Build Hadoop from Source

Instructions in this write-up were tested and run successfully on Ubuntu 10.04, 10.10, 11.04, and 11.10. The instructions should run, with minor modifications, on most flavors and variants of Linux and Unix. For example, replacing apt-get with yum should get it working on Fedora and CentOS.

If you are starting out with Hadoop, one of the best ways to get it working on your box is to build it from source. Using stable binary distributions is an option, but a rather risky one. You are likely to not stop at Hadoop common but go on to setting up Pig and Hive for analyzing data and may also give HBase a try. The Hadoop suite of tools suffer from a huge version mismatch and version confusion problem. So much so that many start out with Cloudera’s distribution, also know as CDH, simply because it solves this version confusion disorder.

Michael Noll’s well written blog post titled: Building an Hadoop 0.20.x version for HBase 0.90.2, serves as a great starting point for building the Hadoop stack from source. I would recommend you read it and follow along the steps stated in that article to build and install Hadoop common. Early on in the article you are told about a critical problem that HBase faces when run on top of a stable release version of Hadoop. HBase may loose data unless it is running on top an HDFS with durable sync. This important feature is only available in the branch-0.20-append of the Hadoop source and not in any of the release versions.

Assuming you have successfully, followed along Michael’s guidelines, you should have the hadoop jars built and available in a folder named ‘build’ within the folder that contains the Hadoop source. At this stage, its advisable to configure Hadoop and take a test drive.

Configure Hadoop: Pseudo-distributed mode

Running Hadoop in pseudo-distributed mode provides a little taste of a cluster install using a single node. The Hadoop infrastructure includes a few daemon processes, namely

  1. HDFS namenode, secondary namenode, and datanode(s)
  2. MapReduce jobtracker and tasktracker(s)

When run on a single node, you can choose to run all these daemon processes within a single Java process (also known as standalone mode) or can run each daemon in a separate Java process (pseudo-distributed mode).

If you go with the pseudo-distributed setup, you will need to provide some minimal custom configuration to your Hadoop install. In Hadoop, the general philosophy is to bundle a default configuration with the source and allow for overriding it using a separate configuration file. For example, hdfs-default.xml, which you can find in the ‘src/hdfs’ folder of your Hadoop root folder, contains the default configuration for HDFS properties. The file hdfs-default.xml gets bundled within a compiled and packaged Hadoop jar file and Hadoop uses the configuration specified in this file for setting up HDFS properties. If you need to override any of the HDFS properties that uses the default configuration from hdfs-default.xml, then you need to re-specify the configuration for that property in a file named hdfs-site.xml. This custom configuration definition file, hdfs-site.xml, resides in the ‘conf’ folder within the Hadoop root folder. Custom configuration in core-site.xml, hdfs-site.xml, and mapred-site.xml corresponds to default configuration in core-default.xml, hdfs-default.xml, and mapred-default.xml, respectively.

Contents of conf/core-site.xml after custom configuration:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 
<!-- Put site-specific property overrides in this file. -->
 
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
  </property>
</configuration>

The specified configuration makes the HDFS namenode daemon accessible on port 9000 on localhost.

Contents of conf/hdfs-site.xml after custom configuration:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 
<!-- Put site-specific property overrides in this file. -->
 
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  <property>
    <name>dfs.name.dir</name>
    <value>path/to/data/dfs/name</value>
    <description>Determines where on the local filesystem the DFS name node
      should store the name table(fsimage).  If this is a comma-delimited list
      of directories then the name table is replicated in all of the
      directories, for redundancy. </description>
  </property>
</configuration>

The override specifies a replication factor of 1. On a single node, you can’t have a failover, can over? The custom configuration also sets the namenode directory to a path on your file system. The default is a folder within your /tmp folder, which gets purged on a restart.

Contents of conf/mapred-site.xml after custom configuration:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 
<!-- Put site-specific property overrides in this file. -->
 
<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
  </property>
</configuration>

After this configuration the MapReduce jobtracker is accessible on port 9001 on localhost.

Finally set JAVA_HOME in conf/hadoop-env.sh. On my Ubuntu 11.10, I have it set as follows:

export JAVA_HOME=/usr/lib/jvm/java-6-openjdk

If you don’t have passphraseless ssh setup on your machine then you may need to execute the following commands:

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

This creates a DSA encrypted key named id_dsa and adds it to the set of authorized_keys for the SSH server running on your localhost. If you aren’t sure if passphraseless ssh access is setup or not then simply run ‘ssh localhost’ on a terminal. If you are prompted for a password then you need to complete the steps to setup passphraseless ssh.

Now start up the Hadoop daemons using:

bin/start-all.sh

Run this command from the root of your Hadoop folder.

As an additional step set HADOOP_HOME environment variable to point to the root of your Hadoop folder. HADOOP_HOME is used by other pieces of software, like HBase, Pig, and Hive, that are built on top of Hadoop.

Running a Simple Example
If you ran the tests after the Hadoop common install and they passed, then you should be ready to use Hadoop. However, for completeness, I would suggest running a simple Hadoop example while the daemons are up and waiting. Run the simple example illustrated in the official document, available online at http://hadoop.apache.org/common/docs/r0.20.203.0/single_node_setup.html#PseudoDistributed. The example is available at the end of the sub-section on Pseudo-Distributed Operation.

Build HBase from Source

Once Hadoop is up and running, you are ready to build and run HBase. Start by getting the HBase source as follows:

git clone https://github.com/apache/hbase.git

I clone it from the Apache HBase mirror on Github. Alternatively, you can get the source from the HBase svn repository, which is where the official commits are checked-in.

HBase can make use of Snappy compression. Snappy is a fast compression/decompression library, which was built by Google and is available as an open source software library under the ‘New BSD License’. The official site defines snappy as follows:

Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.

Snappy is widely used inside Google, in everything from BigTable and MapReduce to our internal RPC systems. (Snappy has previously been referred to as “Zippy” in some presentations and the likes.)

You can learn more about Snappy at http://code.google.com/p/snappy/.

To build HBase with snappy support we need to do the following:

  1. Build and install the snappy library
  2. Build and install hadoop-snappy, the library that bridges snappy and Hadoop
  3. Compile HBase with snappy

Build and Install snappy
Building and installing snappy is easy and quick. Get the latest stable snappy release as follows:

wget http://snappy.googlecode.com/files/snappy-1.0.4.tar.gz

The current latest release version is 1.0.4. This version number could vary as newer versions supersede this version.
Once you download the snappy zipped tarball, extract it:

tar zxvf snappy-1.0.4.tar.gz

Next, change into the snappy extracted folder and use the common configure, make, make install trio to complete the build and install process.

cd snappy-1.0.4
./configure && make && sudo make install

You may need to run ‘make install’ using the privileges of a superuser, i.e. ‘sudo make install’.

Build and Install hadoop-snappy
Hadoop-snappy is a project for Hadoop that provides access to the snappy compression/decompression library. You can learn details about hadoop-snappy at http://code.google.com/p/hadoop-snappy/. Building and installing haddop-snappy requires Maven. To get started checkout the hadoop-snappy code from its subversion repository like so:

svn checkout http://hadoop-snappy.googlecode.com/svn/trunk/ hadoop-snappy-read-only

Then, change to the ‘hadoop-snappy-read-only’ folder and make a small modification to maven/build-compilenative.xml :

# add JAVA_HOME as an env var
<exec dir="${native.staging.dir}" executable="sh" failonerror="true">
    <env key="OS_NAME" value="${os.name}"/>
    <env key="OS_ARCH" value="${os.arch}"/>
    <env key="JVM_DATA_MODEL" value="${sun.arch.data.model}"/>
    <env key="JAVA_HOME" value="/usr/lib/jvm/java-6-openjdk"/>
    <arg line="configure ${native.configure.options}"/>
</exec>

Also, install a few required zlibc related libraries:

sudo apt-get install zlibc zlib1g zlib1g-dev

Next, build Hadoop-snappy using maven like so:

sudo mvn package

Once hadoop-snappy is built, install the jar and tar distributions of hadoop-snappy to your local repository:

mvn install:install-file -DgroupId=org.apache.hadoop -DartifactId=hadoop-snappy -Dversion=0.0.1-SNAPSHOT -Dpackaging=jar -Dfile=./target/hadoop-snappy-0.0.1-SNAPSHOT.jar
mvn install:install-file -DgroupId=org.apache.hadoop -DartifactId=hadoop-snappy -Dversion=0.0.1-SNAPSHOT -Dclassifier=Linux-amd64-64 -Dpackaging=tar -Dfile=./target/hadoop-snappy-0.0.1-SNAPSHOT-Linux-amd64-64.tar

Compile, Configure & Run HBase
Once snappy and hadoop-snappy are compiled and installed, you are ready to compile HBase with snappy support. Change to the folder that contains the HBase repository clone and run the ‘maven compile’ command to build HBase from source.

cd hbase
mvn compile -Dsnappy

The -Dsnappy option tells maven to compile HBase with snappy support.

Earlier, I setup Hadoop to run in pseudo-distributed mode. Lets configure HBase to also run in pseudo-distributed mode. Alike Hadoop, the default configuration for HBase is available in hbase-default.xml and custom configuration can be specified to override the default configuration. Custom configuration resides in conf/hbase-site.xml. To setup, HBase in pseudo-distributed make sure the contents of conf/hbase-site.xml are as follows:

<configuration>
    <property>
        <name>hbase.rootdir</name>
        <value>hdfs://localhost:9000/hbase</value>
        <description>The directory shared by RegionServers.
        </description>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
        <description>The replication count for HLog and HFile storage.
        Should not be greater than HDFS datanode count.
        </description>
    </property>
</configuration>

You may recall we configured the HDFS namenode to be accessible on port 9000 on localhost. Therefore, the ‘hbase.rootdir’ needs to be specified with respect to the HDFS url. If you configure to run HDFS daemons on a different port, then please adjust the configuration for ‘hbase.rootdir’ in line with that. The second custom property definition sets replication factor value to 1. On a single node thats the best and only option you have!

Now you can start-up Hbase using:

bin/start-hbase.sh

Build Pig
Building and Installing Pig from source is a simple 3 command operation like so:

svn checkout http://svn.apache.org/repos/asf/pig/trunk/ pig
cd pig
ant

The first command checks out source from the Pig svn repository. It grabs the source from the ‘trunk’, which is referred to as ‘master’ in git jargon. The second command changes to the folder that contains the pig source. The third command compiles the pig source using Apache Ant. Invoking the default target, i.e. simply calling ‘ant’ without any argument, compiles Pig and packages it as a jar for distribution and consumption. Pig jar file can be found at the root of the Pig folder. Pig usually generates two jar files:

  1. pig.jar — to be run with Hadoop.
  2. pigwithouthadoop.jar — to be run locally. Pig does not need to always use Hadoop.

Build Hive
Building and Installing Hive is almost as easy as building and installing Pig. The following set of commands gets the job done:

svn co http://svn.apache.org/repos/asf/hive/trunk hive
cd hive
ant package

You should be able to understand the commands if you have come so far in this article.

There is one little catch in this Hive instruction set though. As you run the ‘ant package’ task you will see the build fail. With HADOOP_HOME pointing to a hadoop-0.20-append branch build, Hive ShimLoader does not get the Hadoop version correctly. Its the “-” in the name that causes the problem! Apply, the simple patch available at https://issues.apache.org/jira/browse/HIVE-2294 and things should work just fine. Apply the patch as follows:

patch -p0 -i HIVE-2294.3.patch

To start using Hive, you will also need to minimally carry out these additional tasks:

  • Set HIVE_HOME environment variable to point to the root of the HIVE directory.
  • Add $HIVE_HOME/bin to $PATH
  • Create /tmp in HDFS and set appropriate permissions
bin/hadoop fs -mkdir /tmp 
bin/hadoop fs -chmod g+w   /tmp
  • Create /user/hive/warehouse and set appropriate permissions
bin/hadoop fs -mkdir /user/hive/warehouse 
bin/hadoop fs -chmod g+w /user/hive/warehouse

Now you are ready with pseduo-distributed Hadoop, pseudo-distributed HBase, Pig, and Hive running on your box. This is of course just the beginning. You need to learn to leverage these tools to analyze data, but that not covered in this write-up. A following post will possibly address the topic of analyzing data using MapReduce and its abstractions.

Scala syntax highlighting in gedit

Update: A small typo, an unnecessary “<” tag before xmlns in scala-mime.xml has been corrected. Thanks @win for finding the error. See the comments below for additional references.

The default text editor on Ubuntu, or for that matter any Gnome powered desktop, is gedit. If you are a developer like me, who isn’t a huge fan of IDE(s), there is a good chance you use gedit for some of your development. Gedit supports syntax highlighting for a number of languages but if you were hacking some Scala code using the editor, you wouldn’t find any syntax highlighting support out-of-the-box. However, the Scala folks offer gedit syntax highlighting support via the scala-tool-support subproject. To get it working with your gedit installation, do the following:

  1. Download the scala.lang file from http://lampsvn.epfl.ch/trac/scala/browser/scala-tool-support/trunk/src/gedit/scala.lang. You can checkout the source using svn or scrape the screen by simply copying the contents and pasting it into a file named scala.lang. On Ubuntu, using Ctrl-Shift and the mouse, helps accurately select and copy the content from the screen.
  2. Copy or move scala.lang file to ~/.gnome2/gtksourceview-1.0/language-specs/
  3. Create a file named scala-mime.xml at /usr/share/mime/packages/ using
    sudo touch /usr/share/mime/packages/scala-mime.xml
  4. Add the following contents to scala-mime.xml:
    <?xml version="1.0" encoding="UTF-8"?>
    <mime-info
     xmlns='http://www.freedesktop.org/standards/shared-mime-info'>
    <mime-type type="text/x-scala">
    <comment>Scala programming language</comment>
    <glob pattern="*.scala"/>
    </mime-type>
    </mime-info>
  5. Run
    sudo update-mime-database /usr/share/mime
  6. Start (or restart, if its running) gedit and you now have scala syntax highlighting in place.

Ubuntu and HP TouchSmart Sound

I upgraded my Ubuntu install on my HP TouchSmart machine to version 11.04 (Natty Narwhal). Ubuntu 11.04 Unity Desktop experience is so nice and smooth that I started using my HP TouchSmart actively again. It had been sitting gathering dust for the last many months!

The last version of Ubuntu on this machine was 10.04, which was upgraded to 11.04, via a 10.10 upgrade en route. During 10.04 days, I had trouble getting Ubuntu to work smoothly on this machine. The internal speakers did not work (only external speakers did), the wifi did not work, and the touch screen lost its touch qualities. After I upgraded to 11.04, I somehow believed many of these past woes would get corrected but that wasn’t the case. So I actively started making some effort to resolve these issues. Getting the internal speaker sound to work was the first of the things I did and surprisingly a few minutes is all I needed to solve the problem.

The fix on the TouchSmart is really a simple and 1 line addition to a configuration file. Open the terminal and type the following:

sudo gedit /etc/modprobe.d/alsa-base.conf

This will open alsa-base.conf in gedit, the official text editor on the Gnome desktop. If you like vi instead of gedit then open the file as follows:

sudo vi /etc/modprobe.d/alsa-base.conf

At the very end add the following 1 line to alsa-base.conf file:

options snd-hda-intel model=touchsmart

Now, save the file and reload alsa using:

sudo alsa force-reload

and the internal speakers are in business. That was quick and simple. Wasn’t it?

A little peek into why this fix works and how this may apply to systems other than the TouchSmart:

Find out the model of your sound card using:

cat /proc/asound/card0/codec* | grep Codec

On my TouchSmart the output is as follows:

Codec: Analog Devices AD1984A

ALSA (Advanced Linux Sound Architecture) provides audio and MIDI functionality to the Linux OS. Browse the ALSA documentation to see list of supported audio models for your card. The documentation is available in /usr/share/doc/alsa-base/driver/HD-Audio-Models.txt.gz, which is a compressed file. You can list the content of this file, without decompressing, as follows:

gunzip -c /usr/share/doc/alsa-base/driver/HD-Audio-Models.txt.gz

It  may be a good idea to page through the file using the more command like so:

gunzip -c /usr/share/doc/alsa-base/driver/HD-Audio-Models.txt.gz | more

On my machine, I see the following entries relevant to AD1984A :

....
 
AD1884A / AD1883 / AD1984A / AD1984B
====================================
desktop    3-stack desktop (default)
laptop    laptop with HP jack sensing
mobile    mobile devices with HP jack sensing
thinkpad    Lenovo Thinkpad X300
touchsmart    HP Touchsmart
 
....

(First column is the model and second one is the description)

This explains why the value of snd-hda-intel model was set to touchsmart. This hopefully also gives you a clue to find your sound card model and its supported configuration values for that model if you have a problem getting sound to work on your own Ubuntu install.

For additional reference, consider reading https://help.ubuntu.com/community/HdaIntelSoundHowto.

My new book: Professional NoSQL (Wiley, 2011)

My new book, Professional NoSQL (Wiley, 2011) is now available in bookstores.

NoSQL is an emerging topic and a lot of developers, architects, technology managers, and CIO(s) are fairly confused trying to understand where it fits in the stack. While these folks are trying to come up to speed and climb up the learning curve, many NoSQL enthusiasts and product vendors are presenting the usual jargon heavy, myth centric promises and confusing them further. Given this context, I have made an attempt to present an unbiased and objective overview of the topic: explaining the fundamentals, introducing the products, presenting a few of its nuances, and describing the context in which it exists.

Read the first chapter, which is available for download online and consider buying a copy. If you find errors, then please let me know of them.

Hope you enjoy reading the book and find it useful.

Research Papers and Videos on Google Bigtable, GFS, Chubby and MapReduce

Thanks all for coming to my talk today on sorted ordered column-family stores at the Silicon Valley Cloud Computing Meetup. Here are the links to the Google’s research papers and a couple of videos that relate to Bigtable (and Google App Engine internals):

Bigtable: A Distributed Storage System for Structured Datahttp://labs.google.com/papers/bigtable.html

The Chubby Lock Service for Loosely-Coupled Distributed Systems

http://labs.google.com/papers/chubby.html

The Google File System

http://labs.google.com/papers/gfs.html

MapReduce: Simplified Data Processing on Large Clusters

http://labs.google.com/papers/mapreduce.html

BigTable: A Distributed Structured Storage System –

http://video.google.com/videoplay?docid=7278544055668715642&hl=en#

Google I/O 2008 – App Engine Datastore Under the Covers –

http://www.youtube.com/watch?v=tx5gdoNpcZM

Enjoy reading the research papers and watching the videos!


Getting Friendly With Document Databases

Here is a draft of what I plan to present in Part 1 of the NoSQL Series on Jan 24th at Fenwick & West in Mountain View, CA (The series has 4 parts in all. It runs between 1/24 and 1/27, everyday at 7pm). The event is  hosted by the Silicon Valley Cloud Computing Meetup.

Topic: Getting Friendly With Document Databases

Scope:
Products covered: MongoDB (mongodb.org) and CouchDB (couchdb.apache.org)
Level: Introductory but not cursory. Full of examples.
Duration: 60 mins. (1 hour) — may have too much for an hour. Could do a bit more than an hour if need be.
Session Contents:
– Document databases
  • What are they?
  • Their essential structure (in the context of MongoDB and CouchDB)
  • Data types supported
  • Schemaless
– Creating, Reading, Updating and Deleting Documents
  • Using MongoDB
  • Using CouchDB
– Querying Documents
  • Filtering
  • Ordering
  • Limiting result set
  • Grouping
  • Joining (?)
(Includes MapReduce)
– Indexes
  • Types
  • How-to
– Very first steps in performance tuning
  • Understanding query plans
  • Faster query results
– A few peculiarities
– Questions
This should give you a head start but an hour isn’t enough to cover all the details so am planning on organizing a follow-up 2 day training in February. See you on Jan 24 at Fenwick & West.

Special Guest at the NIT PowerConnect

The NIT (National Institute of Technology) Almuni Network in the Silicon Valley organizes a monthly power connect networking event. I am honored to be invited as a special guest to their event this evening. I am not an NIT alumni (I attended St. Stephen’s College, XLRI and Courant, NYU) but do know that NIT (which was formerly known as REC) produces a number of very bright engineers every year. Although, IIT is the big global brand from India which has produced a number of very smart and well-recognized engineers, few know that NIT has a lot of great success stories as well.

I will be leading the conversation on NoSQL and Cloud Computing and would participate in a panel discussion on “Snakes and Ladders – How to climb the corporate Ladder” with a bunch of well-known and respected professionals including Bala Sahejpal (IT Director at Juniper Networks), Paul Chen (Director at PapayaMobile), Dilip Saraf (a career coach), Anand Kamannavar (Applied Ventures) and Biren Gandhi (Ex Studio CTO Zynga). Looking forward to the exciting event this evening. If you are coming to the event then look forward to seeing you there.

5 Technology Application Trends in 2011!

5 technology application trends that I think will be most popular in 2011 are:

  1. Tablets, tablets and tablets: iPad started the fever and its not stopping anytime soon. Newer and smarter tablets of all sizes, form factors and capabilities will emerge. Newer and newer applications for these devices will be available.
  2. Big data will get bigger: More and more big data will become available in the public domain and we will see the emergence of newer and smarter storage and analytics solutions in the space. In other words NoSQL and all tools that help manage big data will boom. Cloud will continue its expansion.
  3. Local will be king: Groupon has show the way but there is a lot more to win! Hyperlocal communities will be the way forward. You will see a lot more startups in the space. If you are an investor, don’t forget to put some money there :)
  4. Social networking shakeout & correction: Every boom meets a correction, Facebook and friends will see some correction as well.
  5. Growth of collaboration: the audio-video segment has been largely a consume only space for a while. Collaborative rich communication and interaction will see some innovative new applications.

Happy New Year!

NoSQL Sessions at Silicon Valley Cloud Computing Meetup in January 2011

After Santa Claus has come and gone, NoSQL is coming to town! Come January 2011, I present the core NoSQL ideas, concepts, tools and technologies via a set of 4 day back-to-back sessions at the Silicon Valley Cloud Computing Meetup. The schedule is as follows:

Jan 24th (Monday): NoSQL Series – Part 1: Getting friendly with document databases
http://www.meetup.com/cloudcomputing/calendar/15226964/
Jan 25th (Tuesday): NoSQL Series – Part 2: Nothing beats a distributed hash
http://www.meetup.com/cloudcomputing/calendar/15226985/
Jan 26th (Wednesday): NoSQL Series – Part 3: HBase beyond the “Hello World!”
http://www.meetup.com/cloudcomputing/calendar/15227013/
Jan 27th (Thursday): NoSQL Series – Part 4: Eventually it’s consistent
http://www.meetup.com/cloudcomputing/calendar/15227037/
The venue is:
Fenwick & West
801 California St
Mountain View, CA 94041
Google Maps

Each day session starts at 7pm so you don’t have to miss work to join us for these sessions. Actually, wanted to make sure everyone was tired after a long day’s work so there were less questions :)

Each session is about an hour and a half long with a short break of 5 mins. or so in the middle. There is pizza, veggies and desserts to go along with the talk.

Thanks to Sebastian Stadil for organizing the Silicon Valley Cloud Computing Meetup and making these sessions possible. If you are into big data, cloud computing and web scale stuff and are in the bay area then this meetup is surely the one you should join.

All these talks will leverage my efforts towards writing Wiley’s Professional NoSQL (coming end of Q1/Q2 2011).

First Steps with BlackBerry PlayBook AIR SDK

Last week at Adobe MAX 2010, the BlackBerry Playbook was introduced to the world of developers. Positioned as a powerful multi-tasking environment it promises to pose formidable competition in the tablet landscape. The PlayBook OS is built on QNX, the real-time time-tested OS and offers an Adobe AIR based SDK for developers to write their apps.

As soon as I was back from MAX, I was itching to write my first “Hello World” app using the BlackBerry PlayBook SDK. So here in this post is an account of that adventure.

To get started with the PlayBook SDK, first go to the BlackBerry Tablet OS development resources and download the SDK and the simulator for your platform. You will need to get the appropriate versions of the SDK and the simulator for your platform. At the moment, Mac and Windows are supported. I tried my hand with both Mac and Windows. There are “Getting started” guides for Mac and Windows and that’s where my journey began.

To setup the development environment you need the following:

  • The BlackBerry Tablet OS Simulator ISO
  • VMWare Player (or VMWare Fusion on the Mac) to configure and run a virtual environment on the basis of the ISO
  • AIR 2.5 SDK
  • Flash Builder 4.x (the support officially extends to Flash Builder 4.0.1 but you can make it work with FB Burrito)
  • BlackBerry AIR SDK

The PlayBook simulator leverages VMWare Player to create a virtual environment for you to test your apps. When I began to follow the steps in the “Getting started” started guide for Mac and wanted to download and install the VMWare Player on a Mac as per the following guidelines: Configure a virtual machine for the BlackBerry Tablet Simulator, I realized a VMWare Player didn’t exist for a Mac at all. So I downloaded a trail version of VMWare Fusion instead to get things working. Once VMWare Fusion is installed one can follow along the instructions in the “Getting started” guide to setup and configure the simulator. Since that is documented and available online it makes no sense to reproduce it here.

Next, I downloaded the BlackBerry AIR SDK. Flash Builder 4.0.1, AIR 2.5 SDK and Flash Builder Burrito were already installed so I did not need to re-install it. If you don’t have Flash Builder and AIR on your machine then please download and install them before you move forward. Installing the BlackBerry SDK was elementary as a wizard guided through the process. However, one little thing was very odd. The BlackBerry SDK refused to deploy to any Flash Builder other than one named: “Adobe Flash Builder 4″. I have Flash Builder 4 installed as a plug-in and have the standalone FB Burrito that does not have a folder name as stated. So I had to hack around the problem by creating a symbolic link named “Adobe Flash Builder 4″ to my FB Burrito folder on the Mac. On Windows I just configured the SDK, which I will talk about in a bit.

With that done things were working on the Mac.

On the Windows box, I had a different story. Installing the VMWare Player was no problem. It comes for free and installing it requires clicking the installer. That’s it! However, things got tricky right after that. I downloaded the simulator installer but on clicking the installer was greeted with this lack of 64-bit platform support message:

BlackBerry TabletOS Simulator Installer Error on Windows 7

BlackBerry TabletOS Simulator Installer Error on Windows 7

This I think was quite uninviting. Anyway, I worked around this problem. All I needed from this bundle was the TableOS Simulator ISO, so I extracted the installer using 7-zip and traversed down the extracted folder to BlackBerryPlayBookSimulator-Installer-Win\InstallerData\Disk1\InstData, where I again extracted the zipped up file called “Resource1″. Once extracted, I could get the ISO at Resource1\$IA_PROJECT_DIR$\installerdata. From there on I could use the instructions in the “Getting started” guide for Windows to setup the simulator.

As expected, the BlackBerry SDK installer didn’t work on Win 7 either. It again threw up the now familiar message stating lack of Win64 support. Makes me wonder why companies don’t have installers support the 64-bit platform. This is a developer’s tool and not an end user product. Many developers are already using the 64-bit platform and all should and will use it in a couple of years time.

I used the same trick as before and extracted the SDK installer using 7-zip. This time I traversed down the extracted installer to BlackBerryTabletSDK-Air-Installer-0.9.0-Win\InstallerData\Disk1\InstData. There I extracted the “Resource1″ zip file and traversed further down to Resource1\$IA_PROJECT_DIR$\installerdata. In this folder there are two JAR (Java Archive) files that hold the contents of the SDK. The names of these 2 jar files are as follows:

  • blackberry-tablet-sdk-0.9.0_zg_ia_sf.jar
  • qnxsdk_zg_ia_sf.jar

Next, I extracted these two jar files using 7-zip again. After this I created a folder in the “applications” folder, which resides at the root of the C: drive and named it “blackberry-tablet-sdk-0.9.0″, essentially all except the “zg_ia_sf.jar” part of the BlackBerry Tablet jar file name. You can choose any other name, say just blackberry-tablet-sdk or any other. Next, I merged the contents of the two extracted jar files and the Adobe AIR 2.5 SDK (which I assume you would have downloaded and setup by now). That was it! It got my BlackBerry Tablet SDK up and running.

Once the SDK and simulator is setup, you can open up Flash Builder and first configure the BlackBerry Tablet AIR SDK by adding it to the list of “Installed Flex SDKs” as shown in the Figure below:

Installed Flex SDKs

Installed Flex SDKs

Once the setup was complete, I could finally get to writing the “Hello World” app. I wanted the “Hello World” app to be a bit more exciting than printing “Hello World” out to the screen so used the Google Maps Flash API to draw out a map of the area where the Adobe MAX venue, i.e. the LA Convention Center, was.

To create this app, I created a Flex project in Flash Builder and selected it to be an Adobe AIR application. Then I explicitly chose the BlackBerry Tablet AIR SDK as the SDK and finally I chose an ActionScript file as the main file. (As far as I understood, the SDK doesn’t support MXML directly as of now.)

Once the application was written, I compiled the application as usual and then packaged and installed the application to the simulator. I used the command line to package and install the app. You can read about the command line options online. You will need to retrieve the simulator IP to get this to work. Don’t forget to read about retrieving the IP of the simulator.

The command line packager and installer has a format like so:

blackberry-airpackager -package output_bar_file_name -installApp -launchApp project_name-app.xml project_name.swf any_other_project_files -device IP_address

I will not talk about the app itself at the moment but may cover that in a subsequent post, especially once I integrate with a few BlackBerry TabletOS specific features and gestures.

For now I just include a snapshot of the initial “Hello World” version.

Hello BlackBerry PlayBook World!

Hello BlackBerry PlayBook World!

FireStats icon Powered by FireStats