Wednesday, May 21, 2014

Setting up and running Hadoop (hadoop-1.2.1) on Windows 7 (64 bit) [Single node cluster]

I will try to list the steps (with screenshots) of setting up hadoop (stable release 1.2.1) on Windows 7 (Single node cluster).

For an introduction to Hadoop, refer the following article

We will need the following for the same:
  • JDK 1.6+ installed on Windows box
  • Cygwin (with openssh)
  • Hadoop 1.2.1

Installing JDK 1.6+
I have installed JDK 1.7.0_55 for the setup.
This can be installed from the Oracle JDK download page @

After setup, put the JDK folder path entry in the Environment variables (system) in a new JAVA_HOME field.
[Computer right click > Advanced System Settings > Environment Variables > New > Put JAVA_HOME entry and value as  -- D:\Java\jdk1.7.0_55 (where your JDK is installed)]

Installing and setup Cygwin (with openssh)
Refer following article for step by step guide:

Installing and setting up Hadoop
  • Download hadoop-1.2.1 from to your local machine
  • Use winzip to get the .tar file.
  • Use winzip to get the hadoop folder extracted. Place the folder in the Cygwin directory under <Cygwin_install_dir>\home\<user>
  • e.g. I have installed under C:\cygwin64\home\nlachwan. Thus my hadoop folder is C:\cygwin64\home\nlachwan\hadoop-1.2.1

Edit <Hadoop Home>/conf/ to set Java home
export JAVA_HOME=D:\\Java\\jdk1.7.0_55 (note the double back slash \)

Update <Hadoop Home>/conf/core-site.xml. with the below xml tag to setup hadoop file system property<configuration>

Update <Hadoop Home>/conf/mapred-site.xml

Create two folders "data" and "name" under /home/<user> and do chmod 755 on both.
e.g. /home/nlachwan/data and /home/nlachwan/name

Update<Hadoop Home>/conf/hdfs-site.xml
<value>/home/nlachwan/data</value> <<Note: please put your username instead of nlachwan>>
<value>/home/nlachwan/name</value> <<Note: please put your username instead of nlachwan>>

Note: You could do dos2unix for the above files (if edited in Windows). However, it didn't makeany difference for me (I didn't do it).

After setup, put the hadoop folder path entry in the Environment variables (system) in the PATH

[Computer right click > Advanced System Settings > Environment Variables > PATH entry and value as  -- C:\cygwin64\home\nlachwan\hadoop-1.2.1\bin (where your hadoop is installed)]

Format HDFS:

In the cygwin terminal, execute hadoop to confirm its working

 nlachwan@NLACHWAN-IN ~
$ hadoop
Warning: $HADOOP_HOME is deprecated.

Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
  namenode -format     format the DFS filesystem
  secondarynamenode    run the DFS secondary namenode
  namenode             run the DFS namenode
  datanode             run a DFS datanode
  dfsadmin             run a DFS admin client
  mradmin              run a Map-Reduce admin client
  fsck                 run a DFS filesystem checking utility
  fs                   run a generic filesystem user client
  balancer             run a cluster balancing utility
  oiv                  apply the offline fsimage viewer to an fsimage


In the cygwin terminal, execute hadoop namenode -format

Execute stop-all
Execute start-all
If you see an ssh connection refused error, follow all the steps from "Setup authorization keys"

Once hadoop is running, test hadoop file system by executing "hadoop fs -ls /"

You can also check the following URLs:

Your Hadoop is setup. Have fun !!