Apache Hadoop is an open-source framework which is used for distributed processing ,performing computations of large data sets on clusters by distributing computations to each of the node.This framework mainly comes with a hadoop kernel , ability to run distributed MapReduce jobs and a filesystem-HDFS.
There are many tutorials which help you install hadoop on windows but most of them have some issues .After referring few tutorials I am writing this to solve what is missed in other ones.
Since I said earlier that this tutorial is to install Hadoop on windows and the fact that hadoop contains lot of shell scripts to be executed we need a *nix shell for windows. Cygwin is one of them and the best as well.download from here.
Run the setup.exe as an administrator and after selecting the mirror for download remember to select ssh package for installation.refer image below :
once you have given the password you will recv the confirmation as in the above image..
so now once ssh is configured you can test it using the command ssh localhost
now generate a key to configure the authentication mechanisms of ssh using the command
ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa
so that you need not give it everytime u invoke.
and once done copy it using the command cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys as show above.
now time to download the hadoop framework , i used this mirror and select the hadoop-1.1.1 release. from the link http://mirror.catn.com/pub/apache/hadoop/common/hadoop-1.1.1/ download the hadoop-1.1.1-bin.tar.gz
( for list of other mirrors you can always check http://www.apache.org/dyn/closer.cgi/hadoop/common/ )
now extract the hadoop-1.1.1-bin.tar.gz file to c:\cygwin\usr\local and rename c:\cygwin\usr\local\hadoop-1.1.1 folder to c:\cygwin\usr\local\hadoop
now go to the path C:\cygwin\usr\local\hadoop\conf and open the hadoop-env.sh
go to line9 and u will find an entry for export JAVA_HOME=something
change it to
export JAVA_HOME=/cygdrive/c/Program\ Files/Java/jdk1.6.0_11
and do not forget to uncomment the line ( remove the # from the beginning of the line )
if you want to get rid of the escape sequence hassle for the space in “program files” you can always install jdk in c:\java\jre or something or use this /cygdrive/c/Program\ Files/Java/jdk1.6.0_11 .it worked for me !!
below are few snap shots of errors which you might get if you dont configure your JAVA_HOME properly.
and open the C:\cygwin\usr\local\hadoop\conf\mapred-site.xml to configure the mapreduce service :
once this is done we can now format the hdfs filesystem using the command
bin/hadoop namenode -format
and start the dfs subsystems using the command
now we have successfully installed hadoop on windows.