This article explains how to install an Apache Hadoop on OS X. This article is not specific to Talend and should be helpful, whatever your requirement is for using Hadoop.

The topics discussed here are useful if you want to learn Hadoop and set up your own single node cluster, for learning and development. OS X is, probably, not the first platform that you will be considering when you're building your large Hadoop cluster; however this is a useful exercise when you're taking your first look at Hadoop. This tutorial has been written, by installing Hadoop on a MacBook Air running OS X 10.9.1 (Mavericks). You can also install Hadoop on Unix, Linux variants and on a Windows Server. Prerequisites for Installing Hadoop There are a few things you need to sort out before installing Hadoop. Java Version Check your Java version. You'll need Java 6 (1.6) or higher.

Update your main Hadoop configuration files, as shown in the sample files below.

At the time of writing, the latest version of Java was Java 7 (1.7). Note Although this is a universal guide for installing Hadoop, this is primarily a site about Talend. At the time of writing, the latest version of Talend (5.4.1) only supports Java 6 so, if running Talend, you'll need to have two Java versions installed or use Hadoop with Java 6. Mavericks By default, Mavericks does not include Java and, if you've upgrade to Mavericks, Java will be uninstalled. There are plenty of resources that will explain how to install Java, if you do not already know. Just Google it.

Java -version You should receive the following response. Free zip files for mac. Java version '1.7.0_51' Java(TM) SE Runtime Environment (build 1.7.0_51-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode) Hadoop User For security and administration reasons, it is recommended that you create an Hadoop Operating System User. You can create a new user from Launchpad->System Preferences->Users & Groups. If you create the user hadoop, create the account as a Standard user. If you are running Hadoop on your own personal computer, you may choose to run Hadoop under your own regular account (this is what I've chosen to do). If you choose to run Hadoop under an account name other than hadoop, amend the commands in this tutorial accordingly.

If you are using the hadoop user, you should now log out and log back in using that account. Open a Terminal Window The following commands are entered from a command prompt, so you will need to open a window. You can do this from Launchpad->Other->Terminal.

SSH To use Hadoop, it will be necessary for Hadoop to have the ability to establish connections to, and to do this without the need to provide a password or passphrase. OS X comes with pre-installed, so there is no need to install any additional software.

Enter the following command. Ssh-keygen -t rsa -P ' You will be asked to Enter file in which to save the key. The default value is /Users/hadoop/.ssh/id_rsa. You have now created an key file that can be used. A passphrase is not required to use this key file -P '.

You should receive the following response. Generating public/private rsa key pair.

Enter file in which to save the key (/Users/hadoop/.ssh/id_rsa): Your identification has been saved in /Users/hadoop/.ssh/id_rsa. Your public key has been saved in /Users/hadoop/.ssh/id_rsa.pub.

The key fingerprint is: 55:b7:8e:1b:b1:76:a4:e8:bb:2f:be:e4:c8:f5:68:89 [email protected] The key's randomart image is: +--[ RSA 2048]----+ .. ... .. + .+. .E=++ ooB=o +-----------------+ Now that the key pair has been created, we can authorize it's use, using the following command. Cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys You can test the connection and save the key fingerprint by entering the following command. Respond with yes, when prompted to save the finger print. Note that if you have followed the preceeding steps correctly, you should not be asked to enter your password or a passphrase.