BigData Analytics

Tuesday, October 29, 2013

Hadoop 2 Installation

Follow Hadoop2.x.x Installation for detailed steps.

Friday, September 6, 2013

A. Pre-Requisite

1. Hadoop

B. HBase Installation
1. Download stable version of HBase from : http://apache.mirrors.hoobly.com/hbase/stable/ (downlaod the latest stable version !)

2. Goto Downloads folder: launch terminal -> cd <download path> (ex: cd /home/hduser/Downloads)

3. Extract: tar -xzf hbase-<version>.tar.gz (ex: tar -zxf hbase-0.94.11.tar.gz)

4. Change directory name: mv hbase-<version>/ hbase/

5. Move hbase to a directory where all hadoop tools installed: sudo mv hbase/ /usr/hadoop/.

6. Change Ownership: sudo chown -R hduser:hadoop /usr/hadoop/hbase/

7. For convenience, export HBASE_HOME & export $PATH:$HBASE_HOME/bin (so that you will be able to call hbase commands from any where )

8. Edit $HBASE_HOME/conf/hbase-env.sh to specify JAVA_HOME. Save & exit from the editor

9. Start HBase: start-hbase.sh

10. Launch hbase shell to work on it (create, alter, drop tables & insert data, etc ...): hbase shell

That's it your HBase installation is done.

To test your installation, use below statement at hbase shell prompt to create table:

1. hbase(main):001:0> create 'emptable', 'empno', 'empname', 'salary', 'deptno'

2. hbase(main):002:0> describe 'emptable'

-

Monday, August 19, 2013

Public Datasets

http://www.scaleunlimited.com/datasets/public-datasets/ - Sample data for various sectors. This sample data can be used for POC, Prototype, Conduct pre-production tests, etc ...

http://www.grouplens.org/node/12 - Sample data for various sectors. This sample data can be used for POC, Prototype, Conduct pre-production tests, etc ...

http://stat-computing.org/dataexpo/2009/the-data.html - Airlines Data to Analyze & predict delays:

       - to calculate average departure delay by month for each airline

Saturday, August 17, 2013

Zookeeper Installation

A. Pre-Requisite

1. Hadoop

B. Zookeeper Installation

1. Download ZooKeeper from : http://www.apache.org/dyn/closer.cgi/zookeeper/ (downlaod the latest stable version !)

2. Goto Downloads folder: launch terminal -> cd <download path> (ex: cd /home/hduser/Downloads)

3. Extract: tar -xzf zookeeper-<version>.tar.gz (ex: tar -xzf zookeeper-3.4.5.tar.gz)

4. Change directory name: mv zookeeper-<version>/ zookeeper/

5. Move zookeeper to a common directory where all hadoop tools installed: sudo mv zookeeper/ /usr/hadoop/.

6. Change Ownership: sudo chown -R hduser:hadoop /usr/hadoop/zookeeper/

7. Create a directory under /tmp and change ownership to to the user under which zookeeper installed : cd /tmp -> sudo mkdir zookeeper -> sudo chown -R hduser:hadoop zookeeper (This directory to store the in-memory database snapshots & the transaction logs)

8. Edit zoo config: In the distribution zoo_sample.cfg exists. Create a copy of it as zoo.cfg & edit: cd <Zookeeper installation path>/conf; cp zoo_sample.cfg zoo.cfg; vi zoo.cfg to configure the datadir & other configurations as per below:
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181

Upon any change save the file & exit (esc :wq)

9. Now login / change as hduser (zookeeper installation user)

10 cd (to goto home directory)

11. vi .bashrc
add below environment variables:
ZOOKEEPER_HOME=/usr/hadoop/zookeeper
export ZOOKEEPER_HOME
PATH=$PATH:$ZOOKEEPER_HOME/bin
export PATH
Upon adding above lines, save & exit from .bashrc (esc :wq)

12. source .bashrc

13. Start ZooKeeper server: zkServer.sh start

14. Zookeeper's command line tools:

zkCli.sh localhost ls /zoo

Thursday, August 15, 2013

Sqoop Installation

A. Pre-Requisite:

I. Hadoop

II. RDBMS (MySQL, Oracle, DB2, etc ...)

III. RDBMS Connector

I have installed MySQL for testing purpose.

Here is the simple way to install MySQL on ubuntu:

1. Launch Terminal

2. sudo apt-get install mysql-server (prompt for password)

3. While installing it will prompt to key in root password for mysql (Not for your system). Key in the password for mysql root user (new password & re-type password)

4. Upon successful installation, check the status using below command:

5. sudo netstat -tap | grep mysql

result should be: tcp 0 0 localhost:mysql *:* LISTEN 10444/mysqld

If it shows above output then your mysql database is ready.

6. Upon installation download the respective connector. In my case I have downloaded mysql-connector-java-5.1.25.jar & added this to CLASSPATH.

B.Sqoop Installation

1. Downlaod Sqoop from http://mirror.sdunix.com/apache/sqoop/1.4.4/sqoop-1.4.4.bin__hadoop-1.0.0.tar.gz (check your hadoop version & download respective version of sqoop, make sure the file name has "bin")

2. Extract Sqoop : tar -xzf sqoop-1.4.4-bin_hadoop-1.0.0.tar.gz (it will extract to a folder sqoop-1.4.4-bin_hadoop-1.0.0)

3. sudo mv sqoop-1.4.4-bin_hadoop-1.0.0/ sqoop

4. sudo mv sqoop/ /usr/hadoop/.

5. cd /usr/hadoop

6. sudo chown -R hduser:hadoop sqoop/

7. cd sqoop

8. cp *.jar $HADOOP_HOME/lib/. (copy sqoop jar files to hadoop lib directory)

9. Set below Env variables (under hduser).

export SQOOP_HOME=/usr/hadoop/SQOOP
export PATH=$SQOOP_HOME/bin:$PATH

(Make sure hadoop is started. To start Hadoop, ssh localhost; start-all.sh)

10. type sqoop at command prompt (it will display type "sqoop help" to get help).

11. Below are sample sqoop statements for importing & exporting data from mysql db to hdfs

a. sqoop import --connect jdbc:mysql://localhost/hadoop_test --username xxxxx --password ******** --table Employee --target-dir /data/emp1 -m 1

b. sqoop import --connect jdbc:mysql://localhost/hadoop_test --username xxxxx --password ******** --table Employee --target-dir /data/emp2/ --split-by deptno;

Import as Avro:
c. sqoop import --connect jdbc:mysql://localhost/hadoop_test --username xxxxx --password ****** --table Employee --target-dir /data/emp3/ --as-avrodatafile -m 1;

d. sqoop export --connect jdbc:mysql://localhost/hadoop_test --table Employee --username xxxxx --password ******** --export-dir /data/emp --input-fields-terminated-by '\t';

** while exporting one should specify the absolute path of the file. In case of Parts, give full path ex: part-00000

(Hadoop Definitive guide has very good & simple example to work on !).

"Sqoop is mainly used to transport data from RDBMS to HDFS & vis-a-vis"

**************** End of Sqoop Installation ******************************

PIG Installation

A. Pre-Requisite:

Hadoop

B. Pig Installation steps:

1. Download Hive from http://mirror.symnds.com/software/Apache/pig/stable/pig-0.11.1.tar.gz

2. Extract Pig : tar -xzf pig-0.11.1.tar.gz (it will extract to a folder hive-0.10.0-bin)

3. sudo mv pig-0.11.1/ pig

4. sudo mv pig/ /usr/hadoop/.

5. cd /usr/hadoop

6. sudo chown -R hduser:hadoop pig/

7. Set below Env variables (under hduser).

export PIG_HOME=/usr/hadoop/pig
export PATH=$PIG_HOME/bin:$PATH

(Make sure hadoop is started. To start Hadoop, ssh localhost; start-all.sh)

At command prompt in terminal: pig (type pig & enter). It will lead you to grunt> prompt, where you can run pig statements / scripts

Note: hduser is the user under which I have installed hadoop & hive. In your case it may be different

********** End of Pig Installation - Enjoy statement based HDFS tool ***************************

Tuesday, August 13, 2013

Hive Installation

A. Pre-Requisite:

Hadoop

B. Hive Installation steps:

1. Download Hive from http://mirror.olnevhost.net/pub/apache/hive/stable/ (File: hive-0.10.0-bin.tar.gz)

2. Extract Hive : tar -xzf hive-0.10.0-bin.tar.gz (it will extract to a folder hive-0.10.0-bin)

3. sudo mv hive-0.10.0-bin/ hive

4. sudo mv hive/ /usr/hadoop/.

5. cd /usr/hadoop

6. sudo chown -R hduser:hadoop hive/

7. Set below Env variables (under hduser).

export HIVE_HOME=/usr/hadoop/hive
export PATH=$HIVE_HOME/bin:$PATH

(Make sure hadoop is started. To start Hadoop, ssh localhost; start-all.sh)
hive

8. hive

it will give you hive prompt. So that you can start accessing default DB or create new DB :)

Note: hduser is the user under which I have installed hadoop & hive. In your case it may be different

********** End of Hive Installation - Enjoy SQL based HDFS tool ***************************