A. Pre-Requisite:
I. Hadoop
II. RDBMS (MySQL, Oracle, DB2, etc ...)
III. RDBMS Connector
I have installed MySQL for testing purpose.
Here is the simple way to install MySQL on ubuntu:
1. Launch Terminal
2. sudo apt-get install mysql-server (prompt for password)
3. While installing it will prompt to key in root password for mysql (Not for your system). Key in the password for mysql root user (new password & re-type password)
4. Upon successful installation, check the status using below command:
5. sudo netstat -tap | grep mysql
result should be: tcp 0 0 localhost:mysql *:* LISTEN 10444/mysqld
If it shows above output then your mysql database is ready.
6. Upon installation download the respective connector. In my case I have downloaded mysql-connector-java-5.1.25.jar & added this to CLASSPATH.
B.Sqoop Installation
1. Downlaod Sqoop from http://mirror.sdunix.com/apache/sqoop/1.4.4/sqoop-1.4.4.bin__hadoop-1.0.0.tar.gz (check your hadoop version & download respective version of sqoop, make sure the file name has "bin")
2. Extract Sqoop : tar -xzf sqoop-1.4.4-bin_hadoop-1.0.0.tar.gz (it will extract to a folder sqoop-1.4.4-bin_hadoop-1.0.0)
3. sudo mv sqoop-1.4.4-bin_hadoop-1.0.0/ sqoop
4. sudo mv sqoop/ /usr/hadoop/.
5. cd /usr/hadoop
6. sudo chown -R hduser:hadoop sqoop/
7. cd sqoop
8. cp *.jar $HADOOP_HOME/lib/. (copy sqoop jar files to hadoop lib directory)
9. Set below Env variables (under hduser).
export SQOOP_HOME=/usr/hadoop/SQOOP
export PATH=$SQOOP_HOME/bin:$PATH
(Make sure hadoop is started. To start Hadoop, ssh localhost; start-all.sh)
10. type sqoop at command prompt (it will display type "sqoop help" to get help).
11. Below are sample sqoop statements for importing & exporting data from mysql db to hdfs
a. sqoop import --connect jdbc:mysql://localhost/hadoop_test --username xxxxx --password ******** --table Employee --target-dir /data/emp1 -m 1
b. sqoop import --connect jdbc:mysql://localhost/hadoop_test --username xxxxx --password ******** --table Employee --target-dir /data/emp2/ --split-by deptno;
Import as Avro:
c. sqoop import --connect jdbc:mysql://localhost/hadoop_test --username xxxxx --password ****** --table Employee --target-dir /data/emp3/ --as-avrodatafile -m 1;
d. sqoop export --connect jdbc:mysql://localhost/hadoop_test --table Employee --username xxxxx --password ******** --export-dir /data/emp --input-fields-terminated-by '\t';
** while exporting one should specify the absolute path of the file. In case of Parts, give full path ex: part-00000
(Hadoop Definitive guide has very good & simple example to work on !).
"Sqoop is mainly used to transport data from RDBMS to HDFS & vis-a-vis"
**************** End of Sqoop Installation ******************************