Monday, January 27, 2014

Executing Hive Scripts

Step 1: Writing a Hive script.

To write the Hive Script the file should be saved with .sql extension. Open a terminal in your Cloudera CDH4 distribution and give the following command to create a Hive Script.
Command: sudo gedit sample.sql
how to run hive scripts

On executing the above command, it will open the file with the list of all the Hive commands that need to be executed.
In this script,  a table will be created, described and data will be loaded and retrieved from the table.

1. To create the table in Hive:

Command: create table product ( productid: int, productname: string, price: float, category: string) rows format delimited fields terminated by ‘,’ ;
Here, product is the table name and { productid, productname, price, category} are the columns of this table.
Fields terminated by ‘,’ indicate that the columns in the input file are separated by the symbol ‘,’.
By default the records in the input file are separated by a new line.

2. Describing the table:

Command: describe product;

3. Loading the data into the table.

To load the data into the table first we need to create an input file which contains the records that need to be inserted in the table.
Let us create an input file. The command is:
Command: sudo gedit input.txt
how to run hive scripts
Edit the contents in the file as shown in the figure.
how to run hive scripts

4. Retrieving the data:

To retrieve the data, the select command is used.
Command: Select * from product;
The above command is used to retrieve the value of all the columns present in the table. The script should be like as it is shown in the below image.
Now we are done with writing the Hive script.  The file sample.sql  can now be saved.
how to run hive scripts

Step 2: Running the Hive Script

The following is the command to run the Hive script:
Command: hive –f /home/cloudera/sample.sql
how to run hive scripts
While executing the script, make sure that  the entire path of the location of the Script file is present.
We can see that all the commands are executed successfully.
how to run hive scripts
This is how Hive scipts are run and executed in CDH4.

Apache Hive Installation on Ubuntu

Hive Installation on Ubuntu:
Please follow the below steps to install Apache Hive on Ubuntu:
Step 1:  Download Hive tar.
Command: wget -c http://archive.apache.org/dist/hive/hive-0.9.0/hive-0.9.0-bin.tar.gz
Hive installation on ubuntu - 1
Step 2:  Extract the tar file.
Command: tar -xzvf hive-0.9.0-bin.tar.gz
Hive installation on ubuntu - 2
Step 3: Edit the “.bashrc” file to update the environment variables for user.

  • hadoop fs -mkdir /user/hive/warehouse
  • hadoop fs -mkdir /temp
  • hadoop fs -chmodg+w /user/hive/warehouse
  • hadoop fs -chmodg+w /temp



Command:  sudo gedit .bashrc
Hive installation on ubuntu - 3
Add the following at the end of the file:
export HADOOP_HOME=/home/user/hadoop-1.2.0
export HIVE_HOME=/home/user/hive-0.9.0-bin
export PATH=$PATH:$HIVE_HOME/bin
export PATH=$PATH:$HADOOP_HOME/bin
Hive installation on ubuntu - 4
Step 4:  Create Hive directories within HDFS.
Command:
The directory ‘warehouse’ is the location to store the table or data related to hive.
Hive installation on ubuntu - 5
The temporary directory ‘temp’is the temporary location to store the intermediate result of processing.
Hive installation on ubuntu - 6
Step 5: Set read/write permissions for table.
Command:
In this command we are giving written permission to the group:
Hive installation on ubuntu - 7
Hive installation on ubuntu - 8
Step 6:  Set Hadoop path in Hive config.sh.
Command: sudo gedit hive-config.sh
Hive installation on ubuntu - 9
Hive installation on ubuntu - 10
Step 7: Launch Hive.
Command: hive
Hive installation on ubuntu - 11
Step 8: Create sample tables.
Command:  hive> CREATE TABLE shakespeare (freq INT, word STRING) ROW FORMAT DELIMITED                FIELDS TERMINATED BY ‘\t’ STORED AS TEXTFILE;
 Create sample tables:
Hive installation on ubuntu - 12
Step 9: To exit from Hive:
Command: hive> exit;

Tuesday, January 7, 2014

Video Tutorial for MongoDB DBA Course from MongoDB University


Week - 1 : 
  1. Course Overview
  2. Concepts and Philosophy
  3. Installing on Unix
  4. Installing on Windows
  5. JSON Types
  6. JSON Syntax - 1
  7. JSON Syntax - 2
  8. Introduction to BSON
  9. What is Mongo shell
  10. What is Java Script - 1
  11. What is Java Script - 2 
  12. MongoImport
  13. Introduction to the Mongo shell
  14. Shell Queries
  15. Shell Sorting 
  16. Shell Cursors and Shell Help
Week - 2 : 

  1. Introduction to Week - 2
  2. Inserting Data
  3. Updating the Documents
  4. Removing the Documents
  5. Updating the Documents Part -2 
  6. MongoDB Commands - 1
  7. MongoDB Commands - 2

Week - 3 : 

  1. Introduction to Week - 3
  2. Schema Design
  3. The Aggregation Framework - 1
  4. The Aggregation Framework - 2
  5. More $ Operations
  6. The FindAndModify Command
  7. MapReduce
Week - 4 : 
  1. Introduction to Replication
  2. Replica Sets Overview
  3. Replica Sets Demo 
  4. Replica Sets Demo (Windows)
  5. Replica Sets - the Simple http admin UI
  6. Replica Set Configuration
  7. GetLasterror and cluster wide commits
  8. Multi data center and sample configurations
  9. ReadPreference (SlaveOK)
Week - 5 : 

  1. Indexes and Optimizing Performance
  2. Index Types
  3. Covered Indexes
  4. Explain and Hint
  5. Read vs Write Tradeoffs
  6. CurrentOp and KillOp
  7. The Profiler
  8. Mongostat and Mongotop
  9. Introduction to MMS Monitoring
  10. Overview of MMS
  11. MMS Agent Requires PyMongo
  12. Installing PyMongo (mac)
  13. Installing PyMongo (Windows)
  14. Registering for MMS Monitoring
  15. MMS Installation (Linux)
  16. MMS Installation (Windows)
Week - 6 : 

  1. Introduction to Sharding
  2. Sharding Setup Demo
  3. The Config Database
  4. Setup Part - 2 Adding the initial Shards
  5. Enabling Sharding for a collection
  6. Working with a Sharded collection
  7. Choosing Shard Keys
  8. Process and Machine Layout
  9. Bulk Inserts and Pre-Splitting
  10. Further Tips and Best Pratices