Pentaho Data Integration 4.4 and Hadoop 1.0.4 ~ BI and Big Data Adventure via Open Source Technologies

Copy the hadoop-20 folder to a hadoop-104 folder(created by the user manually) in the /opt/pentaho/design-tools/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/ directory.
Replace the following JARs in the client (subfolder) with the versions from the Apache Hadoop 1.0.4 distribution:

Add the following JAR from the Hadoop 1.0.4 distribution to the client (subfolder) as well:

Start hadoop with the user created while hadoop installation. Note: Hadoop credentials provided in the page 4 step number 12
Start PDI

Follow the instructions below to begin creating your transformation.

Click New in the upper left corner of Spoon.
Select Transformation from the list.
Under the Design tab, expand the Input node; then, select and drag a CSV file input step onto the canvas on the right.
Expand the Big Data node; click and drag a Hadoop File Output step onto the canvas..
To connect the steps to each other, you must add a hop. Hops are used to describe the flow of data between steps in your transformation. To create the hop, click theCSV file input step, then press and hold the <SHIFT> key then draw a line to the Hadoop File Output step.
Double click the CSV file input step to open its edit properties dialog box.
In the Filename field, click on the Browse button and navigate to the input file location
Select the desired input file. (e.g) sample.csv
Click the Get fields button to get the columns of the input file and click OK button.
Double click the Hadoop File Output step to open its edit properties dialog box.
In the Filename field, click on the Browse button and Open File dialog box appears as shown below
Enter the following credentials to connect with HDFS:

Click Connect button to connect with HDFS and Open File dialog box appears as shown below:
Click OK button.
Provide the desired output file name next to the path selected in the Filename field
Navigate to the Fields tab, click the Get Fields button to get the columns of the input file and click OK button.
Click the Save icon and save the transformation you have created.
Click on the Run icon in the right panel to execute the transformation.
The Execute a Transformation dialog box appears.
Note: Local Execution is enabled by default. Select Detailed logging.
Click Launch.

Follow the instructions below to begin creating your transformation.

Click New in the upper left corner of Spoon.
Select Transformation from the list.
Under the Design tab, expand the Big Data node; then, select and drag a Hadoop File Input step onto the canvas on the right.
Expand the Output node; click and drag a Text file output step onto the canvas..
To connect the steps to each other, you must add a hop. Hops are used to describe the flow of data between steps in your transformation. To create the hop, click theHadoop File input step, then press and hold the <SHIFT> key then draw a line to the Text file output step.
Double click the Hadoop File Input step to open its edit properties dialog box.
In the File or directory field, click on the Browse button and Open File dialog box appears as shown below
Enter the following credentials to connect with HDFS:

Click Connect button to connect with HDFS and Open File dialog box appears as shown below:
Select the desired input file from HDFS. Click OK button.
Click ADD button corresponds to the File or directory field as shown below
Navigate to the Fields tab, click the Get Fields button to get the columns of the input file and click OK button.
Double click the Text file output step to open its edit properties dialog box.
In the Filename field, click on the Browse button and navigate to the desired location where the output file to be placed
Provide the desired output file name next to the path selected in the Filename field
Navigate to the Fields tab, click the Get Fields button to get the columns of the input file and click OK button.
Click the Save icon and save the transformation you have created.
Click on the Run icon in the right panel to execute the transformation.
Click Launch.

BI and Big Data Adventure via Open Source Technologies