tag:blogger.com,1999:blog-17435577388891300582024-03-28T00:34:03.978-07:00BI and Big Data Adventure via Open Source TechnologiesKarthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.comBlogger24125tag:blogger.com,1999:blog-1743557738889130058.post-81020454493525397542015-03-10T01:02:00.000-07:002015-03-10T01:02:46.244-07:00Custom Java UDA implementation in Apache Drill 0.7<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="margin-bottom: 0cm;">
<br /></div>
<div style="border-bottom: none; border-left: none; border-right: none; border-top: 1px solid #00000a; margin-bottom: 0cm; padding-bottom: 0cm; padding-left: 0cm; padding-right: 0cm; padding-top: 0.04cm;">
<br /></div>
<div align="CENTER" style="margin-bottom: 0cm;">
<span style="font-size: x-small;"><span style="color: #7f6000; font-family: Verdana, sans-serif;"><b>CUSTOM
JAVA UDA IMPLEMENTATION IN APACHE DRILL</b></span></span></div>
<div align="CENTER" style="margin-bottom: 0cm;">
<span style="font-size: x-small;"><span style="color: #7f6000; font-family: Verdana, sans-serif;"><b><br /></b></span></span></div>
<div style="border-bottom: none; border-left: none; border-right: none; border-top: 1px solid #00000a; margin-bottom: 0cm; padding-bottom: 0cm; padding-left: 0cm; padding-right: 0cm; padding-top: 0.04cm;">
<br /></div>
<b style="color: #7f6000; font-family: Verdana, sans-serif; font-size: small; text-indent: 0cm;">PREREQUISITE</b><br />
<div style="margin-bottom: 0cm; margin-left: 1.27cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">The
following software is to be installed.</span></div>
<ul>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">Ubuntu
13.10</span></div>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">Java
1.7</span></div>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">Maven
3.0.4</span></div>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">Git</span></div>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">Eclipse
Kepler with Maven Plugins</span></div>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">Apache
Drill 0.7.0 (Installed in an embedded mode)</span></div>
</li>
</ul>
<b style="color: #7f6000; font-family: Verdana, sans-serif; font-size: small; text-indent: 0cm;">DOWNLOAD
AND COMPILE SOURCE CODE OF DRILL</b><br />
<div style="margin-bottom: 0cm; text-indent: 1.27cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">Download
and build the source code of Apache Drill 0.7.0 and 0.8.0 from git.</span></div>
<ul>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">Apache
Drill 0.8.0</span></div>
</li>
</ul>
<dl>
<dl>
<dd><table cellpadding="7" cellspacing="0" style="width: 554px;">
<colgroup><col width="538"></col>
</colgroup><tbody>
<tr>
<td style="border: 1.00pt solid #000001; padding: 0.18cm;" valign="TOP" width="538"><div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif;"><span style="font-size: x-small;">git clone
</span><a href="https://github.com/apache/drill.git"><span style="color: #1155cc;"><span style="font-size: x-small;"><u>https://github.com/apache/drill.git</u></span></span></a></span></div>
<div align="LEFT" style="orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">mvn clean install</span></div>
</td></tr>
</tbody></table>
</dd></dl>
</dl>
<ul>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif;"><span style="font-size: x-small;">Apache
Drill 0.7.0 </span>
</span></div>
</li>
</ul>
<dl>
<dl>
<dd><table cellpadding="7" cellspacing="0" style="width: 557px;">
<colgroup><col width="541"></col>
</colgroup><tbody>
<tr>
<td style="border: 1.00pt solid #000001; padding: 0.18cm;" valign="TOP" width="541"><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif;"><span style="font-size: x-small;">git
clone -b 0.7.0 </span><a href="https://github.com/apache/drill.git"><span style="color: #1155cc;"><span style="font-size: x-small;"><u>https://github.com/apache/drill.git</u></span></span></a></span></div>
<span style="font-family: Verdana, sans-serif; font-size: x-small;">mvn clean install</span></td>
</tr>
</tbody></table>
</dd></dl>
</dl>
<b style="color: #7f6000; font-family: Verdana, sans-serif; font-size: small; text-indent: 0cm;">SET
UP DRILL DEVELOPMENT ENVIRONMENT IN ECLIPSE</b><br />
<ul>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">Open
Eclipse and Right click on Package Explorer and select Import.</span></div>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">Navigate
to Maven and select Existing Maven Projects to Import the compiled
source code of Drill 0.7.0</span></div>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">In
case of any error in project, fix the errors.</span></div>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">Create
a new Maven project (e.g., groupID - com.udf, artifactID - udf,
version - 0.0.1-SNAPSHOT and package - jar)</span></div>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">Add
the following lines in pom.xml file.</span></div>
</li>
</ul>
<dl>
<dl>
<dd><table cellpadding="7" cellspacing="0" style="width: 558px;">
<colgroup><col width="542"></col>
</colgroup><tbody>
<tr>
<td style="border: 1.00pt solid #000001; padding: 0.18cm;" valign="TOP" width="542"><div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> <dependencyManagement></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> <dependencies></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> <dependency></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> <groupId>org.apache.drill.exec</groupId></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> <artifactId>drill-java-exec</artifactId></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> <version>${project.version}</version></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> </dependency></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> </dependencies></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> </dependencyManagement></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif;"><br />
</span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> <build></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> <plugins></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> <plugin></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> <groupId>org.apache.maven.plugins</groupId></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> <artifactId>maven-source-plugin</artifactId></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> <version>2.4</version></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> <executions></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> <execution></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> <id>attach-sources</id></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> <!--
<phase>verify</phase> --></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> <goals></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> <goal>jar</goal></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> </goals></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> </execution></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> </executions></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> </plugin></span></div>
<div align="LEFT" style="margin-bottom: 0cm; orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> </plugins></span></div>
<div align="LEFT" style="orphans: 0; page-break-after: auto; page-break-inside: auto; widows: 0;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;"> </build></span></div>
</td></tr>
</tbody></table>
</dd></dl>
</dl>
<ul>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">Create
a Java class and develop your UDA</span></div>
</li>
</ul>
<b style="color: #7f6000; font-family: Verdana, sans-serif; font-size: small; text-indent: 0cm;">CLEAN,
COMPILE AND BUILD JAR WITH DEPENDENCIES</b><br />
<ul>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">To
Clean - Right click on your Maven project and select Run as ->
Maven Clean</span></div>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">Goto
Project in Eclipse and select clean</span></div>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">To
compile and build Jar - Right click on your Maven project and select
Run as -> Maven Install.</span></div>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">Now
the jar file would be available in target folder under your project.</span></div>
</li>
</ul>
<b style="color: #7f6000; font-family: Verdana, sans-serif; font-size: small; text-indent: 0cm;">DEPLOY
THE CUSTOM JAR IN APACHE DRILL</b><br />
<ul>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">Stop
the drillbit service</span></div>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">Copy
the jar files (yourprojectname-0.0.1-SNAPSHOT-sources.jar and
yourprojectname-0.0.1-SNAPSHOT.jar) from the target directory of
your project and paste to $DRILL_HOME/jars/classb/ location.</span></div>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">Start
the drillbit service.</span></div>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">Make
sure that your jar is loaded successfully into the Drill by
verifying the drill log files.</span></div>
</li>
</ul>
<b style="color: #7f6000; font-family: Verdana, sans-serif; font-size: small; text-indent: 0cm;">TEST
YOUR CUSTOM UDA FUNCTION IN WEB UI OF THE APACHE DRILL</b><br />
<br />
<ul>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif;"><span style="font-size: x-small;">Open
web ui of apache drill (</span><a href="http://localhost:8047/"><span style="color: #1155cc;"><span style="font-size: x-small;"><u>http://localhost:8047/</u></span></span></a><span style="font-size: x-small;">)</span></span></div>
</li>
<li><div style="margin-bottom: 0cm;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">Navigate
to Query and execute your custom query.</span></div>
</li>
</ul>
</div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com28tag:blogger.com,1999:blog-1743557738889130058.post-12920326526697811212015-02-11T21:47:00.004-08:002015-02-11T21:47:59.634-08:00My Github Repository of Hadoop and HBase developement<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<b><a href="https://github.com/kkarthik21/hbase-dev">HBase Github Repository</a></b><br />
<br />
<b><a href="https://github.com/kkarthik21/hadoop-dev">Hadoop Github Repository</a></b><br />
<br /></div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com6tag:blogger.com,1999:blog-1743557738889130058.post-74345688050500538822014-09-18T06:56:00.001-07:002014-09-18T10:35:41.407-07:00Running K-Means Clustering Algorithm against numerical data in Apache Mahout<div dir="ltr" style="text-align: left;" trbidi="on">
<div>
<b style="color: blue;">Procedures :</b></div>
<ul>
<li><div style="margin-bottom: 0cm;">
Dataset Preparation
</div>
</li>
<li><div style="margin-bottom: 0cm;">
Generate Sequence
File</div>
</li>
<li><div style="margin-bottom: 0cm;">
Move the Sequence
File to Hadoop Cluster and check the contents of the Sequence File</div>
</li>
<li><div style="margin-bottom: 0cm;">
Plan and Run K-Means
clustering algorithm</div>
</li>
<li><div style="margin-bottom: 0cm;">
Export the K-Means
output using Cluster Dumper tool</div>
</li>
<li><div style="margin-bottom: 0cm;">
Export the K-Means
output as graphml file</div>
</li>
<li><div style="margin-bottom: 0cm;">
<a href="https://www.blogger.com/null" name="__DdeLink__345_1104411583"></a>Visualize the output of K-Means using graphml in Gephi</div>
</li>
</ul>
<div style="margin-bottom: 0cm;">
<span style="color: blue;"><b>Dataset Preparation :</b></span></div>
<div style="margin-bottom: 0cm;">
<b></b><span style="font-weight: normal;"> I
am gonna generate </span><span style="font-weight: normal;">float
</span><span style="font-weight: normal;">values </span><span style="font-weight: normal;">(having
2 Dimension and 5 different ranges) </span><span style="font-weight: normal;">using
Java Gaussian function as given below, </span>
</div>
<div style="margin-bottom: 0cm;">
<br /></div>
<table cellpadding="4" cellspacing="0" style="width: 100%px;">
<colgroup><col width="256*"></col>
</colgroup><tbody>
<tr>
<td style="border: 1px solid #000000; padding: 0.1cm;" valign="TOP" width="100%"><div align="LEFT" style="margin-bottom: 0cm;">
<span style="font-family: Times New Roman, serif;"><span style="color: #7f0055;"><b>import</b></span><span style="color: black;">
java.util.Random;</span></span></div>
<div align="LEFT" style="margin-bottom: 0cm;">
<br /></div>
<div align="LEFT" style="margin-bottom: 0cm;">
<span style="font-family: Times New Roman, serif;"><span style="color: #7f0055;"><b>public</b></span><span style="color: black;">
</span><span style="color: #7f0055;"><b>final</b></span><span style="color: black;">
</span><span style="color: #7f0055;"><b>class</b></span><span style="color: black;">
RandomGaussian {</span></span></div>
<div align="LEFT" style="margin-bottom: 0cm;">
<span style="font-family: Times New Roman, serif;"><span style="color: black;"> </span><span style="color: #7f0055;"><b>public</b></span><span style="color: black;">
</span><span style="color: #7f0055;"><b>static</b></span><span style="color: black;">
</span><span style="color: #7f0055;"><b>void</b></span><span style="color: black;">
main(String... aArgs) {</span></span></div>
<div align="LEFT" style="margin-bottom: 0cm;">
<span style="font-family: Times New Roman, serif;"><span style="color: black;"> RandomGaussian
gaussian = </span><span style="color: #7f0055;"><b>new</b></span><span style="color: black;">
RandomGaussian();</span></span></div>
<div align="LEFT" style="margin-bottom: 0cm;">
<span style="font-family: Times New Roman, serif;"><span style="color: black;"> </span><span style="color: #7f0055;"><b>double</b></span><span style="color: black;">
MEAN = -0.9f;</span></span></div>
<div align="LEFT" style="margin-bottom: 0cm;">
<span style="font-family: Times New Roman, serif;"><span style="color: black;"> </span><span style="color: #7f0055;"><b>double</b></span><span style="color: black;">
VARIANCE = 0.1f;</span></span></div>
<div align="LEFT" style="margin-bottom: 0cm;">
<span style="font-family: Times New Roman, serif;"><span style="color: black;"> </span><span style="color: #7f0055;"><b>for</b></span><span style="color: black;">
(</span><span style="color: #7f0055;"><b>int</b></span><span style="color: black;">
idx = 1; idx <= 25; ++idx) {</span></span></div>
<div align="LEFT" style="margin-bottom: 0cm;">
<span style="font-family: Times New Roman, serif;"><span style="color: black;"> </span><span style="color: black;"><i>log</i></span><span style="color: black;">(gaussian.getGaussian(MEAN,
VARIANCE));</span></span></div>
<div align="LEFT" style="margin-bottom: 0cm;">
<span style="font-family: Times New Roman, serif;"><span style="color: black;"> }</span></span></div>
<div align="LEFT" style="margin-bottom: 0cm;">
<span style="font-family: Times New Roman, serif;"><span style="color: black;"> }</span></span></div>
<div align="LEFT" style="margin-bottom: 0cm;">
<span style="font-family: Times New Roman, serif;"><span style="color: black;"> </span><span style="color: #7f0055;"><b>private</b></span><span style="color: black;">
Random </span><span style="color: #0000c0;">fRandom</span><span style="color: black;">
= </span><span style="color: #7f0055;"><b>new</b></span><span style="color: black;">
Random();</span></span></div>
<div align="LEFT" style="margin-bottom: 0cm;">
<span style="font-family: Times New Roman, serif;"><span style="color: black;"> </span><span style="color: #7f0055;"><b>private</b></span><span style="color: black;">
</span><span style="color: #7f0055;"><b>double</b></span><span style="color: black;">
getGaussian(</span><span style="color: #7f0055;"><b>double</b></span><span style="color: black;">
aMean, </span><span style="color: #7f0055;"><b>double</b></span><span style="color: black;">
aVariance) {</span></span></div>
<div align="LEFT" style="margin-bottom: 0cm;">
<span style="font-family: Times New Roman, serif;"><span style="color: black;"> </span><span style="color: #7f0055;"><b>return</b></span><span style="color: black;">
aMean + </span><span style="color: #0000c0;">fRandom</span><span style="color: black;">.nextGaussian()
* aVariance;</span></span></div>
<div align="LEFT" style="margin-bottom: 0cm;">
<span style="font-family: Times New Roman, serif;"><span style="color: black;"> }</span></span></div>
<div align="LEFT" style="margin-bottom: 0cm;">
<span style="font-family: Times New Roman, serif;"><span style="color: black;"> </span><span style="color: #7f0055;"><b>private</b></span><span style="color: black;">
</span><span style="color: #7f0055;"><b>static</b></span><span style="color: black;">
</span><span style="color: #7f0055;"><b>void</b></span><span style="color: black;">
log(Object aMsg) {</span></span></div>
<div align="LEFT" style="margin-bottom: 0cm;">
<span style="font-family: Times New Roman, serif;"><span style="color: black;"> System.</span><span style="color: #0000c0;"><i>out</i></span><span style="color: black;">.println(String.</span><span style="color: black;"><i>valueOf</i></span><span style="color: black;">(aMsg));</span></span></div>
<div align="LEFT" style="margin-bottom: 0cm;">
<span style="font-family: Times New Roman, serif;"><span style="color: black;"> }</span></span></div>
<div align="LEFT">
<span style="font-family: Times New Roman, serif;"><span style="color: black;">}</span></span></div>
</td>
</tr>
</tbody></table>
<div style="margin-bottom: 0cm;">
<br /></div>
<div style="margin-bottom: 0cm;">
<span style="color: blue;"><b>Generate Sequence File
:</b></span></div>
<div style="font-weight: normal; margin-bottom: 0cm;">
If
you need to process some numerical data, you need to write some
utility functions to write the numerical data into sequence-vector
format. The following java program will convert the above create
numerical data into sequence vector file. SequencesFiles is a file
with structure of key-value format.</div>
<div style="margin-bottom: 0cm;">
<br /></div>
<table cellpadding="4" cellspacing="0" style="width: 100%px;">
<colgroup><col width="256*"></col>
</colgroup><tbody>
<tr>
<td style="border: 1px solid #000000; padding: 0.1cm;" valign="TOP" width="100%"><div style="font-weight: normal; margin-bottom: 0cm;">
import
java.io.BufferedReader;</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
import
java.io.FileReader;</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
import
java.util.ArrayList;</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
import
java.util.List;</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
<br /></div>
<div style="font-weight: normal; margin-bottom: 0cm;">
import
org.apache.hadoop.conf.Configuration;</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
import
org.apache.hadoop.fs.FileSystem;</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
import
org.apache.hadoop.fs.Path;</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
import
org.apache.hadoop.io.SequenceFile;</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
import
org.apache.hadoop.io.Text;</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
import
org.apache.mahout.math.DenseVector;</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
import
org.apache.mahout.math.NamedVector;</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
import
org.apache.mahout.math.VectorWritable;</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
<br /></div>
<div style="font-weight: normal; margin-bottom: 0cm;">
class
VectorFileCreation {</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
private
VectorFileCreation() {</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
}</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
<br /></div>
<div style="font-weight: normal; margin-bottom: 0cm;">
public
static final int NUM_COLUMNS = 3;</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
<br /></div>
<div style="font-weight: normal; margin-bottom: 0cm;">
public
static void main(String[] args) throws Exception {</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
String
INPUT_FILE = "inputvectorfile.csv";</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
String
OUTPUT_FILE = "sampleseqfile";</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
List<NamedVector>
apples = new ArrayList<NamedVector>();</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
NamedVector
apple;</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
BufferedReader
br = null;</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
br
= new BufferedReader(new FileReader(INPUT_FILE));</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
String
sCurrentLine;</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
while
((sCurrentLine = br.readLine()) != null) {</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
String
item_name = sCurrentLine.split(",")[0];</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
double[]
features = new double[NUM_COLUMNS - 1];</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
for
(int indx = 1; indx < NUM_COLUMNS; ++indx) {</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
features[indx
- 1] = Double.parseDouble(sCurrentLine.split(",")[indx]);</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
}</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
apple
= new NamedVector(new DenseVector(features), item_name);</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
apples.add(apple);</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
}</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
Configuration
conf = new Configuration();</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
FileSystem
fs = FileSystem.get(conf);</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
Path
path = new Path(OUTPUT_FILE);</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
SequenceFile.Writer
writer = new SequenceFile.Writer(fs, conf, path,Text.class,
VectorWritable.class);</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
VectorWritable
vec = new VectorWritable();</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
for
(NamedVector vector : apples) {</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
vec.set(vector);</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
writer.append(new
Text(vector.getName()), vec);</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
}</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
writer.close();</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
SequenceFile.Reader
reader = new SequenceFile.Reader(fs, new Path(OUTPUT_FILE), conf);</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
Text
key = new Text();</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
VectorWritable
value = new VectorWritable();</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
while
(reader.next(key, value)) {</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
System.out.println(key.toString()
+ ","+ value.get().asFormatString());</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
}</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
reader.close();</div>
<div style="font-weight: normal; margin-bottom: 0cm;">
}</div>
<div style="font-weight: normal;">
}</div>
</td>
</tr>
</tbody></table>
<ul>
<div style="margin-bottom: 0cm;">
</div>
</ul>
<div style="margin-bottom: 0cm;">
<span style="color: blue;"><b>Move the Sequence File
to Hadoop Cluster and check the contents of the Sequence File :</b></span></div>
<div style="margin-bottom: 0cm;">
<b></b><span style="font-weight: normal;"> Use
the hadoop shell commands to move the sequence file to Hadoop
Cluster. T</span><span style="font-weight: normal;">he contents of the
sequence file </span><span style="font-weight: normal;">can't be
viewed so e</span><span style="font-weight: normal;">xecuting the
below command will show the contents of the sequence file</span></div>
<div style="margin-bottom: 0cm;">
<br /></div>
<table cellpadding="4" cellspacing="0" style="width: 100%px;">
<colgroup><col width="256*"></col>
</colgroup><tbody>
<tr>
<td style="border: 1px solid #000000; padding: 0.1cm;" valign="TOP" width="100%"><div align="CENTER">
mahout seqdumper -i
/your-hdfs-path-to seqfiles | less</div>
</td>
</tr>
</tbody></table>
<div style="margin-bottom: 0cm;">
<br /></div>
<div style="margin-bottom: 0cm;">
<span style="color: blue;"><b>Plan and Run K-Means
clustering algorithm :</b></span></div>
<div style="margin-bottom: 0cm;">
<b></b><span style="font-weight: normal;"> Plan
the clustering by choosing </span><span style="font-weight: normal;">number
of </span><span style="font-weight: normal;">clusters and iterations
and distance measure and execute the below commands</span></div>
<table cellpadding="4" cellspacing="0" style="width: 100%px;">
<colgroup><col width="256*"></col>
</colgroup><tbody>
<tr>
<td style="border: 1px solid #000000; padding: 0.1cm;" valign="TOP" width="100%">mahout kmeans -i /your-hdfs-path-to-seqfiles -c
/your-hdfs-path-to-initial-cluster -o
/your-hdfs-path-to-seqfiles-final-cluster -x <numeric value of
iteration> -k <numeric value of clusters> -ow
--clustering -cd <numeric value></td>
</tr>
</tbody></table>
<div style="margin-bottom: 0cm;">
<br /></div>
<div style="margin-bottom: 0cm;">
<span style="font-weight: normal;">By
default, it would use Squared Euclidean Distance Measure and
convergance delta value as 0.5</span></div>
<div style="margin-bottom: 0cm;">
<br /></div>
<div style="margin-bottom: 0cm;">
<span style="color: blue;"><b>Export the K-Means
output using Cluster Dumper tool :</b></span></div>
<div style="margin-bottom: 0cm;">
<br /></div>
<table cellpadding="4" cellspacing="0" style="width: 100%px;">
<colgroup><col width="256*"></col>
</colgroup><tbody>
<tr>
<td style="border: 1px solid #000000; padding: 0.1cm;" valign="TOP" width="100%">mahout clusterdump -i
/your-hdfs-path-to-clusters-*-final -p
/your-hdfs-path-to-clusteredPoints -o
/your-local-destination-path-with-filename.txt</td>
</tr>
</tbody></table>
<div style="margin-bottom: 0cm;">
<br /></div>
<div style="margin-bottom: 0cm;">
<span style="color: blue;"><b>Export the K-Means
output as graphml file :</b></span></div>
<div style="margin-bottom: 0cm;">
<br /></div>
<table cellpadding="4" cellspacing="0" style="width: 100%px;">
<colgroup><col width="256*"></col>
</colgroup><tbody>
<tr>
<td style="border: 1px solid #000000; padding: 0.1cm;" valign="TOP" width="100%">mahout clusterdump -i
/your-hdfs-path-to-clusters-*-final -p
/your-hdfs-path-to-clusteredPoints -of GRAPH_ML -o
/your-local-destination-path-with-filename.graphml</td>
</tr>
</tbody></table>
<div style="margin-bottom: 0cm;">
<b><br /></b></div>
<div style="margin-bottom: 0cm;">
<span style="color: blue;"><b>Visualize the output of
K-Means using graphml in Gephi </b></span><b style="color: blue;">:</b></div>
<ul>
<li><div style="margin-bottom: 0cm;">
Open
the Graphml in Gephi and visualize the centroid and cluster point as
shown below,</div>
</li>
</ul>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0u4JY0YLMHc-lvtAkKYf4EYZQ0ybUuEiX11lb3QVTSGY83b1ev2vxGADpytt7bzFs1xVb3O2ePAiO1V5CXaWkfEE6r3VT6gwTr419JmHI7l9RnHmVsMYmRTSK8v_h5-RjyyefY2CKDmk/s1600/sample.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0u4JY0YLMHc-lvtAkKYf4EYZQ0ybUuEiX11lb3QVTSGY83b1ev2vxGADpytt7bzFs1xVb3O2ePAiO1V5CXaWkfEE6r3VT6gwTr419JmHI7l9RnHmVsMYmRTSK8v_h5-RjyyefY2CKDmk/s1600/sample.png" height="360" width="640" /></a></div>
<div>
<span style="font-size: x-small;"><br /></span></div>
</div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com4tag:blogger.com,1999:blog-1743557738889130058.post-63536712647926760912014-09-17T22:58:00.001-07:002015-03-10T21:49:21.970-07:00Hadoop EcoSystems Table<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<table border="0" cellspacing="0" cols="2">
<colgroup width="182"></colgroup>
<colgroup width="857"></colgroup>
<tbody>
<tr>
<td align="CENTER" bgcolor="#83CAFF" height="30" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Distributed Filesystem</span></b></td>
<td align="CENTER" bgcolor="#83CAFF" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Description</span></b></td>
</tr>
<tr>
<td align="CENTER" height="62" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache HDFS</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">The Hadoop Distributed File System (HDFS) offers a way to store large files across multiple machines. Hadoop and HDFS was derived from Google File System (GFS) paper. Prior to Hadoop 2.0.0, the NameNode was a single point of failure (SPOF) in an HDFS cluster. With Zookeeper the HDFS High Availability feature addresses this problem by providing the option of running two redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby.</span></td>
</tr>
<tr>
<td align="CENTER" height="47" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Red Hat GlusterFS</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">GlusterFS is a scale-out network-attached storage file system. GlusterFS was developed originally by Gluster, Inc., then by Red Hat, Inc., after their purchase of Gluster in 2011. In June 2012, Red Hat Storage Server was announced as a commercially-supported integration of GlusterFS with Red Hat Enterprise Linux. Gluster File System, known now as Red Hat Storage Server.</span></td>
</tr>
<tr>
<td align="CENTER" height="92" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Quantcast File System QFS</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">(QFS) is an open-source distributed file system software package for large-scale MapReduce or other batch-processing workloads. It was designed as an alternative to Apache Hadoop’s HDFS, intended to deliver better performance and cost-efficiency for large-scale processing clusters. It is written in C++ and has fixed-footprint memory management. QFS uses Reed-Solomon error correction as method for assuring reliable access to data. Reed–Solomon coding is very widely used in mass storage systems to correct the burst errors associated with media defects. Rather than storing three full versions of each file like HDFS, resulting in the need for three times more storage, QFS only needs 1.5x the raw capacity because it stripes data across nine different disk drives.</span></td>
</tr>
<tr>
<td align="CENTER" height="47" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Ceph Filesystem</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Ceph is a free software storage platform designed to present object, block, and file storage from a single distributed computer cluster. Ceph's main goals are to be completely distributed without a single point of failure, scalable to the exabyte level, and freely-available. The data is replicated, making it fault tolerant. The problem right now is Ceph currently requires Hadoop 1.1.X stable series.</span></td>
</tr>
<tr>
<td align="CENTER" height="136" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Lustre file system</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">The Lustre filesystem is a high-performance distributed filesystem intended for larger network and high-availability environments. Traditionally, Lustre is configured to manage remote data storage disk devices within a Storage Area Network (SAN), which is two or more remotely attached disk devices communicating via a Small Computer System Interface (SCSI) protocol. This includes Fibre Channel, Fibre Channel over Ethernet (FCoE), Serial Attached SCSI (SAS) and even iSCSI. With Hadoop HDFS the software needs a dedicated cluster of computers on which to run. But folks who run high performance computing clusters for other purposes often don't run HDFS, which leaves them with a bunch of computing power, tasks that could almost certainly benefit from a bit of map reduce and no way to put that power to work running Hadoop. Intel's noticed this and, in version 2.5 of its Hadoop distribution that it quietly released last week, has added support for Lustre: the Intel® HPC Distribution for Apache Hadoop* Software, a new product that combines Intel Distribution for Apache Hadoop software with Intel® Enterprise Edition for Lustre software. This is the only distribution of Apache Hadoop that is integrated with Lustre, the parallel file system used by many of the world's fastest supercomputers</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Tachyon</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Tachyon is an memory distributed file system. By storing the file-system contents in the main memory of all cluster nodes, the system achieves higher throughput than traditional disk-based storage systems like HDFS.</span></td>
</tr>
<tr>
<td align="CENTER" height="136" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">GridGain</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">GridGain is open source project licensed under Apache 2.0. One of the main pieces of this platform is the In-Memory Apache Hadoop Accelerator which aims to accelerate HDFS and Map/Reduce by bringing both, data and computations into memory. This work is done with the GGFS - Hadoop compliant in-memory file system. For I/O intensive jobs GridGain GGFS offers performance close to 100x faster than standard HDFS. Paraphrasing Dmitriy Setrakyan from GridGain Systems talking about GGFS regarding Tachyon: - GGFS allows read-through and write-through to/from underlying HDFS or any other Hadoop compliant file system with zero code change. Essentially GGFS entirely removes ETL step from integration. - GGFS has ability to pick and choose what folders stay in memory, what folders stay on disc, and what folders get synchronized with underlying (HD)FS either synchronously or asynchronously. - GridGain is working on adding native MapReduce component which will provide native complete Hadoop integration without changes in API, like Spark currently forces you to do. Essentially GridGain MR+GGFS will allow to bring Hadoop completely or partially in-memory in Plug-n-Play fashion without any API changes.</span></td>
</tr>
<tr>
<td align="CENTER" bgcolor="#83CAFF" height="30" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Distributed Programming</span></b></td>
<td align="CENTER" bgcolor="#83CAFF" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Description</span></b></td>
</tr>
<tr>
<td align="CENTER" height="92" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache MapReduce</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">MapReduce is a programming model for processing large data sets with a parallel, distributed algorithm on a cluster. Apache MapReduce was derived from Google MapReduce: Simplified Data Processing on Large Clusters paper. The current Apache MapReduce version is built over Apache YARN Framework. YARN stands for “Yet-Another-Resource-Negotiator”. It is a new framework that facilitates writing arbitrary distributed processing frameworks and applications. YARN’s execution model is more generic than the earlier MapReduce implementation. YARN can run applications that do not follow the MapReduce model, unlike the original Apache Hadoop MapReduce (also called MR1). Hadoop YARN is an attempt to take Apache Hadoop beyond MapReduce for data-processing.</span></td>
</tr>
<tr>
<td align="CENTER" height="106" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Pig</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Pig provides an engine for executing data flows in parallel on Hadoop. It includes a language, Pig Latin, for expressing these data flows. Pig Latin includes operators for many of the traditional data operations (join, sort, filter, etc.), as well as the ability for users to develop their own functions for reading, processing, and writing data. Pig runs on Hadoop. It makes use of both the Hadoop Distributed File System, HDFS, and Hadoop’s processing system, MapReduce. Pig uses MapReduce to execute all of its data processing. It compiles the Pig Latin scripts that users write into a series of one or more MapReduce jobs that it then executes. Pig Latin looks different from many of the programming languages you have seen. There are no if statements or for loops in Pig Latin. This is because traditional procedural and object-oriented programming languages describe control flow, and data flow is a side effect of the program. Pig Latin instead focuses on data flow.</span></td>
</tr>
<tr>
<td align="CENTER" height="151" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">JAQL</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">JAQL is a functional, declarative programming language designed especially for working with large volumes of structured, semi-structured and unstructured data. As its name implies, a primary use of JAQL is to handle data stored as JSON documents, but JAQL can work on various types of data. For example, it can support XML, comma-separated values (CSV) data and flat files. A "SQL within JAQL" capability lets programmers work with structured SQL data while employing a JSON data model that's less restrictive than its Structured Query Language counterparts. Specifically, Jaql allows you to select, join, group, and filter data that is stored in HDFS, much like a blend of Pig and Hive. Jaql’s query language was inspired by many programming and query languages, including Lisp, SQL, XQuery, and Pig. JAQL was created by workers at IBM Research Labs in 2008 and released to open source. While it continues to be hosted as a project on Google Code, where a downloadable version is available under an Apache 2.0 license, the major development activity around JAQL has remained centered at IBM. The company offers the query language as part of the tools suite associated with InfoSphere BigInsights, its Hadoop platform. Working together with a workflow orchestrator, JAQL is used in BigInsights to exchange data between storage, processing and analytics jobs. It also provides links to external data and services, including relational databases and machine learning data.</span></td>
</tr>
<tr>
<td align="CENTER" height="121" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Spark</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley. Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS). However, Spark provides an easier to use alternative to Hadoop MapReduce and offers performance up to 10 times faster than previous generation systems like Hadoop MapReduce for certain applications. Spark is a framework for writing fast, distributed programs. Spark solves similar problems as Hadoop MapReduce does but with a fast in-memory approach and a clean functional style API. With its ability to integrate with Hadoop and inbuilt tools for interactive query analysis (Shark), large-scale graph processing and analysis (Bagel), and real-time analysis (Spark Streaming), it can be interactively used to quickly process and query big data sets. To make programming faster, Spark provides clean, concise APIs in Scala, Java and Python. You can also use Spark interactively from the Scala and Python shells to rapidly query big datasets. Spark is also the engine behind Shark, a fully Apache Hive-compatible data warehousing system that can run 100x faster than Hive.</span></td>
</tr>
<tr>
<td align="CENTER" height="136" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Stratosphere</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Stratosphere is a general purpose cluster computing framework. It is compatible to the Hadoop ecosystem: Stratosphere can access data stored in HDFS and runs with Hadoop's new cluster manager YARN. The common input formats of Hadoop are supported as well. Stratosphere does not use Hadoop's MapReduce implementation: it is a completely new system that brings its own runtime. The new runtime allows to define more advanced operations that include more transformations than just map and reduce. Additionally, Stratosphere allows to express analysis jobs using advanced data flow graphs, which are able to resemble common data analysis task more naturally. Users can program their analysis programs using a Scala and a Java programming interfaces. Graph processing can be done using its Giraph-inspired graph processing abstraction. Stratosphere is available under the Apache 2.0 License and is being developed actively by an open-source community. Some of the more advanced features are its optimizer that chooses the optimal execution strategy behind the scenes. Tasks that need to go over the data multiple times can use a feature called "Iterations". It allows to express machine learning or graph processing algorithms more naturally within the system and achieve higher performance.</span></td>
</tr>
<tr>
<td align="CENTER" height="62" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Netflix PigPen</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">PigPen is map-reduce for Clojure whiche compiles to Apache Pig. Clojure is dialect of the Lisp programming language created by Rich Hickey, so is a functional general-purpose language, and runs on the Java Virtual Machine, Common Language Runtime, and JavaScript engines. In PigPen there are no special user defined functions (UDFs). Define Clojure functions, anonymously or named, and use them like you would in any Clojure program. This tool is open sourced by Netflix, Inc. the American provider of on-demand Internet streaming media.</span></td>
</tr>
<tr>
<td align="CENTER" height="77" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">AMPLab SIMR</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Spark was developed thinking in Apache YARN. However, up to now, it has been relatively hard to run Apache Spark on Hadoop MapReduce v1 clusters, i.e. clusters that do not have YARN installed. Typically, users would have to get permission to install Spark/Scala on some subset of the machines, a process that could be time consuming. SIMR allows anyone with access to a Hadoop MapReduce v1 cluster to run Spark out of the box. A user can run Spark directly on top of Hadoop MapReduce v1 without any administrative rights, and without having Spark or Scala installed on any of the nodes.</span></td>
</tr>
<tr>
<td align="CENTER" height="62" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Facebook Corona</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">“The next version of Map-Reduce" from Facebook, based in own fork of Hadoop. The current Hadoop implementation of the MapReduce technique uses a single job tracker, which causes scaling issues for very large data sets. The Apache Hadoop developers have been creating their own next-generation MapReduce, called YARN, which Facebook engineers looked at but discounted because of the highly-customised nature of the company's deployment of Hadoop and HDFS. Corona, like YARN, spawns multiple job trackers (one for each job, in Corona's case).</span></td>
</tr>
<tr>
<td align="CENTER" height="121" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Twill</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Twill is an abstraction over Apache Hadoop® YARN that reduces the complexity of developing distributed applications, allowing developers to focus more on their business logic. Twill uses a simple thread-based model that Java programmers will find familiar. YARN can be viewed as a compute fabric of a cluster, which means YARN applications like Twill will run on any Hadoop 2 cluster. YARN is an open source application that allows the Hadoop cluster to turn into a collection of virtual machines. Weave, developed by Continuuity and initially housed on Github, is a complementary open source application that uses a programming model similar to Java threads, making it easy to write distributed applications. In order to remove a conflict with a similarly named project on Apache, called "Weaver," Weave's name changed to Twill when it moved to Apache incubation. Twill functions as a scaled-out proxy. Twill is a middleware layer in between YARN and any application on YARN. When you develop a Twill app, Twill handles APIs in YARN that resemble a multi-threaded application familiar to Java. It is very easy to build multi-processed distributed applications in Twill.</span></td>
</tr>
<tr>
<td align="CENTER" height="47" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Damballa Parkour</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Library for develop MapReduce programs using the LISP like language Clojure. Parkour aims to provide deep Clojure integration for Hadoop. Programs using Parkour are normal Clojure programs, using standard Clojure functions instead of new framework abstractions. Programs using Parkour are also full Hadoop programs, with complete access to absolutely everything possible in raw Java Hadoop MapReduce.</span></td>
</tr>
<tr>
<td align="CENTER" height="47" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Hama</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Top-Level open source project, allowing you to do advanced analytics beyond MapReduce. Many data analysis techniques such as machine learning and graph algorithms require iterative computations, this is where Bulk Synchronous Parallel model can be more effective than "plain" MapReduce.</span></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Datasalt Pangool</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">A new MapReduce paradigm. A new API for MR jobs, in higher level than Java.</span></td>
</tr>
<tr>
<td align="CENTER" height="92" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Tez</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Tez is a proposal to develop a generic application which can be used to process complex data-processing task DAGs and runs natively on Apache Hadoop YARN. Tez generalizes the MapReduce paradigm to a more powerful framework based on expressing computations as a dataflow graph. Tez is not meant directly for end-users – in fact it enables developers to build end-user applications with much better performance and flexibility. Hadoop has traditionally been a batch-processing platform for large amounts of data. However, there are a lot of use cases for near-real-time performance of query processing. There are also several workloads, such as Machine Learning, which do not fit will into the MapReduce paradigm. Tez helps Hadoop address these use cases. Tez framework constitutes part of Stinger initiative (a low latency based SQL type query interface for Hadoop based on Hive).</span></td>
</tr>
<tr>
<td align="CENTER" height="62" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache DataFu</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">DataFu provides a collection of Hadoop MapReduce jobs and functions in higher level languages based on it to perform data analysis. It provides functions for common statistics tasks (e.g. quantiles, sampling), PageRank, stream sessionization, and set and bag operations. DataFu also provides Hadoop jobs for incremental data processing in MapReduce. DataFu is a collection of Pig UDFs (including PageRank, sessionization, set operations, sampling, and much more) that were originally developed at LinkedIn.</span></td>
</tr>
<tr>
<td align="CENTER" height="62" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Pydoop</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Pydoop is a Python MapReduce and HDFS API for Hadoop, built upon the C++ Pipes and the C libhdfs APIs, that allows to write full-fledged MapReduce applications with HDFS access. Pydoop has several advantages over Hadoop’s built-in solutions for Python programming, i.e., Hadoop Streaming and Jython: being a CPython package, it allows you to access all standard library and third party modules, some of which may not be available.</span></td>
</tr>
<tr>
<td align="CENTER" colspan="2" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">NoSQL Databases</span></b></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Column Data Model</span></b></td>
<td align="CENTER" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Description</span></b></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache HBase</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Google BigTable Inspired. Non-relational distributed database. Ramdom, real-time r/w operations in column-oriented very large tables (BDDB: Big Data Data Base). It’s the backing system for MR jobs outputs. It’s the Hadoop database. It’s for backing Hadoop MapReduce jobs with Apache HBase tables</span></td>
</tr>
<tr>
<td align="CENTER" height="92" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Cassandra</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Distributed Non-SQL DBMS, it’s a BDDB. MR can retrieve data from Cassandra. This BDDB can run without HDFS, or on-top of HDFS (DataStax fork of Cassandra). HBase and its required supporting systems are derived from what is known of the original Google BigTable and Google File System designs (as known from the Google File System paper Google published in 2003, and the BigTable paper published in 2006). Cassandra on the other hand is a recent open source fork of a standalone database system initially coded by Facebook, which while implementing the BigTable data model, uses a system inspired by Amazon’s Dynamo for storing data (in fact much of the initial development work on Cassandra was performed by two Dynamo engineers recruited to Facebook from Amazon).</span></td>
</tr>
<tr>
<td align="CENTER" height="47" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Hypertable</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Database system inspired by publications on the design of Google's BigTable. The project is based on experience of engineers who were solving large-scale data-intensive tasks for many years. Hypertable runs on top of a distributed file system such as the Apache Hadoop DFS, GlusterFS, or the Kosmos File System (KFS). It is written almost entirely in C++. Sposored by Baidu the Chinese search engine.</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Accumulo</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Distributed key/value store is a robust, scalable, high performance data storage and retrieval system. Apache Accumulo is based on Google's BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift. Accumulo is software created by the NSA with security features.</span></td>
</tr>
<tr>
<td align="CENTER" bgcolor="#83CAFF" height="30" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Document Data Model</span></b></td>
<td align="CENTER" bgcolor="#83CAFF" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Description</span></b></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">MongoDB</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Document-oriented database system. It is part of the NoSQL family of database systems. Instead of storing data in tables as is done in a "classical" relational database, MongoDB stores structured data as JSON-like documents</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">RethinkDB</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">RethinkDB is built to store JSON documents, and scale to multiple machines with very little effort. It has a pleasant query language that supports really useful queries like table joins and group by, and is easy to setup and learn.</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">ArangoDB</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">An open-source database with a flexible data model for documents, graphs, and key-values. Build high performance applications using a convenient sql-like query language or JavaScript extensions.</span></td>
</tr>
<tr>
<td align="CENTER" bgcolor="#83CAFF" height="30" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Stream Data Model</span></b></td>
<td align="CENTER" bgcolor="#83CAFF" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Description</span></b></td>
</tr>
<tr>
<td align="CENTER" height="62" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">EventStore</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">An open-source, functional database with support for Complex Event Processing. It provides a persistence engine for applications using event-sourcing, or for storing time-series data. Event Store is written in C#, C++ for the server which runs on Mono or the .NET CLR, on Linux or Windows. Applications using Event Store can be written in JavaScript. Event sourcing (ES) is a way of persisting your application's state by storing the history that determines the current state of your application.</span></td>
</tr>
<tr>
<td align="CENTER" bgcolor="#83CAFF" height="30" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Key-Value Data Model</span></b></td>
<td align="CENTER" bgcolor="#83CAFF" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Description</span></b></td>
</tr>
<tr>
<td align="CENTER" height="47" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Redis DataBase</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Redis is an open-source, networked, in-memory, key-value data store with optional durability. It is written in ANSI C. In its outer layer, the Redis data model is a dictionary which maps keys to values. One of the main differences between Redis and other structured storage systems is that Redis supports not only strings, but also abstract data types. Sponsored by Pivotal and VMWare. It’s BSD licensed.</span></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Linkedin Voldemort</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Distributed data store that is designed as a key-value store used by LinkedIn for high-scalability storage.</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">RocksDB</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">RocksDB is an embeddable persistent key-value store for fast storage. RocksDB can also be the foundation for a client-server database but our current focus is on embedded workloads.</span></td>
</tr>
<tr>
<td align="CENTER" height="47" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">OpenTSDB</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">OpenTSDB is a distributed, scalable Time Series Database (TSDB) written on top of HBase. OpenTSDB was written to address a common need: store, index and serve metrics collected from computer systems (network gear, operating systems, applications) at a large scale, and make this data easily accessible and graphable.</span></td>
</tr>
<tr>
<td align="CENTER" bgcolor="#83CAFF" height="30" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Graph Data Model</span></b></td>
<td align="CENTER" bgcolor="#83CAFF" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Description</span></b></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">ArangoDB</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">An open-source database with a flexible data model for documents, graphs, and key-values. Build high performance applications using a convenient sql-like query language or JavaScript extensions.</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Neo4j</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">An open-source graph database writting entirely in Java. It is an embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables.</span></td>
</tr>
<tr>
<td align="CENTER" bgcolor="#83CAFF" height="30" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">NewSQL Databases</span></b></td>
<td align="CENTER" bgcolor="#83CAFF" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Description</span></b></td>
</tr>
<tr>
<td align="CENTER" height="47" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">TokuDB</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">TokuDB is a storage engine for MySQL and MariaDB that is specifically designed for high performance on write-intensive workloads. It achieves this via Fractal Tree indexing. TokuDB is a scalable, ACID and MVCC compliant storage engine. TokuDB is one of the technologies that enable Big Data in MySQL.</span></td>
</tr>
<tr>
<td align="CENTER" height="47" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">HandlerSocket</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">HandlerSocket is a NoSQL plugin for MySQL/MariaDB (the storage engine of MySQL). It works as a daemon inside the mysqld process, accepting TCP connections, and executing requests from clients. HandlerSocket does not support SQL queries. Instead, it supports simple CRUD operations on tables. HandlerSocket can be much faster than mysqld/libmysql in some cases because it has lower CPU, disk, and network overhead.</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Akiban Server</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Akiban Server is an open source database that brings document stores and relational databases together. Developers get powerful document access alongside surprisingly powerful SQL.</span></td>
</tr>
<tr>
<td align="CENTER" height="77" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Drizzle</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Drizzle is a re-designed version of the MySQL v6.0 codebase and is designed around a central concept of having a microkernel architecture. Features such as the query cache and authentication system are now plugins to the database, which follow the general theme of "pluggable storage engines" that were introduced in MySQL 5.1. It supports PAM, LDAP, and HTTP AUTH for authentication via plugins it ships. Via its plugin system it currently supports logging to files, syslog, and remote services such as RabbitMQ and Gearman. Drizzle is an ACID-compliant relational database that supports transactions via an MVCC design</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Haeinsa</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Haeinsa is linearly scalable multi-row, multi-table transaction library for HBase. Use Haeinsa if you need strong ACID semantics on your HBase cluster. Is based on Google Perlocator concept.</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">SenseiDB</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Open-source, distributed, realtime, semi-structured database. Some Features: Full-text search, Fast realtime updates, Structured and faceted search, BQL: SQL-like query language, Fast key-value lookup, High performance under concurrent heavy update and query volumes, Hadoop integration</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Sky</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Sky is an open source database used for flexible, high performance analysis of behavioral data. For certain kinds of data such as clickstream data and log data, it can be several orders of magnitude faster than traditional approaches such as SQL databases or Hadoop.</span></td>
</tr>
<tr>
<td align="CENTER" height="62" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">BayesDB</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">BayesDB, a Bayesian database table, lets users query the probable implications of their tabular data as easily as an SQL database lets them query the data itself. Using the built-in Bayesian Query Language (BQL), users with no statistics training can solve basic data science problems, such as detecting predictive relationships between variables, inferring missing values, simulating probable observations, and identifying statistically similar database entries.</span></td>
</tr>
<tr>
<td align="CENTER" height="62" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">InfluxDB</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">InfluxDB is an open source distributed time series database with no external dependencies. It's useful for recording metrics, events, and performing analytics. It has a built-in HTTP API so you don't have to write any server side code to get up and running. InfluxDB is designed to be scalable, simple to install and manage, and fast to get data in and out. It aims to answer queries in real-time. That means every data point is indexed as it comes in and is immediately available in queries that should return in < 100ms.</span></td>
</tr>
<tr>
<td align="CENTER" bgcolor="#83CAFF" height="30" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">SQL-On-Hadoop</span></b></td>
<td align="CENTER" bgcolor="#83CAFF" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Description</span></b></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Hive</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Data Warehouse infrastructure developed by Facebook. Data summarization, query, and analysis. It’s provides SQL-like language (not SQL92 compliant): HiveQL.</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache HCatalog</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">HCatalog’s table abstraction presents users with a relational view of data in the Hadoop Distributed File System (HDFS) and ensures that users need not worry about where or in what format their data is stored. Right now HCatalog is part of Hive. Only old versions are separated for download.</span></td>
</tr>
<tr>
<td align="CENTER" height="62" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">AMPLAB Shark</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Shark is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive. It can execute Hive QL queries up to 100 times faster than Hive without any modification to the existing data or queries. Shark supports Hive's query language, metastore, serialization formats, and user-defined functions, providing seamless integration with existing Hive deployments and a familiar, more powerful option for new ones. Shark is built on top of Spark</span></td>
</tr>
<tr>
<td align="CENTER" height="77" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Drill</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Drill is the open source version of Google's Dremel system which is available as an infrastructure service called Google BigQuery. In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). Apache Hadoop, originally inspired by Google's internal MapReduce system, is used by thousands of organizations processing large-scale datasets. Apache Hadoop is designed to achieve very high throughput, but is not designed to achieve the sub-second latency needed for interactive data analysis and exploration. Drill, inspired by Google's internal Dremel system, is intended to address this need</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Cloudera Impala</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">The Apache-licensed Impala project brings scalable parallel database technology to Hadoop, enabling users to issue low-latency SQL queries to data stored in HDFS and Apache HBase without requiring data movement or transformation. It's a Google Dremel clone (Big Query google).</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Facebook Presto</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Facebook has open sourced Presto, a SQL engine it says is on average 10 times faster than Hive for running queries across large data sets stored in Hadoop and elsewhere.</span></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Datasalt Splout SQL</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Splout allows serving an arbitrarily big dataset with high QPS rates and at the same time provides full SQL query syntax.</span></td>
</tr>
<tr>
<td align="CENTER" height="77" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Tajo</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Tajo is a robust big data relational and distributed data warehouse system for Apache Hadoop. Tajo is designed for low-latency and scalable ad-hoc queries, online aggregation, and ETL (extract-transform-load process) on large-data sets stored on HDFS (Hadoop Distributed File System) and other data sources. By supporting SQL standards and leveraging advanced database techniques, Tajo allows direct control of distributed execution and data flow across a variety of query evaluation strategies and optimization opportunities. For reference, the Apache Software Foundation announced Tajo as a Top-Level Project in April 2014.</span></td>
</tr>
<tr>
<td align="CENTER" height="77" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Phoenix</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Phoenix is a SQL skin over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows.</span></td>
</tr>
<tr>
<td align="CENTER" bgcolor="#83CAFF" height="30" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Data Ingestion</span></b></td>
<td align="CENTER" bgcolor="#83CAFF" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Description</span></b></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Flume</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Un-structured data agregator to HDFS.</span></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Sqoop</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">System for bulk data transfer between HDFS and structured datastores as RDBMS. Like Flume but from HDFS to RDBMS.</span></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Facebook Scribe</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Log agregator in real-time. It’s a Apache Thrift Service.</span></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Chukwa</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Large scale log aggregator, and analytics.</span></td>
</tr>
<tr>
<td align="CENTER" height="181" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Storm</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Storm is a complex event processor and distributed computation framework written predominantly in the Clojure programming language. Is a distributed real-time computation system for processing fast, large streams of data. Storm is an architecture based on master-workers paradigma. So a Storm cluster mainly consists of a master and worker nodes, with coordination done by Zookeeper. Storm makes use of zeromq (0mq, zeromq), an advanced, embeddable networking library. It provides a message queue, but unlike message-oriented middleware (MOM), a 0MQ system can run without a dedicated message broker. The library is designed to have a familiar socket-style API. Originally created by Nathan Marz and team at BackType, the project was open sourced after being acquired by Twitter. Storm was initially developed and deployed at BackType in 2011. After 7 months of development BackType was acquired by Twitter in July 2011. Storm was open sourced in September 2011. Hortonworks is developing a Storm-on-YARN version and plans finish the base-level integration in 2013 Q4. This is the plan from Hortonworks. Yahoo/Hortonworks also plans to move Storm-on-YARN code from github.com/yahoo/storm-yarn to be a subproject of Apache Storm project in the near future. Twitter has recently released a Hadoop-Storm Hybrid called “Summingbird.” Summingbird fuses the two frameworks into one, allowing for developers to use Storm for short-term processing and Hadoop for deep data dives,. a system that aims to mitigate the tradeoffs between batch processing and stream processing by combining them into a hybrid system.</span></td>
</tr>
<tr>
<td align="CENTER" height="62" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Kafka</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Distributed publish-subscribe system for processing large amounts of streaming data. Kafka is a Message Queue developed by LinkedIn that persists messages to disk in a very performant manner. Because messages are persisted, it has the interesting ability for clients to rewind a stream and consume the messages again. Another upside of the disk persistence is that bulk importing the data into HDFS for offline analysis can be done very quickly and efficiently. Storm, developed by BackType (which was acquired by Twitter a year ago), is more about transforming a stream of messages into new streams.</span></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Netflix Suro</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Suro has its roots in Apache Chukwa, which was initially adopted by Netflix. Is a log agregattor like Storm, Samza.</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Samza</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Samza is a distributed stream processing framework. It uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management. Developed by http://www.linkedin.com/in/jaykreps Linkedin.</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Cloudera Morphline</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Cloudera Morphlines is a new open source framework that reduces the time and skills necessary to integrate, build, and change Hadoop processing applications that extract, transform, and load data into Apache Solr, Apache HBase, HDFS, enterprise data warehouses, or analytic online dashboards.</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">HIHO</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">This project is a framework for connecting disparate data sources with the Apache Hadoop system, making them interoperable. HIHO connects Hadoop with multiple RDBMS and file systems, so that data can be loaded to Hadoop and unloaded from Hadoop</span></td>
</tr>
<tr>
<td align="CENTER" bgcolor="#83CAFF" height="30" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Service Programming</span></b></td>
<td align="CENTER" bgcolor="#83CAFF" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Description</span></b></td>
</tr>
<tr>
<td align="CENTER" height="77" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Thrift</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">A cross-language RPC framework for service creations. It’s the service base for Facebook technologies (the original Thrift contributor). Thrift provides a framework for developing and accessing remote services. It allows developers to create services that can be consumed by any application that is written in a language that there are Thrift bindings for. Thrift manages serialization of data to and from a service, as well as the protocol that describes a method invocation, response, etc. Instead of writing all the RPC code -- you can just get straight to your service logic. Thrift uses TCP and so a given service is bound to a particular port.</span></td>
</tr>
<tr>
<td align="CENTER" height="106" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Zookeeper</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">It’s a coordination service that gives you the tools you need to write correct distributed applications. ZooKeeper was developed at Yahoo! Research. Several Hadoop projects are already using ZooKeeper to coordinate the cluster and provide highly-available distributed services. Perhaps most famous of those are Apache HBase, Storm, Kafka. ZooKeeper is an application library with two principal implementations of the APIs—Java and C—and a service component implemented in Java that runs on an ensemble of dedicated servers. Zookeeper is for building distributed systems, simplifies the development process, making it more agile and enabling more robust implementations. Back in 2006, Google published a paper on "Chubby", a distributed lock service which gained wide adoption within their data centers. Zookeeper, not surprisingly, is a close clone of Chubby designed to fulfill many of the same roles for HDFS and other Hadoop infrastructure.</span></td>
</tr>
<tr>
<td align="CENTER" height="47" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Avro</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Avro is a framework for modeling, serializing and making Remote Procedure Calls (RPC). Avro data is described by a schema, and one interesting feature is that the schema is stored in the same file as the data it describes, so files are self-describing. Avro does not require code generation. This framework can compete with other similar tools like: Apache Thrift, Google Protocol Buffers, ZeroC ICE, and so on.</span></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Curator</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Curator is a set of Java libraries that make using Apache ZooKeeper much easier.</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache karaf</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Karaf is an OSGi runtime that runs on top of any OSGi framework and provides you a set of services, a powerful provisioning concept, an extensible shell & more.</span></td>
</tr>
<tr>
<td align="CENTER" height="47" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Twitter Elephant Bird</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Elephant Bird is a project that provides utilities (libraries) for working with LZOP-compressed data. It also provides a container format that supports working with Protocol Buffers, Thrift in MapReduce, Writables, Pig LoadFuncs, Hive SerDe, HBase miscellanea. This open source library is massively used in Twitter.</span></td>
</tr>
<tr>
<td align="CENTER" height="62" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Linkedin Norbert</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Norbert is a library that provides easy cluster management and workload distribution. With Norbert, you can quickly distribute a simple client/server architecture to create a highly scalable architecture capable of handling heavy traffic. Implemented in Scala, Norbert wraps ZooKeeper, Netty and uses Protocol Buffers for transport to make it easy to build a cluster aware application. A Java API is provided and pluggable load balancing strategies are supported with round robin and consistent hash strategies provided out of the box.</span></td>
</tr>
<tr>
<td align="CENTER" bgcolor="#83CAFF" height="30" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Scheduling</span></b></td>
<td align="CENTER" bgcolor="#83CAFF" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Description</span></b></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Oozie</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Workflow scheduler system for MR jobs using DAGs (Direct Acyclical Graphs). Oozie Coordinator can trigger jobs by time (frequency) and data availabilit</span></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Linkedin Azkaban</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Hadoop workflow management. A batch job scheduler can be seen as a combination of the cron and make Unix utilities combined with a friendly UI.</span></td>
</tr>
<tr>
<td align="CENTER" height="106" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Falcon</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache™ Falcon is a data management framework for simplifying data lifecycle management and processing pipelines on Apache Hadoop®. It enables users to configure, manage and orchestrate data motion, pipeline processing, disaster recovery, and data retention workflows. Instead of hard-coding complex data lifecycle capabilities, Hadoop applications can now rely on the well-tested Apache Falcon framework for these functions. Falcon’s simplification of data management is quite useful to anyone building apps on Hadoop. Data Management on Hadoop encompasses data motion, process orchestration, lifecycle management, data discovery, etc. among other concerns that are beyond ETL. Falcon is a new data processing and management platform for Hadoop that solves this problem and creates additional opportunities by building on existing components within the Hadoop ecosystem (ex. Apache Oozie, Apache Hadoop DistCp etc.) without reinventing the wheel.</span></td>
</tr>
<tr>
<td align="CENTER" bgcolor="#83CAFF" height="30" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Machine Learning</span></b></td>
<td align="CENTER" bgcolor="#83CAFF" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Description</span></b></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Mahout</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Machine learning library and math library, on top of MapReduce.</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">WEKA</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand. Weka is free software available under the GNU General Public License.</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Cloudera Oryx</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">The Oryx open source project provides simple, real-time large-scale machine learning / predictive analytics infrastructure. It implements a few classes of algorithm commonly used in business applications: collaborative filtering / recommendation, classification / regression, and clustering.</span></td>
</tr>
<tr>
<td align="CENTER" height="47" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">MADlib</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">The MADlib project leverages the data-processing capabilities of an RDBMS to analyze data. The aim of this project is the integration of statistical data analysis into databases. The MADlib project is self-described as the Big Data Machine Learning in SQL for Data Scientists. The MADlib software project began the following year as a collaboration between researchers at UC Berkeley and engineers and data scientists at EMC/Greenplum (now Pivotal)</span></td>
</tr>
<tr>
<td align="CENTER" bgcolor="#83CAFF" height="30" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Bechmarking</span></b></td>
<td align="CENTER" bgcolor="#83CAFF" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Description</span></b></td>
</tr>
<tr>
<td align="CENTER" height="77" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Hadoop Benchmarking</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">There are two main JAR files in Apache Hadoop for benchmarking. This JAR are micro-benchmarks for testing particular parts of the infrastructure, for instance TestDFSIO analyzes the disk system, TeraSort evaluates MapReduce tasks, WordCount measures cluster performance, etc. Micro-Benchmarks are packaged in the tests and exmaples JAR files, and you can get a list of them, with descriptions, by invoking the JAR file with no arguments. With regards Apache Hadoop 2.2.0 stable version we have available the following JAR files for test, examples and benchmarking. The Hadoop micro-benchmarks, are bundled in this JAR files: hadoop-mapreduce-examples-2.2.0.jar, hadoop-mapreduce-client-jobclient-2.2.0-tests.jar.</span></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Yahoo Gridmix3</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Hadoop cluster benchmarking from Yahoo engineer team.</span></td>
</tr>
<tr>
<td align="CENTER" height="62" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">PUMA Benchmarking</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Benchmark suite which represents a broad range of MapReduce applications exhibiting application characteristics with high/low computation and high/low shuffle volumes. There are a total of 13 benchmarks, out of which Tera-Sort, Word-Count, and Grep are from Hadoop distribution. The rest of the benchmarks were developed in-house and are currently not part of the Hadoop distribution. The three benchmarks from Hadoop distribution are also slightly modified to take number of reduce tasks as input from the user and generate final time completion statistics of jobs.</span></td>
</tr>
<tr>
<td align="CENTER" height="47" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Berkeley SWIM Benchmark</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">The SWIM benchmark (Statistical Workload Injector for MapReduce), is a benchmark representing a real-world big data workload developed by University of California at Berkley in close cooperation with Facebook. This test provides rigorous measurements of the performance of MapReduce systems comprised of real industry workloads..</span></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Intel HiBench</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">HiBench is a Hadoop benchmark suite.</span></td>
</tr>
<tr>
<td align="CENTER" bgcolor="#83CAFF" height="30" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Security</span></b></td>
<td align="CENTER" bgcolor="#83CAFF" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Description</span></b></td>
</tr>
<tr>
<td align="CENTER" height="47" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Sentry</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Sentry is the next step in enterprise-grade big data security and delivers fine-grained authorization to data stored in Apache Hadoop™. An independent security module that integrates with open source SQL query engines Apache Hive and Cloudera Impala, Sentry delivers advanced authorization controls to enable multi-user applications and cross-functional processes for enterprise data sets. Sentry was a Cloudera development.</span></td>
</tr>
<tr>
<td align="CENTER" height="47" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Knox Gateway</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">System that provides a single point of secure access for Apache Hadoop clusters. The goal is to simplify Hadoop security for both users (i.e. who access the cluster data and execute jobs) and operators (i.e. who control access and manage the cluster). The Gateway runs as a server (or cluster of servers) that serve one or more Hadoop clusters.</span></td>
</tr>
<tr>
<td align="CENTER" bgcolor="#83CAFF" height="30" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">System Deployment</span></b></td>
<td align="CENTER" bgcolor="#83CAFF" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Description</span></b></td>
</tr>
<tr>
<td align="CENTER" height="77" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Ambari</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Intuitive, easy-to-use Hadoop management web UI backed by its RESTful APIs. Apache Ambari was donated by Hortonworks team to the ASF. It's a powerful and nice interface for Hadoop and other typical applications from the Hadoop ecosystem. Apache Ambari is under a heavy development, and it will incorporate new features in a near future. For example Ambari is able to deploy a complete Hadoop system from scratch, however is not possible use this GUI in a Hadoop system that is already running. The ability to provisioning the operating system could be a good addition, however probably is not in the roadmap..</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Cloudera HUE</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Web application for interacting with Apache Hadoop. It's not a deploment tool, is an open-source Web interface that supports Apache Hadoop and its ecosystem, licensed under the Apache v2 license. HUE is used for Hadoop and its ecosystem user operations.</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Whirr</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Whirr is a set of libraries for running cloud services. It allows you to use simple commands to boot clusters of distributed systems for testing and experimentation. Apache Whirr makes booting clusters easy.</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Mesos</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Mesos is a cluster manager that provides resource sharing and isolation across cluster applications. Like HTCondor, SGE or Troque can do it. However Mesos is hadoop centred design</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Marathon</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Marathon is a Mesos framework for long-running services. Given that you have Mesos running as the kernel for your datacenter, Marathon is the init or upstart daemon.</span></td>
</tr>
<tr>
<td align="CENTER" height="62" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Brooklyn</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Brooklyn is a library that simplifies application deployment and management. For deployment, it is designed to tie in with other tools, giving single-click deploy and adding the concepts of manageable clusters and fabrics: Many common software entities available out-of-the-box. Integrates with Apache Whirr -- and thereby Chef and Puppet -- to deploy well-known services such as Hadoop and elasticsearch (or use POBS, plain-old-bash-scripts) Use PaaS's such as OpenShift, alongside self-built clusters, for maximum flexibility</span></td>
</tr>
<tr>
<td align="CENTER" height="77" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Hortonworks HOYA</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">HOYA is defined as “running HBase On YARN”. The Hoya tool is a Java tool, and is currently CLI driven. It takes in a cluster specification – in terms of the number of regionservers, the location of HBASE_HOME, the ZooKeeper quorum hosts, the configuration that the new HBase cluster instance should use and so on. So HOYA is for HBase deployment using a tool developed on top of YARN. Once the cluster has been started, the cluster can be made to grow or shrink using the Hoya commands. The cluster can also be stopped and later resumed. Hoya implements the functionality through YARN APIs and HBase’s shell scripts. The goal of the prototype was to have minimal code changes and as of this writing, it has required zero code changes in Hbase.</span></td>
</tr>
<tr>
<td align="CENTER" height="47" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Helix</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Helix is a generic cluster management framework used for the automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes. Originally developed by Linkedin, now is in an incubator project at Apache. Helix is developed on top of Zookeeper for coordination tasks. .</span></td>
</tr>
<tr>
<td align="CENTER" height="62" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Bigtop</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Bigtop was originally developed and released as an open source packaging infrastructure by Cloudera. BigTop is used for some vendors to build their own distributions based on Apache Hadoop (CDH, Pivotal HD, Intel's distribution), however Apache Bigtop does many more tasks, like continuous integration testing (with Jenkins, maven, ...) and is useful for packaging (RPM and DEB), deployment with Puppet, and so on. Apache Bigtop could be considered as a community effort with a main focus: put all bits of the Hadoop ecosystem as a whole, rather than individual projects.</span></td>
</tr>
<tr>
<td align="CENTER" height="62" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Buildoop</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Buildoop is an open source project licensed under Apache License 2.0, based on Apache BigTop idea. Buildoop is a collaboration project that provides templates and tools to help you create custom Linux-based systems based on Hadoop ecosystem. The project is built from scrach using Groovy language, and is not based on a mixture of tools like BigTop does (Makefile, Gradle, Groovy, Maven), probably is easier to programming than BigTop, and the desing is focused in the basic ideas behind the buildroot Yocto Project. The project is in early stages of development right now.</span></td>
</tr>
<tr>
<td align="CENTER" height="47" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Deploop</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Deploop is a tool for provisioning, managing and monitoring Apache Hadoop clusters focused in the Lambda Architecture. LA is a generic design based on the concepts of Twitter engineer Nathan Marz. This generic architecture was designed addressing common requirements for big data. The Deploop system is in ongoing development, in alpha phases of maturity. The system is setup on top of highly scalable techologies like Puppet and MCollective.</span></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Applications</span></b></td>
<td align="CENTER" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Description</span></b></td>
</tr>
<tr>
<td align="CENTER" height="47" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Nutch</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Highly extensible and scalable open source web crawler software project. A search engine based on Lucene: A Web crawler is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing. Web crawlers can copy all the pages they visit for later processing by a search engine that indexes the downloaded pages so that users can search them much more quickly.</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Sphnix Search Server</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Sphinx lets you either batch index and search data stored in an SQL database, NoSQL storage, or just files quickly and easily — or index and search data on the fly, working with Sphinx pretty much as with a database server.</span></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache OODT</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">OODT was originally developed at NASA Jet Propulsion Laboratory to support capturing, processing and sharing of data for NASA's scientific archives</span></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">HIPI Library</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">HIPI is a library for Hadoop's MapReduce framework that provides an API for performing image processing tasks in a distributed computing environment.</span></td>
</tr>
<tr>
<td align="CENTER" height="77" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">PivotalR</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">PivotalR is a package that enables users of R, the most popular open source statistical programming language and environment to interact with the Pivotal (Greenplum) Database as well as Pivotal HD / HAWQ and the open-source database PostgreSQL for Big Data analytics. R is a programming language and data analysis software: you do data analysis in R by writing scripts and functions in the R programming language. R is a complete, interactive, object-oriented language: designed by statisticians, for statisticians. The language provides objects, operators and functions that make the process of exploring, modeling, and visualizing data a natural one.</span></td>
</tr>
<tr>
<td align="CENTER" bgcolor="#83CAFF" height="30" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Development Frameworks</span></b></td>
<td align="CENTER" bgcolor="#83CAFF" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Description</span></b></td>
</tr>
<tr>
<td align="CENTER" height="136" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Spring XD</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Spring XD (Xtreme Data) is a evolution of Spring Java application development framework to help Big Data Applications by Pivotal. SpringSource was the company created by the founders of the Spring Framework. SpringSource was purchased by VMware where it was maintained for some time as a separate division within VMware. Later VMware, and its parent company EMC Corporation, formally created a joint venture called Pivotal. Spring XD is more than development framework library, is a distributed, and extensible system for data ingestion, real time analytics, batch processing, and data export. It could be considered as alternative to Apache Flume/Sqoop/Oozie in some scenarios. Spring XD is part of Pivotal Spring for Apache Hadoop (SHDP). SHDP, integrated with Spring, Spring Batch and Spring Data are part of the Spring IO Platform as foundational libraries. Building on top of, and extending this foundation, the Spring IO platform provides Spring XD as big data runtime. Spring for Apache Hadoop (SHDP) aims to help simplify the development of Hadoop based applications by providing a consistent configuration and API across a wide range of Hadoop ecosystem projects such as Pig, Hive, and Cascading in addition to providing extensions to Spring Batch for orchestrating Hadoop based workflows.</span></td>
</tr>
<tr>
<td align="CENTER" bgcolor="#83CAFF" height="30" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Categorize Pending ...</span></b></td>
<td align="CENTER" bgcolor="#83CAFF" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><b><span style="font-family: Times New Roman;">Description</span></b></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Twitter Summingbird</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">a system that aims to mitigate the tradeoffs between batch processing and stream processing by combining them into a hybrid system. In the case of Twitter, Hadoop handles batch processing, Storm handles stream processing, and the hybrid system is called Summingbird.</span></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Kiji</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Build Real-time Big Data Applications on Apache HBase.</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">S4 Yahoo</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">S4 is a general-purpose, distributed, scalable, fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data.</span></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Metamarkers Druid</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Realtime analytical data store.</span></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Concurrent Cascading</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Application framework for Java developers to simply develop robust Data Analytics and Data Management applications on Apache Hadoop.</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Concurrent Lingual</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Open source project enabling fast and simple Big Data application development on Apache Hadoop. project that delivers ANSI-standard SQL technology to easily build new and integrate existing applications onto Hadoop</span></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Concurrent Pattern</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Machine Learning for Cascading on Apache Hadoop through an API, and standards based PMML</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Giraph</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Giraph is an iterative graph processing system built for high scalability. For example, it is currently used at Facebook to analyze the social graph formed by users and their connections. Giraph originated as the open-source counterpart to Pregel, the graph processing architecture developed at Google</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Talend</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Talend is an open source software vendor that provides data integration, data management, enterprise application integration and big data software and solutions.</span></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Akka Toolkit</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Akka is an open-source toolkit and runtime simplifying the construction of concurrent applications on the Java platform.</span></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Eclipse BIRT</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">BIRT is an open source Eclipse-based reporting system that integrates with your Java/Java EE application to produce compelling reports.</span></td>
</tr>
<tr>
<td align="CENTER" height="47" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Spango BI</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">SpagoBI is an Open Source Business Intelligence suite, belonging to the free/open source SpagoWorld initiative, founded and supported by Engineering Group. It offers a large range of analytical functions, a highly functional semantic layer often absent in other open source platforms and projects, and a respectable set of advanced data visualization features including geospatial analytics</span></td>
</tr>
<tr>
<td align="CENTER" height="47" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Jedox Palo</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Palo Suite combines all core applications — OLAP Server, Palo Web, Palo ETL Server and Palo for Excel — into one comprehensive and customisable Business Intelligence platform. The platform is completely based on Open Source products representing a high-end Business Intelligence solution which is available entirely free of any license fees.</span></td>
</tr>
<tr>
<td align="CENTER" height="32" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Twitter Finagle</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Finagle is an asynchronous network stack for the JVM that you can use to build asynchronous Remote Procedure Call (RPC) clients and servers in Java, Scala, or any JVM-hosted language.</span></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Intel GraphBuilder</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">library which provides tools to construct large-scale graphs on top of Apache Hadoop</span></td>
</tr>
<tr>
<td align="CENTER" height="17" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Apache Tika</span></td>
<td align="LEFT" style="border-bottom: 1px solid #000000; border-left: 1px solid #000000; border-right: 1px solid #000000; border-top: 1px solid #000000;" valign="MIDDLE"><span style="font-family: Times New Roman;">Toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries.</span></td>
</tr>
</tbody></table>
</div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com0tag:blogger.com,1999:blog-1743557738889130058.post-32385571492982031232014-09-10T03:56:00.001-07:002014-09-10T04:07:21.362-07:00Shell Script to get the status of HBase Cluster<div dir="ltr" style="text-align: left;" trbidi="on">
#/bin/bash<br />
bin=`dirname $0`<br />
bin=`cd $bin;pwd`<br />
<br />
HADOOP_HOME=/home/bigdata/installs/hadoop-1.2.1<br />
HBASE_HOME=/home/bigdata/installs/hbase-0.94.20<br />
DFS_REMAINING_WARNING=15<br />
DFS_REMAINING_CRITICAL=5<br />
ABNORMAL_QUERY="INCONSISTENT|CORRUPT|FAILED|Exception"<br />
<br />
<span style="color: blue;"># hbck and fsck report</span><br />
output=/home/bigdata/hdfscluster/cluster-status<br />
$HBASE_HOME/bin/hbase hbck >> $output<br />
$HADOOP_HOME/bin/hadoop fsck /hbase >> $output<br />
<br />
<span style="color: blue;"># check report</span><br />
count=`egrep -c "$ABNORMAL_QUERY" $output`<br />
if [ $count -eq 0 ]; then<br />
echo "[OK] Cluster is healthy." >> $output<br />
else<br />
echo "[ABNORMAL] Cluster is abnormal!" >> $output<br />
<br />
<span style="color: blue;"># Get the last matching entry in the report file</span><br />
last_entry=`egrep "$ABNORMAL_QUERY" $output | tail -1`<br />
echo "($count) $last_entry"<br />
exit $STATE_CRITICAL<br />
fi<br />
<br />
<span style="color: blue;"># HDFS usage</span><br />
dfs_remaining=`curl -s http://hbase-karthik:50070/dfshealth.jsp |egrep -o "DFS Remaining%.*%" | egrep -o "[0-9]*\.[0-9]*"`<br />
dfs_remaining_word="DFS Remaining%: ${dfs_remaining}%"<br />
echo "$dfs_remaining_word" >> $output<br />
<br />
<span style="color: blue;"># check HDFS usage</span><br />
dfs_remaining=`echo $dfs_remaining | awk -F '.' '{print $1}'`<br />
if [ $dfs_remaining -lt $DFS_REMAINING_CRITICAL ]; then<br />
echo "Low DFS space. $dfs_remaining_word"<br />
exit_status=$STATE_CRITICAL<br />
elif [ $dfs_remaining -lt $DFS_REMAINING_WARNING ]; then<br />
echo "Low DFS space. $dfs_remaining_word"<br />
exit_status=$STATE_WARNING<br />
else<br />
echo "HBase check OK - DFS and HBase healthy.<br />
$dfs_remaining_word"<br />
exit_status=$STATE_OK<br />
fi<br />
exit $exit_status</div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com7tag:blogger.com,1999:blog-1743557738889130058.post-6245739306638854642014-08-25T05:11:00.003-07:002014-08-25T05:11:50.419-07:00Apache Pig Working Example<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="color: blue;"># Uploading the Datasets into Hadoop HDFS</span><br />
<br />
bigdata@bigdata/$ hadoop fs -mkdir pig<br />
bigdata@bigdata/$ hadoop fs -ls<br />
Found 4 items<br />
drwxr-xr-x - bigdata supergroup 0 2014-07-07 13:01 /user/bigdata/backup<br />
drwxr-xr-x - bigdata supergroup 0 2014-07-09 17:40 /user/bigdata/datasets<br />
drwxr-xr-x - bigdata supergroup 0 2014-07-10 11:51 /user/bigdata/imagepath<br />
drwxr-xr-x - bigdata supergroup 0 2014-07-10 18:10 /user/bigdata/pig<br />
bigdata@bigdata/$ hadoop fs -put /home/bigdata/download/pig/excite-small.log /user/bigdata/pig<br />
bigdata@bigdata/$ hadoop fs -ls /user/bigdata/pig<br />
Found 1 items<br />
-rw-r--r-- 1 bigdata supergroup 208348 2014-07-10 18:05 /user/bigdata/pig/excite-small.log<br />
bigdata@bigdata/$<br />
<br />
<span style="color: blue;"># Open Pig in Terminal</span><br />
<br />
grunt> log LOAD '/user/bigdata/pig/excite-small.log' AS (user, timestamp, query);<br />
grunt> grpd GROUP log BY user;<br />
grunt> cntd FOREACH grpd GENERATE group, COUNT(log);<br />
grunt> STORE cntd INTO '/user/bigdata/pig/group_output';<br />
<br />
2014-07-10 18:08:42,840 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY<br />
2014-07-10 18:08:42,840 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED[FilterLogicExpressionSimplifier]}<br />
2014-07-10 18:08:42,842 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false<br />
2014-07-10 18:08:42,842 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner<br />
2014-07-10 18:08:42,843 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1<br />
2014-07-10 18:08:42,843 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1<br />
2014-07-10 18:08:42,849 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job<br />
2014-07-10 18:08:42,850 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3<br />
2014-07-10 18:08:42,850 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.<br />
2014-07-10 18:08:42,850 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator<br />
2014-07-10 18:08:42,851 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer1000000000 maxReducers999 totalInputFileSize208348<br />
2014-07-10 18:08:42,851 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1<br />
2014-07-10 18:08:42,851 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process<br />
2014-07-10 18:08:42,851 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job5158749338361837443.jar<br />
2014-07-10 18:08:44,644 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job5158749338361837443.jar created<br />
2014-07-10 18:08:44,647 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job<br />
2014-07-10 18:08:44,647 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.<br />
2014-07-10 18:08:44,647 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche<br />
2014-07-10 18:08:44,647 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []<br />
2014-07-10 18:08:44,668 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.<br />
2014-07-10 18:08:44,811 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1<br />
2014-07-10 18:08:44,811 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1<br />
2014-07-10 18:08:44,812 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1<br />
2014-07-10 18:08:45,168 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201407101041_0003<br />
2014-07-10 18:08:45,168 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases cntd,grpd,log<br />
2014-07-10 18:08:45,168 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: log[9,6],cntd[11,7],grpd[10,7] C: cntd[11,7],grpd[10,7] R: cntd[11,7]<br />
2014-07-10 18:08:45,168 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://bigdata50030/jobdetails.jsp?jobidjob_201407101041_0003<br />
2014-07-10 18:08:45,172 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete<br />
2014-07-10 18:08:45,172 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_201407101041_0003]<br />
2014-07-10 18:08:48,178 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete<br />
2014-07-10 18:08:48,179 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_201407101041_0003]<br />
2014-07-10 18:08:55,205 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete<br />
2014-07-10 18:08:55,205 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_201407101041_0003]<br />
2014-07-10 18:08:56,207 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_201407101041_0003]<br />
2014-07-10 18:09:00,250 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete<br />
2014-07-10 18:09:00,251 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:<br />
<br />
HadoopVersion<span class="Apple-tab-span" style="white-space: pre;"> </span>PigVersion<span class="Apple-tab-span" style="white-space: pre;"> </span>UserId<span class="Apple-tab-span" style="white-space: pre;"> </span>StartedAt<span class="Apple-tab-span" style="white-space: pre;"> </span>FinishedAt<span class="Apple-tab-span" style="white-space: pre;"> </span>Features<br />
1.2.1<span class="Apple-tab-span" style="white-space: pre;"> </span>0.13.0<span class="Apple-tab-span" style="white-space: pre;"> </span>bigdata<span class="Apple-tab-span" style="white-space: pre;"> </span>2014-07-10 18:08:42<span class="Apple-tab-span" style="white-space: pre;"> </span>2014-07-10 18:09:00<span class="Apple-tab-span" style="white-space: pre;"> </span>GROUP_BY<br />
<br />
Success!<br />
<br />
Job Stats (time in seconds):<br />
JobId<span class="Apple-tab-span" style="white-space: pre;"> </span>Maps<span class="Apple-tab-span" style="white-space: pre;"> </span>Reduces<span class="Apple-tab-span" style="white-space: pre;"> </span>MaxMapTime<span class="Apple-tab-span" style="white-space: pre;"> </span>MinMapTIme<span class="Apple-tab-span" style="white-space: pre;"> </span>AvgMapTime<span class="Apple-tab-span" style="white-space: pre;"> </span>MedianMapTime<span class="Apple-tab-span" style="white-space: pre;"> </span>MaxReduceTime<span class="Apple-tab-span" style="white-space: pre;"> </span>MinReduceTime<span class="Apple-tab-span" style="white-space: pre;"> </span>AvgReduceTime<span class="Apple-tab-span" style="white-space: pre;"> </span>MedianReducetime<span class="Apple-tab-span" style="white-space: pre;"> </span>Alias<span class="Apple-tab-span" style="white-space: pre;"> </span>Feature<span class="Apple-tab-span" style="white-space: pre;"> </span>Outputs<br />
job_201407101041_0003<span class="Apple-tab-span" style="white-space: pre;"> </span>1<span class="Apple-tab-span" style="white-space: pre;"> </span>1<span class="Apple-tab-span" style="white-space: pre;"> </span>1<span class="Apple-tab-span" style="white-space: pre;"> </span>1<span class="Apple-tab-span" style="white-space: pre;"> </span>1<span class="Apple-tab-span" style="white-space: pre;"> </span>1<span class="Apple-tab-span" style="white-space: pre;"> </span>8<span class="Apple-tab-span" style="white-space: pre;"> </span>8<span class="Apple-tab-span" style="white-space: pre;"> </span>8<span class="Apple-tab-span" style="white-space: pre;"> </span>8<span class="Apple-tab-span" style="white-space: pre;"> </span>cntd,grpd,log<span class="Apple-tab-span" style="white-space: pre;"> </span>GROUP_BY,COMBINER<span class="Apple-tab-span" style="white-space: pre;"> </span>/user/bigdata/pig/group_output,<br />
<br />
Input(s):<br />
Successfully read 4501 records (208725 bytes) from: "/user/bigdata/pig/excite-small.log"<br />
<br />
Output(s):<br />
Successfully stored 891 records (17051 bytes) in: "/user/bigdata/pig/group_output"<br />
<br />
Counters:<br />
Total records written : 891<br />
Total bytes written : 17051<br />
Spillable Memory Manager spill count : 0<br />
Total bags proactively spilled: 0<br />
Total records proactively spilled: 0<br />
<br />
Job DAG:<br />
job_201407101041_0003<br />
<br />
<br />
2014-07-10 18:09:00,268 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!<br />
<br />
<span style="color: blue;"># Analyzing the Output</span><br />
<br />
bigdata@bigdata/$ hadoop fs -ls /user/bigdata/pig/<br />
Found 2 items<br />
-rw-r--r-- 1 bigdata supergroup 208348 2014-07-10 18:05 /user/bigdata/pig/excite-small.log<br />
drwxr-xr-x - bigdata supergroup 0 2014-07-10 18:15 /user/bigdata/pig/group_output<br />
bigdata@bigdata/$ hadoop fs -ls /user/bigdata/pig/group_output<br />
Found 3 items<br />
-rw-r--r-- 1 bigdata supergroup 0 2014-07-10 18:15 /user/bigdata/pig/group_output/_SUCCESS<br />
drwxr-xr-x - bigdata supergroup 0 2014-07-10 18:15 /user/bigdata/pig/group_output/_logs<br />
-rw-r--r-- 1 bigdata supergroup 17051 2014-07-10 18:15 /user/bigdata/pig/group_output/part-r-00000<br />
bigdata@bigdata/$<br />
<br />
hadoop fs -cat /user/bigdata/pig/group_output/part-r-00000<br />
<br />
6D2B4F8DEE6D3EAF<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
6D39FA30ABF97CF1<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
6D3EDFFC5B370C42<span class="Apple-tab-span" style="white-space: pre;"> </span>9<br />
6D906622D87278E5<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
6D9E037CA1E489A3<span class="Apple-tab-span" style="white-space: pre;"> </span>7<br />
6DC44AB70EB110CB<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
6E2A4B3FED94E84D<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
6E75DE6D131ADADF<span class="Apple-tab-span" style="white-space: pre;"> </span>9<br />
6EBE46CEB6BD249E<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
6F0A679A71DC2F39<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
6FA5EF0FFF3D6CB3<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
6FB3D2D282761F25<span class="Apple-tab-span" style="white-space: pre;"> </span>32<br />
6FFB6B341F8FA1F1<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
7039BF2E7257EC24<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
70AFB9518EB9997A<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
70F5A03AEC3DA7BF<span class="Apple-tab-span" style="white-space: pre;"> </span>9<br />
714E77EBD1691710<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
719CF3C90004051C<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
71CAF72D8BF0CE30<span class="Apple-tab-span" style="white-space: pre;"> </span>6<br />
71D2D4E6C01FD7E1<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
71ED89525010D11A<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
72270DEAFE0BF9FC<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
726DA9740623758A<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
729B745475893F44<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
72DF27659DE7BE5A<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
736AB65C0C0C10EC<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
736D28D439E9FE2D<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
73AEEF0996A27551<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
73BD52528B217820<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
73D74648CC2CA35E<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
73D7DE81DF856F8D<span class="Apple-tab-span" style="white-space: pre;"> </span>7<br />
74165896F4654D30<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
742C7109B9A57A73<span class="Apple-tab-span" style="white-space: pre;"> </span>14<br />
74406AAFF322E81B<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
74BB49A63C0C4D79<span class="Apple-tab-span" style="white-space: pre;"> </span>7<br />
752FE259E734662C<span class="Apple-tab-span" style="white-space: pre;"> </span>15<br />
75C18D86685AAEAE<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
75F1B47D5BADE010<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
762B8ED16AC158C6<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
762F03C6189BBB1D<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
763A0EADC8BD3533<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
765EAEA8FC0AC936<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
76D05AFAC10D18D0<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
76FC27D6D468BC91<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
772DFF55A7E701DC<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
77752FDFA121C234<span class="Apple-tab-span" style="white-space: pre;"> </span>10<br />
778EBE06AC999541<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
77AC89619076A8E1<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
781BF65D3D769ED3<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
78D021506018889F<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
78EC0A6026552159<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
790FC18760C238A6<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
7965E55C29F534F3<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
79785E25B2F213B8<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
7981B1CD3861167E<span class="Apple-tab-span" style="white-space: pre;"> </span>13<br />
79E7FA8E26F7349E<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
7A17DCDA7EB5033C<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
7A8D9CFC957C7FCA<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
7AE07E7F0053F0A9<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
7AFA10B67193DBF4<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
7B99756B742D8E89<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
7BABD6CD11ABD104<span class="Apple-tab-span" style="white-space: pre;"> </span>6<br />
7C32FF6A8E176FA6<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
7C60C0A2EBF7A3E7<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
7CCA971B578450AB<span class="Apple-tab-span" style="white-space: pre;"> </span>8<br />
7D1DD1781EDB79A0<span class="Apple-tab-span" style="white-space: pre;"> </span>9<br />
7D286B5592D83BBE<span class="Apple-tab-span" style="white-space: pre;"> </span>59<br />
7D61F86F1732EDC6<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
7DF76165C8DCAD93<span class="Apple-tab-span" style="white-space: pre;"> </span>8<br />
7E0D6592BC38322F<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
7E4DF85483B4A9E8<span class="Apple-tab-span" style="white-space: pre;"> </span>6<br />
7E7371B2288BF353<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
7F173C311C29F64A<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
7F419DC93BF79BD3<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
7F88C9EC4CD0BB3A<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
7FEACB5630B25683<span class="Apple-tab-span" style="white-space: pre;"> </span>6<br />
80BC451BC7EE9FE0<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
810EFC647D40E4CB<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
818D7157855D5D48<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
8192D9F4FCBDA81B<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
81CC31A8588135F2<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
81E73204D37D7E50<span class="Apple-tab-span" style="white-space: pre;"> </span>13<br />
8223F74BED5A061A<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
824F413FA37520BF<span class="Apple-tab-span" style="white-space: pre;"> </span>10<br />
82A061D8FA28AAC4<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
833A52784CCEF115<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
83607290B8BEAFC6<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
83FD0A3E1FB0FAD9<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
83FE595CE7D05209<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
84100211438C80DC<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
84312FE558AC325A<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
8462E97CB561C077<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
85690B11E1ED5FB5<span class="Apple-tab-span" style="white-space: pre;"> </span>6<br />
858E8CCC3D889E86<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
85D5A78E64418242<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
86609B9799FE7CA5<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
86D1E09F8F8F5B9F<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
86EAEA913CC8D7C4<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
8775EDC82244F40A<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
887E337AB02C2BF7<span class="Apple-tab-span" style="white-space: pre;"> </span>10<br />
88ED9040788FF9FD<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
893C3ADD0EFBBECB<span class="Apple-tab-span" style="white-space: pre;"> </span>10<br />
8A095E9B925D411D<span class="Apple-tab-span" style="white-space: pre;"> </span>17<br />
8A7BC9076D6F166F<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
8AFBE95F88FA5C99<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
8B2065581C770F50<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
8B6A0AB3CF0804D1<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
8BCE9868E3F8CC75<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
8C549AB7E029345E<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
8CDEE772A295AA02<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
8D2E263D44C09DA0<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
8D4626A753C2D3CF<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
8DCC6FC2AC2EBB4F<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
8DD654BF9AD99482<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
8DDB5A84C1C94E60<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
8E0A39B262A2C60B<span class="Apple-tab-span" style="white-space: pre;"> </span>14<br />
8E1A8EA81FEA8A30<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
8E1E0FBF51628427<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
8E53B09082CF6A75<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
8EDC8ECDA4AD017E<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
8EE83362186F49EF<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
8F0ECFEDAB4A03DB<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
8FE55B4D65B22166<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
90498194D0D0A2F8<span class="Apple-tab-span" style="white-space: pre;"> </span>10<br />
90B21F67EEA27FEA<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
910B9A0392D29303<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
917FDFC55A0EA9ED<span class="Apple-tab-span" style="white-space: pre;"> </span>38<br />
918BE10E19D4E24F<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
91A98BC9BEDCF053<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
91AC6C05E866DAF4<span class="Apple-tab-span" style="white-space: pre;"> </span>7<br />
91C510E369703D99<span class="Apple-tab-span" style="white-space: pre;"> </span>10<br />
9218F60AF54851E4<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
923C890A8A149FD5<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
936BADBE23F87A96<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
93750016D02B541F<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
937EB54AB1F9EE94<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
93CE4FF9E36FA112<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
93E0FC7EE40C63BF<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
944E332F3090AA60<span class="Apple-tab-span" style="white-space: pre;"> </span>11<br />
9471B615A518B7E4<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
949946B881F137F0<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
94B5B95EAEED1FE7<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
9541D2047C5360F9<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
972F13CE9A8E2FA3<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
9742AECDDE7E5895<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
975635FE3F837969<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
977C9CEB63318175<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
977E1B646010C88E<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
97E8A9F4A0DFD224<span class="Apple-tab-span" style="white-space: pre;"> </span>6<br />
986AC2B6E1384999<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
98825190824FBCEC<span class="Apple-tab-span" style="white-space: pre;"> </span>7<br />
98F5BBD3754D292F<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
98FA1E93D617E416<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
9912390F5E1D690F<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
99BA461C4F96233F<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
99D8C7D14A864902<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
9A04A6464335D051<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
9A10B373FA529557<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
9A33FFD53E103291<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
9A5F075ABDE5635D<span class="Apple-tab-span" style="white-space: pre;"> </span>24<br />
9AD67B4FE4D37CDB<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
9B6691E12BC09D27<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
9BB2A0503EE9CE8E<span class="Apple-tab-span" style="white-space: pre;"> </span>7<br />
9BC25E584304AEFA<span class="Apple-tab-span" style="white-space: pre;"> </span>26<br />
9C431E20C78D50AD<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
9C8FA03A0D9DF175<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
9CF1A20154759F8F<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
9D5447106CB898E3<span class="Apple-tab-span" style="white-space: pre;"> </span>9<br />
9D7A7624B927FE31<span class="Apple-tab-span" style="white-space: pre;"> </span>12<br />
9DB263190BB17AC2<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
9E1707EE57C96C1E<span class="Apple-tab-span" style="white-space: pre;"> </span>12<br />
9EAF527F15CABB79<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
9F9453D10F3C6718<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
A0037BF72EC3BA42<span class="Apple-tab-span" style="white-space: pre;"> </span>6<br />
A01609F239CC8A05<span class="Apple-tab-span" style="white-space: pre;"> </span>6<br />
A01C8755A311CD61<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
A02D95B65ECAFADD<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
A127C018E4812A29<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
A1801A1ACC7BE15C<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
A1A9D53780361768<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
A1C6967657FF9158<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
A1CFAE0FF0E6CFDE<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
A1F547F916AD8A43<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
A215ACD1331A1E5F<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
A224DDFAF7314978<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
A25C8C765238184A<span class="Apple-tab-span" style="white-space: pre;"> </span>8<br />
A2800E21FDCEE2BF<span class="Apple-tab-span" style="white-space: pre;"> </span>10<br />
A320339E3C0BE4AA<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
A369CD69E8B08A1A<span class="Apple-tab-span" style="white-space: pre;"> </span>11<br />
A3C564B7D1FA1EA0<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
A40BA4A81324E029<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
A5033398CB2B7728<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
A5A6085F03DA0416<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
A5EB957C8CBFB0CD<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
A628C436696FE1B7<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
A67BC352137D9E89<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
A6867C13B8B29D8A<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
A68E8A400F6B4C56<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
A6EEB808BC4324D2<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
A71ABB39E3E46318<span class="Apple-tab-span" style="white-space: pre;"> </span>18<br />
A71C7704625F3DF4<span class="Apple-tab-span" style="white-space: pre;"> </span>10<br />
A7807FC4C410F719<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
A7863A716CA7614D<span class="Apple-tab-span" style="white-space: pre;"> </span>14<br />
A7FB2A8002E86F26<span class="Apple-tab-span" style="white-space: pre;"> </span>10<br />
A8983AA53DDA6E62<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
A8A0674EA33D1249<span class="Apple-tab-span" style="white-space: pre;"> </span>22<br />
A9167118E28877F1<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
A93156BD79F164A4<span class="Apple-tab-span" style="white-space: pre;"> </span>15<br />
AA17D9D7A97BF879<span class="Apple-tab-span" style="white-space: pre;"> </span>11<br />
AA2BA7F06CBD473D<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
AA50F12D6650122F<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
AA716408D075660C<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
AAA21900843C49E7<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
AAA6E4471629BC8F<span class="Apple-tab-span" style="white-space: pre;"> </span>47<br />
AAAEFC630BA8D7B6<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
AAE7D472AA45AB96<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
AAFA3254AF25D0FC<span class="Apple-tab-span" style="white-space: pre;"> </span>7<br />
AB0D6B51B487075A<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
AC43C9B376B132D5<span class="Apple-tab-span" style="white-space: pre;"> </span>9<br />
AC5FD7086CB44602<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
AD461CB2E3D2B8D7<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
AD957FB1A4A86779<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
ADDF71B56E078EC1<span class="Apple-tab-span" style="white-space: pre;"> </span>18<br />
ADFBC35853A325C6<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
AE27828868F61353<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
AE341CEB2D79E51D<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
AE5E2B48ED103FA0<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
AEB9383955EECF5B<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
AF44F3885296C45B<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
AF7BBE7E92E62D6E<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
AFF4AFE7145DC5C9<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
AFFDEFE691EAD2FA<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
B0274667D0A700A8<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
B038389D403E4C43<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
B0C1B6DC7370F24B<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
B0FBC83B9C9FBB21<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
B144CE6F1EDAB0DE<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
B163FAFD64AFAB18<span class="Apple-tab-span" style="white-space: pre;"> </span>16<br />
B1E4391F6E6EFEF4<span class="Apple-tab-span" style="white-space: pre;"> </span>9<br />
B21B920FD9253010<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
B21C28EF21B46438<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
B27D574886585DD6<span class="Apple-tab-span" style="white-space: pre;"> </span>6<br />
B2B4FD80D447F15B<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
B2D86EFD1C83A81B<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
B3797986B594F03D<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
B3CEAE9CC28714CF<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
B42894B030717FB8<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
B439C4E265D35E3D<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
B451329A3E623408<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
B53A8E9C0F0A04B8<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
B61BFEA8D6B8369F<span class="Apple-tab-span" style="white-space: pre;"> </span>9<br />
B7C5C0BCD35D4CC9<span class="Apple-tab-span" style="white-space: pre;"> </span>7<br />
B80DC510FF1B6C01<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
B8E12AFC196C5FB7<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
B9436C1E65C39A9E<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
B9922F32F8DD2511<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
B9D3C28C13F46D1D<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
B9E187FD56A5C322<span class="Apple-tab-span" style="white-space: pre;"> </span>8<br />
BA449E5E59C384BB<span class="Apple-tab-span" style="white-space: pre;"> </span>16<br />
BAC295D278E3E496<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
BAE7F92AAD81B7C5<span class="Apple-tab-span" style="white-space: pre;"> </span>8<br />
BB925FF85FF44849<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
BBBBA6C4B71C1455<span class="Apple-tab-span" style="white-space: pre;"> </span>8<br />
BC383CAF4C39027A<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
BC492F4E132262FE<span class="Apple-tab-span" style="white-space: pre;"> </span>9<br />
BCB8F383043E184C<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
BCD36229594FAAF4<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
BCD90B7247D8FC7C<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
BD4D0061A2CB3CC9<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
BD6739C2A5932AE7<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
BDDDD3F6DA8557A4<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
BDEB5480328AEB41<span class="Apple-tab-span" style="white-space: pre;"> </span>10<br />
BE2650AD779FC652<span class="Apple-tab-span" style="white-space: pre;"> </span>14<br />
BE4B27358BABBC46<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
BE95DF3EAD425CAB<span class="Apple-tab-span" style="white-space: pre;"> </span>7<br />
BED75271605EBD0C<span class="Apple-tab-span" style="white-space: pre;"> </span>20<br />
BF67450937AA9990<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
BF6A27B4287138DE<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
BF76256C3A233A8A<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
BFDFC6040837EE14<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
C01F07D111E19068<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
C040A1754EEF11B1<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
C0733122E43CFD82<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
C07C30D02210A05E<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
C07D4ECD1ACE0C89<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
C0916429A59CE5A1<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
C0BD480632F27E58<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
C1340737666AB6D7<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
C1896F8C0035B349<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
C1977F1B854584B3<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
C1C4228EA191F401<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
C1C9C378C3568522<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
C1ECD6FD44B29196<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
C207D5DC9D314B5B<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
C2482CBA783A419D<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
C28C7C97640037C1<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
C2E319C7310CF5CA<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
C32C5E6E8CE7DAFD<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
C33468CFBB6BBA02<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
C33FE9482743BD0F<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
C35A0850C4B94541<span class="Apple-tab-span" style="white-space: pre;"> </span>28<br />
C3DC13DF9F22602C<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
C4176145E5944CA2<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
C444109E68351C02<span class="Apple-tab-span" style="white-space: pre;"> </span>10<br />
C485DF6D1EA489BA<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
C5460576B58BB1CC<span class="Apple-tab-span" style="white-space: pre;"> </span>7<br />
C5779DED2B0EA592<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
C5901C622223E71D<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
C5D01E05FF9CA265<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
C611A0BC0216E1CD<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
C68A35C476240F3D<span class="Apple-tab-span" style="white-space: pre;"> </span>10<br />
C6A50F1089717BA1<span class="Apple-tab-span" style="white-space: pre;"> </span>8<br />
C71BEDADCB745808<span class="Apple-tab-span" style="white-space: pre;"> </span>12<br />
C73A5C29D1FC5C7D<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
C771C1E3DF333CDC<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
C7C6CF328CF46E0D<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
C7CA4669EBEAF90B<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
C80EB1206EAC493D<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
C81329DC0EF932FB<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
C86AA16FFD90B66C<span class="Apple-tab-span" style="white-space: pre;"> </span>14<br />
C871CAE33E1EBD23<span class="Apple-tab-span" style="white-space: pre;"> </span>8<br />
C89F34E15252E94A<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
C92AD22C24629491<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
C983FC6A580A67D4<span class="Apple-tab-span" style="white-space: pre;"> </span>6<br />
C989A6531FD9EEC8<span class="Apple-tab-span" style="white-space: pre;"> </span>9<br />
C99EB10EF3F2240E<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
C9EDF6F6F7C8C2C0<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
C9F4F61D48892F7B<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
CA15DFA42D265175<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
CB6EB7CE0467E74F<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
CB9EA2EEB8E11932<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
CBAEB52E28985C5E<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
CC3D6796CCB8F9B4<span class="Apple-tab-span" style="white-space: pre;"> </span>6<br />
CC4F90BE5D6F0F9A<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
CC51BF8EC2ED9FD8<span class="Apple-tab-span" style="white-space: pre;"> </span>21<br />
CCC1CC82483DD48E<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
CD37F95FC0886E1D<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
CD6DBDCB71996CDA<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
CDA3014FEEE660F2<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
CE09372F159CA389<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
CE65B6131CEBAC78<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
CEBE1A072B345F9F<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
CF5AFAEC0B19A940<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
CFE6B4DACA25B607<span class="Apple-tab-span" style="white-space: pre;"> </span>7<br />
D058447C791B3F76<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
D0AA66103CEC6749<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
D0B7245F30B8170E<span class="Apple-tab-span" style="white-space: pre;"> </span>13<br />
D0EA324518D428BE<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
D17494E7F006DB9A<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
D210EAD7F74E82EE<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
D25EF156EEE4AB94<span class="Apple-tab-span" style="white-space: pre;"> </span>11<br />
D2A2F6B93EE290B0<span class="Apple-tab-span" style="white-space: pre;"> </span>6<br />
D2E8CBAEF95A890B<span class="Apple-tab-span" style="white-space: pre;"> </span>18<br />
D2FFE38AFF1C358A<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
D356BF7183CAA42E<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
D39275A3A8A2B21E<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
D3D3ED7BAD64DDC1<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
D49B04FF9BE2DFA1<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
D4D89B48594B5C6E<span class="Apple-tab-span" style="white-space: pre;"> </span>8<br />
D4DA409F40BB9102<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
D4FAB7E5ED4E8BF8<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
D532DEB0BB3D50FD<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
D5D6264C66799EEB<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
D5D8220D36969861<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
D61E5828503E6438<span class="Apple-tab-span" style="white-space: pre;"> </span>8<br />
D6316653B9793BB3<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
D74F0CDBC2EEB6E0<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
D7886648F0884E25<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
D7CACB5A4976AB9E<span class="Apple-tab-span" style="white-space: pre;"> </span>6<br />
D87F01105536CAEB<span class="Apple-tab-span" style="white-space: pre;"> </span>21<br />
D89ADE64C31D4963<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
D901F064DA40CC67<span class="Apple-tab-span" style="white-space: pre;"> </span>8<br />
D9142519595FF9D1<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
D9804262D7097FA0<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
DA1CAD0C5D86B84B<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
DA22EE4DFE3C8179<span class="Apple-tab-span" style="white-space: pre;"> </span>8<br />
DA4586C99882E0BE<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
DA8A7A56AE86C1ED<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
DA8CF5A56D67D01C<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
DA94D4B5A7C0D1AF<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
DA9DC83856C1269E<span class="Apple-tab-span" style="white-space: pre;"> </span>9<br />
DAA8C88C7DA0F0B9<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
DAF7A3D38ED9A343<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
DB0CC854B82A662C<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
DB150CA81A21781F<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
DB1C66C105955633<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
DB2D5E0E0A0A11C6<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
DB38E7AF26F3AD9A<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
DB49308A76F8A6C4<span class="Apple-tab-span" style="white-space: pre;"> </span>7<br />
DC4F3ECF90B35B9A<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
DCC5EACF75BCEF0E<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
DCF04899DF8CD6C7<span class="Apple-tab-span" style="white-space: pre;"> </span>10<br />
DD36B11F3ADA30C8<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
DD99EA68707D6EBB<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
DE3AE35D76E898B0<span class="Apple-tab-span" style="white-space: pre;"> </span>19<br />
DEA8DB3FF5F70B93<span class="Apple-tab-span" style="white-space: pre;"> </span>21<br />
DF04028778B8D665<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
DF3E47213C887544<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
DF9BBDB4B1E1B8EC<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
DFDC0E9E4E3055AA<span class="Apple-tab-span" style="white-space: pre;"> </span>12<br />
DFFFF72A42DD6526<span class="Apple-tab-span" style="white-space: pre;"> </span>30<br />
E016B2DA270CB1B3<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
E075523C884E339E<span class="Apple-tab-span" style="white-space: pre;"> </span>12<br />
E08CDDAE633645FB<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
E0D12FA14991D2D9<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
E0F9E1C71AF27644<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
E131BFC55AF4CDCE<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
E1B23A1B0EAE7DCF<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
E29559653E1E5D44<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
E2BE501BA64CD453<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
E2BE900C633FC8BB<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
E2CC72180CD1173F<span class="Apple-tab-span" style="white-space: pre;"> </span>8<br />
E2E1A6C2BC5E324C<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
E3987AED25D1C7CA<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
E3E5D96565D98DA9<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
E3E8E56E44175FD9<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
E4ACB00AD7316719<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
E55487B7296ED015<span class="Apple-tab-span" style="white-space: pre;"> </span>7<br />
E559AEBED8E9E078<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
E685D01156BD1FA6<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
E7426E62B87C050F<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
E760CDDAD774D717<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
E7B845B836EB153E<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
E7BF2A1987308ECA<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
E7E4CAE0EEA18A00<span class="Apple-tab-span" style="white-space: pre;"> </span>8<br />
E81CC79FF1064DBD<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
E84833C4A26D6818<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
E84EC370A6154A16<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
E8AE49E596BE8075<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
E8BD3F9DE94CF252<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
E8D74D7394CDB87C<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
E8ED6BB6158694B5<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
E9913C2EF0736101<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
E9CE7EE37511E710<span class="Apple-tab-span" style="white-space: pre;"> </span>11<br />
EA0C5440778B6D73<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
EA117DA516DCCE9A<span class="Apple-tab-span" style="white-space: pre;"> </span>10<br />
EAE86EA0EE9F3F2E<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
EAED31B6A8CCC1D2<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
EB7125303EA0A6F9<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
EB73C03E6F4F2602<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
EBE0D9D904DF6E52<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
EC6D96F35B4B6EDC<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
EC6E91864359DD8D<span class="Apple-tab-span" style="white-space: pre;"> </span>47<br />
ECD32A3785A6338C<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
ED3EA19F0B5A556B<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
ED405D0C0341A807<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
ED46FBB036F53C65<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
EE772E45E3DC084E<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
EEF64006C7D47AC1<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
EF5896A4EBC0CA3C<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
EF8A0725112B7813<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
EFC23FE521A780BB<span class="Apple-tab-span" style="white-space: pre;"> </span>11<br />
EFD8A3AF3D55DDA8<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
F006E556E2A09E96<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
F0D2ACCD226C9EB8<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
F0FD2AA13A263844<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
F11C6441E99CF50A<span class="Apple-tab-span" style="white-space: pre;"> </span>5<br />
F19ED8F44663520A<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
F22F55E90508FC0F<span class="Apple-tab-span" style="white-space: pre;"> </span>18<br />
F268B329129FEA09<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
F2C185AC2A3FFE4B<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
F30E97C85680593D<span class="Apple-tab-span" style="white-space: pre;"> </span>8<br />
F31767E967324A34<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
F37A1475EADCEC5E<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
F3FEA5332560D893<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
F44CC3ECE5C1C448<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
F5052DF171744331<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
F5192686FA9BA516<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
F559561E697722BB<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
F567E121D669BA67<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
F58053809A3FD38F<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
F584862B9B7346EB<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
F5C0159294563B38<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
F5F3D76DC932FA2C<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
F5FAFB447A057019<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
F61A119640D7C0EB<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
F623B8196D573996<span class="Apple-tab-span" style="white-space: pre;"> </span>13<br />
F63897E2C5E3ABA0<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
F63CC494C71DF1E8<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
F6D9A01E32E0BE2F<span class="Apple-tab-span" style="white-space: pre;"> </span>6<br />
F83A4A675B2D7087<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
F83D9A82EA70E97C<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
F8548204B42ABEED<span class="Apple-tab-span" style="white-space: pre;"> </span>9<br />
F85AF1304F7D7D56<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
F91FE5DF055F8E7B<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
F95E943D66FEEE8F<span class="Apple-tab-span" style="white-space: pre;"> </span>8<br />
F9D5FD25E1671290<span class="Apple-tab-span" style="white-space: pre;"> </span>8<br />
F9F8675E8F3925BC<span class="Apple-tab-span" style="white-space: pre;"> </span>7<br />
FA0ECA96038AD21E<span class="Apple-tab-span" style="white-space: pre;"> </span>6<br />
FA27C381A64FFDA5<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
FA75BB73B37F9E91<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
FB02D1A76ED1E308<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
FB3EA7AB8B51C95D<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
FB91EB2A6E481F1A<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
FBC6BF4991AE18A7<span class="Apple-tab-span" style="white-space: pre;"> </span>7<br />
FBD3AC3CACEBE693<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
FCBB8401805D783F<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
FCE735441720FBE8<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
FD2253483C3B15DC<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
FD2A6A330C3F58DB<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
FD3373744827EFA7<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
FD4BB9A09080B726<span class="Apple-tab-span" style="white-space: pre;"> </span>2<br />
FD83D5C547D3EA2E<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
FDCC0A3F96D1C47A<span class="Apple-tab-span" style="white-space: pre;"> </span>10<br />
FE106E193F938B17<span class="Apple-tab-span" style="white-space: pre;"> </span>3<br />
FE33FDC5FAE7EB96<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
FE785BA19AAA3CBB<span class="Apple-tab-span" style="white-space: pre;"> </span>10<br />
FEA681A240A74D76<span class="Apple-tab-span" style="white-space: pre;"> </span>4<br />
FF5C9156B2D27FBD<span class="Apple-tab-span" style="white-space: pre;"> </span>1<br />
FFA4F354D3948CFB<span class="Apple-tab-span" style="white-space: pre;"> </span>6<br />
FFCA848089F3BA8C<span class="Apple-tab-span" style="white-space: pre;"> </span>1</div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com0tag:blogger.com,1999:blog-1743557738889130058.post-33305734359660907682014-08-25T05:07:00.006-07:002014-08-25T05:07:53.157-07:00Apache Hive Working Example<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="color: blue;"># Opening Hive Terminal</span><br />
<br />
Open terminal and simply type "hive"<br />
<br />
<span style="color: blue;"># Listing all the databases</span><br />
<br />
hive> show databases;<br />
OK<br />
default<br />
Time taken: 0.022 seconds, Fetched: 1 row(s)<br />
<br />
<span style="color: blue;"># Creating a New Databases in Hive</span><br />
<br />
hive> CREATE DATABASE employee;<br />
OK<br />
Time taken: 0.069 seconds<br />
<br />
<span style="color: blue;"># Listing the created databases</span><br />
<br />
hive> show databases; <br />
OK<br />
default<br />
employee<br />
Time taken: 0.012 seconds, Fetched: 2 row(s)<br />
<br />
<span style="color: blue;"># Choosing/Selecting the Database</span><br />
<br />
hive> use employee;<br />
OK<br />
Time taken: 0.018 seconds<br />
<br />
<span style="color: blue;"># Creating a new Table in Database</span><br />
<br />
hive> CREATE TABLE country_list (name STRING);<br />
OK<br />
Time taken: 0.083 seconds<br />
<br />
<span style="color: blue;"># Listing the tables present in Databases</span><br />
<br />
hive> show tables;<br />
OK<br />
country_list<br />
Time taken: 0.03 seconds, Fetched: 1 row(s)<br />
<br />
<span style="color: blue;"># Loading a dataset into Hive from HDFS</span><br />
<br />
hive> LOAD DATA INPATH '/user/bigdata/hive/country_example.tsv' OVERWRITE INTO TABLE country_list;<br />
Loading data to table employee.country_list<br />
Deleted hdfs://bigdata-karthik9000/user/hive/warehouse/employee.db/country_list<br />
Table employee.country_list stats: [numFiles1, numRows0, totalSize38, rawDataSize0]<br />
OK<br />
Time taken: 0.179 seconds<br />
<br />
<span style="color: blue;"># Selecting all the values from Tables</span><br />
<br />
hive> select * from country_list;<br />
OK<br />
Atlantis<br />
Albania<br />
China<br />
France<br />
Russia<br />
<br />
Time taken: 0.056 seconds, Fetched: 6 row(s)<br />
hive><br />
<br />
<span style="color: blue;"># Besides i have loaded the TSV file into the Hadoop HDFS</span><br />
<br />
bigdata@bigdata-karthik~$ hadoop fs -ls /user/bigdata/hive<br />
bigdata@bigdata-karthik~$ hadoop fs -put /home/bigdata/Downloads/hive/country_example.tsv /user/bigdata/hive<br />
bigdata@bigdata-karthik~$ hadoop fs -ls /user/bigdata/hive<br />
Found 1 items<br />
-rw-r--r-- 1 bigdata supergroup 38 2014-07-14 18:14 /user/bigdata/hive/country_example.tsv<br />
bigdata@bigdata-karthik~$ </div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com1tag:blogger.com,1999:blog-1743557738889130058.post-10702271204492813092014-08-25T04:47:00.002-07:002014-08-25T04:47:53.226-07:00HBase : Unix/Shell Script File for Creating,Putting/Disabling/Droping tables and Inserting bulk load of datasets from Hadoop HDFS<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="color: blue;">#!/bin/bash</span><br />
<br />
<span style="color: blue;"># Declaring Variables</span><br />
table0="employee"<br />
<br />
path=$(pwd);<br />
<br />
<span style="color: blue;"># Creating a Employee table in HBase</span><br />
<span style="color: blue;"><br /></span>
echo "exists '$table0'" | hbase shell > log<br />
cat log | grep "Table employee does exist"<br />
if [ $? = 0 ];then<br />
echo "************ table is already exists **********"<br />
<br />
<span style="color: blue;"># Either you can use truncate or disable & drop options</span><br />
<span style="color: blue;"><br /></span>
echo "disable '$table0'" | hbase shell<br />
echo "drop '$table0'" | hbase shell<br />
# echo "truncate '$table0'" | hbase shell<br />
echo "create '$table0','count'" | hbase shell<br />
<br />
<span style="color: blue;"># Either you can use shell commands here or call another .sh file.. here i have used another file option</span><br />
<span style="color: blue;"><br /></span>
<span class="Apple-tab-span" style="white-space: pre;"> </span>cd $path/depends<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>chmod +x hbase-script.sh<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>./hbase-script.sh | hbase shell<br />
else<br />
echo "*********** need to create a table **********"<br />
echo "create '$table0','count'" | hbase shell<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>cd $path/depends<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>chmod +x hbase-script.sh<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span>./hbase-script.sh | hbase shell<br />
fi<br />
<br />
<span style="color: blue;"># Creating, copying and populating the table0 table</span><br />
<span style="color: blue;"><br /></span>
echo ${HADOOP_HOME}<br />
username="$USER"<br />
<br />
echo $(hadoop fs -copyFromLocal $path/depends/table0 /user/$username/hbasetable/table0)<br />
echo $(hbase org.apache.hadoop.hbase.mapreduce.Import table0 /user/$username/hbasetable/table0)<br />
echo $(hadoop fs -rmr /user/$username/hbasetable)<br />
exit</div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com12tag:blogger.com,1999:blog-1743557738889130058.post-41147788506362168072014-02-07T03:21:00.002-08:002014-02-07T04:08:25.186-08:00Hadoop Books<div dir="ltr" style="text-align: left;" trbidi="on">
<b><span style="color: blue;">Hadoop the Definition Guide - Tom White :</span></b><br />
<b><span style="color: blue;"><br />
</span></b> <iframe height="600" src="https://docs.google.com/file/d/0B9uwIxVfFZMOcks2VklodmxUaEU/preview" width="1024"></iframe><br />
<b><span style="color: blue;"><br />
</span></b> <b><span style="color: blue;">Hadoop MapReduce CookBook : </span></b><br />
<b><span style="color: blue;"><br />
</span></b> <iframe height="600" src="https://docs.google.com/file/d/0B9uwIxVfFZMOQjI3NU9ub1poWWc/preview" width="1024"></iframe><br />
<b><span style="color: blue;"><br />
</span></b> <b><span style="color: blue;">Hadoop Operation : </span></b><br />
<b><span style="color: blue;"><br />
</span></b> <iframe height="600" src="https://docs.google.com/file/d/0B9uwIxVfFZMOVXVfVnlEM0FreWs/preview" width="1024"></iframe><br />
<b><span style="color: blue;"><br />
</span></b> <b><span style="color: blue;">Hadoop In Action : </span></b><br />
<b><span style="color: blue;"><br />
</span></b> <iframe height="600" src="https://docs.google.com/file/d/0B9uwIxVfFZMOTDhtWEJ3ekZmdGc/preview" width="1024"></iframe><br />
<span style="color: blue;"><b><br />
</b></span> <span style="color: blue;"><b>Hive : </b></span><br />
<span style="color: blue;"><b><br />
</b></span> <iframe height="600" src="https://docs.google.com/file/d/0B9uwIxVfFZMOQkJxOGlPemxhRlk/preview" width="1024"></iframe><br />
<span style="color: blue;"><b><br />
</b></span> <span style="color: blue;"><b>Mapred Tutorial : </b></span><br />
<span style="color: blue;"><b><br />
</b></span> <iframe height="600" src="https://docs.google.com/file/d/0B9uwIxVfFZMOQVpWM0RVa1Y5Rm8/preview" width="1024"></iframe><br />
<b style="color: blue;"><br />
</b> <b style="color: blue;">MapReduce Design Pattern :</b><br />
<b style="color: blue;"><br />
</b> <iframe height="600" src="https://docs.google.com/file/d/0B9uwIxVfFZMOOVZYXzV1cndQZjg/preview" width="1200"></iframe></div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com0tag:blogger.com,1999:blog-1743557738889130058.post-59448834166075079262014-01-27T06:46:00.001-08:002014-01-27T06:46:23.980-08:00Executing Hive Scripts<div dir="ltr" style="text-align: left;" trbidi="on">
<h3 style="background-color: white; border-bottom-color: rgb(204, 204, 204); border-bottom-style: dashed; border-width: 0px 0px 1px; clear: both; font-weight: normal; line-height: 1em; margin: 0px 0px 10px; padding: 0px; vertical-align: baseline;">
<strong><span style="font-family: inherit; font-size: small;">Step 1: Writing a Hive script.</span></strong></h3>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;">To write the Hive Script the file should be saved with .sql extension. Open a terminal in your Cloudera CDH4 distribution and give the following command to create a Hive Script.<br /><strong>Command:</strong> sudo gedit sample.sql</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<img alt="how to run hive scripts" class="size-full wp-image-5758 alignleft maxwidth" height="21" src="http://www.edureka.in/blog/wp-content/uploads/2013/10/01.jpg" style="border: 0px; display: inline; float: left; font-style: inherit; font-weight: inherit; height: auto; margin: 0px 1em 1em 0px; max-width: 100%; padding: 0px; text-align: center; vertical-align: baseline;" title="how to run hive scripts" width="602" /></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;"><br /></span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;">On executing the above command, it will open the file with the list of all the Hive commands that need to be executed.</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;">In this script, a table will be created, described and data will be loaded and retrieved from the table.</span></div>
<h4 style="background-color: white; border: 0px; clear: both; font-weight: normal; line-height: 1em; margin: 0px 0px 5px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;">1. To create the table in Hive:</span></h4>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;"><strong>Command:</strong> create table product ( productid: int, productname: string, price: float, category: string) rows format delimited fields terminated by ‘,’ ;</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;">Here, product is the table name and { productid, productname, price, category} are the columns of this table.</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;">Fields terminated by ‘,’ indicate that the columns in the input file are separated by the symbol ‘,’.</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;">By default the records in the input file are separated by a new line.</span></div>
<h4 style="background-color: white; border: 0px; clear: both; font-weight: normal; line-height: 1em; margin: 0px 0px 5px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;">2. Describing the table:</span></h4>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;">Command: describe product;</span></div>
<h4 style="background-color: white; border: 0px; clear: both; font-weight: normal; line-height: 1em; margin: 0px 0px 5px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;">3. Loading the data into the table.</span></h4>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;">To load the data into the table first we need to create an input file which contains the records that need to be inserted in the table.</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;">Let us create an input file. The command is:</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;"><strong>Command:</strong> sudo gedit input.txt</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<img alt="how to run hive scripts" class="size-full wp-image-5760 alignleft maxwidth" height="17" src="http://www.edureka.in/blog/wp-content/uploads/2013/10/02.jpg" style="border: 0px; display: inline; float: left; font-style: inherit; font-weight: inherit; height: auto; margin: 0px 1em 1em 0px; max-width: 100%; padding: 0px; text-align: center; vertical-align: baseline;" title="how to run hive scripts" width="602" /></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;">Edit the contents in the file as shown in the figure.</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<img alt="how to run hive scripts" class="size-full wp-image-5764 alignleft maxwidth" height="248" src="http://www.edureka.in/blog/wp-content/uploads/2013/10/03.jpg" style="border: 0px; display: inline; float: left; font-style: inherit; font-weight: inherit; height: auto; margin: 0px 1em 1em 0px; max-width: 100%; padding: 0px; text-align: center; vertical-align: baseline;" title="how to run hive scripts" width="602" /></div>
<h4 style="background-color: white; border: 0px; clear: both; font-weight: normal; line-height: 1em; margin: 0px 0px 5px; padding: 0px; vertical-align: baseline;">
</h4>
<h4 style="background-color: white; border: 0px; clear: both; font-weight: normal; line-height: 1em; margin: 0px 0px 5px; padding: 0px; vertical-align: baseline;">
</h4>
<h4 style="background-color: white; border: 0px; clear: both; font-weight: normal; line-height: 1em; margin: 0px 0px 5px; padding: 0px; vertical-align: baseline;">
</h4>
<h4 style="background-color: white; border: 0px; clear: both; font-weight: normal; line-height: 1em; margin: 0px 0px 5px; padding: 0px; vertical-align: baseline;">
</h4>
<h4 style="background-color: white; border: 0px; clear: both; font-weight: normal; line-height: 1em; margin: 0px 0px 5px; padding: 0px; vertical-align: baseline;">
</h4>
<h4 style="background-color: white; border: 0px; clear: both; font-weight: normal; line-height: 1em; margin: 0px 0px 5px; padding: 0px; vertical-align: baseline;">
</h4>
<h4 style="background-color: white; border: 0px; clear: both; font-weight: normal; line-height: 1em; margin: 0px 0px 5px; padding: 0px; vertical-align: baseline;">
</h4>
<h4 style="background-color: white; border: 0px; clear: both; font-weight: normal; line-height: 1em; margin: 0px 0px 5px; padding: 0px; vertical-align: baseline;">
<span style="border: 0px; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; vertical-align: baseline;"><span style="font-family: inherit;">4. Retrieving the data:</span></span></h4>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;">To retrieve the data, the select command is used.</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;"><strong>Command:</strong> Select * from product;</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;">The above command is used to retrieve the value of all the columns present in the table. The script should be like as it is shown in the below image.</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;">Now we are done with writing the Hive script. The file sample.sql can now be saved.</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<img alt="how to run hive scripts" class="size-full wp-image-5766 alignleft maxwidth" height="273" src="http://www.edureka.in/blog/wp-content/uploads/2013/10/04.jpg" style="border: 0px; display: inline; float: left; font-style: inherit; font-weight: inherit; height: auto; margin: 0px 1em 1em 0px; max-width: 100%; padding: 0px; text-align: center; vertical-align: baseline;" title="how to run hive scripts" width="602" /></div>
<h4 style="background-color: white; border: 0px; clear: both; font-weight: normal; line-height: 1em; margin: 0px 0px 5px; padding: 0px; vertical-align: baseline;">
</h4>
<h4 style="background-color: white; border: 0px; clear: both; font-weight: normal; line-height: 1em; margin: 0px 0px 5px; padding: 0px; vertical-align: baseline;">
</h4>
<h4 style="background-color: white; border: 0px; clear: both; font-weight: normal; line-height: 1em; margin: 0px 0px 5px; padding: 0px; vertical-align: baseline;">
</h4>
<h4 style="background-color: white; border: 0px; clear: both; font-weight: normal; line-height: 1em; margin: 0px 0px 5px; padding: 0px; vertical-align: baseline;">
</h4>
<h4 style="background-color: white; border: 0px; clear: both; font-weight: normal; line-height: 1em; margin: 0px 0px 5px; padding: 0px; vertical-align: baseline;">
</h4>
<h4 style="background-color: white; border: 0px; clear: both; font-weight: normal; line-height: 1em; margin: 0px 0px 5px; padding: 0px; vertical-align: baseline;">
</h4>
<h4 style="background-color: white; border: 0px; clear: both; font-weight: normal; line-height: 1em; margin: 0px 0px 5px; padding: 0px; vertical-align: baseline;">
</h4>
<h4 style="background-color: white; border: 0px; clear: both; font-weight: normal; line-height: 1em; margin: 0px 0px 5px; padding: 0px; vertical-align: baseline;">
</h4>
<h3 style="background-color: white; border-bottom-color: rgb(204, 204, 204); border-bottom-style: dashed; border-width: 0px 0px 1px; clear: both; font-weight: normal; line-height: 1em; margin: 0px 0px 10px; padding: 0px; vertical-align: baseline;">
<span style="border: 0px; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; vertical-align: baseline;"><span style="font-family: inherit; font-size: small;">Step 2: Running the Hive Script</span></span></h3>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;">The following is the command to run the Hive script:</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;"><strong>Command:</strong> hive –f /home/cloudera/sample.sql</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;"><img alt="how to run hive scripts" class="maxwidth" height="17" src="http://www.edureka.in/blog/wp-content/uploads/2013/10/05.jpg" style="border: 0px; font-style: inherit; font-weight: inherit; height: auto; margin: 0px; max-width: 100%; padding: 0px; text-align: center; vertical-align: baseline;" title="how to run hive scripts" width="602" /></span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;">While executing the script, make sure that the entire path of the location of the Script file is present.</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;">We can see that all the commands are executed successfully.</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;"><img alt="how to run hive scripts" class="maxwidth" height="418" src="http://www.edureka.in/blog/wp-content/uploads/2013/10/06.jpg" style="border: 0px; font-style: inherit; font-weight: inherit; height: auto; margin: 0px; max-width: 100%; padding: 0px; text-align: center; vertical-align: baseline;" title="how to run hive scripts" width="602" /></span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; vertical-align: baseline;">
<span style="font-family: inherit;">This is how Hive scipts are run and executed in CDH4.</span></div>
</div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com0tag:blogger.com,1999:blog-1743557738889130058.post-39529944006765515122014-01-27T06:40:00.006-08:002014-01-27T06:42:00.886-08:00Apache Hive Installation on Ubuntu<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="background-color: white; border: 0px; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<b><span style="color: blue; font-family: inherit;">Hive Installation on Ubuntu:</span></b></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<span style="font-family: inherit;">Please follow the below steps to install <b>Apache Hive </b>on Ubuntu:</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<span style="font-family: inherit;"><b>Step 1:</b> Download <b>Hive tar.</b></span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<b><span style="font-family: inherit;">Command: wget -c http://archive.apache.org/dist/hive/hive-0.9.0/hive-0.9.0-bin.tar.gz</span></b></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: center; vertical-align: baseline;">
<a class="fancybox" href="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive1.png" rel="gallery" style="border: 0px; color: #368ba1; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;"><span style="font-family: inherit;"><img alt="Hive installation on ubuntu - 1" class="aligncenter size-full wp-image-7101 maxwidth" src="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive1.png" height="367" style="border: none; clear: both; display: block; font-style: inherit; font-weight: inherit; height: auto; margin: 20px auto; max-width: 100%; padding: 0px; vertical-align: baseline;" title="Apache Hive installation on ubuntu - 1" width="882" /></span></a></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<span style="font-family: inherit;"><b>Step 2: </b>Extract the <b>tar</b> file.<b></b></span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<b><span style="font-family: inherit;">Command: tar -xzvf hive-0.9.0-bin.tar.gz</span></b></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: center; vertical-align: baseline;">
<b><a class="fancybox" href="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive2.png" rel="gallery" style="border: 0px; color: #368ba1; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;"><span style="font-family: inherit;"><img alt="Hive installation on ubuntu - 2" class="aligncenter size-full wp-image-7102 maxwidth" src="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive2.png" height="27" style="border: none; clear: both; display: block; font-style: inherit; font-weight: inherit; height: auto; margin: 20px auto; max-width: 100%; padding: 0px; vertical-align: baseline;" title="Apache Hive installation on ubuntu - 2" width="677" /></span></a></b></div>
<div style="text-align: left;">
<span style="font-family: inherit;"><b>Step 3: </b>Edit the <b>“.bashrc”</b> file to update the environment variables for user.</span></div>
<div style="text-align: left;">
<span style="font-family: inherit;"><br /></span></div>
<div style="text-align: left;">
</div>
<ul style="background-color: white; border: 0px; color: #333333; line-height: 21.280000686645508px; list-style: square; margin: 0px 0px 20px 18px; padding: 0px; vertical-align: baseline;">
<li style="border: 0px; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; vertical-align: baseline;"><b><span style="font-family: inherit;">hadoop fs -mkdir /user/hive/warehouse</span></b></li>
</ul>
<ul style="background-color: white; border: 0px; color: #333333; line-height: 21.280000686645508px; list-style: square; margin: 0px 0px 20px 18px; padding: 0px; vertical-align: baseline;">
<li style="border: 0px; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; vertical-align: baseline;"><b style="font-style: inherit;"><span style="font-family: inherit;">hadoop fs -mkdir /temp</span></b></li>
</ul>
<ul style="background-color: white; border: 0px; color: #333333; line-height: 21.280000686645508px; list-style: square; margin: 0px 0px 20px 18px; padding: 0px; vertical-align: baseline;">
<li style="border: 0px; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; vertical-align: baseline;"><b style="font-style: inherit;"><span style="font-family: inherit;">hadoop fs -chmodg+w /user/hive/warehouse</span></b></li>
</ul>
<ul style="background-color: white; border: 0px; color: #333333; line-height: 21.280000686645508px; list-style: square; margin: 0px 0px 20px 18px; padding: 0px; vertical-align: baseline;">
<li style="border: 0px; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; vertical-align: baseline;"><b style="font-style: inherit;"><span style="font-family: inherit;">hadoop fs -chmodg+w /temp</span></b></li>
</ul>
<br />
<div style="text-align: left;">
<span style="font-family: inherit;"><br /></span></div>
<div style="text-align: left;">
<span style="font-family: inherit;"><br /></span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<b><span style="font-family: inherit;">Command: sudo gedit .bashrc</span></b></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: center; vertical-align: baseline;">
<a class="fancybox" href="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive3.png" rel="gallery" style="border: 0px; color: #368ba1; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;"><span style="font-family: inherit;"><img alt="Hive installation on ubuntu - 3" class="aligncenter size-full wp-image-7103 maxwidth" src="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive3.png" height="60" style="border: none; clear: both; display: block; font-style: inherit; font-weight: inherit; height: auto; margin: 20px auto; max-width: 100%; padding: 0px; vertical-align: baseline;" title="Apache hive installation on ubuntu - 3" width="720" /></span></a></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<span style="font-family: inherit;">Add the following at the end of the file:</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<b><span style="font-family: inherit;">export HADOOP_HOME=/home/user/hadoop-1.2.0</span></b></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<b style="line-height: 1.75em;"><span style="font-family: inherit;">export HIVE_HOME=/home/user/hive-0.9.0-bin</span></b></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<b><span style="font-family: inherit;">export PATH=$PATH:$HIVE_HOME/bin</span></b></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<b><span style="font-family: inherit;">export PATH=$PATH:$HADOOP_HOME/bin</span></b></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: center; vertical-align: baseline;">
<b><a class="fancybox" href="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive4.png" rel="gallery" style="border: 0px; color: #368ba1; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;"><span style="font-family: inherit;"><img alt="Hive installation on ubuntu - 4" class="aligncenter size-full wp-image-7104 maxwidth" src="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive4.png" height="535" style="border: none; clear: both; display: block; font-style: inherit; font-weight: inherit; height: auto; margin: 20px auto; max-width: 100%; padding: 0px; vertical-align: baseline;" title="Apache hive installation on ubuntu - 4" width="669" /></span></a></b></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<span style="font-family: inherit;"><b>Step 4: </b>Create <b>Hive</b> directories within<strong> HDFS</strong>.<b></b></span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<b><span style="font-family: inherit;">Command:</span></b></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<span style="font-family: inherit;">The directory <b>‘warehouse’</b> is the location to store the table or data related to hive.</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: center; vertical-align: baseline;">
<a class="fancybox" href="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive5.png" rel="gallery" style="border: 0px; color: #368ba1; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;"><span style="font-family: inherit;"><img alt="Hive installation on ubuntu - 5" class="aligncenter size-full wp-image-7105 maxwidth" src="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive5.png" height="44" style="border: none; clear: both; display: block; font-style: inherit; font-weight: inherit; height: auto; margin: 20px auto; max-width: 100%; padding: 0px; vertical-align: baseline;" title=" Apache hive installation on ubuntu - 5" width="800" /></span></a></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<span style="font-family: inherit;">The temporary directory ‘temp’is the temporary location to store the intermediate result of processing.</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: center; vertical-align: baseline;">
<b><a class="fancybox" href="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive6.png" rel="gallery" style="border: 0px; color: #368ba1; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;"><span style="font-family: inherit;"><img alt="Hive installation on ubuntu - 6" class="aligncenter size-full wp-image-7106 maxwidth" src="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive6.png" height="45" style="border: none; clear: both; display: block; font-style: inherit; font-weight: inherit; height: auto; margin: 20px auto; max-width: 100%; padding: 0px; vertical-align: baseline;" title="Apache hive installation on ubuntu - 6" width="716" /></span></a></b></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<span style="font-family: inherit;"><b>Step 5: </b>Set read/write permissions for table.<b></b></span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<b><span style="font-family: inherit;">Command:</span></b></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<span style="font-family: inherit;">In this command we are giving written permission to the group:</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: center; vertical-align: baseline;">
<a class="fancybox" href="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive7.png" rel="gallery" style="border: 0px; color: #368ba1; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;"><span style="font-family: inherit;"><img alt="Hive installation on ubuntu - 7" class="aligncenter size-full wp-image-7107 maxwidth" src="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive7.png" height="43" style="border: none; clear: both; display: block; font-style: inherit; font-weight: inherit; height: auto; margin: 20px auto; max-width: 100%; padding: 0px; vertical-align: baseline;" title="Apache hive installation on ubuntu - 7" width="854" /></span></a></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: center; vertical-align: baseline;">
<a class="fancybox" href="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive8.png" rel="gallery" style="border: 0px; color: #368ba1; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;"><span style="font-family: inherit;"><img alt="Hive installation on ubuntu - 8" class="aligncenter size-full wp-image-7108 maxwidth" src="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive8.png" height="43" style="border: none; clear: both; display: block; font-style: inherit; font-weight: inherit; height: auto; margin: 20px auto; max-width: 100%; padding: 0px; vertical-align: baseline;" title="Apache hive installation on ubuntu - 8" width="733" /></span></a></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<span style="font-family: inherit;"><b>Step 6: </b>Set <b>Hadoop</b> path in <b>Hive</b> config.sh.</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<b><span style="font-family: inherit;">Command: sudo gedit hive-config.sh</span></b></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: center; vertical-align: baseline;">
<b><a class="fancybox" href="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive9.png" rel="gallery" style="border: 0px; color: #368ba1; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;"><span style="font-family: inherit;"><img alt="Hive installation on ubuntu - 9" class="aligncenter size-full wp-image-7109 maxwidth" src="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive9.png" height="137" style="border: none; clear: both; display: block; font-style: inherit; font-weight: inherit; height: auto; margin: 20px auto; max-width: 100%; padding: 0px; vertical-align: baseline;" title="Apache hive installation on ubuntu - 9" width="881" /></span></a></b></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: center; vertical-align: baseline;">
<b><a class="fancybox" href="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive10.png" rel="gallery" style="border: 0px; color: #368ba1; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;"><span style="font-family: inherit;"><img alt="Hive installation on ubuntu - 10" class="aligncenter size-full wp-image-7110 maxwidth" src="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive10.png" height="534" style="border: none; clear: both; display: block; font-style: inherit; font-weight: inherit; height: auto; margin: 20px auto; max-width: 100%; padding: 0px; vertical-align: baseline;" title="Apache hive installation on ubuntu - 10" width="664" /></span></a></b></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<span style="font-family: inherit;"><b>Step 7</b>: Launch <b>Hive.</b></span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<b><span style="font-family: inherit;">Command: hive</span></b></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: center; vertical-align: baseline;">
<b><a class="fancybox" href="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive11.png" rel="gallery" style="border: 0px; color: #368ba1; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;"><span style="font-family: inherit;"><img alt="Hive installation on ubuntu - 11" class="aligncenter size-full wp-image-7111 maxwidth" src="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive11.png" height="245" style="border: none; clear: both; display: block; font-style: inherit; font-weight: inherit; height: auto; margin: 20px auto; max-width: 100%; padding: 0px; vertical-align: baseline;" title="Apache hive installation on ubuntu - 11" width="883" /></span></a></b></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<span style="font-family: inherit;"><b>Step 8</b>: Create sample tables.</span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<span style="font-family: inherit;"><b>Command: </b> <strong>hive> CREATE TABLE shakespeare (freq INT, word STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’ STORED AS TEXTFILE;</strong><b></b></span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<span style="font-family: inherit;"><b> </b><b>Create sample tables:</b></span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: center; vertical-align: baseline;">
<b><a class="fancybox" href="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive12.png" rel="gallery" style="border: 0px; color: #368ba1; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;"><span style="font-family: inherit;"><img alt="Hive installation on ubuntu - 12" class="aligncenter size-full wp-image-7112 maxwidth" src="http://www.edureka.in/blog/wp-content/uploads/2014/01/hive12.png" height="92" style="border: none; clear: both; display: block; font-style: inherit; font-weight: inherit; height: auto; margin: 20px auto; max-width: 100%; padding: 0px; vertical-align: baseline;" title="Apache hive installation on ubuntu - 12" width="883" /></span></a></b></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<span style="font-family: inherit;"><b>Step 9: </b>To exit from<b> Hive:</b></span></div>
<div style="background-color: white; border: 0px; color: #333333; line-height: 1.75em; margin-bottom: 20px; padding: 0px; text-align: left; vertical-align: baseline;">
<b><span style="font-family: inherit;">Command: hive> exit<span style="font-size: x-small;">;</span></span></b></div>
</div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com1tag:blogger.com,1999:blog-1743557738889130058.post-3531238962340763012014-01-09T01:48:00.002-08:002014-01-28T03:53:50.170-08:00Video Tutorial for Data mining with Weka from University of Waikato<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="separator" style="clear: both;">
<b><span style="color: blue;">Week - 1 : Getting started with Weka</span></b></div>
<div class="separator" style="clear: both;">
</div>
<ol>
<li><a href="http://youtu.be/piOQDJmLqQs" target="_blank">Introduction</a></li>
<li><a href="http://youtu.be/bCJ5UZZjL4s" target="_blank">Exploring the Explorer</a></li>
<li><a href="http://youtu.be/v7SkVvL46AM" target="_blank">Exploring datasets</a></li>
<li><a href="http://youtu.be/3EC7XwuvksQ" target="_blank">Building a classifier</a></li>
<li><a href="http://youtu.be/KiPwULWd1wI" target="_blank">Using a filter</a></li>
<li><a href="http://youtu.be/jZPmI9Dqtlk" target="_blank">Visualizing your data</a></li>
</ol>
<div>
<div class="separator" style="clear: both;">
<b><span style="color: blue;">Week - 2 : Evaluation</span></b></div>
<div class="separator" style="clear: both;">
</div>
<ol style="text-align: left;">
<li><a href="http://youtu.be/B11pEZQCaOs" target="_blank">Be a classifier!</a></li>
<li><a href="http://youtu.be/Dyz5wj76Ya8" target="_blank">Training and testing</a></li>
<li><a href="http://youtu.be/X7qckodx4o8" target="_blank">Repeated training and testing</a></li>
<li><a href="http://youtu.be/aDiKpQ_y7OY" target="_blank">Baseline accuracy</a></li>
<li><a href="http://youtu.be/nfXq_6r_SGU" target="_blank">Cross-Validation</a></li>
<li><a href="http://youtu.be/ihR9sJJPVw8" target="_blank">Cross-Validation results</a></li>
</ol>
<div>
<div class="separator" style="clear: both;">
<b><span style="color: blue;">Week - 3 : Simple Classifiers</span></b></div>
<div class="separator" style="clear: both;">
</div>
<ol>
<li><a href="http://youtu.be/82GsT7NsQFo" target="_blank">Simplicity first!</a></li>
<li><a href="http://youtu.be/1TT6RRxB2Yc" target="_blank">Overfitting</a></li>
<li><a href="http://youtu.be/7SIsBNaaUKY" target="_blank">Using probabilities</a></li>
<li><a href="http://youtu.be/WifxU_w7KzI" target="_blank">Decision trees</a></li>
<li><a href="http://youtu.be/bdW1_gnZM_8" target="_blank">Pruning decision trees</a></li>
<li><a href="http://youtu.be/PQKeET2X5YE" target="_blank">Nearest neighbor</a></li>
</ol>
<div>
<div class="separator" style="clear: both;">
<b><span style="color: blue;">Week - 4 : More Classifiers</span></b></div>
<div class="separator" style="clear: both;">
</div>
<ol>
<li><a href="http://youtu.be/Lxz6p3_EB7c" target="_blank">Classification boundaries</a></li>
<li><a href="http://youtu.be/KTJEvoJ3YqI" target="_blank">Linear regression</a></li>
<li><a href="http://youtu.be/JA9Q2HBdz84" target="_blank">Classification by regression</a></li>
<li><a href="http://youtu.be/sEzrmkhJ_rs" target="_blank">Logistic regression</a></li>
<li><a href="http://youtu.be/9YvXm_sEUig" target="_blank">Support vector machines</a></li>
<li><a href="http://youtu.be/geTS76MAuhM" target="_blank">Ensemble learning</a></li>
</ol>
<div>
<div class="separator" style="clear: both;">
<b><span style="color: blue;">Week - 5 : Putting it all together</span></b></div>
<div class="separator" style="clear: both;">
</div>
<ol>
<li><a href="http://youtu.be/-SlGUuaI950" target="_blank">The data mining process</a></li>
<li><a href="http://youtu.be/MVjvSMZhLTM" target="_blank">Pitfalls and pratfalls</a></li>
<li><a href="http://youtu.be/TMgpJ0NusGg" target="_blank">Data mining and ethics</a></li>
<li><a href="http://youtu.be/wNcM_d0qIM8" target="_blank">Summary</a></li>
</ol>
</div>
</div>
</div>
</div>
</div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com354tag:blogger.com,1999:blog-1743557738889130058.post-19314948536314026012014-01-07T06:08:00.002-08:002014-01-27T23:22:40.409-08:00Video Tutorial for MongoDB DBA Course from MongoDB University<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
<b><span style="color: blue;">Week - 1 : </span></b></div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<ol style="text-align: left;">
<li><a href="http://youtu.be/WGGjP9fz0ZU" target="_blank">Course Overview</a></li>
<li><a href="http://youtu.be/Mq9cBFsJoqM" target="_blank">Concepts and Philosophy</a></li>
<li><a href="http://youtu.be/IXOYZ3_70IM" target="_blank">Installing on Unix</a></li>
<li><a href="http://youtu.be/ZbxkQ3y6j_I" target="_blank">Installing on Windows</a></li>
<li><a href="http://youtu.be/XfiaTe0q2Ng" target="_blank">JSON Types</a></li>
<li><a href="http://youtu.be/gdbq0lrKWMk" target="_blank">JSON Syntax - 1</a></li>
<li><a href="http://youtu.be/id7WB6_F4i0" target="_blank">JSON Syntax - 2</a></li>
<li><a href="http://youtu.be/rJAiyjJ_tZU" target="_blank">Introduction to BSON</a></li>
<li><a href="http://youtu.be/VhUnT1citd0" target="_blank">What is Mongo shell</a></li>
<li><a href="http://youtu.be/XgkaYVN-Gec" target="_blank">What is Java Script - 1</a></li>
<li><a href="http://youtu.be/rnXUxdny0B4" target="_blank">What is Java Script - 2 </a></li>
<li><a href="http://youtu.be/9BcQj6Zh5qQ" target="_blank">MongoImport</a></li>
<li><a href="http://youtu.be/IegVM6gubzE" target="_blank">Introduction to the Mongo shell</a></li>
<li><a href="http://youtu.be/H7QIr0u3amk" target="_blank">Shell Queries</a></li>
<li><a href="http://youtu.be/CRVbEpMZ-SE" target="_blank">Shell Sorting </a></li>
<li><a href="http://youtu.be/luZykcwYwnU" target="_blank">Shell Cursors and Shell Help</a></li>
</ol>
<b><span style="color: blue;">Week - 2 : </span></b><br />
<br />
<ol style="text-align: left;">
<li><a href="http://youtu.be/7um0FC7E5NA" target="_blank">Introduction to Week - 2</a></li>
<li><a href="http://youtu.be/fBHJjBK2lLE" target="_blank">Inserting Data</a></li>
<li><a href="http://youtu.be/RJGw0dgOXUI" target="_blank">Updating the Documents</a></li>
<li><a href="http://youtu.be/LQfT6S4arTg" target="_blank">Removing the Documents</a></li>
<li><a href="http://youtu.be/m6RqnLZv9q4" target="_blank">Updating the Documents Part -2 </a></li>
<li><a href="http://youtu.be/ZdTdf_3izI4" target="_blank">MongoDB Commands - 1</a></li>
<li><a href="http://youtu.be/kqlw74tRLPc" target="_blank">MongoDB Commands - 2</a></li>
</ol>
<br />
<b><span style="color: blue;">Week - 3 : </span></b><br />
<br />
<ol>
<li><a href="http://youtu.be/I6zxFt9dkx0" target="_blank">Introduction to Week - 3</a></li>
<li><a href="http://youtu.be/AMV80gD74rQ" target="_blank">Schema Design</a></li>
<li><a href="http://youtu.be/0PAOLHJ8kvE" target="_blank">The Aggregation Framework - 1</a></li>
<li><a href="http://youtu.be/up8Yz3j-NKI" target="_blank">The Aggregation Framework - 2</a></li>
<li><a href="http://youtu.be/X_Fj548vYoQ" target="_blank">More $ Operations</a></li>
<li><a href="http://youtu.be/rkOsMEwAKk8" target="_blank">The FindAndModify Command</a></li>
<li><a href="http://youtu.be/IuJjvVhKEQw" target="_blank">MapReduce</a></li>
</ol>
<div>
<b><span style="color: blue;">Week - 4 : </span></b><br />
<ol>
<li><a href="http://www.youtube.com/watch?v=7DX5i0J5pK4" target="_blank">Introduction to Replication</a></li>
<li>Replica Sets Overview</li>
<li>Replica Sets Demo </li>
<li>Replica Sets Demo (Windows)</li>
<li>Replica Sets - the Simple http admin UI</li>
<li>Replica Set Configuration</li>
<li>GetLasterror and cluster wide commits</li>
<li>Multi data center and sample configurations</li>
<li>ReadPreference (SlaveOK)</li>
</ol>
<b><span style="color: blue;">Week - 5 : </span></b><br />
<br />
<ol>
<li><a href="http://youtu.be/6DZzsa18hVA" target="_blank">Indexes and Optimizing Performance</a></li>
<li><a href="http://youtu.be/yN2YYKiCwvc" target="_blank">Index Types</a></li>
<li><a href="http://youtu.be/u9EgzrBx4z4" target="_blank">Covered Indexes</a></li>
<li><a href="http://youtu.be/18xZ5dCEj1Q" target="_blank">Explain and Hint</a></li>
<li><a href="http://youtu.be/iJ36xwbM9Dw" target="_blank">Read vs Write Tradeoffs</a></li>
<li><a href="http://youtu.be/SmXsGTf8L74" target="_blank">CurrentOp and KillOp</a></li>
<li><a href="http://youtu.be/kwJ904RQC54" target="_blank">The Profiler</a></li>
<li><a href="http://youtu.be/XDIXEnKxqwU" target="_blank">Mongostat and Mongotop</a></li>
<li><a href="http://youtu.be/MAPgzeeapx4" target="_blank">Introduction to MMS Monitoring</a></li>
<li><a href="http://youtu.be/OInX56P2E9E" target="_blank">Overview of MMS</a></li>
<li><a href="http://youtu.be/7eY9EDZAL74" target="_blank">MMS Agent Requires PyMongo</a></li>
<li><a href="http://youtu.be/72L4BnDOHf0" target="_blank">Installing PyMongo (mac)</a></li>
<li><a href="http://youtu.be/OQXYQU9SnPQ" target="_blank">Installing PyMongo (Windows)</a></li>
<li><a href="http://youtu.be/h7ohVLEPbFA" target="_blank">Registering for MMS Monitoring</a></li>
<li><a href="http://youtu.be/wGYuiT-3xwc" target="_blank">MMS Installation (Linux)</a></li>
<li><a href="http://youtu.be/YZaN6hyhnVo" target="_blank">MMS Installation (Windows)</a></li>
</ol>
</div>
<b><span style="color: blue;">Week - 6 : </span></b><br />
<br />
<ol>
<li><a href="http://youtu.be/j2mYoEW9ehk" target="_blank">Introduction to Sharding</a></li>
<li><a href="http://youtu.be/ZLWRe15trOM" target="_blank">Sharding Setup Demo</a></li>
<li><a href="http://youtu.be/Mv-IFEE_COA" target="_blank">The Config Database</a></li>
<li><a href="http://youtu.be/MmZ5IhnOmPQ" target="_blank">Setup Part - 2 Adding the initial Shards</a></li>
<li><a href="http://youtu.be/GLdR4ZsvjMo" target="_blank">Enabling Sharding for a collection</a></li>
<li><a href="http://youtu.be/kiBYgL6XnJw" target="_blank">Working with a Sharded collection</a></li>
<li><a href="http://youtu.be/WU5rIUKJ9Fo" target="_blank">Choosing Shard Keys</a></li>
<li><a href="http://youtu.be/7B-x9nSEhdw" target="_blank">Process and Machine Layout</a></li>
<li><a href="http://youtu.be/cqtX7In_BeA" target="_blank">Bulk Inserts and Pre-Splitting</a></li>
<li><a href="http://youtu.be/10CjrHhW4PI" target="_blank">Further Tips and Best Pratices</a></li>
</ol>
<div>
<b><span style="color: blue;">Week - 7 : </span></b><br />
<br />
<ol>
<li><a href="http://youtu.be/eW_vyLKDOO8" target="_blank">Introduction to Security</a></li>
<li><a href="http://youtu.be/51tSE820oik" target="_blank">Security and Clients</a></li>
<li><a href="http://youtu.be/DECkGOasW3M" target="_blank">Intra-Cluster Security</a></li>
<li><a href="http://youtu.be/M_wdmhHZ1Ro" target="_blank">Backups</a></li>
<li><a href="http://youtu.be/1wipz_jlgB0" target="_blank">Backup Strategies</a></li>
<li><a href="http://youtu.be/TIcRAr8WV08" target="_blank">Introduction to MMS Backup</a></li>
<li><a href="http://youtu.be/OgUf0i9wxCA" target="_blank">Installing MMS Backup</a></li>
<li><a href="http://youtu.be/L2KfMXkYpus" target="_blank">Data Recovery with MMS Backup</a></li>
<li><a href="http://youtu.be/_pmDPjERisg" target="_blank">Geospatial Indexes</a></li>
<li><a href="http://youtu.be/Lnr6u-f7UBE" target="_blank">Additional Features</a></li>
<li><a href="http://youtu.be/WW4hjxplGe4" target="_blank">Hardware Tips</a></li>
<li><a href="http://youtu.be/wG6Fk1E8R9c" target="_blank">Additional Resourse</a></li>
</ol>
</div>
</div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com2tag:blogger.com,1999:blog-1743557738889130058.post-62617648312360004022013-10-28T03:08:00.006-07:002013-10-28T03:09:47.511-07:00RStudio Server: Configuring the Server on Ubuntu<div dir="ltr" style="text-align: left;" trbidi="on">
<h1 style="background-color: white; color: #333333; line-height: 40px; margin: 10px 0px; text-rendering: optimizelegibility;">
<span style="font-family: Times, 'Times New Roman', serif; font-size: small;">Overview</span></h1>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">RStudio is configured by adding entries to two configuration files (note that these files do not exist by default so you will need to create them if you wish to specify custom settings):</span></div>
<pre class="code" style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 20px; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code style="background-color: transparent; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 0px; color: inherit; padding: 0px;"><span style="font-family: Times, Times New Roman, serif;">/etc/rstudio/rserver.conf
/etc/rstudio/rsession.conf
</span></code></pre>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">After editing configuration files you should perform a check to ensure that the entries you specified are valid. This can be accomplished by executing the following command:</span></div>
<pre class="code" style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 20px; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code style="background-color: transparent; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 0px; color: inherit; padding: 0px;"><span style="font-family: Times, Times New Roman, serif;">$ sudo rstudio-server test-config
</span></code></pre>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">Note that this command is also automatically executed when starting or restarting the server (those commands will fail if the configuration is not valid).</span></div>
<h2 style="background-color: white; color: #333333; line-height: 40px; margin: 10px 0px; text-rendering: optimizelegibility;">
<span style="font-family: Times, Times New Roman, serif; font-size: small;">Network Port and Address</span></h2>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">After initial installation RStudio accepts connections on port 8787. If you wish to change to another port you should create an<strong><code style="background-color: #f7f7f9; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(225, 225, 232); color: #dd1144; padding: 2px 4px;">/etc/rstudio/rserver.conf</code></strong> file (if one doesn't already exist) and add a <code style="background-color: #f7f7f9; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(225, 225, 232); color: #dd1144; padding: 2px 4px;">www-port</code> entry corresponding to the port you want RStudio to listen on. For example:</span></div>
<pre class="code" style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 20px; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code style="background-color: transparent; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 0px; color: inherit; padding: 0px;"><span style="font-family: Times, Times New Roman, serif;">www-port=80
</span></code></pre>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">By default RStudio binds to address 0.0.0.0 (accepting connections from any remote IP). You can modify this behavior using the <code style="background-color: #f7f7f9; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(225, 225, 232); color: #dd1144; padding: 2px 4px;">www-address</code> entry. For example:</span></div>
<pre class="code" style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 20px; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code style="background-color: transparent; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 0px; color: inherit; padding: 0px;"><span style="font-family: Times, Times New Roman, serif;">www-address=127.0.0.1
</span></code></pre>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">Note that after editing the <strong><code style="background-color: #f7f7f9; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(225, 225, 232); color: #dd1144; padding: 2px 4px;">/etc/rstudio/rserver.conf</code></strong> file you should always restart the server to apply your changes (and validate that your configuration entries were valid). You can do this by entering the following command:</span></div>
<pre class="code" style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 20px; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code style="background-color: transparent; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 0px; color: inherit; padding: 0px;"><span style="font-family: Times, Times New Roman, serif;">$ sudo rstudio-server restart
</span></code></pre>
<h2 style="background-color: white; color: #333333; line-height: 40px; margin: 10px 0px; text-rendering: optimizelegibility;">
<span style="font-family: Times, Times New Roman, serif; font-size: small;">External Libraries</span></h2>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">You can add elements to the default LD_LIBRARY_PATH for R sessions (as determined by the R ldpaths script) by adding an<code style="background-color: #f7f7f9; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(225, 225, 232); color: #dd1144; padding: 2px 4px;">rsession-ld-library-path</code> entry to the server config file. This might be useful for ensuring that packages can locate external library dependencies that aren't installed in the system standard library paths. For example:</span></div>
<pre class="code" style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 20px; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code style="background-color: transparent; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 0px; color: inherit; padding: 0px;"><span style="font-family: Times, Times New Roman, serif;">rsession-ld-library-path=/opt/local/lib:/opt/local/someapp/lib
</span></code></pre>
<h2 style="background-color: white; color: #333333; line-height: 40px; margin: 10px 0px; text-rendering: optimizelegibility;">
<span style="font-family: Times, Times New Roman, serif; font-size: small;">Specifying R Version</span></h2>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">By default RStudio Server runs against the version of R which is found on the system PATH (using <code style="background-color: #f7f7f9; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(225, 225, 232); color: #dd1144; padding: 2px 4px;">which R</code>). You can override which version of R is used via the <code style="background-color: #f7f7f9; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(225, 225, 232); color: #dd1144; padding: 2px 4px;">rsession-which-r</code> setting in the server config file. For example, if you have two versions of R installed on the server and want to make sure the one at <code style="background-color: #f7f7f9; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(225, 225, 232); color: #dd1144; padding: 2px 4px;">/usr/local/bin/R</code> is used by RStudio then you would use:</span></div>
<pre class="code" style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 20px; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code style="background-color: transparent; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 0px; color: inherit; padding: 0px;"><span style="font-family: Times, Times New Roman, serif;">rsession-which-r=/usr/local/bin/R
</span></code></pre>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">Note again that the server must be restarted for this setting to take effect.</span></div>
<h2 style="background-color: white; color: #333333; line-height: 40px; margin: 10px 0px; text-rendering: optimizelegibility;">
<span style="font-family: Times, Times New Roman, serif; font-size: small;">Setting User Limits</span></h2>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">There are a number of settings which place limits on which users can access RStudio and the amount of resources they can consume. This file does not exist by default so if you wish to specify any of the settings below you should create the file.</span></div>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">To limit the users who can login to RStudio to the members of a specific group, you use the <code style="background-color: #f7f7f9; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(225, 225, 232); color: #dd1144; padding: 2px 4px;">auth-required-user-group</code>setting. For example:</span></div>
<pre class="code" style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 20px; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code style="background-color: transparent; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 0px; color: inherit; padding: 0px;"><span style="font-family: Times, Times New Roman, serif;">auth-required-user-group=rstudio_users
</span></code></pre>
<h2 style="background-color: white; color: #333333; line-height: 40px; margin: 10px 0px; text-rendering: optimizelegibility;">
<span style="font-family: Times, Times New Roman, serif; font-size: small;">Additional Settings</span></h2>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">There is a separate <strong><code style="background-color: #f7f7f9; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(225, 225, 232); color: #dd1144; padding: 2px 4px;">/etc/rstudio/rsession.conf</code></strong> configuration file that enables you to control various aspects of R sessions (note that as with <code style="background-color: #f7f7f9; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(225, 225, 232); color: #dd1144; padding: 2px 4px;">rserver.conf</code> this file does not exist by default). These settings are especially useful if you have a large number of potential users and want to make sure that resources are balanced appropriately.</span></div>
<h3 style="background-color: white; color: #333333; line-height: 40px; margin: 10px 0px; text-rendering: optimizelegibility;">
<span style="font-family: Times, Times New Roman, serif; font-size: small;">Session Timeouts</span></h3>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">By default if a user hasn't issued a command for 2 hours RStudio will suspend that user's R session to disk so they are no longer consuming server resources (the next time the user attempts to access the server their session will be restored). You can change the timeout (including disabling it by specifying a value of 0) using the <code style="background-color: #f7f7f9; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(225, 225, 232); color: #dd1144; padding: 2px 4px;">session-timeout-minutes</code> setting. For example:</span></div>
<pre class="code" style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 20px; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code style="background-color: transparent; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 0px; color: inherit; padding: 0px;"><span style="font-family: Times, Times New Roman, serif;">session-timeout-minutes=30
</span></code></pre>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">Note that a user's session will never be suspended while it is running code (only sessions which are idle will be suspended).</span></div>
<h3 style="background-color: white; color: #333333; line-height: 40px; margin: 10px 0px; text-rendering: optimizelegibility;">
<span style="font-family: Times, Times New Roman, serif; font-size: small;">Package Library Path</span></h3>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">By default RStudio sets the R_LIBS_USER environment variable to ~/R/library. This ensures that packages installed by end users do not have R version numbers encoded in the path (which is the default behavior). This in turn enables administrators to upgrade the version of R on the server without reseting users installed packages (which would occur if the installed packages were in an R-version derived directory).</span></div>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">If you wish to override this behavior you can do so using the <code style="background-color: #f7f7f9; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(225, 225, 232); color: #dd1144; padding: 2px 4px;">r-libs-user</code> settings. For example:</span></div>
<pre class="code" style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 20px; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code style="background-color: transparent; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 0px; color: inherit; padding: 0px;"><span style="font-family: Times, Times New Roman, serif;">r-libs-user=~/R/packages
</span></code></pre>
<h3 style="background-color: white; color: #333333; line-height: 40px; margin: 10px 0px; text-rendering: optimizelegibility;">
<span style="font-family: Times, Times New Roman, serif; font-size: small;">CRAN Repository</span></h3>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">Finally, you can set the default CRAN repository for the server using the <code style="background-color: #f7f7f9; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(225, 225, 232); color: #dd1144; padding: 2px 4px;">r-cran-repos</code> setting. For example:</span></div>
<pre class="code" style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 20px; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code style="background-color: transparent; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 0px; color: inherit; padding: 0px;"><span style="font-family: Times, Times New Roman, serif;">r-cran-repos=http://cran.case.edu/
</span></code></pre>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">Note again that the above settings should be specified in the <strong><code style="background-color: #f7f7f9; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(225, 225, 232); color: #dd1144; padding: 2px 4px;">/etc/rstudio/rsession.conf</code></strong> file (rather than the aforementioned <code style="background-color: #f7f7f9; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(225, 225, 232); color: #dd1144; padding: 2px 4px;">rserver.conf</code> file).</span></div>
</div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com1tag:blogger.com,1999:blog-1743557738889130058.post-82761462574895986622013-10-28T03:08:00.002-07:002013-10-28T03:10:15.048-07:00RStudio Server: Managing the Server on Ubuntu<div dir="ltr" style="text-align: left;" trbidi="on">
<h1 style="background-color: white; color: #333333; line-height: 40px; margin: 10px 0px; text-rendering: optimizelegibility;">
<span style="font-family: Times, 'Times New Roman', serif; font-size: small;">Overview</span></h1>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">RStudio server management tasks are performed using the <strong><code style="background-color: #f7f7f9; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(225, 225, 232); color: #dd1144; padding: 2px 4px;">rstudio-server</code></strong> utility (installed under <code style="background-color: #f7f7f9; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(225, 225, 232); color: #dd1144; padding: 2px 4px;">/usr/sbin</code> in binary distributions). This utility enables the stopping, starting, and restarting of the server, enumeration and suspension of user sessions, taking the server offline, as well as the ability to hot upgrade a running version of the server.</span></div>
<h2 style="background-color: white; color: #333333; line-height: 40px; margin: 10px 0px; text-rendering: optimizelegibility;">
<span style="font-family: Times, Times New Roman, serif; font-size: small;">Stopping and Starting</span></h2>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">If you installed RStudio using a package manager binary (e.g. a Debian package or RPM) then RStudio is automatically registred as a deamon which starts along with the rest of the system. On Ubuntu this registration is performed using an Upstart script at <code style="background-color: #f7f7f9; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(225, 225, 232); color: #dd1144; padding: 2px 4px;">/etc/init/rstudio-server.conf</code>. On other systems an init.d script is installed at <code style="background-color: #f7f7f9; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(225, 225, 232); color: #dd1144; padding: 2px 4px;">/etc/init.d/rstudio-server</code>.</span></div>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">To manually stop, start, and restart the server you use the following commands:</span></div>
<pre class="code" style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 20px; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code style="background-color: transparent; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 0px; color: inherit; padding: 0px;"><span style="font-family: Times, Times New Roman, serif;">$ sudo rstudio-server stop
$ sudo rstudio-server start
$ sudo rstudio-server restart
</span></code></pre>
<h2 style="background-color: white; color: #333333; line-height: 40px; margin: 10px 0px; text-rendering: optimizelegibility;">
<span style="font-family: Times, Times New Roman, serif; font-size: small;">Managing Active Sessions</span></h2>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">There are a number of administrative commands which allow you to see what sessions are active and request suspension of running sessions (note that session data is not lost during a suspend).</span></div>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">To list all currently active sessions:</span></div>
<pre class="code" style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 20px; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code style="background-color: transparent; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 0px; color: inherit; padding: 0px;"><span style="font-family: Times, Times New Roman, serif;">$ sudo rstudio-server active-sessions
</span></code></pre>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">To suspend an individual session:</span></div>
<pre class="code" style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 20px; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code style="background-color: transparent; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 0px; color: inherit; padding: 0px;"><span style="font-family: Times, Times New Roman, serif;">$ sudo rstudio-server suspend-session <pid>
</span></code></pre>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">To suspend all running sessions:</span></div>
<pre class="code" style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 20px; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code style="background-color: transparent; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 0px; color: inherit; padding: 0px;"><span style="font-family: Times, Times New Roman, serif;">$ sudo rstudio-server suspend-all
</span></code></pre>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">The suspend commands also have a "force" variation which will send an interrupt to to the session to request the termination of any running R command:</span></div>
<pre class="code" style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 20px; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code style="background-color: transparent; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 0px; color: inherit; padding: 0px;"><span style="font-family: Times, Times New Roman, serif;">$ sudo rstudio-server force-suspend-session <pid>
$ sudo rstudio-server force-suspend-all
</span></code></pre>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">The force-suspend-all command should be issued immediately prior to any reboot so as to preserve the data and state of active R sessions accross the restart.</span></div>
<h2 style="background-color: white; color: #333333; line-height: 40px; margin: 10px 0px; text-rendering: optimizelegibility;">
<span style="font-family: Times, Times New Roman, serif; font-size: small;">Taking the Server Offline</span></h2>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">If you need to perform system maintenance and want users to receive a friendly message indicating the server is offline you can issue the following command:</span></div>
<pre class="code" style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 20px; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code style="background-color: transparent; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 0px; color: inherit; padding: 0px;"><span style="font-family: Times, Times New Roman, serif;">$ sudo rstudio-server offline
</span></code></pre>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">When the server is once again available you should issue this command:</span></div>
<pre class="code" style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 20px; margin-bottom: 10px; padding: 9.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><code style="background-color: transparent; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 0px; color: inherit; padding: 0px;"><span style="font-family: Times, Times New Roman, serif;">$ sudo rstudio-server online
</span></code></pre>
<h2 style="background-color: white; color: #333333; line-height: 40px; margin: 10px 0px; text-rendering: optimizelegibility;">
<span style="font-family: Times, Times New Roman, serif; font-size: small;">Upgrading to a New Version</span></h2>
<div style="background-color: white; color: #333333; line-height: 22px; margin-bottom: 10px;">
<span style="font-family: Times, Times New Roman, serif;">If you perform an upgrade of RStudio Server using a package manager binary (e.g. a Debian package or RPM) and a version of RStudio Server is currently running, then the upgrade process will also ensure that active sessions are immediately migrated to the new version. This includes the following behavior:</span></div>
<ul style="background-color: white; color: #333333; line-height: 22px; margin: 0px 0px 10px 25px; padding: 0px;">
<li style="line-height: 20px;"><span style="font-family: Times, Times New Roman, serif;">Running R sessions are suspended so that future interactions with the server automatically launch the updated R session binary</span></li>
<li style="line-height: 20px;"><span style="font-family: Times, Times New Roman, serif;">Currently connected browser clients are notified that a new version is available and automatically refresh themselves.</span></li>
<li style="line-height: 20px;"><span style="font-family: Times, Times New Roman, serif;">The core server binary is restarted</span></li>
</ul>
</div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com0tag:blogger.com,1999:blog-1743557738889130058.post-43396449452914635452013-10-28T02:57:00.004-07:002013-10-28T03:09:01.654-07:00RStudio Server : Installation on Ubunutu<div dir="ltr" style="text-align: left;" trbidi="on">
<h3 style="clear: left; color: #111111; font-weight: 400; line-height: 32px; margin: 20px 0px 2px; text-rendering: optimizelegibility;">
<span style="font-family: Times, Times New Roman, serif; font-size: small;">What is RStudio?</span></h3>
<hr style="border-bottom-color: rgb(255, 255, 255); border-bottom-style: solid; border-left-width: 0px; border-right-width: 0px; border-top-color: rgb(238, 238, 238); border-top-style: solid; box-sizing: content-box; color: #111111; height: 0px; line-height: 23px; margin: 0px 0px 13px !important;" />
<span style="font-family: Times, Times New Roman, serif;"><a href="http://www.rstudio.com/ide" style="color: #0088cc; line-height: 23px; text-decoration: none;">RStudio IDE</a><span style="background-color: white; color: #111111; line-height: 23px;"> is an open source </span><strong style="color: #111111; line-height: 23px;">I</strong><span style="background-color: white; color: #111111; line-height: 23px;">ntegrated </span><strong style="color: #111111; line-height: 23px;">D</strong><span style="background-color: white; color: #111111; line-height: 23px;">evelopment </span><strong style="color: #111111; line-height: 23px;">E</strong><span style="background-color: white; color: #111111; line-height: 23px;">nvironment for the statistical analysis program R. RStudio Server provides a web version of RStudio IDE that allows easy development on a VPS.</span><br style="color: #111111; line-height: 23px;" /><br style="color: #111111; line-height: 23px;" /><span style="background-color: white; color: #111111; line-height: 23px;">Since our VPSs are billed by the hour, it's surprisingly cheap to spin up a 24 core instance, crunch some data, and then destroy the VPS.</span></span><br />
<h2 style="color: #111111; font-weight: 500; line-height: 32px; margin: 30px 0px 2px; text-rendering: optimizelegibility;">
<span style="font-family: Times, Times New Roman, serif; font-size: small;">Installing RStudio In a VPS</span></h2>
<hr style="border-bottom-color: rgb(255, 255, 255); border-bottom-style: solid; border-left-width: 0px; border-right-width: 0px; border-top-color: rgb(238, 238, 238); border-top-style: solid; box-sizing: content-box; color: #111111; height: 0px; line-height: 23px; margin: 0px 0px 13px !important;" />
<span style="background-color: white; color: #111111; line-height: 23px;"><span style="font-family: Times, Times New Roman, serif;">First, install R, apparmor, and gdebi.</span></span><br />
<pre style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 16px; margin-bottom: 1em; margin-top: 1em; padding: 7.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><span style="font-family: Times, Times New Roman, serif;">sudo apt-get install r-base libapparmor1 gdebi-core</span></pre>
<span style="font-family: Times, Times New Roman, serif;"><span style="background-color: white; color: #111111; line-height: 23px;">Next, download and install the correct package for your architecture. On 32-bit Ubuntu, execute the following commands.</span></span><br />
<pre style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 16px; margin-bottom: 1em; margin-top: 1em; padding: 7.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><span style="font-family: Times, Times New Roman, serif;">wget http://download2.rstudio.org/rstudio-server-0.97.336-i386.deb -O rstudio.deb</span></pre>
<span style="font-family: Times, Times New Roman, serif;"><span style="background-color: white; color: #111111; line-height: 23px;">On 64-bit Ubuntu, execute the following commands.</span></span><br />
<pre style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 16px; margin-bottom: 1em; margin-top: 1em; padding: 7.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><span style="font-family: Times, Times New Roman, serif;">wget http://download2.rstudio.org/rstudio-server-0.97.336-amd64.deb -O rstudio.deb</span></pre>
<span style="font-family: Times, Times New Roman, serif;"><br style="color: #111111; line-height: 23px;" /><span style="background-color: white; color: #111111; line-height: 23px;">Install the package.</span></span><br />
<pre style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 16px; margin-bottom: 1em; margin-top: 1em; padding: 7.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><span style="font-family: Times, Times New Roman, serif;">sudo gdebi rstudio.deb</span></pre>
<h2 style="color: #111111; font-weight: 500; line-height: 32px; margin: 30px 0px 2px; text-rendering: optimizelegibility;">
<span style="font-family: Times, Times New Roman, serif; font-size: small;">Creating RStudio User</span></h2>
<hr style="border-bottom-color: rgb(255, 255, 255); border-bottom-style: solid; border-left-width: 0px; border-right-width: 0px; border-top-color: rgb(238, 238, 238); border-top-style: solid; box-sizing: content-box; color: #111111; height: 0px; line-height: 23px; margin: 0px 0px 13px !important;" />
<span style="background-color: white; color: #111111; line-height: 23px;"><span style="font-family: Times, Times New Roman, serif;">It is not advisable to use the root account with RStudio, instead, create a normal user account just for RStudio. The account can be named anything, and the account password will be the one to use in the web interface.</span></span><br />
<pre style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 16px; margin-bottom: 1em; margin-top: 1em; padding: 7.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><span style="font-family: Times, Times New Roman, serif;">sudo adduser rstudio</span></pre>
<span style="font-family: Times, Times New Roman, serif;"><br style="color: #111111; line-height: 23px;" /><span style="background-color: white; color: #111111; line-height: 23px;">RStudio will use the user's home directory as it's default workspace.</span></span><br />
<h2 style="color: #111111; font-weight: 500; line-height: 32px; margin: 30px 0px 2px; text-rendering: optimizelegibility;">
<span style="font-family: Times, Times New Roman, serif; font-size: small;">Using R Studio</span></h2>
<hr style="border-bottom-color: rgb(255, 255, 255); border-bottom-style: solid; border-left-width: 0px; border-right-width: 0px; border-top-color: rgb(238, 238, 238); border-top-style: solid; box-sizing: content-box; color: #111111; height: 0px; line-height: 23px; margin: 0px 0px 13px !important;" />
<span style="font-family: Times, Times New Roman, serif;"><span style="background-color: white; color: #111111; line-height: 23px;">RStudio can be access through port 8787. Any user account with a password can be used in RStudio. </span><img in="" rstudio="" sign="" src="https://i.imgur.com/kobMKpU.png%E2%80%9D%20alt=" style="border: 0px; color: #111111; height: auto; line-height: 23px; max-width: 100%; vertical-align: middle;" /><br style="color: #111111; line-height: 23px;" /><br style="color: #111111; line-height: 23px;" /><span style="background-color: white; color: #111111; line-height: 23px;">Let's test that RStudio is working correctly by installing a quantitative finance package from</span><a href="http://cran.us.r-project.org/" style="color: #0088cc; line-height: 23px; text-decoration: none;">CRAN</a><span style="background-color: white; color: #111111; line-height: 23px;">, the R package repository.</span><br style="color: #111111; line-height: 23px;" /><br style="color: #111111; line-height: 23px;" /><span style="background-color: white; color: #111111; line-height: 23px;">Run the following command inside RStudio to install </span><a href="http://www.quantmod.com/" style="color: #0088cc; line-height: 23px; text-decoration: none;">quantmod</a><span style="background-color: white; color: #111111; line-height: 23px;">.</span></span><br />
<pre style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 16px; margin-bottom: 1em; margin-top: 1em; padding: 7.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><span style="font-family: Times, Times New Roman, serif;">install.packages("quantmod")</span></pre>
<span style="font-family: Times, Times New Roman, serif;"><br style="color: #111111; line-height: 23px;" /><img alt="Rstudio 1" src="https://i.imgur.com/KyPVGkq.png?1" style="border: 0px; color: #111111; height: auto; line-height: 23px; max-width: 100%; vertical-align: middle;" /><br style="color: #111111; line-height: 23px;" /><br style="color: #111111; line-height: 23px;" /><span style="background-color: white; color: #111111; line-height: 23px;">Next, let's test out RStudio's graphing capabilities by plotting the stock price of Apple. The graph will appear in the bottom right panel of RStudio.</span></span><br />
<pre style="background-color: whitesmoke; border-bottom-left-radius: 4px; border-bottom-right-radius: 4px; border-top-left-radius: 4px; border-top-right-radius: 4px; border: 1px solid rgba(0, 0, 0, 0.14902); color: #333333; line-height: 16px; margin-bottom: 1em; margin-top: 1em; padding: 7.5px; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"><span style="font-family: Times, Times New Roman, serif;">library('quantmod')
data <- new.env()
getSymbols('AAPL', data)
plot(data$AAPL)</span></pre>
<span style="font-family: Times, Times New Roman, serif;"><br style="color: #111111; line-height: 23px;" /><img alt="Rstudio 2" src="https://i.imgur.com/EIa5IVz.png?1" style="border: 0px; color: #111111; height: auto; line-height: 23px; max-width: 100%; vertical-align: middle;" /><br style="color: #111111; line-height: 23px;" /><br style="color: #111111; line-height: 23px;" /><span style="background-color: white; color: #111111; line-height: 23px;">R is a really powerful tool and there are hundreds of useful packages available from </span><a href="http://cran.us.r-project.org/" style="color: #0088cc; line-height: 23px; text-decoration: none;">CRAN</a><span style="background-color: white; color: #111111; line-height: 23px;">. You can learn the basics of R at </span><a href="http://www.codeschool.com/courses/try-r" style="color: #0088cc; line-height: 23px; text-decoration: none;">Try R</a><span style="background-color: white; color: #111111; line-height: 23px;">.</span></span></div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com1tag:blogger.com,1999:blog-1743557738889130058.post-79652737867043509132013-10-25T05:18:00.001-07:002013-10-25T05:22:41.388-07:00Pentaho Data Integration 4.4 and Hadoop 1.0.4<div dir="ltr" style="text-align: left;" trbidi="on">
<h3 style="text-align: left;">
<strong><span style="line-height: 1.5;"><span style="color: blue; font-family: Times, Times New Roman, serif; font-size: small;">Prerequisites:</span></span></strong></h3>
<ul style="text-align: left;">
<li><span style="font-family: Times, 'Times New Roman', serif; line-height: 1.5;">Copy the hadoop-20 folder to a hadoop-104 folder(created by the user manually) in the /opt/pentaho/design-tools/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/ directory.</span></li>
<li><span style="font-family: Times, 'Times New Roman', serif; line-height: 1.5;">Replace the following JARs in the client (subfolder) with the versions from the Apache Hadoop 1.0.4 distribution:</span></li>
<ul>
<li><span style="font-family: Times, 'Times New Roman', serif; line-height: 19px;">commons-codec-1.0.4.jar</span></li>
<li><span style="font-family: Times, 'Times New Roman', serif; line-height: 19px;">hadoop-core-1.0.4.jar</span></li>
</ul>
<li><span style="font-family: Times, 'Times New Roman', serif; line-height: 1.5;">Add the following JAR from the Hadoop 1.0.4 distribution to the client (subfolder) as well:</span></li>
<ul>
<li><span style="font-family: Times, 'Times New Roman', serif; line-height: 19px;">commons-configuration-1.0.6.jar</span></li>
</ul>
<li><span style="font-family: Times, 'Times New Roman', serif; line-height: 1.5;">Then change the property in plugins.properties to point to the new folder:</span></li>
<ul>
<li><span style="font-family: Times, 'Times New Roman', serif; line-height: 19px;">active.hadoop.configuration=hadoop-104</span></li>
</ul>
<li><span style="font-family: Times, 'Times New Roman', serif; line-height: 1.5;">Start hadoop with the user created while hadoop installation. </span><em style="font-family: Times, 'Times New Roman', serif; line-height: 19px;"><span style="line-height: 1.5;">Note</span></em><span style="font-family: Times, 'Times New Roman', serif; line-height: 1.5;">: Hadoop credentials provided in the page 4 step number 12</span></li>
<li><span style="font-family: Times, 'Times New Roman', serif; line-height: 1.5;">Start PDI</span></li>
</ul>
<div style="text-align: left;">
<h3 style="text-align: left;">
<span style="color: blue; font-family: Times, Times New Roman, serif; font-size: small;"><span style="line-height: 1.5;"><b>Transformation [CSV → Hadoop]:</b></span></span></h3>
</div>
<span style="font-family: Times, Times New Roman, serif;"><span style="background-color: white; line-height: 19px;">Follow the instructions below to begin creating your transformation.</span></span><br />
<ul style="text-align: left;">
<li><span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;">Click </span><strong><span style="line-height: 1.5;">New</span></strong><span style="line-height: 1.5;"> in the upper left corner of Spoon.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;">Select </span><strong><span style="line-height: 1.5;">Transformation</span></strong><span style="line-height: 1.5;"> from the list.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;">Under the </span><strong><span style="line-height: 1.5;">Design</span></strong><span style="line-height: 1.5;"> tab, expand the </span><strong><span style="line-height: 1.5;">Input</span></strong><span style="line-height: 1.5;"> node; then, select and drag a </span><strong><span style="line-height: 1.5;">CSV file input</span></strong><span style="line-height: 1.5;"> step onto the canvas on the right.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;">Expand the </span><strong><span style="line-height: 1.5;">Big Data</span></strong><span style="line-height: 1.5;"> node; click and drag a </span><strong><span style="line-height: 1.5;">Hadoop File Output</span></strong><span style="line-height: 1.5;"> step onto the canvas.</span><strong><span style="line-height: 1.5;">.</span></strong></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;">To connect the steps to each other, you must add a hop. Hops are used to describe the flow of data between steps in your transformation. To create the hop, click the</span><strong><span style="line-height: 1.5;">CSV file input</span></strong><span style="line-height: 1.5;"> step, then press and hold the <</span><strong><span style="line-height: 1.5;">SHIFT</span></strong><span style="line-height: 1.5;">> key then draw a line to the </span><strong><span style="line-height: 1.5;">Hadoop File Output</span></strong><span style="line-height: 1.5;"> step.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;">Double click the </span><strong><span style="line-height: 1.5;">CSV file input</span></strong><span style="line-height: 1.5;"> step to open its edit properties dialog box.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;">In the Filename field, click on the </span><strong><span style="line-height: 1.5;">Browse</span></strong><span style="line-height: 1.5;"> button and navigate to the input file location</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;">Select the desired input file. (e.g) sample.csv</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;">Click the </span><strong><span style="line-height: 1.5;">Get fields</span></strong><span style="line-height: 1.5;"> button to get the columns of the input file and click </span><strong><span style="line-height: 1.5;">OK</span></strong><span style="line-height: 1.5;"> button.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;">Double click the </span><strong><span style="line-height: 1.5;">Hadoop File Output</span></strong><span style="line-height: 1.5;"> step to open its edit properties dialog box.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;">In the </span><strong><span style="line-height: 1.5;">Filename</span></strong><span style="line-height: 1.5;"> field, click on the </span><strong><span style="line-height: 1.5;">Browse</span></strong><span style="line-height: 1.5;"> button and Open File dialog box appears as shown below</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;">Enter the following credentials to connect with HDFS:</span></span></li>
<ul>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: transparent;">Look In – Check whether you have selected HDFS</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: transparent;">In Connection,</span></span></li>
<ul>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: transparent; color: blue;">Server – localhost</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: transparent; color: blue;">Port - 54310</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: transparent; color: blue;">User ID - hduser</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: transparent; color: blue;">Password - password</span></span></li>
</ul>
</ul>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;">Click </span><strong><span style="line-height: 1.5;">Connect</span></strong><span style="line-height: 1.5;"> button to connect with HDFS and Open File dialog box appears as shown below:</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;">Click </span><strong><span style="line-height: 1.5;">OK</span></strong><span style="line-height: 1.5;"> button.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;">Provide the desired output file name next to the path selected in the </span><strong><span style="line-height: 1.5;">Filename</span></strong><span style="line-height: 1.5;"> field</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;">Navigate to the </span><strong><span style="line-height: 1.5;">Fields</span></strong><span style="line-height: 1.5;"> tab, click the </span><strong><span style="line-height: 1.5;">Get Fields</span></strong><span style="line-height: 1.5;"> button to get the columns of the input file and click </span><strong><span style="line-height: 1.5;">OK</span></strong><span style="line-height: 1.5;"> button.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;">Click the </span><strong><span style="line-height: 1.5;">Save</span></strong><span style="line-height: 1.5;"> icon and save the transformation you have created.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;">Click on the Run icon in the right panel to execute the transformation.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: transparent; line-height: 1.5;">The </span><strong style="background-color: transparent;"><span style="line-height: 1.5;">Execute a Transformation</span></strong><span style="background-color: transparent; line-height: 1.5;"> dialog box appears.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><em style="background-color: transparent;"><span style="line-height: 1.5;">Note</span></em><span style="background-color: transparent; line-height: 1.5;">: </span><strong style="background-color: transparent;"><span style="line-height: 1.5;">Local Execution</span></strong><span style="background-color: transparent; line-height: 1.5;"> is enabled by default. Select </span><strong style="background-color: transparent;"><span style="line-height: 1.5;">Detailed logging</span></strong><span style="background-color: transparent; line-height: 1.5;">.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;">Click </span><strong><span style="line-height: 1.5;"><span style="line-height: 1.5;">Launch</span></span></strong><span style="line-height: 1.5;">.</span></span></li>
</ul>
<h3 style="text-align: left;">
<span style="color: blue; font-family: Times, Times New Roman, serif; font-size: small;"><b>Transformation [ Hadoop → Text File]:</b></span></h3>
<span style="background-color: white; font-family: Times, 'Times New Roman', serif; line-height: 24px;">Follow the instructions below to begin creating your transformation.</span><br />
<span style="font-family: Times, Times New Roman, serif;"></span><br />
<ul style="text-align: left;">
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: white; line-height: 1.5;">Click </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">New</span></strong><span style="background-color: white; line-height: 1.5;"> in the upper left corner of Spoon.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: white; line-height: 1.5;">Select </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">Transformation</span></strong><span style="background-color: white; line-height: 1.5;"> from the list.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: white; line-height: 1.5;">Under the </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">Design</span></strong><span style="background-color: white; line-height: 1.5;"> tab, expand the </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">Big Data</span></strong><span style="background-color: white; line-height: 1.5;"> node; then, select and drag a </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">Hadoop File Input</span></strong><span style="background-color: white; line-height: 1.5;"> step onto the canvas on the right.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: white; line-height: 1.5;">Expand the </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">Output</span></strong><span style="background-color: white; line-height: 1.5;"> node; click and drag a </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">Text file output</span></strong><span style="background-color: white; line-height: 1.5;"> step onto the canvas.</span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">.</span></strong></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: white; line-height: 1.5;">To connect the steps to each other, you must add a hop. Hops are used to describe the flow of data between steps in your transformation. To create the hop, click the</span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">Hadoop File input</span></strong><span style="background-color: white; line-height: 1.5;"> step, then press and hold the <</span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">SHIFT</span></strong><span style="background-color: white; line-height: 1.5;">> key then draw a line to the </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">Text file output</span></strong><span style="background-color: white; line-height: 1.5;"> step.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: white; line-height: 1.5;">Double click the </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">Hadoop File Input</span></strong><span style="background-color: white; line-height: 1.5;"> step to open its edit properties dialog box.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: white; line-height: 1.5;">In the </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">File or directory</span></strong><span style="background-color: white; line-height: 1.5;"> field, click on the </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">Browse</span></strong><span style="background-color: white; line-height: 1.5;"> button and Open File dialog box appears as shown below</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: white; line-height: 1.5;">Enter the following credentials to connect with HDFS:</span></span></li>
<ul>
<li><span style="font-family: Times, Times New Roman, serif;">Look In – Check whether you have selected HDFS</span></li>
<li><span style="font-family: Times, Times New Roman, serif;">In Connection,</span></li>
<ul>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="color: blue;">Server – localhost</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="color: blue;">Port - 54310</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="color: blue;">User ID - hduser</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="color: blue;">Password – password</span></span></li>
</ul>
</ul>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: white; line-height: 1.5;">Click </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">Connect</span></strong><span style="background-color: white; line-height: 1.5;"> button to connect with HDFS and Open File dialog box appears as shown below:</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: white; line-height: 1.5;">Select the desired input file from HDFS. Click </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">OK</span></strong><span style="background-color: white; line-height: 1.5;"> button.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: white; line-height: 1.5;">Click </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">ADD</span></strong><span style="background-color: white; line-height: 1.5;"> button corresponds to the </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">File or directory</span></strong><span style="background-color: white; line-height: 1.5;"> field as shown below</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: white; line-height: 1.5;">Navigate to the </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">Fields</span></strong><span style="background-color: white; line-height: 1.5;"> tab, click the </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">Get Fields</span></strong><span style="background-color: white; line-height: 1.5;"> button to get the columns of the input file and click </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">OK</span></strong><span style="background-color: white; line-height: 1.5;"> button.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: white; line-height: 1.5;">Double click the </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">Text file output</span></strong><span style="background-color: white; line-height: 1.5;"> step to open its edit properties dialog box.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: white; line-height: 1.5;">In the Filename field, click on the </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">Browse</span></strong><span style="background-color: white; line-height: 1.5;"> button and navigate to the desired location where the output file to be placed</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: white; line-height: 1.5;">Provide the desired output file name next to the path selected in the </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">Filename</span></strong><span style="background-color: white; line-height: 1.5;"> field</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: white; line-height: 1.5;">Navigate to the </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">Fields</span></strong><span style="background-color: white; line-height: 1.5;"> tab, click the </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">Get Fields</span></strong><span style="background-color: white; line-height: 1.5;"> button to get the columns of the input file and click </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">OK</span></strong><span style="background-color: white; line-height: 1.5;"> button.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: white; line-height: 1.5;">Click the </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">Save</span></strong><span style="background-color: white; line-height: 1.5;"> icon and save the transformation you have created.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: white; line-height: 1.5;">Click on the Run icon in the right panel to execute the transformation.</span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="background-color: white; line-height: 1.5;">Click </span><strong style="background-color: white; line-height: 19px;"><span style="line-height: 1.5;">Launch</span></strong><span style="background-color: white; line-height: 1.5;">.</span></span></li>
</ul>
</div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com5tag:blogger.com,1999:blog-1743557738889130058.post-77508181521552823602013-10-25T05:13:00.005-07:002013-10-25T05:13:39.599-07:00MongoDB Installation on Ubuntu<div dir="ltr" style="text-align: left;" trbidi="on">
<ul style="background-color: white; font-family: arial, helvetica, sans-serif; font-size: 13px; line-height: 19px; margin: 0.5em 0px 0px; padding: 0px 0px 0px 3em;">
<li><span style="font-family: Arial, Helvetica, sans-serif;">Open Terminal and issue the below command to install the MongoDB on Ubuntu</span><ul style="margin: 0.5em 0px 0px; padding: 0px 0px 0px 3em;">
<li><strong><span style="color: blue; font-family: Arial, Helvetica, sans-serif;">apt-get install mongodb</span></strong></li>
</ul>
</li>
<li><span style="font-family: Arial, Helvetica, sans-serif;">Else download the tar file from the following location and untar</span>Enable Authentication<ul style="margin: 0.5em 0px 0px; padding: 0px 0px 0px 3em;">
<li><strong><span style="color: blue;"><a class="wiki_link_ext" href="http://downloads.mongodb.org/linux/mongodb-linux-x86_64-2.2.2.tgz" rel="nofollow" style="background-image: url(http://www.wikispaces.com/i/a.gif); background-position: 100% 50%; background-repeat: no-repeat no-repeat; padding-right: 10px;">http://downloads.mongodb.org/linux/mongodb-linux-x86_64-2.2.2.tgz</a></span></strong></li>
<li><strong><span style="color: blue;">tar zxvf mongodb-linux-x86_64-2.2.2.tgz</span></strong></li>
<li><strong><span style="line-height: 1.5;"><span style="color: blue; font-family: Arial, Helvetica, sans-serif;">root@AX-PENTAHO:/usr/local# echo auth=true >> /etc/mongodb.conf</span></span></strong></li>
<li><strong><span style="line-height: 1.5;"><span style="color: blue; font-family: Arial, Helvetica, sans-serif;">root@AX-PENTAHO:/usr/local# grep auth=true /etc/mongodb.conf</span></span></strong></li>
<li><strong><span style="line-height: 1.5;"><span style="color: blue; font-family: Arial, Helvetica, sans-serif;">auth=true</span></span></strong></li>
</ul>
</li>
<li>Start/Stop the MongoDB<ul style="margin: 0.5em 0px 0px; padding: 0px 0px 0px 3em;">
<li><strong><span style="line-height: 1.5;"><span style="color: blue;">/etc/init.d/mongodb start/stop/restart</span></span></strong></li>
</ul>
</li>
<li>Adding User MongoDB<ul style="margin: 0.5em 0px 0px; padding: 0px 0px 0px 3em;">
<li><strong><span style="color: blue;">root@AX-PENTAHO:/usr/local# mongo</span></strong></li>
<li><strong><span style="color: blue;">MongoDB shell version: 2.2.2</span></strong></li>
<li><strong><span style="color: blue;">connecting to: test</span></strong></li>
<li><strong><span style="color: blue;">> use admin;</span></strong></li>
<li><strong><span style="color: blue;">switched to db admin</span></strong></li>
<li><strong><span style="color: blue;">> db.addUser('admin','test123');</span></strong></li>
<li><strong><span style="color: blue;">{</span></strong></li>
<li><strong><span style="color: blue;">"user" : "admin",</span></strong></li>
<li><strong><span style="color: blue;">"readOnly" : false,</span></strong></li>
<li><strong><span style="color: blue;">"pwd" : "3ebea24ef5a0388efc523a0cb1ed54d1",</span></strong></li>
<li><strong><span style="color: blue;">"_id" : ObjectId("5100f5ffb6b86baa08f17ff5")</span></strong></li>
<li><strong><span style="color: blue;">}</span></strong></li>
</ul>
</li>
<li>Login to MongoDB using Admin Login<ul style="margin: 0.5em 0px 0px; padding: 0px 0px 0px 3em;">
<li><strong><span style="color: blue; line-height: 1.5;">root@AX-PENTAHO:/usr/local# mongo</span></strong></li>
<li><strong><span style="color: blue; line-height: 1.5;">MongoDB shell version: 2.2.2</span></strong></li>
<li><strong><span style="color: blue; line-height: 1.5;">connecting to: test</span></strong></li>
<li><strong><span style="color: blue; line-height: 1.5;">> use admin</span></strong></li>
<li><strong><span style="color: blue; line-height: 1.5;">switched to db admin</span></strong></li>
<li><strong><span style="color: blue; line-height: 1.5;">> db.db.auth('admin','Amtex123');</span></strong></li>
<li><strong><span style="color: blue; line-height: 1.5;">Thu Jan 24 14:22:00 TypeError: db.db.auth is not a function (shell):1</span></strong></li>
<li><strong><span style="color: blue; line-height: 1.5;">> db.auth('admin','Amtex123');</span></strong></li>
<li><strong><span style="color: blue; line-height: 1.5;">1</span></strong></li>
<li><strong><span style="color: blue; line-height: 1.5;">> exit</span></strong></li>
</ul>
</li>
<li><span style="font-size: 11pt; line-height: 1.5;">Listen MongoDB to all IP's</span><ul style="margin: 0.5em 0px 0px; padding: 0px 0px 0px 3em;">
<li><strong><span style="color: blue;">vim /etc/mongodb.conf</span></strong></li>
<li><strong><span style="color: blue;">Change bind_ip = 127.0.0.1 to bind_ip = 0.0.0.0</span></strong></li>
</ul>
</li>
</ul>
</div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com0tag:blogger.com,1999:blog-1743557738889130058.post-62282906571388005592013-10-25T05:13:00.002-07:002013-10-25T05:13:17.096-07:00Migrating the Report, Graphs, Dashboard created in Pentaho User Console (PUC) from v4.8.0 to v4.8.1<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: left;">
</div>
<ul style="text-align: left;">
<li><span style="color: #222222; font-family: Times, 'Times New Roman', serif; line-height: 19px;">Copying the Dashboard, Reports, Graphs, Datasource and Datasource name from your current Pentaho v 4.8</span></li>
<ul>
<li><span style="color: #222222; font-family: Times, 'Times New Roman', serif; line-height: 1.5;">Backup all the reports, graphs and dashboards from <pentaho>/server/biserver-ee/pentaho-solutions directory</span></li>
<li><span style="color: #222222; font-family: Times, 'Times New Roman', serif; line-height: 1.5;">Backup all the datasource csv files from <span style="line-height: 1.5;"><pentaho></span></span><span style="color: #222222; font-family: Times, 'Times New Roman', serif; line-height: 1.5;">/server/biserver-ee/pentaho-solutions/system/metadata/csvfiles</span></li>
<li><span style="color: #222222; font-family: Times, 'Times New Roman', serif; line-height: 1.5;">Backup all the datasource name from <span style="line-height: 1.5;"><pentaho></span></span><span style="color: #222222; font-family: Times, 'Times New Roman', serif; line-height: 1.5;">/server/biserver-ee/pentaho-solutions/system/olap</span></li>
<li><span style="color: #222222; font-family: Times, 'Times New Roman', serif; line-height: 1.5;">Backup all the resource files from </span><span style="color: #222222; font-family: Times, 'Times New Roman', serif; line-height: 1.5;"><pentaho>/server/biserver-ee/pentaho-solutions/admin/resources/metadata</span></li>
</ul>
<li><span style="background-color: white; font-family: Times, 'Times New Roman', serif; line-height: 19px;">Restore the backup files to the corresponding location of new pentaho server i.e. v4.8.1 and restart the server.</span></li>
</ul>
</div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com0tag:blogger.com,1999:blog-1743557738889130058.post-29713605668042331252013-10-25T05:11:00.004-07:002013-10-25T05:11:59.930-07:00CTools Integration with Pentaho installed on Ubuntu<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="ws-menu-bar WikiControls WikispacesContent WikispacesBs" style="background-color: white; margin-bottom: 10px; padding-bottom: 0px; position: relative;">
<div style="text-align: left;">
</div>
<ul style="text-align: left;">
<li><span style="color: #222222; font-family: Times, 'Times New Roman', serif; line-height: 1.5;">Download the <span style="line-height: 1.5;">Marked place from the following </span></span><span style="color: #222222; font-family: Times, 'Times New Roman', serif; line-height: 1.5;">location</span></li>
<ul>
<li><a class="wiki_link_ext" href="http://ci.analytical-labs.com/view/Webdetails/job/Webdetails-Marketplace/" rel="nofollow" style="background-image: url(http://www.wikispaces.com/i/a.gif); background-position: 100% 50%; background-repeat: no-repeat no-repeat; line-height: 1.5; padding-right: 10px;" target="_blank"><span style="font-family: Times, Times New Roman, serif;">http://ci.analytical-labs.com/view/Webdetails/job/Webdetails-Marketplace/</span></a></li>
</ul>
<li><span style="color: #222222; font-family: Times, 'Times New Roman', serif; line-height: 1.5;">Unzip the tar file and place it under Pentaho-Solution/System directory</span></li>
<li><span style="color: #222222; font-family: Times, 'Times New Roman', serif; line-height: 1.5;">Restart the server.</span></li>
<li><span style="color: #222222; font-family: Times, 'Times New Roman', serif; line-height: 1.5;">Open Pentaho User Console (PUC) and Navigate to Tools -> Marketplace</span></li>
<li><span style="color: #222222; font-family: Times, 'Times New Roman', serif; line-height: 1.5;">Install the suitable Tools e.g CDE.</span></li>
<li><span style="color: #222222; font-family: Times, 'Times New Roman', serif; line-height: 1.5;">Again Restart the server.</span></li>
</ul>
</div>
</div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com0tag:blogger.com,1999:blog-1743557738889130058.post-26794795447044268402013-10-25T05:08:00.003-07:002013-10-25T05:08:41.298-07:00Adding the Data Cleaner and Data Quality Plugins to Kettle<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: left;">
<span style="font-family: Times, Times New Roman, serif;"><strong><span style="color: blue; line-height: 1.5;">Data Cleaner :</span></strong></span><br />
<br />
<ul style="text-align: left;">
<li><a class="wiki_link_ext" href="http://wiki.pentaho.com/display/EAI/Kettle+Data+Profiling+with+DataCleaner" rel="nofollow" style="background-image: url(http://www.wikispaces.com/i/a.gif); background-position: 100% 50%; background-repeat: no-repeat no-repeat; font-family: Times, 'Times New Roman', serif; line-height: 24px; padding-right: 10px;">http://wiki.pentaho.com/display/EAI/Kettle+Data+Profiling+with+DataCleaner</a></li>
</ul>
<span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;"><b><span style="color: blue;">Data Quality :</span></b></span></span><br />
<ul style="text-align: left;">
<li><span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;"><span style="background-color: white; line-height: 1.5;">Download Easy data quality for Pentaho</span></span></span></li>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;"><span style="background-color: white; line-height: 1.5;">The plugin source code and downloads are hosted on Sourceforge, so the first step is to go here to download it:</span></span></span></li>
<ul>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;"><a class="wiki_link_ext" href="http://sourceforge.net/projects/easydq-kettle/" rel="nofollow" style="background-image: url(http://www.wikispaces.com/i/a.gif); background-position: 100% 50%; background-repeat: no-repeat no-repeat; line-height: 1.5; padding-right: 10px;">http://sourceforge.net/projects/easydq-kettle/</a></span></span></li>
</ul>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="line-height: 1.5;"><span style="background-color: white; line-height: 1.5;">After download you will have a file named EasyDQ-PDI-plugin.jar.</span></span></span></li>
</ul>
<br />
<strong style="font-family: Times, 'Times New Roman', serif; line-height: 24px;"><span style="color: blue;">Copy plugin file to Pentaho:</span></strong><br />
<span style="background-color: white; font-family: Times, 'Times New Roman', serif; line-height: 1.5;"><ul style="text-align: left;">
<li><span style="line-height: 1.5;">Copy the EasyDQ-PDI-plugin.jar file to the plugins/ directory of Pentaho Data Integration. The folder will already have a few other plugins, for instance it will look like this:</span></li>
</ul>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://www.easydq.com/media/127650/pdi-install-1_500x248.jpg" imageanchor="1" style="font-family: 'Times New Roman'; line-height: normal; margin-left: 1em; margin-right: 1em;"><img alt="external image pdi-install-1_500x248.jpg" border="0" src="http://www.easydq.com/media/127650/pdi-install-1_500x248.jpg" style="border: 0px; padding: 4px;" title="external image pdi-install-1_500x248.jpg" /></a></div>
<div>
</div>
<ul style="text-align: left;">
<li><span style="line-height: 1.5;">If you prefer you can also create a subdirectory in the plugins/ folder and put the file there.</span></li>
</ul>
<div>
<span style="line-height: normal;"><b><span style="color: blue;">Start Pentaho Data Integration:</span></b></span></div>
<ul style="text-align: left;">
<li><span style="line-height: 1.5;">Start Pentaho Data Integration by executing the spoon.bat file (or spoon.sh on *nix systems). Once the application has started, you should see the "Data Quality" category of steps when you work with transformations.</span></li>
</ul>
</span></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: Times, Times New Roman, serif; margin-left: 1em; margin-right: 1em;"><img alt="external image pdi-install-2.png" src="http://www.easydq.com/media/127692/pdi-install-2.png" style="border: 0px; padding: 4px;" title="external image pdi-install-2.png" /></span></div>
<div style="text-align: left;">
<br style="background-color: white; font-family: arial, helvetica, sans-serif; font-size: 13px; line-height: 19px;" /></div>
</div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com1tag:blogger.com,1999:blog-1743557738889130058.post-10308797602366418642013-10-14T10:50:00.001-07:002013-10-25T04:10:13.660-07:00Integration of R/Weka with Pentaho Data Integration (Spoon/Kettle)<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="margin-bottom: 0in; text-align: left;">
<b><span style="color: blue; font-family: Times, Times New Roman, serif;">Weka
Installation and Integration with R:</span></b></div>
<ul style="text-align: left;">
<li><div style="margin-bottom: 0in;">
<span style="font-family: Times, Times New Roman, serif;"><span style="color: #222222;">Download
and install the software from the following link
</span><span style="color: navy;"><span lang="zxx"><u><a href="http://www.cs.waikato.ac.nz/ml/weka/" target="_blank">http://www.cs.waikato.ac.nz/ml/weka/</a></u></span></span></span></div>
</li>
<li><span style="color: #222222; font-family: Times, 'Times New Roman', serif;">Installation
location on AE3 Server : /opt/weka-3-7-10</span></li>
<li><span style="color: #222222; font-family: Times, 'Times New Roman', serif;">For
linux, Navigate to that directory and issue the below command to
start the Weka</span></li>
<ul>
<li><span style="color: blue;"><span style="font-family: Times, Times New Roman, serif;">java
-Xmx1000M -jar weka.jar</span></span></li>
</ul>
<li><span style="color: #222222;"><span style="font-family: Times, Times New Roman, serif;">Under
Weka GUI Chooser, Navigate to Tools -> Package Manager</span></span></li>
<li><span style="color: #222222;"><span style="font-family: Times, Times New Roman, serif;">Install
the below dependence Packages thru Package manager</span></span></li>
<ul>
<li><span style="font-family: Times, Times New Roman, serif;">Rplugin</span></li>
<li><span style="font-family: Times, Times New Roman, serif;">DTNB</span></li>
<li><span style="font-family: Times, Times New Roman, serif;">TimeSeriesForecastin</span></li>
<li><span style="font-family: Times, Times New Roman, serif;">naiveBayesTree</span></li>
<li><span style="font-family: Times, Times New Roman, serif;">kfKettle</span></li>
<li><span style="font-family: Times, Times New Roman, serif;">multiInstanceFilters</span></li>
<li><span style="font-family: Times, Times New Roman, serif;">UserClassifier</span></li>
</ul>
<li><span style="color: #222222;"><span style="font-family: Times, Times New Roman, serif;">Weka
log file location /root/wekafiles</span></span></li>
</ul>
<ol start="4" style="text-align: left;">
</ol>
<span style="font-family: Times, 'Times New Roman', serif;"></span><br />
<ul style="text-align: left;">
</ul>
<div style="text-align: left;">
</div>
<ol start="6" style="text-align: left;">
</ol>
<span style="color: blue; font-family: Times, Times New Roman, serif;"><b>Weka integration with Pentaho Spoon:</b> </span><br />
<ol start="6" style="text-align: left;">
</ol>
<div style="margin-bottom: 0in;">
<div style="margin-bottom: 0in; text-align: left;">
<div style="text-align: left;">
<b><span style="font-family: Times, Times New Roman, serif;">To
integrate the Weka with Pentaho for doing data mining we need a PMML
Model to fetch an input data with that model.</span></b></div>
</div>
<ul style="text-align: left;">
<li><span style="font-family: Times, Times New Roman, serif;">Creating and Exporting the Model in Weka</span></li>
<li><span style="font-family: Times, Times New Roman, serif;">Open Weka Explorer</span></li>
<li><span style="font-family: Times, Times New Roman, serif;">Open the CSV file and Navigate to Classify tab and choose J48 classifier which is best Data Learning classifier available under the Choose -> Tree -> J48</span></li>
<li><span style="font-family: Times, Times New Roman, serif;">Click Start button to create a Classifier Model</span></li>
</ul>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcw7RLjyEo2sUCWfvLgWkVs6G8b8bvH1HNwrMlicylVbLCYJQqV_VloJkPonVzvnDPp_C3D0eBJxNkmyAsdzCZgcu1lUhW_Ja3P0ZsW6S9SEtGIOpcbHLC8v9JgLagUw5HtJt6jNAX2D4/s1600/001.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-family: Times, Times New Roman, serif;"><img border="0" height="286" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcw7RLjyEo2sUCWfvLgWkVs6G8b8bvH1HNwrMlicylVbLCYJQqV_VloJkPonVzvnDPp_C3D0eBJxNkmyAsdzCZgcu1lUhW_Ja3P0ZsW6S9SEtGIOpcbHLC8v9JgLagUw5HtJt6jNAX2D4/s400/001.JPG" width="400" /></span></a></div>
<div>
<ul style="text-align: left;">
<li><span style="font-family: Times, Times New Roman, serif;">after running successfully, you would get the above screenshot. The value of correctly classified Instance should be above the 60%</span></li>
<li><span style="font-family: Times, Times New Roman, serif;">Save the Model in specific location.</span></li>
</ul>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgxNbOrJZxAQBzLuGpTHkJEih4W46xdnQ4MClw8t-KIlWI95BKU1RBlSQkLyFeYDu1By_UizfiOYqfkg9f_439LKIN3_en7GoYGQx7cDqqi_-Q_5zCoE-aNUByjxv1TJhuPYYX2iKjieVE/s1600/002.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><span style="font-family: Times, Times New Roman, serif;"><img border="0" height="241" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgxNbOrJZxAQBzLuGpTHkJEih4W46xdnQ4MClw8t-KIlWI95BKU1RBlSQkLyFeYDu1By_UizfiOYqfkg9f_439LKIN3_en7GoYGQx7cDqqi_-Q_5zCoE-aNUByjxv1TJhuPYYX2iKjieVE/s400/002.JPG" width="400" /></span></a></div>
<div>
</div>
<ul style="text-align: left;">
<li><span style="font-family: Times, Times New Roman, serif;">Open Pentaho Spoon and create a Transformation as given below</span></li>
</ul>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiihon2IQh5TKd-YnyM54w_iz2FAotlbD-tfF2Cr-PuPW5z6Ml667DsB5eFrq7RYFFpAC_HAeiGzAhVIy5gRXnPP0EHk6zFMZFhfY-yfV4LpX0PwMoWQNlKuOH2rHjud3oM0Xqy1zYVhgY/s1600/003.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-family: Times, Times New Roman, serif;"><img border="0" height="155" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiihon2IQh5TKd-YnyM54w_iz2FAotlbD-tfF2Cr-PuPW5z6Ml667DsB5eFrq7RYFFpAC_HAeiGzAhVIy5gRXnPP0EHk6zFMZFhfY-yfV4LpX0PwMoWQNlKuOH2rHjud3oM0Xqy1zYVhgY/s640/003.JPG" width="640" /></span></a></div>
<div>
<ul style="text-align: left;">
<li><span style="font-family: Times, Times New Roman, serif;">On Weka scoring object, Load the exported model from Weka and map the input field to the Model</span></li>
</ul>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjos-JUYikq2Ld509wtKFXEA_x5aEqNzVNN2Hzz_qAryocraZcNiV6lvPOryXEBPuxDBY_j1G8bUmYujE9nfembpSdiOQfUSI69UsgoFWEECAR45NtS_mLpNs8b92oFu38qZb3bAS7RGLI/s1600/004.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-family: Times, Times New Roman, serif;"><img border="0" height="235" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjos-JUYikq2Ld509wtKFXEA_x5aEqNzVNN2Hzz_qAryocraZcNiV6lvPOryXEBPuxDBY_j1G8bUmYujE9nfembpSdiOQfUSI69UsgoFWEECAR45NtS_mLpNs8b92oFu38qZb3bAS7RGLI/s400/004.JPG" width="400" /></span></a></div>
<span style="font-family: Times, Times New Roman, serif;"><br /></span>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgFoSbUQDlKQggrH-AEw7ESwgLzCLQIffAuRpAP60sxWA_tVGEsIzhnJGQgqYJZLZ1DgcxGOFXw-CxrGpu2_1jayAjZ-84zQ25FqHjE_vMrNeOubKV-TaauAwJyMD3jnQXz-RmMjIDWU2g/s1600/005.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-family: Times, Times New Roman, serif;"><img border="0" height="198" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgFoSbUQDlKQggrH-AEw7ESwgLzCLQIffAuRpAP60sxWA_tVGEsIzhnJGQgqYJZLZ1DgcxGOFXw-CxrGpu2_1jayAjZ-84zQ25FqHjE_vMrNeOubKV-TaauAwJyMD3jnQXz-RmMjIDWU2g/s400/005.JPG" width="400" /></span></a></div>
</div>
</div>
<div>
<ul style="text-align: left;">
<li><span style="font-family: Times, Times New Roman, serif;">Now the Weka is integrated with Pentaho and implement your Data Mining Concepts and Run the transformation</span></li>
</ul>
</div>
</div>
</div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com2tag:blogger.com,1999:blog-1743557738889130058.post-78866824917363885232013-10-09T07:20:00.001-07:002013-10-25T03:31:27.878-07:00How to install and upgrade R up to date on Ubuntu linux<div dir="ltr" style="text-align: left;" trbidi="on">
<h2 style="color: #333333; font-family: 'Trebuchet MS', 'Lucida Grande', Verdana, Arial, sans-serif; font-size: 1.6em; line-height: 19.1875px; margin: 30px 0px 0px;">
</h2>
<div style="text-align: left;">
<b><span style="color: blue; font-family: Times, Times New Roman, serif;">Installation of R on Ubuntu : </span></b></div>
<div class="entry" style="text-align: justify;">
<div style="color: #333333;">
<div style="text-align: left;">
<ul style="line-height: 1.4em; text-align: left;">
<li><span style="font-family: Times, 'Times New Roman', serif; line-height: 1.4em;">R is included as part of the standard Ubuntu distribution, and can be installed with a command like</span><b style="color: blue; font-family: Times, 'Times New Roman', serif; line-height: 1.4em;"> </b></li>
<ul>
<li><b style="color: blue; font-family: Times, 'Times New Roman', serif; line-height: 1.4em;">sudo apt-get install r-base</b></li>
</ul>
<li><span style="font-family: Times, 'Times New Roman', serif; line-height: 16.796875px;">Installation location of R and Library file</span></li>
<ul>
<li><span style="line-height: 16.796875px;"><span style="font-family: Times, Times New Roman, serif;"><span style="color: blue;"><b>Installed location : /usr/lib/R</b></span></span></span><span style="line-height: 16.796875px;"><span style="font-family: Times, Times New Roman, serif;"><span style="color: blue;"><b> </b></span></span></span></li>
<li><span style="line-height: 16.796875px;"><span style="font-family: Times, Times New Roman, serif;"><span style="color: blue;"><b>Library file : /usr/local/lib/R</b></span></span></span><span style="line-height: 16.796875px;"><span style="font-family: Times, Times New Roman, serif;"><span style="color: blue;"><b> /usr/share/R</b></span></span></span><span style="line-height: 16.796875px;"><span style="font-family: Times, Times New Roman, serif;"><span style="color: blue;"><b> /etc/R</b></span></span></span><span style="line-height: 16.796875px;"><span style="font-family: Times, Times New Roman, serif;"><span style="color: blue;"><b> /usr/bin/R</b></span></span></span></li>
</ul>
<li><span style="font-family: Times, 'Times New Roman', serif; line-height: 16.796875px;">Open Terminal and simply type R to open the shell</span></li>
</ul>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhzIS3MRJ78rcGmyN-yubc5v2LihHIo1TBdOIMtg30NvTeF6cyI1cJ2FJ2-tUwGW2udIfQt9pMkuANmSmEQfOYxU48uHl3qPN_OpDz1riyHdI8nlYYEmDIoOhJyEqKb8ur7kyFzyxw3SdQ/s1600/R.JPG" imageanchor="1" style="font-family: Times, 'Times New Roman', serif; margin-left: 1em; margin-right: 1em;"><img border="0" height="246" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhzIS3MRJ78rcGmyN-yubc5v2LihHIo1TBdOIMtg30NvTeF6cyI1cJ2FJ2-tUwGW2udIfQt9pMkuANmSmEQfOYxU48uHl3qPN_OpDz1riyHdI8nlYYEmDIoOhJyEqKb8ur7kyFzyxw3SdQ/s400/R.JPG" width="400" /></a></div>
<div>
</div>
<ul style="line-height: 1.4em; text-align: left;">
<li><span style="font-family: Times, 'Times New Roman', serif; line-height: 16.796875px;">Set the Environmental variable for R Systems in ./bashrc file</span></li>
</ul>
<ul style="line-height: 1.4em; text-align: left;">
</ul>
<ul style="line-height: 1.4em; text-align: left;">
</ul>
<div style="line-height: 1.4em;">
</div>
<ul style="text-align: left;"><ul>
<li><b style="color: blue; font-family: Times, 'Times New Roman', serif; line-height: 16.796875px;">export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.35</b></li>
<li><b style="color: blue; font-family: Times, 'Times New Roman', serif; line-height: 16.796875px;">export PATH=$JAVA_HOME/bin:$PATH</b></li>
<li><b style="color: blue; font-family: Times, 'Times New Roman', serif; line-height: 16.796875px;">export R_HOME=/usr/lib/R</b></li>
<li><b style="color: blue; font-family: Times, 'Times New Roman', serif; line-height: 16.796875px;">export PATH=$R_HOME/bin:$PATH</b></li>
<li><b style="color: blue; font-family: Times, 'Times New Roman', serif; line-height: 16.796875px;">export LD_LIBRARY_PATH=/u01/app/oracle/product/11.2.0/xe/lib:/usr/lib/R/lib:/usr/lib/jvm/java-6-sun-1.6.0.35/lib:/usr/lib/jvm/java-6-sun-1.6.0.35/jre/lib/amd64/server:/usr/local/lib/R/site-library/rJava/jri</b></li>
<li><b style="color: blue; font-family: Times, 'Times New Roman', serif; line-height: 16.796875px;">export PATH=$LD_LIBRARY_PATH/bin:$PATH</b></li>
</ul>
</ul>
<ul style="line-height: 1.4em; text-align: left;">
<li><span style="font-family: Times, 'Times New Roman', serif; line-height: 1.4em;">Use below command to reconfigure/update the R Systems</span></li>
<ul>
<li><b style="color: blue; font-family: Times, 'Times New Roman', serif; line-height: 16.796875px;">sudo R CMD javareconf</b></li>
</ul>
<li><span style="color: #333333; font-family: Times, 'Times New Roman', serif; line-height: 1.4em;">Obviously the software included as part of the standard distribution usually lags a little behind the latest version, and this is usually quite acceptable for most users most of the time. However, R is evolving quite quickly at the moment, and for various reasons I have decided to skip Ubuntu 12.10 (quantal) and stick with Ubuntu 12.4 (precise) for the time being. Since R 2.14 is included with Ubuntu 12.4, and I’d rather use R 2.15, I’d like to run with the latest R builds on my Ubuntu system.</span></li>
</ul>
<ul style="line-height: 1.4em; text-align: left;">
</ul>
<ul style="line-height: 1.4em; text-align: left;">
<li><span style="font-family: Times, Times New Roman, serif;"><span style="color: #333333; line-height: 1.4em;">Fortunately this is very easy, as there is a maintained repository for Ubuntu builds of R on CRAN. </span><a href="http://cran.r-project.org/bin/linux/ubuntu/" style="color: #b85b5a; line-height: 1.4em; text-decoration: none;">Full instructions</a><span style="color: #333333; line-height: 1.4em;"> are provided on CRAN, but here is the quick summary. First you need to know your nearest CRAN mirror – there is a </span><a href="http://cran.r-project.org/mirrors.html" style="color: #b85b5a; line-height: 1.4em; text-decoration: none;">list of mirrors</a><span style="color: #333333; line-height: 1.4em;"> on CRAN. I generally use the Bristol mirror, and so I will use it in the following.</span></span></li>
</ul>
</div>
</div>
<div>
<b style="line-height: normal; text-align: left;"><span style="color: blue; font-family: Times, Times New Roman, serif;">Upgrade the R on Ubuntu : </span></b><br />
<div style="text-align: left;">
<div style="text-align: left;">
<div style="text-align: left;">
</div>
<ul style="text-align: left;"><ul>
<li><b style="color: blue; font-family: Times, 'Times New Roman', serif;">sudo su</b></li>
<li><b style="color: blue; font-family: Times, 'Times New Roman', serif;">echo "deb http://lib.stat.cmu.edu/R/CRAN/bin/linux/ubuntu precise/" >> /etc/apt/sources.list</b></li>
<li><b style="color: blue; font-family: Times, 'Times New Roman', serif;">apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9</b></li>
<li><b style="color: blue; font-family: Times, 'Times New Roman', serif;">apt-get update</b></li>
<li><b style="color: blue; font-family: Times, 'Times New Roman', serif;">apt-get upgrade</b></li>
</ul>
<li><span style="color: #333333; font-family: Times, 'Times New Roman', serif; line-height: 1.4em;">That’s it. You are updated to the latest version of R, and your system will check for updates in the usual way. There are just two things you may need to edit in line 2 above. </span><span style="color: #333333; font-family: Times, 'Times New Roman', serif; line-height: 1.4em;">The first is the address of the CRAN mirror (here “</span><a href="http://lib.stat.cmu.edu/R/CRAN/bin/linux/ubuntu/precise/" style="font-family: Times, 'Times New Roman', serif; line-height: 1.4em;">http://lib.stat.cmu.edu/R/CRAN</a><span style="color: #333333; font-family: Times, 'Times New Roman', serif; line-height: 1.4em;">”). The second is the name of the Ubuntu distro you are running (here “precise”).</span></li>
</ul>
</div>
</div>
</div>
</div>
</div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com12tag:blogger.com,1999:blog-1743557738889130058.post-67366237611857284972013-10-02T11:54:00.000-07:002014-01-24T05:16:15.521-08:00Elasticsearch, Kettle and the CTools<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="margin-bottom: 0in; page-break-before: always; text-align: left;">
<span style="font-family: Times, Times New Roman, serif;"><span style="font-weight: normal;">When i was started to work the Elastic Search with PDI based on the following link
</span><a href="http://pedroalves-bi.blogspot.co.uk/2011/07/elasticsearch-kettle-and-ctools.html?m=1" style="font-weight: normal;">http://pedroalves-bi.blogspot.co.uk/2011/07/elasticsearch-kettle-and-ctools.html?m=1</a><span style="font-weight: normal;">,
I
have been faced many issues and while trying to debug those issues, I could
not find much information/support from anybody. Therefore this blog
describes the insertion of bulk data to Elastic Search engine using
Elastic Search Bulk Insert object on kettle and integrates the output
of Kettle with CDA.</span></span><br />
<span style="font-family: Times, Times New Roman, serif;"><span style="font-weight: normal;">Currently,
it is not possible to run the Pentaho with higher version of Elastic
Search e.g. 0.90.5. The main reason of it is that PDI components has been compiled with 0.16.3 classes.<span style="background-color: white;"> </span></span></span><br />
<span style="font-family: Times, Times New Roman, serif;"><span style="font-weight: normal;"><span style="background-color: white;"><br /></span></span>
<span style="color: blue;"><b>Prerequisite
:</b></span></span><br />
<ul style="text-align: left;">
<li><span style="font-family: Times, Times New Roman, serif;">Elastic Search engine - ES 0.19.5</span></li>
<li><span style="font-family: Times, Times New Roman, serif;">Pentaho BA Server - 4.8.0 GA</span></li>
<li><span style="font-family: Times, Times New Roman, serif;">Kettle - 4.4</span></li>
</ul>
<b style="color: blue;"><span style="font-family: Times, Times New Roman, serif;">Installation of Elastic Search Engine :</span></b></div>
<div style="margin-bottom: 0in; page-break-before: always; text-align: left;">
<ul style="text-align: left;">
<li><span style="font-family: Times, Times New Roman, serif;">Download ES ver. 0.19.5 from<a href="http://www.elasticsearch.org/downloads/0-19-5/">http://www.elasticsearch.org/downloads/0-19-5/</a></span></li>
<li><span style="font-family: Times, 'Times New Roman', serif;">Extract the elasticsearch-0.19.5.tar file under the usr/share
directory.</span></li>
<li><span style="font-family: Times, 'Times New Roman', serif;">Navigate to usr/share directory and issues the below command,</span></li>
<ul>
<li><span style="color: blue; font-family: Times, 'Times New Roman', serif;">$
elasticsearch-0.19.5/bin/plugin -install mobz/elasticsearch-head</span></li>
</ul>
<li><span style="font-family: Times, 'Times New Roman', serif;">Navigate to usr/share/elasticsearch-0.19.5 directory and issues the
below command to start the ES</span></li>
<ul>
<li><span style="color: blue; font-family: Times, 'Times New Roman', serif;">$
bin/elasticsearch or bin/elasticsearch -f</span></li>
</ul>
<li><span style="font-family: Times, 'Times New Roman', serif;">open </span><span style="color: blue; font-family: Times, 'Times New Roman', serif;"><u><a href="http://localhost:9200/_plugin/head/">http://localhost:9200/_plugin/head/</a></u></span></li>
</ul>
</div>
<div style="margin-bottom: 0in; text-align: left;">
<span style="font-family: Times, Times New Roman, serif;">
<b style="color: blue;">Inserting the Bulk data to ES on Kettle Transformation :</b></span></div>
<div style="margin-bottom: 0in; text-align: left;">
<ul style="text-align: left;">
<li><span style="font-family: Times, Times New Roman, serif;">Create a Transformation (Table input -> Elastic Search Bulk
Insert)<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgaZWFGtlyWo7V3EAaocP4hYhXnQe7JV83nVSv7-53tdgH3clFS2bPfuUtIUkgUmmCh-1Q1xU4wjfKCLc8_sspYRunNND5_vEhPx6uHMY60c0FcTP4VG1k8WEes1xuzl9B-07xq7eQ1cSg/s1600/Elastic+Search+Bulk.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgaZWFGtlyWo7V3EAaocP4hYhXnQe7JV83nVSv7-53tdgH3clFS2bPfuUtIUkgUmmCh-1Q1xU4wjfKCLc8_sspYRunNND5_vEhPx6uHMY60c0FcTP4VG1k8WEes1xuzl9B-07xq7eQ1cSg/s1600/Elastic+Search+Bulk.JPG" /></a></span></li>
<li><span style="font-family: Times, 'Times New Roman', serif;">Copy the elasticsearch* and lucene* jars from 0.19.5 ES server/lib to
.../design-tools/data-integration/lib/elasticsearch directory.</span></li>
<li><span style="font-family: Times, 'Times New Roman', serif;">Copy the attached jar file (es_0.19.4_patch.jar) into PDI/lib</span></li>
<ul>
<li><span style="font-family: Times, Times New Roman, serif;"><span style="color: blue;"><b>Download Patch file : </b></span><a href="https://sites.google.com/site/filecabinkkarthik21bigdata/uploadfile/es_0.19.4_patch.jar?attredirects=0&d=1">https://sites.google.com/site/filecabinkkarthik21bigdata/uploadfile/es_0.19.4_patch.jar?attredirects=0&d=1</a></span></li>
</ul>
<li><span style="font-family: Times, 'Times New Roman', serif;">Restart the PDI</span></li>
<li><span style="font-family: Times, 'Times New Roman', serif;">In Elastic Search Bulk Insert object, Provide the IP address and Port
number of the Elastic Search Engine on Servers tab.</span></li>
</ul>
</div>
<div style="margin-bottom: 0in; text-align: left;">
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: Times, Times New Roman, serif; margin-left: 1em; margin-right: 1em;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9C3GFjHsMWHJeXUJNjTWvZ-G1-u-cuXAPwE5Cyu7TYhnIJu-SuX2QdTaFv-vkVf-kD-dg-ngv5yU7TsOMk5Eq4T77Qjg3I202QtFQdTEqAzOqSVTwUXWbfo4SUaz-sdqHD892tyoVCwA/s1600/001.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9C3GFjHsMWHJeXUJNjTWvZ-G1-u-cuXAPwE5Cyu7TYhnIJu-SuX2QdTaFv-vkVf-kD-dg-ngv5yU7TsOMk5Eq4T77Qjg3I202QtFQdTEqAzOqSVTwUXWbfo4SUaz-sdqHD892tyoVCwA/s640/001.JPG" height="421" width="640" /></a></span></div>
</div>
<div style="margin-bottom: 0in; text-align: left;">
<ul style="text-align: left;"><ul>
<li><b style="color: red;"><span style="font-family: Times, Times New Roman, serif;">Note
: You need to select the value for ID Field.</span></b></li>
</ul>
<li><span style="font-family: Times, 'Times New Roman', serif;">Click the Test Connection, you could see the below screen-shot which
means PDI is connected to ES.</span></li>
</ul>
</div>
<div style="margin-bottom: 0in; text-align: left;">
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: Times, Times New Roman, serif; margin-left: 1em; margin-right: 1em;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEih15suwWhyphenhyphenIihfwWteWE_OlY-I4-nzpfiK9uG1P7l2EP7G5jdCAfe2eiTWsorLwD9Qi9KRq8B3ETI29ZNfAmcTWtyhwtXZJAcRAUsHVnWD4RkJ_YMgLECaGWEoO9AeeMM6aUUOsYsM1l4/s1600/ES_Test_Connection.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEih15suwWhyphenhyphenIihfwWteWE_OlY-I4-nzpfiK9uG1P7l2EP7G5jdCAfe2eiTWsorLwD9Qi9KRq8B3ETI29ZNfAmcTWtyhwtXZJAcRAUsHVnWD4RkJ_YMgLECaGWEoO9AeeMM6aUUOsYsM1l4/s400/ES_Test_Connection.JPG" height="165" width="400" /></a></span></div>
</div>
<div style="margin-bottom: 0in; text-align: left;">
<ul style="text-align: left;"><span style="font-family: Times, 'Times New Roman', serif;">
<li>Run the Transformation. It inserts the bulk data to the ES engine.</li>
</span></ul>
<span style="font-family: Times, Times New Roman, serif;">
</span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: Times, Times New Roman, serif;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjKYGzvote0uoNCJXxN5bnG7kXuDlGsiEGDCT22oBkVeID3x3MKOOsEtqnM4l1r7XmAOIrLtc9q_RQmI125SZ-SrZvNwdF28ysbRXBkODVXnkacPA1wljQPQa7wgrXNoXBmqVGI8BP2X_c/s1600/002.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjKYGzvote0uoNCJXxN5bnG7kXuDlGsiEGDCT22oBkVeID3x3MKOOsEtqnM4l1r7XmAOIrLtc9q_RQmI125SZ-SrZvNwdF28ysbRXBkODVXnkacPA1wljQPQa7wgrXNoXBmqVGI8BP2X_c/s1600/002.JPG" /></a></span></div>
<span style="font-family: Times, Times New Roman, serif;">
</span>
<br />
<div>
</div>
<span style="font-family: Times, Times New Roman, serif;">
<b style="color: blue;">Elastic Search Server : Sample JSON Input and Output Query:</b></span></div>
<div style="margin-bottom: 0in; text-align: left;">
<b style="color: blue;"><span style="font-family: Times, Times New Roman, serif;"><br /></span></b></div>
<div style="margin-bottom: 0in; text-align: left;">
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: Times, Times New Roman, serif; margin-left: 1em; margin-right: 1em;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0v-hriu0Nesrq-edF_AZSwB022sP2l70vA5w0XgZADfao91WM_lBJ_egtw04bdeu5vNNNEfHEsAmfnXbP909oo6pVkSWmFNTJkDZg3vbRY0edNBQBHMfFMRE-o6c6QOwPXjw95PNim9k/s1600/003.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0v-hriu0Nesrq-edF_AZSwB022sP2l70vA5w0XgZADfao91WM_lBJ_egtw04bdeu5vNNNEfHEsAmfnXbP909oo6pVkSWmFNTJkDZg3vbRY0edNBQBHMfFMRE-o6c6QOwPXjw95PNim9k/s640/003.JPG" height="508" width="640" /></a></span></div>
</div>
<div style="margin-bottom: 0in; text-align: left;">
<div class="separator" style="clear: both; text-align: center;">
</div>
<div style="margin-bottom: 0in;">
<span style="font-family: Times, Times New Roman, serif;"><br />
</span><br />
<div style="text-align: left;">
<span style="font-family: Times, Times New Roman, serif;"><b><span style="color: blue;">Elastic Search Query Transformation :</span></b></span></div>
<ul style="text-align: left;">
<li><span style="font-family: Times, Times New Roman, serif;">Create a Transformation.</span></li>
</ul>
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2810zzd8nBRCLEJn6hyphenhyphenBh-os0XrQIKjZ0Cusp5SbaeMmsKQ-1QPLbXGEIzcTafwvh3CYGUeyi1hK40wpLS9NRe84HrG6AjFLpDlfHnTi1v0nABqwqiHGNhgSpC9VSMGZxn5K0BO1sh1s/s1600/004.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><span style="font-family: Times, Times New Roman, serif;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2810zzd8nBRCLEJn6hyphenhyphenBh-os0XrQIKjZ0Cusp5SbaeMmsKQ-1QPLbXGEIzcTafwvh3CYGUeyi1hK40wpLS9NRe84HrG6AjFLpDlfHnTi1v0nABqwqiHGNhgSpC9VSMGZxn5K0BO1sh1s/s640/004.JPG" height="176" width="640" /></span></a></div>
<div style="margin-bottom: 0in;">
<span style="font-family: Times, Times New Roman, serif;"><br /></span></div>
<div style="margin-bottom: 0in;">
<div class="separator" style="clear: both; text-align: center;">
</div>
<div style="margin-left: 1em; margin-right: 1em;">
<span style="font-family: Times, Times New Roman, serif;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEionnkUeb9c-Ai9WwRG4vG-H6WNzgrWIlY_xOQYWAgU72aQLtqu3WMgPf1-BtoTiRmVCYHnuFGSh25pGbG_RqkSTRDRibPsM7iM1zW9aXQtwrsXve-dyQ8HdHmbom6sMbygvbYkyBB9Mwg/s1600/005.JPG" /></span></div>
<div style="text-align: left;">
<span style="font-family: Times, Times New Roman, serif;"><br /></span>
<br />
<ul style="text-align: left;"><ul>
<li><b style="color: red;"><span style="font-family: Times, Times New Roman, serif;">Note : Here I have used java script to extracting all the datas from CACM.</span></b></li>
</ul>
</ul>
<div style="text-align: left;">
<span style="font-family: Times, Times New Roman, serif;">
<b style="color: blue;">Kettle Data Source Input in CDE Dashboard:</b></span></div>
<ul style="text-align: left;">
<li><span style="font-family: Times, Times New Roman, serif;">Create a New CDE Dashboard in PUC.</span></li>
<li><span style="font-family: Times, Times New Roman, serif;">Create a Kettle Dat source in Data source Tab.</span></li>
</ul>
</div>
<div style="text-align: left;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj99d6GSZ8xis-PPdlL4-QO4JAsp4lxF-30UwxjrdxC1iNnaLaj29Ie67ukSsi3uF3JXnGTZCyEFTXIawcC_pVtmkzsYP7lTt0eBSGdVZmWpTvwz_uhVTZZVI-cvFNylJNfA_T3bCFatw4/s1600/006.JPG" imageanchor="1" style="clear: left; display: inline !important; margin-bottom: 1em; margin-left: 1em; text-align: center;"><span style="font-family: Times, Times New Roman, serif;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj99d6GSZ8xis-PPdlL4-QO4JAsp4lxF-30UwxjrdxC1iNnaLaj29Ie67ukSsi3uF3JXnGTZCyEFTXIawcC_pVtmkzsYP7lTt0eBSGdVZmWpTvwz_uhVTZZVI-cvFNylJNfA_T3bCFatw4/s640/006.JPG" height="268" width="640" /></span></a></div>
<div style="text-align: left;">
<ul style="text-align: left;">
<li><span style="font-family: Times, Times New Roman, serif;">Click Preview to display the values</span></li>
</ul>
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div style="margin-left: 1em; margin-right: 1em;">
<span style="font-family: Times, Times New Roman, serif;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgE7IZGIvP85rw38s3jzyJZwNdMVxKr4R-RUi6NisEfOnWP-UcnEjK0-zfB48k8hFgzXUwUOQje0l3Wz8ocuELCCTtWN8x5h7nKdMesyFeMHIKW1vp1gBdoGggY-HEJ9us74S0rbUH6DEM/s640/007.JPG" height="214" width="640" /></span></div>
</div>
<ol>
</ol>
</div>
</div>
Karthikhttp://www.blogger.com/profile/18220563988883365640noreply@blogger.com2Chennai, Tamil Nadu, India13.0524139 80.25082459999998712.5573929 79.605377599999983 13.547434899999999 80.896271599999992