Procedures :
Dataset Preparation
Generate Sequence
File
Move the Sequence
File to Hadoop Cluster and check the contents of the Sequence File
Plan and Run K-Means
clustering algorithm
Export the K-Means
...
Distributed Filesystem
Description
Apache HDFS
The Hadoop Distributed File System (HDFS) offers a way to store large files across multiple machines. Hadoop and HDFS was derived from Google File System (GFS) paper. Prior to Hadoop 2.0.0, the NameNode was a single point...