Monday, October 14, 2013

Integration of R/Weka with Pentaho Data Integration (Spoon/Kettle)

Weka Installation and Integration with R:
  • Download and install the software from the following link
  • Installation location on AE3 Server : /opt/weka-3-7-10
  • For linux, Navigate to that directory and issue the below command to start the Weka
    • java -Xmx1000M -jar weka.jar
  • Under Weka GUI Chooser, Navigate to Tools -> Package Manager
  • Install the below dependence Packages thru Package manager
    • Rplugin
    • DTNB
    • TimeSeriesForecastin
    • naiveBayesTree
    • kfKettle
    • multiInstanceFilters
    • UserClassifier
  • Weka log file location /root/wekafiles

Weka integration with Pentaho Spoon: 
To integrate the Weka with Pentaho for doing data mining we need a PMML Model to fetch an input data with that model.
  • Creating and Exporting the Model in Weka
  • Open Weka Explorer
  • Open the CSV file and Navigate to Classify tab and choose J48 classifier which is best Data Learning classifier available under the Choose -> Tree -> J48
  • Click Start button to create a Classifier Model
  • after running successfully, you would get the above screenshot. The value of correctly classified Instance should be above the 60%
  • Save the Model in specific location.
  • Open Pentaho Spoon and create a Transformation as given below
  • On Weka scoring object, Load the exported model from Weka and map the input field to the Model

  • Now the Weka is integrated with Pentaho and implement your Data Mining Concepts and Run the transformation

1 comment:

  1. Good post . Is there a way R studio can be used with Pentaho BA server (CE). Is there a plugin available