• How to prepare data for analysis

    Introduction Data is messy, most case, in a unclear and disorganized state, I assume the most important thing before we dive deep into data application is to reconstruct the data we have. Since the domain “Machine Learning” is waking up in the lightening so that we are exausted to chase...


  • Spark 2.1.0 setup on YARN environment, along with Zeppelin notebook

    Summary 1. Introduction 2. Architecture 3. Spark Setup 4. Zeppelin Setup 1. Introduction This posts will give the detail about how to setup Spark environment onto YARN computing cluster, and also along with apache Zeppelin notebook. 2. Architecture The architecture is specified in my previous post, please refer to it...


  • HBase 1.2.4 Cluster setup with Zookeeper

    Summary 1. Introduction 2. Architecture 3. HBase Setup 4. Launch and Shutdown HBase Cluster Service 5. Verify the HBase cluster is up and healthy 1. Introduction In this post, I’m going to go through the HBase Cluster setup process (version 1.2.4) onto the environment we just built in these posts:...


  • Zookeeper 3.4.9 Cluster Setup for Hadoop

    Summary 1. Introduction 2. Architecture 3. Zookeeper Setup 4. Launch and Shutdown Zookeeper Cluster Service 5. Verify the Zookeeper cluster is up and healthy 1. Introduction This post is to basically guide you to setup a Zookeeper cluster based on the Hadoop cluster I previously built, please refer to my...


  • Hadoop Cluster 2.6.5 Installation on CentOS 7 in basic version

    Summary 1. Introduction 2. Architecture 3. CentOS setup 4. Hadoop Setup 5. Launch and Shutdown Hadoop Cluster Service 6. Verify the hadoop cluster is up and healthy 7. End 1. Introduction This posts will give all related detail in how to setup a Hadoop cluster on CentOS linux system. Before...