Unskilled Coder

How to prepare data for analysis

Introduction Data is messy, most case, in a unclear and disorganized state, I assume the most important thing before we dive deep into data application is to reconstruct the data we have. Since the domain “Machine Learning” is waking up in the lightening so that we are exausted to chase...

May 13, 2017

in Machine-learning

Spark 2.1.0 setup on YARN environment, along with Zeppelin notebook

Summary 1. Introduction 2. Architecture 3. Spark Setup 4. Zeppelin Setup 1. Introduction This posts will give the detail about how to setup Spark environment onto YARN computing cluster, and also along with apache Zeppelin notebook. 2. Architecture The architecture is specified in my previous post, please refer to it...

February 10, 2017

in Hadoop, Spark

HBase 1.2.4 Cluster setup with Zookeeper

Summary 1. Introduction 2. Architecture 3. HBase Setup 4. Launch and Shutdown HBase Cluster Service 5. Verify the HBase cluster is up and healthy 1. Introduction In this post, I’m going to go through the HBase Cluster setup process (version 1.2.4) onto the environment we just built in these posts:...

December 11, 2016

in Hadoop, Hbase

Zookeeper 3.4.9 Cluster Setup for Hadoop

Summary 1. Introduction 2. Architecture 3. Zookeeper Setup 4. Launch and Shutdown Zookeeper Cluster Service 5. Verify the Zookeeper cluster is up and healthy 1. Introduction This post is to basically guide you to setup a Zookeeper cluster based on the Hadoop cluster I previously built, please refer to my...

December 10, 2016

in Hadoop, Zookeeper

Hadoop Cluster 2.6.5 Installation on CentOS 7 in basic version

Summary 1. Introduction 2. Architecture 3. CentOS setup 4. Hadoop Setup 5. Launch and Shutdown Hadoop Cluster Service 6. Verify the hadoop cluster is up and healthy 7. End 1. Introduction This posts will give all related detail in how to setup a Hadoop cluster on CentOS linux system. Before...

December 10, 2016

in Hadoop