HDFS JAVA Hadoop API Overview




Hadoop API Introduction Hadoop API provides a Java native API to support file system operations such as create, rename or delete files and directories, open, read or write files, set permissions, etc. A very basic example can be found on Apache wiki about how to read and write files from Hadoop API. This is great for applications running within the […]

Read more

Apache Hadoop Yarn Overview




Here we describe Apache Hadoop Yarn, which is a resource manager built into Hadoop. But it also is a stand-alone programming framework that other applications can use to run those applications across a distributed architecture. We illustrate Yarn by setting up a Hadoop cluster as Yarn by itself is not much to see. It is not something you work with […]

Read more

How to run hadoop – map reduce jobs without a cluster




This document is indented to aid basic java developers to kick start how to run hadoop and practical investigation on Hadoop map reduce jobs without any cluster set up on their end. To understand this document you need to possess basic theoretical knowledge on  Hadoop, hdfs and map reduce jobs. It is also advisable to have some prior knowledge on […]

Read more

Hadoop Distributed File System (HDFS) for Big Data




The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining […]

Read more

Mapreduce Use Case to Calculate PageRank




Mapreduce Use Case to Calculate PageRank PageRank is a way of measuring the importance of website pages. PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites. In the general […]

Read more

Mapreduce Use Case to Calculate PageRank




Mapreduce Use Case to Calculate PageRank PageRank is a way of measuring the importance of website pages. PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites. In the general […]

Read more

MRUnit Example for WordCount Algorithm




MRUnit Example for WordCount Algorithm In this post we will discuss about basic MRUnit example for Wordcount algorithm. Below are the tools used in this example Eclipse 3.8, mrunit-1.0.0-hadoop2.jar Procedure: 1. Download mrunit jar from this link and add this to the java project build path (File –> properties –> java build path –> add external jars) in eclipse. 2. […]

Read more

Steps to change hadoop hive default metastore Derby DB to MySQL DB




Steps to change hadoop hive default metastore Derby DB to MySQL DB Step 1: Install and start MySQL Step 2: Configure the MySQL Service and Connector Download mysql-connector-java-5.0.5.jar file and copy it to $HIVE_HOME/lib directory. Step 3: Create the Database and User Create a metastore_db database in MySQL database using root user $ mysql -u root -p Enter password: mysql> CREATE […]

Read more

Java Interface to HDFS File Read Write




Java Interface to HDFS File Read Write Java Interface to HDFS File Read Write This post describes Java interface to HDFS File Read Write and it is a continuation for previous post, Java Interface for HDFS I/O. Reading HDFS Files Through FileSystem API: In order to read any File in HDFS, We first need to get an instance of FileSystem underlying the cluster. Then we […]

Read more

Mapreduce Program to calculate Missing Count




Mapreduce Program to calculate Missing Count Use Case Description: This post describes an approach to use case scenario, where an input file contains some columns and its corresponding values as records. But some of these columns may have blanks/nulls instead of actual values. I.e. data is missing for some columns. And developer needs to write a Mapreduce Program to calculate […]

Read more
1 2