Hadoop Hive Introduction




Hadoop Hive Overview Hadoop Hive is very similar to Apache Pig. What it does is let you create tables and load external files into tables using SQL. Then it creates MapReduce jobs in Java.  Java is a very wordy language so using Pig and Hive is simpler. Some have said that Hadoop Hive is a data warehouse tool (Bluntly put, […]

Read more

Hadoop TeraSort Benchmark Example




Hadoop TeraSort is one of Hadoop’s widely used benchmarks. Hadoop’s distribution contains both the input generator and sorting implementations: the TeraGen generates the input and TeraSort conducts the sorting. Here, we provide a short tutorial for using the Hadoop TeraSort benchmark.   Hadoop Terasort provides terabyte (TB) sort competition to run Hadoop benchmarking with sorting large data files e.g. 1TB or 1PB (1000x 1TB). […]

Read more

Hadoop Cluster Overview




Hadoop cluster In talking about Hadoop cluster, first we need to define two terms: cluster and node. A cluster is a collection of nodes. A node is a process running on a virtual or physical machine or in a container. We say process because a code would be running other programs beside Hadoop. When Hadoop is not running in cluster […]

Read more

HDFS JAVA Hadoop API Overview




Hadoop API Introduction Hadoop API provides a Java native API to support file system operations such as create, rename or delete files and directories, open, read or write files, set permissions, etc. A very basic example can be found on Apache wiki about how to read and write files from Hadoop API. This is great for applications running within the […]

Read more

Apache Hadoop Yarn Overview




Here we describe Apache Hadoop Yarn, which is a resource manager built into Hadoop. But it also is a stand-alone programming framework that other applications can use to run those applications across a distributed architecture. We illustrate Yarn by setting up a Hadoop cluster as Yarn by itself is not much to see. It is not something you work with […]

Read more

Oveview of hadoop cassandra




Overview of Hadoop Cassandra hadoop Cassandra is a noSQL opensource database.It was developed by Facebook to handle their unique needs to process enormous amounts of data. To say that it is noSQL does not mean it is unstructured. Data in hadoop Cassandra is stored in the familiar row-and-column datasets as a regular SQL database. But there are no relations between […]

Read more

Oveview of cassandra hadoop




Overview of Cassandra Hadoop Cassandra Hadoop is a noSQL opensource database.It was developed by Facebook to handle their unique needs to process enormous amounts of data. To say that it is noSQL does not mean it is unstructured. Data in Cassandra hadoop is stored in the familiar row-and-column datasets as a regular SQL database. But there are no relations between […]

Read more

Big Data Testing: Functional & Performance




Big Data Testing   What is Big Data? Big data is a collection of large datasets that cannot be processed using traditional computing techniques. Testing of these datasets involves various tools, techniques and frameworks to process. Big data relates to data creation, storage, retrieval and analysis that is remarkable in terms of volume, variety, and velocity. You can learn more […]

Read more

What is Hadoop Sqoop? What is FLUME – Hadoop Tutorial




What is Hadoop Sqoop? What is FLUME – Hadoop Tutorial In this Article we are going to learn hadoop sqoop and hadoop flume. Before we learn more about Flume and Sqoop , lets study Issues with Data Load into Hadoop Analytical processing using Hadoop requires loading of huge amounts of data from diverse sources into Hadoop clusters. This process of […]

Read more

hadoop Sqoop vs Flume vs HDFS in Hadoop




Sqoop vs Flume vs HDFS in Hadoop Please see the below table section for the difference between hadoop Sqoop vs Flume Vs HDFS. Hadoop Sqoop Flume  HDFS Hadoop Sqoop is used for importing data from structured data sources such as RDBMS. Flume is used for moving bulk streaming data into HDFS. HDFS is a distributed file system used by Hadoop […]

Read more
1 2 3 6