Hadoop Mapreduce Streaming Tricks and Techniques

May 09, 2015

∙ Paid

I have been using Hadoop a lot now a days and thought about writing some of the novel techniques that a user could use to get the most out of the Hadoop Ecosystem.

Using Shell Scripts to run your Programs

I am not a fan of large bash commands. The ones where you have to specify the whole path of the jar files and the such. You can effectively organize your workflow by using shell scripts. Now Shell scripts are not as formidable as they sound. We wont be doing programming perse using these shell scripts(Though they are pretty good at that too), we will just use them to store commands that we need to use sequentially.

Below is a sample of the shell script I use to run my Mapreduce Codes.

#!/bin/bash
#Defining program variables
IP="/data/input"
OP="/data/output"
HADOOP_JAR_PATH="/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.5.0.jar"
MAPPER="test_m.py"
REDUCER="test_r.py"

hadoop fs -rmr -skipTrash&nbsp;$OP
hadoop jar&nbsp;$HADOOP_JAR_PA…

Continue reading this post for free, courtesy of Rahul Agarwal.

Or purchase a paid subscription.

MLWhiz | AI Unwrapped

Hadoop Mapreduce Streaming Tricks and Techniques

Using Shell Scripts to run your Programs

Continue reading this post for free, courtesy of Rahul Agarwal.