Research Breakthrough Possible @S-Logix pro@slogix.in

Office Address

Social List

How to Write Map Reduce Output to Multiple Directories in Hadoop

Write Map Reduce Output to Multiple Directories

Steps for Writing Map Reduce Output to Multiple Directories

  • Description:
    In Hadoop, the MapReduce framework writes the output of a job to a single output directory by default. However, for various reasons (such as splitting the output by categories or time), you may need to write the output to multiple directories. The MapReduce job can be configured to use different output formats, each specifying a different directory.
  • Steps:
    • 1. Create Custom OutputFormat: Implement a custom OutputFormat class where you can define different directories based on conditions like key, value, or data parts.
    • 2. Use MultipleOutputs: Utilize the MultipleOutputs utility in Hadoop to write outputs to multiple directories based on keys or other conditions.
Source Code
  • import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;
    import org.apache.hadoop.io.NullWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Reducer;
    import java.io.IOException;

    public class ReducerClass extends Reducer {
    private MultipleOutputs multioutput;

    @Override
    protected void setup(Context context) throws IOException, InterruptedException {
    super.setup(context);
    multioutput = new MultipleOutputs<>(context);
    }

    @Override
    protected void reduce(Text key, Iterable value, Context context) throws IOException, InterruptedException {
    String fileName = key.toString() + Constants.FILE_NAME_PREFIX;
    for (Text val : value) {
    multioutput.write(NullWritable.get(), val, fileName);
    }
    }

    @Override
    protected void cleanup(Context context) throws IOException, InterruptedException {
    multioutput.close();
    }
    }
Screenshots
  • Map Reduce Multiple Directories Screenshot 1