Description:
The goal of generating frequent itemsets using MapReduce is to find items that occur frequently in a dataset, based on a predefined minimum support threshold. The process begins by mapping transactions and generating candidate itemsets, then counting their frequencies in the reduce phase. Iteratively, larger itemsets are generated using frequent itemsets from previous iterations.
Steps for Generating Frequent Itemset Using MapReduce
Step 1: Mapper Function
Map Phase:
Input: Line number and transaction.
Process: Extract items from each transaction and set item count to 1.
Output:<item, 1> (for each item in the transaction)
Step 2: Reducer Function
Reduce Phase:
Input:<item, list of counts>
Process: Sum the counts for each item occurrence. If the occurrence of an item satisfies the minimum support (minsup), emit it as a frequent item.