<k1, v1>
k1:
Line number v1:
Record
<k2, v2>
k2:
Attribute with class label v2:
1 (for each occurrence of the attribute with class label)
<k2, List<v2>>
<k3, v3>
k3:
Attribute with class label v3:
Frequency (count of occurrences)
<k3, v3>
<k4, v4>
k4:
Attribute v4:
Entropy, Information Gain, and Split Information
<k4, List<v4>>
<k5, v5>
k5:
Decision node (best splitting attribute) v5:
Information Gain Ratio
<k5, v5>
<k6, v6>
k6:
Node ID v6:
Elements (attribute values)