MapReduce

Assignments 1. Calculate the relative frequencies of the co-occurrences of word pairs. Download the sample code, i.e., pairs stripes.zip, from moodle. Piece together the codes to calculate the relative frequencies of the co-occurrences of word pairs. 2. In the second implementation of the calculation of relative frequencies, remove the specific partitioner method from the mapreduce class. Instead, modify the wordpair class so that all the word pairs sharing the same left word go to the same reduce task. 2 Definition 1. A word pair is a pair of words that are right next to each other. 2. The relative frequency of a word pair is defined as follows f(wj |wi) = N(wi , wj ) N(wi , *) . (1) N(wi , wj ) is the number of co-occurrences of the word pair (wi , wj ). N(wi , *) is the number of co-occurrences of any word pair in which one word is wi . 3. The input is a set of files. Each file contains a sequence of words that are separated by space. 4. The output should be ordered by the left word in a pair. For all the pairs with the same left word, order them using the right word.