java - Apache spark map-reduce explanation -


i'm wondering how works little snippet:

if have text:

ut quis pretium tellus. fusce quis suscipit ipsum. morbi viverra elit ut malesuada pellentesque. fusce eu ex quis urna lobortis finibus. integer aliquam faucibus neque id cursus. nulla non massa odio. fusce pretium felis felis, @ malesuada felis blandit nec. praesent ligula enim, gravida sit amet scelerisque eget, porta non mi. aenean vitae maximus tortor, ac facilisis orci.

and snippet code count occurences of each words on text above:

        // load  input data.         javardd<string> input = sc.textfile(inputfile);         // split words.         javardd<string> words = input.flatmap(new flatmapfunction<string, string>() {             public iterable<string> call(string x) {                 return arrays.aslist(x.split(" "));             }         });         // transform word , count.         javapairrdd<string, integer> counts = words.maptopair(new pairfunction<string, string, integer>() {             public tuple2<string, integer> call(string x) {                 return new tuple2(x, 1);             }         }).reducebykey(new function2<integer, integer, integer>() {             public integer call(integer x, integer y) {                 return x + y;             }         }); 

it's simple understand line

javardd<string> words = input.flatmap(new flatmapfunction<string, string>() {                 public iterable<string> call(string x) {                     return arrays.aslist(x.split(" "));                 }             }); 

creates dataset containing whole words splitted space

and line gives @ each tuple value of one, example:

javapairrdd<string, integer> counts = words.maptopair(new pairfunction<string, string, integer>() {                 public tuple2<string, integer> call(string x) {                     return new tuple2(x, 1); 

ut,1
quis,1 //go on

i'm confused on how reducebykey works, , how can count occurences of each words?

thanks in advance.

reducebykey groups tuples key (first argument in each tuple) , makes reduce each of group.

like this:

(ut, 1), (quis, 1), ..., (quis, 1), ..., (quis, 1), ... maptopair

               \            /             |                           reducebykey                       +                  (quis, 1+1)              |                        \                 /                          \             /                                   +                             (quis, 2+1) 

Comments

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

php - Bypass Geo Redirect for specific directories -

php - .htaccess mod_rewrite for dynamic url which has domain names -