java - Apache spark map-reduce explanation -

May 15, 2011

i'm wondering how works little snippet:

if have text:

ut quis pretium tellus. fusce quis suscipit ipsum. morbi viverra elit ut malesuada pellentesque. fusce eu ex quis urna lobortis finibus. integer aliquam faucibus neque id cursus. nulla non massa odio. fusce pretium felis felis, @ malesuada felis blandit nec. praesent ligula enim, gravida sit amet scelerisque eget, porta non mi. aenean vitae maximus tortor, ac facilisis orci.

and snippet code count occurences of each words on text above:

        // load  input data.         javardd<string> input = sc.textfile(inputfile);         // split words.         javardd<string> words = input.flatmap(new flatmapfunction<string, string>() {             public iterable<string> call(string x) {                 return arrays.aslist(x.split(" "));             }         });         // transform word , count.         javapairrdd<string, integer> counts = words.maptopair(new pairfunction<string, string, integer>() {             public tuple2<string, integer> call(string x) {                 return new tuple2(x, 1);             }         }).reducebykey(new function2<integer, integer, integer>() {             public integer call(integer x, integer y) {                 return x + y;             }         });

it's simple understand line

javardd<string> words = input.flatmap(new flatmapfunction<string, string>() {                 public iterable<string> call(string x) {                     return arrays.aslist(x.split(" "));                 }             });

creates dataset containing whole words splitted space

and line gives @ each tuple value of one, example:

javapairrdd<string, integer> counts = words.maptopair(new pairfunction<string, string, integer>() {                 public tuple2<string, integer> call(string x) {                     return new tuple2(x, 1);

ut,1
quis,1 //go on

i'm confused on how reducebykey works, , how can count occurences of each words?

thanks in advance.

reducebykey groups tuples key (first argument in each tuple) , makes reduce each of group.

like this:

(ut, 1), (quis, 1), ..., (quis, 1), ..., (quis, 1), ... maptopair

               \            /             |                           reducebykey                       +                  (quis, 1+1)              |                        \                 /                          \             /                                   +                             (quis, 2+1)

Search This Blog

Script

java - Apache spark map-reduce explanation -

Comments

Post a Comment

Popular posts from this blog

javascript - Bootstrap Popover: iOS Safari strange behaviour -

Magento/PHP - Get phones on all members in a customer group -

spring cloud - How to configure SpringCloud Eureka instance to point to https on non standard port -