join - How do I remove rows of an RDD whose key is not in another RDD? -
let's have pairrdd, students (id, name). keep rows id in rdd, activestudents (id).
the solution have create pairdd activestudents, (id, id), , join students.
is there more elegant way of doing this?
thats pretty solution start with. if active students small enough collect ids map , filter id presence (this avoids having shuffle).
Comments
Post a Comment