apache spark - java.lang.ClassCastException: scala.Tuple2 cannot be cast to java.lang.Iterable -


working java in spark, want parse text document called artist_data.txt; first created javardd;

javardd rawartistdata = sc.textfile("src/main/resources/artist_data.txt"); parse document, has tab sperator has bad lines number of lines appear corrupted. don't contain tab, or inadvertently include newline character. need use flatmap method;

now running code below, got error; java.lang.classcastexception: scala.tuple2 cannot cast java.lang.iterable

javardd<tuple2<integer, string>> artistbyid0 = rawartistdata  					.flatmap(new flatmapfunction<string, tuple2<integer, string>>() {  						private static final long serialversionuid = 1l;  						@suppresswarnings("unchecked")  						public iterable<tuple2<integer, string>> call(string s) {  							 string[] sarray = s.split("\t");  							return (iterable<tuple2<integer, string>>) new tuple2<integer, string>   							(integer.parseint(sarray[0]), sarray[1].trim());  						}  					});  		   		    		  javapairrdd<integer, string> artistbyid = javapairrdd.fromjavardd(artistbyid0);  		    		  system.out.println(artistbyid.count());

this happening because flatmap expects list of lists, truncate internal lists 1 list. splitting , parsing in 1 go, need map function return tuple directly.

a more typical usecase of flatmap return array split directly, result in of arrays truncated 1 list have of words instead of bunch of separate lists of words.

per comment, sounds code sample shown not display true usecase. if have possibility of returning nothing due bad data, want following:

javardd<tuple2<integer, string>> artistbyid0 = rawartistdata                 .flatmap(new flatmapfunction<string, tuple2<integer, string>>() {                     private static final long serialversionuid = 1l;                     @suppresswarnings("unchecked")                     public iterable<tuple2<integer, string>> call(string s) {                          string[] sarray = s.split("\t");                          list<tuple2<integer, string>> returnlist = new arraylist<tuple2<integer, string>>();                          if(sarray.length >= 2)                             returnlist.add(new tuple2<integer, string> (integer.parseint(sarray[0]), sarray[1].trim()));                          return returnlist;                         );                     }                 }); 

notice return list items in if split split 2 or more items.


Comments

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

php - .htaccess mod_rewrite for dynamic url which has domain names -

Website Login Issue developed in magento -