python - dataframe or sqlctx (sqlcontext) generated "Trying to call a package" error -


i using spark 1.3.1. in pyspark, have created dataframe rdd , registered schema, :

datalen=sqlctx.createdataframe(myrdd, ["id", "size"]) datalen.registertemptable("tbl") 

at point fine can make "select" query "tbl", example "select size tbl id='abc'".

then in python function, define :

def  getsize(id):     total=sqlctx.sql("select size tbl id='" + id + "'")     return total.take(1)[0].size 

at point still no problem, can getsize("ab") , return value.

the problem occurred when invoked getsize within rdd, have rdd named data of (key, value) list, when do

data.map(lambda x: (x[0], getsize("ab")) 

this generated error is

py4j.protocol.py4jerror: trying call package

any idea?

spark doesn't support nested actions or transformations , sqlcontext not accessible outside driver. you're doing here cannot work. not clear want here simple join, either on rdds or dataframes should trick.


Comments

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

php - Bypass Geo Redirect for specific directories -

php - .htaccess mod_rewrite for dynamic url which has domain names -