amazon s3 - Predetermining number of partitions of RDD -


1)how pre-determine number of rdd partitions created?
2)what factors partitioning of data depend on? size of data , way stored(compressed, sequence etc..)

for simplicity, assume have 6gb file in hdfs stored plain text file.

my cluster ec2 cluster below config,

1 master node - m3.xlarge(4 cores, 15gb ram)

4 core nodes - m3.xlarge(4 cores , 15gb ram each)

update: happens if same stored in s3, hbase or nosql?

the partitions dependent on file type. in case, since hdfs file, default number of partitions number of input splits , depend on hadoop setup. if want way of understanding how works.

from hadooprdd.getpartitions:

val inputsplits = inputformat.getsplits(jobconf, minpartitions) val array = new array[partition](inputsplits.size) 

Comments

Popular posts from this blog

javascript - Bootstrap Popover: iOS Safari strange behaviour -

Website Login Issue developed in magento -

Can the constants be defined inside a model file of a framework in PHP? -