spark是啥 spark上怎么講讀取的數(shù)據(jù)的某幾行合并成一行,新手?
spark上怎么講讀取的數(shù)據(jù)的某幾行合并成一行,新手?#RDD指定的行數(shù)據(jù)在spark中拼接,RDD合并為一行。Python實(shí)現(xiàn)frompyparkimportsparkcontextsc=spark
spark上怎么講讀取的數(shù)據(jù)的某幾行合并成一行,新手?
#RDD指定的行數(shù)據(jù)在spark中拼接,RDD合并為一行。Python實(shí)現(xiàn)frompyparkimportsparkcontextsc=sparkcontext(“l(fā)ocal”,“myapp”)行=sc.并行化([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12
)獲取指定行的數(shù)據(jù)并返回rdddefgetnum:#全局行號(hào)迭代globalcounter#指定行號(hào)globalcounter=1If(counter==row):returncounter=0row=3?Cache()緩存數(shù)據(jù)x1=行。篩選器(getnum).cache()行=4x2=行。篩選器(getnum)。Cache()#生成rddxx=x1。包含兩個(gè)RDD中所有元素的并集(x2)打印xx.collect()