spark編程基礎(chǔ) Spark RDD,DataFrame和DataSet的區(qū)別?
Spark RDD,DataFrame和DataSet的區(qū)別?官網(wǎng)解釋:RDD:A Resilient Distributed Dataset (RDD), the basic abstraction
Spark RDD,DataFrame和DataSet的區(qū)別?
官網(wǎng)解釋:RDD:A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.rdd是一個(gè)分布式的數(shù)據(jù)集,數(shù)據(jù)分散在分布式集群的各臺(tái)機(jī)器上A DataFrame is equivalent to a relational table in Spark SQL, and can be created using various functions in SQLContextdataframe更像是一張關(guān)系型數(shù)據(jù)表,是一種spark獨(dú)有的數(shù)據(jù)格式吧,這種格式的數(shù)據(jù)可以使用sqlcontext里面的函數(shù)