A DataFrame in Spark is a distributed collection of data with named columns. It’s similar to a table in a relational database or a data frame in R/Python, but with more advanced optimizations. To manually create a DataFrame, use the createDataFrame() and toDF() methods. You can use these methods to create a Spark DataFrame from RDD, DataFrame, Dataset, List, and Seq data objects.
The example below shows how to create a dataframe with spark-shell.
var data = sqlContext.createDataFrame(Seq((101,”John”,10))).toDF(“empno”, “ename”, “locid”)
var data = sqlContext.createDataFrame(Seq((101,”John”,10),(102,”Smith”,20),(103,”Alan”,30))).toDF(“empno”, “ename”, “locid”)
data.show()
Hope you find this article helpful.
Subscribing to this site will allow you to receive quick updates on future articles.