Creating a DataFrame in SparkSQL

A DataFrame in Spark is a distributed collection of data with named columns. It’s similar to a table in a relational database or a data frame in R/Python, but with more advanced optimizations. To manually create a DataFrame, use the createDataFrame() and toDF() methods. You can use these methods to create a Spark DataFrame from RDD, DataFrame, Dataset, List, and Seq data objects.

The example below shows how to create a dataframe with spark-shell.

var data = sqlContext.createDataFrame(Seq((101,”John”,10))).toDF(“empno”, “ename”, “locid”)

var data = sqlContext.createDataFrame(Seq((101,”John”,10),(102,”Smith”,20),(103,”Alan”,30))).toDF(“empno”, “ename”, “locid”)

data.show()

CreatingDataFrameInSparkSQL

Hope you find this article helpful.

Subscribing to this site will allow you to receive quick updates on future articles.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s