Connecting to Spark in CDH

This post will show you how to use Cloudera’s Hadoop Distribution Quick Start VM to connect to Spark.

To enter Spark and work with it using Scala, type the following in the terminal.

[cloudera@quickstart ~]$ spark-shell

ConnectingToSpark

There are two points to discuss here. As you can see, one of the points is “Failed to obtain database default“. “SQL context is available as sqlContext“, says the second item.

Let’s talk about the issue in fetching ‘default’ database. This happens because Spark attempted to connect to the Hive metastore but was unable to do so. This could be due to a variety of factors, but first, check the hive-server and hive-metastore services. Starting these servers usually solves the problem.

To verify the status of the services, open a terminal and type the commands below.
sudo service hive-metastore status
sudo service hive-server2 status

If the services are not running as shown in the image below, perform the commands below to start them.
sudo service hive-metastore start
sudo service hive-server2 start

Restart the Spark session at this point. Since the Spark session is currently operating on the default port, it will try to find the next available port if you open a new one.

sudo_service_status_start

Let’s move on to the second point about “SqlContext,” which is a class used to setup the Spark SQL functionality. The SparkContext class object (sc) is necessary for the SQLContext class object to be initiated.

The SparkContext is initialized using spark-shell with the line “spark-shell”.

Hope you find this article helpful.

Subscribing to this site will allow you to receive quick updates on future articles.