Parsing Web URL Data In Apache Hive

A lot of data is generated daily from smartphone apps, blogs, social media networks, games, online shopping, electronic payment channels, etc. Data can be for client knowledge, user habits, web traffic, demographics and more. Each and every data is filled with potential if it can be properly evaluated. Proper data analysis can aid in many ways, such as finding potential clients, increasing customer retention, enhancing customer service, improving marketing efforts, predicting business trends, and so on.

Such analyzed data helps companies to create baselines, benchmarks and strategies and keep going forward.

An overview of information obtained from a structured URL is a simple and efficient way to consider consumer tastes among several ways to obtain insight into the digital behaviour of the user. Converting the data of the website URL to a readable and analyzable format thus becomes the opening wedge of the solution.

The purpose of this article is to help you understand the URL parsing and presenting the the website interaction information in a readable format that can help in understanding the business.

Example Data: (Click on the image to enlarge)

ParseURLData
For download, click here.

Let’s create the table in Hive.

CREATE TABLE parseurldata(
id INT,
url STRING,
date STRING,
publisher STRING,
advt STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
LINES TERMINATED BY ‘\n’;

–Loading the data into the table
LOAD DATA LOCAL INPATH ‘Desktop/parseurldata.txt’ INTO TABLE parseurldata;

parseurl_loadeddata

Let’s parse this URL data into readable format.

SELECT PARSE_URL(url, ‘QUERY’, ‘keyword’) KEYWORD,
PARSE_URL(url, ‘HOST’) host ,
PARSE_URL(url, ‘QUERY’, ‘country’) country from parseurldata;

parse_url_without_split

Hope you like this post.

Please do click on the follow button for more interesting updates.

5 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s