The majority of the blog entries are written in a style that novices may understand. I will give some practice tasks to be completed using Apache Hive in order to develop learning abilities and to provide practical exposure and knowledge growth.
There will be a set of assignments with data sets, but without solutions.
Title : Split the data into columns.
Dataset: Click here.
Assignment: A CSV data file including movie names and their release years is supplied to you. Your assignment as follows:
1) You must divide the information into two columns: “ReleasedYear” and “MovieTitle.”
2) Identify the duplicate entries.
3) Find out how many times the word “red” appears in a title.
Hints: Click here.
All the best.
One comment