Apache Hive String Functions

The string functions available in the Apache Hive are as follows. The most of them are also available in Apache Impala. The behavior of the functions in both Apache Hive and Impala would not vary.

Concatenating Columns In Apache Hive & Apache Impala

The CONCAT function in Apache Hive incorporates two or more strings into a single string. This function’s behavior is the same as in relational database systems. Only string values are accepted by this function. If you want to combine integer values, you need to convert them to strings before concatenation. Note: The below specified functions…

Concatenating Rows in Apache Hive

The CONCAT_WS function in Apache Hive incorporates two or more strings into a single string with the specified delimiter. Example: SELECT fname, lname, CONCAT_WS(‘ ‘, fname, lname) FROM firstlastnames; However, this function can be used to combine row values into a single string. Using CONCAT WS along with COLLECT SET we can join the row…

Returning Nth element from the list – Apache Hive

The ELT function in Apache Hive helps in retrieving the Nth element from the list of the strings provided. This is equivalent to ELT in MySQL. This function returns NULL if N is less than 1 or greater than the number of arguments. ELT() is the complement of FIELD(). Syntax: elt(N int,str1 string,str2 string,str3 string,…)…

Formatting number in Apache Hive

Because of the many built-in functions provided by Apache Hive, the job made it easy for analysts, developers and programmers in most instances. Instead of using casting/converting functions to transform the integer values to decimal values, Hive provided a direct function that helps in those situations. Syntax: FORMAT_NUMBER(number x, int d) Formats the number X…

Process and analyze JSON documents – Apache Hive

A JSON file is a file in the JavaScript Object Notation (JSON) format that stores simple data structures and objects, which is a standard format for data interchange. It is used mainly for the transmission of data between a web application and a server. Though the JSON formatted data is is easy for machines to…

LOCATE Function in Apache Hive & Apache Impala

The LOCATE function is also helpful when looking for a substring in a string, similar to the INSTR function. Note: The below specified function is common in both Apache Hive & Apache Impala. Syntax: locate(string substr, string str[, int pos]) Returns the position of the first occurrence of substr in str after position pos. Below…

Find the position of a substring in a string – Apache Hive

INSTR function in Apache Hive helps in finding the position of a substring in a string. It returns only the first first occurrence of the given input. Returns null if either of the arguments are null and returns 0 if the substring could not be found in the string. By default, the first character in string has index 1.…

Finding the length of a string – Apache Hive & Apache Impala

The LENGTH() function returns the length of a string. Both leading and trailing spaces at the begin and end of the string is included when calculating the length.  Note: The below specified function is common in both Apache Hive & Apache Impala. Syntax: LENGTH(String)This accepts both characters and numbers in the input but returns INT…

Case Conversion in Apache Hive & Apache Impala

Apache Hive provides many features that SQL developers are familiar with. The behavior in much of the functions is the same. Such functions help to interpret, translate, quantify and format data. The aim of this post is to let you know the functions that help view data in various character cases, such as Uppercase, Lowercase…

Padding Functions in Apache Hive & Apache Impala

The padding functions are used to fill the left or right side of a string with a specific set of characters. These functions are useful for formatting the output of a query. It is necessary to specify the padding length, which is a number representing the total length of the string returned after padding. Note:…

Trimming Functions in Apache Hive & Apache Impala

Trimming functions help to remove spaces from a given string. These functions are often used in data cleansing operations during the ETL cycle. While this is not a new innovation for SQL lovers, there is something that Cloudera’s Impala is offering that everybody would appreciate.Let’s look into the trimming functions one by one.LTRIM:This function removes…

Parsing Web URL Data In Apache Hive

A lot of data is generated daily from smartphone apps, blogs, social media networks, games, online shopping, electronic payment channels, etc. Data can be for client knowledge, user habits, web traffic, demographics and more. Each and every data is filled with potential if it can be properly evaluated. Proper data analysis can aid in many…

Printf Function in Apache Hive

The PRINTF function returns the formatted input according to the printf-style format string. When using hive scripts, this would be beneficial. Syntax: printf(String format, Obj… args) Example:SELECT PRINTF(‘hello’)This returns “hello” as output. While using the Hive scripts, it can be added as follows:

Regular Expression Extract (Regexp_Extract) In Apache Hive

Regular expression functions identify precise patterns of characters in the given string and returns the matching text item from the string/data. Syntax: REGEXP_EXTRACT(STRING subject, STRING pattern, INT index) Let’s see practically how it works. Below is the sample records from the dataset:ReleaseYear MovieTitle1969 Downhill Racer1970 M*A*S*H1970 The Party at Kitty and Stud’s1970 Lovers and Other…

Regular Expressions – RLIKE in Hive

Either SQL Developers or Data Analysts often use arithmetic operators in their queries such as =, >, <, != Apart from this, LIKE, EXISTS, IN, NOT IN, NOT EXISTS, etc. will also be used very frequently.  These are all will help in fetching only the required data. There is a relational operator (RLIKE) which might…

Best Example For Repeat Function In Apache Hive & Impala

When the new business rules are introduced, it might be necessary to perform any cleanup or transformations on existing data. The below is an use case. In order to have an unified employee ID, a company that took over the current company has adopted a new business rule. Current employee IDs will not be updated,…

TRANSLATE & REPLACE functions in Cloudera’s Impala & Apache Hive

This article introduces the new TRANSLATE and REPLACE string functions available from version 2.9.0 of Cloudera Impala. Both of these functions look identical, shifting letters from one to the other. There is a major difference between them, however. Let’s see how close these functions are: SELECT REPLACE (‘Flat_720’, ‘_’, ‘#’);SELECT TRANSLATE (‘Flat_720′,’_’,’#’); Result:Both returns “Flat#720’…

REVERSE Function in Apache Hive & Apache Impala

REVERSE(STRING a)The REVERSE function accepts a character expression as its argument, and returns a string of the same length, but the ordinal position of each logical character is reversed. However, this only works for STRING.This function and its behaviour is same in both Apache Hive & ImpalaExample:SELECT REVERSE(‘slipknot’);In case if you want to reverse a…

SPACE function in Apache Hive & Apache Impala

There are some functions that hardly come into use. Often this kind of function would have some other replacement that we use. SPACE is one of the functions introduced by Cloudera’s Impala, which returns the concatenated string to the specified number of spaces. This function is available in Apache Hive too. Let’s see what it…

Word Count in HiveQL – Explode and Split Usage

This article aims to explain the usage of the SPLIT function in HiveQL. If you are looking for a similar function in SQL Server, then please click here. Let’s create a staging table to load the data temporarily.CREATE TABLE tempData (col1 STRING); Load the data to the table.LOAD DATA INPATH ‘Desktop/DataFile’ OVERWRITE INTO TABLE tempData;…

Substring_Index Function in Apache Hive

The aim of this post is to let you know how to use the SUBSTRING INDEX() function to get a substring from a string until the delimiter occurs. Anything to the left of the final delimiter (counting from the left) is returned when the count is positive. If the count is negative, it returns everything…

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s