Convert Multiple Date Formats into A single Format

Representation of the date and time varies between countries, various applications or application logs. When stored in a table form, we always use a single format to prevent misunderstanding. It is mandatory to be in a single format for querying and analytical purposes. If a dataset has a date column in different date-formats it needs to be converted into a single format. In this article you’ll learn how it can be done in Apache Hive.

Consider the below dataset for the practice.

Empid,EName,  Job, MgrID, HireDate, Sal, Comm, DeptID
7839,KING,PRESIDENT,0,17-Nov-1981,5000,0,10
7698,BLAKE,MANAGER,7839,01-May-1981,2850,0,30
7782,CLARK,MANAGER,7839,Sep-06 1981,2450,0,10
7566,JONES,MANAGER,7839,Feb-04 1981,2975,0,20
7788,SCOTT,ANALYST,7566,13-Jul-1987,3000,0,20
7902,FORD,ANALYST,7566,03/Dec/1981,3000,0,20
7369,SMITH,CLERK,7902,17/Dec/1980,800,0,20
7499,ALLEN,SALESMAN,7698,20/Feb/1981,1600,300,30
7521,WARD,SALESMAN,7698,22/Feb/1981,1250,500,30
7654,MARTIN,SALESMAN,7698,09/28/1981,1250,1400,30
7844,TURNER,SALESMAN,7698,08/09/1981,1500,0,30
7876,ADAMS,CLERK,7788,13/Jul/1987,1100,0,20
7900,JAMES,CLERK,7698,Dec-03 1981,950,0,30
7934,MILLER,CLERK,7782,Jan-23 1982,1300,0,10

If you look at the above dataset, the column “hiredate” has different date formats as shown below: 
>   17-Nov-1981
>   Sep-06 1981
>   03/Dec/1981
>   09/28/1981

Let’s see how it can be converted into a single date format. Below is the implementation done in Apache Hive using regular expressions.

SELECT hiredate,
CASE
     WHEN hiredate REGEXP ‘(\\d{1,2})\\-(\\S{3})\\-(\\d{4})’
          THEN FROM_UNIXTIME(UNIX_TIMESTAMP(hiredate, “dd-MMM-yyyy”), “MM/dd/yyyy”)
      WHEN hiredate REGEXP ‘(\\d{1})\\/(\\d{1,2})\\/(\\d{1,4})’
          THEN FROM_UNIXTIME(UNIX_TIMESTAMP(hiredate, “MM/dd/yyyy”), “MM/dd/yyyy”)
      WHEN hiredate REGEXP ‘(\\d{1,2})\\/(\\S{3})\\/(\\d{4})’
          THEN FROM_UNIXTIME(UNIX_TIMESTAMP(hiredate, “dd/MMM/yyyy”), “MM/dd/yyyy”)
      WHEN hiredate REGEXP ‘(\\S{3})\\-(\\d{1,2})\\s(\\d{4})’
          THEN FROM_UNIXTIME(UNIX_TIMESTAMP(hiredate, “MMM-dd yyyy”), “MM/dd/yyyy”)
ELSE hiredate
END
FROM emp;

regex conversion multiple dates

Almost every article we spoke about a variety of approaches to solve the problem. There is an alternate approach in this case too.

Method-2:
SELECT hiredate, COALESCE(
FROM_UNIXTIME(UNIX_TIMESTAMP(hiredate ,”dd-MMM-yyyy”), “MM/dd/yyyy”),
FROM_UNIXTIME(UNIX_TIMESTAMP(hiredate ,”MM/dd/yyyy”), “MM/dd/yyyy”),
FROM_UNIXTIME(UNIX_TIMESTAMP(hiredate ,”dd/MMM/yyyy”), “MM/dd/yyyy”),
FROM_UNIXTIME(UNIX_TIMESTAMP(hiredate ,”MMM-dd yyyy”), “MM/dd/yyyy”)
) As hiredate2
FROM emp;

Coalesce conversion multiple dates

Hope you find this article helpful.

Please follow us for more interesting updates.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s