Using XPath_String to read XML in Hive

As stated in the previous article – XPath_Short, XPath_Int, XPath_Long, XPath_Float, XPath_Double, XPath_Number, and XPath_String are the XPath UDFs available in Apache Hive that helps in fetching the required information from the XML document. 

Let’s do the exercise to understand how XPath_String can be used.

Here’s my sample XML data.
<CustInfo>
<CustName>Melanie Pike</CustName>
<CustPhone>00971305374745</CustPhone>
</CustInfo>
<CustInfo>
<CustName>Rose Moore</CustName>
<CustPhone>00971305374746</CustPhone>
</CustInfo>
<CustInfo>
<CustName>Dalillah</CustName>
<CustPhone>00971305374747</CustPhone>
</CustInfo>

My text will look like this-

XML_SampleData

Let’s create a table in Apache Hive and load this data into it.

–Creating the test table to hold the xml document in a single column.
CREATE TABLE TestXML (line STRING);

–Importing the data into the table.
LOAD DATA LOCAL INPATH ‘Desktop/Docs/xmltest.csv’ INTO TABLE TestXML;

Now, let’s use XPath to retrieve needed information.

SELECT XPATH_STRING(line,’//CustPhone’) from TestXML;

This returns the following output-
00971305374745
00971305374746
00971305374747

SELECT  XPATH_STRING(LINE,’//CustName’),
               XPATH_STRING(line,’//CustPhone’)
FROM TestXML;

This returns the following –
Melanie Pike     00971305374745
Rose Moore      00971305374746
Dalillah              00971305374747

xpath_string_xml_scrshot

Hope you find this article helpful.

Please subscribe to receive updates on latest posts.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s