As stated in the previous article – XPath_Short, XPath_Int, XPath_Long, XPath_Float, XPath_Double, XPath_Number, and XPath_String are the XPath UDFs available in Apache Hive that helps in fetching the required information from the XML document.
Let’s do the exercise to understand how XPath_String can be used.
Here’s my sample XML data.
<CustInfo>
<CustName>Melanie Pike</CustName>
<CustPhone>00971305374745</CustPhone>
</CustInfo>
<CustInfo>
<CustName>Rose Moore</CustName>
<CustPhone>00971305374746</CustPhone>
</CustInfo>
<CustInfo>
<CustName>Dalillah</CustName>
<CustPhone>00971305374747</CustPhone>
</CustInfo>
My text will look like this-
Let’s create a table in Apache Hive and load this data into it.
–Creating the test table to hold the xml document in a single column.
CREATE TABLE TestXML (line STRING);
–Importing the data into the table.
LOAD DATA LOCAL INPATH ‘Desktop/Docs/xmltest.csv’ INTO TABLE TestXML;
Now, let’s use XPath to retrieve needed information.
SELECT XPATH_STRING(line,’//CustPhone’) from TestXML;
This returns the following output-
00971305374745
00971305374746
00971305374747
SELECT XPATH_STRING(LINE,’//CustName’),
XPATH_STRING(line,’//CustPhone’)
FROM TestXML;
This returns the following –
Melanie Pike 00971305374745
Rose Moore 00971305374746
Dalillah 00971305374747
Hope you find this article helpful.
Please subscribe to receive updates on latest posts.