XPath UDF’s in Apache Hive

There are other built-in functions that assist in getting essential data from an XML document, in addition to XPath and XPath String.

Despite the fact that each of these functions yields a different outcome, they all work in the same way. Everything is based on the information in the XML document. Although no XML validation is done, incorrect xml generates an error.

The following is a list of functions from the language manual:

  • xpath_boolean returns a boolean. It returns true if the XPath expression evaluates to true, or if a matching node is found.
  • xpath_short returns a short integer.
  • xpath_int returns an integer.
  • xpath_long returns a long integer.
  • xpath_float returns a floating point number.
  • xpath_double,xpath_number returns a double-precision floating point number (xpath_number is an alias for xpath_double).

Here is my data for this exercise:

XML_UDF_Data

—Creating the table
CREATE TABLE xmltest2(line String);

—Loading the data into the table.

LOAD DATA LOCAL INPATH ‘Desktop/Docs/xmltest2.csv’ INTO TABLE xmltest2;

—Here is all UDFs in a single query

SELECT

xpath_short(line, ‘/ProductInfo/ProductID/text()’) as ProductID,
xpath_int(line, ‘/ProductInfo/BranchID/text()’) as BranchID,
xpath_long(line, ‘/ProductInfo/UniqueID/text()’) as UniqueID,
xpath_float(line, ‘/ProductInfo/Price/text()’) as Price,
xpath_double(line, ‘/ProductInfo/TotalSales/text()’) as TotalSales,
xpath_boolean(line, ‘/ProductInfo/UniqueID/text()’) as UniqueIDExists
FROM xmltest2;

The below is the outcome of the query:

ProductID   BranchID        UniqueID              Price      TotalSales     UniqueIDExists

100               1147483647     971305374745     200.5      2405228.59    true
101               1847483647     971305374746      475.5     4941198.59    true
102               1947483647     0                              882.0     9114101.59    false

XML_UDF_All

Hope you find this article helpful.

Please subscribe to get updates on latest posts.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s