Handling “Json” and “Unstructured” Data in SQL

The below is to understand how we can handle JSON data in prior versions of SQL Server 2016.

Sample JSON data is:

  {“accountNumber”: 2020112, “pin”: 2525},
  {“accountNumber”: 2567899, “pin”: 1462}
  {“accountNumber”: 6789925, “pin”: 2614}
  {“accountNumber”: 9925678, “pin”: 6142}

This can be extracted into columns easily in SQL Server 2016 with the help of following code:

DECLARE @json NVARCHAR(MAX)

SET @json = N'[
{“accountNumber”: 2020112, “pin”: 2525},
{“accountNumber”: 2567899, “pin”: 1462},
{“accountNumber”: 6789925, “pin”: 2614},
{“accountNumber”: 9925678, “pin”: 6142}
]’

SELECT *
FROM OPENJSON(@json)
WITH (
Account INT ‘$.accountNumber’,
accountPin INT ‘$.pin’
)

GO

Json
However, what if the data is not in the exact JSON format i.e. no curly braces, no colon (:) to indicate name-value pairs and no square bracket to hold arrays and values separated by comma (,), in addition to this if it is to be worked on prior to SQL Server 2016 versions?

Sample data:

(accountNumber=2020112)(accountPin=2525)(Phone=+12345678)(countryId=121) (DateOfBirth=19810726)(NumberOfCallsMade=381)
(accountNumber=202019)(accountPin=98291)(Phone=)(countryId=1881) (DateOfBirth=19860526)(NumberOfCallsMade=31)

If you look at the data, there are two rows and are not in same lengthy strings since the values lengths are different. No value provided for Phone attribute in the second row.

The reason behind highlighting these points is, recently I got to work with this data and SUBSTRING function in SQL Server alone is not much a help. I managed to create the report at the end by using PATINDEX along with SUBSTRING functions.

This is to let you know that, within the available resources we can sort out the things even if it looked complicated and unsolvable.

Let me add these two rows in a TEMP table.
CREATE TABLE #Temp(StringVal VARCHAR(MAX))
INSERT INTO #Temp
SELECT
‘(accountNumber=2020112)(accountPin=2525)(Phone=+12345678)(countryId=121)(DateOfBirth=19810726)(NumberOfCallsMade=381)’
UNION SELECT
‘(accountNumber=202019)(accountPin=98291)(Phone=)(countryId=1881)(DateOfBirth=19860526)(NumberOfCallsMade=31)’

TempTableValues

Now let’s see how can we convert these strings into columns. 

SELECT
(SELECT LEFT(
     Val,PATINDEX(‘%[^0-9]%’, Val+’a’)-1) from(
 SELECT SUBSTRING(SUBSTRING(StringVal,16,7), PATINDEX(‘%[0-9]%’, SUBSTRING(StringVal,16,7)), LEN(SUBSTRING(StringVal,16,7))) Val
)X)AccountNumber,
(SELECT LEFT(
      Val,PATINDEX(‘%[^0-9]%’, Val+’a’)-1) from(
 SELECT SUBSTRING(SUBSTRING(StringVal,35,5), PATINDEX(‘%[0-9]%’,
SUBSTRING(StringVal,35,5)), LEN(SUBSTRING(StringVal,35,5))) Val
)X)Pin,
(SELECT LEFT(
      Val,PATINDEX(‘%[^0-9]%’, Val+’a’)-1) from(
 SELECT SUBSTRING(SUBSTRING(StringVal,47,10), PATINDEX(‘%[0-9]%’,
SUBSTRING(StringVal,47,10)), LEN(SUBSTRING(StringVal,47,10))) Val
)X)Phone,
(SELECT LEFT(
        Val,PATINDEX(‘%[^0-9]%’, Val+’a’)-1) from(
 SELECT SUBSTRING(SUBSTRING(StringVal,60,15), PATINDEX(‘%[0-9]%’,
SUBSTRING(StringVal,60,15)), LEN(SUBSTRING(StringVal,60,15))) Val
)X)CountryID,
(SELECT LEFT(
          Val,PATINDEX(‘%[^0-9]%’, Val+’a’)-1) from(
 SELECT SUBSTRING(SUBSTRING(StringVal,78,20), PATINDEX(‘%[0-9]%’,
SUBSTRING(StringVal,78,20)), LEN(SUBSTRING(StringVal,78,20))) Val
)X)DateOfBirth,
(SELECT LEFT(
          Val,PATINDEX(‘%[^0-9]%’, Val+’a’)-1) from(
 SELECT SUBSTRING(SUBSTRING(StringVal,105,20), PATINDEX(‘%[0-9]%’,
SUBSTRING(StringVal,105,20)), LEN(SUBSTRING(StringVal,105,20))) Val
)X)NumberOfCallsMade
FROM #Temp

unstructuredCode
Let’s see what substrings are returning.

SELECT SUBSTRING(StringVal,16,7) FROM #Temp
SELECT SUBSTRING(StringVal,35,5) FROM #Temp
SELECT SUBSTRING(StringVal,47,10) FROM #Temp
SELECT SUBSTRING(StringVal,60,15) FROM #Temp
SELECT SUBSTRING(StringVal,78,20) FROM #Temp
SELECT SUBSTRING(StringVal,105,20) FROM #Temp

substring

Substrings are returning invalid or unnecessary characters apart from the desired output. Hence we used PATINDEX to fetch only numbers.

This conversion can be achieved using some user defined function as well by reading row by row and character by character. As I said earlier there will be many ways to solve the issues.

 

2 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s