To split records in one column into multiple columns - sql-server

I am not sure whether this is achievable in sql server or not. I have a table ADDRESS and its metadata is given below. SQ_ID is my unique key column
SQ_ID INT
ADDRESS_LINE_1 varchar(255)
ADDRESS_LINE_2 varchar(255)
ADDRESS_LINE_3 varchar(255)
ADDRESS_LINE_4 varchar(255)
REGION varchar(255)
POSTOCODE varchar(255)
COUNTRY_CODE varchar(255)
Now I have loaded around 2 millions of records into this table and the problem is that all the address details have been loaded into ADDRESS LINE 1 column in this table. Now I am trying to find a way to split this. Below given is some of the sample set of addresses.
So I want to split the data in Address line 1 in such a way that
value in addressLine1 in the given record should be populated to
addressline1 in the same table value in addressLine2 in the given
record should be populated to addressline2 in the same table value in
addressLine3 in the given record should be populated to addressline3
in the same table value in primaryTown in the given record should be
populated to addressline4 in the same table value in provinceOrState
in the given record should be populated to region in the same table
value in isoAlpha2Code in the given record should be populated to
COUNTRY_CODE in the same table value in zipOrPostalCode in the given
record should be populated to POSTOCODE in the same table
My Expected output is given below
Also I am attaching set of sample address values for testing
{'addressLine1': '67 xxxx Road', 'primaryTown': 'HOxxxCHxxCH',
'zipOrPostalCode': 'RM11 1EX', 'addressCountry': {'isoAlpha2Code':
'GB'}}
{'primaryTown': 'MünXXer', 'addressCountry': {'isoAlpha2Code': 'DE'}}
{'addressLine1': '28 THE EXCAC', 'primaryTown': 'PERTH',
'provinceOrState': 'WA', 'addressCountry': {'isoAlpha2Code': 'AU'}}
{'addressLine1': '28 THE ESPLANADE', 'primaryTown': 'PERTH',
'provinceOrState': 'WA', 'addressCountry': {'isoAlpha2Code': 'AU'}}
{'addressLine1': '76 XXX STREET', 'primaryTown': 'MAXDFOT',
'provinceOrState': 'NSW', 'addressCountry': {'isoAlpha2Code': 'AU'}}
{'addressLine1': 'UNIT 56', 'addressLine2': '22 XDFR STREET',
'primaryTown': 'MANLY VALE', 'provinceOrState': 'NSW',
'addressCountry': {'isoAlpha2Code': 'AU'}}
{'addressLine1': 'BjoXCDSaret 15', 'addressLine2': 'Jppsdeheim',
'addressLine3': '', 'primaryTown': 'AKERdwfUS', 'addressCountry':
{'isoAlpha2Code': 'NO'}}

You can't parse JSON by splitting. SQL Server 2016 and later supports JSON though, so you can extract the data you want using functions like JSON_VALUE.
You don't have to do that though, unless you want to index and filter those values. It may make sense to extract the country or zip code if you intend to filter or group results using them. Address lines on the other hand can't be queried and might as well remain in the JSON string. You can always extract them when necessary with JSON_VALUE.
Puttin primaryTown in AddressLine4 looks like a mistake though. That is a significant attribute that could be used in querying. It should go to its own field, or not extracted at all.
SQL Server 2016 and later
In any case, you can parse a JSON value with JSON_VALUE , eg :
declare #myTable table(json varchar(max))
insert into #myTable
values
('{"addressLine1": "BjoXCDSaret 15", "addressLine2": "Jppsdeheim","addressLine3": "", "primaryTown": "AKERdwfUS", "addressCountry":
{"isoAlpha2Code": "NO"}}'),
('{"addressLine1": "67 xxxx Road", "primaryTown": "HOxxxCHxxCH",
"zipOrPostalCode": "RM11 1EX", "addressCountry": {"isoAlpha2Code":
"GB"}}'),
('{"primaryTown": "MünXXer", "addressCountry": {"isoAlpha2Code": "DE"}}'),
('{"addressLine1": "28 THE EXCAC", "primaryTown": "PERTH",
"provinceOrState": "WA", "addressCountry": {"isoAlpha2Code": "AU"}}'),
('{"addressLine1": "28 THE ESPLANADE", "primaryTown": "PERTH",
"provinceOrState": "WA", "addressCountry": {"isoAlpha2Code": "AU"}}'),
('{"addressLine1": "76 XXX STREET", "primaryTown": "MAXDFOT",
"provinceOrState": "NSW", "addressCountry": {"isoAlpha2Code": "AU"}}'),
('{"addressLine1": "UNIT 56", "addressLine2": "22 XDFR STREET",
"primaryTown": "MANLY VALE", "provinceOrState": "NSW",
"addressCountry": {"isoAlpha2Code": "AU"}}')
select
JSON_VALUE(json,'$.zipOrPostalCode') as ZipCode,
JSON_VALUE(json,'$.primaryTown') as primaryTown,
JSON_VALUE(json,'$.addressCountry.isoAlpha2Code') as CountryCode
from #mytable
You can use the same expressions to create persisted computed columns on the table and index them, thus speeding queries that need to filter by country or zip code.
Older versions
In older versions, perhaps the simplest solution would be to create a SQLCLR function that uses JSON.NET to parse the data and return the values.
Another option is to convert the JSON string to XML using string replacements and use XPATH to retrieve values. This can get really tricky though as the replacements depend on the expected data, are sensitive to whitespace and can easily break when dealing with nested objects.
For example, this flat JSON object :
declare #json varchar(max)='{"addressLine1": "BjoXCDSaret 15", "addressLine2": "Jppsdeheim", "addressLine3": "", "primaryTown": "AKERdwfUS"}'
Can be converted to XML with a few replacements. Notice that I fixed whitespace differences to ensure there's always a space between tokens. Otherwise I'd have to replace both "," and ", ".
select
cast(
replace(
replace(
replace(
replace(#json,'{"','<it '),
'": "',' ="'),
'", "','" '),
'}',' />')
as xml)
The result is :
<it addressLine1="BjoXCDSaret 15" addressLine2="Jppsdeheim" addressLine3="" primaryTown="AKERdwfUS" />
This can now be queried with .value :
select
cast(replace(replace(replace(replace(#json,'{"','<it '),'": "',' ="'),'", "','" '),'}',' />') as xml)
.value('(/it/#primaryTown)[1]','varchar(20)')
This will return :
AKERdwfUS
This breaks with the nested addressCountry object though. If you know what the JSON text contains, you could cheat and replace specific attributes, not just separators, eg :
declare #json varchar(max)='{"addressLine1": "UNIT 56", "addressLine2": "22 XDFR STREET", "primaryTown": "MANLY VALE", "provinceOrState": "NSW", "addressCountry": {"isoAlpha2Code": "AU"}}'
select
cast(
replace(
replace(
replace(
replace(
replace(
replace(#json,'", "addressCountry": {"','"><addressCountry '),
'}}','/></it>'),
'{"','<it '),
'": "',' ="'),
'", "','" ')
,'}',' />')
as xml).value('(/it/addressCountry/#isoAlpha2Code)[1]','varchar(20)')
This returns AU.
That's some serious cheating though that can only work through trial and error. In this case, the addressCountry attribute and the following separators are converted to an element. }} is expected to appear only at the end of the string, so it gets special treatment.
Use a client script
It's probably easier to use a small .NET program to read the data using JSON.NET and extract the desired values. 2M rows aren't a lot so parsing the data once in a while won't be a big problem

Related

Querying Json whose root is an array of objects in SQL Server

I have a column in SQL table that has json value like below:
[
{"address":{"value":"A9"},
"value":{"type":11,"value":"John"}},
{"address":{"value":"A10"},
"value":{"type":11,"value":"Doe"}}]
MSDN Examples for JSON_VALUE or JSON_QUERY require a json object at root. How can I query above to return rows that have "address" as A9 and "value" as John? I'm using SQL Azure.
Something like this:
declare #json nvarchar(max) = '[
{"address":{"value":"A9"},
"value":{"type":11,"value":"John"}},
{"address":{"value":"A10"},
"value":{"type":11,"value":"Doe"}}]'
select a.*
from openjson(#json) r
cross apply openjson(r.value)
with (
address nvarchar(200) '$.address.value',
name nvarchar(200) '$.value.value'
) a
where address = N'A9'
and name = N'John'
outputs
address name
------- -----
A9 John
(1 row affected)
It may not be entirely relevant to the OP's post as the usage is different, however it is possible to retrieve arbitrary items from a root-level unnamed JSON array e.g.
declare #json nvarchar(max) = '[
{"address":
{"value":"A9"},
"value":
{"type":11,"value":"John"}
},
{"address":
{"value":"A10"},
"value":
{"type":11,"value":"Doe"}
}
]'
select
JSON_VALUE(
JSON_QUERY(#json, '$[0]'),
'$.address.value') as 'First address.value',
JSON_VALUE(
JSON_QUERY(#json, '$[1]'),
'$.address.value') as 'Second address.value'
Output :
First address.value Second address.value
A9 A10

Can we stop SQL Server EXCEPT from ignoring trailing spaces in values

I am auditing values in 2 identical structure tables. The T-SQL EXCEPT statement is ignoring the trailing space on a value in one table, so the values don't match, but also do not show up in our audit.
I have tried searching for ways to change how SQL is comparing the columns. I did something similar to ensure it was case sensitive, but couldn't find something that would make it include the white space/padding in the field value.
Example data would have the value in MyTable as "Product Name ", while the RemoteTable has the value "Product Name".
To quickly reproduce, here is a slimmed down version of what I'm doing now:
DECLARE #SampleLocal TABLE(ProductName varchar(50))
DECLARE #RemoteTable TABLE(ProductName varchar(50))
INSERT INTO #SampleLocal (ProductName) VALUES ('Product Name')
INSERT INTO #RemoteTable (ProductName) VALUES ('Product Name ')
SELECT ProductName COLLATE SQL_Latin1_General_CP1_CS_AS ProductName
FROM #SampleLocal
EXCEPT
SELECT ProductName COLLATE SQL_Latin1_General_CP1_CS_AS ProductName
FROM #RemoteTable
This currently returns no results, showing that the values are the same. But the value in the second table has a space at the end.
I would expect to get a result back that has "Product Name"
When I needed to compare things with case sensitivity I was able to add
COLLATE SQL_Latin1_General_CP1_CS_AS
Is there something similar that would show the value being different because of the blank space?
According to this article (https://support.microsoft.com/en-us/help/316626/inf-how-sql-server-compares-strings-with-trailing-spaces) :
The ANSI standard requires padding for the character strings used in comparisons so that their lengths match before comparing them. The padding directly affects the semantics of WHERE and HAVING clause predicates and other Transact-SQL string comparisons. For example, Transact-SQL considers the strings 'abc' and 'abc ' to be equivalent for most comparison operations.
This behavior is intended.
You can use a slower method to achieve what you wanted:
SELECT innerItems.ProductName
FROM
(
SELECT DATALENGTH(ProductName) as realLength, ProductName COLLATE SQL_Latin1_General_CP1_CS_AS as ProductName
FROM #SampleLocal
EXCEPT
SELECT DATALENGTH(ProductName) as realLength, ProductName COLLATE SQL_Latin1_General_CP1_CS_AS as ProductName
FROM #RemoteTable
) innerItems
Comparing the values and real lengths together does the magic here. (The len method would give the 'wrong' result in this case)
This is old, and already answered, but I was struggling with this as a result of comparison in a JOIN condition too. While it is true that SQL Server ignores trailing spaces, it doesn't ignore leading spaces. Therefore for my JOIN condition, I compared forwards and backwards (using the reverse function) and that gave me a more accurate set of results.
In the example below, I want rows with a trailing space in one table to only match rows with a trailing space in the JOINed table. The REVERSE function helped with this.
DECLARE #DbData AS TABLE (
Id INT,
StartDate DATETIME,
OrgName VARCHAR(100)
)
DECLARE #IncomingData AS TABLE (
Id INT,
StartDate DATETIME,
OrgName VARCHAR(100)
)
INSERT INTO #DbData (Id, StartDate, OrgName)
SELECT 1, CAST('1 Jan 2022' AS DATE), 'Test ' UNION ALL
SELECT 2, CAST('1 Jan 2022' AS DATE), 'Test' UNION ALL
SELECT 3, CAST('1 Jan 2022' AS DATE), 'Other Test' UNION ALL
SELECT 4, CAST('1 Jan 2022' AS DATE), 'Other Test '
INSERT INTO #IncomingData (Id, StartDate, OrgName)
SELECT 1, CAST('1 Jan 2022' AS DATE), 'Test ' UNION ALL
SELECT 2, CAST('1 Jan 2022' AS DATE), 'Test' UNION ALL
SELECT 3, CAST('1 Jan 2022' AS DATE), 'Other Test'
SELECT '~' + dd.OrgName + '~', '~' + id.OrgName + '~', *
FROM #DbData dd
JOIN #IncomingData id ON id.StartDate = dd.StartDate
AND dd.OrgName = id.OrgName
AND REVERSE(dd.OrgName) = REVERSE(id.OrgName) -- Try the query with and without this line to see the difference it makes

OPENJSON unable to parse Chinese characters

I'm trying to convert my JSON data into a table format in SQL Server.
Following are my JSON data:
[{
"emp_no": "001",
"emp_designation":"Data Admin",
"emp_name": "Peter",
"emp_name2": "彼特"
},
{
"emp_no": "002",
"emp_designation":"Software Engineer",
"emp_name": "Lee",
"emp_name2": "李"
}]
What I had tried are:
DECLARE #JSON NVARCHAR(MAX)
set #JSON='[{
"emp_no": "001",
"emp_designation":"Data Admin",
"emp_name": "Peter",
"emp_name2": "彼特"},
{
"emp_no": "002",
"emp_designation":"Software Engineer",
"emp_name": "Lee",
"emp_name2": "李"
}]'
--Method 1
SELECT * INTO #emp_temp FROM OPENJSON(#JSON)
WITH (emp_no varchar(20),
emp_designation varchar(50),
emp_name NVARCHAR(100),
emp_name2 NVARCHAR(100))
SELECT * FROM #Emp_temp
DROP TABLE #Emp_temp
--Method 2
SELECT
JSON_Value (EMP.VALUE, '$.emp_no') as emp_no,
JSON_Value (EMP.VALUE, '$.emp_designation') as emp_designation,
JSON_Value (EMP.VALUE, '$.emp_name') as emp_name,
JSON_Value (EMP.VALUE, '$.emp_name2') as emp_name2
INTO #Emp_temp2
FROM OPENJSON (#JSON) as EMP
SELECT * FROM #Emp_temp2
DROP TABLE #Emp_temp2
However, both temp table return me following result, with the Chinese characters remain as "???".
Temp table select result
emp_no emp_designation emp_name emp_name2
001 |Data Admin | Peter| ??
002 |Software Engineer| Lee | ?
Any idea how to preserve the original Chinese characters after parse the data into temp table?
Thanks.
*Edit:
I know it can work by putting a extra 'N' in front of the JSON
set #JSON=N'[
{ "emp_no": "001...
.....
But actually the JSON is a parameter in a Store Procedure, I cannot simply add a N like : set #JSON = 'N' + #JSON,
which this will jeopardize the format of the JSON data, and cause an error.
ALTER PROCEDURE [dbo].[SP_StoreEmpInfo]
#JSON NVARCHAR(max)
#JSON = 'N' + #JSON
/*Will cause invalid JSON format error */
SELECT
JSON_Value (EMP.VALUE, '$.emp_no') as.....
Try adding 'N' before your sql set to indicate that unicode characters are contained within like this:
DECLARE #JSON NVARCHAR(MAX)
set #JSON=N'[{
"emp_no": "001",
"emp_designation":"Data Admin",
"emp_name": "Peter",
"emp_name2": "彼特"},
{
"emp_no": "002",
"emp_designation":"Software Engineer",
"emp_name": "Lee",
"emp_name2": "李"
}]'
This question may assist in background:
What does N' stands for in a SQL script ? (the one used before characters in insert script)

Get 3rd part of text field in SQL Server

I have a table in my SQL Server database with a column of type NTEXT. The column holds address information, like this:
"Company name
Street + number
Postalcode + City
Country"
Sometimes the country is not there, but the first 3 lines are always there.
How would I go about selecting only line 3?
Each line is separated with CR + LF (\r\n)
I need this as part of a SQL Server stored procedure and the column I use is called RecipientAddress
The reason why I need this to be done within an SP is that I use the data to create a Crystal Report.
Or is there a way to do this within a formula in Crystal Reports?
EDIT: The datatypes used for the fields cannot be changed at the moment, since fields are part of an ERP system, where we are not able to change the datatypes.
I didn't use patindex because I suppose you are not using SS2014/16
It works on a set of addresses but can be change to work on only 1 value at a time using a variable
I used 2 CTE because it is easier to read and write the query this way. You can use intermediary variables instead if you work on only 1 address in a variable.
Code below can be tested in Management Studio (it creates 1 with country and 1 without country):
declare #delim varchar(10) = char(13)+char(10)
declare #table table(ID int, Address varchar(max))
insert into #table(ID, Address) values(0, 'Company name1
Street 1000
92345 City
Country')
insert into #table(ID, Address) values(1, 'Company name
Street 1000
92345 City')
; With row_start as(
Select ID, Address, pos_start = charindex(#delim, Address, charindex(#delim, Address, 0)+1)+len(#delim)
From #table
)
, row_end as (
Select ID, Address, pos_start, pos_end = charindex(#delim, Address,pos_start+1)
From row_start
)
Select ID, Address
, Zip_City = substring(Address, pos_start, (case when pos_end = 0 then len(Address)+1 else pos_end end) - pos_start)
From row_end

How to switch data in a column and update the column with result information in SQL Server

I have a table USERS and it has a column FULLNAME that contains full users names in the format FirstName LastName and I need to switch the data, and update it in the same column, using this format LastName, FirstName.
Ex. James Brown needs to be switched to Brown, James and updated in the same column (FULLNAME)
Is there any way to do it?
Thanks.
The best solution would be to split the elements into two columns (LastName, FirstName). However, if you want to try and squeeze it into 1 column, and you're assuming that every name is split by a single space
DECLARE #Users TABLE ( Name VARCHAR(100) )
INSERT INTO #Users
( Name
)
SELECT 'James Brown'
UNION ALL
SELECT 'Mary Ann Watson'
SELECT RIGHT(Name, CHARINDEX(' ', REVERSE(Name)) - 1) + ', ' + LEFT(Name,
LEN(Name)
- CHARINDEX(' ',
REVERSE(Name)))
FROM #Users
This also assumes that if they have more than two names sperated by a space, then the last word is the last name, and everything else is first name. Works for "Mary Ann Watson", but not "George Tucker Jones", if "Tucker Jones" is a last name.
Assuming they only have 1 space in the name:
UPDATE USERS
SET FULLNAME = RIGHT(FULLNAME,len(FULLNAME) - CHARINDEX(' ',FULLNAME)) +', '+ LEFT(FULLNAME,charindex(' ',FULLNAME)-1)
WHERE LEN(FULLNAME) - LEN(REPLACE(FULLNAME, ' ', '')) = 1

Resources