Creating index on specific JSON value inside an object array

Creating index on specific JSON value inside an object array - sql-server

So let's say I have a varchar column in a table with some structure like:
{
"Response":{
"DataArray":[
{
"Type":"Address",
"Value":"123 Fake St"
},
{
"Type":"Name",
"Value":"John Doe"
}
]
}
}
And I want to create a persisted computed column on the "Value" field of the "DataArray" array element that contains a Type field that equals "Name". (I hope I explained that properly. Basically I want to index the people names on that structure).
The problem is that, unlike with other json objects, I can't use the JSON_VALUE function in a straightforward way to extract said value. I've no idea if this can be done, I've been dabbling with JSON_QUERY but so far I've no idea what to do.
Any ideas and help appreciated. Thanks!

You could achieve it using function:
CREATE FUNCTION dbo.my_func(#s NVARCHAR(MAX))
RETURNS NVARCHAR(100)
WITH SCHEMABINDING
AS
BEGIN
DECLARE #r NVARCHAR(100);
SELECT #r = Value
FROM OPENJSON(#s,'$.Response.DataArray')
WITH ([Type] NVARCHAR(100) '$.Type', [Value] NVARCHAR(100) '$.Value')
WHERE [Type] = 'Name';
RETURN #r;
END;
Defining table:
CREATE TABLE tab(
val NVARCHAR(MAX) CHECK (ISJSON(val) = 1),
col1 AS dbo.my_func(val) PERSISTED -- calculated column
);
Sample data:
INSERT INTO tab(val) VALUES (N'{
"Response":{
"DataArray":[
{
"Type":"Address",
"Value":"123 Fake St"
},
{
"Type":"Name",
"Value":"John Doe"
}
]
}
}');
CREATE INDEX idx ON tab(col1); -- creating index on calculated column
SELECT * FROM tab;
db<>fiddle demo

You could use a computed column with PATINDEX and index that:
CREATE TABLE foo (a varchar(4000), a_ax AS (IIF(PATINDEX('%bar%', a) > 0, SUBSTRING(a, PATINDEX('%bar%', a), 42), '')))
CREATE INDEX foo_x ON foo(a_ax)

You could use a scalar function as #Lukasz Szozda posted - it's a good solution for this.
The problem, however, with T-SQL scalar UDFs in computed columns is that they destroy the performance of any query that table is involved in. Not only does data modification (inserts, updates, deletes) slow down, any execution plans for queries that involve that table cannot leverage a parallel execution plan. This is the case even when the computed column is not referenced in the query. Even index builds lose the ability to leverage a parallel execution plan. Note this article: Another reason why scalar functions in computed columns is a bad idea by Erik Darling.
This is not as pretty but, if performance is important than this will get you the results you need without the drawbacks of a scalar UDF.
CREATE TABLE dbo.jsonStrings
(
jsonString VARCHAR(8000) NOT NULL,
nameTxt AS (
SUBSTRING(
SUBSTRING(jsonString,
CHARINDEX('"Value":"',jsonString,
CHARINDEX('"Type":"Name",',jsonString,
CHARINDEX('"DataArray":[',jsonString)+12))+9,8000),1,
CHARINDEX('"',
SUBSTRING(jsonString,
CHARINDEX('"Value":"',jsonString,
CHARINDEX('"Type":"Name",',jsonString,
CHARINDEX('"DataArray":[',jsonString)+12))+9,8000))-1)) PERSISTED
);
INSERT dbo.jsonStrings(jsonString)
VALUES
('{
"Response":{
"DataArray":[
{
"Type":"Address",
"Value":"123 Fake St"
},
{
"Type":"Name",
"Value":"John Doe"
}
]
}
}');
Note that, this works well for the structure you posted. It may need to be tweaked depending on what the JSON does and can look like.
A second (and better but more complex) solution would be to take the json path logic from Lukasz Szozda's scalar UDF and get it into a CLR. T-SQL scalar UDFs, when written correctly, do not have the aforementioned problems that T-SQL scalar UDFs do.

Related

How to add fields dynamically to snowflake's object_construct function

I have a large table of data in Snowflake that contains many fields with a name prefix of 'RAW_'. In order to make my table more manageable, I wish to condense all of these 'RAW_' fields into just one field called 'RAW_KEY_VALUE', condensing all of it into a key-value object store.
It initially appeared that Snowflake's 'OBJECT_CONSTRUCT' function was going to be my perfect solution here. However, the issue with this function is that it requires a manual input/hard coding of the fields you wish to convert to a key-value object. This is problematic for me as I have anywhere from 90-120 fields I would need to manually place in this function. Additionally, these fields with a 'RAW_' prefix change all the time. It is therefore critical that I have a solution that allows me to dynamically add these fields and convert them to a key-value store. (I haven't tried creating a stored procedure for this yet but will if all else fails)
Here is a snippet of the data in question
create or replace table reviews(name varchar(50), acting_rating int, raw_comments varchar(50), raw_rating int, raw_co varchar(50));
insert into reviews values
('abc', 4, NULL, 1, 'NO'),
('xyz', 3, 'some', 1, 'haha'),
('lmn', 1, 'what', 4, NULL);
Below is the output I'm trying to achieve (using the manual input/hard coding approach with object_construct)
select
name ,
acting_rating ,
object_construct_keep_null ('raw_comments',raw_comments,'raw_rating',raw_rating,'raw_co',raw_co) as RAW_KEY_VALUE
from reviews;
The above produces this desired output below.
Please let me know if there are any other ways to approach here. I think if I was able to work out a way to add the relevant fields to the object_construct function dynamically, that would solve my problem.

You can do this with a JS UDF and object_construct(*):
create or replace function obj_with_prefix(PREFIX string, A variant)
returns variant
language javascript
as $$
let result = {};
for (key in A) {
if (key.startsWith(PREFIX))
result[key] = A[key];
}
return result
$$
;
Test:
with data(aa_1, aa_2, bb_1, aa_3) as (
select 1,2,3,4
)
select obj_with_prefix('AA', object_construct(*))
from data

Issue where clause FOR JSON AUTO has generated the incomplete answer [duplicate]

This question already has answers here:
FOR JSON PATH results in SSMS truncated to 2033 characters
(10 answers)
SQL Server json truncated (even when using NVARCHAR(max) )
(10 answers)
Closed 9 months ago.
Getting JSON from SQL Server is great, but I ran into a problem.
Example. I have a LithologySamples table with a very basic structure:
[Id] [uniqueidentifier],
[Depth1] [real],
[Depth2] [real],
[RockId] [nvarchar](8),
In the database there are more or less 600 records of this table. I want to generate a JSON to transport data to another database, so I use FOR JSON AUTO. Which has worked perfectly with other tables with less records. But in this case I see that the response is generated incomplete. It has me baffled. I noticed when examining the output:
[{
"Id": "77769039-B2B7-E511-8279-DC85DEFBF2B6",
"Depth1": 4.2000000e+001,
"Depth2": 5.8000000e+001,
"RockId": "MIC SST"
}, {
"Id": "78769039-B2B7-E511-8279-DC85DEFBF2B6",
"Depth1": 5.8000000e+001,
"Depth2": 6.3000000e+001,
"RockId": "CGL"
}, {
"Id": "79769039-B2B7-E511-8279-DC85DEFBF2B6",
"Depth1": 6.3000000e+001,
"Depth2": 8.3000000e+001,
"RockId": "MIC SST"
}, {
// ... OK, continue fine, but it breaks off towards the end:
}, {
"Id": "85769039-B2B7-E511-8279-DC85DEFBF2B6",
"Depth1": 2.0500000e+002,
"Depth2": 2.1500000e+002,
"RockId": "MIC SST"
}, {
"Id": "86769039-
// inexplicably it cuts here !?
I've searched and I can't find any options for the answer to come out complete.
The SQL query is as follows:
SELECT*FROM LithologySamples FOR JSON AUTO;
AUTO or PATH are the same result
Does anyone know what I should do so that the statement generates the JSON of the entire table?

But in this case I see that the response is generated incomplete.
If you are checking this in SSMS, it truncates text in various ways depending on the output method you're using (PRINT, SELECT, results to text/grid). The string is complete, it's just the output that has been mangled.
One way to validate that the string is in fact complete is to:
SELECT * INTO #foo FROM
(SELECT * FROM LithologySamples FOR JSON AUTO) x(y);
Then checking LEN(y), DATALENGTH(y), RIGHT(y , 50) (see example db<>fiddle), or selecting from that table using CONVERT(xml (see this article for more info).
In your case it seems the problem is coming from how C# is consuming the output. If the consumer is treating the JSON as multiple rows, then assigning a variable there will ultimately assign one arbitrary row of <= 2033 characters, not the whole value. I talked about this briefly back in 2015. Let's say you are using reader[0] or similar to test:
CREATE TABLE dbo.Samples
(
[Id] [uniqueidentifier] NOT NULL DEFAULT NEWID(),
[Depth1] [real] NOT NULL DEFAULT 5,
[Depth2] [real] NOT NULL DEFAULT 5,
[RockId] [nvarchar](8)
);
INSERT dbo.Samples(RockId) SELECT TOP (100) LEFT(name, 8) FROM sys.all_columns;
-- pretend this is your C# reader:
SELECT * FROM dbo.Samples FOR JSON AUTO;
-- reader[0] here would be something like this:
-- [{"Id":"054EC9A2-760B-4EBA-BF06-...,"RockId":"ser
-- which is the first 2,033 characters
SELECT LEN('[{"Id":"054EC9A2-760B-4EBA-BF06-..."RockId":"ser')
-- instead, since you want C# to assign a scalar,
-- assign output to a scalar first:
DECLARE #json nvarchar(max) = (SELECT * FROM dbo.Samples FOR JSON AUTO);
SELECT json = #json;
-- now reader[0] will be the whole thing
Example db<>fiddle
The 2033 comes from the same place it comes from for XML (since SQL Server's JSON implementation is just a pretty wrapper under existing underlying XML functionality), as Charlie points out Martin explained here:
SELECT FOR XML AUTO and return datatypes

MSSQL Data type conversion

I have a pair of databases (one mssql and one oracle), ran by different teams. Some data are now being synchronized regularily by a stored procedure in the mssql table. This stored procedure is calling a very large
MERGE [mssqltable].[Mytable] as s
USING THEORACLETABLE.BLA as t
ON t.[R_ID] = s.[R_ID]
WHEN MATCHED THEN UPDATE SET [Field1] = s.[Field1], ..., [Brokenfield] = s.[BrokenField]
WHEN NOT MATCHED BY TARGET THEN
... another big statement
Field Brokenfield was a numeric one until today, and could take value NULL, 0, 1, .., 24
Now, the oracle team introduced a breaking change today for some reason, changed the type of the column to string and now has values NULL, "", "ALFA", "BRAVO"... in the column. Of course, the sync got broken.
What is the easiest way to fix the sync here? I (Mysql team lead, frontend expert but not so in databases) would usually apply one of our database expert guys here, but all of them are now ill, and the fix must go online today....
I thought of a stored procedure like CONVERT_BROKENFIELD_INT_TO_STRING or so, based on some switch-case, which could be called in that merge statement, but not sure how to do that.
Edit/Clarification:
What I need is a way to make a chunk of SQL code (stored procedure), taking an input of "ALFA" and returning 1, "BRAVO" -> 2, etc. and which can be reused, to avoid writing huge ifs in more then one place.

If you can not simplify the logic for correct values the way #RichardHansell desribed, you can create a crosswalk table for BrokenField to the correct values. Then you can use a common table expression or subquery with a left join to that crosswalk to use in the merge.
create table dbo.BrokenField_Crosswalk (
BrokenField varchar(32) not null primary key
, CorrectedValue int
);
insert into dbo.BrokenField_Crosswalk (BrokenField,CorrectedValue) values
('ALFA', 1)
, ('ALPHA', 1)
, ('BRAVO', 2)
...
go
And your code for the merge would look something like this:
;with cte as (
select o.R_ID
, o.Field1
, BrokenField = cast(isnull(c.CorrectedValue,o.BrokenField) as int)
....
from oracle_table.bla as o
left join dbo.BrokenField_Crosswalk as c
)
merge into [mssqltable].[Mytable] t
using cte as s
on t.[R_ID] = s.[R_ID]
when matched
then update set
[Field1] = s.[Field1]
, ...
, [Brokenfield] = s.[BrokenField]
when not matched by target
then

If they are using names with a letter at the start that goes in a sequence:
A = 1
B = 2
C = 3
etc.
Then you could do something like this:
MERGE [mssqltable].[Mytable] as s
USING THEORACLETABLE.BLA as t
ON t.[R_ID], 1)) - ASCII('A') + 1 = s.[R_ID]
WHEN MATCHED THEN UPDATE SET [Field1] = s.[Field1], ..., [Brokenfield] = s.[BrokenField]
WHEN NOT MATCHED BY TARGET THEN
... another big statement
Edit: but actually I re-read your question and you are talking about [Brokenfield] being the problem column, so my solution wouldn't work.
I don't really understand now, as it seems as though the MERGE statement is updating the oracle table with numbers, so surely you need the mapping to work the other way, i.e. 1 -> ALFA, 2 -> BETA, etc.?

How to get around the parameters limit in Linq-to-SQL's .Contains

I have a time-consuming Linq-to-SQL query which looks like: database.GetTable<....>().Where(.....).Join(.......).Join(.......).Join(........).Select(a => new XResult(.......)).ToArray(). It works quite slow and I tried to speed it up by caching all XResult into static List.
For that purpose I added .Where(a => !cachedResults.Contains(a)) into the query sequence, but I faced the problem: "The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Too many parameters were provided in this RPC request. The maximum is 2100.".
So... is it somehow possible to redesign my query in order to get around the parameters limit in SQL? Maybe temporary table? Or somehow redesign the caching mechanism? Any ideas?
UPDATE: I tried to merge all cached XResults into string and then do .Where(a => SqlMethods.Like(mergedResults, "%|" + a.Id.ToString() + "|%")).. It does not crash and maybe it even works, but I could not get the result - I get the SQL timeout. So... it is not the acceptable solution. Any other ideas?

My final solution.
In database create table-valued function, which receives merged ids as VARBINARY(MAX), splits them and returns table of INTs:
CREATE FUNCTION [dbo].[SplitIds]
(
#data VARBINARY(MAX)
)
RETURNS #result TABLE(Id INT)
AS
BEGIN
IF #data IS NULL
RETURN
DECLARE #ptr INT = 0, #size INT = 4
WHILE (#ptr) * #size < LEN(#data)
BEGIN
INSERT INTO #result(Id)
VALUES(SUBSTRING(#data, (#ptr* #size) + 1, #size))
SET #ptr += 1
END
RETURN
END
Drag it from Server Explorer to DBML designer.
Use it in Linq-to-SQL sequence like any other built-in method, for example I used it as follows: database.GetTable<....>().Join(database.SplitIds(ConvertIdsToBinary(Ids)), ......).Where(.....).Join(.......).Join(.......).Join(........).Select(a => new XResult(.......)).ToArray()

In SQL Server can I insert multiple nodes into XML from a table?

I want to generate some XML in a stored procedure based on data in a table.
The following insert allows me to add many nodes but they have to be hard-coded or use variables (sql:variable):
SET #MyXml.modify('
insert
<myNode>
{sql:variable("#MyVariable")}
</myNode>
into (/root[1]) ')
So I could loop through each record in my table, put the values I need into variables and execute the above statement.
But is there a way I can do this by just combining with a select statement and avoiding the loop?
Edit I have used SELECT FOR XML to do similar stuff before but I always find it hard to read when working with a hierarchy of data from multiple tables. I was hoping there would be something using the modify where the XML generated is more explicit and more controllable.

Have you tried nesting FOR XML PATH scalar valued functions?
With the nesting technique, you can brake your SQL into very managable/readable elemental pieces
Disclaimer: the following, while adapted from a working example, has not itself been literally tested
Some reference links for the general audience
http://msdn2.microsoft.com/en-us/library/ms178107(SQL.90).aspx
http://msdn2.microsoft.com/en-us/library/ms189885(SQL.90).aspx
The simplest, lowest level nested node example
Consider the following invocation
DECLARE #NestedInput_SpecificDogNameId int
SET #NestedInput_SpecificDogNameId = 99
SELECT [dbo].[udfGetLowestLevelNestedNode_SpecificDogName]
(#NestedInput_SpecificDogNameId)
Let's say had udfGetLowestLevelNestedNode_SpecificDogName had been written without the FOR XML PATH clause, and for #NestedInput_SpecificDogName = 99 it returns the single rowset record:
#SpecificDogNameId DogName
99 Astro
But with the FOR XML PATH clause,
CREATE FUNCTION dbo.udfGetLowestLevelNestedNode_SpecificDogName
(
#NestedInput_SpecificDogNameId
)
RETURNS XML
AS
BEGIN
-- Declare the return variable here
DECLARE #ResultVar XML
-- Add the T-SQL statements to compute the return value here
SET #ResultVar =
(
SELECT
#SpecificDogNameId as "#SpecificDogNameId",
t.DogName
FROM tblDogs t
FOR XML PATH('Dog')
)
-- Return the result of the function
RETURN #ResultVar
END
the user-defined function produces the following XML (the # signs causes the SpecificDogNameId field to be returned as an attribute)
<Dog SpecificDogNameId=99>Astro</Dog>
Nesting User-defined Functions of XML Type
User-defined functions such as the above udfGetLowestLevelNestedNode_SpecificDogName can be nested to provide a powerful method to produce complex XML.
For example, the function
CREATE FUNCTION [dbo].[udfGetDogCollectionNode]()
RETURNS XML
AS
BEGIN
-- Declare the return variable here
DECLARE #ResultVar XML
-- Add the T-SQL statements to compute the return value here
SET #ResultVar =
(
SELECT
[dbo].[udfGetLowestLevelNestedNode_SpecificDogName]
(t.SpecificDogNameId)
FROM tblDogs t
FOR XML PATH('DogCollection') ELEMENTS
)
-- Return the result of the function
RETURN #ResultVar
END
when invoked as
SELECT [dbo].[udfGetDogCollectionNode]()
might produce the complex XML node (given the appropriate underlying data)
<DogCollection>
<Dog SpecificDogNameId="88">Dino</Dog>
<Dog SpecificDogNameId="99">Astro</Dog>
</DogCollection>
From here, you could keep working upwards in the nested tree to build as complex an XML structure as you please
CREATE FUNCTION [dbo].[udfGetAnimalCollectionNode]()
RETURNS XML
AS
BEGIN
DECLARE #ResultVar XML
SET #ResultVar =
(
SELECT
dbo.udfGetDogCollectionNode(),
dbo.udfGetCatCollectionNode()
FOR XML PATH('AnimalCollection'), ELEMENTS XSINIL
)
RETURN #ResultVar
END
when invoked as
SELECT [dbo].[udfGetAnimalCollectionNode]()
the udf might produce the more complex XML node (given the appropriate underlying data)
<AnimalCollection>
<DogCollection>
<Dog SpecificDogNameId="88">Dino</Dog>
<Dog SpecificDogNameId="99">Astro</Dog>
</DogCollection>
<CatCollection>
<Cat SpecificCatNameId="11">Sylvester</Cat>
<Cat SpecificCatNameId="22">Tom</Cat>
<Cat SpecificCatNameId="33">Felix</Cat>
</CatCollection>
</AnimalCollection>

Use sql:column instead of sql:variable. You can find detailed info here: http://msdn.microsoft.com/en-us/library/ms191214.aspx

Can you tell a bit more about what exactly you are planning to do.
Is it simply generating XML data based on a content of the table
or adding some data from the table to an existing xml structure?
There are great series of articles on the subject on XML in SQLServer written by Jacob Sebastian, it starts with the basics of generating XML from the data in the table

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Creating index on specific JSON value inside an object array - sql-server

You could use a computed column with PATINDEX and index that: CREATE TABLE foo (a varchar(4000), a_ax AS (IIF(PATINDEX('%bar%', a) > 0, SUBSTRING(a, PATINDEX('%bar%', a), 42), ''))) CREATE INDEX foo_x ON foo(a_ax)

Related

How to add fields dynamically to snowflake's object_construct function

Issue where clause FOR JSON AUTO has generated the incomplete answer [duplicate]

MSSQL Data type conversion

How to get around the parameters limit in Linq-to-SQL's .Contains

In SQL Server can I insert multiple nodes into XML from a table?

Categories

Resources