XPath query: parsing multiple paths using the same query (Cross Apply / .nodes() ) - sql-server

I have a rather big and structured XML receipt which one I want to parse into a relational database. There are some equal structures on different levels, so it'd be very good to parse them using the same SQL statement. Like:
DECLARE #XMLPath varchar(127)
SET #XMLPath = 'atag/btag/item'
INSERT INTO XMLReadItems
SELECT ci.InvoiceID,
T.c.value('productname[1]', 'varchar(63)') AS InvoiceTarget,
T.c.value('unit[1]', 'varchar(15)') AS Unit,
FROM #XMLItems ci CROSS APPLY XMLCol.nodes(*[local-name()=sql:variable("#XMLPath")]') T(c)
Where #XMLPath could be a string from a variable or even a field from a table (what about using sql:column()?). But any of them I couldn't make work.
I can only use a static string in XMLCol.nodes().

There is no way you can construct XQuery parameter dynamically as it is limited to literal string only. See what MSDN says about the parameter of nodes() XML method :
XQuery
Is a string literal, an XQuery expression. If the query expression constructs nodes, these constructed nodes are exposed in the resulting rowset. If the query expression results in an empty sequence, the rowset will be empty. If the query expression statically results in a sequence that contains atomic values instead of nodes, a static error is raised.
Forcing to pass SQL variable to nodes() method would trigger error :
The argument 1 of the XML data type method "nodes" must be a string literal.
The trick you're trying to implement only works for matching element by name dynamically, not constructing the entire XPath dynamically. For example, the following should work fine to shred on item elements :
SET #elementName = 'item'
SELECT .....
FROM #XMLItems ci
CROSS APPLY XMLCol.nodes('//*[local-name()=sql:variable("#elementName")]') T(c)
In the end there is no workaround to this limitation as far as I can see, unless you want to go farther to construct the entire query dynamically (see: sp_executesql).

Related

SQL Server Data Masking bug with "FOR JSON PATH" clause

I'm working with a masked database on my QA server using SQL Server Standard (64-bit) 14.0.1000.169. This is my structure:
CREATE TABLE [dbo].[Test](
[Column1] [VARCHAR(64)] NULL,
[Column2] [VARCHAR(64)] NULL
)
GO
INSERT INTO [dbo].[Test]
VALUES ('ABCDEFG', 'HIJKLMN')
I've masked the column with the following code:
ALTER TABLE [dbo].[Test]
ALTER COLUMN [Column1] VARCHAR(64) MASKED WITH (FUNCTION = 'default()');
It works as expected when I perform the following query using a non-allowed user:
SELECT [Column1], [Column2]
FROM [dbo].[Test]
FOR JSON PATH
-- RESULT: '[{"Column1":"xxxx", "Column2":"HIJKLMN"}]'
But it doesn't work when the same non-allowed user saves the result in variable (the main goal):
DECLARE #var VARCHAR(64)
SET #var = (SELECT [Column1], [Column2] FROM [dbo].[Test] FOR JSON PATH)
SELECT #var --it should show a valid JSON...
-- RESULT: 'xxxx' <-- JSON LOSES ITS STRUCTURE
-- DESIRED RESULT: '[{"Column1":"xxxx", "Column2":"HIJKLMN"}]' <-- VALID JSON
Main problem: JSON looses its structure when a masked column appear in the SELECT and "FOR JSON PATH" clause is present.
We want to get a valid JSON even if the data column is masked or not, or even if sa user or not.
I've tested using NVARCHAR or doing a CAST in the masked column, but the only way we get the desired result is using a #tempTable before use the "FOR JSON PATH" clause.
How can I do for SELECT a masked column and save it to
VARCHAR variables without loose JSON structure?
Any help will be appreciated.
NOTE: The SA user is default allowed to see unmasked data (so the JSON doesn't loose its structure), but we want to execute it on a non-allowed user and return a valid JSON, not only 'xxxx'.
It does indeed appear to be a bug. Repro is here. Although see below, not so sure.
When using FOR JSON, or for that matter FOR XML, as a top level SELECT construct, a different code path is used as compared to placing it in a subquery or assigning it to a variable. This is one of the reasons for the 2033-byte limit per row in a bare FOR JSON.
What appears to be happening is that in the case of a bare FOR JSON, the data masking happens at the top of the plan, in a Compute Scalar operator just before the JSON SELECT operator. So the masking happens on just the one column.
PasteThePlan
Whereas when putting inside a subquery, a UDX function operator is used. The problem is that the Compute Scalar is happening after the UDX has created the JSON or XML, whereas it should have been pushed down below the UDX in the plan.
PasteThePlan
I suggest you file the bug with Microsoft, on the Azure Feedback site.
Having gone over this a little, I actually think now that it's not a bug. What does seem to be a bug is the case without nesting.
From the documentation:
Whenever you project an expression referencing a column for which a data masking function is defined, the expression will also be masked. Regardless of the function (default, email, random, custom string) used to mask the referenced column, the resulting expression will always be masked with the default function.
Therefore, when you select any masked column, even in a normal SELECT, if you use a function on the column then the masking always happens after any other functions. In other words, the masking is not applied when the data is read, it is applied when it is finally output to the client.
When using a subquery, the data is fed into a UDX function operator. The compiler now senses that the final resultset is a normal SELECT, just that it needs to mask any final result that came from the masked column. So the whole JSON is masked as one blob, similar to if you did UPPER(yourMaskedColumn). See the XML plan in this fiddle for an example of that.
But when using a bare FOR JSON, it appears to the compiler as a normal SELECT, just that the final output is changed to JSON (the top-level SELECT operator is different). So the masking happens before that point. This seems to me to be a bug.
The bug is even more egregious when you use FOR XML, which uses the same mechanisms. If you use a nested FOR XML ..., TYPE then you get just <masked /> irrespective of whether you nest it or not. Again this is because the query plan shows the masking happening after the UDX. Whereas if you don't use , TYPE then it depends if you nest it. See fiddle.

Binding for in clause from inside snowflake procedure

I want to execute a query
select *
from table
where column1 in (?)
where the value of ? is a java script array object. How can I bind this array to a clause value?
I am looking a cleaner way available out of the box from snowflake, instead of preparing the in clause value by iterating the array and building the string.
You cannot bind a Javascript array object as currently only Javascript variables of type number, string and SfDate can be bound.
For more information see this link.

How to query multiple JSON document schemas in Snowflake?

Could anyone tell me how to change the Stored Procedure in the article below to recursively expand all the attributes of a json file (multiple JSON document schemas)?
https://support.snowflake.net/s/article/Automating-Snowflake-Semi-Structured-JSON-Data-Handling-part-2
Craig Warman's stored procedure posted in that blog is a great idea. I asked him if it was okay to refactor his code, and he agreed. I've used the refactored version in the field, so I know the SP well as well as how it works.
It may be possible to modify the SP to work on your JSON. It will depend on whether or not Snowflake types the JSON in your variant column. The way you have it structured, it may not type everything. You can check by running this SQL and seeing if the result set includes all the columns you need:
set VARIANT_TABLE = 'WEATHER';
set VARIANT_COLUMN = 'V';
with MAIN_TABLE as
(
select * from identifier($VARIANT_TABLE) sample (1000 rows)
)
select distinct REGEXP_REPLACE(REGEXP_REPLACE(f.path, '\\[(.+)\\]'),'[^a-zA-Z0-9]','_') AS path_name, -- This generates paths with levels enclosed by double quotes (ex: "path"."to"."element"). It also strips any bracket-enclosed array element references (like "[0]")
typeof(f.value) AS attribute_type, -- This generates column datatypes.
path_name AS alias_name -- This generates column aliases based on the path
from
MAIN_TABLE,
LATERAL FLATTEN(identifier($VARIANT_COLUMN), RECURSIVE=>true) f
where TYPEOF(f.value) != 'OBJECT'
AND NOT contains(f.path, '[');
Be sure to replace the variables to your table and column names. If this picks up the type information for the columns in your JSON, then it's possible to modify this SP to do what you need. If it doesn't but there's a way to modify the query to get it to pick up the columns, that would work too.
If it doesn't pick up the columns, based on Craig's idea I decided to write type inference for non variant (such as strings from CSV log files without type information). Try the SQL above and see what results first.

Use String parameter for RegEx in query

In my query (the database is a sql server) I use a RegEx for a select command like this:
SELECT * FROM test WHERE id LIKE '1[2,3]'
(This query is tested and returns the data I want)
I want to use a paramter for this RegEx. For that I definded the Paramter in iReport $P{id} as a string and the value is "1[2,3]".
In my query I use now this parameter like this:
SELECT * FROM test WHERE id LIKE $P{id}
As result I get a blank page. I think the problem is that the value of the parameter is defined with " ". But with ' ' I get a compiler error that the paramter isn't a string.
I hope someone can help me.
LIKE applies to text values, not to numeric values. Since id is numeric use something like this:
SELECT * FROM test WHERE id IN (12, 13)
with the parameter
SELECT * FROM test WHERE id IN ($P!{id_list})
and supply a comma separated list of ids for the parameter. The bang (!) makes sure that the parameter will be inserted as-is, without string delimiters.
Btw: LIKE (Transact-SQL) uses wildcards, not regex.
You can still use LIKE since there exists an implicit conversion from numeric types to text in T-SQL, but this will result in a (table or index) scan, where as the IN clause can take advantage of indexes.
The accepted answer works but it is using String replacement, read more about sql-injection, to understand why this is not good practice.
The correct way to execute this IN query in jasper report (using prepared statement) is:
SELECT * FROM test WHERE $X{IN, id, id_list}
For more information as the use of NOTIN, BETWEEN ecc. see JasperReports sample reference for query

Custom Sorting Function - of SQL Server

I have a column in SQL Server 2005 that stores a version number as a string that i would like to sort by. I have been unable to find out how to sort this column, although i am guessing it would be some kind of custom function or compare algorithm.
Can anyone point me in the right direction of where to start? I may be googling the wrong stuff.
Cheers
Tris
I'd use separate int columns (e.g. MajorCol + MinorCol if you are tracking major + minor versions) and run something like
order by MajorCol, MinorCol
in my query.
I would look at using a persisted computed column which processes the string into an int or string or something appropriate for sorting in the native SQL Server sort - i.e.
'1.1.1.1' -> '001.001.001.001'
'10.10.10.10' -> '010.010.010.010'
'1.10.1.10' -> '001.010.001.010'
So that you can sort by the computed column and get expected results.
Alternatively, you can use such an operation inline, but it might be very slow. In addition scalar UDFs are extremely slow.
Creating an SQL CLR function is the way to go. They're extremely fast and powerful. It would be quick and effective as you wouldn't have to change any existing code, and you could specify all the information you need right in your SQL statements.
The SQL CLR function could accept an input string, as well as other parameters specifying which piece of information you'd like to extract from the input string. You could then sort on the return values of the function.
Specifically, I'd create a generic function that accepts three parameters: an input string, a regular expression, and a group name. That function would allow you to pass your database field and a regular expression with named groups right in the SQL statement.
The SQL CLR function would create a Regex, test it against the string, and would ultimately return the matched group's value or null if there was no match or the group was not matched (if the group was optional). Ideally, you'd want to pass the same regular expression to each call (perhaps as a variable like #regex), to take advantage of any CLR caching of the compiled regular expression. The end result would be very flexible and fast.
Regular expression options can be specified inline in the pattern like so: "(?imnsx-imnsx:subexpression)". See: msdn.microsoft.com/en-us/library/yd1hzczs.aspx
The code for such a function would look something like this:
[SqlFunction(IsDeterministic=true,IsPrecise=true,DataAccess=DataAccessKind.None,SystemDataAccess=SystemDataAccessKind.None)]
public static SqlString RegexMatchNamedGroup( SqlChars input, SqlString pattern, SqlString group_name )
{
Regex regex = new Regex( pattern.Value );
Match match = regex.Match( new string( input.Value ) );
if (!match.Success) //return null if match failed
return SqlString.Null;
if (group_name.IsNull) //return entire string if matched when no group name is specified
return match.Value;
Group group = match.Groups[group_name.Value];
if (group.Success)
return group.Value; //return matched group value
else
return SqlString.Null; //return null if named group was not matched
}
Your SQL statement could then sort on pieces of your information like so:
declare #regex nvarchar(2000) = '^(?<Major>\d{1,3})\.(?<Minor>\d{1,3})';
select VersionNumber
from YourTable
order by
Cast(RegexMatchNamedGroup( VersionNumber, #regex, 'Major') as int),
Cast(RegexMatchNamedGroup( VersionNumber, #regex, 'Minor') as int)

Resources