Snowflake JSON with foreign language to tabular format dynamically

Snowflake JSON with foreign language to tabular format dynamically - snowflake-cloud-data-platform

I read through snowflake documentation and the web and found only one solution to my problem by https://stackoverflow.com/users/12756381/greg-pavlik which can be found here Snowflake JSON to tabular
This doesn't work on data with Russian attribute names and attribute values. What modifications can be made for this to fit my case?
Here is an example:
create or replace table target_json_table(
v variant
);
INSERT INTO target_json_table SELECT parse_json('{
"at": {
"cf": "NV"
},
"pd": {
"мо": "мо",
"ä": "ä",
"retailerName": "retailer",
"productName":"product"
}
}');
call create_view_over_json('target_json_table', 'V', 'MY_VIEW');
ERROR: Encountered an error while creating the view. SQL compilation error: syntax error line 7 at position 7 unexpected 'ä:'. syntax error line 8 at position 7 unexpected 'мо'.

There was a bug in the original SQL used as a basis for the creation of the stored procedure. I have corrected that. You can get an update on the Github page. The changed section is here:
sql =
`
SELECT DISTINCT '"' || array_to_string(split(f.path, '.'), '"."') || '"' AS path_nAme, -- This generates paths with levels enclosed by double quotes (ex: "path"."to"."element"). It also strips any bracket-enclosed array element references (like "[0]")
DECODE (substr(typeof(f.value),1,1),'A','ARRAY','B','BOOLEAN','I','FLOAT','D','FLOAT','STRING') AS attribute_type, -- This generates column datatypes of ARRAY, BOOLEAN, FLOAT, and STRING only
'"' || array_to_string(split(f.path, '.'), '.') || '"' AS alias_name -- This generates column aliases based on the path
FROM
#~TABLE_NAME~#,
LATERAL FLATTEN(#~COL_NAME~#, RECURSIVE=>true) f
WHERE TYPEOF(f.value) != 'OBJECT'
AND NOT contains(f.path, '[') -- This prevents traversal down into arrays
limit ${ROW_SAMPLE_SIZE}
`;
Previously this SQL simply replaced non-ASCII characters with underscores. The updated SQL will wrap key names in double quotes to create non-ASCII key names.
Be sure that's what you want it to do. Also, the keys are nested. I decided that the best way to handle that is to create column names in the view with dot notation, for example one column name is pd.ä. That will require wrapping the column name with double quotes, such as:
select * from MY_VIEW where "pd.ä" = 'ä';
Final note: The name of your stored procedure is create_view_over_json, however, in the Github project the name is create_view_over_variant. When you update, be sure to call the right procedure.

Related

ABAP SQL preserve OR pad trailing spaces

I am trying to find a way to preserve a space within SQL concatenation.
For context: A table I am selecting from a table with a single concatenated key column. Concatenated keys respect spaces.
Example: BUKRS(4) = 'XYZ ', WERKS(4) = 'ABCD' is represented as key XYZ ABCD.
I am trying to form the same value in SQL, but it seems like ABAP SQL auto-trims all trailing spaces.
Select concat( rpad( tvko~bukrs, 4, (' ') ), t001w~werks ) as key, datab, datbi
from t001w
inner join tvko on tvko~vkorg = t001w~vkorg
left join ztab on ztab~key = concat( rpad( tvko~bukrs, 4, (' ') ), t001w~werks ) "This is why I need the concat
rpad( tvko~bukrs, 4, ' ' ) in this example returns XYZ, instead of XYZ , which leads to concatenated value being XYZABCD, rather than XYZ ABCD.
lpad seems to work just fine (returning XYZ), which leads me to believe I'm doing something wrong.
SQL functions don't accept string literals or variables (which preserve spaces in the same circumstances in ABAP) as they are non-elementary types.
Is there any way to pad/preserve the spaces in ABAP SQL (without pulling data and doing it in application server)?
Update: I solved my problem by splitting key selection from data selection and building the key in ABAP AS. It's a workaround that avoids the problem instead of solving it, so I'll keep the question open in case an actual solution appears.

EDIT: this post doesn't answer the question of inserting a number of characters which vary based on values in some table columns e.g. LENGTH function is forbidden in RPAD( tvko~bukrs, LENGTH( ... ), (' ') ). It's only starting from ABAP 7.55 that you can indicate SQL expressions instead of fixed numbers. You can't do it in ABAP before that. Possible workarounds are to mix ABAP SQL and ABAP (e.g. LIKE 'part1%part2' and then filtering out using ABAP) or to use native SQL directly (ADBC, AMDP, etc.)
Concerning how the trailing spaces are managed in OpenSQL/ABAP SQL, they seem to be ignored, the same way as they are ignored with ABAP fixed-length character variables.
Demonstration: I simplified your example to extract the line Walldorf plant:
These ones don't work (no line returned):
SELECT * FROM t001w
WHERE concat( 'Walldorf ' , 'plant' ) = t001w~name1
INTO TABLE #DATA(itab_1).
SELECT * FROM t001w
WHERE concat( rpad( 'Walldorf', 1, ' ' ), 'plant' ) = t001w~name1
INTO TABLE #DATA(itab_2).
These 2 ones work, one with leading space(s), one using concat_with_space:
SELECT * FROM t001w
WHERE concat( 'Walldorf', ' plant' ) = t001w~name1
INTO TABLE #DATA(itab_3).
SELECT * FROM t001w
WHERE concat_with_space( 'Walldorf', 'plant', 1 ) = t001w~name1
INTO TABLE #DATA(itab_4).
General information: ABAP documentation - SQL string functions
EDIT: working example added, using leading space(s).

How can I pull out multiple substrings from within a single string

For example, I have this string in an SSMS DB:
Sent for processing:1 ;DK A/C-WestlySnipes:ACCT NOT FOUND ;DK A/C-SonyaBlade:ACCT NOT FOUND
What I want to be able to do is pull out WestleySnipes and SonyaBlade from within that string and store it in a temp table. The names can be different and there can be more than one name within a particular string.
I've tried using substrings combined with CharIndex but I can only pull out 1 value. Not sure how to pull out multiple names from within the same string where the names can change in any given string.

In SQL Server 2016 and later, STRING_SPLIT(string, delimiter) is a table valued function. You can split your text string with a ; delimiter like so. (fiddle).
SELECT value FROM STRING_SPLIT(
'Sent for processing:1 ;DK A/C-WestlySnipes:ACCT NOT FOUND ;DK A/C-SonyaBlade:ACCT NOT FOUND',
';')
You get back this:
value
Sent for processing:1
DK A/C-WestlySnipes:ACCT NOT FOUND
DK A/C-SonyaBlade:ACCT NOT FOUND
Then you can use ordinary string-editing functions on value in your query to extract the precise substrings you need. Something like this will work for your specific example. (fiddle.)
SELECT
value,
REPLACE(REPLACE(value, 'DK A/C-', ''), ':ACCT NOT FOUND', '') val
FROM STRING_SPLIT(
'Sent for processing:1 ;DK A/C-WestlySnipes:ACCT NOT FOUND ;DK A/C-SonyaBlade:ACCT NOT FOUND',
';')
WHERE value LIKE 'DK %'

presto syntax for csv external table with array in one of the fields

I'm having trouble creating a table in Athena - that points at files with the following format:
string, string, string, array.
when I wrote the file - I delimited the array items with '|'.
I delimited each line with '\n' and each column with ','.
so for example, a row in my CSV would look like that:
Garfield, 15, orange, fish|milk|lasagna
in hive (according to the documentation i read)- when creating a table with a row delimited format - while stating the delimiters you can state a 'collection items' delimiter - that states the delimiter between elements in array columns.
I could not find an equivalent for Presto in the documentation,
Is anyone aware if it's possible, if so - what is the format, or where can I find it?
i tried "guessing" many forms, including 'collection items', none seem to work.
CREATE EXTERNAL TABLE `cats`(
`name` string,
`age` string,
`color` string,
`foods` array<string>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
COLLECTION ITEMS TERMINATED BY '|'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'some-location'
Would really appreciate any insight, Thanks! :)

According to AWS Athena docs on using SerDe, your guess was 100% correct.
In general, Athena uses the LazySimpleSerDe if you do not specify a ROW FORMAT, or if you specify ROW FORMAT DELIMITED
ROW FORMAT
DELIMITED FIELDS TERMINATED BY ','
ESCAPED BY '\\'
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY ':'
Now, when I simply tried your DDL statement, I would get
line 1:8: no viable alternative at input 'create external'
However by deleting LINES TERMINATED BY '\n', I was able to create table schema in meta catalog
CREATE EXTERNAL TABLE `cats`(
`name` string,
`age` string,
`color` string,
`foods` array<string>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '|'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'some-location'
Sample file with lines as shown in your file would get parsed correctly and I was able to do UNNEST on foods column:
SELECT *
FROM "cats"
CROSS JOIN UNNEST(foods) as t(food)
which resulted in
Moreover, it was also enough to simply swap lines LINES TERMINATED BY '\n' and COLLECTION ITEMS TERMINATED BY '|' for query to work (although I don't have an explanation for it)
CREATE EXTERNAL TABLE `cats`(
`name` string,
`age` string,
`color` string,
`foods` array<string>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '|'
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'some-location'

(Note: this answer is applicable to Presto in general, but not to Athena)
Currently you cannot set collection delimiter in Presto.
Please create a feature request # https://github.com/prestosql/presto/issues/
Note, we plan to provide generic support for table properties to address cases like this holistically -- https://github.com/prestosql/presto/issues/954. You can track the issue and associated pull request for updates.

I'm use presto engine creating a hive table , set collection delimiter in Presto, for example:
CREATE TABLE IF NOT EXISTS test (
id bigint COMMENT 'ID',
type varchar COMMENT 'TYPE',
content varchar COMMENT 'CONTENT',
create_time timestamp(3) COMMENT 'CREATE TIME',
pt varchar
)
COMMENT 'create time 2021/11/04 11:27:53'`
WITH (
format = 'TEXTFILE',
partitioned_by = ARRAY['pt'],
textfile_field_separator = U&'\0001'
)

SQL Server 2014 - XQuery - get comma-separated List

I have a database table in SQL Server 2014 with only an ID column (int) and a column xmldata of type XML.
This xmldata column contains for example:
<book>
<title>a nice Novel</title>
<author>Maria</author>
<author>Peter</author>
</book>
As expected, I have multiple books, therefore multiple rows with xmldata.
I now want to execute a query for all books, where Peter is an Author. I tried this in some xPath2.0 testers and got to the conclusion that:
/book/author/concat(text(), if(position() != last())then ',' else '')
works.
If you try to port this success into SQL Server 2014 Express it looks like this, which is correctly escaped syntax etc.:
SELECT id
FROM books
WHERE 'Peter' IN (xmldata.query('/book/author/concat(text(), if(position() != last())then '','' else '''')'))
SQL Server however does not seem to support a construction like /concat(...) because of:
The XQuery syntax '/function()' is not supported.
I am at a loss then however, why /text() would work in:
SELECT id, xmldata.query('/book/author/text()')
FROM books
which it does.
My constraints:
I am bound to use SQL Server
I am bound to xpath or something else that can be "injected" as the statement above (if the structure of the xml or the database changes, the xpath above could be changed isolated and the application logic above that constructs the Where clause will not be touched) SEE EDIT
Is there a way to make this work?
regards,
BillDoor
EDIT:
My second constraint boils down to this:
An Application constructs the Where clause by
expression <operator> value(s)
expression is stored in a database and is mapped by the xmlTag eg.:
| tokenname| querystring
| "author" | "xmldata.query(/book/author/text())"
the values are presented by the Requesting user. so if the user asks for the author "Peter" with operator "EQUALS" the application constructs:
xmaldata.query(/book/author/text()) = "Peter"
as where clause.
If the customer now decides that author needs to be nested in an <authors> element, i can simply change the expression in the construction-database and the whole machine keeps running without any changes to code, simply manageable.
So i need a way to achieve that
<xPath> <operator> "Peter"
or any other combination of this three isolated components (see above: "Peter" IN <xPath>...) gets me all of Peters' books, even if there are multiple unsorted authors.
This would not suffice either (its not sqlserver syntax, but you get the idea):
WHERE xmldata.exist('/dossier/client[text() = "$1"]', "Peter") = 1;
because the operator is still nested in the expression, i could not request <> "Peter".
I know this is strange, please don't question the concept as a whole - it has a history :/
EDIT: further clarification:
The filter-rules come into the app in an XML structure basically:
Operator: "EQ"
field: "name"
value "Peter"
evaluates to:
expression = lookupExpressionForField("name") --> "table2.xmldata.value('book/author/name[1]', 'varchar')"
operator = lookUpOperatorMapping("EQ") --> "="
value = FormatValues("Peter") --> "Peter" (if multiple values are passed FormatValues cosntructs a comma seperated list)
the application then builds:
- constructClause(String expression,String operator,String value)
"table2.xmldata.value('book/author/name[1]', 'varchar')" + "=" + "Peter"
then constructs a Select statement with the result as WHERE clause.
it does not build it like this, unescaped, unfiltered for injection etc, but this is the basic idea.
i can influence how the input is Transalted, meaning I can implement the methods:
lookupExpressionForField(String field)
lookUpOperatorMapping(String operator)
Formatvalues(List<String> values) | Formatvalues(String value)
constructClause(String expression,String operator,String value)
however i choose to do, i can change the parameter types, I can freely implement them. The less the better of course. So simply constructing a comma-seperated list with xPath would be optimal (like if i could somewhere just tick "enable /function()-syntax in xPath" in sqlserver and the /concat(if...) would work)

How about something like this:
SET NOCOUNT ON;
DECLARE #Books TABLE (ID INT NOT NULL IDENTITY(1, 1) PRIMARY KEY, BookInfo XML);
INSERT INTO #Books (BookInfo)
VALUES (N'<book>
<title>a nice Novel</title>
<author>Maria</author>
<author>Peter</author>
</book>');
INSERT INTO #Books (BookInfo)
VALUES (N'<book>
<title>another one</title>
<author>Bob</author>
</book>');
SELECT *
FROM #Books bk
WHERE bk.BookInfo.exist('/book/author[text() = "Peter"]') = 1;
This returns only the first "book" entry. From there you can extract any portion of the XML field using the "value" function.
The "exist" function returns a boolean / BIT. This will scan through all "author" nodes within "book", so there is no need to concat into a comma-separated list only for use in an IN list, which wouldn't work anyway ;-).
For more info on the "value" and "exist" functions, as well as the other functions for use with XML data, please see:
xml Data Type Methods

Building dynamic query for Sql Server 2008 when table name contains " ' "

I need to fetch Table's TOP_PK, IDENT_CURRENT, IDENT_INCR, IDENT_SEED for which i am building dynamic query as below:
sGetSchemaCommand = String.Format("SELECT (SELECT TOP 1 [{0}] FROM [{1}]) AS TOP_PK, IDENT_CURRENT('[{1}]') AS CURRENT_IDENT, IDENT_INCR('[{1}]') AS IDENT_ICREMENT, IDENT_SEED('[{1}]') AS IDENT_SEED", pPrimaryKey, pTableName)
Here pPrimaryKey is name of Table's primary key column and pTableName is name of Table.
Now, i am facing problem when Table_Name contains " ' " character.(For Ex. KIN'1)
When i am using above logic and building query it would be as below:
SELECT (SELECT TOP 1 [ID] FROM [KIL'1]) AS TOP_PK, IDENT_CURRENT('[KIL'1]') AS CURRENT_IDENT, IDENT_INCR('[KIL'1]') AS IDENT_ICREMENT, IDENT_SEED('[KIL'1]') AS IDENT_SEED
Here, by executing above query i am getting error as below:
Incorrect syntax near '1'.
Unclosed quotation mark after the character string ') AS IDENT_SEED'.
So, can anyone please show me the best way to solve this problem?

Escape a single quote by doubling it: KIL'1 becomes KIL''1.
If a string already has adjacent single quotes, two becomes four, or four becomes eight... it can get a little hard to read, but it works :)
Using string methods from .NET, your statement could be:
sGetSchemaCommand = String.Format("SELECT (SELECT TOP 1 [{0}] FROM [{1}]) AS TOP_PK, IDENT_CURRENT('[{2}]') AS CURRENT_IDENT, IDENT_INCR('[{2}]') AS IDENT_ICREMENT, IDENT_SEED('[{2}]') AS IDENT_SEED", pPrimaryKey, pTableName, pTableName.Replace("'","''"))
EDIT:
Note that the string replace is now only on a new, third substitution string. (I've taken out the string replace for pPrimaryKey, and for the first occurrence of pTableName.) So now, single quotes are only doubled, when they will be within other single quotes.

You need to replace every single quote into two single quotes http://beyondrelational.com/modules/2/blogs/70/posts/10827/understanding-single-quotes.aspx

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Snowflake JSON with foreign language to tabular format dynamically - snowflake-cloud-data-platform

Related

ABAP SQL preserve OR pad trailing spaces

How can I pull out multiple substrings from within a single string

presto syntax for csv external table with array in one of the fields

SQL Server 2014 - XQuery - get comma-separated List

Building dynamic query for Sql Server 2008 when table name contains " ' "

Categories

Resources