handle incorrect values in snowflake - snowflake-cloud-data-platform

HI I have one doubt in snow flake server.
how to handle non ascii value in snowflake
table : emp
Empno|Empname
1 |ravÉi
2 |banu raju
3 |raḠu kumar
based on above data i want output like below
Empno|Empname
1 |ravEi
2 |banu raju
3 |raGu kumar
I have tried like below
select empno,uncode(empname,ecoding='utf-8')lname from emp
but above query throwing error:
sql compilation erro: error line 1 at postition 27 invalid identifier encoding
can you please tell me how to write query to achive this task in snow flake server .

did you try, it should work , the function Unicode returns the Unicode code point for the first Unicode character in a string.
select empno,empname from emp;
select unicode('ravÉi')lname from dual; will return the value

Given your sample input and output what you really want is to replace Unicode characters with their closest ASCII equivalent.
A JS UDF in Snowflake can help you do that:
You can solve this with a JS UDF in Snowflake:
CREATE OR REPLACE FUNCTION normalize_js(S string)
RETURNS string
LANGUAGE JAVASCRIPT
AS 'return S.normalize("NFD").replace(/\p{Diacritic}/gu, "")'
;
select normalize_js('áéÉña');
-- 'aeEna'
See https://stackoverflow.com/a/66606937/132438.

Related

Snowflake - Setting Date Time format in result of Query

I'm running a query in snowflake to then export. I need to set/convert a date value to the following format 'yyyy-MM-ddThh:mm:ss' from 2022-02-23 16:23:58.805
I'm not sure what is the best way to convert the date format. I've tried using TO_TIMESTAMP, but keep getting the following error '1 too many arguments for function [TO_TIMESTAMP(FSA.LAST_UPDATED, 'yyyy-MM-ddThh:mm:ss')] expected 1, got 2'
This looks like a conversion issue. Please check datatype for your column last_updated. Also seems there is some typo in your question - for the time portion in format, use mi (hh:mi:ss).
Refer below -
select to_timestamp('2022-02-23 16:23:58.805'::TIMESTAMP,'yyyy-mm-dd hh:mi:ss.ff')
;
000939 (22023): SQL compilation error: error line 1 at position 7
**too many arguments for function
[TO_TIMESTAMP(TO_TIMESTAMP_NTZ('2022-02-23 16:23:58.805'), 'yyyy-mm-dd hh:mi:ss.ff')] expected 1, got 2**
select to_timestamp('2022-02-23 16:23:58.805'::string,'yyyy-mm-dd hh:mi:ss.ff');
TO_TIMESTAMP('2022-02-23 16:23:58.805'::STRING,'YYYY-MM-DD HH:MI:SS.FF')
2022-02-23 16:23:58.805
TO_TIMESTAMP is for string -> timestamp, TO_CHAR is for timestamp -> string of which the TO_CHAR( <date_or_time_expr> [, '<format>' ] ) form is the one you seem to be wanting.
this SQL show string -> timestamp -> formatted string
SELECT
'2022-02-23 16:23:58.805' as time_string,
to_timestamp(time_string) as a_timestamp,
to_char(a_timestamp, 'yyyy-MM-ddThh:mm:ss') as formating_string;
TIME_STRING
A_TIMESTAMP
FORMATING_STRING
2022-02-23 16:23:58.805
2022-02-23 16:23:58.805
2022-02-23T16:02:58

Split data from strings into columns

I have a column with a long string. The data needs split into columns and there are variable lengths of strings with not always the same amount of columns. Not exactly sure how to do this so was looking for some advice here.
Lets say I have this string:
VS5~MedCond1~35.4|VS4~MedCond2~16|VS1~MedCond3~155|VS2~MedCond4~70|SPO2~MedCond5~100|VS3~MedCond6~64|FiO2~MedCond7~21|MAP~MedCond8~98|
And in some cases the string might not have all the medical conditions just some of them.
I need to split into columns where the column name is in between the tilds i.e. MedCond1 and the value would be the value to the right of the tild but before the pipe and end up like this:
MedCond1 MedCond2 MedCond3 MedCond4 MedCond5 MedCond6 MedCond7 MedCond8
======== ======== ======== ======== ======== ======== ======== ========
35.1 24 110 64 100 88 21 79
I need to do this for a lot of rows within a large table and as I said not all the columns are always present but they will not be different names, you might have med cond 1- 8, then in another set have med cond 3, 4, 7.
Here is a query I created that is kind of what I want but not dynamic so it is picking up the values with some extra bits of the string
select MainCol, case when charindex('MedCond1', MainCol) > 0 then
substring(MainCol, charindex('MedCond1', MainCol) + 9, 4) end as [MedCond1]
from MedTable
Will return
MedCond1
========
35.3
40.2
33.6
33|V <--- Problem
As you can see the numeric value is sometimes picked up with additional part of the string due to hard coding of the charindex number. The value is sometimes 4 characters long with a decimal place, sometimes 2 long with no decimal place. I would like to make this dynamic. The pipe defines the end of the data I need and the start is defined by the tild at the end of the column name.
Thanks for any thoughts on making this dynamic
Andrew
This data looks like a table itself. It could have been stored in SQL Server as xml. SQL Server supports xml fields and allows querying them. In fact, one could try to convert this string to XML, then try to query it:
declare #medTable table (item nvarchar(2000))
insert into #medTable
values ('VS5~MedCond1~35.4|VS4~MedCond2~16|VS1~MedCond3~155|VS2~MedCond4~70|SPO2~MedCond5~100|VS3~MedCond6~64|FiO2~MedCond7~21|MAP~MedCond8~98|');
-- Step 1: Replace `|` with <item> tags and `~` with `tag` tags
-- This will return an xml value for each medTable row
with items as (
select xmlField= cast('<item><tag>'
+ replace(
replace(item,'|','</tag></item><item><tag>'),
'~','</tag><tag>' )
+ '</tag></item>' as xml)
from #medTable
)
-- Step 2: Select different tags and display them as fields
select
y.item.value('(tag/text())[1]','nvarchar(20)'),
y.item.value('(tag/text())[2]','nvarchar(20)'),
y.item.value('(tag/text())[3]','nvarchar(20)')
from items outer apply xmlField.nodes('item') as y(item)
The result is :
-------------------- -------------------- -------
VS5 MedCond1 35.4
VS4 MedCond2 16
VS1 MedCond3 155
VS2 MedCond4 70
SPO2 MedCond5 100
VS3 MedCond6 64
FiO2 MedCond7 21
MAP MedCond8 98
NULL NULL NULL
It would be better to perform this conversion when loading the data though. It's easier for example, to make the replacements in C# or SSIS and store a complete xml value in the database.
You can modify this query too, to generate the xml value and store it in the database:
declare #medTable2 table (xmlField xml)
with items as (
select xmlField= cast('<item><tag>' + replace(replace(item,'|','</tag></item><item><tag>'),'~','</tag><tag>' ) + '</tag></item>' as xml)
from #medTable
)
insert into #medTable2
select items.xmlField
from items
-- Query the new table from now on
select
y.item.value('(tag/text())[1]','nvarchar(20)'),
y.item.value('(tag/text())[2]','nvarchar(20)'),
y.item.value('(tag/text())[3]','nvarchar(20)')
from #medTable2 outer apply xmlField.nodes('item') as y(item)
OK, let me take a stab at this. The solution I'm outlining is not going to be purely SQL Server, however, it uses a round-trip via a text-file.
The approach uses the following steps:
Unpivot the data delimited by the pipe symbols (to create more than one line of output for each line of input)
Round-trip the data from SQL Server to a text file and back
Separate the data into columns on the tilde ~ symbol delimiter
Pivot the data back into columns
The key benefit of this approach is the unpivot operation, which allows you to handle missing columns like MedCond2 naturally by the absence of an equivalent row. It also eliminates nearly all string manipulation, save for the one REPLACE function in step 1 below.
Given a single row's contents like the following:
VS5~MedCond1~35.4|VS4~MedCond2~16|VS1~MedCond3~155|VS2~MedCond4~70|SPO2~MedCond5~100|VS3~MedCond6~64|FiO2~MedCond7~21|MAP~MedCond8~98|
Step 1 (Unpivot): Find and replace all instances of the pipe symbol with a newline character. So, REPLACE(column, '|', CHAR(13)) will give you the following lines of text (i.e. multiple lines of text in a single database row) for a single input row:
VS5~MedCond1~35.4
VS4~MedCond2~16
VS1~MedCond3~155
VS2~MedCond4~70
SPO2~MedCond5~100
VS3~MedCond6~64
FiO2~MedCond7~21
MAP~MedCond8~98
Step 2 (Round-trip): Write the above output to a text file, using your tool of choice (SSIS, SQLCMD, etc.) and ensure that the newline character defined is the same as that used in the REPLACE command in step 1.
The purpose of this step is to concatenate multiple lines within the same row with other lines in different rows.
Note that steps 1 can be eliminated by defining the row delimiter for steps 2 & 3 as the pipe symbol. I've put in the additional step 1 using newlines only to make it easier to understand and debug.
Step 3 (Separate columns): Import the text file back into SQL Server using the same tool, and define the column delimiter as the tilde ~ symbol, row delimiter same as in steps 1/2.
ColA MedCondTitle MedCondValue
------ ------------- -------------
VS5 MedCond1 35.4
VS4 MedCond2 16
VS1 MedCond3 155
VS2 MedCond4 70
SPO2 MedCond5 100
VS3 MedCond6 64
FiO2 MedCond7 21
MAP MedCond8 98
Step 4 (Pivot): Now you'd have a trivially simple step of pivoting rows to columns, which can be achieved with a statement of the form:
SUM(CASE WHEN MedCondTitle='MedCond1' THEN MedCondValue ELSE 0) as MedCond1

Postgres function with jsonb parameters

I have seen a similar post here but my situation is slightly different from anything I've found so far. I am trying to call a postgres function with parameters that I can leverage in the function logic as they pertain to the jsonb query. Here is an example of the query I'm trying to recreate with parameters.
SELECT *
from edit_data
where ( "json_field"#>'{Attributes}' )::jsonb #>
'{"issue_description":"**my description**",
"reporter_email":"**user#generic.com**"}'::jsonb
I can run this query just fine in PGAdmin but all my attempts thus far to run this inside a function with parameters for "my description" and "user#generic.com" values have failed. Here is a simple example of the function I'm trying to create:
CREATE OR REPLACE FUNCTION get_Features(
p1 character varying,
p2 character varying)
RETURNS SETOF edit_metadata AS
$BODY$
SELECT * from edit_metadata where ("geo_json"#>'{Attributes}' )::jsonb #> '{"issue_description":**$p1**, "reporter_email":**$p2**}'::jsonb;
$BODY$
LANGUAGE sql VOLATILE
COST 100
ROWS 1000;
I know that the syntax is incorrect and I've been struggling with this for a day or two. Can anyone help me understand how to best deal with these double quotes around the value and leverage a parameter here?
TIA
You could use function json_build_object:
select json_build_object(
'issue_description', '**my description**',
'reporter_email', '**user#generic.com**');
And you get:
json_build_object
-----------------------------------------------------------------------------------------
{"issue_description" : "**my description**", "reporter_email" : "**user#generic.com**"}
(1 row)
That way there's no way you will input invalid syntax (no hassle with quoting strings) and you can swap the values with parameters.

Parse json arrays using HIVE

I have many json arrays stored in a table (jt) that looks like this:
[{"ts":1403781896,"id":14,"log":"show"},{"ts":1403781896,"id":14,"log":"start"}]
[{"ts":1403781911,"id":14,"log":"press"},{"ts":1403781911,"id":14,"log":"press"}]
Each array is a record.
I would like to parse this table in order to get a new table (logs) with 3 fields: ts, id, log.
I tried to use the get_json_object method, but it seems that method is not compatible with json arrays because I only get null values.
This is the code I have tested:
CREATE TABLE logs AS
SELECT get_json_object(jt.value, '$.ts') AS ts,
get_json_object(jt.value, '$.id') AS id,
get_json_object(jt.value, '$.log') AS log
FROM jt;
I tried to use other functions but they seem really complicated.
Thank you! :)
Update!
I solved my issue by performing a regexp:
CREATE TABLE jt_reg AS
select regexp_replace(regexp_replace(value,'\\}\\,\\{','\\}\\\n\\{'),'\\[|\\]','') as valuereg from jt;
CREATE TABLE logs AS
SELECT get_json_object(jt_reg.valuereg, '$.ts') AS ts,
get_json_object(jt_reg.valuereg, '$.id') AS id,
get_json_object(jt_reg.valuereg, '$.log') AS log
FROM ams_json_reg;
I just ran into this problem, with the JSON array stored as a string in the hive table.
The solution is a bit hacky and ugly, but it works and doesn't require serdes or external UDFs
SELECT
get_json_object(single_json_table.single_json, '$.ts') AS ts,
get_json_object(single_json_table.single_json, '$.id') AS id,
get_json_object(single_json_table.single_json, '$.log') AS log
FROM ( SELECT explode (
split(regexp_replace(substr(json_array_col, 2, length(json_array_col)-2),
'"}","', '"}",,,,"'), ',,,,')
) FROM src_table) single_json_table;
I broke the lines up so that it would be a little easier to read.
I'm using substr() to strip the first and last characters, removing [ and ] . I'm then using regex_replace to match the separator between records in the json array and adding or changing the separator to be something unique that can then be used easily with split() to turn the string into a hive array of json objects which can then be used with explode() as described in the previous solution.
Note, the separator regex used here ( "}"," ) wouldn't work with the original data set...the regex would have to be ( "},\{" ) and the replacement would then need to be "},,,,{" eg..
split(regexp_replace(substr(json_array_col, 2, length(json_array_col)-2),
'"},\\{"', '"},,,,{"'), ',,,,')
Use explode() function
hive (default)> CREATE TABLE logs AS
> SELECT get_json_object(single_json_table.single_json, '$.ts') AS ts,
> get_json_object(single_json_table.single_json, '$.id') AS id,
> get_json_object(single_json_table.single_json, '$.log') AS log
> FROM
> (SELECT explode(json_array_col) as single_json FROM jt) single_json_table ;
Automatically selecting local only mode for query
Total MapReduce jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
hive (default)> select * from logs;
OK
ts id log
1403781896 14 show
1403781896 14 start
1403781911 14 press
1403781911 14 press
Time taken: 0.118 seconds, Fetched: 4 row(s)
hive (default)>
where json_array_col is column in jt which holds your array of jsons.
hive (default)> select json_array_col from jt;
json_array_col
["{"ts":1403781896,"id":14,"log":"show"}","{"ts":1403781896,"id":14,"log":"start"}"]
["{"ts":1403781911,"id":14,"log":"press"}","{"ts":1403781911,"id":14,"log":"press"}"]
because get_json_object doesn't support json array string, so you can concat to a json object, like this:
SELECT
get_json_object(concat(concat('{"root":', jt.value), '}'), '$.root')
FROM jt;

How convert string "20080926112720" to datetime in SSIS?

i am getting the value from xml 20080926172720. I have loaded using ssis as a string column in sql server.but ineed to store 09/26/2008 27:17:20 like this as a datetime in sql server
convert(datetime,'20080926112720',101)
I need a output like this
09/26/2008 11:27:20
I'm using SQL Server 2005
It's not the cleanest way but SQL is having trouble parsing out the HH:mm:ss without the colon separator. This way will get you the formatting you want. I strongly suggest finding a way to make your input to the convert function formatted as 'CCYYMMDD HH:mm:ss'
DECLARE #MyDate char(15)
SET #MyDate = '20080926132720'
SELECT SUBSTRING(#Mydate,5,2)+'/'+SUBSTRING(#MyDate,7,2)+'/'+SUBSTRING(#MyDate,1,4)+SPACE(1)+SUBSTRING(#MyDate,9,2)+':'+SUBSTRING(#MyDate,11,2)+':'+SUBSTRING(#MyDate,13,2)
I can't believe I put the amount of time into this that I did but you need to create a new variable and evaluate the variable as an expression to give it the proper formatting. Right now SSIS has no clue how to parse the blurb of text you are giving it. In the expression builder you should build something like this.
SUBSTRING( "20080926112720",1, 4 )+"-"+SUBSTRING( "20080926112720",5, 2 )+"-"+SUBSTRING( "20080926112720",7, 2 )+ " "+SUBSTRING( "20080926112720",9, 2 )+ ":"+SUBSTRING( "20080926112720",11, 2 )+":" +SUBSTRING( "20080926112720",13, 2 )
This will evaluate to this value 2008-09-26 11:27:20 then you can set that variable as Datetime.
You will of course need to modify this as
SUBSTRING( ##YourDateVariable,1, 4 )+"-"+SUBSTRING( ##YourDateVariable,5, 2 )+"-"+SUBSTRING( ##YourDateVariable,7, 2 )+ " "+SUBSTRING( ##YourDateVariable,9, 2 )+ ":"+SUBSTRING( ##YourDateVariable,11, 2 )+":" +SUBSTRING( ##YourDateVariable,13, 2 )

Resources