Snowflake update value of all column with random value

Snowflake update value of all column with random value - snowflake-cloud-data-platform

There is a table Account_info.
Name Email salary
david dv#some.com 10,0000
jimmy jk#new.com 20,0000
....
I need to update the table such as it contains the data in scrambled
like
Name Email salary
xrfds le#xxx.com 99,0000
aswss ad#yry.com 11,0000
....
I tried to use row_generator but seems that wont help.
Any suggestion will help

You could use randstr(), random() and uniform() and do something like this:
select randstr(5, random()) as name
, lower(concat(randstr(2, random()), '#', randstr(3, random()), '.com')) as email
, uniform(1000, 100000, random()) as salary;

Related

Snowflake Retrieve value from Semi Structured Data

I'm trying to retrieve the health value from Snowflake semi structured data in a variant column called extra from table X.
An example of the code can be seen below:
[
{
"party":
"[{\"class\":\"Farmer\",\"gender\":\"Female\",\"ethnicity\":\"NativeAmerican\",\"health\":2},
{\"class\":\"Adventurer\",\"gender\":\"Male\",\"ethnicity\":\"White\",\"health\":3},
{\"class\":\"Farmer\",\"gender\":\"Male\",\"ethnicity\":\"White\",\"health\":0},
{\"class\":\"Banker\",\"gender\":\"Female\",\"ethnicity\":\"White\",\"health\":0}
}
]
I have tried reading the Snowflake documentation from https://community.snowflake.com/s/article/querying-semi-structured-data
I have also tried the following queries to flatten the query:
SELECT result.value:health AS PartyHealth
FROM X
WHERE value = 'Trail'
AND name = 'Completed'
AND PartyHealth > 0,
TABLE(FLATTEN(X, 'party')) result
AND
SELECT [0]['party'][0]['health'] AS Health
FROM X
WHERE value = 'Trail'
AND name = 'Completed'
AND PH > 0;
I am trying to retrieve the health value from table X from column extra which contains the the variant party, which has 4 repeating values [0-3]. Im not sure how to do this is someone able to tell me how to query semi structured data in Snowflake, considering the documentation doesn't make much sense?

First, the JSON value you posted seems wrong formatted (might be a copy paste issue).
Here's an example that works:
first your JSON formatted:
[{ "party": [ {"class":"Farmer","gender":"Female","ethnicity":"NativeAmerican","health":2}, {"class":"Adventurer","gender":"Male","ethnicity":"White","health":3}, {"class":"Farmer","gender":"Male","ethnicity":"White","health":0}, {"class":"Banker","gender":"Female","ethnicity":"White","health":0} ] }]
create a table to test:
CREATE OR REPLACE TABLE myvariant (v variant);
insert the JSON value into this table:
INSERT INTO myvariant SELECT PARSE_JSON('[{ "party": [ {"class":"Farmer","gender":"Female","ethnicity":"NativeAmerican","health":2}, {"class":"Adventurer","gender":"Male","ethnicity":"White","health":3}, {"class":"Farmer","gender":"Male","ethnicity":"White","health":0}, {"class":"Banker","gender":"Female","ethnicity":"White","health":0} ] }]');
now, to select a value you start from column name, in my case v, and as your JSON is an array inside, I specify first value [0], and from there expand, so something like this:
SELECT v[0]:party[0].health FROM myvariant;
Above gives me:
For the other rows you can simply do:
SELECT v[0]:party[1].health FROM myvariant;
SELECT v[0]:party[2].health FROM myvariant;
SELECT v[0]:party[3].health FROM myvariant;

Another option might be to make the data more like a table ... I find it easier to work with than the JSON :-)
Code at bottom - just copy/paste and it runs in Snowflake returning screenshot below.
Key Doco is Lateral Flatten
SELECT d4.path, d4.value
from
lateral flatten(input=>PARSE_JSON('[{ "party": [ {"class":"Farmer","gender":"Female","ethnicity":"NativeAmerican","health":2}, {"class":"Adventurer","gender":"Male","ethnicity":"White","health":3}, {"class":"Farmer","gender":"Male","ethnicity":"White","health":0}, {"class":"Banker","gender":"Female","ethnicity":"White","health":0} ] }]') ) as d ,
lateral flatten(input=> value) as d2 ,
lateral flatten(input=> d2.value) as d3 ,
lateral flatten(input=> d3.value) as d4

Search for row where email contains only the X character

I want to search for rows whose column email has only the X character.
For example, if the email is XXXX, it should fetch the row but if the email is XXX#XXX.COM it should not be fetched.
I have tried something like this, but it is returning me all emails having the character X in it:
select *
from STUDENTS
where EMAIL like '%[X+]%';
Any idea what is wrong with my query?
Thanks

Try below query:
select *
from STUDENTS
where LEN(EMAIL) > 0 AND LEN(REPLACE(EMAIL,'X','')) = 0;

I would use PATINDEX:
SELECT * FROM STUDENTS WHERE PATINDEX('%[^X]%', Email)=0
Only X means no other characters than X.
To handle NULLs and empty strings you should consider additional conditions. See demo below:
WITH STUDENTS AS
(
SELECT * FROM (VALUES ('XXXX'),('XXX#XXX.COM'),(NULL),('')) T(Email)
)
SELECT *
FROM STUDENTS
WHERE PATINDEX('%[^X]%', Email)=0 AND LEN(Email)>0

This will find all rows where email only contain 1 or more X and no other characters.
SELECT *
FROM STUDENTS
WHERE Email not like '%[^X]%' and Email <> ''

Postgresql: convert text representing key - value to json

I have a table with a string column containing values like this: 'ID: 1, name: john doe, occupation: salesmen'. I want to convert this into a column of JSON objects like this: {"ID" : "1", "name" : "john doe", "occupation" : "salesmen"}
For now my solution is:
WITH
lv1 as(SELECT regexp_split_to_table('ID: 1, name: john doe, occupation: salesmen', ', ') record)
, lv2 as (SELECT regexp_split_to_array(record, ': ') arr from lv1)
SELECT
json_object(
array_agg(arr[1])
, array_agg(arr[2])
)
FROM lv2
The problem is that the string actually contains nearly 100 key - value pair and the table has millions of rows, so using regex_split_to_table will make this table explode. Is there any efficient way to do this in Postgresql?

you don't necessarily need regular expressions functions here, eg:
db=# with c as (select unnest('{ID: 1, name: john doe, occupation: salesmen}'::text[]))
select string_to_array(unnest,': ') from c;
string_to_array
-----------------------
{ID,1}
{name,"john doe"}
{occupation,salesmen}
(3 rows)
Not sure what will be faster though.
Regarding built in json formatting - I think you HAVE to provide ether a row or formatted JSON - no parsers are currently vailable...

how to insert variable into database with pyodbc?

highscore= score
cursor.execute("insert into tble values (hscore) hishscore.getvalue"):
que: score will save into variable highscore. That highscore needs to save on to the database in the field hscore. What is the correct code for insertion and getting value.

You want to bind the parameter using the ? placeholder:
cursor.execute("INSERT INTO tble (hscore) VALUES (?)", highscore)
If you wanted to insert multiple values, here's a longer form:
cursor.execute(
"""
INSERT INTO table_name
(column_1, column_2, column_3)
VALUES (?, ?, ?)
""",
(value_1, value_2, value_3)
)
Your order of VALUES was out of place as well. Good luck!

cursor.execute("insert into tablename(column1,column2) values (?,?);",var1,var2)
I needed the semicolon for it to work for me.

Assuming the column name is 'hscore', and the variable with the value to be inserted is 'highscore':
cursor.execute("insert into tablename([hscore]) values(?)", highscore)

you can follow the below code this is going write column values from csv , good example for your use case
import pyodbc
import io
#credential file for server,database,username,password
with io.open('cred.txt','r',encoding='utf-8')as f2:
cred_data=f2.read()
f2.close()
cred_data=cred_data.split(',')
server=cred_data[0]
database=cred_data[1]
username=cred_data[2]
pwd=cred_data[3]
con_obj=pyodbc.connect("DRIVER={SQL Server};SERVER="+server+";DATABASE="+database+";UID="+username+";PWD="+pwd)
data_obj=con_obj.cursor()
#data file with 5 columns
with io.open('data.csv','r',encoding='utf-8')as f1:
data=f1.read()
f1.close()
data=data.split('\n')[1:]
i=1001
for row in data:
lines=row.split(',')
emp=i
fname=lines[0].split(' ')[0]
sname=lines[0].split(' ')[1]
com=lines[1]
dep=lines[2]
job=lines[3]
email=lines[4]
data_obj.execute("insert into dbo.EMP(EMPID,FNAME,SNAME,COMPANY,DEPARTMENT,JOB,EMAIL) values(?,?,?,?,?,?,?)", emp,fname,sname,com,dep,job,email)
con_obj.commit()
i=i+1

replace not working after specific number

I have a table with a column vouchn. The recod of this column eg-if it receipt voucher it will record like RV103 AND LIKE payment it stores like PV99. I also use this sql for gettin max records.
SELECT MAX(REPLACE(vouchn, 'RV', '')) AS vcno
FROM dbo.dayb
WHERE (vouchn LIKE '%RV%')
it is ok until i reach RV999. After then even record RV1000 is there the above sql retrieve RV999. What is the error of the above code.

If REPLACE(vouchn, 'RV', '') always return numeric result, you can as the below:
SELECT MAX(REPLACE(vouchn, 'RV', '') * 1) AS vcno
FROM dbo.dayb
WHERE (vouchn LIKE '%RV%')

Add CONVERT to numeric type:
SELECT MAX(CONVERT(BIGINT,REPLACE(vouchn, 'RV', ''))) AS vcno
FROM dbo.dayb
WHERE (vouchn LIKE '%RV%')

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Snowflake update value of all column with random value - snowflake-cloud-data-platform

You could use randstr(), random() and uniform() and do something like this: select randstr(5, random()) as name , lower(concat(randstr(2, random()), '#', randstr(3, random()), '.com')) as email , uniform(1000, 100000, random()) as salary;

Related

Snowflake Retrieve value from Semi Structured Data

Search for row where email contains only the X character

Postgresql: convert text representing key - value to json

how to insert variable into database with pyodbc?

replace not working after specific number

Categories

Resources