Snowflake row generator to insert random adhoc data in table

Snowflake row generator to insert random adhoc data in table - snowflake-cloud-data-platform

I have a table t1 with one record.
name Reg address
david 12 a23 Carl Marx
Now I need to create 100 record using row generator and insert that into the table.
so that table t1 has 101 record now with 100 row of adhoc data in the table.
I can write a query like
Insert into t1 from
select randstr(5, random()) name,
uniform(1, 10, random(12)) reg,
randstr(20, random()) address
from table(generator(rowcount => 10));
However I have 40 table like t1 with 30 plus column.
So I need to use the datatype of table column to generate adhoc data.
Is there a better way to use the datatype of table to generate adhoc data

(Edited with usage instructions below) The Gods of SQL Purity may scorn me for this answer but I propose creating a Javascript UDF that takes the output of GET_DDL, parses the results and returns to a SQL statement, which you can then copy/paste and run.
Here is one such UDF which worked against the table I tried it on by select prepare_seed_stmt('my_tbl', get_ddl('table', 'my_tbl'));
Pay close attention to the expressions object to make sure it does what you want and since many datatypes are missing (like VARIANT, etc)
create or replace function prepare_seed_stmt(t string, ddl string)
RETURNS string
LANGUAGE JAVASCRIPT
AS
$$
const regex = /^\t(.*?) (.*?)(?: |,|\n)/gm;
const expressions = {
"string": "randstr(5, random()) ",
"number": "uniform(1, 10, random(12))",
"object": "object_construct(randstr(5, random()), uniform(1, 10, random(12)))",
"ts_ntz": "dateadd(second, uniform(-1e6, 1e6, random()), current_timestamp::timestamp_ntz)",
"array": "array_construct(uniform(1, 10, random(12))"
}
const mapping = {
"FLOAT": "number",
"NUMBER": "number",
"VARCHAR": "string",
"OBJECT": "object",
"ARRAY": "array",
"TIMESTAMP_NTZ": "ts_ntz",
}
let m;
let sql = `insert into ${T} from\nselect \n`;
while ((m = regex.exec(DDL)) !== null) {
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
sql += `${expressions[mapping[m[2].replace(/\(|\)|[0-9]/g, '')]]} ${m[1]},\n`
}
sql = sql.slice(0, -2)
sql += "\nfrom table(generator(rowcount => 10));"
return sql
$$
;
Usage
Say you have a table 'foo' and you want to generate a SQL insert into foo from select <for_each_column_an_expression_for_random_data> from table(generator(rowcount => 10))
FOO is defined as follows
select get_ddl('table', 'foo');
--returns:
create or replace TABLE FOO (
a NUMBER(38,0),
b VARCHAR(16777216),
c VARCHAR(16777216),
d VARCHAR(16777216),
e TIMESTAMP_NTZ(9),
f VARCHAR(16777216),
g FLOAT,
h OBJECT,
i OBJECT,
j OBJECT,
k OBJECT,
l VARCHAR(16777216),
m VARCHAR(16777216),
n OBJECT,
o OBJECT,
p VARCHAR(16777216),
q NUMBER(38,0),
r VARCHAR(16777216),
s NUMBER(38,0),
t TIMESTAMP_NTZ(9) DEFAULT CAST(CONVERT_TIMEZONE('UTC', CAST(CURRENT_TIMESTAMP() AS TIMESTAMP_TZ(9))) AS TIMESTAMP_NTZ(9))
);
Run select prepare_seed_sql('foo', get_ddl('table', 'foo')); to return the following, which can then be executed.
insert into foo from
select
uniform(1, 10, random(12)) a,
randstr(5, random()) b,
randstr(5, random()) c,
randstr(5, random()) d,
dateadd(second, uniform(-1e6, 1e6, random()), current_timestamp::timestamp_ntz) e,
randstr(5, random()) f,
uniform(1, 10, random(12)) g,
object_construct(randstr(5, random()), uniform(1, 10, random(12))) h,
object_construct(randstr(5, random()), uniform(1, 10, random(12))) i,
object_construct(randstr(5, random()), uniform(1, 10, random(12))) j,
object_construct(randstr(5, random()), uniform(1, 10, random(12))) k,
randstr(5, random()) l,
randstr(5, random()) m,
object_construct(randstr(5, random()), uniform(1, 10, random(12))) n,
object_construct(randstr(5, random()), uniform(1, 10, random(12))) o,
randstr(5, random()) p,
uniform(1, 10, random(12)) q,
randstr(5, random()) r,
uniform(1, 10, random(12)) s,
dateadd(second, uniform(-1e6, 1e6, random()), current_timestamp::timestamp_ntz) t
from table(generator(rowcount => 10));

Related

How to Update and Insert Array of Objects in table with OPENJSON and where conditions

I want to update and insert the stock, InvM, and Invoice table with OPENJSON(). I am new at OPENJSON() in SQL Server. I have an array of objects and I want to insert each object to new row of the tables.
I want to iterate through every object and insert or update it using Where clause and OPENJSON():
Array of Objects:
DECLARE #f NVARCHAR(MAX) = N'[{
"Batch": "CP008",
"Bonus": -26,
"Code": 002,
"Cost": 50,
"Disc1": 0,
"Name": "Calpax_D Syp 120Ml",
"Price": undefined,
"Quantity": "1",
"SNO": 9,
"STP": 153,
"Stax": 0,
"TP": 50,
"Total": 50,
"invoiceno": 71,
"profit": 156,
"randomnumber": "3MO0FMDLUX0D9P1N7HGV",
"selected": false,
},
{
"Batch": "P009",
"Bonus": 0,
"Code": 823,
"Cost": 237.14999389648438,
"Disc1": 0,
"Name": "PENZOL TAB 40 MG",
"Price": undefined,
"Quantity": "2",
"SNO": 94,
"STP": 263.5,
"Stax": 0,
"TP": 263.5,
"Total": 527,
"invoiceno": 71,
"profit": 156,
"randomnumber": "3MO0FMDLUX0D9P1N7HGV",
"selected": false,
}
]'
How to update the Stock table and reduce the quantity with where condition if the Name of the medicine in the object array matches with the medicine in the Stock table (I came up with this but it is not working correctly):
UPDATE Stock
SET Qty = Qty - qty
from OPENJSON(#files)
with(qty INT '$.Quantity', Name12 VARCHAR(55) '$.Name')
where Stock.Name = Name12
Same goes for the InvM and Invoice table I want to insert new row with where condition
insert into InvM (RNDT, Dat, SMID, CID, Total, USR, RefNo, SRID, Txt,InvTime)
select RNDT, getdate(), Salesman.SMID, Customer.CID,#total, USR.Name, 0,
0,Salesman.Name,CURRENT_TIMESTAMP
from Salesman, USR,Customer, OPENJSON(#files)
with(
RNDT NVARCHAR(max) '$.randomnumber'
)
where USR.Name = 'moiz'
insert into Invoice (SNO, RNDT, Invno, Code, Name, Batch, STP, Qty, Bon, Disc, Stax, NET,
TP, Cost, Profit)
select SNO, RNDT, InvNo, Code, Name, Batch, STP, Qty, Bon, Disc, Stax, NET, TP,
Cost,profit
from OPENJSON(#files)
with (
Batch INT '$.Batch',
Bon INT '$.Bouns',
Code INT '$.Code',
Cost INT '$.Cost',
Disc INT '$.Disc1',
Name NVARCHAR(Max) '$.Name',
STP INT '$.STP',
Qty INT '$.Quantity',
SNO INT '$.SNO',
Stax INT '$.Stax',
RNDT NVARCHAR(max) '$.randomnumber',
InvNo INT '$.invoiceno',
TP INT '$.TP',
NET INT '$.Total',
profit INT '$.profit'
)

You need to parse the input JSON with OPENJSON() and update the table using an appropriate JOIN. The following example is a posiible solution to your problem:
Sample data:
SELECT *
INTO Stock
FROM (VALUES
('PENZOL TAB 40 MG', 100),
('Calpax_D Syp 120Ml', 100)
) v (Name, Quantity)
JSON:
DECLARE #files NVARCHAR(MAX) = N'[
{
"Batch":"CP008",
"Bonus":-26,
"Code":2,
"Cost":50,
"Disc1":0,
"Name":"Calpax_D Syp 120Ml",
"Price":"undefined",
"Quantity":"1",
"SNO":9,
"STP":153,
"Stax":0,
"TP":50,
"Total":50,
"invoiceno":71,
"profit":156,
"randomnumber":"3MO0FMDLUX0D9P1N7HGV",
"selected":false
},
{
"Batch":"P009",
"Bonus":0,
"Code":823,
"Cost":237.14999389648438,
"Disc1":0,
"Name":"PENZOL TAB 40 MG",
"Price":"undefined",
"Quantity":"2",
"SNO":94,
"STP":263.5,
"Stax":0,
"TP":263.5,
"Total":527,
"invoiceno":71,
"profit":156,
"randomnumber":"3MO0FMDLUX0D9P1N7HGV",
"selected":false
}
]';
UPDATE statement:
UPDATE s
SET s.Quantity = s.Quantity - j.Quantity
FROM Stock s
JOIN OPENJSON(#files) WITH (
Name varchar(100) '$.Name',
Quantity int '$.Quantity'
) j ON s.Name = j.Name
Result:
Name
Quantity
PENZOL TAB 40 MG
98
Calpax_D Syp 120Ml
99

SQL: Join fact on dimension using bridge table and return concatenated list for column of interest

I would like to join multiple dimension tables to a fact table. However, instead of returning all the results, I want to avoid a one-to-many relationship result. Instead, I would like to access just some of the data in one table, concatenate all of its findings, and return them into a single column so that the expected result of 1 row per fact remains. I do not want a one-to-many relationship in my result.
If I pair this answer How to concatenate text from multiple rows into a single text string in SQL Server with additional columns of interest, I get an error, saying: is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause
-- sample values
-- bridge
dim2Key, groupKey
1, 1
2, 1
3, 1
4, 2
-- dim2
dim2Key, attributeOne
1, 'A'
2, 'B'
3, 'C'
4, 'A'
-- dim1
dim1Key, attributeTwo, attributeThree,
1, 35, 'val1'
2, 25, 'val2'
3, 45, 'val3'
4, 55, 'val1'
-- fact1
dim1Key, factvalue1, groupKey,
1, 5, 1
2, 25, 1
3, 55, -1
4, 99, 2
-- expected values
-- fact1
dim1Key, factvalue1, groupKey, attributeTwo, attributeThree, attributeOne
1, 5, 1, 35, 'val1', 'A, B, C'
...
4, 99, 2, 55, 'val1', 'A'

It's very unclear what your schema and joins should be, but it seems you want to aggregate dim2 per row of fact1.
You can aggregate dim2 inside a correlated subquery. It's often nicer to put this in an APPLY but you could also put it directly in the SELECT
SELECT
fact1.dim1Key,
fact1.factvalue1,
fact1.groupKey,
dim1.attributeTwo,
dim1.attributeThree,
dim2.attributeOne
FROM fact1
JOIN dim1 ON dim1.dim1key = fact1.dim1key
CROSS APPLY (
SELECT attributeOne = STRING_AGG(dim2.attributeOne, ', ')
FROM bridge b
JOIN dim2 ON dim2.dim2key = b.dim2key
WHERE b.groupKey = fact1.groupKey
) dim2

If you don't have access to the STRING_AGG function in your version of SQL, you can use FOR XML PATH to achieve the same thing.
CREATE TABLE #bridge (dim2Key int, groupKey int)
INSERT #bridge (dim2Key, groupKey)
VALUES (1, 1)
,(2, 1)
,(3, 1)
,(4, 2)
CREATE TABLE #dim2 (dim2Key int, attributeOne varchar(5))
INSERT #dim2 (dim2Key, attributeOne)
VALUES (1, 'A')
,(2, 'B')
,(3, 'C')
,(4, 'A')
CREATE TABLE #dim1 (dim1Key int, attributeTwo int, attributeThree varchar(5))
INSERT #dim1 (dim1Key, attributeTwo, attributeThree)
VALUES (1, 35, 'val1')
,(2, 25, 'val2')
,(3, 45, 'val3')
,(4, 55, 'val1')
CREATE TABLE #fact1 (dim1Key int, factvalue1 int, groupKey int)
INSERT #fact1 (dim1Key, factvalue1, groupKey)
VALUES (1, 5, 1)
,(2, 25, 1)
,(3, 55, -1)
,(4, 99, 2)
GO
;WITH pvt (groupKey, attributeOne)
AS
(
SELECT b.groupKey, d2.attributeOne
FROM #dim2 d2
JOIN #bridge b
ON d2.dim2Key = b.dim2Key
)
, dim2 AS
(
SELECT DISTINCT a.groupKey
,LEFT(r.attributeOne.value('text()[1]','nvarchar(max)') , LEN(r.attributeOne.value('text()[1]','nvarchar(max)'))-1) attributeOne
FROM pvt a
CROSS APPLY
(
SELECT attributeOne + ', '
FROM pvt r
WHERE a.groupKey = r.groupKey
FOR XML PATH(''), TYPE
) r (attributeOne)
)
SELECT f1.dim1Key, factvalue1, f1.groupKey, attributeTwo, attributeThree, attributeOne
FROM #fact1 f1
LEFT JOIN #dim1 d1
ON f1.dim1Key = d1.dim1Key
LEFT JOIN dim2 d2
ON f1.groupKey = d2.groupKey

Snowflake - Lateral Flatten creating duplicate rows

I'm creating a new table (my_new_table) from another table (my_existing_table) that has 4 columns, product and monthly_budget have nested values that I'm trying to extract:
Product column is a dictionary like this:
{"name": "Display", "full_name": "Ad Bundle"}
MONTHLY_BUDGETS is a list with several dictionaries, the column looks like this:
[{"id": 123, "quantity_booked": "23", "budget_booked": "0.0", "budget_booked_loc": "0.0"} ,
{"id": 234, "quantity_booked": "34", "budget_booked": "0.0", "budget_booked_loc": "0.0"},
{"id": 455, "quantity_booked": "44", "budget_booked": "0.0", "budget_booked_loc": "0.0"}]
The below is what I'm doing to create the new table and unnest from the other table:
CREATE OR REPLACE TABLE my_new_table as (
with og_table as (
select
id,
parse_json(product) as PRODUCT,
IO_NAME,
parse_json(MONTHLY_BUDGETS) as MONTHLY_BUDGETS
from my_existing_table
)
select
id,
PRODUCT:name::string as product_name,
PRODUCT:full_name::string as product_full_name,
IO_NAME,
MONTHLY_BUDGETS:id::integer as monthly_budgets_id,
MONTHLY_BUDGETS:quantity_booked::float as monthly_budgets_quantity_booked,
MONTHLY_BUDGETS:budget_booked_loc::float as monthly_budgets_budget_booked_loc
from og_table,
lateral flatten( input => PRODUCT) as PRODUCT,
lateral flatten( input => MONTHLY_BUDGETS) as MONTHLY_BUDGETS);
however once my new table is created and I run this:
select distinct id, count(*)
from my_new_table
where id = '123'
group by 1;
I see 18 under the count(*) column when I should only have 1, so it looks like there are a lot of duplicates, but why? and how do I prevent this?

LATERAL FLATTEN produces a CROSS JOIN between the input row and the flatten results.
So if we have this data
Id, Array
1, [10,20,30]
2, [40,50,60]
and you do a flatten on Array, via something like:
SELECT d.id,
d.array,
f.value as val
FROM data d
LATERAL FLATTEN(input => d.array) f
Id, Array, val
1, [10,20,30], 10
1, [10,20,30], 20
1, [10,20,30], 30
2, [40,50,60], 40
2, [40,50,60], 50
2, [40,50,60], 60
for for you case, given you are doing two flatten's for each ID you will have many duplicate rows of ID.
Just like above if on my output if I did a SELECT ID, count(*) FROM output GROUP BY 1 I will have the values 1,3 and 2,3

Join most recent Date AND other fields that belong to that date from another table

I want to JOIN a different table that has DATE values in it, and I only want the most recent Date to be added and te most recent Value that corresponds with that Date.
I have a table in which certain RENTALOBJECTS in the RENTALOBJECTTABLE have a N:1 relationship with the OBJECTTABLE
RENTALOBJECTTABLE:
RENTALOBJECTID, OBJECTID
1, 1
2, 1
3, 2
4, 3
5, 4
6, 4
OBJECTTABLE:
OBJECTID
1
2
3
4
Every OBJECTID can (and usually has, more than 1) VALUE
VALUETABLE:
OBJECTID, VALUE, VALIDFROM, VALIDTO, CODE
1, 2000, 1-1-1950, 31-12-1980, A
1, 3000, 1-1-1981, 31-12-2010, A
1, 4000, 1-1-2013, NULL, A
2, 1000, 1-1-1970, NULL, A
3, 2000, 1-1-2010, NULL, A
4, 2000, 1-1-2000, 31-12-2009, A
4, 3100, 1-1-2010, NULL, B
4, 3000, 1-1-2010, NULL, A
And combined I want for every RentalObject the most recent VALUE to be shown. End result expected:
RENTALOBJECTTABLE_WITHVALUE:
RENTALOBJECTID, OBJECTID, VALUE, VALIDFROM, VALIDTO, CODE
1, 1, 4000, 1-1-2013, NULL, A
2, 1, 4000, 1-1-2013, NULL, A
3, 2, 1000, 1-1-1970, NULL, A
4, 3, 2000, 1-1-2010, NULL, A
5, 4, 3000, 1-1-2010, NULL, A
6, 4, 3000, 1-1-2010, NULL, A
I so far managed to get the Most Recent Date joined to the table with the code below. However, as soon as I want to INCLUDE VALUETABLE.VALUE then the rowcount goes from 5000 (what the original dataset has) to 48000.
SELECT
RENTALOBJECTTABLE.RENTALOBJECTID
FROM RENTALOBJECTTABLE
LEFT JOIN OBJECTTABLE
ON OBJECTTABLE.OBJECTID = RENTALOBJECTTABLE.OBJECTID
LEFT JOIN (
SELECT
OBJECTID,
CODE,
VALUE, --without this one it gives the same rows as the original table
MAX(VALIDFROM) VALIDFROM
FROM VALUETABLE
LEFT JOIN PMETYPE
ON VALUETABLE.CODE = PMETYPE.RECID
AND PMETYPE.REGISTERTYPENO = 6
WHERE PMETYPE.[NAME] = 'WOZ'
GROUP BY OBJECTID, CODE, VALUE
) VALUETABLE ON OBJECTTABLE.OBJECTID = VALUETABLE.OBJECTID
When I include MAX(VALUE) next to the MAX(Date) it obviously has the original 5000 dataset rows again, but now it only selects the most recent date + highest value, which is not always correct.
Anyone any clue about how to solve this issue?
I think I miss something very obvious.

This gets you close
WITH cte AS
(
SELECT
o.OBJECTID,
v.VALUE,
v.VALIDFROM,
v.VALIDTO,
v.CODE,
ROW_NUMBER() OVER (PARTITION BY o.OBJECTID ORDER BY v.VALIDFROM DESC ) rn
FROM dbo.OBJECTTABLE o
INNER JOIN dbo.VALUETABLE v ON v.OBJECTID = o.OBJECTID
)
SELECT ro.RENTALOBJECTID,
ro.OBJECTID,
cte.OBJECTID,
cte.VALUE,
cte.VALIDFROM,
cte.VALIDTO,
cte.CODE
FROM dbo.RENTALOBJECTTABLE ro
INNER JOIN cte ON cte.OBJECTID = ro.OBJECTID
AND rn=1;
However, this might pull out the 3100 value for object 4 - there is nothing to separate the two values with the same validfrom. If you have (or can add) an identity column to the value table, you can use this in the order by on the partitioning to select the row you want.

Your Sample Data
select * into #RENTALOBJECTTABLE from (
SELECT 1 AS RENTALOBJECTID, 1 OBJECTID
UNION ALL SELECT 2,1
UNION ALL SELECT 3,2
UNION ALL SELECT 4,3
UNION ALL SELECT 5,4
UNION ALL SELECT 6,4) A
SELECT * INTO #OBJECTTABLE FROM(
SELECT
1 OBJECTID
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4)AS B
SELECT * INTO #VALUETABLE FROM (
SELECT 1OBJECTID,2000 VALUE,'1-1-1950'VALIDFROM,'31-12-1980' VALIDTO, 'A' CODE
UNION ALL SELECT 1,3000,'1-1-1981','31-12-2010', 'A'
UNION ALL SELECT 1,4000,'1-1-2013',NULL, 'A'
UNION ALL SELECT 2,1000,'1-1-1970',NULL, 'A'
UNION ALL SELECT 3,2000,'1-1-2010',NULL, 'A'
UNION ALL SELECT 4,2000,'1-1-2000','31-12-2009', 'A'
UNION ALL SELECT 4,3100,'1-1-2010',NULL, 'B'
UNION ALL SELECT 4,3000,'1-1-2010',NULL, 'A'
) AS C
Query:
;WITH CTE AS (
SELECT * , ROW_NUMBER()OVER(PARTITION BY OBJECTID ORDER BY OBJECTID DESC)RN FROM #VALUETABLE
)
SELECT RO.RENTALOBJECTID,RO.OBJECTID,C.VALUE,C.VALIDFROM,C.VALIDTO,C.CODE
FROM CTE C
CROSS APPLY (SELECT OBJECTID,MAX(RN)RN FROM CTE C1 WHERE C.OBJECTID=C1.OBJECTID GROUP BY OBJECTID )AS B
INNER JOIN #RENTALOBJECTTABLE RO ON RO.OBJECTID=C.OBJECTID
WHERE C.OBJECTID=B.OBJECTID AND C.RN=B.RN
OutPut Data:
RENTALOBJECTID, OBJECTID, VALUE, VALIDFROM, VALIDTO, CODE
1, 1, 4000, 1-1-2013, NULL, A
2, 1, 4000, 1-1-2013, NULL, A
3, 2, 1000, 1-1-1970, NULL, A
4, 3, 2000, 1-1-2010, NULL, A
5, 4, 3000, 1-1-2010, NULL, A
6, 4, 3000, 1-1-2010, NULL, A

SQL Server : how to split single column into multi rows without create new function?

I would like to find a way to split column to rows, but I can find many solutions of them are using function.
I am just curious can we do this without function?
Here is my sample data:
id
value a, b, c, d, e
This is 1 COLUMN named id with value 'a, b, c, d, e'
I want to make them like
row 1 a
row 2 b
row 3 c
row 4 d
row 5 e

Use UNPIVOT
Example:
DECLARE #PIVOTED TABLE
( KEYVAL VARCHAR(50)
, A INT
, B INT
, C INT
, D INT
, E INT)
INSERT INTO #PIVOTED
( KEYVAL, A, B, C, D, E )
VALUES
('Test 1', 1, 7, 6, 4, 3),('Test 2', 9, 3, 5, 4, 6)
SELECT
KEYVAL
, ABCKeys
, ABCVals
FROM
( SELECT *
FROM #PIVOTED
) p UNPIVOT
( ABCVals FOR ABCKeys IN ( A, B, C, D, E )
) AS unpvt
ORDER BY KEYVAL;

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Snowflake row generator to insert random adhoc data in table - snowflake-cloud-data-platform

Related

How to Update and Insert Array of Objects in table with OPENJSON and where conditions

SQL: Join fact on dimension using bridge table and return concatenated list for column of interest

Snowflake - Lateral Flatten creating duplicate rows

Join most recent Date AND other fields that belong to that date from another table

SQL Server : how to split single column into multi rows without create new function?

Categories

Resources