I'm trying to to make a script that reads from a nosql and inserts into a SQL Server database.
That said I'm reading collections dynamically, so I need something to do things like
var columns = [ 1, 2, 3, 4 ...]
var values = [a, b, c ,4 ...]
request.query("INSERT INTO TABLE (" + [columns] + ") VALUES ( " [values] ");"
I have some collections with up to like 27 columns and I can't hog the database by inserting each value as I have like 20.000.000 registers to do... can't find anything that can do that inside a transaction, so I would appreciate any suggestions
var columns = [ 1, 2, 3, 4 ...]
var values = [a, b, c ,4 ...]
request.query(`INSERT INTO TABLE (${columns}) VALUES ?`), [[values]])
columns is an array so will have convert into a string for removing '[' and ']' brackets.
Related
I am new to Spark Sql and I have a column of type array with data like below :
[{"X":"A11"},{"X":"A12"},{"X":"A13"}]
The output I am looking for is a string field as
A11, A12, A13
I cannot explode the array as I need the data in one row.
Since the maximum length of the array in my case is 6, I got it to work using below case statement.
case
when size(arr)=1 then array_join(map_values(map_concat(arr[0])),',')
when size(arr)=2 then array_join(map_values(map_concat(arr[0],arr[1])),',')
when size(arr)=3 then array_join(map_values(map_concat(arr[0],arr[1],arr[2])),',')
when size(arr)=4 then array_join(map_values(map_concat(arr[0],arr[1],arr[2],arr[3])),',')
when size(arr)=5 then array_join(map_values(map_concat(arr[0],arr[1],arr[2],arr[3],arr[4])),',')
when size(arr)=6 then array_join(map_values(map_concat(arr[0],arr[1],arr[2],arr[3],arr[4],arr[5])),',')
else
null
end
Is there a better way to do this?
Assuming that the source and result columns are col and values respectively, it can be implemented as follows:
data = [
([{"X": "A11"}, {"X": "A12"}, {"X": "A13"}],)
]
df = spark.createDataFrame(data, ['col'])
df = df.withColumn('values', F.array_join(F.flatten(F.transform('col', lambda x: F.map_values(x))), ','))
df.show(truncate=False)
I need a database to store pairs of key - array rows like below:
===== TABLE: shoppingCart =====
user_id - product_ids
1 - [1, 2, 3, 4]
2 - [100, 200, 300, 400]
and I want to be able to update a row with new array merging to the old one while skipping duplicate values. i.e, I want operations like:
UPDATE shoppingCart SET product_ids = UNION(product_ids, [4, 5, 6]) WHERE user_id = 1
to result the first row's product_ids column to become:
[1, 2, 3, 4, 5, 6]
I also need operations like selecting a sub-array, e.g. :
SELECT product_ids[0:2] from shoppingCart
which should result:
[1,2]
any suggestions for best database for such purposes?
the arrays I need to work with are usually long (containing about 1,000 - 10,000 values of long integers ( or string version of long integers) )
How do i modify the PostgresSQL in snowflake?
UNNEST(array[
'x' || to_char(date_trunc('MONTH', max(date)), 'Mon YYYY' ,
'y' || to_char(date_trunc('MONTH', max(date)), 'Mon YYYY')
)])
You can use "flatten" to break out values from the array, and then "table" to convert the values into a table:
-- Use an array for testing:
select array_construct(1, 2, 3, 4, 5);
-- Flattens into a table with metadata for each row:
select * from table(flatten(input => array_construct(1, 2, 3, 4, 5)));
--Pulls out just the values from the array:
select value::integer from table(flatten(input => array_construct(1, 2, 3, 4, 5)));
The "::integer" part casts the values to the data type you want from the array. It's optional but recommended.
You can approximate the syntax of unnest by creating a user defined table function:
create or replace function UNNEST(V array)
returns table ("VALUE" variant)
language SQL
aS
$$
select VALUE from table(flatten(input => V))
$$;
You would call it like this:
select * from table(unnest(array_construct(1, 2, 3, 4, 5)));
This returns a table with a single column named VALUE of type variant. You can make a version that returns strings, integers, etc.
I have two arrays [1,2,3,4,7,6] and [2,3,7] in PostgreSQL which may have common elements. What I am trying to do is to exclude from the first array all the elements that are present in the second.
So far I have achieved the following:
SELECT array
(SELECT unnest(array[1, 2, 3, 4, 7, 6])
EXCEPT SELECT unnest(array[2, 3, 7]));
However, the ordering is not correct as the result is {4,6,1} instead of the desired {1,4,6}.
How can I fix this ?
I finally created a custom function with the following definition (taken from here) which resolved my issue:
create or replace function array_diff(array1 anyarray, array2 anyarray)
returns anyarray language sql immutable as $$
select coalesce(array_agg(elem), '{}')
from unnest(array1) elem
where elem <> all(array2)
$$;
I would use ORDINALITY option of UNNEST and put an ORDER BY in the array_agg function while converting it back to array. NOT EXISTS is preferred over except to make it simpler.
SELECT array_agg(e order by id)
FROM unnest( array[1, 2, 3, 4, 7, 6] ) with ordinality as s1(e,id)
WHERE not exists
(
SELECT 1 FROM unnest(array[2, 3, 7]) as s2(e)
where s2.e = s1.e
)
DEMO
More simple, NULL support, probably faster:
select array(
select v
from unnest(array[2,2,null,1,3,3,4,5,null]) with ordinality as t(v, pos)
where array_position(array[3,3,5,5], v) is null
order by pos
);
Result: {2,2,null,1,4,null}
Function array_diff() with tests.
Postgres is unfortunately lacking this functionality. In my case, what I really needed to do was to detect cases where the array difference was not empty. In that specific case you can do that with the #> operator which means "Does the first array contain the second?"
ARRAY[1,4,3] #> ARRAY[3,1,3] → t
See doc
I have a dataset where we track engagement per-percent (so 8 people are active at 38%, 7 people are active at 39%, etc.). This gives an array with 100 values, filled with integers.
I need to store this in a postgres table. The only/major requirement is that I need to be able to sum the values for each index to form a new array. Example:
Row 1: [5, 3, 5, ... 7]
Row 2: [2, 5, 3, ... 1]
Sum: [7, 8, 8, ... 8]
The naive way to save these would be 100 individual (BIG)INT columns, which would allow you to sum the values per-column over multiple rows. However, this makes the table very wide (and does not seem like the most efficient way to do it). I have looked into (BIG)INT[100] columns, but I cannot seem to find a good, native way to sum the values. Same thing with json(b) columns (with a native JSON array).
Have I overlooked something? Is there a good, efficient way to do this without completely bloating a table?
The solution using unnest() with ordinality:
with the_table(intarr) as (
values
(array[1, 2, 3, 4]),
(array[1, 2, 3, 4]),
(array[1, 2, 3, 4])
)
select array_agg(sum order by ordinality)
from (
select ordinality, sum(unnest)
from the_table,
lateral unnest(intarr) with ordinality
group by 1
) s;
array_agg
------------
{3,6,9,12}
(1 row)
Here is one method that seems to work:
select array_agg(sum_aval order by ind)
from (select ind, sum(aval) sum_aval
from (select id, unnest(a) as aval, generate_series(1, 3) as ind
from (values (1, array[1, 2, 3]), (2, array[3, 4, 5])) v(id, a)
) x
group by ind
) x;
That is, unnest the arrays and generate indexes for them using generate_series(). Then you can aggregate at the index level and then re-combine into an array (using two separate aggregations).