Snowflake Overwrites Column Names using COPY INTO - snowflake-cloud-data-platform

I have one query only that fails to copy column headers in the results when executed as COPY INTO. I have HEADER=TRUE and I'm assigning explicit names using AS. The behavior occurs when I am not explicitly assigning names as well.
If the query is run without COPY INTO the results are generated correctly.
Query
COPY INTO 's3://<my_bucket>'
FROM (
SELECT
col1 AS "alpha",
col2 AS "beta",
col3 AS "charlie",
col4 AS "delta"
FROM
"table_1"
INNER JOIN
"table_2"
ON
col1 = col3
AND
col2 = 'foo'
AND
col4 > 1234
)
FILE_FORMAT=(TYPE='PARQUET' FIELD_DELIMITER=',' record_delimiter = '\n' field_optionally_enclosed_by='"')
HEADER=TRUE
SINGLE=FALSE
OVERWRITE=TRUE
MAX_FILE_SIZE = 5368709120
credentials=(AWS_KEY_ID='<my_id>'
AWS_SECRET_KEY='<my_key>');
Results
ROW C0 C1 C2 C3
1 foo bar baz beetle
Desired Results
ROW alpha beta charlie delta
1 foo bar baz beetle

Yes, COPY INTO with headers included should work. Here's an example for parquet:
https://docs.snowflake.net/manuals/user-guide/data-unload-considerations.html#unloading-a-relational-table-to-parquet-with-multiple-columns
Just for testing purposes, does the example work?
One thing you may look at is the file format options. Parquet does not have some of the file format options specified in your code above.
https://docs.snowflake.net/manuals/sql-reference/sql/create-file-format.html#type-parquet

Related

Replacing elements in an array column in snowflake

I have sample data as follows;
team_id
mode
123
[1,2]
Here mode is an array.The goal is to replace the values in column mode by literal values, such as 1 stands for Ocean, and 2 stands for Air
Expected Output
team_id
mode
123
[Ocean,Air]
Present Approach
As an attempt, I tried to first flatten the data into multiple rows;
team_id
mode
123
1
123
2
Then we can define a new column assigning literal values to mode column using a case statement, followed by aggregating the values into an array to get desired output.
Can I get some help here to do the replacement directly in the array? Thanks in advance.
Using FLATTEN and ARRAY_AGG:
CREATE OR REPLACE TABLE tab(team_id INT, mode ARRAY) AS SELECT 123, [1,2];
SELECT TEAM_ID,
ARRAY_AGG(CASE f.value::TEXT
WHEN 1 THEN 'Ocean'
WHEN 2 THEN 'Air'
ELSE 'Unknown'
END) WITHIN GROUP(ORDER BY f.index) AS new_mode
FROM tab
,LATERAL FLATTEN(tab.mode) AS f
GROUP BY TEAM_ID;
Output:
TEAM_ID
NEW_MODE
123
[ "Ocean", "Air" ]
For an alternative solution with easy array manipulation. you could create a JS UDF:
create or replace function replace_vals_in_array(A variant)
returns variant
language javascript
as $$
dict = {1:'a', 2:'b', 3:'c', 4:'d'};
return A.map(x => dict[x]);
$$;
Then to update your table:
update arrs
set arr = replace_vals_in_array(arr);
Example setup:
create or replace temp table arrs as (
select 1 id, [1,2,3] arr
union all select 2, [2,4]
);
select *, replace_vals_in_array(arr)
from arrs;

Comparing values between records in a table using Informatica PowerCenter

Consider a table with the following records in a Database:
>>> Table A:
Col_1 Col_2 Col_3
GGG 123 -
GGG 123 X
GGG 123 Y
KKK 786 X
MMM 999 Y
DDD 456 X
DDD 456 U
Wherever we have records with matching values in col_1 and col_2, and we have values X and Y in col_3, the records with X and Y must be deleted. In other cases, we should keep the records.
For example in the above table, the output should look like this:
>>> Output_Table:
Col_1 Col_2 Col_3
GGG 123 -
KKK 786 X
MMM 999 Y
DDD 456 X
DDD 456 U
How this scenario can be implemented (using expression transformation, variable ports, lookup and so on...)? Any help would be greatly appreciated.
There can be multiple scenarios. And i am not sure if your issue is exactly like you described but i will answer as per your question.
Assuming Col_3 can have 'X','Y' - as hardcoded value you want to remove. The values you are trying to remvoe are hardcoded.
First sort the data based on Col_1,Col_2.
Then use EXP transformation and create 7 ports like below. Here we will compare one row with its previous row and see if they are same or not. If same, then concat col3 into one single column.
col1
col2
in_col3
v_col3= iif(v_prev_col1=col1 and v_prev_col2=col2,col3,v_col3||''||col3)
v_prev_col1=col1
v_prev_col2=col2
o_col3=v_col3
After that use an aggregator - group by ports will be col1,col2. And then col3 will be MAX(o_col3) from expression before. Agg will stamp concatenated col3 into one single column.
Then add a filter like below to check if you have XY or YX for duplicate rows.
iif(max_col3='XY' or reverse(max_col3)='XY',FALSE,TRUE) -- You can place any hardcode values here.
EDIT :
5. Now, if you want to get original data (like in comments) excluding XY combination, then use a joiner.
use a joiner now, join output from step 4 and output of step 1. It will be a normal join on Col_1,Col_2.
And the output of the joiner will have no XY combination.
Whole mapping should look like this
|->2.EXP-->3.AGG-->4.FIL--|
-->1.SRT ->|------------------------>|->5.JNR--...--> TGT

How to load a csv into a table in Q?

Very new to Q and I am having some issues loading my data into a table following the examples on the documentation.
I am running the following code:
table1: get `:pathname.csv
While it doesn't throw an error, when I run the following command nothing comes up:
select * from table1
Or when selecting a specific column:
select col1 from table1
If anyone could guide me in the right direction, that would be great!
Edit: This seems to work and retain all my columns:
table1: (9#"S";enlist csv) 0: `:data.CSV
You're going to need to use 0: https://code.kx.com/q/ref/filenumbers/#load-csv
The exact usage will depend on your csv, as you need to define the datatypes to load each column as.
As an example, here I have a CSV with a long, char & float column:
(kdb) chronos#localhost ~/Downloads $ more example.csv
abc,def,ghi
1,a,3.4
2,b,7.5
3,c,88
(kdb) chronos#localhost ~/Downloads $ q
KDB+ 3.6 2018.10.23 Copyright (C) 1993-2018 Kx Systems
l64/ 4()core 3894MB chronos localhost 127.0.0.1 EXPIRE 2019.06.15 jonathon.mcmurray#aquaq.co.uk KOD #5000078
q)("JCF";enlist",")0:`:example.csv
abc def ghi
-----------
1 a 3.4
2 b 7.5
3 c 88
q)meta ("JCF";enlist",")0:`:example.csv
c | t f a
---| -----
abc| j
def| c
ghi| f
q)
I use the chars "JCF" to define the datatypes long, character & float respectively.
I enlist the delimiter (",") to indicate that the first row of the CSV contains the headers for the columns. (Otherwise, these can be supplied in your code & the table constructed)
On a side note, note that in q-sql, the * is not necessary as in standard SQL; you can simply do select from table1 to query all columns

Changing character in a string of characters

I was wondering regarding how to edit the following column that exists in oracle DB
PPPPFPPPPPPPPPPPPPPPPPPPPPPPPFPPPPPPPP
I want to only set the 5th F with P without affecting other structure.
I've around 700 records and I want to change that position (5th) on all users to P
I was thinking of PLSQL instead of a query, so could you please advice.
Thanks
Use REGEXP_REPLACE:
> SELECT REGEXP_REPLACE('PPPPFPPPPPPPPPPPPPPPPPPPPPPPPFPPPPPPPP', '^(\w{4}).(.*)', '\1P\2') AS COL_REGX FROM dual
COL_REGX
--------------------------------------
PPPPPPPPPPPPPPPPPPPPPPPPPPPPPFPPPPPPPP
Klashxx answer is a good one - REGEXP_REPLACE is the way to go. Old fashioned way built up bit by bit so you can see what's going on :
WITH
test_data (text)
AS (SELECT '1234F1234F1234F1234F1234F1234F1234' FROM DUAL
)
SELECT
text
,INSTR(text,'F',1,5) --fifth occurence
,SUBSTR(text,1,INSTR(text,'F',1,5)-1) --substr up to that point
,SUBSTR(text,1,INSTR(text,'F',1,5)-1)||'P' --add P
,SUBSTR(text,1,INSTR(text,'F',1,5)-1)||'P'||SUBSTR(text,INSTR(text,'F',1,5)+1) --add remainder of string
FROM
test_data
;
So what you're trying to do would be something like
UPDATE <your table>
SET <your column> = SUBSTR(<your column>,1,INSTR(<your column>,'F',1,5)-1)||'P'||SUBSTR(<your column>,INSTR(<your column>,'F',1,5)+1)
..assuming you want to update all rows
The solution below looks for the first five characters at the beginning of the input string. If found, it keeps the first four unchanged and it replaces the fifth with the letter P. Note that if the input string is four characters or less, it is left unchanged. (This includes NULL as the input string, shown in the WITH clause which creates sample strings and also in the output - note that the output has FIVE rows, even though there is nothing visible in the last one.)
with
test_data ( str ) as (
select 'ABCDEFGH' from dual union all
select 'PPPPF' from dual union all
select 'PPPPP' from dual union all
select '1234' from dual union all
select null from dual
)
select str, regexp_replace(str, '^(.{4}).', '\1P') as replaced
from test_data
;
STR REPLACED
-------- --------
ABCDEFGH ABCDPFGH
PPPPF PPPPP
PPPPP PPPPP
1234 1234
5 rows selected.
Flip the 5th 'bit' to a 'P' where it's currently an 'F'.
update table
set column = regexp_replace(column , '^(.{4}).', '\1P')
where regexp_like(column , '^.{4}F');

Find valid combinations based on matrix

I have a in CALC the following matrix: the first row (1) contains employee numbers, the first column (A) contains productcodes.
Everywhere there is an X that productitem was sold by the corresponding employee above
| 0302 | 0303 | 0304 | 0402 |
1625 | X | | X | X |
1643 | | X | X | |
...
We see that product 1643 was sold by employees 0303 and 0304
What I would like to see is a list of what product was sold by which employees but formatted like this:
1625 | 0302, 0304, 0402 |
1643 | 0303, 0304 |
The reason for this is that we need this matrix ultimately imported into an SQL SERVER table. We have no access to the origins of this matrix. It contains about 50 employees and 9000+ products.
Thanx for thinking with us!
try something like this
;with data as
(
SELECT *
FROM ( VALUES (1625,'X',NULL,'X','X'),
(1643,NULL,'X','X',NULL))
cs (col1, [0302], [0303], [0304], [0402])
),cte
AS (SELECT col1,
col
FROM data
CROSS apply (VALUES ('0302',[0302]),
('0303',[0303]),
('0304',[0304]),
('0402',[0402])) cs (col, val)
WHERE val IS NOT NULL)
SELECT col1,
LEFT(cs.col, Len(cs.col) - 1) AS col
FROM cte a
CROSS APPLY (SELECT col + ','
FROM cte B
WHERE a.col1 = b.col1
FOR XML PATH('')) cs (col)
GROUP BY col1,
LEFT(cs.col, Len(cs.col) - 1)
I think there are two problems to solve:
get the product codes for the X marks;
concatenate them into a single, comma-separated string.
I can't offer a solution for both issues in one step, but you may handle both issues separately.
1.
To replace the X marks by the respective product codes, you could use an array function to create a second table (matrix). To do so, create a new sheet, copy the first column / first row, and enter the following formula in cell B2:
=IF($B2:$E3="X";$B$1:$E$1;"")
You'll have to adapt the formula, so it covers your complete input data (If your last data cell is Z9999, it would be =IF($B2:$Z9999="X";$B$1:$Z$1;"")). My example just covers two rows and four columns.
After modifying it, confirm with CTRL+SHIFT+ENTER to apply it as array formula.
2.
Now, you'll have to concatenate the product codes. LO Calc lacks a feature to concatenate an array, but you could use a simple user-defined function. For such a string-join function, see this answer. Just create a new macro with the StarBasic code provided there and save it. Now, you have a STRJOIN() function at hand that accepts an array and concatenates its values, leaving empty values out.
You could add that function using a helper column on the second sheet and apply it by dragging it down. Finally, to get rid of the cells with the single product IDs, copy the complete second sheet, paste special into a third sheet, pasting only the values. Now, you can remove all columns except the first one (employee IDs) and the last one (with the concatenated product ids).
I created a table in sql for holding the data:
CREATE TABLE [dbo].[mydata](
[prod_code] [nvarchar](8) NULL,
[0100] [nvarchar](10) NULL,
[0101] [nvarchar](10) NULL,
[and so on...]
I created the list of columns in Calc by copying and pasting them transposed. After that I used the concatenate function to create the columnlist + datatype for the create table statement
I cleaned up the worksheet and imported it into this table using SQL Server's import wizard. Cleaning meant removing unnecessary rows/columns. Since the columnnames were identical mapping was done correctly for 99%.
Now I had the data in SQL Server.
I adapted the code MM93 suggested a bit:
;with data as
(
SELECT *
FROM dbo.mydata <-- here i simply referenced the whole table
),cte
and in the next part I uses the same 'worksheet' trick to list and format all the column names and pasted them in.
),cte
AS (SELECT prod_code, <-- had to replace col1 with 'prod_code'
col
FROM data
CROSS apply (VALUES ('0100',[0100]),
('0101', [0101] ),
(and so on... ),
The result of this query was inserted into a new table and my colleagues and I are querying our harts out :)
PS: removing the 'FOR XML' clause resulted in a table with two columns :
prodcode | employee
which containes al the unique combinations of prodcode + employeenumber which is a lot faster and much more practical to query.

Resources