snowflake: How to mention date formats when loading data from stage - snowflake-cloud-data-platform

I have a csv file in s3 bucket with Date type columns in different formats and i need to insert them into snowflake table (eg: "DB_NAME-XX"."SCHEMA_NAME-XX"."TABLE_NAME-XX") where all columns are Date type
example (sample data) (its a stage data formatted for understanding purpose)
col_1 | col_2 | col_3
2017/12/01 | 1996 | 20101201
after inserting into snowflake ("DB_NAME-XX"."SCHEMA_NAME-XX"."TABLE_NAME-XX") it should look like below
COL_1 | COL_2 | COL_3
2017-12-01 | 1996-01-01 | 2010-12-01
I am using the following command
copy into "DB_NAME-XX"."SCHEMA_NAME-XX"."TABLE_NAME-XX"
from s3://bucket_name_xx/folder_name_xx/ credentials=(aws_key_id='xxxxxxxxxxxxxx' aws_secret_key='yyyyyyyyyyyyyyyyyyyyyy')
file_format = (type = csv field_delimiter = '|'skip_header = 1)
validation_mode = 'RETURN_1000000_ROWS';
I am not sure how to mention the date format in the above command.
How to achieve my goal of inserted the data with different date formats to snowflake

I suggest you create a named external stage first, e.g.:
CREATE STAGE my_ext_stage
URL='s3://bucket_name_xx/folder_name_xx/'
credentials=(aws_key_id='xxxxxxxxxxxxxx' aws_secret_key='yyyyyyyyyyyyyyyyyyyyyy');
Then assuming that your columns have the same date format for all of the values throughout the file, you can apply to_date() method with the desired time format for each of the column.
copy into "DB_NAME-XX"."SCHEMA_NAME-XX"."TABLE_NAME-XX"
from (select to_date($1, 'YYYY/MM/DD'), to_date($2, 'YYYY'), to_date($3, 'YYYYMMDD') FROM #my_ext_stage)
file_format = (type = csv field_delimiter = '|'skip_header = 1)
validation_mode = 'RETURN_1000000_ROWS';
Expected result:

Related

snowflake: mention date format staged data files

I have a tsv as below (just formatted for better repr) in a s4 bucket
col_1 | col_2 | col_3
2017/12/01 | 1996 | 20101201
.. | .. | ..
All the above columns are of DATE type
I create a stage to load this file from s3
Now I have table created
CREATE OR REPLACE TABLE
"test"
(
col_1 Date,
col_2 Date,
col_3 Date
);
Now i want to ingest the above csv into this table
-- create file_format
create or replace file format my_file_format
type = csv
field_delimiter = '|'
skip_header = 1
-- create stage
CREATE or replace STAGE my_stage
URL='s3://xxxx/yyyy'
CREDENTIALS=(AWS_KEY_ID='XXXXXXXXXXXXXX' AWS_SECRET_KEY='YYYYY');
-- copy into
copy into "TEST"
from #my_stage
file_format = (format_name = my_file_format);
-- or insert into
insert INTO "test" (select $1,$2,$3 from #my_stage (file_format => my_file_format));
I get error
Can't parse '2017/12/01' as date with format 'AUTO'
I cant change the csv. Is there any way i can mention the date format for each col while ingesting.
Can you try to use the proper date format based on the values?
insert INTO "test" (select TO_DATE($1,'YYYY/MM/DD'),TO_DATE($2,'YYYY'),
TO_DATE($3,'YYYYMMDD') from #my_stage (file_format => my_file_format));

How to add lots of values to a Postgres column with a where statement

I have a table with 1000 rows and have added a new column but now i need to add the data to it. Below is an example of my table.
location | name | display_name
-----------------+--------+-------
liverpool | Dan |
london | Louise |
stoke-on-trent | Amel |
itchen-hampshire| Mark |
I then have a csv that looks like this that has the extra data
location,name,display_name
Liverpool,Dan,Liverpool
London,Louise,London
stoke-on-trent,Amel,Stoke on Trent
itchen-hampshire,Mark,itchen (hampshire)
i know how to update a single row but not sure for the 1000 rows of data i have?
updating single row
UPDATE info_table
SET display_name = 'Itchen (Hampshire)'
WHERE id = 'itchen-hampshire';
You should first load that CSV data into another table and then do an update join on the first table:
UPDATE yourTable t1
SET display_name = t2.display_name
FROM csvTable t2
WHERE t2.location = t1.location;
If you only want to update display names which are null and have no value, then use:
WHERE t2.location = t1.location AND display_name IS NULL;
Updating more than one columns you can use this genralized query
update test as t set
column_a = c.column_a
from (values
('123', 1),
('345', 2)
) as c(column_b, column_a)
where c.column_b = t.column_b;

Unload Snowflake table data into S3 in Parquet format

i could see all the values as Null's under the columns after loading into S3 bucket from Snowflake table.
below is the code i have used.
create or replace stage STG_LOAD
url='s3://bucket/foler'
credentials=(aws_key_id='xxxx',aws_secret_key='xxxx')
file_format = (type = PARQUET);
copy into STG_LOAD from
(select OBJECT_CONSTRUCT(country_cd,source)
from table_1
file_format = (type='parquet')
header='true';
please let me know if i am missing something here.
It's normal to see null values. It's expected behaviour. Your SELECT produces an empty object, therefore it is written as NULL values:
create or replace table table_1
( country_cd varchar, source varchar ) as
select * from values
('US','Jack'),
('UK','Joe'),
('NL','Jim'),
('EU', null);
select OBJECT_CONSTRUCT(country_cd,source) output
from table_1
+------------------+
| OUTPUT |
+------------------+
| { "US": "Jack" } |
| { "UK": "Joe" } |
| { "NL": "Jim" } |
| {} |
+------------------+
If you don't want to write null values, you can filter them on your select:
copy into #STG_LOAD from
(select OBJECT_CONSTRUCT(country_cd,source ) output
from table_1
where output != OBJECT_CONSTRUCT() ) -- no empty objects
file_format = (type=parquet)
overwrite = true;

Decimal Field In Hive Only Returning Results When Encapsulating Value In Single Quotes

I ingested a table from a SQLServer database into a Hive database. When I attempt to do a lookup on a value in Hive, I have to surround the value with single quotes in order to find it.
In SQLServer, the datatype for the column is numeric(21,0). The front-end tool we used to migrate the data (as well as provide reporting, etc.), Solix CDP, has mapped this value to a decimal(38,13) in Hive.
The following is a matrix of anonymized search values and results:
select * from [table] where num = 1;
| |
select * from [table] where num = '1';
| 1 |
select * from [table] where num like 1;
| |
select * from [table] where num like '1';
| 1 |
The Solix tool does not handle wrapping the value in single quotes in a report, so I need to figure out a way to return a result "traditionally" ie without single quotes.
What would be causing this issue?
Thanks.

Split Single Column into multiple and Load it to a Table or a View

I'm using SQL Server 2008. I have a source table with a few columns (A, B) containing string data to split into a multiple columns. I do have function that does the split already written.
The data from the Source table (the source table format cannot be modified) is used in a View being created. But I need to have my View have already split data for Column A and B from the Source table. So, my view will have extra columns that are not in the Source table.
Then the View populated with the Source table is used to Merge with the Other Table.
There two questions here:
Can I split column A and B from the Source table when creating a View, but do not change the Source Table?
How to use my existing User Defined Function in the View "Select" statement to accomplish this task?
Idea in short:
String to split is also shown in the example in the commented out section. Pretty much have Destination table, vStandardizedData View, SP that uses the View data to Merge to tblStandardizedData table. So, in my Source column I have column A and B that I need to split before loading to tblStandardizedData table.
There are five objects that I'm working on:
Source File
Destination Table
vStandardizedData View
tblStandardizedData table
Stored procedure that does merge
(Update and Insert) form the vStandardizedData View.
Note: all the 5 objects a listed in the order they are supposed to be created and loaded.
Separately from this there is an existing UDFunction that can split the string which I was told to use
Example of the string in column A (column B has the same format data) to be split:
6667 Mission Street, 4567 7rd Street, 65 Sully Pond Park
Desired result:
User-defined function returns a table variable:
CREATE FUNCTION [Schema].[udfStringDelimeterfromTable]
(
#sInputList VARCHAR(MAX) -- List of delimited items
, #Delimiter CHAR(1) = ',' -- delimiter that separates items
)
RETURNS #List TABLE (Item VARCHAR(MAX)) WITH SCHEMABINDING
/*
* Returns a table of strings that have been split by a delimiter.
* Similar to the Visual Basic (or VBA) SPLIT function. The
* strings are trimmed before being returned. Null items are not
* returned so if there are multiple separators between items,
* only the non-null items are returned.
* Space is not a valid delimiter.
*
* Example:
SELECT * FROM [Schema].[udfStringDelimeterfromTable]('abcd,123, 456, efh,,hi', ',')
*
* Test:
DECLARE #Count INT, #Delim CHAR(10), #Input VARCHAR(128)
SELECT #Count = Count(*)
FROM [Schema].[udfStringDelimeterfromTable]('abcd,123, 456', ',')
PRINT 'TEST 1 3 lines:' + CASE WHEN #Count=3
THEN 'Worked' ELSE 'ERROR' END
SELECT #DELIM=CHAR(10)
, #INPUT = 'Line 1' + #delim + 'line 2' + #Delim
SELECT #Count = Count(*)
FROM [Schema].[udfStringDelimeterfromTable](#Input, #Delim)
PRINT 'TEST 2 LF :' + CASE WHEN #Count=2
THEN 'Worked' ELSE 'ERROR' END
What I'd ask you, is to read this: How to create a Minimal, Complete, and Verifiable example.
In general: If you use your UDF, you'll get table-wise data. It was best, if your UDF would return the item together with a running number. Otherwise you'll first need to use ROW_NUMBER() OVER(...) to create a part number in order to create your target column names via string concatenation. Then use PIVOT to get the columns side-by-side.
An easier approach could be a string split via XML like in this answer
A quick proof of concept to show the principles:
DECLARE #tbl TABLE(ID INT,YourValues VARCHAR(100));
INSERT INTO #tbl VALUES
(1,'6667 Mission Street, 4567 7rd Street, 65 Sully Pond Park')
,(2,'Other addr1, one more addr, and another one, and even one more');
WITH Casted AS
(
SELECT *
,CAST('<x>' + REPLACE(YourValues,',','</x><x>') + '</x>' AS XML) AS AsXml
FROM #tbl
)
SELECT *
,LTRIM(RTRIM(AsXml.value('/x[1]','nvarchar(max)'))) AS Address1
,LTRIM(RTRIM(AsXml.value('/x[2]','nvarchar(max)'))) AS Address2
,LTRIM(RTRIM(AsXml.value('/x[3]','nvarchar(max)'))) AS Address3
,LTRIM(RTRIM(AsXml.value('/x[4]','nvarchar(max)'))) AS Address4
,LTRIM(RTRIM(AsXml.value('/x[5]','nvarchar(max)'))) AS Address5
FROM Casted
If your values might include forbidden characters (especially <,> and &) you can find an approach to deal with this in the linked answer.
The result
+----+---------------------+-----------------+--------------------+-------------------+----------+
| ID | Address1 | Address2 | Address3 | Address4 | Address5 |
+----+---------------------+-----------------+--------------------+-------------------+----------+
| 1 | 6667 Mission Street | 4567 7rd Street | 65 Sully Pond Park | NULL | NULL |
+----+---------------------+-----------------+--------------------+-------------------+----------+
| 2 | Other addr1 | one more addr | and another one | and even one more | NULL |
+----+---------------------+-----------------+--------------------+-------------------+----------+

Resources