Cast variant to boolean in external table - snowflake-cloud-data-platform

I have an external table on top of a csv file:
create or replace external table PAGES
(
near VARIANT as (nullif(value:c1,null)::VARIANT)
)
with location = #test_stage
file_format = test_file_format
pattern = '.*[.]csv';
select * from PAGES;
It gives me result as
"None"
"True"
"False"
"None"
I want my table to be like:
create or replace external table PAGES
(
near BOOLEAN as (nullif(value:c1,null)::BOOLEAN)
)
with location = #test_stage
file_format = test_file_format
pattern = '.*[.]csv';
select * from PAGES;
but it gives me error:
Failed to cast variant value "None" to BOOLEAN
How can I make it work with boolean.
My stage and file format looks like:
create or replace file format test_file_format type = 'csv' field_delimiter = ','
SKIP_HEADER = 1
FIELD_OPTIONALLY_ENCLOSED_BY = '"' ESCAPE = '\\'
empty_field_as_null=TRUE;
create or replace stage oncrawl_stage url='s3://unload-dev/'
file_format = test_file_format
storage_integration=snowflake_s3_integration;

One of the rows should return "None" (JS Null) to get this error. Otherwise the DDL should work:
select VALUE from pages;
+---------------------+
| VALUE |
+---------------------+
| { "c1": "True" } |
| { "c1": "False" } |
| { "c1": "None" } |
+---------------------+
select * from pages;
Failed to cast variant value "None" to BOOLEAN
Make sure you set USE_CACHED_RESULT to false to prevent unindented caching on your tests!
PS: Why do you use nullif(value:c1,null)? It means "return NULL if value:c1 is NULL"!
Can you try this one?
create or replace external table PAGES
(
near BOOLEAN as (decode(value:c1,'None',False,value:c1)::BOOLEAN)
)
with location = #test_stage
file_format = test_file_format
;

Related

Can I use a variable in a stage path?

I'm using a COPY INTO statement to load some tables into S3:
COPY INTO 's3://sandbox-staging/US/'
FROM US
storage_integration = sandbox
FILE_FORMAT = (
type = 'parquet'
)
header = true
overwrite = true;
I have to do a migration like this for every state. To save some time and protect against human error, I'd love to set the table name as a variable, so that I can use it in both the COPY INTO and FROM clauses. For example:
SET loc = 'US_NY';
SET staging_path = 's3://sandbox-staging/' || $loc || '/';
COPY INTO $staging_path
FROM table($loc)
storage_integration = sandbox
FILE_FORMAT = (
type = 'parquet'
)
header = true
overwrite = true;
The FROM clause works, it's the COPY INTO I can't seem to get right. In the same sense that there's a table function for table literals, is there any literal function I can use for staging paths?
You can try using a variable with execute immediate to dynamically generate the command. https://docs.snowflake.com/en/sql-reference/sql/execute-immediate.html
SET loc = 'US_NY';
SET staging_path = '''s3://sandbox-staging/' || $loc || '/''' ;
SET copy_command=
'COPY INTO ' || $staging_path ||
' FROM ' || $loc ||
' storage_integration = sandbox
FILE_FORMAT = (
type = \'parquet\'
)
header = true
overwrite = true;';
EXECUTE IMMEDIATE $copy_command;
To view the copy command code you can run:
SELECT $copy_command;
Output:
COPY INTO 's3://sandbox-staging/US_NY/' FROM US_NY storage_integration = sandbox FILE_FORMAT = ( type = 'parquet' ) header = true overwrite = true;
Going to the original requirement - running this for all states - this is tailor made for a SQL generator.
create or replace table TABLE_LIST(NAME string);
insert into TABLE_LIST (NAME) values ('US_NY'), ('US_CA'), ('US_NC'), ('US_FL');
select $$
COPY INTO 's3://sandbox-staging/$$ || NAME || $$/'$$ || $$
FROM $$ || NAME || $$
storage_integration = sandbox
FILE_FORMAT = (
type = 'parquet'
)
header = true
overwrite = true
$$ as SQL_COMMAND
from TABLE_LIST;
That will generate all the SQL commands in a table. If you want to automate running them, you can use a stored procedure to do that. For running generated SQL statements, there's one available to do that already.
https://snowflake.pavlik.us/index.php/2019/08/22/executing-multiple-sql-statements-in-a-stored-procedure/
You can then call it like this:
call RunBatchSQL($$
select 'COPY INTO ''s3://sandbox-staging/' || NAME || '\'' ||
' FROM ' || NAME ||
' storage_integration = sandbox
FILE_FORMAT = (
type = ''parquet''
)
header = true
overwrite = true'
as SQL_COMMAND
from TABLE_LIST;
$$);
It may be cleaner to write a stored procedure from scratch, but this allows you to run any generated SQL statements.

snowflake handle null while reading csv file

I am trying to load a CSV file from S3. which has a null value in the integer type data field in the snowflake table.
So I try to use IFFNULL function but gets the error.
Numeric value 'null' is not recognized.
For example when I try
select IFNULL(null,0)
I get the answer as 0.
but the same thing when I try while reading the CSV file won't work
select $1,$2,ifnull($2,0)
from
#stage/path
(file_format => csv)
I get the null not recognized Error.
and it fails when $2 is null.
My csv format is as below.
create FILE FORMAT CSV
COMPRESSION = 'AUTO' FIELD_DELIMITER = ','
RECORD_DELIMITER = '\n' SKIP_HEADER = 0
FIELD_OPTIONALLY_ENCLOSED_BY = '\042'
TRIM_SPACE = FALSE
ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE ESCAPE = '\134'
ESCAPE_UNENCLOSED_FIELD = '\134' DATE_FORMAT = 'AUTO'
TIMESTAMP_FORMAT = 'AUTO' NULL_IF = ('\\N');
Basically, I am just trying to convert null to 0, when reading from the stage.
The null string literal could be handled by setting NULL_IF:
CREATE FILE FORMAT CSV
...
NULL_IF = ('null', '\\N');
I used the second option listed in the Snowflake documentation specifying FIELD_OPTIONALLY_ENCLOSED_BY=NONE and EMPTY_FIELD_AS_NULL = FALSE in which case I'd need to provide a value to be used for NULLs (NULL_IF=('NULL')
https://docs.snowflake.com/en/user-guide/data-unload-considerations.html
"Leave string fields unenclosed by setting the FIELD_OPTIONALLY_ENCLOSED_BY option to NONE (default), and set the EMPTY_FIELD_AS_NULL value to FALSE to unload empty strings as empty fields.
If you choose this option, make sure to specify a replacement string for NULL data using the NULL_IF option, to distinguish NULL values from empty strings in the output file. If you later choose to load data from the output files, you will specify the same NULL_IF value to identify the NULL values in the data files."
So my query looked something like the following:
COPY INTO #~/unload/table FROM (
SELECT * FROM table
)
FILE_FORMAT = (TYPE = 'CSV' COMPRESSION = 'GZIP'
FIELD_DELIMITER = '\u0001'
EMPTY_FIELD_AS_NULL = FALSE
FIELD_OPTIONALLY_ENCLOSED_BY = NONE
NULL_IF=('NULL'))
OVERWRITE = TRUE;

Create Snowpipe Copy into using a file form a stage fails

When I try to load the data below from a stage file it fails with invalid date, is there a way to resolve this issue? Without changing the source file.
I am trying to setup a Snowpipe
Orig_Int_Date
04-21-2020
create or replace file format Ally_format
type = csv
field_delimiter = '|'
skip_header = 1
empty_field_as_null = true
REPLACE_INVALID_CHARACTERS = TRUE
DATE_FORMAT = '<MM-DD-YYYY>'
EMPTY_FIELD_AS_NULL = TRUE;**
Copy into NAM_FIN_DB.FIN_PUBLIC.ALLY
from #NAM_FIN_DB.PUBLIC.FP_FINANCE
file_format = Ally_format
pattern='ALLY.*';**
I think your date format line should be:
DATE_FORMAT = 'MM-DD-YYYY'
not
DATE_FORMAT = '<MM-DD-YYYY>'

Snowflake copy command to put default values in place of null

I am copying the data into snowflake table which has three columns: ID, DATA and ETL_LOAD_TIMESTAMP.
I have a column ETL_LOAD_TIMESTAMP in snowflake of type TIMESTAMP_TZ(9) and I have set its default value as CURRENT_TIMESTAMP().
I get my data from a CSV file, which is of type:
ID, DATA
1, Dummy
I download the csv file at tmpdir location on local. I load the data of this csv into snowflake as:
create_cmd = "CREATE TEMPORARY STAGE teamp123 COMMENT = 'TEMPORARY STAGE FOR TEST_TABLE1 DATA LOAD'"
self.connection.execute("ALTER SESSION SET TIMEZONE = 'UTC';")
self.connection.execute(create_cmd)
self.connection.execute(f"put file://tmpdir/* #temp123 PARALLEL=8")
self.connection.execute("COPY INTO TEST_TABLE1 FROM #temp123 PURGE = TRUE FILE_FORMAT = (TYPE = 'CSV' field_delimiter = ',' FIELD_OPTIONALLY_ENCLOSED_BY = '\"' ESCAPE_UNENCLOSED_FIELD = None error_on_column_count_mismatch=false SKIP_HEADER = 1)")
I get the values of ID and Data but the ETL_LOAD_TIMESTAMP is null.
How do I modify this copy command so that I get the default value of ETL_LOAD_TIMESTAMP which is current timestamp instead of null?
you can use default current_timestamp() while defining datatypes or explicit to_timestamp
https://docs.snowflake.com/en/user-guide/data-load-transform.html#current-time-current-timestamp-default-column-values

Loading CSV data to Snowflake table

Column splits into multiple columns when trying to load the following data in to SnowFlake table since its CSV file.
Column Data :
{"Department":"Mens
Wear","Departmentid":"10.1;20.1","customername":"john4","class":"tops wear","subclass":"sweat shirts","product":"North & Face 2 Bangle","style":"Sweat shirt hoodie - Large - Black"}
Is there any other way to load the data in to single column.
The best solution would be use a different delimiter instead of comma in your CSV file. If it's not possible, then you can ingest the data using a non-existing delimiter to get the whole line as one column, and then parse it. Of course it won't be as effective as native loading:
cat test.csv
1,2020-10-12,Gokhan,{"Department":"Mens Wear","Departmentid":"10.1;20.1","customername":"john4","class":"tops wear","subclass":"sweat shirts","product":"North & Face 2 Bangle","style":"Sweat shirt hoodie - Large - Black"}
create file format csvfile type=csv FIELD_DELIMITER='NONEXISTENT';
select $1 from #my_stage (file_format => csvfile );
create table testtable( id number, d1 date, name varchar, v variant );
copy into testtable from (
select
split( split($1,',{')[0], ',' )[0],
split( split($1,',{')[0], ',' )[1],
split( split($1,',{')[0], ',' )[2],
parse_json( '{' || split($1,',{')[1] )
from #my_stage (file_format => csvfile )
);
select * from testtable;
+----+------------+--------+-----------------------------------------------------------------+
| ID | D1 | NAME | V |
+----+------------+--------+-----------------------------------------------------------------+
| 1 | 2020-10-12 | Gokhan | { "Department": "Mens Wear", "Departmentid": "10.1;20.1", ... } |
+----+------------+--------+-----------------------------------------------------------------+

Resources