SNOWFLAKE - Single file generation even though the max size exceeds - snowflake-cloud-data-platform

Team,
I have a query to discuss: Table to file creation in snowflake
We have used the max file size with the compression, but at some times the data which is retrieving and size it is storing in the file getting exceeds.
But we remove the SINGLE=TRUE there are multiple files getting generated and takes up lot time to fix and merge the files.
We need to know a step to create the file if the maximum file size exceeds??
copy into #SNOWFLAKE_AZURE_STAGE/data/load/30jan/etl_file_20210223.dat.csv.gz
from DB.SOURCE_TABLE
file_format = (
type = csv
COMPRESSION = 'gzip'
field_delimiter = '|'
field_optionally_enclosed_by = NONE
empty_field_as_null = FALSE
RECORD_DELIMITER = ''
escape='None'
)
OVERWRITE = TRUE
MAX_FILE_SIZE = 5368706371
SINGLE = TRUE
HEADER = True;

Related

snowflake handle null while reading csv file

I am trying to load a CSV file from S3. which has a null value in the integer type data field in the snowflake table.
So I try to use IFFNULL function but gets the error.
Numeric value 'null' is not recognized.
For example when I try
select IFNULL(null,0)
I get the answer as 0.
but the same thing when I try while reading the CSV file won't work
select $1,$2,ifnull($2,0)
from
#stage/path
(file_format => csv)
I get the null not recognized Error.
and it fails when $2 is null.
My csv format is as below.
create FILE FORMAT CSV
COMPRESSION = 'AUTO' FIELD_DELIMITER = ','
RECORD_DELIMITER = '\n' SKIP_HEADER = 0
FIELD_OPTIONALLY_ENCLOSED_BY = '\042'
TRIM_SPACE = FALSE
ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE ESCAPE = '\134'
ESCAPE_UNENCLOSED_FIELD = '\134' DATE_FORMAT = 'AUTO'
TIMESTAMP_FORMAT = 'AUTO' NULL_IF = ('\\N');
Basically, I am just trying to convert null to 0, when reading from the stage.
The null string literal could be handled by setting NULL_IF:
CREATE FILE FORMAT CSV
...
NULL_IF = ('null', '\\N');
I used the second option listed in the Snowflake documentation specifying FIELD_OPTIONALLY_ENCLOSED_BY=NONE and EMPTY_FIELD_AS_NULL = FALSE in which case I'd need to provide a value to be used for NULLs (NULL_IF=('NULL')
https://docs.snowflake.com/en/user-guide/data-unload-considerations.html
"Leave string fields unenclosed by setting the FIELD_OPTIONALLY_ENCLOSED_BY option to NONE (default), and set the EMPTY_FIELD_AS_NULL value to FALSE to unload empty strings as empty fields.
If you choose this option, make sure to specify a replacement string for NULL data using the NULL_IF option, to distinguish NULL values from empty strings in the output file. If you later choose to load data from the output files, you will specify the same NULL_IF value to identify the NULL values in the data files."
So my query looked something like the following:
COPY INTO #~/unload/table FROM (
SELECT * FROM table
)
FILE_FORMAT = (TYPE = 'CSV' COMPRESSION = 'GZIP'
FIELD_DELIMITER = '\u0001'
EMPTY_FIELD_AS_NULL = FALSE
FIELD_OPTIONALLY_ENCLOSED_BY = NONE
NULL_IF=('NULL'))
OVERWRITE = TRUE;

Create Snowpipe Copy into using a file form a stage fails

When I try to load the data below from a stage file it fails with invalid date, is there a way to resolve this issue? Without changing the source file.
I am trying to setup a Snowpipe
Orig_Int_Date
04-21-2020
create or replace file format Ally_format
type = csv
field_delimiter = '|'
skip_header = 1
empty_field_as_null = true
REPLACE_INVALID_CHARACTERS = TRUE
DATE_FORMAT = '<MM-DD-YYYY>'
EMPTY_FIELD_AS_NULL = TRUE;**
Copy into NAM_FIN_DB.FIN_PUBLIC.ALLY
from #NAM_FIN_DB.PUBLIC.FP_FINANCE
file_format = Ally_format
pattern='ALLY.*';**
I think your date format line should be:
DATE_FORMAT = 'MM-DD-YYYY'
not
DATE_FORMAT = '<MM-DD-YYYY>'

Snowflake copy command to put default values in place of null

I am copying the data into snowflake table which has three columns: ID, DATA and ETL_LOAD_TIMESTAMP.
I have a column ETL_LOAD_TIMESTAMP in snowflake of type TIMESTAMP_TZ(9) and I have set its default value as CURRENT_TIMESTAMP().
I get my data from a CSV file, which is of type:
ID, DATA
1, Dummy
I download the csv file at tmpdir location on local. I load the data of this csv into snowflake as:
create_cmd = "CREATE TEMPORARY STAGE teamp123 COMMENT = 'TEMPORARY STAGE FOR TEST_TABLE1 DATA LOAD'"
self.connection.execute("ALTER SESSION SET TIMEZONE = 'UTC';")
self.connection.execute(create_cmd)
self.connection.execute(f"put file://tmpdir/* #temp123 PARALLEL=8")
self.connection.execute("COPY INTO TEST_TABLE1 FROM #temp123 PURGE = TRUE FILE_FORMAT = (TYPE = 'CSV' field_delimiter = ',' FIELD_OPTIONALLY_ENCLOSED_BY = '\"' ESCAPE_UNENCLOSED_FIELD = None error_on_column_count_mismatch=false SKIP_HEADER = 1)")
I get the values of ID and Data but the ETL_LOAD_TIMESTAMP is null.
How do I modify this copy command so that I get the default value of ETL_LOAD_TIMESTAMP which is current timestamp instead of null?
you can use default current_timestamp() while defining datatypes or explicit to_timestamp
https://docs.snowflake.com/en/user-guide/data-load-transform.html#current-time-current-timestamp-default-column-values

S3 to Snowflake ( loading csv data in S3 to Snowflake table throwing following error)

I am trying to load .csv file data to Snowflake table and using following command
COPY INTO MYTABLE
FROM #S3PATH PATTERN='.*TEST.csv'
FILE_FORMAT = (type = csv skip_header = 1) ON_ERROR = CONTINUE PURGE=TRUE FORCE=TRUE;
Following scenario I am seeing
1) if even one column of the table is numeric it will throw error
Numeric value '""' is not recognized
2) if i change all the columns data type to varchar, then it will load the data but it will populate
all the columns data with "" double quotes ( instead of 15 , "15")
Thanks in advance for your response!
You're likely missing FIELD_OPTIONALLY_ENCLOSED_BY = '\042' in your file_format. Add that in and try.
https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html#type-csv
https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html
Thanks CodeMonkey!
One issue is solved
current scenario:
One column is defines as " NUMBER" in SF table and if the csv file has a value populated for that columns then those were the only rows loaded in the table. basically if the numeric column in csv file is null (or blank) those record as not loaded.
also tried using
EMPTY_FIELD_AS_NULL = TRUE
still the same result as above.
"first_error" message: Numeric value '' is not recognized
here is what i did and it is working
FILE_FORMAT = (type = csv field_delimiter = ',' skip_header = 1 FIELD_OPTIONALLY_ENCLOSED_BY = '\042' EMPTY_FIELD_AS_NULL = TRUE NULL_IF = ('NULL','null','')) ON_ERROR = CONTINUE PURGE=TRUE FORCE=TRUE;

How to receive error if the S3 prefix is incorrect when copying files?

If the s3 prefix is incorrect, Snowflake doesn't throw an error. Is there a way to fail the copy command with the use of appropriate parameters?
COPY INTO <table>
FROM 's3://<valid-Bucket>/<INVALIDPREFIX>'
FILE_FORMAT = (
COMPRESSION = 'AUTO'
FIELD_DELIMITER = '\\001'
RECORD_DELIMITER = '\n'
ESCAPE_UNENCLOSED_FIELD= NONE
TRIM_SPACE = FALSE
)
ON_ERROR = ABORT_STATEMENT
The message received in the Snowflake UI

Resources