Cannot insert Array in Snowflake - snowflake-cloud-data-platform

I have a CSV file with the following data:
eno | phonelist | shots
"1" | "['1112223333','6195551234']" | "[[11,12]]"
The DDL statement I have used to create table in snowflake is as follows:
CREATE TABLE ArrayTable (eno INTEGER, phonelist array,shots array);
I need to insert the data from the CSV into the Snowflake table and the method I have used is:
create or replace stage ArrayTable_stage file_format = (TYPE=CSV)
put file://ArrayTable #ArrayTable_stage auto_compress=true
copy into ArrayTable from #ArrayTable_stage/ArrayTable.gz
file_format = (TYPE=CSV FIELD_DELIMITER='|' FIELD_OPTIONALLY_ENCLOSED_BY='\"\')
But when I try to run the code, I get the error:
Copy to table failed: 100069 (22P02): Error parsing JSON:
('1112223333','6195551234')
How to resolve this?

FIELD_OPTIONALLY_ENCLOSED_BY='\"\' base on the row you have that should just be '\"'
select parse_json('[\'1112223333\',\'6195551234\']');
works (the back slashes are to get around the sql parser)
but your output has parens (, ) which is different.
SELECT column2, TRY_PARSE_JSON(column2) as j
FROM #ArrayTable_stage/ArrayTable.gz
file_format = (TYPE=CSV FIELD_DELIMITER='|' FIELD_OPTIONALLY_ENCLOSED_BY='\"\')
WHERE j is null;
will show which values are failing to parse..
failing that you might want to use to_array to parse column2 and thus insert into you table the SELECTED/transformed data, that is failing to auto transform

Related

Snowflake copy into not recognising timestamp

Any suggestions on the below, i am trying to use copy into to move parquet files from S3 into snowflake table. Col1 is a timestamp and the rest are strings.
copy into table1 from
(select $1:col1, $1:col2, $1:col3
from #stage/path)
file_format = (format_name = parquet_format);
and getting the following error
Failed to cast variant value "20050111 00:00:00" to TIMESTAMP_NTZ
I have tried
copy into table1 from
(select to_timestamp($1:col1, 'yyyymmdd hh:mi:ss'), $1:col2, $1:col3
from #stage/path)
file_format = (format_name = parquet_format);
but getting the error
Error: too many arguments for function [TO_TIMESTAMP(GET(STAGE.$1, 'col1'), 'yyyymmdd hh:mi:ss')] expected 1, got 2 (line 130)
Any ideas
What's happening is this in this expression:
(select to_timestamp($1:col1, 'yyyymmdd hh:mi:ss')
This part: $1:col1 is resolving to an object rather than a primitive data type. This is what's leading to the error message about getting 2 arguments instead of one.
Changing this part of the expression to $1:col1::string casts the expression as a primitive type - string or varchar. This is the type of parameter the to_timestamp function is expecting. The final statement should be:
copy into table1 from
(select to_timestamp($1:col1::string, 'yyyymmdd hh:mi:ss'), $1:col2, $1:col3
from #stage/path)
file_format = (format_name = parquet_format);

Converting a table in one form to another using Snowflake

I am trying to load a CSV file into Snowflake. The sample format of the input csv table in s3 location is as follows (with 2 columns: ID, Location_count):
Input csv table
I need to transform it in the below format:(with 3 columns:ID, Location, Count)
Output csv table
However when I am trying to load the input file using the following query after creating database, external stage and file format, it returns LOAD_FAILED
create or replace table table_name
(
id integer,
Location_count variant
);
select parse_json(Location_count) as c;
list #stage_name;
copy into table_name from #stage_name file_format = 'fileformatname' on_error = 'continue';
you will probably need to parse_json that 2nd column as part of a copy-transformation. For example:
create file format myformat
type = csv field_delimiter = ','
FIELD_OPTIONALLY_ENCLOSED_BY = '"';
create or replace stage csv_stage file_format = (format_name = myformat);
copy into #csv_stage from
( select '1',
'{"SHS-TRN":654738,"PRN-UTN":78956,"NCT-JHN":96767}') ;
create or replace table blah (id integer, something variant);
copy into blah from (select $1, parse_json($2) from #csv_stage);

String delimiter present in string not permitted in Polybase?

I'm creating an external table using a CSV stored in an Azure Data Lake Storage and populating the table using Polybase in SQL Server.
However, I ran into this problem and figured it may be due to the fact that in one particular column there are double quotes present within the string, and the string delimiter has been specified as " in Polybase (STRING_DELIMITER = '"').
HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopExecutionException: Could not find a delimiter after string delimiter
Example:
I have done quite an extensive research in this and found that this issue has been around for years but yet to see any solutions given.
Any help will be appreciated.
I think the easiest way to fix this up because you are in charge of the .csv creation is to use a delimiter which is not a comma and leave off the string delimiter. Use a separator which you know will not appear in the file. I've used a pipe in my example, and I clean up the string once it is imported in to the database.
A simple example:
IF EXISTS ( SELECT * FROM sys.external_tables WHERE name = 'delimiterWorking' )
DROP EXTERNAL TABLE delimiterWorking
GO
IF EXISTS ( SELECT * FROM sys.tables WHERE name = 'cleanedData' )
DROP TABLE cleanedData
GO
IF EXISTS ( SELECT * FROM sys.external_file_formats WHERE name = 'ff_delimiterWorking' )
DROP EXTERNAL FILE FORMAT ff_delimiterWorking
GO
CREATE EXTERNAL FILE FORMAT ff_delimiterWorking
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (
FIELD_TERMINATOR = '|',
--STRING_DELIMITER = '"',
FIRST_ROW = 2,
ENCODING = 'UTF8'
)
);
GO
CREATE EXTERNAL TABLE delimiterWorking (
id INT NOT NULL,
body VARCHAR(8000) NULL
)
WITH (
LOCATION = 'yourLake/someFolder/delimiterTest6.txt',
DATA_SOURCE = ds_azureDataLakeStore,
FILE_FORMAT = ff_delimiterWorking,
REJECT_TYPE = VALUE,
REJECT_VALUE = 0
);
GO
SELECT *
FROM delimiterWorking
GO
-- Fix up the data
CREATE TABLE cleanedData
WITH (
CLUSTERED COLUMNSTORE INDEX,
DISTRIBUTION = ROUND_ROBIN
)
AS
SELECT
id,
body AS originalCol,
SUBSTRING ( body, 2, LEN(body) - 2 ) cleanBody
FROM delimiterWorking
GO
SELECT *
FROM cleanedData
My results:
String Delimiter issue can be avoided if you have the Data lake flat file converted to Parquet format.
Input:
"ID"
"NAME"
"COMMENTS"
"1"
"DAVE"
"Hi "I am Dave" from"
"2"
"AARO"
"AARO"
Steps:
1 Convert Flat file to Parquet format [Using Azure Data factory]
2 Create External File format in Data Lake [Assuming Master key, Scope credentials available]
CREATE EXTERNAL FILE FORMAT PARQUET_CONV
WITH (FORMAT_TYPE = PARQUET,
DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
);
3 Create External Table with FILE_FORMAT = PARQUET_CONV
Output:
I believe this is the best option as Microsoft don't have an solution currently to handle this string delimiter occurring with in the data for External table

Snowflake - CREATE table with a date variable in its name

I would like to create a table in snowflake, appending the date to the end of its name. What is the best way to do that?
original table = "DB"."SCHEMA"."CLONEME"
desired new table = "DB"."SCHEMA"."CLONEME_20200812BKP"
Tried setting the date variables, but it didn't work.
First attempt:
set var1= (SELECT TO_CHAR(DATE_TRUNC('DAY',CONVERT_TIMEZONE('UTC', CURRENT_DATE())),'YYYYMMDD'));
set var2 = concat('DB.SCHEMA.CLONEME_',$var1);
create table $var2 clone DB.SCHEMA.CLONEME;
-- and got the following error:
-- SQL compilation error: syntax error line 1 at position 13 unexpected '$var2'.
I'd recommend using the IDENTIFIER function:
https://docs.snowflake.com/en/sql-reference/identifier-literal.html
Example:
CREATE OR REPLACE TABLE CLONEME(
src_string VARCHAR(20));
INSERT INTO CLONEME
VALUES('JKNHJYGHTFGRTYGHJ'), ('ABC123'), (null), ('0123456789');
set var1= (SELECT TO_CHAR(DATE_TRUNC('DAY',CONVERT_TIMEZONE('UTC', CURRENT_DATE())),'YYYYMMDD'));
set var2 = concat('CLONEME_',$var1);
SELECT getvariable('VAR1'), getvariable('VAR2');
--20200812 CLONEME_20200812
create table identifier($var2) clone CLONEME;
--Table CLONEME_20200812 successfully created

How to load .jsonl into a snowflake table variant?

How to load .jsonl into a table variant as json of snowflake
create or replace table sampleColors (v variant);
insert into
sampleColors
select
parse_json(column1) as v
from
values
( '{r:255,g:12,b:0} {r:0,g:255,b:0} {r:0,g:0,b:255}')
v;
select * from sampleColors;
Error parsing JSON: more than one document in the input
If you want each RGB value in its own row, you need to split the JSONL to a table with one row per JSON using a table function like this:
insert into
sampleColors
select parse_json(VALUE)
from table(split_to_table( '{r:255,g:12,b:0} {r:0,g:255,b:0} {r:0,g:0,b:255} {c:0,m:1,y:1,k:0} {c:1,m:0,y:1,k:0} {c:1,m:1,y:0,k:0}', ' '));

Resources