SnowSQL JSONL vs JSON - snowflake-cloud-data-platform

Is it possible to load JSONL using the the JSON file format type='JSON'? Or would I need to convert the JSONL to JSON?
As it stands right now I am able to stage the data but when I try to copy into a table the query errors out stating there is a data error.

You can use CSV format to read each line of JSONL (with a non-common delimiter), then parse them using PARSE_JSON.
Sample test.jsonl:
{ "id":1, "name":"Gokhan"}
{ "id":2, "name":"Jack"}
{ "id":3, "name":"Joe"}
Sample file format object:
create file format jsonl type=CSV field_delimiter = '*xyz*';
Reading using parse_json:
select parse_json($1) js, js:id, js:name from #my_stage (file_format=>jsonl);

Related

insert csv file into snowflake as variant not working

I am trying to copy csv data into snowflake table with only one column(variant). When I run copy into statement, my variant column is only displaying data from first column. I'm not sure what the problem is. Please help.
Create or replace table name(
RAW variant )
COPY INTO db_name.table_name
FROM (SELECT s.$1::VARIANT FROM #stage_name.csv s);
Assuming your .csv file consists of multiple, valid JSON, try using a file format of type JSON instead of .csv. See https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html.
Alternatively, use PARSE_JSON in your SELECT.

Snowflake JSON import into Stage with double quote in column value

I've set stage file format as JSON and trying to import below data from Azure blob, getting error. see 2nd line last SQLTableName value has double quote.
{"SQLDatabaseName":"TV","SQLSchemaName":"sec","SQLTableName":"UserProfile"}
{"SQLDatabaseName":"LW","SQLSchemaName":"sec","SQLTableName":"User"Profile"}
in sql server i am exporting each row in json above format to move data into snowflake.
Your data in JSON format is incorrect, the second line is not JSON validated.
Check your sample data on this page: JSON Formatter & Validator
You should use some escape character when exporting.
It does not validate:
SELECT PARSE_JSON('{"SQLDatabaseName":"LW","SQLSchemaName":"sec","SQLTableName":"User"Profile"}');
If you change your data and add an escape character, it will validate:
SELECT PARSE_JSON('{"SQLDatabaseName":"LW","SQLSchemaName":"sec","SQLTableName":"User\'Profile"}');
SELECT PARSE_JSON('{"SQLDatabaseName":"LW","SQLSchemaName":"sec","SQLTableName":"User\\"Profile"}');

Snowpipe fails to read AVRO compressed by DEFLATE exported from BigQuery

I am trying to import data exported from BigQuery as AVRO and compressed as DEFLATE. The only encoding common to both is DEFLATE besides NONE.
I am exporting one of publicly available datasets bigquery-public-data:covid19_open_data.covid19_open_data with 13,343,598 rows. I am using the following command to export:
bq extract --destination_format=AVRO --compression=DEFLATE bigquery-public-data:covid19_open_data.covid19_open_data gs://staging/covid19_open_data/avro_deflate/covid19_open_data_2_*.avro
The command creates 17 files in GCP. When I query the data in the files with command:
SELECT count(*) FROM #shared.data_warehouse_ext_stage/covid19_open_data/avro_deflate;
I only get a count of 684,5021 rows. To troubleshoot the error in the pipe I issue the command:
SELECT * from table(information_schema.copy_history(table_name=>'covid19_open_data', start_time=> dateadd(hours, -1, current_timestamp())));
The error reported by the pipeline is as follows:
Invalid data encountered during decompression for file: 'covid19_open_data_3_000000000006.avro',compression type used: 'DEFLATE', cause: 'data error'
The SQL for the File Format command is:
CREATE OR REPLACE FILE FORMAT monitoring_blocking.dv_avro_deflate_format TYPE = AVRO COMPRESSION = DEFLATE;
I know the problem is only related to the compression being DEFLATE. There are only two compressions for AVRO that are common for both BigQuery and Snowflake NONE and DEFLATE. I also created two pipes one file format AVRO with compression NONE and the second with CSV and GZIP. They both load data into the table. The two AVRO pipelines are a mirror of each other except for the file format. Here is snippet of the SQL for the pipe:
CREATE OR REPLACE PIPE covid19_open_data_avro
AUTO_INGEST = TRUE
INTEGRATION = 'GCS_PUBSUB_DATA_WAREHOUSE_NOTIFICATION_INT' AS
COPY INTO covid19_open_data(
location_key
,date
,place_id
,wikidata_id
...
)
FROM
(SELECT
$1:location_key
,$1:date AS date
,$1:place_id AS place_id
,$1:wikidata_id AS wikidata_id
...
FROM #shared.staging/covid19_open_data/avro_deflate)
FILE_FORMAT = monitoring_blocking.dv_avro_deflate_format;
The problem lies within Snowflake. When we change the compression format in the FILE FORMAT definition to AUTO it worked
CREATE OR REPLACE FILE FORMAT my_schema.avro_compressed_format
TYPE = AVRO
COMPRESSION = DEFLATE;
to
CREATE OR REPLACE FILE FORMAT my_schema.avro_compressed_format
TYPE = AVRO
COMPRESSION = AUTO;

SNOWFLAKE Error parsing JSON: incomplete object value when parsing a Json file in snowflake worksheet. (json file is verified and correct)

the problem is I have a json file stored in a stage in one of my databases (newly generated) I am not performing any database related activities I'm just trying to query the json data using :
SELECT parse_json($1):order_id FROM #my_stage/mahdi_test.json.gz t;
and here is the mahdi_test.json sample :
{"order_id": 67, "file_id": *****, "file_name": "name.pdf", "file_type": "pdf", "language_id": 1, "created_dt": "2030-11-17 19:39:25", "delivery_id": *****},
(the "*" are just to avoid showing actual data.)
the json file contains multiple lines just like the sample above ... but the results of the query is :
"Error parsing JSON: incomplete object value, pos 17"
the most tricky part is that I took the same json file into another DB's stage (this database was generated before and not by me) and tried the same thing in the snowflake worksheet ( I changed the database in the panel on top right side of worksheet to the older DB) and started the exact same query with the exact same json file ... but this time it worked and it showed me the results.
what is causing this problem how can I make the new database to act like the other one, because clearly there is nothing wrong with the json file itself because it worked on the legacy database.
The answer to this question ended up being that the stage did not have a file format specified. Adding a file format that specified a JSON format to the stage fixed the issue.

Stream Analytics GetArrayElements as String

I have a Stream analytics job that gets the data from an external source (I do not have a say on how the data is being formatted). I am trying to import the data into my data lake, storing as a JSON. This works fine, but I also want to get the output in a CSV, this is where I am having trouble.
As the input data has an array as one of the column, when importing in JSON it recognizes it and provides the right data i.e. places them in brackets [A, B, C], but when I use it in CSV I get the column represented as the word "Array". I thought I would convert it to XML, use STUFF and get them in one line, but it does not like using a SELECT statement in a CROSS APPLY.
Has anyone worked with Stream Analytics importing data into CSV, that has array column? If so, how did you manage to import the array values?
Sample data:
[
{"GID":"10","UID":1,"SID":"5400.0","PG:["75aef","e5f8e"]},
{"GID":"10","UID":2,"SID":"4400.0","PG:["75aef","e5f8e","6d793"]}
]
PG is the column I am trying to extract, so the output CSV should look something like.
GID|UID|SID|PG
10|1|5400.0|75aef,e5f8e
10|2|4400.0|75aef,e5f8e,6d793
This is the query I am using,
SELECT
D.GID ,
D.UID ,
D.SID ,
A.ArrayValue
FROM
dummy AS D
CROSS APPLY GetArrayElements(D.PG) AS A
As you could imagine, this gives me results in this format.
GID|UID|SID|PG
10|1|5400.0|75aef
10|1|5400.0|e5f8e
10|2|4400.0|75aef
10|2|4400.0|e5f8e
10|2|4400.0|6d793
As Pete M said, you could try to create a JavaScript user-defined function to convert an array to a string, and then you could call this User-defined function in your query.
JavaScript user-defined function:
function main(inputobj) {
var outstring = inputobj.toString();
return outstring;
}
Call UDF in query:
SELECT
TI.GID,TI.UID,TI.SID,udf.extractdatafromarray(TI.PG)
FROM
[TEST-SA-DEMO-BLOB-Input] as TI
Result:

Resources