Query internal stage Snowflake - snowflake-cloud-data-platform

Following the steps in the documentation I created a stage and a file format in Snowflake, then staged a csv file with PUT
USE IA;
CREATE OR REPLACE STAGE csv_format_2;
CREATE OR REPLACE FILE FORMAT csvcol26 type='csv' field_delimiter='|';
PUT file://H:\\CSV_SWF_file_format_stage.csv #IA.public.csv_format_2
When I tried to query the staged object
SELECT a.$1 FROM #csv_format_2 (FORMAT=>'csvcol26', PATTERN=>'CSV_SWF_file_format_stage.csv.gz') a
I got:
SQL Error [2] [0A000]: Unsupported feature 'TABLE'.
Any idea on this error?

The first argument should be FILE_FORMAT instead of FORMAT:
SELECT a.$1
FROM #csv_format_2 (FILE_FORMAT=>'csvcol26',PATTERN=>'CSV_SWF_file_format_stage.csv.gz') a;
Related: Querying Data in Staged Files
Query staged data files using a SELECT statement with the following syntax:
SELECT [<alias>.]$<file_col_num>[.<element>] [ , [<alias>.]$<file_col_num>[.<element>] , ... ]
FROM { <internal_location> | <external_location> }
[ ( FILE_FORMAT => '<namespace>.<named_file_format>', PATTERN => '<regex_pattern>' ) ]
[ <alias> ]

Related

Found character ':' instead of field delimiter ','

Again I am facing an issue with loading a file into snowflake.
My file format is:
TYPE = CSV
FIELD_DELIMITER = ','
FIELD_OPTIONALLY_ENCLOSED_BY = '\042'
NULL_IF = ''
ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE
[ COMMENT = '<string_literal>' ]
Now by running the:
copy into trips from #citibike_trips
file_format=CSV;
I am receiving the following error:
Found character ':' instead of field delimiter ','
File 'citibike-trips-json/2013-06-01/data_01a304b5-0601-4bbe-0045-e8030021523e_005_7_2.json.gz', line 1, character 41
Row 1, column "TRIPS"["STARTTIME":2]
If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.
I am a little confused about the file I am trying to load. Actually, I got the file from a tutorial on YouTube and in the video, it works properly. However, inside the file, there are not only CSV datasets, but also JSON, and parquet. I think this could be the problem, but I am not sure to solve it, since the command code above is having the file_format = CSV.
Remove FIELD_OPTIONALLY_ENCLOSED_BY = '\042' , recreate the file format and run the copy statement again.
You're trying to import a JSON file using a CSV file format. In most cases all you need to do is specify JSON as the file type in the COPY INTO statement.
FILE_FORMAT = ( { FORMAT_NAME = '[<namespace>.]<file_format_name>' |
TYPE = { CSV | JSON | AVRO | ORC | PARQUET | XML } [ formatTypeOptions ] } ) ]
You're using CSV, but it should be JSON:
FILE_FORMAT = (TYPE = JSON)
If you're more comfortable using a named file format, use the builder to create a named file format that's of type JSON:
I found a thread in the Snowflake Community forum that explains what I think you might have been facing. There are now three different kinds of files in the stage - CSV, parquet, and JSON. The copy process given in the tutorial expects there to be only CSV. You can use this syntax to exclude non-CSV files from the copy:
copy into trips from #citibike_trips
on_error = skip_file
pattern = '.*\.csv\.gz$'
file_format = csv;
Using the PATTERN option with a regular expression you can filter only the csv files to be loaded.
https://community.snowflake.com/s/feed/0D53r0000AVKgxuCQD
And if you also run into an error related to timestamps, you will want to set this file format before you do the copy:
create or replace file format
citibike.public.csv
type = 'csv'
field_optionally_enclosed_by = '\042'
S3 to Snowflake ( loading csv data in S3 to Snowflake table throwing following error)

Load JSON Data into Snow flake table

My Data is follows:
[ {
"InvestorID": "10014-49",
"InvestorName": "Blackstone",
"LastUpdated": "11/23/2021"
},
{
"InvestorID": "15713-74",
"InvestorName": "Bay Grove Capital",
"LastUpdated": "11/19/2021"
}]
So Far Tried:
CREATE OR REPLACE TABLE STG_PB_INVESTOR (
Investor_ID string, Investor_Name string,Last_Updated DATETIME
); Created table
create or replace file format investorformat
type = 'JSON'
strip_outer_array = true;
created file format
create or replace stage investor_stage
file_format = investorformat;
created stage
copy into STG_PB_INVESTOR from #investor_stage
I am getting an error:
SQL compilation error: JSON file format can produce one and only one column of type variant or object or array. Use CSV file format if you want to load more than one column.
You should be loading your JSON data into a table with a single column that is a VARIANT. Once in Snowflake you can either flatten that data out with a view or a subsequent table load. You could also flatten it on the way in using a SELECT on your COPY statement, but that tends to be a little slower.
Try something like this:
CREATE OR REPLACE TABLE STG_PB_INVESTOR_JSON (
var variant
);
create or replace file format investorformat
type = 'JSON'
strip_outer_array = true;
create or replace stage investor_stage
file_format = investorformat;
copy into STG_PB_INVESTOR_JSON from #investor_stage;
create or replace table STG_PB_INVESTOR as
SELECT
var:InvestorID::string as Investor_id,
var:InvestorName::string as Investor_Name,
TO_DATE(var:LastUpdated::string,'MM/DD/YYYY') as last_updated
FROM STG_PB_INVESTOR_JSON;

SQL Compilation error while loading CSV file from S3 to Snowflake

we are facing below issue while loading csv file from S3 to Snowflake.
SQL Compilation error: Insert column value list does not match column list expecting 7 but got 6
we have tried removing the column from table and tried again but this time it is showing expecting 6 but got 5
below are the the commands that we have used for stage creation and copy command.
create or replace stage mystage
url='s3://test/test'
STORAGE_INTEGRATION = test_int
file_format = (type = csv FIELD_OPTIONALLY_ENCLOSED_BY='"' COMPRESSION=GZIP);
copy into mytable
from #mystage
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;
FILE_FORMAT = (TYPE = CSV FIELD_OPTIONALLY_ENCLOSED_BY='"' COMPRESSION=GZIP error_on_column_count_mismatch=false TRIM_SPACE=TRUE NULL_IF=(''))
FORCE = TRUE
ON_ERROR = Continue
PURGE=TRUE;
You can not use MATCH_BY_COLUMN_NAME for the CSV files, this is why you get this error.
This copy option is supported for the following data formats:
JSON
Avro
ORC
Parquet
https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html

Unloading Snowflake table data into S3 in Parquet format

i am trying to unload Snowflake tale data into S3 bucket in parquet format. but getting below error.
`SQL compilation error: COPY statement only supports simple SELECT from stage statements for import.`
below is the syntax of copy statement
`create or replace stage STG_LOAD
url='s3://bucket/foler'
credentials=(aws_key_id='xxxx',aws_secret_key='xxxx')
file_format = (type = PARQUET);
copy into STG_LOAD from
(select OBJECT_CONSTRUCT(country_cd,source)
from table_1
file_format = (type='parquet')
header='true';`
please let me know if i am missing anything here.
You have to identify named stages using the # symbol. Also the header option is true rather than 'true'
copy into #STG_LOAD from
(select OBJECT_CONSTRUCT(country_cd,source)
from table_1 )
file_format = (type='parquet')
header=true;

How to use inline file format to query data from stage in Snowflake data warehouse

Is there any way to query data from a stage with an inline file format without copying the data into a table?
When using a COPY INTO table statement, I can specify an inline file format:
COPY INTO <table>
FROM (
SELECT ...
FROM #my_stage/some_file.csv
)
FILE_FORMAT = (
TYPE = CSV,
...
);
However, the same thing doesn't work when running the same select query directly, outside of the COPY INTO command:
SELECT ...
FROM #my_stage/some_file.csv
(FILE_FORMAT => (
TYPE = CSV,
...
));
Instead, the best I can do is to use a pre-existing file format:
SELECT ...
FROM #my_stage/some_file.csv
(FILE_FORMAT => 'my_file_format');
But this doesn't allow me to programatically change the file format when creating the query. I've tried every syntax variation possible, but this just doesn't seem to be supported right now.
I don't believe it is possible but, as a workaround, can't you create the file format programatically, use that named file format in your SQL and then, if necessary, drop it?

Resources