Data Load into Snowflake table - Geometry data - snowflake-cloud-data-platform

I have a requirement to load a csv file which contains geometry data into a Snowflake table.
I am using data load option which is available in the Snowflake WebGUI.
The sample geometry data is as below.
LINESTRING (-118.808186210713 38.2287933407744, -118.808182249848 38.2288155788245, -118.807079844554 38.2293234553217, -118.806532314702 38.229961732287, -118.80625724007 38.2306350645631, -118.805071970015 38.231849721603, -118.804097093763 38.2325380450286, -118.803504299857 38.2328501734747, -118.802726055048 38.2332839062976, -118.802126140311 38.2334442483131, -118.801758172942 38.233542312624)
Since commas are present in the geometry data, the data load option is treating them as a separate column and throwing a error.
I tried using updating the csv file with "to_geography" function as below, but still no luck.
TO_GEOGRAPHY(LINESTRING (-118.808186210713 38.2287933407744, -118.808182249848 38.2288155788245, -118.807079844554 38.2293234553217, -118.806532314702 38.229961732287, -118.80625724007 38.2306350645631, -118.805071970015 38.231849721603, -118.804097093763 38.2325380450286, -118.803504299857 38.2328501734747, -118.802726055048 38.2332839062976, -118.802126140311 38.2334442483131, -118.801758172942 38.233542312624))
So any pointers on this would be appreciated, the full content of the csv file is as below.
ID," GEOGRAPHIC_ROUTE",Name
12421,"LINESTRING (-118.808186210713 38.2287933407744, -118.808182249848 38.2288155788245, -118.807079844554 38.2293234553217, -118.806532314702 38.229961732287, -118.80625724007 38.2306350645631, -118.805071970015 38.231849721603, -118.804097093763 38.2325380450286, -118.803504299857 38.2328501734747, -118.802726055048 38.2332839062976, -118.802126140311 38.2334442483131, -118.801758172942 38.233542312624)",Winston

As I see, the fields are enclosed by double quotes to prevent misinterpretion of comma characters of geographic data (which is good!)
Could you set FIELD_OPTIONALLY_ENCLOSED_BY to '"' (double quote) for your file format, try to re-import the file?
https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html#format-type-options-formattypeoptions
I am able to ingest the sample data using the following COPY command:
copy into TEST_TABLE from #my_stage
FILE_FORMAT = (type = csv, FIELD_OPTIONALLY_ENCLOSED_BY='"', skip_header =1 );

Related

how to Read headers of a CSV file in Snowflake stage

I am learning snowflake ,I was enter image description here trying to read the headers of CSV file stored in aws bucket ..I used the metadata fields that required me to input $1,$2 as column names and so on to obtain headers(for copy into table creation)..
is there a better alternative to this?
Statement :
select
Top 100 metadata$filename,
metadata$file_row_number,
t.$1,
t.$2,
t.$3,
t.$4,
t.$5,
t.$6
from
#aws_stage t
where
metadata$filename = 'OrderDetails.csv'

How to solve error "Field delimiter ',' found while expecting record delimiter '\n'" while loading json data to the stage

I am trying to "COPY INTO" command to load data from s3 to the snowflake
Below are the steps I followed to create the stage and loading file from stage to Snowflake
JSON file
{
"Name":"Umesh",
"Desigantion":"Product Manager",
"Location":"United Kingdom"
}
create or replace stage emp_json_stage
url='s3://mybucket/emp.json'
credentials=(aws_key_id='my id' aws_secret_key='my key');
# create the table with variant
CREATE TABLE emp_json_raw (
json_data_raw VARIANT
);
#load data from stage to snowflake
COPY INTO emp_json_raw from #emp_json_stage;
I am getting below error
Field delimiter ',' found while expecting record delimiter '\n' File
'emp.json', line 2, character 18 Row 2, column
"emp_json_raw"["JSON_DATA_RAW":1]
I am using a simple JSON file, and I don't understand this error.
What causes it and how can I solve it?
File format is not specified and is defaulting to CSV format hence the error.
Try this:
COPY INTO emp_json_raw
from #emp_json_stage
file_format=(TYPE=JSON);
There are other options too that can be specified with file_format other than TYPE. Refer the documentation here: https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html#type-json
try:
file_format = (type = csv field_optionally_enclosed_by='"')
The default settings do not expect the " wrapping around your data.
So you could strip all the " or ... just set the field_optionally_enclosed_by to a ". This does mean if your data has " in it things get messy.
https://docs.snowflake.com/en/user-guide/getting-started-tutorial-copy-into.html
https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html#type-csv
Also have a standard practice to mention type of file either it could be CSV, JSON ,AVRO , Parquet etc.
https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html

Data not loaded correctly into snowflake table from S3 bucket

I am having trouble loading data from an amazon S3 bucket to the snowflake table. This is my command:
copy into myTableName
from 's3://dev-allocation-storage/data_feed/'
credentials=(aws_key_id='***********' aws_secret_key='**********')
PATTERN='.*/.*/.*/.*'
file_format = (type = csv field_delimiter = '|' skip_header = 1 error_on_column_count_mismatch=false );
I have 3 CSV files in my bucket and they are all being loaded into the table. But I have 8 columns in my target table, but they are all being loaded into the first columns as a JSON object.
Check that you do not have each row enclosed in double-quotes. Something like "f1|f2|...|f8". This will be treated like one single column value. Unlike "f1"|"f2"|...|"f8".

Snowflake-Internal Stage data load error: How to load "\" character

In a file, few of the rows have \ in a column value for example, i have rows in below format.
101,Path1,Z:\VMC\PSPS,abc
102,Path5,C:\wintm\PSPS,abc
I was wondering how to load \ character
COPY INTO TEST_TABLE from #database.schema.stage_name FILE_FORMAT = ( TYPE = CSV FIELD_OPTIONALLY_ENCLOSED_BY = '\"' SKIP_HEADER = 1 );
is there any thing that i can mention the file_format line?
Are you still getting this error? I just tried to recreate it by creating a CSV based off your sample data and a test table. I loaded the CSV into an internal stage and then ran your COPY command. It worked for me. Please see the screenshot below.
Could you provide more details on the error you are facing? Perhaps there was something off with your table definition.

How to get the header names of a staged csv file on snowflake?

Is it possible to get the header of staged csv file on snowflake into an array ?
I need to loop over all fields to insert data into our data vault model and it is really needed to get these column names onto an array.
Actually it was solved by using the following query over a staged file in a JavaScript stored procedure:
var get_length_and_columns_array = "select array_size(split($1,',')) as NO_OF_COL, "+
"split($1,',') as COLUMNS_ARRAY from "+FILE_FULL_PATH+" "+
"(file_format=>"+ONE_COLUMN_FORMAT_FILE+") limit 1";
The ONE_COLUMN_FORMAT_FILE will put all fields into one in order to make this query works:
CREATE FILE FORMAT ONE_COLUMN_FILE_FORMAT
TYPE = 'CSV' COMPRESSION = 'AUTO' FIELD_DELIMITER = '|' RECORD_DELIMITER = '\n'
SKIP_HEADER = 0
FIELD_OPTIONALLY_ENCLOSED_BY = 'NONE'
TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE
ESCAPE = 'NONE' ESCAPE_UNENCLOSED_FIELD = '\134'
DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'AUTO' NULL_IF = ('\\N');
Yes, you can query the following metadata of your staged files:
METADATA$FILENAME: Name of the staged data file the current row belongs to. Includes the path to the data file in the stage.
METADATA$FILE_ROW_NUMBER: Row number for each record in the container staged data file.
So there is not enough information. But: There is the parameter SKIP_HEADER that can be used in your COPY INTO-command. So my suggestion for a workaround is:
Copy your data into a table by using SKIP_HEADER and thus also load your header into your table as regular column values
Query the first row which are the column names
Use this as input for further processing
More infos about the parameter within the COPY INTO-Command https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html
Currently dynamically generating the column list from a csv file is not currently available in snowflake or most platform afaik.
csv is not the ideal format for this kind of schema on read operation.
if you are able to work with your input files, I would suggest converting the csv to json. If you use json instead, you can then use snowflake to process the file.
here is some context:
Load CSV files with dynamic column headers Not Supported
Example of loading json data
Example Converting CSV to JSON with Pandas
import pandas as pd
import csvkit
filepath = '/home/username/data/sales.csv'
jsonfilepath = filepath.replace('.csv','.json')
df = pd.read_csv(filepath)
# df.to_json(jsonfilepath, orient="table", date_format="iso", index=False)
df.to_json(jsonfilepath, orient="records", date_format="iso")
print("Input File: {}\r\nOutput File: {}".format(filepath, jsonfilepath))
Example Converting CSV to JSON with csvkit
csvjson -i 4 '/home/username/data/sales.csv' > '/home/username/data/sales.csvkit.json'
Querying Semi-Structured Data in Snowflake
Loading JSON Data into Snowflake
/* Create a target relational table for the JSON data. The table is temporary, meaning it persists only for */
/* the duration of the user session and is not visible to other users. */
create or replace temporary table home_sales (
city string,
zip string,
state string,
type string default 'Residential',
sale_date timestamp_ntz,
price string
);
/* Create a named file format with the file delimiter set as none and the record delimiter set as the new */
/* line character. */
/* */
/* When loading semi-structured data (e.g. JSON), you should set CSV as the file format type (default value). */
/* You could use the JSON file format, but any error in the transformation would stop the COPY operation, */
/* even if you set the ON_ERROR option to continue or skip the file. */
create or replace file format sf_tut_csv_format
field_delimiter = none
record_delimiter = '\\n';
/* Create a temporary internal stage that references the file format object. */
/* Similar to temporary tables, temporary stages are automatically dropped at the end of the session. */
create or replace temporary stage sf_tut_stage
file_format = sf_tut_csv_format;
/* Stage the data file. */
/* */
/* Note that the example PUT statement references the macOS or Linux location of the data file. */
/* If you are using Windows, execute the following statement instead: */
-- PUT file://%TEMP%/sales.json #sf_tut_stage;
put file:///tmp/sales.json #sf_tut_stage;
/* Load the JSON data into the relational table. */
/* */
/* A SELECT query in the COPY statement identifies a numbered set of columns in the data files you are */
/* loading from. Note that all JSON data is stored in a single column ($1). */
copy into home_sales(city, state, zip, sale_date, price)
from (select substr(parse_json($1):location.state_city,4), substr(parse_json($1):location.state_city,1,2),
parse_json($1):location.zip, to_timestamp_ntz(parse_json($1):sale_date), parse_json($1):price
from #sf_tut_stage/sales.json.gz t)
on_error = 'continue';
/* Query the relational table */
select * from home_sales;

Resources