File format creation using Python in Snowflake - snowflake-cloud-data-platform

We are using python for data load so we need to create file format in snowflake using python. I have tried to creating file format through python but it got error out.
Could someone please share the sample python script for creating file format using python.

You can execute your file format DDL statement via Python cursor.
snowflake.connector as sfc
# --Snowflake Connection Setup
cnx = sfc.connect(
user='user',
password=pwd,
account='account',
warehouse = 'warehouse',
database='database',
schema='schema',
role = 'role')
cnx.cursor().execute("create or replace file format mycsvformat type = 'CSV'
field_delimiter '|' skip_header = 1;")

Related

Found character ':' instead of field delimiter ','

Again I am facing an issue with loading a file into snowflake.
My file format is:
TYPE = CSV
FIELD_DELIMITER = ','
FIELD_OPTIONALLY_ENCLOSED_BY = '\042'
NULL_IF = ''
ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE
[ COMMENT = '<string_literal>' ]
Now by running the:
copy into trips from #citibike_trips
file_format=CSV;
I am receiving the following error:
Found character ':' instead of field delimiter ','
File 'citibike-trips-json/2013-06-01/data_01a304b5-0601-4bbe-0045-e8030021523e_005_7_2.json.gz', line 1, character 41
Row 1, column "TRIPS"["STARTTIME":2]
If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.
I am a little confused about the file I am trying to load. Actually, I got the file from a tutorial on YouTube and in the video, it works properly. However, inside the file, there are not only CSV datasets, but also JSON, and parquet. I think this could be the problem, but I am not sure to solve it, since the command code above is having the file_format = CSV.
Remove FIELD_OPTIONALLY_ENCLOSED_BY = '\042' , recreate the file format and run the copy statement again.
You're trying to import a JSON file using a CSV file format. In most cases all you need to do is specify JSON as the file type in the COPY INTO statement.
FILE_FORMAT = ( { FORMAT_NAME = '[<namespace>.]<file_format_name>' |
TYPE = { CSV | JSON | AVRO | ORC | PARQUET | XML } [ formatTypeOptions ] } ) ]
You're using CSV, but it should be JSON:
FILE_FORMAT = (TYPE = JSON)
If you're more comfortable using a named file format, use the builder to create a named file format that's of type JSON:
I found a thread in the Snowflake Community forum that explains what I think you might have been facing. There are now three different kinds of files in the stage - CSV, parquet, and JSON. The copy process given in the tutorial expects there to be only CSV. You can use this syntax to exclude non-CSV files from the copy:
copy into trips from #citibike_trips
on_error = skip_file
pattern = '.*\.csv\.gz$'
file_format = csv;
Using the PATTERN option with a regular expression you can filter only the csv files to be loaded.
https://community.snowflake.com/s/feed/0D53r0000AVKgxuCQD
And if you also run into an error related to timestamps, you will want to set this file format before you do the copy:
create or replace file format
citibike.public.csv
type = 'csv'
field_optionally_enclosed_by = '\042'
S3 to Snowflake ( loading csv data in S3 to Snowflake table throwing following error)

How to solve error "Field delimiter ',' found while expecting record delimiter '\n'" while loading json data to the stage

I am trying to "COPY INTO" command to load data from s3 to the snowflake
Below are the steps I followed to create the stage and loading file from stage to Snowflake
JSON file
{
"Name":"Umesh",
"Desigantion":"Product Manager",
"Location":"United Kingdom"
}
create or replace stage emp_json_stage
url='s3://mybucket/emp.json'
credentials=(aws_key_id='my id' aws_secret_key='my key');
# create the table with variant
CREATE TABLE emp_json_raw (
json_data_raw VARIANT
);
#load data from stage to snowflake
COPY INTO emp_json_raw from #emp_json_stage;
I am getting below error
Field delimiter ',' found while expecting record delimiter '\n' File
'emp.json', line 2, character 18 Row 2, column
"emp_json_raw"["JSON_DATA_RAW":1]
I am using a simple JSON file, and I don't understand this error.
What causes it and how can I solve it?
File format is not specified and is defaulting to CSV format hence the error.
Try this:
COPY INTO emp_json_raw
from #emp_json_stage
file_format=(TYPE=JSON);
There are other options too that can be specified with file_format other than TYPE. Refer the documentation here: https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html#type-json
try:
file_format = (type = csv field_optionally_enclosed_by='"')
The default settings do not expect the " wrapping around your data.
So you could strip all the " or ... just set the field_optionally_enclosed_by to a ". This does mean if your data has " in it things get messy.
https://docs.snowflake.com/en/user-guide/getting-started-tutorial-copy-into.html
https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html#type-csv
Also have a standard practice to mention type of file either it could be CSV, JSON ,AVRO , Parquet etc.
https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html

Load data into Snowflake using Pentaho

I am using pentaho 7.1 and trying to load data to Snowflake . The SQl is running fine on Snowflake. But in Pentaho I am getting error :
Couldn't execute SQL: copy into "DEMO_DB"."PUBLIC"."STG_DIMACTIVITY" Table 'DEMO_DB.PUBLIC.STG_DIMACTIVITY' does not exist
SQL used is :
copy into "DEMO_DB"."PUBLIC"."STG_DIMACTIVITY"
from #my_s3_stage
FILES = ('MB_ACTIVITY.txt_0')
--pattern='.*MB_ACTIVITY.txt_0.*'
file_format = (type = csv field_delimiter = '|' skip_header = 1)
force=true;
Please let me know what i am missing here.Any help is much appreciated.

C# Filestream to SQL Server database

I want to create a file in SQL Server from a string. I can't figure out how to put it into the database. After reading it seems it has someting to do with filestream. If so then once the stream is created then how do I put that to my DB as a file?
FileStream fs1 = new FileStream("somefilename", FileMode.Create, FileAccess.Write);
StreamWriter writer = new StreamWriter(fs1);
writer.WriteLine("file content line 1");
writer.Close();
What I am trying to achieve is create a file from a string. I believe that my db is already set up for files. As we have a savefile method that works:
HttpPostedFile file = uploadedFiles[i];
if (file.ContentLength < 30000000)
{
//DOFileUpload File = CurrentBRJob.SaveFile(CurrentSessionContext.Owner.ContactID, Job.JobID, fileNew.PostedFile);
DOFileUpload File = CurrentBRJob.SaveFile(CurrentSessionContext.Owner.ContactID, Job.JobID, file, file.ContentLength, CurrentSessionContext.CurrentContact.ContactID);
DOJobFile jf = CurrentBRJob.CreateJobFile(CurrentSessionContext.Owner.ContactID, Job.JobID, File.FileID);
CurrentBRJob.SaveJobFile(jf);
}
What I want to do is: Instead of the user selecting a file for us to save to the DB. I want to instead create that file internally with strings and then save it to the db.
Create a a column type of any one below. Use ADO.NET SqlCommand write it to database.
varbinary(max) - to write binary data
nvarchar(max) - for unicode text data (i mean if text involves UNICODE chars)
varchar(max) - for non unicode text data

How can I import a PostgreSQL .pgc file into a Postgres DB

I am expecting a data set to be supplied for data migration into a new system. The legacy vendor has supplied me with a .pgc file.
What is this? Is this a data file? Google tells me its an embedded SQL Program.
How can I import this to my local Postgres DB to get at the data set?
The output of command file filename.pgc is
file energyresourcingprod.pgc
energyresourcingprod.pgc: PostgreSQL custom database dump - v1.12-0
The first few lines from text editor are:
PGDMPrenergyresourcingprod9.2.49.2.4∑±00ENCODINGENCODINGSET client_encoding = 'UTF8';
false≤00
STDSTRINGS
STDSTRINGS)SET standard_conforming_strings = 'off';
false≥126214581287energyresourcingprodDATABASErCREATE DATABASE energyresourcingprod WITH TEMPLATE = template0 ENCODING = 'UTF8' LC_COLLATE = 'C' LC_CTYPE = 'C';
$DROP DATABASE energyresourcingprod;
carerixfalse26152200publicSCHEMACREATE SCHEMA public;
DROP SCHEMA public;
The file is 300Mb and the majority of it contains hashed/base64? content:
ßû+˜)™yä⁄%(»j9≤\§^¸S∏Cîó|%ëflsfi∆†p1ñºúíñ Í∆î≈3õµ=qn
Mµ¢©]Q,uÆ<*Å™ííP’ÍOõ…∫U1Eu͡ IîfiärJ¥›•$ø...
...
Many Thanks
It's just a plain PostgreSQL dump.
Use pg_restore to load it into a database.
It's weird that they used that filename, but ultimately insignificant.

Resources