using pattern while loading csv file from s3 to snowflake - snowflake-cloud-data-platform

below copy command is not working , please correct me if something wrong.
copy into mytable from #mystage pattern='20.*csv.gz'
here i am trying to load the files which are starts with 20, there are mix of files which are having the name as 2021myfile.csv.gz, myfile202109.csv.gz, above command is not loading any files though there are files which starts with 20.
if i use pattern as pattern='.*20.*csv.gz'`` it is taking all the files which is wrong, i need to load only the files that are starts with 20`.
Thanks!

This is because the pattern clause is a Regex expression.
Try this:
copy into mytable from #mystage pattern = '[20]*.*csv.gz'
Reference: Loading Using Pattern Matching

Related

Dynamic stage path in snowflake

I have a stage path as below
copy into table1 as (
select $1:InvestorID::varchar as Investor_ID from #company_stage/pbook/2022-03-10/Invor/part-00000-33cbc68b-69c1-40c0-943c-f586dfab3f49-c000.snappy.parquet
)
This is my S3 location company_stage/pbook/2022-03-10/Invor,
I need to make this dynamic:
I) I need to change this "2022-03-10" folder to current date
II)it must take all parquet files in the folder automatically, without me mentioning of filename. How to achieve this?
Here is one approach. Your stage shouldn't include the date as part of the stage name because if it did, you would need a new stage every day. Better to define the stage as company_stage/pbook/.
To make it dynamic, I suggest using the pattern option together with the COPY INTO command. You could create a variable with the regex pattern expression using current_date(), something like this:
set mypattern = '\.*'||to_char(current_date(), 'YYYY-MM-DD')||'\.*';
Then use this variable in your COPY INTO command like this:
copy into table1 as (
select $1:InvestorID::varchar as Investor_ID from #company_stage/pbook/ pattern = $mypattern
)
Of course you can adjust your pattern matching as you see fit.

Stage command metadata$filename shows hundreds of occurences of single file

I have files in my stage that I want to query, as I want to include filenames in the result, I use the metadata$filename command.
My stage is an Azure ADLS GEN 2.
I have only one file matching the following regexp in my stage : .*regular.*[.]png.
When I run the command
SELECT
metadata$filename
FROM
#dev_silver_db.common.stage_bronze/DEV/BRONZE/<CENSORED>/S21/2715147 (
PATTERN => $pattern_png
)
AS t
I have 562 occurences of the same file in my result.
I thought that it was a bug from my IDE at first and double checked on Snowflake's history and this is the actual result from the request.
If I run LIST, the proper dataset (1 result only) is returned.
If I run the following command (the same with any union).
SELECT $pattern_png
UNION
SELECT
metadata$filename
FROM
#dev_silver_db.common.stage_bronze/DEV/BRONZE/<CENSORED>/S21/2715147 (
PATTERN => $pattern_png
)
AS t
I get the following result.
In my opinion, this behavior should be considered a bug, but I may have missed something.
For now I will just use TOP(1) because this is fine in my case but it may become a problem in other contextes.
Thank you in advance for your insights.
When you SELECT from a stage you are actually reading content of the file using a FILE FORMAT. When not specified CSV file format is used by default.
I think that what you're actually seeing is the metadata$filename information duplicated on every "row" that snowflake can read in your file.

query snowflake s3 external file

I've created an S3 [external] stage and uploaded csv files into \stage*.csv folder.
I can see stage content by doing list #my_stage.
if I query the stage
select $1,$2,$3,$4,$5,$6 from #my_s3_stage it looks like I'm randomly picking up files.
So I'm trying to select from specific file by adding a pattern
PATTERN => job.csv
This returns no results.
Note: I've used snowflake for all of 5 hours so pretty new to syntax
For a pattern you can use
select t.$1, t.$2 from #mystage1 (file_format => 'myformat', pattern=>'.*data.*[.]csv.gz') t;
The pattern is a regex expression.
For a certain file you have to add the file name to the query like this:
select t.$1, t.$2 from #mystage/data1.csv.gz;
If your file format is set in your stage definition, you don't need the file format-parameter.
More info can be found here: https://docs.snowflake.com/en/user-guide/querying-stage.html

How to give pattern in snowflake external table

I have 2k files in S3 in a specific path (where only these files are present) with pattern emp_test_user_1.csv,emp_test_user_2.csv and so on.
Have to create an external table and load the data there. When I am trying to create and giving pattern like pattern='emp_test_user*.csv';
And loading the table, data is not getting loaded to it.
Could you please help me.
Try changing the pattern to match on the whole path, not just the filename. For example:
pattern='.*emp_test_user.*\\.csv'.
The .* matches zero or more chars. To match a dot you can use [.] or \\..
UPDATE
You can test your pattern with LIST:
list #mystage pattern='.*emp_test_user.*[.]csv'

using pg_read_file read file in desktop PostgreSQL

I wanted to how to read a file in my desk top using pg_read_file in PostgreSQL
pg_read_file(filename text [, offset bigint, length bigint])
my query
select pg_read_file('/root/desktop/new.txt' , 0 , 1000000);
error
ERROR: absolute path not allowed
UPDATE
pg_read_file can read the files only from the data directory path, if you would like to know your data directory path use:
SHOW data_directory;
I think that you can resolve you problem by looking to this post
If you're using psql you can use \lo_import to create a large object from a local file.
The pg_read_file tool only allows reads from server-side files.
To read the content of a file from PostgreSQL you can use this.
CREATE TABLE demo(t text);
COPY demo from '[FILENAME]';
SELECT * FROM demo;
Each text-line in a SQL-ROW. Useful for temporary transfers.
lo_import(file path) will generate an oid.This may solve your problem. you can import any type of file using this (even image)

Resources