Snowflake stage creation with relative path - snowflake-cloud-data-platform

Snowflake stage creation with relative path - snowflake-cloud-data-platform

I'm trying to use the "test" folder name as dynamic folder path for Snowflake stage creation, which comes after the s3 url.
Copy command runs but returns zero records.
create or replace stage MYSQL_S3 url='s3://myproject/product/BackEnd/'
credentials=(aws_key_id='x' aws_secret_key='s' AWS_TOKEN='s')
file_format = myformat.format_csv;
copy into test from #MYSQL_S3 pattern = 'test/'
Anything is missing?

Try pattern='.*test/.*'. Pattern is a regex and has to match the path (S3 key) and the filename. You can use LIST to see what the pattern is matching.
Or to append test/ to the stage url, try copy into test from #MYSQL_S3/test/;

Related

Dynamic stage path in snowflake

I have a stage path as below
copy into table1 as (
select $1:InvestorID::varchar as Investor_ID from #company_stage/pbook/2022-03-10/Invor/part-00000-33cbc68b-69c1-40c0-943c-f586dfab3f49-c000.snappy.parquet
)
This is my S3 location company_stage/pbook/2022-03-10/Invor,
I need to make this dynamic:
I) I need to change this "2022-03-10" folder to current date
II)it must take all parquet files in the folder automatically, without me mentioning of filename. How to achieve this?

Here is one approach. Your stage shouldn't include the date as part of the stage name because if it did, you would need a new stage every day. Better to define the stage as company_stage/pbook/.
To make it dynamic, I suggest using the pattern option together with the COPY INTO command. You could create a variable with the regex pattern expression using current_date(), something like this:
set mypattern = '\.*'||to_char(current_date(), 'YYYY-MM-DD')||'\.*';
Then use this variable in your COPY INTO command like this:
copy into table1 as (
select $1:InvestorID::varchar as Investor_ID from #company_stage/pbook/ pattern = $mypattern
)
Of course you can adjust your pattern matching as you see fit.

using pattern while loading csv file from s3 to snowflake

below copy command is not working , please correct me if something wrong.
copy into mytable from #mystage pattern='20.*csv.gz'
here i am trying to load the files which are starts with 20, there are mix of files which are having the name as 2021myfile.csv.gz, myfile202109.csv.gz, above command is not loading any files though there are files which starts with 20.
if i use pattern as pattern='.*20.*csv.gz'`` it is taking all the files which is wrong, i need to load only the files that are starts with 20`.
Thanks!

This is because the pattern clause is a Regex expression.
Try this:
copy into mytable from #mystage pattern = '[20]*.*csv.gz'
Reference: Loading Using Pattern Matching

query snowflake s3 external file

I've created an S3 [external] stage and uploaded csv files into \stage*.csv folder.
I can see stage content by doing list #my_stage.
if I query the stage
select $1,$2,$3,$4,$5,$6 from #my_s3_stage it looks like I'm randomly picking up files.
So I'm trying to select from specific file by adding a pattern
PATTERN => job.csv
This returns no results.
Note: I've used snowflake for all of 5 hours so pretty new to syntax

For a pattern you can use
select t.$1, t.$2 from #mystage1 (file_format => 'myformat', pattern=>'.*data.*[.]csv.gz') t;
The pattern is a regex expression.
For a certain file you have to add the file name to the query like this:
select t.$1, t.$2 from #mystage/data1.csv.gz;
If your file format is set in your stage definition, you don't need the file format-parameter.
More info can be found here: https://docs.snowflake.com/en/user-guide/querying-stage.html

How to give pattern in snowflake external table

I have 2k files in S3 in a specific path (where only these files are present) with pattern emp_test_user_1.csv,emp_test_user_2.csv and so on.
Have to create an external table and load the data there. When I am trying to create and giving pattern like pattern='emp_test_user*.csv';
And loading the table, data is not getting loaded to it.
Could you please help me.

Try changing the pattern to match on the whole path, not just the filename. For example:
pattern='.*emp_test_user.*\\.csv'.
The .* matches zero or more chars. To match a dot you can use [.] or \\..
UPDATE
You can test your pattern with LIST:
list #mystage pattern='.*emp_test_user.*[.]csv'

Pentaho kittle - Skip files in root and only move/process the files in subfolder

I need to copy the files from User1 and user2 folder to "process" folder but I need to skip the files that may be dropped by the user directly into Root folder.
-Root
+ User1
+ User2
Is there any way xpression can be used to skip to move those files or
can we exclude the files directly under Root folder to be excluded to be processed.
Thanks,
Rahul

You can do this by loading your result file list from a transform and then using a Process result filenames step. The job is very simple:
The transform is where the logic happens. Add two Get File Names steps, one for the user1 and user2 files, and one for the exception (root) files. When configuring these be sure to click the 'Filters' tab and uncheck 'Add filenames to result'. Read the exceptions directory with a Stream lookup. Compare the short_filename field from both input steps and specify 'exists' as your lookup field. It should look something like this:
The Filter Rows step flows off all the files that exist in the exceptions directory, and the Set files in result step puts only the ones that don't exist in the exceptions directory into the Job's file results. Be sure to use the 'filename' field rather than the 'short_filename` field here.
Then in the Job, the Process result filenames can be configured to do what you want with the files (move/copy/delete, etc).