How to give pattern in snowflake external table - snowflake-cloud-data-platform

How to give pattern in snowflake external table - snowflake-cloud-data-platform

I have 2k files in S3 in a specific path (where only these files are present) with pattern emp_test_user_1.csv,emp_test_user_2.csv and so on.
Have to create an external table and load the data there. When I am trying to create and giving pattern like pattern='emp_test_user*.csv';
And loading the table, data is not getting loaded to it.
Could you please help me.

Try changing the pattern to match on the whole path, not just the filename. For example:
pattern='.*emp_test_user.*\\.csv'.
The .* matches zero or more chars. To match a dot you can use [.] or \\..
UPDATE
You can test your pattern with LIST:
list #mystage pattern='.*emp_test_user.*[.]csv'

Related

using pattern while loading csv file from s3 to snowflake

below copy command is not working , please correct me if something wrong.
copy into mytable from #mystage pattern='20.*csv.gz'
here i am trying to load the files which are starts with 20, there are mix of files which are having the name as 2021myfile.csv.gz, myfile202109.csv.gz, above command is not loading any files though there are files which starts with 20.
if i use pattern as pattern='.*20.*csv.gz'`` it is taking all the files which is wrong, i need to load only the files that are starts with 20`.
Thanks!

This is because the pattern clause is a Regex expression.
Try this:
copy into mytable from #mystage pattern = '[20]*.*csv.gz'
Reference: Loading Using Pattern Matching

Find matching patterns in list of log files where patterns are stored in array elements

I have a list of log files which have the same pattern example :
"http://textuploader.com/d02at!"
As you can see it's divided into various columns , I want to extract certain information from each column i.e EBCDIC, BINARY and TRACE HEADERS and display it column wise for each sequence.
I have already written a working script to do so :
http://textuploader.com/d0z0u
Which generates the desired output in the following format :
EBCDIC Header info
"http://textuploader.com/d02aq!"
Bnary Header Info
"http://textuploader.com/d02am!"
Trace Header parts
"http://textuploader.com/d02ap!"
..Similar extraction for other headers based on the first column in the logfile.
What I want to do is to getaway from so much "grep" work in the script and use some sort of array method to store all the attributes that I want to grep.
Then iterate over these array elements to extract the information.
Thanks

Auto-generating destinations of split files in SSIS

I am working on my first SSIS package. I have a view with data that looks something like:
Loc Data
1 asd
1 qwe
2 zxc
3 jkl
And I need all of the rows to go to different files based on the Loc value. So all of the data rows where Loc = 1 should end up in the file named Loc1.txt, and the same for each other Loc.
It seems like this can be accomplished with a conditional split to flat file, but that would require a destination for each Location. I have a lot of Locations, and they all will be handled the same way other than being split in to different files.
Is there a built in way to do this without creating a bunch of destination components? Or can I at least use the script component to act as a way?

You should be able to set an expression using a variable. Define your path up to the directory and then set the variable equal to that column.
You'll need an Execute SQL task to return a Single Row result set, and loop that in a container for every row in your original result set.
I don't have access at the moment to post screenshots, but this link should help outline the steps.
So when your package runs the expression will look like:
'C:\Documents\MyPath\location' + #User::LocationColumn + '.txt'
It should end up feeding your directory with files according to location.
Set the User::LocationColumn equal to the Location Column in your result set. Write your result set to group by Location, so all your records write to a single file per Location.

I spent some time try to complete this task using the method #Phoenix suggest, but stumbled upon this video along the way.
I ended up going with the method shown in the video. I was hoping I wouldn't have to separate it in to multiple select statements for each location and an extra one to grab the distinct locations, but I thought the SSIS implementation in the video was much cleaner than the alternative.

Change the connection manager's connection string, in which you have to use variable which should be changed.
By varying the variable, destination file also changes
and connection string is :
'C:\Documents\ABC\Files\' + #User::data + '.txt'
vote this if it helps you

How to Dynamically render Table name and File name in pentaho DI

I have a requirement in which one source is a table and one source is a file. I need to join these both on a column. The problem is that I can do this for one table with one transformation but I need to do it for multiple set of files and tables to load into another set of specific files as target using the same transformation.
Breaking down my requirement more specifically :
Source Table Source File Target File
VOICE_INCR_REVENUE_PROFILE_0 VoiceRevenue0 ProfileVoice0
VOICE_INCR_REVENUE_PROFILE_1 VoiceRevenue1 ProfileVoice1
VOICE_INCR_REVENUE_PROFILE_2 VoiceRevenue2 ProfileVoice2
VOICE_INCR_REVENUE_PROFILE_3 VoiceRevenue3 ProfileVoice3
VOICE_INCR_REVENUE_PROFILE_4 VoiceRevenue4 ProfileVoice4
VOICE_INCR_REVENUE_PROFILE_5 VoiceRevenue5 ProfileVoice5
VOICE_INCR_REVENUE_PROFILE_6 VoiceRevenue6 ProfileVoice6
VOICE_INCR_REVENUE_PROFILE_7 VoiceRevenue7 ProfileVoice7
VOICE_INCR_REVENUE_PROFILE_8 VoiceRevenue8 ProfileVoice8
VOICE_INCR_REVENUE_PROFILE_9 VoiceRevenue9 ProfileVoice9
The table and file names are always corresponding i.e. VOICE_INCR_REVENUE_PROFILE_0 should always join with VoiceRevenue0 and the result should be stored in ProfileVoice0. There should be no mismatches in this case. I tried setting the variables with table names and file names, but it only takes on value at a time.
All table names and file names are constant. Is there any other way to get around this. Any help would be appreciated.

Try using "Copy rows to result" step. It will store all the incoming rows (in your case the table and file names) into a memory. And for every row, it will try to execute your transformation. In this way, you can read multiple filenames at one go.
Try reading this link. Its not the exact answer, but similar.
I have created a sample here. Please check if this is what is required.
In the first transformation, i read the tablenames and filenames and loaded it in the memory. After that i have used the get variable step to read all the files and table names to generate the output. [Note: I have not used table input as source anywhere, instead used TablesNames. You can replace the same with the table input data.]
Hope it helps :)

Conditional ETL in Camel based on matching .md5

Looked through the docs for a way to use Camel for ETL just as in the site's examples, except with these additional conditionals based on an md5 match.
Like the camel example, myetl/myinputdir would be monitored for any new file, and if found, file of ${filename} would be processed.
Except it would first wait for ${filename}.md5 to show up, which would contain the correct md5. If ${filename}.md5 never showed up, it would simply ignore the file until it did.
And if ${filename}.md5 did show up but the md5 didn't match, it would be processed but with an error condition.
Found suggestions to use crypto for matching, but have not figured out how to ignore the file until the matching .md5 file shows up. Really, these two files need to be processed as a matched pair for everything to work properly, and they may not arrive in the input directory at the exact same millisecond. Or alternately, the md5 file might show up a few milliseconds before the data file.

You could use an aggregator to combine the two files based on their file name. If your files are suitably named, then you can use the file name (without extension) as the correlation ID. Continue the route once completionSize equals 2. If you set groupExchanges to true then in your next route step you have access to both the file to compute the hash value for and the contents of the md5 file to compare the hash value against. Or if the md5 or content file never arrived within completionTimeout you can trigger whatever action is appropriate for your scenario.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to give pattern in snowflake external table - snowflake-cloud-data-platform

Try changing the pattern to match on the whole path, not just the filename. For example: pattern='.emp_test_user.\\.csv'. The .* matches zero or more chars. To match a dot you can use [.] or \\.. UPDATE You can test your pattern with LIST: list #mystage pattern='.emp_test_user.[.]csv'

Related

using pattern while loading csv file from s3 to snowflake

Find matching patterns in list of log files where patterns are stored in array elements

Auto-generating destinations of split files in SSIS

How to Dynamically render Table name and File name in pentaho DI

Conditional ETL in Camel based on matching .md5

Categories

Resources

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to give pattern in snowflake external table - snowflake-cloud-data-platform

Try changing the pattern to match on the whole path, not just the filename. For example: pattern='.*emp_test_user.*\\.csv'. The .* matches zero or more chars. To match a dot you can use [.] or \\.. UPDATE You can test your pattern with LIST: list #mystage pattern='.*emp_test_user.*[.]csv'

Related

using pattern while loading csv file from s3 to snowflake

Find matching patterns in list of log files where patterns are stored in array elements

Auto-generating destinations of split files in SSIS

How to Dynamically render Table name and File name in pentaho DI

Conditional ETL in Camel based on matching .md5

Categories

Resources

Try changing the pattern to match on the whole path, not just the filename. For example: pattern='.emp_test_user.\\.csv'. The .* matches zero or more chars. To match a dot you can use [.] or \\.. UPDATE You can test your pattern with LIST: list #mystage pattern='.emp_test_user.[.]csv'