Can I use a subquery in a snowflake COPY INTO Command - snowflake-cloud-data-platform

I have a csv file in an S3 bucket, and to import that data into snowflake. I built a snowflake stage and a snowpipe. Using this command:
create or replace pipe pipe_name as
copy into target_table
from #my_stage
FILE_FORMAT = (FIELD_OPTIONALLY_ENCLOSED_BY = '0x22');
This query worked perfectly.
However I would like to add in a new column called insert_date, to track when the rows are loaded into the table. Would this query work?
copy into target_table
from (select *, current_date()
from #my_stage)
FILE_FORMAT = (FIELD_OPTIONALLY_ENCLOSED_BY = '0x22');
Note: I do not have sysadmin privileges to run this query, I ask a colleague who has the privileges to run it. So I want to make sure this query would run when I ask him to run. Any help is highly appreciated.

You cannot query the files from the stage using select *, you need to specify the columns. For CSV files it is easy to do by number, see the example below. Assuming your file has two columns:
copy into target_table
from (select $1, $2, current_date()
from #my_stage)
FILE_FORMAT = (FIELD_OPTIONALLY_ENCLOSED_BY = '0x22');
You will need to add more columns in your select $1, $2,... depending on your source file. Then it would work.

Here is a documentation page with some examples
https://docs.snowflake.com/en/user-guide/data-load-transform.html#transforming-csv-data

Related

How do I put a csv into an external table in SNowflake?

I have a staged file and I am trying to query the first line/row of it because it contains the column headers of the file. Is there a way I can create an external table using this file so that I can query the first line?
I am able to query the staged file using
SELECT a.$1
FROM #my_stage (FILE_FORMAT=>'my_file_format',PATTERN=>'my_file_path') a
and then to create the table I tried doing
CREATE EXTERNAL TABLE MY_FILE_TABLE
WITH
LOCATION='my_file_path'
FILE_FORMAT = my_file_format;
Reading Headers from CSV is not supported however this answer from StackOverflow gives a workaround.

Snowflake data warehouse - querying data in staged files

On running a query of staged data files in Snowflake, I have noticed that the filename effectively has an implicit glob on the end.
In other words,
SELECT COUNT(*)
FROM #MASTERCATALOGUE.CUSTOMERS.USAGE_STAGE/4089.jsonl.gz
is actually
SELECT COUNT(*)
FROM #MASTERCATALOGUE.CUSTOMERS.USAGE_STAGE/4089.jsonl.gz*
For example, I have two files in the stage named 4089.jsonl.gz and 4089.jsonl.gz.1.gz
On running the following:
SELECT COUNT(*)
FROM #MASTERCATALOGUE.CUSTOMERS.USAGE_STAGE/4089.jsonl.gz
I would expect to get the count of just 4089.jsonl.gz. However, I get the count of both added together as the implicit glob ends up matching both files.
There is no mention of this in the documentation.
Querying data in staged files
I have tried putting single and double quotes around the filename, but this makes no difference.
Any ideas of the notation that will not add this implicit glob?
Thanks.
You can limit the results by filtering on the METADATA$FILENAME metadata column:
SELECT COUNT(*)
FROM #MASTERCATALOGUE.CUSTOMERS.USAGE_STAGE/4089.jsonl.gz
WHERE METADATA$FILENAME = '4089.jsonl.gz'
https://docs.snowflake.net/manuals/user-guide/querying-metadata.html#

CTAS the output from COPY INTO

the copy into command returns an output dataset.
CTAS can create a table from the results of a query.
combining the two, we would expect to get the list of loaded files into a new table.
CREATE TABLE MY_LOADED_FILES
AS
COPY INTO mytable
FROM #my_int_stage;
However, this returns:
SQL compilation error: syntax error line 3 at position 0 unexpected 'copy'.
What am I doing wrong?
It doesn't look like you can put a COPY INTO statement inside another statement unfortunately. There is a way to do this however by using the result_scan function to return the results of the previous query.
copy into test_database.public.test_table from #my_int_stage;
create temporary table test_database.public.test_table_results as (
select * from table(result_scan(LAST_QUERY_ID()))
);
Of course you need to make sure the second query runs in the same session as the copy statement and also that it is run directly after the copy statement. Alternatively you can use the query id with the result_scan.
If you want to see which files were loaded why don't you just look at the copy_history of the table?

How export query result as CSV from a remote database with SQL Server?

I’m doing the following query on a remote database:
DECLARE #SQL varchar(max) =
'SELECT ts, value
FROM history
WHERE name = ''SOME_ID''
EXEC (#SQL) AT SOME_LINKED_SERVER
So the expected output is like that:
ts value
----------------
ts1 value1
ts2 value2
… …
I’m doing this query for almost 100 different names and am willing to save a different CSV for each output. I know I can do it manually by clicking with the right button on the query’s output and selection “Save Result As...”, but it would take too long, specially because each query takes about 10 minutes to finish.
So I’d like to do it automatically, making the procedure export all the different CSV’s after getting the data. My ideia is to loop the search through an array of names, do the query and export the output as a CSV.
How can I do that? Before trying to loop through the array, I'm already struggling to output a CSV for a single query result.
Why do it one name at a time when you can do this:
SELECT [ts], [value], [name]
FROM [SOME_LINKED_SERVER].[database_name].[dbo].[history] H (NOLOCK)
WHERE [name] IN (<name list>)
If you want to store the result on the local end, then insert the data into a local table and work with the data there. Like so:
SELECT [ts], [value], [name]
INTO [local_history]
FROM [SOME_LINKED_SERVER].[database_name].[dbo].[history] H (NOLOCK)
WHERE [name] IN (<name list>)
SELECT * FROM [local_history] -- WHERE [name] = 'SOME_ID'
Then export, either by using Save Results As..., or copy and pasting the results into Excel, and doing a Save as... (F12).
Even better: if you have SSIS installed, use that to build a package that runs the query, and does the export for you. SSIS can even loop through the list of names if it comes down to that. If you don't have SSIS installed, install it, it comes with SQL Server. SSIS can be made to automate this process entirely.

Ignore errors in SSDT Post-Deployment script

I need to populate codelists after publishing my database using SSDT. So I've added new post-deployment script to the project and from it I call another scripts using SQLCMD :r command, each inserting data to one table. But if table is already filled, there are primary key constraints violated and whole setup is broken.
How can I suppress errors in post-deployment script? SQLCMD command :on error ignore is not supported.
Here's a good example of how to achieve what you're looking for using the MERGE statement instead of raw INSERTs.
http://blogs.msdn.com/b/ssdt/archive/2012/02/02/including-data-in-an-sql-server-database-project.aspx
Why don't you modify your script to avoid reinserting existing values? Using a common table expression, you would have something resembling:
;with cte as (select *, row_number() over (partition by ... order by ...) as Row from ... )
insert into ...
select ...
from cte where not exists (...) and cte.Row = 1
Cannot be more explicit without having your table definition ...

Resources