I am using Snowpipe with Auto Ingest True . I have followed all the steps, but When I upload files in S3 , it does not load into Snowflake. I checked status with below query:
select system$pipe_status('DB.PUBLIC.mypipe');
{"executionState":"STOPPED_MISSING_TABLE","pendingFileCount":0,"notificationChannelName":"arn:aws:sqs:region:xxxx:xxxx","numOutstandingMessagesOnChannel":0,"lastReceivedMessageTimestamp":"2021-03-29T13:00:07.443Z","lastForwardedMessageTimestamp":"2021-03-29T13:00:07.443Z"}
It looks like it is not able to find the target table, but table exists and current user has access to table. When I run just the underlying Copy, it works fine and files are loaded.
Can someone suggest what can be issue.
create or replace pipe DB.public.mypipe auto_ingest=true as
copy into DB.public.table
from #DB.public.table
file_format = (type = 'CSV' error_on_column_count_mismatch=false) ON_ERROR="CONTINUE";
The STOPPED_MISSING_TABLE error message usually refers to the insufficient privilege for the role to access the table in question.
The necessary privileges required for the role to perform the tasks per the document link below.
https://docs.snowflake.com/en/user-guide/data-load-snowpipe-auto-s3.html#step-3-configure-security
Related
I'm testing out a trial version of Snowflake. I created a table and want to load a local CSV called "food" but I don't see any "load" data option as shown in tutorial videos.
What am I missing? Do I need to use a PUT command somewhere?
Don't think Snowsight has that option in the UI. It's available in the classic UI though. Go to Databases tab, select a database. Go to Tables tab and select a table the option will be at the top
If the classic UI is limiting you or you are already using Snowsight and don't want to switch back, then here is another way to upload a CSV file.
A preliminary is that you have installed SnowSQL on your device (https://docs.snowflake.com/en/user-guide/snowsql-install-config.html).
Start SnowSQL and perform the following steps:
Use the database where to upload the file to. You need various privileges for creating a stage, a fileformat, and a table. E.g. USE MY_TEST_DB;
Create the fileformat you want to use for uploading your CSV file. E.g.
CREATE FILE FORMAT "MY_TEST_DB"."PUBLIC".MY_FILE_FORMAT TYPE = 'CSV';
If you don't configure the RECORD_DELIMITER, the FIELD_DELIMITER, and other stuff, Snowflake uses some defaults. I suggest you have a look at https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html. Some of the auto detection stuff can make your life hard and sometimes it is better to disable it.
Create a stage using the previously created fileformat
CREATE STAGE MY_STAGE file_format = "MY_TEST_DB"."PUBLIC".MY_FILE_FORMAT;
Now you can put your file to this stage
PUT file://<file_path>/file.csv #MY_STAGE;
You can find documentation for configuring the stage at https://docs.snowflake.com/en/sql-reference/sql/create-stage.html
You can check the upload with
SELECT d.$1, ..., d.$N FROM #MY_STAGE/file.csv d;
Then, create your table.
CREATE TABLE MY_TABLE (col1 varchar, ..., colN varchar);
Personally, I prefer creating first a table with only varchar columns and then create a view or a table with the final types. I love the try_to_* functions in snowflake (e.g. https://docs.snowflake.com/en/sql-reference/functions/try_to_decimal.html).
Then, copy the content from your stage to your table. If you want to transform your data at this point, you have to use an inner select. If not then the following command is enough.
COPY INTO mycsvtable from #MY_STAGE/file.csv;
I suggest doing this without the inner SELECT because then the option ERROR_ON_COLUMN_COUNT_MISMATCH works.
Be aware that the schema of the table must match the format. As mentioned above, if you go with all columns as varchars first and then transform the columns of interest in a second step, you should be fine.
You can find documentation for copying the staged file into a table at https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html
If you can check the dropped lines as follows:
SELECT error, line, character, rejected_record FROM table(validate("MY_TEST_DB"."MY_SCHEMA"."MY_CSV_TABLE", job_id=>'xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'))
Details can be found at https://docs.snowflake.com/en/sql-reference/functions/validate.html.
If you want to add those lines to your success table you can copy the the dropped lines to a new table and transform the data until the schema matches with the schema of the success table. Then, you can UNION both tables.
You see that it is pretty much to do for loading a simple CSV file to Snowflake. It becomes even more complicated when you take into account that every step can cause some specific failures and that your file might contain erroneous lines. This is why my team and I are working at Datameer to make these types of tasks easier. We aim for a simple drag and drop solution that does most of the work for you. We would be happy if you would try it out here: https://www.datameer.com/upload-csv-to-snowflake/
I create an external stage in Snowflake via (I've tried with a public bucket too)
CREATE OR REPLACE stage "DATABASE"."SCHEMA"."STAGE_NAME"
url='s3://bucket'
CREDENTIALS=(AWS_KEY_ID='xxxxxxxxxxxx' AWS_SECRET_KEY='xxxxxxxxxxxx');
I could view the parameters of this stage via
SHOW STAGES
DESC STAGE "DATABASE"."SCHEMA"."STAGE_NAME"
However, I'm getting the error whenever I'm trying to interact with this stage (e.g., LIST #STAGE_NAME or load a file).
SQL compilation error: Stage 'DATABASE.SCHEMA.STAGE_NAME' does not exist or not authorized.
I've tried different snowflake roles but can't make it work. Could anyone point me where to look? Perhaps I have to assign any permissions to the stage?
You have STAGE-privileges: https://docs.snowflake.com/en/user-guide/security-access-control-privileges.html#stage-privileges
For COPY, LIST and others you need the privileges mentioned there. (USAGE, READ and maybe WRITE)
It's pretty weird, but I can list a stage if the name consists of capital letters. No additional permissions are needed.
Works fine
CREATE OR REPLACE stage "DATABASE"."SCHEMA"."STAGE_NAME"
url='s3://bucket'
CREDENTIALS=(AWS_KEY_ID='xxxxxxxxxxxx' AWS_SECRET_KEY='xxxxxxxxxxxx');
LIST #STAGE_NAME
Returns Stage does not exist or not authorized.
CREATE OR REPLACE stage "DATABASE"."SCHEMA"."Stage_Name"
url='s3://bucket'
CREDENTIALS=(AWS_KEY_ID='xxxxxxxxxxxx' AWS_SECRET_KEY='xxxxxxxxxxxx');
LIST #Stage_Name
At the same time, I see all Stages while running the "SHOW STAGES" command.
Are there any constraints on the naming? I haven't found any so far.
If the stage DDL has the name enclosed in double-quotes(CREATE OR REPLACE stage "DATABASE"."SCHEMA"."STAGE_NAME"), the name becomes case-sensitive which is why you cannot see it. Do not enclose the stage name in quotes and you should be able to see it regardless of the case.
https://docs.snowflake.com/en/sql-reference/sql/create-stage.html#required-parameters
I have created an internal temporary stage in snowflake and after consuming data from it. I want to remove the stage.
I have created stage as:
CREATE TEMPORARY STAGE TEST_STAGE COMMENT = 'TEMPORARY STAGE FOR USER DATA LOAD'
When I do:
SHOW STAGES IN ACCOUNT;
I see:
name database_name schema_name type
TEMP_STAGE test schema internal_temporary
all other fields related to s3 are null since its internal storage.
I have tried
DROP STAGE "test"."schema"."TEMP_STAGE"
remove #%USER;
None of them worked, I still see this stage using show stages command. And I have proper rights to delete the schema
stage created as temporary will be dropped at the end of the session in which it was created.
Please Note:
When a temporary external stage is dropped, only the stage itself is
dropped; the data files are not removed.
When a temporary internal stage is dropped, all of the files in the
stage are purged from Snowflake, regardless of their load status.
This prevents files in temporary internal stages from using data storage and, consequently, accruing storage charges. However, this
also means that the staged files cannot be recovered through
Snowflake once the stage is dropped.
Tip: If you plan to create and use temporary internal stages, you should maintain copies of your data files outside of Snowflake.
Huge edit -- I removed the ';' characters and replace them with 'GO' and ... the secondary key and URL worked, except I got this:
Cannot bulk load. The file "06May2013_usr_tmp_cinmachI.csv" does not exist or you don't have file access rights.
BTW, this can't be true :) -- I'm able to use PowerShell to upload the file so I'm sure it's not my account credentials. Here is the code I'm using now, again, it WON'T fit into a {} block no matter what I do with this editor, sorry for the inconvenience.
The docs can CREATE MASTER KEY is used to encrypt SECRET later on but there's no obvious link, assumed this is all under the hood -- is that right? If not, maybe that's what's causing my access error.
So, the issue with the data source not existing was errant syntax -- one can't use ';' evidently to terminate blocks of SQL but 'GO' will work.
The CSV file does exist:
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'S0me!nfo'
GO
CREATE DATABASE SCOPED CREDENTIAL AzureStorageCredential
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'removed'
GO
CREATE EXTERNAL DATA SOURCE myDataSource
WITH (TYPE = BLOB_STORAGE, LOCATION = 'https://dtstestcsv.blob.core.windows.net/sunsource', CREDENTIAL = AzureStorageCredential)
GO
BULK
INSERT dbo.ISSIVISFlatFile
FROM '06May2013_usr_tmp_cinmachI.csv'
WITH
(DATA_SOURCE = 'myDataSource', FORMAT = 'CSV')
I feel obliged to post at some info even if it's not a full answer.
I was getting this error:
Msg 4860, Level 16, State 1, Line 58
Cannot bulk load. The file "container/folder/file.txt" does not exist or you don't have file access rights.
I believe the problem might have been that I generated my SAS key from right now, but that is UTC time, meaning that here in Australia, the key only becomes valid in ten hours. So I generated a new key that started a month before and it worked.
The SAS (Shared Access Signature) is a big string that is created as follows:
In Azure portal, go to your storage account
Press Shared Access Signature
Fill in fields (make sure your start date is a few days prior, and you can leave Allowed IP addresses blank)
Press Generate SAS
Copy the string in the SAS Token field
Remove the leading ? before pasting it into your SQL script
Below is my full script with comments.
-- Target staging table
IF object_id('recycle.SampleFile') IS NULL
CREATE TABLE recycle.SampleFile
(
Col1 VARCHAR(MAX)
);
-- more info here
-- https://blogs.msdn.microsoft.com/sqlserverstorageengine/2017/02/23/loading-files-from-azure-blob-storage-into-azure-sql-database/
-- You can use this to conditionally create the master key
select * from sys.symmetric_keys where name like '%DatabaseMasterKey%'
-- Run once to create a database master key
-- Can't create credentials until a master key has been generated
-- Here, zzz is a password that you make up and store for later use
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'zzz';
-- Create a database credential object that can be reused for external access to Azure Blob
CREATE DATABASE SCOPED CREDENTIAL BlobTestAccount
WITH
-- Must be SHARED ACCESS SIGNATURE to access blob storage
IDENTITY= 'SHARED ACCESS SIGNATURE',
-- Generated from Shared Access Signature area in Storage account
-- Make sure the start date is at least a few days before
-- otherwise UTC can mess you up because it might not be valid yet
-- Don't include the ? or the endpoint. It starts with 'sv=', NOT '?' or 'https'
SECRET = 'sv=2016-05-31&zzzzzzzzzzzz'
-- Create the external data source
-- Note location starts with https. I've seen examples without this but that doesn't work
CREATE EXTERNAL DATA SOURCE BlobTest
WITH (
TYPE = BLOB_STORAGE,
LOCATION = 'https://yourstorageaccount.blob.core.windows.net',
CREDENTIAL= BlobTestAccount);
BULK INSERT recycle.SampleFile
FROM 'container/folder/file'
WITH ( DATA_SOURCE = 'BlobTest');
-- If you're fancy you can use these to work out if your things exist first
select * from sys.database_scoped_credentials
select * from sys.external_data_sources
DROP EXTERNAL DATA SOURCE BlobTest;
DROP DATABASE SCOPED CREDENTIAL BlobTestAccount;
One thing that this wont do that ADF does, is pick up a file based on wildcard.
That is: If I have a file called ABC_20170501_003.TXT, I need to explicitly list that in the bulk insert load script, whereas in ADF I can just specify ABC_20170501 and it automatically wildcards the rest
Unfortunately there is no (easy) way to enumerate files in blob storage from SQL Server. I eventually got around this by using Azure Automation to run a powershell script to enumerate the files and register them into a table that SQL Server could see. This seems complicated but actually Azure Automation is a very useful tool to learn and use, and it works very reliably
More opinions on ADF:
I couldn't find a way to pass the filename that I loaded (or other info) into the database.
Do not use ADF if you need data to be loaded in the order it appears in the file (i.e. as captured by an identity field). ADF will try and do things in parallel. In fact, my ADF did insert things in order for about a week (i.e. as recorded by the identity) then one day it just started inserting stuff out of order.
The timeslice concept is useful in limited circumstances (when you have cleanly delineated data in cleanly delineated files that you want to drop neatly into a table). In any other circumstances it is complicated, unwieldy and difficult to understand and use. In my experience real world data needs more complicated rules to work out and apply the correct merge keys.
I don't know the cost difference between importing files via ADF and files via BULK INSERT, but ADF is slow. I don't have to patience to hack through Azure blades to find metrics right now but your talking 5 minutes in ADF vs 5 seconds in Bulk Insert
UPDATE:
Try Azure Data Factory V2. It is vastly improved, and you are no longer bound to timeslices.
In db2, data from file can be imported with 'import' providing insert_update mode to do insert if record doesn't exist and update if exist.
Is there a way to import/load data from a file into a table such that records from file are inserted if they do not exist and updated if they do exist.
The only way I could figure out is to use bulk load with merge through intermediate/temporary table and then use that table to insert-update into target table.
With this approach there may be performance issue as all data is first loaded into temporary table. Please advise if there is way to do this without creating temporary table.
You could use SSIS.
In your data flow you would perform a lookup to see if record exists already, if so send it down an update code path (which will probably involve using a staging update then joing the two). If it doesn't exist perform an insert.
Along the lines of https://social.msdn.microsoft.com/forums/sqlserver/en-US/9e14507d-2a30-403b-98f5-a6d2468b384e/update-else-insert-ssis-record