I am loading files to snowflake tables using Snowpipe. I have already loaded a file named employees1 to snowflake table named Employee_table. I truncated the table and wanted to load that file again. As snowpipe will store load history in metadata for 14days it will not allow to load the same file again to snowflake table. I am not able to use FORCE=TRUE in snowpipe COPY INTO command. Is there any way we can load the files without creating another table.
Related
When everytime my azure pipeline runs a new files gets created in azure data lake storage, so now I want my external table already created for this table to point to the latest file created in data lake
I have multiple parquet files of the same table in blob storage, we want to read the latest parquet file in external table in snowflake.
Have you checked out this section in the Snowflake documentation. It covers the steps required to configure Automatic Refresh of External Tables using Azure Event Grid. If this is not suitable for your use case, can you provide more detail on your issue and why.
Im trying to create an external table in Snowflake by using an external stage in Azure blob storage. Query runs witout errors but since there are several files in the external stage I can not successfully create a table with data from a correct file which I want to load in. Even though I have been trying to specify the file name by writing different path I still dont get the table with the right data.
Does anyone know how to specify a file among many files in an external stage?
I would also like to update the table in Snowflake evey time the file has been updated in Azure blob storage. How would you do?
Thank you in advance!
I frequently need to validate CSVs submitted from clients to make sure that the headers and values in the file meet our specifications. Typically I do this by using the Import/Export Wizard and have the wizard create the table based on the CSV (file name becomes table name, and the headers become the column names). Then we run a set of stored procedures that checks the information_schema for said table(s) and matches that up with our specs, etc.
Most of the time, this involves loading multiple files at a time for a client, which becomes very time consuming and laborious very quickly when using the import/export wizard. I tried using an xp_cmshell sql script to load everything from a path at once to have the same result, but xp_cmshell is not supported by AzureSQL DB.
https://learn.microsoft.com/en-us/azure/azure-sql/load-from-csv-with-bcp
The above says that one can load using bcp, but it also requires the table to exist before the import... I need the table structure to mimic the CSV. Any ideas here?
Thanks
If you want to load the data into your target SQL db, then you can use Azure Data Factory[ADF] to upload your CSV files to Azure Blob Storage, and then use Copy Data Activity to load that data in CSV files into Azure SQL db tables - without creating those tables upfront.
ADF supports 'auto create' of sink tables. See this, and this
Is there a way to import data from a csv file into a temporary table on Snowflake? Preferably using the web interface.
It is not possible to load a csv file into a temp table from UI. It would be interesting to know the use case for the questions?
Given that Temporary tables only exist within the session in which they were created and persist only for the remainder of the session.
As such, they are not visible to other users or sessions. Once the session ends, data stored in the table is purged completely from the system and, therefore, is not recoverable, either by the user who created the table or Snowflake.
I am loading csv files from Amazon S3 to Snowflake via a Snowflake External Stage pointing at Amazon S3 using the COPY command.
Is it possible to identify files that have already been processed by Snowflake?
I have explored listing from the external stage and querying the metadata for the external stage
Ideally, this data would output as a flag which I can query in SQL.
The view INFORMATION_SCHEMA.LOAD_HISTORY should contain relevant information on data loading.
Columns are SCHEMA_NAME, FILE_NAME, TABLE_NAME, LAST_LOAD_TIME, STATUS, ROW_COUNT etc.
Documented here.