Common practice for stages - snowflake-cloud-data-platform

Snowflake allows to put files of different structure in just one stage using different paths.
On the other hand we can put files of the same structure in separate stage.
Is stage a store for several tables of a schema or is stage a mean to store data for a partitioned table?
What is the usual practice?

There are a few different types of stages in Snowflake:
Internal Stages (Named, User and Table): With these types of stages, you upload the files directly to Snowflake. If you wanted to load data into multiple tables from a single stage you can either use a "Named" or "User" stage. A "Table" stage is automatically created when you create a table and it's for loading data into a single table only. With all internal stages, you typically upload data into Snowflake using SnowSQL from your local machine or a server and then run a copy command into a table.
External Stages (External Stages): External stages are the most common in my experience. You create a stage inside Snowflake that points to a cloud provider's blob storage service (s3, gcs, azure blob). The files are not stored in Snowflake like they are with an Internal Stage, they are stored in s3 (or whatever) and you can run copy commands to load into any table.
There is no right answer, you can either use Internal (Named or User) or External stages to load into multiple tables. My preference is to use an external stage, that way the data resides outside of Snowflake and can be loaded into other tools too if necessary.

Related

External Table in snowflake should read and point to the latest files created in Azure data lake storage automatically

When everytime my azure pipeline runs a new files gets created in azure data lake storage, so now I want my external table already created for this table to point to the latest file created in data lake
I have multiple parquet files of the same table in blob storage, we want to read the latest parquet file in external table in snowflake.
Have you checked out this section in the Snowflake documentation. It covers the steps required to configure Automatic Refresh of External Tables using Azure Event Grid. If this is not suitable for your use case, can you provide more detail on your issue and why.

Creating table in Snowflake from external stage when there are several files in blob

Im trying to create an external table in Snowflake by using an external stage in Azure blob storage. Query runs witout errors but since there are several files in the external stage I can not successfully create a table with data from a correct file which I want to load in. Even though I have been trying to specify the file name by writing different path I still dont get the table with the right data.
Does anyone know how to specify a file among many files in an external stage?
I would also like to update the table in Snowflake evey time the file has been updated in Azure blob storage. How would you do?
Thank you in advance!

how to pre check column position before copy command

We have 900+ columns coming in stage. I want to check position of columns should not change in inbound files before processing. What can be the best way to do this.
Snowflake supports using standard SQL to query data files located in an internal (i.e. Snowflake) stage or named external (Amazon S3, Google Cloud Storage, or Microsoft Azure) stage. This can be useful for inspecting/viewing the contents of the staged files, particularly before loading or after unloading data.
Details & Syntax: https://docs.snowflake.com/en/user-guide/querying-stage.html#querying-data-in-staged-files

Can I access data COPY'ed into a Snowflake table on S3 directly?

I loaded data into a table from an external stage by using COPY command. I know Snowflake compresses, encrypts, and saves all data to its "Storage Layer" shared across multiple Virtual Warehouses. Can I access my table's data directly on S3 storage layer?
I do not consider Unloading option.
The data is not stored in virtual warehouses, but in an underlying storage account.
You can not access the files used by Snowflake after the data have been ingested.
(You can upload the files using PUT with SnowSQL to an internal snowflake stage, and then download the files using GET)

what is the difference between external tables and global temporary tables in oracle?

I have worked with external tables in oracle, It can be created on a file containing data (with many other conditions). Then, How global temporary tables are different from External tables ?
An external table gets its content from e.g. a CSV file. The database itself does not store any data. Their content is visible to all sessions (=connections) to the server (provided necessary access privileges exists). The data exists independently of the database and is only deleted (or changed) if the file is changed externally (as far as I know Oracle can not write to an external table, only read from it - but I haven't used them for ages, so maybe this changed in Oracle 18 or later)
The data for a temporary table is stored and managed inside the database, but each session keeps its own copy of the data in the table. The data is automatically removed by Oracle when the session is disconnected or if the transaction is ended (depending on the definition of the temporary table). Data in a temporary table never survives a restart of the database server.
Broadly an external table is a place holder definition which points to a file somewhere on the OS. These are generally used (not limited to) when you have an external interface sending you data in files. You could either load the data in a normal table using sqlldr OR you could use External tables to point to the file itself, you can simply query the table to read from the file. There are some limitations though like you can not update an external table.
GTT - global temporary tables are used when you want to keep some on the fly information in a table such that it is only visible in the current session. There are good articles on both these tables if you want to go more in detail.
One more thing a GTT table access would be faster as compared to an external table access.

Resources