snowflake external table partition multiple columns - snowflake-cloud-data-platform

Can I create multi columns partition in snowflake external table
partition by ( date, time)?
also is it possible to create partition table from table columns( such name, create date etc). all the document is indicating use metdata$filename column to create the partition column.

A partition column definition is an expression that parses the column metadata in the internal (hidden) METADATA$EXTERNAL_TABLE_PARTITION column. As of now if you want to use any column from your data files, you need to define the column in the form of Metadata$external_table.
https://docs.snowflake.com/en/sql-reference/sql/create-external-table.html#partitioning-parameters
You can create multiple partitions like data and time however, you need to be selective to make sure that the partition you are specifying is more performant.

External table partition is based on the file paths, which is stored as metadata$filename to the external table.
Yes, you can define multiple columns as partitions for Snowflake's external tables like "partition by (date, time)", but they need to be from the metadata, not the actual data column in the external table file, because they are not used to partition the actual files.

Related

How partiton pruning works on integer column in snowflake table

I have a table in snowflake with around 1000 columns, i have an id column which is of integer type
when i run query like
select * from table where id=12
it is scanning all the micro-paritions .I am expecting that snowflake will maintain metadata of min/max of id column and based on that it should scan only one partition rather than all the partition.
In this doc https://docs.snowflake.com/en/user-guide/tables-clustering-micropartitions.html its mentioned that they maintain min/max , disticnt value of columns in each micro-partition.
How can i take advantage of partititon pruning in this scenario?Currently even for unique id snowflake is scanning all the partitions.
It's a little more complicated than that unfortunately. Snowflake would only scan a single partition if your table was perfectly clustered by your id column, which it probably isn't, nor should it be. Snowflake is a data warehouse and isn't ideal for single-row lookups.
You could always cluster your table by your id column but you usually don't want to do this in a data warehouse. I would recommend reading this document to understand how table clustering works.

SSIS flat file with joins

I have a flat file which has following columns
Device Name
Device Type
Device Location
Device Zone
Which I need to insert into SQL Server table called Devices.
Devices table has following structure
DeviceName
DeviceTypeId (foreign key from DeviceType table)
DeviceLocationId (foreign key from DeviceLocation table)
DeviceZoneId (foreign key from DeviceZone table)
DeviceType, DeviceLocation and DeviceZone tables are already prepopulated.
Now I need to write ETL which reads flat file and for each row get DeviceTypeId, DeviceLocationId and DeviceZoneId from corresponding tables and insert into Devices table.
I am sure this is not new but its being a while I worked on such SSIS packages and help would be appreciated.
Load the flat content into a staging table and write a stored procedure to handle the inserts and updates in T-SQL.
Having FK relationships between the destination tables, can probably make a lot of trouble with a single data flow and a multicast.
The problem is that you have no control over the order of the inserts so the child record could be inserted before the parent.
Also, for identity columns on the tables, you cannot retrieve the identity value from one stream and use it in another without using subsequent merge joins.
The simplest way to do that, is by using Lookup Transformation to get the ID for each value. You must be aware that duplicates may lead to a problem, you have to make sure that the value is not found multiple times in the foreign tables.
Also, make sure to redirect rows that have no match into a staging table to check them later.
You can refer to the following article for a step by step guide to Lookup Transformation:
An Overview of the LOOKUP TRANSFORMATION in SSIS

SQL Server partitioniong

I am working on a heavy record set database in MS SQL 2016. So I want to use row table partition feature to improve speed.
As we know partition feature is working on partition column of a table. Let's say [Date Column] of a table. In our scenario, have many tables that need to partition because of heaver record set in 5 to 7 tables. Each table not have that [Date column]. Also not possible to add that column in each table.
So is there any way I can select partition column of another table or something else.
The best option is to add a common column to all tables that you will then use to partition by.
You must already have a way of relating the different tables to each other so you can use this to tag each table with the correct Partition column.
This column could be as simple as an int with YYYYMM as values for monthly partitions.
You also need to make sure your queries are "Partition Aware".
This means that you should include this column in your WHERE Clause and also your JOIN Clauses for any queries.
Use Query Plans to make sure you are getting Partition Elimination on your queries.
If you can't change the model (but can add partitions???) then you could implement the partitioning with different columns in each table provided you have a single column in each table that you can partition on named ranges - but if you have 1-many relationships then it is unlikely that the child tables keys will be consecutive relative to the parent table. Note that this approach will make your "partition aware" queries more complex to craft.

Create persistent key using two unique id's as the source

I'm extracting data from a system that is using uniqueidentifier as the field type for it's primary keys.
On the system I'm extracting from, I've been given access to a single table that's been derived. That table has been made by joining one table to a one to many table resulting in me needing to use two of these uniqueidentifier columns to get uniqueness.
Is there a way for me to create a simple persistent key using these two columns?
The only idea I have at the moment is to create an identity column on my table, and upsert any future extractions (daily) into my table.
Is there a better method than this?
You can add what is known as a 'composite key'.
ALTER TABLE dbo.yourtablename
ADD CONSTRAINT uq_yourtablename UNIQUE(column1,column2);

How to create composite partition in SQL Server?

I want to create a sample database using composite partition. I know about Range Partition and List Partition. But, I don't have enough knowledge about Hash Values and how to create Hash Partition in my database?. So, I have decided that I should make a sample database using Composite Partition and I want to use Range Partition and Hash Partition in it. Can anybody describe it more and in easy word so, i can understand well about Hash Partition as well as Composite Partition.
I have also read some documents on internet. But, I could not understand how to create Hash Partition and How to create Composite Partition in my database. Actually I don't have enough knowledge about Hash Value and Hash Functoin. I have read about it but, I could not understand very well. I need a simple definition.
Definition of Horizontal Partition & Vertical Partition
Partition (database)
Hash Functions
Composite Partitioning feature is not available in SQL Server 2008. Only Range Partitioning is available in SQL Server.
Although the partitioning column must be a single column, it does not need to be numeric and it can be calculated so that the range can include multiple columns.
For instance it is common to partition on datetime data by month. This will work well, because that data is usually in a single column, but what do you do if you have data for multiple companies and you also want to partition by company? For this you could use a computed column for the partitioning column. This will create a computed column using the ‘company id’ and ‘order month’ which is then used for the partitions. It will partition three companies for the first three months of 2007.
the computed column must be persisted to form the partitioning column.
CREATE PARTITION FUNCTION MyPartitionRange (INT) AS RANGE LEFT FOR VALUES (1200701,1200702,1200703,2200701,2200702,2200703,3200701,3200702,3200703)
CREATE PARTITION SCHEME MyPartitionScheme AS PARTITION MyPartitionRange ALL TO ([PRIMARY])
CREATE TABLE CompanyOrders
( Company_id INT ,
OrderDate datetime ,
Item_id INT ,
Quantity INT ,
OrderValue decimal(19,5) ,
PartCol AS Company_id * 10000 + CONVERT(VARCHAR(4),OrderDate,112) persisted
) ON MyPartitionScheme (PartCol)

Resources