Foreign key verification is very slow while creating empty partitions - database

I am using postgres 14 and dealing with multi-level partitioning.
An sample table design looks like :
Table A :
CREATE TABLE issue (
id bigserial,
catalog_id bigint NOT NULL,
submit_time timestamp WITH TIME ZONE NOT NULL,
PRIMARY KEY (id, catalog_id, submit_time)
) PARTITION BY LIST (catalog_id)
Table B :
CREATE TABLE issue_detail (
id bigserial,
catalog_id bigint NOT NULL,
issue_id bigint NOT NULL,
submit_time timestamp WITH TIME ZONE NOT NULL,
PRIMARY KEY (id, catalog_id, submit_time),
FOREIGN KEY (catalog_id, submit_time, issue_id) REFERENCES issue (catalog_id, submit_time, id)
) PARTITION BY LIST (catalog_id)
So partition key for first level is catalog_id(partition by list) and for second level is submit_time(partition by range - on weekly basis).
Second level partitioning definition :
For Table A :
CREATE TABLE issue_catalog1 PARTITION OF issue FOR VALUES IN (1) PARTITION BY RANGE (submit_time)
For Table B :
CREATE TABLE issue_detail_catalog1 PARTITION OF issue_detail FOR VALUES IN (1) PARTITION BY RANGE (submit_time)
Similarly, child partitions are created by range and on weekly basis for past 3 years.
First level partitioned table is created inclemently, ie, first for catalog_id = 1, first level partitioned table is created and then it's partitions are created then for catalog_id = 2 and so on. So, for catalog_id=1 there would be around 166 partitions (range partition - partitioned by weekly for past 3 year). Similar for other consecutive catalog_id, 166 partitions would be created.
While defining the partitions, the time to create empty partitions in case of issue_detail table start growing(nearly by 30-50 % increase between consecutive catalog_id). After looking at postgres server log, I found that foreign key referential constraint verification is taking time. Then, to double check I created empty partition creation time without foreign key, in that case it was very fast(within couple of second).
It's very weird that creating empty partition for issue_detail is taking more than 10 minutes after catalog_id = 40. How can empty partitions creation take that much time. Why foreign key integrity verification is that slow on empty table ?

Don't create foreign keys between the partitioned tables. They will prevent you from dropping or detaching partitions. Instead, define the foreign keys between the actual partitions. That will probably also get rid of your performance problem.
Don't create too many partitions if you want good performance.

Related

Index and primary key in large table that doesn't have an Id column

I'm looking for guidance on the best practice for adding indexes / primary key for the following table in SQL Server.
My goal is to maximize performance mostly on selecting data, but also in inserts.
IndicatorValue
(
[IndicatorId] [uniqueidentifier] NOT NULL, -- this is a foreign key
[UnixTime] [bigint] NOT null,
[Value] [decimal](15,4) NOT NULL,
[Interval] [int] NOT NULL
)
The table will have over 10 million rows. Data is batch inserted between 5-10 thousand rows at a time.
I frequently query the data and retrieve the same 5-10 thousand rows at a time with SQL similar to
SELECT [UnixTime]
FROM [IndicatorValue]
WHERE [IndicatorId] = 'xxx GUID xxx'
AND [Interval] = 2
ORDER BY [UnixTime]
or
SELECT [UnixTime], [Value]
FROM [IndicatorValue]
WHERE [IndicatorId] = 'xxx GUID xxx'
AND [Interval] = 2
ORDER BY [UnixTime]
Based on my limited knowledge of SQL indexes, I think:
I should have a clustered index on IndicatorId and Interval. Because of the ORDER BY, should it also include UnixTime?
As I don't have an identity column (didn't create one because I wouldn't use it), I could have a non-clustered primary key on IndicatorId, UnixTime and Interval, because I read that it's always good to have PK on every table.
Also, the data is very rarely deleted, and there are not many updates, but when they happen it's only on 1 row.
Any insight on best practices would be much appreciated.

Are global indexes used when inserting directly into partition?

In Oracle 12c 12.1, when inserting directly into a specific partition, are any global unique indexes I have across multiple partitions on the same table still used? Will the uniqueness constraint continue to be maintained across partitions? If not, what is the benefit of having the global index in the first place?
Why not a simple test if you are in doubts?
create table test_tab (
id int,
trans_date date)
PARTITION BY RANGE (trans_date)
INTERVAL(NUMTOYMINTERVAL(1, 'MONTH'))
(
PARTITION p_2020 VALUES LESS THAN ( DATE'2021-01-01'),
PARTITION p_2021 VALUES LESS THAN ( DATE'2022-01-01')
);
create unique index test_tab_ux1 on test_tab(id);
insert into test_tab partition (p_2020)
(id, trans_date) values (1, DATE'2020-01-01');
insert into test_tab partition (p_2021)
(id, trans_date) values (1, DATE'2021-01-01');
ORA-00001: unique constraint (ZZZ.TEST_TAB_UX1) violated
So, you see the index maintains the uniquness across the partitions which is expected.
You should know that it has its price - anytime you drop or truncate some partition the index gets invalid and must be rebuild (either manually or integrated while using UPDATE INDEXES).
So basically you try to avoid unique constraint on partitioned tables at all (and enforce the consistency in the maintaining process) or at least to have part of the unique key as the partition key - a case that can be covered with a local index.

Find distinct values in SQL table without scan

A SQL Server table with >200 million records is divided into ~100 partitions (not true SQL Server Partitions - it's not running on a compatible edition of SQL Server) by adding a column PartitionID. PartitionID is the first half the table's clustered index definition; the other half is a unique auto-incrementing integer ID. PartitionID is also foreign key into the Partition table. No record from Example is ever accessed without knowing its PartitionID; they are usually accessed in ranges associated with a single PartitionID (or small number of PartitionIDs).
CREATE TABLE Example (
ID BIGINT IDENTITY(1, 1) NOT NULL,
PartitionID DECIMAL(18, 0) NOT NULL,
-- Other columns omitted for brevity
CONSTRAINT PK_Example PRIMARY KEY NONCLUSTERED (ID),
CONSTRAINT FK_Example_Partition FOREIGN KEY (PartitionID) REFERENCES Partition (ID)
)
CREATE UNIQUE CLUSTERED INDEX IX_Example ON Example(PartitionID, ID)
Partition rows are kept indefinitely, but Example rows are frequently purged by issuing a DELETE statement against a range with the same PartitionID. Over time, this leads to Partition rows that are not referenced by any Example rows. This is not the problem; the problem is identifying the Partition rows that are still referenced.
Without resorting to user-level management techniques like adding and manually maintaining a ReferenceCount field in the Partition table, or adding and manually maintaining a list of in-use PartitionIDs, is there a system-level technique we could use to discover the set of PartitionIDs that are still in use - without scanning all the rows in table Example?
SELECT DISTINCT PartitionID FROM Example
The above query takes tens of seconds to return 100 values because it's scanning 100s of millions of rows in the clustered index. Adding another very narrow index on PartionID alone might reduce the I/O and halve the time but essentially SQL Server is still scanning that index too.
CREATE NONCLUSTERED INDEX IX_Example_PartitionID ON Example(PartitionID)
I should probably also avoid joining Partition with Example (performing a number of clustered index seeks instead of an index scan) because the number of seeks will increase (and decrease performance) over time.
SELECT DISTINCT PartitionID FROM Partition p WHERE EXISTS (
SELECT TOP 1 1 FROM Example e WHERE p.ID = e.PartitionID
)

MSSQL - Foreign key to the same column IF other column is not equal to the referring row

I have a database in which i have two tables:
CREATE TABLE Transactions (
ID BIGINT IDENTITY(1,1) NOT NULL,
AccountID BIGINT NOT NULL,
Amount BIGINT NOT NULL,
CONSTRAINT PK_Transactions PRIMARY KEY CLUSTERED (ID ASC,AccountID ASC),
CONSTRAINT FK_Transaction_Account FOREIGN KEY (AccountID) REFERENCES Accounts(ID)
);
CREATE TABLE Accounts (
ID BIGINT IDENTITY(1,11) NOT NULL,
Balance BIGINT NOT NULL,
CONSTRAINT PK_Accounts PRIMARY KEY (ID)
);
Transactions are inserted to their table by a stored procedure i wrote, so that two rows are generated when Account 1 transfers 25 "coins" to Account 21:
ID | AccountID | Amount
-------------------------
1 | 1 | -25
-------------------------
1 | 21 | 25
In the above schema, i want the first row to reference the bottom row based on ID and the AccountID being unequal to the AccountID of the bottom row.
And vica versa.
What i want to do would look something like this:
CONSTRAINT FK_Transaction_Counterpart FOREIGN KEY (ID) REFERENCES Transactions(ID) WHERE thisRow.AccountID != referencedRow.AccountID
I haven't found this possibility in the documentation on the table constraints.
So both out of curiosity and intent to use this i ask, is this possible? And if yes, how?
Edit:
Answers reflect that this is not possible, and i should adjust my design or intentions.
I think i will settle with assigning the two transaction rows to each other in the functional code.
A traditional foreign key can't be conditional (i.e. no WHERE clause attached). In your case, I'd probably just make sure that the inserts are atomic (in the same transaction) so that there'd be no possibility of only one of them inserting.
If the data model you are trying to implement is:
One transaction (ID) has two and only two entries in table Transactions
For the two rows of a given Transaction ID, the AccountIDs cannot be the same
Then one perhaps overly-complex way you could enforce this business rule within the database table structures would be as follows:
Table Accounts, as you have defined
Table Transactions, as you have defined
New table TransactionPair with:
Columns (all are NOT NULL)
ID
LowAccountID
HighAccountID
Constraints
Primary key on ID (only one entry per Transaction ID)
Foreign key on (ID, LowAccountID) into Transactions
Foreign key on (ID, HighAccountID) into Transactions
Check constraint on the row such that LowAccountID < HighAccountID
Process:
Add pair of rows to Transactions table
Add single row to TransactionPair referencing the rows just added
If that row cannot be added, something failed, roll everything back
Seems neat and tidy, but quite possibly overly complex. Your mileage may vary.

designing new table for daily uploads - use unique constraint

I am using SQL Server 2012 & am creating a table that will have 8 columns, types below
datetime
varchar(12)
varchar(6)
varchar(100)
float
float
int
datetime
Once a day (normally) there will be an upload of approx 10,000 rows of data. Going forward its possible it could be 100,000.
The rows will be unique if I group on the first three columns listed above. I have read I can use the unique constraint on multiple columns which will guarantee the rows are unique.
I think I'm correct in saying that the unique constraint by default sets up non-clustered index. Would a clustered index be better & assuming when the table starts to contain millions of rows this won't cause any issues?
My last question. By applying the unique constraint on my table I am right to say querying the data will be quicker than if the unique constraint wasn't applied (because of the non-clustering or clustering) & uploading the data will be slower (which is fine) with the constraint on the table?
Unique index can be non-clustered.
Primary key is unique and can be clustered
Clustered index is not unique by default
Unique clustered index is unique :)
Mor information you can get from this guide.
So, we should separate uniqueness and index keys.
If you need to kepp data unique by some column - create uniqe contraint (unique index). You'll protect your data.
Also, you can create primary key (PK) on your columns - they will be unique also. But, there is a difference: all other indexies will use PK for referencing, so PK must be as short as possible. So, my advice - create Identity column (int or bigint) and create PK on it. And, create unique index on your unique columns.
Querying data may become faster, if you do queries on your unique columns, if you do query on other columns - you need to create other, specific indexies.
So, unique keys - for data consistency, indexies - for queries.
I think I'm correct in saying that the unique constraint by default
sets up non-clustered index
TRUE
Would a clustered index be better & assuming when the table starts to
contain millions of rows this won't cause any issues?
(1)if u need to make (datetime ,varchar(12), varchar(6)) Unique
(2)if you application or you will access rows using datetime or datetime ,varchar(12) or datetime ,varchar(12), varchar(6) in where condition
ALL the time
then have primary key on (datetime ,varchar(12), varchar(6))
by default it will put Uniqness and clustered index on all above three column.
but as you commented above:
the queries will vary to be honest. I imagine most queries will make
use of the first datetime column
and you will deal with huge data and might join this table with other tables
then its better have a surrogate key( ever-increasing unique identifier ) in the table and to satisfy your Selects
have Non-Clustered INDEXES
Surrogate Key vs Business Key
NON-CLUSTERED INDEX

Resources