Azure SQL table partitions Ignored when queries contain simple function - sql-server

Using Azure SQL Server database. I have a few tables partitioned on a 90 day date boundary. We have a stored procedure to shift data to maintain the proper partition breakpoint/range. I'm using a small function to provide the proper date breakpoint for my queries so I don't have to constantly update all my views.
But just by virtue of using that function in my queries, partitioning is ignored. Do I have no choice but to put hard-coded values in my queries everywhere and constantly modify them?
Here is a sample that reproduces the problem.
Update: After changing the PartitionDate function below according to the marked answer, it was fine for a short time (partition elimination occurred). Then, queries started sucking again. When I ran simple queries filtered by the date function, partitions were no longer eliminated.
------------------------------- setup
-- Create functions PartitionDate and PartitionQueryDate
create function PartitionDate() returns date as
begin
return GETDATE() - 91 -- returns 1/4/2019 today
end
go
create function PartitionQueryDate() returns date as
begin
return GETDATE() - 90 -- returns 1/5/2019
end
go
-- Create partition func and scheme using above functions
CREATE PARTITION FUNCTION order_pf (smalldatetime) AS RANGE RIGHT FOR VALUES (dbo.PartitionDate())
CREATE PARTITION SCHEME order_ps AS PARTITION order_pf ALL TO ([PRIMARY])
-- Create Order (pk, OrderDate, Fk), Customer (pk) tables. Order is partitioned
create table Customer
(
id int primary key identity(1,1),
FirstName varchar(255) not null
)
create table [Order]
(
id int identity(1,1), OrderDate smalldatetime not null,
CustomerId int not null,
CONSTRAINT [FK_Orders_Customer] FOREIGN KEY ([CustomerId]) REFERENCES Customer([id])
) on order_ps(OrderDate);
-- Add in indexes to Order: only OrderDate on the partition func
CREATE CLUSTERED INDEX [Order_OrderDate] ON [Order]([OrderDate] ASC) ON [order_ps] ([OrderDate]);
CREATE NONCLUSTERED INDEX [FK_Order_Customer] ON [Order](CustomerId, OrderDate) ON [order_ps] ([OrderDate]) -- seems to work the same with or without the partition reference.
go
-- Add some data before and after the partition break
insert Customer values ('bob')
insert [Order] values('12-31-2018', SCOPE_IDENTITY())
insert Customer values ('hank')
insert [Order] values('1-6-2019', SCOPE_IDENTITY())
---------------------------- test
-- verify a row per partition:
SELECT $PARTITION.order_pf(OrderDate) as Partition_Number, COUNT(*) as Row_Count
FROM [Order]
GROUP BY $PARTITION.order_pf(OrderDate)
-- Simple queries with actual execution plan turned on. The queries are logically equivalent.
select COUNT(1) from [Order] where OrderDate > '1-5-2019' -- Index seek Order_OrderDate; actual partition count 1
select COUNT(1) from [Order] where OrderDate > dbo.PartitionQueryDate() -- Index seek Order_OrderDate; actual partition count 2
-- Cleanup
drop table if exists [Order]
drop table if exists Customer
drop partition scheme order_ps
drop partition function order_pf
drop function if exists PartitionDate
drop function if exists PartitionQueryDate

One workaround would be to assign the function result to a variable first.
declare #pqd smalldatetime = dbo.PartitionQueryDate();
select COUNT(1) from [Order] where OrderDate > #pqd
Another option would be to use an inline TVF
CREATE FUNCTION dbo.PartitionQueryDateTVF ()
RETURNS TABLE
AS
RETURN
(
SELECT CAST(CAST( GETDATE() - 90 AS DATE) AS SMALLDATETIME) AS Date
)
GO
SELECT COUNT(1) from [Order] where OrderDate > (SELECT Date FROM dbo.PartitionQueryDateTVF())
This may be something that is improved with inline scalar UDFs but I'm not in a position to test this at the moment

Related

Partition SQL Server tables based on column not in the primary key?

Let's say I have a table like this:
create table test_partitions
(
pk_id int not null,
col1 nvarchar(20),
col2 nvarchar(100),
constraint pk_test_partitions primary key (pk_id, col1)
);
I want to partition this table to improve query performance so that I don't have to look through the whole table every time I need something. So I added a calculated column:
create table test_partitions
(
pk_id int not null,
partition_id as pk_id % 10 persisted not null,
col1 nvarchar(20),
col2 nvarchar(100),
constraint pk_test_partitions primary key (pk_id, col1)
);
So ever time I do select * from test_partitions where pk_id = 123 I want SQL Server to look in only 1/10th of the entire table. I don't want to add partition_id column to the primary key because it will never be part of the where clause. How do I partition my table on partition_id?
Just right now i did test and found a solution
SELECT TOP ((SELECT COUNT(*) [TABLE_NAME])/10)
* FROM [TABLE_NAME]
it returns 1/10th of your records
To improve your query performance you can apply pagination. That is you can use fetch next query to achieve your output.
Please go through this for more details.
SELECT column-names FROM table-name
ORDER BY column-names
OFFSET n ROWS
FETCH NEXT m ROWS ONLY

Can I grab the inserted IDs when doing multiple inserts?

In my head this sounds improbable, but I'd like to know if I can do it:
INSERT INTO MyTable (Name)
VALUES ('First'),
('Second'),
('Third'),
('Fourth'),
('Fifth');
SELECT INSERTED Name, ID FROM TheAboveQuery
Where ID is an auto-indexed column?
Just to clarify, I want to select ONLY the newly inserted rows.
Starting with SQL Server 2008 you can use OUTPUT clause with INSERT statement
DECLARE #T TABLE (ID INT, Name NVARCHAR(100))
INSERT INTO MyTable (Name)
OUTPUT INSERTED.ID, INSERTED.Name INTO #T
VALUES
('First'),
('Second'),
('Third'),
('Fourth'),
('Fifth');
SELECT Name, ID FROM #T;
UPDATE: if table have no triggers
INSERT INTO MyTable (Name)
OUTPUT INSERTED.ID, INSERTED.Name
VALUES
('First'),
('Second'),
('Third'),
('Fourth'),
('Fifth');
Sure, you can use an IDENTITY property on your ID field, and create the CLUSTERED INDEX on it
ONLINE DEMO
create table MyTable ( ID int identity(1,1),
[Name] varchar(64),
constraint [PK_MyTable] primary key clustered (ID asc) on [Primary]
)
--suppose this data already existed...
INSERT INTO MyTable (Name)
VALUES
('First'),
('Second'),
('Third'),
('Fourth'),
('Fifth');
--now we insert some more... and then only return these rows
INSERT INTO MyTable (Name)
VALUES
('Sixth'),
('Seventh')
select top (##ROWCOUNT)
ID,
Name
from MyTable
order by ID desc
##ROWCOUNT returns the number of rows affected by the last statement executed. You can always see this in the messages tab of SQL Server Management Studio. Thus, we are getting the number of rows inserted and combining it with TOP which limits the rows returned in a query to the specified number of rows (or percentage if you use [PERCENT]). It is important that you use ORDER BY when using TOP otherwise your results aren't guaranteed to be the same
From my previous edited answer...
If you are trying to see what values were inserted, then I assume you are inserting them a different way and this is usually handled with an OUTPUT clause, TRIGGER if you are trying to do something with these records after the insert, etc... more information would be needed.

SQL Server 2014 - parallel processes Insert same value in table with unique index

I have a table called dbo.mtestUnique with two column id and desc, I have a unique index on "desc" , two process inserting data to this table at a same time, how can I avoid inserting duplicate value and violating the unique index?
not exists and left join doesn't work.
to replicate this you can create a table on a test database:
CREATE TABLE mtestUnique
(
id INT ,
[DESC] varchar(50),
UNIQUE([DESC])
)
and then run the following script on two different queries on SSMS.
SET XACT_ABORT ON;
DECLARE #time VARCHAR(50)
WHILE (1=1)
BEGIN
IF OBJECT_ID('tempdb..#t') IS NOT NULL
DROP TABLE #t
SELECT #time = CAST(DATEPART(HOUR , GETDATE()) AS VARCHAR(10)) + ':' + RIGHT('00' +CAST(DATEPART(MINUTE , GETDATE())+1 AS VARCHAR(2)),2)
SELECT MAX(id) + 1 id , 'test' + #time [DESC]
INTO #t
FROM dbo.mtestUnique
-- to insert as exact same time
WAITFOR TIME #time
INSERT INTO dbo.mtestUnique
( id, [DESC] )
SELECT *
FROM #t t
WHERE NOT EXISTS (
SELECT 1
FROM dbo.mtestUnique u
WHERE u.[DESC] = t.[Desc]
)
END
I even put the insert in a TRAN but no luck.
thanks for your help in advance.
The only way to prevent a unique constraint violation is to not insert duplicate values for the column. If you have the unique constraint, it will throw an error when you try to insert a duplicate description, but it will not control what descriptions are attempted to be inserted.
With that said, if you only need a unique identifier I would highly recommend using the ID instead. Set it to an auto incriminating integer and do not insert in manually. Just provide the description and SQL Server will populate the ID for you avoiding duplicates.

Stored proc to copy relational data (SQL Server 2000)

I've got the following tables (only key columns shown):
Order OrderItem OrderItemDoc Document
======= =========== ============ ==========
OrderId OrderItemId OrderItemId DocumentId
--etc-- OrderId DocumentId --etc--
--etc--
I'm writing a stored procedure to 'clone' an Order (takes an existing OrderId as a parameter, copies the Order and all related items, then returns the new OrderId). I'm stuck on the 'OrderItemDoc' joining table as it will be joining two sets of newly created records. I'm thinking I'll need to loop round a temporary table that maps the old IDs to the new ones. Is that the right direction to go in? It's running on MS-SQL 2000.
There are many efficient ways of doing this SQL 2005 and 2008. Here's a way to do it using SQL2000.
You need to declare a variable to hold the cloned OrderId and create a temp table to hold the cloned records that will go in the OrderItemDoc table.
Here's some sample code on how to that. It relies on the sequence to link the old OrderItems to the new ones in OrderItemDoc Table.
CREATE PROCEDURE CloneOrder
(
#OrderId int
)
AS
DECLARE #NewOrderId int
--create the cloned order
INSERT Order(...OrderColumnList...)
SELECT ...OrderColumnList... FROM ORDER WHERE OrderId = #OrderId;
-- Get the new OrderId
SET #NewOrderId = SCOPE_IDENTITY();
-- create the cloned OrderItems
INSERT OrderItem(OrderId,...OrderItemColumns...)
SELECT #NewOrderId, ...OrderItemColumns...
FROM OrderItem WHERE OrderId = #OrderId
-- Now for the tricky part
-- Create a temp table to hold the OrderItemIds and DocumentIds
CREATE TABLE #TempOrderItemDocs
(
OrderItemId int,
DocumentId int
)
-- Insert the DocumentIds associated with the original Order
INSERT #OrderItemDocs(DocumentId)
SELECT
od.DocumentId
FROM
OrderItemDoc od
JOIN OrderItem oi ON oi.OrderItemId = od.OrderItemId
WHERE
oi.OrderId = #OrderId
ORDER BY
oi.OrderItemId
-- Update the temp table to contain the newly cloned OrderItems
UPDATE #OrderItemDocs
SET
OrderItemId = oi.OrderItemId
FROM
OrderItem oi
WHERE
oi.OrderId = #NewOrderId
ORDER BY
oi.OrderItemId
-- Now to complete the Cloning process
INSERT OrderItemDoc(OrderItemId, DocumentId)
SELECT
OrderItemId, DocumentId
FROM
#TempOrderItemDocs
Yes, a memory table or a temp table would be your best options. If your PK's are identity columns then you could also make assumptions about ID's being contiguous based on an offset (ie, you could assume that your new OrderItemId is equal to the existing Max(OrderItemId) in the table + the relative offset of the Item in the Order, but I don't like making assumptions like that and it becomes a pain going more than one level deep).
drats, I wrote up this then saw you were on 2000... (sql server 2005 doesn't have the trick that this uses...)
no loop necessary in sql 2005..
INSERT INTO Order ----assuming OrderID is an identity
VALUES ( .....)
SELECT
.....
FROM Order
WHERE OrderId=#OrderId
DECLARE #y TABLE (RowID int identity(1,1) primary key not null, OldID int, NewID int)
INSERT INTO OrderItem ---assuming OrderItemId is an identity
VALUES (OrderId ......)
OUTPUT OrderItems.OrderItemId, INSERTED.tableID
INTO #y
SELECT
OrderId .....
FROM OrderItems
WHERE OrderId=#OrderId
INSERT INTO OrderItemDoc
VALUES (OrderItemId ....) ---assuming DocumentId is an identity
SELECT
y.NewID .....
FROM OrderItem
INNER JOIN #Y y ON OrderItem.OrderItemId=y.OldId
do document the same way, make a new #temp table, etc...

Does query plan optimizer works well with joined/filtered table-valued functions?

In SQLSERVER 2005, I'm using table-valued function as a convenient way to perform arbitrary aggregation on subset data from large table (passing date range or such parameters).
I'm using theses inside larger queries as joined computations and I'm wondering if the query plan optimizer work well with them in every condition or if I'm better to unnest such computation in my larger queries.
Does query plan optimizer unnest
table-valued functions if it make
sense?
If it doesn't, what do you
recommend to avoid code duplication
that would occur by manually
unnesting them?
If it does, how do
you identify that from the execution
plan?
code sample:
create table dbo.customers (
[key] uniqueidentifier
, constraint pk_dbo_customers
primary key ([key])
)
go
/* assume large amount of data */
create table dbo.point_of_sales (
[key] uniqueidentifier
, customer_key uniqueidentifier
, constraint pk_dbo_point_of_sales
primary key ([key])
)
go
create table dbo.product_ranges (
[key] uniqueidentifier
, constraint pk_dbo_product_ranges
primary key ([key])
)
go
create table dbo.products (
[key] uniqueidentifier
, product_range_key uniqueidentifier
, release_date datetime
, constraint pk_dbo_products
primary key ([key])
, constraint fk_dbo_products_product_range_key
foreign key (product_range_key)
references dbo.product_ranges ([key])
)
go
.
/* assume large amount of data */
create table dbo.sales_history (
[key] uniqueidentifier
, product_key uniqueidentifier
, point_of_sale_key uniqueidentifier
, accounting_date datetime
, amount money
, quantity int
, constraint pk_dbo_sales_history
primary key ([key])
, constraint fk_dbo_sales_history_product_key
foreign key (product_key)
references dbo.products ([key])
, constraint fk_dbo_sales_history_point_of_sale_key
foreign key (point_of_sale_key)
references dbo.point_of_sales ([key])
)
go
create function dbo.f_sales_history_..snip.._date_range
(
#accountingdatelowerbound datetime,
#accountingdateupperbound datetime
)
returns table as
return (
select
pos.customer_key
, sh.product_key
, sum(sh.amount) amount
, sum(sh.quantity) quantity
from
dbo.point_of_sales pos
inner join dbo.sales_history sh
on sh.point_of_sale_key = pos.[key]
where
sh.accounting_date between
#accountingdatelowerbound and
#accountingdateupperbound
group by
pos.customer_key
, sh.product_key
)
go
-- TODO: insert some data
-- this is a table containing a selection of product ranges
declare #selectedproductranges table([key] uniqueidentifier)
-- this is a table containing a selection of customers
declare #selectedcustomers table([key] uniqueidentifier)
declare #low datetime
, #up datetime
-- TODO: set top query parameters
.
select
saleshistory.customer_key
, saleshistory.product_key
, saleshistory.amount
, saleshistory.quantity
from
dbo.products p
inner join #selectedproductranges productrangeselection
on p.product_range_key = productrangeselection.[key]
inner join #selectedcustomers customerselection on 1 = 1
inner join
dbo.f_sales_history_..snip.._date_range(#low, #up) saleshistory
on saleshistory.product_key = p.[key]
and saleshistory.customer_key = customerselection.[key]
I hope the sample makes sense.
Much thanks for your help!
In this case, it's an "inline table valued function"
The optimiser simply expands (unnests) it if it's useful (or view).
If the function is treated as "black box" by the outer query, the quickest way is to compare IO shown in SSMS vs IO in profiler.
Profler captures "black box" IO that SSMS does not.
Blog post by Adam Mechanic (his book is in my drawer at work)
1) Yes, using your syntax, it does. If you happened to use a UDF that returned a table which had conditional logic in it, it would not, though.
3) The optimizer won't point out what part of your query it's optimizing, because it may see fit to combine chunks of the plan with your function, or to optimize bits away.

Resources