SQL Server rows not in order of clustered index - sql-server

I have a table that has a clustered index on the id
[SomeID] [bigint] IDENTITY(1,1) NOT NULL,
When I do
select top 1000 * from some where date > '20150110'
My records are not in order
When I do:
select top 1000 * from some where date > '20150110' and date < '20150111'
They are in order?
Index is :
CONSTRAINT [PK_Some] PRIMARY KEY CLUSTERED
(
[SomeID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
I have never come across this before, does anyone have an idea of what is happening and how I can fix this.
Thanks

You can't rely on an order if you do not specify one. Add an order by clause.
Otherwise the DB will just grab the result as fast as possible and that is not always in the order of the index.

Related

Explicit conversion of column in Table Partition in SQL Server

I have table like below:
CREATE TABLE [dbo].[PartitionExample]
(
[dateTimeColumn1] [datetime] NOT NULL,
CONSTRAINT [PK_PartitionExample] PRIMARY KEY CLUSTERED
(
[dateTimeColumn1] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =
OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
I have created Partition Function like below:
CREATE PARTITION FUNCTION DateRangePF (INT)
AS RANGE RIGHT FOR VALUES ( 20180601,20180901,20181201,20190301)
Then, I have created Partition Scheme for it:
CREATE PARTITION SCHEME DateRangePS
AS PARTITION DateRangePF TO
(FG032018_SampleDB,FG062018_SampleDB,FG092018_SampleDB,
FG122018_SampleDB,FG032019_SampleDB);
Now, When I am applying the partition scheme to this table, I want to apply explicit conversion of [dateTimeColumn1] column of datetime data type to INT Data Type. But when I tried it, I got syntax error:
ALTER TABLE [dbo].[PartitionExample] ADD
CONSTRAINT [PK_PartitionExample] PRIMARY KEY CLUSTERED
(
dateTimeColumn1 ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, ALLOW_ROW_LOCKS =
ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90)
ON DateRangePS(
convert(INT, CONVERT(CHAR(8), dateTimeColumn1, 112));
Can you guys please let me know
how explicit conversion can be implemented in such scenarios.
Also would it perform better when I will perform explicit conversion of datetime column or char(8) column to INT Column for partition.
Thank you for your help.

Optimization for Date Correlation doesn’t change plan

I have a reporting requirement from the following tables. I created a new database with these tables and imported data from the live database for reporting purpose.
The report parameter is a date range. I read the following and found that DATE_CORRELATION_OPTIMIZATION can be used to make the query work faster by utilizing seek instead of scan. I made the required settings – still the query is using same old plan and same execution time. What additional changes need to be made to make the query utilize the date correlation?
Note: I am using SQL Server 2005
REFERENCES
Optimizing Queries That Access Correlated datetime Columns
The Query Optimizer: Date Correlation Optimisation
SQL
--Database change made for date correlation
ALTER DATABASE BISourcingTest
SET DATE_CORRELATION_OPTIMIZATION ON;
GO
--Settings made
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
SET ARITHABORT ON
SET CONCAT_NULL_YIELDS_NULL ON
SET QUOTED_IDENTIFIER ON
SET NUMERIC_ROUNDABORT OFF
GO
--Test Setting
IF ( (sessionproperty('ANSI_NULLS') = 1) AND
(sessionproperty('ANSI_PADDING') = 1) AND
(sessionproperty('ANSI_WARNINGS') = 1) AND
(sessionproperty('ARITHABORT') = 1) AND
(sessionproperty('CONCAT_NULL_YIELDS_NULL') = 1) AND
(sessionproperty('QUOTED_IDENTIFIER') = 1) AND
(sessionproperty('NUMERIC_ROUNDABORT') = 0)
)
PRINT 'Everything is set'
ELSE
PRINT 'Different Setting'
--Query
SELECT C.ContainerID, C.CreatedOnDate,OLIC.OrderID
FROM ContainersTest C
INNER JOIN OrderLineItemContainers OLIC
ON OLIC.ContainerID = C.ContainerID
WHERE C.CreatedOnDate > '1/1/2015'
AND C.CreatedOnDate < '2/01/2015'
TABLES
CREATE TABLE [dbo].[ContainersTest](
[ContainerID] [varchar](20) NOT NULL,
[Weight] [decimal](9, 2) NOT NULL DEFAULT ((0)),
[CreatedOnDate] [datetime] NOT NULL DEFAULT (getdate()),
CONSTRAINT [XPKContainersTest] PRIMARY KEY CLUSTERED
(
[CreatedOnDate] ASC,
[ContainerID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[OrderLineItemContainers](
[OrderID] [int] NOT NULL,
[LineItemID] [int] NOT NULL,
[ContainerID] [varchar](20) NOT NULL,
[CreatedOnDate] [datetime] NOT NULL DEFAULT (getdate()),
CONSTRAINT [PK_POLineItemContainers] PRIMARY KEY CLUSTERED
(
[OrderID] ASC,
[LineItemID] ASC,
[ContainerID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
CONSTRAINT [IX_OrderLineItemContainers] UNIQUE NONCLUSTERED
(
[ContainerID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[OrderLineItemContainers] WITH CHECK ADD CONSTRAINT [FK_POLineItemContainers_Containers] FOREIGN KEY([ContainerID])
REFERENCES [dbo].[Containers] ([ContainerID])
GO
ALTER TABLE [dbo].[OrderLineItemContainers] CHECK CONSTRAINT [FK_POLineItemContainers_Containers]
Plan
--
According to the docs:
https://technet.microsoft.com/en-us/library/ms177416(v=sql.105).aspx
If any one of the datetime columns for which correlation statistics are maintained is not the first or only key of a clustered index, consider creating a clustered index on it. Doing this generally leads to better performance on the types of queries covered by correlation statistics. If a clustered index already exists on the primary key columns, you can modify a table so that the clustered index and primary key use different column sets.
Since your OrderLineItemContainers table has no suitable index by which to filter on the Date, it really can't do anything. Try adding a nonclustered index on the OrderLineItemContainers.CreatedOnDate to see if it will then switch the plan.
It would be better to have it be clustered, but there are other considerations... note you could make the primary key nonclustered, and use the clustered for this new date index if this is the dominant query and this makes it worth it.
So this is optimal:
CREATE TABLE [dbo].[OrderLineItemContainers](
[OrderID] [int] NOT NULL,
[LineItemID] [int] NOT NULL,
[ContainerID] [varchar](20) NOT NULL,
[CreatedOnDate] [datetime] NOT NULL DEFAULT (getdate()),
CONSTRAINT [PK_POLineItemContainers] PRIMARY KEY NONCLUSTERED -- NONCLUSTERED PRIMARY KEY!!
(
[OrderID] ASC,
[LineItemID] ASC,
[ContainerID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
CONSTRAINT [IX_OrderLineItemContainers] UNIQUE NONCLUSTERED
(
[ContainerID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE CLUSTERED INDEX ON OrderLineItemContainers(CreatedOnDate)
OR you could just try a new NONCLUSTERED index:
CREATE NONCLUSTERED INDEX ON OrderLineItemContainers(CreatedOnDate)

Updating a table after adding Index

I am designing a database using SQLExpress.
I have a table which has three columns. The table looks as below.
CREATE TABLE [dbo].[dummy](
[id] [int] IDENTITY(1,1) NOT NULL,
[someLongString] [text] NOT NULL,
[someLongText_Hash] [binary](20) NOT NULL,
CONSTRAINT [PK_dummy] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
I already have some data in this table. Whenever I want to add a new row, I first compute a hash on someLongString and query the table to see if a row with this hash already exists. As the table size grows, this query talks longer time and hence I plan to index it by the someLongText_Hash column.
Can some please suggest how to do this in SQL Server Management Studio. Also, after adding this index, how do I index the existing rows in this table ?
Why can't you just set the 'someLongString' field to be unique? That way you don't need to keep a hash and an extra primary key?
You could try using a CHECKSUM.
CREATE TABLE [dbo].[dummy](
[id] [int] IDENTITY(1,1) NOT NULL,
[someLongString] [text] NOT NULL,
[someLongText_CheckSum] NOT NULL,
CONSTRAINT [UC_someLongText_CheckSum] UNIQUE (someLongText_CheckSum),
CONSTRAINT [PK_dummy] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
See here for further explanation

primary key name is required field?

Is there any difference between the below 2 CREATE TABLE statements in SQL Server 200x/2012? I generated this script from two different tables, one had a Key name defined (PK_Table1) whereas the other had some kind of randomly generated number associated to it (PK_Table1_1084F446).
CREATE TABLE [dbo].[Table1](
[ID] [uniqueidentifier] NOT NULL,
<<Other Column declaration here>>
PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Few more non-clustered indexes declaration here
CREATE TABLE [dbo].[Table1](
[ID] [uniqueidentifier] NOT NULL,
<<Other Column declaration here>>
CONSTRAINT [PK_Table1] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Few more non-clustered indexes declaration here
It works in the same way, but natural names are more convenient:
1) when altering constraint you can easy refer to it (if you gave sensible name);
2) when query failed due to constraint, name of this constraint is showed, so you can easily know what cause an error (if you gave sensible name).

Outer join on two tables with sequential guid stalls

I'm attempting to perform a full outer join on two tables that are not related. Each table has a location_id which will eventually form the primary/foreign key relationship (once I figure out this performance issue). When executing the outer join, it just clocks away. Queries and triggers performed against each table on its own complete in less than a second.
This table has 21000 records:
CREATE TABLE [dbo].[TBL_LOCATIONS](
[OBJECTID] [int] NOT NULL,
[Loc_Name] [nvarchar](100) NULL,
[Location_ID] [uniqueidentifier] NULL,
[SHAPE] [geometry] NULL,
CONSTRAINT [R33_pk] PRIMARY KEY CLUSTERED
(
[OBJECTID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 75) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[TBL_LOCATIONS] WITH CHECK ADD CONSTRAINT [g17_ck] CHECK (([SHAPE].[STSrid]=(26917)))
GO
ALTER TABLE [dbo].[TBL_LOCATIONS] ADD CONSTRAINT [DF_TBL_LOCATIONS_Location_ID] DEFAULT (newsequentialid()) FOR [Location_ID]
GO
CREATE SPATIAL INDEX [S17_idx] ON [dbo].[TBL_LOCATIONS]
(
[SHAPE]
)USING GEOMETRY_GRID
WITH (
BOUNDING_BOX =(224827, 3923750, 323464, 3967780), GRIDS =(LEVEL_1 = HIGH,LEVEL_2 = HIGH,LEVEL_3 = HIGH,LEVEL_4 = HIGH),
CELLS_PER_OBJECT = 16, PAD_INDEX = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE UNIQUE NONCLUSTERED INDEX [UUID_OID_33] ON [dbo].[TBL_LOCATIONS]
(
[Location_ID] ASC,
[OBJECTID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 75) ON [PRIMARY]
GO
This table has 53000 records
CREATE TABLE [dbo].[TBL_EVENTS](
[OBJECTID] [int] NOT NULL,
[Event_ID] [uniqueidentifier] NULL,
[Location_ID] [uniqueidentifier] NULL,
CONSTRAINT [PK_TBL_EVENTS] PRIMARY KEY CLUSTERED
(
[OBJECTID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[TBL_EVENTS] ADD CONSTRAINT [DF_TBL_EVENTS_Event_ID] DEFAULT (newsequentialid()) FOR [Event_ID]
GO
ALTER TABLE [dbo].[TBL_EVENTS] ADD CONSTRAINT [DF_TBL_EVENTS_Event_ID] DEFAULT (newsequentialid()) FOR [Event_ID]
GO
CREATE UNIQUE NONCLUSTERED INDEX [R36_SDE_ROWID_UK] ON [dbo].[TBL_EVENTS]
(
[OBJECTID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 75) ON [PRIMARY]
GO
And here is the query that is running....and running...1 hour and no results.
SELECT
TBL_LOCATIONS.Loc_Name,
TBL_LOCATIONS.Location_ID,
TBL_LOCATIONS.SHAPE,
TBL_EVENTS.Event_ID
FROM
TBL_EVENTS
FULL OUTER JOIN
TBL_LOCATIONS ON TBL_EVENTS.Location_ID = TBL_LOCATIONS.Location_ID
I've tried every permutation of attribute indexes on both tables, rebuilding and reorganizing them, nothing affects the performance. The use of ObjectID as PK is mandated by the application, as is the sequentialGUID. I don't think those are factors here, as both these tables perform splendidly outside of this query. SQL Server 2008 SP1 64BIT on RAID 10/48 GB RAM.
FULL JOIN works well when data in columns used to links tables are unique.
For rows containing duplicated data FULL JOIN behaves like CROSS JOIN and can cause performace issues.
So probably bottleneck comes from duplicates in LOCATION_ID column.
Maybe you need to consider turning off Transaction Logging whilst doing all that.
If the linked field values are not all that unique (location), the query size could approach quite a large number.
In an extreme example, if location only had the value of "1" in both tables, the total rows would be close to the cross join size, about 1,113,000,000 rows (21,000 * 53,000). A query of this size (over a billion rows) will take a long time to run.
EDIT - updating incorrect statement as pointed out in comments

Resources