speed up long running query on huge tables - sql-server

I am facing with problematic query on Azure SQL database, which I need to speed up.
This is my query:
SELECT
[Incidents].[Incident_Number],
[Incidents].[Incidentinteraction],
[Incidents].[Incidentid],
[Address].[Ads_Sk]
FROM
[schema1].[Address] AS Address --3529046 rows
JOIN
[schema2].[Incidents] AS Incidents --3268375 rows
ON Incidents.[Ads_Sk_Incidentaddress] = Address.[Ads_Sk]
Address table has 2 indexes:
ALTER TABLE [schema1].[ADDRESS]
ADD PRIMARY KEY CLUSTERED ([ADS_SK] ASC, [ISCURRENTRECORD] ASC, [RECORDSTARTDATE] ASC)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]
and a non-clustered index:
CREATE NONCLUSTERED INDEX [nci_ADDRESS_ADS_CURRENT]
ON [PROMISE_CDW].[ADDRESS] ([ADS_SK] ASC, [ISCURRENTRECORD] ASC)
WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
Incident table has also two indexes:
ALTER TABLE [schema2].[INCIDENTS]
ADD PRIMARY KEY CLUSTERED ([INCIDENTID] ASC, [ISCURRENTRECORD] ASC, [RECORDSTARTDATE] ASC)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]
and
CREATE NONCLUSTERED INDEX [nci_ADDRESS_ADS_SK_INCIDENT_NUMBER]
ON [schema2].[INCIDENTS] ([ADS_SK_INCIDENTADDRESS] ASC, [INCIDENT_NUMBER] ASC)
INCLUDE ([INCIDENTID], [INCIDENTINTERACTION])
WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
GO
I got my results after 22 seconds and this is unacceptable for business users.
How can I speed up this query?
Thank you in advance for any hint

That query doesn't need a join, all the fields you're selecting belong to the Incidents table.
You're only using the Address table to filter rows that don't have Incidents.[Ads_Sk_Incidentaddress] as part of Address.[Ads_Sk], which is something you can easily do in a where clause with an in.

Related

Composite key in where condition - SQL Server

I'm involved in some data cleaning activities. My table does not have any unique identifier. So I decided to take 3 columns and make the index like below:
ALTER TABLE [dbo].[ISFTX]
ADD CONSTRAINT ISFTX_UNQ UNIQUE (IDNO,ACCTNUM,PANNO)
After altering table, a part of my alter script looks like this:
CONSTRAINT [ISFTX_UNQ] UNIQUE NONCLUSTERED
(
[IDNO] ASC,
[ACCTNUM] ASC,
[PANNO] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
I want to use the combination of 3 columns as ONE KEY and put it in where condition so that I can search for other column values in the same table using this where condition(ONE KEY). How can I do this?
I would prefer any example of "Where" condition and this will help? OR Please suggest me any better way of doing this? Thanks.

Why FREETEXTTABLE scan entire table if there is a fulltext index available?

There is a simple table of cities
CREATE TABLE [dbo].[cities](
[id] [numeric](6, 0) NOT NULL,
[cityname] [varchar](48) NOT NULL,
)
and a full-text catalog which index not only names of cities:
The table has 6259 rows.
I have a query:
SELECT * FROM FREETEXTTABLE([cities],[cityname],
N'Hello, I am looking for cities inside this text,
my house is not -far from London PJ-')
The results contains 11 rows and the query execution takes about 1-2 seconds.
According to execution plan, 6092 rows were scanned in the index.
Why so many rows are scanned if there is index?
I Use SQL Server 2012.
UPDATE:
There is no special index on table cities. The only index from object explorer looks like:
Create TABLE [dbo].[cities] ADD CONSTRAINT [xpkruianobec] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
GO

SQL Server rows not in order of clustered index

I have a table that has a clustered index on the id
[SomeID] [bigint] IDENTITY(1,1) NOT NULL,
When I do
select top 1000 * from some where date > '20150110'
My records are not in order
When I do:
select top 1000 * from some where date > '20150110' and date < '20150111'
They are in order?
Index is :
CONSTRAINT [PK_Some] PRIMARY KEY CLUSTERED
(
[SomeID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
I have never come across this before, does anyone have an idea of what is happening and how I can fix this.
Thanks
You can't rely on an order if you do not specify one. Add an order by clause.
Otherwise the DB will just grab the result as fast as possible and that is not always in the order of the index.

SQL Server not using index

I'm trying to write a query in SQL Server, but it's doing a table scan on a table with about 30 million rows (TGS_INFO), so the query runs very slowly.
The actual query is more complex but I've reduced it to a simpler version that still exhibits the same issue.
SELECT DISTINCT UNIT_ITEMS.DBKEY,
UNIT_ITEMS.ID,
UNIT_ITEMS.LOCATION1,
UNIT_ITEMS.LOCATION2
FROM UNIT_ITEMS
INNER JOIN TGS.dbo.TGS_INFO
ON UNIT_ITEMS.UNIT_ID = TGS_INFO.UNIT_ID AND
UNIT_ITEMS.ITEM_ID = TGS_INFO.ITEM_ID AND
UNIT_ITEMS.LOCATION1 = TGS_INFO.LOCATION1 AND
UNIT_ITEMS.LOCATION2 = TGS_INFO.LOCATION2
Here is the execution plan.
StmtText
|--Sort(DISTINCT ORDER BY:([DbName].[dbo].[UNIT_ITEMS].[DBKEY] ASC, [DbName].[dbo].[UNIT_ITEMS].[ITEM_ID] ASC, [DbName].[dbo].[UNIT_ITEMS].[LOCATION1] ASC, [DbName].[dbo].[UNIT_ITEMS].[LOCATION2] ASC))
|--Hash Match(Inner Join, HASH:([DbName].[dbo].[UNIT_ITEMS].[UNIT_ID], [DbName].[dbo].[UNIT_ITEMS].[ITEM_ID], [DbName].[dbo].[UNIT_ITEMS].[LOCATION1], [DbName].[dbo].[UNIT_ITEMS].[LOCATION2])=([Expr1008], [Expr1009], [Expr1010], [Expr1011]), RESIDUAL:([DbName].[dbo].[UNIT_ITEMS].[UNIT_ID]=[Expr1008] AND [DbName].[dbo].[UNIT_ITEMS].[ITEM_ID]=[Expr1009] AND [DbName].[dbo].[UNIT_ITEMS].[LOCATION1]=[Expr1010] AND [DbName].[dbo].[UNIT_ITEMS].[LOCATION2]=[Expr1011]))
|--Table Scan(OBJECT:([DbName].[dbo].[UNIT_ITEMS]))
|--Compute Scalar(DEFINE:([Expr1008]=CONVERT_IMPLICIT(int,[TGS].[dbo].[TGS_INFO].[UNIT_ID],0), [Expr1009]=CONVERT_IMPLICIT(nvarchar(50),[TGS].[dbo].[TGS_INFO].[ITEM_ID],0), [Expr1010]=CONVERT_IMPLICIT(nvarchar(50),[TGS].[dbo].[TGS_INFO].[LOCATION1],0), [Expr1011]=CONVERT_IMPLICIT(int,[TGS].[dbo].[TGS_INFO].[LOCATION2],0)))
|--Table Scan(OBJECT:([TGS].[dbo].[TGS_INFO]))
TGS_INFO and UNIT_ITEMS both have nonclustered indexes on UNIT_ID and ITEM_ID. As mentioned, TGS_INFO has about 30 million rows but they are evenly distributed around about a thousand different UNIT_IDs. UNIT_ITEMS always contains only one UNIT_ID.
Here are the indexes:
CREATE NONCLUSTERED INDEX [IX_UNIT_ID_ITEM_ID] ON [dbo].[TGS_INFO]
(
[UNIT_ID] ASC,
[ITEM_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_UNIT_ID_ITEM_ID] ON [dbo].[UNIT_ITEMS]
(
[UNIT_ID] ASC,
[ITEM_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
As I mentioned in the comments, all the columns are VARCHAR(50) in TGS_INFO. All the columns in UNIT_ITEMS are ints.
For the record, I didn't design the schema of TGS_INFO.
If you don't include LOCATION1 and LOCATION2 in your indexes the join cannot be satisfied from an index alone. Add these columns to the indexes on both tables.
You probably have to include all other columns that are referenced in your query, too.
I notice the execution plan shows the following:
|--Compute Scalar(DEFINE:([Expr1008]=CONVERT_IMPLICIT(int,[TGS].[dbo].[TGS_INFO].[UNIT_ID],0), [Expr1009]=CONVERT_IMPLICIT(nvarchar(50),[TGS].[dbo].[TGS_INFO].[ITEM_ID],0), [Expr1010]=CONVERT_IMPLICIT(nvarchar(50),[TGS].[dbo].[TGS_INFO].[LOCATION1],0), [Expr1011]=CONVERT_IMPLICIT(int,[TGS].[dbo].[TGS_INFO].[LOCATION2],0)))
I can't think of a good reason for the query engine to do an implicit data type conversion on these columns unless the data types between the two tables don't match on the columns you're using for the join.
You may also try moving UNIT_ITEMS.LOCATION1 = TGS_INFO.LOCATION1 AND UNIT_ITEMS.LOCATION2 = TGS_INFO.LOCATION2 to the WHERE clause since they're not covered by an index. The query engine is typically smart enough to account for this, but it's something to try.

Outer join on two tables with sequential guid stalls

I'm attempting to perform a full outer join on two tables that are not related. Each table has a location_id which will eventually form the primary/foreign key relationship (once I figure out this performance issue). When executing the outer join, it just clocks away. Queries and triggers performed against each table on its own complete in less than a second.
This table has 21000 records:
CREATE TABLE [dbo].[TBL_LOCATIONS](
[OBJECTID] [int] NOT NULL,
[Loc_Name] [nvarchar](100) NULL,
[Location_ID] [uniqueidentifier] NULL,
[SHAPE] [geometry] NULL,
CONSTRAINT [R33_pk] PRIMARY KEY CLUSTERED
(
[OBJECTID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 75) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[TBL_LOCATIONS] WITH CHECK ADD CONSTRAINT [g17_ck] CHECK (([SHAPE].[STSrid]=(26917)))
GO
ALTER TABLE [dbo].[TBL_LOCATIONS] ADD CONSTRAINT [DF_TBL_LOCATIONS_Location_ID] DEFAULT (newsequentialid()) FOR [Location_ID]
GO
CREATE SPATIAL INDEX [S17_idx] ON [dbo].[TBL_LOCATIONS]
(
[SHAPE]
)USING GEOMETRY_GRID
WITH (
BOUNDING_BOX =(224827, 3923750, 323464, 3967780), GRIDS =(LEVEL_1 = HIGH,LEVEL_2 = HIGH,LEVEL_3 = HIGH,LEVEL_4 = HIGH),
CELLS_PER_OBJECT = 16, PAD_INDEX = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE UNIQUE NONCLUSTERED INDEX [UUID_OID_33] ON [dbo].[TBL_LOCATIONS]
(
[Location_ID] ASC,
[OBJECTID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 75) ON [PRIMARY]
GO
This table has 53000 records
CREATE TABLE [dbo].[TBL_EVENTS](
[OBJECTID] [int] NOT NULL,
[Event_ID] [uniqueidentifier] NULL,
[Location_ID] [uniqueidentifier] NULL,
CONSTRAINT [PK_TBL_EVENTS] PRIMARY KEY CLUSTERED
(
[OBJECTID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[TBL_EVENTS] ADD CONSTRAINT [DF_TBL_EVENTS_Event_ID] DEFAULT (newsequentialid()) FOR [Event_ID]
GO
ALTER TABLE [dbo].[TBL_EVENTS] ADD CONSTRAINT [DF_TBL_EVENTS_Event_ID] DEFAULT (newsequentialid()) FOR [Event_ID]
GO
CREATE UNIQUE NONCLUSTERED INDEX [R36_SDE_ROWID_UK] ON [dbo].[TBL_EVENTS]
(
[OBJECTID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 75) ON [PRIMARY]
GO
And here is the query that is running....and running...1 hour and no results.
SELECT
TBL_LOCATIONS.Loc_Name,
TBL_LOCATIONS.Location_ID,
TBL_LOCATIONS.SHAPE,
TBL_EVENTS.Event_ID
FROM
TBL_EVENTS
FULL OUTER JOIN
TBL_LOCATIONS ON TBL_EVENTS.Location_ID = TBL_LOCATIONS.Location_ID
I've tried every permutation of attribute indexes on both tables, rebuilding and reorganizing them, nothing affects the performance. The use of ObjectID as PK is mandated by the application, as is the sequentialGUID. I don't think those are factors here, as both these tables perform splendidly outside of this query. SQL Server 2008 SP1 64BIT on RAID 10/48 GB RAM.
FULL JOIN works well when data in columns used to links tables are unique.
For rows containing duplicated data FULL JOIN behaves like CROSS JOIN and can cause performace issues.
So probably bottleneck comes from duplicates in LOCATION_ID column.
Maybe you need to consider turning off Transaction Logging whilst doing all that.
If the linked field values are not all that unique (location), the query size could approach quite a large number.
In an extreme example, if location only had the value of "1" in both tables, the total rows would be close to the cross join size, about 1,113,000,000 rows (21,000 * 53,000). A query of this size (over a billion rows) will take a long time to run.
EDIT - updating incorrect statement as pointed out in comments

Resources