How can I optimize this SQL outer join query? - sql-server
SCENARIO
I need to select records from test_userData based on a 1-to-1 match from test_userCheck on the columns customer or account_info. The code below will create a mock-up of the tables and will populate with random data for the purpose of my question. Based on this code, it's looking for any records where test_userData.customer = 'Guerrero, Unity' or test_userData.account_info = 'XXXXXXXXXXXXXXXX0821', and should return three rows (confirmation_id = 6836985, 5502798, and 3046441)
PROBLEM
As it stands, the query returns what I need... however, my real userData table has almost 2 million records, and the userCheck table has about 10,000. The query takes about 7 seconds as it is and I think that's way too long. I'm also worried because the userData table will start to grow quickly (by tens of thousands of unique records a day), and I envision my current method becoming unmanageable.
QUESTION
Any ideas on how I can optimize this to scale with millions of records? The data resides on a shared SQL 2008 server with limited permissions.
--setup temporary testing tables
IF EXISTS
(
SELECT * FROM dbo.sysobjects
WHERE id = object_id(N'[dbo].[test_userData]')
AND OBJECTPROPERTY(id, N'IsUserTable') = 1
)
DROP TABLE [dbo].[test_userData]
GO
IF EXISTS
(
SELECT * FROM dbo.sysobjects
WHERE id = object_id(N'[dbo].[test_userCheck]')
AND OBJECTPROPERTY(id, N'IsUserTable') = 1
)
DROP TABLE [dbo].[test_userCheck]
GO
CREATE TABLE [dbo].[test_userData](
[id] [int] IDENTITY(1,1) NOT NULL,
[merchant_id] [int] NOT NULL,
[sales_date] [datetime] NOT NULL,
[confirmation_id] [int] NOT NULL,
[customer] [nvarchar](max) NOT NULL,
[total] [smallmoney] NOT NULL,
[account_info] [nvarchar](max) NOT NULL,
[email_address] [nvarchar](max) NOT NULL
CONSTRAINT [PK_test_userData] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[test_userCheck](
[confirmation_id] [int] NOT NULL,
[customer] [nvarchar](max) NOT NULL,
[total] [smallmoney] NOT NULL,
[account_info] [nvarchar](max) NOT NULL
CONSTRAINT [PK_test_userCheck] PRIMARY KEY CLUSTERED
(
[confirmation_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
--insert some random user transactions
INSERT INTO [dbo].[test_userData] (merchant_id,sales_date,confirmation_id,customer,total,account_info,email_address) VALUES
('99','03/25/2010','3361424','Soto, Ahmed','936','XXXXXXXXXXXXXXXX8744','Donec.egestas#NullainterdumCurabitur.ca'),
('17','09/12/2010','6710165','Holcomb, Eden','1022','XXXXXXXXXXXXXXXX6367','Curabitur#dolortempus.org'),
('32','05/04/2010','4489509','Foster, Nasim','1463','XXXXXXXXXXXXXXXX7115','augue.eu.tellus#ullamcorperviverraMaecenas.ca'),
('95','01/02/2011','5384061','Browning, Owen','523','XXXXXXXXXXXXXXXX0576','sed.dictum.eleifend#accumsaninterdum.edu'),
('91','08/21/2010','6075234','Dawson, McKenzie','141','XXXXXXXXXXXXXXXX3580','dolor.sit.amet#etmagnis.org'),
('63','01/29/2010','1055619','Mathews, Keefe','1110','XXXXXXXXXXXXXXXX2682','ligula#Sednuncest.edu'),
('27','10/20/2010','1819662','Clarke, Briar','1474','XXXXXXXXXXXXXXXX7481','Donec.non.justo#malesuada.org'),
('82','03/05/2010','3184936','Holman, Dana','560','XXXXXXXXXXXXXXXX7080','Aenean.eget.magna#accumsan.edu'),
('24','06/11/2010','1007427','Kirk, Desiree','206','XXXXXXXXXXXXXXXX3681','parturient#at.com'),
('49','06/17/2010','6137066','Foley, Sopoline','1831','XXXXXXXXXXXXXXXX1718','ac.urna.Ut#pellentesqueafacilisis.org'),
('22','05/08/2010','3545367','Howell, Uriel','638','XXXXXXXXXXXXXXXX1945','ad.litora#arcuvelquam.ca'),
('5','10/25/2010','6836985','Little, Caryn','743','XXXXXXXXXXXXXXXX0821','Suspendisse.aliquet#auctor.org'),
('91','06/16/2010','6852582','Buckner, Chiquita','99','XXXXXXXXXXXXXXXX1533','tellus.sem#semvitaealiquam.edu'),
('63','06/12/2010','7930230','Nolan, Wyoming','1192','XXXXXXXXXXXXXXXX1291','Sed#diam.org'),
('32','02/01/2010','8407102','Cummings, Deacon','1315','XXXXXXXXXXXXXXXX4375','a.odio.semper#massaSuspendisseeleifend.ca'),
('75','06/29/2010','5502798','Guerrero, Unity','858','XXXXXXXXXXXXXXXX8000','eget#lectus.edu'),
('50','09/13/2010','8312525','Russo, Yvette','1680','XXXXXXXXXXXXXXXX2046','In.mi#eu.com'),
('11','04/13/2010','6204132','Small, Calista','426','XXXXXXXXXXXXXXXX0269','lacus#Cumsociisnatoque.org'),
('16','01/01/2011','7522507','Mosley, Thor','1459','XXXXXXXXXXXXXXXX8451','netus.et#Pellentesqueutipsum.com'),
('5','01/27/2010','1472120','Case, Kiona','1419','XXXXXXXXXXXXXXXX7097','Duis#duilectusrutrum.edu'),
('70','02/17/2010','1095935','Snyder, Tanner','1655','XXXXXXXXXXXXXXXX8556','metus.sit.amet#inconsequatenim.edu'),
('63','11/10/2010','3046441','Guerrero, Unity','629','XXXXXXXXXXXXXXXX0807','nonummy.ac.feugiat#Phasellusdapibus.org'),
('22','08/19/2010','5435100','Turner, Patrick','1133','XXXXXXXXXXXXXXXX6734','pede#Duis.edu'),
('96','10/05/2010','6381992','May, Dominic','1858','XXXXXXXXXXXXXXXX7227','hymenaeos#etcommodo.edu'),
('96','02/26/2010','8630748','Chandler, Olympia','1016','XXXXXXXXXXXXXXXX4001','sed.dui.Fusce#pellentesqueSed.com');
--insert a random fraud transaction to check against (based on customer and account_info only)
INSERT INTO [dbo].[test_userCheck] (confirmation_id, customer, total, account_info) VALUES
('2055015', 'Guerrero, Unity', '20.02', 'XXXXXXXXXXXXXXXX0821');
--get result, which is correct
SELECT a.confirmation_id, a.customer, a.total, a.account_info, a.email_address
FROM dbo.test_userData AS a RIGHT OUTER JOIN
dbo.test_userCheck AS b ON a.customer = b.customer OR a.account_info = b.account_info;
DROP TABLE [dbo].[test_userData];
DROP TABLE [dbo].[test_userCheck];
Create the appropriate index or indices. Just based on your question, I'd suggest two indices, one on test_userData.customer, and a second index on test_userData.account_info
Creating indexes would probably help, but have you considered another design that complies with normal forms. It would be better if you access the date through index on a integer column instead of string...
Related
MSSQL: How to join only if its children not exist
I want to left join but only if it does not have the specific child. Here is my current query: SELECT "chatRoom"."id" as id, "chatRoom"."name" as name, "chatRoom"."type" as type, "chatRoom"."description" as description, "chatRoom"."thumbnail" as thumbnail, "chatRoom"."status" as status, chats.[unreadCount] as unreadCount FROM "chat_room" "chatRoom" LEFT JOIN "chat_room_participant" "participants" ON "participants"."chatRoomId"="chatRoom"."id" LEFT JOIN ( SELECT "chatRoom"."id" AS "chatRoomId", COUNT(readBy.id) AS "unreadCount" FROM "chat" "chat" LEFT JOIN "chat_room" "chatRoom" ON "chatRoom"."id"="chat"."chatRoomId" LEFT JOIN "chat_read_by_chat_room_participant" "chat_readBy" ON "chat_readBy"."chatId"="chat"."id" LEFT JOIN "chat_room_participant" "readBy" ON "readBy"."id"="chat_readBy"."chatRoomParticipantId" WHERE NOT(readBy.UserId IN ('ca774a5f-a04d-ec11-ae58-74d83e04f9d3')) OR "readBy"."id" IS NULL GROUP BY "chatRoom"."id" ) "chats" ON chats.chatRoomId = "chatRoom"."id" WHERE ('ALL' = 'ALL' OR "chatRoom"."status" = 'ALL') AND "chatRoom"."applicationId" = '4ac752e9-004c-ec11-ae53-74d83e04f9d3' AND "participants"."userId" = 'A97D66C4-014C-EC11-AE53-74D83E04F9D3' ORDER BY "chatRoom"."lastUpdate" DESC In this query, it returns the correct value only if the readBy is null or contains 1 entry which equals to the given userId. So here I have the userId. I want to get all the unread chats count from each chatRoom that has the user as a participant. The schema is like this: chatRoom has many chats chatRoom has many participants chats has many to many `readBy` (participants) So there is a column automatically created for the many-to-many relation: column: chat_read_by_chat_room_participant Contains: |chatId|chatRoomParticipantId| In my query above, the left join will get any readBy from another user: WHERE NOT(readBy.UserId IN ('ca774a5f-a04d-ec11-ae58-74d83e04f9d3')) OR "readBy"."id" IS NULL GROUP BY "chatRoom"."id") "chats" ON chats.chatRoomId = "chatRoom"."id" which I do not want. This will return the entry if the user already read the chat, but another users have read as well. I want to return the entry only if the user has not read the chat. How can I do this? CREATION QUERY SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[chat]( [id] [uniqueidentifier] NOT NULL, [sentAt] [bigint] NOT NULL, [type] [nvarchar](255) NOT NULL, [status] [nvarchar](255) NOT NULL, [message] [nvarchar](max) NOT NULL, [filePath] [nvarchar](max) NULL, [chatRoomId] [uniqueidentifier] NOT NULL, [senderId] [nvarchar](255) NOT NULL, [userId] [uniqueidentifier] NULL, CONSTRAINT [PK_9d0b2ba74336710fd31154738a5] PRIMARY KEY CLUSTERED ( [id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] GO ALTER TABLE [dbo].[chat] ADD CONSTRAINT [DF_9d0b2ba74336710fd31154738a5] DEFAULT (newsequentialid()) FOR [id] GO ALTER TABLE [dbo].[chat] WITH CHECK ADD CONSTRAINT [FK_52af74c7484586ef4bdfd8e4dbb] FOREIGN KEY([userId]) REFERENCES [dbo].[chat_room_participant] ([id]) GO ALTER TABLE [dbo].[chat] CHECK CONSTRAINT [FK_52af74c7484586ef4bdfd8e4dbb] GO ALTER TABLE [dbo].[chat] WITH CHECK ADD CONSTRAINT [FK_e49029a11d5d42ae8a5dd9919a2] FOREIGN KEY([chatRoomId]) REFERENCES [dbo].[chat_room] ([id]) GO ALTER TABLE [dbo].[chat] CHECK CONSTRAINT [FK_e49029a11d5d42ae8a5dd9919a2] GO CREATE TABLE [dbo].[chat_read_by_chat_room_participant]( [chatId] [uniqueidentifier] NOT NULL, [chatRoomParticipantId] [uniqueidentifier] NOT NULL, CONSTRAINT [PK_f3dd24628d4644dd6e79bcd03d1] PRIMARY KEY CLUSTERED ( [chatId] ASC, [chatRoomParticipantId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] GO ALTER TABLE [dbo].[chat_read_by_chat_room_participant] WITH CHECK ADD CONSTRAINT [FK_011624ccd7e7b0281ef629f2930] FOREIGN KEY([chatId]) REFERENCES [dbo].[chat] ([id]) ON UPDATE CASCADE ON DELETE CASCADE GO ALTER TABLE [dbo].[chat_read_by_chat_room_participant] CHECK CONSTRAINT [FK_011624ccd7e7b0281ef629f2930] GO ALTER TABLE [dbo].[chat_read_by_chat_room_participant] WITH CHECK ADD CONSTRAINT [FK_2e33e3de9d7c91d426c09a24810] FOREIGN KEY([chatRoomParticipantId]) REFERENCES [dbo].[chat_room_participant] ([id]) ON UPDATE CASCADE ON DELETE CASCADE GO ALTER TABLE [dbo].[chat_read_by_chat_room_participant] CHECK CONSTRAINT [FK_2e33e3de9d7c91d426c09a24810] GO CREATE TABLE [dbo].[chat_room]( [id] [uniqueidentifier] NOT NULL, [name] [nvarchar](255) NULL, [type] [nvarchar](255) NOT NULL, [thumbnail] [nvarchar](max) NULL, [description] [nvarchar](max) NULL, [status] [nvarchar](255) NOT NULL, [applicationId] [uniqueidentifier] NOT NULL, [lastUpdate] [bigint] NULL, CONSTRAINT [PK_8aa3a52cf74c96469f0ef9fbe3e] PRIMARY KEY CLUSTERED ( [id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] GO ALTER TABLE [dbo].[chat_room] ADD CONSTRAINT [DF_8aa3a52cf74c96469f0ef9fbe3e] DEFAULT (newsequentialid()) FOR [id] GO ALTER TABLE [dbo].[chat_room] WITH CHECK ADD CONSTRAINT [FK_2226638e6b7665ec0259d246b2b] FOREIGN KEY([applicationId]) REFERENCES [dbo].[application] ([id]) GO ALTER TABLE [dbo].[chat_room] CHECK CONSTRAINT [FK_2226638e6b7665ec0259d246b2b] GO CREATE TABLE [dbo].[chat_room_participant]( [id] [uniqueidentifier] NOT NULL, [joinedAt] [bigint] NOT NULL, [privilege] [nvarchar](255) NOT NULL, [chatRoomId] [uniqueidentifier] NOT NULL, [userId] [uniqueidentifier] NOT NULL, CONSTRAINT [PK_15913cf37a762fce4c8d6a32a42] PRIMARY KEY CLUSTERED ( [id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY], CONSTRAINT [UQ_ae9630d66f6c5d12afd1a991fec] UNIQUE NONCLUSTERED ( [chatRoomId] ASC, [userId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] GO ALTER TABLE [dbo].[chat_room_participant] ADD CONSTRAINT [DF_15913cf37a762fce4c8d6a32a42] DEFAULT (newsequentialid()) FOR [id] GO ALTER TABLE [dbo].[chat_room_participant] WITH CHECK ADD CONSTRAINT [FK_9f718459ea81b5130f81980ca08] FOREIGN KEY([userId]) REFERENCES [dbo].[user] ([id]) GO ALTER TABLE [dbo].[chat_room_participant] CHECK CONSTRAINT [FK_9f718459ea81b5130f81980ca08] GO ALTER TABLE [dbo].[chat_room_participant] WITH CHECK ADD CONSTRAINT [FK_fb664f48a4ec615ec5cce90a25d] FOREIGN KEY([chatRoomId]) REFERENCES [dbo].[chat_room] ([id]) GO ALTER TABLE [dbo].[chat_room_participant] CHECK CONSTRAINT [FK_fb664f48a4ec615ec5cce90a25d] GO
It looks like you just need a WHERE NOT EXISTS correlated subquery. Don't be tempted to just use LEFT JOIN IS NULL syntax, it is generally less efficient, as it hides from the optimizer that you are doing an anti-join. Occasionally it can be useful though. Further notes: Don't quote column and table names unless you have to. And generally avoid names that need quoting. Don't alias columns that don't need aliasing. Choose short meaningful table aliases. Basic formatting and good use of whitespace helps readability Your LEFT JOIN chat_room_participant logically becomes an INNER JOIN because of the WHERE. You don't need to re-join chat_room in the grouped subquery, you can just group by the join column. You may want to use a grouped OUTER APPLY instead of that subquery. It is unlikely to get a different query plan, but it can be easier to read. COUNT(SomeNotNullValue) is the same as COUNT(*) ('ALL' = 'ALL' OR cr.status = 'ALL') only makes sense if cr.Status is nullable, otherwise you would just use cr.status = 'ALL'. Even if it is nullable, you may as well use (cr.Status IS NULL OR cr.status = 'ALL') SELECT cr.id, cr.name, cr.type, cr.description, cr.thumbnail, cr.status, c.unreadCount FROM chat_room cr JOIN chat_room_participant p ON p.chatRoomId = cr.id LEFT JOIN ( SELECT c.chatRoomId AS chatRoomId, COUNT(*) AS unreadCount FROM chat c WHERE NOT EXISTS (SELECT 1 FROM chat_read_by_chat_room_participant chat_readBy JOIN chat_room_participant readBy ON readBy.id = chat_readBy.chatRoomParticipantId WHERE chat_readBy.chatId = c.id AND readBy.UserId IN ('ca774a5f-a04d-ec11-ae58-74d83e04f9d3') ) GROUP BY c.chatRoomId ) c ON c.chatRoomId = cr.id WHERE (cr.Status IS NULL OR cr.status = 'ALL') AND cr.applicationId = '4ac752e9-004c-ec11-ae53-74d83e04f9d3' AND p.userId = 'A97D66C4-014C-EC11-AE53-74D83E04F9D3' ORDER BY cr.lastUpdate DESC;
It seems that what you want is an anti-semi join. That is, a join to demonstrate that a row does not exist. The two typical methods are... main_table AS m LEFT JOIN other_table AS o ON o.m_id = m.id AND o.user = 'xyz' WHERE o.id IS NULL Or... main_table AS m WHERE NOT EXISTS ( SELECT * FROM other_table AS o WHERE o.m_id = m.id AND o.user = 'xyz' ) Exactly how to apply this to your example is unclear as you have cluttered your question with too many other details. (It is not a Minimal Verifiable Example.)
How to multiply two columns and assign its output to the column of another table?
I am a Database beginner. I am using Microsoft SQL Server Management Studio. I am multiplying two columns of one table and assign the output of those columns to another table, but I don't know how to do that. Is there anyone to help me? I have two columns one is UnitPrice column and the other is Quantity in PurchasesTable and I want to insert the output of these columns to TotalAmount of another table with the Name Dues. Thanks in advance.
/****** Object: Table [dbo].[Sample1] Script Date: 5/9/2018 3:59:09 PM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[Sample1]( [ID] [INT] IDENTITY(1,1) NOT NULL, [UnitPrice] [DECIMAL](16, 2) NOT NULL, [Quantity] [DECIMAL](10, 2) NOT NULL, [TotalAmount] AS ([UnitPrice]*[Quantity]), CONSTRAINT [PK_Sample1] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO INSERT INTO dbo.Sample1 ( UnitPrice, Quantity ) VALUES ( 100, -- UnitPrice - decimal 25 -- Quantity - decimal ) SELECT * FROM dbo.Sample1 ---OUTPUT------------ ID UnitPrice Quantity TotalAmount 1 100.00 25.00 2500.0000
UNIQUE constraint exception thrown on empty table INSERT [sql-server]
My INSERT statement fails while it is trying to add a new record into an empty table (Attribute) (no record yet). I am surprised by the error raised by the system: Violation of UNIQUE KEY constraint 'CK_Attribute_Name_IDproject'. Cannot insert duplicate key in object 'dbo.Attribute'. The duplicate key value is (dummy, 55). The creation script for this table looks like CREATE TABLE [dbo].[Attribute]( [ID] [int] IDENTITY(1,1) NOT NULL, [IDproject] [int] NOT NULL, [IDtype] [int] NOT NULL, [IDgroup] [int] NOT NULL, [name] [varchar](50) NOT NULL, [color] [int] NULL, [protected] [tinyint] NULL, [datemodified] [datetime] NOT NULL, PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY], CONSTRAINT [CK_Attribute_Name_IDproject] UNIQUE NONCLUSTERED ( [name] ASC, [IDproject] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO I skiped foreign keys references and default values which does not seem of interest in this context. The UNIQUE constraint applies to [name] and [IDproject]. When running the following statement SELECT * FROM [dbo].[Attribute] GO SELECT * FROM [dbo].[Project] GO I get the results (0 row(s) affected) (2 row(s) affected) The first result indicats the Attribute Table is empty The second that there are 2 Projects then running the following INSERT in table Attribute it failed with the above mentioned UNIQUE CONSTRAINT error INSERT INTO [dbo].[Attribute] ([IDproject], [name], [IDtype], [IDgroup], [color], [protected], [datemodified]) SELECT DISTINCT p.[ID],'dummy',t.[ID],g.[ID],-1,0,getdate() FROM [dbo].[Project] p INNER JOIN [dbo].[Group] g ON g.[name]='none' AND g.[IDproject] = p.[ID] INNER JOIN [dbo].[AttributeType] t ON t.[format]='text' AND g.[IDproject] = p.[ID] WHERE p.[name]='TESTPROJ' GO How can i get such an error on an empty table ?
I have found the solution myself: the derived SELECT returns 2 records with 'dummy' due to a duplicate INTO one of table, AttributeType, with which INNER JOIN is performed.
SQL Server what indexes to create
I have a simple table: CREATE TABLE DocModHistory [ID] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL, [DocID] [int] NOT NULL, [BranchID] [int] NOT NULL, [UserID] [int] NOT NULL, [InsDate] [datetime] NOT NULL, [Type] [int] NOT NULL, CONSTRAINT [PK_DocModHistory] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] and I have two select queries: SELECT dh.BranchID, MAX(dh.ID) as MaxID FROM DocModHistory dh WHERE dh.UserID = #p_UserID GROUP BY dh.BranchID and SELECT dh.DocID, MAX(dh.ID) as MaxID FROM DocModHistory dh WHERE dh.UserID = #p_UserID GROUP BY dh.DocID Could you tell me please what indexes should I create? Shall I create individual indexes for UserID, BranchID, DocID, or do I need multi-column indexes? Thank you!
Create an index for UserID with Included columns BranchID, ID and DocID So something like CREATE INDEX IX_UserID ON DocModHistory (UserID) INCLUDE (BranchID, ID ,DocID);
Display rows when scrolling as twitter, using a stored procedure
I have a site that displays posts. I want the site's scrolling to behave like twitter - scrolling down will display more and more posts, endlessly. Suppose I have the following tables: A Post table to hold all the posts. Every post is related to a single person CREATE TABLE [dbo].[Post]( [Id] [bigint] IDENTITY(1,1) NOT NULL, [PersonId] [int] NOT NULL, [PublishDate] [datetime] NOT NULL, CONSTRAINT [PK_Post] PRIMARY KEY CLUSTERED ( [Id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] A PostTag table to hold all the related tags of each post. CREATE TABLE [dbo].[PostTag]( [PostId] [bigint] NOT NULL, [TagId] [int] NOT NULL, CONSTRAINT [PK_PostTag] PRIMARY KEY CLUSTERED ( [PostId] ASC, [TagId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] For each user of the site, the UserPersonStatistics table holds the number of times he showed interest in a person related post. CREATE TABLE [dbo].[UserPersonStatistics]( [UserId] [bigint] NOT NULL, [PersonId] [int] NOT NULL, [Counter] [bigint] NOT NULL, CONSTRAINT [PK_UserPersonStatistics] PRIMARY KEY CLUSTERED ( [UserId] ASC, [PersonId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] For each user of the site, the UserPostStatistics table holds the number of times he showed interest in a post. CREATE TABLE [dbo].[UserPostStatistics]( [UserId] [bigint] NOT NULL, [PostId] [bigint] NOT NULL, [Counter] [bigint] NOT NULL, CONSTRAINT [PK_UserPostStatistics] PRIMARY KEY CLUSTERED ( [UserId] ASC, [PostId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] For each user of the site, the UserTagStatistic table holds the number of times he showed interest in a tag related post. CREATE TABLE [dbo].[UserTagStatistics]( [UserId] [bigint] NOT NULL, [TagId] [int] NOT NULL, [Counter] [bigint] NOT NULL, CONSTRAINT [PK_UserTagStatistics] PRIMARY KEY CLUSTERED ( [UserId] ASC, [TagId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] What I need is a stored procedure that for each user returns 35 different posts each time, that "remembers" the last 35 post so it will not return the same posts again, and the 35 posts should be consist of: 15 posts for the most popular tag (UserTagStatistics) 15 posts for the most popular person (UserPersonStatistics) 5 most popular posts (UserPostStatistics) One problem is that the procedure should return 35 different posts each time. One more problem is that a post can return once as the most popular post, once as a post of the most popular tag, and once as a post for the most popular person. This post should be counted once, not three times. The performance of the stored procedure is crucial. I know its a very complicated question. Any thoughts are appreciated. kruvi
Add a "LastViewed" datetime field to all tables then use a proc like this. For performance, just make sure to have an index on UserID+LastViewed+Counter and UserID+PersonID for each of the three tables and it should scream. Actually, since UserID+LastViewed+Counter is practically the whole table, if possible I'd recommend you make it the clustered index on each of your tables so that you avoid creating that second index which would basically be the same size as the raw table. create proc GetInfo(#UserId bigint) as begin update userpersonstatistics set lastviewed=getdate() where userid=#UserID and personid in ( select top 15 personid from userpersonstatistics where userid=#UserID and ( lastviewed is null or lastviewed != (select max(lastviewed) from userpersonstatistics where userid=#UserID) ) order by counter desc ) select * from UserPersonStatistics where UserID=#UserID and LastViewed = (select max(lastviewed) from UserTagStatistics) --**Repeat the above code for UserPostStatistics and UserTagStatistics end REVISED PROC BASED ON INPUT: create proc GetInfo(#UserId bigint) as begin declare #lastviewed datetime declare #results TABLE ( StatType varchar(10), Counter int, PostID ) set #lastviewed = getdate() --Person insert into #results(StatType,Counter,PostID) select 'Person',counter,PostID from UserPersonStatistics where userid=#UserID and personid in ( select top 35 personid from userpersonstatistics where userid=#UserID and ( lastviewed is null or lastviewed != (select max(lastviewed) from userpersonstatistics where userid=#UserID) ) order by counter desc ) --Post insert into #results(StatType,Counter,PostID) select 'Post',counter,PostID from UserPostStatistics where userid=#UserID and Postid in ( select top 35 Postid from userPoststatistics where userid=#UserID and ( lastviewed is null or lastviewed != (select max(lastviewed) from userPoststatistics where userid=#UserID) ) order by counter desc ) --Tag insert into #results(StatType,Counter,TagID) select 'Tag',counter,TagID from UserTagStatistics where userid=#UserID and Tagid in ( select top 35 Tagid from userTagstatistics where userid=#UserID and ( lastviewed is null or lastviewed != (select max(lastviewed) from userTagstatistics where userid=#UserID) ) order by counter desc ) --At this point you could have 105 rows of the various types (35*3). --You can use whatever algorithm you need to decide the top 35. --That may include some weighting. --You may want to consider using the Rank() function. end If your algorithm should consider the #1 top counter from each category before the #2's, take a look at the Rank() function.