SQL Server : optimize query. Lots of data - sql-server

At the beginning I apologize for not being word-perfect in English.
I have two tables in my database, one contains questions, and second contains user answers for questions (for statistics).
TableA - questions
___________
| ID | Name |
TableB - Statistics
___________________________________
| ID | A_ID | U_ID| IsCorrect | Date|
User can answer one question several times, for example if we have question with ID = 1 and user (with id 2) which answered this question 4 times, we will add 4 rows to TableB:
___________________________________
| ID | A_ID | U_ID| IsCorrect | Date|
-------------------------------------
| 1 | 1 | 2 | True | Date|
| 2 | 1 | 2 | False | Date|
| 3 | 1 | 2 | False | Date|
| 4 | 1 | 2 | True | Date|
At the end, I have to query for questions (TableA) which user has not responded or answered but the smallest number of times (user is able to answer all questions).
My query (procedure) looks like:
Declare #max int
SET #max = (SELECT TOP 1 Count(A_ID) as QuestionCount FROM [TableB]
Where User_id = 1
GROUP BY A_ID
ORDER BY QuestionCount DESC)
SELECT TOP 40 ID
FROM [dbo].[TableA]
WHERE ID NOT IN (SELECT A_ID
FROM [dbo].[TableB]
WHERE User_id = 1
GROUP BY A_ID
HAVING Count(A_ID) = #max)
ORDER BY NewID()
At the beggining I query for max occurence of question - If user answered some question 4 time #max will be 4.
In second query I query for question which weren't answered yet (in this occurence).
Question is: How to optimize this query (or maybe I should change my tables)? TableB for now has almost one million rows and beacause of that it isn't fast enough.

With SQL-Server (>=2008) you can use the OVER clause (https://msdn.microsoft.com/en-us/library/ms189461.aspx) which gives you grouped aggregats.
EDIT: Just found your ORDER BY NewID() Why do you do this? NewID() is very bad to sort... 1 million is not so much in fact, but 1 million GUIDs without an index are a mass...

Related

Displaying data in a different manner

So I have a table that binds ProductId and GroupId. The product can be assigned to all of 5 groups (1-5).
If the product doesn't exist in the table, it's not assigned to any of the group
ProductId | GroupId
-------------------
100 | 1
100 | 2
200 | 1
200 | 2
200 | 3
200 | 4
200 | 5
Taking a look at this table, we know that Product that goes by id 100 is assigned to 2 groups (1,2) and the product of id 200 is assigned to 5 groups (1-5).
I'm trying to write a query that will display each product in separate row, together with columns for all of the 5 groups and a bit value that contains information if the product belongs to the group or not (0,1). A visualization of the result I need:
ProductId | IsGroup1 | IsGroup2 | IsGroup3 | IsGroup4 | IsGroup5
-----------------------------------------------------------------
100 | 1 | 1 | 0 | 0 | 0 -- this belongs to groups 1, 2
200 | 1 | 1 | 1 | 1 | 1 -- this belongs to all of the groups
I know I could probably solve it using a self join 5 times on each distinct product, but I'm wondering if there's a more elegant way of solving it?
Any tips will be strongly appreciated
You could use a pivot. Since you only have 5 groups you don't need a dynamic pivot.
DB FIDDLE
select
ProductId
,IsGroup1 = iif([1] is null,0,1)
,IsGroup2 = iif([2] is null,0,1)
,IsGroup3 = iif([3] is null,0,1)
,IsGroup4 = iif([4] is null,0,1)
,IsGroup5 = iif([5] is null,0,1)
from
(select ProductID, GroupId from mytable) x
pivot
(max(GroupId) for GroupId in ([1],[2],[3],[4],[5])) p

SQL Server delete on multiple foreign keyed tables - performance

I am trying to remove old data from a SQL Server database, given a list of ID's, but I'm trying to figure out how to get it to run faster. Currently deleting a list of 250 ID's takes around 1 hour. These ID's are attached to our 'root' objects, example below. Each of these has foreign key constraints.
Products
| productID | description | price |
+-----------------+-------------------+-------------+
| 1 | item 1 | 5.00 |
| 2 | item 2 | 5.00 |
| 3 | item 3 | 5.00 |
| ... | ... | ... |
Sales
| saleID | productID |
+-----------------+-------------------+
| 4 | 1 |
| 5 | 2 |
| 6 | 3 |
| ... | ... |
Taxes
| taxID | saleID |
+-----------------+-------------------+
| 7 | 4 |
| 8 | 5 |
| 9 | 6 |
| ... | ... |
Currently, we are just passing a list of product ID's and cascading through manually, such as
DECLARE #ProductIDsRemoval AS TABLE { id int }
INSERT INTO #ProductIDsRemoval VALUES (1)
DELETE t
FROM dbo.Taxes t
INNER JOIN dbo.Sales s ON (s.saleID = t.saleID)
INNER JOIN #ProductIDsRemoval p ON (s.productID = p.id)
DELETE s
FROM dbo.Sales s
INNER JOIN #ProductIDsRemoval p ON (s.productID = p.id)
DELETE p
FROM dbo.Products p
INNER JOIN #ProductIDsRemoval p2 ON (p.productID = p2.id)
This works fine, however my issue is that my table structure has ~70 tables and at least a couple thousand rows in each to remove, if not a couple million. Currently, my query takes anywhere from 1 to 6 hours to run, depending on the number of base ID's we're removing (my structure doesn't actually use Products/Taxes/Sales, but it's a decent analogy, and the number we're aiming to remove is ~750 base ids, which we are estimating 3-5 hours for runtime)
I've seen other Stack Overflow answers saying to drop all constraints, add the on-cascade delete, and then re-add the constraints, but this also is taking quite a long time, as I would need to 1. Drop constraints. 2. Rebuild with on-cascade. 3. run my query. 4 drop constraints. 5 re-add without on-cascade.
I've also been looking at possibly just selecting everything I need into temp tables, truncating all of the other tables, and then re-inserting all of my values back and re-setting the indexes based on the last item I added, but again I would need to edit all foreign keys, which I would prefer to not do.

SQL Query to select one set when there are multiple entries

I asked this question (SQL Query to select one set when there are duplicates) last year and got the solution to count the SLAs. Basically, count the number of minimum SLA for each application. However, I have a follow-up question. I want a query that will return the rows of the minimum SLA and earliest date for each REF_ID (or APP_ID)
ID | REF_ID | APP_ID | FIRST_DATE | SECOND_DTE | SLA |
1 | 11 | 101 | 2016/10/01 | 2016/10/02 | 1 |
2 | 12 | 102 | 2016/10/01 | 2016/10/04 | 2 |
3 | 12 | 102 | 2016/10/01 | 2016/10/05 | 2 |
So the query should return the first and second row.
I would very much appreciate if someone could provide a solution.
I have updated the query based on User726720 answer. This does not return entire rows but sufficient data.
SELECT REF_ID, MIN(SECOND_DTE), MIN(SLA) FROM TABLE WHERE FIRST_DTE > '2016-10-01' AND FIRST_DTE < '2016-11-01' GROUP BY REF_ID
This should do the job:
Select * from table
WHERE SLA = ( SELECT MIN(SLA) FROM table)
and SECOND_DTE = ( SELECT MIN(SECOND_DTE ) FROM table)

Storing Users Location Questionnaire Site

Im building a website that has questionnaire that users fill out. Currently my db looks something like below.
Website Questionnaire Consists of
25 Questions
4 to 6 Answers per questions the user can choose from.
Issue
I want to add in the users country, state / province / city.
I need to incorporate this into my search function. See sql statement below.
The client provided me a list of 23 countries to store and 750 states/provinces and about 6000 cities.
Were should this go in my db? Im completely lost on this one?
Current DB Design
See fiddle http://sqlfiddle.com/#!6/bf068/1
User_Table
ID | UserName
0 | Jack
...
User Questionnaire_Questions_Answer
ID | user_id | question_id | answer_id
0 | 0 | 0 | 0
1 | 0 | 1 | 3
...
Questionnaire_Questions
ID | Question
0 | What type of music do you like?
1 | What is your favorite sport ?
...
Questionnaire_Answers
ID | Answer
0 | Rock
1 | Rap
2 | Basketball
3 | Soccer
...
SQL STATEMENT FOR SEARCH
Searches best questionnaire results based on what the preferences the user is looking for, Sorts on Count on highest totalmatches
SELECT
User_Table.id,
User_Table.UserName,
COUNT(User_Table.id) as totalMatches
FROM User_Table
INNER JOIN Questionnaire_Questions_Answer ON User_Table.id = Questionnaire_Questions_Answer.user_ID
INNER JOIN Questionnaire_Questions ON Questionnaire_Questions.id = Questionnaire_Questions_Answer.question_ID
INNER JOIN Questionnaire_Answers on Questionnaire_Answers.id = Questionnaire_Questions_Answer.answer_ID
WHERE
--Q and A Requested to Match
Questionnaire_Questions.id = '0' and Questionnaire_Answers.answer = '0'
OR
Questionnaire_Questions.id = '1' and Questionnaire_Answers.answer = '3'
GROUP BY User_Table.id
ORDER BY totalMatches DESC
Example Results
ID | Name | totalMatches
0 | Jack | 2
create a location table as so:
create table dbo.location (id int identity,
country varchar(200),
state_province varchar(100),
city varchar 100))
when you add a user add the id of the location to the user table.

SQL Query to get data based on multiple filters

I have following Product table and ProductTag tables -
ID | Product
--------------
1 | Product_A
2 | Product_B
3 | Product_C
TagID | ProductID
----------------------
1 | 2
1 | 3
2 | 1
2 | 2
2 | 3
3 | 1
3 | 2
Now I need a SQL query that return all products list which are having both Tag 1 and 2. Result should be as given below -
ProductID | Product
------------------------
2 | Product_B
3 | Product_C
Please suggest how can i write a MS SQL query for this.
SELECT p.ID, p.Product
FROM Product p
INNER JOIN ProductTag pt
ON p.ID = pt.ProductID
WHERE pt.TagID IN (1, 2) -- <== Tags you want to find
GROUP BY p.ID, o.Product
HAVING COUNT(*) = 2 -- <== tag count on WHERE clause
however, if TagID is not unique on every Product, you need to count only the distinct product.
HAVING COUNT(DISTINCT pt.TagID) = 2
More on: SQL of Relational Division

Resources