SQL Server outer join issue - sql-server

I'm having a bit of trouble getting an outer join to work: I've had them work as I expected in MS Access in the past, but getting a similar thing happening in SQL Server is giving me issues.
I have a table of scores that apply to each student like:
+-------------+------------+-------+
| StudentID | StandardID | Score |
+-------------+------------+-------+
| 100 | 1011 | 1 |
| 100 | 1012 | 2 |
| 101 | 1011 | 3 |
Each student may have many scores, and each score is related to one Standard. Additionally, each student may belong to one or more groups, which are contained within another table, groups:
+-------------+------------+
| StudentID | GroupID |
+-------------+------------+
| 100 | 83 |
| 101 | 83 |
What I want to do is extract the score information and filter it by group: this dataset will then be matched up by StudentID to the correct record elsewhere. However, for each retrieved dataset for any given student, there needs to be exactly the same number of rows: one for each standard. Ideally this (for the above data):
StudentID = 100
+------------+-------------+------------+-------+
| StandardID | StudentID | GroupID | Score |
+------------+-------------+------------+-------+
| 1011 | 100 | 83 | 1 |
| 1012 | 100 | 83 | 2 |
StudentID = 101
+------------+-------------+------------+-------+
| StandardID | StudentID | GroupID | Score |
+------------+-------------+------------+-------+
| 1011 | 101 | 83 | 3 |
| 1012 | 101 | 83 | NULL | <--Can't get this to happen
I can pull up the list that I want but there are not NULL rows in there. As a further example, if I have 4 scores for one student but only 1 score for another, I still need there to be 4 rows returned by the query, with NULLs in for the scores they don't have.
This is what I have tried so far (a bit more verbose, but in essence):
SELECT Standards.StandardID, scores.StudentID, scores.TestDate, scores.Score,
scores.Assessment
FROM scores RIGHT OUTER JOIN
(SELECT scores_1.StandardID
FROM scores AS scores_1 INNER JOIN studentGroups
ON scores_1.StudentID = studentGroups.StudentID
WHERE (studentGroups.GroupID = 83)
GROUP BY scores_1.StandardID) AS Standards
ON scores.StandardID = Standards.StandardID
WHERE scores.StudentID = 100
Any help would be amazing!

Can you provide us the database structure Because to return same number of rows for all students you need to create a temp table with different StandardIDs and then use outer join to get same number of rows for all students.
provide the table structure for further and appropriate ans.

I use scores and groups as the two tables described above. You used much more terms so I got (and maybe are) a bit confused. However, this should work:
select AllStandards.StandardID,
groups.StudentID,
groups.GroupID,
Scores.Score
from (select distinct StandardID from scores) AllStandards
left join (
scores
join groups
on groups.StudentID = scores.StudentID
)
on AllStandards.StandardID = scores.StandardID
where groups.StudentID=100
I first create a list of all available StandardID and then do a left join to all students and scores to get the list.

Related

Running count of duplicate values

I have a table showing pallets and the amount of product ("units") on those pallets. Individual pallets can have multiple records due to multiple possible defect codes. This means when I am trying to sum the total units on all pallets, the same pallet could get counted more than once, which is undesirable. I would like (but don't know how) to add a running tally column to show how many times a specific pallet ID has appeared so that I can filter out any record where the count is greater than 1:
| Pallet_ID | Units | Defect_Code | COUNT |
+-----------+-------+-------------+-------+
| A1 | 100 | 03 | 1 |
| A1 | 100 | 05 | 2 |
| B1 | 95 | 03 | 1 |
| C1 | 300 | 05 | 1 |
| C1 | 300 | 06 | 2 |
| D1 | 210 | 03 | 1 |
| A1 | 100 | 10 | 3 |
| D1 | 210 | 03 | 2 |
In the above example, the correct sum total of units should be 705. A solution in SQL or in DAX would work (although I lean towards SQL). I have searched for a long time but could not find a solution that fits this particular scenario. Many thanks in advance for your time and consideration!
You may use the windowing function row_number() with the over clause where you partition by the pallet. Within each partition you can control which row is assigned the number 1 by using the order by inside the over clause.
select
*
from (
select
Pallet_ID
, Units
, Defect_Code
, row_number() over(partition by Pallet_ID order by defect_code) as count_of
from yourtable
)
where count_of = 1
Note I have arbitrability use the column defect_code to order by as I don't know what other columns may exist. If your table has a date/time value for when the row was created you could use this instead, or perhaps the unique key of the table.
side note:
I would not recommend using column alias of "count" as it's a SQL reserved word

Maximum Daisy Chain Length

I have a bunch of value pairs (Before, After) by users in a table. In ideal scenarios these values should form an unbroken chain. e.g.
| UserId | Before | After |
|--------|--------|-------|
| 1 | 0 | 10 |
| 1 | 10 | 20 |
| 1 | 20 | 30 |
| 1 | 30 | 40 |
| 1 | 40 | 30 |
| 1 | 30 | 52 |
| 1 | 52 | 0 |
Unfortunately, these records originate in multiple different tables and are imported into my investigation table. The other values in the table do not lend themselves to ordering (e.g. CreatedDate) due to some quirks in the system saving them out of order.
I need to produce a list of users with gaps in their data. e.g.
| UserId | Before | After |
|--------|--------|-------|
| 1 | 0 | 10 |
| 1 | 10 | 20 |
| 1 | 20 | 30 |
// Row Deleted (30->40)
| 1 | 40 | 30 |
| 1 | 30 | 52 |
| 1 | 52 | 0 |
I've looked at the other Daisy Chaining questions on SO (and online in general), but they all appear to be on a given problem space, where one value in the pair is always lower than the other in a predictable fashion. In my case, there can be increases or decreases.
Is there a way to quickly calculate the longest chain that can be created? I do have a CreatedAt column that would provide some (very rough) relative ordering - When the date is more than about 10 seconds apart, we could consider them orderable)
Are you not therefore simply after this to get the first row where the "chain" is broken?
SELECT UserID, Before, After
FROM dbo.YourTable YT
WHERE NOT EXISTS (SELECT 1
FROM dbo.YourTable NE
WHERE NE.After = YT.Before)
AND YT.Before != 0;
If you want to last row where the row where the "chain" is broken, just swap the aliases on the columns in the WHERE in the NOT EXISTS.
the following performs hierarchical recursion on your example data and calculates a "chain" count column called 'h_level'.
;with recur_cte([UserId], [Before], [After], h_level) as (
select [UserId], [Before], [After], 0
from dbo.test_table
where [Before] is null
union all
select tt.[UserId], tt.[Before], tt.[After], rc.h_level+1
from dbo.test_table tt join recur_cte rc on tt.UserId=rc.UserId
and tt.[Before]=rc.[After]
where tt.[Before]<tt.[after])
select * from recur_cte;
Results:
UserId Before After h_level
1 NULL 10 0
1 10 20 1
1 20 30 2
1 30 40 3
1 30 52 3
Is this helpful? Could you further define which rows to exclude?
If you want users that have more than one chain:
select t.UserID
from <T> as t left outer join <T> as t2
on t2.UserID = t.UserID and t2.Before = t.After
where t2.UserID is null
group by t.UserID
having count(*) > 1;

SQL Server delete on multiple foreign keyed tables - performance

I am trying to remove old data from a SQL Server database, given a list of ID's, but I'm trying to figure out how to get it to run faster. Currently deleting a list of 250 ID's takes around 1 hour. These ID's are attached to our 'root' objects, example below. Each of these has foreign key constraints.
Products
| productID | description | price |
+-----------------+-------------------+-------------+
| 1 | item 1 | 5.00 |
| 2 | item 2 | 5.00 |
| 3 | item 3 | 5.00 |
| ... | ... | ... |
Sales
| saleID | productID |
+-----------------+-------------------+
| 4 | 1 |
| 5 | 2 |
| 6 | 3 |
| ... | ... |
Taxes
| taxID | saleID |
+-----------------+-------------------+
| 7 | 4 |
| 8 | 5 |
| 9 | 6 |
| ... | ... |
Currently, we are just passing a list of product ID's and cascading through manually, such as
DECLARE #ProductIDsRemoval AS TABLE { id int }
INSERT INTO #ProductIDsRemoval VALUES (1)
DELETE t
FROM dbo.Taxes t
INNER JOIN dbo.Sales s ON (s.saleID = t.saleID)
INNER JOIN #ProductIDsRemoval p ON (s.productID = p.id)
DELETE s
FROM dbo.Sales s
INNER JOIN #ProductIDsRemoval p ON (s.productID = p.id)
DELETE p
FROM dbo.Products p
INNER JOIN #ProductIDsRemoval p2 ON (p.productID = p2.id)
This works fine, however my issue is that my table structure has ~70 tables and at least a couple thousand rows in each to remove, if not a couple million. Currently, my query takes anywhere from 1 to 6 hours to run, depending on the number of base ID's we're removing (my structure doesn't actually use Products/Taxes/Sales, but it's a decent analogy, and the number we're aiming to remove is ~750 base ids, which we are estimating 3-5 hours for runtime)
I've seen other Stack Overflow answers saying to drop all constraints, add the on-cascade delete, and then re-add the constraints, but this also is taking quite a long time, as I would need to 1. Drop constraints. 2. Rebuild with on-cascade. 3. run my query. 4 drop constraints. 5 re-add without on-cascade.
I've also been looking at possibly just selecting everything I need into temp tables, truncating all of the other tables, and then re-inserting all of my values back and re-setting the indexes based on the last item I added, but again I would need to edit all foreign keys, which I would prefer to not do.

SQL Query to select one set when there are multiple entries

I asked this question (SQL Query to select one set when there are duplicates) last year and got the solution to count the SLAs. Basically, count the number of minimum SLA for each application. However, I have a follow-up question. I want a query that will return the rows of the minimum SLA and earliest date for each REF_ID (or APP_ID)
ID | REF_ID | APP_ID | FIRST_DATE | SECOND_DTE | SLA |
1 | 11 | 101 | 2016/10/01 | 2016/10/02 | 1 |
2 | 12 | 102 | 2016/10/01 | 2016/10/04 | 2 |
3 | 12 | 102 | 2016/10/01 | 2016/10/05 | 2 |
So the query should return the first and second row.
I would very much appreciate if someone could provide a solution.
I have updated the query based on User726720 answer. This does not return entire rows but sufficient data.
SELECT REF_ID, MIN(SECOND_DTE), MIN(SLA) FROM TABLE WHERE FIRST_DTE > '2016-10-01' AND FIRST_DTE < '2016-11-01' GROUP BY REF_ID
This should do the job:
Select * from table
WHERE SLA = ( SELECT MIN(SLA) FROM table)
and SECOND_DTE = ( SELECT MIN(SECOND_DTE ) FROM table)

TSQL Multiple column unpivot with named rows possible?

I know there are several unpivot / cross apply discussions here but I was not able to find any discussion that covers my problem. What I've got so far is the following:
SELECT Perc, Salary
FROM (
SELECT jobid, Salary_10 AS Perc10, Salary_25 AS Perc25, [Salary_Median] AS Median
FROM vCalculatedView
WHERE JobID = '1'
GROUP BY JobID, SourceID, Salary_10, Salary_25, [Salary_Median]
) a
UNPIVOT (
Salary FOR Perc IN (Perc10, Perc25, Median)
) AS calc1
Now, what I would like is to add several other columns, eg. one named Bonus which I also want to put in Perc10, Perc25 and Median Rows.
As an alternative, I also made a query with cross apply, but here, it seems as if you can not "force" sort the rows like you can with unpivot. In other words, I can not have a custom sort, but only a sort that is according to a number within the table, if I am correct? At least, here I do get the result like I wish to have, but the rows are in a wrong order and I do not have the rows names like Perc10 etc. which would be nice.
SELECT crossapplied.Salary,
crossapplied.Bonus
FROM vCalculatedView v
CROSS APPLY (
VALUES
(Salary_10, Bonus_10)
, (Salary_25, Bonus_25)
, (Salary_Median, Bonus_Median)
) crossapplied (Salary, Bonus)
WHERE JobID = '1'
GROUP BY crossapplied.Salary,
crossapplied.Bonus
Perc stands for Percentile here.
Output is intended to be something like this:
+--------------+---------+-------+
| Calculation | Salary | Bonus |
+--------------+---------+-------+
| Perc10 | 25 | 5 |
| Perc25 | 35 | 10 |
| Median | 27 | 8 |
+--------------+---------+-------+
Do I miss something or did I something wrong? I'm using MSSQL 2014, output is going into SSRS. Thanks a lot for any hint in advance!
Edit for clarification: The Unpivot-Method gives the following output:
+--------------+---------+
| Calculation | Salary |
+--------------+---------+
| Perc10 | 25 |
| Perc25 | 35 |
| Median | 27 |
+--------------+---------+
so it lacks the column "Bonus" here.
The Cross-Apply-Method gives the following output:
+---------+-------+
| Salary | Bonus |
+---------+-------+
| 35 | 10 |
| 25 | 5 |
| 27 | 8 |
+---------+-------+
So if you compare it to the intended output, you'll notice that the column "Calculation" is missing and the row sorting is wrong (note that the line 25 | 5 is in the second row instead of the first).
Edit 2: View's definition and sample data:
The view basically just adds computed columns of the table. In the table, I've got Columns like Salary and Bonus for each JobID. The View then just computes the percentiles like this:
Select
Percentile_Cont(0.1)
within group (order by Salary)
over (partition by jobID) as Salary_10,
Percentile_Cont(0.25)
within group (order by Salary)
over (partition by jobID) as Salary_25
from Tabelle
So the output is like:
+----+-------+---------+-----------+-----------+
| ID | JobID | Salary | Salary_10 | Salary_25 |
+----+-------+---------+-----------+-----------+
| 1 | 1 | 100 | 60 | 70 |
| 2 | 1 | 100 | 60 | 70 |
| 3 | 2 | 150 | 88 | 130 |
| 4 | 3 | 70 | 40 | 55 |
+----+-------+---------+-----------+-----------+
In the end, the view will be parameterized in a stored procedure.
Might this be your approach?
After your edits I understand, that your solution with CROSS APPLY would comes back with the right data, but not in the correct output. You can add constant values to your VALUES and do the sorting in a wrapper SELECT:
SELECT wrapped.Calculation,
wrapped.Salary,
wrapped.Bonus
FROM
(
SELECT crossapplied.*
FROM vCalculatedView v
CROSS APPLY (
VALUES
(1,'Perc10',Salary_10, Bonus_10)
, (2,'Perc25',Salary_25, Bonus_25)
, (3,'Median',Salary_Median, Bonus_Median)
) crossapplied (SortOrder,Calculation,Salary, Bonus)
WHERE JobID = '1'
GROUP BY crossapplied.SortOrder,
crossapplied.Calculation,
crossapplied.Salary,
crossapplied.Bonus
) AS wrapped
ORDER BY wrapped.SortOrder

Resources