SQL join conditional either or not both? - sql-server

I have 3 tables that I'm joining and 2 variables that I'm using in one of the joins.
What I'm trying to do is figure out how to join based on either of the statements but not both.
Here's the current query:
SELECT DISTINCT
WR.Id,
CAL.Id as 'CalendarId',
T.[First Of Month],
T.[Last of Month],
WR.Supervisor,
WR.cd_Manager as [Manager], --Added to search by the Manager--
WR.[Shift] as 'ShiftId'
INTO #Workers
FROM #T T
--Calendar
RIGHT JOIN [dbo].[Calendar] CAL
ON CAL.StartDate <= T.[Last of Month]
AND CAL.EndDate >= T.[First of Month]
--Workers
--This is the problem join
RIGHT JOIN [dbo].[Worker_Filtered]WR
ON WR.Supervisor IN (SELECT Id FROM [dbo].[User] WHERE FullName IN(#Supervisors))
or (WR.Supervisor IN (SELECT Id FROM [dbo].[User] WHERE FullName IN(#Supervisors))
AND WR.cd_Manager IN(SELECT Id FROM [dbo].[User] WHERE FullNameIN(#Manager))) --Added to search by the Manager--
AND WR.[Type] = '333E7907-EB80-4021-8CDB-5380F0EC89FF' --internal
WHERE CAL.Id = WR.Calendar
AND WR.[Shift] IS NOT NULL
What I want to do is either have the result based on the Worker_Filtered table matching the #Supervisor or (but not both) have it matching both the #Supervisor and #Manager.
The way it is now if it matches either condition it will be returned. This should be limiting the returned results to Workers that have both the Supervisor and Manager which would be a smaller data set than if they only match the Supervisor.
UPDATE
The query that I have above is part of a greater whole that pulls data for a supervisor's workers.
I want to also limit it to managers that are under a particular supervisor.
For example, if #Supervisor = John Doe and #Manager = Jane Doe and John has 9 workers 8 of which are under Jane's management then I would expect the end result to show that there are only 8 workers for each month. With the current query, it is still showing all 9 for each month.
If I change part of the RIGHT JOIN to:
WR.Supervisor IN (SELECT Id FROM [dbo].[User] WHERE FullName IN (#Supervisors))
AND WR.cd_Manager IN(SELECT Id FROM [dbo].[User] WHERE FullName IN(#Manager))
Then it just returns 12 rows of NULL.
UPDATE 2
Sorry, this has taken so long to get a sample up. I could not get SQL Fiddle to work for SQL Server 2008/2014 so I am using rextester instead:
Sample
This shows the results as 108 lines. But what I want to show is just the first 96 lines.
UPDATE 3
I have made a slight update to the Sample. this does get the results that I want. I can set #Manager to NULL and it will pull all 108 records, or I can have the correct Manager name in there and it'll only pull those that match both Supervisor and Manager.
However, I'm doing this with an IF ELSE and I was hoping to avoid doing that as it duplicates code for the insert into the Worker table.

The description of expected results in update 3 makes it all clear now, thanks. Your 'problem' join needs to be:
RIGHT JOIN Worker_Filtered wr on (wr.Supervisor in(#Supervisors)
and case when #Manager is null then 1
else case when wr.Manager in(#Manager) then 1 else 0 end
end = 1)
By the way, I don't know what you are expecting the in(#Supervisors) to achieve, but if you're hoping to supply a comma separated list of supervisors as a single string and have wr.Supervisor match any one of them then you're going to be disappointed. This query works exactly the same if you have = #Supervisors instead.

Related

How to avoid using a cursor in this T-SQL example

I'm converting a VBA application to run natively in a set of Microsoft SQL Server stored procedures. Since I'm using recordsets in VBA the direct translation would be to use a cursor in SQL Server. Unlike some other databases, SQL Server has terrible performance with cursors and I see advice to avoid them like the plague. So I'm looking for advice on a direct way to code this problem.
I have a set of tasks and a set of people to work those tasks. There are rules so that only certain people can work certain tasks. The goal is to distribute the tasks as evenly as possible to the people.
The VBA algorithm is:
Select all the tasks for a given rule.
select the single person who matches that rule who has the lowest number of tasks assigned to them. Requerying each time to assure that I get the person with the lowest number after each update.
Assign that person to that task.
Increment the person's count of tasks assigned.
Next Task
If not end of Task goto 2.
The only ways I see to do this are with a cursor or a real ugly while loop. With a cursor the logic moves over with the same steps from the VBA application.
The WHILE Loop I envision in pseudo code:
While exists unassigned tasks
select a single task
select the person with the least tasks that can work that task type
assign person to task
Increment person's count
Does anyone have any better suggestions?
Not everything should be done in a stored procedure. This sounds like a scenario where you might select all the datasets, process assignment on client side (or SQLCLR), then push back the results.
Unless the problem is so massive to prevent it (millions of people, tasks, and rules), keeping it outside of the database should keep it more maintainable.
I'm new here but have been writing SQL/stored procedures like this for a few years.
Anyway, I've found that there are some times you cannot avoid a cursor (or while loop, etc) - particularly when the results of the next iteration depend on the results of this iteration. This applies in your case, as you're assigning tasks based on how many tasks they have.
The good news is that, based on what you're doing (assigning people to tasks) I'm guessing you're only talking very small numbers of loops and/or only running infrequently. As such, a cursor should still work fine.
The key thing to be mindful of in this is to keep the code within the cursor as simple as possible and ensure you don't need to keep doing big reads/etc. Instead, do any harder coding outside the cursor to get the data in a reasonable format (e.g., in temp tables) then run the cursor over the top.
Since you do not give us the structure of the table, it's kind of hard to try to give you a solution that will suit your needs. Of course, this is only an example of how you can rethink your logic, it's up to you to adapt it, specially based on the volume of your dataset which i assume must be small, and fill the gap for how you apply the rules which person can be assigned to which task
But, here's a strategy you can use (only if you understand it, since you'll have to maintain it!).
Let say you have theses tables:
create table Task (id int identity(1,1) not null, taskname varchar(30));
create table Person (id int identity(1,1), name varchar(30) not null);
create table AssignedTask (id int identity(1,1) not null, idPerson int not null, idTask int not null)
with theses values :
insert into Task (taskname) values ('Task1'),('Task2'),('Task3'), ('Task4'), ('Task5'), ('Task6'),('Task7'),('Task8'),('Task9'),('Task10')
insert into Person (name) values ('Person1'),('Person2'), ('Person3'), ('Person4')
insert into AssignedTask (idPerson, idTask) values (1,3),(2,4),(4,1),(4,5)
The current situation is :
select p.id, p.name, 'is assigned to', t.id as TaskId, t.taskname
from AssignedTask at
inner join Person p on p.id = at.idPerson
inner join Task t on t.id = at.idTask
order by p.name
id name TaskId taskname
----- ----------- -------------- ----- ---------
1 Person1 is assigned to 3 Task3
2 Person2 is assigned to 4 Task4
4 Person4 is assigned to 1 Task1
4 Person4 is assigned to 5 Task5
The idea behind this logic is to "pre-compute" the order of which the task will be assigned by computing the number of already assigned task + a row number and then, sorting the data by the least.
-- unassigned task with a TaskSequence to be able to perform an join on a row number by NbAssignedTaskIfAssigned
;with UnassignedTask as (
select row_number() over (order by id) as TaskSeq, id
from task
where id not in (select idTask from AssignedTask)
)
-- Get Number of task assigned by person
, PersonTask as (
select idPerson, count(1) as NbAssignedTask
from AssignedTask
group by idPerson
)
-- this is where the logic is
, ExpandThis as (
select p.id as PersonId, coalesce(pt.nbAssignedTask,0) as nbAssignedTask
-- this allow us to prioritize which person should be assigned a task
, coalesce(pt.nbAssignedTask,0) + row_number() over (partition by idPerson order by idperson) as NbAssignedTaskIfAssigned
from Person p
left join PersonTask pt on pt.idPerson = p.id
-- this is where the magic happens. Since you did not provides us how
-- the rule for assignment is enforce (either in a table on in code),
-- i choose to not apply any rule at all, so every person can be assigned
-- to any task. Change this line to a "cross apply ( select .... from
-- MyRules where ... ) x" to fit you needs.
cross apply UnassignedTask x
)
, Prioritize as (
select *
-- Just use a basic row_number to order which row must be assigned first
, ROW_NUMBER() over (order by NbAssignedTaskIfAssigned) as [Priority]
from ExpandThis et
)
select p.PersonId, ps.name, t.id as TaskId, t.taskname
from Prioritize p
inner join UnassignedTask ut on ut.TaskSeq = p.[Priority]
inner join Person ps on ps.id = p.PersonId
inner join Task t on t.id = ut.id
order by p.[Priority]
and the result is :
PersonId name TaskId taskname
----------- --------------- -------- ------------------------------
3 Person3 2 Task2
3 Person3 6 Task6
1 Person1 7 Task7
2 Person2 8 Task8
2 Person2 9 Task9
4 Person4 10 Task10
Person 3 did not have any assigned task, so he comes first. Second, third and fourth rows are both equals (assuming task 2 is assigned to person3), so they get assigned the next 3 tasks. Finally, all four person have now an equal number of task assigned but only 2 tasks remaining, the first 2 persons get assigned. you can fiddle this by modifying the order by clause.
Now, you can insert this result into AssignedTask, no need of a cursor nor a loop to perform this task.

Field is being updated with same value

I have a table that has a new column, and updating the values that should go in the new column. For simplicity sake I am reducing my example table structure as well as my query. Below is how i want my output to look.
IDNumber NewColumn
1 1
2 1
3 1
4 2
5 2
WITH CTE_Split
AS
(
select
*,ntile(2) over (order by newid()) as Split
from TableA
)
Update a
set NewColumn = a.Split
from CTE_Split a
Now when I do this I get my table and it looks as such
IDNumber NewColumn
1 1
2 1
3 1
4 1
5 1
However when I do the select only I can see that I get the desire output, now I have done this before to split result sets into multiple groups and everything works within the select but now that I need to update the table I am getting this weird result. Not quiet sure what I'm doing wrong or if anyone can provide any sort of feedback.
So after a whole day of frustration I was able to compare this code and table to another that I had already done this process to. The reason that this table was getting updated to all 1s was because turns out that whoever made the table thought this was supposed to be a bit flag. When it reality it should be an int because in this case its actually only two possible values but in others it could be more than two.
Thank you for all your suggestion and help and it should teach me to scope out data types of tables when using the ntile function.
Try updating your table directly rather than updating your CTE. This makes it clearer what your UPDATE statement does.
Here is an example:
WITH CTE_Split AS
(
SELECT
*,
ntile(2) over (order by newid()) as Split
FROM TableA
)
UPDATE a
SET NewColumn = c.Split
FROM
TableA a
INNER JOIN CTE_Split c ON a.IDNumber = c.IDNumber
I assume that what you want to achieve is to group your records into two randomized partitions. This statement seems to do the job.

Duplicate SQL Record Entries with in 3 days

Table has following structure
ID, OrderNumber, PFirstName, PLastName, Product, LastDateModified
This information is populated into my SQL Server database by a XML import file and is created when the front end hits 'Enter'. But someone on the front has been seeing an error and then hitting Cancel and re-submitting the order with new information.
Now, the first order is in the Database because they didn't cancel it out on the backend first.
How can I find the any duplicate OrderNumber, PFirstName, PLastName, Product within 3 days of any lastdatemodified entry?
A self join with a simple where clause.
Assuming the ORDER numbers are not duplicated and that's what you're looking for.
SELECT A.ID as A_ID
, A.orderNumber as OriginalOrder
, B.ID as B_ID
, B.OrderNumber as PossibleDuplicatedOrder
FROM TBL A
INNER JOIN TBL B
on A.PFirstName = B.PfirstName
AND A.PLastName = B.PLastName
AND A.Product = B.Product
AND A.LastDateMOdified < B.LastDateModified
WHERE datediff(day,A.LastDateModified,B.LastDateModified) <=3
Logically this self joins and to eliminate A-->B and B-->A duplication casued by self joins we use a < so that all of the records in alias A have a date earlier than that in B when the other fields are equal, and then we simply look for those that have a datediff of <=3.
However if multiple duplicates exist for the same order such as
A-->B
B-->C
You'll see duplication in the results such as (but only if all 3 are w/in 3 days)
A-->B
B-->C
A-->C
But I don't see this as a bad thing given what you're attempting to recover from.
I'm not sure how to determine if it's been cancelled or backed out so you'll have to set other limits for that as they weren't specified in the question.

Updating duplicate records so they are filtered

I've found that our website to ERP integration tool will duplicate inserts if there is an error during the sync. Until the error is resolved, the records will duplicate every time the sync retries, which is usually every 5 minutes.
Trying to find an effective way to update duplicate records so that when queried for a view that the duplicates are filtered. The challenge I am having is that a duplicate will have some columns that are different.
For example, looking at the SalaesOrderDetail table, an order had 120 line items. However, because of a sync issue, each line was duplicated.
I've tried using the following to test for the past month:
WITH cte AS (
SELECT SOHD.[salesorderno], [itemcode],[CommentText], unitofmeasure, itemcodedesc, quantityorderedoriginal, quantityshipped,
row_number() OVER(PARTITION BY SOHD.[salesorderno], [itemcode], unitofmeasure, itemcodedesc, quantityorderedoriginal, quantityshipped ORDER BY SOHD.[Linekey] desc) AS [rn]
FROM [dbo].[SO_SalesOrderHistoryDetail] SOHD
inner join [dbo].[SO_SalesOrderHistoryHeader] SOHH on SOHH.Salesorderno = SOHD.Salesorderno
Where year(orderdate) = '2016'
and month(orderdate) = '08'
--Only Look at completed orders, ignore quotes & deleted orders
and SOHH.Orderstatus in ('C')
--Only looks for item lines where something did not ship (prevent removing a "good" entry)
and [quantityshipped] = '0'
)
Select *
from cte
However, I keep finding issues with using this because if I were to run an update command with this, it will update some records it shouldn't. And if I add some of the columns for it to be more specific, it wouldn't edit some columns that it needs to.
For example, if I don't add
where rn >1 then I inadvertently edit records that are not duplicates
but if I add
where rn >1 then the 1st set of duplicate records won't be updated.
Feeling stuck, but not sure what to do.
Adding more info from comment section. I think maybe my cte statement to find the duplicates and an update command might have to be somewhat different. Example Data:
Order# Itemcode CommentText UnitofMeasure itemcodedesc qtyordered qtyshipped
12345 abc null each candy 5 0
12345 abc null each candy 5 5
12345 xyz null case slinky 25 0
12345 xyz null case slinky 25 25
So they are not duplicates if I include the qtyshipped column, but what I want to do is update only the records where the qtyshipped = 0. The update I plan to so is set commenttext = 'delete'
Change ROW_NUMBER to COUNT() Over() window function
WITH cte
AS (SELECT SOHD.[salesorderno],
[itemcode],
[commenttext],
unitofmeasure,
itemcodedesc,
quantityorderedoriginal,
quantityshipped,
Count(1)
OVER(partition BY SOHD.[salesorderno], [itemcode], unitofmeasure,itemcodedesc) AS [rn]
FROM [dbo].[so_salesorderhistorydetail] SOHD
..........)
SELECT *
FROM cte
WHERE rn > 1

Computed column expression

I have a specific need for a computed column called ProductCode
ProductId | SellerId | ProductCode
1 1 000001
2 1 000002
3 2 000001
4 1 000003
ProductId is identity, increments by 1.
SellerId is a foreign key.
So my computed column ProductCode must look how many products does Seller have and be in format 000000. The problem here is how to know which Sellers products to look for?
I've written have a TSQL which doesn't look how many products does a seller have
ALTER TABLE dbo.Product
ADD ProductCode AS RIGHT('000000' + CAST(ProductId AS VARCHAR(6)) , 6) PERSISTED
You cannot have a computed column based on data outside of the current row that is being updated. The best you can do to make this automatic is to create an after-trigger that queries the entire table to find the next value for the product code. But in order to make this work you'd have to use an exclusive table lock, which will utterly destroy concurrency, so it's not a good idea.
I also don't recommend using a view because it would have to calculate the ProductCode every time you read the table. This would be a huge performance-killer as well. By not saving the value in the database never to be touched again, your product codes would be subject to spurious changes (as in the case of perhaps deleting an erroneously-entered and never-used product).
Here's what I recommend instead. Create a new table:
dbo.SellerProductCode
SellerID LastProductCode
-------- ---------------
1 3
2 1
This table reliably records the last-used product code for each seller. On INSERT to your Product table, a trigger will update the LastProductCode in this table appropriately for all affected SellerIDs, and then update all the newly-inserted rows in the Product table with appropriate values. It might look something like the below.
See this trigger working in a Sql Fiddle
CREATE TRIGGER TR_Product_I ON dbo.Product FOR INSERT
AS
SET NOCOUNT ON;
SET XACT_ABORT ON;
DECLARE #LastProductCode TABLE (
SellerID int NOT NULL PRIMARY KEY CLUSTERED,
LastProductCode int NOT NULL
);
WITH ItemCounts AS (
SELECT
I.SellerID,
ItemCount = Count(*)
FROM
Inserted I
GROUP BY
I.SellerID
)
MERGE dbo.SellerProductCode C
USING ItemCounts I
ON C.SellerID = I.SellerID
WHEN NOT MATCHED BY TARGET THEN
INSERT (SellerID, LastProductCode)
VALUES (I.SellerID, I.ItemCount)
WHEN MATCHED THEN
UPDATE SET C.LastProductCode = C.LastProductCode + I.ItemCount
OUTPUT
Inserted.SellerID,
Inserted.LastProductCode
INTO #LastProductCode;
WITH P AS (
SELECT
NewProductCode =
L.LastProductCode + 1
- Row_Number() OVER (PARTITION BY I.SellerID ORDER BY P.ProductID DESC),
P.*
FROM
Inserted I
INNER JOIN dbo.Product P
ON I.ProductID = P.ProductID
INNER JOIN #LastProductCode L
ON P.SellerID = L.SellerID
)
UPDATE P
SET P.ProductCode = Right('00000' + Convert(varchar(6), P.NewProductCode), 6);
Note that this trigger works even if multiple rows are inserted. There is no need to preload the SellerProductCode table, either--new sellers will automatically be added. This will handle concurrency with few problems. If concurrency problems are encountered, proper locking hints can be added without deleterious effect as the table will remain very small and ROWLOCK can be used (except for the INSERT which will require a range lock).
Please do see the Sql Fiddle for working, tested code demonstrating the technique. Now you have real product codes that have no reason to ever change and will be reliable.
I would normally recommend using a view to do this type of calculation. The view could even be indexed if select performance is the most important factor (I see you're using persisted).
You cannot have a subquery in a computed column, which essentially means that you can only access the data in the current row. The only ways to get this count would be to use a user-defined function in your computed column, or triggers to update a non-computed column.
A view might look like the following:
create view ProductCodes as
select p.ProductId, p.SellerId,
(
select right('000000' + cast(count(*) as varchar(6)), 6)
from Product
where SellerID = p.SellerID
and ProductID <= p.ProductID
) as ProductCode
from Product p
One big caveat to your product numbering scheme, and a downfall for both the view and UDF options, is that we're relying upon a count of rows with a lower ProductId. This means that if a Product is inserted in the middle of the sequence, it would actually change the ProductCodes of existing Products with a higher ProductId. At that point, you must either:
Guarantee the sequencing of ProductId (identity alone does not do this)
Rely upon a different column that has a guaranteed sequence (still dubious, but maybe CreateDate?)
Use a trigger to get a count at insert which is then never changed.

Resources