Conditionally Change String Name - sql-server

I have a large data source that's automatically uploaded in a SQL Server Table so I am unable to manually change the data. Every now and then there are records that are mislabeled. 98% of the dataset contains unique Patient_fins; however, for patients that have been to both locations (ED and EDU), Patient_fin are duplicated, which is fine. For example,
Patient_fin CHECKIN_DATE_TIME TRACKING_GROUP
1 2018-01-01 01:37:00 EDU
1 2018-01-01 04:37:00 ED
I'm running into issues when the patients tracking group is not correctly labeled (both labels are the same when the CHECKIN_DATE_TIMEs are different) . For example, I can tell from the CHECKIN_DATE_TIME that the patient has been to two different locations ED and EDU, yet the tracking group is the same. The second row for Patient_fin 1, tracking group should read 'ED'
Patient_fin CHECKIN_DATE_TIME TRACKING_GROUP
1 2018-01-01 01:37:00 EDU
1 2018-01-01 04:37:00 EDU
For instances where the TRACKING GROUP is incorrect, is there a way in SQL where I can recode the record with the later CHECKIN_DATE_TIME so the TRACKING_GROUP reads ED. A priori knowledge tells me the later CHECKIN_DATE_TIME will always be associated with ED and not EDU.

IF only there will ever be two records with the same Patient_fin and you don't need to account for the first record being ED, what happens then? You would then be left with two records having a TRACKING_GROUP = ED:
--This will do pretty much what Sean Lange described except instead of a cte, it uses
--A subquery to get the records with a row number, partitioned by the Patient_fin
--It then joins this on the table by Patient_fin and CHECKIN_DATE_TIME and updates the second record for a Patient_fin
UPDATE dbo.SomTable
SET TRACKING_GROUP = 'ED'
FROM dbo.SomeTable AS st
INNER JOIN
(
SELECT Patient_fin, CHECKIN_DATE_TIME, ROW_NUMBER() OVER(PARTITION BY Patient_fin ORDER BY Patient_fin) AS [RowNumer]
FROM dbo.SomeTable
) AS x
ON x.CHECKIN_DATE_TIME = st.CHECKIN_DATE_TIME AND x.Patient_fin = st.Patient_fin
WHERE x.RowNum = 2

Related

SQL Server - deducting from available credit

I have invoicing solution that uses Azure SQL to store and calculate invoice data. I have been requested to provide 'credit' functionality so rather than recovering customers charges, the totals are deducted from an amount of available credit and reflected in the invoice (solution xyz may have 1500 worth of charges, but deducted from available credit of 10,000 means its effectively zero'd and leaves 8,500 credit remaining ). Unfortunately after several days I haven't been able to work out how to do this.
I am able to get a list of items and their costs from sql easily:
invoice_id
contact_id
solution_id
total
date
202104-015
52
10000
30317.27
2021-05-22
202104-015
52
10001
2399.90
2021-05-22
202104-015
52
10005
8302.27
2021-05-22
202104-015
52
10060
3625.22
2021-05-22
202104-015
52
10111
22.87
2021-05-22
202104-015
52
10115
435.99
2021-05-22
I have another table that shows the credit available for the given contact:
id
credit_id
owner_id
total_applied
date_applied
1
C00001
52
500000.00
2021-05-14
I have tried using the following SQL statement, based on another stackoverflow question to subtract from the previous row, thinking each row would then reflect the remaining credit:
Select
invoice_id,
solution_id
sum(total) as 'total',
cr.total_remaining - coalesce(lag(total)) over (order by s.solution_id), 0) as credit_available,
date
from
invoices
left join credits cr on
cr.credit_id = 'C00001'
Whilst this does subtract, it only subtracts from the row above it, not all of the rows above it:
invoice_id
solution_id
total
credit_available
date
202104-015
10000
30317.27
500000.00
2021-05-22
202104-015
10001
2399.90
469682.73
2021-05-22
202104-015
10005
8302.27
497600.10
2021-05-22
202104-015
10060
3625.22
491697.73
2021-05-22
202104-015
10111
22.87
496374.78
2021-05-22
202104-015
10115
435.99
499977.13
2021-05-22
I've also tried various queries with a mess of case statements.
Im at the point where I am contemplating using powershell or similar to do the task instead (loop through each solution, check if there is enough available credit, update a deduction table, goto next etc) but I'd rather keep it all in SQL if I can.
Anyone have some pointers for this beginner?
You don't need to use window functions, use a sub-query that sums the total of previous invoices. But be sure to use index the table correctly so that performance is not a problem.
There are two sub-queries, one for the previous total sum and another to get the date of the next credit for contact_id.
SELECT [inv].[invoice_id],
[inv].[solution_id],
[inv].[total],
-- subquery that sums the previous totals
[cr].[total_applied] - COALESCE((
SELECT SUM([inv_inner].[total])
FROM [dbo].[invoices] AS [inv_inner]
WHERE [inv_inner].[solution_id] < [inv].[solution_id]
), 0) AS [credit_available],
[inv].[date]
FROM [dbo].[invoices] [inv]
LEFT JOIN [dbo].[credits] [cr]
ON [cr].[owner_id] = [inv].[contact_id]
-- here, we make sure that the credit is available for the correct period
-- invoice date >= credit date_applied
AND [inv].[date] >= [cr].[date_applied]
-- and invoice date < next date_applied or tomorrow, in case there are no next date_applied
AND [inv].[date] < COALESCE((
SELECT MIN([cr2].[date_applied])
FROM [dbo].[credits] [cr2]
WHERE [cr2].[owner_id] = [cr].[owner_id]
AND [cr2].[date_applied] > [cr].[date_applied]
), GETDATE()+1)
AND [cr].[credit_id] = 'C00001';
This query works, but it is for this question only. Please study it and adapt to your real world problem.
This is a pretty complex scenario. I sadly cannot spend the time to offer a complete solution here. I do can provide you with tips and points of attention here:
Be sure to determine the actual remaining credit based on the complete invoice history. If you introduce filtering (in a WHERE-clause, for example, or by including joins with other tables), the results should not be affected by it. You should probably pre-calculate the available credit per invoice detail record in a temporary table or in a CTE and use that data in your main query.
Make sure that you regard the date_applied value of the credit. Before a credit is applied to a customer, that customer should probably have less credit or no credit at all. That should be reflected correctly on historical invoices, I guess.
Make sure you determine the correct amount of total credit. It is unclear from the information provided in your question how that should be determined/calculated. Is only the latest total_applied value from the credits table active? Or should all the historical total_applied values be summarized to get the total available credit?)
Include a correct join between your invoices table and your credits table. Currently, this join is hard coded in your query.
Also regard actual payments by customers. Payments have effect on the available credit, I assume. Also note that, unless you are OK with a history that changes, you need to regard the payment dates as well (just like the credit change dates).
I'm not sure how you would solve your scenario using PowerShell... I do know for sure, that this can be tackled with SQL.
I cannot say anything about the resulting performance, however. These kinds of calculations surely come with a price tag attached in that regard. If you need high performance, I guess it might be more practical to include columns in your invoices table to physically store the available credit with each invoice detail record.
Edit
I have experimented a little with your scenario and your additional comments.
My solution implementation uses two CTEs:
The first CTE (cte_invoice_credit_dates) retrieves the date of the active credit record for specific invoice IDs.
The second CTE (cte_contact_invoice_summarized_totals) calculates the invoice totals of all the invoices of a specific contact. Since you want to summarize on solution detail per invoice as well, I also included the solution ID per invoice in the querying logic.
The main query selects all columns from the invoices table and uses the data from the two CTEs to calculate three additional columns in the result set:
Column credit_assigned represents the total assigned credit at the invoice's date.
Column summarized_total shows the contact's cumulative invoice total.
Column credit_available shows the remaining credit.
WITH
[cte_invoice_credit_dates] AS (
SELECT DISTINCT
I.[invoice_id],
C.[date_applied]
FROM
[invoices] AS I
OUTER APPLY (SELECT TOP (1) [date_applied]
FROM [credits]
WHERE
[owner_id] = I.[contact_id] AND
[date_applied] <= I.[date]
ORDER BY [date_applied] DESC) AS C
),
[cte_contact_invoice_summarized_totals] AS (
SELECT
I.[contact_id],
I.[invoice_id],
I.[solution_id],
SUM(H.[total]) AS [total]
FROM
[invoices] AS I
INNER JOIN [invoices] AS H ON
H.[contact_id] = I.[contact_id] AND
H.[invoice_id] = I.[invoice_id] AND
H.[solution_id] <= I.[solution_id] AND
H.[date] <= I.[date]
GROUP BY
I.[contact_id],
I.[invoice_id],
I.[solution_id]
)
SELECT
I.[invoice_id],
I.[contact_id],
I.[solution_id],
I.[total],
I.[date],
COALESCE(C.[total_applied], 0) AS [credit_assigned],
H.[total] AS [summarized_total],
COALESCE(C.[total_applied] - H.[total], 0) AS [credit_available]
FROM
[invoices] AS I
INNER JOIN [cte_contact_invoice_summarized_totals] AS H ON
H.[contact_id] = I.[contact_id] AND
H.[invoice_id] = I.[invoice_id] AND
H.[solution_id] = I.[solution_id]
LEFT JOIN [cte_invoice_credit_dates] AS CD ON
CD.[invoice_id] = I.[invoice_id]
LEFT JOIN [credits] AS C ON
C.[owner_id] = I.[contact_id] AND
C.[date_applied] = CD.[date_applied]
ORDER BY
I.[invoice_id],
I.[solution_id];

How to accummulate two datetime in two tables as VIEW in SQL Server 2014?

How to query to accumulate two datetime columns in two tables in SQL Server 2014? This is an example for your reference:
Check-In table
InID UserID CheckInTime
---------------------------------
IN-001 1 2018-11-10 08:00:00
IN-002 2 2018-11-15 07:00:00
Check-Out table
OutID UserID CheckOutTime
----------------------------------
OUT-001 1 2018-11-10 12:00:00
OUT-002 2 2018-11-15 14:00:00
Result set (expected)
ResultID UserID InID OutID WorkTimeinHour
--------------------------------------------------------
1 1 IN-001 OUT-001 4
2 2 IN-002 OUT-002 7
Similar to #PSK, I used STUFF function to replace "IN-" and "OUT-" characters
But since these are in JOIN conditions, those operations will cause performance loss
It is better to use a numeric column in both tables instead of useless "IN-" and "OUT-" containing string columns
select
i.UserId, i.InID, CheckInTime, o.OutID, CheckOutTime,
dbo.fn_CreateTimeFromSeconds(DATEDIFF(ss, CheckInTime, CheckOutTime)) as TotalTime
from CheckIn i
inner join CheckOut o
on i.UserId = o.UserId and
STUFF (i.InID,1,3,'') = STUFF (o.OutID,1,4,'')
Additionally, I used a custom user-defined fn_CreateTimeFromSeconds function to format time for HH:MI:SS format
Hope it helps
For your current scenario, you can try like following.
Assuming that IN and OUT id after the "-" will be same as one entry.
SELECT ROW_NUMBER()
OVER(
ORDER BY (SELECT NULL)) AS ResultIt,
T1.inid,
T2.outid,
DATEDIFF(hh, T2.checkouttime, T1.checkintime)
FROM checkin T1
INNER JOIN checkout T2
ON REPLACE(T1.inid, 'IN-', '') = REPLACE(T2.outid, 'OUT-', '')
This query will not perform good for huge data as REPLACE is being used in the JOIN. Ideally you should have a single identifier to identify the IN and OUT transaction.

SQL join conditional either or not both?

I have 3 tables that I'm joining and 2 variables that I'm using in one of the joins.
What I'm trying to do is figure out how to join based on either of the statements but not both.
Here's the current query:
SELECT DISTINCT
WR.Id,
CAL.Id as 'CalendarId',
T.[First Of Month],
T.[Last of Month],
WR.Supervisor,
WR.cd_Manager as [Manager], --Added to search by the Manager--
WR.[Shift] as 'ShiftId'
INTO #Workers
FROM #T T
--Calendar
RIGHT JOIN [dbo].[Calendar] CAL
ON CAL.StartDate <= T.[Last of Month]
AND CAL.EndDate >= T.[First of Month]
--Workers
--This is the problem join
RIGHT JOIN [dbo].[Worker_Filtered]WR
ON WR.Supervisor IN (SELECT Id FROM [dbo].[User] WHERE FullName IN(#Supervisors))
or (WR.Supervisor IN (SELECT Id FROM [dbo].[User] WHERE FullName IN(#Supervisors))
AND WR.cd_Manager IN(SELECT Id FROM [dbo].[User] WHERE FullNameIN(#Manager))) --Added to search by the Manager--
AND WR.[Type] = '333E7907-EB80-4021-8CDB-5380F0EC89FF' --internal
WHERE CAL.Id = WR.Calendar
AND WR.[Shift] IS NOT NULL
What I want to do is either have the result based on the Worker_Filtered table matching the #Supervisor or (but not both) have it matching both the #Supervisor and #Manager.
The way it is now if it matches either condition it will be returned. This should be limiting the returned results to Workers that have both the Supervisor and Manager which would be a smaller data set than if they only match the Supervisor.
UPDATE
The query that I have above is part of a greater whole that pulls data for a supervisor's workers.
I want to also limit it to managers that are under a particular supervisor.
For example, if #Supervisor = John Doe and #Manager = Jane Doe and John has 9 workers 8 of which are under Jane's management then I would expect the end result to show that there are only 8 workers for each month. With the current query, it is still showing all 9 for each month.
If I change part of the RIGHT JOIN to:
WR.Supervisor IN (SELECT Id FROM [dbo].[User] WHERE FullName IN (#Supervisors))
AND WR.cd_Manager IN(SELECT Id FROM [dbo].[User] WHERE FullName IN(#Manager))
Then it just returns 12 rows of NULL.
UPDATE 2
Sorry, this has taken so long to get a sample up. I could not get SQL Fiddle to work for SQL Server 2008/2014 so I am using rextester instead:
Sample
This shows the results as 108 lines. But what I want to show is just the first 96 lines.
UPDATE 3
I have made a slight update to the Sample. this does get the results that I want. I can set #Manager to NULL and it will pull all 108 records, or I can have the correct Manager name in there and it'll only pull those that match both Supervisor and Manager.
However, I'm doing this with an IF ELSE and I was hoping to avoid doing that as it duplicates code for the insert into the Worker table.
The description of expected results in update 3 makes it all clear now, thanks. Your 'problem' join needs to be:
RIGHT JOIN Worker_Filtered wr on (wr.Supervisor in(#Supervisors)
and case when #Manager is null then 1
else case when wr.Manager in(#Manager) then 1 else 0 end
end = 1)
By the way, I don't know what you are expecting the in(#Supervisors) to achieve, but if you're hoping to supply a comma separated list of supervisors as a single string and have wr.Supervisor match any one of them then you're going to be disappointed. This query works exactly the same if you have = #Supervisors instead.

Joining poorly designed SQL tables?

I've tried searching for information on joining tables without foriegn keys but it seems the answer is always to create the foreign key. I cannot modify the tables in question to do this and I must report on data that is already in production. The following is a portion of the data in the tables involved in order to exemplify the issue.
Table A
Journal Account Debit Credit Sequence
--------------------------------------------------
87041 150-00 100.00 0.00 16384
87041 150-10 0.00 100.00 32768
87041 150-00 50.0 0.0 49152
87041 210-90 0.0 50.0 65536
Then the second table, tracking additional bits of information, is largely the same but missing the Sequence number that would tie the line items together properly. It has its own Sequence Number that is unrelated.
Table B
Journal Account Label Artist Sequence
--------------------------------------------------
87041 150-00 Label02 Artist12 1
87041 150-10 Label09 Artist03 2
87041 150-00 Label04 Artist01 3
87041 210-90 Label01 Artist05 4
At present the best I can come up with is to join on Journal and Account but that duplicates records. I have gotten close by playing around with grouping and max() on sequence number but the result has been that that not all duplicates are removed for journal entries with a very large number of rows and the first match from the second table is always displayed for lines that have the same account.
Closest - but bad - result
Journal Account Debit Credit Sequence Label Artist
----------------------------------------------------------------------
87041 150-00 100.00 0.00 16384 Label02 Artist12
87041 150-10 0.00 100.00 32768 Label09 Artist03
87041 150-00 50.0 0.0 49152 Label02 Artist12 <-- wrong
87041 210-90 0.0 50.0 65536 Label01 Artist05
How can I join the tables such that duplicates are excluded but also so that the correct Label and Artist are displayed? It sort of feels like I have to produce a query which knows that one of the records from Table B has already been used when the 49152 record from Table A comes looking for a match.
EDIT:
#Justin Crabtree A.Sequence will be the order in which the line items were entered. So a user could have entered the last line in the example first, then the first line, then the third, and finally the second.
#Edper Microsoft SQL Server...hmm, I cannot remote into the client's machine this morning...otherwise I would provide the version.
#Abe Miessler yes, you are correct.
As soon as I can get back into the server I will try your suggestion #pkuderov
Try this
;WITH a AS
(
SELECT Journal,
Account,
Debit,
Credit,
Sequence,
Id = ROW_NUMBER() OVER(PARTITION BY Journal ORDER BY Sequence)
FROM dbo.tablea
)
, b AS
(
SELECT Journal,
Account,
Label,
Artist,
Id = ROW_NUMBER() OVER(PARTITION BY Journal ORDER BY Sequence)
FROM dbo.tableb
)
SELECT a.Journal,
a.Account,
a.Debit,
a.Credit,
a.Sequence,
b.Label,
b.Artist
FROM a
JOIN b ON b.Journal = a.Journal
AND b.Account = a.Account
AND b.Id = a.Id
Hi, that's just an idea:
select
a.Journal, a.Account, a.Debit, a.Credit, a.Sequence, b.Label, b.Artist
from (
select
*,
row_number() over(partition by Journal, Account order by Sequence) as idInGroup
from a
) as a
join (
select
*,
row_number() over(partition by Journal, Account order by Sequence) as idInGroup
from b
) as b on
a.Journal = b.Journal
and a.Account = b.Account
and a.idInGroup = b.idInGroup
Here I assume that orders appeared in Sequence order (in both tables) and that's the base hint for join tables.
If you ordered the 2 table rows by their own sequence numbers, will the rows align in the same order?
If so, this is a possible solution for SQL server:
You can create 2 CTEs, one for each table, with ROW_NUMBER column, and that way, both tables will have a matching row number column that you can use to join. Let me know if you need an example.
If I'm reading your requirements correctly and you want all rows from Table A, but only the first matching row from Table B, your best bet would be to do an OUTER APPLY with a TOP(1). That would look something like this:
select *
from TableA
OUTER APPLY
(select TOP(1) Journal, Account, Label, Artist, Sequence
FROM TableB
WHERE Journal = TableA.Journal AND Account = TableA.Account
ORDER BY Sequence) as B
(Definitely pseudo-code, but that should be somewhat close.)
If it comes down to it, you could use ROW_NUMBER(), partition that by Journal and Account and then match on those Row_Number values for each result set. You'd generate one sub-query/CTE for TableA and another CTE for TableB - each with a RowNumber value that would be essentially a new sequence integer. The first row in TableA would match the first row in TableB, Second row in TableA would match the second in TableB, etc. Of course, you'd run into some issues if there are more rows for Journal/Account in "A" than there are in "B".
A better question might be - "How does your code determine all matches between TableA and TableB if they can't use any data columns to tie them together?"

Filtering a complex SQL Query

Unit - hmy, scode, hProperty
InsurancePolicy - hmy, hUnit, dtEffective, sStatus
Select MAX(i2.dtEffective) as maxdate, u.hMy, MAX(i2.hmy) as InsuranceId,
i2.sStatus
from unit u
left join InsurancePolicy i2 on i2.hUnit = u.hMy
and i2.sStatus in ('Active', 'Cancelled', 'Expired')
where u.hProperty = 2
Group By u.hmy, i2.sStatus
order by u.hmy
This query will return values for the Insurance Policy with the latest Effective Date (Max(dtEffective)). I added Max(i2.hmy) so if there was more than one Insurance Policy for the latest Effective Date, it will return the one with the highest ID (i2.hmy) in the database.
Suppose there was a Unit that had 3 Insurance Policies attached with the same latest effective date and all have different sStatus'.
The result would look like this:
maxdate UnitID InsuranceID sStatus
1/23/12 2949 1938 'Active'
1/23/12 2949 2343 'Cancelled'
1/23/12 2949 4323 'Expired'
How do I filter the results so that if there are multiple Insurance Policies with different Status' for the same unit and same date, then we choose the Insurance Policy with the 'Active' Status first, if one doesn't exist, choose 'Cancelled', and if that doesn't exist, choose 'Expired'.
This seems to be a matter of proper ranking of InsurancePolicy's rows and then joining Unit to the set of the former's top-ranked rows:
;
WITH ranked AS (
SELECT
*,
rnk = ROW_NUMBER() OVER (
PARTITION BY hUnit
ORDER BY dtEffective DESC, sStatus, hmy DESC
)
FROM InsurancePolicy
)
SELECT
i2.dtEffective AS maxdate,
u.hMy,
i2.hmy AS InsuranceId,
i2.sStatus
FROM Unit u
LEFT JOIN ranked i2 ON i2.hUnit = u.hMy AND i2.rnk = 1
You could make this work with one SQL statement but it will be nearly unreadable to your everyday t-sql developer. I would suggest breaking this query up into a few steps.
First, I would declare a table variable and place all the records that require no manipulation into this table (ie - Units that do not have multiple statuses for the same date = good records).
Then, get a list of your records that need work done on them (multiple statuses on the same date for the same UnitID) and place them in a table variable. I would create a "rank" column within this table variable using a case statement as illustrated here:
Pseudocode: WHEN Active THEN 1 ELSE WHEN Cancelled THEN 2 ELSE WHEN Expired THEN 3 END
Then delete records where 2 and 3 exist with a 1
Then delete records where 2 exists and 3
Finally, merge this updated table variable with your table variable containing your "good" records.
It is easy to get sucked into trying to do too much within one SQL statement. Break up the tasks to make it easier for you to develop and more manageable in the future. If you have to edit this SQL in a few years time you will be thanking yourself, not to mention any other developers that may have to take over your code.

Resources