Use NOT IN instead of CTE or temp table

Use NOT IN instead of CTE or temp table - sql-server

select distinct patientID
from [dbo]..HPrecords
where ptprocess = 'refill'
and pTDescription <> 'Success'
and patientID is not null
and patiententrytime > '2021-04-06'
and patientID not in (
select distinct patientID
from [dbo]..HPrecords
where ptprocess = 'embossing'
and ptDescription = 'Success'
and patientID is not null
and patiententrytime > '2021-04-06'
)
So I want to use a NOT IN feature in SQL to filter out the patients that haven't received their refill medication yet. A patient can be refilled multiple times, the first time can fail, but the second or third time it can be successful. So there can be multiple rows.
So I just want to write a query that will filter out and get me the patientID that DID NOT SUCCEED in getting refill at all no matter how many times.
Is this the best way to write it, my current query is still running, I think the logic is wrong?
I want to try to write this query without CTE or temp table just as an exercise.
Sample output:
PatientID
151761
151759
151757
151764

I personally prefer joins above not-in. Looks neater, reads better and allows one to access information on both tables if you need to analyse anomalies. A colleague and I once did some very basic performance comparisons and there was no notable difference.
Here's my take on it..
select distinct hpr.patientID
from [dbo].HPrecords hpr
LEFT OUTER JOIN
[dbo].HPrecords hpr_val ON
hpr.patientID = hpr_val.patientID
AND hpr_val.ptprocess = 'embossing'
AND hpr_val.ptDescription = 'Success'
and hpr_val.patiententrytime > '20`enter code here`21-04-06'
where hpr.ptprocess = 'refill'
and hpr.pTDescription <> 'Success'
--and hpr.patientID is not null -- Not necessary because you will always have records in this table in this scenario
and hpr.patiententrytime > '2021-04-06'
AND hpr_Val.PatietID IS NULL
And on the extra dot in between the schema and table name... As Smor pointed out it is not necessary (Might even break the query) and rather used when you do not want to reference the schema when pointing to a database and table.
Long way: [database].[schema].[table] -- Example [MyDatabase].[dbo].[MyTable]
Short way: [database]..[table] -- Example [MyDatabase]..[MyTable]

Related

I have two tables and wanted to make a left join and get the latest data using date from both the tables. It doesn't pull all data from left table

I have two tables and wanted to make a left join and get the latest data using date from both the tables. It doesn't pull all data from left table
SELECT Firsttable.Username, Secondtable.city
FROM Firsttable
LEFT JOIN Secondtable ON (Firsttable.Username = Secondtable.Account_Name)
WHERE (
Firsttable.Questions <> 5
AND Firsttable.CreateDate = '2018-02-06 09:41:38.000'
AND Secondtable.CreateDate = '2018-02-06 09:07:47.000'
)
OR (
Firsttable.Questions <> 5
AND Firsttable.CreateDate = '2018-02-06 09:41:38.000'
AND Secondtable.CreateDate IS NULL
)

Since you didn't provide us with any data, you are going to have to do your own testing.
First, break your query down into smaller parts and see if the data you want actually exists.
SELECT Firsttable.Username
FROM Firsttable
LEFT JOIN Secondtable ON
(Firsttable.Username = Secondtable.Account_Name
AND Firsttable.Questions <> 5
AND Firsttable.CreateDate = '2018-02-06 09:41:38.000'
)
This will show if you have data in the first table that is valid. If you don't get anything, you may need to relax your Date comparison by eliminating Time. But only you know how restricting your data is. It also may mean your Secondtable.Account_Name may not match up to your Firsttable.Username like you thought. Play around with this query until you get the good data you were expecting.
Once you know you have good data in your first table, then add on selection criteria for your second table:
SELECT Firsttable.Username, Secondtable.city
FROM Firsttable
LEFT JOIN Secondtable ON
(Firsttable.Username = Secondtable.Account_Name
AND Firsttable.Questions <> 5
AND Firsttable.CreateDate = '2018-02-06 09:41:38.000'
)
WHERE (
Secondtable.CreateDate = '2018-02-06 09:07:47.000'
OR Secondtable.CreateDate IS NULL
)
If this drops off all of your data, then you know something doesn't match up with your Secondtable.CreateDate. It could be your use of Time is too constricting. Just match on date only and see if that helps. But if you know Time is important, go back to the first query above and print out all the different Secondtable.CreateDate to see if your date is part of the result set. Being off by one second could cause your data to not match.
Play around with both of these queries until you find the combination that brings your data out.
If you still have trouble, you'll have to post some example data for each table so we can see how to help you better.

Hierarchical SQL select-query

I'm using MS SqlServer 2008. And I have a table 'Users'. This table has the key field ID of bigint. And also a field Parents of varchar which encodes all chain of user's parent IDs.
For example:
User table:
ID | Parents
1 | null
2 | ..
3 | ..
4 | 3,2,1
Here user 1 has no parents and user 4 has a chain of parents 3->2->1. I created a function which parses the user's Parents field and returns result table with user IDs of bigint.
Now I need a query which will select and join IDs of some requested users and theirs parents (order of users and theirs parents is not important). I'm not an SQL expert so all I could come up with is the following:
WITH CTE AS(
SELECT
ID,
Parents
FROM
[Users]
WHERE
(
[Users].Name = 'John'
)
UNION ALL
SELECT
[Users].Id,
[Users].Parents
FROM [Users], CTE
WHERE
(
[Users].ID in (SELECT * FROM GetUserParents(CTE.ID, CTE.Parents) )
))
SELECT * FROM CTE
And basically it works. But performance of this query is very poor. I believe WHERE .. IN .. expression here is a bottle neck. As I understand - instead of just joining the first subquery of CTE (ID's of found users) with results of GetUserParents (ID's of user parents) it has to enumerate all users in the Users table and check whether the each of them is a part of the function's result (and judging on execution plan - Sql Server does distinct order of the result to improve performance of WHERE .. IN .. statement - which is logical by itself but in general is not required for my goal. But this distinct order takes 70% of execution time of the query). So I wonder how this query could be improved or perhaps somebody could suggest some another approach to solve this problem at all?
Thanks for any help!

The recursive query in the question looks redundant since you already form the list of IDs needed in GetUserParents. Maybe change this into SELECT from Users and GetUserParents() with WHERE/JOIN.
select Users.*
from Users join
(select ParentId
from (SELECT * FROM Users where Users.Name='John') as U
cross apply [GetDocumentParents](U.ID, U.Family, U.Parents))
as gup
on Users.ID = gup.ParentId
Since GetDocumentParents expects scalars and select... where produces a table, we need to apply the function to each row of the table (even if we "know" there's only one). That's what apply does.
I used indents to emphasize the conceptual parts of the query. (select...) as gup is the entity Users is join'd with; (select...) as U cross apply fn() is the argument to FROM.
The key knowledge to understanding this query is to know how the cross apply works:
it's a part of a FROM clause (quite unexpectedly; so the syntax is at FROM (Transact-SQL))
it transforms the table expression left of it, and the result becomes the argument for the FROM (i emphasized this with indent)
The transformation is: for each row, it
runs a table expression right of it (in this case, a call of a table-valued function), using this row
adds to the result set the columns from the row, followed by the columns from the call. (In our case, the table returned from the function has a single column named ParentId)
So, if the call returns multiple rows, the added records will be the same row from the table appended with each row from the function.
This is a cross apply so rows will only be added if the function returns anything. If this was the other flavor, outer apply, a single row would be added anyway, followed by a NULL in the function's column if it returned nothing.

This "parsing" thing violates even the 1NF. Make Parents field contain only the immediate parent (preferably, a foreign key), then an entire subtree can be retrieved with a recursive query.

Select query optimisation

I have a large table with ID, date, and some other columns. ID is indexed and sequential.
I want to select all rows after a certain date. Given that the IDs are sequential, if the rows are ordered by ID in decreasing order, once the first row that fails the date test there's no need to carry on checking. How can I make use of the index to optimise this?

You could do something like this:
With FirstFailDate AS
(
-- You start by selecting the first fail date
SELECT TOP 1 * FROM YOUR_TABLE WHERE /* DATE TEST FAILING */ ORDER BY ID DESC
)
SELECT *
FROM YOUR_TABLE t
-- Then, you join your table with the first fail date, and get all the records
-- that are before this date (by ID)
JOIN FirstFailDate f
ON f.ID > t.ID

I don't think there is a good "legal" way to do this without actually indexing date.
However, you could try something like this:
Issue the following query to the DBMS: SELECT * FROM YOUR_TABLE ORDER BY ID DESC.
Start fetching the rows in your client application.
As you fetch, check the date.
Stop fetching (and close the cursor) when the date passes the limit.
The idea is that DBMS sometimes doesn't have to finish the whole query before starting to send the partial results to the client. In this case, the hope is that the DBMS will perform an index scan on ID (due to the ORDER BY ID DESC), and you'll be able get the results as it happens and then stop it before it has even finished.
NOTE: If your DBMS gives you an option to balance between getting the first row fast, versus getting the whole result fast, pick the first option (such as /*+ FIRST_ROWS */ hint under Oracle).
Of course, perform measurements on realistic amounts of data, to make sure this actually works in your particular situation.

MAX keyword taking a lot of time to select a value from a column

Well, I have a table which is 40,000,000+ records but when I try to execute a simple query, it takes ~3 min to finish execution. Since I am using the same query in my c# solution, which it needs to execute over 100+ times, the overall performance of the solution is deeply hit.
This is the query that I am using in a proc
DECLARE #Id bigint
SELECT #Id = MAX(ExecutionID) from ExecutionLog where TestID=50881
select #Id
Any help to improve the performance would be great. Thanks.

What indexes do you have on the table? It sounds like you don't have anything even close to useful for this particular query, so I'd suggest trying to do:
CREATE INDEX IX_ExecutionLog_TestID ON ExecutionLog (TestID, ExecutionID)
...at the very least. Your query is filtering by TestID, so this needs to be the primary column in the composite index: if you have no indexes on TestID, then SQL Server will resort to scanning the entire table in order to find rows where TestID = 50881.
It may help to think of indexes on SQL tables in the same way as those you'd find in the back of a big book that are hierarchial and multi-level. If you were looking for something, then you'd manually look under 'T' for TestID then there'd be a sub-heading under TestID for ExecutionID. Without an index entry for TestID, you'd have to read through the entire book looking for TestID, then see if there's a mention of ExecutionID with it. This is effectively what SQL Server has to do.
If you don't have any indexes, then you'll find it useful to review all the queries that hit the table, and ensure that one of those indexes is a clustered index (rather than non-clustered).

Try to re-work everything into something that works in a set based manner.
So, for instance, you could write a select statement like this:
;With OrderedLogs as (
Select ExecutionID,TestID,
ROW_NUMBER() OVER (PARTITION BY TestID ORDER By ExecutionID desc) as rn
from ExecutionLog
)
select * from OrderedLogs where rn = 1 and TestID in (50881, 50882, 50883)
This would then find the maximum ExecutionID for 3 different tests simultaneously.
You might need to store that result in a table variable/temp table, but hopefully, instead, you can continue building up a larger, single, query, that processes all of the results in parallel.
This is the sort of processing that SQL is meant to be good at - don't cripple the system by iterating through the TestIDs in your code.
If you need to pass many test IDs into a stored procedure for this sort of query, look at Table Valued Parameters.

SQL Server do I need two queries and a function efficiency question

I want to get a list of people affiliated with a blog. The table [BlogAffiliates] has:
BlogID
UserID
Privelage
and if the persons associated with that blog have a lower or equal privelage they cannot edit [bit field canedit].
Is this query the most efficient way of doing this or are there better ways to derive this information??
I wonder if it can be done in a single query??
Can it be done without that convert in some more clever way?
declare #privelage tinyint
select #privelage = (select Privelage from BlogAffiliates
where UserID=#UserID and BlogID = #BlogID)
select aspnet_Users.UserName as username,
BlogAffiliates.Privelage as privelage,
Convert(Bit, Case When #privelage> blogaffiliates.privelage
Then 1 Else 0 End) As canedit
from BlogAffiliates, aspnet_Users
where BlogAffiliates.BlogID = #BlogID and BlogAffiliates.Privelage >=2
and aspnet_Users.UserId = BlogAffiliates.UserID

Some of this would depend on the indexs and the size of the tables involved. If for example your most costly portion of the query when you profiled it was a seek on the "BlogAffiliates.BlogID" column, then you could do one select into a table variable and then do both calculations from there.
However I think most likely the query you have stated is probably going to be close the the most efficient. The only possible work duplication is you are seeking twice on the "BlogAffiliates.BlogID" fields because of the two queries.

You can try below query.
Select aspnet_Users.UserName as username, Blog.Privelage as privelage,
Convert(Bit, Case When #privelage> Blog.privelage
Then 1 Else 0 End) As canedit
From
(
Select UserID, Privelage
From BlogAffiliates
Where BlogID = #BlogID and Privelage >= 2
)Blog
Inner Join aspnet_Users on aspnet_Users.UserId = Blog.UserID
As per my understanding you should not use Table variable, in case you are joining it with other table. This can reduce the performance. But in case the records are less, then you should go for it. You can also use Local temporary tables for this purpose.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Use NOT IN instead of CTE or temp table - sql-server

Related

I have two tables and wanted to make a left join and get the latest data using date from both the tables. It doesn't pull all data from left table

Hierarchical SQL select-query

Select query optimisation

MAX keyword taking a lot of time to select a value from a column

SQL Server do I need two queries and a function efficiency question

Categories

Resources