SQL Server do I need two queries and a function efficiency question - sql-server

I want to get a list of people affiliated with a blog. The table [BlogAffiliates] has:
BlogID
UserID
Privelage
and if the persons associated with that blog have a lower or equal privelage they cannot edit [bit field canedit].
Is this query the most efficient way of doing this or are there better ways to derive this information??
I wonder if it can be done in a single query??
Can it be done without that convert in some more clever way?
declare #privelage tinyint
select #privelage = (select Privelage from BlogAffiliates
where UserID=#UserID and BlogID = #BlogID)
select aspnet_Users.UserName as username,
BlogAffiliates.Privelage as privelage,
Convert(Bit, Case When #privelage> blogaffiliates.privelage
Then 1 Else 0 End) As canedit
from BlogAffiliates, aspnet_Users
where BlogAffiliates.BlogID = #BlogID and BlogAffiliates.Privelage >=2
and aspnet_Users.UserId = BlogAffiliates.UserID

Some of this would depend on the indexs and the size of the tables involved. If for example your most costly portion of the query when you profiled it was a seek on the "BlogAffiliates.BlogID" column, then you could do one select into a table variable and then do both calculations from there.
However I think most likely the query you have stated is probably going to be close the the most efficient. The only possible work duplication is you are seeking twice on the "BlogAffiliates.BlogID" fields because of the two queries.

You can try below query.
Select aspnet_Users.UserName as username, Blog.Privelage as privelage,
Convert(Bit, Case When #privelage> Blog.privelage
Then 1 Else 0 End) As canedit
From
(
Select UserID, Privelage
From BlogAffiliates
Where BlogID = #BlogID and Privelage >= 2
)Blog
Inner Join aspnet_Users on aspnet_Users.UserId = Blog.UserID
As per my understanding you should not use Table variable, in case you are joining it with other table. This can reduce the performance. But in case the records are less, then you should go for it. You can also use Local temporary tables for this purpose.

Related

Use NOT IN instead of CTE or temp table

select distinct patientID
from [dbo]..HPrecords
where ptprocess = 'refill'
and pTDescription <> 'Success'
and patientID is not null
and patiententrytime > '2021-04-06'
and patientID not in (
select distinct patientID
from [dbo]..HPrecords
where ptprocess = 'embossing'
and ptDescription = 'Success'
and patientID is not null
and patiententrytime > '2021-04-06'
)
So I want to use a NOT IN feature in SQL to filter out the patients that haven't received their refill medication yet. A patient can be refilled multiple times, the first time can fail, but the second or third time it can be successful. So there can be multiple rows.
So I just want to write a query that will filter out and get me the patientID that DID NOT SUCCEED in getting refill at all no matter how many times.
Is this the best way to write it, my current query is still running, I think the logic is wrong?
I want to try to write this query without CTE or temp table just as an exercise.
Sample output:
PatientID
151761
151759
151757
151764
I personally prefer joins above not-in. Looks neater, reads better and allows one to access information on both tables if you need to analyse anomalies. A colleague and I once did some very basic performance comparisons and there was no notable difference.
Here's my take on it..
select distinct hpr.patientID
from [dbo].HPrecords hpr
LEFT OUTER JOIN
[dbo].HPrecords hpr_val ON
hpr.patientID = hpr_val.patientID
AND hpr_val.ptprocess = 'embossing'
AND hpr_val.ptDescription = 'Success'
and hpr_val.patiententrytime > '20`enter code here`21-04-06'
where hpr.ptprocess = 'refill'
and hpr.pTDescription <> 'Success'
--and hpr.patientID is not null -- Not necessary because you will always have records in this table in this scenario
and hpr.patiententrytime > '2021-04-06'
AND hpr_Val.PatietID IS NULL
And on the extra dot in between the schema and table name... As Smor pointed out it is not necessary (Might even break the query) and rather used when you do not want to reference the schema when pointing to a database and table.
Long way: [database].[schema].[table] -- Example [MyDatabase].[dbo].[MyTable]
Short way: [database]..[table] -- Example [MyDatabase]..[MyTable]

Select a random row from Oracle DB in a performant way

Using :
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0
I am trying to fetch a random row. As suggested in other stackoverflow questions, I used DBMS_RANDOM.VALUE like this -
SELECT column FROM
( SELECT column
FROM table
WHERE COLUMN_VALUE = 'Y' -- value of COLUMN_VALUE
ORDER BY dbms_random.value
)
WHERE rownum <= 1
But this query isn't performant when the number of requests increase.
So I am looking for an alternative.
SAMPLE wouldn't work for me because the sample picked up through the clause wouldn't have a dataset that matches my WHERE clause. The query looked like this -
SELECT column FROM table SAMPLE(1) WHERE COLUMN_VALUE = 'Y'
Because the SAMPLE is applied before my WHERE clause, most times this returns no data.
P.S: I am ok to move some part of the logic to application layer (though i am definitely not looking for answers that suggest loading everything to memory)
The performance problems consist of two aspects:
selecting the data with column_value = 'Y' and
sorting this subset to get a random record
You didn't say if the subset of your table with column_value = 'Y' is a large or small. This is important and will drive your strategy.
If there are lots of records with column_value = 'Y' use the SAMPLE to limit the rows to by sorted.
You are right, this could lead to empty result - in this case repeat the query (you may additionally add a logic that increases the sample percent to avoid lot of repeating). This will boost performance while you sort ony a sample of the data
select id from (
select id from tt SAMPLE(1) where column_value = 'Y' order by dbms_random.value )
where rownum <= 1;
If there are only few records with column_value = 'Y' define an index on this column (or a separate partition) - this enables a effiective access to the records. Use the order by dbms_random.value approach. Sort will not degradate performance for small number of rows.
select id from (
select id from tt where column_value = 'Y' order by dbms_random.value )
where rownum <= 1;
Basically both approaches keep the sorted rows in small size. The first approach perform a table access comparable with FULL TABLE SCAN, the second performs INDEX ACCESS for the selected column_value.

Slow running BIDS report query...OR clauses in WHERE branch using IN sub queries

I have a report with 3 sub reports and several queries to optimize in each. The first has several OR clauses in the WHERE branch and the OR's are filtering through IN options which are pulling sub-queries.
I say this mostly from reading this SO post. Specifically LBushkin's second point.
I'm not the greatest at TSQL but I know enough to think this is very inefficient. I think I need to do two things.
I know I need to add indexes to the tables involved.
I think the query can be greatly enhanced.
So it seems that my first step would be to improve the query. From there I can look at what columns and tables are involved and thus determine the indexes.
At this point I haven't posted table schemas as I'm looking more for options / considerations such as using a cte to replace all the IN sub-queries.
If needed I will definitely post whatever would be helpful such as physical reads etc.
SELECT DISTINCT
auth.icm_authorizationid,
auth.icm_documentnumber
FROM
Filteredicm_servicecost AS servicecost
INNER JOIN Filteredicm_authorization AS auth ON
auth.icm_authorizationid = servicecost.icm_authorizationid
INNER JOIN Filteredicm_service AS service ON
service.icm_serviceid = servicecost.icm_serviceid
INNER JOIN Filteredicm_case AS cases ON
service.icm_caseid = cases.icm_caseid
WHERE
(cases.icm_caseid IN
(SELECT icm_caseid FROM Filteredicm_case AS CRMAF_Filteredicm_case))
OR (service.icm_serviceid IN
(SELECT icm_serviceid FROM Filteredicm_service AS CRMAF_Filteredicm_service))
OR (servicecost.icm_servicecostid IN
(SELECT icm_servicecostid FROM Filteredicm_servicecost AS CRMAF_Filteredicm_servicecost))
OR (auth.icm_authorizationid IN
(SELECT icm_authorizationid FROM Filteredicm_authorization AS CRMAF_Filteredicm_authorization))
EXISTS is usually much faster than IN as the query engine is able to optimize it better.
Try this:
WHERE EXISTS (SELECT 1 FROM FROM Filteredicm_case WHERE icm_caseid = cases.icm_caseid)
OR EXISTS (SELECT 1 FROM Filteredicm_service WHERE icm_serviceid = service.icm_serviceid)
OR EXISTS (SELECT 1 FROM Filteredicm_servicecost WHERE icm_servicecostid = servicecost.icm_servicecostid)
OR EXISTS (SELECT 1 FROM Filteredicm_authorization WHERE icm_authorizationid = auth.icm_authorizationid)
Furthermore, an index on Filteredicm_case.icm_caseid, an index on Filteredicm_service.icm_serviceid, an index on Filteredicm_servicecost.icm_servicecostid, and an index on Filteredicm_authorization.icm_authorizationid will increase performance of this query. They look like they should be keys already, however, so I suspect that indices already exist.
However, unless I'm misreading, there's no way this WHERE clause will ever evaluate to anything other than true.
The clause you wrote says WHERE cases.icm_caseid IN (SELECT icm_caseid FROM Filteredicm_case AS CRMAF_Filteredicm_case). However, cases is an alias to Filteredicm_case. That's the equivalent of WHERE Filteredicm_case.icm_caseid IN (SELECT icm_caseid FROM Filteredicm_case AS CRMAF_Filteredicm_case). That will be true as long as Filteredicm_case.icm_caseid isn't NULL.
The same error in logic exists for the remaining portions in the WHERE clause:
(service.icm_serviceid IN (SELECT icm_serviceid FROM Filteredicm_service AS CRMAF_Filteredicm_service))
service is an alias for Filteredicm_service. This is always true as long as icm_serviceid is not null
(servicecost.icm_servicecostid IN (SELECT icm_servicecostid FROM Filteredicm_servicecost AS CRMAF_Filteredicm_servicecost))
servicecost is an alias for Filteredicm_servicecost. This is always true as long as icm_servicecostid is not null.
(auth.icm_authorizationid IN (SELECT icm_authorizationid FROM Filteredicm_authorization AS CRMAF_Filteredicm_authorization))
auth is an alias for Filteredicm_authorization. This is always true as long as icm_authorizationid is not null.
I don't understand what you're trying to accomplish.

The fastest way to check if some records in a database table?

I have a huge table to work with . I want to check if there are some records whose parent_id equals my passing value .
currently what I implement this is by using "select count(*) from mytable where parent_id = :id"; if the result > 0 , means the they do exist.
Because this is a very huge table , and I don't care what's the exactly number of records that exists , I just want to know whether it exists , so I think count(*) is a bit inefficient.
How do I implement this requirement in the fastest way ? I am using Oracle 10.
#
According to hibernate Tips & Tricks https://www.hibernate.org/118.html#A2
It suggests to write like this :
Integer count = (Integer) session.createQuery("select count(*) from ....").uniqueResult();
I don't know what's the magic of uniqueResult() here ? why does it make this fast ?
Compare to "select 1 from mytable where parent_id = passingId and rowrum < 2 " , which is more efficient ?
An EXISTS query is the one to go for if you're not interested in the number of records:
select 'Y' from dual where exists (select 1 from mytable where parent_id = :id)
This will return 'Y' if a record exists and nothing otherwise.
[In terms of your question on Hibernate's "uniqueResult" - all this does is return a single object when there is only one object to return - instead of a set containing 1 object. If multiple results are returned the method throws an exception.]
There's no real difference between:
select 'y'
from dual
where exists (select 1
from child_table
where parent_key = :somevalue)
and
select 'y'
from mytable
where parent_key = :somevalue
and rownum = 1;
... at least in Oracle10gR2 and up. Oracle's smart enough in that release to do a FAST DUAL operation where it zeroes out any real activity against it. The second query would be easier to port if that's ever a consideration.
The real performance differentiator is whether or not the parent_key column is indexed. If it's not, then you should run something like:
select 'y'
from dual
where exists (select 1
from parent_able
where parent_key = :somevalue)
select count(*) should be lighteningly fast if you have an index, and if you don't, allowing the database to abort after the first match won't help much.
But since you asked:
boolean exists = session.createQuery("select parent_id from Entity where parent_id=?")
.setParameter(...)
.setMaxResults(1)
.uniqueResult()
!= null;
(Some syntax errors to be expected, since I don't have a hibernate to test against on this computer)
For Oracle, maxResults is translated into rownum by hibernate.
As for what uniqueResult() does, read its JavaDoc! Using uniqueResult instead of list() has no performance impact; if I recall correctly, the implementation of uniqueResult delegates to list().
First of all, you need an index on mytable.parent_id.
That should make your query fast enough, even for big tables (unless there are also a lot of rows with the same parent_id).
If not, you could write
select 1 from mytable where parent_id = :id and rownum < 2
which would return a single row containing 1, or no row at all. It does not need to count the rows, just find one and then quit. But this is Oracle-specific SQL (because of rownum), and you should rather not.
For DB2 there is something like select * from mytable where parent_id = ? fetch first 1 row only. I assume that something similar exists for oracle.
This query will return 1 if any record exists and 0 otherwise:
SELECT COUNT(1) FROM (SELECT 1 FROM mytable WHERE ROWNUM < 2);
It could help when you need to check table data statistics, regardless table size and any performance issue.

Efficient checking of possible duplicate entities

I have a requirement to produce a list of possible duplicates before a user saves an entity to the database and warn them of the possible duplicates.
There are 7 criteria on which we should check the for duplicates and if at least 3 match we should flag this up to the user.
The criteria will all match on ID, so there is no fuzzy string matching needed but my problem comes from the fact that there are many possible ways (99 ways if I've done my sums corerctly) for at least 3 items to match from the list of 7 possibles.
I don't want to have to do 99 separate db queries to find my search results and nor do I want to bring the whole lot back from the db and filter on the client side. We're probably only talking of a few tens of thousands of records at present, but this will grow into the millions as the system matures.
Anyone got any thoughs of a nice efficient way to do this?
I was considering a simple OR query to get the records where at least one field matches from the db and then doing some processing on the client to filter it some more, but a few of the fields have very low cardinality and won't actually reduce the numbers by a huge amount.
Thanks
Jon
OR and CASE summing will work but are quite inefficient, since they don't use indexes.
You need to make UNION for indexes to be usable.
If a user enters name, phone, email and address into the database, and you want to check all records that match at least 3 of these fields, you issue:
SELECT i.*
FROM (
SELECT id, COUNT(*)
FROM (
SELECT id
FROM t_info t
WHERE name = 'Eve Chianese'
UNION ALL
SELECT id
FROM t_info t
WHERE phone = '+15558000042'
UNION ALL
SELECT id
FROM t_info t
WHERE email = '42#example.com'
UNION ALL
SELECT id
FROM t_info t
WHERE address = '42 North Lane'
) q
GROUP BY
id
HAVING COUNT(*) >= 3
) dq
JOIN t_info i
ON i.id = dq.id
This will use indexes on these fields and the query will be fast.
See this article in my blog for details:
Matching 3 of 4: how to match a record which matches at least 3 of 4 possible conditions
Also see this question the article is based upon.
If you want to have a list of DISTINCT values in the existing data, you just wrap this query into a subquery:
SELECT i.*
FROM t_info i1
WHERE EXISTS
(
SELECT 1
FROM (
SELECT id
FROM t_info t
WHERE name = i1.name
UNION ALL
SELECT id
FROM t_info t
WHERE phone = i1.phone
UNION ALL
SELECT id
FROM t_info t
WHERE email = i1.email
UNION ALL
SELECT id
FROM t_info t
WHERE address = i1.address
) q
GROUP BY
id
HAVING COUNT(*) >= 3
)
Note that this DISTINCT is not transitive: if A matches B and B matches C, this does not mean that A matches C.
You might want something like the following:
SELECT id
FROM
(select id, CASE fld1 WHEN input1 THEN 1 ELSE 0 "rule1",
CASE fld2 when input2 THEN 1 ELSE 0 "rule2",
...,
CASE fld7 when input7 THEN 1 ELSE 0 "rule2",
FROM table)
WHERE rule1+rule2+rule3+...+rule4 >= 3
This isn't tested, but it shows a way to tackle this.
What DBS are you using? Some support using such constraints by using server side code.
Have you considered using a stored procedure with a cursor? You could then do your OR query and then step through the records one-by-one looking for matches. Using a stored procedure would allow you to do all the checking on the server.
However, I think a table scan with millions of records is always going to be slow. I think you should work out which of the 7 fields are most likely to match are make sure these are indexed.
I'm assuming your system is trying to match tag ids of a certain post, or something similar. This is a multi-to-multi relationship and you should have three tables to handle it. One for the post, one for tags and one for post and tags relationship.
If my assumptions are correct then the best way to handle this is:
SELECT postid, count(tagid) as common_tag_count
FROM posts_to_tags
WHERE tagid IN (tag1, tag2, tag3, ...)
GROUP BY postid
HAVING count(tagid) > 3;

Resources