WHERE inside CASE statement - sql-server

I have seen lots of posts about putting CASE statements inside WHERE clauses. However, in my case, I want to put a WHERE statement inside a CASE statement, which I don't think can be done.
My situation is that I have a view that is going on many, many databases. In half of them, I need to have a WHERE clause. In the other half, I don't need the WHERE clause. Leaving it in isn't harmful, in that I don't get bad data, but it slows the query down considerably given that it has to read and sort a large table when it is unnecessary.
Status_Table
ID Status
1 Active
2 Inactive
3 Unknown --this row of data will not exist in some DBs
Item_table
Item_ID Status value1 value2
1001 3 ..... .....
1002 1 ..... .....
What I want to do is something like this.
-- big nasty ugly query with various CTEs, selects, and joins
CASE WHEN (SELECT MAX(ID) FROM Status_table) = 3
THEN WHERE Item_Table.Status != 3
ELSE WHERE 1=1
END
Ultimately, I can probably swing this by constructing the query using dynamic SQL, but I was hoping for a more elegant solution.

You're trying to build a conditional WHERE clause, yes? I'd think something like this would work pretty well for what you want to do:
WHERE (EXISTS (SELECT 1 FROM Status_table WHERE ID = 3)
AND Item_Table.Status != 3)
OR
NOT EXISTS (SELECT 1 FROM Status_table WHERE ID = 3)

SQL Fiddle
MS SQL Server 2014 Schema Setup:
CREATE TABLE Status_Table ( ID int, [Status] varchar(30) ) ;
INSERT INTO Status_Table ( ID, [Status] )
VALUES
( 1,'Active' ), (2,'Inactive'), (3,'Unknown')
;
CREATE TABLE Item_table ( Item_ID int, [Status] int, value1 varchar(10), value2 varchar(10) ) ;
/* NOTE: Item_table.[Status] should be a descriptive name, like [StatusID], or something not the same name as a different datatype column in the relation table. */
INSERT INTO Item_Table ( Item_ID, [Status], value1, value2 )
VALUES (1001,3,'Bill','Exclude')
,(1002,1,'Ted','Include')
,(1003,3,'Rufus','Exclude')
,(1004,2,'Jay','Include')
,(1005,5,'Bob','BadRecord')
;
Query 1:
SELECT s1.Item_ID, s1.Status, s1.Value1, s1.Value2
FROM (
SELECT i.Item_ID, i.Status, i.Value1, i.Value2
, RANK() OVER (ORDER BY s.ID DESC) AS rn
FROM Item_Table i
RIGHT OUTER JOIN Status_Table s ON i.[Status] = s.ID
) s1
WHERE s1.rn <> 1
Results:
| Item_ID | Status | Value1 | Value2 |
|---------|--------|--------|---------|
| 1004 | 2 | Jay | Include |
| 1002 | 1 | Ted | Include |

If its performance your after start using DECLARE and SET.
It only needs to execute ONCE!!!
DECLARE #UnkownExists AS INT
SET #UnkownExists = (select count(*) from Status_table where ID = 3)
-- big nasty ugly query with various CTEs, selects, and joins
WHERE (#UnkownExists=1 AND Item_Table.Status != 3) OR #NoUnknown=0

You can try this:
CASE WHEN (SELECT MAX(ID) FROM Status_table) = 3
THEN 3
ELSE (SELECT MAX(ID)+1 FROM Status_table) --Invalid value that will not exist
END <> Item_Table.Status

Related

Part Id 3900 take wrong technology id as 7 and it must Be 2 because Feature Name and Value Exist?

I work on sql server 2017 I have table #partsfeature already exist as below
create table #partsfeature
(
PartId int,
FeatureName varchar(300),
FeatureValue varchar(300),
TechnologyId int
)
insert into #partsfeature(PartId,FeatureName,FeatureValue,TechnologyId)
values
(1211,'AC','5V',1),
(2421,'grail','51V',2),
(6211,'compress','33v',3)
my issue Done For Part id 3900 it take wrong
Technology Id 7 and Correct Must be 2
Because Feature name and Feature Value Exist
So it Must Take Same TechnologyId Exist
on Table #partsfeature as Technology Id 2 .
correct will be as Below
+--------+--------------+---------------+-------------
| PartID | FeatureName | FeatureValue | TechnologyId
+--------+--------------+---------------+-------------
| 3900 | grail | 51V | 2
+--------+--------------+---------------+-------
what I try is
insert into #partsfeature(PartId,FeatureName,FeatureValue,TechnologyId)
select PartId,FeatureName,FeatureValue,
TechnologyId = dense_rank() over (order by FeatureName,FeatureValue)
+ (select max(TechnologyId) from #partsfeature)
from
(
values
(3900,'grail','51V',NULL),
(5442,'compress','30v',NULL),
(7791,'AC','59V',NULL),
(8321,'Angit','50V',NULL)
) s (PartId,FeatureName,FeatureValue,TechnologyId)
Expected Result For Parts Inserted
Use NOT EXISTS() to check for existance of FeatureName and FeatureValue.
Sub query to get existing maximum TechnologyId from table and row_number() to generate a running sequence
insert into #partsfeature(PartId,FeatureName,FeatureValue,TechnologyId)
select PartId,FeatureName,FeatureValue,
TechnologyId = row_number() over (order by PartId)
+ (select max(TechnologyId) from #partsfeature)
from
(
values
(3900,'grail','51V',NULL),
(5442,'compress','30v',NULL),
(7791,'AC','59V',NULL),
(8321,'Angit','50V',NULL)
) s (PartId,FeatureName,FeatureValue,TechnologyId)
where not exists
(
select *
from #partsfeature x
where x.FeatureName = s.FeatureName
and x.FeatureValue = s.FeatureValue
)

what to use instead of union to join same results based on two where clauses

I have two queries that work as expected for example
Query 1
select Name,ID,Product,Question
from table 1
where Id= 9 and ProductID=30628
table output
Name | ID | Product | QUestion
0659e103-b33d-4603 |12356|Apple | is it picked up?
0659e103-b33d-4603 |12456|Apple |Available in store?
0659e103-b33d-4603 |12458|Apple |confirm order?
query 2
select Name,ID,Product,Question
from table 1
where Id= 9 and TypeID=2
table output
Name | ID | Product | QUestion
0659e103-b33d-4603 |12347|Apple | Problem at store?
as you can see in query 1 i use a ProductID and in query 2 i use a TypeID these two values gives me different out puts
so i used a union to join both as follows
select Name,ID,Product,Question
from table 1
where Id= 9 and ProductID=30628
union
select Name,ID,Product,Question
from table 1
where Id= 9 and TypeID=2
which i get the desired output
Name | ID | Product | QUestion
0659e103-b33d-4603 |12356|Apple | is it picked up?
0659e103-b33d-4603 |12456|Apple |Available in store?
0659e103-b33d-4603 |12458|Apple |confirm order?
0659e103-b33d-4603 |12347|Apple | Problem at store?
is their a better way to do this because my query will grow and i would not like to repeat the same thing over again. is their a better way to optimize the query?
NOte i can not use ProductID and TypeID on the same line because they do not result in accurate results
You could use OR since you are querying the same table.
SELECT Name
,ID
,Product
,Question
FROM TABLE1
WHERE (
Id = 9
AND ProductID = 30628
)
OR (
Id = 9
AND TypeID = 2
)
If you have a growing number of OR conditions you could use a temp table/variable and inner join to profit from a set based operation.
The inner join will only return matching rows.
CREATE TABLE #SomeTable(Id INT NOT NULL, ProductID INT NULL, TypeID INT NULL)
-- Insert all conditions you want to match.
INSERT INTO #SomeTable(Id, ProductID, TypeId)
VALUES (9, 30628, NULL)
, (9, NULL, 2)
SELECT Name
,ID
,Product
,Question
FROM TABLE1 x
INNER JOIN #SomeTable y ON
x.ID = y.ID -- Since ID is Not null in the temp table
AND (y.ProductID IS NULL OR y.ProductID = x.ProductID)
AND (y.TypeID IS NULL OR y.TypeID = x.TypeID)
You can use cas-when clause with a self join.
Case-when something like this:
SELECT t1_2.Name,
t1_2.ID,
t1_2.Product,
t1_2.Question,
(CASE WHEN (t1.Id= 9 and t1.ProductID=30628) THEN ID
WHEN (t1.Id= 9 and t1.TypeID=2) THEN ID
ELSE NULL) AS IDcalc
FROM table_1 t1 LEFT JOIN table_1 t1_2
ON t1.ID = t1_2.ID
WHERE (CASE WHEN (t1.Id= 9 and t1.ProductID=30628) THEN ID
WHEN (t1.Id= 9 and t1.TypeID=2) THEN ID
ELSE NULL) IS NOT NULL
You can use any table in the join.
In comparison of query performance the OR is much better until you have only one table, if you have more tables, then you should use temp table or case-when in your query.

Simplify multiple joins

I have a Claims table with 70 columns, 16 of which contain diagnosis codes. The codes mean nothing, so I need to pull the descriptions for each code located in a separate table.
There has to be a simpler way of pulling these code descriptions:
-- This is the claims table
FROM
[database].[schema].[claimtable] AS claim
-- [StagingDB].[schema].[Diagnosis] table where the codes located
-- [ICD10_CODE] column contains the code
LEFT JOIN
[StagingDB].[schema].[Diagnosis] AS diag1 ON claim.[ICDDiag1] = diag1.[ICD10_CODE]
LEFT JOIN
[StagingDB].[schema].[Diagnosis] AS diag2 ON claim.[ICDDiag2] = diag2.[ICD10_CODE]
LEFT JOIN
[StagingDB].[schema].[Diagnosis] AS diag3 ON claim.[ICDDiag3] = diag3.[ICD10_CODE]
-- and so on, up to ....
LEFT JOIN
[StagingDB].[schema].[Diagnosis]AS diag16 ON claim.[ICDDiag16] = diag16.[ICD10_CODE]
-- reported column will be [code_desc]
-- ie:
-- diag1.[code_desc] AS Diagnosis1
-- diag2.[code_desc] AS Diagnosis2
-- diag3.[code_desc] AS Diagnosis3
-- diag4.[code_desc] AS Diagnosis4
-- etc.
I think what you are doing is already correct in given scenario.
Another way can be from programming point of view or you can give try and compare ther performace.
i) Pivot Claim table on those 16 description columns.
ii) Join the Pivoted column with [StagingDB].[schema].[Diagnosis]
Another way can be to put [StagingDB].[schema].[Diagnosis] table in some #temp table
instead of touching large Staging 16 times.
But for data analysis has to be done to decide if there is any way.
You can go for UNPIVOT of the claimTable and then join with Diagnosis table.
TEST SETUP
create table #claimTable(ClaimId INT, Diag1 VARCHAR(10), Diag2 VARCHAR(10))
CREATE table #Diagnosis(code VARCHAR(10), code_Desc VARCHAR(255))
INSERT INTO #ClaimTable
VALUES (1, 'Fever','Cold'), (2, 'Headache','toothache');
INSERT INTO #Diagnosis
VALUEs ('Fever','Fever Desc'), ('cold','cold desc'),('headache','headache desc'),('toothache','toothache desc');
Query to Run
;WITH CTE_Claims AS
(SELECT ClaimId,DiagnosisNumeral, code
FROM #claimTable
UNPIVOT
(
code FOR DiagnosisNumeral in ([Diag1],[Diag2])
) as t
)
SELECT c.ClaimId, c.code, d.code_Desc
FROM CTE_Claims AS c
INNER JOIN #Diagnosis as d
on c.code = d.code
ResultSet
+---------+-----------+----------------+
| ClaimId | code | code_Desc |
+---------+-----------+----------------+
| 1 | Fever | Fever Desc |
| 1 | Cold | cold desc |
| 2 | Headache | headache desc |
| 2 | toothache | toothache desc |
+---------+-----------+----------------+

SQL Select Records ONLY When a Column Value Is In More Than Once

I have a stored procedure in SQL Server, I am trying to select only the records where a column's value is in there more than once, This may seem a bit of an odd request but I can't seem to figure it out, I have tried using HAVING clauses but had no luck..
I want to be able to only select records that have the ACCOUNT in there more than once, So for example:
ACCOUNT | PAYDATE
-------------------
B066 | 15
B066 | OUTSTAND
B027 | OUTSTAND <--- **SHOULD NOT BE IN THE SELECT**
B039 | 09
B039 | OUTSTAND
B052 | 09
B052 | 15
B052 | OUTSTAND
BO27 should NOT show in my select, and the rest of the ACCOUNTS should.
here is my start and end of the Stored Procedure:
Select * from (
*** SELECTS ARE HERE ***
) X where O_STAND <> 0.0000
group by X.ACCOUNT, X.ACCT_NAME , X.DAYS_CR, X.PAYDATE, X.O_STAND
order by X.ACCOUNT
I have been struggling with this for a while, any help or advice would be appreciated. Thank you in advance.
you could replace the first string with
Select *, COUNT(*) OVER (PARTITION BY ACCOUNT) cnt FROM (
and then wrap your query as subquery once more
SELECT cols FROM ( query ) q WHERE cnt>1
Yes, the having clause is for solving exactly this kind of tasks. Basically, it's like where, but allows to filter not only by column values, but also by aggregate functions' results:
declare #t table (
Id int identity(1,1) primary key,
AccountId varchar(20)
);
insert into #t (AccountId)
values
('B001'),
('B002'),
('B015'),
('B015'),
('B002');
-- Get all rows for which AccountId value is encountered more than once in the table
select *
from #t t
where exists (
select 0
from #t h
where h.AccountId = t.AccountId
group by h.AccountId
having count(h.AccountId) > 1
);

More efficient alternative to subquery

I've got many "actual" and "history companion tables" (so to speak) with structure of last like this:
values| date_deal | type_deal | num (autoinc)
value1| 01.01.2012 | i | 1
value1| 02.01.2012 | u | 2
value2| 02.01.2012 | i | 3
value2| 03.01.2012 | u | 4
value1| 04.01.2012 | d | 5
value2| 05.01.2012 | u | 6
value2| 08.01.2012 | u | 7
If I insert (or update or delete) record in "actual" table, trigger puts affected record into "history table" with date_deal = Geddate(), type_deal = i|u|d (for insert, update and delete triggers respectivly) and num as autoinc unique value
So the question is how to get last record for each distinct value valid on certain date and excluding from final result records which type_deal = 'd' (since that record was deleted from actual table by that time and we don't want to have anything assosiated with it)
The way I do it most of the time:
SELECT *
FROM t_table1 t1
WHERE t1.num = ( SELECT MAX(num)
FROM t_table1 t2
WHERE t2.[values] = t1.[values]
AND t2.[date_deal] < #dt)
AND t1.[type_deal] <> 'D'
But that works very slow sometimes. I'm looking for more efficient alternative. Please, help
So, an update.
Thanks for replies, friends.
I've made some testing on both actual and testing servers.
In order to put these different approaches into same league I've decided that we should take all fields from source table.
Testing server has bellow 200K records and I also had a luxury of using DBCC FreeProcCache
and DBCC DropCleanbuffers directives. Actual working server has over 2.3M records and also no option for droping buffs or cache since.. well.. it is in use by real users. So it was droped only once and i've got results right after that.
Here is actual queries and time it took on both servers:
Original:
DECLARE #dt datetime = CONVERT(datetime, '01.08.2013', 104)
SELECT *
FROM [CLIENTS_HISTORY].[dbo].[Clients_all_h] c
WHERE c.num = ( SELECT MAX(num)
FROM [CLIENTS_HISTORY].[dbo].[Clients_all_h] c2
WHERE c2.[AccountSys] = c.[AccountSys]
AND date_deal <= #dt)
AND c.type_deal <> 'D'
61sec # 2'316'890rec on real one, 4sec # 191'533 on test
Rahul's:
SELECT *
FROM [CLIENTS_HISTORY].[dbo].[Clients_all_h] c
GROUP BY [all_fields]
HAVING c.num = ( SELECT MAX(num)
FROM [CLIENTS_HISTORY].[dbo].[Clients_all_h] c2
WHERE c2.[AccountSys] = c.[AccountSys]
AND date_deal <= #dt)
AND c.type_deal <> 'D'
62sec # 2'316'890rec on real one, 4sec # 191'533 on test
Almost equal
George's (with some major changes):
SELECT * FROM
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY accountsys ORDER BY num desc) AS aa
FROM [CLIENTS_HISTORY].[dbo].[Clients_all_h] c
WHERE c.date_deal < #dt) as a
WHERE aa=1
AND type_deal <> 'D'
76sec # 2'316'890rec on real one, 5sec # 191'533 on test
So far original and Rahul's are fastest and George's is not so fast.
Try using GROUP BY..HAVING CLAUSE
SELECT *
FROM t_table1 t1
GROUP BY [column_names]
HAVING t1.num = ( SELECT MAX(num)
FROM t_table1 t2
WHERE t2.[values] = t1.[values]
AND t2.[date_deal] < #dt)
AND t1.[type_deal] <> 'D'
I think row_num() could be usefull for you as follows:
select
*
from
(
select
*,
row_number() over( partition by date_deal order by num) as aa
from
t_table1 t1
where
t1.[type_deal] <> 'D'
) as a
where
aa=1

Resources