SQL Server data grouping - sql-server

Good evening folks,
This probably would be simple for a lot of folks out there, but I am struggling on this problem for a couple of days now.
I have a table, lets call it dbo.RawDump table. On this table, I have the columns STKNBR and SaleTypeID.
Sample Data:
STKNBR SaleTypeID
1010186732 2
1010186732 1
1010188780 2
1010190707 1
1010190707 2
1010190350 2
1010190446 2
1010190647 2
What I am trying to figure out is how to only pick out the STKNBR's who have a SaleTypeID of 2. I dont want to pickup the ones which have a saletypeID of 1 and 2. The result should give me only those STKNBR's that have a SaleTypeID of only 2.
What I have tried so far:
SELECT STKNBR, SaleTypeID FROM dbo.RawDump lm
WHERE lm.SaleTypeID = 2 AND lm.SaleTypeID <> 1
I understand that this is probably a silly question, but any help is appreciated to overcome this.
Thanks for reading!
RV

It's fairly simple:
SELECT
STKNBR
FROM dbo.RawDump
GROUP BY STKNBR
HAVING MIN(SaleTypeId) = 2 AND MAX(SaleTypeId) = 2

Depending on what you want if there is a row SaleTypeId = 3, you may want this
SELECT
rd.STKNBR
FROM dbo.RawDump rd
GROUP BY
rd.STKNBR
HAVING COUNT(CASE WHEN rd.SaleTypeId = 1 THEN 1 END) = 0
AND COUNT(CASE WHEN rd.SaleTypeId = 2 THEN 1 END) > 0;

Related

SQL: Trying to find if gift was given year following first gift year

I'm using SSMS version 18.9.2 and I'm trying to get a list IDs who gave a gift the year following the year after their FIRST gift date. Meaning if their first gift was in 2019 and they gave a gift in 2020 then row count = 1, then the next person's first gift also was in 2019, but did NOT give a gift in 2020, then row count would remain 1 even though we have reviewed a total of two people. Hope that makes since.
Using a sample data as this, I would expect my row count to be 1; returning only ID 2
ID
Date
1
3/8/1981
1
2/11/1988
1
2/15/1995
2
2/22/1982
2
2/24/1983
2
3/15/1983
2
2/17/1984
3
2/16/1984
3
3/13/1984
3
6/13/1986
4
2/2/1983
4
3/11/1985
4
3/21/1986
This is the closest I've gotten to work. Notice the two different HAVINGs where as first works but the second fails which the second is how it needs to work:
SELECT DISTINCT
gifts1.giftid,
YEAR(gifts2.gifteffdat) AS 'MINYR',
YEAR(MIN(gifts1.gifteffdat)) AS 'MINYR+1'
FROM
gifts AS gifts1
INNER JOIN
gifts AS gifts2 ON gifts1.giftid = gifts2.giftid
AND DATEDIFF(year, gifts2.gifteffdat, gifts1.gifteffdat) = 1
GROUP BY
gifts1.giftid, gifts1.gifteffdat, gifts2.gifteffdat
-- THIS HAVING WORKS
HAVING
(YEAR(gifts2.gifteffdat) = 1982)
-- THIS HAVING DOESNT WORK
-- HAVING YEAR(gifts2.gifteffdat) = YEAR(MIN(gifts1.gifteffdat)) +1
I appreciate any help! Thank you!

Performing COUNT() on a computed Column from a VIEW

So all I want to do is have a view that shows how many kid between and including the age of 5 - 18 are in each family. I AM USING SQL SERVER.
The view I Have written to get the Family Members Ages is
CREATE VIEW VActiveMembers
AS
SELECT
TM.intMemberID AS intMemberID,
TM.strFirstName AS strFirstName,
TM.strLastName AS strLastName,
TM.strEmailAddress AS strEmailAddress,
TM.dtmDateOfBirth AS dtmDateOfBirth,
FLOOR(DATEDIFF(DAY, dtmDateOfBirth, GETDATE()) / 365.25) AS intMemberAge
FROM
TMembers AS TM
WHERE
TM.intStatusFlagID = 1
intStatusFlag = 1 is just a flag that means the member is active.
Now I have tried for about 3ish hours to figure this out but I cannot figure it out. Here is the one where instead of trying to get the solution in one fowl swoop I tried to step wise it, but then I still didn't get the result I wanted.
As you can see I didn't use the view where I calculated the AGE from because the "Multi-part Identifier could not be bound" I have seen that error but I couldn't get it to go away in this case. Ideally I would like the count to be performed on the VIEW instead of recalculating the ages all over again
CREATE VIEW VActiveFamilyMembersK12Count
AS
SELECT
TF.intParishFamilyID,
COUNT(DATEDIFF(DAY, dtmDateOfBirth, GETDATE()) / 365) AS intMemberAgeCount
FROM
TFamilies AS TF
INNER JOIN
TFamilyMembers AS TFM
INNER JOIN
VActiveMembers AS vAM ON (TFM.intMemberID = vAM.intMemberID)
ON (TFM.intParishFamilyID = TF.intParishFamilyID)
WHERE
TF.intStatusFlagID = 1
GROUP BY
TF.intParishFamilyID
I wanted to just get a count using the age calculation just to see If I could get a correct count of members in a family, then I could start building upon that to get a count of members of a certain age. The result I get back is 2 but there are guaranteed 3 members to each family.
The result I am looking For is this
Family_ID | K12Count
-----------------------------
1001 | 2
1002 | 0
1003 | 1
1004 | 0
Here is a list of resources I looked up trying to figure this out, maybe one of them is in fact the answer and I just don't see it, but I am at a loss at the moment.
SQL Select Count from below a certain age
How to get count of people based on age groups using SQL query in Oracle database?
Count number of user in a certain age's range base on date of birth
Conditional Count on a field
http://timmurphy.org/2010/10/10/conditional-count-in-sql/
*** EDIT ***
CREATE VIEW VActiveFamilyMembersK12Count
AS
SELECT
TF.intParishFamilyID,
SUM(CASE WHEN intMemberAge >= 5 AND intMemberAge <= 18 THEN 1 ELSE 0 END) AS intK12Count
FROM
TFamilies AS TF
INNER JOIN TFamilyMembers AS TFM
INNER JOIN VActiveMembers AS vAM
ON (TFM.intMemberID = vAM.intMemberID)
ON (TFM.intParishFamilyID = TF.intParishFamilyID)
WHERE
TF. intStatusFlagID = 1
GROUP BY
TF.intParishFamilyID
GO
THIS IS THE SOLUTION ABOVE.
Conditional count is the way to go.
Something like:
SELECT intParishFamilyID,
COUNT(CASE WHEN intMemberAge >=5 and intMemberAge <=18 THEN 1 ELSE 0 END)
FROM
TFamilies AS TF
INNER JOIN TFamilyMembers AS TFM
INNER JOIN VActiveMembers AS vAM
ON (TFM.intMemberID = vAM.intMemberID)
ON (TFM.intParishFamilyID = TF.intParishFamilyID)
WHERE
TF. intStatusFlagID = 1
GROUP BY
TF.intParishFamilyID

Advice on how best to manage this dataset?

New to SAS and would appreciate advice and help on how best to handle this data mangement situation.
I have a dataset in which each observation represents a client. Each client has a "description" variable which could include either a comprehensive assessment, treatment or discharge. I have created 3 new variables to flag each observation if they contain one of these.
So for example:
treat_yes = 1 if description contains "tx", "treatment"
dc_yes = 1 if description contains "dc", "d/c" or "discharge"
ca_yes = 1 if desciption contains "comprehensive assessment" or "ca" or "comprehensive ax"
My end goal is to have a new dataset of clients that have gone through a Comprehensive Assessment, Treatment and Discharge.
I'm a little stumped as to what my next move should be here. I have all my variables flagged for clients. But there could be duplicate observations just because a client could have come in many times. So for example:
Client_id treatment_yes ca_yes dc_yes
1234 0 1 1
1234 1 0 0
1234 1 0 1
All I really care about is if for a particular client the variables treatment_yes, ca_yes and dc_yes DO NOT equal 0 (i.e., they each have at least one "1". They could have more than one "1" but as long as they are flagged at least once).
I was thinking my next step might be to collapse the data (how do you do this?) for each unique client ID and sum treatment_yes, dc_yes and ca_yes for each client.
Does that work?
If so, how the heck do I accomplish this? Where do I start?
thanks everyone!
I think the easiest thing to do at this point is to use a proc sql step to find the max value of each of your three variables, aggregated by client_id:
data temp;
input Client_id $ treatment_yes ca_yes dc_yes;
datalines;
1234 0 1 1
1234 1 0 0
1234 1 0 1
;
run;
proc sql;
create table temp_collapse as select distinct
client_id, max(treatment_yes) as treatment_yes,
max(ca_yes) as ca_yes, max(dc_yes) as dc_yes
from temp
group by client_id;
quit;
A better overall approach would be to use the dataset you used to create the _yes variables and do something like max(case when desc = "tx" then 1 else 0 end) as treatment_yes etc., but since you're still new to SAS and understand what you've done so far, I think the above approach is totally sufficient.
The following code allows you to preserve other variables from your original dataset. I have added two variables (var1 and var2) for illustrative purposes:
data temp;
input Client_id $ treatment_yes ca_yes dc_yes var1 var2 $;
datalines;
1234 0 1 1 10 A
1234 1 0 0 11 B
1234 1 0 1 12 C
;
run;
Join the dataset with itself so that each row of a client_id in the original dataset is merged with its corresponding row in an aggregated dataset constructed in a subquery.
proc sql;
create table want as
select *
from temp as a
left join (select client_id,
max(treatment_yes) as max_treat,
max(ca_yes) as max_ca,
max(dc_yes) as max_dc
from temp
group by client_id) as b
on a.client_id=b.client_id;
quit;

TSQL Least number of appearances

My question is that I want to find the "Balie" with the least number of "Maatschappijen" booked on it. So far I got this query wich displays all "Balies" and all the "Maatschappijen" with them. The wanted result is one "balienummer" record with the least number of "maatschappijen" booked on it.
Query
SELECT [Balie].[balienummer], [IncheckenBijMaatschappij].[balienummer], [IncheckenBijMaatschappij].[maatschappijcode]
FROM [Balie]
JOIN [IncheckenBijMaatschappij]
ON [Balie].[balienummer] = [IncheckenBijMaatschappij].[balienummer]
Query result
balienummer balienummer maatschappijcode
1 1 BA
1 1 TR
2 2 AF
2 2 NZ
3 3 KL
4 4 KL
LRS: https://www.dropbox.com/s/f2l9a874d5witpt/LRS_CasusGelreAirport.pdf
SELECT [Balie].[balienummer], count([IncheckenBijMaatschappij].[maatschappijcode])
FROM [Balie]
JOIN [IncheckenBijMaatschappij]
ON [Balie].[balienummer] = [IncheckenBijMaatschappij].[balienummer]
GROUP BY [Balie].[balienummer]
ORDER BY count([IncheckenBijMaatschappij].[maatschappijcode])
First record should be your answer.

Ensure data integrity in SQL Server

I have to make some changes in a small system that stores data in one table as following:
TransId TermId StartDate EndDate IsActiveTerm
------- ------ ---------- ---------- ------------
1 1 2007-01-01 2007-12-31 0
1 2 2008-01-01 2008-12-31 0
1 3 2009-01-01 2009-12-31 1
1 4 2010-01-01 2010-12-31 0
2 1 2008-08-05 2009-08-04 0
2 2 2009-08-05 2010-08-04 1
3 1 2009-07-31 2010-07-30 1
3 2 2010-07-31 2011-07-30 0
where the rules are:
StartDate must be the previous
term EndDate + 1 day (terms cannot overlapping)
there are many terms per each transaction
term length is from 1 to n days (I
made 1 year to make it simpler in this example)
NOTE: IsActiveTerm is a computed column which depends on CurentDate so is not deterministic
I need to ensure terms not overlapping. In other words I want to enforce this condition even when inserting/updating a multiple rows.
What I am thinking of is to add an "INSTEAD OF" triggers (for both Insert and Update) but this requires to use cursors as I need to cope with multiple rows.
Does anyone have a better idea?
You can find pretty much everything about temporal databases in: Richard T. Snodgrass, "Developing Time-Oriented Database Applications in SQL", Morgan-Kaufman (2000), which i believe is out of print but can be downloaded via the link on his publication list
I've got working solution:
CREATE TRIGGER TransTerms_EnsureCon ON TransTerms
FOR INSERT, UPDATE, DELETE AS
BEGIN
IF (EXISTS (SELECT *
FROM TransTerms pT
INNER JOIN TransTerms nT
ON pT.TransId= nT.OfferLettingId
AND nT.TransTermId = pT.TransTermId + 1
WHERE nT.StartDate != DATEADD(d, 1, pT.EndDate)
AND pT.EndDate > pT.StartDate
AND nT.EndDate > nT.StartDate
)
)
RAISERROR('Transaction violates sequenced CONSTRAINT', 1, 2)
ROLLBACK TRANSACTION
END
P.S. Many thanks wallenborn!

Resources