T-SQL Selecting TOP 1 In A Query With Aggregates/Groups - sql-server

I'm still fairly new to SQL. This is a stripped down version of the query I'm trying to run. This query is suppose to find those customers with more than 3 cases and display either the top 1 case or all cases but still show the overall count of cases per customer in each row in addition to all the case numbers.
The TOP 1 subquery approach didn't work but is there another way to get the results I need? Hope that makes sense.
Here's the code:
SELECT t1.StoreID, t1.CustomerID, t2.LastName, t2.FirstName
,COUNT(t1.CaseNo) AS CasesCount
,(SELECT TOP 1 t1.CaseNo)
FROM MainDatabase t1
INNER JOIN CustomerDatabase t2
ON t1.StoreID = t2.StoreID
WHERE t1.SubmittedDate >= '01/01/2017' AND t1.SubmittedDate <= '05/31/2017'
GROUP BY t1.StoreID, t1.CustomerID, t2.LastName, t2.FirstName
HAVING COUNT (t1.CaseNo) >= 3
ORDER BY t1.StoreID, t1.PatronID
I would like it to look something like this, either one row with just the most recent case and detail or several rows showing all details of each case in addition to the store id, customer id, last name, first name, and case count.
Data Example

For these I usually like to make a temp table of aggregates:
DROP TABLE IF EXISTS #tmp;
CREATE TABLE #tmp (
CustomerlD int NOT NULL DEFAULT 0,
case_count int NOT NULL DEFAULT 0,
case_max int NOT NULL DEFAULT 0,
);
INSERT INTO #tmp
(CustomerlD, case_count, case_max)
SELECT CustomerlD, COUNT(tl.CaseNo), MAX(tl.CaseNo)
FROM MainDatabase
GROUP BY CustomerlD;
Then you can join this "tmp" table back to any other table you want to display the number of cases on, or the max case number on. And you can limit it to customers that have more than 3 cases with WHERE case_count > 3

Related

SQL - Attain Previous Transaction Informaiton [duplicate]

I need to calculate the difference of a column between two lines of a table. Is there any way I can do this directly in SQL? I'm using Microsoft SQL Server 2008.
I'm looking for something like this:
SELECT value - (previous.value) FROM table
Imagining that the "previous" variable reference the latest selected row. Of course with a select like that I will end up with n-1 rows selected in a table with n rows, that's not a probably, actually is exactly what I need.
Is that possible in some way?
Use the lag function:
SELECT value - lag(value) OVER (ORDER BY Id) FROM table
Sequences used for Ids can skip values, so Id-1 does not always work.
SQL has no built in notion of order, so you need to order by some column for this to be meaningful. Something like this:
select t1.value - t2.value from table t1, table t2
where t1.primaryKey = t2.primaryKey - 1
If you know how to order things but not how to get the previous value given the current one (EG, you want to order alphabetically) then I don't know of a way to do that in standard SQL, but most SQL implementations will have extensions to do it.
Here is a way for SQL server that works if you can order rows such that each one is distinct:
select rank() OVER (ORDER BY id) as 'Rank', value into temp1 from t
select t1.value - t2.value from temp1 t1, temp1 t2
where t1.Rank = t2.Rank - 1
drop table temp1
If you need to break ties, you can add as many columns as necessary to the ORDER BY.
WITH CTE AS (
SELECT
rownum = ROW_NUMBER() OVER (ORDER BY columns_to_order_by),
value
FROM table
)
SELECT
curr.value - prev.value
FROM CTE cur
INNER JOIN CTE prev on prev.rownum = cur.rownum - 1
Oracle, PostgreSQL, SQL Server and many more RDBMS engines have analytic functions called LAG and LEAD that do this very thing.
In SQL Server prior to 2012 you'd need to do the following:
SELECT value - (
SELECT TOP 1 value
FROM mytable m2
WHERE m2.col1 < m1.col1 OR (m2.col1 = m1.col1 AND m2.pk < m1.pk)
ORDER BY
col1, pk
)
FROM mytable m1
ORDER BY
col1, pk
, where COL1 is the column you are ordering by.
Having an index on (COL1, PK) will greatly improve this query.
LEFT JOIN the table to itself, with the join condition worked out so the row matched in the joined version of the table is one row previous, for your particular definition of "previous".
Update: At first I was thinking you would want to keep all rows, with NULLs for the condition where there was no previous row. Reading it again you just want that rows culled, so you should an inner join rather than a left join.
Update:
Newer versions of Sql Server also have the LAG and LEAD Windowing functions that can be used for this, too.
select t2.col from (
select col,MAX(ID) id from
(
select ROW_NUMBER() over(PARTITION by col order by col) id ,col from testtab t1) as t1
group by col) as t2
The selected answer will only work if there are no gaps in the sequence. However if you are using an autogenerated id, there are likely to be gaps in the sequence due to inserts that were rolled back.
This method should work if you have gaps
declare #temp (value int, primaryKey int, tempid int identity)
insert value, primarykey from mytable order by primarykey
select t1.value - t2.value from #temp t1
join #temp t2
on t1.tempid = t2.tempid - 1
Another way to refer to the previous row in an SQL query is to use a recursive common table expression (CTE):
CREATE TABLE t (counter INTEGER);
INSERT INTO t VALUES (1),(2),(3),(4),(5);
WITH cte(counter, previous, difference) AS (
-- Anchor query
SELECT MIN(counter), 0, MIN(counter)
FROM t
UNION ALL
-- Recursive query
SELECT t.counter, cte.counter, t.counter - cte.counter
FROM t JOIN cte ON cte.counter = t.counter - 1
)
SELECT counter, previous, difference
FROM cte
ORDER BY counter;
Result:
counter
previous
difference
1
0
1
2
1
1
3
2
1
4
3
1
5
4
1
The anchor query generates the first row of the common table expression cte where it sets cte.counter to column t.counter in the first row of table t, cte.previous to 0, and cte.difference to the first row of t.counter.
The recursive query joins each row of common table expression cte to the previous row of table t. In the recursive query, cte.counter refers to t.counter in each row of table t, cte.previous refers to cte.counter in the previous row of cte, and t.counter - cte.counter refers to the difference between these two columns.
Note that a recursive CTE is more flexible than the LAG and LEAD functions because a row can refer to any arbitrary result of a previous row. (A recursive function or process is one where the input of the process is the output of the previous iteration of that process, except the first input which is a constant.)
I tested this query at SQLite Online.
You can use the following funtion to get current row value and previous row value:
SELECT value,
min(value) over (order by id rows between 1 preceding and 1
preceding) as value_prev
FROM table
Then you can just select value - value_prev from that select and get your answer

postgresql: Insert two values in table b if both values are not in table a

I'm doing an assignment where I am to make an sql-database of a tournament result. Players can be added by their name, and when the database has at least two or more players who has not already been assigned to a match, two players should be matched against each other.
For instance, if the tables currently are empty I add Joe as a player. I then also add James and since the table then has two players, who also are not in the matches-table, a new row in the matches-table is created with their p_id set to left_player_P_id and right_player_P_id.
I thought it would be a good idea to create a function and a trigger so that every time a row is added to the player-table, the sql-code would run and create the row in the matches as needed. I am open to other ways of doing this.
I've tried multiple different approaches including SQL - Insert if the number of rows is greater than and Using IF ELSE statement based on Count to execute different Insert statements but I am now at a loss.
Problematic code:
This approach returns a syntax error.
IF ((select count(*) from players_not_in_any_matches) >= 2)
begin
insert into matches values (
(select p_id from players_not_in_any_matches limit 1),
(select p_id from players_not_in_any_matches limit 1 offset 1)
)
end;
Alternative approach (still problematic code):
This approach seems more promising (but less readable). However, it inserts even if there are no rows returned inside the where not exists.
insert into matches (left_player_p_id, right_player_p_id)
select
(select p_id from players_not_in_any_matches limit 1),
(select p_id from players_not_in_any_matches limit 1 offset 1)
where not exists (
select * from players_not_in_any_matches offset 2
);
Tables
CREATE TABLE players (
p_id serial PRIMARY KEY,
full_name text
);
CREATE TABLE matches(
left_player_P_id integer REFERENCES players,
right_player_P_id integer REFERENCES players,
winner integer REFERENCES players
);
Views
-- view for getting all players not currently assigned to a match
create view players_not_in_any_matches as
select * from players
where p_id not in (
select left_player_p_id from matches
) and
p_id not in (
select right_player_p_id from matches
);
Try:
insert into matches (left_player_p_id, right_player_p_id)
select p1.p_id, p2.p_id
from players p1
join players p2
on p1.p_id <> p2.p_id
and not exists(
select 1 from matches m
where p1.p_id in (m.left_player_p_id, m.right_player_p_id)
)
and not exists(
select 1 from matches m
where p2.p_id in (m.left_player_p_id, m.right_player_p_id)
)
limit 1
Anti joins (not-exists operators) in the above query could be further simplified a bit using LEFT JOINs:
insert into matches (left_player_p_id, right_player_p_id)
select p1.p_id, p2.p_id
from players p1
join players p2
left join matches m1
on p1.p_id in (m1.left_player_p_id, m1.right_player_p_id)
left join matches m2
on p2.p_id in (m2.left_player_p_id, m2.right_player_p_id)
where m1.left_player is null
and m2.left_player is null
limit 1
but in my opinion the former query is more readable, while the latter one looks tricky.

SQL Join one-to-many tables, selecting only most recent entries

This is my first post - so I apologise if it's in the wrong seciton!
I'm joining two tables with a one-to-many relationship using their respective ID numbers: but I only want to return the most recent record for the joined table and I'm not entirely sure where to even start!
My original code for returning everything is shown below:
SELECT table_DATES.[date-ID], *
FROM table_CORE LEFT JOIN table_DATES ON [table_CORE].[core-ID] = table_DATES.[date-ID]
WHERE table_CORE.[core-ID] Like '*'
ORDER BY [table_CORE].[core-ID], [table_DATES].[iteration];
This returns a group of records: showing every matching ID between table_CORE and table_DATES:
table_CORE date-ID iteration
1 1 1
1 1 2
1 1 3
2 2 1
2 2 2
3 3 1
4 4 1
But I need to return only the date with the maximum value in the "iteration" field as shown below
table_CORE date-ID iteration Additional data
1 1 3 MoreInfo
2 2 2 MoreInfo
3 3 1 MoreInfo
4 4 1 MoreInfo
I really don't even know where to start - obviously it's going to be a JOIN query of some sort - but I'm not sure how to get the subquery to return only the highest iteration for each item in table 2's ID field?
Hope that makes sense - I'll reword if it comes to it!
--edit--
I'm wondering how to integrate that when I'm needing all the fields from table 1 (table_CORE in this case) and all the fields from table2 (table_DATES) joined as well?
Both tables have additional fields that will need to be merged.
I'm pretty sure I can just add the fields into the "SELECT" and "GROUP BY" clauses, but there are around 40 fields altogether (and typing all of them will be tedious!)
Try using the MAX aggregate function like this with a GROUP BY clause.
SELECT
[ID1],
[ID2],
MAX([iteration])
FROM
table_CORE
LEFT JOIN table_DATES
ON [table_CORE].[core-ID] = table_DATES.[date-ID]
WHERE
table_CORE.[core-ID] Like '*' --LIKE '%something%' ??
GROUP BY
[ID1],
[ID2]
Your example field names don't match your sample query so I'm guessing a little bit.
Just to make sure that I have everything you’re asking for right, I am going to restate some of your question and then answer it.
Your source tables look like this:
table_core:
table_dates:
And your outputs are like this:
Current:
Desired:
In order to make that happen all you need to do is use a subquery (or a CTE) as a “cross-reference” table. (I used temp tables to recreate your data example and _ in place of the - in your column names).
--Loading the example data
create table #table_core
(
core_id int not null
)
create table #table_dates
(
date_id int not null
, iteration int not null
, additional_data varchar(25) null
)
insert into #table_core values (1), (2), (3), (4)
insert into #table_dates values (1,1, 'More Info 1'),(1,2, 'More Info 2'),(1,3, 'More Info 3'),(2,1, 'More Info 4'),(2,2, 'More Info 5'),(3,1, 'More Info 6'),(4,1, 'More Info 7')
--select query needed for desired output (using a CTE)
; with iter_max as
(
select td.date_id
, max(td.iteration) as iteration_max
from #table_dates as td
group by td.date_id
)
select tc.*
, td.*
from #table_core as tc
left join iter_max as im on tc.core_id = im.date_id
inner join #table_dates as td on im.date_id = td.date_id
and im.iteration_max = td.iteration
select *
from
(
SELECT table_DATES.[date-ID], *
, row_number() over (partition by table_CORE date-ID order by iteration desc) as rn
FROM table_CORE
LEFT JOIN table_DATES
ON [table_CORE].[core-ID] = table_DATES.[date-ID]
WHERE table_CORE.[core-ID] Like '*'
) tt
where tt.rn = 1
ORDER BY [core-ID]

Getting Random Number for each row

I have a table with some names in a row. For each row I want to generate a random name. I wrote the following query to:
BEGIN transaction t1
Create table TestingName
(NameID int,
FirstName varchar(100),
LastName varchar(100)
)
INSERT INTO TestingName
SELECT 0,'SpongeBob','SquarePants'
UNION
SELECT 1, 'Bugs', 'Bunny'
UNION
SELECT 2, 'Homer', 'Simpson'
UNION
SELECT 3, 'Mickey', 'Mouse'
UNION
SELECT 4, 'Fred', 'Flintstone'
SELECT FirstName from TestingName
WHERE NameID = ABS(CHECKSUM(NEWID())) % 5
ROLLBACK Transaction t1
The problem is the "ABS(CHECKSUM(NEWID())) % 5" portion of this query sometime returns more than 1 row and sometimes returns 0 rows. I must be missing something but I can't see it.
If I change the query to
DECLARE #n int
set #n= ABS(CHECKSUM(NEWID())) % 5
SELECT FirstName from TestingName
WHERE NameID = #n
Then everything works and I get a random number per row.
If you take the query above and paste it into SQL management studio and run the first query a bunch of times you will see what I am attempting to describe.
The final update query will look like
Update TableWithABunchOfNames
set [FName] = (SELECT FirstName from TestingName
WHERE NameID = ABS(CHECKSUM(NEWID())) % 5)
This does not work because sometimes I get more than 1 row and sometimes I get no rows.
What am I missing?
The problem is that you are getting a different random value for each row. That is the problem. This query is probably doing a full table scan. The where clause is executed for each row -- and a different random number is generated.
So, you might get a sequence of random numbers where none of the ids match. Or a sequence where more than one matches. On average, you'll have one match, but you don't want "on average", you want a guarantee.
This is when you want rand(), which produces only one random number per query:
SELECT FirstName
from TestingName
WHERE NameID = floor(rand() * 5);
This should get you one value.
Why not use top 1?
Select top 1 firstName
From testingName
Order by newId()
This worked for me:
WITH
CTE
AS
(
SELECT
ID
,FName
,CAST(5 * (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) AS int) AS rr
FROM
dbo.TableWithABunchOfNames
)
,CTE_ForUpdate
AS
(
SELECT
CTE.ID
, CTE.FName
, dbo.TestingName.FirstName AS RandomName
FROM
CTE
LEFT JOIN dbo.TestingName ON dbo.TestingName.NameID = CTE.rr
)
UPDATE CTE_ForUpdate
SET FName = RandomName
;
This solution depends on how smart optimizer is.
For example, if I use INNER JOIN instead of LEFT JOIN (which is the correct choice for this query), optimizer would move calculation of random numbers outside the join loop and end result would be not what we expect.
I created a table TestingName with 5 rows as in the question and a table TableWithABunchOfNames with 100 rows.
Here is the execution plan with LEFT JOIN. You can see the Compute scalar that calculates random numbers is done before the join loop. You can see that 100 rows were updated:
Here is the execution plan with INNER JOIN. You can see the Compute scalar that calculates random numbers is done after the join loop and with extra filter. This query may update not all rows in TableWithABunchOfNames and some rows in TableWithABunchOfNames may be updated several times. You can see that Filter left 102 rows and Stream aggregate left only 69 rows. It means that only 69 rows were eventually updated and also there were multiple matches for some rows (102 - 69 = 33).
To guarantee that the result is what you expect you should generate random number for each row in TableWithABunchOfNames and explicitly remember the result, i.e. materialize the CTE shown above. Then use this temporary result to join with the table TestingName.
You can add a column to TableWithABunchOfNames to store generated random numbers or save CTE to a temp table or table variable.

Sqlite select query - returning default rows?

I have a table Emp like this for example.
----------------------
eName | eId
----------------------
Anusha 1
Sunny 2
Say i am looking for an entry whose id is 3.I want to write a query which finds the row and displays it.But if it doesnt find it it is expected to display a default row (temp,999)
select case
when (total != 0) then (select eName from Emp where eId = 3)
when (total == 0) then "temp"
end as eName,
case
when (total != 0) then (select eId from Emp where eId = 3)
when (total == 0) then 999
end as eId
from Emp,(select count(*) as total from Emp where eId = 3);
Using this query that i wrote it gives me two rows as a result.
temp 999
temp 999
I assume it is because of
(select count(*) as total from Emp where eId = 3) this query in the from list of the query.
I tried using the distinct clause and it gives me just a single row. But i am a little doubtful if i am messing the query and only trying to probably employ a hack to do it.Please suggest if there is a better way to do this or if i am wrong.
I'll get to how to do this right, but first let me give you a long answer to maybe help you with your understanding of SQL. What's happening to use is this:
Your select clause does not affect the number of records you get. So to understand what's happenning, let's simpify the query a little. Let's change it to,
select * from emp, (select count(*) as total from emp where eid=3)
I'm not sure what you think the comma after "emp" does here, but SQL see this as an old-style join on two tables: emp and the temporary table created by the select count(*), etc. There is no WHERE clause, so this is a cross join, but the second table only has one record anyway, so that part doesn't matter. But the fact that there is no WHERE clause means that you will get every record in emp, joined to the count. So the output of this query is:
ename eid count(*)
Anusha 1 0
Sunny 2 0
If you had 100 records, you would get 100 results.
Frankly there is no really clean way to do what you want in SQL. It's the sort of thing that's cleaner to do in code: do a plain "select ... where eid=3", and if you get no records, fill in the default at run-time.
But assuming that you need to do it in SQL for some reason, I think the simplest way would be:
select eid, ename from emp where eid=3
union
select 999 as eid, 'temp' as ename
where not exists (select 1 from emp where eid=3)
In some versions of SQL you need to give a dummy table name on the second select, like Oracle requires you to say "from dual".

Resources