We're running SQL Server 2016. I'm currently building a table that will hold employee data for various companies we have dealings with. The table will be populated with employee information for the various companies in the form of:
Name
Company
Status
DateTime
Person 01
Company1
11/08/2021 07:01:00.00
Person 02
Company1
11/08/2021 07:02:00.00
Person 03
Company1
11/08/2021 07:03:00.00
Person 04
Company1
11/08/2021 07:04:00.00
Person 05
Company1
11/08/2021 07:05:00.00
Person 06
Company1
11/08/2021 07:06:00.00
Person 07
Company1
11/08/2021 07:07:00.00
Person 08
Company1
11/08/2021 07:08:00.00
Person 09
Company1
11/08/2021 07:09:00.00
Person 10
Company1
11/08/2021 07:10:00.00
Person 11
Company1
11/08/2021 07:11:00.00
Person 12
Company1
11/08/2021 07:12:00.00
Person 13
Company2
11/08/2021 07:13:00.00
Person 14
Company2
11/08/2021 07:14:00.00
Person 15
Company2
11/08/2021 07:15:00.00
Person 16
Company2
11/08/2021 07:16:00.00
We'll have a lot more data than that, but what we're trying to achieve is, every day, we want to do a selection of the first 50 with a blank status, and group these together.
Once we finish with them for the day, the Status is updated so we can select the next 50 easily.
This part is easy, however, the bit I'm unsure about is that we only want to select up to 10 entries from the same company every day.
For example, we want to retrieve 50 results, but only up to 10 can have the same company.
The 50 results may look like:
10 from Company 1
10 from Company 2
8 from Company 3 (since they didn't submit 8 or they did them later, so their no longer next in line)
2 from Company 4
10 from Company 5
10 from Company 6
I've gotten this far, but I don't know how to only select up to 10 from each company:
select top 50 *
from EmployeeData
where insull(Status,'') <> ''
order by DateTime
Any help would be greatly appreciated.
Thanks
Luke
Something like this maybe:
select top 50 [Name], [Company], [Status], [DateTime]
from (
SELECT [Name], [Company], [Status], [DateTime],
ROW_NUMBER ( ) OVER ( PARTITION BY [Company] order by [Name] ) rownum
FROM EmployeeData
where insull(Status,'') <> ''
) tbl
where rownum <= 10
order by [DateTime]
This query has two CTE's. First, the unique Company names are selected as 'unq_cte'. Second, 10 random rows are selected for each company as 'all_cte'. Last, from the 'all_cte' (collection of maximum 10 rows per company) the query selects 50 rows at random.
;with
unq_cte as (
select distinct Company
from EmployeeData),
all_cte as (
select top10.*
from unq_cte uc
cross apply (select top(10) *
from EmployeeData ed
where uc.Company=ed.Company
order by newid())
top10([Name], Company, [Status], [DateTime]))
select top(50) *
from all_cte
order by newid();
Related
Let's say I have a table testpivot that looks like this, with district and metric a unique key:
id
district
metric
value
1
a
work
40
2
a
hours
80
3
b
work
50
4
b
hours
85
I create a view:
CREATE VIEW vpivot
AS
SELECT *
FROM(
SELECT district, metric, value
FROM testpivot
)t
PIVOT(SUM(value) for metric IN(work,hours)) as p
So querying from the view looks like this:
district
work
hours
a
40
80
b
50
85
Is there a way to make an insert query like this work:
INSERT INTO vpivot SELECT 'c',20,80
Select * from vpivot
UNION
SELECT 'c',20,80
I have a lot of tables that needed to merge but first I need to check if Student already in the table
Table 1 (English)
FirstName LastName BirthDate English
Kyle Fernandez 01/05/2002 82
Bill Cruz 08/24/2003 88
Table 2 (Math)
FirstName LastName BirthDate Math
Kyle Fernandez 01/05/2002 79
Bill Cruz 08/24/2003 83
Mae Sol 03/26/2002 87
Now what I need to do is merge those tables without repeating the same student. like this
Table Report Card
FirstName LastName BirthDate English Math
Kyle Fernandez 01/05/2002 82 79
Bill Cruz 08/24/2003 88 83
Mae Sol 03/26/2002 0 87
I need to insert and update in bulk. Thank you
I think you want a full outer join here:
SELECT
COALESCE(t1.FirstName, t2.FirstName) AS FirstName,
COALESCE(t1.LastName, t2.LastName) AS LastName,
COALESCE(t1.Birthdate, t2.Birthdate) AS Birthdate,
COALESCE(t1.English, 0) AS English,
COALESCE(t2.Math, 0) AS Math
FROM Table1 t1
FULL OUTER JOIN Table2 t2
ON t2.FirstName = t1.FirstName AND
t2.LastName = t1.LastName
The logic here is to retain a student even if it only appears in one, but not both, of the two tables. We report the English and math scores as zero in the case where a given student does not have any value in either table.
I have three tables. Table Cust has a custID field, plus various other values (name, address etc)
Table List has a single column ID. Each ID is a custID in the Cust table
Edit: the purpose of this is to filter the records, restricting thge results to ones where the CustID appears in the list table.
All three tables are indexed.
Table Trans has a TransactionID field, a Cust field that holds a customer ID, And other transaction fields
Edit: I should have mentioned that in some cases there will be no transaction record. In this case I want one row of Customer info with the transaction fields null or blank.
I want a query to return cust and transaction ID for each ID in the list table. If there is more than one matching row in the transaction table, I want each included along 3with the matching cust info. So if the tables look like This:
Cust
ID Name
01 John
02 Mary
03 Mike
04 Jane
05 Sue
06 Frank
List
ID
01
03
05
06
Transact
TransID CustId Msg
21 01 There
22 01 is
23 02 a
24 03 tide
25 04 in
26 04 the
27 05 affairs
28 05 of
29 05 men
I want the result set to be:
CustID Name TransID Msg
01 John 21 There
01 John 22 is
03 Mike 24 tide
05 Sue 27 affairs
05 Sue 28 of
05 Sue 29 men
06 Frank -- --
(Where -- represents NULL or BLANK)
Obviously the actual tables are much larger (millions of rows), but that shows the pattern, one row for every item in table Transactions that matches any of the items in the List table, with matching fields from the Cust table. if there is no matching Transaction, one row of customer info from each ID in the List table. CustID is unique in the Cust and List tables, but not in the transaction table.
This needs to work on any version of SQL server from 2005 onward, if that matters.
Any suggestions?
Unless I'm missing something, this is all you need to do:
Select T.CustID, C.Name, T.TransID, T.Msg
From Transact T
Join Cust C On C.Id = T.CustId
Join List L On L.Id = C.Id
Order By T.CustID, T.TransID
;with cust (id, name) as
(
select 1, 'John' union all
select 2, 'Mary' union all
select 3, 'Mike' union all
select 4, 'Jane' union all
select 5, 'Sue'
), list (id) as
(
select 1 union all
select 3 union all
select 5
), transact (TransId, CustId, Msg) as
(
select 21, 1, 'There '
union all select 22, 1, 'is'
union all select 23, 2, 'a'
union all select 24, 3, 'tide'
union all select 25, 4, 'in'
union all select 26, 4, 'the'
union all select 27, 5, 'affairs'
union all select 28, 5, 'of'
union all select 29, 5, 'men'
)
select
CustId = c.id,
Name = c.Name,
TransId = t.TransId,
Msg = t.Msg
from cust c
inner join list l
on c.id = l.id
inner join transact t
on l.id = t.custid
yields:
CustId Name TransId Msg
----------- ---- ----------- -------
1 John 21 There
1 John 22 is
3 Mike 24 tide
5 Sue 27 affairs
5 Sue 28 of
5 Sue 29 men
I'm trying to group a set of data and for some of the fields I need to select a specific value based on the ttype, for example I have the following rows:
caseid age iss gcs ttype
00170 64 25 17 Transfer Out
00170 64 27 15 Transfer In
00201 24 14 40 Transfer In
If a caseID has ttype 'Transfer Out' I want to use the ISS and GCS values from this row, otherwise use the values from the 'Transfer In' row.
My desired output based on the above example would be:
caseid age iss gcs
00170 64 25 17
00201 24 14 40
My current select statement is:
select caseid, max(age), max(iss), max(gcs)
from Table1
group by caseid
Which I know is incorrect but how do I specify the values for ISS and GCS from a specific row?
Thanks
Edit - I will not always need to select from Row1, table below with expanded data:
caseid age iss gcs los ttype disdate
170 64 25 17 5 Transfer Out 2014-01-02 00:00:00.000
170 64 27 15 1 Transfer In 2014-01-04 00:00:00.000
201 24 14 40 4 Transfer In 2014-01-04 00:00:00.000
In this case, I want the max age and the ISS and GCS figure for row1 as before but I need to sum the LOS and select the disdate for row 2 (ie the latest date), so my output would be:
caseid age iss gcs los disdate
170 64 25 17 6 2014-01-04
201 24 14 40 4 2014-01-04
Is this possible?
You can use a CTE and ROW_NUMBER + Over-clause (edited acc. to your updated question):
WITH CTE AS
(
SELECT caseid, age, iss, gcs, los, ttype, disdate,
SumLos = SUM(los) OVER (PARTITION BY caseid),
LatestDisDate = MAX(disdate) OVER (PARTITION BY caseid),
rn = ROW_NUMBER() OVER (PARTITION BY caseid
ORDER BY CASE WHEN ttype = 'Transfer Out'
THEN 0 ELSE 1 END ASC, disdate ASC)
FROM dbo.Table1
)
SELECT caseid, age, iss, gcs, los = SumLos, disdate = LatestDisDate
FROM CTE
WHERE rn = 1
Demo
I think this is what you need -
;WITH CTE AS
(
SELECT case_id, age,iss,gcs, ROW_NUMBER () over (PARTITION BY ttype order by gcs DESC) Rn
from YOUR_TABLE_NAME
)
SELECT case_id,age,iss,gcs
from CTE where Rn =1
I'll try to describe the real situation. In our company we have a reservation system with a table, let's call it Customers, where e-mail and phone contacts are saved with each incoming order - that's the part of a system I can't change. I'm facing the problem how to get count of unique customers. With the unique customer I mean group of people who has either the same e-mail or same phone number.
Example 1: From the real life you can imagine Tom and Sandra who are married. Tom, who ordered 4 products, filled in our reservation system 3 different e-mail addresses and 2 different phone numbers when one of them shares with Sandra (as a homephone) so I can presume they are connected somehow. Sandra except this shared phone number filled also her private one and for both orders she used only one e-mail address. For me this means to count all of the following rows as one unique customer. So in fact this unique customer may grow up into the whole family.
ID E-mail Phone Comment
---- ------------------- -------------- ------------------------------
0 tom#email.com +44 111 111 First row
1 tommy#email.com +44 111 111 Same phone, different e-mail
2 thomas#email.com +44 111 111 Same phone, different e-mail
3 thomas#email.com +44 222 222 Same e-mail, different phone
4 sandra#email.com +44 222 222 Same phone, different e-mail
5 sandra#email.com +44 333 333 Same e-mail, different phone
As ypercube said I will probably need a recursion to count all of these unique customers.
Example 2: Here is the example of what I want to do.Is it possible to get count of unique customers without using recursion for instance by using cursor or something or is the recursion necessary ?
ID E-mail Phone Comment
---- ------------------- -------------- ------------------------------
0 linsey#email.com +44 111 111 ─┐
1 louise#email.com +44 111 111 ├─ 1. unique customer
2 louise#email.com +44 222 222 ─┘
---- ------------------- -------------- ------------------------------
3 steven#email.com +44 333 333 ─┐
4 steven#email.com +44 444 444 ├─ 2. unique customer
5 sandra#email.com +44 444 444 ─┘
---- ------------------- -------------- ------------------------------
6 george#email.com +44 555 555 ─── 3. unique customer
---- ------------------- -------------- ------------------------------
7 xavier#email.com +44 666 666 ─┐
8 xavier#email.com +44 777 777 ├─ 4. unique customer
9 xavier#email.com +44 888 888 ─┘
---- ------------------- -------------- ------------------------------
10 robert#email.com +44 999 999 ─┐
11 miriam#email.com +44 999 999 ├─ 5. unique customer
12 sherry#email.com +44 999 999 ─┘
---- ------------------- -------------- ------------------------------
----------------------------------------------------------------------
Result ∑ = 5 unique customers
----------------------------------------------------------------------
I've tried a query with GROUP BY but I don't know how to group the result by either first or second column. I'm looking for let's say something like
SELECT COUNT(*) FROM Customers
GROUP BY Email OR Phone
Thanks again for any suggestions
P.S.
I really appreciate the answers for this question before the complete rephrase. Now the answers here may not correspond to the update so please don't downvote here if you're going to do it (except the question of course :). I completely rewrote this post.Thanks and sorry for my wrong start.
Here is a full solution using a recursive CTE.
;WITH Nodes AS
(
SELECT DENSE_RANK() OVER (ORDER BY Part, PartRank) SetId
, [ID]
FROM
(
SELECT [ID], 1 Part, DENSE_RANK() OVER (ORDER BY [E-mail]) PartRank
FROM dbo.Customer
UNION ALL
SELECT [ID], 2, DENSE_RANK() OVER (ORDER BY Phone) PartRank
FROM dbo.Customer
) A
),
Links AS
(
SELECT DISTINCT A.Id, B.Id LinkedId
FROM Nodes A
JOIN Nodes B ON B.SetId = A.SetId AND B.Id < A.Id
),
Routes AS
(
SELECT DISTINCT Id, Id LinkedId
FROM dbo.Customer
UNION ALL
SELECT DISTINCT Id, LinkedId
FROM Links
UNION ALL
SELECT A.Id, B.LinkedId
FROM Links A
JOIN Routes B ON B.Id = A.LinkedId AND B.LinkedId < A.Id
),
TransitiveClosure AS
(
SELECT Id, Id LinkedId
FROM Links
UNION
SELECT LinkedId Id, LinkedId
FROM Links
UNION
SELECT Id, LinkedId
FROM Routes
),
UniqueCustomers AS
(
SELECT Id, MIN(LinkedId) UniqueCustomerId
FROM TransitiveClosure
GROUP BY Id
)
SELECT A.Id, A.[E-mail], A.Phone, B.UniqueCustomerId
FROM dbo.Customer A
JOIN UniqueCustomers B ON B.Id = A.Id
Finding groups that have only same Phone:
SELECT
ID
, Name
, Phone
, DENSE_RANK() OVER (ORDER BY Phone) AS GroupPhone
FROM
MyTable
ORDER BY
GroupPhone
, ID
Finding groups that have only same Name:
SELECT
ID
, Name
, Phone
, DENSE_RANK() OVER (ORDER BY Name) AS GroupName
FROM
MyTable
ORDER BY
GroupName
, ID
Now, for the (complex) query you describe, let's say we have a table like this instead:
ID Name Phone
---- ------------- -------------
0 Kate +44 333 333
1 Sandra +44 000 000
2 Thomas +44 222 222
3 Robert +44 000 000
4 Thomas +44 444 444
5 George +44 222 222
6 Kate +44 000 000
7 Robert +44 444 444
--------------------------------
Should all these be in one group? As they all share name or phone with someone else, forming a "chain" of relative persons:
0-6 same name
6-1-3 same phone
3-7 same name
7-4 same-phone
4-2 same name
2-5 bame phone
For the dataset in the example you could write something like this:
;WITH Temp AS (
SELECT Name, Phone,
DENSE_RANK() OVER (ORDER BY Name) AS NameGroup,
DENSE_RANK() OVER (ORDER BY Phone) AS PhoneGroup
FROM MyTable)
SELECT MAX(Phone), MAX(Name), COUNT(*)
FROM Temp
GROUP BY NameGroup, PhoneGroup
I don't know if this is the best solution, but here it is:
SELECT
MyTable.ID, MyTable.Name, MyTable.Phone,
CASE WHEN N.No = 1 AND P.No = 1 THEN 1
WHEN N.No = 1 AND P.No > 1 THEN 2
WHEN N.No > 1 OR P.No > 1 THEN 3
END as GroupRes
FROM
MyTable
JOIN (SELECT Name, count(Name) No FROM MyTable GROUP BY Name) N on MyTable.Name = N.Name
JOIN (SELECT Phone, count(Phone) No FROM MyTable GROUP BY Phone) P on MyTable.Phone = P.Phone
The problem is that here are some joins made on varchars and could end up in increasing execution time.
Here is my solution:
SELECT p.LastName, P.FirstName, P.HomePhone,
CASE
WHEN ph.PhoneCount=1 THEN
CASE
WHEN n.NameCount=1 THEN 'unique name and phone'
ELSE 'common name'
END
ELSE
CASE
WHEN n.NameCount=1 THEN 'common phone'
ELSE 'common phone and name'
END
END
FROM Contacts p
INNER JOIN
(SELECT HomePhone, count(LastName) as PhoneCount
FROM Contacts
GROUP BY HomePhone) ph ON ph.HomePhone = p.HomePhone
INNER JOIN
(SELECT FirstName, count(LastName) as NameCount
FROM Contacts
GROUP BY FirstName) n ON n.FirstName = p.FirstName
LastN FirstN Phone Comment
Hoover Brenda 8138282334 unique name and phone
Washington Brian 9044563211 common name
Roosevelt Brian 7737653279 common name
Reagan Charles 7734567869 unique name and phone