SQL performance problem: Select N rows until find distinct 200 customer - sql-server

I have table ORDERS with these columns:
id | Customer | product
I want to calculate N number of rows in ORDERS table. N should be big enough to contain 200 distinct CUSTOMER from bottom of table. I have wrote the following query using max(ID) but It takes too many seconds to run this query. I think this is not optimized because I have thousands of rows and every time I have to use group by on whole table to find just an ID:
select count(*) as N from ORDERS where id > (
select top 1 id from
(select distinct top 200 CUSTOMER,max(id) as maxid from ORDERS group by CUSTOMER order by maxid desc) x
order by id asc
)
Is there another way to handle this with better performance?

This is a huge guess, but this will at least return a result:
WITH Customers AS(
SELECT TOP 200
CUSTOMER,
MAX(id) AS MaxID
FROM ORDERS
GROUP BY CUSTOMER
ORDER BY MaxID DESC)
SELECT COUNT(*)
FROM ORDERS O
WHERE EXISTS (SELECT 1
FROM Customers C
WHERE C.MaxID = O.id);
If it returns the correct results, but it still runs slowly, post the DDL of your table, and include the DDL for your indexes. I also suggest posting the query plan by using Paste the Plan

Related

Get top 10 unique vendor data based on sub query which return vendor id

I have two tables FileMaster and VendorMaster.
In VendorMaster i have vendor id and other stuff. in FileMaster i have file related data where 'Vendorid' is foreign key in FileMaster. Now I want to fetch top 10 data from FileMaster for Each 'Vendor' (One record for one vendor).
I have tried below query, but it returns me 10 records with duplicate vendorID
select top 10 * from FileMaster where VendorId in (select top 10 VendorId from VendorMaster)
You can use ROW_NUMBER. I assumed FileID column for the identity of File Master. By the way, you don't need any subquery
SELECT TOP 10 * FROM (
select *,
ROW_NUMBER() OVER(PARTITION BY VendorID ORDER BY FileID) AS RN
FROM FileMaster ) AS T
WHERE RN = 1
ORDER BY FileID
Here you can use simple way instead of subquery
Or Anther way you can use CTE click this Link
SELECT DISTINCT FM.*
FROM FileMaster FM WITH(NOLOCK)
INNER JOIN [dbo].[VendorMaster] VM WITH(NOLOCK)
ON FM.VendorId = VM.VendorId
ORDER BY FM.VendorId ASC
OFFSET 0 ROWS
FETCH NEXT 10 ROWS ONLY
For more details OFFSET related check this Link

SQL Server: select random rows with distinct id from table where id is not distinct

I have a simple table named Tickets with the following columns:
ticketId, userId
where ticketId is the primary key, UserId is not unique.
A user can therefore have several tickets, each with unique ticketId's.
I'm struggling to find a solution on my problem which is that I need to select 5 random tickets by 5 unique userId's.
I know how to select the random tickets by using the following query:
SELECT TOP 5 *
FROM Tickets
ORDER BY RAND(CHECKSUM(*) * RAND())
Which returns something like:
Ticket id: UserId:
--------------------------
10 1
25 1
31 2
42 2
56 3
My question is: what do I need to add to the query for it to select the random rows between distinct userId's so that it does not return more than one unique ticket for a user
Mind I need the most performance correct solution, since the table could potentially be filled with millions of rows in the long run.
Thanks in advance,
Christian
Edit:
The more tickets a user has, the higher the chances of selection. However it should still be randomly selected and not just select the user with the highest amount of tickets. Just like in a lottery.
In other words it should select 5 random rows between all rows, but ensure that the 5 rows have a unique userId.
Please try like this .... NEWID()
Select UserId
from
(
SELECT TOP 5 UserId
FROM Tickets
ORDER BY NEWID()
)k
CROSS APPLY
(
select top 1 TicketId
from Tickets T WHERE T.UserId = k.UserId
ORDER BY NEWID()
)u
Edit: As pointed out in the comments, this solution doesn't properly weight the users by number of tickets (so a user with 1000 tickets incorrectly has same change of winning as user with 1 ticket). This was particularly dumb of me since I pointed out this problem on other answers.
Given that Steve now has his solution working, I think that is the better answer.
Original answer:
I think something like the following works:
SELECT top 5 ticketid, userid
FROM
(
SELECT ticketid, userid, ROW_NUMBER() OVER (PARTITION BY userid ORDER BY NEWID()) as nid
FROM tickets
) a
WHERE nid = 1
ORDER BY NEWID()
Here's an sql fiddle to play around with it.
Credit where credit is due: I based this on Steve's solution which I don't think works correctly as written.
Something like the following I think.
Please note this code is untested, so please excuse any small syntax errors.
WITH randomised_tickets AS
(
SELECT
*
,ROW_NUMBER() OVER (ORDER BY NEWID() ASC) AS random_order
FROM Tickets
)
,ordered_winning_tickets AS
(
SELECT
*
,ROW_NUMBER() OVER (PARTITION BY userId ORDER BY random_order ASC) AS user_win_order
FROM randomised_tickets
)
SELECT TOP 5
*
FROM
ordered_winning_tickets
WHERE
user_win_order = 1 --eliminate 2nd wins from the list
ORDER BY
random_order
You could try something like this, using ignore_dup_key on a temp table to eliminate duplciates for a user:
drop table if exists #WinningTickets
create table #WinningTickets(PickId int identity primary key, TicketId int, UserId int)
create unique index ix_unique_user on #WinningTickets(UserId) with (ignore_dup_key=on)
while ( select count(*) from #WinningTickets ) < 5
begin
insert into #WinningTickets
select top 10 TicketId, UserId
from Tickets
order by newid()
end
select top 5 *
from #WinningTickets
order by PickId

Unique vs MAX in SQL statement

I have a table with three columns:
PERSON
VISITOR
DATE
The table is basically a transactional table. The following is true:
There are multiple rows per person
There are multiple rows per visitor
There are multiple rows of a given person/visitor combination.
Assumed unique person/date combination
What I need is
I want visitor for each Person's MAX Date.
I cannot have multiple persons in the output.
Person must be unique.
visitor may repeat.
I have tried:
SELECT
ROW_NUMBER() OVER (PARTITION BY PERSON, VISITOR ORDER BY Date DESC) row_num,
PERSON,
VISITOR as VISITOR
FROM
`TABLE`
ORDER BY
PERSON
Maybe this... not sure I fully understand question. Sample data /expected results would help.
You said you wanted only the 1 person with the visitor per max date so the row_num of 1 will be the record w/ the max date. and since we partition by person it will not matter if person A had 3 visitors. only the person and their Most recent visitor will be listed.
WITH cte as (
SELECT ROW_NUMBER() OVER (PARTITION BY PERSON ORDER BY Date DESC) row_num
, PERSON
, VISITOR as VISITOR
FROM `TABLE`)
SELECT *
FROM cte
WHERE row_Num = 1
I think this can be done with a cross apply too though i'm not as good at using them yet...
SELECT A.Person, A.Visitor, A.Date
FROM table A
CROSS APPLY (SELECT TOP 1 *
FROM TABLE B
WHERE A.Person = B.Person
and A.Visitor = B.Visitor
and A.Date = B.Date
ORDER BY DATE DESC) C
Essentially the inner query runs for each record on the outer query; thus only the top most record will be returned thus the newest date.
select a.* from myTable as a inner join (
SELECT person, max(date) as maxDate from myTable group by person
) as b
on a.date = b.maxDate
and a.person = b.person;
I am weak in reading and writing English.
In my opinion the answer may be:
SELECT `PERSON`, `VISITOR`, MAX(`DATE`) AS `DATE`
FROM `TABLE`
GROUP BY `PERSON`, `VISITOR`;

SQL Server: group by a column and get the first group

I have a table for sending scheduled sms. some texts has multiple receivers, the records with the same text, has the same GroupID each time I should select a maximum of 100 receivers but all must have the same GroupID. for example if there are 500 records with the same GroupID I should select 100 records of that group, but if there are 10 records with the same GroupID, I should only select these 10 records.
Well I can simply select top 100 for defining maximum the problem is I don't know how to avoid selection of records with other GroupIDs.
I come up with this solution what do you think?
select top 100 * from ScheduledSms
where GroupID = (select top 1 GroupID from ScheduledSms order by DateAdded)
SELECT TOP 100 receiver
WHERE groupid = '...'
Well I used my own solution and it works fine:
select top 100 * from ScheduledSms
where GroupID = (select top 1 GroupID from ScheduledSms order by DateAdded)

Top results from multiple elections in T-SQL

I have a table with votes from multiple sectors (think of each sector as a state or similar) on multiple candidates. Each sector has multiple candidates, each with a different vote count.
Here is my table (simplified)
CREATE TABLE [Results]
(
[SectorID] BIGINT,
[CanditateID] BIGINT,
[VoteCount] BIGINT,
[Newness] DATETIME
)
I obviously keep sector and candidate meta data in another table, but I need to find the highest voted candidate for each sector, so I can join those tables into a view.
The best candidate in each sector, is determined by [VoteCount], and if there are two with the same number of votes, it is determined by [Newness]. There must remain exactly one line per sector, and I have to be able to use it in a view, joined together with the meta data.
How do I obtain the highest voted candidate from each sector?
Assuming that you are using SQL Server 2005 or greater, you want to do this with row_number():
select r.*
from (select r.*,
row_number() over (partition by SectorId order by VoteCount desc, Newness desc) as seqnum
from Results r
) r
where seqnum = 1
This would work and can be joined:
SELECT
TR.SectorID, TR.CandidateID
FROM
tblResults TR
INNER JOIN ...
INNER JOIN ...
WHERE CandidateID =
(
SELECT TOP 1 CandidateID
FROM tblResults TRSUB
WHERE TRSUB.SectorID = TR.SectorID
ORDER BY VoteCount DESC, Newness DESC
)

Resources