SQL Server: group by a column and get the first group

SQL Server: group by a column and get the first group - sql-server

I have a table for sending scheduled sms. some texts has multiple receivers, the records with the same text, has the same GroupID each time I should select a maximum of 100 receivers but all must have the same GroupID. for example if there are 500 records with the same GroupID I should select 100 records of that group, but if there are 10 records with the same GroupID, I should only select these 10 records.
Well I can simply select top 100 for defining maximum the problem is I don't know how to avoid selection of records with other GroupIDs.
I come up with this solution what do you think?
select top 100 * from ScheduledSms
where GroupID = (select top 1 GroupID from ScheduledSms order by DateAdded)

SELECT TOP 100 receiver
WHERE groupid = '...'

Well I used my own solution and it works fine:
select top 100 * from ScheduledSms
where GroupID = (select top 1 GroupID from ScheduledSms order by DateAdded)

Related

Get top 10 unique vendor data based on sub query which return vendor id

I have two tables FileMaster and VendorMaster.
In VendorMaster i have vendor id and other stuff. in FileMaster i have file related data where 'Vendorid' is foreign key in FileMaster. Now I want to fetch top 10 data from FileMaster for Each 'Vendor' (One record for one vendor).
I have tried below query, but it returns me 10 records with duplicate vendorID
select top 10 * from FileMaster where VendorId in (select top 10 VendorId from VendorMaster)

You can use ROW_NUMBER. I assumed FileID column for the identity of File Master. By the way, you don't need any subquery
SELECT TOP 10 * FROM (
select *,
ROW_NUMBER() OVER(PARTITION BY VendorID ORDER BY FileID) AS RN
FROM FileMaster ) AS T
WHERE RN = 1
ORDER BY FileID

Here you can use simple way instead of subquery
Or Anther way you can use CTE click this Link
SELECT DISTINCT FM.*
FROM FileMaster FM WITH(NOLOCK)
INNER JOIN [dbo].[VendorMaster] VM WITH(NOLOCK)
ON FM.VendorId = VM.VendorId
ORDER BY FM.VendorId ASC
OFFSET 0 ROWS
FETCH NEXT 10 ROWS ONLY
For more details OFFSET related check this Link

SQL performance problem: Select N rows until find distinct 200 customer

I have table ORDERS with these columns:
id | Customer | product
I want to calculate N number of rows in ORDERS table. N should be big enough to contain 200 distinct CUSTOMER from bottom of table. I have wrote the following query using max(ID) but It takes too many seconds to run this query. I think this is not optimized because I have thousands of rows and every time I have to use group by on whole table to find just an ID:
select count(*) as N from ORDERS where id > (
select top 1 id from
(select distinct top 200 CUSTOMER,max(id) as maxid from ORDERS group by CUSTOMER order by maxid desc) x
order by id asc
)
Is there another way to handle this with better performance?

This is a huge guess, but this will at least return a result:
WITH Customers AS(
SELECT TOP 200
CUSTOMER,
MAX(id) AS MaxID
FROM ORDERS
GROUP BY CUSTOMER
ORDER BY MaxID DESC)
SELECT COUNT(*)
FROM ORDERS O
WHERE EXISTS (SELECT 1
FROM Customers C
WHERE C.MaxID = O.id);
If it returns the correct results, but it still runs slowly, post the DDL of your table, and include the DDL for your indexes. I also suggest posting the query plan by using Paste the Plan

SQL Server: select random rows with distinct id from table where id is not distinct

I have a simple table named Tickets with the following columns:
ticketId, userId
where ticketId is the primary key, UserId is not unique.
A user can therefore have several tickets, each with unique ticketId's.
I'm struggling to find a solution on my problem which is that I need to select 5 random tickets by 5 unique userId's.
I know how to select the random tickets by using the following query:
SELECT TOP 5 *
FROM Tickets
ORDER BY RAND(CHECKSUM(*) * RAND())
Which returns something like:
Ticket id: UserId:
--------------------------
10 1
25 1
31 2
42 2
56 3
My question is: what do I need to add to the query for it to select the random rows between distinct userId's so that it does not return more than one unique ticket for a user
Mind I need the most performance correct solution, since the table could potentially be filled with millions of rows in the long run.
Thanks in advance,
Christian
Edit:
The more tickets a user has, the higher the chances of selection. However it should still be randomly selected and not just select the user with the highest amount of tickets. Just like in a lottery.
In other words it should select 5 random rows between all rows, but ensure that the 5 rows have a unique userId.

Please try like this .... NEWID()
Select UserId
from
(
SELECT TOP 5 UserId
FROM Tickets
ORDER BY NEWID()
)k
CROSS APPLY
(
select top 1 TicketId
from Tickets T WHERE T.UserId = k.UserId
ORDER BY NEWID()
)u

Edit: As pointed out in the comments, this solution doesn't properly weight the users by number of tickets (so a user with 1000 tickets incorrectly has same change of winning as user with 1 ticket). This was particularly dumb of me since I pointed out this problem on other answers.
Given that Steve now has his solution working, I think that is the better answer.
Original answer:
I think something like the following works:
SELECT top 5 ticketid, userid
FROM
(
SELECT ticketid, userid, ROW_NUMBER() OVER (PARTITION BY userid ORDER BY NEWID()) as nid
FROM tickets
) a
WHERE nid = 1
ORDER BY NEWID()
Here's an sql fiddle to play around with it.
Credit where credit is due: I based this on Steve's solution which I don't think works correctly as written.

Something like the following I think.
Please note this code is untested, so please excuse any small syntax errors.
WITH randomised_tickets AS
(
SELECT
*
,ROW_NUMBER() OVER (ORDER BY NEWID() ASC) AS random_order
FROM Tickets
)
,ordered_winning_tickets AS
(
SELECT
*
,ROW_NUMBER() OVER (PARTITION BY userId ORDER BY random_order ASC) AS user_win_order
FROM randomised_tickets
)
SELECT TOP 5
*
FROM
ordered_winning_tickets
WHERE
user_win_order = 1 --eliminate 2nd wins from the list
ORDER BY
random_order

You could try something like this, using ignore_dup_key on a temp table to eliminate duplciates for a user:
drop table if exists #WinningTickets
create table #WinningTickets(PickId int identity primary key, TicketId int, UserId int)
create unique index ix_unique_user on #WinningTickets(UserId) with (ignore_dup_key=on)
while ( select count(*) from #WinningTickets ) < 5
begin
insert into #WinningTickets
select top 10 TicketId, UserId
from Tickets
order by newid()
end
select top 5 *
from #WinningTickets
order by PickId

How to find and delete all duplicates from SQL Server database

I'm new to SQL in general and I need to delete all duplicates in a given database.
For the moment, I use this DB to experiment some things.
The table currently looks like this :
I know I can find all duplicates using this query :
SELECT COUNT(*) AS NBR_DOUBLES, Name, Owner
FROM dbo.animals
GROUP BY Name, Owner
HAVING COUNT(*) > 1
but I have a lot of trouble finding an adapted and updated solution to not only find all the duplicates, but also delete them all, only leaving one of each.
Thanks a lot for taking some of your time to help me.

;WITH numbered AS (
SELECT ROW_NUMBER() OVER(PARTITION BY Name, Owner ORDER BY Name, Owner) AS _dupe_num
FROM dbo.Animals
)
DELETE FROM numbered WHERE _dupe_num > 1;
This will delete all but one of each occurance with the same Name & Owner, if you need it to be more specific you should extend the PARTITION BY clause. If you want it to take in account the entire record you should add all your fields.
The record left behind is currently random, since it seems you do not have any field to have any sort of ordering on.

What you want to do is use a projection that numbers each record within a given duplicate set. You can do that with a Windowing Function, like this:
SELECT Name, Owner
,Row_Number() OVER ( PARTITION BY Name, Owner ORDER BY Name, Owner, Birth) AS RowNum
FROM dbo.animals
ORDER BY Name, Owner
This should give you results like this:
Name Owner RowNum
Ecstasy Sacha 1
Ecstasy Sacha 2
Ecstasy Sacha 3
Gremlin Max 1
Gremlin Max 2
Gremlin Max 3
Outch Max 1
Outch Max 2
Outch Max 3
Now you want to convert this to a DELETE statement that has a WHERE clause targeting rows with RowNum > 1. The way to use a windowing function with a DELETE is to first include the windowing function as part of a common table expression (CTE), like this:
WITH dupes AS
(
SELECT Name, Owner,
Row_Number() OVER ( PARTITION BY Name, Owner ORDER BY Name, Owner, Birth) AS RowNum
FROM dbo.animals
)
DELETE FROM dupes WHERE RowNum > 1;
This will delete later duplicates, but leave row #1 for each group intact. The only trick now is to make sure row #1 is the correct row, since not all of your duplicates have the same values for the Birth or Death columns. This is the reason I included the Birth column in the windowing function, while other answers (so far) have not. You need to decide if you want to keep the oldest animal or the youngest, and optionally change the Birth order in the OVER clause to match your needs.

Use CTE. I will show you a sample :
Create table #Table1(Field1 varchar(100));
Insert into #Table1 values
('a'),('b'),('f'),('g'),('a'),('b');
Select * from #Table1;
WITH CTE AS(
SELECT Field1,
RN = ROW_NUMBER()OVER(PARTITION BY Field1 ORDER BY Field1)
FROM #Table1
)
--SELECT * FROM CTE WHERE RN > 1
DELETE FROM CTE WHERE RN > 1
What I am doing is, numbering the rows. If there are duplicates based on PARTITION BY columns, it will be numbered sequentially, else 1.
Then delete those records whose count is greater than 1.
I won't spoon feed you solution hence you will have to play with PARTITION BY to reach your output
output :
Select * from #Table1;
Field1
---------
a
b
f
g
a
b
/*with cte as (...) SELECT * FROM CTE;*/
Field1 RN
------- -----
a 1
a 2
b 1
b 2
f 1
g 1

if NBR_DOUBLES had an ID field, I believe you could use this;
DELETE FROM NBR_DOUBLES WHERE ID IN
(
SELECT MAX(ID)
FROM dbo.animals
GROUP BY Name, Owner
HAVING COUNT(*) > 1
)

How to return top 100 rows by a column value and then randomize theese top 100 rows?

I'm using MS SQL and i've managed to create a query that selects top 100 rows and randomize them like this
SELECT TOP 100 * FROM Inlagg ORDER BY NEWID()
I've also managed to create a query that returns the top 100 rows according to the column likes like this
SELECT TOP 100 * FROM Inlagg ORDER BY Likes DESC
My question is now, how can i target theese top 100 rows by Likes and then randomize theese top 100 values?
Any help or input highly appreciated, thanks!

You can use something like
SELECT *
FROM (SELECT TOP 100 * FROM Inlagg ORDER BY Likes DESC) as T
ORDER BY NEWID()
or (for those who prefers common table expressions not subqueries)
WITH CTE_TOP as (SELECT TOP 100 * FROM Inlagg ORDER BY Likes DESC)
SELECT * FROM CTE_TOP ORDER BY NEWID();

may be this also works
select *
from Inlagg t1
inner join
(
select distinct top 100 Likes
from Inlagg
order by Likes
) t2
on t1.Inlaggid = t2.Inlaggid
Guys i'm sorry to say that i'm unable to send comments may be java Api is not supporting my browser.Why it wont works it will give top 100 records based on the order by combination.Coming to the performance issue may be this table column will have clustered or non clustered index will be there.Scan lookups will be reduced i just said it is another way not the exact solution

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SQL Server: group by a column and get the first group - sql-server

SELECT TOP 100 receiver WHERE groupid = '...'

Well I used my own solution and it works fine: select top 100 * from ScheduledSms where GroupID = (select top 1 GroupID from ScheduledSms order by DateAdded)

Related

Get top 10 unique vendor data based on sub query which return vendor id

SQL performance problem: Select N rows until find distinct 200 customer

SQL Server: select random rows with distinct id from table where id is not distinct

How to find and delete all duplicates from SQL Server database

How to return top 100 rows by a column value and then randomize theese top 100 rows?

Categories

Resources