SQL: SELECT n% with pictures, (100-n)% without pictures - sql-server

we have a DB which stores users who may have pictures.
I am looking for an elegant way in SQL to get the following results:
Select n users. Of those n users e.g. 60% should have an associated picture and 40% should not have a picture. If there are less than 60% users having a picture the result should be filled up with users wihtout an image.
Is there some elegant way in SQL without firing multiple SELECTs to the DB?
Thank you very much.

So you provide #n, being the number of users you want.
You provide #x being the percentage of those users who should have pictures.
select top (#n) *
from
(
select top (#n * #x / 100) *
from users
where picture is not null
union all
select top (#n) *
from users
where picture is null
) u
order by case when picture is not null then 1 else 2 end;
So... you want at most #n * #x / 100 users who have pictures, and the rest have to be people who don't have pictures. So I'm doing a 'union all' between my #n*#x/100 picture-people and enough others to complete my #n. Then I'm selecting them back, ordering my TOP to make sure that I keep the people who have a picture.
Rob
Edited: Actually, this would be better:
select top (#n) *
from
(
select top (#n * #x / 100) *, 0 as NoPicture
from users
where picture is not null
union all
select top (#n) *, 1 as NoPicture
from users
where picture is null
) u
order by NoPicture;
...because it removes the impact of the ORDER BY.

Ugly code:
SELECT TOP #n * FROM
(
//-- We start selecting users who have a picture (ordered by HasPicture)
//-- If there is no more users with a picture, this query will fill the
//-- remaining rows with users without a picture
SELECT TOP 60 PERCENT * FROM tbUser
ORDER BY HasPicture DESC
UNION
//-- This is to make sure that we select at least 40% users without a picture
//-- AT LEAST because in the first query it is possible that users without a
//-- picture have been selected
SELECT TOP 40 PERCENT * FROM tblUser
WHERE HasPicture = 0
//-- We need to avoid duplicates because in the first select query we haven't
//-- specified HasPicture = 1 (and we didn't want to).
AND UserID not IN
(
SELECT TOP 60 PERCENT UserID FROM tbUser
ORDER BY HavePicture DESC
)
)

SELECT TOP(n) HasPicture --should be 0 or 1 to allow ORDER
FROM Users
ORDER BY 1

Use the Select case for this type of Requirement.

Related

Access SubQuery: SHOW TOP (count form select query) Table

Is it possible to use a Count() or number from another Select query to SELECT TOP a number of rows in a different query?
Below is a sample of the update query I'm trying to use but would like to take the count from another query to replace "10".
...
WHERE Frames.Package IN (
SELECT TOP 10 Frames
FROM Frames.Package WHERE Package = "100"
ORDER BY Frames.ReferenceNumber
)
So for example, i've tried to do
SELECT TOP SelectQuery.RecordCount Frames
Sample SelectQuery.RecordCount
SELECT COUNT(Frames.Package) AS RecordCount
FROM Frames
HAVING Frames.Package = "100";
Any assistance would be appreciated...
Access does not support using a parameter for SELECT TOP. You must write a literal value into the text of the SQL statement.
From another answer: Select TOP N not working in MS Access with parameter
On that note, your two queries appear to be just interchanging HAVING and WHERE clauses to get the record count. It doesn't seem to be doing anything more, thus why bother with the TOP clause and simply SELECT * FROM Frames WHERE [..]?
Am I missing something?

Average data in its own row

I have data that returns the same value multiple times in one column, I only want to include the first value or even average the group, since they are all the same value. The group itself might have 3 rows of payments, but the payments are the same. I just want the three rows to show, but only the one payment in its own column.
In the data below I would like to add another column that averages Rich and Bob's value and inputs the amount in the top row for Rich and Bob.
Sample Data:
1 Rich 300
2 Rich 300
3 Rich 300
4 Bob 250
5 Bob 250
You probably want something like this:
Just paste this into an empty query window and exectue. Adapt to your needs...
DECLARE #tbl TABLE(ID INT, PersonName VARCHAR(100),Amount DECIMAL(6,2))
INSERT INTO #tbl VALUES
(1,'Rich',300)
,(2,'Rich',300)
,(3,'Rich',300)
,(4,'Bob',250)
,(5,'Bob',250);
WITH NumberedPerson AS
(
SELECT tbl.*
,ROW_NUMBER() OVER(PARTITION BY PersonName ORDER BY ID) PersonID
,AVG(Amount) OVER(PARTITION BY PersonName) PersonAvg
FROM #tbl AS tbl
)
SELECT *
,CASE WHEN PersonID=1 THEN PersonAvg ELSE NULL END AS AverageInFirstRow
FROM NumberedPerson
ORDER BY ID
But - to be honest - that is absolutely not the way how this should be done...

Stuck into Ranking algorithm

I'm stuck into an algorithm that I'm working on a few days. It's something like this:
I have lots of posts, and people may like or dislike them. In a scale from 0 to 100, the algorithm shows the most liked posts first. But when new posts arrives, they haven't any score yet, so they get to the end of this ranking. What I did: when a post haven't any vote, I put an default score (for example, 75).
When the first user likes this new post, it get the total score (100), but when the user dislike it, it goes to the end of the list (score 0).
What can I do to achieve this ranking for liked posts based on the total number of users who liked it?
If I wasn't clear enough, please tell me
Any help will be appreciated.
What I had done so far:
select id,(
(select cast(count(1) as float) from posts p1 where p1.id = p.id and liked = 1) /
(select cast(count(1) as float) from posts p2 where p2.id = p.id)
)*100 AS value
from posts p
group by id
My solution to this problem is to subtract a standard error from the estimated values. I would treat the variable in question as the proportion of likes among all responses to the post. The standard error is: sqrt(plikes * (1 - plikes)/(likes + notlikes)).
In SQL, this would be something like:
select id,
(avg(liked*1.0) - sqrt(avg(like * 1.0) * avg(1.0 - like) / count(*))) as like_lowerbound
group by id;
Subtracting one standard error is somewhat arbitrary, although there is a statistical basis for it. I have found that this works pretty well in practice.
I don't know if I understand correctly what you want but in any case, here it goes my answer. A ranking system may be based on the average of the positive votes (likes), this means, rank = number_of_likes / (number_of_likes + number_of_dislikes).
In SQL, you have something like this:
SELECT id, (likes/(likes + dislikes)) as rank FROM posts order by rank desc;
You can multiply by 100 if you need the result to be between [0, 100], instead of [0,1].
I will write two query using union all instead of going into so complication .
---First query to select fresh post
--also include one column which can be use in order by clause
-- Or you can make your own indication
Select col1,col2 .......,1 Indicator from post where blah blah
Union all
--Second query to select most populare
Select col1,col2 .......,2 Indicator from post where blah blah
Then in front end you easily identify and do filtering .
Also it is easy to maintain and quite fast .
thanks for the help, but I had solved my problem this way:
I maintained the original query with an additional clause
select id,(
(select cast(count(1) as float) from posts p1 where p1.id = p.id and liked = 1) /
(select cast(count(1) as float) from posts p2 where p2.id = p.id)
)*100 AS value,
(select count(1) from posts p3 where p3.id = p.id) as qty
from posts p
where qty > 5
group by id
So, if a new post comes in, it will have the default value assigned until the fifth user rate it. If the content is truly bad, it goes to the end of the list, otherwise it will stay on top until other users rate it down.
May not be the perfect solution but worked for me

Show records where most recent 'x' records meet criteria

Here's a simplified SQLFiddle example of data
Basically, I'm looking to identify records in a login audit table where the most recent records for each user has 'x' (let's say 3, for this example) number of failed logins
I am able to get this data for individual users by doing a SELECT TOP 3 and ordering by the log date in descending order and evaluating those records, but I know there's got to be a better way to do this.
I have tried a few queries using ROW_NUMBER(), partitioning by UserName and Success and ordering by LogDate, but I can't quite get it to do what I want. Essentially, every time a successful login occurs, I want the failed login counter to be reset.
try this code:
select * from (
select distinct a.UserName,
(select sum(cast(Success as int)) from (
SELECT TOP 3 Success --- here 3, change it to your number
FROM tbl as b
WHERE b.UserName=a.UserName
ORDER BY LogDate DESC
) as q
having count(*) >= 3 --- this string need to remove users who made less then 3 attempts
) as cnts
from tbl as a
) as q2
where q2.cnts=0
it shows users with all last 3 attempts failed, with different modifications, you can use this approach to identify how many success/fail attempts where done during last N rows
NOTE: this query works, but it is not the optimal way, from tbl as a should be changed to table where only users are stored, so you will be able to get rid of distinct, also - store users ID instead of username in tbl

how to limit result based on the fields

How to use ' LIMIT' in mysql database query based on fields . just consider a user has 10 phone number and i would like to get only 5 phone number for that user .
I would like to get only 5 phone numbers per user . not only 5 results from database ?
Hmmm... this should be the same (regarding your last sentence):
SELECT phone_number FROM phone_numbers WHERE user_id = XX LIMIT 5
But if you're looking for a sub-select within another query, then you will have to use a JOIN and can't LIMIT this JOIN. So the way to go would be to first select the users and then - on a per-user-basis - select 5 phone numbers for every user.
You can use "ROW_NUMBER() OVER PARTITION (..)" Mysql simulation for the situation Augenfeind described ( if you make a join and select no more than 5 phone numbers for each user ).
See here to understand the logic: http://www.xaprb.com/blog/2005/09/27/simulating-the-sql-row_number-function/
This would be something like:
SELECT * FROM
users u JOIN
(
select l.user_id, l.phone, count(*) as num
from phones as l
left outer join phones as r
on l.user_id = r.user_id
and l.phone >= r.phone
group by l.user_id, l.phone
) p
ON u.user_id=p.user_id AND p.num<=5
Hope that helps.

Resources