I'm stuck into an algorithm that I'm working on a few days. It's something like this:
I have lots of posts, and people may like or dislike them. In a scale from 0 to 100, the algorithm shows the most liked posts first. But when new posts arrives, they haven't any score yet, so they get to the end of this ranking. What I did: when a post haven't any vote, I put an default score (for example, 75).
When the first user likes this new post, it get the total score (100), but when the user dislike it, it goes to the end of the list (score 0).
What can I do to achieve this ranking for liked posts based on the total number of users who liked it?
If I wasn't clear enough, please tell me
Any help will be appreciated.
What I had done so far:
select id,(
(select cast(count(1) as float) from posts p1 where p1.id = p.id and liked = 1) /
(select cast(count(1) as float) from posts p2 where p2.id = p.id)
)*100 AS value
from posts p
group by id
My solution to this problem is to subtract a standard error from the estimated values. I would treat the variable in question as the proportion of likes among all responses to the post. The standard error is: sqrt(plikes * (1 - plikes)/(likes + notlikes)).
In SQL, this would be something like:
select id,
(avg(liked*1.0) - sqrt(avg(like * 1.0) * avg(1.0 - like) / count(*))) as like_lowerbound
group by id;
Subtracting one standard error is somewhat arbitrary, although there is a statistical basis for it. I have found that this works pretty well in practice.
I don't know if I understand correctly what you want but in any case, here it goes my answer. A ranking system may be based on the average of the positive votes (likes), this means, rank = number_of_likes / (number_of_likes + number_of_dislikes).
In SQL, you have something like this:
SELECT id, (likes/(likes + dislikes)) as rank FROM posts order by rank desc;
You can multiply by 100 if you need the result to be between [0, 100], instead of [0,1].
I will write two query using union all instead of going into so complication .
---First query to select fresh post
--also include one column which can be use in order by clause
-- Or you can make your own indication
Select col1,col2 .......,1 Indicator from post where blah blah
Union all
--Second query to select most populare
Select col1,col2 .......,2 Indicator from post where blah blah
Then in front end you easily identify and do filtering .
Also it is easy to maintain and quite fast .
thanks for the help, but I had solved my problem this way:
I maintained the original query with an additional clause
select id,(
(select cast(count(1) as float) from posts p1 where p1.id = p.id and liked = 1) /
(select cast(count(1) as float) from posts p2 where p2.id = p.id)
)*100 AS value,
(select count(1) from posts p3 where p3.id = p.id) as qty
from posts p
where qty > 5
group by id
So, if a new post comes in, it will have the default value assigned until the fifth user rate it. If the content is truly bad, it goes to the end of the list, otherwise it will stay on top until other users rate it down.
May not be the perfect solution but worked for me
Related
I am trying to figure out how to tag the SumAmounts as Good project and Great Projects based on the question asked.
Here is the question and hints provided:
Exercise
Among successful projects, those that raised 100% to 150% of the minimum amount are good projects, whereas those that raised more than 150% are great projects. Show the number of projects (name the column Count) along with a string representing how good the project is (good projects or great projects). Name that column Tag.
The result should look similar to this:
Count Tag
16 Good projects
7 Great projects
Stuck? Here's a hint!
You will need two CTEs: one for those projects which raised 100-150% and another one for those which raised more than 150%. Use HAVING clauses to check the conditions. Use UNION ALL to show the results from both queries.
To create the Tag column use:
N'Good projects' AS Tag
and
N'Great projects' AS Tag
Mind the N – the column should be nvarchar type.
Here is where my code is at:
WITH GoodP AS(
SELECT d.ProjectId, p.MinimalAmount,SUM(d.Amount)AS SumAmount
FROM Donation d
JOIN Project p
ON p.Id = d.ProjectId
GROUP BY d.ProjectId, p.MinimalAmount
HAVING SUM(d.Amount)>= p.MinimalAmount AND SUM(d.Amount) <= p.MinimalAmount*1.5
),
GreatP AS(
SELECT d.ProjectId, p.MinimalAmount,SUM(d.Amount)AS SumAmount
FROM Donation d
JOIN Project p
ON p.Id = d.ProjectId
GROUP BY d.ProjectId, p.MinimalAmount
HAVING SUM(d.Amount)> p.MinimalAmount*1.5
)
SELECT COUNT(ProjectId) AS Count, CASE WHEN SumAmount IS NULL THEN '0' ELSE N'Good projects' END AS Tag
FROM GoodP
Group by SumAmount
UNION ALL
SELECT COUNT(ProjectId) AS Count, CASE WHEN SumAmount IS NULL THEN '0' ELSE N'Great projects' END AS Tag
FROM GreatP
Group by SumAmount
My outer query is producing Good projects and Great projects but my counts are all single counts in the left column. The truth is I am not sure how to create the tags from SumAmount or if I should.
I guess I really shouldn't call it homework as I took it upon myself to learn SQL to help advance me in my job. I work on it when I can because SQL has helped me with some of the work I do at my job.
I had posted the question, the hint and the best code I could come up (what I had tried after several attempts) to answer the question.
With that being said, I just took the hint too literally and needed to do the following in the outer query:
SELECT COUNT(ProjectId) AS Count, N'Good projects' AS Tag
FROM GoodP
UNION ALL
SELECT COUNT(ProjectId) AS Count, N'Great projects' AS Tag
FROM GreatP;
This is my first Stackflow question, I hope someone can help me out with this. I I am completely lost and a newbie at SQL.
I have two tables (which I overly simplified for this question), the first one has the customer info and the car tire that they need. The second one is simply filled with a tire id, and all of the information for the tires. I am trying to input only the customer ID and return the one closest tire that matches the input along with the values of both the selected tire and the customer's tire. The matches also need to be prioritized in that order (size most important, width next most important, ratio is least important). Any suggestions on how to do this or where to start? Is there anything I can look at to help me solve this problem? I have been trying many different procedures, and some nested selects, but nothing is getting me close. Thank you.
customertable (custno, custsize, custwidth, custratio)
1,17,255,50
2,16,235,50
etc...
tirecollection (tireid, tiresize, tirewidth, tireratio)
1,15,225,40
2,16,225,50
3,17,250,55
4,17,235,30
5,18,255,40
etc...
This is not a 100% complete solution, but may work towards coming up with a solution. The approach here is combining the tyre dimensions into one value and then ranking them within a tyre size partition. You could then pass in the customer tyre dimensions to get the closest match.
with CTE
as
(
select *, TyreSize + TyreWidth as [TyreDimensions]
from tblTyres
)
select TC.CustId, C.TyreId, C.TyreSize, C.TyreWidth, C.[TyreDimensions],
rank() over(partition by C.TyreSize order by C.[TyreDimensions]) as [RNK]
from tblTyreCustomer as TC
join CTE as C
on TC.CustTyreSize = C.TyreSize
Assuming you're running SQL Server 2008 or later, this should work (this assumes you want to get a result for a single customer on a case-by-case basis):
CREATE FUNCTION udf.GetClosestTireMatch
(
#CustomerNo int
)
RETURNS TABLE
AS RETURN
SELECT custno, tireid, tiresize, tirewidth, tireratio
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY sizediff, widthdiff, ratiodiff) AS rownum
, c.custno, c.custsize, c.custwidth, c.custratio, t.tireid, t.tiresize, t.tirewidth, t.tireratio
, ABS(c.custsize-t.tiresize) AS sizediff, ABS(c.custwidth-t.tirewidth) AS widthdiff, ABS(c.custratio-t.tireratio) AS ratiodiff
FROM (SELECT * FROM customertable WHERE custno = #CustomerNo) c
CROSS JOIN tirecollection
) sub
WHERE rownum = 1
GO
Then you run the function with:
SELECT * FROM udf.GetClosestTireMatch(5)
(where 5=the customernumber you're querying).
First of all, sorry for that weird title. Here is the thing:
I work for a online shop, which sells products on amazon. Since we sell sets of different items, it happens that we send the same item within multiple sets to amazon fba. To give out the total sum of one item in all sets, I wrote the following query:
SELECT
SUM(nQuantity)
AS [total]
FROM [amazon_fba]
INNER JOIN (SELECT
[cArtNr]
FROM [tArtikel]
INNER JOIN (SELECT
[kStueckliste]
FROM [tStueckliste]
WHERE [kArtikel] = (SELECT
[kArtikel]
FROM [tArtikel]
WHERE [cHAN] = 12345)) [bar]
ON [tArtikel].[kStueckliste] = [bar].[kStueckliste]) [foo]
ON [amazon_fba].[cSellerSKU] = [foo].[cArtNr]
The cHAN=12345 part is just used to pick one specific item for which we want to know the total number of items. This query itself works fine, so this is not the problem.
However, I also know that all products that are part of sets have [tArtikel].[kStueckliste]=0, which -in theory- makes identifying them pretty easy. Which got me to the idea, that I could use this query to instantly generate a list of all these products with their respective total, like:
kArtikel | total
=================
01234 | 23
56789 | 42
So basically I needed something like
foreach (
select [kArtikel]
from [tArtikel]
where [tArtikel].[kStueckliste]=0
) do (
< the query I made >
)
Thus I tried the following statement:
SELECT
SUM(nQuantity)
AS [total]
FROM [amazon_fba]
INNER JOIN (SELECT
[cArtNr]
FROM [tArtikel]
INNER JOIN (SELECT
[kStueckliste]
FROM [tStueckliste]
INNER JOIN (SELECT
[kArtikel]
FROM [tArtikel]
WHERE [tArtikel].[tStueckliste] = 0) [baz]
ON [tStueckliste].[kArtikel] = [baz].[kArtikel]) [bar]
ON [tArtikel].[kStueckliste] = [bar].[kStueckliste]) [foo]
ON [amazon_fba].[cSellerSKU] = [foo].[cArtNr]
This did not -as I hoped- return a list of sums, but instead gave me the total sum of all sums I wanted to create.
Since I am pretty new to SQL (about two weeks in maybe), I have neither any idea what to do, nor where my mistake is, NOR what phrasing I should use to google my way around -thus that wierd Title of this post. So if anyone could help me with that and/or point me into the right direction I'd be really happy :)
I write MySQL rather than SQL but I believe it's very similar other than a few functions and syntaxes. Here's what I think should work for you:
select am.cArtNr, sum(am.nQuantity) as total
from amazon_fba am
join tArtikel ar on ar.cArtNr=am.cArtNr
join tStueckliste st on st.kStueckliste=ar.kStueckliste
where ar.kStueckliste=0
group by am.cArtNr;
Adding the group by will do the split out by articles, but reducing the number of brackets (in this instance derived tables) will speed up the query provided you're using indexes. Again, this is how I would do it in MySQL, and the only other query language I have experience in is BigQuery which won't help here.
What's the best way to 'SELECT' a 'DISTINCT' list of a field from a table / view (with 'WHERE' criteria) and alongside that count the number of times that that field content repeats in the table / view?
In other words, I have an initial view that looks a bit like this:
I'd like a single SQL query to filter it (SELECT...WHERE...) so that we are only considering records where [ORDER COMPLETE] = False and [PERSONAL] = Null...
...and then create a distinct list of names with counts of the number of times each name appears in the previous table:
*Displaying the [ORDER COMPLETE] and [PERSONAL] fields is redundant by this point and could be dropped to simplify.
I can do the steps individually as above, but struggling to get a single query to do it all... any help appreciated!
Thanks in advance,
-Tim
This should just be the following
SELECT dbo.tblPerson.Person,
COUNT(dbo.tblPerson.Person) AS Count
FROM dbo.tblPerson
INNER JOIN dbo.tblNotifications ON dbo.tblPerson.PersonID = dbo.tblNotifications.AddresseeID
WHERE dbo.tblNotifications.Complete = 'False'
AND dbo.tblNotifications.Personal IS NULL
GROUP BY dbo.tblPerson.Person
ORDER BY COUNT(dbo.tblPerson.Person) DESC
You don't need your DISTINCT or TOP 100 PERCENT,
Here is a simplified fiddle
Well I got downvoted into oblivion (probably for displaying the full extent of my own ignorance!), but just in case someone from the future experiences the same problem as me and stumbles across this question while Googling (or whatever verb you use for "searching all digitised human knowledge" in the distant future), here's some sanitised code of the query I managed to get to work in the end - thanks to Mark Sinkinson's snippet for helping me realise the obvious...
SELECT DISTINCT TOP (100) PERCENT dbo.tblPerson.Person, COUNT(dbo.tblPerson.Person) AS CountPerson
FROM dbo.tblPerson INNER JOIN
dbo.tblNotifications ON dbo.tblPerson.PersonID = dbo.tblNotifications.AddresseeID
WHERE (dbo.tblNotifications.Complete = 'False') AND (dbo.tblNotifications.Personal IS NULL)
GROUP BY dbo.tblPerson.Person
ORDER BY CountPerson DESC
I have a requirement to produce a list of possible duplicates before a user saves an entity to the database and warn them of the possible duplicates.
There are 7 criteria on which we should check the for duplicates and if at least 3 match we should flag this up to the user.
The criteria will all match on ID, so there is no fuzzy string matching needed but my problem comes from the fact that there are many possible ways (99 ways if I've done my sums corerctly) for at least 3 items to match from the list of 7 possibles.
I don't want to have to do 99 separate db queries to find my search results and nor do I want to bring the whole lot back from the db and filter on the client side. We're probably only talking of a few tens of thousands of records at present, but this will grow into the millions as the system matures.
Anyone got any thoughs of a nice efficient way to do this?
I was considering a simple OR query to get the records where at least one field matches from the db and then doing some processing on the client to filter it some more, but a few of the fields have very low cardinality and won't actually reduce the numbers by a huge amount.
Thanks
Jon
OR and CASE summing will work but are quite inefficient, since they don't use indexes.
You need to make UNION for indexes to be usable.
If a user enters name, phone, email and address into the database, and you want to check all records that match at least 3 of these fields, you issue:
SELECT i.*
FROM (
SELECT id, COUNT(*)
FROM (
SELECT id
FROM t_info t
WHERE name = 'Eve Chianese'
UNION ALL
SELECT id
FROM t_info t
WHERE phone = '+15558000042'
UNION ALL
SELECT id
FROM t_info t
WHERE email = '42#example.com'
UNION ALL
SELECT id
FROM t_info t
WHERE address = '42 North Lane'
) q
GROUP BY
id
HAVING COUNT(*) >= 3
) dq
JOIN t_info i
ON i.id = dq.id
This will use indexes on these fields and the query will be fast.
See this article in my blog for details:
Matching 3 of 4: how to match a record which matches at least 3 of 4 possible conditions
Also see this question the article is based upon.
If you want to have a list of DISTINCT values in the existing data, you just wrap this query into a subquery:
SELECT i.*
FROM t_info i1
WHERE EXISTS
(
SELECT 1
FROM (
SELECT id
FROM t_info t
WHERE name = i1.name
UNION ALL
SELECT id
FROM t_info t
WHERE phone = i1.phone
UNION ALL
SELECT id
FROM t_info t
WHERE email = i1.email
UNION ALL
SELECT id
FROM t_info t
WHERE address = i1.address
) q
GROUP BY
id
HAVING COUNT(*) >= 3
)
Note that this DISTINCT is not transitive: if A matches B and B matches C, this does not mean that A matches C.
You might want something like the following:
SELECT id
FROM
(select id, CASE fld1 WHEN input1 THEN 1 ELSE 0 "rule1",
CASE fld2 when input2 THEN 1 ELSE 0 "rule2",
...,
CASE fld7 when input7 THEN 1 ELSE 0 "rule2",
FROM table)
WHERE rule1+rule2+rule3+...+rule4 >= 3
This isn't tested, but it shows a way to tackle this.
What DBS are you using? Some support using such constraints by using server side code.
Have you considered using a stored procedure with a cursor? You could then do your OR query and then step through the records one-by-one looking for matches. Using a stored procedure would allow you to do all the checking on the server.
However, I think a table scan with millions of records is always going to be slow. I think you should work out which of the 7 fields are most likely to match are make sure these are indexed.
I'm assuming your system is trying to match tag ids of a certain post, or something similar. This is a multi-to-multi relationship and you should have three tables to handle it. One for the post, one for tags and one for post and tags relationship.
If my assumptions are correct then the best way to handle this is:
SELECT postid, count(tagid) as common_tag_count
FROM posts_to_tags
WHERE tagid IN (tag1, tag2, tag3, ...)
GROUP BY postid
HAVING count(tagid) > 3;