Bayesian ratings in postgresql - database

I have the following table in my db
Name | total_stars | total_reviews
Item A 27 7
Item B 36 9
Item C 27 7
Item D 30 6
Item E 0 0
Item F 0 0
Item F 15 3
I was looking at this article and was trying to implement bayesian rankings in postgresql database.
The formula given for the rank is
br = ( (avg_num_votes * avg_rating) + (this_num_votes * this_rating) ) /
(avg_num_votes + this_num_votes)
where:
avg_num_votes: The average number of votes of all items that have num_votes>0
avg_rating: The average rating of each item (again, of those that have num_votes>0)
this_num_votes: number of votes for this item
this_rating: the rating of this item
This is the query I came up with, but it is not working:
with avg_num_votes as (
select AVG(total_reviews)
from business
where total_reviews != 0),
avg_rating as (
select AVG(total_stars/total_reviews)
from business
where total_reviews != 0)
select * from business
order by ((avg_num_votes * avg_rating) + (total_stars)) / (avg_num_votes + total_reviews);
I am getting: ERROR: column "avg_num_votes" does not exist

Operator WITH in Postgres is only used to create additional working queries to be used in main query.
This query is using sub-selects in FROM clause instead and works as you expect:
SELECT business.* FROM business,
(SELECT avg(total_reviews) AS v
FROM business
WHERE total_reviews != 0
) AS avg_num_votes,
(SELECT avg(total_stars/total_reviews) AS v
FROM business
WHERE total_reviews != 0
) AS avg_rating
ORDER BY ((avg_num_votes.v * avg_rating.v) + (total_stars)) / (avg_num_votes.v + total_reviews)
EDIT: Actually, using WITH is also possible, but does not seem to be shorter compared to first form. Also, it is less portable - first solution will work on MySQL, but this will not:
WITH
avg_num_votes AS (
SELECT avg(total_reviews) AS v
FROM business
WHERE total_reviews != 0
),
avg_rating AS (
SELECT avg(total_stars/total_reviews) AS v
FROM business
WHERE total_reviews != 0
)
SELECT business.*
FROM business, avg_num_votes, avg_rating
ORDER BY ((avg_num_votes.v * avg_rating.v) + (total_stars)) / (avg_num_votes.v + total_reviews)

SQL Fiddle
with av as (
select avg(total_reviews) avg_num_votes
from business
where total_reviews > 0
), ar as (
select name, avg(total_stars * 1.0 / total_reviews) avg_rating
from business
where total_reviews > 0
group by name
)
select b.*, avg_rating, avg_num_votes,
(avg_num_votes * avg_rating + total_stars)
/
(avg_num_votes + total_reviews) br
from
business b
left join
ar on ar.name = b.name
inner join
av on true
order by br, b.name
;
name | total_stars | total_reviews | avg_rating | avg_num_votes | br
--------+-------------+---------------+--------------------+--------------------+------------------------------------
Item A | 27 | 7 | 3.8571428571428571 | 6.4000000000000000 | 3.85714285714285712238805970149254
Item C | 27 | 7 | 3.8571428571428571 | 6.4000000000000000 | 3.85714285714285712238805970149254
Item B | 36 | 9 | 4.0000000000000000 | 6.4000000000000000 | 4.00000000000000000000000000000000
Item D | 30 | 6 | 5.0000000000000000 | 6.4000000000000000 | 5.00000000000000000000000000000000
Item F | 0 | 0 | 5.0000000000000000 | 6.4000000000000000 | 5.00000000000000000000000000000000
Item F | 15 | 3 | 5.0000000000000000 | 6.4000000000000000 | 5.00000000000000000000000000000000
Item E | 0 | 0 | | 6.4000000000000000 |

Related

Sum two columns in SQL and optimise SQL query

I have the following query which returns two columns, I want to sum both columns and create the third column through summing the two.
Is there any way I can recreate the below query by removing the subquery? Any way I can achieve the same through joins?
SELECT
IIF(c2.isdeleted = 1 OR c2.approved = 0, 0, 1) AS Contentcount,
(SELECT COUNT(c1.content)
FROM comments c1
WHERE c1.parentcommentid = c2.id
AND c1.isdeleted = 0
AND c1.approved = 1) ChildContentcount --Anyway to remove the subquery
FROM
comments c2
WHERE
c2.discussionid = '257943'
AND c2.parentcommentid IS NULL
ORDER BY
c2.pinned DESC,
c2.createddate
Sample data:
+----------+--------------+
| content | childcontent |
+----------+--------------+
| 1 | 8 |
| 0 | 0 |
| 1 | 3 |
+----------+--------------+
Expected output:
+----------+----------------+---------+
| content | childcontent | sumdata |
+----------+----------------+---------+
| 1 | 8 | 9 |
| 0 | 0 | 0 |
| 1 | 3 | 4 |
| 1 | 8 | 9 |
+----------+----------------+---------+
You can use CROSS APPLY or OUTER APPLY instead of a correlated subquery.
Then you can re-use the values.
select c.pinned, c.createddate
, c.discussionid
, ca1.content
, ca2.childcontent
, (ca1.content + ca2.childcontent) AS sumdata
FROM comments c
CROSS APPLY
(
SELECT CASE
WHEN c.isdeleted = 1 OR c.approved = 0 THEN 0
ELSE 1
END AS content
) ca1
CROSS APPLY
(
SELECT COUNT(c2.content) AS childcontent
FROM comments c2
WHERE c2.parentcommentid = c.id
AND c2.isdeleted = 0
AND c2.approved = 1
) ca2
WHERE c.discussionid = '257943'
AND c.parentcommentid IS NULL
ORDER BY
c.pinned DESC,
c.createddate;
Subquery and sum the columns :
select tbl.* , Contentcount+ChildContentcount third_sum from
(
select IIF(c2.isdeleted = 1 OR c2.approved = 0, 0, 1) AS Contentcount,
(SELECT COUNT(c1.content)
FROM comments c1
WHERE c1.parentcommentid = c2.id
AND c1.isdeleted = 0
AND c1.approved = 1) ChildContentcount
FROM
comments c2
WHERE
c2.discussionid = '257943'
AND c2.parentcommentid IS NULL ) tbl
If you supply sql fiddle, we can try to create it alternative ways

SQL Server Pivot Table and Group

I'm attempting to pivot some of my rows, currently my query is this:
SELECT count(distinct users.userName) as TotalUsers, products.productNameCommon, employees.Division, products.productType
FROM FlexLM_users users
INNER JOIN Org.Employees employees ON employees.Username=users.userName
INNER JOIN FlexLM_history history ON users.userID=history.userID
INNER JOIN FlexLM_products products ON products.productID=history.productID
where products.productType = 'Base'
GROUP BY products.productNameCommon, employees.Division, products.productType
ORDER BY users DESC
And outputs this:
TotalUsers| productNameCommon | Division | productType
------------------------------------------------------------------------
16 | Standard | Disease Control | base
12 | Basic | Epidemiology | base
10 | Standard | Prevention | base
8 | Advanced | Epidemiology | base
6 | Basic | Disease Control | base
2 | Advanced | Prevention | base
What I am looking to do is this:
Division | Basic | Standard | Advanced | TotalUsers
----------------------------------------------------------
Disease Control| 6 | 16 | 0 | 22
Epidemiology | 12 | 0 | 8 | 20
Prevention | 0 | 10 | 2 | 12
SELECT Division
, ISNULL([Basic] , 0) AS [Basic]
, ISNULL([Standard], 0) AS [Standard]
, ISNULL([Advanced], 0) AS [Advanced]
, ISNULL([Basic] , 0)
+ ISNULL([Standard], 0)
+ ISNULL([Advanced], 0) AS TotalUsers
FROM (
SELECT TotalUsers , productNameCommon , Division
FROM (
-- Your Existing Query here
)a
) t
PIVOT (
SUM (TotalUsers)
FOR productNameCommon
IN ([Basic], [Standard] , [Advanced])
) p

Combine Parent-Child Rows - TSQL

lI am trying to flatten/combine rows from a table with a parent-child hierarchy. I'm trying to identify the beginning and the end of each 'link' - so if a is linked to b, b is linked to c, and then c is linked to d, I want the output to link a to d.
I'm trying my best to avoid using a procedure with loops, so any advice would be much appreciated!
The original dataset and the required output is as follows:
personID | form | linkedform
---------|---------|---------
1 | a | b
1 | b | c
1 | c | d
1 | d | NULL
2 | e | f
2 | f | g
2 | g | NULL
2 | h | i
2 | i | NULL
3 | j | NULL
3 | k | l
3 | l | NULL
Desired output:
personID | form | linkedform
---------|---------|---------
1 | a | d
2 | e | g
2 | h | i
3 | j | NULL
3 | k | l
Each personID can have multiple links, and a link can be made of just one or multiple forms.
-- use a recursive cte to build the hierarchy
-- start with [linkedform] = null and work your way up
;WITH cte AS
(
SELECT *, [form] AS [root],
1 AS [Level]
FROM Table1
WHERE [linkedform] IS NULL
UNION ALL
SELECT t1.*,
[root],
[Level] + 1
FROM Table1 t1
JOIN cte ON cte.form = t1.linkedform
)
-- level 1 will be the last element, use row_number to get the first element
-- join the two together based on last and first level, that have the same personid and root ([linkedform] = null)
SELECT cte.personId,
cte2.form,
cte.form
FROM cte
JOIN ( SELECT *,
ROW_NUMBER() OVER (PARTITION BY personId, [root] ORDER BY Level DESC) Rn
FROM cte) cte2
ON cte2.Rn = cte.Level
AND cte2.personId = cte.personId
AND cte2.root = cte.root
WHERE cte.[Level] = 1
ORDER BY cte.personId, cte2.form

Sql server join by group?

I have this table :
id | type | date
1 | a | 01/1/2012
2 | b | 01/1/2012
3 | b | 01/2/2012
4 | b | 01/3/2012
5 | a | 01/5/2012
6 | b | 01/5/2012
7 | b | 01/9/2012
8 | a | 01/10/2012
The POV is per date. if 2 rows contains the same date , so both will visible in the same line ( left join).
Same date can be shared by 2 rows max.
so this situation can't be :
1 | a | 01/1/2012
2 | b | 01/1/2012
3 | a | 01/1/2012
if in the same date there is group a and b show both of them in single line using left join
if in date there is only a group , show it as single line ( +null at the right side )
if in date there is only b group , show it as single line ( +null at the left side )
Desired result :
Date |typeA|typeB |a'id|b'id
01/1/2012 | a | b | 1 | 2
01/2/2012 | | b | | 3
01/3/2012 | | b | | 4
01/5/2012 | a | b | 5 | 6
01/9/2012 | | b | | 7
01/10/2012 | a | | 8 |
I know this suppose to be simple , but the main anchor of join here is the date.
The problem I've encountered is when I read line 1 , i search in the table all rows with the same date...fine. - its ok.
But when I read the second line , I do it also , and it yields the first row - which already was counted...
any help ?
here is the sql fiddle :
https://data.stackexchange.com/stackoverflow/query/edit/82605
I think you want a pivot
select
[date],
case when [a] IS null then null else 'a' end typea,
case when [b] IS null then null else 'b' end typeb,
a as aid,
b as bid
from yourtable src
pivot (max(id) for type in ([a],[b]))p
If you want to do it with joins..
select ISNULL(a.date, b.date), a.type,b.type, a.id,b.id
from
(select * from yourtable where type='a') a
full outer join
(select * from yourtable where type='b') b
on a.date = b.date

Retrieving the most recent records within a query

I have the following tables:
tblPerson:
PersonID | Name
---------------------
1 | John Smith
2 | Jane Doe
3 | David Hoshi
tblLocation:
LocationID | Timestamp | PersonID | X | Y | Z | More Columns...
---------------------------------------------------------------
40 | Jan. 1st | 3 | 0 | 0 | 0 | More Info...
41 | Jan. 2nd | 1 | 1 | 1 | 0 | More Info...
42 | Jan. 2nd | 3 | 2 | 2 | 2 | More Info...
43 | Jan. 3rd | 3 | 4 | 4 | 4 | More Info...
44 | Jan. 5th | 2 | 0 | 0 | 0 | More Info...
I can produce an SQL query that gets the Location records for each Person like so:
SELECT LocationID, Timestamp, Name, X, Y, Z
FROM tblLocation
JOIN tblPerson
ON tblLocation.PersonID = tblPerson.PersonID;
to produce the following:
LocationID | Timestamp | Name | X | Y | Z |
--------------------------------------------------
40 | Jan. 1st | David Hoshi | 0 | 0 | 0 |
41 | Jan. 2nd | John Smith | 1 | 1 | 0 |
42 | Jan. 2nd | David Hoshi | 2 | 2 | 2 |
43 | Jan. 3rd | David Hoshi | 4 | 4 | 4 |
44 | Jan. 5th | Jane Doe | 0 | 0 | 0 |
My issue is that we're only concerned with the most recent Location record. As such, we're only really interested in the following Rows: LocationID 41, 43, and 44.
The question is: How can we query these tables to give us the most recent data on a per-person basis? What special grouping needs to happen to produce the desired result?
MySQL doesn't have ranking/analytical/windowing functionality.
SELECT tl.locationid, tl.timestamp, tp.name, X, Y, Z
FROM tblPerson tp
JOIN tblLocation tl ON tl.personid = tp.personid
JOIN (SELECT t.personid,
MAX(t.timestamp) AS max_date
FROM tblLocation t
GROUP BY t.personid) x ON x.personid = tl.personid
AND x.max_date = tl.timestamp
SQL Server 2005+ and Oracle 9i+ support analytics, so you could use:
SELECT x.locationid, x.timestamp, x.name, x.X, x.Y, x.Z
FROM (SELECT tl.locationid, tl.timestamp, tp.name, X, Y, Z,
ROW_NUMBER() OVER (PARTITION BY tp.name ORDER BY tl.timestamp DESC) AS rank
FROM tblPerson tp
JOIN tblLocation tl ON tl.personid = tp.personid) x
WHERE x.rank = 1
Using a variable to get same as ROW_NUMBER functionality on MySQL:
SELECT x.locationid, x.timestamp, x.name, x.X, x.Y, x.Z
FROM (SELECT tl.locationid, tl.timestamp, tp.name, X, Y, Z,
CASE
WHEN #name != t.name THEN
#rownum := 1
ELSE #rownum := #rownum + 1
END AS rank,
#name := tp.name
FROM tblLocation tl
JOIN tblPerson tp ON tp.personid = tl.personid
JOIN (SELECT #rownum := NULL, #name := '') r
ORDER BY tp.name, tl.timestamp DESC) x
WHERE x.rank = 1
As #Mark Byers mentions, this problem comes up frequently on Stack Overflow.
Here's the solution I most frequently recommend, given your tables:
SELECT p.*, l1.*
FROM tblPerson p
JOIN tblLocation l1 ON p.PersonID = l1.PersonID
LEFT OUTER JOIN tblLocation l2 ON p.PersonID = l2.PersonID AND
(l1.timestamp < l2.timestamp OR l1.timestamp = l2.timestamp AND l1.LocationId < l2.LocationId)
WHERE l2.LocationID IS NULL;
To see other examples, follow the tag greatest-n-per-group, which I added to your question.
This is a classic 'max per group' question that comes up on Stack Overflow almost every day. There are many ways to solve it and you can find example solutions by searching Stack Overflow. Here is one way that you can do it in MySQL:
SELECT
location.LocationId,
location.Timestamp,
person.Name,
location.X,
location.Y,
location.Z
FROM (
SELECT
LocationID,
#rn := CASE WHEN #prev_PersonID = PersonID
THEN #rn + 1
ELSE 1
END AS rn,
#prev_PersonID := PersonID
FROM (SELECT #prev_PersonID := NULL) vars, tblLocation
ORDER BY PersonID, Timestamp DESC
) T1
JOIN tblLocation location ON location.LocationID = T1.LocationId
JOIN tblPerson person ON person.PersonID = location.PersonID
WHERE rn = 1

Resources