Distinct rows from three tables using joins - sql-server

I have three tables related to article section of my website. I need to show the top authors based on based on number if times authors articles where read. I use basic three table to store this inform.
Article has all the details related to articles, author information is stored in Authors and when a user views a particular article I update or insert a new record in Popularity.
Below is sample data:
Articles
ArticleID Title Desc AuthorID
--------- ---------------- ---- --------
1 Article One .... 100
2 Article Two .... 200
3 Article Three .... 100
4 Article Four .... 300
5 Article Five .... 100
6 Article Six .... 300
7 Article Seven .... 500
8 Article Eight .... 100
9 Article Nine .... 600
Authors
AuthorID AuthorName
-------- ------------
100 Author One
200 Author Two
300 Author Three
400 Author Four
500 Author Five
600 Author Six
Popularity
ID ArticleID Hits
-- --------- ----
1 1 20
2 2 50
3 5 100
4 3 11
5 4 21
I am trying to use following query to get the TOP 10 authors:
SELECT TOP 10 AuthorID
,au.AuthorName
,ArticleHits
,SUM(ArticleHits)
FROM Authors au
JOIN Articles ar
ON au.AuthorID = ar.ArticleAuthorID
JOIN Popularity ap
ON ap.ArticleID = ar.ArticleID
GROUP BY AuthorID,1,1,1
But this generates the following error:
Msg 164, Level 15, State 1, Line 12Each GROUP BY expression must contain at least one column that is not an outer reference.

SQL Server requires that any columns in the SELECT list must be in the GROUP BY cluase or in an aggregate function. The following query appears to be working, as you can see I included a GROUP BY au.AuthorID, au.AuthorName which contains both columns in the SELECT list that are not in an aggregate function:
SELECT top 10 au.AuthorID
,au.AuthorName
,SUM(Hits) TotalHits
FROM Authors au
JOIN Articles ar
ON au.AuthorID = ar.AuthorID
JOIN Popularity ap
ON ap.ArticleID = ar.ArticleID
GROUP BY au.AuthorID, au.AuthorName
order by TotalHits desc
See SQL Fiddle with Demo.
I am not sure if you want the Hits in the SELECT statement because you will then have to GROUP BY it. This could alter the Sum(Hits) for each article because if the hits are different in each entry you will not get an accurate sum.

I would do it this way. First figure out who your top ten authors are, then go get the name (and any other columns you want to pull along). For this query it's not a huge difference but all that grouping can become more complex and expensive as your output list requirements increase.
;WITH TopAuthors(AuthorID, ArticleHits) AS
(
SELECT TOP (10) a.AuthorID, SUM(p.Hits)
FROM dbo.Authors AS a
INNER JOIN dbo.Articles AS ar
ON a.AuthorID = ar.AuthorID
INNER JOIN dbo.Popularity AS p
ON ar.ArticleID = p.ArticleID
ORDER BY SUM(p.Hits) DESC
)
SELECT t.AuthorID, a.AuthorName, t.ArticleHits
FROM TopAuthors AS t
INNER JOIN dbo.Authors AS a
ON t.AuthorID = a.AuthorID
ORDER BY t.ArticleHits DESC;
For this specific query bluefeet's version is likely to be more efficient. But if you add additional columns to the output (e.g. more info from the authors table) the grouping might outweigh the additional seek or scan I have presented.

As many columns present with Aggregate function those have to be present in the group by clause. In your case, AuthorID, au.AuthorName, ArticleHits should also be present. Hence the group by statement would become
GROUP BY AuthorID, au.AuthorName, ARticleHits
This would help.

Related

How do I get multiple values from a table in SQLite?

I have three tables:
authors
idname
1 Albert
2Bobby
3 Carl
4 Dan
authors_musicals
rowidauthor_idmusical_id
1 1 1
2 2 1
3 1 2
4 1 3
musicals
id title year
1 Brigadoon 1947
2My Fair Lady1956
3 Oklahoma! 1943
4 Camelot 1960
I need to get all the titles belonging to Albert (his id (1) from authors corresponds to musical_id (1, 2, 3) in authors_musicals which each correspond to title (Brigadoon, My Fair Lady, Oklahoma!) in musicals). I thought the following would work:
SELECT title FROM musicals WHERE id=(SELECT musical_id FROM authors_musicals WHERE author_id=(SELECT id FROM authors WHERE name="Albert"));
This only gives me the first listing. How can I get all three and since these tables are linked, is there a simpler way of getting what I want?
JOIN the tables:
SELECT musicals.title
FROM musicals
JOIN authors_musicals ON (musicals.id = authors_musicals.musical_id)
JOIN authors ON (authors.id = authors_musicals.author_id)
WHERE authors.name = "Albert"
I don't use SQLite but I would assume that it's basically the same as using SQL for any other database. When you use SomeColumn = SomeValue you can only have one value on the right-hand side. Even if your subqueries produce multiple results, only the first will be used because you're using =.
You should be able to keep your current SQL structure and make it work by replacing = with IN, assuming that SQLite supports that operator. Then you'll be comparing to all the results instead of just one.
That said, I don't think that you should be using subqueries at all. It seems more appropriate to be using joins there. Again, there might be some small syntax difference but something like this should work:
SELECT title FROM musicals INNER JOIN authors_musicals
ON musicals.musical_id = authors_musicals.musical_id INNER JOIN authors
ON authors.author_id = authors_musicals.author_id
WHERE authors.name = 'Albert'
Combine the info between tables and get what you need:
SELECT title
FROM authors, authors_musicals, musicals
WHERE name="Albert" and authors.id=authors_musicals.author_id and musical_id = musicals.id;

MSAccess/SQL lookup table for match field based on sum of current table.field

I've been battling this for the last week with many attempted solutions. I want to return the unique names in table with the sum of their points and their current dance level based on that sum. Ultimately I want compare the returned dance level with what is stored in the customer table against the customer and show only the records where the two dance levels are different (the stored dance level and the calculated dance level based on the current sum of the points.
The final solution will be a web page using ADODB connection to MSAccess DB (2013). But for starters just want it to work in MSAccess.
I have a MSAccess DB (2013) with the following tables.
PointsAllocation
CustomerID Points
100 2
101 1
102 1
100 1
101 4
DanceLevel
DLevel Threshold
Beginner 2
Intermediate 4
Advanced 6
Customer
CID Firstname Dancelevel1
100 Bob Beginner
101 Mary Beginner
102 Jacqui Beginner
I want to find the current DLevel for each customer by using the SUM of their Points in the first table. I have this first...
SELECT SUM(Points), CustomerID FROM PointsAllocation GROUP BY CustomerID
Works well and gives me total points per customer. I can then INNER JOIN this to the customer table to get the persons name. Perfect.
Now I want to add the DLevel from the DanceLevel table to the results where the SUM total is used to lookup the Threshold and not exceed the value so I get the following:
(1) (2) (3) (4)
Bob 3 Beginner Intermediate
Mary 5 Beginner Advanced
Where...
(1) Customer.Firstname
(2) SUM(PointsAllocation.Points)
(3) Customer.Dancelevel1
(4) Dancelevel.DLevel
Jacqui is not shown as her SUM of Points is less than or equal to 2 giving her a calculated dance level of Beginner and this already matches the her Dancelevel1 in the Customer table.
Any ideas anyone?
You can start from the customer table because you want to list every customer. Then left join it with a subquery that calculates the dance levels and point totals. The innermost subquery totals the points and then joins on valid dance levels and selects the max threshold value from the dance levels. Then left join on the DanceLevel table again on the threshold value to get the level's description.
Select Customer.Firstname,
CustomerDanceLevels.Points,
Customer.Dancelevel1,
Dancelevel.DLevel
from Customer
left join
(select CustomerID, Points, Min(Threshold) Threshold
from
(select CustomerID, sum(Points) Points
from PointsAllocation
group by CustomerID
) PointsTotal
left join DanceLevel
on PointsTotal.Points <= DanceLevel.Threshold
group by CustomerID, Points
) CustomerDanceLevels
on Customer.CID = CustomerDanceLevels.CustomerID
left join DanceLevel
on CustomerDanceLevels.Threshold = DanceLevel.Threshold

OLTP-Database design

I need help. I have 2 tables Books and Authors
One book can have multiple Authors
One Author can write multiple Books
So I designed Mapping/Junction table to maintain this relation
My requirement - I want to get Book ID,Name for the given Author combination.
Say in below example Book 'B3' (103) written by Author A2 & A3. So my input will be 302 & 303 (A2 & A3 id's) and query should give me 103 (book id)
Pl suggest schema changes if require
Here is the sample code work in SQL Server 2005 and above
declare #tbl_Books TABLE (Book_ID INT, Book_Name VARCHAR(500))
declare #tbl_Authors TABLE (Author_ID INT, Author_Name VARCHAR(50))
declare #tbl_Mapping TABLE (Mapping_ID INT IDENTITY(1,1), Book_ID INT, Author_ID INT)
insert into #tbl_Books VALUES (101,'B1'),(102,'B2'),(103,'B3')
insert into #tbl_Authors VALUES (301,'A1'),(302,'A2'),(303,'A3')
insert into #tbl_Mapping VALUES (101,301),(101,302),(102,301),(102,302),(101,303),(103,302),(103,303)
select * from #tbl_Books
select * from #tbl_Authors
select * from #tbl_Mapping
Table : tbl_Books
==========
Book_ID Book_Name
101 B1
102 B2
103 B3
Table: tbl_Authors
===================
Author_ID Author_name
301 A1
302 A2
303 A3
Table:tbl_Mapping
==============
Mapping_ID Book_ID Author_ID
1 101 301
2 101 302
3 102 301
4 102 302
5 102 303
6 103 302
7 103 303
This isn't pretty but it works:
SELECT x.book_id, b.book_name
FROM (SELECT book_id, COUNT(*) AS num FROM tbl_mapping GROUP BY book_id) x --Get all books with a count of their authors
INNER JOIN (SELECT book_id FROM tbl_mapping WHERE author_id IN (302,303)) y --Get all books which involve the specified authors
ON y.book_id = x.book_id
INNER JOIN tbl_books b
ON b.book_id = x.book_id
WHERE x.num = 2 --Filter for books which have exactly the required number of authors
GROUP BY x.book_id, b.book_name
HAVING COUNT(*) = 2 --Filter for how many times each book appears in the results. We want those that appear as many times as there are authors being searched
To make it less static you would somehow have to get your IN clause to be built according to the list of author IDs you supply and where it says = 2 you would need to change the 2 to the number of authors being search by.
I tested it by adding another book to your example data written by only one author and adjusting the query accordingly. It returned what I expected. Also tried the book with three authors which works too. This hardly constitutes robust testing but it proves the basic concept. I'm certain there's a nicer way to do this possibly using window functions but frankly it's my dinner time and I'm starving so I can't think of it!
So you are looking for Book ID and Book names for a given set of authors.
You could try something like (very much pseudo-sql):
select tb.Book_ID, tb.Book_Name, SUM(tm.Author_ID) as authors FROM tbl_Mapping tm
INNER JOIN tbl_Book tb on tb.Book_ID = tm.Book_ID
WHERE tm.Author_ID IN ( <your list of authors>)
AND authors = (<the number of authors passed in>)
GROUP BY tb.Book_ID
But I'm not sure of the legality of the authors alias as a filter (I've never really done this direclty in SQL)
A programatic approach, however, would be to have a query like:
select Book_ID from tbl_Mapping WHERE Author_ID = <One author ID>
And put it in a loop. The above query is the first execution, and later queries also have
AND BOOK_ID IN (<List of Book IDs returned by previous loops)
you loop until you run out of authors, and then you run those IDs through a query to get the name (or you tack the name on to the previous queries and also track it).

Viable ways to have an AutoNumber per user (or per entity)?

Let's say you have a web application that manages books for book sellers, and it is built on a multi-tenant database with a single books table that contains books from several book sellers.
Now let's say that each book seller really wants each of their books to have a unique number associated with it so they can look books up by that number, but it's important to them that the number is roughly consecutive for them. (It's OK if there are small breaks in the sequence due to deleted books and other events that cause an AutoNumber to get consumed but not used).
Obviously each book already has a unique number (primary key) associated with it that is generated via AutoNumber and is unique across book sellers. That is not what I am discussing here.
Let's just assume SQL-Server from here on, but the discussion applies equally to Oracle (except that Oracle uses Sequences that are independent of tables, and the current version of SQL Server must use a table to accomplish the same thing).
We want a number that increments safely in the context of a book seller. We want to maintain the benefits of using AutoNumber, but we want there to be one sequence per book seller. It seems like there are two options, and neither are very good:
Create one single-column table per book seller. This scares me because I can't think of another example of dynamically changing the schema (adding a new table whenever a new book seller is added to the system via the web application) in a web request. It also seems really heavyweight to have one table per book seller. I know a future version of SQL-server will support Sequences, but even that would still be a schema change at run time.
Roll your own auto-numbering behavior. This seems really risky because databases' built-in AutoNumber features take care of a lot of stuff for you, and giving that up is a big deal. Attempts to re-implement it yourself are probably error-prone and may cause poorer concurrency than the built-in AutoNumber.
Hopefully there are additional options that I'm missing. Has anyone successfully dealt with a similar situation? Thanks.
Is there a reason you couldn't have a 2 field table with:
BookSeller_ID, BookID
You wouldn't need to change schema as you add sellers, and it would be trivial to track per seller:
SELECT MAX(BookID)
WHERE BookSeller_ID = 123
For additional info you could also add a Universal_BookID field that linked to your unique ID referenced in the 3rd paragraph.
EDIT:
To clarify, if you have sellers 1 2 and 3 you could have a table like:
SellerID BookID BookUniversalID
1 1 123
2 1 456
3 1 999
1 2 1234
1 3 8798
1 4 999
1 5 10000001
3 2 123
3 3 456
You keep track of which seller has which IDs assigned and which actual book it links too, and to determine what the next ID is for a seller just query
SELECT MAX(bookid) FROM ThisTable WHERE SellerID = 1
DENSE_RANK, works in SQL Server and Oracle
Assuming your table looks vaguely thus
CREATE TABLE dbo.BOOKS
(
internal_book_id int identity(1,1) primary key
, seller_id int NOT NULL
, title varchar(50) NOT NULL
)
Whenever you present the identity value to the seller, use the dense_rank() function to generate the surrogate values.
CREATE VIEW dbo.BOOK_TO_SELLER_MAP
AS
SELECT
B.*
, DENSE_RANK() OVER (PARTITION BY B.seller_id ORDER BY B.internal_book_id ASC) AS unique_book_id_for_seller
FROM
dbo.BOOKS B
WHERE
B.seller_id = #sellerId
For the combination of seller_id and the generated id, you ought to always match back to the true id (assuming no physical deletes).
Demo code
;
WITH BOOKS (internal_book_id, seller_id, title)
AS
(
SELECT 1, 100, 'Secret of NIMH'
UNION ALL SELECT 2, 400, 'Once and Future King'
UNION ALL SELECT 7, 88, 'Microsoft SQL Server 2008'
UNION ALL SELECT 8, 100, 'Bonfire of the Vanities'
UNION ALL SELECT 9, 100, 'Canary Row'
UNION ALL SELECT 10, 400, '1916'
UNION ALL SELECT 11, 100, 'The Picture of Dorian Gray'
UNION ALL SELECT 12, 88, 'The Disasters of War'
)
, BOOK_TO_SELLER_MAP AS
(
SELECT
B.*
, DENSE_RANK() OVER (PARTITION BY B.seller_id ORDER BY B.internal_book_id ASC) AS unique_book_id_for_seller
FROM
BOOKS B
)
SELECT
*
FROM
BOOK_TO_SELLER_MAP V
ORDER BY
V.seller_id
, V.unique_book_id_for_seller
Results
internal_book_id seller_id title unique_book_id_for_seller
7 88 Microsoft SQL Server 2008 1
12 88 The Disasters of War 2
-------------------------------------------------------------------------------
1 100 Secret of NIMH 1
8 100 Bonfire of the Vanities 2
9 100 Canary Row 3
11 100 The Picture of Dorian Gray 4
-------------------------------------------------------------------------------
2 400 Once and Future King 1
10 400 1916 2
OMG Ponies is correct that sequences are the only correct way to achieve this. There isn't really another viable option.

Outputting Results from complicated database structure (SQL Server)

This will be a long question so I'll try and explain it as best as I can.
I've developed a simple reporting tool in which a number of results are stored and given a report id, these results were generated from a particular quote being used on the main system, with a huge list of these being stored in a quotes table. Here are the current batch:
REPORTS
REP_ID DESC QUOTE_ID
-----------------------------------
1 Test 1
2 Today 1
3 Last Week 2
RESULTS
RES_ID TITLE REFERENCE REP_ID
---------------------------------------------------
1 Equipment Toby 1
2 Inventory Carl 1
3 Stocks Guest 2
4 Portfolios Guest 3
QUOTE
QUOTE_ID QUOTE
------------------------------------
1 Booking a meeting room
2 Car Park Policy
3 New User Guide
So far, so good, a simple stored procedure was able to pull all the information necessary.
Now, the feature list has been upped to include categories and groups of the quotes. In the Reports table quote_id has been changed to group_id to link to the following tables.
REPORTS
- REPORT_ID
- DESC
- GROUP_ID
GROUP
- GROUP_ID
- GROUP
GROUP_CAT_JOIN
- GCJ_ID
- CAT_ID
- GROUP_ID
CATEGORIES
- CAT_ID
- CATEGORY
CAT_QUOTE_JOIN
- CQJ_ID
- CAT_ID
- QUOTE_ID
The idea of these changes is so that instead of running a report on a quote I should now write a report for a group where a group is a set of quotes for certain occasions. I should also be able to run a report on a category where a category is also a set of quotes for certain departments. The trick is that several categories can fall into one group.
To explain it further, the results table has a report_id that links to reports, reports has a group_id that links to groups, groups and categories are linked through a group_cat_join table, the same with categories and quotes through a cat_quote_join table.
In basic terms I should be able to pull all the results from either a group of quotes or a category of quotes. The query will aim to pull all the results from a certain report under either a certain category, a group or both. This puzzle has left me stumped for days now as inner joins don't appear to be working and I'm struggling to find other ways to solve the problem using SQL.
Can anyone here help me?
Here's some extra clarification.
I want to be able to return all the results within a category, but as of right now the solution below and the ones I've tried always output every solution within a description, which is not what I want.
Here's an example of the data I have in there at the moment
Results
RES_ID TITLE REFERENCE REP_ID
---------------------------------------------------
1 Equipment Toby 1
2 Inventory Carl 1
3 Stocks Guest 2
4 Portfolios Guest 3
Reports
REP_ID DESC GROUP_ID
-----------------------------------
1 Test 1
2 Today 1
3 Last Week 2
GROUP
GROUP_ID GROUP
---------------------------------
1 Standard
2 Target Week
GROUP_CAT_JOIN
GCJ_ID GROUP_ID CAT_ID
----------------------------------
1 1 1
2 1 2
3 2 3
CATEGORIES
CAT_ID CAT
-------------------------------
1 York Office
2 Glasgow Office
3 Aberdeen Office
CAT_QUOTE_JOIN
CQJ_ID CAT_ID QUOTE_ID
-----------------------------------
1 1 1
2 2 2
3 3 3
QUOTE
QUOTE_ID QUOTE
------------------------------------
1 Booking a meeting room
2 Car Park Policy
3 New User Guide
This is the test data I am using at the moment and to my knowledge it is similar to what will be run through once this is done. In all honesty I'm still trying to get my head around this structure.
The result I am looking for is if I choose to search by group I'll get everything within a group, if I choose everything inside a category I get everything just inside that category, and if I choose something from a category in a group I get everything inside that category. The problem at the moment is that whenever the group is referenced everything inside every category that's linked to the group is pulled.
The following will get the necessary rows from the results:
select
a.*
from
results a
inner join reports b on
a.rep_id = b.rep_id
and (-1 = #GroupID or
b.group_id = #GroupID)
and (-1 = #CatID or
b.cat_id = #CatID)
Note that I used -1 as the placeholder for all Groups and Categories. Obviously, use a value that makes sense to you. However, this way, you can specify a specific group_id or a specific cat_id and get the results that you want.
Additionally, if you want Group/Category/Quote details, you can always append more inner joins to get that info.
Also note that I added the Group_ID and Cat_ID conditions to the Reports table. This would be the SQL necessary if and only if you add a Cat_ID column to the Reports table. I know that your current table structure doesn't support this, but it needs to. Otherwise, as my grandfather used to say, "Boy, you can't get there from here." The issue here is that you want to limit reports by group and category, but reports only knows about group. Therefore, we need to tie something to the category from reports. Otherwise, it will never, ever, ever limit reports by category. The only thing that you can limit by both group and category is quotes. And that doesn't seem to be your requirement.
As an addendum: If you add cat_id to results instead of reports, the join condition should be:
and (-1 = #CatID or
a.cat_id = #CatID)
Is this what you are looking for?
SELECT a.*
FROM Results a
JOIN Reports b ON a.REP_Id = c.REP_Id
WHERE EXISTS (
SELECT * FROM CAT_QUOTE_JOIN c
WHERE c.QUOTE_ID = b.QUOTE_ID -- correlation to the outer query
AND c.CAT_ID = #CAT_ID -- parameterization
)
OR EXISTS (
-- note that subquery table aliases are not visible to other subqueries
-- so we can reuse the same letters
SELECT * FROM CAT_QUOTE_JOIN c, GROUP_CAT_JOIN d
WHERE c.CAT_ID = d.CAT_ID -- subquery join
AND c.QUOTE_ID = b.QUOTE_ID -- correlation to the outer query
AND d.GROUP_ID = #GROUP_ID -- parameterization
)

Resources