Get rows from different table in a column in select query - sql-server

I want to be able to see the rows from different table in a single column .
Just like this question How to select row/s from an other table using a comma-separated value from the first table? (which is a duplicate of this question: MySQL Join two tables with comma separated values :) but for Sql Server.
Tables from link:
Table 1. faculty
subject
101, 102
104
103, 105
Table 2. subject
code
subject
101
subject 1
102
subject 2
103
subject 3
104
subject 4
105
subject 5
Expected Output:
subject
subject
101, 102
subject 1, subject 2
104
subject 4
103, 105
subject 3, subject 5
Basically I don't want to see multiple columns for joins but all corresponding values in on column comma separated.

To split a delimited string into rows in newer versions of sql server you can use STRING_SPLIT() in a CROSS APPLY. Then after joining, to aggregate strings back into delimited form you can use function STRING_AGG. Putting the same pieces together as the linked question you will get the following. NOTE that this requires newer versions of sql server that support both STRING_AGG and STRING_SPLIT functionality.
With faculty_split AS
(
SELECT *
FROM faculty
CROSS APPLY STRING_SPLIT (subject, ',') fac
)
SELECT faculty_split.subject as codes, STRING_AGG(subject.subject, ',') as subjects
FROM faculty_split
INNER JOIN subject on faculty_split.value = subject.code
GROUP BY faculty_split.subject
Link to fiddle

Related

SELECT INTO returning multiple records when it should return 1

EDIT:
I've since edited the query using JOINS instead of a WHERE clause in light
of suggested comments. I was using a WHERE clause instead of JOIN because I
couldn't get it to work across three tables but have figured it out. I've
also inserted SELECT DISTINCT because it does solve the problem.
Thanks #MichaelEvanchik, and #SeanLange for the help. Im still learning and hope I don't frustrate you guys too much.
I've looked over many of the multiple return threads and don't seem to find an answer that helps me.
I have 4 tables.
table1
ID Cat1_Name1 Cat1_Name2 Cat2_Name1 Cat2_Name2
12 Mike Mike George Mike
13 Jen Jen Amy Amy
14 Jeff Jen Mike Ben
15 Jeff Jeff Fred Tom
16 George Jen Luke Amy
table2
ID Cat1_Name1 Cat1_Name2 Cat2_Name1 Cat2_Name2
25 Mike Mike Jen George
table3
Name Cat1_Value Cat2_Value
Mike 6.5 20.25
Jen 10.2 0.5
Jeff 11.5 1.5
George 8.0 27.1
table4
Name Cat1_Value Cat2_Value
Mike 7.8 20.0
Jen 6.0 13.0
Jeff 13.2 5.0
George 8.0 1.2
Before anyone asks, the set of names in table2 must stay separate from table1. It isn’t duplicate information, but a SINGLE UNKNOWN SET that will be compared to every record in table1, which can contain millions of known sets (i.e.; no ID’s in table1 will ever match the ID in table2). If you look at the tables you can see that the set of names CAN match between table1 and table2 but do not have to. For example, the names for cat1 match between tables 1 and 2 for ID 12 and 25 (all 4 are Mike) but doesn't match any between IDs 13,14,15,16 and 25 (only the two in 25 are Mike). While at cat2, ID 12 and 25 match partially (i.e., the names in cat2 between tables 1 and 2 contain the name George but do not match in the second name). Here I show two categories. There will be upwards of 30 categories of names for one record but for now, I am focusing on 1 to solve this particular problem. Cat1_Name1, Cat1_Name2. I will worry about aggregating the different categories and logical name combinations with JOINs and UNIONs and answer my other question later….hopefully.
I want to create a new table that returns the ID from table1, with the associated value for each category depending on how many names match in the category. For example, since cat1_name 1 and 2 in table1 are mike,mike AND cat1_name 1 and 2 in table2 are mike,mike, return the ID from table 1 (12) and value in table 3 for cat1 (6.5). Different sets of matching names would return values from different tables (i.e.; the partially matching set in cat2 between 12 and 25 might return the value from table4 etc). I asked a similar question about this previously, here but the problem was different:
Returning Results from different tables depending on conditions from two other tables
I have a partial answer for it but now have a different problem. I plan on posting an answer to the first, once I figure out this problem (hopefully with a little help  ).
Here’s my query:
SELECT DISTINCT dbo.table1.ID, dbo.table3.Cat1_Value
INTO Cat1Table
FROM dbo.table2
INNER JOIN dbo.table3 ON (dbo.table1.Cat1_Name1 = dbo.table3.Name ) AND
(dbo.table1.Cat1_Name2 = dbo.table3.Name )
INNER JOIN dbo.table1 ON (dbo.table2.Cat2_Name1 = dbo.table3.Name ) AND
(dbo.table2.Cat2_Name2 = dbo.table3.Name )
Result table that I want:
Cat1Table
ID Cat1_Value
12 6.5
What I’m getting:
Cat1Table
ID Cat1_Value
12 6.5
12 6.5
Why am I getting a duplicate? Is it my logic or am I missing something else more simple? If I use SELECT DISTINCT it gives me the correct return but I'm thinking there might be a more efficient way because this will be expanded to millions of records. Wouldn't SELECT DISTINCT slow everything down?

SSIS - Merging rows with aggregate determinations

I'm attempting to determine the best way, either in SSIS or straight TSQL, to merge two rows based on a given key, but taking specific data from each row based on various aggregate rules (MAX and SUM specifically). As an example, given the following dataset:
Customer Name Total Date Outstanding
12345 A 100 7/15/2015 500
12345 200 1/1/2015 300
456 B 500 1/2/2010 100
456 B 250 2/1/2015 900
78 C 100 9/15/2015 500
I wish to consolidate those to a single row per customer key, with the following rules as an example:
If any name is null, use a corresponding value for that customer that is not null
MAX(Total)
MAX(Date)
SUM(Outstanding)
The result set would be:
Customer Name Total Date Outstanding
12345 A 200 7/15/2015 800
456 B 500 2/1/2015 1000
78 C 100 9/15/2015 500
What's the best approach here? My first instinct is to query the table to join to itself on customer to get all values on a single row, and then use formulas in a Derived Column task in SSIS to determine the values to use. My concern there is that is not scalable - it works fine if I have a customer occur only twice in the main dataset, but the goal would be for the logic to work for N number of rows without needing to do a ton of rework. I'm sure there's also a TSQL approach that I'm missing here. Any help would be appreciated.
If name column in your query is not empty then you can do that simply by using aggregate function in one query
DECLARE #Customer TABLE
(
Customer INT, Name varchar(10), Total INT , PurchaseDate DATE , Outstanding INT
)
INSERT INTO #Customer
SELECT 12345,'A',100,'7/15/2015',500 UNION
SELECT 12345,'A',200,'1/1/2015',300 UNION
SELECT 456,'B',500,'1/2/2010',100 UNION
SELECT 456,'B',250,'2/1/2015',900 UNION
SELECT 78,'C',100,'9/15/2015',500
SELECT Customer,NAME ,MAX(Total), MAX(PurchaseDate), SUM(outstanding)
FROM #Customer
GROUP BY Customer, NAME
Demo
Now, if your name column is empty in few cases like you have mentioned in your example then you can update name table with correct name value

SQL Server 2008 View Group By

I have created the following view in SQL Server 2008 to create mailing lists for land owners:
SELECT
dbo.parcel.featid,
CAST(mms_db.dbo.TR_Roll_Master.FMT_ROLL_NO AS decimal(11, 3)) AS Roll,
dbo.parcel.survey, mms_db.dbo.Central_Name_Database.NAME AS Owner,
mms_db.dbo.Central_Name_Database.NAME_2 AS Owner2,
mms_db.dbo.Central_Name_Database.BOX_NUM,
mms_db.dbo.Central_Name_Database.APT_NUM,
mms_db.dbo.Central_Name_Database.FMT_STREET AS House_num,
mms_db.dbo.Central_Name_Database.CITY AS Town,
mms_db.dbo.Central_Name_Database.PROV_CD AS Prov,
mms_db.dbo.Central_Name_Database.POST_CD AS Post_code,
mms_db.dbo.TR_Roll_Number_Owners.NAME_CODE
FROM
mms_db.dbo.TR_Roll_Master
INNER JOIN
dbo.parcel ON mms_db.dbo.TR_Roll_Master.ROLL_NO = dbo.parcel.roll_no COLLATE SQL_Latin1_General_CP1_CI_AS
INNER JOIN
mms_db.dbo.TR_Roll_Number_Owners ON mms_db.dbo.TR_Roll_Master.ROLL_NO = mms_db.dbo.TR_Roll_Number_Owners.ROLL_NO
INNER JOIN
mms_db.dbo.Central_Name_Database ON mms_db.dbo.TR_Roll_Number_Owners.NAME_CODE = mms_db.dbo.Central_Name_Database.NAME_CODE
WHERE
(mms_db.dbo.TR_Roll_Master.DEL_ROLL NOT LIKE '%Y%') AND
(mms_db.dbo.TR_Roll_Master.ROLL_NO NOT LIKE 'P%') OR
(mms_db.dbo.TR_Roll_Master.DEL_ROLL IS NULL) AND (mms_db.dbo.TR_Roll_Master.ROLL_NO NOT LIKE 'P%') OR
(mms_db.dbo.TR_Roll_Master.DEL_ROLL NOT LIKE '%I%') AND
(mms_db.dbo.TR_Roll_Master.ROLL_NO NOT LIKE 'P%')
The view works fine however there are often duplicates as many people own more than one piece of land. I would like to group by Name_Code to eliminate the duplicates.
When I add:
Group by mms_db.dbo.TR_Roll_Number_Owners.NAME_CODE
to the end of the query I am returned with the following response:
SQL Execution Error.
Executed SQL statement: SELECT dbo.parcel.featid, CAST(mms_db.dbo.TR_Roll_Master.FMT_ROLL_NO AS decimal(11,3)) AS Roll,
dbo.parcel.survey,
mms_db.dbo.Central_NameDatabase.Name AS Owner,
mms_db.dbo.Central_Name_Database.NAME_2 AS Owner2,
mms_db.dbo.Central_Name_Database.B...
Error Source: .Net SQLClient Data Provider
Error Message: Column 'dbo.parcel.featid' is invalid in the select list
because it is not contained in either an aggregate function or the
GROUP BY clause.
I'm not sure what I need to change to make this work.
--Edit--
As a sample data, here is a condensed sample of what I would like to achieve
Roll Owner Box_Num Town Prov Post_code Name_Code
100 John Smith 50 Somewhere MB R3W 9T7 00478
200 John Smith 50 Somewhere MB R3W 9T7 00478
300 Peter Smith 72 Somewhere MB R3W 9T9 00592
400 John Smith 90 OtherPlace MB R2R 8V7 00682
John Smith has the name code of 00478. He owns both Roll 100 & 200, Peter Smith owns 300 and another person with the name of John Smith owns 400. Based on different Name_Code values I know that the two John Smith's are different people. I would like an output that would list John Smith with Name_Code 00478 1 time only while also listing Peter Smith and the other John Smith. Name_Code is the only value I can use for grouping as the rest could represent different people with the same name.
If you just want to eliminate duplicates, just use DISTINCT and exclude the columns representing other "people on more than one piece of land" from your query viz:
SELECT DISTINCT
NAME_CODE,
{column2},
{column3},
FROM
[MyView]
However, if you wish to perform aggregation of some sort, or show one random of the "people on more than one piece of land" then you will need the GROUP BY. All non-aggregated columns in the select need to appear in the group by:
SELECT
NAME_CODE,
... Other non aggregated fields here
COUNT(featid) AS NumFeatIds,
MIN(Owner2) AS FirstOwner,
... etc (other aggregated columns)
GROUP BY
NAME_CODE,
... All non-aggregated columns in the select.
Edit
To get the table listed in your edit, you would just need to ORDER BY Name_Code
However to get just one row of John Smith #00478, you need to compromise on the non-unique columns by either eliminating them entirely, using GROUP BY and aggregates on the rows, doing a GROUP_CONCAT type hack to e.g. comma separate them, or to pivot the duplicate row columns as extra columns on the one row.
Since you've mentioned GROUP repeatedly, it seems the aggregation route is necessary. John Smith #00478 has 2 properties, hence 2 discrete Roll values. So Roll can't appear in the aggregated result. So instead, you can return e.g. a count of the Rolls, or the MIN or MAX Roll, but not both Rows*. The other columns (Address related) are probably constant for all properties (assuming John Smith 00478 has one address), but unfortunately SqlServer will require you to include them in the GROUP.
I would suggest you try:
SELECT
COUNT(Roll) AS NumPropertiesOwned,
Owner,
Box_Num,
Town,
Prov,
Post_code,
Name_Code
FROM [MyNewView]
GROUP BY
Owner, Box_Num, Town, Prov, Post_code, Name_Code
ORDER BY Name_Code;
i.e. all the non-aggregated columns must be repeated in the GROUP BY
* unless you use the GROUP_CONCAT hack or the pivot route
its telling you what to do:
"Error Message: Column 'dbo.parcel.featid' is invalid in the select list
because it is not contained in either an aggregate function or the
GROUP BY clause."
This means you have to group the other (non-aggregated) fields too.

Distinct rows from three tables using joins

I have three tables related to article section of my website. I need to show the top authors based on based on number if times authors articles where read. I use basic three table to store this inform.
Article has all the details related to articles, author information is stored in Authors and when a user views a particular article I update or insert a new record in Popularity.
Below is sample data:
Articles
ArticleID Title Desc AuthorID
--------- ---------------- ---- --------
1 Article One .... 100
2 Article Two .... 200
3 Article Three .... 100
4 Article Four .... 300
5 Article Five .... 100
6 Article Six .... 300
7 Article Seven .... 500
8 Article Eight .... 100
9 Article Nine .... 600
Authors
AuthorID AuthorName
-------- ------------
100 Author One
200 Author Two
300 Author Three
400 Author Four
500 Author Five
600 Author Six
Popularity
ID ArticleID Hits
-- --------- ----
1 1 20
2 2 50
3 5 100
4 3 11
5 4 21
I am trying to use following query to get the TOP 10 authors:
SELECT TOP 10 AuthorID
,au.AuthorName
,ArticleHits
,SUM(ArticleHits)
FROM Authors au
JOIN Articles ar
ON au.AuthorID = ar.ArticleAuthorID
JOIN Popularity ap
ON ap.ArticleID = ar.ArticleID
GROUP BY AuthorID,1,1,1
But this generates the following error:
Msg 164, Level 15, State 1, Line 12Each GROUP BY expression must contain at least one column that is not an outer reference.
SQL Server requires that any columns in the SELECT list must be in the GROUP BY cluase or in an aggregate function. The following query appears to be working, as you can see I included a GROUP BY au.AuthorID, au.AuthorName which contains both columns in the SELECT list that are not in an aggregate function:
SELECT top 10 au.AuthorID
,au.AuthorName
,SUM(Hits) TotalHits
FROM Authors au
JOIN Articles ar
ON au.AuthorID = ar.AuthorID
JOIN Popularity ap
ON ap.ArticleID = ar.ArticleID
GROUP BY au.AuthorID, au.AuthorName
order by TotalHits desc
See SQL Fiddle with Demo.
I am not sure if you want the Hits in the SELECT statement because you will then have to GROUP BY it. This could alter the Sum(Hits) for each article because if the hits are different in each entry you will not get an accurate sum.
I would do it this way. First figure out who your top ten authors are, then go get the name (and any other columns you want to pull along). For this query it's not a huge difference but all that grouping can become more complex and expensive as your output list requirements increase.
;WITH TopAuthors(AuthorID, ArticleHits) AS
(
SELECT TOP (10) a.AuthorID, SUM(p.Hits)
FROM dbo.Authors AS a
INNER JOIN dbo.Articles AS ar
ON a.AuthorID = ar.AuthorID
INNER JOIN dbo.Popularity AS p
ON ar.ArticleID = p.ArticleID
ORDER BY SUM(p.Hits) DESC
)
SELECT t.AuthorID, a.AuthorName, t.ArticleHits
FROM TopAuthors AS t
INNER JOIN dbo.Authors AS a
ON t.AuthorID = a.AuthorID
ORDER BY t.ArticleHits DESC;
For this specific query bluefeet's version is likely to be more efficient. But if you add additional columns to the output (e.g. more info from the authors table) the grouping might outweigh the additional seek or scan I have presented.
As many columns present with Aggregate function those have to be present in the group by clause. In your case, AuthorID, au.AuthorName, ArticleHits should also be present. Hence the group by statement would become
GROUP BY AuthorID, au.AuthorName, ARticleHits
This would help.

Binding literal list to SQL query as column that would return 1 row?

I have an SQL query that returns a single row, I have a list of numbers that I need to have returned as individual rows with the single row data bound to their row.
for example here's what I'm trying to do
select a,b,c, barcode
from database
join ('12345', '67890',...) as barcode
where a=1 and b=2 and c=3
I need to do it this way due to the fact i'm modifying some code that's looking for a specific format to come from the query, and modifying the code to match the literal list I have is far more difficult than doing something like this
Example Output:
a b c barcode
- - - -------
1 2 3 12345
1 2 3 67890
1 2 3 ....
1 2 3 ....
...
Easiest method would be to create a barcode table with a single column, insert the values you want here one at a time, then join to that table.
Could use a union to fudge it as well. Problem with join ('484','48583',...) is you are joining to a single row with multi columns, when you want one row per record.
pseudo coded:
select a,b,c, barcode
from database
join (select 12345 union all select 289384 union all...)a as barcode
where a=1 and b=2 and c=3
Basically, you could pass the list as a single CSV string and transform it into a row set of items. A table-valued function is often used in such cases, but there are actually many options to explore. A very comprehensive collection of various methods and their tests can be found in the set of articles by Erland Sommarskog: Arrays and Lists in SQL Server.
If it was e.g. a function, your query might look like this:
SELECT
t.a,
t.b,
t.c,
s.Value AS barcode
FROM yourtable t
CROSS JOIN dbo.Split('12345,67890', ',') s
WHERE t.a = 1
AND t.b = 2
AND t.c = 3

Resources