SQL Server 2000: Ideas for performing concatenation aggregation subquery - sql-server
i have a query that returns rows that i want, e.g.
QuestionID QuestionTitle UpVotes DownVotes
========== ============= ======= =========
2142075 Win32: Cre... 0 0
2232727 Win32: How... 2 0
1870139 Wondows Ae... 12 0
Now i want to have a column returned, that contains a comma separated list of "Authors" (e.g. original poster and editors). e.g.:
QuestionID QuestionTitle UpVotes DownVotes Authors
========== ============= ======= ========= ==========
2142075 Win32: Cre... 0 0 Ian Boyd
2232727 Win32: How... 2 0 Ian Boyd, roygbiv
1870139 Wondows Ae... 12 0 Ian Boyd, Aaron Klotz, Jason Diller, danbystrom
Faking It
SQL Server 2000 does not have a CONCAT(AuthorName, ', ') aggregation operation, i've been faking it - performing simple sub-selects for the TOP 1 author, and the author count.
QuestionID QuestionTitle UpVotes DownVotes FirstAuthor AuthorCount
========== ============= ======= ========= =========== ===========
2142075 Win32: Cre... 0 0 Ian Boyd 1
2232727 Win32: How... 2 0 Ian Boyd 2
1870139 Wondows Ae... 12 0 Ian Boyd 3
If there is more than one author, then i show the user an ellipses ("…"), to indicate there is more than one. e.g. the user would see:
QuestionID QuestionTitle UpVotes DownVotes Authors
========== ============= ======= ========= ==========
2142075 Win32: Cre... 0 0 Ian Boyd
2232727 Win32: How... 2 0 Ian Boyd, …
1870139 Wondows Ae... 12 0 Ian Boyd, …
And that works well enough, since normally a question isn't edited - which means i'm supporting the 99% case perfectly, and the 1% case only half-assed as well.
Threaded Re-query
As a more complicated, and bug-prone solution, i was thinking of iterating the displayed list, and spinning up a thread-pool worker thread for each "question" in the list, perform a query against the database to get the list of authors, then aggregating the list in memory. This would mean that the list fills first in the (native) application. Then i issue a few thousand individual queries afterwards.
But that would be horribly, horrendously, terribly, slow. Not to mention bug-riddled, since it will be thread work.
Yeah yeah yeah
Adam Mechanic says quite plainly:
Don't concatenate rows into delimited
strings in SQL Server. Do it client
side.
Tell me how, and i'll do it.
/cry
Can anyone think of a better solution, that is as fast (say...within an order of magnitude) than my original "TOP 1 plus ellipses" solution?
For example, is there a way to return a results set, where reach row has an associated results set? So for each "master" row, i could get at a "detail" results set that contains the list.
Code for best answer
Cade's link to Adam Machanic's solution i like the best. A user-defined function, that seems to operate via magic:
CREATE FUNCTION dbo.ConcatAuthors(#QuestionID int)
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE #Output VARCHAR(8000)
SET #Output = ''
SELECT #Output = CASE #Output
WHEN '' THEN AuthorName
ELSE #Output + ', ' + AuthorName
END
FROM (
SELECT QuestionID, AuthorName, QuestionDate AS AuthorDate FROM Questions
UNION
SELECT QuestionID, EditorName, EditDate FROM QuestionEdits
) dt
WHERE dt.QuestionID = #QuestionID
ORDER BY AuthorDate
RETURN #Output
END
With a T-SQL usage of:
SELECT QuestionID, QuestionTitle, UpVotes, DownVotes, dbo.ConcatAuthors(AuthorID)
FROM Questions
Have a look at these articles:
http://dataeducation.com/rowset-string-concatenation-which-method-is-best/
http://www.simple-talk.com/sql/t-sql-programming/concatenating-row-values-in-transact-sql/ (See Phil Factor's cross join solution in the responses - which will work in SQL Server 2000)
Obviously in SQL Server 2005, the FOR XML trick is easiest, most flexible and generally most performant.
As far as returning a rowset for each row, if you still want to do that for some reason, you can do that in a stored procedure, but the client will need to consume all the rows in the first rowset and then go to the next rowset and associate it with the first row in the first rowset, etc. Your SP would need to open a cursor on the same set it returned as the first rowset and run multiple selects in sequence to generate all the child rowsets. It's a technique I've done, but only where ALL the data actually was needed (for instance, in a fully-populated tree view).
And regardless of what people say, doing it client-side is often a very big waste of bandwidth, because returning all the rows and doing the looping and breaking in the client side means that huge number of identical columns are being transferred at the start of each row just to get the changing column at the end of the row.
Wherever you do it, it should be an informed decision based on your use case.
I tried 3 approaches to this solution, the one posted here, activex scripting and UDF functions.
The most effective script (speed-wise) for me was bizzarely Axtive-X script running multiple queries to get the additioanl data to concat.
UDF took an average of 22 minutes to transform, the Subquery method (posted here) took around 5m and the activeX script took 4m30, much to my annoyance since this was the script I was hoping to ditch. I'll have to see if I can iron out a few more efficiencies elsewhere.
I think the extra 30s is used by the tempdb being used to store the data since my script requires an order by.
It should be noted that I am concatanating huge quantities of textual data.
You can also take a look to this script. It's basically the cross join approach that Cade Roux also mentioned in his post.
The above approach looks very clean: you have to do a view first and secondly create a statement based on the values in the view. The second sql statement you can build dynamically in your code, so it should be straight forward to use.
I'm not sure if this works in SQL Server 2000, but you can try it:
--combine parent and child, children are CSV onto parent row
CREATE TABLE #TableA (RowID int, Value1 varchar(5), Value2 varchar(5))
INSERT INTO #TableA VALUES (1,'aaaaa','A')
INSERT INTO #TableA VALUES (2,'bbbbb','B')
INSERT INTO #TableA VALUES (3,'ccccc','C')
CREATE TABLE #TableB (RowID int, TypeOf varchar(10))
INSERT INTO #TableB VALUES (1,'wood')
INSERT INTO #TableB VALUES (2,'wood')
INSERT INTO #TableB VALUES (2,'steel')
INSERT INTO #TableB VALUES (2,'rock')
INSERT INTO #TableB VALUES (3,'plastic')
INSERT INTO #TableB VALUES (3,'paper')
SELECT
a.*,dt.CombinedValue
FROM #TableA a
LEFT OUTER JOIN (SELECT
c1.RowID
,STUFF(
(SELECT
', ' + TypeOf
FROM (SELECT
a.RowID,a.Value1,a.Value2,b.TypeOf
FROM #TableA a
LEFT OUTER JOIN #TableB b ON a.RowID=b.RowID
) c2
WHERE c2.rowid=c1.rowid
ORDER BY c1.RowID, TypeOf
FOR XML PATH('')
)
,1,2, ''
) AS CombinedValue
FROM (SELECT
a.RowID,a.Value1,a.Value2,b.TypeOf
FROM #TableA a
LEFT OUTER JOIN #TableB b ON a.RowID=b.RowID
) c1
GROUP BY RowID
) dt ON a.RowID=dt.RowID
OUTPUT from SQL Server 2005:
RowID Value1 Value2 CombinedValue
----------- ------ ------ ------------------
1 aaaaa A wood
2 bbbbb B rock, steel, wood
3 ccccc C paper, plastic
(3 row(s) affected)
EDIT query that replaces FOR XML PATH with FOR XML RAW, so this should work on SQL Server 2000
SELECT
a.*,dt.CombinedValue
FROM #TableA a
LEFT OUTER JOIN (SELECT
c1.RowID
,STUFF(REPLACE(REPLACE(
(SELECT
', ' + TypeOf as value
FROM (SELECT
a.RowID,a.Value1,a.Value2,b.TypeOf
FROM #TableA a
LEFT OUTER JOIN #TableB b ON a.RowID=b.RowID
) c2
WHERE c2.rowid=c1.rowid
ORDER BY c1.RowID, TypeOf
FOR XML RAW
)
,'<row value="',''),'"/>','')
, 1, 2, '') AS CombinedValue
FROM (SELECT
a.RowID,a.Value1,a.Value2,b.TypeOf
FROM #TableA a
LEFT OUTER JOIN #TableB b ON a.RowID=b.RowID
) c1
GROUP BY RowID
) dt ON a.RowID=dt.RowID
OUTPUT, same as original query
Related
Field is being updated with same value
I have a table that has a new column, and updating the values that should go in the new column. For simplicity sake I am reducing my example table structure as well as my query. Below is how i want my output to look. IDNumber NewColumn 1 1 2 1 3 1 4 2 5 2 WITH CTE_Split AS ( select *,ntile(2) over (order by newid()) as Split from TableA ) Update a set NewColumn = a.Split from CTE_Split a Now when I do this I get my table and it looks as such IDNumber NewColumn 1 1 2 1 3 1 4 1 5 1 However when I do the select only I can see that I get the desire output, now I have done this before to split result sets into multiple groups and everything works within the select but now that I need to update the table I am getting this weird result. Not quiet sure what I'm doing wrong or if anyone can provide any sort of feedback. So after a whole day of frustration I was able to compare this code and table to another that I had already done this process to. The reason that this table was getting updated to all 1s was because turns out that whoever made the table thought this was supposed to be a bit flag. When it reality it should be an int because in this case its actually only two possible values but in others it could be more than two. Thank you for all your suggestion and help and it should teach me to scope out data types of tables when using the ntile function.
Try updating your table directly rather than updating your CTE. This makes it clearer what your UPDATE statement does. Here is an example: WITH CTE_Split AS ( SELECT *, ntile(2) over (order by newid()) as Split FROM TableA ) UPDATE a SET NewColumn = c.Split FROM TableA a INNER JOIN CTE_Split c ON a.IDNumber = c.IDNumber I assume that what you want to achieve is to group your records into two randomized partitions. This statement seems to do the job.
SQL join conditional either or not both?
I have 3 tables that I'm joining and 2 variables that I'm using in one of the joins. What I'm trying to do is figure out how to join based on either of the statements but not both. Here's the current query: SELECT DISTINCT WR.Id, CAL.Id as 'CalendarId', T.[First Of Month], T.[Last of Month], WR.Supervisor, WR.cd_Manager as [Manager], --Added to search by the Manager-- WR.[Shift] as 'ShiftId' INTO #Workers FROM #T T --Calendar RIGHT JOIN [dbo].[Calendar] CAL ON CAL.StartDate <= T.[Last of Month] AND CAL.EndDate >= T.[First of Month] --Workers --This is the problem join RIGHT JOIN [dbo].[Worker_Filtered]WR ON WR.Supervisor IN (SELECT Id FROM [dbo].[User] WHERE FullName IN(#Supervisors)) or (WR.Supervisor IN (SELECT Id FROM [dbo].[User] WHERE FullName IN(#Supervisors)) AND WR.cd_Manager IN(SELECT Id FROM [dbo].[User] WHERE FullNameIN(#Manager))) --Added to search by the Manager-- AND WR.[Type] = '333E7907-EB80-4021-8CDB-5380F0EC89FF' --internal WHERE CAL.Id = WR.Calendar AND WR.[Shift] IS NOT NULL What I want to do is either have the result based on the Worker_Filtered table matching the #Supervisor or (but not both) have it matching both the #Supervisor and #Manager. The way it is now if it matches either condition it will be returned. This should be limiting the returned results to Workers that have both the Supervisor and Manager which would be a smaller data set than if they only match the Supervisor. UPDATE The query that I have above is part of a greater whole that pulls data for a supervisor's workers. I want to also limit it to managers that are under a particular supervisor. For example, if #Supervisor = John Doe and #Manager = Jane Doe and John has 9 workers 8 of which are under Jane's management then I would expect the end result to show that there are only 8 workers for each month. With the current query, it is still showing all 9 for each month. If I change part of the RIGHT JOIN to: WR.Supervisor IN (SELECT Id FROM [dbo].[User] WHERE FullName IN (#Supervisors)) AND WR.cd_Manager IN(SELECT Id FROM [dbo].[User] WHERE FullName IN(#Manager)) Then it just returns 12 rows of NULL. UPDATE 2 Sorry, this has taken so long to get a sample up. I could not get SQL Fiddle to work for SQL Server 2008/2014 so I am using rextester instead: Sample This shows the results as 108 lines. But what I want to show is just the first 96 lines. UPDATE 3 I have made a slight update to the Sample. this does get the results that I want. I can set #Manager to NULL and it will pull all 108 records, or I can have the correct Manager name in there and it'll only pull those that match both Supervisor and Manager. However, I'm doing this with an IF ELSE and I was hoping to avoid doing that as it duplicates code for the insert into the Worker table.
The description of expected results in update 3 makes it all clear now, thanks. Your 'problem' join needs to be: RIGHT JOIN Worker_Filtered wr on (wr.Supervisor in(#Supervisors) and case when #Manager is null then 1 else case when wr.Manager in(#Manager) then 1 else 0 end end = 1) By the way, I don't know what you are expecting the in(#Supervisors) to achieve, but if you're hoping to supply a comma separated list of supervisors as a single string and have wr.Supervisor match any one of them then you're going to be disappointed. This query works exactly the same if you have = #Supervisors instead.
Best way to select most recent record
So the database I am using does not have a great way to select the most recent number by its unique ID. We have to narrow down to get the most recent record with a bunch of sub queries joining back to the original table. The original table is TBL_POL. Ex. Policy_ID Load_DATE ENDORSEMENT# SEQUENCE EXTRACTDATE 25276 8/16/2015 0 1 8/15/2015 25276 2/13/2016 1 2 2/12/2016 25276 9/24/2016 3 4 9/20/2016 25276 9/24/2016 3 4 9/20/2016 25276 9/24/2016 2 3 9/20/2016 so first we grab the max load date and join back to the original table and then grab the max endorsement # and then join back and grab the max sequence and then join back and get the max extract date to finally get back to our final record so it will be unique. Above is an example. Is there an easier way to do this? Someone mentioned row_number() over(partition by), but I think that just returns the whatever row number you would like. I am for a quick way to grab the most record with all these above attributes in one swipe. Does anyone have a better idea to do this, because these queries take a little while to run. Thanks
#Bryant, First, #Backs saved this post for you. When I first looked at it I thought "Damn. If he doesn't care to spend any time making his request readable, why should I bother"? Further, if you're looking for a coded example, then it would be good to create some readily consumable test data to make it a whole lot easier for folks to help you. Also, as #Felix Pamittan suggested, you should also post what your expected return should be. Here's one way to post readily consumable test data. I also added another Policy_ID so that I could demonstrate how to do this for a whole table instead of just one Policy_ID. --===== If the test table doesn't already exist, drop it to make reruns in SSMS easier. -- This is NOT a part of the solution. We're just simulating the original table -- using a Temp Table. IF OBJECT_ID('tempdb..#TBL_POL','U') IS NOT NULL DROP TABLE #TBL_POL ; --===== Create the test table (technically, a heap because no clustered index) -- Total SWAG on the data-types because you didn't provide those, either. CREATE TABLE #TBL_POL ( Policy_ID INT NOT NULL ,Load_DATE DATE NOT NULL ,ENDORSEMENT# TINYINT NOT NULL ,SEQUENCE TINYINT NOT NULL ,EXTRACTDATE DATE NOT NULL ) ; --===== Populate the test table INSERT INTO #TBL_POL (Policy_ID,Load_DATE,ENDORSEMENT#,SEQUENCE,EXTRACTDATE) SELECT Policy_ID,Load_DATE,ENDORSEMENT#,SEQUENCE,EXTRACTDATE FROM (VALUES --===== Original values provided (25276,'8/16/2015',0,1,'8/15/2015') ,(25276,'2/13/2016',1,2,'2/12/2016') ,(25276,'9/24/2016',3,4,'9/20/2016') ,(25276,'9/24/2016',3,4,'9/20/2016') ,(25276,'9/24/2016',2,3,'9/20/2016') --===== Additional values to demo multiple Policy_IDs with ,(12345,'8/16/2015',0,1,'8/15/2015') ,(12345,'9/24/2016',1,5,'2/12/2016') ,(12345,'2/13/2016',1,2,'2/12/2016') ,(12345,'9/24/2016',3,4,'9/20/2016') ,(12345,'9/24/2016',3,4,'9/20/2016') ,(12345,'9/24/2016',2,3,'9/20/2016') ) v (Policy_ID,Load_DATE,ENDORSEMENT#,SEQUENCE,EXTRACTDATE) ; --===== Show what's in the test table SELECT * FROM #TBL_POL ; If you are looking to resolve your question for more than one Policy_ID at a time, then the following will work. --===== Use a partitioned windowing function to find the latest row -- for each Policy_ID, ignoring "dupes" in the process. -- This assumes that the "sequence" column is king of the hill. WITH cteEnumerate AS ( SELECT * ,RN = ROW_NUMBER() OVER (PARTITION BY Policy_ID ORDER BY SEQUENCE DESC) FROM #TBL_POL ) SELECT Policy_ID,Load_DATE,ENDORSEMENT#,SEQUENCE,EXTRACTDATE FROM cteEnumerate WHERE RN = 1 ; If you're only looking for one Policy_ID for this, the "TOP 1" method that #ZLK suggested will work but so will adding a WHERE clause to the above. Not sure which will work faster but the same indexes will help both. Here's the solution with a WHERE clause (which could be parameterized). --===== Use a partitioned windowing function to find the latest row -- for each Policy_ID, ignoring "dupes" in the process. -- This assumes that the "sequence" column is king of the hill. WITH cteEnumerate AS ( SELECT * ,RN = ROW_NUMBER() OVER (PARTITION BY Policy_ID ORDER BY SEQUENCE DESC) FROM #TBL_POL WHERE Policy_ID = 25276 ) SELECT Policy_ID,Load_DATE,ENDORSEMENT#,SEQUENCE,EXTRACTDATE FROM cteEnumerate WHERE RN = 1 ;
May be you should try Grouping SET Throw another sample data. Also i am not sure about performance. Give Feedback but result and performance both SELECT * FROM ( SELECT Policy_ID ,max(Load_DATE) Load_DATE ,max(ENDORSEMENT#) ENDORSEMENT# ,max(SEQUENCE) SEQUENCE ,max(EXTRACTDATE) EXTRACTDATE FROM #TBL_POL t GROUP BY grouping SETS(Policy_ID, Load_DATE, ENDORSEMENT#, SEQUENCE, EXTRACTDATE) ) t4 WHERE Policy_ID IS NOT NULL drop table #TBL_POL
Calculate time duration between differents records in a table based on datetimes
For the question, Let's say I have a table that holds the following data : 1) Name 2) Mood 3) DateTime I could insert records like : Andy Happy '11.06.2012 - 14.06.07' -- Inserted on 11.06.2012 # 19:12.32 Arthur Angry '11.06.2012 - 15.06.57' -- Inserted on 11.06.2012 # 17:12.32 Andy Sad '11.06.2012 - 14.34.05' -- Inserted on 11.06.2012 # 17:12.32 Arthur Happy '11.06.2012 - 13.34.05' -- Inserted on 11.06.2012 # 14:12.32 I would like to get the "duration" information related to these moods changes ! My table holds thousands of hundred or records and I cannot afford having a process that takes too much time. What would be the best way of calculating this ? Trigger "After insert", filling a "Duration" column ? Stored procedure that fill a previously created "Duration" column ? Calculated column ? A view (I already tried that and it takes more than 2 seconds to display, which is totally unacceptable) Another idea ? Thanks for your help ! Important edit : The mood records arrive grouped into packet and we cannot be sure that already inserted records have smaller dates! (see the above comments next to my records)
A possible SQL version: WITH CTE AS( SELECT ROW_NUMBER()OVER(PARTITION BY [Name] ORDER BY [Time])As RowNum , * FROM #table T ) SELECT DiffSec=DATEDIFF(s,[Time],(SELECT [Time] FROM CTE c2 WHERE c2.[Name]=CTE.[Name] AND c2.RowNum=CTE.RowNum+1)) , [Name] , Mood , [Time] FROM CTE ORDER BY [Name],[RowNum] Result: DiffSec Name Mood Time 1678 Andy Happy 2012-06-11 14:06:07.000 NULL Andy Sad 2012-06-11 14:34:05.000 5228 Arthur Angry 2012-06-11 14:06:57.000 NULL Arthur Happy 2012-06-11 15:34:05.000 Your test data: declare #table table(name varchar(10),mood varchar(10),time datetime); insert into #table values('Andy','Happy',convert(datetime,'11.06.2012 14:06:07',104)); insert into #table values('Arthur','Angry',convert(datetime,'11.06.2012 14:06:57',104)); insert into #table values('Andy','Sad',convert(datetime,'11.06.2012 14:34:05',104)); insert into #table values('Arthur','Happy',convert(datetime,'11.06.2012 15:34:05',104)); Edit Self-Joining a CTE seems to be a very bad idea ("If you self join the CTE it will kill you"). I've tested my query with 500000 records in a temporary table and cancelled the query after 30 minutes. Here's a much faster approach(4 seconds for all) using a sub-query (with your commented schema): SELECT T.* ,(SELECT DATEDIFF(s,MAX(T2.Time),T.Time) FROM dbo.Temp T2 WHERE T2.HE_Id = T.HE_Id AND T2.Time < T.Time ) AS DiffSec FROM dbo.Temp AS T ORDER BY HE_Id ,DiffSec
Microsoft T-SQL Counting Consecutive Records
Problem: From the most current day per person, count the number of consecutive days that each person has received 0 points for being good. Sample data to work from : Date Name Points 2010-05-07 Jane 0 2010-05-06 Jane 1 2010-05-07 John 0 2010-05-06 John 0 2010-05-05 John 0 2010-05-04 John 0 2010-05-03 John 1 2010-05-02 John 1 2010-05-01 John 0 Expected answer: Jane was bad on 5/7 but good the day before that. So Jane was only bad 1 day in a row most recently. John was bad on 5/7, again on 5/6, 5/5 and 5/4. He was good on 5/3. So John was bad the last 4 days in a row. Code to create sample data: IF OBJECT_ID('tempdb..#z') IS NOT NULL BEGIN DROP TABLE #z END select getdate() as Date,'John' as Name,0 as Points into #z insert into #z values(getdate()-1,'John',0) insert into #z values(getdate()-2,'John',0) insert into #z values(getdate()-3,'John',0) insert into #z values(getdate()-4,'John',1) insert into #z values(getdate(),'Jane',0) insert into #z values(getdate()-1,'Jane',1) select * from #z order by name,date desc Firstly, I am sorry but new to this system and having trouble figuring out how to work the interface and post properly. 2010-05-13 --------------------------------------------------------------------------- Joel, Thank you so much for your response below! I need it for a key production job that was running about 60 minutes. Now the job runs in 2 minutes!! Yes, there was 1 condition in my case that I needed to address. My source always had only record for those that had been bad recently so that was not a problem for me. I did however have to handle records where they were never good, and did that with a left join to add back in the records and gave them a date so the counting would work for all. Thanks again for your help. It was opened my mind some more to SET based logic and how to approach it and was a HUGE benefit to my production job.
The basic solution here is to first build a set that contains the name of each person and the value of the last day on which that person was good. Then join this set to the original table and group by name to find the count of days > the last good day for each person. You can build the set in either a CTE, a view, or an uncorrelated derived table (sub query) — any of those will work. My example below uses a CTE. Note that while the concept is sound, this specific example might not return exactly what you want. Your actual needs here depend on what you want to happen for those who have not been good ever and for those who have not been bad recently (ie, you might need a left join to show users who were good yesterday). But this should get you started: WITH LastGoodDays AS ( SELECT MAX([date]) as [date], name FROM [table] WHERE Points > 0 GROUP BY name ) SELECT t.name, count(*) As ConsecutiveBadDays FROM [table] t INNER JOIN LastGoodDays lgd ON lgd.name = t.name AND t.[date] > lgd.[date] group by t.name