Formatting link lists using TSQL - sql-server

Shog9 keeps on making my link lists look awesome.
Essentially, I write a bunch of queries that pull out results from the Stackoverflow data dump. However, my link lists look very ugly and are hard to understand.
Using some formatting magic Shog9 manages to make the link lists look a lot nicer.
So, for example, I will write a query that returns the following:
question id,title,user id, other info
4,When setting a form’s opacity should I use a decimal or double?,8,Eggs McLaren, some other stuff lots of text
And I want it to paste it into an answer on meta and make it look like this:
Question Id User Name Other Info
When setting a form’s opacity... Eggs Mclaren Some other stuff...
So assuming my starting point is the query that returns the start info.
What are the least amount of steps I can run in query analyser to turn the results into:
<h3> Question Id User Name Other Info </h3>
<pre>
When setting a form’s opacity... Eggs Mclaren Some other stuff...
</pre>
My initial thoughts are to insert the results into a temp table and then run a stored proc that will iron the data into my desired structure. Run the proc, cut and paste and be done.
Any candidate TSQL based solutions to this problem?
EDIT: Accepting my answer, its the only solution with an implementation.

Not sure of your exact requirements, but have you considered selecting the data as XML and then applying an XSLT transform to the results?

I'll update this post with my progress as I refine my proc:
Example:
select top 20
UserId = u.Id,
UserName = u.DisplayName,
u.Reputation,
sum(case when p.ParentId is null then 1 else 0 end) as Questions,
sum(case when p.ParentId is not null then 1 else 0 end) as Answers
into #t
from Users u
join Posts p on p.OwnerUserId = u.Id
where p.CommunityOwnedDate is null and p.ClosedDate is null
group by u.Id, u.DisplayName, u.Reputation
having sum(case when p.ParentId is not null then 1 else 0 end) < sum(case when p.ParentId is null then 1 else 0 end) / 6
order by Reputation desc
exec spShog9
Results:
User Reputation
Questions Answers
Edward Tanguay 8317 465 24
me 5767 311 29
Joan Venge 4844 226 14
Blankman 4546 310 1
acidzombie24 4359 371 32
Thanks 4350 416 21
Masi 4193 555 74
LazyBoy 3230 94 12
KingNestor 3187 92 11
Nick 2084 79 6
George2 1973 263 1
Xaisoft 1944 174 12
John 1929 160 24
danmine 1901 53 3
zsharp 1771 145 16
carrier 1742 56 8
JC Grubbs 1550 50 5
vg1890 1534 56 2
Coocoo4Cocoa 1514 143 0
Keand64 1513 83 5
Masi 4193 555 74
LazyBoy 3230 94 12
KingNestor 3187 92 11
Nick 2084 79 6
George2 1973 263 1
Xaisoft 1944 174 12
John 1929 160 24
danmine 1901 53 3
zsharp 1771 145 16
carrier 1742 56 8
JC Grubbs 1550 50 5
vg1890 1534 56 2
Coocoo4Cocoa 1514 143 0
Keand64 1513 83 5
The proc is on gist: http://gist.github.com/165544

You could do something like:
with
data (question_id,title,user_id, username ,other_info) as
(
select 4,'When setting a form''s opacity should I use a decimal or double?',8,'Eggs McLaren', 'some other stuff lots of text'
union all
select 5,'Another q title',9,'OtherUsername', 'some other stuff lots of text')
select
(select 'http://stackoverflow.com/questions/' + cast(question_id as varchar(10)) as [#href], title as [*] for xml path('a')) as questioninfo
,(select 'http://stackoverflow.com/users/' + cast(user_id as varchar(10)) + '/' + replace(username, ' ', '-') as [#href], username as [*] for xml path('a')) as userinfo
, other_info
from data
...but see how you go. I personally find that FOR XML PATH is very powerful for getting marked-up results in a way that suits me.
Rob

Related

Calculating a column in SQL using the column's own output as input

I have problem that I find very hard to solve:
I need to calculate a column R_t in SQL where for each row, the sum of the "previous" calculated values SUM(R_t-1) is required as input. The calculation is done grouped over a ProjectID column. I have no clue how to proceed.
The formula for the calculation I am trying to achieve is R_t = ([Contract value]t - SUM(R{t-1})) / [Remaining Hours]_t * [HoursRegistered]t where "t" denotes time and SUM(R{t-1}) is the sum of R_t from t = 0 to t-1.
Time is always consecutive and always begin in t = 0. But number of time periods may differ across [ProjectID], i.e. one project having t = {0,1,2} and another t = {0,1,2,3,4,5}. The time period will never "jump" from 5 to 7
The expected output (using the data from below is) for ProjectID 101 is
R_0 = (500,000 - 0) / 500 * 65 = 65,000
R_1 = (500,000 - (65,000)) / 435 * 100 = 100,000
R_2 = (500,000 - (65,000 + 100,000)) / 335 * 85 = 85,000
R_3 = (500,000 - (65,000 + 100,000 + 85,000)) / 250 * 69 = 69,000
etc...
This calculation is done for each ProjectID.
My question is how to formulate this in a SQL query? My first thought was to create a recursive CTE, but I am actually not sure it is the right way proceed. Recursive CTE is (from my understanding) made for handling more of hierarchical like structure, which this isn't really.
My other thought was to calculate the SUM(R_t-1) using windowed functions, ie SUM OVER (PARITION BY ORDER BY) with a LAG, but the recursiveness really gives me trouble and I run my head against the wall when I am trying.
Below a query for creating the input data
CREATE TABLE [dbo].[InputForRecursiveCalculation]
(
[Time] int NULL,
ProjectID [int],
ContractValue float,
ContractHours float,
HoursRegistered float,
RemainingHours float
)
GO
INSERT INTO [dbo].[InputForRecursiveCalculation]
(
[Time]
,[ProjectID]
,[ContractValue]
,[ContractHours]
,[HoursRegistered]
,[RemainingHours]
)
VALUES
(0,101,500000,500,65,500),
(1,101,500000,500,100,435),
(2,101,500000,500,85,335),
(3,101,500000,500,69,250),
(4,101,450000,650,100,331),
(5,101,450000,650,80,231),
(6,101,450000,650,90,151),
(7,101,450000,650,45,61),
(8,101,450000,650,16,16),
(0,110,120000,90,10,90),
(1,110,120000,90,10,80),
(2,110,130000,90,10,70),
(3,110,130000,90,10,60),
(4,110,130000,90,10,50),
(5,110,130000,90,10,40),
(6,110,130000,90,10,30),
(7,110,130000,90,10,20),
(8,110,130000,90,10,10)
GO
For those of you who dare downloading something from a complete stranger, I have created an Excel file demonstrating the calculation (please download the file as you will not be to see the actual formula in the HTML representation shown when first clicking the link):
https://www.dropbox.com/s/3rxz72lbvooyc4y/Calculation%20example.xlsx?dl=0
Best regards,
Victor
I think it will be usefull for you. There is additional column SumR that stands for sumarry of previest rows (for ProjectID)
;with recu as
(
select
Time,
ProjectId,
ContractValue,
ContractHours,
HoursRegistered,
RemainingHours,
cast((ContractValue - 0)*HoursRegistered/RemainingHours as numeric(15,0)) as R,
cast((ContractValue - 0)*HoursRegistered/RemainingHours as numeric(15,0)) as SumR
from
InputForRecursiveCalculation
where
Time=0
union all
select
input.Time,
input.ProjectId,
input.ContractValue,
input.ContractHours,
input.HoursRegistered,
input.RemainingHours,
cast((input.ContractValue - prev.SumR)*input.HoursRegistered/input.RemainingHours as numeric(15,0)),
cast((input.ContractValue - prev.SumR)*input.HoursRegistered/input.RemainingHours + prev.SumR as numeric(15,0))
from
recu prev
inner join
InputForRecursiveCalculation input
on input.ProjectId = prev.ProjectId
and input.Time = prev.Time + 1
)
select
*
from
recu
order by
ProjectID,
Time
RESULTS:
Time ProjectId ContractValue ContractHours HoursRegistered RemainingHours R SumR
----------- ----------- ---------------------- ---------------------- ---------------------- ---------------------- --------------------------------------- ---------------------------------------
0 101 500000 500 65 500 65000 65000
1 101 500000 500 100 435 100000 165000
2 101 500000 500 85 335 85000 250000
3 101 500000 500 69 250 69000 319000
4 101 450000 650 100 331 39577 358577
5 101 450000 650 80 231 31662 390239
6 101 450000 650 90 151 35619 425858
7 101 450000 650 45 61 17810 443668
8 101 450000 650 16 16 6332 450000
0 110 120000 90 10 90 13333 13333
1 110 120000 90 10 80 13333 26666
2 110 130000 90 10 70 14762 41428
3 110 130000 90 10 60 14762 56190
4 110 130000 90 10 50 14762 70952
5 110 130000 90 10 40 14762 85714
6 110 130000 90 10 30 14762 100476
7 110 130000 90 10 20 14762 115238
8 110 130000 90 10 10 14762 130000

Summing Previous N Rows using Where Statement

Having some problems finding an answer to what I think is a simple query but I'm very green with SQL:
YR MO ID FLAG RETURN
2001 01 1 1 3.00
2001 02 1 2 4.00
2001 03 1 3 -1.00
2001 04 1 4 1.00
2001 05 1 5 1.00
2001 06 1 6 1.00
2001 07 1 7 1.00
2001 08 1 8 1.00
2001 09 1 9 1.00
2001 10 1 10 1.00
2001 11 1 11 2.00
2001 12 1 12 1.00
2002 12 2 3 1.00
2002 04 2 0 0.05
I'd like a new column next to sum the previous 12 RETURN values WHERE FLAG = 12. Any help is greatly appreciated!
The data will be sorted by ID, then Year, and Month so it should be order sequentially.
The output would be (3+4+-1+1+1+1+1+1+1+1+2+1) = 16
I'd like the output (16) in the FLAG=12 row
Maybe a Windowed Function would fit the bill here:
SELECT *, CASE WHEN FLAG = 12 THEN SUM([RETURN]) OVER (PARTITION BY ID ORDER BY YR, MO ROWS BETWEEN 12 PRECEDING AND CURRENT ROW) ELSE NULL END
FROM SomeTable
ORDER BY ID, YR, MO
So, there are a couple issues with what you are attempting. First, you will need to either programmatically or administratively (through the UI) create the new column; the select call will not do this for you. Next, you need to be sure you want that data in your schema as it will be very 'odd' to have a column that sums flagged values. It seems as if you want to know that result but don't necessarily need to store it. If that is true (or can be made true), then I would suggest creating a select call that uses the 'sum', 'order by ... desc' (this means you need to know the ordering) and 'limit 12' functions. Given any row where the Flag is 12, you should be able get the result you want with a single call.
Just another note, since you've mentioned two different DBMSs, make sure you validate the SQL against both; I'm fairly certain you can find a generic request that will work in both systems. Good luck.

Sum of multiple variables by group

I have a dataset with over 900 observations, each observation represents the population of a sub-geographical area for a given year by gender (male, female, all) and 20 different age groups.
I have dropped the variable for the sub-geographical area and I want to collape into the greater geographical area (called Geo).
I am having a difficult time doing a SUM or PROC MEANS because I have so many age groups to sum up and I am trying to avoid writing them all out. I want to collapse across the group year, geo, sex so that I only have 3 observations per Geo (my raw data could have as many as 54 observations).
This is an example of what a tiny section of the raw data looks like:
Year Geo Sex Age0005 Age0610 Age1115 (etc)
2010 1 1 92 73 75
2010 1 2 57 81 69
2010 1 3 159 154 144
2010 1 1 41 38 43
2010 1 2 52 41 39
2010 1 3 93 79 82
2010 2 1 71 66 68
2010 2 2 63 64 70
2010 2 3 134 130 138
2010 2 1 32 35 34
2010 2 2 29 31 36
2010 2 3 61 66 70
This is how I want it to look:
Year Group Sex Age0005 Age0610 Age1115 (etc)
2010 1 1 133 111 118
2010 1 2 109 122 08
2010 1 3 252 233 226
2010 2 1 103 101 102
2010 2 2 92 95 106
2010 2 3 195 196 208
Any ideas? Please help!
You don't have to write out each variable name individually - there are ways of getting around that. E.g. if all of the age group variables that need to be summed up start with age then you can use a : wildcard to match them:
proc summary nway data = have;
var age:;
class year geo sex;
output out = want sum=;
run;
If your variables don't have a common prefix, but are all next to each other in one big horizontal group in your dataset, you can use a double dash list instead:
proc summary nway data = have;
var age005--age1115; /*Includes all variables between these two*/
class year geo sex;
output out = want sum=;
run;
Note also the use of sum= - this means that each summarised variable is reproduced with its original name in the output dataset.
I personally like to use proc sql for this, since it makes it very clear what you're summing and grouping by.
data old ;
input Year Geo Sex Age0005 Age0610 Age1115 ;
datalines;
2010 1 1 92 73 75
2010 1 2 57 81 69
2010 1 3 159 154 144
2010 1 1 41 38 43
2010 1 2 52 41 39
2010 1 3 93 79 82
2010 2 1 71 66 68
2010 2 2 63 64 70
2010 2 3 134 130 138
2010 2 1 32 35 34
2010 2 2 29 31 36
2010 2 3 61 66 70
;
run;
proc sql ;
create table new as select
year
, geo label = 'Group'
, sex
, sum(age0005) as age0005
, sum(age0610) as age0610
, sum(age1115) as age1115
from old
group by geo, year, sex ;
quit;

Concatenating rows into single column when taking value from two different tables [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Concatenate values based on ID
I have two tables
table1 contains
pkUserSubjectid UserId fkSubjectId
15 146 1
16 146 2
17 146 4
18 147 1
19 147 3
20 148 1
21 148 3
22 149 1
23 149 3
table 2 contains
pkSubjectId SubjectName
1 Maths
2 English
3 Physics
4 Chemistry
5 Computer
I want my result in this format
UserId SubjectName
146 Maths, English, Chemistry
147 Maths, Physics
and so on
Please tell me any query in SQL
Consider building a clr aggregate function. The msdn example function would work for this.
http://msdn.microsoft.com/en-us/library/ms131056(v=sql.100).aspx
You could then do something like
SELECT a.[UserId], dbo.MyAgg(b.[SubjectName]) as [SubjectName]
FROM table1 as a
LEFT OUTER JOIN table2 as b ON a.[fkSubjectId] = b.[pkSubjectId]
GROUP BY a.[UserId]
The example uses a single parameter and uses "," as the delimiter. You could also create a 2 parameter function as in the second example to pass in the delimiter.

Finding bigram in a location index

I have a table which indexes the locations of words in a bunch of documents.
I want to identify the most common bigrams in the set.
How would you do this in MSSQL 2008?
the table has the following structure:
LocationID -> DocID -> WordID -> Location
I have thought about trying to do some kind of complicated join... and i'm just doing my head in.
Is there a simple way of doing this?
I think I better edit this on monday inorder to bump it up in the questions
Sample Data
LocationID DocID WordID Location
21952 534 27 155
21953 534 109 156
21954 534 4 157
21955 534 45 158
21956 534 37 159
21957 534 110 160
21958 534 70 161
It's been years since I've written SQL, so my syntax may be a bit off; however, I believe the logic is correct.
SELECT CONCAT(i.WordID, "|", j.WordID) as bigram, count(*) as freq
FROM index as i, index as j
WHERE j.Location = i.Location+1 AND
j.DocID = i.DocID
GROUP BY bigram
ORDER BY freq DESC
You can also add the actual word IDs to the select list if that's useful, and add a join to whatever table you've got that dereferences WordID to actual words.

Resources