SQL Server: conditional aggregation - sql-server

I have this table:
What I want to do is aggregate these so each instructor has one line, so I used this SQL:
Select
TermCode, SubjectCode, course, QuestionNbr, InstructorName,
Sum(TotalStudents) as TotalStudents, Avg(Mean) as Mean,
StDev(StdDev) as StdDev
From
#MyTable
Group By
TermCode, SubjectCode, course, QuestionNbr, InstructorName
And I get this:
The problem is that any instructor with just one entry will have a null StdDev, which is to be expected. What I want is in those cases to use the StdDev value from the original table, so I would get this:
Is there a way to do this?

One approach would be to use COALESCE with an aggregate function that won't return NULL - such as Max:
COALESCE(StDev(StdDev),Max(StdDev)) as StdDev

Related

SQL Server: random number in WHERE clause

As far as I am aware, the only way to get a random value in a SELECT statement is by using the newid() function, as the random() function doesn’t generate new values for each row.
This leads to the following awkward construction to get a random number from, say 0 - 9:
abs(checksum(newid())) % 10
If I use this expression in the SELECT clause, it behaves as expected. However, if I try something like the following:
select *
from table
where abs(checksum(newid())) % 10>4;
I should have though that I would get roughly half the rows. Instead I get I get all or none of them. Apparently newid() is only evaluated once, instead of for each row.
The question is, how can I use a random number in the WHERE clause?
More
There is a similar question which asks for fixed number of rows at random. In the above example I could have used:
select top 50 percent from table order by newid();
which will get me what I am looking for.
The question remains, how can I use a random number in the WHERE clause. For example, is it possible to do something like this?
select *
from table
where code={random number};
Here is one way to get around the problem
SELECT *
FROM (SELECT *,
Abs(Checksum(Newid())) % 10 AS ran
FROM yourtable) a
WHERE ran > 4;
for some reason newid() in where clause it is executed only once and it is checked with the constant.
When I check the execution plan your query is missing compute scalar where as my query has compute scalar present in execution plan.
The function newid() is calculate only once in the WHERE clause, not row by row. The trick is to force it to run row by row.
Of course it is possible to include it in a SELECT clause, and, in turn, include that in a CTE or a subquery, as per the other answers.
Microsoft offer a solution here: https://learn.microsoft.com/en-us/previous-versions/sql/sql-server-2008-r2/ms189108(v=sql.105)?redirectedfrom=MSDN
The trick is to force newid() to recalculate by combining it with some row value. This is easily done in the checksum() function.
For example:
SELECT *
FROM table
WHERE abs(checksum(newid(),id)) % 10>4;
I should have though that I would get roughly half the rows. Instead I get I get all or none of them
You may get all of the rows or none of them ,since NEWID() is executed once per query when you use it in where clause..This is explained here by Conor Cunnigham and the technical term for this is called RumTimeConstants
You can look at your execution plan and look out for below expression
Const ConstValue
which you can see is calculated once and used throughout and finally you are doing just a boolean comparison,so you will end up with all rows or none
you have to use CTE Like the one stated in another answer or use Top with order by newid() or tablesample to return random rows
you may find Tablesample option more helpfull,since this may not go though all the table data to get only sample set of rows,unlike Newid()
below is one example on a table having 1000000 rows
select * from Orders
TABLESAMPLE (50 PERCENT)
plan

Using where on a column resulting from UDF

In my select i have a column which is resulting from UDF. Also the same column should be part of where clause. Other than calling UDF two times are there any other options. I have around 15K rows returning based on user search, because of this i would like to call UDF as it is slowing performance any advice on how to achieve this. Would like something similar to this.
SELECT EMP
, SAL
, Location
, dbo.GetCompValue
(EMP, SAL) AS CompValue
FROM tblEmpSal
WHERE CompValue > 5000;
You could use CROSS/OUTER APPLY:
SELECT EMP, SAL,Location, s.CompValue
From tblEmpSal
OUTER APPLY (SELECT dbo.GetCompValue(EMP,SAL) AS CompValue) s
WHERE s.CompValue > 5000
UDFs are always a pig when it comes to Performance , you can change this function logic to make it an In-Line table valued function, Performance will be much better and in some cases it will also make use of the cached execution plans.
SELELCT EMP, SAL,Location, CompValue
FROM (
SELECT EMP, SAL,Location, dbo.GetCompValue(EMP,SAL) AS CompValue
From tblEmpSal
)A
WHERE A.CompValue > 5000
If the UDF's parameters are from a single table, one option (depending on how that table is used and the nature of the UDF) would be to add a persisted computed column to that table and then use that column in your SQL statements instead of repeatedly calling the UDF. This column could also potentially be indexed, although whether that has any benefit will depend on what else your query is doing.

Grouping by single column but returning all the columns without including other columns in aggregate function

I am working on an SQL query which should group by a column bidBroker and return all the columns in the table.
I tried it using the following query
select Product,
Term,
BidBroker,
BidVolume,
BidCP,
Bid,
Offer,
OfferCP,
OfferVolume,
OfferBroker,
ProductID,
TermID
from canadiancrudes
group by BidBroker
The above query threw me an error as follows
Column 'canadiancrudes.Product' is invalid in the select list because it is not contained in either an aggregate function or the
GROUP BY clause.
Is there any other way which returns all the data grouping by bidBroker without changing the order of data coming from CanadadianCrudes?
First if you are going to agregate, you should learn about agregate functions.
Then grouping becomes much more obvious.
I think you should explain what you are trying to accomplish here, because I suspect that you are trying to SORT bu Bidbroker, rather than grouping.
If you mean you want to sort by BidBroker, you can use:
SELECT Product,Term,BidBroker,BidVolume,BidCP,Bid,Offer,OfferCP,OfferVolume,OfferBroker,ProductID,TermID
FROM canadiancrudes
ORDER BY BidBroker
If you want to GROUP BY, and give example-data you can use:
SELECT c1.Product,c1.Term,c1.BidBroker,c1.BidVolume,c1.BidCP,c1.Bid,c1.Offer,c1.OfferCP,c1.OfferVolume,c1.OfferBroker,c1.ProductID,c1.TermID
FROM canadiancrudes c1
WHERE c1.YOURPRIMARYKEY IN (
select MIN(c2.YOURPRIMARYKEY) from canadiancrudes c2 group by c2.BidBroker
)
Replace YOURPRIMARYKEY with your column with your row-unique id.
As others have said, don't use "group by" if you don't want to aggregate something. If you do want to aggregate by one column but include others as well, consider researching "partition."

SQL Server strange distinct query

I use SQL Server 2008, C#, I have a table which contains about 20000 rows, I have several similar rows in this table, there are about 900 distinct rows, it is my table structure:
tblCourse
courselevel, coursecode, coursename, branchcode...
For example I have 20 rows with the same coursecode/coursename but with different branchcode or courselevel, I'm going to have a table which contains item with only unique coursecode.
here is a little sample of my table:
... courselevel=1,coursecode=1200,coursename=A,branchcode=200...
... courselevel=2,coursecode=1200,coursename=A,branchcode=200...
... courselevel=1,coursecode=1200,coursename=A,branchcode=220...
... courselevel=1,coursecode=1200,coursename=A,branchcode=230...
... courselevel=1,coursecode=1200,coursename=A,branchcode=240...
... courselevel=1,coursecode=1200,coursename=A,branchcode=250...
... courselevel=2,coursecode=1200,coursename=A,branchcode=251...
... courselevel=1,coursecode=1200,coursename=A,branchcode=225...
I want to have only the first row:
... courselevel=1,coursecode=1200,coursename=A,branchcode=200...
because all rows have similar coursecode,
What should I do?
How should I write my select query string?
I have tested different methods (group by, distinct, max(ID)...) with no luck, please help me!
thanks
You can GROUP BY the similar columns and use any Aggregate Function on the other columns to have them just return one record. What that one value would be entirely depends on the aggregate function you use.
Aggregate Functions
Aggregate functions perform a calculation on a set of values and
return a single value. Except for COUNT, aggregate functions ignore
null values. Aggregate functions are frequently used with the GROUP BY
clause of the SELECT statement
In this example, I have used the min/max and avg aggregate functions.
SELECT courselevel
, coursecode
, coursename
, MIN(branchcode)
, MAX(othercolumn)
, AVG(numberColumn)
, ...
FROM yourTable
GROUP BY
courselevel
, coursecode
, coursename

SQL Server Reference a Calculated Column

I have a select statement with calculated columns and I would like to use the value of one calculated column in another. Is this possible? Here is a contrived example to show what I am trying to do.
SELECT [calcval1] = CASE Statement, [calcval2] = [calcval1] * .25
No.
All the results of a single row from a select are atomic. That is, you can view them all as if they occur in parallel and cannot depend on each other.
If you're referring to computed columns, then you need to update the formula's input for the result to change during a select.
Think of computed columns as macros or mini-views which inject a little calculation whenever you call them.
For example, these columns will be identical, always:
-- assume that 'Calc' is a computed column equal to Salaray*.25
SELECT Calc, Salary*.25 Calc2 FROM YourTable
Also keep in mind that the persisted option doesn't change any of this. It keeps the value around which is nice for indexing, but the atomicity doesn't change.
Unfortunately not really, but a workaround that is sometimes worth it is
SELECT [calcval1], [calcval1] * .25 AS [calcval2]
FROM (SELECT [calcval1] = CASE Statement FROM whatever WHERE whatever)
Yes it's possible.
Use the WITH Statement for nested selects:
Two ways I can think of to do that. First understand that the calval1 column does not exist as far as SQL Server is concerned until the statement has run, therefore it cannot be directly used as showning your example. So you can put the calculation in there twice, once for calval1 and once as substitution for calcval1 in the calval2 calculation.
The other way is to make a derived table with calval1 in it and then calculate calval2 outside the derived table something like:
select calcval1*.25 as calval2, calval1, field1, field2
from (select casestament as cavlval1, field1, field2 from my table) a
You'll need to test both for performance.
You should use an outer apply instead of a subselect:
select V.calc,V.calc*0.25 from FOO outer apply (select case Statement as calc) V
You can't "reset" the value of a calculated column in a Select clause, if that's what you're trying to do... The value of a calculated column is based on the calculated column formulae. Which CAN include the value of another calculated column.... but you canlt reset the formulae in a Select clause... if all you want to do is "output" the value based on two calculated columns, (as the syntax in your question reads" Then the
"[calcval2]"
in
SELECT [calcval1] = CASE Statement, [calcval2] = [calcval1] * .25
would just become a column alias in the output of the Select Clause.
or are you asking how to define the formulae for one calculated column to be based on another?

Resources