Aggregate functions in one SQL query - sql-server

My task is to use min, max and sum in one query. The main problem is that I have to add 3 more aggregates like sum only positive, sum only negative and distinct count.
I made a simple query:
select
min(amount) as [min],
max(amount) as [max],
sum(amount) as [sum],
sum(case when amount > 0 then amount else 0 end) as sum_pos,
sum(case when amount < 0 then amount else 0 end) as sum_neg,
count(distinct amount) as count_dist
from sometable
It gave me the expected results but I wonder if this task has a more professional approach?

As many of the commenters have pointed out, this query satisfies the requirements as stated with the least amount of work to be performed by the query optimizer and database engine.
In a practical application, this query may over-simplify the data in the hypothetical sometable. Additional context could be added with a GROUP BY and displaying the aggregates on a per-entity/key basis, but that does not bear on the "professionalism" or functionality of the original question as scoped.

Related

Sql Server Statement to calculate accommodation utilisation

I want to calculate the total number of Occupied accommodation as a percentage of the total accommodtion available. I am using the query below. I dont think the query is right. It returns error near From.
Please can anyone help
SELECT ((COUNT(Accommodation.dbo.House.HouseID) / COUNT(B.HouseID)) * 100) AS [Accommodation Utilisation %]
FROM (SELECT COUNT(B.HouseID)
FROM Accommodation.dbo.House B
WHERE Accommodation.dbo.House.STATUS = 'Occupied')
FROM Accommodation.dbo.House
Seems like a much easier way to do this would be to use a conditional AVG:
SELECT AVG(CASE H.STATUS WHEN 'Occupied' THEN 1. ELSE 0. END) AS Utilisation --Don't use names that need to be delimit identified.
FROM dbo.House H; --You should already be connnected to the database Accomodation, so it doesn't need to appear here
db<>fiddle

SQL Get Second Record

I am looking to retrieve only the second (duplicate) record from a data set. For example in the following picture:
Inside the UnitID column there is two separate records for 105. I only want the returned data set to return the second 105 record. Additionally, I want this query to return the second record for all duplicates, not just 105.
I have tried everything I can think of, albeit I am not that experience, and I cannot figure it out. Any help would be greatly appreciated.
You need to use GROUP BY for this.
Here's an example: (I can't read your first column name, so I'm calling it JobUnitK
SELECT MAX(JobUnitK), Unit
FROM JobUnits
WHERE DispatchDate = 'oct 4, 2015'
GROUP BY Unit
HAVING COUNT(*) > 1
I'm assuming JobUnitK is your ordering/id field. If it's not, just replace MAX(JobUnitK) with MAX(FieldIOrderWith).
Use RANK function. Rank the rows OVER PARTITION BY UnitId and pick the rows with rank 2 .
For reference -
https://msdn.microsoft.com/en-IN/library/ms176102.aspx
Assuming SQL Server 2005 and up, you can use the Row_Number windowing function:
WITH DupeCalc AS (
SELECT
DupID = Row_Number() OVER (PARTITION BY UnitID, ORDER BY JobUnitKeyID),
*
FROM JobUnits
WHERE DispatchDate = '20151004'
ORDER BY UnitID Desc
)
SELECT *
FROM DupeCalc
WHERE DupID >= 2
;
This is better than a solution that uses Max(JobUnitKeyID) for multiple reasons:
There could be more than one duplicate, in which case using Min(JobUnitKeyID) in conjunction with UnitID to join back on the UnitID where the JobUnitKeyID <> MinJobUnitKeyID` is required.
Except, using Min or Max requires you to join back to the same data (which will be inherently slower).
If the ordering key you use turns out to be non-unique, you won't be able to pull the right number of rows with either one.
If the ordering key consists of multiple columns, the query using Min or Max explodes in complexity.

PostgreSQL Crosstab - variable number of columns

A common beef I get when trying to evangelize the benefits of learning freehand SQL to MS Access users is the complexity of creating the effects of a crosstab query in the manner Access does it. I realize that strictly speaking, in SQL it doesn't work that way -- the reason it's possible in Access is because it's handling the rendering the of the data.
Specifically, when I have a table with entities, dates and quantities, it's frequent that we want to see a single entity on one line with the dates represented as columns:
This:
entity date qty
------ -------- ---
278700-002 1/1/2016 5
278700-002 2/1/2016 3
278700-002 2/1/2016 8
278700-002 3/1/2016 1
278700-003 2/1/2016 12
Becomes this:
Entity 1/1/16 2/1/16 3/1/16
---------- ------ ------ ------
278700-002 5 11 1
278700-003 12
That said, the common way we've approached this is something similar to this:
with vals as (
select
entity,
case when order_date = '2016-01-01' then qty else 0 end as q16_01,
case when order_date = '2016-02-01' then qty else 0 end as q16_02,
case when order_date = '2016-03-01' then qty else 0 end as q16_02
from mydata
)
select
entity, sum (q16_01) as q16_01, sum (q16_02) as q16_02, sum (q16_03) as q16_03
from vals
group by entity
This is radically oversimplified, but I believe most people will get my meaning.
The main problem with this is not the limit on the number of columns -- the data is typically bounded, and I can make due with a fixed number of date columns -- 36 months, or whatever, depending on the context of the data. My issue is the fact that I have to change the dates every month to make this work.
I had an idea that I could leverage arrays to dynamically assign the quantity to the index of the array, based on the month away from the current date. In this manner, my data would end up looking like this:
Entity Values
---------- ------
278700-002 {5,11,1}
278700-003 {0,12,0}
This would be quite acceptable, as I could manage the rendering of the actual columns within whatever rendering tool I was using (Excel, for example).
The problem is I'm stuck... how do I get from my data to this. If this were Perl, I would loop through the data and do something like this:
foreach my $ref (#data) {
my ($entity, $month_offset, $qty) = #$ref;
$values{$entity}->[$month_offset] += $qty;
}
By this isn't Perl... so far, this is what I have, and now I'm at a mental impasse.
with offset as (
select
entity, order_date, qty,
(extract (year from order_date ) - 2015) * 12 +
extract (month from order_date ) - 9 as month_offset,
array[]::integer[] as values
from mydata
)
select
prod_id, playgrd_dte, -- oh my... how do I load into my array?
from fcst
The "2015" and the "9" are not really hard-coded -- I put them there for simplicity sake for this example.
Also, if my approach or my assumptions are totally off, I trust someone will set me straight.
As with all things imaginable and unimaginable, there is a way to do this with PostgreSQL. It looks like this:
WITH cte AS (
WITH minmax AS (
SELECT min(extract(month from order_date))::int,
max(extract(month from order_date))::int
FROM mytable
)
SELECT entity, mon, 0 AS qty
FROM (SELECT DISTINCT entity FROM mytable) entities,
(SELECT generate_series(min, max) AS mon FROM minmax) allmonths
UNION
SELECT entity, extract(month from order_date)::int, qty FROM mytable
)
SELECT entity, array_agg(sum) AS values
FROM (
SELECT entity, mon, sum(qty) FROM cte
GROUP BY 1, 2) sub
GROUP BY 1
ORDER BY 1;
A few words of explanation:
The standard way to produce an array inside a SQL statement is to use the array_agg() function. Your problem is that you have months without data and then array_agg() happily produces nothing, leaving you with arrays of unequal length and no information on where in the time period the data comes from. You can solve this by adding 0's for every combination of 'entity' and the months in the period of interest. That is what this snippet of code does:
SELECT entity, mon, 0 AS qty
FROM (SELECT DISTINCT entity FROM mytable) entities,
(SELECT generate_series(min, max) AS mon FROM minmax) allmonths
All those 0's are UNIONed to the actual data from 'mytable' and then (in the main query) you can first sum up the quantities by entity and month and subsequently aggregate those sums into an array for each entity. Since it is a double aggregation you need the sub-query. (You could also sum the quantities in the UNION but then you would also need a sub-query because UNIONs don't allow aggregation.)
The minmax CTE can be adjusted to include the year as well (your sample data doesn't need it). Do note that the actual min and max values are immaterial to the index in the array: if min is 743 it will still occupy the first position in the array; those values are only used for GROUPing, not indexing.
SQLFiddle
For ease of use you could wrap this query up in a SQL language function with parameters for the starting and ending month. Adjust the minmax CTE to produce appropriate min and max values for the generate_series() call and in the UNION filter the rows from 'mytable' to be considered.

Compute sum for distinct order numbers in ssrs report

I'm using a SQL Server 2008R2 Database and SSRS Report Builder 3.0
Trying to compute the sum of the amount owed for each order id (need to show the itemids)...but when I do, the amount owed is showing 400 (instead of 200 - line 4, 100 instead of 50 in line 7, line 9 is correct. As a result the Total line is way off)
=Sum(Fields!owe.Value)
The report is grouped by the campus.
I understand that ssrs is probably not the best place to do this computation but I don't know how to do outside of ssrs....I tried distinct and group by so far with no results.
Below is how I need the report to show like....
Thanks in advance.
Incorrect amounts are
Another example as it should display the subtotals
I would modify the SQL to produce an extra column just for purposes of summing the Owe on an OrderId. Use the Row Number to get the first item in each order, and only supply the Owe value for that item for each order:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY OrderId ORDER BY ItemId) AS rn
FROM MyTable
WHERE (whatever filters you use)
)
SELECT *,
CASE WHEN rn=1 THEN Owe ELSE 0 END AS OrderOwe
FROM cte
ORDER BY Campus, CustomerId, OrderId, ItemId
Then simply change the expression for the "Owe" textbox in your SubTotal row to this:
=Sum(Fields!OrderOwe.Value)
And you will get the sum of the Owe per order instead of per item.
Well if your owe is always the same for each item in the group you could add a Sum/Count of the item in the group which would give you the correct results in all the cases above.

Performant way to get the maximum value of a running total in TSQL

We have a table of transactions which is structured like the following :
TranxID int (PK and Identity field)
ItemID int
TranxDate datetime
TranxAmt money
TranxAmt can be positive or negative, so the running total of this field (for any ItemID) will go up and down as time goes by. Getting the current total is obviously simple, but what I'm after is a performant way of getting the highest value of the running total and the TranxDate when this occurred. Note that TranxDate is not unique, and due to some backdating the ID field is not necessarily in the same sequence as TranxDate for a given Item.
Currently we're doing something like this (#tblTranx is a table variable containing just the transactions for a given Item) :
SELECT Top 1 #HighestTotal = z.TotalToDate, #DateHighest = z.TranxDate
FROM
(SELECT a.TranxDate, a.TranxID, Sum(b.TranxAmt) AS TotalToDate
FROM #tblTranx AS a
INNER JOIN #tblTranx AS b ON a.TranxDate >= b.TranxDate
GROUP BY a.TranxDate, a.TranxID) AS z
ORDER BY z.TotalToDate DESC
(The TranxID grouping removes the issue caused by duplicate date values)
This, for one Item, gives us the HighestTotal and the TranxDate when this occurred. Rather than run this on the fly for tens of thousands of entries, we only calculate this value when the app updates the relevant entry and record the value in another table for use in reporting.
The question is, can this be done in a better way so that we can work out these values on the fly (for multiple items at once) without falling into the RBAR trap (some ItemIDs have hundreds of entries). If so, could this then be adapted to get the highest values of subsets of transactions (based on a TransactionTypeID not included above). I'm currently doing this with SQL Server 2000, but SQL Server 2008 will be taking over soon here so any SQL Server tricks can be used.
SQL Server sucks in calculating running totals.
Here's a solution for your very query (which groups by dates):
WITH q AS
(
SELECT TranxDate, SUM(TranxAmt) AS TranxSum
FROM t_transaction
GROUP BY
TranxDate
),
m (TranxDate, TranxSum) AS
(
SELECT MIN(TranxDate), SUM(TranxAmt)
FROM (
SELECT TOP 1 WITH TIES *
FROM t_transaction
ORDER BY
TranxDate
) q
UNION ALL
SELECT DATEADD(day, 1, m.TranxDate),
m.TranxSum + q.TranxSum
FROM m
CROSS APPLY
(
SELECT TranxSum
FROM q
WHERE q.TranxDate = DATEADD(day, 1, m.TranxDate)
) q
WHERE m.TranxDate <= GETDATE()
)
SELECT TOP 1 *
FROM m
ORDER BY
TranxSum DESC
OPTION (MAXRECURSION 0)
You need to have an index on TranxDate for this to work fast.

Resources