Aggregate function on one column, group by on another, leave a third unaffected - sql-server

I feel like this isn't too bad of a problem but I've been looking for a solution for the greater part of the day to no avail. Other solutions I've seen plenty of that don't seem to help me have been for getting columns that aren't unique values along with a group by and aggregate function.
The problem
I have a table of historical data as follows:
ID | source | value | date
---+--------+-------+-----------
1 | 12 | 10 | 2016-11-16
2 | 12 | 20 | 2015-11-16
3 | 12 | 30 | 2014-11-16
4 | 13 | 40 | 2016-11-16
5 | 13 | 50 | 2015-11-16
6 | 13 | 60 | 2014-11-16
I'm trying to get data before a certain date(within a loop to go different ranges), then getting the sum of the values grouped by source. So as an example "get all records before 30 days ago, and get the sum of the values of the unique sources, using the most recent dated entry for each".
So the first step was to remove entries with dates not in the range, an easy where date < getdate()-30 for example to get:
ID | source | value | date
---+--------+-------+-----------
2 | 12 | 20 | 2015-11-16
3 | 12 | 30 | 2014-11-16
5 | 13 | 50 | 2015-11-16
6 | 13 | 60 | 2014-11-16
Now my issue is finding a way to group by source and take the max date, and then sum up the result across all sources. The idea hear is that we don't know when the last entry is, so before the specified date we get all records, then take the newest entry for each unique source, and sum those up to get the total value at that time.
So the next step would be to group by source using the max of date, resulting in :
ID | source | value | date
---+--------+-------+-----------
2 | 12 | 20 | 2015-11-16
5 | 13 | 50 | 2015-11-16
And then the final step would be to sum the values, and then this process is repeated to get the sum value for multiple dates, so this would result in the row
value | date
-------+-----------
70 | getdate() - 30
to use for the rest.
Where I'm stuck
I'm trying to group by source and use the max of date to get the most recent entry for each unique source, but if I use the aggregate function or group by, then I can't preserve the ID or value columns to stick with the chosen max row. It's totally possible I'm just misunderstanding how aggregate functions work.
Progress so far
The best place I've gotten to yet is something like
with dataInDateRange as (
select *
from #historicalData hd
where hd.date < getdate() - 30
)
select ???, max(date)
from dataInDateRange
group by source
But I'm not seeing how I can do this without somehow preserving a unique ID for the row that has the max date for each source so then I can go back and sum up the numbers.
Thank you great people for any help/guidance/lessons

USE row_number()
with dataInDateRange as (
select *
from #historicalData hd
where hd.date < getdate() - 30
), rows as (
select *,
row_number() over (partition by source
order by date desc) as rn
from dataInDateRange
)
SELECT *
FROM rows
WHERE rn = 1

Related

Running count of duplicate values

I have a table showing pallets and the amount of product ("units") on those pallets. Individual pallets can have multiple records due to multiple possible defect codes. This means when I am trying to sum the total units on all pallets, the same pallet could get counted more than once, which is undesirable. I would like (but don't know how) to add a running tally column to show how many times a specific pallet ID has appeared so that I can filter out any record where the count is greater than 1:
| Pallet_ID | Units | Defect_Code | COUNT |
+-----------+-------+-------------+-------+
| A1 | 100 | 03 | 1 |
| A1 | 100 | 05 | 2 |
| B1 | 95 | 03 | 1 |
| C1 | 300 | 05 | 1 |
| C1 | 300 | 06 | 2 |
| D1 | 210 | 03 | 1 |
| A1 | 100 | 10 | 3 |
| D1 | 210 | 03 | 2 |
In the above example, the correct sum total of units should be 705. A solution in SQL or in DAX would work (although I lean towards SQL). I have searched for a long time but could not find a solution that fits this particular scenario. Many thanks in advance for your time and consideration!
You may use the windowing function row_number() with the over clause where you partition by the pallet. Within each partition you can control which row is assigned the number 1 by using the order by inside the over clause.
select
*
from (
select
Pallet_ID
, Units
, Defect_Code
, row_number() over(partition by Pallet_ID order by defect_code) as count_of
from yourtable
)
where count_of = 1
Note I have arbitrability use the column defect_code to order by as I don't know what other columns may exist. If your table has a date/time value for when the row was created you could use this instead, or perhaps the unique key of the table.
side note:
I would not recommend using column alias of "count" as it's a SQL reserved word

Sum running total in sql

I am trying to insert a running total column into a SQL Server table as part of a stored procedure. I am needing this for a financial database so I am dealing with accounts and departments. For example, let's say I have this data set:
Account | Dept | Date | Value | Running_Total
--------+--------+------------+----------+--------------
5000 | 40 | 2018-02-01 | 10 | 15
5000 | 40 | 2018-01-01 | 5 | 5
4000 | 40 | 2018-02-01 | 10 | 30
5000 | 30 | 2018-02-01 | 15 | 15
4000 | 40 | 2017-12-01 | 20 | 20
The Running_Total column provides a historical sum of dates less than or equal to each row's date value. However, the account and dept must match for this to be the case.
I was able to get close by using
SUM(Value) OVER (PARTITION BY Account, Dept, Date)
but it does not go back and get the previous months...
Any ideas? Thanks!
You are close. You need an order by:
Sum(Value) over (partition by Account, Dept order by Date)

SQL Server Query for Partitioning Data

I have a requirement of assigning sequential Numbers to students. The problem is the data must be partitioned by course first and then the Number must be assigned starting from say 1 to say 1000.
Each Course should have at least a gap of say 20 ( may differ ) to accommodate a student in the same course in case, someone, if left out as of now appears later.
and so on.
I have tried partitioning and Recursive CTE but haven't succeeded to get this kind of series for assigning finally the RollNumber.
Any help would be very much anticipated.
Thank You.
You can do this in two steps with a subquery. First get your row_number() partitioned by course and order by student id, then you can bump each partition by 20 by counting the previous 1 values returned by your row_number() and multiplying by 20.
SELECT
s_no,
course,
rownumber + (SUM(CASE WHEN rownumber = 1 THEN 1 ELSE 0 END) OVER (ORDER BY course, s_no ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) * 20) - 20
FROM
(
SELECT
s_no,
course,
ROW_NUMBER() OVER (PARTITION BY course ORDER BY s_no) rownumber
FROM test
) sub
ORDER BY course, s_no;
+------+--------+-----------+
| s_no | course | rownumber |
+------+--------+-----------+
| 1 | A | 1 |
| 2 | A | 2 |
| 3 | A | 3 |
| 1 | B | 21 |
| 2 | B | 22 |
| 3 | B | 23 |
| 1 | C | 41 |
| 2 | C | 42 |
| 3 | C | 43 |
+------+--------+-----------+
This isn't exactly as your desired output, but I think it's the same as what you are after. You can monkey with the math in that main query though and bump each partitions starting position to whatever you want.

SQL Server loop to run a script based off a date?

Not sure where to start searching. Basically I have a script that returns multiple metrics from tables. It uses an as of date (each monday). I was able to collect the past years Mondays "As of dates" Now I want to be able to write a script that will use those dates instead of running it manually 52 times.
The end table looks like this:
Office | Metric_1| Metric_2|As_of_Date|
12 | 2000000 | 1 |2017-06-28|
15 | 4000000 | 2 |2017-06-28|
20 | 8000000 | 4 |2017-06-28|
I;d greatly appreciate any direction or help.
Thank you
The end result table would look like this:
Office | Metric_1| Metric_2|As_of_Date|
12 | 2000000 | 1 |2017-06-28|
15 | 4000000 | 2 |2017-06-28|
20 | 8000000 | 4 |2017-06-28|
12 | 2000000 | 1 |2017-05-15|
15 | 4000000 | 2 |2017-05-15|
20 | 8000000 | 4 |2017-05-15|
If I didn't get you wrong, what you need is to find all the data that has as_of_date being in this year, so all you need is to limit date only to this year, some of the examples on how to do it is below
select * from table where as_of_date >= to_date('2017-01-01','yyyy-mm-dd') and as_of_date <= to_date('2017-12-31','yyyy-mm-dd')
or the better way
select * from table where WHERE EXTRACT(YEAR FROM as_of_date ) = 2017
UPD: as you stated in your comment, in order to get some data from DataTable according to the DateTable you mentioned you can use join like below:
SELECT A.* FROM DATATABLE A RIGHT JOIN DATETABLE B ON A.AS_OF_DATE = B.AS_OF_DATE

TSQL Multiple column unpivot with named rows possible?

I know there are several unpivot / cross apply discussions here but I was not able to find any discussion that covers my problem. What I've got so far is the following:
SELECT Perc, Salary
FROM (
SELECT jobid, Salary_10 AS Perc10, Salary_25 AS Perc25, [Salary_Median] AS Median
FROM vCalculatedView
WHERE JobID = '1'
GROUP BY JobID, SourceID, Salary_10, Salary_25, [Salary_Median]
) a
UNPIVOT (
Salary FOR Perc IN (Perc10, Perc25, Median)
) AS calc1
Now, what I would like is to add several other columns, eg. one named Bonus which I also want to put in Perc10, Perc25 and Median Rows.
As an alternative, I also made a query with cross apply, but here, it seems as if you can not "force" sort the rows like you can with unpivot. In other words, I can not have a custom sort, but only a sort that is according to a number within the table, if I am correct? At least, here I do get the result like I wish to have, but the rows are in a wrong order and I do not have the rows names like Perc10 etc. which would be nice.
SELECT crossapplied.Salary,
crossapplied.Bonus
FROM vCalculatedView v
CROSS APPLY (
VALUES
(Salary_10, Bonus_10)
, (Salary_25, Bonus_25)
, (Salary_Median, Bonus_Median)
) crossapplied (Salary, Bonus)
WHERE JobID = '1'
GROUP BY crossapplied.Salary,
crossapplied.Bonus
Perc stands for Percentile here.
Output is intended to be something like this:
+--------------+---------+-------+
| Calculation | Salary | Bonus |
+--------------+---------+-------+
| Perc10 | 25 | 5 |
| Perc25 | 35 | 10 |
| Median | 27 | 8 |
+--------------+---------+-------+
Do I miss something or did I something wrong? I'm using MSSQL 2014, output is going into SSRS. Thanks a lot for any hint in advance!
Edit for clarification: The Unpivot-Method gives the following output:
+--------------+---------+
| Calculation | Salary |
+--------------+---------+
| Perc10 | 25 |
| Perc25 | 35 |
| Median | 27 |
+--------------+---------+
so it lacks the column "Bonus" here.
The Cross-Apply-Method gives the following output:
+---------+-------+
| Salary | Bonus |
+---------+-------+
| 35 | 10 |
| 25 | 5 |
| 27 | 8 |
+---------+-------+
So if you compare it to the intended output, you'll notice that the column "Calculation" is missing and the row sorting is wrong (note that the line 25 | 5 is in the second row instead of the first).
Edit 2: View's definition and sample data:
The view basically just adds computed columns of the table. In the table, I've got Columns like Salary and Bonus for each JobID. The View then just computes the percentiles like this:
Select
Percentile_Cont(0.1)
within group (order by Salary)
over (partition by jobID) as Salary_10,
Percentile_Cont(0.25)
within group (order by Salary)
over (partition by jobID) as Salary_25
from Tabelle
So the output is like:
+----+-------+---------+-----------+-----------+
| ID | JobID | Salary | Salary_10 | Salary_25 |
+----+-------+---------+-----------+-----------+
| 1 | 1 | 100 | 60 | 70 |
| 2 | 1 | 100 | 60 | 70 |
| 3 | 2 | 150 | 88 | 130 |
| 4 | 3 | 70 | 40 | 55 |
+----+-------+---------+-----------+-----------+
In the end, the view will be parameterized in a stored procedure.
Might this be your approach?
After your edits I understand, that your solution with CROSS APPLY would comes back with the right data, but not in the correct output. You can add constant values to your VALUES and do the sorting in a wrapper SELECT:
SELECT wrapped.Calculation,
wrapped.Salary,
wrapped.Bonus
FROM
(
SELECT crossapplied.*
FROM vCalculatedView v
CROSS APPLY (
VALUES
(1,'Perc10',Salary_10, Bonus_10)
, (2,'Perc25',Salary_25, Bonus_25)
, (3,'Median',Salary_Median, Bonus_Median)
) crossapplied (SortOrder,Calculation,Salary, Bonus)
WHERE JobID = '1'
GROUP BY crossapplied.SortOrder,
crossapplied.Calculation,
crossapplied.Salary,
crossapplied.Bonus
) AS wrapped
ORDER BY wrapped.SortOrder

Resources