How to add a calculated column of certain rows in SQL - sql-server

I want to add a calculated column (persisted) that is the total rows for the same group of categories such as sales order below. How would you do this in SQL Server?
SalesOrder Amount Total(calculated)
100 10 25
100 15 25
101 20 45
101 25 45
102 30 65
102 35 65

The best mechanism to use for storing pre-calculated aggregates that are automatically maintained would be an indexed view, it is not possible via a persisted computed column (you could use a scalar UDF in a computed column to calculate the result but this can't be persisted and such computed columns are generally bad for performance both as they force RBAR evaluation and as they block parallelism).
CREATE VIEW dbo.AggregatedSales
WITH SCHEMABINDING
AS
SELECT SalesOrder,
SUM(Amount) AS Total
FROM dbo.YourTable
GROUP BY SalesOrder
GO
CREATE UNIQUE CLUSTERED INDEX UIX ON dbo.AggregatedSales(SalesOrder)
Then the aggregates will be pre calculated and stored in the view. Your queries will need to join on the view. You may need to use the NOEXPAND hint to be sure that the pre calculated aggregates are in fact used and they aren't recalculated at runtime.

For SQL server 2012
CREATE TABLE #t (saleOrder int , amount int)
INSERT INTO #t VALUES
(100,10)
,(100,15)
,(101,20)
,(101,25)
,(102,30)
,(102,35)
SELECT *
,SUM(amount) OVER (PARTITION BY saleorder) as [total]
FROM #t
Result :
saleOrder | amount | total
==========================
100 | 10 | 25
100 | 15 | 25
101 | 20 | 45
101 | 25 | 45
102 | 30 | 65
102 | 35 | 65

Related

Calculate row difference within groups

I'm looking for help with calculating the difference between consecutive ordered rows within groups in SQL (Microsoft SQL server).
I have a table like this:
ID School_ID Enrollment_Start_Date Order
1 56 1/1/2018 10
1 56 5/5/2018 24
1 56 7/7/2018 35
1 103 4/4/2019 26
1 103 3/3/2019 19
I want to calculate the difference between Order, group by ID, School_ID, and order by Enrollment_Start_Date.
so I want something like this:
ID School_ID Enrollment_Start_Date Order Diff
1 56 1/1/2018 10 10 # nothing to be subtracted from 10
1 56 5/5/2018 24 14 # 24-10
1 56 7/7/2018 35 11 # 35-24
1 103 3/3/2019 19 19 # nothing to be subtracted from 19
1 103 4/4/2019 26 7 # 26-19
I have hundreds of IDs, and each ID can have at most 6 Enrollment_Start_Date, so I'm looking for some generalizable implementations.
Use LAG(<column>) analytic function to obtain a "previous" column value specified within the OVER part, then substract current value from it and make it a positive number multiplying it by -1. If previous value isn't present (is null) then take the current value.
Pseudo code would be:
If previous_order_value exists:
-1 * (previous_order_value - current_order_value)
Else
current_order_value
where previous_order_value is based on the same id & school_id and is sorted by enrollment_start_date in ascending order
SQL Code:
select
id,
school_id,
enrollment_start_date,
[order],
coalesce(-1 * (lag([order]) over (partition by id, school_id order by enrollment_start_date ) - [order]), [order]) as diff
from yourtable
Also note, that order keyword is reserved in SQL Server, which is why your column was created with name wrapped within [ ]. I suggest using some other word for this column, if possible.
use lag() analytic function for getting difference of two row and case when for getting orginal value of order column where no difference exist
with cte as
(
select 1 as id, 56 as sclid, '2018-01-01' as s_date, 10 as orders
union all
select 1,56,'2018-05-05',24 union all
select 1,56,'2018-07-07',35 union all
select 1,103,'2019-04-04',26 union all
select 1,103,'2019-03-03',19
) select t.*,
case when ( lag([orders])over(partition by id,sclid order by s_date ) -[orders] )
is null then [orders] else
( lag([orders])over(partition by id,sclid order by s_date ) -[orders] )*(-1) end
as diff
from cte t
output
id sclid s_date orders diff
1 56 2018-01-01 10 10
1 56 2018-05-05 24 14
1 56 2018-07-07 35 11
1 103 2019-03-03 19 19
1 103 2019-04-04 26 7
demo link
Use LAG(COLUMN_NAME)
Query
SELECT id, School_ID, Enrollment_Start_Date, cOrder,
ISNULL((cOrder - (LAG(cOrder) OVER(PARTITION BY id, School_ID ORDER BY Enrollment_Start_Date))),cOrder)Diff
FROM Table1
Samle Output
| id | School_ID | Enrollment_Start_Date | cOrder | Diff |
|----|-----------|-----------------------|--------|------|
| 1 | 56 | 2018-01-01 | 10 | 10 |
| 1 | 56 | 2018-05-05 | 24 | 14 |
| 1 | 56 | 2018-07-07 | 35 | 11 |
| 1 | 103 | 2019-03-03 | 19 | 19 |
| 1 | 103 | 2019-04-04 | 26 | 7 |
SQL Fiddle Demo

Aggregate function on one column, group by on another, leave a third unaffected

I feel like this isn't too bad of a problem but I've been looking for a solution for the greater part of the day to no avail. Other solutions I've seen plenty of that don't seem to help me have been for getting columns that aren't unique values along with a group by and aggregate function.
The problem
I have a table of historical data as follows:
ID | source | value | date
---+--------+-------+-----------
1 | 12 | 10 | 2016-11-16
2 | 12 | 20 | 2015-11-16
3 | 12 | 30 | 2014-11-16
4 | 13 | 40 | 2016-11-16
5 | 13 | 50 | 2015-11-16
6 | 13 | 60 | 2014-11-16
I'm trying to get data before a certain date(within a loop to go different ranges), then getting the sum of the values grouped by source. So as an example "get all records before 30 days ago, and get the sum of the values of the unique sources, using the most recent dated entry for each".
So the first step was to remove entries with dates not in the range, an easy where date < getdate()-30 for example to get:
ID | source | value | date
---+--------+-------+-----------
2 | 12 | 20 | 2015-11-16
3 | 12 | 30 | 2014-11-16
5 | 13 | 50 | 2015-11-16
6 | 13 | 60 | 2014-11-16
Now my issue is finding a way to group by source and take the max date, and then sum up the result across all sources. The idea hear is that we don't know when the last entry is, so before the specified date we get all records, then take the newest entry for each unique source, and sum those up to get the total value at that time.
So the next step would be to group by source using the max of date, resulting in :
ID | source | value | date
---+--------+-------+-----------
2 | 12 | 20 | 2015-11-16
5 | 13 | 50 | 2015-11-16
And then the final step would be to sum the values, and then this process is repeated to get the sum value for multiple dates, so this would result in the row
value | date
-------+-----------
70 | getdate() - 30
to use for the rest.
Where I'm stuck
I'm trying to group by source and use the max of date to get the most recent entry for each unique source, but if I use the aggregate function or group by, then I can't preserve the ID or value columns to stick with the chosen max row. It's totally possible I'm just misunderstanding how aggregate functions work.
Progress so far
The best place I've gotten to yet is something like
with dataInDateRange as (
select *
from #historicalData hd
where hd.date < getdate() - 30
)
select ???, max(date)
from dataInDateRange
group by source
But I'm not seeing how I can do this without somehow preserving a unique ID for the row that has the max date for each source so then I can go back and sum up the numbers.
Thank you great people for any help/guidance/lessons
USE row_number()
with dataInDateRange as (
select *
from #historicalData hd
where hd.date < getdate() - 30
), rows as (
select *,
row_number() over (partition by source
order by date desc) as rn
from dataInDateRange
)
SELECT *
FROM rows
WHERE rn = 1

SQL Query to fill missing gaps across time and get last non-null value

I have the following table in my database:
Month|Year | Value
1 |2013 | 100
4 |2013 | 101
8 |2013 | 102
2 |2014 | 103
4 |2014 | 104
How can I fill in "missing" rows from the data, so that if I query from 2013-03 through 2014-03, I would get:
Month|Year | Value
3 |2013 | 100
4 |2013 | 101
5 |2013 | 101
6 |2013 | 101
7 |2013 | 101
8 |2013 | 102
9 |2013 | 102
10 |2013 | 102
11 |2013 | 102
12 |2013 | 102
1 |2014 | 102
2 |2014 | 103
3 |2014 | 103
As you can see I want to repeat the previous Value for a missing row.
I have created a SQL Fiddle of this solution for you to play with.
Essentially it creates a Work Table #Months and then Cross joins this will all years in your data set. This produces a complete list of all months for all years. I then left join the Test data provided in your example (Table named TEST - see SQL fiddle for schema) back into this list to give me a complete list with Values for the months that have them. The next issue to overcome was using the last months values if this months didn't have any. For that, I used a correlated sub-query i.e. joined tblValues back on itself only where it matched the maximum Rank of a row which has a value. This then gives a complete result set!
If you want to filter by year\month you can add this into a WHERE clause just before the final Order By.
Enjoy!
Test Schema
CREATE TABLE TEST( Month tinyint, Year int, Value int)
INSERT INTO TEST(Month, Year, Value)
VALUES
(1,2013,100),
(4,2013,101),
(8,2013,102),
(2,2014,103),
(4,2014,104)
Query
DECLARE #Months Table(Month tinyint)
Insert into #Months(Month)Values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12);
With tblValues as (
select Rank() Over (ORDER BY y.Year, m.Month) as [Rank],
m.Month,
y.Year,
t.Value
from #Months m
CROSS JOIN ( Select Distinct Year from Test ) y
LEFT JOIN Test t on t.Month = m.Month and t.Year = y.Year
)
Select t.Month, t.Year, COALESCE(t.Value, t1.Value) as Value
from tblValues t
left join tblValues t1 on t1.Rank = (
Select Max(tmax.Rank)
From tblValues tmax
Where tmax.Rank < t.Rank AND tmax.Value is not null)
Order by t.Year, t.Month

How can calculate transaction data

I have a table like this,Consider i have around 5 million records.
Transaction id|Amount|CustomerId|date
1 | 100 | 20 |1/1/2012
2 | 230 | 30 |2/2/2012
3 | 320 | 20 |2/3/2012
etc...
How can i find total amount for last 5 transactions of each customer in each quarter in 2012?
Output: Quarter|Customerid|totalAmount
1 | 20 | 40000
1 | 30 | 300000
2 ...etc...
Please write an efficient method..
You should post the ddl.But You can try something like this.It should work..
with mycte as
(
select customerid,datepart(qq,dt) as qtr,amount,
row_number() over(partition by
datepart (qq,dt),customerid order by dt desc,transaction id desc) as rn
from table where dt >= '01/01/2012'
)
select qtr,customerid,sum(amount) as amt
from mycte
where rn <= 5
group by qtr,customerid
If you want someone else to write efficient queries for you.Then you have to do some hard work by providing the ddl,indexes etc and some sample data and what approaches you have used till now.

sql query to delete only one duplicate row

I've a table with some duplicate rows in it. I want to delete only one duplicate row.
For example I'v 9 duplicate rows so should delete only one row and should show 8 remaining rows.
example
date calling called duration timestampp
2012-06-19 10:22:45.000 165 218 155 1.9 121
2012-06-19 10:22:45.000 165 218 155 1.9 121
2012-06-19 10:22:45.000 165 218 155 1.9 121
2012-06-19 10:22:45.000 165 218 155 1.9 121
from above date should delete only one row and should show 3 rows
2012-06-19 10:22:45.000 165 218 155 1.9 100
2012-06-19 10:22:45.000 165 218 155 1.9 100
2012-06-19 10:22:45.000 165 218 155 1.9 100
from above date should delete only one row and should show 2 rows
How can I do this?
This solution allows you to delete one row from each set of duplicates (rather than just handling a single block of duplicates at a time):
;WITH x AS
(
SELECT [date], rn = ROW_NUMBER() OVER (PARTITION BY
[date], calling, called, duration, [timestamp]
ORDER BY [date])
FROM dbo.UnspecifiedTableName
)
DELETE x WHERE rn = 2;
As an aside, both [date] and [timestamp] are terrible choices for column names...
For SQL Server 2005+ you can do the following:
;WITH CTE AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY [date], calling, called, duration, [timestamp] ORDER BY 1) RN
FROM YourTable
)
DELETE FROM CTE
WHERE RN = 2
Do you have a primary key on the table?
What makes a row a duplicate? Same time? same date? all columns being the same?
If you have a primary key you can use the TOP function to select only one record and delete that one row:
Delete from [tablename] where id in (select top 1 id from [tablename] where [clause])
If you don't mind the order of these rows there is a command in MySQL:
DELETE TOP (numberOfRowsToDelete) FROM db.tablename WHERE {condition for ex id = 5};
Since I don't have the schema, I'd a possible solution in steps:
Apply a row number to the select of all columns
Make a group by with those columns and delete the min(rownumber) in each group
Edit:
The rownumber is in a inner query and will have the rownumber incrementing in all rows. In the outer query I make the group by of the inner query and select the min(rownumber) for each group. Since each group is composed by duplicated rows, I then remove the min(rownumber) for each group.
using LIMIT 1 will help you delete only 1 ROW that matches your DELETE query:
DELETE FROM `table_name` WHERE `column_name`='value' LIMIT 1;
BEFORE:
+----------------------+
| id | column_name |
+-----+----------------+
| 1 | value |
+-----+----------------+
| 2 | value |
+-----+----------------+
| 3 | value |
+-----+----------------+
| 4 | value |
+-----+----------------+
AFTER:
+----------------------+
| id | column_name |
+-----+----------------+
| 1 | value |
+-----+----------------+
| 2 | value |
+-----+----------------+
| 3 | value |
+-----+----------------+

Resources