Custom aggregations in SQL - sql-server

I have a table named industry. There are 6 fields. The schema is given below.
In this case, am needing to perform custom aggregations. There are 22 areas in the database. Two custom aggregations need to be made:
Areas 1-17 need to be combined into a new area with value 00.
Areas 20 and 21 need to be made into another with code value 99.
Next is my attempt at an overall framework for this. I am assuming that creating a new table is the simplest way to accopmlish this. At the bottom is a very short example of the intended result.
create table industry2
(
year char(4),
qtr char(2),
area char(6),
industry char(3),
ownership char(2),
employment numeric(8,0)
);
INSERT INTO Industry2
(year, qtr, area, industry, ownership, employment)
SELECT year, qtr, area, (select sum (employment) from dbo.industry where area
= '01' or area = '02' and so on):
2017 01 01 123000 1 456
2017 01 02 123000 1 101
2017 01 03 123000 1 103
2017 01 01 134000 1 6
2017 01 02 134000 1 7
2017 01 03 134000 1 12
2017 01 09 134000 1 1
2017 01 01 144000 1 14
2017 01 20 134000 1 7
2017 01 21 134000 1 8
Intended result
2017 01 00 123000 1 660
2017 01 00 134000 1 26
2017 01 00 144000 1 14
2017 01 99 134000 1 15

You can define your custom GROUP BY clause with a CASE WHEN statement:
select [year],
[qtr],
case when [area] in('20','21') then '99' when [area] between 1 and 17 then '00' end as [area],
[industry],
[ownership],
sum([employment]) as [employment_sum]
from industry2
group by
[year],
[qtr],
case when [area] in('20','21') then '99' when [area] between 1 and 17 then '00' end,
[industry],
[ownership]
Result:

Related

Grouping items and setting a flag

I have a table structured as follows:
order_yr acct_id indv_id age
2019 323 01 38
2019 323 02 37
2019 323 03 16
2019 323 04 5
2019 325 01 38
2019 326 01 64
2019 326 02 63
What I need to do is by order_yr and acct_id add a flag if the order_yr and acct_id has someone age <=17.
The result would be like this:
order_yr acct_id indv_id age child_flg
2019 323 01 38 1
2019 323 02 37 1
2019 323 03 16 1
2019 323 04 5 1
2019 325 01 38 0
2019 326 01 64 0
2019 326 02 63 0
I know I have to partition by order_yr and acct_id, but not sure how to get the result in one inline script.
Any help would be appreciated.
BTW this is an individual level extract with a number of other columns associated with each indv.
I've not gotten very far - I have this:
,ROW_NUMBER() OVER(PARTITION BY order_yr, acct_id ORDER BY (CASE WHEN age <=17 THEN 'Y' ELSE 'N' END) desc) AS CHILD_flg
You have some options here. One is using a subquery to find out if a row exists that belongs to a group and meets your condition:
select *
, case
when exists (select *
from #data sub
where sub.order_yr = d.order_yr
and sub.acct_id = d.acct_id
and sub.age <= 17)
then 1
else 0
end as flag
from #data d
You can also go with a window function like you planned:
select *
, max(case when age <= 17 then 1 else 0 end) over (partition by order_yr, acct_id) as flag
from #data d
Working demo on dbfiddle

Count Consecutive Days where value greater than 0

Using SQL Server 2012, I am trying create a query that provides me with, say, the top 10 longest wet (or dry) periods from a climate database.
My temp table provides the following data output:
select monthid as [id], date, rain_today
from #raindays
order by monthid asc, date asc
Output:
id date rain_today
-------------------------------
1 24 Dec 2014 2.4
1 25 Dec 2014 0
1 26 Dec 2014 8.7
1 27 Dec 2014 1.8
1 28 Dec 2014 0.3
1 29 Dec 2014 0
1 30 Dec 2014 0
1 31 Dec 2014 0.3
2 01 Jan 2015 0.3
2 02 Jan 2015 0.3
2 03 Jan 2015 18.3
2 04 Jan 2015 0.3
etc. etc.
I would like to return a ranked table that would count the period where rain_today is > 0, (or rain_today = 0) i.e:
Rank Start_Date End_Date Wet Period
----------------------------------------
1 31 Dec 2014 04 Jan 2015 5
2 26 Dec 2014 28 Dec 2014 3
...
The closest I have got from reviewing other similar queries is the following (this is for dry days):
select
#raindays.monthid as id,
min(#raindays.date) as [FirstDryDay],
max(#raindays.date) as [LatestDryDay],
count(*) as countdays
from
(select
monthid,
coalesce(max(case
when rain_today > '0'
then #raindays.date end), '19000101') as latestdry
from
#raindays
group by
monthid) g
join
#raindays on #raindays.monthid = g.monthid
and #raindays.date > g.latestdry
group by
#raindays.monthid
order by
countdays desc
Output:
id FirstDryDay LatestDryDay countdays
-----------------------------------------------
23 21 Oct 2016 31 Oct 2016 11
21 23 Aug 2016 31 Aug 2016 9
**15 23 Feb 2016 29 Feb 2016 7**
10 25 Sep 2015 30 Sep 2015 6
8 28 Jul 2015 31 Jul 2015 4
24 28 Nov 2016 30 Nov 2016 3
29 29 Apr 2017 30 Apr 2017 2
30 30 May 2017 31 May 2017 2
31 29 Jun 2017 30 Jun 2017 2
20 30 Jul 2016 31 Jul 2016 2
7 29 Jun 2015 30 Jun 2015 2
5 30 Apr 2015 30 Apr 2015 1
11 31 Oct 2015 31 Oct 2015 1
17 30 Apr 2016 30 Apr 2016 1
22 30 Sep 2016 30 Sep 2016 1
As you can see, I don't really want to group by id as I want to be able to span over different months and I'm missing other periods that occur earlier in the month. The actual count is working fine correctly it seems, checking above highlighted period:
id date rain_today
15 22 Feb 2016 3.9
15 23 Feb 2016 0
15 24 Feb 2016 0
15 25 Feb 2016 0
15 26 Feb 2016 0
15 27 Feb 2016 0
15 28 Feb 2016 0
15 29 Feb 2016 0
16 01 Mar 2016 3
Thanks in advance for any help!
Is this what you want???
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
DROP TABLE #TestData;
CREATE TABLE #TestData (
id INT NOT NULL ,
[Date] DATE NOT NULL,
Rain_Today DECIMAL(9,2) NOT NULL
);
INSERT #TestData (id, Date, Rain_Today) VALUES
(1, '24 Dec 2014', 2.4),
(1, '25 Dec 2014', 0),
(1, '26 Dec 2014', 8.7),
(1, '27 Dec 2014', 1.8),
(1, '28 Dec 2014', 0.3),
(1, '29 Dec 2014', 0),
(1, '30 Dec 2014', 0),
(1, '31 Dec 2014', 0.3),
(2, '01 Jan 2015', 0.3),
(2, '02 Jan 2015', 0.3),
(2, '03 Jan 2015', 18.3),
(2, '04 Jan 2015', 0.3);
--======================================
WITH
cte_AddRankGroup AS (
SELECT
td.id,
td.Date,
td.Rain_Today,
hr.HasRain,
RankGroup = DENSE_RANK() OVER (PARTITION BY td.id ORDER BY td.Date) -
DENSE_RANK() OVER (PARTITION BY td.id, hr.HasRain ORDER BY td.Date)
FROM
#TestData td
CROSS APPLY ( VALUES (IIF(td.Rain_Today = 0, 0, 1)) ) hr (HasRain)
)
SELECT
arg.id,
BegDate = MIN(arg.Date),
EndDate = MAX(arg.Date),
WetPeriod = IIF(arg.HasRain = 1, 'Wet', 'Dry'),
ConsecutiveDays = COUNT(1)
FROM
cte_AddRankGroup arg
GROUP BY
arg.id,
arg.HasRain,
arg.RankGroup
ORDER BY
arg.id,
MIN(arg.Date);
Results...
id BegDate EndDate WetPeriod ConsecutiveDays
----------- ---------- ---------- --------- ---------------
1 2014-12-24 2014-12-24 Wet 1
1 2014-12-25 2014-12-25 Dry 1
1 2014-12-26 2014-12-28 Wet 3
1 2014-12-29 2014-12-30 Dry 2
1 2014-12-31 2014-12-31 Wet 1
2 2015-01-01 2015-01-04 Wet 4
Edit: Code version using CASE expression in place of IIF...
--======================================
WITH
cte_AddRankGroup AS (
SELECT
td.id,
td.Date,
td.Rain_Today,
hr.HasRain,
RankGroup = DENSE_RANK() OVER (PARTITION BY td.id ORDER BY td.Date) -
DENSE_RANK() OVER (PARTITION BY td.id, hr.HasRain ORDER BY td.Date)
FROM
#TestData td
CROSS APPLY ( VALUES (CASE WHEN td.Rain_Today = 0 THEN 0 ELSE 1 END) ) hr (HasRain)
)
SELECT top 10
arg.id,
BegDate = MIN(arg.Date),
EndDate = MAX(arg.Date),
WetPeriod = CASE WHEN arg.HasRain = 1 THEN 'Wet' ELSE 'Dry' END,
ConsecutiveDays = COUNT(1)
FROM
cte_AddRankGroup arg
WHERE
arg.HasRain = '0' -- Top 10 Dry
--arg.HasRain = '1' -- Top 10 Wet
GROUP BY
arg.id,
arg.HasRain,
arg.RankGroup
ORDER BY
ConsecutiveDays desc, MIN(arg.Date);
Modified original script to produce the Top 10 by each period type which was my ultimate aim (output is from the full dataset):
id BegDate EndDate WetPeriod ConsecutiveDays
31 10 Jun 2017 26 Jun 2017 Dry 17
4 02 Mar 2015 14 Mar 2015 Dry 13
5 12 Apr 2015 24 Apr 2015 Dry 13
20 15 Jul 2016 26 Jul 2016 Dry 12
29 01 Apr 2017 11 Apr 2017 Dry 11
26 17 Jan 2017 27 Jan 2017 Dry 11
23 21 Oct 2016 31 Oct 2016 Dry 11
25 01 Dec 2016 09 Dec 2016 Dry 9
21 10 Aug 2016 18 Aug 2016 Dry 9
21 23 Aug 2016 31 Aug 2016 Dry 9
This problem can be resolved by recursion in this way:
-- this variable is needed to stop the recursion
declare #numrows int=(select count(1) from #raindays)
-- add a row number to the table creating a new table as "tabseq"
;WITH tabseq as (select row_number() over(order by date) as rownum, * from #raindays),
-- apply recursion to tabseq keeping a toggle running totals of wet and dry periods
CTE as
(
select *,
(case when rain_today=0 then 1 else 0 end) as dry,
(case when rain_today>0 then 1 else 0 end) as wet
from tabseq where rownum=1
union all
select s.*,
(case when s.rain_today=0 then cte.dry+1 else 0 end) as dry,
(case when s.rain_today>0 then cte.wet+1 else 0 end) as wet
from tabseq s
join cte on s.rownum=cte.rownum+1
where s.rownum<=#numrows
)
select * from cte
Once you have the table (cte) with the dry/wet accumulators you can order and select from it to suit your output requirements.
Please note that this is assuming consecutive days on the table, if there are gaps then instead of adding +1 on the case statement you may need to add a datediff to one side or the other depending how you consider the missing dates (wet or dry).

recusively group and count/max records for a given datetime interval

I had another question opened, but this one is hopefully more clear and targeted at what I'm needing assistence with.
Sample data: (SQL Fiddle link included below)
groupid custid cust_type cust_date data_total_1 data_total_2
CA123 ABC12345 SLE January, 01 2014 5 10
CA123 ABC12345 SLE February, 01 2014 2 5
CA123 ABC12345 SLE March, 01 2014 7 11
CA123 ABC12345 SLE April, 01 2014 7 4
FL444 BBB22222 SLE January, 01 2014 2 3
FL444 BBB22222 SLE March, 01 2014 7 21
FL444 BBB22222 SLE July, 01 2014 3 9
WA999 ZZZ99909 NSLE April, 01 2014 2 10
WA999 ZZZ99909 NSLE May, 01 2014 4 9
For each given groupid, custid, cust_type combination, I need to grab evaluate records within a given time interval (3 months). I need to count the number of records and grab the max data_total_x values that exist within the "range" for each record.
My expected output looks similar to this:
groupid custid cust_type cust_date custid_count max_data_total_1 max_data_total_2
CA123 ABC12345 SLE January, 01 2014 4 7 11
CA123 ABC12345 SLE February, 01 2014 3 7 11
CA123 ABC12345 SLE March, 01 2014 2 7 11
CA123 ABC12345 SLE April, 01 2014 1 7 4
FL444 BBB22222 SLE January, 01 2014 2 7 21
FL444 BBB22222 SLE March, 01 2014 1 7 21
FL444 BBB22222 SLE July, 01 2014 1 3 9
WA999 ZZZ99909 NSLE April, 01 2014 2 4 10
WA999 ZZZ99909 NSLE May, 01 2014 1 4 9
SQL Fiddle that includes sample data and my attempt at it:
http://sqlfiddle.com/#!6/ba5a53/10/0
Any assistance would be appreciated.
I think this should do it:
select
groupid,
custid,
cust_type,
f.custid_count,
f.max_data_total_1,
f.max_data_total_2
from records r
outer apply (
select
count(*) as custid_count,
max(data_total_1) as max_data_total_1,
max(data_total_2) as max_data_total_2
from
records r2
where
r.groupid = r2.groupid and
r.custid = r2.custid and
r.cust_type = r2.cust_type and
r2.cust_date <= dateadd(month, 3, r.cust_date) and
r2.cust_date >= r.cust_date
) f
SQL Fiddle: http://sqlfiddle.com/#!6/ba5a53/14

Query to compare two year sales

I have a table in SQL Server with these columns:
Year
Month
Product
Qty
Example:
Year Month Product Qty
2011 1 XYZQW 45
So in this table was stored all product sales.
I need to build a query to compare one year and its previous to build this report:
Year GEN FEB MAR APR MAY GIU JUL AUG SEP OCT NOV DEC
-------------------------------------------------------
2011 12 23 56 54 14 11 15 18 89 87 48 98
2012 19 21 55 50 24 10 19 17 88 81 45 90
There is a way to do this without creating a temporary table?
This is simple, try this:
WITH DT(AMonth,AYear,AQty)
AS (SELECT Month, Year, Qty
FROM YourTable)
SELECT pvt.*
FROM DT cte
PIVOT
(SUM(AQty)
FOR AYear IN ( [2011],[2012],[2013],[2014] ) ) AS pvt

Filling null row values with last non-null values - SQL table

I'm struggling with the following problem. Consider the example table posted below. What I need to do is to update the table, specifically the NULL values on each row with the "last" non-NULL values. For example, the NULL values on rows 3 and 4 should be updated with the values of row 2 of the same column, that is
2 007585102 2001 03 31 2001 04 12 2 154980 6300 154980 6300
3 007585102 2001 03 31 2001 04 19 2 154980 6300 154980 6300
4 007585102 2001 03 31 2001 04 26 2 154980 6300 154980 6300
and NULL values on rows 9 to 15 updated with the values of row 8 and so on.
I honestly have no idea how to do this and I will greatly appreciate any help. Thanks in advance.
Sorry about the extremely poor formatting of the table but I can't post anything but plain text.
EXAMPLE TABLE
1 007585102 2001 03 31 2001 04 05 2 543660 22100 543660 22100
2 007585102 2001 03 31 2001 04 12 2 154980 6300 154980 6300
3 007585102 NULL 2001 04 19 NULL NULL NULL NULL NULL
4 007585102 NULL 2001 04 26 NULL NULL NULL NULL NULL
5 007585102 2001 03 31 2001 05 03 2 2726664 110840 2726664 110840
6 007585102 2001 03 31 2001 05 10 2 836400 34000 836400 34000
7 007585102 2001 03 31 2001 05 17 2 534804 21740 7634364 310340
8 007585102 2001 03 31 2001 05 24 2 4920 200 4920 200
9 007585102 NULL 2001 05 31 NULL NULL NULL NULL NULL
10 007585102 NULL 2001 06 07 NULL NULL NULL NULL NULL
11 007585102 NULL 2001 06 14 NULL NULL NULL NULL NULL
12 007585102 NULL 2001 06 21 NULL NULL NULL NULL NULL
13 007585102 NULL 2001 06 28 NULL NULL NULL NULL NULL
14 007585102 NULL 2001 07 05 NULL NULL NULL NULL NULL
15 007585102 NULL 2001 07 12 NULL NULL NULL NULL NULL
16 007585102 2001 06 30 2001 07 19 2 2693301 118300 2693301 118300
17 007585102 2001 06 30 2001 07 26 2 232220 10200 NULL NULL
I'm not very proud of my answer, but at least it works. Find more elegant way on your own. I'd suggest recursive cte.
drop table #temp
GO
select
*
into #temp
from (
select 1 as id, '2001 03 31' as dat union all
select 2, '2001 03 31' union all
select 3, null union all
select 4, null union all
select 5, '2001 03 31' union all
select 6, '2001 03 31' union all
select 7, '2001 03 31' union all
select 8, '2001 03 31' union all
select 9, null union all
select 10, null union all
select 11, null union all
select 12, null union all
select 13, null union all
select 14, null union all
select 15, null union all
select 16, '2001 06 30' union all
select 17, '2001 06 30'
) x
update t
set
t.dat = t2.dat
from #temp t
join (
select
t1.id, max(t2.id) as maxid
from #temp t1
join #temp t2
on t1.id>t2.id
and t2.dat is not null
and t1.dat is null
group by
t1.id
) x
on t.id=x.id
join #temp t2
on t2.id=x.maxid
select * from #temp
I have explained this in details here:
https://koukia.ca/common-sql-problems-filling-null-values-with-preceding-non-null-values-ad538c9e62a6#.k0dxirgwu
here is the TSQL you need,
SELECT *
INTO #Temp
FROM ImportedSales;
;With CTE
As
(
SELECT ProductName
, Id
, COUNT(ProductName) OVER(ORDER BY Id ROWS UNBOUNDED PRECEDING) As MyGroup
FROM #Temp
),
GetProduct
AS
(
SELECT [ProductName]
, First_Value(ProductName) OVER(PARTITION BY MyGroup ORDER BY Id ROWS UNBOUNDED PRECEDING) As UpdatedProductName
FROM CTE
)
UPDATE GetProduct
Set ProductName = UpdatedProductName;
SELECT *
FROM #TemP;
In redshift, and I think other sql flavors, you can combine nvl() and lag functions, being sure to use the ignore nulls option when using lag().
I adapted this from https://blog.jooq.org/2015/12/17/how-to-fill-sparse-data-with-the-previous-non-empty-value-in-sql/ .
I'm going to the call the second date field in your example "date_field" since I didn't see a column header.
Let's assume you have a column called "row_number" on which you can correctly order your values.
Then the example using your above data would be like
select nvl(date_field,lag(date_field,1)) ignore nulls over ([partition by whatever] order by rownumber).
This should grab the nearest non-null value above in the column (which is ordered by whatever columns you specified), and replace the nulls until it hits a non-null value.
Ignore nulls is the key b/c otherwise you'll just grab the first non-null, then the next null, so that you only replace one row.
HTH.

Resources