SQL: GROUP BY where a column is unique

SQL: GROUP BY where a column is unique - database

So I have this one giant table e.g.
PROD_IDGEOG_IDTIME_IDVALUE1
1 MT JAN 100
1 MT FEB 100
2 MT JAN 100
2 MT FEB 100
3 TT MARCH 100
And I want to receive Jan and Feb data only in the geography MT. Then sum the Value1's together where PROD_ID matches.
So the end result is:
PROD_IDGEOG_IDVALUE1
1 MT 200
2 MT 200
I have managed to get the data down to TIME_ID only using:
SELECT PROD_ID, GEOG_ID, TIME_ID, VALUE1 FROM database WHERE GEOG_ID = 'MT' AND TIME_ID IN ('JAN', 'FEB')
so I have :
PROD_IDGEOG_IDTIME_IDVALUE1
1 MT JAN 100
1 MT FEB 100
2 MT JAN 100
2 MT FEB 100
but now I am unsure how to use the group by function on PROD_ID since TIME_ID is unique.
Any thoughts?
Many thanks!

You can use the TIME_ID in the where clause without selecting it, meaning it doesn't have to go in the group by statement.
SELECT PROD_ID, GEOG_ID, SUM(VALUE1) AS TOTAL
FROM database
WHERE GEOG_ID = 'MT'
AND TIME_ID IN ('JAN', 'FEB')
GROUP BY PROD_ID, GEOG_ID

Related

Timeseries chart with aggregation over overlapping intervals

My dataset is of the following form:
Start Date End Date Count Contact
2019-01-20 2019-05-10 50 A
2019-03-05 2019-06-07 20 A
2019-03-05 2019-06-07 20 B
....
I want a timeseries chart where the X axis is months, and the Y axis is the total count.
E.g.
The entries would be
Month TotalCount Contact
Jan 50 A
Jan 0 B
Feb 50 A
Feb 0 B
Mar 70 A
Mar 20 B
Apr 70 A
Apr 20 B
May 70 A
May 20 B
Jun 20 A
Jun 20 B
Jul 0 A
Jul 0 B
...
How can I achieve this in Data Studio? The data is coming from bigquery.

You wouldn't be able to do that in Data Studio visually alone without manipulating your dataset first.
You could instead use a custom query to generate the a date range of month starts (https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#generate_date_array) and then join all rows from your dataset where the month start or between your start date and end date.
Example code for your example
WITH CTE AS (
SELECT date('2019-01-20') start_date, date('2019-05-10') end_date, 50 count, 'A' contact
UNION ALL SELECT date('2019-03-05'), date('2019-06-07'), 20, 'A'
UNION ALL SELECT date('2019-03-05'), date('2019-06-07'), 20, 'B'
), monthDates AS (
SELECT months FROM
UNNEST(GENERATE_DATE_ARRAY('2019-01-01', '2019-07-01', INTERVAL 1 MONTH)) months
), monthContact AS (
SELECT months, contact
FROM monthDates m
CROSS JOIN CTE
GROUP BY 1,2
)
SELECT months, ifnull(sum(count),0) count, m.contact
FROM monthContact m
LEFT JOIN CTE c ON m.months BETWEEN start_date AND end_date AND m.contact = c.contact
GROUP BY 1,3
ORDER BY 1,3

Use join with CTE

I have a table in which i am inserting some records every week. There is a column for Date. I want to compare the data of last week and this week using column key. Below is my table:
Name Date Key
ABC 07 June 1
BAC 07 June 2
WSD 07 June 3
QWE 14 June 9
QWT 14 June 2
DEF 14 June 1
CXZ 14 June 6
I want the data of 14 June in which key is same as in data of 07th june.
Desired output:
Name Date Key
QWT 14 June 2
DEF 14 June 1
I am using CTE to join but i am not getting the desired results.
;WITH T1
AS
(SELECT * FROM [Table] where [Date]= '07 June'),
T2
AS
(SELECT * FROM [Table] where [Date]= '14 June')
SELECT *
FROM T2
INNER JOIN T1 ON T1.[KEY] = T2.[KEY];

What you have should be returning the results you stated that you want. I would maybe simplify this a little bit to a single query with a self join. Something like this.
select t2.*
from [Table] t
join [Table] t2 on t.MyKey = t2.MyKey
where t.MyDate = '07 June'
and t2.MyDate = '14 June'

If you want the data of 14 June in which key is the same as in data of 07th June. You can use intersect:
select
t1.*
from
table
as t1
where
t1.MyDate = '07 June'
intersect select
t2.*
from
table
as t2
where
t2.MyDate = '14 June

You can also think everything dynamic as below. This will return you result regardless what is the date is. this will always compare a row with the row with date 7 day less.
You can check DEMO HERE
SELECT A.*
FROM your_table A
INNER JOIN your_table B
ON A.[Key] = B.[Key] AND DATEADD(DD,7,B.[Date]+ ' 2019') = A.[Date] + ' 2019'
-- Added 2019 To make the string as date

Custom ordering of records in SQL Server

I have a table such as http://sqlfiddle.com/#!6/e4f6f which contains records that need to be reported on an Excel using SSIS.
However, the ordering of records needs to be custom.
Such as below
AID BID CID CurrencyID ClassID Year MetricID Value ReferenceID
220 1 3 6 1147 2012 C1 653465.751842658967 V001
220 1 3 6 1147 2012 C2 0.000000000000 V001
220 1 3 6 1156 2012 C1 1151019.50078003120 V001
220 1 3 6 1156 2012 C2 0.000000000000 V001
As you can see the records are grouped such that they are ordered on ReferenceID first and then all the other dimension keys except the MetricID. Any help is much appreciated

Put MetricID as the last column in your ORDER BY:
SELECT *
FROM [FactValidationResult]
ORDER BY
ReferenceID,
AID,
BID,
CID,
CurrencyID,
Year,
ClassID,
MetricID

SQL Server: How to get a rolling sum over 3 days for different customers within same table

This is the input table:
Customer_ID Date Amount
1 4/11/2014 20
1 4/13/2014 10
1 4/14/2014 30
1 4/18/2014 25
2 5/15/2014 15
2 6/21/2014 25
2 6/22/2014 35
2 6/23/2014 10
There is information pertaining to multiple customers and I want to get a rolling sum across a 3 day window for each customer.
The solution should be as below:
Customer_ID Date Amount Rolling_3_Day_Sum
1 4/11/2014 20 20
1 4/13/2014 10 30
1 4/14/2014 30 40
1 4/18/2014 25 25
2 5/15/2014 15 15
2 6/21/2014 25 25
2 6/22/2014 35 60
2 6/23/2014 10 70
The biggest issue is that I don't have transactions for each day because of which the partition by row number doesn't work.
The closest example I found on SO was:
SQL Query for 7 Day Rolling Average in SQL Server
but even in that case there were transactions made everyday which accomodated the rownumber() based solutions
The rownumber query is as follows:
select customer_id, Date, Amount,
Rolling_3_day_sum = CASE WHEN ROW_NUMBER() OVER (partition by customer_id ORDER BY Date) > 2
THEN SUM(Amount) OVER (partition by customer_id ORDER BY Date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
END
from #tmp_taml9
order by customer_id
I was wondering if there is way to replace "BETWEEN 2 PRECEDING AND CURRENT ROW" by "BETWEEN [DATE - 2] and [DATE]"

One option would be to use a calendar table (or something similar) to get the complete range of dates and left join your table with that and use the row_number based solution.
Another option that might work (not sure about performance) would be to use an apply query like this:
select customer_id, Date, Amount, coalesce(Rolling_3_day_sum, Amount) Rolling_3_day_sum
from #tmp_taml9 t1
cross apply (
select sum(amount) Rolling_3_day_sum
from #tmp_taml9
where Customer_ID = t1.Customer_ID
and datediff(day, date, t1.date) <= 3
and t1.Date >= date
) o
order by customer_id;
I suspect performance might not be great though.

T-SQL Query to remove duplicate records in the output based on one particular column

I am running SQL Server 2014 and I have the following T-SQL query:
USE MYDATABASE
SELECT *
FROM RESERVATIONLIST
WHERE [MTH] IN ('JANUARY 2015','FEBRUARY 2015')
RESERVATIONLIST mentioned in the code above is a view. The query gives me the following output (extract):
ID NAME DOA DOD Nights Spent MTH
--------------------------------------------------------------------
251 AH 2015-01-12 2015-01-15 3 JANUARY 2015
258 JV 2015-01-28 2015-02-03 4 JANUARY 2015
258 JV 2015-01-28 2015-02-03 2 FEBRUARY 2015
The above output consist of around 12,000 records.
I need to modify my query so that it eliminates all duplicate ID and give me the following results:
ID NAME DOA DOD Nights Spent MTH
--------------------------------------------------------------------
251 AH 2015-01-12 2015-01-15 3 JANUARY 2015
258 JV 2015-01-28 2015-02-03 4 JANUARY 2015
I tried something like this, but it's not working:
USE MYDATABASE
SELECT *
FROM RESERVATIONLIST
WHERE [MTH] IN ('JANUARY 2015', 'FEBRUARY 2015')
GROUP BY [ID]
HAVING COUNT ([MTH]) > 1

Following query will return one row per ID :
SELECT * FROM
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT NULL)) rn FROM RESERVATIONLIST
WHERE [MTH] IN ('JANUARY 2015','FEBRUARY 2015')
) T
WHERE rn = 1
Note : this will return a random row from multiple rows having same ID. IF you want to select some specific row then you have to define it in order by. For e.g. :
SELECT * FROM
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DOA DESC) rn FROM RESERVATIONLIST
WHERE [MTH] IN ('JANUARY 2015','FEBRUARY 2015')
) T
WHERE rn = 1
definitely, it will return the row having max(DOA).

You are trying to do a GROUP BY statement which IMHO is the right way to go. You should formulate all columns that are a constant, and roll-up the others. Depending on the value of DOD and DOA I can see two solutions:
SELECT ID,NAME,DOA,DOD,SUM([Nights Spent]) as Nights,
min(MTH) as firstRes, max(MTH) as lastRes
FROM RESERVATIONLIST
GROUP BY ID,NAME,DOA,DOD
OR
SELECT ID,NAME,min(DOA) as firstDOA,max(DOD) as lastDOD,SUM([Nights Spent]) as Nights,
min(MTH) as firstRes, max(MTH) as lastRes
FROM RESERVATIONLIST
GROUP BY ID,NAME

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SQL: GROUP BY where a column is unique - database

You can use the TIME_ID in the where clause without selecting it, meaning it doesn't have to go in the group by statement. SELECT PROD_ID, GEOG_ID, SUM(VALUE1) AS TOTAL FROM database WHERE GEOG_ID = 'MT' AND TIME_ID IN ('JAN', 'FEB') GROUP BY PROD_ID, GEOG_ID

Related

Timeseries chart with aggregation over overlapping intervals

Use join with CTE

Custom ordering of records in SQL Server

SQL Server: How to get a rolling sum over 3 days for different customers within same table

T-SQL Query to remove duplicate records in the output based on one particular column

Categories

Resources