how to find the max number of days for each city? - sql-server

As you see below, there are many cities and days(period of raining days).
I want to find the max days even if days are the same.
For example if maximum days are 3 and there are two 3 days, then i want to print out two rows.
Possible outputs would be:
Auckland 2013-11-30 2013-11-30 5
Christchurch 2013-11-10 2013-11-50 4
If there are only 5 cities, there might be 5 rows to 10 rows depending on the same value of days.
I want to use SELECT, IF or CASE, MAX or Count functions, as this part is one of the complete, complex code.
Thank you.
SQL Version:
Microsoft SQL Server 2016 (RTM-GDR) (KB4019088) - 13.0.1742.0 (X64) Jul 5 2017 23:41:17 Copyright (c) Microsoft Corporation Developer Edition (64-bit) on Windows Server 2012 R2 Standard 6.3 (Build 9600: ) (Hypervisor)
this is the example data:
create TABLE practice10 (
station VARCHAR(50),
start_date DATE,
end_date DATE,
days INT,
)
INSERT INTO practice10 values ('Auckland','2013-10-5','2013-10-10', 5),
('Auckland','2013-10-15','2013-10-17', 2),
('Auckland','2013-10-20','2013-10-23', 3),
('Manchester','2015-9-1','2013-9-4', 3),
('Manchester','2013-10-3','2013-10-3', 0),
('Manchester','2013-10-20','2013-10-29', 9);

Order days using dense_rank(). Then use top 1 to get days with highest values
select
top 1 with ties *
from
myTable
order by dense_rank() over (partition by station order by days desc)

Related

Linear Interpolation in MS SQL Server

I work with annual mileage for each customer. Years are consecutive in range 2009-2022, so i have no gaps in years and some customers for instance can have records from 2009 to 2020, and so on, but all years are consecutive there are no gaps. The issue with Annual_Mlg is that there are NULL values time to time. NULLs appear randomly: there can be 3 consecutive NULLs at the beginning of range for certain customer or in the middle there can be 4 NULLs, and last 3 values can be NULLs as well. So NULL values are random. I need to have mileage for EVERY SINGLE YEAR. If mileage was 0 that is fine. I just should use it in computation. 0 mileage means customer was not traveling that year at all. So that number is acceptable just like any other valid mileage. By valid mileage i mean non-negative mileage, and table with which i work has ONLY non-negative mileage (so valid mileage). I need to get rid of NULLs and where i have NULLs i use either previous record or next record or interpolation should be used to get rid of NULLs completely. There should be NO default value. I DO HOPE THAT THERE IS A CODE BETTER TO FIX MY PROBLEM. ALSO IF THERE AN EASIER WAY TO FIX PROBLEM IN 'R' (programming language) then I am ALL EARS, however i have not attempted to solve it in 'R' since i suck more in 'R' then in MS SQL.
I have created Previous Mlg and Next Mlg in order to do interpolation if NULL value is sandwiched between non-NULL 2 values. That seems to be working. Annual_mlg this field where the original values are in. Final_mlg this is where i need to have all values as non-NULLs. I used COALESCE to get rid of NULLs to some extent. I am a relatively new in SQL and i have not enough coding skills to tackle this type of problem on my own it seems to be. So if to take a look at what i have inside COALESCE i can explain my logic.
3rd value inside COALESCE deals with case when there are 2 consecutive NULLs somewhere in the middle of 2009-2022 range and then code comes to first NULL it takes previous mileage that is Prev_Mlg, but then for 2nd NULL it takes previous Mileage which i just got and averages it with next non-NULL value.
I use bunch of lead() and lag() those are to account for cases when there are bunch of records with NULLs at either beginning or at the end. This code does not obviously solve all problems and looks like a joke.
I am looking to have more meaningful code. NOTE: there should not be a default value, all NULLs should be replaced and the logic for this goes as follows:
If there is a NULL sandwiched between 2 non-NULLs then code should interpolate;
if there is NULL at the beginning of range then it should be replaced with FIRST non-NULL value in range
by range i mean years from 2009 - 2022
if there is NULL at the end of range then it should be replaced by FIRST non-NULL value (if to count from the end of the range)
Once NULL value is replaced then this should be used in interpolation. I will provide examples what kind of result i need to get.
Any help is GREATLY appreciated!!! THANK YOU SO MUCH!!! I AM REALLY STRUGGLING!!!
SELECT
Customer,
Year,
Final_Mlg = COALESCE(Annual_Mlg, (Prev_Mlg + Next_Mlg)/2,
(lag(Final_Mlg) over (partition by Customer order by Year) + Next_Mlg)/2, Prev_Mlg, Next_Mlg,
lead(Final_Mlg) over (partition by Customer order by Year),
lead(Final_Mlg,2) over (partition by Customer order by Year),
lead(Final_Mlg,3) over (partition by Customer order by Year),
lead(Final_Mlg,4) over (partition by Customer order by Year),
lead(Final_Mlg,5) over (partition by Customer order by Year),
lead(Final_Mlg,6) over (partition by Customer order by Year),
lead(Final_Mlg,7) over (partition by Customer order by Year),
lead(Final_Mlg,8) over (partition by Customer order by Year),
lead(Final_Mlg,9) over (partition by Customer order by Year),
lead(Final_Mlg,10) over (partition by Customer order by Year),
lead(Final_Mlg,11) over (partition by Customer order by Year),
lag(Final_Mlg) over (partition by Customer order by Year),
lag(Final_Mlg,2) over (partition by Customer order by Year),
lag(Final_Mlg,3) over (partition by Customer order by Year),
lag(Final_Mlg,4) over (partition by Customer order by Year),
lag(Final_Mlg,5) over (partition by Customer order by Year),
lag(Final_Mlg,6) over (partition by Customer order by Year),
lag(Final_Mlg,7) over (partition by Customer order by Year),
lag(Final_Mlg,8) over (partition by Customer order by Year),
lag(Final_Mlg,9) over (partition by Customer order by Year),
lag(Final_Mlg,10) over (partition by Customer order by Year),
lag(Final_Mlg,11) over (partition by Customer order by Year)),
Annual_Mlg,
Prev_Mlg,
Next_Mlg
FROM #table2
ORDER BY
Customer,
Year
CASE 1: couple of values are consecutive NULLs at the BEGINNING of range. There is a desired result below.
Year
Customer
Annual_Mileage
2009
A
NULL(Should be Replaced by 3)
2010
A
NULL(Should be Replaced by 3)
2011
A
NULL(Should be Replaced by 3)
2012
A
3
2013
A
4
2014
A
5
2015
A
6
2016
A
7
2017
A
8
2018
A
9
2019
A
10
2020
A
11
2021
A
12
2022
A
13
CASE 2: couple of values are consecutive NULLs at the END of range.
Year
Customer
Annual_Mileage
2009
A
3
2010
A
3
2011
A
3
2012
A
3
2013
A
4
2014
A
5
2015
A
6
2016
A
7
2017
A
8
2018
A
9
2019
A
10
2020
A
NULL(Should be Replaced by 10)
2021
A
NULL(Should be Replaced by 10)
2022
A
NULL(Should be Replaced by 10)
CASE 3: There are some NULLs in between non-NULL values.
Year
Customer
Annual_Mileage
2009
A
1
2010
A
NULL (Should be replaced by 1.5)
2011
A
2
2012
A
3
2013
A
4
2014
A
NULL (Should be replaced by 4.5)
2015
A
5
2016
A
NULL (Should be replaced by 5.5)
2017
A
6
2018
A
NULL (Should be replaced by 6)
2019
A
NULL (Should be replaced by 6.5)
2020
A
7
2021
A
8
2022
A
8

Google Data Studio Table: Dividing Data that has 2 different Years

I need to produce a table that has Quotes win%. The formula is #won divide by #sent.
My problem is, there are quotes that are won within a year but were sent in different years.
(My data comes from BigQuery)
The data looks like this:
Sale Sent Won
sale1 2019 2020
sale2 2019 2020
sale3 2016 2017
sale4 2017 2019
sale5 2020 2020
sale6 2020 2020
sale7 2018 2018
sale8 2016 2016
sale9 2015 2016
sale10 2016 2017
sale11 2016 2018
sale12 2018 2019
I'd like to be able to create a table in data studio like this:
Year SENT WON WIN%
2016 4 2 50%
2017 1 2 200%
2018 2 2 100%
2019 2 2 100%
2020 2 4 200%
I would love to see if this is possible in google data studio. Any suggestion is highly appreciated.
Added a Google Data Studio Report to demonstrate, as well as a GIF showing the process below.
One approach is to restructure the Data at the Data Set and use Calculated Fields in a Table:
1) Data Transformation
The data needs to be transformed from the current Wide structure to a Long data structure. One way it can be achieved in Google Sheets is by using the formula below (Sheet1 represents the input sheet; consult embedded Google Sheet for clarification):
=ArrayFormula(QUERY({
{Sheet1!A:A,IF(LEN(Sheet1!A:A),"Sent",""),Sheet1!B:B};
{Sheet1!A:A,IF(LEN(Sheet1!A:A),"Won",""),Sheet1!C:C}
},"Select * Where Col3 is not null Label Col2 'Dimension', Col3 'Year'",1))
2) Table
- Dimension: Year
- Sort: Year in Ascending order
- Metrics: Add the 3 calculated fields below:
3) Calculated Fields
The formulas below create the metrics used in the Table above (Formula 3.1 and 3.2 need to be added at the Data Source-level, while 3.3 can be added at the Chart-level if required):
3.1) SENT
COUNT(CASE
WHEN REGEXP_MATCH(Dimension, "Sent") THEN Year
ELSE NULL END)
3.2) WON
COUNT(CASE
WHEN REGEXP_MATCH(Dimension, "Won") THEN Year
ELSE NULL END)
3.3) WIN%
WON / SENT

Query to find records with more than one date

I'm trying to build a query for a MS SQL database that will find records with more than one year but not the records with only one.
Lets say I have a car dealership and I have 1 Chevy from 2015 and 2 from 2017 then I would want to find Chevy 2015 1 and chevy 2017 2 but if I have a three Fords from 2018 and only 2018 then I don't want that at all.
I have tweeked with groups and joins but I don't get any where. So I need Select from table something. I'm leaning toward a pivot table but not sure what to do. Thanks for the help
MyTable Contents
Model year count
Chevy 2012 1
Chevy 2012 1
Chevy 2015 1
Ford 2018 1
Ford 2018 1
Ford 2018 1
Buick 2017 1
Lexus 2017 1
Lexus 2015 1
Desired Result Set
Chevy 2012 2
Chevy 2015 1
Lexus 2017 1
Lexus 2015 1
Because it has 2 different years for the model
The below query should help you. Need not hardcode model values.
Select T.Model,T.[year] ,count(T.[year])
from T
join (select distinct * from T) S on T.model = S.model and T.year!=S.year
group by T.Model,T.[year]
You need to use SUM function and group by on subquery,Because there might be Multiple count on count column. then join itself and distinct to exclude duplicate data.
Select distinct t1.*
from (
SELECT Model,[year] ,sum([count]) totle
FROM T
group by Model,[year]
) t1
inner join T t2 on t1.Model = t2.Model and t1.[year] !=t2.[year]
sqlfiddle:http://sqlfiddle.com/#!18/e8756/55
Note:[table],[year] are keyword in sql avoid naming it as column name

sum up every 12 months in a table

i have a table calculating the installments. in that table i'm saving all the data recording to that. For example if i'm calculating for 60 installments and saving all the data,so it is like 60 months. so now i need to sum up the value of one column for every 12 months. sometimes v start paying the installments from the middle of the year also.
my DB looks like this.the highlighted column must sum up for every 12 months. two images are one table only
suppose i have 30 installments from starting on jun 2012.suppose i started paying installment from jun 2012 then should sum up the installments from jun 2012 to may 2013. v can't use group by year. i must sum up like this ................................................................................‌​
sum jun 2012 to may 2013
sum jun 2013 to may 2014
sum jun 2014 to nov 2014 ( only 6 months left)
You can use ROW_NUMBER to generate a group of 12 months:
WITH Cte AS(
SELECT *,
RN = (ROW_NUMBER() OVER(ORDER BY InstallmentMonth) - 1)/ 12
FROM your_table
)
SELECT
SUM(InteresetPerInstallment)
FROM Cte
GROUP BY RN

One to many Relationship in SQL Server Analysis Services

I have these tables:
DimDate (PK: DateKey, other attributes)
FactActivationCodes (PK: ActivationCode, IssuedDateKey (FK to DimDate)
FactExpirations (PK: ActivationCode + ExpirationType, FK: ActivationCode to FactActivationCodes)
I set up measures that count the number of rows in
Issued Count (count of rows in FactActivationCodes)
Expired Count (count of distinct ActivationCodes in FactExpirations)
The idea is that the FactActivationCodes has one activation code, with a date when it was issued. The activation code can get expired year after year (and then renewed) so it would have a row for expiration in FactExpirations (one each year)
I put some test rows in the tables; I put 3 rows in FactActivationCodes (different IssuedDate for each) , and only 2 in FactExpirations. When I browse the cube, and I am looking at the count of Issued on columns, and the Issued Date (dimension) on rows, it looks like this:
Issued Date
January 2008 1
February 2008 1
March 2008 1
But then, when I add the Expired Count, I was hoping to see the 'expired column' count with only the ones that match the 'Activation Code' like so, because of the one to many relationship between the two fact tables:
Issued Date Expired Date
January 2008 1 1
February 2008 1 1
March 2008 1 0
But instead, I a cross join of everything like so, with the totals of expired:
Issued Date Expired Date
January 2008 1 2
February 2008 1 2
March 2008 1 2
April 2008 2
May 2008 2
June 2008 2
And onwards, for every date entry in my Date Dimensions... I guess I'm not doing the relationship correctly... how can I get the expected result?
The answer to use referenced relationship: http://technet.microsoft.com/en-us/library/ms166704.aspx

Resources