I work with Sales and problem is that this table does not have records for each client for every year. Records are missing randomly. Instead i need to have those years there and put 0 for sales for those years for my analysis.
I have limited knowledge of SQL. Can anybody help on this one? What i have as of now and what i would like to have is shown below.
I have thoughts to use LAG() function, but missing records can be for 2 years in a row or 3. I am not sure how to tackle such problem.
What I have now:
Client_ID
SalesYear
Sales
1
2010
12
1
2012
20
1
2013
21
1
2016
14
What i need to have:
Client_ID
SalesYear
Sales
1
2010
12
1
2011
0
1
2012
20
1
2013
21
1
2014
0
1
2015
0
1
2016
14
You need a complete list of years to outer-join with.
You can do this a number of ways, the basic principle would be:
with y as (
select * from (values(2010),(2011),(2012),(2013),(2014),(2015),(2016))y(y)
)
insert into t (Client_Id, SalesYear, Sales)
select 1, y.y, 0
from y
where not exists (select * from t where t.SalesYear = y.y);
Something like this might help:
DECLARE #Sales TABLE
(Client_ID int, SalesYear int, Sales money)
INSERT INTO #Sales(Client_ID, SalesYear, Sales) SELECT 1, 2010, 12
INSERT INTO #Sales(Client_ID, SalesYear, Sales) SELECT 1, 2012, 20
INSERT INTO #Sales(Client_ID, SalesYear, Sales) SELECT 1, 2013, 21
INSERT INTO #Sales(Client_ID, SalesYear, Sales) SELECT 1, 2016, 14;
with years as
(
select 2000 as theYear
UNION ALL
select y.theYear + 1 as theYear
from years y
where y.theYear + 1 <= YEAR(GetDate())
)
select
Y.theYear, S.Client_ID, S.Sales
FROM
Years Y
LEFT JOIN
#Sales S ON S.SalesYear = Y.theYear
option (maxrecursion 0)
You can change "2000" to something more appropriate.
I have a single table of payments called PYMT, and am trying to wrap my head around using a PIVOT if possible to get a certain arrangement for an output and befuddled as how to do this. In the table are pymt_amount and pymt_date (other columns too, but not necessary). I wish to see the output to look like so:
PayMonth 2007 2008 2009 2010 2011
1 26044.12 82663.79 83583.17 35963.49 100865.94
2 60145.61 35245.06 19173.08 14417.98 21502.71
3 68138.88 88670.16 85319.66 40850.39 31595.43
4 228835.04 215258.84 157905.56 136551.46 166027.30
5 395877.88 348307.58 348506.09 363460.24 298488.22
6 618013.05 662869.88 522233.48 472174.95 385879.94
7 557751.27 363659.66 305363.68 304606.98 349173.75
8 355639.91 173107.60 266235.54 147731.54 251878.49
9 131440.63 173338.90 133869.36 140035.13 109595.83
10 168148.90 127356.25 114818.69 119082.52 139201.50
11 139543.35 138151.22 128667.58 137351.77 107807.27
12 142286.06 136670.64 116980.04 69609.22 85670.84
To get the first column of payment totals is easy - it's doing it for the other years that I can't figure out - I know how to do a basic PIVOT table.
The query for the first 2 columns is
SELECT DATEPART(MM, pymt_Date) AS PayMonth, SUM(pymt_Amount) AS [2007]
FROM PYMT
GROUP BY DATEPART(MM, pymt_Date) , DATEPART(YY, pymt_Date)
HAVING (DATEPART(YY, pymt_Date) = 2007)
ORDER BY PayMonth
How to add the other years (each payMonth is a sum of all payments for the month), but I don't wish to pivot by month (just the years by the month as I show in the output)
I could run a separate query in a cursor per month
Here's an example, but showing the grand total for the year (but it needs to be separated by month)
SELECT * FROM
(SELECT DATEPART(yy, pymt_Date) AS PayYear, pymt_Amount
FROM PYMT
) tbl1
PIVOT
(SUM(pymt_Amount)
FOR
PayYear in ([2007],[2008],[2009],[2010],[2011])
) tbl2
which yields
2007 2008 2009 2010 2011
2891864.70 2545299.58 2282655.93 1981835.67 2047687.22
As you can see - this isn't broken down by month rows
Any ideas?
You missed this DATEPART(month, pymt_Date) AS PayMonth in the tbl1 query.
The complete query should be
SELECT *
FROM
(
SELECT DATEPART(year, pymt_Date) AS PayYear,
DATEPART(month, pymt_Date) AS PayMonth,
pymt_Amount
FROM PYMT
) tbl1
PIVOT
(
SUM(pymt_Amount)
FOR PayYear in ([2007],[2008],[2009],[2010],[2011])
) tbl2
I am running SQL Server 2014 and I have the following T-SQL query:
USE MYDATABASE
SELECT *
FROM RESERVATIONLIST
WHERE [MTH] IN ('JANUARY 2015','FEBRUARY 2015')
RESERVATIONLIST mentioned in the code above is a view. The query gives me the following output (extract):
ID NAME DOA DOD Nights Spent MTH
--------------------------------------------------------------------
251 AH 2015-01-12 2015-01-15 3 JANUARY 2015
258 JV 2015-01-28 2015-02-03 4 JANUARY 2015
258 JV 2015-01-28 2015-02-03 2 FEBRUARY 2015
The above output consist of around 12,000 records.
I need to modify my query so that it eliminates all duplicate ID and give me the following results:
ID NAME DOA DOD Nights Spent MTH
--------------------------------------------------------------------
251 AH 2015-01-12 2015-01-15 3 JANUARY 2015
258 JV 2015-01-28 2015-02-03 4 JANUARY 2015
I tried something like this, but it's not working:
USE MYDATABASE
SELECT *
FROM RESERVATIONLIST
WHERE [MTH] IN ('JANUARY 2015', 'FEBRUARY 2015')
GROUP BY [ID]
HAVING COUNT ([MTH]) > 1
Following query will return one row per ID :
SELECT * FROM
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT NULL)) rn FROM RESERVATIONLIST
WHERE [MTH] IN ('JANUARY 2015','FEBRUARY 2015')
) T
WHERE rn = 1
Note : this will return a random row from multiple rows having same ID. IF you want to select some specific row then you have to define it in order by. For e.g. :
SELECT * FROM
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DOA DESC) rn FROM RESERVATIONLIST
WHERE [MTH] IN ('JANUARY 2015','FEBRUARY 2015')
) T
WHERE rn = 1
definitely, it will return the row having max(DOA).
You are trying to do a GROUP BY statement which IMHO is the right way to go. You should formulate all columns that are a constant, and roll-up the others. Depending on the value of DOD and DOA I can see two solutions:
SELECT ID,NAME,DOA,DOD,SUM([Nights Spent]) as Nights,
min(MTH) as firstRes, max(MTH) as lastRes
FROM RESERVATIONLIST
GROUP BY ID,NAME,DOA,DOD
OR
SELECT ID,NAME,min(DOA) as firstDOA,max(DOD) as lastDOD,SUM([Nights Spent]) as Nights,
min(MTH) as firstRes, max(MTH) as lastRes
FROM RESERVATIONLIST
GROUP BY ID,NAME
I know that there are many topics discussing nested queries, however I am getting errors on my nested query due to the functions I am using.
Sample Data:
Sample TestDate Column:
2015-05-13 13:45:14.000
2015-05-15 07:33:13.000
2015-05-18 06:07:11.000
2015-05-19 02:58:13.000
2015-05-22 14:08:42.000
2015-05-26 11:01:29.000
2015-05-26 11:01:50.000
2015-05-27 07:19:32.000
2015-05-15 08:04:28.000
2015-05-15 10:32:23.000
2015-05-22 14:11:26.000
2015-05-27 07:16:57.000
2015-05-29 08:50:36.000
2015-05-15 10:38:23.000
2015-05-19 03:08:53.000
2015-05-27 13:41:47.000
2015-05-29 08:47:56.000
2015-05-15 07:50:04.000
2015-05-18 06:20:28.000
2015-05-19 06:32:24.000
2015-05-26 11:00:58.000
2015-05-22 14:12:15.000
2015-05-26 10:57:17.000
I am looking to query the last 7 DATES with data (may not be the last 7 days).
My query to return the last 7 Dates with data works well.
-- Set the return record count to the last 7 days
SET ROWCOUNT 7
--Get the Distinct Dates
SELECT DISTINCT(CONVERT(VARCHAR, CONVERT(DATETIME,[TestDate]),23)) AS DT
FROM [SERVER].[dbo].[TABLE]
--Get the last 60 days
WHERE [TestDate] BETWEEN (Getdate() - 60) AND Getdate()
--Start at the current date and go backwards.
ORDER BY DT DESC
-- reset the return record count to prevent issues with further queries.
SET ROWCOUNT 0
This provides the following result:
DT
2015-05-29
2015-05-27
2015-05-26
2015-05-22
2015-05-19
2015-05-18
2015-05-15
Now, I want to use those 7 entries to pull the data for those dates.
Usually I would do a
SELECT * WHERE [TestDate] >= '2015-05-29' AND [TestDate] <= '2015-05-30'
for example (cumbersome I know).
A) I get errors with the SET function in a nested query.
B) How to make the proper WHERE statement. One option is to use the first and last result (2015-05-29 and 2015-05-15) from the query
(WHERE [TestDate] >= 'FIRST_RESULT' AND [TestDate] <= 'LAST_RESULT')
EDIT:
So from the table I added above, I would want data from 2015-05-15 - 2015-05-29 (ie the results from the query), but not from the data on date 2015-05-13, since data from the 13 th is the 8 th day.
This would give you the last 7 dates with data without having to do what you've done in your sample code:
SELECT DISTINCT TOP 7
CAST([TestDate] AS DATE) DT
FROM YourTable
ORDER BY CAST([TestDate] AS DATE) DESC
I've cast them to DATE to get the date portion.
You can use this to JOIN on to, which will restrict the output to rows with a matching date:
SELECT *
FROM YourTable t1
INNER JOIN ( SELECT DISTINCT TOP 7
CAST(TestDate AS DATE) DT
FROM YourTable
ORDER BY CAST(TestDate AS DATE) DESC
) dts ON dts.DT = CAST(t1.TestDate AS DATE)
I have a query that pulls out month/year totals for customers, and add the ntile ranking. If I were to be able to pull out the max subtotal for ntile 1, 2, 3, 4, and 5, I would ALMOST get what I'm after, but I do not know how to proceed.
For example, the result I want would look something like:
Month Year CustomerCode SubTotal ntile
1 2012 CCC 131.45 1
1 2012 CCC 342.95 2
1 2012 ELITE 643.92 3
1 2012 CCC 1454.05 4
1 2012 CCC 12971.78 5
2 2012 CCC 135.99 1
2 2012 CCI 370.47 2
2 2012 NOC 766.84 3
2 2012 ELITE 1428.26 4
2 2012 VBC 5073.20 5
3 2012 CCC 119.02 1
3 2012 CCC 323.78 2
3 2012 HUCC 759.66 3
3 2012 ELITE 1402.95 4
3 2012 CCC 7964.20 5
EXCEPT - I would expect ranking to be different customers like for month 2, but my base query isn't giving me that result - and I obviously don't know how to get it in T-SQL on SQL SERVER 2005 - in fact I'm not sure what I'm getting.
My next option is to pull a DataTable in C# and do some gymnastics to get there, but there has to be an easier way :)
My base query is
SELECT
i.DateOrdered
,LTRIM(STR(DATEPART(MONTH,i.DateOrdered))) AS [Month]
,LTRIM(STR(YEAR(i.Dateordered))) AS [Year]
,c.CustomerCode
,SUM(i.Jobprice) AS Subtotal
,NTILE(5) OVER(ORDER BY SUM(i.JobPrice)) AS [ntile]
FROM Invoices i
JOIN
Customers c
ON i.CustomerID = c.ID
WHERE i.DateOrdered >= '1/1/2012'
AND i.DateOrdered <= '9/30/2012'
GROUP BY YEAR(i.DateOrdered), MONTH(i.DateOrdered), i.DateOrdered, c.CustomerCode
ORDER BY LTRIM(STR(DATEPART(MONTH,i.DateOrdered))),
TRIM(STR(YEAR(i.Dateordered))),
SUM(i.JobPrice), c.CustomerCode ASC
I'd really appreciate help getting this right.
Thanks in advance
Cliff
If I read you correctly, what you are after is
For each month in the range,
Show 5 customers who have the greatest SUMs in that month
And against each customer, show the corresponding SUM.
In that case, this SQL Fiddle creates a sample table and runs the query that gives you the output described above. If you wanted to see what's in the created tables, just do simple SELECTs on the right panel.
The query is:
; WITH G as -- grouped by month and customer
(
SELECT DATEADD(D,1-DAY(i.DateOrdered),i.DateOrdered) [Month],
c.CustomerCode,
SUM(i.Jobprice) Subtotal
FROM Invoices i
JOIN Customers c ON i.CustomerID = c.ID
WHERE i.DateOrdered >= '1/1/2012' AND i.DateOrdered <= '9/30/2012'
GROUP BY DATEADD(D,1-DAY(i.DateOrdered),i.DateOrdered), c.CustomerCode
)
SELECT MONTH([Month]) [Month],
YEAR([Month]) [Year],
CustomerCode,
SubTotal,
Rnk [Rank]
FROM
(
SELECT *, RANK() OVER (partition by [Month] order by Subtotal desc) Rnk
FROM G
) X
WHERE Rnk <= 5
ORDER BY Month, Rnk
To explain, the first part (WITH block) is just a fancy way of writing a subquery, that GROUPs the data by month and Customer. The expression DATEADD(D,1-DAY(i.DateOrdered),i.DateOrdered) turns every date into the FIRST day of that month, so that the data can be easily grouped by month. The next subquery written in traditional form adds a RANK column within each month by the subtotal, which is finally SELECTed to give the top 5*.
Note that RANK allows for equal rankings, which may end up showing 6 customers for a month, if 3 of them are ranked equally at position 4. If that is not what you want, then you can change the word RANK to ROW_NUMBER which will randomly tie-break between equal Subtotals.
The query needs to be modified to only get the month and year dateparts. The issue you are having with the same customer showing multiple times in the same month is due to the inclusion of i.DateOrdered in the select and group by clauses.
The following query should give you what you need. Also, I suspect it is a typo on the next to last line of the query, but tsql doesn't have a TRIM() function only LTRIM and RTRIM.
SELECT
LTRIM(STR(DATEPART(MONTH,i.DateOrdered))) AS [Month]
,LTRIM(STR(YEAR(i.Dateordered))) AS [Year]
,c.CustomerCode
,SUM(i.Jobprice) AS Subtotal
,NTILE(5) OVER(ORDER BY SUM(i.JobPrice)) AS [ntile]
FROM Invoices i
JOIN
Customers c
ON i.CustomerID = c.ID
WHERE i.DateOrdered >= '1/1/2012'
AND i.DateOrdered <= '9/30/2012'
GROUP BY YEAR(i.DateOrdered), MONTH(i.DateOrdered), c.CustomerCode
ORDER BY LTRIM(STR(DATEPART(MONTH,i.DateOrdered))),
LTRIM(STR(YEAR(i.Dateordered))),
SUM(i.JobPrice), c.CustomerCode ASC
This gives these results
Month Year CustomerCode Subtotal ntile
1 2012 ELITE 643.92 2
1 2012 CCC 14900.23 5
2 2012 CCC 135.99 1
2 2012 CCI 370.47 1
2 2012 NOC 766.84 3
2 2012 ELITE 1428.26 4
2 2012 VBC 5073.20 4
3 2012 HUCC 759.66 2
3 2012 ELITE 1402.95 3
3 2012 CCC 8407.00 5
Try this:
declare #tab table
(
[month] int,
[year] int,
CustomerCode varchar(20),
SubTotal float
)
insert into #tab
select
1,2012,'ccc',131.45 union all
select
1,2012,'ccc',343.45 union all
select
1,2012,'ELITE',643.92 union all
select
2,2012,'ccc',131.45 union all
select
2,2012,'ccc',343.45 union all
select
2,2012,'ELITE',643.92 union all
select
3,2012,'ccc',131.45 union all
select
3,2012,'ccc',343.45 union all
select
3,2012,'ELITE',643.92
;with cte as
(
select NTILE(3) OVER(partition by [month] ORDER BY [month]) AS [ntile],* from #tab
)
select * from cte
Even in your base query you need to add partition by, so that you will get correct output.
I can't see how to solve this problem without double ranking:
You need to get the largest sums per customer & month.
You then need, for every month, to retrieve the top five of the found sums.
Here's how I would approach this:
;
WITH MaxSubtotals AS (
SELECT DISTINCT
CustomerID,
MonthDate = DATEADD(MONTH, DATEDIFF(MONTH, 0, DateOrdered), 0),
Subtotal = MAX(SUM(JobPrice)) OVER (
PARTITION BY Customer, DATEADD(MONTH, DATEDIFF(MONTH, 0, DateOrdered), 0)
ORDER BY SUM(JobPrice)
)
FROM Invoices
GROUP BY
CustomerID,
DateOrdered
),
TotalsRanked AS (
SELECT
CustomerID,
MonthDate,
Subtotal,
Ranking = ROW_NUMBER() OVER (PARTITION BY MonthDate ORDER BY Subtotal DESC)
FROM MaxDailyTotals
)
SELECT
Month = MONTH(i.MonthDate),
Year = YEAR(i.MonthDate),
c.CustomerCode,
i.Subtotal,
i.Ranking
FROM TotalsRanked i
INNER JOIN Customers ON i.CustomerID = c.ID
WHERE i.Ranking <= 5
;
The first CTE, MaxSubtotals, determines the maximum subtotals per customer & month. Involving DISTINCT and a window aggregating function, it is essentially a "shortcut" for the following two-step query:
SELECT
CustomerID,
MonthDate,
Subtotal = MAX(Subtotal)
FROM (
SELECT
CustomerID,
MonthDate = DATEADD(MONTH, DATEDIFF(MONTH, 0, DateOrdered), 0),
Subtotal = SUM(JobPrice)
FROM Invoices
GROUP BY
CustomerID,
DateOrdered
) s
GROUP BY
CustomerID,
MonthDate
The other CTE, TotalsRanked, simply adds ranking numbers for the found susbtotals, partitioning by customer and month. As a final step, you only need to limit the rows to those that have rankings not greater than 5 (or whatever you might choose another time).
Note that using ROW_NUMBER() to rank the rows in this case guarantees that you'll get no more than 5 rows with the Ranking <= 5 filter. If there were two or more rows with the same subtotal, the would get distinct rankings, and in the end you might end up with an output like this:
Month Year CustomerCode Subtotal Ranking
----- ---- ------------ -------- -------
1 2012 CCC 1500.00 1
1 2012 ELITE 1400.00 2
1 2012 NOC 900.00 3
1 2012 VBC 700.00 4
1 2012 HUCC 700.00 5
-- 1 2012 ABC 690.00 6 -- not returned
-- 1 2012 ... ... ...
Even though there might be other customers with Subtotals of 700.00 for the same month, they wouldn't be returned, because they would be assigned rankings after 5.
You could use RANK() instead of ROW_NUMBER() to account for that. But note that you might end up with more than 5 rows per month then, with an output like this:
Month Year CustomerCode Subtotal Ranking
----- ---- ------------ -------- -------
1 2012 CCC 1500.00 1
1 2012 ELITE 1400.00 2
1 2012 NOC 900.00 3
1 2012 VBC 700.00 4
1 2012 HUCC 700.00 4
1 2012 ABC 700.00 4
-- 1 2012 DEF 690.00 7 -- not returned
-- 1 2012 ... ... ...
Customers with subtotals less than 700.00 wouldn't make it to the output because they would have rankings starting with 7, which would correspond to the ranking of the first under-700.00 sum if ranked by ROW_NUMBER().
And there's another option, DENSE_RANK(). You might want to use it if you want up to 5 distinct sums per month in your output. With DENSE_RANK() your output might contain even more rows per month than it would have with RANK(), but the number of distinct subtotals would be exactly 5 (or fewer if the original dataset can't provide you with 5). That is, your output might then look like this:
Month Year CustomerCode Subtotal Ranking
----- ---- ------------ -------- -------
1 2012 CCC 1500.00 1
1 2012 ELITE 1400.00 2
1 2012 NOC 900.00 3
1 2012 VBC 700.00 4
1 2012 HUCC 700.00 4
1 2012 ABC 700.00 4
1 2012 DEF 650.00 5
1 2012 GHI 650.00 5
1 2012 JKL 650.00 5
-- 1 2012 MNO 600.00 5 -- not returned
-- 1 2012 ... ... ...
Like RANK(), the DENSE_RANK() function assigns same rankings to identical values, but, unlike RANK(), it doesn't produce gaps in the ranking sequence.
References:
OVER Clause (Transact-SQL)
Ranking Functions (Transact-SQL)