update stats from nested query based on isnull and aggregate values - sql-server

I have a parent system table called lt_program_data it contains customer data, percents tracked for that customer and a year data field, as those percents are tracked on a yearly basis.
The percents are populated from a localized table based on some criteria, then the parent table lt_program_data is updated based on the year and customer values.
However, in some cases we only have past data and what the user is requesting is in the cases where we have customer data, but no percents corresponding to this season we use the max season value.
our logic is like this for now:
update lt_program_data
set percent = ( select percent
from #percent b
where b.year = a.fyear and b.customer = a.customer)
from lt_program_data
This works great, but now we have to say something like
if b.year is null select Max year for that customer for the data we have.
select *
From #lt_program_data a
join #percent b on b.fyear= isnull(a.fyear,max(a.fyear)) and b.customer = a.customer
I tried to write a select and then an update but get the following message:
Msg 1015, Level 15, State 1, Line 3
An aggregate cannot appear in an ON clause unless it is in a subquery contained in a HAVING clause or select list, and the column being aggregated is an outer reference.
Please help sort this out.
Here is a sample of our output
lt_program_data
customer year .. percent .. Other columns
1 2016 ..
2 2016
1 2017
2 2017
3 2017
etc.
percent table looks like this
customer year percent
1 2016 40
2 2016 64
3 2016 11
The expected result will take lt_program_data for
customer year percent
1 2016 40
1 2017 40
2 2016 64
2 2017 64
3 2017 NULL
It matches customer number and percent for the given year that exists in the percent table (so the value for customer 1 becomes 40 and customer 2 becomes 64) since no data for those customers exist for 2017 season, it uses the same data (max existing) data for the respective customers from 2016 season. in the case of customer 3 since there is nothing its left NULL.
The percent table goes back to 2016, so what we want to say is since the max data we have for our customers goes back to 2016, we will populate the 2017 value in lt_program_data for customer 1 with the 2016 value of 40.

I hope this query will work for you.
update lt_program_data
set percent_poverty = case
when b.year is null -- year in #percent is null (no join found)
then (select top 1 poverty_percent -- then get first percent by ordering year descending
from #percent
where customer = a.customer
order by year desc)
else b.poverty_percent -- else get the percent
end
from lt_program_data a -- lets left join both tables on year and customer
left join #percent b on b.year = a.fyear and b.customer = a.customer

Related

Need to generate rows with missing data in a large dataset - SQL

We are comparing values between months over multiple years. As time moves on the number of years and months in the dataset increases. We are only interested in months where there were values for every year, i.e. a full set.
Consider the following example for 1 month (1) over 3 years (1,2,3) and two activities (101, 102)
Dataset:
Activity Month year Count
------- ---- ------ ------
101 1 1 2
101 1 2 3
101 1 3 1
102 1 1 1
102 1 2 1
In the example above only activity 101 will come into consideration as it satisfies the condition that there must be a count for the activity for month 1 IN year 1, 2 and 3.
Activity 102 doesn't qualify for further analysis as it has no record for year 3.
I would like to generate a record with which I can then evaluate this. The record will effectively generate the new record with the missing row (in this case 102, 1, 3 , 0) to complete the dataset
Activity Month year Count
------- ---- ------ ------
102 1 3 0
We find the problem difficult as the data keeps in growing, the number of activities keep expanding and it is a combination of activity, year and month that need to be evaluated.
An elegant solution will be appreciated.
As I mention in my comment, presumably you have both an Activity table and some kind of Calendar table with details of your activities and the years in your system. As such you can therefore do a CROSS JOIN between these 2 objects and then LEFT JOIN to your table to get the data set you want:
--Create sample objects/data
CREATE TABLE dbo.Activity (Activity int); --Obviously your table has more columns
INSERT INTO dbo.Activity (Activity)
VALUES (101),(102);
GO
CREATE TABLE dbo.Calendar (Year int,
Month int);--Likely your table has more columns
INSERT INTO dbo.Calendar (Year, Month)
VALUES(1,1),
(2,1),
(3,1);
GO
CREATE TABLE dbo.YourTable (Activity int,
Year int,
Month int,
[Count] int);
INSERT INTO dbo.YourTable (Activity,Month, Year, [Count])
VALUES(101,1,1,2),
(101,1,2,3),
(101,1,3,1),
(102,1,1,1),
(102,1,2,1);
GO
--Solution
SELECT A.Activity,
C.Month,
C.Year,
ISNULL(YT.[Count],0) AS [Count]
FROM dbo.Activity A
CROSS JOIN dbo.Calendar C
LEFT JOIN dbo.YourTable YT ON A.Activity = YT.Activity
AND C.[Year] = YT.[Year]
AND C.[Month] = YT.[Month]
WHERE C.Month = 1; --not sure if this is needed
If you don't have an Activity and Calendar table (I suggest, however, you should), then you can use subqueries with a DISTINCT, but note this will be far from performant with large data sets:
SELECT A.Activity,
C.Month,
C.Year,
ISNULL(YT.[Count],0) AS [Count]
FROM (SELECT DISTINCT Activity FROM dbo.YourTable) A
CROSS JOIN (SELECT DISTINCT Year, Month FROM dbo.YourTable) C
LEFT JOIN dbo.YourTable YT ON A.Activity = YT.Activity
AND C.[Year] = YT.[Year]
AND C.[Month] = YT.[Month]
WHERE C.Month = 1; --not sure if this is needed

subquery returned more than 1 value table join

I am writing a course project (something like a program for the hotel manager) and I need a little help. I have tables Reservations and Rooms and I need to calculate the amount of payment after the client leaves the room ((End_date - Start_date) * price_per_day), but I'm having trouble getting the price_per_day from the table Rooms.
My query only works if there is one record in the Resertvation table, if there are 2 or more, I get an error "subquery returned more than 1 value" and I don’t know how to fix it (the problem is in this part of the query SELECT price_per_day FROM Rooms AS ro JOIN Reservations AS re ON ro.room_id = re.room_id)
I'm using visual studio 2019 + SQL Server Express LocalDB.
I will be grateful for any help or hint!
UPDATE Reservations
SET Amount_payable = (
DATEDIFF(day, CONVERT(datetime, Start_date, 104), CONVERT(datetime, End_date, 104) * (SELECT price_per_day FROM Rooms AS ro JOIN Reservations AS re ON ro.room_id = re.room_id))
)
WHERE Status = 'Archived'
Table Reservations
reservation_id customer_id room_id start_date end_date status Amount_payable
1 3 3 12.04.2020 05.06.2020 Archived 0
2 2 4 11.04.2020 30.05.2020 Active 0
Table Rooms
reservation_id room_id number_of_persons room_type price_per_day
0 1 3 Double 300
0 2 4 Triple 600
0 3 3 Studio 400
2 4 2 Single 444
you need slightly different approach to resolve the issue.
try the following:
UPDATE re
SET
Amount_payable = (DATEDIFF(day, CONVERT(DATETIME, Start_date, 104), CONVERT(DATETIME, End_date, 104)) * price_per_day)
FROM Reservations re
JOIN Rooms AS ro ON ro.room_id = re.room_id
WHERE STATUS = 'Archived';

I want to get data if records are not present in specific month

I wrote a sql query to get all records happen in specific month
select month(loggingdate),Count(id) from communicationlogs
where clientid=20154 and month(loggingdate) in (1,2,3,4,5,6,7,8,9)
group by month(loggingdate)
7 65
8 5
here records are present in 7th and 8th month. I want to get 0 value for other month numbers like-
1 0
2 0
3 0
4 0
...
This is a standard problem where a calendar table comes in handy. A calendar table, as the name implies, is a table which just stores a sequence of dates. In your particular case, we only need the digits corresponding to the 12 months. Begin the query with the calendar table and then left join to your aggregation query as a subquery.
Note the use of COALESCE below. If a given month appears nowhere in your original query, then its count would show up as NULL in the join, in which case we report zero for that month.
WITH calendar_month AS (
SELECT 1 AS month
UNION ALL
SELECT month +1
FROM
calendar_month
WHERE month +1 <= 12
)
SELECT
t1.month,
COALESCE(t2.cnt, 0) AS cnt
FROM calendar_month t1
LEFT JOIN
(
SELECT
MONTH(loggingdate) as month,
COUNT(id) AS cnt
FROM communicationlogs
WHERE
clientid = 20154 AND
MONTH(loggingdate) IN (1,2,3,4,5,6,7,8,9)
GROUP BY MONTH(loggingdate)
) t2
ON t1.month = t2.month

SQL Server: How to get a rolling sum over 3 days for different customers within same table

This is the input table:
Customer_ID Date Amount
1 4/11/2014 20
1 4/13/2014 10
1 4/14/2014 30
1 4/18/2014 25
2 5/15/2014 15
2 6/21/2014 25
2 6/22/2014 35
2 6/23/2014 10
There is information pertaining to multiple customers and I want to get a rolling sum across a 3 day window for each customer.
The solution should be as below:
Customer_ID Date Amount Rolling_3_Day_Sum
1 4/11/2014 20 20
1 4/13/2014 10 30
1 4/14/2014 30 40
1 4/18/2014 25 25
2 5/15/2014 15 15
2 6/21/2014 25 25
2 6/22/2014 35 60
2 6/23/2014 10 70
The biggest issue is that I don't have transactions for each day because of which the partition by row number doesn't work.
The closest example I found on SO was:
SQL Query for 7 Day Rolling Average in SQL Server
but even in that case there were transactions made everyday which accomodated the rownumber() based solutions
The rownumber query is as follows:
select customer_id, Date, Amount,
Rolling_3_day_sum = CASE WHEN ROW_NUMBER() OVER (partition by customer_id ORDER BY Date) > 2
THEN SUM(Amount) OVER (partition by customer_id ORDER BY Date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
END
from #tmp_taml9
order by customer_id
I was wondering if there is way to replace "BETWEEN 2 PRECEDING AND CURRENT ROW" by "BETWEEN [DATE - 2] and [DATE]"
One option would be to use a calendar table (or something similar) to get the complete range of dates and left join your table with that and use the row_number based solution.
Another option that might work (not sure about performance) would be to use an apply query like this:
select customer_id, Date, Amount, coalesce(Rolling_3_day_sum, Amount) Rolling_3_day_sum
from #tmp_taml9 t1
cross apply (
select sum(amount) Rolling_3_day_sum
from #tmp_taml9
where Customer_ID = t1.Customer_ID
and datediff(day, date, t1.date) <= 3
and t1.Date >= date
) o
order by customer_id;
I suspect performance might not be great though.

T sql group by month

I'm trying to group by according to month from datetime
I run below query
select cf.flow_name as 'Process', COUNT(c.case_ID) as 'Case', CONVERT(VARCHAR(10),c.xdate,104) as 'Date'
from cases c inner join case_flow cf on c.case_flow_ID=cf.CF_ID
where project_ID=1 and c.subject_ID=1
group by cf.flow_name,c.xdate
Columns data types as below
flow_name varchar(100)
case_ID int
xdate datetime
Result displays like below if i run above query
Process - Case - Date
Test 1 30.01.2015
Test 1 30.01.2015
analysis 1 19.03.2015
analysis 1 30.03.2015
analysis 1 13.04.2015
analysis 1 16.04.2015
Question:
I need to group by as below (group by according to month for x.date)
Correct Result should be as below
Process - Case - Date
Test 2 30.01.2015 (Because Test has 2 data from 01 month)
analysis 2 19.03.2015 (Because analysis has 2 data from 03 month)
analysis 2 13.04.2015 (Because analysis has 2 data from 04 month)
as above all result should group by month how can i do this according to my query ?
hope you understand my english thanks
SELECT cf_flow,
Count(*),
Min(xdate)
FROM cases c
INNER JOIN case_flow cf
ON c.case_flow_id = cf.cf_id
WHERE project_id = 1
AND c.subject_id = 1
GROUP BY cf_flow,
Dateadd(month, Datediff(month, 0, xdate), 0)

Resources