SQL query to insert stats of students to a table - sql-server

I would like some help on how to write an sql server query in order to insert the monthly stats of students into a table.
My monthly stats table is something like this:
| StudentID | Year | Month | Grade1 | Grade2| Absences
Now I have another table with the Students Details like StudentID, name, etc. Also multiple other tables with grades, presence etc.
My goal is to select all studentsIDs from StudentDetails and insert them to the Monthly Stats table while I calculate Grade1, Grade2, and Absences from other multiple tables.
What is the best way to write such a query?
Do I first insert the StudentsIds, Year column and Month column with a select into query and after that, I iterate somehow through every studentid that were inserted and run update queries (for calculating rest of columns) for every studentID for the specified month and year?
I just need an example or some logic on how to achieve this.
For the the first part of inserting studentids I have this:
declare #maindate date = '20230101';
insert into Monthly_Stats (StudentID, Year, Month)
(select StudentID, AllocatedYear, AllocatedMonth
from Students_Allocation
where AllocatedMonth = DATEPART(MONTH, #maindate)
and AllocatedYear = DATEPART(YEAR, #maindate)
and Active = 1)
After insertion I would like somehow to update every other column (Grade1, Grade2,Absences...) from multiple other tables for each StudentID for the aforementioned Month and Year.
Any ideas?

This is what I usually perform batch update
UPDATE Monthly_Stats
SET
Monthly_Stats.GRADE1 = T1.Somedata + T2.Somedata + T3.Somedata
FROM
Monthly_Stats MS INNER JOIN TABLE_1 as T1
left join TABLE_2 as T2 on T1.StudentID = T2.StudentID and T1.Year = T2.Year and T1.Month = T2.Month
left join TABLE_3 as T3 on T1.StudentID = T3.StudentID and T1.Year = T3.Year and T1.Month = T3.Month
ON
MS.StudentID = T1.StudentID and MS.Year = T1.Year and MS.Month = T1.Month;
Be careful with the two left join. Depending on your database normalization, you may need more conditions in the ON clause to ensure the join output is as expected.
Hope it helps

Related

Calculation of Average of distinct values

I have date in below format:
Now, I am want to display latest Post Date of each Account No, its corresponding amount, average of Amount of all previous dates including the latest date but the hack is I need to display latest Post Date based on ShareID.
I need the Amount of minimum ShareID and while taking average I have to omit amount of duplicate/same date of max(Postdate).
I need the data in below format:
You can try :
WITH
T1 AS
(
SELECT AccountNo, FundNo, MAX(PostDate) AS LastPostDate
FROM MyTable
GROUP BY AccountNo, FundNo,
),
T2 AS
(
SELECT DISTINCT AccountNo, FundNo, Amount
FROM MyTable
)
SELECT T1.AccountNo, T1.FundNo, T1.LastPostDate, T.PostDate, AVG(T2.Amount)
FROM T1
JOIN T2 ON T1.AccountNo = T2.AccountNo
JOIN MyTable AS T ON T.AccountNo = T1.AccountNo AND T.PostDate, = T1.LastPostDate
Post the DDL of your table and an example under a INSERT SQL statements to help us sto give a correct solution please.

SQL Server : update every 5 records with Past Months

I want to update 15 records in that first 5 records date should be June 2019,next 5 records with July 2019,last 5 records with Aug 2019 based on employee id,Can any one tell me how to write this type of query in SQL Server Management Studio V 17.7,I've tried with below query but unable to do for next 5 rows..
Like below query
Update TOP(5) emp.employee(nolock) set statusDate=GETDATE()-31 where EMPLOYEEID='XCXXXXXX';
To update only a certain number of rows of a table you will need to include a FROM clause and join a sub-query which limits the number of rows. I would suggest using OFFSET AND FETCH instead of top so that you can skip X number of rows
You will also want to use the DATEADD function instead of directly subtracting a number from the DateTime function GETDATE(). I'm not certain but I think your query will subtract milliseconds. If you intend to go back a month I would suggest subtracting a month rather than 31 days. Alternatively it might be easier to specify an exact date like '2019-06-01'
For example:
TableA
- TableAID INT PK
- EmployeeID INT FK
- statusDate DATETIME
UPDATE TableA
SET statusDate = '2019-06-01'
FROM TableA
INNER JOIN
(
SELECT TableAID
FROM TableA
WHERE EmployeeID = ''
ORDER BY TableAID
OFFSET 0 ROWS
FETCH NEXT 5 ROWS ONLY
) T1 ON TableA.TableAID = T1.TableAID
Right now it looks like your original query is updating the table employee rather than a purchases table. You will want to replace my TableA with whichever table it is you're updating and replace TableAID with the PK field of it.
You can use a ROW_NUMBER to get a ranking by employee, then just update the first 15 rows.
;WITH EmployeeRowsWithRowNumbers AS
(
SELECT
T.*,
RowNumberByEmployee = ROW_NUMBER() OVER (
PARTITION BY
T.EmployeeID -- Generate a ranking by each different EmployeeID
ORDER BY
(SELECT NULL)) -- ... in no particular order (you should supply one if you have an ordering column)
FROM
emp.employee AS T
)
UPDATE E SET
statusDate = CASE
WHEN E.RowNumberByEmployee <= 5 THEN '2019-06-01'
WHEN E.RowNumberByEmployee BETWEEN 6 AND 10 THEN '2019-07-01'
ELSE '2019-08-01' END
FROM
EmployeeRowsWithRowNumbers AS E
WHERE
E.RowNumberByEmployee <= 15

Optimizing Large Table Join in PySpark

I have a large fact table, roughly 500M rows per day. The table is partitioned by region_date.
I have to scan through 6 months of data every day, left outer join with another smaller subset (1M rows) based on an id & date column and calculate two aggregate values: sum(fact) if id exists in right table & sum(fact)
My SparkSQL looks like this:
SELECT
a.region_date,
SUM(case
when t4.id is null then 0
else a.duration_secs
end) matching_duration_secs
SUM(a.duration_secs) total_duration_secs
FROM fact_table a LEFT OUTER JOIN id_lookup t4
ON a.id = t4.id
and a.region_date = t4.region_date
WHERE a.region_date >= CAST(date_format(DATE_ADD(CURRENT_DATE,-180), 'yyyyMMdd') AS BIGINT)
AND a.is_test = 0
AND a.desc = 'VIDEO'
GROUP BY a.region_date
What is the best way to optimize and distribute/partition the data? The query runs for more than 3 hours now. I tried spark.sql.shuffle.partitions = 700
If I roll-up the daily data at "id" level, it's about 5M rows per day. Should I rollup the data first and then do the join?
Thanks,
Ram.
Because there are some filter conditions in your query, I thought you can split your query into two queries to decrease the amount of data first.
table1 = select * from fact_table
WHERE a.region_date >= CAST(date_format(DATE_ADD(CURRENT_DATE,-180), 'yyyyMMdd') AS BIGINT)
AND a.is_test = 0
AND a.desc = 'VIDEO'
Then you can use the new table which is much smaller than the original table to join id_lookup table

How to add column A (date column) to Column B ( number of business days) in teradata to get the new date?

Here's my data;
table A.pickup_date is a date column
table A.biz_days is the business days I want to add up to A.pickup_date
table B.date
table B.is_weekend (Y or N)
table B. is_holiday (Y or N)
Basically from table B, I know for each date, if any date is a business day or not. Now I want to have a third column in table A for the exact date after I add A.business_days to A.pickup_date.
Can anyone provide me with either a case when statement or procedure statement for this? Unfortunately we are not allowed to write our own functions in Teradata.
This is pretty darned ugly, but I think it should get you started.
First I created a volatile table to represent your table a:
CREATE VOLATILE TABLE vt_pickup AS
(SELECT CURRENT_DATE AS pickup_date,
8 AS Biz_Days) WITH DATA PRIMARY INDEX(pickup_date)
ON COMMIT PRESERVE ROWS;
INSERT INTO vt_pickup VALUES ('2015-02-24',5);
Then I joined that with sys_calendar.calendar to get the days of the week:
CREATE VOLATILE TABLE VT_Days AS
(
SELECT
p.pickup_date,
day_of_week
FROM
vt_pickup p
INNER JOIN sys_calendar.CALENDAR c
ON c.calendar_date >= p.pickup_date
AND c.calendar_date < (p.pickup_date + Biz_Days)
) WITH DATA
PRIMARY INDEX(pickup_date)
ON COMMIT PRESERVE ROWS
Then I can use all that to generate the actual delivery date:
SELECT
p.pickup_date,
p.biz_days,
biz_days + COUNT(sundays.day_of_week) + COUNT (saturdays.day_of_week) AS TotalDays,
COUNT (sundays.day_of_week) AS Suns,
COUNT (saturdays.day_of_week) AS Sats,
p.pickup_date + totaldays AS Delivery_Date,
FROM
vt_pickup p
LEFT JOIN vt_days AS Sundays ON
p.pickup_date = sundays.pickup_date
AND sundays.day_of_week = 1
LEFT JOIN vt_days AS saturdays ON
p.pickup_date = saturdays.pickup_date
AND saturdays.day_of_week = 7
GROUP BY 1,2
You should be able to use the logic with another alias for your holidays.
The easiest way to do this is calculating a sequential number of business days (add it as a new column to your calendar table if it's a recurring operation, otherwise using WITH):
SUM(CASE WHEN is_weekend = 'Y' OR is_holiday = 'Y' THEN 0 ELSE 1 END)
OVER (ORDER BY calendar_date
ROWS UNBOUNDED PRECEDING) AS biz_day#
Then you need two joins:
SELECT ..., c2.calendar_date
FROM tableA AS a
JOIN tableB AS c1
ON a.pickup_date = c1.calendar_date
JOIN tableB AS c2
ON c2.biz_day# = c1.biz_day# + a.biz_days
AND is_weekend = 'N'
AND is_holiday = 'N'

How to query a Master database using Inner Join links to 2 sub-databases that are identical to each other

I have an Inventory table containing Master file info and 2 Movement History tables (Current Year and Last Year).
I want to use a Query to extract Movements from (say) June LAST Year to March THIS Year in Code, Date sequence.
I am relatively new to SQL and have tried to use the following INNER JOIN structure to do this:
SELECT Code, Descrip, Category, MLast.Date, MLast.DocNo, MCurr.Date, MCurr.DocNo
FROM Stock AS S
INNER JOIN MoveTrnArc MLast ON MLast.Stockcode = S.Code
AND MLast.Date >='2011/06/01' AND MLast.Date <='2012/03/31'
INNER JOIN MoveTrn MCurr ON MCurr.Stockcode = S.Code
AND MCurr.Date >='2011/06/01' AND MCurr.Date <='2012/03/31'
ORDER BY S.Code
This creates a Query Table with the following column structure:
Code | Descrip | Category | Date | DocNo | Date | DocNo |
...where the data from the LAST Year table appears in the first Date/DocNo columns and the CURRENT Year data appears in the second Date/DocNo columns.
What must I do to the Query to have each Movement in its own row or is there a better, more efficient Query to achieve this?
Also, I need the Movements listed in Code followed by Date sequence.
use union all instead of joins
select s.Code , s.Descrip , s.Category , t.Date , t.DocNo
from
(
select Stockcode, Date, DocNo from MoveTrnArc
union all
select Stockcode, Date, DocNo from MoveTrn
) t join Stock s on s.Code = t.Stockcode
where t.Date >='2011/06/01' AND t.Date <='2012/03/31'
beside careful with comparing dates, if Date column is type datetime and includes time you have to change t.Date <='2012/03/31' into t.Date <'2012/04/01' to include all the rows from 31st of march,
as '2012/03/31' is casted as '2012/03/31 00:00:00.000'

Resources