sql cross table calculations - sql-server

Hi i need to write a query that does multiple things, i made it so it can get the details of orders from within a certain time frame as well as for ages between 20 and 30, however i need to check if the orders product cost more then a set amount
however that data is in multiple tables
one table has the orderid the prodcode and quantity, while the other day has the prod information such as code and price, and im 3rd from another table
So i need to access the price of the product with the prodcode and quantity to do a cross table calculation and see if its above 100 and trying to do this with an and where command
so if i have 3 tables
Orderplaced table with oid odate custno paid
ordered table with oid itemid quant
items itemid itemname price
and i need to do a calcultion across those tabkes in my query
SELECT DISTINCT Orderplaced.OID, Orderplaced.odate, Orderplaced.custno, Orderplaced.paid
FROM Cust, Orderplaced, items, Ordered
WHERE Orderplaced.odate BETWEEN '01-JUL-14' AND '31-DEC-14'
AND Floor((sysdate-Cust.DOB) / 365.25) Between '20' AND '30'
AND Cust.SEX='M'
AND items.itemid=ordered.itemid
AND $sum(ordered.quan*item.PRICE) >100;
no matter what way i try to get the calculation to work it doesnt seem to work always returns the same result even on orders under 100 dollars
so any advice on this would be good as its for my studies but is troubling me a lot

I think this is what you want. (I not familiar with $sum, I've replaced it with SUM())
SELECT
Orderplaced.OID,
Orderplaced.odate,
Orderplaced.custno,
Orderplaced.paid,
sum(ordered.quan * item.PRICE)
FROM
Cust
JOIN Orderplaced ON Cust.CustNo = Orderplaced.custno
JOIN Ordered ON Ordered.Oid = Orderplaced.Oid
JOIN items ON items.itemid = ordered.itemid
WHERE
Orderplaced.odate BETWEEN date 2014-07-01 AND date 2014-12-31
AND Floor((sysdate-Cust.DOB) / 365.25) Between 20 AND 30
AND Cust.SEX = 'M'
GROUP BY
Orderplaced.OID,
Orderplaced.odate,
Orderplaced.custno,
Orderplaced.paid
HAVING
sum(ordered.quant * item.PRICE) > 100;

I think you want to try something like this...
SELECT DISTINCT Orderplaced.OID, Orderplaced.odate, Orderplaced.custno, Orderplaced.paid
FROM Cust
JOIN Orderplaced ON
Cust.<SOMEID> = OrderPlaces.<CustId>
AND Orderplaced.odate BETWEEN '01-JUL-14' AND '31-DEC-14'
WHERE Floor((sysdate-Cust.DOB) / 365.25) Between 20 AND 30
AND Cust.SEX='M'
AND (
SELECT SUM(Ordered.quan*Item.PRICE)
FROM Ordered
JOIN Item ON Item.ItemId = Ordered.ItemId
WHERE Ordered.<SomeId> = OrderPlaced.<SomeId>) > 100
Couple of pointers:
1. Floor returns a number... you are comparing it to a string
2. Typically, when referencing a table in a query, the table has to be joined on its primary keys, ie. In your query you're referencing Item and ordered, without joining any of those tables on any key columns.
Hope that helps

Related

SQL Server : Join If Between

I have 2 tables:
Query1: contains 3 columns, Due_Date, Received_Date, Diff
where Diff is the difference in the two dates in days
QueryHol with 2 columns, Date, Count
This has a list of dates and the count is set to 1 for everything. All these dates represent public holidays.
I want to be able to get the sum of QueryHol["Count"] if QueryHol["Date"] is between Query1["Due_Date"] and Query1["Received_Date"]
Result Wanted: a column joined onto Query1 to state how many public holidays fell into the date range so they can be subtracted from the Query1["Diff"] column to give a reflection of working days.
Because the 01-01-19 is a bank holiday i would want to minus that from the Diff to end up with results like below
Let me know if you require any more info.
Here's an option:
SELECT query1.due_date
, query1.received_date
, query1.diff
, queryhol.count
, COALESCE(query1.diff - queryhol.count, query1.diff) as DiffCount
FROM Query1
OUTER APPLY(
SELECT COUNT(*) AS count
FROM QueryHol
WHERE QueryHol.Date <= Query1.Received_Date
AND QueryHol.Date >= Query1.Due_Date
) AS queryhol
You may need to play around with the join condition - as it is assumes that the Received_Date is always later than the Due_Date which there is not enough data to know all of the use cases.
If I understand your problem, I think this is a possible solution:
select due_date,
receive_date,
diff,
(select sum(table2.count)
from table2
where table2.due_date between table1.due_date and table1.due_date) sum_holi,
table1.diff - (select sum(table2.count)
from table2
where table2.date between table1.due_date and table2.due_date) diff_holi
from table1
where [...] --here your conditions over table1.

SQL Server : update every 5 records with Past Months

I want to update 15 records in that first 5 records date should be June 2019,next 5 records with July 2019,last 5 records with Aug 2019 based on employee id,Can any one tell me how to write this type of query in SQL Server Management Studio V 17.7,I've tried with below query but unable to do for next 5 rows..
Like below query
Update TOP(5) emp.employee(nolock) set statusDate=GETDATE()-31 where EMPLOYEEID='XCXXXXXX';
To update only a certain number of rows of a table you will need to include a FROM clause and join a sub-query which limits the number of rows. I would suggest using OFFSET AND FETCH instead of top so that you can skip X number of rows
You will also want to use the DATEADD function instead of directly subtracting a number from the DateTime function GETDATE(). I'm not certain but I think your query will subtract milliseconds. If you intend to go back a month I would suggest subtracting a month rather than 31 days. Alternatively it might be easier to specify an exact date like '2019-06-01'
For example:
TableA
- TableAID INT PK
- EmployeeID INT FK
- statusDate DATETIME
UPDATE TableA
SET statusDate = '2019-06-01'
FROM TableA
INNER JOIN
(
SELECT TableAID
FROM TableA
WHERE EmployeeID = ''
ORDER BY TableAID
OFFSET 0 ROWS
FETCH NEXT 5 ROWS ONLY
) T1 ON TableA.TableAID = T1.TableAID
Right now it looks like your original query is updating the table employee rather than a purchases table. You will want to replace my TableA with whichever table it is you're updating and replace TableAID with the PK field of it.
You can use a ROW_NUMBER to get a ranking by employee, then just update the first 15 rows.
;WITH EmployeeRowsWithRowNumbers AS
(
SELECT
T.*,
RowNumberByEmployee = ROW_NUMBER() OVER (
PARTITION BY
T.EmployeeID -- Generate a ranking by each different EmployeeID
ORDER BY
(SELECT NULL)) -- ... in no particular order (you should supply one if you have an ordering column)
FROM
emp.employee AS T
)
UPDATE E SET
statusDate = CASE
WHEN E.RowNumberByEmployee <= 5 THEN '2019-06-01'
WHEN E.RowNumberByEmployee BETWEEN 6 AND 10 THEN '2019-07-01'
ELSE '2019-08-01' END
FROM
EmployeeRowsWithRowNumbers AS E
WHERE
E.RowNumberByEmployee <= 15

Optimizing Large Table Join in PySpark

I have a large fact table, roughly 500M rows per day. The table is partitioned by region_date.
I have to scan through 6 months of data every day, left outer join with another smaller subset (1M rows) based on an id & date column and calculate two aggregate values: sum(fact) if id exists in right table & sum(fact)
My SparkSQL looks like this:
SELECT
a.region_date,
SUM(case
when t4.id is null then 0
else a.duration_secs
end) matching_duration_secs
SUM(a.duration_secs) total_duration_secs
FROM fact_table a LEFT OUTER JOIN id_lookup t4
ON a.id = t4.id
and a.region_date = t4.region_date
WHERE a.region_date >= CAST(date_format(DATE_ADD(CURRENT_DATE,-180), 'yyyyMMdd') AS BIGINT)
AND a.is_test = 0
AND a.desc = 'VIDEO'
GROUP BY a.region_date
What is the best way to optimize and distribute/partition the data? The query runs for more than 3 hours now. I tried spark.sql.shuffle.partitions = 700
If I roll-up the daily data at "id" level, it's about 5M rows per day. Should I rollup the data first and then do the join?
Thanks,
Ram.
Because there are some filter conditions in your query, I thought you can split your query into two queries to decrease the amount of data first.
table1 = select * from fact_table
WHERE a.region_date >= CAST(date_format(DATE_ADD(CURRENT_DATE,-180), 'yyyyMMdd') AS BIGINT)
AND a.is_test = 0
AND a.desc = 'VIDEO'
Then you can use the new table which is much smaller than the original table to join id_lookup table

Summation in SQL over a computed column

I have a Trans-SQL related question, concerning summations over a computed column.
I am having a problem with double-counting of these computed values.
Usually I would extract all the raw data and post-process it in Perl, but I can't do that on this occasion due to the particular reporting system we need to use. I'm relatively inexperienced with the intricacies of SQL, so I thought I'd refer this to the experts.
My data is arranged in the following tables (highly simplified and reduced for the purposes of clarity):
Patient table:
PatientId
PatientSer
Course table
PatientSer
CourseSer
CourseId
Diagnosis table
PatientSer
DiagnosisId
Plan table
PlanSer
CourseSer
PlanId
Field table
PlanSer
FieldId
FractionNumber
FieldDateTime
What I would like to do is find the difference between the maximum fraction number and the minimum fraction number over a range of dates in the FieldDateTime in the FieldTable. I would like to then sum these values over the possible plan ids associated with a course, but I do not want to double count over the two particular diagnosis ids (A or B or both) that I may encounter for a patient.
So, for a patient with two diagnosis codes (A and B) and two plans in the same course of treatment (Plan1 and Plan2), with a difference in fraction numbers of 24 for the first plan and 5 for the second what I would like to get out is something like this:
- **PatientId CourseId PlanId DiagnosisId FractionNumberDiff Sum
- AB1234 1 Plan1 A 24 29
- AB1234 1 Plan1 B * *
- AB1234 1 Plan2 A 5 *
- AB1234 1 Plan2 B * *
I've racked my brains about how to do this, and I've tried the following:
SELECT
Patient.PatientId,
Course.CourseId,
Plan.PlanId,
MAX(fractionnumber OVER PARTITION(Plan.PlanSer)) - MIN(fractionnumber OVER PARTITION(Plan.PlanSer)) AS FractionNumberDiff,
SUM(FractionNumberDiff OVER PARTITION(Course.CourseSer)
FROM
Patient P
INNER JOIN
Course C ON (P.PatientSer = C.PatientSer)
INNER JOIN
Plan Pl ON (Pl.CourseSer = C.CourseSer)
INNER JOIN
Diagnosis D ON (D.PatientSer = P.PatientSer)
INNER JOIN
Field F ON (F.PlanSer = Pl.PlanSer)
WHERE
FieldDateTime > [Start Date]
AND FieldDateTime < [End Date]
But this just double-counts over the diagnosis codes, meaning that I end up with 58 instead of 29.
Any ideas about what I can do?
change the FractionNumberDiff to
MAX(fractionnumber) OVER (PARTITION BY Plan.PlanSer) -
MIN(fractionnumber) OVER (PARTITION BY Plan.PlanSer) AS FractionNumberDiff
and remove the "SUM(FractionNumberDiff OVER PARTITION(Course.CourseSer)"
make the exisitng query as a derived table and calcualte the SUM(FractionNumberDiff) there
SELECT *, SUM(FractionNumberDiff) OVER ( PARTITION BYCourse.CourseSer)
FROM
(
< the modified existing query here>
) AS d
as for the double counting issue, please post some sample data and the expected result

Flatten/merge overlapping time intervals

I have a 'Service' table with millions of rows. Each row corresponds to a service provided by a staff in a given date and time interval (Each row has a unique ID). There are cases where a staff might provide services in overlapping time frames. I need to write a query that merges overlapping time intervals and returns the data in the format shown below.
I tried grouping by StaffID and Date fields and getting the Min of BeginTime and Max of EndTime but that does not account for the non-overlapping time frames. How can I accomplish this? Again, the table contains several million records so a recursive CTE approach might have performance issues. Thanks in advance.
Service Table
ID StaffID Date BeginTime EndTime
1 101 2014-01-01 08:00 09:00
2 101 2014-01-01 08:30 09:30
3 101 2014-01-01 18:00 20:30
4 101 2014-01-01 19:00 21:00
Output
StaffID Date BeginTime EndTime
101 2014-01-01 08:00 09:30
101 2014-01-01 18:00 21:00
Here is another sample data set with a query proposed by a contributor.
http://sqlfiddle.com/#!6/bfbdc/3
The first two rows in the results set should be merged into one row (06:00-08:45) but it generates two rows (06:00-08:30 & 06:00-08:45)
I only came up with a CTE query as the problem is there may be a chain of overlapping times, e.g. record 1 overlaps with record 2, record 2 with record 3 and so on. This is hard to resolve without CTE or some other kind of loops, etc. Please give it a go anyway.
The first part of the CTE query gets the services that start a new group and are do not have the same starting time as some other service (I need to have just one record that starts a group). The second part gets those that start a group but there's more then one with the same start time - again, I need just one of them. The last part recursively builds up on the starting group, taking all overlapping services.
Here is SQLFiddle with more records added to demonstrate different kinds of overlapping and duplicate times.
I couldn't use ServiceID as it would have to be ordered in the same way as BeginTime.
;with flat as
(
select StaffID, ServiceDate, BeginTime, EndTime, BeginTime as groupid
from services S1
where not exists (select * from services S2
where S1.StaffID = S2.StaffID
and S1.ServiceDate = S2.ServiceDate
and S2.BeginTime <= S1.BeginTime and S2.EndTime <> S1.EndTime
and S2.EndTime > S1.BeginTime)
union all
select StaffID, ServiceDate, BeginTime, EndTime, BeginTime as groupid
from services S1
where exists (select * from services S2
where S1.StaffID = S2.StaffID
and S1.ServiceDate = S2.ServiceDate
and S2.BeginTime = S1.BeginTime and S2.EndTime > S1.EndTime)
and not exists (select * from services S2
where S1.StaffID = S2.StaffID
and S1.ServiceDate = S2.ServiceDate
and S2.BeginTime < S1.BeginTime
and S2.EndTime > S1.BeginTime)
union all
select S.StaffID, S.ServiceDate, S.BeginTime, S.EndTime, flat.groupid
from flat
inner join services S
on flat.StaffID = S.StaffID
and flat.ServiceDate = S.ServiceDate
and flat.EndTime > S.BeginTime
and flat.BeginTime < S.BeginTime and flat.EndTime < S.EndTime
)
select StaffID, ServiceDate, MIN(BeginTime) as begintime, MAX(EndTime) as endtime
from flat
group by StaffID, ServiceDate, groupid
order by StaffID, ServiceDate, begintime, endtime
Elsewhere I've answered a similar Date Packing question with
a geometric strategy. Namely, I interperet the date ranges
as a line, and utilize geometry::UnionAggregate to merge
the ranges.
Your question has two peculiarities though. First, it calls
for sql-server-2008. geometry::UnionAggregate is not then
avialable. However, download the microsoft library at
https://github.com/microsoft/SQLServerSpatialTools and load
it in as a clr assembly to your instance and you have it
available as dbo.GeometryUnionAggregate.
But the real peculiarity that has my interest is the concern
that you have several million rows to work with. So I thought
I'd repeat the strategy here but with an added technique to
improve it's performance. This technique will work well if
you have a lot of your StaffID/date subsets that are the same.
First, let's build a numbers table. Swap this out with your favorite
way to do it.
select i = row_number() over (order by (select null))
into #numbers
from #services; -- where i put your data
Then convert the dates to floats and use those floats to create
geometrical points.
These points can then be turned into lines via STUnion and STEnvelope.
With your ranges now represented as geometric lines, merge them via
UnionAggregate. The resulting geometry object 'lines' might contain
multiple lines. But any overlapping lines turn into one line.
select s.StaffID,
s.Date,
linesWKT = geometry::UnionAggregate(line).ToString()
-- If you have SQLSpatialTools installed then:
-- linesWKT = dbo.GeometryUnionAggregate(line).ToString()
into #aggregateRangesToGeo
from #services s
cross apply (select
beginTimeF = convert(float, convert(datetime,beginTime)),
endTimeF = convert(float, convert(datetime,endTime))
) prepare
cross apply (select
beginPt = geometry::Point(beginTimeF, 0, 0),
endPt = geometry::Point(endTimeF, 0, 0)
) pointify
cross apply (select
line = beginPt.STUnion(endPt).STEnvelope()
) lineify
group by s.StaffID,
s.Date;
You have one 'lines' object for each staffId/date combo. But depending
on your dataset, there may be many 'lines' objects that are the same
between these combos. This may very well be true if staff are expected
to follow a routine and data is recorded to the nearest whatever.
So get a distinct lising of 'lines' objects. This should improve
performance.
From this, extract the individual lines inside 'lines'. Envelope the lines,
which ensures that the lines are stored only as their endpoints. Read the
endpoint x values and convert them back to their time representations.
Keep the WKT representation to join it back to the combos later on.
select lns.linesWKT,
beginTime = convert(time, convert(datetime, ap.beginTime)),
endTime = convert(time, convert(datetime, ap.endTime))
into #parsedLines
from (select distinct linesWKT from #aggregateRangesToGeo) lns
cross apply (select
lines = geometry::STGeomFromText(linesWKT, 0)
) geo
join #numbers n on n.i between 1 and geo.lines.STNumGeometries()
cross apply (select
line = geo.lines.STGeometryN(n.i).STEnvelope()
) ln
cross apply (select
beginTime = ln.line.STPointN(1).STX,
endTime = ln.line.STPointN(3).STX
) ap;
Now just join your parsed data back to the StaffId/Date combos.
select ar.StaffID,
ar.Date,
pl.beginTime,
pl.endTime
from #aggregateRangesToGeo ar
join #parsedLines pl on ar.linesWKT = pl.linesWKT
order by ar.StaffID,
ar.Date,
pl.beginTime;

Resources