Show Quantity with Quantity Breakdown by Batch numbers - sql-server

I have written a query in T-SQL that extracts StockNo, StockQty, CostPrice, BatchNo, BatchQty and StockValue. Something similar to a Stock Audit report. The BatchNo and BatchQty exists on TableB while the StockNo, StockQty, CostPrice and StockValue are from TableA. The BatchQty is the breakdown of the StockQty for Batched Items. So yes, the query returns the StockNo and it's relevant batch quantity if applicable. I am trying to write the query such that it shows the StockQty and next to that the BatchQty breakdown with duplicating or bringing in the StockValue again.
I am using SQL 2016.
I have tried different join types, temp tables as well as a union. I have also researched different SQL forums to see if there is a possible solution that I can use.
SELECT
TRANS_ID
STOCKNO,
BATCHMANAGED,
BATCHNO,
STOCKQTY,
BATCHQTY,
COSTPRICE,
STOCKVALUE,
FROM TABLE_A T0
FULL JOIN TABLE_B T1 ON T0.STOCKNO = T1.STOCKNO AND T0.TRANS_ID = T1_BASE_ID
I had to go with a full join as I needed Stock items that were both Batched and Non-Batched. Any other join I tried, omitted certain stock items and the total Stock Value didn't balance back to the Trial Balance.
Below is the result I currently get, which is extracting the StockQty and StockValue twice and in certain cases more than twice depending on the BatchNo and BatchQty:
StockNo BatchManaged BatchNo StockQty BatchQty CostPrice StockValue
Item1 Y BN0001 20 10 10.00 200
Item1 Y BN0002 20 10 10.00 200
Item2 Y BN0003 40 24 30.00 1200
Item2 Y BN0004 20 16 30.00 1200
Item3 N 50 20.00 1000
Total 3800
This is the output I am trying to achieve where the StockQty and StockValue is not shown for every batch number.They need to just show once:
StockNo BatchManaged BatchNo StockQty BatchQty CostPrice StockValue
Item1 Y BN0001 20 10 10.00 200
Item1 Y BN0002 10 10.00
Item2 Y BN0003 40 24 30.00 1200
Item2 Y BN0004 16 30.00
Item3 N 50 20 1000
Total 2400

Related

SQL Server, select with group by and date column

I have tbl_Orders:
OrderNumber ProductCode Qty OrderDate SentDate
---------------------------------------------------------------
1000 A 100 2018-03-01 00:30:51 2018-03-02
1000 A 150 2018-03-12 04:31:54 2018-03-13
1000 B 150 2018-03-11 03:34:51 2018-03-14
1001 C 200 2018-03-01 00:30:51 2018-04-02
1002 D 200 2018-03-01 00:30:51 2018-04-15
I want to write a query to get result like:
OrderNumber Qty MAXOrderDate MAXSentDate
------------------------------------------------
1000 400 2018-03-12 04:31:54 2018-03-14
1001 200 2018-03-01 00:30:51 2018-04-02
1002 200 2018-03-01 00:30:51 2018-04-15
From my newbie perspective it looks like I need 'group by OrderNumber'. But I am not sure what else. And the second problematic thing is "MAXOrderDate" or "MAXSentDate". I also don't know how to select max value from Date.
I really appreciate any help.
Simple,
It should be - As long as you are aggregating data you need not group by that column (eg.. min, max, avg etc.,.. so your original idea of groupby ordernumber should be sufficient)
Try the sql
select
tbl_Orders.ordernumber as OrderNumber
, sum(tbl_Orders.qty) as qty
,max(tbl_Orders.orderdate) as orderdate
,max(tbl_Orders.sentdate) as sentdate
from tbl_Orders
group by
tbl_Orders.ordernumber
You have to sum Qty, max on MAXOrderDate and max on MAXSentDate and then group by OrderNumber, like:
SELECT OrderNumber, sum(Qty) As Qty, max(MAXOrderDate) As OrderDate, max(MAXSentDate) AS SentDate
FROM tbl_Orders (NOLOCK)
GROUP BY OrderNumber
I would use NOLOCK to avoid other queries to be blocked in hight performance environments. Be aware that NOLOCK could bring in the result records that are in transactions that are not yet commited.

Error on Group by method

I wrote a query to combine records in multiple tables. Tables named by Purchase Order, Purchase Order Item
[ Note: The column names are not original names, it just for a model data]
In purchase order table have the order details like this,
id date vendorid totalitems totalqty grossamnt netamnt taxamt
----------------------------------------------------------------------------
1 03/10/17 00001 2 6 12000 13000 1000
Purchase Order Item table have the order details like this,
poid id productcode qty rate tax(%) taxamnt total
--------------------------------------------------------
1 1 12001 3 6000 2.5 500 6500
2 1 12000 3 6000 2.5 500 6500
My Query is,
select po.POID,po.SupplierId,po.TotalItems from
PurchaseOrder po, PurchaseOrderItem poi where po.POID=poi.POID group by
po.POID, po.SupplierId,po.TotalItems
Query returns,
id vendorid totalitems
--------------------------
1 00001 2
1 00001 2
Expected Output is,
id vendorid totalitems
------------------------
1 00001 2
You are using an outdated join method, have a read here:
ANSI vs. non-ANSI SQL JOIN syntax
You are also joining to another table, but never use it:
select po.POID,po.SupplierId,po.TotalItems
from PurchaseOrder po, PurchaseOrderItem poi
where po.POID=poi.POID
group by po.POID, po.SupplierId,po.TotalItems
Can just be:
select po.POID,po.SupplierId,po.TotalItems
from PurchaseOrder po
group by po.POID, po.SupplierId,po.TotalItem
OR
select DISTINCT
po.POID,
po.SupplierId,
po.TotalItems
from PurchaseOrder po

SQL Server - Group by day for the top N of the range

What I need to do is get a Cost breakout for each grouping, aggregated by day. Also, only taking the top N per the whole date range. I'm probably not explaining this well so let me give examples. Say my table schema and data looks like this:
SoldDate Product State Cost
----------------------- --------------------- --------- ------
2017-07-11 01:00:00.000 Apple NY 6
2017-07-11 07:00:00.000 Banana NY 1
2017-07-11 07:00:00.000 Banana NY 1
2017-07-12 01:00:00.000 Pear NY 2
2017-07-12 03:00:00.000 Olive TX 1
2017-07-12 16:00:00.000 Banana NY 1
2017-07-13 22:00:00.000 Apple NY 6
2017-07-13 22:00:00.000 Apple NY 6
2017-07-13 23:00:00.000 Banana NY 1
Call this table SoldProduce.
Now what I'm looking for is to group by Day, Product and State but for each day, only take the top two of the group NOT the top of that particular day. Anything else gets lumped under 'other'.
So in this case, our top two groups with the greatest Cost are Apple-NY and Banana-NY. So those are the two that should show up in the output only. Anything else is under 'Other'
So in the end this is the desired output:
SoldDay Product State Total Cost
----------------------- --------------------- --------- ------
2017-07-11 00:00:00.000 Apple NY 6
2017-07-11 00:00:00.000 Banana NY 2
2017-07-11 00:00:00.000 OTHER OTHER 0
2017-07-12 00:00:00.000 OTHER OTHER 3
2017-07-12 00:00:00.000 Banana NY 1
2017-07-13 00:00:00.000 Apple NY 12
2017-07-13 00:00:00.000 Banana NY 1
2017-07-13 00:00:00.000 OTHER OTHER 0
Note how on the 12th Pear and Olive were lumped under other. Even though it outsold Banana on that day. This is because I want the Top N selling groups for the whole range, not just on a day by day basis.
I did a lot of googleing a way to make a query to get this data but I'm not sure if it's the best way:
WITH TopX AS
(
SELECT
b.Product,
b.State,
b.SoldDate,
b.Cost,
DENSE_RANK() OVER (ORDER BY GroupedCost DESC) as [Rank]
FROM
(
SELECT
b.Product,
b.State,
b.SoldDate,
b.Cost,
SUM(b.Cost) OVER (PARTITION BY b.Product, b.State) as GroupedCost
FROM
SoldProduce b WITH (NOLOCK)
) as b
)
SELECT
DATEADD(d,DATEDIFF(d,0,SoldDate),0),
b.Product,
b.State,
SUM(b.Cost)
FROM
TopX b
WHERE
[Rank] <= 2
GROUP BY
DATEADD(d,DATEDIFF(d,0,SoldDate),0),
b.Product,
b.State
UNION ALL
SELECT
DATEADD(d,DATEDIFF(d,0,SoldDate),0),
null,
null,
SUM(b.Cost)
from
TopX b
WHERE
[Rank] > 2
GROUP BY
DATEADD(d,DATEDIFF(d,0,SoldDate),0)
Step 1) Create a common query that first projects the cost that the row would be has we just grouped by Product and State. Then it does a second projection to rank that cost 1-N where 1 has the greatest grouped cost.
Step 2) Call upon the common query, grouping by day and restricting to rows <= 2. This is the Top elements. Then union the other category to this, or anything ranked > 2.
What do you guys think? Is this an efficient solution? Could I do this better?
Edit:
FuzzyTrees suggestion benchmarks better than mine.
Final query used:
WITH TopX AS
(
SELECT
TOP(2)
b.Product,
b.State
FROM
SoldProduce b
GROUP BY
b.Product,
b.State
ORDER BY
SUM(b.Cost)
)
SELECT
DATEADD(d,DATEDIFF(d,0,SoldDate),0),
coalesce(b.Product, 'Other') Product,
coalesce(b.State, 'Other') State,
SUM(b.Cost)
FROM
SoldProduce a
LEFT JOIN TopX b ON
(a.Product = b.Product OR (a.Product IS NULL AND b.Product IS NULL)) AND
(a.State = b.State OR (a.State IS NULL AND b.State IS NULL))
GROUP BY
DATEADD(d,DATEDIFF(d,0,SoldDate),0),
coalesce(b.Product, 'Other') Product,
coalesce(b.State, 'Other') State,
ORDER BY DATEADD(d,DATEDIFF(d,0,SoldDate),0)
-- Order by optional. Just for display purposes.
--More effienct to order in code for the final product.
--Don't use I/O if you don't have to :)
I suggest using a plain group by without window functions for your TopX view:
With TopX AS
(
select top 2 Product, State
from SoldProduce
group by Product, State
order by sum(cost) desc
)
Then you can left join to your TopX view and use coalesce to determine which products fall into the Other group
select
coalesce(TopX.Product, 'Other') Product,
coalesce(TopX.State, 'Other') State,
sum(Cost),
sp.SoldDate
from SoldProduce sp
left join TopX on TopX.Product = sp.Product
and TopX.State = sp.State
group by
coalesce(TopX.Product, 'Other'),
coalesce(TopX.State, 'Other'),
SoldDate
order by SoldDate
Note: This query will not return 0 counts

In SQL Server how to get some meaningful data from a chronicle alike table?

Say I have a table let's call it purchase table in SQL Server that represents user purchasing.
Table name: purchase
purchase_id buyer_member_id song_id
1 101 1001
2 101 1002
3 102 1001
4 102 1003
5 103 1001
6 103 1003
7 103 1004
Now I tried to make some stats out of this table. I want to know who has purchased both song 1001 and 1003.
select distinct buyer_member_id from purchase where
buyer_member_id in (select buyer_member_id from purchase where song_id = 1001)
and buyer_member_id in (select buyer_member_id from purchase where song_id = 1003)
This works but when we add more and more criteria to the equation, it became slower and slower. It's nearly impossible to do a research for something like, find people who buy a, b and c but not d nor f. I understand that the nature of this and the use of "where someid in (select someid from table where something) is probably not the best way to do it.
Question is, is there a better way?
I call these "set-within-a-set" queries, and like to approach them using group by and having:
select buyer_member_id
from purchase p
group by buyer_member_id
having sum(case when song_id = 1001 then 1 else 0 end) > 0 and
sum(case when song_id = 1003 then 1 else 0 end) > 0;
The sum() counts the number of purchases that match each song. The > 0 says there is at least 1. And = 0 would say there are none.

SQL Server - cumulative sum on overlapping data - getting date that sum reaches a given value

In our company, our clients perform various activities that we log in different tables - Interview attendance, Course Attendance, and other general activities.
I have a database view that unions data from all of these tables giving us the ActivityView that looks like this.
As you can see some activities overlap - for example while attending an interview, a client may have been performing a CV update activity.
+----------------------+---------------+---------------------+-------------------+
| activity_client_id | activity_type | activity_start_date | activity_end_date |
+----------------------+---------------+---------------------+-------------------+
| 112 | Interview | 2015-06-01 09:00 | 2015-06-01 11:00 |
| 112 | CV updating | 2015-06-01 09:30 | 2015-06-01 11:30 |
| 112 | Course | 2015-06-02 09:00 | 2015-06-02 16:00 |
| 112 | Interview | 2015-06-03 09:00 | 2015-06-03 10:00 |
+----------------------+---------------+---------------------+-------------------+
Each client has a "Sign Up Date", recorded on the client table, which is when they joined our programme. Here it is for our sample client:
+-----------+---------------------+
| client_id | client_sign_up_date |
+-----------+---------------------+
| 112 | 2015-05-20 |
+-----------+---------------------+
I need to create a report that will show the following columns:
+-----------+---------------------+--------------------------------------------+
| client_id | client_sign_up_date | date_client_completed_5_hours_of_activity |
+-----------+---------------------+--------------------------------------------+
We need this report in order to see how effective our programme is. An important aim of the programme is that we get every client to complete at least 5 hours of activity as quickly as possible.
So this report will tell us how long from sign up does it take each client to achieve this figure.
What makes this even trickier is that when we calculate 5 hours of total activity, we must discount overlapping activities:
In the sample data above the client attended an interview between 09:00 and 11:00.
On the same day they also performed CV updating activity from 09:30 to 11:30.
For our calculation, this would give them total activity for the day of 2.5 hours (150 minutes) - we would only count 30 minutes of the CV updating as the Interview overlaps it up to 11:00.
So the report for our sample client would give the following result:
+-----------+---------------------+--------------------------------------------+
| client_id | client_sign_up_date | date_client_completed_5_hours_of_activity |
+-----------+---------------------+--------------------------------------------+
| 112 | 2015-05-20 | 2015-06-02 |
+-----------+---------------------+--------------------------------------------+
So my question is how can I create the report using a select statement ?
I can work out how to do this by writing a stored procedure that will loop through the view and write the result to a report table.
But I would much prefer to avoid a stored procedure and have a select statement that will give me the report on the fly.
I am using SQL Server 2005.
See SQL Fiddle here.
with tbl as (
-- this will generate daily merged ovelaping time
select distinct
a.id
,(
select min(x.starttime)
from act x
where x.id=a.id and ( x.starttime between a.starttime and a.endtime
or a.starttime between x.starttime and x.endtime )
) start1
,(
select max(x.endtime)
from act x
where x.id=a.id and ( x.endtime between a.starttime and a.endtime
or a.endtime between x.starttime and x.endtime )
) end1
from act a
), tbl2 as
(
-- this will add minute and total minute column
select
*
,datediff(mi,t.start1,t.end1) mi
,(select sum(datediff(mi,x.start1,x.end1)) from tbl x where x.id=t.id and x.end1<=t.end1) totalmi
from tbl t
), tbl3 as
(
-- now final query showing starttime and endtime for 5 hours other wise null in case not completed 5(300 minutes) hours
select
t.id
,min(t.start1) starttime
,min(case when t.totalmi>300 then t.end1 else null end) endtime
from tbl2 t
group by t.id
)
-- final result
select *
from tbl3
where endtime is not null
This is one way to do it:
;WITH CTErn AS (
SELECT activity_client_id, activity_type,
activity_start_date, activity_end_date,
ROW_NUMBER() OVER (PARTITION BY activity_client_id
ORDER BY activity_start_date) AS rn
FROM activities
),
CTEdiff AS (
SELECT c1.activity_client_id, c1.activity_type,
x.activity_start_date, c1.activity_end_date,
DATEDIFF(mi, x.activity_start_date, c1.activity_end_date) AS diff,
ROW_NUMBER() OVER (PARTITION BY c1.activity_client_id
ORDER BY x.activity_start_date) AS seq
FROM CTErn AS c1
LEFT JOIN CTErn AS c2 ON c1.rn = c2.rn + 1
CROSS APPLY (SELECT CASE
WHEN c1.activity_start_date < c2.activity_end_date
THEN c2.activity_end_date
ELSE c1.activity_start_date
END) x(activity_start_date)
)
SELECT TOP 1 client_id, client_sign_up_date, activity_start_date,
hoursOfActivicty
FROM CTEdiff AS c1
INNER JOIN clients AS c2 ON c1.activity_client_id = c2.client_id
CROSS APPLY (SELECT SUM(diff) / 60.0
FROM CTEdiff AS c3
WHERE c3.seq <= c1.seq) x(hoursOfActivicty)
WHERE hoursOfActivicty >= 5
ORDER BY seq
Common Table Expressions and ROW_NUMBER() were introduced with SQL Server 2005, so the above query should work for that version.
Demo here
The first CTE, i.e. CTErn, produces the following output:
client_id activity_type start_date end_date rn
112 Interview 2015-06-01 09:00 2015-06-01 11:00 1
112 CV updating 2015-06-01 09:30 2015-06-01 11:30 2
112 Course 2015-06-02 09:00 2015-06-02 16:00 3
112 Interview 2015-06-03 09:00 2015-06-03 10:00 4
The second CTE, i.e. CTEdiff, uses the above table expression in order to calculate time difference for each record, taking into consideration any overlapps with the previous record:
client_id activity_type start_date end_date diff seq
112 Interview 2015-06-01 09:00 2015-06-01 11:00 120 1
112 CV updating 2015-06-01 11:00 2015-06-01 11:30 30 2
112 Course 2015-06-02 09:00 2015-06-02 16:00 420 3
112 Interview 2015-06-03 09:00 2015-06-03 10:00 60 4
The final query calculates the cumulative sum of time difference and selects the first record that exceeds 5 hours of activity.
The above query will work for simple interval overlaps, i.e. when just the end date of an activity overlaps the start date of the next activity.
A Geometric Approach
For another issue, I've taken a geometric approach to date
packing. Namely, I convert dates and times to a sql geometry
type and utilize geometry::UnionAggregate to merge the ranges.
I don't believe this will work in sql-server 2005. But your
problem was such an interesting puzzle that I wanted to see
whether the geometrical approach would work. So any future
users running into this problem that have access to a later
version can consider it.
Code Description
In 'numbers':
I build a table representing a sequence
Swap it out with your favorite way to make a numbers table.
For a union operation, you won't ever need more rows than in
your original table, so I just use it as the base to build it.
In 'mergeLines':
I convert the dates to floats and use those floats
to create geometrical points.
I then connect these points via STUnion and STEnvelope.
Finally, I merge all these lines via UnionAggregate. The resulting
'lines' geometry object might contain multiple lines, but if they
overlap, they turn into one line.
In 'redate':
I use the numbers CTE to extract the individual lines inside 'lines'.
I envelope the lines which here ensures that the lines are stored
only as its two endpoints.
I read the endpoint x values and convert them back to their time
representations (This is usually the end goal, but you need more).
I calculate the difference in minutes between activity start and
end dates (I do this first in seconds then divide by 60 for the
sake of a precision issue).
I calculate the cumulative sume of these minutes for each row.
In the outer query:
I align the previous cumulative minutes sum with each current row
I filter for the row where the 5hr goal was met but where the
previous minutes shows that the 5hr goal for the previous row
was not met.
I then calculate where in the current row's range the user has
met the 5 hours, to not only arrive at the date the five hour
goal was met, but the exact time.
The Code
with
numbers as (
select row_number() over (order by (select null)) i
from #activities -- where I put your data
),
mergeLines as (
select activity_client_id,
lines = geometry::UnionAggregate(line)
from #activities
cross apply (select
startP = geometry::Point(convert(float,activity_start_date), 0, 0),
stopP = geometry::Point(convert(float,activity_end_date), 0, 0)
) pointify
cross apply (select line = startP.STUnion(stopP).STEnvelope()) lineify
group by activity_client_id
),
redate as (
select client_id = activity_client_id,
activities_start_date,
activities_end_date,
minutes,
rollingMinutes = sum(minutes) over(
partition by activity_client_id
order by activities_start_date
rows between unbounded preceding and current row
)
from mergeLines ml
join numbers n on n.i between 1 and ml.lines.STNumGeometries()
cross apply (select line = ml.lines.STGeometryN(i).STEnvelope()) l
cross apply (select
activities_start_date = convert(datetime, l.line.STPointN(1).STX),
activities_end_date = convert(datetime, l.line.STPointN(3).STX)
) unprepare
cross apply (select minutes =
round(datediff(s, activities_start_date, activities_end_date) / 60.0,0)
) duration
)
select client_id,
activities_start_date,
activities_end_date,
met_5hr_goal = dateadd(minute, (60 * 5) - prevRoll, activities_start_date)
from (
select *,
prevRoll = lag(rollingMinutes) over (
partition by client_id
order by rollingMinutes
)
from redate
) ranker
where rollingMinutes >= 60 * 5
and prevRoll < 60 * 5;

Resources