Grouping rows to minimise deviation - sql-server

I have a Employee Wages table like this, with their EmpID and their wages.
EmpId | Wages
================
101 | 1280
102 | 1600
103 | 1400
104 | 1401
105 | 1430
106 | 1300
I need to write a Stored Procedure in SQL Server, to group the Employees according to their wages, such that similar salaried people are in groups together and the deviations within the group is as minimum as possible.
There are no other conditions or rules mentioned.
The output should look like this
EmpId | Wages | Group
=======================
101 | 1280 | 1
106 | 1300 | 1
103 | 1400 | 2
104 | 1401 | 2
105 | 1430 | 2
102 | 1600 | 3

You can use a query like the following:
SELECT EmpId, Wages,
DENSE_RANK() OVER (ORDER BY CAST(Wages - t.min_wage AS INT) / 100) AS grp
FROM mytable
CROSS JOIN (SELECT MIN(Wages) AS min_wage FROM mytable) AS t
The query calculates the distance of each wage from the minimum wage and then uses integer division by 100 in order to place records in slices. So all records that have a deviation that is between 0 - 99 off the minimum wage are placed in the first slice. The second slice contains records off by 100 - 199 from the minimum wage, etc.

You can for +-30 deviation as the below:
DECLARE #Tbl TABLE (EmpId INT, Wages INT)
INSERT INTO #Tbl
VALUES
(99, 99),
(100, 101),
(101, 1280),
(102, 1600),
(103, 1400),
(104, 1401),
(105, 1430),
(106, 1300)
;WITH CTE AS ( SELECT *, ROW_NUMBER() OVER (ORDER BY Wages) AS RowId FROM #Tbl )
SELECT
A.EmpId ,
A.Wages ,
DENSE_RANK() OVER (ORDER BY MIN(B.RowId)) [Group]
FROM
CTE A CROSS JOIN CTE B
WHERE
ABS(B.Wages - A.Wages) BETWEEN 0 AND 30 -- Here +-30
GROUP BY A.EmpId, A.Wages
ORDER BY A.Wages
Result:
EmpId Wages Group
----------- ----------- --------------------
99 99 1
100 101 1
101 1280 2
106 1300 2
103 1400 3
104 1401 3
105 1430 3
102 1600 4

Related

Count number of sale by order amount

I'm using SQL Server 2008 R2 and doing a analysis on a table that contains CustomerID, OrderAmount, RegionID. I need to count number of orders in different categories according to the OrderAmount in each region. And if there is no sales in the category, returns 0.
Sample of data:
CustomerID | OrderAmount | RegionID
10001 | 50 | 801
10002 | 25 | 801
10003 | 200 | 802
10001 | 100 | 802
10002 | 20 | 802
...
And my expected result is:
RegionID | CategoryID | Num_of_Sales
801 | 1 | 2 -----Below 100
801 | 2 | 0 -----100-200
802 | 1 | 2 -----Below 100
802 | 2 | 1 -----100-200
...
My question is:
1. How to return 0 on the category that is empty?
2. Is there a better way to write the code?(Not using UNION)
WITH Category1 AS(
SELECT * FROM Sales_Table
WHERE NewAmount <= 100
)
, Category2 AS(
SELECT * FROM Sales_Table
WHERE NewAmount BETWEEN 101 AND 200
)
, [...]
SELECT Region_ID, CategoryID, Num_of_Sales
FROM (
SELECT Region_ID, COUNT(*) AS [Num_of_Sales], 1 AS CategoryID
FROM Category1
GROUP BY Region_ID
UNION
SELECT Region_ID, COUNT(*) AS [Num_of_Sales], 2 AS CategoryID
FROM Category2
GROUP BY Region_ID
UNION
[...]
)z
ORDER BY Region_ID, CategoryID
So, I use these code and get my result, but the count did not return 0 on the 100-200 Category at Region 801.
A table holding RegionID and CategoryID is needed for what you are trying to achieve. Then we can use that table to do a join as shown below.
With RegCatSales as
(
select RegionID,C,COUNT(*) AS [Num_of_Sales]
from
(
select RegionID,OrderAmount,
CASE
WHEN OrderAmount <= 100 THEN 1
WHEN OrderAmount BETWEEN 101 AND 200 THEN 2
END as C
from Sales_Table x
) xx
group by RegionID, C
),
Regions as
(
select distinct RegionID from RegCatSales
),
Categories as
(
select distinct C from RegCatSales
),
RegCat AS(
select distinct RegionID, C as CategoryID from Regions,Categories
)
select rc.RegionID,rc.CategoryID, ISNULL([Num_of_Sales],0) NUM_Of_Sales from
RegCatSales rcs
right join RegCat rc
on rc.RegionID= rcs.RegionID and rc.CategoryID = rcs.C
order by rc.RegionID, rc.CategoryID

How to find first positive value and third consecutive positive values in SQL?

I have tried using ROW_NUMBER but haven't quite got it. Any ideas on the best way to achieve this.
I am looking to find:
- What month did they first cash flow.
- What month did they average 3 months cash flow.
Sample Data:
Office ,Balance & Year month
------------------------------
| Office | Balance | YrMo |
| 12 | 111 | 201510 |
| 12 | 222 | 201511 |
| 12 | -444 | 201512 |
| 12 | -777 | 201601 |
| 12 | 555 | 201602 |
| 12 | 666 | 201603 |
| 12 | -888 | 201604 |
| 12 | 777 | 201605 |
| 40 | -555 | 201510 |
| 40 | -200 | 201511 |
| 40 | 0 | 201512 |
| 40 | 100 | 201601 |
| 40 | -555 | 201602 |
| 40 | 666 | 201603 |
| 40 | 777 | 201604 |
| 40 | 888 | 201605 |
| 40 | 999 | 201606 |
The first Positive Balances would be:
-office 12 , Balance 111 , YrMo 201510
-office 40 , Balance 100 , YrMo 201601
The first month the office averaged 3 positive balance:
-office 40 , Balance 999 , YrMo 201606
Here is the #test table script:
IF OBJECT_ID('tempdb..#test') IS NOT NULL
DROP TABLE #test
GO
CREATE TABLE #test (office INT , Balance INT, YrMo INT ) ;
INSERT INTO #test VALUES (12 , 111 , 201510) ;
INSERT INTO #test VALUES (12 , 222 , 201511) ;
INSERT INTO #test VALUES (12 , -444 , 201512) ;
INSERT INTO #test VALUES (12 , -777 , 201601) ;
INSERT INTO #test VALUES (12 , 555 , 201602) ;
INSERT INTO #test VALUES (12 , 666 , 201603) ;
INSERT INTO #test VALUES (12 , -888 , 201604) ;
INSERT INTO #test VALUES (12 , 777 , 201605) ;
INSERT INTO #test VALUES (40 , -555 , 201510) ;
INSERT INTO #test VALUES (40 , -200 , 201511) ;
INSERT INTO #test VALUES (40 , 0 , 201512) ;
INSERT INTO #test VALUES (40 , 100 , 201601) ;
INSERT INTO #test VALUES (40 , -555 , 201602) ;
INSERT INTO #test VALUES (40 , 666 , 201603) ;
INSERT INTO #test VALUES (40, 777 , 201604) ;
INSERT INTO #test VALUES (40 , 888 , 201605) ;
INSERT INTO #test VALUES (40 , 999 , 201606) ;
Thanks in advance
;with cteFirst as (
Select *
,FirstPos=Row_Number() over (Partition By Office Order By YrMo,Balance) from #Test Where Balance>0
),
cteCons as (
Select *
,TestCons=Lag(IIf(IIf(sign(balance)=1,1,0)=1,1,0),1,0) over (Partition By Office Order By YrMo)
+Lag(IIf(IIf(sign(balance)=1,1,0)=1,1,0),2,0) over (Partition By Office Order By YrMo)
+Lag(IIf(IIf(sign(balance)=1,1,0)=1,1,0),3,0) over (Partition By Office Order By YrMo)
from #Test
)
Select *,Status='First Positive' from cteFirst where FirstPos=1
Union All
Select *,Status='3 Cons' from cteCons where TestCons=3
Return
office Balance YrMo FirstPos Status
12 111 201510 1 First Positive
40 100 201601 1 First Positive
40 999 201606 3 3 Consequtive
I added another example. This one traps gaps in Dates.
If you want to see all the flags and how the data progresses, remove
the -- before Select * from cteFinal Order by Office,YrMo
I added another office which has 3 consecutive positive balances, but the months are NOT (no June). Notice Office 99 fails to meet the consecutive months criteria
office Balance YrMo
99 199 201605
99 299 201607
99 399 201608
The updated query is as follows
;with cteBase as (
Select *
,RowNr = Row_Number() over (Partition By Office Order By Office,YrMo,Balance)
,MthSeq = case when cast(YrMo as int)-Lag(YrMo,1,YrMo-1) over (Partition By Office Order By YrMo) in (1,89) then 1 else 0 end
,IsPos = IIf(Balance>0,1,null)
from #Test
)
,cteFinal as (
Select *
,PosRowNr = min(RowNr*IsPos) over (Partition By Office Order By RowNr)
,TestCons = MthSeq * (
Lag(IIf(IIf(sign(balance)=1,1,0)=1,1,0),1,0) over (Partition By Office Order By YrMo)
+Lag(IIf(IIf(sign(balance)=1,1,0)=1,1,0),2,0) over (Partition By Office Order By YrMo)
+Lag(IIf(IIf(sign(balance)=1,1,0)=1,1,0),3,0) over (Partition By Office Order By YrMo)
)
From cteBase
)
--Select * from cteFinal Order by Office,YrMo
Select Office
,Balance
,YrMo
,Status = IIf(RowNr=PosRowNr,'First Positive','')+IIf(TestCons=3,'Consecutive Months','')
From cteFinal
Where TestCons=3 or RowNr=PosRowNr
Order by Status Desc,Office,YrMo
The Results are
Office Balance YrMo Status
12 111 201510 First Positive
40 100 201601 First Positive
99 199 201605 First Positive
40 999 201606 Consecutive Months

Sum one column and subtract over second column

I want to display the subtraction of two columns. From the first column I need to get sum all value and substract with each value from the second column.
This is the table structure:
id | name | col1 | col2 | date
------------------------------------
432| xxx | 0 | 15 |2015-11-17
432| yyy | 10 | 30 |2015-11-19
432| zzz | 60 | 40 |2015-11-20
433| aaa | 0 | 60 |2015-11-17
433| bbb | 80 | 20 |2015-11-19
433| ccc | 60 | 10 |2015-11-20
Formula should go:
sum(col1) = 70 =>>> WHERE ID = 432
70 - col2 col3
-------------------------
=> 70 - 15 = 55
=> 70 - (30 + 15) = 25
=> 70 - (40 + 45) = -15
---------------------------
sum(col1) = 140 ===>> WHERE ID = 433
140 - col2 col3
-------------------------
=> 140 - 60 = 80
=> 140 - (60 + 20) = 60
=> 140 - (10 + 80) = 50
result is col3 and Output should be like as
id | name | col1 | col2 | col3 | date
-------------------------------------------
432| xxx | 0 | 15 | 55 | 2015-11-17
432| yyy | 10 | 30 | 25 | 2015-11-19
432| zzz | 60 | 40 | -15 | 2015-11-20
433| aaa | 0 | 60 | 80 | 2015-11-17
433| bbb | 80 | 20 | 60 | 2015-11-19
433| ccc | 60 | 10 | 50 | 2015-11-20
EDIT: What if I need the values ​​vary depending on the group as a 432 and 433 id column.
Schema Info
DECLARE #TEST TABLE
(
id INT,
name VARCHAR(10),
col1 INT,
col2 int
)
INSERT INTO #TEST VALUES
(432,'xxx',0, 15 ),
(432,'yyy',10, 30 ),
(432,'zzz',60, 40 ),
(433,'aaa',0, 60 ),
(433,'bbb',80, 20 ),
(433,'ccc',60, 10 )
Query
SELECT T.id ,
T.name ,
T.col1 ,
T.col2 ,
SUM(T.col1) OVER( PARTITION BY T.id ORDER BY T.id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
- SUM(T.col2) OVER ( PARTITION BY T.id ORDER BY T.id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS col3
FROM #TEST T;
Results
id | name | col1 | col2 | col3 |
---------------------------------
432 | xxx | 0 | 15 | 55 |
432 | yyy | 10 | 30 | 25 |
432 | zzz | 60 | 40 | -15 |
433 | aaa | 0 | 60 | 80 |
433 | bbb | 80 | 20 | 60 |
433 | ccc | 60 | 10 | 50 |
SQL Fiddle
This should work:
declare #total int = (select sum(col1) from Table)
select id, name, col1, col2, #total - (select sum(col2) from Table where date <= T.date) as col3, date from Table T
I was assuming you want to substract every time the previous total (based on the date). I hope this is OK.
You can use simple select query with cross apply
SELECT ID
,NAME
,COL1
,COL2
,A.C1 - (
SUM(COL2) OVER (
ORDER BY ID
)
) AS COL3
FROM TABLE1 T1
CROSS APPLY (
SELECT SUM(COL1) AS C1
FROM TABLE1 T2
) A
You can use two subqueries in SELECT fields list.
With the first you'll get a sum of all rows of your table named yourtable, in the second you'll get a sum of all rows before the current. So you can subtract two values.
Try this:
SELECT T.id, T.name, T.col1, T.col2,
ISNULL(
(SELECT SUM(T2.col1) FROM yourtable T2)
,0) -
ISNULL(
(SELECT SUM(T3.col2) FROM yourtable T3
WHERE T3.id <= T.id)
,0) as col3,
t.date
FROM yourtable T
Go on Sql fiddle example
EDIT
SELECT T.id, T.name, T.col1, T.col2,
ISNULL(
(SELECT SUM(T2.col1) FROM yourtable T2 where T2.id = T.id)
,0) -
ISNULL(
(SELECT SUM(T3.col2) FROM yourtable T3
WHERE T3.id = T.id AND T3.date <= T.date)
,0) as col3,
t.date
FROM yourtable T
Go on Sql Fiddle edited example
Pay attention: A deep edit can be a different question. Two queries are differents
Pay attention: it's no good a field named ID with repeated values

SQL Query to fill missing gaps across time and get last non-null value

I have the following table in my database:
Month|Year | Value
1 |2013 | 100
4 |2013 | 101
8 |2013 | 102
2 |2014 | 103
4 |2014 | 104
How can I fill in "missing" rows from the data, so that if I query from 2013-03 through 2014-03, I would get:
Month|Year | Value
3 |2013 | 100
4 |2013 | 101
5 |2013 | 101
6 |2013 | 101
7 |2013 | 101
8 |2013 | 102
9 |2013 | 102
10 |2013 | 102
11 |2013 | 102
12 |2013 | 102
1 |2014 | 102
2 |2014 | 103
3 |2014 | 103
As you can see I want to repeat the previous Value for a missing row.
I have created a SQL Fiddle of this solution for you to play with.
Essentially it creates a Work Table #Months and then Cross joins this will all years in your data set. This produces a complete list of all months for all years. I then left join the Test data provided in your example (Table named TEST - see SQL fiddle for schema) back into this list to give me a complete list with Values for the months that have them. The next issue to overcome was using the last months values if this months didn't have any. For that, I used a correlated sub-query i.e. joined tblValues back on itself only where it matched the maximum Rank of a row which has a value. This then gives a complete result set!
If you want to filter by year\month you can add this into a WHERE clause just before the final Order By.
Enjoy!
Test Schema
CREATE TABLE TEST( Month tinyint, Year int, Value int)
INSERT INTO TEST(Month, Year, Value)
VALUES
(1,2013,100),
(4,2013,101),
(8,2013,102),
(2,2014,103),
(4,2014,104)
Query
DECLARE #Months Table(Month tinyint)
Insert into #Months(Month)Values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12);
With tblValues as (
select Rank() Over (ORDER BY y.Year, m.Month) as [Rank],
m.Month,
y.Year,
t.Value
from #Months m
CROSS JOIN ( Select Distinct Year from Test ) y
LEFT JOIN Test t on t.Month = m.Month and t.Year = y.Year
)
Select t.Month, t.Year, COALESCE(t.Value, t1.Value) as Value
from tblValues t
left join tblValues t1 on t1.Rank = (
Select Max(tmax.Rank)
From tblValues tmax
Where tmax.Rank < t.Rank AND tmax.Value is not null)
Order by t.Year, t.Month

The highest value from list-distinct

Can anyone help me with query, I have table
vendorid, agreementid, sales
12001 1004 700
5291 1004 20576
7596 1004 1908
45 103 345
41 103 9087
what is the goal ?
when agreemtneid >1 then show me data when sales is the highest
vendorid agreementid sales
5291 1004 20576
41 103 9087
Any ideas ?
Thx
Well you could try using a CTE and ROW_NUMBER something like
;WITH Vals AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY AgreementID ORDER BY Sales DESC) RowID
FROM MyTable
WHERE AgreementID > 1
)
SELECT *
FROM Vals
WHERE RowID = 1
This will avoid you returning multiple records with the same sale.
If that was OK you could try something like
SELECT *
FROM MyTable mt INNER JOIN
(
SELECT AgreementID, MAX(Sales) MaxSales
FROM MyTable
WHERE AgreementID > 1
) MaxVals ON mt.AgreementID = MaxVals.AgreementID AND mt.Sales = MaxVals.MaxSales
SELECT TOP 1 WITH TIES *
FROM MyTable
ORDER BY DENSE_RANK() OVER(PARTITION BY agreementid ORDER BY SIGN (SIGN (agreementid - 2) + 1) * sales DESC)
Explanation
We break table MyTable into partitions by agreementid.
For each partition we construct a ranking or its rows.
If agreementid is greater than 1 ranking will be equal to ORDER BY sales DESC.
Otherwise ranking for every single row in partition will be the same: ORDER BY 0 DESC.
See how it looks like:
SELECT *
, SIGN (SIGN (agreementid - 2) + 1) * sales AS x
, DENSE_RANK() OVER(PARTITION BY agreementid ORDER BY SIGN (SIGN (agreementid - 2) + 1) * sales DESC) AS rnk
FROM MyTable
+----------+-------------+-------+-------+-----+
| vendorid | agreementid | sales | x | rnk |
+----------|-------------|-------+-------+-----+
| 0 | 0 | 3 | 0 | 1 |
| -1 | 0 | 7 | 0 | 1 |
| 0 | 1 | 3 | 0 | 1 |
| -1 | 1 | 7 | 0 | 1 |
| 41 | 103 | 9087 | 9087 | 1 |
| 45 | 103 | 345 | 345 | 2 |
| 5291 | 1004 | 20576 | 20576 | 1 |
| 7596 | 1004 | 1908 | 1908 | 2 |
| 12001 | 1004 | 700 | 700 | 3 |
+----------+-------------+-------+-------+-----+
Then using TOP 1 WITH TIES construction we leave only rows where rnk equals 1.
you can try like this.
SELECT TOP 1 sales FROM MyTable WHERE agreemtneid > 1 ORDER BY sales DESC
I really do not know the business logic behind agreement_id > 1. It looks to me you want the max sales (with ties) by agreement id regardless of vendor_id.
First, lets create a simple sample database.
-- Sample table
create table #sales
(
vendor_id int,
agreement_id int,
sales_amt money
);
-- Sample data
insert into #sales values
(12001, 1004, 700),
(5291, 1004, 20576),
(7596, 1004, 1908),
(45, 103, 345),
(41, 103, 9087);
Second, let's solve this problem using a common table expression to get a result set that has each row paired with the max sales by agreement id.
The select statement just applies the business logic to filter the data to get your answer.
-- CTE = max sales for each agreement id
;
with cte_sales as
(
select
vendor_id,
agreement_id,
sales_amt,
max(sales_amt) OVER(PARTITION BY agreement_id) AS max_sales
from
#sales
)
-- Filter by your business logic
select * from cte_sales where sales_amt = max_sales and agreement_id > 1;
The screen shot below shows the exact result you wanted.

Resources