Group by and count apperances in other column

Group by and count apperances in other column - sql-server

My table look like this:
----------------------------------------------
|id | action | building | date |
----------------------------------------------
|1 | IN | 1000 | 01-01-2015 |
|2 | OUT | 1000 | 01-01-2015 |
|3 | OUT | 1000 | 05-01-2015 |
|4 | IN | 2000 | 01-01-2015 |
----------------------------------------------
I would like to group the result by building and count the how many IN and OUT actions exists. Data and id doesn't matter in the result. The result should be like:
-------------------------
| Building | IN | OUT |
-------------------------
| 1000 | 1 | 2 |
| 2000 | 1 | 0 |
-------------------------
The action column can only contain IN and OUT.
My best attempt is:
select distinct (action), building, count(*)
from table
group by action, building
Output:
-------------------------------------
| action | Building | count(*) |
-------------------------------------
| IN | 1000 | 1 |
| OUT | 1000 | 2 |
| IN | 2000 | 1 |
-------------------------------------

Do it with conditional aggregation:
select Building,
sum(case when action = 'IN' then 1 else 0 end) as [IN],
sum(case when action = 'OUT' then 1 else 0 end) as [OUT],
from TableName
group by Building

You need to use conditional aggregation:
select building,
count(CASE WHEN action = 'IN' THEN 1 END) AS 'IN',
count(CASE WHEN action = 'OUT' THEN 1 END) AS 'OUT'
from table
group by building

Related

Sum Consecutive Months Based on Groups with Criteria

I am having trouble narrowing down sales in top regions that occurred in consecutive months. I know I need to use some form of window function with Row_Number or Dense_Rank, but I am having trouble getting the final output
Here is my source data:
+--------+-----------+------------+
| Fruit | SaleDate | Top_Region |
+--------+-----------+------------+
| Apple | 1/1/2017 | 1 |
| Apple | 2/1/2017 | 1 |
| Apple | 3/1/2017 | 1 |
| Apple | 4/1/2017 | 0 |
| Apple | 5/1/2017 | 0 |
| Apple | 6/1/2017 | 0 |
| Apple | 7/1/2017 | 1 |
| Apple | 8/1/2017 | 1 |
| Apple | 9/1/2017 | 1 |
| Apple | 10/1/2017 | 1 |
| Apple | 11/1/2017 | 0 |
| Apple | 12/1/2017 | 0 |
| Banana | 1/1/2017 | 0 |
| Banana | 2/1/2017 | 0 |
| Banana | 3/1/2017 | 1 |
| Banana | 4/1/2017 | 1 |
| Banana | 5/1/2017 | 1 |
| Banana | 6/1/2017 | 1 |
| Banana | 7/1/2017 | 1 |
| Banana | 8/1/2017 | 1 |
| Banana | 9/1/2017 | 0 |
| Banana | 10/1/2017 | 1 |
| Banana | 11/1/2017 | 1 |
| Banana | 12/1/2017 | 0 |
+--------+-----------+------------+
This is the expected output:
+--------+-----------+-----------+-------+
| Fruit | Start | End | Total |
+--------+-----------+-----------+-------+
| Apple | 1/1/2017 | 3/1/2017 | 3 |
| Apple | 7/1/2017 | 10/1/2017 | 4 |
| Banana | 3/1/2017 | 8/1/2017 | 6 |
| Banana | 10/1/2017 | 11/1/2017 | 2 |
+--------+-----------+-----------+-------+
The goal is to have instances of top region sales in succession with missing in one month.
So far I have tried a few different combinations, with this being the closest.
SELECT fruit,
MIN(saledate) AS spanStart ,
MAX(saledate) AS spanEnd,
COUNT(*) AS spanLength
FROM ( SELECT s.* ,
( ROW_NUMBER() OVER ( ORDER BY month )
- ROW_NUMBER() OVER ( PARTITION BY fruit, topregion ORDER BY month ) ) AS fruits
FROM #salesdata s
) s
GROUP BY fruit,fruits ,
topregion
HAVING topregion = 1
ORDER BY COUNT(*) DESC;
Any help would be greatly appreciated

This is a typical gaps-and-island problem. One strategy is to identify the groups of adjacent rows groups by computing the difference between two row_number()s. We can then filter on groups having top_region = 1 and use aggregation to get the start date, end date and number of records per group.
Your query is really close, but the first row_number() is missing a partition by fruit in its over() clause. And I find that aliasing that column fruits where another column is called fruit is error prone.
select
fruit,
min(sale_date) start_date,
max(sale_date) end_date,
count(*) total
from (
select
t.*,
row_number() over(partition by fruit order by sale_date) rn1,
row_number() over(partition by fruit, top_region order by sale_date) rn2
from mytable t
) t
where top_region = 1
group by fruit, rn1 - rn2
order by fruit, start_date
You can run the inner query separately to see the result it produces.
Demo on DB Fiddle:
fruit | start_date | end_date | total
:----- | :--------- | :--------- | ----:
Apple | 2017-01-01 | 2017-01-03 | 3
Apple | 2017-01-07 | 2017-01-10 | 4
Banana | 2017-01-03 | 2017-01-08 | 6
Banana | 2017-01-10 | 2017-01-11 | 2

What's an efficient way to count "previous" rows in SQL?

Hard to phrase the title for this one.
I have a table of data which contains a row per invoice. For example:
| Invoice ID | Customer Key | Date | Value | Something |
| ---------- | ------------ | ---------- | ------| --------- |
| 1 | A | 08/02/2019 | 100 | 1 |
| 2 | B | 07/02/2019 | 14 | 0 |
| 3 | A | 06/02/2019 | 234 | 1 |
| 4 | A | 05/02/2019 | 74 | 1 |
| 5 | B | 04/02/2019 | 11 | 1 |
| 6 | A | 03/02/2019 | 12 | 0 |
I need to add another column that counts the number of previous rows per CustomerKey, but only if "Something" is equal to 1, so that it returns this:
| Invoice ID | Customer Key | Date | Value | Something | Count |
| ---------- | ------------ | ---------- | ------| --------- | ----- |
| 1 | A | 08/02/2019 | 100 | 1 | 2 |
| 2 | B | 07/02/2019 | 14 | 0 | 1 |
| 3 | A | 06/02/2019 | 234 | 1 | 1 |
| 4 | A | 05/02/2019 | 74 | 1 | 0 |
| 5 | B | 04/02/2019 | 11 | 1 | 0 |
| 6 | A | 03/02/2019 | 12 | 0 | 0 |
I know I can do this using either a CTE like this...
(
select
count(*)
from table
where
[Customer Key] = t.[Customer Key]
and [Date] < t.[Date]
and Something = 1
)
But I have a lot of data and that's pretty slow. I know I can also use cross apply to achieve the same thing, but as far as I can tell that's not any better performing than just using a CTE.
So; is there a more efficient means of achieving this, or do I just suck it up?
EDIT: I originally posted this without the requirement that only rows where Something = 1 are counted. Mea culpa - I asked it in a hurry. Unfortunately I think that this means I can't use row_number() over (partition by [Customer Key])

Assuming you're using SQL Server 2012+ you can use Window Functions:
COUNT(CASE WHEN Something = 1 THEN CustomerKey END) OVER (PARTITION BY CustomerKey ORDER BY [Date]
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -1 AS [Count]
Old answer before new required logic:
COUNT(CustomerKey) OVER (PARTITION BY CustomerKey ORDER BY [Date]
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -1 AS [Count]

If you're not using 2012 an alternative is to use ROW_NUMBER
ROW_NUMBER() OVER (PARTITION BY CustomerKey ORDER BY [Date]) - 1 AS Count

How to find difference between two tables in MSSQL

I have got two tables 'Customer'.
The first one:
ID | UserID | Date
1. | 1 | 2018-05-01
2. | 1 | 2018-05-02
The second one:
ID | UserID | Date
1. | 1 | 2018-05-01
2. | 1 | 2018-05-02
3. | 1 | 2018-05-03
So, as you can see in the second table, there is one row more.
I have written so far this code:
;with cte_table1 as (
select UserID, count(id) cnt from db1.Customer group by UserID
),
cte_table2 as (
select UserID, count(id) cnt from db2.Customer group by UserID
)
select * from cte_table1 t1
join cte_table2 t2 on t2.UserID = t1.UserID
where t1.cnt <> t2.cnt
and this gives me expected result:
UserID | cnt | UserID | cnt
1 | 2 | 1 | 3
And so far, everything is fine. The thing is, these two tables have many rows and I'd like to have result with dates, where cnt does not match.
In other words, I'd like to have something like this:
UserID | cnt | Date | UserID | cnt | Date
1 | 2 | 2018-05-01 | 1 | 3 | 2018-05-01
1 | 2 | 2018-05-02 | 1 | 3 | 2018-05-01
1 | 2 | NULL | 1 | 3 | 2018-05-03
The best soulution would be resultset where both cte's are joined to give this:
UserID | cnt | Date | UserID | cnt | Date
1 | 2 | 2018-05-01 | 1 | 3 | 2018-05-01
1 | 2 | 2018-05-02 | 1 | 3 | 2018-05-01
1 | 2 | NULL | 1 | 3 | 2018-05-03
1 | 2 | 2018-05-30 | 1 | 3 | NULL

You should do a FULL OUTER JOIN query like below
Select
C1.UserID,
C1.cnt,
C1.Date,
C2.UserID,
C2.cnt,
C2.Date
from
db1.Customer C1
FULL OUTER JOIN
db2.Customer C2
on C1.UserId=C2.UserId and C1.date=C2.Date

SQL Server : update sequence number across multiple groups

I would like to update a table:
| id | type_id | created_at | sequence |
|----|---------|------------|----------|
| 1 | 1 | 2010-04-26 | NULL |
| 2 | 1 | 2010-04-27 | NULL |
| 3 | 2 | 2010-04-28 | NULL |
| 4 | 3 | 2010-04-28 | NULL |
To this (note that created_at is used for ordering, and sequence is "grouped" by type_id):
| id | type_id | created_at | sequence |
|----|---------|------------|----------|
| 1 | 1 | 2010-04-26 | 1 |
| 2 | 1 | 2010-04-27 | 2 |
| 3 | 2 | 2010-04-28 | 1 |
| 4 | 3 | 2010-04-28 | 1 |
Same question has been raised but for SQL Server.
Link
Thanks.

You can use ROW_NUMBER() to get sequence number per type_id slice. Use a CTE to make UPDATE operation simpler:
;WITH ToUpdate AS (
SELECT id, type_id, created_at, sequence,
ROW_NUMBER() OVER (PARTITION BY type_id ORDER BY created_at) AS newSeq
FROM mytable
)
UPDATE ToUpdate
SET sequence = newSeq
Demo here

how to check date records with in fdate and tdate range in sql server

Hi friends i have small doubt in sql server
here i want data based on condition
same id and status is equal to s then that date value be
how to write query in sql server
Table :emp
id |status |date(mm-dd-yy) |fdate(mm-dd-yy) |tdate(mm-dd-yy)
1 | S |03-16-11 | |
1 | b | | 03-15-11 |03-18-11
1 | s |03-17-11 | |
1 | b | | 04-20-12 |04-30-12
1 | S |04-20-12 | |
1 | s |04-10-12 | |
1 | s |10-01-14 | |
1 | b | |10-02-14 |10-25-14
2 | s |01-18-12 | |
2 | b | |01-18-12 |01-28-12
2 | b | |03-10-13 |03-24-13
2 | s |03-16-13 | |
2 | s |03-10-13 | |
2 | s |03-23-13 | |
2 | b | |04-20-13 |04-27-13
2 | s |07-01-14 | |
the table (status = s, id, date) compare it with status = b, same id number and date ( Date value from status s) with the date range of fdate and tdate .
if that data with in range then Billing yes other wise billing no
output like
id |status |date(mm-dd-yy) |fdate(mm-dd-yy) |tdate(mm-dd-yy) |Billing
1 | S |03-16-11 | | |yes
1 | s |03-17-11 | | |yes
1 | S |04-20-12 | | |yes
1 | s |04-10-12 | | |no
1 | s |10-01-14 | | |no
2 | s |01-18-12 | | |yes
2 | s |03-16-13 | | |yes
2 | s |03-10-13 | | |yes
2 | s |03-23-13 | | |yes
2 | s |07-01-14 | | |no
i tried query like below
select *
from ( select * from emp a where status ='s') a
inner join (select * from emp b where status='b') b
on a.pn=b.pn
where a.date<=b.date1 and a.date>=b.date2
its not give exactely result.
please tell me how to write query in sql server .

Try
select a.Id,
a.status,
a.date,
a.fdate,
a.tdate,
max(IsNull(case when a.date between b.fDate and b.tDate
then 'yes'
else 'no'
end, 'no')) Billing
from emp a
left join emp b
on a.Id=b.Id
where a.status ='s'
and b.status = 'b'
group by a.Id,
a.status,
a.date,
a.fdate,
a.tdate
Some questions/comments:
What are the fields: pn, date1 and date2?
date1 in your query is, I guess, bigger than date2