SQL group data by 2 column - sql-server

I having issue in grouping 2 columns,keep getting error :
Column '#TEM1.STATUS' is invalid in the select list because it is not
contained in either an aggregate function or the GROUP BY clause.
below are my stored procedure code :
SELECT COUNT(Employee_ID),Roster_Code,STATUS
FROM #TEM1 GROUP BY Roster_Code,Department
TEM1 table actually is a temporary table.i want to group them by department and roster code.
below is my TEM1 data:
my expected output :

Specifying "GROUP BY Roster_Code,Department" basically means that you expect to see one row in the output for each different combination of Roster_Code and Department which exists in the table.
For example, the output would contain one row for Roster_Code=A, Department=HRS. But there are some rows in this group with STATUS=IN, and other with STATUS=ABSENT. So it is not clear what you expect to be displayed in the output in this situation, and so you see an error message.
This is why you cannot simply select 'STATUS' - you either need to include it in the GROUP BY clause (so you would have different rows in the output for each STATUS) or to use some aggregate function which tells SQL Server how to combine the multiple values into a single value which it can output.
It looks like what you are actually trying to do is to count the number of employees with 'IN' statuses, and the total number of employees. This probably means that you will want to use the COUNT() aggregate function.
Here is a step towards the output you want:
SELECT Department,
Roster_Code,
COUNT(CASE WHEN STATUS='IN' THEN 1 ELSE NULL END) IN_STATUSES,
COUNT(*) TOTAL_STATUSES
FROM #TEM1 GROUP BY Department, Roster_Code
It looks like you also want to classify any roster code other than A/B/D/E as 'Other', so we can add another step to do that:
SELECT Department,
Roster_Code,
COUNT(CASE WHEN STATUS='IN' THEN 1 ELSE NULL END) IN_STATUSES,
COUNT(*) TOTAL_STATUSES
FROM
(
SELECT Department,
CASE WHEN Roster_Code IN ('A','B','D','E') THEN Roster_Code ELSE 'Other' END Roster_Code,
STATUS
FROM #TEM1
) x
GROUP BY Department, Roster_Code
It looks like you also want to perform a "pivot" operation, which will take the separate rows we currently have for A/B/D/E/Other and convert these into their own columns in a single row. Then, you will want to combine the status counts we currently have into strings of the form "3/4" - this is just a case of concatenating them (e.g. IN_STATUSES + '/' + TOTAL_STATUSES).

Related

how to select first rows distinct by a column name in a sub-query in sql-server?

Actually I am building a Skype like tool wherein I have to show last 10 distinct users who have logged in my web application.
I have maintained a table in sql-server where there is one field called last_active_time. So, my requirement is to sort the table by last_active_time and show all the columns of last 10 distinct users.
There is another field called WWID which uniquely identifies a user.
I am able to find the distinct WWID but not able to select the all the columns of those rows.
I am using below query for finding the distinct wwid :
select distinct(wwid) from(select top 100 * from dbo.rvpvisitors where last_active_time!='' order by last_active_time DESC) as newView;
But how do I find those distinct rows. I want to show how much time they are away fromm web apps using the diff between curr time and last active time.
I am new to sql, may be the question is naive, but struggling to get it right.
If you are using proper data types for your columns you won't need a subquery to get that result, the following query should do the trick
SELECT TOP 10
[wwid]
,MAX([last_active_time]) AS [last_active_time]
FROM [dbo].[rvpvisitors]
WHERE
[last_active_time] != ''
GROUP BY
[wwid]
ORDER BY
[last_active_time] DESC
If the column [last_active_time] is of type varchar/nvarchar (which probably is the case since you check for empty strings in the WHERE statement) you might need to use CAST or CONVERT to treat it as an actual date, and be able to use function like MIN/MAX on it.
In general I would suggest you to use proper data types for your column, if you have dates or timestamps data use the "date" or "datetime2" data types
Edit:
The query aggregates the data based on the column [wwid], and for each returns the maximum [last_active_time].
The result is then sorted and filtered.
In order to add more columns "as-is" (without aggregating them) just add them in the SELECT and GROUP BY sections.
If you need more aggregated columns add them in the SELECT with the appropriate aggregation function (MIN/MAX/SUM/etc)
I suggest you have a look at GROUP BY on W3
To know more about the "execution order" of the instruction you can have a look here
You can solve problem like this by rank ordering the results by a key and finding the last x of those items, this removes duplicates while preserving the key order.
;
WITH RankOrdered AS
(
SELECT
*,
wwidRank = ROW_NUMBER() OVER (PARTITION BY wwid ORDER BY last_active_time DESC )
FROM
dbo.rvpvisitors
where
last_active_time!=''
)
SELECT TOP(10) * FROM RankOrdered WHERE wwidRank = 1
If my understanding is right, below query will give the desired output.
You can have conditions according to your need.
select top 10 distinct wwid from dbo.rvpvisitors order by last_active_time desc

What did I do wrong with this subquery for SQL Server?

I've got a table called tblEventLocationStock. It stores sales information for stock at a certain location and event. I'm trying to get a list of items that have a different starting count than the end count from the previous event. I've got this query, but I get the "subquery returned more than 1 value" error:
SELECT ID,EventID,LocationID,StockID,StartQty,UnitPrice,PhysicalSalesQty,PhysicalSalesValue,PhysicalEndQty,TillSoldQty,TillSoldValue
FROM tblEventLocationStock ELS
where StartQty <> (
select PhysicalEndQty from tblEventLocationStock ELSO
where ELS.StockID=ELSO.StockID
and ELS.LocationID=ELSO.LocationID
and ELS.EventID=(ELSO.EventID+1000))
ORDER BY ID desc
I use ELS.EventID=ELSO.EventID+1000 because the event ID's go up in intervals of 1000.
What's odd is that even though I get the "subquery returned more than 1 value" error, I still get 10 rows in the results tab. Those 10 results do appear to have a different starting count for the items than the same item at the same location from the previous event. Also, I get no results if I use an order by, but I still get 10 results if I don't use an order by.
What's even more odd is that I get those 10 results if I run the query with some joins to some other tables so I can get names of the stock items and locations instead of just IDs, but if I do it without the joins, I get no results.
Try This,
SELECT ID, EventID, LocationID, StockID, StartQty, UnitPrice, PhysicalSalesQty,
PhysicalSalesValue, PhysicalEndQty, TillSoldQty, TillSoldValue
FROM tblEventLocationStock ELS
WHERE NOT EXISTS (
SELECT 1
FROM tblEventLocationStock ELSO
WHERE ELS.StockID = ELSO.StockID AND
ELS.StartQty <> ELSO.PhysicalEndQty AND
ELS.LocationID = ELSO.LocationID AND
ELS.EventID = (ELSO.EventID+1000)
)
ORDER BY ID DESC

SQL Get Second Record

I am looking to retrieve only the second (duplicate) record from a data set. For example in the following picture:
Inside the UnitID column there is two separate records for 105. I only want the returned data set to return the second 105 record. Additionally, I want this query to return the second record for all duplicates, not just 105.
I have tried everything I can think of, albeit I am not that experience, and I cannot figure it out. Any help would be greatly appreciated.
You need to use GROUP BY for this.
Here's an example: (I can't read your first column name, so I'm calling it JobUnitK
SELECT MAX(JobUnitK), Unit
FROM JobUnits
WHERE DispatchDate = 'oct 4, 2015'
GROUP BY Unit
HAVING COUNT(*) > 1
I'm assuming JobUnitK is your ordering/id field. If it's not, just replace MAX(JobUnitK) with MAX(FieldIOrderWith).
Use RANK function. Rank the rows OVER PARTITION BY UnitId and pick the rows with rank 2 .
For reference -
https://msdn.microsoft.com/en-IN/library/ms176102.aspx
Assuming SQL Server 2005 and up, you can use the Row_Number windowing function:
WITH DupeCalc AS (
SELECT
DupID = Row_Number() OVER (PARTITION BY UnitID, ORDER BY JobUnitKeyID),
*
FROM JobUnits
WHERE DispatchDate = '20151004'
ORDER BY UnitID Desc
)
SELECT *
FROM DupeCalc
WHERE DupID >= 2
;
This is better than a solution that uses Max(JobUnitKeyID) for multiple reasons:
There could be more than one duplicate, in which case using Min(JobUnitKeyID) in conjunction with UnitID to join back on the UnitID where the JobUnitKeyID <> MinJobUnitKeyID` is required.
Except, using Min or Max requires you to join back to the same data (which will be inherently slower).
If the ordering key you use turns out to be non-unique, you won't be able to pull the right number of rows with either one.
If the ordering key consists of multiple columns, the query using Min or Max explodes in complexity.

Postgresql inner select with distinct

I'm using Postgresql 9.2 and have a simple students table as follow
id | proj_id | mark | name | test_date
I have 2 queries which is described below
select * from (select distinct on (proj_id) proj_id , mark, name,
test_date from students )
t
where t.mark <= 1000
VS
select distinct on (proj_id) proj_id , mark, name, test_date from
students where mark <= 1000
when I run each query for more than 10000 records each query returns different result especially result count although for less than 3000 records the result would be the same.
is this postgresql 9.2 bug or I'm missing something ?
Your queries are producing two different sets of results because they are applying the logic differently.
The first query is getting a distinct set of results, and then applying the 'mark' filter.
The second query is applying the 'mark' filter, and then getting a distinct set of results.
As you don't have any ordering applied the first query could potential return a different number of rows each time it is run - as the mark field could contain any of the values that relate to the proj_id.

Obtain Duplicated Data

Please suggest an SQL query to find duplicate customers across different stores, e.g. customer table has id, name, phone, storeid in it, I need to write queries for the following:
Duplicate customers within a store
Duplicate customers across different stores
Table data:
id name phone storeid
-----------------------------------
1 abc 123 4
2 abc 123 4
3 abc 123 5
The first query should show only first 2 records, and the second query should show all 3 records.
You can do something like the following:-
SELECT Name,Phone, COUNT(Id) NumberOfTimes, StoreID
FROM Customers
GROUP BY Name,Phone,StoreID
HAVING COUNT(Id) > 1
ORDER BY StoreID
Hope this helps.
Solution
You can try this for the first query:
SELECT *
FROM customer,
WHERE 1 < (
SELECT COUNT(name)
FROM customer
WHERE name IN (
SELECT name FROM customer
)
) AND
1 < (
SELECT COUNT(storeid)
FROM customer
WHERE storeid IN (
SELECT storeid FROM customer
)
);
Now, for the second query, use the above one, but remove everything after and including the AND.
Explanation
Let's look at the query step-by-step:
SELECT *
FROM customer
This is stating you want all the columns from the customers table.
WHERE 1 < (
SELECT COUNT(name)
FROM customer
WHERE name IN (
SELECT name FROM customer
)
)
This is a pretty long query, so let's look from inside-outward.
WHERE name IN (
SELECT name FROM customer
)
This time we're getting all the names of customers and checking if their is match in our curret table. To be truthful, we might not need this whole section....
SELECT COUNT(name)
FROM customer
This is stating we want the total number of times each name appears (count) in the customers table that matches the where clause.
WHERE 1 < (
....
)
Here, we are comparing the result from the subquery (the number of duplicated names) and checking to see if it is greater than l (i.e., there is a duplicate).
AND
.....
The AND keyword indicates that this second condition must be true in addition to the previous conditions.
The full query should return all entries where both the names and store ids are duplicated; if you remove everything including and after the AND, that will result in all entries which have the same name, but not neccessarily the right store id.
Notes
The other two answers are suggesting grouping duplicated data, but in your particular case, I think you do want the duplicated entries as per your expected results (albeit you should add more expected output info than that).
SELECT storeName, customerName FROM customer
WHERE id IN (
SELECT c.storeid
FROM customer 'c'
RIGHT JOIN store 's' ON (c.storeid = s.id)
GROUP BY c.storeid
HAVING COUNT(*) > 1
)
Basically, we are grouping by storeids, which allows us to count the times they occur in the customer table. We get the id of a case where there are multiple occurrences, and we select the storeName and CustomerName from the customer table that contains the id we got from the inner query.

Resources