SQL Server how can I use COUNT DISTINCT(*) in HAVING clause? - sql-server

I have a procedure that counts all the unique [customerid] values and displays them in a SELECT list. I'm trying to sort the [customerid] where it is only "> 1" by using a HAVING clause, but SQL won't let me use the DISTINCT COUNT inside the HAVING. In my mind it makes sense that the HAVING should work with the COUNT but it does not:
USE MyCompany;
GO
SELECT DISTINCT COUNT(customerid) AS NumberOfOrdersMade, customerid AS
CustomerID
FROM tblItems_Ordered
GROUP BY customerid
HAVING DISTINCT COUNT(customerid) > 1
GO

You probably want SELECT COUNT(DISTINCT orderid) instead of DISTINCT COUNT(customerid):
USE MyCompany;
GO
SELECT COUNT(DISTINCT orderid) AS NumberOfOrdersMade, customerid AS
CustomerID
FROM tblItems_Ordered
GROUP BY customerid
HAVING COUNT(DISTINCT orderid) > 1
GO
When outside of the COUNT, the DISTINCT will eliminate duplicate rows from a result set, which will have no effect in your query because you are doing a GROUP BY. When inside the COUNT, DISTINCT will limit the count to unique values of the column that you pass to the count function. Thus, it makes more sense to use an orderid column instead of customerid when you're aliasing it as NumberOfOrdersMade.

Related

Is there a way to tell SQL Server to check the table for a duplicate before inserting each new row?

I tried using the SQL below to insert values from one table, importTable, into another table, POInvoicing. It appears that the way this query below works is it checks the POInvoicing table for any possible duplicates from the importTable and for those entries that are not duplicates, it inserts them into the table. The end result is SQL inserting duplicates that already exist in importTable. Is there a way to tell SQL Server to check the table for a possible duplicate entry, if not, add the next row. Then check the table for a duplicate entry, if not, add the next row. I know this will be slower but speed isn't an issue.
INSERT INTO POInvoicing
(VendorID, InvoiceNo)
SELECT dbo.importTable.VendorID,
dbo.importTable.InvoiceNo
FROM dbo.importTable
WHERE NOT EXISTS (SELECT VendorID,
InvoiceNo
FROM POInvoicing
WHERE POInvoicing.VendorID = dbo.importTable.VendorID AND
POInvoicing.InvoiceNo = dbo.importTable.InvoiceNo)
This isn't exactly the functionality I was hoping for. What I want is for the query to insert a row into the table and then check for "duplicates" before inserting the next row. What constitutes a duplicate in the importTable would be the combination of VendorID and InvoiceNo. There are about a dozen different columns in importTable and technically each row is distinct, so DISTINCT won't work here.
I can't simply remove duplicates from the importTable for a couple of reasons not relevant to the question above (though I can provide it if necessary), so that method is out.
If you really don't care (or refuse to tell us) how you want to decide between two rows with the same VendorID and InvoiceNo values, you can pick an arbitrary row like this:
;WITH NewRows AS
(
SELECT VendorID, InvoiceNo, InvoiceDate, /* ... other columns ... */
rn = ROW_NUMBER() OVER (PARTITION BY VendorID, InvoiceNo ORDER BY (SELECT NULL))
FROM dbo.importTable AS i
WHERE NOT EXISTS (SELECT 1 FROM dbo.POInvoicing AS p
WHERE p.VendorID = i.VendorID
AND p.InvoiceNo = i.InvoiceNo)
)
INSERT dbo.POInvoicing(VendorID, InvoiceNo, InvoiceDate /* , ... other columns ... */)
SELECT VendorID, InvoiceNo, InvoiceDate /* , ... other columns */
FROM NewRows
WHERE rn = 1;
If you later decide there is a specific row you want in the case of duplicates, you can swap out (SELECT NULL) for something else. For example, to take the row with the latest invoice date:
OVER (PARTITION BY VendorID, InvoiceNo ORDER BY InvoiceDate DESC)
Again, I wasn't asking questions here to be annoying, it was to help you get the solution you need. If you want SQL Server to pick between two duplicates, you can either tell it how to pick, or you'll have to accept arbitrary / non-deterministic results. You should not jump the fence for looping / cursors just because the first thing you tried didn't work the way you wanted it to.
Also please always specify the schema and use sensible table aliases.
Adding a primary key constraint or unique key constraint in your table to avoid duplicate data insertion.
Also use distinct keyword in your select query to avoid this.
Duplicate rows can also be eliminated by using group by or row_number() functions in SQL.
Using DISTINCT Keyword
INSERT INTO POInvoicing
(VendorID, InvoiceNo, InvoiceDate)
SELECT DISTINCT dbo.importTable.VendorID,
dbo.importTable.InvoiceNo,
dbo.importTable.InvoiceDate
FROM dbo.importTable
WHERE NOT EXISTS (SELECT VendorID,
InvoiceNo
FROM POInvoicing
WHERE POInvoicing.VendorID = dbo.importTable.VendorID
AND
POInvoicing.InvoiceNo = dbo.importTable.InvoiceNo)
Try this INNER JOIN
INSERT INTO POInvoicing
(VendorID, InvoiceNo, InvoiceDate)
SELECT dbo.importTable.VendorID,
dbo.importTable.InvoiceNo,
dbo.importTable.InvoiceDate
FROM dbo.importTable IM
INNER JOIN POInvoicing S ON S.POInvoicing.VendorID <>
dbo.importTable.VendorID
AND
S.POInvoicing.InvoiceNo <> dbo.importTable.InvoiceN

Get the id of the row with the max value with two grouping

We have a data structure with four columns:
ContractoreName, ProjectCode, InvoiceID, OrderID
We want to group the data by both ContractoreName and ProjectCode columns, and then get the InvoiceID of the row for each group with MAX(OrderID).
You could use ROW_NUMBER:
SELECT ContractorName, ProjectName, OrderId, InvoiceId
FROM (SELECT *, ROW_NUMBER() OVER(PARTITION BY ContractorName, ProjectName
ORDER BY OrderId DESC) AS rn
FROM tab
) AS sub
WHERE rn = 1;
ROW_NUMBER() is what I would call the canonical solution. In many cases, an old-fashioned solution has better performance:
select t.*
from t
where t.orderid = (select max(t2.orderid)
from t t2
where t2.contractorname = t.contractorname and
t2.projectname = t.projectname
);
This is especially true if there is an index on (contractorname, projectname, orderid).
Why is this faster? Basically, SQL Server can scan the table doing a lookup in an index. The lookup is really fast because the index is designed for it, so the scan is just a little faster than a full table scan.
When using row_number(), SQL Server has to scan the table to calculate the row number (and that can use the index, so it might be fast). But then it has to go back to the table to fetch the columns and apply the where clause. So, even if it uses an index, it is doing more work.
EDIT:
I should also point out that this can be done without a subquery:
select distinct contractorname, projectname,
max(orderid) over (partition by contractorname, projectname) as lastest_order,
first_value(invoiceid) partition by (order by contractorname, projectname order by orderid desc) as lastest_invoice
from t;
Unfortunately, SQL Server doesn't offer first_value() as an aggregation function, but you can use select distinct and get the same effect.

SSIS - Filter duplicate rows

I have a table (Id, ArticleCode, StoreCode, Adress, Number) that contains duplicate entries based on only these columns [ArticleCode, StoreCode].
Currently I can filter duplicate rows using Aggregate transformation, but the problem is in the output rows I have only two columns [Article, StoreCode] and I need the other columns as well.
Just in the OLEDB Source component use SQL Command as Source instead of Table name and write the following command (as a source):
SELECT [ID]
,[ArticleCode]
,[StoreCode]
,[Address]
,[Number] FROM (
SELECT [ID]
,[ArticleCode]
,[StoreCode]
,[Address]
,[Number]
,ROW_NUMBER() OVER(PARTITION BY [ArticleCode]
,[StoreCode] ORDER BY [ArticleCode]
,[StoreCode]) AS ROWNUM
FROM [dbo].[Table_1]) AS T1
WHERE T1.ROWNUM = 1
To get rid of duplicates and select unique records by [ArticleCode, StoreCode]:
select top 1 with ties
Id ,
ArticleCode ,
StoreCode ,
Adress ,
Number
from
YourTable
order by
row_number() over(partition by ArticleCode, StoreCode order by Id)
But which of two records have to be selected when [ArticleCode, StoreCode] are equal and [Adress, Number] differ?
If Id is auto-increment then order by Id gets the first entered record, order by Id desc - the last.
You have somehow to define which [Adress, Number] pair among the duplicates is correct to be selected.

Select a random database row based on another query

For internal control we would like to select a single random invoice for each of multiple invoice types and regions.
Here's the SQL to get a set of distinct Invoice Types and Regions
select InvoiceType,RegionID
from Invoices
group by InvoiceType, RegionID
For each row this returns I need to fetch a random row with that InvoiceType and RegionID. This is how I'm fetching random rows:
SELECT top 1
CustomerID
,InvoiceNum
,Name
FROM Invoices
JOIN Customers on Customers.CustomerID=Invoices.CustomerID
where InvoiceType=X and RegionID=Y
ORDER BY NEWID
But I don't know how to run this select statement foreach() row the first statement returns. I could do it programmatically but I would prefer an option using only a stored procedure as this query isn't supposed to need a program.
WITH cteInvoices AS (
SELECT CustomerID, InvoiceNum, Name,
ROW_NUMBER() OVER(PARTITION BY InvoiceType, RegionID ORDER BY NEWID()) AS RowNum
FROM Invoices
)
SELECT c.CustomerID, c.InvoiceNum, c.Name
FROM cteInvoices c
WHERE c.RowNum = 1;

Generate Row Serial Numbers in SQL Query

I have a customer transaction table. I need to create a query that includes a serial number pseudo column. The serial number should be automatically reset and start over from 1 upon change in customer ID.
Now, I am familiar with the row_number() function in SQL. This doesnt exactly solve my problem because to the best of my knowledge the serial number will not be reset in case the order of the rows change.
I want to do this in a single query (SQL Server) and without having to go through any temporary table usage etc. How can this be done?
Sometime we might don't want to apply ordering on our result set to add serial number. But if we are going to use ROW_NUMBER() then we have to have a ORDER BY clause. So, for that we can simply apply a tricks to avoid any ordering on the result set.
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS ItemNo, ItemName FROM ItemMastetr
For that we don't need to apply order by on our result set. We'll just add ItemNo on our given result set.
select
ROW_NUMBER() Over (Order by CustomerID) As [S.N.],
CustomerID ,
CustomerName,
Address,
City,
State,
ZipCode
from Customers;
I'm not certain, based on your question if you want numbered rows that will remember their numbers even if the underlying data changes (and gives a different ordering), but if you just want numbered rows - that reset on a change in customer ID, then try using the Partition by clause of row_number()
row_number() over(partition by CustomerID order by CustomerID)
Implementing Serial Numbers Without Ordering Any of the Columns
Demo SQL Script-
IF OBJECT_ID('Tempdb..#TestTable') IS NOT NULL
DROP TABLE #TestTable;
CREATE TABLE #TestTable (Names VARCHAR(75), Random_No INT);
INSERT INTO #TestTable (Names,Random_No) VALUES
('Animal', 363)
,('Bat', 847)
,('Cat', 655)
,('Duet', 356)
,('Eagle', 136)
,('Frog', 784)
,('Ginger', 690);
SELECT * FROM #TestTable;
There are ‘N’ methods for implementing Serial Numbers in SQL Server. Hereby, We have mentioned the Simple Row_Number Function to generate Serial Numbers.
ROW_NUMBER() Function is one of the Window Functions that numbers all rows sequentially (for example 1, 2, 3, …) It is a temporary value that will be calculated when the query is run. It must have an OVER Clause with ORDER BY. So, we cannot able to omit Order By Clause Simply. But we can use like below-
SQL Script
IF OBJECT_ID('Tempdb..#TestTable') IS NOT NULL
DROP TABLE #TestTable;
CREATE TABLE #TestTable (Names VARCHAR(75), Random_No INT);
INSERT INTO #TestTable (Names,Random_No) VALUES
('Animal', 363)
,('Bat', 847)
,('Cat', 655)
,('Duet', 356)
,('Eagle', 136)
,('Frog', 784)
,('Ginger', 690);
SELECT Names,Random_No,ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS SERIAL_NO FROM #TestTable;
In the Above Query, We can Also Use SELECT 1, SELECT ‘ABC’, SELECT ” Instead of SELECT NULL. The result would be Same.
SELECT ROW_NUMBER() OVER (ORDER BY ColumnName1) As SrNo, ColumnName1, ColumnName2 FROM TableName
select ROW_NUMBER() over (order by pk_field ) as srno
from TableName
Using Common Table Expression (CTE)
WITH CTE AS(
SELECT ROW_NUMBER() OVER(ORDER BY CustomerId) AS RowNumber,
Customers.*
FROM Customers
)
SELECT * FROM CTE
I found one solution for MYSQL its easy to add new column for SrNo or kind of tepropery auto increment column by following this query:
SELECT #ab:=#ab+1 as SrNo, tablename.* FROM tablename, (SELECT #ab:= 0)
AS ab
ALTER function dbo.FN_ReturnNumberRows(#Start int, #End int) returns #Numbers table (Number int) as
begin
insert into #Numbers
select n = ROW_NUMBER() OVER (ORDER BY n)+#Start-1 from (
select top (#End-#Start+1) 1 as n from information_schema.columns as A
cross join information_schema.columns as B
cross join information_schema.columns as C
cross join information_schema.columns as D
cross join information_schema.columns as E) X
return
end
GO
select * from dbo.FN_ReturnNumberRows(10,9999)

Resources