Distinct valúes while using multiple joins - sql-server

I want to obtain one single row but it is returning 3 rows. The form_audit table has 3 rows with the same REF_NO,
How to get one distinct row?

Hope this will give you some idea how it works
CREATE TABLE #tblBackers (
amountBacked MONEY,
backersAccountID INT,
playerBacked INT,
Dates DATETIME)
INSERT INTO #tblBackers VALUES (25,12345,99999,GETDATE())
INSERT INTO #tblBackers VALUES (25,12345,99999,GETDATE())
INSERT INTO #tblBackers VALUES (25,12345,99699,GETDATE())
INSERT INTO #tblBackers VALUES (25,12345,99999,GETDATE())
INSERT INTO #tblBackers VALUES (25,98765,88888,GETDATE())
INSERT INTO #tblBackers VALUES (25,76543,77777,GETDATE())
GO
SELECT DISTINCT * FROM #tblBackers
SELECT DISTINCT TOP 1 * FROM #tblBackers
And use the ORDER BY to get the latest record.

If you only want one record per ref_no, then consider adding a group by clause on that field.
select
fa.ref_no
/*, other stuff*/
from
FORM_AUDIT fa
/* other joins*/
group by
fa.ref_no;
Keep in mind that this group by clause will aggregate all records that share the same ref_no into a single record in the result set. That means that you can no longer include fields like fh.* and fcd.* in the select list directly, because you have no guarantee that each of those fields has only one value per row in your result set. For every such field that you want to include in your select list, you must either:
Include that field in your group by clause, keeping in mind that doing so will no longer necessarily give you exactly one row per distinct ref_no; now you'll get one row per distinct combination of ref_no and whatever else you add to the group by clause, or
Use one of SQL Server's aggregate functions to transform the set of zero-to-many values in the field you're adding into a single value. Aggregate functions are things like max(), sum(), count(), etc. There's a complete list at the link.
Good luck!

Related

How can I keep the order of column values in a union select?

I am doing a bulk insert into a table using SELECT and UNION. I need the order of the SELECT values to be unchanged when calling the INSERT, but it seems that the values are being inserted in an ascending order, rather than the order I specify.
For example, the below insert statement
declare #QuestionOptionMapping table
(
[ID] [int] IDENTITY(1,1)
, [QuestionOptionID] int
, [RateCode] varchar(50)
)
insert into #QuestionOptionMapping (
RateCode
)
select
'PD0116'
union
select
'PL0090'
union
select
'PL0091'
union
select
'DD0026'
union
select
'DD0025'
SELECT * FROM #QuestionOptionMapping
renders the data as
(5 row(s) affected)
ID QuestionOptionID RateCode
----------- ---------------- --------------------------------------------------
1 NULL DD0025
2 NULL DD0026
3 NULL PD0116
4 NULL PL0090
5 NULL PL0091
(5 row(s) affected)
How can the select of the inserted data return the same order as when it was inserted?
SQL Server stores your rows as an unordered set. The data points may or may not be contiguous, and they may or may not be in the "order" the data was specified in your insert statements.
When you query the data, the engine will retrieve the rows in the most efficient order, as determined by the optimizer. There is no guarantee that the order will be the same every time you query the data.
The only way to guarantee the order of your result set is to include an explicit ORDER BY clause with your SELECT statement.
See this answer for a much more in depth discussion as to why this the case. Default row order in SELECT query - SQL Server 2008 vs SQL 2012
By using the SELECT/UNION option for your INSERT statement, you're creating an unordered set that SQL Server ingests as a set, not as a series of inputs. Separate your inserts into discrete statements if you need them to have the IDENTITY values applied in order. Better yet, if the row numbering matters, don't leave it to chance. Explicitly number the rows on insert.
SQL tables do represent unordered sets. However, the identity column on an insert will follow the ordering of the order by.
Your data is getting out of order because of the duplicate elimination in the union. However, I would suggest writing the query to explicitly sort the data:
insert into #QuestionOptionMapping (RateCode)
select ratecode
from (values (1, 'PD0116'),
(2, 'PL0090'),
(3, 'PL0091'),
(4, 'DD0026'),
(5, 'DD0025')
) v(ord, ratecode)
order by ord;
Then be sure to use order by for the select:
select qom.*
from #QuestionOptionMapping qom
order by id;
Note that this also uses the values() table constructor, which is a very handy syntax.
If you're not selecting from tables?
Then you could insert VALUES, instead of a select with unions.
insert into #QuestionOptionMapping (RateCode) values
('PD0116')
,('PL0090')
,('PL0091')
,('DD0026')
,('DD0025')
Or in your query, change all the UNION to UNION ALL.
The difference between a UNION and a UNION ALL is that a UNION will remove duplicate rows.
While UNION ALL just stiches the resultsets from the selects together.
And for UNION to find those duplicates, internally it first has to sort them.
But a UNION ALL doesn't care about uniqueness, so it doesn't need to sort.
A 3th option would be to simply change from 1 insert statement to multiple insert statements.
One insert per value. Thus avoiding UNION completely.
But that anti-golfcoding method is also the most wordy.
Your problem is you are not putting them in in the order you think. UNION is distinct values only and it will typically sort the values to facilitate the distinct. Run the select statement alone and you will see.
If you insert using values then order is preserved:
insert into #QuestionOptionMapping (RateCode) values
('PD0116'), ('PL0090'), ('PL0091'), ('DD0026'), ('DD0025')
select * from #QuestionOptionMapping order by ID

How does DISTINCT work in SQL Server 2008 R2? Are there other options? [duplicate]

I need to retrieve all rows from a table where 2 columns combined are all different. So I want all the sales that do not have any other sales that happened on the same day for the same price. The sales that are unique based on day and price will get updated to an active status.
So I'm thinking:
UPDATE sales
SET status = 'ACTIVE'
WHERE id IN (SELECT DISTINCT (saleprice, saledate), id, count(id)
FROM sales
HAVING count = 1)
But my brain hurts going any farther than that.
SELECT DISTINCT a,b,c FROM t
is roughly equivalent to:
SELECT a,b,c FROM t GROUP BY a,b,c
It's a good idea to get used to the GROUP BY syntax, as it's more powerful.
For your query, I'd do it like this:
UPDATE sales
SET status='ACTIVE'
WHERE id IN
(
SELECT id
FROM sales S
INNER JOIN
(
SELECT saleprice, saledate
FROM sales
GROUP BY saleprice, saledate
HAVING COUNT(*) = 1
) T
ON S.saleprice=T.saleprice AND s.saledate=T.saledate
)
If you put together the answers so far, clean up and improve, you would arrive at this superior query:
UPDATE sales
SET status = 'ACTIVE'
WHERE (saleprice, saledate) IN (
SELECT saleprice, saledate
FROM sales
GROUP BY saleprice, saledate
HAVING count(*) = 1
);
Which is much faster than either of them. Nukes the performance of the currently accepted answer by factor 10 - 15 (in my tests on PostgreSQL 8.4 and 9.1).
But this is still far from optimal. Use a NOT EXISTS (anti-)semi-join for even better performance. EXISTS is standard SQL, has been around forever (at least since PostgreSQL 7.2, long before this question was asked) and fits the presented requirements perfectly:
UPDATE sales s
SET status = 'ACTIVE'
WHERE NOT EXISTS (
SELECT FROM sales s1 -- SELECT list can be empty for EXISTS
WHERE s.saleprice = s1.saleprice
AND s.saledate = s1.saledate
AND s.id <> s1.id -- except for row itself
)
AND s.status IS DISTINCT FROM 'ACTIVE'; -- avoid empty updates. see below
db<>fiddle here
Old sqlfiddle
Unique key to identify row
If you don't have a primary or unique key for the table (id in the example), you can substitute with the system column ctid for the purpose of this query (but not for some other purposes):
AND s1.ctid <> s.ctid
Every table should have a primary key. Add one if you didn't have one, yet. I suggest a serial or an IDENTITY column in Postgres 10+.
Related:
In-order sequence generation
Auto increment table column
How is this faster?
The subquery in the EXISTS anti-semi-join can stop evaluating as soon as the first dupe is found (no point in looking further). For a base table with few duplicates this is only mildly more efficient. With lots of duplicates this becomes way more efficient.
Exclude empty updates
For rows that already have status = 'ACTIVE' this update would not change anything, but still insert a new row version at full cost (minor exceptions apply). Normally, you do not want this. Add another WHERE condition like demonstrated above to avoid this and make it even faster:
If status is defined NOT NULL, you can simplify to:
AND status <> 'ACTIVE';
The data type of the column must support the <> operator. Some types like json don't. See:
How to query a json column for empty objects?
Subtle difference in NULL handling
This query (unlike the currently accepted answer by Joel) does not treat NULL values as equal. The following two rows for (saleprice, saledate) would qualify as "distinct" (though looking identical to the human eye):
(123, NULL)
(123, NULL)
Also passes in a unique index and almost anywhere else, since NULL values do not compare equal according to the SQL standard. See:
Create unique constraint with null columns
OTOH, GROUP BY, DISTINCT or DISTINCT ON () treat NULL values as equal. Use an appropriate query style depending on what you want to achieve. You can still use this faster query with IS NOT DISTINCT FROM instead of = for any or all comparisons to make NULL compare equal. More:
How to delete duplicate rows without unique identifier
If all columns being compared are defined NOT NULL, there is no room for disagreement.
The problem with your query is that when using a GROUP BY clause (which you essentially do by using distinct) you can only use columns that you group by or aggregate functions. You cannot use the column id because there are potentially different values. In your case there is always only one value because of the HAVING clause, but most RDBMS are not smart enough to recognize that.
This should work however (and doesn't need a join):
UPDATE sales
SET status='ACTIVE'
WHERE id IN (
SELECT MIN(id) FROM sales
GROUP BY saleprice, saledate
HAVING COUNT(id) = 1
)
You could also use MAX or AVG instead of MIN, it is only important to use a function that returns the value of the column if there is only one matching row.
If your DBMS doesn't support distinct with multiple columns like this:
select distinct(col1, col2) from table
Multi select in general can be executed safely as follows:
select distinct * from (select col1, col2 from table ) as x
As this can work on most of the DBMS and this is expected to be faster than group by solution as you are avoiding the grouping functionality.
I want to select the distinct values from one column 'GrondOfLucht' but they should be sorted in the order as given in the column 'sortering'. I cannot get the distinct values of just one column using
Select distinct GrondOfLucht,sortering
from CorWijzeVanAanleg
order by sortering
It will also give the column 'sortering' and because 'GrondOfLucht' AND 'sortering' is not unique, the result will be ALL rows.
use the GROUP to select the records of 'GrondOfLucht' in the order given by 'sortering
SELECT GrondOfLucht
FROM dbo.CorWijzeVanAanleg
GROUP BY GrondOfLucht, sortering
ORDER BY MIN(sortering)

Using distinct clause in SQL Server

In SQL Server 2008, using distinct clause is always doing an implicit order by or I need to specify an order by for that? I want to be sure that using distinct put data in order.
Here you have an example, distinct is doing order by
create table #MyTable (id int)
insert into #MyTable values (3)
insert into #MyTable values (2)
insert into #MyTable values (8)
select distinct id from #MyTable
Although the typical implementation of distinct is done using some kind of ordered data structure, the order it uses may not be the one you need.
There are:
No guarantees that the data will be ordered any which way
No guarantees that the same query on the same data later/tomorrow will return the data in the same (arbitrary) order
No guarantees that the observed ordering will be consistent
The distinct clause does not imply ordering. As such, if you need the data ordered in a particular manner, you have to add an order by clause to the query.
Also note that one of the data structures that can be used is a hashtable/hashset, and though these may produce data that looks ordered if there are only a few values placed into them, with larger quantities this will break down, and regardless, this is implementation specific and undocumented. Do not rely on any such behavior.
DISTINCT clause has nothing to do with ordering records. You have to explicitly use ORDER BY clause for sorting.
select distinct id
from #MyTable
Order By id

SQL WHERE NOT EXISTS (skip duplicates)

Hello I'm struggling to get the query below right. What I want is to return rows with unique names and surnames. What I get is all rows with duplicates
This is my sql
DECLARE #tmp AS TABLE (Name VARCHAR(100), Surname VARCHAR(100))
INSERT INTO #tmp
SELECT CustomerName,CustomerSurname FROM Customers
WHERE
NOT EXISTS
(SELECT Name,Surname
FROM #tmp
WHERE Name=CustomerName
AND ID Surname=CustomerSurname
GROUP BY Name,Surname )
Please can someone point me in the right direction here.
//Desperate (I tried without GROUP BY as well but get same result)
DISTINCT would do the trick.
SELECT DISTINCT CustomerName, CustomerSurname
FROM Customers
Demo
If you only want the records that really don't have duplicates (as opposed to getting duplicates represented as a single record) you could use GROUP BY and HAVING:
SELECT CustomerName, CustomerSurname
FROM Customers
GROUP BY CustomerName, CustomerSurname
HAVING COUNT(*) = 1
Demo
First, I thought that #David answer is what you want. But rereading your comments, perhaps you want all combinations of Names and Surnames:
SELECT n.CustomerName, s.CustomerSurname
FROM
( SELECT DISTINCT CustomerName
FROM Customers
) AS n
CROSS JOIN
( SELECT DISTINCT CustomerSurname
FROM Customers
) AS s ;
Are you doing that while your #Tmp table is still empty?
If so: your entire "select" is fully evaluated before the "insert" statement, it doesn't do "run the query and add one row, insert the row, run the query and get another row, insert the row, etc."
If you want to insert unique Customers only, use that same "Customer" table in your not exists clause
SELECT c.CustomerName,c.CustomerSurname FROM Customers c
WHERE
NOT EXISTS
(SELECT 1
FROM Customers c1
WHERE c.CustomerName = c1.CustomerName
AND c.CustomerSurname = c1.CustomerSurname
AND c.Id <> c1.Id)
If you want to insert a unique set of customers, use "distinct"
Typically, if you're doing a WHERE NOT EXISTS or WHERE EXISTS, or WHERE NOT IN subquery,
you should use what is called a "correlated subquery", as in ypercube's answer above, where table aliases are used for both inside and outside tables (where inside table is joined to outside table). ypercube gave a good example.
And often, NOT EXISTS is preferred over NOT IN (unless the WHERE NOT IN is selecting from a totally unrelated table that you can't join on.)
Sometimes if you're tempted to do a WHERE EXISTS (SELECT from a small table with no duplicate values in column), you could also do the same thing by joining the main query with that table on the column you want in the EXISTS. Not always the best or safest solution, might make query slower if there are many rows in that table and could cause many duplicate rows if there are dup values for that column in the joined table -- in which case you'd have to add DISTINCT to the main query, which causes it to SORT the data on all columns.
-- Not efficient at all.
And, similarly, the WHERE NOT IN or NOT EXISTS correlated subqueries can be accomplished (and give the exact same execution plan) if you LEFT OUTER JOIN the table you were going to subquery -- and add a WHERE . IS NULL.
You have to be careful using that, but you don't need a DISTINCT. Frankly, I prefer to use the WHERE NOT IN subqueries or NOT EXISTS correlated subqueries, because the syntax makes the intention clear and it's hard to go wrong.
And you do not need a DISTINCT in the SELECT inside such subqueries (correlated or not). It would be a waste of processing (and for WHERE EXISTS or WHERE IN subqueries, the SQL optimizer would ignore it anyway and just use the first value that matched for each row in the outer query). (Hope that makes sense.)

Ordering query result by list of values

I'm working on a sql query that is passed a list of values as a parameter, like
select *
from ProductGroups
where GroupID in (24,12,7,14,65)
This list is constructed of relations used througout the database, and must be kept in this order.
I would like to order the results by this list. I only need the first result, but it could be the one with GroupId 7 in this case.
I can't query like
order by (24,12,7,14,65).indexOf(GroupId)
Does anyone know how to do this?
Additional info:
Building a join works and running it in the mssql query editor, but...
Due to limitiations of the software sending the query to mssql, I have to pass it to some internal query builder as 1 parameter, thus "24,12,7,14,65". And I don't know upfront how many numbers there will be in this list, could be 2, could be 20.
You can also order by on a CASE:
select *
from ProductGroups
where GroupID in (24,12,7,14,65)
order by case GroupId
when 7 then 1 -- First in ordering
when 14 then 2 -- Second
else 3
end
Use a table variable or temporary table with an identity column, feed in your values and join to that, e.g.
declare #rank table (
ordering int identity(1,1)
, number int
)
insert into #rank values (24)
insert into #rank values (12)
insert into #rank values (7)
insert into #rank values (14)
insert into #rank values (65)
select pg.*
from ProductGroups pg
left outer join
#rank r
on pg.GroupId = r.number
order by
r.ordering
I think I might have found a possible solution (but it's ugly):
select *
from ProductGroups
where GroupID in (24,12,7,14,65)
order by charindex(
','+cast(GroupID as varchar)+',' ,
','+'24,12,7,14,65'+',')
this will order the rows by the position they occur in the list. And I can pass the string like I need too.
Do a join with a temporary table, in which you have the values that you want to filter by as rows. Add a column to it that has the order that you want as the second column, and sort by it.

Resources