only display one row when key field is the same - sql-server

I have created a key field (C) by joining two columns(A&C). I want to run an sql that says, if column C is unique take only the top row.
Sample data:-
A B C D
10022 Blue 10022Blue Buggy
10300 Red 10300Red Noodle
10300 Red 10300Red Sammy
so I only want one line to show for 10300Red
Cheers

One way to do it is with a cte and ROW_NUMBER():
;WITH CTE AS
(
SELECT A,
B,
C,
D,
ROW_NUMBER() OVER(PARTITION BY C ORDER BY (SELECT NULL)) rn
FROM Table
)
SELECT A, B, C, D
FROM CTE
WHERE rn = 1
Note: You did say you want the "first" record, but you didn't specify the order of the records. Since tables in a relational database are unsorted by nature, "first" is simply an arbitrary row, hence "order by (select null)"

Do it this way:
select distinct A, B, C from tablename

You can find the result set by grouping it, then join it with the main table.
SELECT
A.*
FROM
YourTable A INNER JOIN
(
SELECT
G.C,
MAX(G.D) D
FROM
YourTable G
GROUP BY
G.C
) B ON A.C = B.C AND A.D = B.D

Related

SQL Server: COUNT used with WHERE

so I'd consider myself really new to SQL Server so the less used keywords like HAVING and COUNT() etc. So when I got this error:
An aggregate may not appear in the WHERE clause unless it is in a
subquery contained in a HAVING clause or a select list, and the column
being aggregated is an outer reference.
I was really confused by the last bit. "a select list?" "column being aggregated is an outer reference?" Can anyone explain this in layman's terms?
It's basically saying you need to use a subquery that references another table if you want to use aggregates in those places:
SELECT A,
B,
C
FROM Table T
WHERE A = (SELECT MAX(D) FROM Table T2 WHERE T2.A = T.A)
--Valid, MAX(D) is an outer reference to another table we call T2
SELECT A,
B,
C
FROM Table T
WHERE A = MAX(D) --Invalid
The HAVING version would be something like this:
SELECT A,
B,
C
FROM Table T
GROUP BY A,
B,
C
HAVING COUNT(*) > (SELECT MAX(D) FROM Table T2) --Valid
SELECT A,
B,
C
FROM Table T
GROUP BY A,
B,
C
HAVING COUNT(*) > MAX(D) --Invalid
The SELECT-list is
SELECT a, b, c ... <=== this list of expressions after SELECT
An outer reference is a column of the surrounding query referenced in a subquery. This is clearly explained here: Aggregates with an Outer Reference
Note that the WHERE-clause is applied before grouping (with GROUP BY) and the HAVING-clause after grouping. Therefore the aggregate functions can appear in the HAVING-clause but not in the WHERE clause.
SELECT customer_id, COUNT(*) as number_of_orders, SUM(amount) AS total_amount
FROM cust_orders
WHERE year(order_date) = 2017 -- filters records before grouping.
GROUP BY customer_id -- groups while counting and summing up.
HAVING COUNT(*) > 2 -- count is available here.
This selects all the customer orders of the year 2017 and calculates the totals per customer. Only customers having more than 2 orders in this year are returned.
Basically what it says is that you cannot do this:
WHERE COUNT(ColumnA) = 100
You need a HAVING after the GROUP BY:
SELECT COUNT(ColumnA) AS CountA, ColumnB, ColumnC
FROM Table
GROUP BY ColumnB, ColumnC
HAVING COUNT(ColumnA) = 100

DISTINCT Query for One Column but not any other columns

I have a small 2 column table. Lets say the columns are A and B. Column A needs to be distinct so that it does not display a repeated value. Column B needs to have everything selected in the query so if there are multiple B values for a value in A, the multiple values will display. How can I write a query that will do this for me?
While the duplicates are now gone...there is a bunch of blank space in my dropdown.
You could use a CTE to simplify it:
WITH CTE AS
(
SELECT A, B,
RN = ROW_NUMBER() OVER (PARTITION BY A ORDER BY A, B)
FROM dbo.TableName
)
SELECT A = CASE WHEN RN = 1 THEN Cast(A as varchar(50)) ELSE '' END,
B
FROM CTE

GROUP BY doesn't contain specific column

I have the following statement in MSSQL
SELECT a, b, MAX(t)
FROM table
GROUP BY a, b
What I want is just to show c and d columns for each specific row in the result. How can I do that?
It sounds like you're looking for ROW_NUMBER() or RANK() (the former will ignore ties, the latter will include them), something like:
;With Ranked as (
SELECT a,b,c,d,t,
ROW_NUMBER() OVER (PARTITION BY a,b
ORDER BY t desc) as rn
FROM table
)
SELECT * from Ranked where rn = 1
Which will return one row for each unique combination of the a,b columns, choosing the other values such that they come from the row with the highest t value (and, as I say, this variant ignores ties).

concatenate columns from 2 tables in resultset

Here is simplified version of my schema. Using Sql Server 2012 enterprise edition.
CREATE table #abc (a INT , b INT);
CREATE TABLE #def ( a INT , c INT ,d INT);
INSERT INTO #abc values(1,23),(1,24);
INSERT INTO #def VALUES(1,53,54),(1,56,57)
Table #abc JOINs TO #def ON COLUMN a
Basically it is concatenation of rows from both tables based on column a. Tried inner join\cross apply but they all results in cross join kind of resultset understandably . I have workaround using another temp table(then update) but kind of feel that this can be done easily in single select . I am missing something simple here.
Need output like this:
a b c d
1 23 53 54
1 24 56 57
Thanks
-N
You need some sort of sequence number to join the tables together. You can generate one using row_number() as follows:
select a.a, a.b, d.c, d.d
from (select a.*, row_number() over (order by (select NULL)) as seqnum
from #abc a
) a join
(select d.*, row_number() over (order by (select NULL)) as seqnum
from #def d
) d
on a.seqnum = d.seqnum;
Now the caution, caution, caution. The order by clause does not really specify the ordering, so the sequence numbers may not be what you expect. You should really have a column to specify the ordering.
You need to have a unique key value in each row to be able to join the tables in the way you would like. Then, an inner join will return the result set you require.
If you introduce referential integrity between the tables, then this will be enforced and return the expected results.

set difference in SQL query

I'm trying to select records with a statement
SELECT *
FROM A
WHERE
LEFT(B, 5) IN
(SELECT * FROM
(SELECT LEFT(A.B,5), COUNT(DISTINCT A.C) c_count
FROM A
GROUP BY LEFT(B,5)
) p1
WHERE p1.c_count = 1
)
AND C IN
(SELECT * FROM
(SELECT A.C , COUNT(DISTINCT LEFT(A.B,5)) b_count
FROM A
GROUP BY C
) p2
WHERE p2.b_count = 1)
which takes a long time to run ~15 sec.
Is there a better way of writing this SQL?
If you would like to represent Set Difference (A-B) in SQL, here is solution for you.
Let's say you have two tables A and B, and you want to retrieve all records that exist only in A but not in B, where A and B have a relationship via an attribute named ID.
An efficient query for this is:
# (A-B)
SELECT DISTINCT A.* FROM (A LEFT OUTER JOIN B on A.ID=B.ID) WHERE B.ID IS NULL
-from Jayaram Timsina's blog.
You don't need to return data from the nested subqueries. I'm not sure this will make a difference withiut indexing but it's easier to read.
And EXISTS/JOIN is probably nicer IMHO then using IN
SELECT *
FROM
A
JOIN
(SELECT LEFT(B,5) AS b1
FROM A
GROUP BY LEFT(B,5)
HAVING COUNT(DISTINCT C) = 1
) t1 On LEFT(A.B, 5) = t1.b1
JOIN
(SELECT C AS C1
FROM A
GROUP BY C
HAVING COUNT(DISTINCT LEFT(B,5)) = 1
) t2 ON A.C = t2.c1
But you'll need a computed column as marc_s said at least
And 2 indexes: one on (computed, C) and another on (C, computed)
Well, not sure what you're really trying to do here - but obviously, that LEFT(B, 5) expression keeps popping up. Since you're using a function, you're giving up any chance to use an index.
What you could do in your SQL Server table is to create a computed, persisted column for that expression, and then put an index on that:
ALTER TABLE A
ADD LeftB5 AS LEFT(B, 5) PERSISTED
CREATE NONCLUSTERED INDEX IX_LeftB5 ON dbo.A(LeftB5)
Now use the new computed column LeftB5 instead of LEFT(B, 5) anywhere in your query - that should help to speed up certain lookups and GROUP BY operations.
Also - you have a GROUP BY C in there - is that column C indexed?
If you are looking for just set difference between table1 and table2,
the below query is simple that gives the rows that are in table1, but not in table2, such that both tables are instances of the same schema with column names as
columnone, columntwo, ...
with
col1 as (
select columnone from table2
),
col2 as (
select columntwo from table2
)
...
select * from table1
where (
columnone not in col1
and columntwo not in col2
...
);

Resources