DISTINCT Query for One Column but not any other columns - sql-server

I have a small 2 column table. Lets say the columns are A and B. Column A needs to be distinct so that it does not display a repeated value. Column B needs to have everything selected in the query so if there are multiple B values for a value in A, the multiple values will display. How can I write a query that will do this for me?
While the duplicates are now gone...there is a bunch of blank space in my dropdown.

You could use a CTE to simplify it:
WITH CTE AS
(
SELECT A, B,
RN = ROW_NUMBER() OVER (PARTITION BY A ORDER BY A, B)
FROM dbo.TableName
)
SELECT A = CASE WHEN RN = 1 THEN Cast(A as varchar(50)) ELSE '' END,
B
FROM CTE

Related

Table with record occurrences count for each column in PostgreSQL

I'm looking to generate a table of columns containing count of the occurences of each unique record using PostgreSQL. My current approach is following:
with a as (SELECT count(*) as count_each_record_a
FROM (SELECT column1 as word from table_name) t
group by word),
b as (SELECT count(*) as count_each_record_b
FROM (SELECT column2 as word from table_name) t
group by word)
select * from a, b;
When I run part of the query, as in:
SELECT count(*) as count_each_record_b
FROM (SELECT column2 as word from table_name) t
group by word
then I get proper results.
When I run the whole query, I get the proper results for count_each_record_a, but column count_each_record_b is populated only with 1. Column count_each_record_b should be the same as when running the query by itself.
What am I doing wrong here?
I've actually found an answer, even if it's not that straightforward:
with a as (SELECT ARRAY(SELECT count(*)
FROM (SELECT a as word from table) t
group by word) as a),
b as (SELECT ARRAY (SELECT count(*)
FROM (SELECT b as word from table) t
group by word) as b)
select * from a, b
I get for each column one row with an array containing the occurences. Since I'm using this data further, it does serve my needs, but I would't call this a proper solution for the concept problem at hand.

SQL Server: COUNT used with WHERE

so I'd consider myself really new to SQL Server so the less used keywords like HAVING and COUNT() etc. So when I got this error:
An aggregate may not appear in the WHERE clause unless it is in a
subquery contained in a HAVING clause or a select list, and the column
being aggregated is an outer reference.
I was really confused by the last bit. "a select list?" "column being aggregated is an outer reference?" Can anyone explain this in layman's terms?
It's basically saying you need to use a subquery that references another table if you want to use aggregates in those places:
SELECT A,
B,
C
FROM Table T
WHERE A = (SELECT MAX(D) FROM Table T2 WHERE T2.A = T.A)
--Valid, MAX(D) is an outer reference to another table we call T2
SELECT A,
B,
C
FROM Table T
WHERE A = MAX(D) --Invalid
The HAVING version would be something like this:
SELECT A,
B,
C
FROM Table T
GROUP BY A,
B,
C
HAVING COUNT(*) > (SELECT MAX(D) FROM Table T2) --Valid
SELECT A,
B,
C
FROM Table T
GROUP BY A,
B,
C
HAVING COUNT(*) > MAX(D) --Invalid
The SELECT-list is
SELECT a, b, c ... <=== this list of expressions after SELECT
An outer reference is a column of the surrounding query referenced in a subquery. This is clearly explained here: Aggregates with an Outer Reference
Note that the WHERE-clause is applied before grouping (with GROUP BY) and the HAVING-clause after grouping. Therefore the aggregate functions can appear in the HAVING-clause but not in the WHERE clause.
SELECT customer_id, COUNT(*) as number_of_orders, SUM(amount) AS total_amount
FROM cust_orders
WHERE year(order_date) = 2017 -- filters records before grouping.
GROUP BY customer_id -- groups while counting and summing up.
HAVING COUNT(*) > 2 -- count is available here.
This selects all the customer orders of the year 2017 and calculates the totals per customer. Only customers having more than 2 orders in this year are returned.
Basically what it says is that you cannot do this:
WHERE COUNT(ColumnA) = 100
You need a HAVING after the GROUP BY:
SELECT COUNT(ColumnA) AS CountA, ColumnB, ColumnC
FROM Table
GROUP BY ColumnB, ColumnC
HAVING COUNT(ColumnA) = 100

only display one row when key field is the same

I have created a key field (C) by joining two columns(A&C). I want to run an sql that says, if column C is unique take only the top row.
Sample data:-
A B C D
10022 Blue 10022Blue Buggy
10300 Red 10300Red Noodle
10300 Red 10300Red Sammy
so I only want one line to show for 10300Red
Cheers
One way to do it is with a cte and ROW_NUMBER():
;WITH CTE AS
(
SELECT A,
B,
C,
D,
ROW_NUMBER() OVER(PARTITION BY C ORDER BY (SELECT NULL)) rn
FROM Table
)
SELECT A, B, C, D
FROM CTE
WHERE rn = 1
Note: You did say you want the "first" record, but you didn't specify the order of the records. Since tables in a relational database are unsorted by nature, "first" is simply an arbitrary row, hence "order by (select null)"
Do it this way:
select distinct A, B, C from tablename
You can find the result set by grouping it, then join it with the main table.
SELECT
A.*
FROM
YourTable A INNER JOIN
(
SELECT
G.C,
MAX(G.D) D
FROM
YourTable G
GROUP BY
G.C
) B ON A.C = B.C AND A.D = B.D

GROUP BY doesn't contain specific column

I have the following statement in MSSQL
SELECT a, b, MAX(t)
FROM table
GROUP BY a, b
What I want is just to show c and d columns for each specific row in the result. How can I do that?
It sounds like you're looking for ROW_NUMBER() or RANK() (the former will ignore ties, the latter will include them), something like:
;With Ranked as (
SELECT a,b,c,d,t,
ROW_NUMBER() OVER (PARTITION BY a,b
ORDER BY t desc) as rn
FROM table
)
SELECT * from Ranked where rn = 1
Which will return one row for each unique combination of the a,b columns, choosing the other values such that they come from the row with the highest t value (and, as I say, this variant ignores ties).

T-SQL get row count before TOP is applied

I have a SELECT that can return hundreds of rows from a table (table can be ~50000 rows). My app is interested in knowing the number of rows returned, it means something important to me, but it actually uses only the top 5 of those hundreds of rows. What I want to do is limit the SELECT query to return only 5 rows, but also tell my app how many it would have returned (the hundreds). This is the original query:
SELECT id, a, b, c FROM table WHERE a < 2
Here is what I came up with - a CTE - but I don't feel comfortable with the total row count appearing in every column. Ideally I would want a result set of the TOP 5 and a returned parameter for the total row count.
WITH Everything AS
(
SELECT id, a, b, c FROM table
),
DetermineCount AS
(
SELECT COUNT(*) AS Total FROM Everything
)
SELECT TOP (5) id, a, b, c, Total
FROM Everything
CROSS JOIN DetermineCount;
Can you think of a better way?
Is there a way in T-SQl to return the affected row count of a select top query before the top was applied? ##rowcount would return 5 but I wonder if there is a ##rowcountbeforetop sort of thing.
Thanks in advance for your help.
** Update **
This is what I'm doing now and I kind of like it over the CTE although CTEs as so elegant.
-- #count is passed in as an out param to the stored procedure
CREATE TABLE dbo.#everything (id int, a int, b int, c int);
INSERT INTO #everything
SELECT id, a, b, c FROM table WHERE a < 2;
SET #count = ##rowcount;
SELECT TOP (5) id FROM #everything;
DROP TABLE #everything;
Here's a relatively efficient way to get 5 random rows and include the total count. The random element will introduce a full sort no matter where you put it.
SELECT TOP (5) id,a,b,c,total = COUNT(*) OVER()
FROM dbo.mytable
ORDER BY NEWID();
Assuming you want the top 5 ordering by id ascending, this will do it with a single pass through your table.
; WITH Everything AS
(
SELECT id
, a
, b
, c
, ROW_NUMBER() OVER (ORDER BY id ASC) AS rn_asc
, ROW_NUMBER() OVER (ORDER BY id DESC) AS rn_desc
FROM <table>
)
SELECT id
, a
, b
, c
, rn_asc + rn_desc - 1 AS total_rows
FROM Everything
WHERE rn_asc <= 5
** Update **
This is what I'm doing now and I kind of like it over the CTE although CTEs as so elegant. Let me know what you think. Thanks!
-- #count is passed in as an out param to the stored procedure
CREATE TABLE dbo.#everything (id int, a int, b int, c int);
INSERT INTO #everything
SELECT id, a, b, c FROM table WHERE a < 2;
SET #count = ##rowcount;
SELECT TOP (5) id FROM #everything;
DROP TABLE #everything;

Resources