SQL GROUP BY with columns which contain mirrored values - sql-server

Sorry for the bad title. I couldn't think of a better way to describe my issue.
I have the following table:
Category | A | B
A | 1 | 2
A | 2 | 1
B | 3 | 4
B | 4 | 3
I would like to group the data by Category, return only 1 line per category, but provide both values of columns A and B.
So the result should look like this:
category | resultA | resultB
A | 1 | 2
B | 4 | 3
How can this be achieved?
I tried this statement:
SELECT category, a, b
FROM table
GROUP BY category
but obviously, I get the following errors:
Column 'a' is invalid in the select list because it is not contained
in either an aggregate function or the GROUP BY clause.
Column 'b' is invalid in the select list because it is not contained in either an
aggregate function or the GROUP BY clause.
How can I achieve the desired result?

Try this:
SELECT category, MIN(a) AS resultA, MAX(a) AS resultB
FROM table
GROUP BY category
If the values are mirrored then you can get both values using MIN, MAX applied on a single column like a.

Seams you don't really want to aggregate per category, but rather remove duplicate rows from your result (or rather rows that you consider duplicates).
You consider a pair (x,y) equal to the pair (y,x). To find duplicates, you can put the lower value in the first place and the greater in the second and then apply DISTINCT on the rows:
select distinct
category,
case when a < b then a else b end as attr1,
case when a < b then b else a end as attr2
from mytable;

Considering you want a random record from duplicates for each category.
Here is one trick using table valued constructor and Row_Number window function
;with cte as
(
SELECT *,
(SELECT Min(min_val) FROM (VALUES (a),(b))tc(min_val)) min_val,
(SELECT Max(max_val) FROM (VALUES (a),(b))tc(max_val)) max_val
FROM (VALUES ('A',1,2),
('A',2,1),
('B',3,4),
('B',4,3)) tc(Category, A, B)
)
select Category,A,B from
(
Select Row_Number()Over(Partition by category,max_val,max_val order by (select NULL)) as Rn,*
From cte
) A
Where Rn = 1

Related

Joining 2nd Table with Random Row to each record

I need to join table B to Table A, where Table B's records are randomly assigned, or joined. Most of the queries out there are based off of having a key between them and conditions, where I just want to randomly join records without a key.
I'm not sure where to start, as none of the queries I've found are doing this. I assume a nested join could be helpful for this, but how can I randomly assort the records on join?
**Table A**
| Associate ID| Statement|
|:----: |:------:|
| 33691| John is |
| 82451| Susie is |
| 25485| Sam is|
| 26582| Lonnie is|
| 52548| Carl is|
**Table B**
| RowID | List|
|:----: |:------:|
| 1| admirable|
| 2| astounding|
| 3| excellent|
| 4| awesome|
| 5| first class|
The result would be something like this, where items from the list are not looped through in order, but random:
**Result Table**
| Associate ID| Statement| List|
|:----: |:------:|:------:|
| 33691| John is |astounding|
| 82451| Susie is |first class|
| 25485| Sam is|admirable|
| 26582| Lonnie is|excellent|
| 52548| Carl is|awesome|
These are some of the queries I've tried:
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/aeb83251-e132-435a-8630-e5b842a69368/random-join-between-tables?forum=sqldataaccess
-This seems to loop through values from 'Table B', not random.
https://www.daveperrett.com/articles/2009/08/11/mysql-select-random-row-with-join
-This is based off of a common key between the two tables and returning one of the records with the key, which I do not have.
SQL Join help when selecting random row
- I'll be honest, I don't understand this one, but it doesn't seem to assign random for each row from Table A, but more of a selection overall link the link above this.
Join One Table To Get Random Rows from 2nd Table
- This seems to be specific to a key, and not an overall random.
using 2 CTEs we generate a select which generates a row number for each table based on a random order and then join based on that row number.
Using a CTE to get N times the records in B as described here:
Repeat Rows N Times According to Column Value (Not included below) Note to get the "N" you'll need to get count from A and B, then divide by eachother and Add 1.
Assuming Even Distribution
With A as(
SELECT *, Row_number() over (order by NewID()) RN
FROM A),
B as (
SELECT *, Row_number () over (order by NewID()) RN
FROM B)
SELECT *
FROM A
INNER JOIN B
on A.RN = B.RN
Or use (assuming uneven distribution)
SELECT *
FROM A
CROSS APPLY (SELECT TOP 1 * FROM B ORDER BY NewID()) Z
This method assumes you know in advance which is the smaller table.
First it assigns an ascending row numbering from 1. This does not have to be randomized.
Then for each row in the larger table it uses the modulus operator to randomly calculate a row number in the range to join onto.
WITH Small
AS (SELECT *,
ROW_NUMBER() OVER ( ORDER BY (SELECT 0)) AS RN
FROM SmallTable),
Large
AS (SELECT *,
1 + CRYPT_GEN_RANDOM(3) % (SELECT COUNT(*) FROM SmallTable) AS RND
FROM LargeTable
ORDER BY RND
OFFSET 0 ROWS)
SELECT *
FROM Large
INNER JOIN Small
ON Small.RN = Large.RND
The ORDER BY RND OFFSET 0 ROWS is to get the random numbers materialized in advance.
This will allow a MERGE join on the smaller table. It also avoids an issue that can sometimes happen where the CRYPT_GEN_RANDOM is moved around in the plan and only evaluated once rather than once per row as required.

In PostgreSQL, how can I extract matching items from a list?

I have a query in PostgreSQL that returns results like this, records with a string and a json array:
id | property_list
-----+-------------------------------------------------------------------------------
"i1" | [{"a":{"b":"no"}}, {"a":{"b":"yes"}}, {"a":{"b":"true"}}, {"a":{"b":"false"}}]
"i2" | [{"a":{"b":"yes"}}, {"a":{"b":"no"}}, {"a":{"b":"no"}}]
What I need is something like this:
id | yes_or_true
-----+------------
"i1" | 2
"i2" | 1
I need to count the properties in property_list where a.b equals "yes" or "true".
There are more properties, but there is always an a.b property with a string as its value.
I can solve this using a PL/pgSQL function, but for some reason, I'm in a situation where I can't use a PL/pgSQL function. How can I solve this in the query?
You can do this using jsonb_array_elements and a subquery:
SELECT
id,
(SELECT count(*)
FROM json_array_elements(property_list) el
WHERE el->'a'->>'b' IN ('true','yes')
) AS yes_or_true
FROM the_table
A lateral join to jsonb_array_elements() will solve this:
with indat (id, property_list) as (
values
('i1', '[{"a":{"b":"no"}}, {"a":{"b":"yes"}}, {"a":{"b":"true"}}, {"a":{"b":"false"}}]'::jsonb),
('i2', '[{"a":{"b":"yes"}}, {"a":{"b":"no"}}, {"a":{"b":"no"}}]'::jsonb)
)
select id, count(*) filter (where jdat->'a'->>'b' in ('yes', 'true'))
from indat
cross join lateral jsonb_array_elements(property_list) as j(jdat)
group by id;
id | count
----+-------
i1 | 2
i2 | 1
(2 rows)

Select from union of nested subqueries

I am quite certain this is an issue with applying proper aliases, I'm just not sure where I'm going wrong. I am looking at the following UNION in sqlserver:
Select Z.DesiredResult1, etc...
from (
Select C.columns
from (
Select B.columns
from (
Select A.columns
from (Subquery) as A
) as B
) as C
Where C.condition = 1
UNION
Select F.columns
from (
Select E.columns
from (
Select D.columns
from (Subquery) as D
) as E
) as F
Where F.condition = 2
) as Z
The union by itself functions perfectly, but when trying to make SELECT statements from it (as shown above) it throws an error:
No column name was specified for column 1 of 'Z'
Any insights would be appreciated, thanks for helping an SQL newbie.
Edit: Solved--I misunderstood the error. The issue was an aggregate function that needed an alias, not an entire subquery. Leaving the aggregate column unnamed worked fine for the union alone, so I didn't even consider it. Thanks for bothering to read.
This error can be easily reproduced. Check it here.
If you do not name the columns in a single UNION
SELECT *
FROM (SELECT 'A','B') T1
UNION
SELECT *
FROM (SELECT 'C','D') T2
You will get the same error:
No column name was specified for column 1 of 'T1'.
No column name was specified for column 2 of 'T1'.
No column name was specified for column 1 of 'T2'.
No column name was specified for column 2 of 'T2'.
Simply name each common column with the same name.
SELECT T3.Result1, T3.Result2
FROM
(SELECT *
FROM (SELECT 'A' Result1, 'B' Result2) T1
UNION
SELECT *
FROM (SELECT 'C' Result1, 'D' Result2) T2) T3
+----+---------+---------+
| | Result1 | Result2 |
+----+---------+---------+
| 1 | A | B |
+----+---------+---------+
| 2 | C | D |
+----+---------+---------+

select resultset of counts by array param in postgres

I've been searching for this and it seems like it should be something simple, but apparently not so much. I want to return a resultSet within PostgreSQL 9.4.x using an array parameter so:
| id | count |
--------------
| 1 | 22 |
--------------
| 2 | 14 |
--------------
| 14 | 3 |
where I'm submitting a parameter of {'1','2','14'}.
Using something (clearly not) like:
SELECT id, count(a.*)
FROM tablename a
WHERE a.id::int IN array('{1,2,14}'::int);
I want to test it first of course, and then write it as a storedProc (function) to make this simple.
Forget it, here is the answer:
SELECT a.id,
COUNT(a.id)
FROM tableName a
WHERE a.id IN
(SELECT b.id
FROM tableName b
WHERE b.id = ANY('{1,2,14}'::int[])
)
GROUP BY a.id;
You can simplify to:
SELECT id, count(*) AS ct
FROM tbl
WHERE id = ANY('{1,2,14}'::int[])
GROUP BY 1;
More:
Check if value exists in Postgres array
To include IDs from the input array that are not found I suggest unnest() followed by a LEFT JOIN:
SELECT id, count(t.id) AS ct
FROM unnest('{1,2,14}'::int[]) id
LEFT JOIN tbl t USING (id)
GROUP BY 1;
Related:
Preserve all elements of an array while (left) joining to a table
If there can be NULL values in the array parameter as well as in the id column (which would be an odd design), you'd need (slower!) NULL-safe comparison:
SELECT id, count(t.id) AS ct
FROM unnest('{1,2,14}'::int[]) id
LEFT JOIN tbl t ON t.id IS NOT DISTINCT FROM id.id
GROUP BY 1;

Max Value with unique values in more than one column

I feel like I'm missing something really obvious here.
Using T-SQL/SQL-Server:
I have unique values in more than one column but want to select the max version based on one particular column.
Dataset:
Example
ID | Name| Version | Code
------------------------
1 | Car | 3 | NULL
1 | Car | 2 | 1000
1 | Car | 1 | 2000
Target status: I want my query to only select the row with the highest version value. Running a MAX on the version column pulls all three because of the distinct values in the 'Code' column:
SELECT ID
,Name
,MAX(Version)
,Code
FROM Table
GROUP BY ID, Name, Code
The net result is that I get all three entries as per the data set due to the unique values in the Code column, but I only want the top row (Version 3).
Any help would be appreciated.
You need to identify the row with the highest version as 1 query and use another outer query to pull out all the fields for that row. Like so:
SELECT t.ID, t.Name, GRP.Version, t.Code
FROM (
SELECT ID
,Name
,MAX(Version) as Version
FROM Table
GROUP BY ID, Name
) GRP
INNER JOIN Table t on GRP.ID = t.ID and GRP.Name = t.Name and GRP.Version = t.Version
You can also use row_number() to do this kind of logic, for example like this:
select ID, Name, Version, Code
from (
select *, row_number() over (order by Version desc) as RN
from Table1
) X where RN = 1
Example in SQL Fiddle
add the top statment to force the return of a single row. Also add the order by notation
SELECT top 1 ID
,Name
,MAX(Version)
,Code
FROM Table
GROUP BY ID, Name, Code
order by max(version) desc

Resources