SQL Server : Group by breaks my program - sql-server

I have the following query. The idea is to inner join the records and group them in order to get one record (the latest one) from each group.
If I add the GROUP BY (like on the example bellow) it doesn't work.
If I remove the GROUP BY the query works but display duplicated data.
If I group by all fields that I selected before the inner join, it works but not as intended. It will display all records.
Any suggestions?
SELECT
Calibrations.Cert_No,
Calibrations.Cust_Ref,
Calibrations.Rec_Date,
Instruments.Inst_ID,
Instruments.Description,
Instruments.Model_no,
Instruments.Manufacturer,
Instruments.Serial_no,
Instruments.Status,
Instruments.Cust_Acc_No
FROM
Instruments
INNER JOIN
Calibrations ON Instruments.Inst_ID = Calibrations.Inst_ID
WHERE
Instruments.Cust_Name = '" & Session("MM_Username") & "'
AND Instruments.Cust_Acc_No = '" & Session("MM_Password") & "'
AND Instruments.Cust_Acc_No = '" & Replace(rsDue__MMColParam, "'", "''") & "'
AND Instruments.Status IN ('N')
GROUP BY
Instruments.Inst_ID
ORDER BY
Calibrations.Rec_Date DESC

You cannot have columns in the SELECT part of your query, that does not appear in the GROUP BY part of the query, unless they are inside an aggregate function such as MIN(), MAX(), SUM(), etc...
Think about it this way: Say you have a table that looks like this:
+----------+------+--------+
| Col1 | Col2 | NumCol |
+----------+------+--------+
| Value 1a | ABC | 123 |
| Value 1a | DEF | 234 |
| Value 1b | GHI | 345 |
| Value 1b | JKL | 456 |
+----------+------+--------+
This query would not work:
SELECT Col1, Col2, NumCol FROM Table
GROUP BY Col1 ORDER BY NumCol
Why? Because you are only grouping by Col1, and since this column only contains two distinct values, the query engine doesn't know which of the values it should display in the Col2 or NumCol columns (since these contain 4 distinct values).
To fix this, you should either remove the columns from your SELECT statement like this:
SELECT Col1 FROM Table
GROUP BY Col1
...or aggregate the columns somehow. For example like this:
SELECT Col1, MAX(Col2) AS Col2, SUM(NumCol) AS NumCol FROM Table
GROUP BY Col1 ORDER BY NumCol
However, this is not the same as getting the "latest record", or for example the record with the largest NumCol for each distinct value of Col1. To do that, you should consider using the ROW_NUMBER() windowed function like this:
SELECT Col1, Col2, NumCol FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Col1 ORDER BY NumCol DESC) AS N
FROM Table
) AS T
WHERE T.N = 1
How this works is a topic of its own, but basically, ROW_NUMBER assigns a running value to each row, resetting the value each time it encounters a new value in Col1. The ordering makes sure that the running value starts with 1 for the record that has the largest NumCol value. In the outer select statement, you then apply a filter on this running value, to get only the first record for each distinct Col1 value - that is the record with the largets NumCol value.

When you are grouping in a SQL query, you have to either list the column in the group by clause or use an aggregate function -> There can not be columns without aggregation since they are not in the group by list.
You did not provided any information about your specific goal, but either you can get the values by aggregating (using MIN, MAX, AVG, etc) functions to get the desired data, or you can use subqueries to retrieve the distinct list than another one to retrieve their specific data, or you can use analytic functions (FIRST_VALUE, LAST_VALUE, etc) and distinct.

Related

Joining 2nd Table with Random Row to each record

I need to join table B to Table A, where Table B's records are randomly assigned, or joined. Most of the queries out there are based off of having a key between them and conditions, where I just want to randomly join records without a key.
I'm not sure where to start, as none of the queries I've found are doing this. I assume a nested join could be helpful for this, but how can I randomly assort the records on join?
**Table A**
| Associate ID| Statement|
|:----: |:------:|
| 33691| John is |
| 82451| Susie is |
| 25485| Sam is|
| 26582| Lonnie is|
| 52548| Carl is|
**Table B**
| RowID | List|
|:----: |:------:|
| 1| admirable|
| 2| astounding|
| 3| excellent|
| 4| awesome|
| 5| first class|
The result would be something like this, where items from the list are not looped through in order, but random:
**Result Table**
| Associate ID| Statement| List|
|:----: |:------:|:------:|
| 33691| John is |astounding|
| 82451| Susie is |first class|
| 25485| Sam is|admirable|
| 26582| Lonnie is|excellent|
| 52548| Carl is|awesome|
These are some of the queries I've tried:
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/aeb83251-e132-435a-8630-e5b842a69368/random-join-between-tables?forum=sqldataaccess
-This seems to loop through values from 'Table B', not random.
https://www.daveperrett.com/articles/2009/08/11/mysql-select-random-row-with-join
-This is based off of a common key between the two tables and returning one of the records with the key, which I do not have.
SQL Join help when selecting random row
- I'll be honest, I don't understand this one, but it doesn't seem to assign random for each row from Table A, but more of a selection overall link the link above this.
Join One Table To Get Random Rows from 2nd Table
- This seems to be specific to a key, and not an overall random.
using 2 CTEs we generate a select which generates a row number for each table based on a random order and then join based on that row number.
Using a CTE to get N times the records in B as described here:
Repeat Rows N Times According to Column Value (Not included below) Note to get the "N" you'll need to get count from A and B, then divide by eachother and Add 1.
Assuming Even Distribution
With A as(
SELECT *, Row_number() over (order by NewID()) RN
FROM A),
B as (
SELECT *, Row_number () over (order by NewID()) RN
FROM B)
SELECT *
FROM A
INNER JOIN B
on A.RN = B.RN
Or use (assuming uneven distribution)
SELECT *
FROM A
CROSS APPLY (SELECT TOP 1 * FROM B ORDER BY NewID()) Z
This method assumes you know in advance which is the smaller table.
First it assigns an ascending row numbering from 1. This does not have to be randomized.
Then for each row in the larger table it uses the modulus operator to randomly calculate a row number in the range to join onto.
WITH Small
AS (SELECT *,
ROW_NUMBER() OVER ( ORDER BY (SELECT 0)) AS RN
FROM SmallTable),
Large
AS (SELECT *,
1 + CRYPT_GEN_RANDOM(3) % (SELECT COUNT(*) FROM SmallTable) AS RND
FROM LargeTable
ORDER BY RND
OFFSET 0 ROWS)
SELECT *
FROM Large
INNER JOIN Small
ON Small.RN = Large.RND
The ORDER BY RND OFFSET 0 ROWS is to get the random numbers materialized in advance.
This will allow a MERGE join on the smaller table. It also avoids an issue that can sometimes happen where the CRYPT_GEN_RANDOM is moved around in the plan and only evaluated once rather than once per row as required.

SQL GROUP BY with columns which contain mirrored values

Sorry for the bad title. I couldn't think of a better way to describe my issue.
I have the following table:
Category | A | B
A | 1 | 2
A | 2 | 1
B | 3 | 4
B | 4 | 3
I would like to group the data by Category, return only 1 line per category, but provide both values of columns A and B.
So the result should look like this:
category | resultA | resultB
A | 1 | 2
B | 4 | 3
How can this be achieved?
I tried this statement:
SELECT category, a, b
FROM table
GROUP BY category
but obviously, I get the following errors:
Column 'a' is invalid in the select list because it is not contained
in either an aggregate function or the GROUP BY clause.
Column 'b' is invalid in the select list because it is not contained in either an
aggregate function or the GROUP BY clause.
How can I achieve the desired result?
Try this:
SELECT category, MIN(a) AS resultA, MAX(a) AS resultB
FROM table
GROUP BY category
If the values are mirrored then you can get both values using MIN, MAX applied on a single column like a.
Seams you don't really want to aggregate per category, but rather remove duplicate rows from your result (or rather rows that you consider duplicates).
You consider a pair (x,y) equal to the pair (y,x). To find duplicates, you can put the lower value in the first place and the greater in the second and then apply DISTINCT on the rows:
select distinct
category,
case when a < b then a else b end as attr1,
case when a < b then b else a end as attr2
from mytable;
Considering you want a random record from duplicates for each category.
Here is one trick using table valued constructor and Row_Number window function
;with cte as
(
SELECT *,
(SELECT Min(min_val) FROM (VALUES (a),(b))tc(min_val)) min_val,
(SELECT Max(max_val) FROM (VALUES (a),(b))tc(max_val)) max_val
FROM (VALUES ('A',1,2),
('A',2,1),
('B',3,4),
('B',4,3)) tc(Category, A, B)
)
select Category,A,B from
(
Select Row_Number()Over(Partition by category,max_val,max_val order by (select NULL)) as Rn,*
From cte
) A
Where Rn = 1

Returning Field names as part of a SQL Query

I need to write a Sql Satement that gets passed any valid SQL subquery, and return the the resultset, WITH HEADERS.
Somehow i need to interrogate the resultset, get the fieldnames and return them as part of a "Union" with the origional data, then pass the result onwards for exporting.
Below my attempt: I have a Sub-Query Callled "A", wich returns a dataset and i need to query it for its fieldnames. ?ordinally maybe?
select A.fields[0].name, A.fields[1].name, A.fields[2].name from
(
Select 'xxx1' as [Complaint Mechanism] , 'xxx2' as [Actual Achievements]
union ALL
Select 'xxx3' as [Complaint Mechanism] , 'xxx4' as [Actual Achievements]
union ALL
Select 'xxx5' as [Complaint Mechanism] , 'xxx6' as [Actual Achievements] ) as A
Any pointers would be appreciated (maybe i am just missing the obvious...)
The Resultset should look like the table below:
F1 F2
--------------------- ---------------------
[Complaint Mechanism] [Actual Achievements]
xxx1 xxx2
xxx3 xxx4
xxx5 xxx6
If you have a static number of columns, you can put your data into a temp table and then query tempdb.sys.columns to get the column names, which you can then union on top of your data. If you will have a dynamic number of columns, you will need to use dynamic SQL to build your pivot statement but I'll leave that up to you to figure out.
The one caveat here is that all data under your column names will need to be converted to strings:
select 1 a, 2 b
into #a;
select [1] as FirstColumn
,[2] as SecondColumn
from (
select column_id
,name
from tempdb.sys.columns
where object_id = object_id('tempdb..#a')
) d
pivot (max(name)
for column_id in([1],[2])
) pvt
union all
select cast(a as nvarchar(100))
,cast(b as nvarchar(100))
from #a;
Query Results:
| FirstColumn | SecondColumn |
|-------------|--------------|
| a | b |
| 1 | 2 |

insert query in sql from another table with running number

Am inserting rows in the table from another table
I need to the id columns should be running number like the below how to do that
i have set id column is unique key, so that the below code shows error
insert into Tbl1 (Id, DislayName,IsEnabled)
select 16000,Names,0 from Tbl2
Insertion should be like
16000 | John | false
16001 | Deo | false
16002 | Jake | false
NOTE: no auto increment should be used, because already its been assigned for another column
Add row_number() window function (minus one)
insert into Tbl1 (Id, DislayName,IsEnabled)
select 16000 -1 + row_number () over (order by Names),
Names,0
from Tbl2;

Max Value with unique values in more than one column

I feel like I'm missing something really obvious here.
Using T-SQL/SQL-Server:
I have unique values in more than one column but want to select the max version based on one particular column.
Dataset:
Example
ID | Name| Version | Code
------------------------
1 | Car | 3 | NULL
1 | Car | 2 | 1000
1 | Car | 1 | 2000
Target status: I want my query to only select the row with the highest version value. Running a MAX on the version column pulls all three because of the distinct values in the 'Code' column:
SELECT ID
,Name
,MAX(Version)
,Code
FROM Table
GROUP BY ID, Name, Code
The net result is that I get all three entries as per the data set due to the unique values in the Code column, but I only want the top row (Version 3).
Any help would be appreciated.
You need to identify the row with the highest version as 1 query and use another outer query to pull out all the fields for that row. Like so:
SELECT t.ID, t.Name, GRP.Version, t.Code
FROM (
SELECT ID
,Name
,MAX(Version) as Version
FROM Table
GROUP BY ID, Name
) GRP
INNER JOIN Table t on GRP.ID = t.ID and GRP.Name = t.Name and GRP.Version = t.Version
You can also use row_number() to do this kind of logic, for example like this:
select ID, Name, Version, Code
from (
select *, row_number() over (order by Version desc) as RN
from Table1
) X where RN = 1
Example in SQL Fiddle
add the top statment to force the return of a single row. Also add the order by notation
SELECT top 1 ID
,Name
,MAX(Version)
,Code
FROM Table
GROUP BY ID, Name, Code
order by max(version) desc

Resources