SQL Server 2008 View Group By - sql-server

I have created the following view in SQL Server 2008 to create mailing lists for land owners:
SELECT
dbo.parcel.featid,
CAST(mms_db.dbo.TR_Roll_Master.FMT_ROLL_NO AS decimal(11, 3)) AS Roll,
dbo.parcel.survey, mms_db.dbo.Central_Name_Database.NAME AS Owner,
mms_db.dbo.Central_Name_Database.NAME_2 AS Owner2,
mms_db.dbo.Central_Name_Database.BOX_NUM,
mms_db.dbo.Central_Name_Database.APT_NUM,
mms_db.dbo.Central_Name_Database.FMT_STREET AS House_num,
mms_db.dbo.Central_Name_Database.CITY AS Town,
mms_db.dbo.Central_Name_Database.PROV_CD AS Prov,
mms_db.dbo.Central_Name_Database.POST_CD AS Post_code,
mms_db.dbo.TR_Roll_Number_Owners.NAME_CODE
FROM
mms_db.dbo.TR_Roll_Master
INNER JOIN
dbo.parcel ON mms_db.dbo.TR_Roll_Master.ROLL_NO = dbo.parcel.roll_no COLLATE SQL_Latin1_General_CP1_CI_AS
INNER JOIN
mms_db.dbo.TR_Roll_Number_Owners ON mms_db.dbo.TR_Roll_Master.ROLL_NO = mms_db.dbo.TR_Roll_Number_Owners.ROLL_NO
INNER JOIN
mms_db.dbo.Central_Name_Database ON mms_db.dbo.TR_Roll_Number_Owners.NAME_CODE = mms_db.dbo.Central_Name_Database.NAME_CODE
WHERE
(mms_db.dbo.TR_Roll_Master.DEL_ROLL NOT LIKE '%Y%') AND
(mms_db.dbo.TR_Roll_Master.ROLL_NO NOT LIKE 'P%') OR
(mms_db.dbo.TR_Roll_Master.DEL_ROLL IS NULL) AND (mms_db.dbo.TR_Roll_Master.ROLL_NO NOT LIKE 'P%') OR
(mms_db.dbo.TR_Roll_Master.DEL_ROLL NOT LIKE '%I%') AND
(mms_db.dbo.TR_Roll_Master.ROLL_NO NOT LIKE 'P%')
The view works fine however there are often duplicates as many people own more than one piece of land. I would like to group by Name_Code to eliminate the duplicates.
When I add:
Group by mms_db.dbo.TR_Roll_Number_Owners.NAME_CODE
to the end of the query I am returned with the following response:
SQL Execution Error.
Executed SQL statement: SELECT dbo.parcel.featid, CAST(mms_db.dbo.TR_Roll_Master.FMT_ROLL_NO AS decimal(11,3)) AS Roll,
dbo.parcel.survey,
mms_db.dbo.Central_NameDatabase.Name AS Owner,
mms_db.dbo.Central_Name_Database.NAME_2 AS Owner2,
mms_db.dbo.Central_Name_Database.B...
Error Source: .Net SQLClient Data Provider
Error Message: Column 'dbo.parcel.featid' is invalid in the select list
because it is not contained in either an aggregate function or the
GROUP BY clause.
I'm not sure what I need to change to make this work.
--Edit--
As a sample data, here is a condensed sample of what I would like to achieve
Roll Owner Box_Num Town Prov Post_code Name_Code
100 John Smith 50 Somewhere MB R3W 9T7 00478
200 John Smith 50 Somewhere MB R3W 9T7 00478
300 Peter Smith 72 Somewhere MB R3W 9T9 00592
400 John Smith 90 OtherPlace MB R2R 8V7 00682
John Smith has the name code of 00478. He owns both Roll 100 & 200, Peter Smith owns 300 and another person with the name of John Smith owns 400. Based on different Name_Code values I know that the two John Smith's are different people. I would like an output that would list John Smith with Name_Code 00478 1 time only while also listing Peter Smith and the other John Smith. Name_Code is the only value I can use for grouping as the rest could represent different people with the same name.

If you just want to eliminate duplicates, just use DISTINCT and exclude the columns representing other "people on more than one piece of land" from your query viz:
SELECT DISTINCT
NAME_CODE,
{column2},
{column3},
FROM
[MyView]
However, if you wish to perform aggregation of some sort, or show one random of the "people on more than one piece of land" then you will need the GROUP BY. All non-aggregated columns in the select need to appear in the group by:
SELECT
NAME_CODE,
... Other non aggregated fields here
COUNT(featid) AS NumFeatIds,
MIN(Owner2) AS FirstOwner,
... etc (other aggregated columns)
GROUP BY
NAME_CODE,
... All non-aggregated columns in the select.
Edit
To get the table listed in your edit, you would just need to ORDER BY Name_Code
However to get just one row of John Smith #00478, you need to compromise on the non-unique columns by either eliminating them entirely, using GROUP BY and aggregates on the rows, doing a GROUP_CONCAT type hack to e.g. comma separate them, or to pivot the duplicate row columns as extra columns on the one row.
Since you've mentioned GROUP repeatedly, it seems the aggregation route is necessary. John Smith #00478 has 2 properties, hence 2 discrete Roll values. So Roll can't appear in the aggregated result. So instead, you can return e.g. a count of the Rolls, or the MIN or MAX Roll, but not both Rows*. The other columns (Address related) are probably constant for all properties (assuming John Smith 00478 has one address), but unfortunately SqlServer will require you to include them in the GROUP.
I would suggest you try:
SELECT
COUNT(Roll) AS NumPropertiesOwned,
Owner,
Box_Num,
Town,
Prov,
Post_code,
Name_Code
FROM [MyNewView]
GROUP BY
Owner, Box_Num, Town, Prov, Post_code, Name_Code
ORDER BY Name_Code;
i.e. all the non-aggregated columns must be repeated in the GROUP BY
* unless you use the GROUP_CONCAT hack or the pivot route

its telling you what to do:
"Error Message: Column 'dbo.parcel.featid' is invalid in the select list
because it is not contained in either an aggregate function or the
GROUP BY clause."
This means you have to group the other (non-aggregated) fields too.

Related

LAG and/or LEAD alternative without window function in Pstgresql

Well I'm using Postgresql(but it won't matter if you can advise a solution in any SQL syntax), I have a table like
employee
department
salary
1
sales
30,000
2
sales
25,000
3
marketing
45,000
4
marketing
55,000
so on...
What I want to achieve is:
employee
department
salary
difference
1
sales
30,000
0/null
2
sales
25,000
5,000
3
marketing
45,000
0/null
4
marketing
55,000
10,000
So technically I want to extract the value difference of consecutive rows, however I can't use the window functions (I don't know why, but it is must to avoid in this challenge)
in a perfect world, we'd be able to do lag() or lead() functions partitioned by department name and store the value difference in other column, but I don't know how to do it without them.
I tried subqueries multiple ways, but every time I ended up having NULL or 0 in new a column
You can use a self-join to table itself and join each employee with the previous row within the same department.
SELECT t1.employee, t1.department, t1.salary, ABS(t2.salary - t1.salary) AS difference
FROM tab t1
LEFT JOIN tab t2
ON t1.department = t2.department AND t1.employee = t2.employee +1

How to sort top of N records after selecting?

I'm using AdventureWorks2019 to practise. I want to select top 10 values of FirstName column inside Person.Person table, after getting the values, I want to sort them by ascending.
When I've tried to select top 10 values, I've got result:
Syed
Catherine
Kim
Kim
Kim
Hazem
Sam
Humberto
Gustavo
Pilar
I want to sort them to:
Catherine
Gustavo
Hazem
Humberto
Kim
Kim
Kim
Pilar
Sam
Syed
This is what I've tried, but failed:
select top 10 [FirstName] from Person.Person order by [FirstName];
and
select * from
(select top 10 [FirstName] from Person.Person) as persons
order by [FirstName];
and
with persons as (select top 10 [FirstName] from Person.Person)
select * from persons order by [FirstName];
When you don't specify an ORDER BY, sql server will return the values in whatever order is convenient, there is no guarantee it'll stay the same on subsequent executions. So a TOP(x) without an ORDER BY is usually senseless, because those x can be any x rows.
If you really want to sort those random 10 rows, you should use a sub query as you did. You can see sql server is free to choose any order it sees fit, because you didn't tell it what you want.
If you really, really want to sort those rows, you can try to insert them into a temporary table or table variable first. But again, I suspect you'll find sql server will decide on 10 different rows.
A nice way to visualize it (sort of) is to look at the execution plan: no ORDER BY means no sort, so the rows are shown in the order that they come in (things get a tad mythical beyond that).

GROUP BY or Aggregation Function error message [duplicate]

This question already has answers here:
GROUP BY / aggregate function confusion in SQL
(5 answers)
Closed 3 years ago.
I got an error -
Column 'Employee.EmpID' is invalid in the select list because it is
not contained in either an aggregate function or the GROUP BY clause.
select loc.LocationID, emp.EmpID
from Employee as emp full join Location as loc
on emp.LocationID = loc.LocationID
group by loc.LocationID
This situation fits into the answer given by Bill Karwin.
correction for above, fits into answer by ExactaBox -
select loc.LocationID, count(emp.EmpID) -- not count(*), don't want to count nulls
from Employee as emp full join Location as loc
on emp.LocationID = loc.LocationID
group by loc.LocationID
ORIGINAL QUESTION -
For the SQL query -
select *
from Employee as emp full join Location as loc
on emp.LocationID = loc.LocationID
group by (loc.LocationID)
I don't understand why I get this error. All I want to do is join the tables and then group all the employees in a particular location together.
I think I have a partial explanation for my own question. Tell me if its ok -
To group all employees that work in the same location we have to first mention the LocationID.
Then, we cannot/do not mention each employee ID next to it. Rather, we mention the total number of employees in that location, ie we should SUM() the employees working in that location. Why do we do it the latter way, i am not sure.
So, this explains the "it is not contained in either an aggregate function" part of the error.
What is the explanation for the GROUP BY clause part of the error ?
Suppose I have the following table T:
a b
--------
1 abc
1 def
1 ghi
2 jkl
2 mno
2 pqr
And I do the following query:
SELECT a, b
FROM T
GROUP BY a
The output should have two rows, one row where a=1 and a second row where a=2.
But what should the value of b show on each of these two rows? There are three possibilities in each case, and nothing in the query makes it clear which value to choose for b in each group. It's ambiguous.
This demonstrates the single-value rule, which prohibits the undefined results you get when you run a GROUP BY query, and you include any columns in the select-list that are neither part of the grouping criteria, nor appear in aggregate functions (SUM, MIN, MAX, etc.).
Fixing it might look like this:
SELECT a, MAX(b) AS x
FROM T
GROUP BY a
Now it's clear that you want the following result:
a x
--------
1 ghi
2 pqr
Your query will work in MYSQL if you set to disable ONLY_FULL_GROUP_BY server mode (and by default It is). But in this case, you are using different RDBMS. So to make your query work, add all non-aggregated columns to your GROUP BY clause, eg
SELECT col1, col2, SUM(col3) totalSUM
FROM tableName
GROUP BY col1, col2
Non-Aggregated columns means the column is not pass into aggregated functions like SUM, MAX, COUNT, etc..
Basically, what this error is saying is that if you are going to use the GROUP BY clause, then your result is going to be a relation/table with a row for each group, so in your SELECT statement you can only "select" the column that you are grouping by and use aggregate functions on that column because the other columns will not appear in the resulting table.
"All I want to do is join the tables and then group all the employees
in a particular location together."
It sounds like what you want is for the output of the SQL statement to list every employee in the company, but first all the people in the Anaheim office, then the people in the Buffalo office, then the people in the Cleveland office (A, B, C, get it, obviously I don't know what locations you have).
In that case, lose the GROUP BY statement. All you need is ORDER BY loc.LocationID

SQL Statement to total all employee records

I have a sql statement that is missing all employee names.
Table employee_list contains all employees for the company.
Table apps contain the employee that is assigned to the app
Table details contains the total dollar amount for the order
My query will not group and total for employees that did not have any apps. For example employee John had 5 apps for $250, Bill had 2 apps for $75 and Henry had 0 apps for $0 (no rows in apps or details table for Henry).
My query returns:
John 5 250.00
Bill 2 75.00
I need it to return
John 5 250.00
Bill 2 75.00
Henry 0 0.00
Any ideas? Here is my current code
SELECT employee_list.Fullname,
count(apps.acntnum),
sum(details.cost)
FROM employee_list
left join apps on employee_list.Fullname=apps.EmployeeName
LEFT JOIN details ON (apps.ID=details.ObjOwner_ID AND details.Active=1)
Group BY
employee_list.Fullname
The important thing is to be using a LEFT JOIN from your employee_list table and any subsequent tables you're joining to, and to not do anything that will filter out NULLs from the right-hand tables (because the NULLs would be for the 'missing' rows).
Your query is fine, but I suspect you're using it in a wider query, where you may inadvertently have an INNER JOIN or mention one of the columns in a WHERE clause.
I agree with all the other answers, however, you could also try this....
SELECT employee_list.Fullname,
(SELECT count(apps.acntnum) FROM apps WHERE employee_list.Fullname=apps.EmployeeName) AS Cnt,
(SELECT sum(details.cost) FROM apps LEFT JOIN details ON (apps.ID=details.ObjOwner_ID AND details.Active=1) WHERE employee_list.Fullname=apps.EmployeeName) AS cost
FROM employee_list
This will always return the full list of employees, and separately go and count/sum the other values.
This answer does not take performance into account.

MS Access row number, specify an index

Is there a way in MS access to return a dataset between a specific index?
So lets say my dataset is:
rank | first_name | age
1 Max 23
2 Bob 40
3 Sid 25
4 Billy 18
5 Sally 19
But I only want to return those records between 'rank' 2 and 4, so my results set is Bob, Sid and Billy? However, Rank is not part of the table, and this should be generated when the query is run. Why don't I use an autogenerated number, because if a record is deleted, this will be inconsistent, and what if I wanted the results in reverse!
This obviously very simple, and the reason I ask is because I am working on a product catalogue and I am looking for a more efficient way of paging through the returned dataset, so if I only return 1 page worth of data from the database this is obviously going to be quicker then return a complete set of 3000 records and then having to subselect from that set!
Thanks R.
Original suggestion:
SELECT * from table where rank BETWEEN 2 and 4;
Modified after comment, that rank is not existing in structure:
Select top 100 * from table;
And if you want to choose subsequent results, you can choose the ID of the last record from the first query, say it was ID 101, and use a WHERE clause to get the next 100;
Select top 100 * from table where ID > 100;
But these won't give you what you're looking for either, I bet.
How are you calculating rank? I assume you are basing it on some data in another dataset somewhere. If so, create a function, do a table join, or do something that can calculate rank based on values in other table(s), then you can do queries based on the rank() function.
For example:
select *
from table
where rank() between 2 and 4
If you are not calculating rank based on some data somewhere, there really isn't a way to write this query, and you might as well be returning three random rows from the table.
I think you need to use a correlated subquery to calculate the rank on the fly e.g. I'm guessing the rank is based on name:
SELECT T1.first_name, T1.age,
(
SELECT COUNT(*) + 1
FROM MyTable AS T2
WHERE T1.first_name > T2.first_name
) AS rank
FROM MyTable AS T1;
The bad news is the Access data engine is poorly optimized for this kind of query; in my experience, performace will start to noticeably degrade beyond a few hundred rows.
If it is not possible to maintain the rank on the db side of the house (e.g. high insertion environment) consider doing the paging on the client side. For example, an ADO classic recordset object has properties to support paging (PageCount, PageSize, AbsolutePage, etc), something for which DAO recordsets (being of an older vintage) have no support.
As always, you'll have to perform your own timings but I suspect that when there are, say, 10K rows you will find it faster to take on the overhead of fetching all the rows to an ADO recordset then finding the page (then perhaps fabricate smaller ADO recordset consisting of just that page's worth of rows) than it is to perform a correlated subquery to only fetch the number of rows for the page.
Unfortunately the LIMIT keyword isn't available in MS Access -- that's what is used in MySQL for a multi-page presentation. If you can write an order key into the results table, then you can use it something like this:
SELECT TOP 25 MyOrder, Etc FROM Table1 WHERE MyOrder in
(SELECT TOP 55 MyOrder FROM Table1 ORDER BY MyOrder DESC)
ORDER BY MyOrder ASCENDING
If I understand you correctly, there is ionly first_name and age columns in your table. If this is the case, then there is no way to return Bob, Sid, and Billy with a single query. Unless you do something like
SELECT * FROM Table
WHERE FirstName = 'Bob'
OR FirstName = 'Sid'
OR FirstName = 'Billy'
But I think that this is not what you are looking for.
This is because SQL databases make no guarantee as to the order that the data will come out of the database unless you specify an ORDER BY clause. It will usually come out in the same order it was added, but there are no guarantees, and once you get a lot of rows in your table, there's a reasonably high probability that they won't come out in the order you put them in.
As a side note, you should probably add a "rank" column (this column is usually called id) to your table, and make it an auto incrementing integer (see Access documentation), so that you can do the query mentioned by Sev. It's also important to have a primary key so that you can be certain which rows are being updated when you are running an update query, or which rows are being deleted when you run a delete query. For example, if you had 2 people named Max, and they were both 23, how you delete 1 row without deleting the other. If you had another auto incrementing unique column in there, you could specify the unique ID in your query to delete only one.
[ADDITION]
Upon reading your comment, If you add an autoincrement field, and want to read 3 rows, and you know the ID of the first row you want to read, then you can use "TOP" to read 3 rows.
Assuming your data looks like this
ID | first_name | age
1 Max 23
2 Bob 40
6 Sid 25
8 Billy 18
15 Sally 19
You can wuery Bob, Sid and Billy with the following QUERY.
SELECT TOP 3 FirstName, Age
From Table
WHERE ID >= 2
ORDER BY ID

Resources