Sorting a query, how does thas it work?

Sorting a query, how does thas it work? - sql-server

Can someone explain to me why this is possible with SQL Server :
select column1 c,column2 d
from table1
order by c,column3
I can sort by column1 using the alias because order by clause is applied after the select clause, but how is it possible to sort by a column that i'm not retreiving ?
Thanks in advance.

All column names from the objects in the FROM clause are available to ORDER BY, except in the case of GROUPing or DISTINCT. As you've indicated the alias is also available, because the SELECT statement is processed before the ORDER BY.
This is one of those cases where you trust the optimizer.

According to Books Online (http://technet.microsoft.com/en-us/library/ms188385(v=sql.90).aspx)
The ORDER BY clause can include items that do not appear in the
select list. However, if SELECT DISTINCT is specified, or if the
statement contains a GROUP BY clause, or if the SELECT statement
contains a UNION operator, the sort columns must appear in the select
list.
Additionally, when the SELECT statement includes a UNION operator, the
column names or column aliases must be those specified in the first
select list.

You can sort by alias' which you define in the select select column1 c and then you tell it to sort by a column that you are not including in the select, but one that still exists in the table. This allows us to sort by expressions of data, without having to have it in the select.
Select cost, tax From table ORDER BY (cost*tax)

Related

How do I Select an aggregate function from a temp table without getting the invalid column error from not including the column in the GROUP BY clause?

I performed aggregate functions in a temp table but I'm getting an error because the field I performed the aggregate function on is not included in a GROUP BY in the table I am selecting from. To clarify, this is just a snippet so these tables are temp tables in the larger query. They are also named in the actual code.
WITH #t1 AS
(SELECT
Name,
Date,
COUNT(Email),
COUNT(DISTINCT Email)
FROM SentEmails)
SELECT
#t1.*,
#t2.GrossSents
FROM #t1
--***JOINS***
GROUP BY
#t1.Name,
#t1.Date
I expect a table with Name, Date, Count of Emails, Unique Emails, and Gross Sends fields but I get
Column '#t1.COUNT(Email)' is invalid in the select list` because it is not contained in either an aggregate function or the GROUP BY clause.

Break your issue into steps.
Start by getting the query inside your CTE to return the data you expect from it. The query as written here won't run because you're doing aggregation without a GROUP BY clause.
Once that query is giving you the results you want, wrap it in the CTE syntax and try a SELECT * FROM cteName to see if that works. You'll get an error here because each column in a CTE has to have a name and your last two columns don't have names. Also, as noted in the comments, it's a poor practice to name your CTE with a #. It makes the subsequent code more confusing, since it appears as though there's a temp table someplace, and there isn't.
After you have the CTE returning what you need, start joining other tables, one at a time. Monitor those results as you add tables so you're sure that your JOINs are working as you expect.
If you're doing further aggregation on the outer query, specifying SELECT * is just asking for trouble because you're going to need to specify every non-aggregated column in your GROUP BY anyway. As a general rule, you should enumerate your columns in your SELECT, and in this case that will allow you to copy & paste them to your eventual GROUP BY.

Avoid duplicate values in comma delimited sql query

hello I have here a comma delimited query:
select [Product_Name]
,(select h2.Location_name + ', ' from (select distinct * from [dbo].[Product_list]) h2 where h1.Product_Name = h2.Product_Name
order by h2.Product_Name for xml path ('')) as Location_name
,(select h2.[Store name] + ', ' from [dbo].[Product_list] h2 where h1.Product_Name = h2.Product_Name
order by h2.Product_Name for xml path ('')) as store_name, sum(Quantity) as Total_Quantity from [dbo].[Product_list] h1
group by [Product_Name]
but this query shows duplicated data in comma delimited form, my problem is how will I only show the distinct values of the column in comma delimited form? can anyone please help me?

Well, if you don't SELECT DISTINCT * FROM dbo.Product_list and instead SELECT DISTINCT location_name FROM dbo.Product_list, which is anyway the only column you need, it will return only distinct values.
T-SQL supports the use of the asterisk, or “star” character (*) to
substitute for an explicit column list. This will retrieve all columns
from the source table. While the asterisk is suitable for a quick
test, avoid using it in production work, as changes made to the table
will cause the query to retrieve all current columns in the table’s
current defined order. This could cause bugs or other failures in
reports or applications expecting a known number of columns returned
in a defined order. Furthermore, returning data that is not needed can
slow down your queries and cause performance issues if the source
table contains a large number of rows. By using an explicit column
list in your SELECT clause, you will always achieve the desired
results, providing the columns exist in the table. If a column is
dropped, you will receive an error that will help identify the problem
and fix your query.
Using SELECT DISTINCT will filter out duplicates in the result set.
SELECT DISTINCT specifies that the result set must contain only unique
rows. However, it is important to understand that the DISTINCT option
operates only on the set of columns returned by the SELECT clause. It
does not take into account any other unique columns in the source
table. DISTINCT also operates on all the columns in the SELECT list,
not just the first one.
From Querying Microsoft SQL Server 2012 MCT Manual.

Why do I need to use "as" keyword in this sql query?

I have this SQL query:
select top(1)
salary
from
(select top(2) salary
from employee
order by salary desc) as b
order by
salary asc
If I don't utilize as b it will give me an error:
Incorrect syntax near ...
Why is mandatory to use as in this query?

You don't need the as keyword. In fact, I advise using as for column aliases but not for table aliases. So, I would write this as:
select top(1) salary
from (select top(2) salary
from employee
order by salary desc
) b
order by salary asc;
You do need the table alias for the subquery, because SQL Server requires that all subqueries in the from clause be named.

This is TSql syntax. Subquery in FROM must have an alias even it's never used. Oracle for example considers this alias optional.

This is because you have a sub-query that, according to the Transact-SQL documentation on FROM, makes the use of an alias mandatory:
When a derived table, rowset or table-valued function, or operator clause (such as PIVOT or UNPIVOT) is used, the required table_alias at the end of the clause is the associated table name for all columns, including grouping columns, returned.
Note that with derived table the kind of sub-query is intended that you use in your SQL statement:
derived_table
Is a subquery that retrieves rows from the database. derived_table is used as input to the outer query.

Because you are using 'salary' twice. Without an alias the interpreter won't know what 'salary' to order the results by. By using an alias it can discern between employee.salary and b.salary.

A different approach to get the 2nd highest salary... as if you need the 3rd or 4th you're approach would get much more challenging...
SELECT *
FROM (SELECT salary, row_number() over (order by salary desc) rn
FROM employee) E
WHERE rn = 2

You are creating two queries. The first one selects the top 2 salaries from employee. You are calling this list "b". Then you are selecting the top salary from "b".

Building sql query with count() where count() is > 1

If I have a table where there are duplicate IDs, how can I count the number of times the same ID appears in the table and only show records that have a count greater than 1?
I've tried:
SELECT COUNT(ID) AS myCount FROM myTbl
WHERE myCount > 1 GROUP BY ID
But it says myCount is invalid column name. Can someone show me what I'm doing wrong?

You need to use the HAVING keyword:
SELECT COUNT(ID) AS myCount FROM myTbl
GROUP BY ID
HAVING COUNT(ID) > 1
From MSDN:
Specifies a search condition for a group or an aggregate. HAVING can
be used only with the SELECT statement. HAVING is typically used in a
GROUP BY clause. When GROUP BY is not used, HAVING behaves like a
WHERE clause.

You need to understand the logical query processing phases.
Following are the main query clauses
specified in the order that you are supposed to type them (known as “keyed-in order”):
SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY
The logical query processing order, which is the conceptual interpretation
order, is different. It starts with the FROM clause. Here is the logical query processing
order of the six main query clauses:
FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
A typical mistake made by people who don’t understand logical query processing is attempting
to refer in the WHERE clause to a column alias defined in the SELECT clause. You
can’t do this because the WHERE clause is evaluated before the SELECT clause.
If you understand that the WHERE clause is evaluated before the SELECT clause, you realize
that this attempt is wrong because at this phase, the attribute myCount doesn’t yet exist.
It’s important to understand the difference between WHERE and HAVING. The WHERE
clause is evaluated before rows are grouped, and therefore is evaluated per row.
The HAVING clause is evaluated after rows are grouped, and therefore is evaluated per group.
The HAVING (evaluated per group):
can contain aggregate functions
executed after grouping (exclude records after grouping)
cannot be used without a GROUP BY
In the other hand, the WHERE :
cannot contain aggregate functions (like in your case)
processes after FROM
can be used without GROUP BY
So your query should be like below :
SELECT COUNT(ID) AS myCount FROM myTbl
GROUP BY ID
HAVING COUNT(ID) > 1
Note : Notice that the ORDER BY clause is the first and only clause that is allowed to refer to column
aliases defined in the SELECT clause. That’s because the ORDER BY clause is the only one
to be evaluated after the SELECT clause.

Count of Distinct Rows Without Using Subquery

Say I have Table1 which has duplicate rows (forget the fact that it has no primary key...) Is it possible to rewrite the following without using a JOIN, subquery or CTE and also without having to spell out the columns in something like a GROUP BY?
SELECT COUNT(*)
FROM (
SELECT DISTINCT * FROM Table1
) T1

You can do something like this.
SELECT Count(DISTINCT ProductName) FROM Products
but if you want a count of completely distinct records then you will have to use one of the other options you mentioned.
If you wanted to do something like you suggested in the question, then that would imply you have duplicate records in your table.
If you didn't have duplicate records SELECT DISTINCT * from table would be the same without the distinct.

No, it's not possible.
If you are limited by your framework/query tool/whatever, can't use a subquery, and can't spell out each column name in the GROUP BY, you are SOL.
If you are not limited by your framework/query tool/whatever, there's no reason not to use a subquery.

if you really really want to do that you can just "SELECT COUNT(*) FROM table1 GROUP BY all,columns,here" and take the size of the result set as your count.
But it would be dailywtf worthy code ;)

I just wanted to refine the answer by saying that you need to check that the datatype of the columns is comparable - otherwise you will get an error trying to make them DISTINCT:
e.g.
com.microsoft.sqlserver.jdbc.SQLServerException: The ntext data type cannot be selected as DISTINCT because it is not comparable.
This is true for large binary, xml columns and others depending on your RDBMS - rtm. The solution for SQLServer for example is to cast it from an ntext to an nvarchar(MAX) from SQLServer 2005 onwards.
If you stick to the PK columns then you should be OK (I haven't verified this myself but I'd have thought logically that PK columns would have to be comparable)