I asked a question about aliases recently: Discerning between alias, temp table, etc [SQL Server].
I got the impression that tables and resulting queries had to be named using aliases.
select customers.name as 'Customers'
from customers
where customers.id not in
(
select customerid from orders
)
In fact when you use an alias there is a runtime error. What gives?
When working with "tables" - that is, anything that can use a JOIN - a name of some sort is needed. For example, if your query was written as:
select customers.name as 'Customers'
from customers
LEFT JOIN (
select customerid from orders
) ___
WHERE ___ is null
Then you need to name the derived table, and fill in the blanks, because SQL Syntax requires a name in a JOIN statement.
However, in your sample code:
select customers.name as 'Customers'
from customers
where customers.id not in
(
select customerid from orders
)
The syntax does not require a name, and so the nested query does not require naming.
Aliases are there for convenience most of the time. There are times when you are required to use them, though.
https://www.geeksforgeeks.org/sql-aliases/
Temporary tables, derived look-ups (sub-queries), common table expressions (CTEs), duplicate table names in JOINs, and a couple other other things I'm sure I'm forgetting are the only times you need to use an alias.
Most of the time, it's simply to rename something because it's long, complex, a duplicate column name, or just to make things simpler or more readable.
The query you post won't likely need an alias, but using one makes things easier when you are using the results in code, as well as when/if you add more columns to the query.
Side note:
You may see a lot of single letter abbreviations in people's SQL. This is common, however, it's bad form. They will also likely abbreviate with the first letter of every word in a table name, such as cal for ClientAddressLookup, and this is also not great form, however typing ClientAddressLookup for each of the 12 columns you need when JOINing with other tables isn't great either. I'm as guilty of this as everyone else, just know that using good aliases are just as necessary and useful as using good names for your variables in code.
Related
As you can see in the link of the ER Diagram, I got two tables, Department and Admissions. My goal is to print out only the Reshnum of the Department that has the most Admissions.
My first attempt was this:
select top 1 count(*) as Number_of_Adm, Department.ReshNum
from Admission
inner join Department on Admission.DepartmentId = Department.id
group by Department.ReshNum
order by Number_of_Adm;
It's pretty straight forward, counts the rows, groups them to the department and prints out the top answer after ordering for the highest count. My problem is that it prints both the count and the Rashnum.
I'm trying to only print the Rashnum (name of the branch/serialnumber). I've read up on sub queries to try to get the count in a subquery and then pick the branch out from that query, but I can't get it to work.
Any tips?
You just need to select the column you need and move the count to the order by criteria.
Using column aliases also helps make your query easier to follow, especially with more columns & tables in the query.
you also say you want the most, I assume you'll need to order descending.
select top (1) d.ReshNum
from Admission a
inner join Department d
on a.DepartmentId = d.id
group by d.ReshNum
Order By count(*) desc;
Great question! Stu's answer is probably the most optimum way, depending on your indexes.
Just for posterity, since your inquiry includes how to make a subquery work, here is an alternative using a subquery. As far as I can tell, at least on my database, SQL Query Optimizer plans the two queries out with about the same performance on either version.
Subqueries can be really useful in tons of scenarios, like when you want to add another field to display and group by without having to add every single other field on the table in the group by clause.
SELECT TOP 1 x.ReshNum /*, x.DepartmentName*/
FROM
(
SELECT count(*) AS CountOfAdmissions, d.CustomerNumber /*d.DepartmentName*/
FROM Adminssion a
INNER JOIN Department d ON a.DepartmentId= d.Id
GROUP BY d.ReshNum /*, d.DepartmentName*/
/*HAVING count(*) > 50*/
) x
ORDER BY CountOfAdmissions DESC
How it works:
You need to wrap your subquery in parenthesis and give that subquery an alias. Above, I have it aliased as x just outside the closing parenthesis as an arbitrary identifier. You could certainly alias it as depts or mySubQuery or whatever reads well in the resulting overall query to you.
Next, it's important to notice that while the group by clause can be included inside the subquery, the order by clause cannot. So you have to keep the order by clause on the outside of the query, which means you are actually ordering the results of the subquery, and not the results of the actual table. Which could be great for performance because the result of your subquery is likely to be vastly smaller than the whole table. However, it will not use your table's index that way, so depending on how your indexes are, that bonus may wash out or even be worse than ordering without a subquery.
Last, one of the benefits of this kind of subquery approach is that you could easily throw in another field if you want, like the department name for example, without costing very much in performance. Above I have that hypothetical department name field commented out between the /* and */ flags. Note that it is referenced with the d table alias on the inside of the subquery, but uses the subquery's x alias outside of the subquery.
Just as a bonus, in case it comes up, also commented out above is a having clause that you might be able to use. Just to show what could be done inside the subquery.
Cheers
I have a table with about 50,000 records (a global index of corporate and government bonds).
I would like the user to be able to filter this master index firstly into a smaller subset index (based on permanent logic), and then apply further run time criteria that vary each time.
For example, let's say the user wanted to start from one of many subset indices of bonds, let's say of government bonds only, rather than government and corporate bonds, and also only wanted the US$ government bond index specifically. This would be a permanently defined subset of the master index, with a where clause something like "[Level1]='Government' AND [Currency]='USD' AND [CountryCode]='US'"
At run time, the user would additionally request additional criteria, say for example "AND [IssueSize] > 1,000,000,000 AND [Yield] > 0.0112".
I initially thought of having a separate table that stored the different criteria for these permanent sub-indices as where clauses, for example it might have columns "IndexCode, IndexLogic", and using the example above the values would be "UST", "[Level1]='Government' and [Currency]='USD' AND [CountryCode]='US'", and there would be dozens of rows in this table defining commonly used bond indices.
I had originally thought of creating a dynamic string at run-time, where the user supplies their choice of sub-index code ('UST' in the example above), which then adds the relevant where conditions, and any additional criteria passed as separate parameters, and then doing an exec(#tsql) type command. I had also thought of perhaps having a where clause that was a function call, but this seems very inefficient?
Is the dynamic string method the best way of doing this, or is there a better way involving some kind of 'eval' function equivalent which can take a field value and use that as a where clause?
The problem here is you don't know in advance what the filtered index is.
A solution I have used in this instance, where the filtered index can often change is to grab the definition of the filter back into the client app, and use that to dynamically generate the SQL batch. You can also do this with dynamic SQL in a stored procedure:
SELECT ISNULL(
(SELECT i.filter_definition
FROM sys.indexes i
WHERE i.object_id = OBJECT_ID(#tablename) AND
i.name = #indexname AND has_filter = 1),
'(1=1)');
You pass the table name, and the index name, and you get back the exact definition for the index. This has the benefit of if the index is dropped, the condition becomes (1=1) i.e. every row. You can change this to (1=0) to return nothing instead.
Then you concat this into your dynamic query like so:
SELECT *
FROM table
WHERE regular_conditions_here
AND concated_filter_here
If you are joining other tables, I would advise you to subquery the filter, otherwise you many get column clash as there are no aliases.
SELECT *
FROM (SELECT * FROM table WHERE concated_filter_here) table
JOIN othertables etc
WHERE regular_conditions_here
I have to tables. A table that we call "The Vault" which has a ton of demographic data and Table A which has some of the demographic data. I am new at SQL Server, and I was tasked with finding a list of 21 students in The Vault table (Table B). Problem, there is no primary key or anything distinctive besides, FirstName, LastName, BirthMonth, Birthday, Birthyear.
Goal: We could not match these people in the conventional way we have, and so we are attempting a Hail Mary to try to see which of these shared combinations will perhaps land us with a match.
What I have tried doing: I have placed both tables on tempt tables, table A and table B, and I am trying to do an Inner Join but then I realized that in order to make it work, I would have to do a crazy join statement where I say (see the code below)
But the problem as you can imagine is it brings a lot more than my current 21 records and is in the thousands so then I would have to make that join statement longer but I am not sure this is the right way to do this. I believe that is where the WHERE clause would come in no?
Question: How do I compare these two tables for possible matches using a WHERE clause where I can mix and match different columns without having to filter the data constrains in the ON clause of the JOIN. I don't want to JOIN on like 6 different columns. Is there another way to do this so I can perhaps learn. I understand this is easier when you have a primary key shared and that would be the JOIN criteria I would use, but when we are comparing two tables to find possible matches, I have never done that.
FROM #Table a
INNER JOIN #table b ON (a.LAST_NAME = B.LAST_NAME AND a.FIRST_NAME = b.FIRST_NAME.....)```
I'm sure this has to be documented SOMEWHERE but for the life of me I just can't seem to find any actual documentation explaining the behavior.
Taking the 4 ways to reference tables (I don't believe there are more but feel free to correct me):
Current Database
Remote Database
Linked Server
Synonym
Their behavior when using multi-part column identifiers seem to differ and I'm trying to understand the reasoning behind it. I've tested the various types of SELECT statements:
Current Database
Works
SELECT Column FROM Schema.Table;
SELECT Table.Column FROM Schema.Table;
SELECT Schema.Table.Column FROM Schema.Table;
SELECT Alias.Column FROM Schema.Table AS Alias;
Even this works!
(Obviously only when using dbo Schema, but still)
SELECT Schema.Table.Column FROM Table;
Remote Database
Works
SELECT Column FROM RemoteDB.Schema.Table;
SELECT Table.Column FROM RemoteDB.Schema.Table;
SELECT RemoteDB.Schema.Table.Column FROM RemoteDB.Schema.Table;
SELECT Alias.Column FROM RemoteDB.Schema.Table AS Alias;
Fails
SELECT Schema.Table.Column FROM RemoteDB.Schema.Table;
The multi-part identifier "Schema.Table.Column" could not be bound.
Linked Server
Works
SELECT Column FROM LinkedServer.RemoteDB.Schema.Table;
SELECT Table.Column FROM LinkedServer.RemoteDB.Schema.Table;
SELECT Alias.Column FROM LinkedServer.RemoteDB.Schema.Table AS Alias;
Fails
SELECT Schema.Table.Column FROM LinkedServer.RemoteDB.Schema.Table;
The multi-part identifier "Schema.Table.Column" could not be bound.
SELECT RemoteDB.Schema.Table.Column FROM LinkedServer.RemoteDB.Schema.Table;
The multi-part identifier "RemoteDB.Schema.Table.Column" could not be bound.
SELECT LinkedServer.RemoteDB.Schema.Table.Column FROM LinkedServer.RemoteDB.Schema.Table;
The multi-part identifier "LinkedServer.RemoteDB.Schema.Table.Column" could not be bound.
**I believe this fails are you're only allowed a maximum of 4 parts in the identifier?**
**Read this somewhere but nothing authoritive. Would appreciate a reference.**
Synonym
Works
SELECT Column FROM SynonymName;
SELECT Column FROM SynonymSchema.SynonymName;
SELECT SynonymName.ColumnName FROM SynonymSchema.SynonymName;
SELECT SynonymSchema.SynonymName.Column FROM SynonymSchema.SynonymName;
SELECT Alias.Column FROM SynonymSchema.SynonymName AS Alias;
Even this works!
(Obviously only when using dbo Schema, but still)
SELECT SynonymSchema.SynonymName.Column FROM SynonymName;
I'm trying to understand why certain multi-part identifiers work when used one way (like schema against local db) but then fail when used another way (like schema against a remote db/linked server).
Should you rather always use aliases to ensure things will always work?
Any advice would be highly appreciated, especially pointers to official documentation as to the reason behind the design and best practice advice for a "one size fits all" scenario (which I'm currently going to surmise to be the alias route).
Best practice - Alias your tables and use a two parts identifier for column names - first part is the table alias and the second one is the column name.
Why? because:
Using a single part identifier will break as soon as the query contains a join (or apply), and that column name happens to belong to more than one table.
Using more than two parts identifier for a column will force you to write most of the identifier twice - once for the column and once for the table. If anything changes (like table moved to a different schema, linked server name changed, synonym changes) you now have to change your query in (at least) two places.
Using a two parts identifier for a column means you know exactly what table this column belongs to, and even if you add a join / apply clause, or simply add a column with the same name to one of the existing tables in the query, you don't need to change the query at all. Also, you now have only one place that determines where the table comes from.
Using three or four parts identifier for columns is deprecated (thanks #larnu) for the link in the comments.
Most importantly - columns belong to tables. They don't belong to servers, databases or schemas.
Please note that the word table in this answer is interchangeable with view, table-valued function, table-variable etc'.
Here's the setup: I have several tables that hold information for data objects which have the potential to have various and sundry bits of data associated with them. Each of these tables has an associated attributes table, which holds 3 bits of information:
the id (integer) of the row the attribute is associated with
a short attribute name ( < 50 chars )
a value (varchar)
The object table will have any number of columns of varying data types, but will always have an integer primary key. If possible, I would like to set up a view that will allow me to select a row from the object table, and all of its associated attributes at one go.
****EDIT****
Ideally, the form I'd like this to take is having columns in the view with the names of the matched attribute from the attributes table, and the value as the value of the attribute.
So for example, if I have table Foo with columns 'Bar', 'Bat', and 'Baz' the view would have those columns, and additionally, columns for any attributes that a row might have.
****END EDIT****
Now, I know (or think I do) that SQL doesn't allow using variables as an alias for a column name. Is there a clean, practical way of doing what I want, or am I chasing a pipe dream?
The obvious solution is to handle all of this in the application code, but I'm curious if it can be done in SQL.
The answer depends on what you are actually seeking. Will the output of the view have one row per attribute per object or one column per attribute per object? If the former, then I'm not sure why you need a view:
Select ...
From ObjectTable
Join AttributeTable
On AttributeTable.Id = ObjectTable.Id
However, I suspect what you want is the later or something like:
Select ...
, ... As Attribute1
, ... As Attribute2
, ... As Attribute3
...
From ObjectTable
In this scenario, the columns that would be generated are not known at execution because the attribute names are dynamic. This is commonly known as a dynamic crosstab. In general, the SQL language is not designed for dynamic column generation. The only way to do this in T-SQL is to use some fugly dynamic SQL. Thus, it is better done in a reporting tool or in middle-tier code.
It sounds like you want a view for each of your 'object' tables as well as its 'attributes' table. Correct me if I am wrong in my reading. It's not clear what your intentions are with 'using variables as an alias for a column name'. Were you hoping to merge ALL your objects, with their different columns, into one view?
Suggest create one view per entity table, and join to its relevant 'attributes' table.
Question though - why is there one matching attributes table for each entity table? Why are they split out? Perhaps you've made the question simpler or obfuscated, so perhaps my question is rhetorical.
CREATE VIEW Foo AS
SELECT O.ID
,O.EverythingElse
,A.ShortName
,A.SomeVarcharValue
FROM
ObjectTable AS O --customer, invoice, whathaveyou
INNER JOIN
ObjectAttribute AS A ON A.ObjectID = O.ID
To consume from this, you could:
SELECT * FROM Foo WHERE ID = 4 OR
SELECT * FROM Foo WHERE ShortName = 'Ender'