Alternatives to FULL OUTER JOIN - sql-server

We have a query that looks for discrepancies between two tables in two different databases (one SQL Server, one Oracle) that in theory should always be in sync. The query pulls the data from both tables into table variables and then does a FULL OUTER JOIN to find the discrepancies. We suspect that the FULL OUT JOIN is partially to blame for the performance issues.
Would it make sense to rely on two LEFT OUTER JOINs and look for records that don't exist on the right side of the join?
We're also thinking about using temp tables to further help with performance.

One option is to do a inner join and store results in a temp table. Then do a select from TableA where not exists in tempTableWithCommonRecords
and another select from TableB where not exists in tempTableWithCommonRecords
Cant say if this will perform better as don't have enough info for that. Its just another option.

You can try the EXCEPT operator which handles complex joins, and it should work in both PL-SQL and T-SQL. It will return any values in the left table that are not a perfect match on the right table:
SELECT [Field1], [Field2], [Field3]
FROM Table1
EXCEPT
SELECT [Field1], [Field2], [Field3]
FROM Table2
UNION
SELECT [Field1], [Field2], [Field3]
FROM Table2
EXCEPT
SELECT [Field1], [Field2], [Field3]
FROM Table1

Related

Can you set multiple column names as a macro in SQL to query against?

Can you set multiple column names from a SQL table as a macro in SQL to query against?
For example I have multiple columns I am hitting against multiple times, can I use a macro or some type of reference to identify them ONCE to avoid displaying them repetitively and cluttering up the code?
The current code works, I am just looking for a cleaner/streamlined option.
Current Code:
WHERE ('ABC') IN
([CODE1],[CODE2],[CODE3],[CODE4],[CODE5],[CODE6],[CODE7],[CODE8]
,[CODE9],[CODE10],[CODE11],[CODE12],[CODE13],[CODE14],[CODE15]
,[CODE16],[CODE17],[CODE18],[CODE19],[CODE20],[CODE21],[CODE22]
,[CODE23],[CODE24],[CODE25]
AND ('CFS') IN
([CODE1],[CODE2],[CODE3],[CODE4],[CODE5],[CODE6],[CODE7],[CODE8]
,[CODE9],[CODE10],[CODE11],[CODE12],[CODE13],[CODE14],[CODE15]
,[CODE16],[CODE17],[CODE18],[CODE19],[CODE20],[CODE21],[CODE22]
,[CODE23],[CODE24],[CODE25]
ect...(20 more times)
Goal:
WHERE 'ABC' IN (&columnsmentionedabove)
OR 'FGS' in (&columnsmentionedabove)
OR 'g6s' in (&columnsmentionedabove)
etc.....
This is inherited code and just seems very clunky.
Thank you
Numbered columns like this are almost always a sign you should have an additional table. So if your existing table structure is like this:
Table1
Table1ID, OtherFields, Code1, Code2, Code3.... Code25
You really want something more like this:
Table1
Table1ID, OtherFields
Table1Codes
Table1ID, Code
Where each entry in Table1 will have many entries in Table1Codes. Then you write JOIN statements to show the two sets side-by-side when needed.
FROM Table1 t
INNER JOIN Table1Codes tc1 ON tc.Table1ID = t.Table1ID AND tc.Code = 'ABC'
INNER JOIN Table1Codes tc2 ON tc.Table1ID = t.Table1ID AND tc.Code = 'CFS'
Or
FROM Table1 t
INNER JOIN Table1Codes tc1 ON tc.Table1ID = t.Table1ID AND tc.Code IN ('ABC','FGS','g6s')
If you can't change the table's schema, as in often the case, you can UNPIVOT it. For example, assuming CODE1...CODE25 come from MyTable, wrap the UNPIVOT operation inside a CTE:
;WITH
cte AS
(
SELECT upvt.*
FROM MyTable
UNPIVOT (
CodeValue FOR CodeLabel IN ([CODE1], [CODE2], ..., [CODE25])
) upvt
)
SELECT *
FROM cte
WHERE CodeValue IN ('ABC', 'DEF', ...)
The unpivot operation is not free. Make sure you filter as much as possible from MyTable before unpivoting the it.

SQL: Building hierarchies and nesting queries on the same table

I am trying to build hierarchies by nesting queries on the same table in MS SQL Server 2014.
To give an example of what I am trying to achieve:
I have a table 'employees' whith the following columns:
[ID],[First Name],[Last Name],[ReportsTo]
{1},{John},{Doe},{2}
{2},{Mary},{Miller},{NULL}
I am trying to build a statement, where I join the employees table with itself and where I build a hierarchy with the boss on top.
Expected Result:
[Employee],[Boss]
{Miller,Mary},{NULL}
{Doe, John},{Miller,Mary}
I apologize, if this is a stupid question, but I fail to create a working nested query.
Could you please help me with that?
Thank you very much in advance
Based on the intended results, it looks like what you essentially want is a list of employees. So let's start with that:
SELECT LastName, FirstName, ReportsTo FROM Employees
This gives you the list, so you now have the objects you're looking for. But you need to fill out more data. You want to follow ReportsTo and show data from the record to which that points as well. This would be done exactly as it would if the foreign key pointed to a different table. (The only difference from being the same table is that you must use table aliases in the query, since you're including the same table twice.)
So let's start by joining the table:
SELECT e.LastName, e.FirstName, e.ReportsTo
FROM Employees e
LEFT OUTER JOIN Employees b on e.ReportsTo = b.ID
The results should still be the same, but now you have more data to select from. So you can add the new columns to the SELECT clause:
SELECT
e.LastName AS EmployeeLastName,
e.FirstName AS EmployeeFirstName,
b.LastName AS BossLastName,
b.FirstName AS BossFirstName
FROM Employees e
LEFT OUTER JOIN Employees b on e.ReportsTo = b.ID
It's a join like any other, it just happens to be a join to the same table.

Shortcut for adding table to column name SQL-server 2014

Stupidly simple question, but I just don't know what to google!
If I create a query like this:
Select id, data
from table1
Now I want to join with table2. I can immediately see that the id column is no longer unique and I have to change it to
table1.id
Is there any smart way (like a keyboard-shortcut) to do this, instead of manually adding table1 to every column? Either before I add the Join to secure that all columns will be unique, or after with suggestions based on the different possible tables.
No, there is no helper.
But do not you can alias the table name:
select x.Col1, y.Col2
from ALongTableName x
inner join AReallyReallyLongTableName y on x.Id = y.OtherId
which can also make queries clearer, and is very much necessary when doing self joins.
First of all, you should start using aliases:
SQL aliases are used to give a database table, or a column in a table,
a temporary name.
Basically aliases are created to make column names more readable.
This will narrow down your problem and make your code maintenance easier. If that's not enough, I guess you could start using auto-completion tools, such as these:
SQL Complete
SQL Prompt
ApexSQL Complete
These have your desired functionality, however, they do not always work as expected (at least for me).
Oh! You can use alias table name. Like this:
SELECT A.ID, A.data
FROM TableA A
INNER JOIN TableB B
ON A.ID = B.ID
You just only use A. or B. if two table have same this column selected. If they different, you don't need: Like this:
SELECT A.ID, data -- if Table B not have column data
FROM TableA A
INNER JOIN TableB B
ON A.ID = B.ID
Or:
Select A.*, B.ID
FROM TableA A
INNER JOIN TableB B
ON A.ID = B.ID

Is it possible to perform a join in Access on a second column if the first is blank?

I have this ugly source data with two columns, let's call them EmpID and SomeCode. Generally EmpID maps to the EmployeeListing table. But sometimes, people are entering the Employee IDs in the SomeCode field.
The person previously running this report in Excel 'solved' this problem by performing multiple vlookups with if statements, as well as running some manual checks to ensure results were accurate. As I'm moving these files to Access I am not sure how best to handle this scenario.
Ideally, I'm hoping to tell my queries to do a Left Join on SomeCode if EmpID is null, otherwise Left Join on EmpID
Unfortunately, there's no way for me to force validation or anything of the sort in the source data.
Here's the full SQL query I'm working on:
SELECT DDATransMaster.Fulfillment,
DDATransMaster.ConfirmationNumber,
DDATransMaster.PromotionCode,
DDATransMaster.DirectSellerNumber,
NZ([DDATransMaster]![DirectSellerNumber],[DDATransMaster]![PromotionCode]) AS EmpJoin,
EmployeeLookup.ID AS EmpLookup,
FROM FROM DDATransMaster
LEFT JOIN EmployeeLookup ON NZ([DDATransMaster]![DirectSellerNumber],[DDATransMaster]![PromotionCode]) = EmployeeLookup.[Employee #])
You can create a query like this:
SELECT
IIf(EmpID Is Null, SomeCode, EmpID) AS join_field,
field2,
etc
FROM YourTable
Or if the query will always be used within an Access session, Nz is more concise.
SELECT
Nz(EmpID, SomeCode) AS join_field,
field2,
etc
FROM YourTable
When you join that query to your other table, the Access query designer can represent the join between join_field and some matching field in the other table. If you were to attempt the IIf or Nz as part of the join's ON clause, the query designer can't display the join correctly in Design View --- it could still work, but may not be as convenient if you're new to Access SQL.
See whether this SQL gives you what you want.
SELECT
dda.Fulfillment,
dda.ConfirmationNumber,
dda.PromotionCode,
dda.DirectSellerNumber,
NZ(dda.DirectSellerNumber,dda.PromotionCode) AS EmpJoin,
el.ID AS EmpLookup
FROM
DDATransMaster AS dda
LEFT JOIN EmployeeLookup AS el
ON NZ(dda.DirectSellerNumber,dda.PromotionCode) = el.[Employee #])
But I would use the Nz part in a subquery.
SELECT
sub.Fulfillment,
sub.ConfirmationNumber,
sub.PromotionCode,
sub.DirectSellerNumber,
sub.EmpJoin,
el.ID AS EmpLookup
FROM
(
SELECT
Fulfillment,
ConfirmationNumber,
PromotionCode,
DirectSellerNumber,
NZ(DirectSellerNumber,PromotionCode) AS EmpJoin
FROM DDATransMaster
) AS sub
LEFT JOIN EmployeeLookup AS el
ON sub.EmpJoin = el.[Employee #])
What about:
LEFT JOIN EmployeeListing ON NZ(EmpID, SomeCode)
as your join, nz() uses the second parameter if the first is null, I'm not 100% sure this sort of join works in access. Worth 20 seconds to try though.
Hope it works.
You Could use a Union:
SELECT DDATransMaster.Fulfillment,
DDATransMaster.ConfirmationNumber,
DDATransMaster.PromotionCode,
DDATransMaster.DirectSellerNumber,
EmployeeLookup.ID AS EmpLookup
FROM DDATransMaster
LEFT JOIN EmployeeLookup ON
DDATransMaster.DirectSellerNumber = EmployeeLookup.[Employee #]
where DDATransMaster.DirectSellerNumber IS NOT NULL
Union
SELECT DDATransMaster.Fulfillment,
DDATransMaster.ConfirmationNumber,
DDATransMaster.PromotionCode,
DDATransMaster.DirectSellerNumber,
EmployeeLookup.ID AS EmpLookup
FROM DDATransMaster
LEFT JOIN EmployeeLookup ON
DDATransMaster.PromotionCode = EmployeeLookup.[Employee #]
where DDATransMaster.DirectSellerNumber IS NULL;

TSQL query to merge data from multiple tables that may or may not have matching rows?

For example, suppose we're conducting research where students can take up to 10 different tests, and each table in the database stores all the students' responses for one test. The tables are named after each test as: T1, T2, ... , T10. Suppose each table has a primary key column 'Username' that identifies each student. Students may or may not have completed each test, so there may or may not be a record in each table for each student.
What is the correct SQL Query to return all the test data from all tables, with one row per student (one row per username)? I want the simplest query possible that returns the correct results. I would also like to coalesce the Username fields into a single Username field in the final query.
To clarify, I understand that SQL has a major limitation in that it does not support a syntax to select all columns except one or more fields like "select *[^ExcludeColumn1][^ExcludeColumn2]". To avoid specifically naming all columns in the final query, it would be acceptable to leave all the Username columns there, as long as it includes a coalesced Username field at the beginning named something like RowID.
As for the overall query, one option would be to perform a union all on the username column of all ten tables, then select the distinct usernames across all tables, then perform a series of left joins against the list of distinct usernames on all 10 tables. That would result in a very straightforward query where each left join is performed on the same distinct set of usernames, but I want to avoid a separate up-front query for distinct usernames. (Although if that's the best option, let me know). It would look something like this:
select * from
(select distinct coalesce(t1.Username,t2.Username,...,t10.Username) as RowID from t1,t2,t3,t4,t5,t6,t7,t8,t9,t10) distinct_usernames
left join t1 on t1.Username = distinct_usernames.RowID
left join t2 on t2.Username = distinct_usernames.RowID
...
left join t10 on t10.Username = distinct_usernames.RowID
Although that is short and easy to write, it is incredibly inefficient and would take hours to run on test tables with 5000+ rows each, so with an adjustment, an equivalent version that runs in a few seconds is:
select * from (
select distinct Username as RowID from (
select Username from t1
union all
select Username from t2
union all
...
select Username from t10
) all_usernames) distinct_usernames
left join t1 on t1.Username = distinct_usernames.RowID
left join t2 on t2.Username = distinct_usernames.RowID
...
left join t10 on t10.Username = distinct_usernames.RowID
I think that what I have above might be the most efficient and correct query (takes only a couple seconds to run and returns correct result set), but I also thought perhaps it could be simplified with some kind of full join. The problem is that full joins get confusing with more than two tables, because without pre-determining the usernames, each subsequent table would have to match records against any of the preceding tables, resulting in a query where each additional table has "[previous table count] + 1" conditions on matching the username.
Assuming that Username is unique in each table, your second query would be the way I would try first, with the slight modifications of removing distinct and simply using union (which implies distinct) rather than union all:
select *
from (
select Username from t1
union
select Username from t2
union
-- ...
select Username from t10
) distinct_usernames
left join t1 on t1.Username = distinct_usernames.Username
left join t2 on t2.Username = distinct_usernames.Username
-- ...
left join t10 on t10.Username = distinct_usernames.Username
From there I would make sure that Username is indexed, possibly even using it as the clustered index. I've also had optimization luck in the past by implementing your distinct_usernames as a temp table (possibly indexed, or an indexed view) at the beginning of the proc, but only testing would determine if that were worthwhile.
A full outer join would require a bunch of or conditions or coalesce arguments, though it could be worth a try on just a few tables to see if the performance is there. I can't try to out-guess what your query engine will like best.
Also, getting just the column names that you want could be done with a query to sys.columns or information_schema.columns and using dynamic SQL to build your query as a string and then executing that.

Resources