Joining null in SQL Server, Oracle and informatica - sql-server

I have two tables to join with a column (say emp_id).. if emp_id in both the tables have null values, how will SQL Server and Oracle treat???
Coz, I read that informatica will neglect the NULL rows when joining..if I handle the null, by substituting -1, a cross-join will happen which i don't want..
What can I do here?
I cannot completely neglect the rows which has NULL.
Thanks

Perhaps you want a left outer join? See wikipedia
Here's how you do it with Oracle
Here's the SQL Server documentation for left outer join.

You can't join on colA = colB and expect NULLs to compare as equal. Depending on your needs (assuming perhaps some sort of table synchronisation need below) three approaches I can think of are
Use COALESCE to substitute a value such as -1 in place of null if a suitable value exists that can never occur in your actual data. COALESCE(Table1.colA,-1) = COALESCE(Table2.colB,-1)
Use both an IS NULL and equality check on all joining columns.
Use INTERSECT (nulls will be treated as equal). Possibly in a derived table that you can JOIN back onto.

Related

Number of rows updated in a oracle table

I have a table called t1 which is already updated by a file. I have table t2 which is created as backup for table t1 before modifications. Now I want to know how many records got updated in table t1. Is there anyway that I can do join with back up table and know how many records got altered? Or how to use sql%rowcount function on a already updated table? Or how should i proceed with ALL_TAB_MODIFICATIONS?
You can join the tables on their primary key (cos you didn't update that, hopefully!) and then compare every column.. you'll have to check for nulls too, and it'll make quite a lot of typing. You could use all_tab_cols and a bit of sql to create your query though (write an sql that creates sql as its output )
Actually, thinking about it, you might be able to get away with less typing by doing a natural join the tables together to get a set of rows that didn't change and removing that set from the original full set:
select * from original
Minus
select original.* from original natural inner join backup
Ive never done it, but the theory is that natural join joins on all equal column names so every column of each table will feature in the join condition. It's an inner join so only columns that have not changed will be represented. Any columns that have become null or become valued from null will also disappear. This is hence the set of rows that have not changed. If all you're after is a count, do a count of the original table less the count of this join result. If you want to know which rows changed, do the result set minus.
Ideally you shouldn't do this; instead at the point the update is run, capture the number of rows it affected. However, this technique could be used long after the update was performed (but before some other update was run)

Select from one table where not in another in SQL Server

I have a SQL Server database on my computer, and there are two tables in it.
This is the first one:
SELECT
[ParticipantID]
,[ParticipantName]
,[ParticipantNumber]
,[PhoneNumber]
,[Mobile]
,[Email]
,[Address]
,[Notes]
,[IsDeleted]
,[Gender]
,[DOB]
FROM
[Gym].[dbo].[Participant]
and this is the second one
SELECT
[ParticipationID]
,[ParticipationNumber]
,[ParticpationTypeID]
,[AddedByEmployeeID]
,[AddDate]
,[ParticipantID]
,[TrainerID]
,[ParticipationDate]
,[EndDate]
,[Fees]
,[PaidFees]
,[RemainingFees]
,[IsPeriodParticipation]
,[NoOfVisits]
,[Notes]
,[IsDeleted]
FROM
[Gym].[dbo].[Participation]
Now I need to write a T-SQL query that can return
SELECT
Participant.ParticipantNumber,
Participation.ParticipationDate,
Participation.EndDate
FROM
Participation
WHERE
Participant.ParticipantID = Participation.ParticipantID;
and I'm going to be thankful
SQL Server performs sort, intersect, union, and difference operations using in-memory sorting and hash join technology. Using this type of query plan, SQL Server supports vertical table partitioning, sometimes called columnar storage.
SQL Server employs three types of join operations:
Nested Loops joins
Merge joins
Hash joins
Join Fundamentals
By using joins, you can retrieve data from two or more tables based on logical relationships between the tables. Joins indicate how Microsoft SQL Server should use data from one table to select the rows in another table.
A join condition defines the way two tables are related in a query by:
Specifying the column from each table to be used for the join. A typical join condition specifies a foreign key from one table and its associated key in the other table.
Specifying a logical operator (for example, = or <>,) to be used in comparing values from the columns.
Inner joins can be specified in either the FROM or WHERE clauses. Outer joins can be specified in the FROM clause only. The join conditions combine with the WHERE and HAVING search conditions to control the rows that are selected from the base tables referenced in the FROM clause.
Follow this link to help you understand joins better in mssql:
link to joins

TSQL Operators IN vs INNER JOIN

Using SQL Server 2014:
Is there any performance difference between the following statements?
DELETE FROM MyTable where PKID IN (SELECT PKID FROM #TmpTableVar)
AND
DELETE FROM MyTable INNER JOIN #TmpTableVar t ON MyTable.PKID = t.PKID
In your given example the execution plans will be the same (most probably).
But having same execution plans doesn't mean that they are the best execution plans you can possibly have for this statement.
The problem I see in both of your queries is the use of the Table Variable.
SQL Server always assumes that there is only 1 row in the table variable. Only in SQL Server 2014 and later version this assumption has been changed to 100 rows.
So no matter how many rows you have this the table variable SQL Server will always assume you have one row in the #TmpTableVar.
You can change your code slightly to give SQL Server a better idea of how many rows there will be in that table by replacing it with a Temporary table and since it is a PK_ID Column in your table variable you can also create an index on that table, to give best chance to sql server to come up with the best possible execution plan for this query.
SELECT PKID INTO #Temp
FROM #TmpTableVar
-- Create some index on the temp table here .....
DELETE FROM MyTable
WHERE EXISTS (SELECT 1
FROM #Temp t
WHERE MyTable.PKID = t.PKID)
Note
In operator will work fine since it is a primary key column in the table variable. but if you ever use IN operator on a nullable column, the results may surprise you, The IN operator goes all pear shape as soon as it finds a NULL values in the column it is checking on.
I personally prefer Exists operator for such queries but inner joins should also work just fine but avoid IN operators if you can.

How to force SQL Server to process CONTAINS clauses before WHERE clauses?

I have a SQL query that uses both standard WHERE clauses and full text index CONTAINS clauses. The query is built dynamically from code and includes a variable number of WHERE and CONTAINS clauses.
In order for the query to be fast, it is very important that the full text index be searched before the rest of the criteria are applied.
However, SQL Server chooses to process the WHERE clauses before the CONTAINS clauses and that causes tables scans and the query is very slow.
I'm able to rewrite this using two queries and a temporary table. When I do so, the query executes 10 times faster. But I don't want to do that in the code that creates the query because it is too complex.
Is there an a way to force SQL Server to process the CONTAINS before anything else? I can't force a plan (USE PLAN) because the query is built dynamically and varies a lot.
Note: I have the same problem on SQL Server 2005 and SQL Server 2008.
You can signal your intent to the optimiser like this
SELECT
*
FROM
(
SELECT *
FROM
WHERE
CONTAINS
) T1
WHERE
(normal conditions)
However, SQL is declarative: you say what you want, not how to do it. So the optimiser may decide to ignore the nesting above.
You can force the derived table with CONTAINS to be materialised before the classic WHERE clause is applied. I won't guarantee performance.
SELECT
*
FROM
(
SELECT TOP 2000000000
*
FROM
....
WHERE
CONTAINS
ORDER BY
SomeID
) T1
WHERE
(normal conditions)
Try doing it with 2 queries without temp tables:
SELECT *
FROM table
WHERE id IN (
SELECT id
FROM table
WHERE contains_criterias
)
AND further_where_classes
As I noted above, this is NOT as clean a way to "materialize" the derived table as the TOP clause that #gbn proposed, but a loop join hint forces an order of evaluation, and has worked for me in the past (admittedly usually with two different tables involved). There are a couple of problems though:
The query is ugly
you still don't get any guarantees that the other WHERE parameters don't get evaluated until after the join (I'll be interested to see what you get)
Here it is though, given that you asked:
SELECT OriginalTable.XXX
FROM (
SELECT XXX
FROM OriginalTable
WHERE
CONTAINS XXX
) AS ContainsCheck
INNER LOOP JOIN OriginalTable
ON ContainsCheck.PrimaryKeyColumns = OriginalTable.PrimaryKeyColumns
AND OriginalTable.OtherWhereConditions = OtherValues

SQL Server CTE referred in self joins slow

I have written a table-valued UDF that starts by a CTE to return a subset of the rows from a large table.
There are several joins in the CTE. A couple of inner and one left join to other tables, which don't contain a lot of rows.
The CTE has a where clause that returns the rows within a date range, in order to return only the rows needed.
I'm then referencing this CTE in 4 self left joins, in order to build subtotals using different criterias.
The query is quite complex but here is a simplified pseudo-version of it
WITH DataCTE as
(
SELECT [columns] FROM table
INNER JOIN table2
ON [...]
INNER JOIN table3
ON [...]
LEFT JOIN table3
ON [...]
)
SELECT [aggregates_columns of each subset] FROM DataCTE Main
LEFT JOIN DataCTE BananasSubset
ON [...]
AND Product = 'Bananas'
AND Quality = 100
LEFT JOIN DataCTE DamagedBananasSubset
ON [...]
AND Product = 'Bananas'
AND Quality < 20
LEFT JOIN DataCTE MangosSubset
ON [...]
GROUP BY [
I have the feeling that SQL Server gets confused and calls the CTE for each self join, which seems confirmed by looking at the execution plan, although I confess not being an expert at reading those.
I would have assumed SQL Server to be smart enough to only perform the data retrieval from the CTE only once, rather than do it several times.
I have tried the same approach but rather than using a CTE to get the subset of the data, I used the same select query as in the CTE, but made it output to a temp table instead.
The version referring the CTE version takes 40 seconds. The version referring the temp table takes between 1 and 2 seconds.
Why isn't SQL Server smart enough to keep the CTE results in memory?
I like CTEs, especially in this case as my UDF is a table-valued one, so it allowed me to keep everything in a single statement.
To use a temp table, I would need to write a multi-statement table valued UDF, which I find a slightly less elegant solution.
Did some of you had this kind of performance issues with CTE, and if so, how did you get them sorted?
Thanks,
Kharlos
I believe that CTE results are retrieved every time. With a temp table the results are stored until it is dropped. This would seem to explain the performance gains you saw when you switched to a temp table.
Another benefit is that you can create indexes on a temporary table which you can't do to a cte. Not sure if there would be a benefit in your situation but it's good to know.
Related reading:
Which are more performant, CTE or temporary tables?
SQL 2005 CTE vs TEMP table Performance when used in joins of other tables
http://msdn.microsoft.com/en-us/magazine/cc163346.aspx#S3
Quote from the last link:
The CTE's underlying query will be
called each time it is referenced in
the immediately following query.
I'd say go with the temp table. Unfortunately elegant isn't always the best solution.
UPDATE:
Hmmm that makes things more difficult. It's hard for me to say with out looking at your whole environment.
Some thoughts:
can you use a stored procedure instead of a UDF (instead, not from within)?
This may not be possible but if you can remove the left join from you CTE you could move that into an indexed view. If you are able to do this you may see performance gains over even the temp table.

Resources