More efficient way to write query - sql-server

I have two fields with an email address in #data table that I am trying to join on. I want it to join on rep email address and if that doesn't work, I want it to join on email. I tried running the following query:
select a.*
from #data a
join #email b on b.email=coalesce(a.rep_email_address,a.email)
where a.rep_email_address<>a.email
This however doesn't work, because in the case where a.rep_email_adress is not null but doesn't match with b.email, it will drop the record instead of taking the a.email field.
This is the work-around I found:
select a.*
from #data a
join #email b on a.email=b.email
except
select a.*
from #data a
join #email b on a.rep_email_address=b.email
union
select a.*
from #data a
join #email b on a.rep_email_address=b.email
where a.rep_email_address<>a.email
This however, is far from optimal, so I am wondering- any way to write this to perform better/look cleaner or simpler? Just to clarify- this query works (the second query), I am wondering if there is a better way to write it.
Thank you!

This should be much simpler. However, I also recommend you check the execution plan on this query to help you analyze if this is more optimal. [Or just compare the resulting execution times on your tables]
SELECT a.*
FROM #data a
JOIN #email b
ON (a.rep_email_address = b.email
OR a.email = b.email )
WHERE a.rep_email_address<>a.email;
# Not sure why or IF you need this where clause specifically.

Try it like this...
SELECT
a.*,
SomeColumn = ISNULL(e1.SomeColumn, e2.SomeColumn)
from
#data a
LEFT JOIN #email e1
ON e1.email = a.rep_email_address
LEFT JOIN #email e2
ON e2.email = a.email
AND e1.email IS NULL
WHERE
a.rep_email_address <> a.email
AND (
e1.email IS NOT NULL
OR
e2.email IS NOT NULL
);
HTH, Jason

SQL sever uses "Three-Valued Logic" in boolean evaluations:
NULL <> 2 --> Unknown (in your case in the where clause it will essentially become false)
NULL <> NULL --> Unknown (same as above)
a <> b --> true
In your case your original query should be:
select a.*
from #data a
join #email b on b.email=coalesce(a.rep_email_address,a.email)
where ISNULL( a.rep_email_address, '' ) <> ISNULL( a.email, '' )
If you care about performance, then try to avoid the use of functions in join predicates or WHERE conditions as this prevents SQL Server from using indexes on columns that are passed into the function.
SELECT a.*
FROM #data AS a
INNER JOIN #email AS b ON b.email = a.rep_email_address OR b.email = a.email
WHERE a.rep_email_address <> a.email OR ( a.rep_email_address IS NULL OR a.email IS NULL )
Summary:
Do not use NULLs to denote empty strings as this requires a lot of extra code to then check for NULLs

Related

Create a User defined function like SQL server 2017 STRING_AGG on earlier versions

I try to create a generic function that can be used like this example of using the new string_agg built-in function on SQL Server 2017
the inside implementation can be something like the follow
with tbl as(
select a.Id, c.Desc
from TableA a
join TableB b on b.aId = a.Id
join TableC c on c.Code = b.bCode
)
select distinct ID
, STUFF(( select ', ' + Desc from tbl t where t.ID = tbl.ID
for xml path(''),TYPE).value('.','VARCHAR(MAX)'),1,2,'') Desc
from tbl
But how to receives the field key, the field to be connected, the separator char, and the scoped select context?
Is it related to Inline or Multi-Statement Table-Valued Functions ?
Well, this is an ugly hack, I have to go and wash my hands now, but it works (in a way :-D)
CREATE FUNCTION dbo.MyStringAgg(#SelectForXmlAuto XML,#Delimiter NVARCHAR(10))
RETURNS NVARCHAR(MAX)
AS
BEGIN
RETURN STUFF((
SELECT #Delimiter + A.nd.value(N'(#*)[1]',N'nvarchar(max)')
FROM #SelectForXmlAuto.nodes(N'/*') AS A(nd)
FOR XML PATH(''),TYPE
).value(N'.',N'nvarchar(max)'),1,LEN(#Delimiter),'');
END
GO
DECLARE #tbl TABLE(GroupId INT,SomeValue NVARCHAR(10));
INSERT INTO #tbl VALUES(1,'A1'),(1,'A2'),(2,'B1'),(3,'C1'),(3,'C2'),(3,'C3');
SELECT GroupId
,dbo.MyStringAgg((SELECT SomeValue
FROM #tbl AS t2
WHERE t2.GroupId=t.GroupId
FOR XML AUTO), N', ')
FROM #tbl AS t
GROUP BY GroupId;
GO
DROP FUNCTION dbo.MyStringAgg;
The result
1 A1, A2
2 B1
3 C1, C2, C3
The parameter is a FOR XML sub-select within paranthesis. This will implicitly pass the sub-selects result as an XML into the function.
To be honest: I would not use this myself...
A query like this
SELECT GroupId
,STUFF((SELECT N', ' + SomeValue
FROM #tbl AS t2
WHERE t2.GroupId=t.GroupId
FOR XML PATH,TYPE).value(N'.','nvarchar(max)'),1,2,'')
FROM #tbl AS t
GROUP BY GroupId;
produces the same result and is almost the same amount of typing - but should be faster then calling a slow UDF...
Ok.. so with the first comment of #MichaƂTurczyn I run into this Microsoft article about CLR User-Defined Aggregate - Invoking Functions
Once I compile the code into SrAggFunc.dll, I was trying to register the aggregate in SQL Server as follows:
CREATE ASSEMBLY [STR_AGG] FROM 'C:\tmp\STR_AGG.dll';
GO
But I got the following error.
Msg 6501, Level 16, State 7, Line 1
CREATE ASSEMBLY failed because it could not open the physical file 'C:\tmp\SrAggFunc.dll': 3(The system cannot find the path specified.).
So I used this excellant part of #SanderRijken code and then change the command to
CREATE ASSEMBLY [STR_AGG]
FROM 0x4D5A90000300000004000000FF......000; --from GetHexString function
GO
and then,
CREATE AGGREGATE [STR_AGG] (#input nvarchar(200)) RETURNS nvarchar(max)
EXTERNAL NAME [STR_AGG].C_STRING_AGG;`
Now it's done.
You can see it under your Database -> Programmability on SSMS
and used like :
SELECT a.Id, [dbo].[STR_AGG](c.Desc) cDesc
FROM TableA a
JOIN TableB b on b.aId = a.Id
JOIN TableC c on c.Code = b.bCode
GROUP BY a.Id
Thanks all =)

Insert Into T2 all rows from T1 that are currently not in T2

I am trying to insert all records from T1 into T2 that are not currently in T2
I have tried in a loop as I am generating a code from a stored proc as the identifier of T2
declare #Part VARCHAR(255),
#GenValue VARCHAR(255),
#x INT
set #x = (select count(*) from T1)
WHILE #x >=0
BEGIN
EXEC [dbo].[usp_GenInd] #GenValue OUT,#GencCode = 'TKM', #GencIncrement = 1
set #Part = #GencValue
INSERT INTO dbo.T2
SELECT #Part AS [part],
[Prod_Code] + Column_Header AS [identifier],
[part_rev] = NULL,
'!' AS [u_version],
a.[Descr] AS [descr],
GETDATE() AS [last_updated],
'ME' AS [last_upd_user],
'EA' AS [basic_unit],
[source] = NULL,
'MAIN' AS [level_1],
'GROUP' AS [level_2],
'ME' AS [user_created],
'20' AS [status],
[Prod_Code] AS [master_part],
[drawing_no] = NULL
FROM [dbo].T1 a
LEFT JOIN dbo.T2 b
ON a.Prod_Code + a.Column_Header = b.part
WHERE b.part is null
END
I keep getting error saying primary key violation on T2 which is the #part variable I am generating from the stored proc.
really slow as well, I thought an insert on left join on null was quicker than a cursor.
only have 67 rows in T1
Thanks for helping in advance
Nope - go back to the cursor if you must continue to use this stored procedure to generate primary key values. The logic error you added to this script is the insert statement. It does not select a specific row from T1 - it selects all rows in T1 that do not exist in T2 (assuming that logic is correct - I'm not going to evaluate it). Presumably you must call the procedure usp_GenInd to generate a PK value for each row in T1. In addition, you never decrement #x - so you have an endless loop.
And notice the wording - "not exists". Generally I find it easier to understand undocumented logic when the query matches (as close as possible) the intent of the code. Your left join logic is the same as not exists - just more difficult to figure out. And you also have a potential problem with your concatenation logic to check for existence. 'AA' + 'B' = 'A' + 'AB' - but the columns contain different values. Be careful about assumptions.
I would try something like:
;WITH cte AS (
SELECT your needed data
FROM [dbo].T1
EXCEPT
SELECT already existing data
FROM [dbo].T2
)
INSERT INTO dbo.T2
SELECT *
FROM cte
Your JOIN logic is flawed.
In your INSERT you have this:
INSERT INTO dbo.T2
SELECT #Part AS [part],
[Prod_Code] + Column_Header AS [identifier],
Inserting #Part into [part]
But when you do your JOIN to rule out existing rows, you have this:
LEFT JOIN dbo.T2 b
ON a.Prod_Code + a.Column_Header = b.part
To rule out existing rows, you should be joining on #part=b.part.

SQL Server: in-query error catching

In my query I have to join tables from db that is not under my control. It is driving me mad as sometimes this db is not accessible (please don't ask me why) and this breaks my query. Fields I'm joining are not fundamental for my operations and I want my app to work normally even if these fields are not accessible at a time.
Here's the data structure that I do not own:
[DBOutOfControl].[dbo].[Table1]:
[Field1]
[Field2]
[DBOutOfControl].[dbo].[Table2]:
[Field1]
[Field2]
[Field3]
And here is my table:
[DBInMyControl].[dbo].[Table3]:
[Field1]
My original query looks something like that:
SELECT [Table3].[MyID],
[ForeignDataQry].[A],
[ForeignDataQry].[B]
FROM [DBInMyControl].[dbo].[Table3]
LEFT JOIN
(SELECT [Table1].[Field1] AS [MyID],
[Table1].[Field2] AS [A],
[SubQry].[Field2] AS [B]
FROM [DBOutOfControl].[dbo].[Table1]
LEFT JOIN
(SELECT [Table2].[Field1],
[Table2].[Field2]
FROM [DBOutOfControl].[dbo].[Table2]
WHERE [Table2].[Field3] = 'Where') AS [SubQry] ON [Table1].[Field1] = [SubQry].[Field1]) AS [ForeignDataQry] ON [Table3].[MyID]=[ForeignDataQry].[MyID]
How can i bullet-proof this query so when [ForeignDataQry] generates an error the result would be:
[MyID] [A] [B]
1 NULL NULL
Otherwise
[MyID] [A] [B]
1 Va1 Val2
Is there something that could be done server side?
Just specify the expected result of COUNT, the three names, and you can check tables beforehand. A minor rewrite can allow this to check for objects other than tables, utilize EXISTS if desired, skip or add more checks, etc.:
IF 0 = ( -- Specify how many records you expect to come.
SELECT COUNT(C.[name]) AS [COUNT]
FROM sys.objects AS O
LEFT JOIN sys.schemas AS S ON S.schema_id = O.schema_id
LEFT JOIN sys.columns AS C ON C.object_id = O.object_id
WHERE O.[name] = 'tablename'
AND S.[name] = 'schemaname'
AND C.[name] = 'columnname'
)
SELECT 1 AS A -- Do some code.
ELSE
SELECT 2 AS B -- Do some other code.
I'd wrap the problematic query in dynamic code in order to be able to catch the compilation error (that we cannot catch in the same scope) like this:
begin try
declare #sql varchar(4000) =
'SELECT [Table3].[MyID],
[ForeignDataQry].[A],
[ForeignDataQry].[B]
FROM [DBInMyControl].[dbo].[Table3]
LEFT JOIN
(SELECT [Table1].[Field1] AS [MyID],
[Table1].[Field2] AS [A],
[SubQry].[Field2] AS [B]
FROM [DBOutOfControl].[dbo].[Table1]
LEFT JOIN
(SELECT [Table2].[Field1],
[Table2].[Field2]
FROM [DBOutOfControl].[dbo].[Table2]
WHERE [Table2].[Field3] = ''Where'') AS [SubQry] ON [Table1].[Field1] = [SubQry].[Field1]) AS [ForeignDataQry] ON [Table3].[MyID]=[ForeignDataQry].[MyID]'
exec(#sql)
end try
begin catch
SELECT [Table3].[MyID],
cast(null as ... )as [A],
cast(null as ...) as [B]
FROM [DBInMyControl].[dbo].[Table3]
end catch
Here I use cast(null as ... )as [A] to get the same type as [ForeignDataQry].[A] has, for example if [ForeignDataQry].[A] is int there should be int: cast(null as int )as [A]
I dealt with this problem in a different way that I originally wanted but anyway:
I created a [Table4] that keeps copy of the records from foreign tables - fields matches [ForeignDataQry] + timestamp. I created procedure:
CREATE PROCEDURE [dbo].[CopyForeignData]
AS
DECLARE #Timestamp datetime
SET #Timestamp = getdate()
BEGIN
INSERT INTO [DBInMyControl].[dbo].[Table4] ([MyID], [A], [B], [Timestamp])
SELECT [Table1].[Field1] AS [MyID],
[Table1].[Field2] AS [A],
[SubQry].[Field2] AS [B],
#Timestamp
FROM [DBOutOfControl].[dbo].[Table1]
LEFT JOIN
(SELECT [Table2].[Field1],
[Table2].[Field2]
FROM [DBOutOfControl].[dbo].[Table2]
WHERE [Table2].[Field3] = 'Where') AS [SubQry] ON [Table1].[Field1] = [SubQry].[Field1]
DELETE FROM [DBInMyControl].[dbo].[Table4] WHERE [Timestamp] <> #Timestamp
END
I will call this every time when I start my app and handle error there and modify my main LEFT JOIN to refer [Table4]

Using JOIN statement with CONTAINS function

In SQL Server database I have a View with a lot of INNER JOINs statements. The last join uses LIKE predicate and that's why it's working too slowly. The query looks like :
SELECT *
FROM A INNER JOIN
B ON A.ID = B.ID INNER JOIN
C ON C.ID1 = B.ID1 INNER JOIN
...........................
X ON X.Name LIKE '%' + W.Name + '%' AND
LIKE '%' + W.Name2 + '%' AND
LIKE '%' + W.Name3 + '%'
I want to use CONTAINS instead of LIKE as :
SELECT *
FROM A INNER JOIN
B ON A.ID = B.ID INNER JOIN
C ON C.ID1 = B.ID1 INNER JOIN
...........................
X ON CONTAINS(X.Name, W.Name) AND
CONTAINS(X.Name, W.Name2) AND
CONTAINS(X.Name, W.Name3)
I know that CONTAINS is working faster than LIKE and also that can't use CONTAINS in JOIN statements.
Is there any workaround in this case or suggestion?
Thanks in advance.
It's not that CONTAINS can't be used in joins.
You just can't use columns as a second parameter of CONTAINS - see MSDN - CONTAINS (Transact-SQL)
CONTAINS
( { column_name | ( column_list ) | * }
,'<contains_search_condition>'
[ , LANGUAGE language_term ]
)
However, you can use a variable as a search condition, so you can use a cursor and then get all data you need.
Here is some very rough example:
declare #Name nvarchar(max)
declare #Temp_A table(Name nvarchar(max))
declare #Temp_B table(Name nvarchar(max))
--=============================================================================================
insert into #Temp_A (Name)
select 'Test'
insert into #Temp_B (Name)
select 'aaaTestaaa'
--=============================================================================================
-- Query 1 - LIKE
--=============================================================================================
select *
from #Temp_A as A
inner join #Temp_B as B on B.Name like '%' + A.Name + '%'
--=============================================================================================
-- Query 2 - CONTAINS
--=============================================================================================
declare table_cursor cursor local fast_forward for
select distinct Name from #Temp_A
open table_cursor
while 1 = 1
begin
fetch table_cursor into #Name
if ##fetch_status <> 0 break
select * from #Temp_B where contains(Name, #Name)
end
close table_cursor
deallocate table_cursor
CONCAT works perfect, I have tested it with PostgreSQL
SELECT *
FROM TABLE_ONE AS a INNER JOIN TABLE_TWO AS b
ON b.field LIKE CONCAT('%', CONCAT(a.field, '%'));
Please refer to similar answer here
You can create a join using a LIKE..
something like this:
SELECT * FROM TABLE_ONE
FULL OUTER JOIN TABLE_TWO ON TABLE_ONE.String_Column LIKE '%' + TABLE_TWO.Name + '%'
ie - select everything from TABLE_ONE where the string_column is contained in the TABLE_TWO name
In short there isn't a way to do this using CONTAINS, it simply is not allowed in a JOIN like this.
see: TSQL - A join using full-text CONTAINS
So although there is performance hit, IMO like is the easiest solution here.

How to UPDATE a column from two possible database sources

I have the following scenario:
Database A.table A.name
Database A.table A.Application
Database B.table B.name
Database B.table B.Application
Database C.table C.name
Database C.table C.Application
I'm trying to write an UPDATE query that will set a value to table A.Application. The value I need to update it with could come from tables B or C but not both; A.name only exists in either B or C. The condition for each row I would need to update on would be as so:
If B.name exists for A.name, set A.Application = B.application
If C.Name exists for A.name, set A.application = C.application
I'm trying to do this non-dynamically; any assistance would be appreciated.
You can do it in two statements:
UPDATE A
SET A.Application = B.Application
FROM A
INNER JOIN B ON A.name = B.name;
UPDATE A
SET A.Application = C.Application
FROM A
INNER JOIN C ON A.name = C.name;
Only one of them will actually do anything to the data, assuming the names in B and C are truly orthogonal. Otherwise, C wins.
Or you could get fancy (without having actually tried it):
UPDATE A
SET A.Application = ISNULL(B.Application, C.Application)
FROM A
LEFT JOIN B ON A.name = B.name
LEFT JOIN C ON A.name = C.name
declare #A table([name] varchar(1),[Application] int)
insert #A
select 'a',0 union all
select 'b',0 union all
select 'c',0
declare #B table([name] varchar(1),[Application] int)
insert #B
select 'a',5 union all
select 'b',6
declare #C table([name] varchar(1),[Application] int)
insert #C
select 'c',8
update #A set [Application]=b.[Application]
from #A a left join
(
select [name],[Application] from #B
union all
select [name],[Application] from #C
) b on a.name=b.name
select * from #A
/*
name Application
---- -----------
a 5
b 6
c 8
*/

Resources