Select records with a substring from another table - sql-server

I have this two tables:
data
id |email
_
1 |xxx#gmail.com
2 |yyy#gmial.com
3 |zzzgimail.com
errors
_
error |correct
#gmial.com|#gmail.com
gimail.com|#gmail.com
How can I select from data all the records with an email error? Thanks.

SELECT d.id, d.email
FROM data d
INNER JOIN errors e ON d.email LIKE '%' + e.error
Would do it, however doing a LIKE with a wildcard at the start of the value being matched on will prevent an index from being used so you may see poor performance.
An optimal approach would be to define a computed column on the data table, that is the REVERSE of the email field and index it. This would turn the above query into a LIKE condition with the wildcard at the end like so:
SELECT d.id, d.email
FROM data d
INNER JOIN errors e ON d.emailreversed LIKE REVERSE(e.error) + '%'
In this case, performance would be better as it would allow an index to be used.
I blogged a full write up on this approach a while ago here.

Assuming the error is always at the end of the string:
declare #data table (
id int,
email varchar(100)
)
insert into #data
(id, email)
select 1, 'xxx#gmail.com' union all
select 2, 'yyy#gmial.com' union all
select 3, 'zzzgimail.com'
declare #errors table (
error varchar(100),
correct varchar(100)
)
insert into #errors
(error, correct)
select '#gmial.com', '#gmail.com' union all
select 'gimail.com', '#gmail.com'
select d.id,
d.email,
isnull(replace(d.email, e.error, e.correct), d.email) as CorrectedEmail
from #data d
left join #errors e
on right(d.email, LEN(e.error)) = e.error

Well, in reality you can't with the info you have provided.
In SQL you would need to maintain a table of "correct" domains. With that you could do a simple query to find non-matches.
You could use some "non" SQL functionality in SQL Server to do a regular expression check, however that kind of logic does not below in SQL (IMO).

select * from
(select 1 as id, 'xxx#gmail.com' as email union
select 2 as id, 'yyy#gmial.com' as email union
select 3 as id, 'zzzgimail.com' as email) data join
(select '#gmial.com' as error, '#gmail.com' as correct union
select 'gimail.com' as error, '#gmail.com' as correct ) errors
on data.email like '%' + error + '%'
I think ... that if you didn't use a wildcard at the beginning but anywhere after, it could benefit from an index. If you used a full text search, it could benefit too.

Related

How to write the COLUMNS to ROWS in SQL server [duplicate]

Looking for elegant (or any) solution to convert columns to rows.
Here is an example: I have a table with the following schema:
[ID] [EntityID] [Indicator1] [Indicator2] [Indicator3] ... [Indicator150]
Here is what I want to get as the result:
[ID] [EntityId] [IndicatorName] [IndicatorValue]
And the result values will be:
1 1 'Indicator1' 'Value of Indicator 1 for entity 1'
2 1 'Indicator2' 'Value of Indicator 2 for entity 1'
3 1 'Indicator3' 'Value of Indicator 3 for entity 1'
4 2 'Indicator1' 'Value of Indicator 1 for entity 2'
And so on..
Does this make sense? Do you have any suggestions on where to look and how to get it done in T-SQL?
You can use the UNPIVOT function to convert the columns into rows:
select id, entityId,
indicatorname,
indicatorvalue
from yourtable
unpivot
(
indicatorvalue
for indicatorname in (Indicator1, Indicator2, Indicator3)
) unpiv;
Note, the datatypes of the columns you are unpivoting must be the same so you might have to convert the datatypes prior to applying the unpivot.
You could also use CROSS APPLY with UNION ALL to convert the columns:
select id, entityid,
indicatorname,
indicatorvalue
from yourtable
cross apply
(
select 'Indicator1', Indicator1 union all
select 'Indicator2', Indicator2 union all
select 'Indicator3', Indicator3 union all
select 'Indicator4', Indicator4
) c (indicatorname, indicatorvalue);
Depending on your version of SQL Server you could even use CROSS APPLY with the VALUES clause:
select id, entityid,
indicatorname,
indicatorvalue
from yourtable
cross apply
(
values
('Indicator1', Indicator1),
('Indicator2', Indicator2),
('Indicator3', Indicator3),
('Indicator4', Indicator4)
) c (indicatorname, indicatorvalue);
Finally, if you have 150 columns to unpivot and you don't want to hard-code the entire query, then you could generate the sql statement using dynamic SQL:
DECLARE #colsUnpivot AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #colsUnpivot
= stuff((select ','+quotename(C.column_name)
from information_schema.columns as C
where C.table_name = 'yourtable' and
C.column_name like 'Indicator%'
for xml path('')), 1, 1, '')
set #query
= 'select id, entityId,
indicatorname,
indicatorvalue
from yourtable
unpivot
(
indicatorvalue
for indicatorname in ('+ #colsunpivot +')
) u'
exec sp_executesql #query;
well If you have 150 columns then I think that UNPIVOT is not an option. So you could use xml trick
;with CTE1 as (
select ID, EntityID, (select t.* for xml raw('row'), type) as Data
from temp1 as t
), CTE2 as (
select
C.id, C.EntityID,
F.C.value('local-name(.)', 'nvarchar(128)') as IndicatorName,
F.C.value('.', 'nvarchar(max)') as IndicatorValue
from CTE1 as c
outer apply c.Data.nodes('row/#*') as F(C)
)
select * from CTE2 where IndicatorName like 'Indicator%'
sql fiddle demo
You could also write dynamic SQL, but I like xml more - for dynamic SQL you have to have permissions to select data directly from table and that's not always an option.
UPDATEAs there a big flame in comments, I think I'll add some pros and cons of xml/dynamic SQL. I'll try to be as objective as I could and not mention elegantness and uglyness. If you got any other pros and cons, edit the answer or write in comments
cons
it's not as fast as dynamic SQL, rough tests gave me that xml is about 2.5 times slower that dynamic (it was one query on ~250000 rows table, so this estimate is no way exact). You could compare it yourself if you want, here's sqlfiddle example, on 100000 rows it was 29s (xml) vs 14s (dynamic);
may be it could be harder to understand for people not familiar with xpath;
pros
it's the same scope as your other queries, and that could be very handy. A few examples come to mind
you could query inserted and deleted tables inside your trigger (not possible with dynamic at all);
user don't have to have permissions on direct select from table. What I mean is if you have stored procedures layer and user have permissions to run sp, but don't have permissions to query tables directly, you still could use this query inside stored procedure;
you could query table variable you have populated in your scope (to pass it inside the dynamic SQL you have to either make it temporary table instead or create type and pass it as a parameter into dynamic SQL;
you can do this query inside the function (scalar or table-valued). It's not possible to use dynamic SQL inside the functions;
Just to help new readers, I've created an example to better understand #bluefeet's answer about UNPIVOT.
SELECT id
,entityId
,indicatorname
,indicatorvalue
FROM (VALUES
(1, 1, 'Value of Indicator 1 for entity 1', 'Value of Indicator 2 for entity 1', 'Value of Indicator 3 for entity 1'),
(2, 1, 'Value of Indicator 1 for entity 2', 'Value of Indicator 2 for entity 2', 'Value of Indicator 3 for entity 2'),
(3, 1, 'Value of Indicator 1 for entity 3', 'Value of Indicator 2 for entity 3', 'Value of Indicator 3 for entity 3'),
(4, 2, 'Value of Indicator 1 for entity 4', 'Value of Indicator 2 for entity 4', 'Value of Indicator 3 for entity 4')
) AS Category(ID, EntityId, Indicator1, Indicator2, Indicator3)
UNPIVOT
(
indicatorvalue
FOR indicatorname IN (Indicator1, Indicator2, Indicator3)
) UNPIV;
Just because I did not see it mentioned.
If 2016+, here is yet another option to dynamically unpivot data without actually using Dynamic SQL.
Example
Declare #YourTable Table ([ID] varchar(50),[Col1] varchar(50),[Col2] varchar(50))
Insert Into #YourTable Values
(1,'A','B')
,(2,'R','C')
,(3,'X','D')
Select A.[ID]
,Item = B.[Key]
,Value = B.[Value]
From #YourTable A
Cross Apply ( Select *
From OpenJson((Select A.* For JSON Path,Without_Array_Wrapper ))
Where [Key] not in ('ID','Other','Columns','ToExclude')
) B
Returns
ID Item Value
1 Col1 A
1 Col2 B
2 Col1 R
2 Col2 C
3 Col1 X
3 Col2 D
I needed a solution to convert columns to rows in Microsoft SQL Server, without knowing the colum names (used in trigger) and without dynamic sql (dynamic sql is too slow for use in a trigger).
I finally found this solution, which works fine:
SELECT
insRowTbl.PK,
insRowTbl.Username,
attr.insRow.value('local-name(.)', 'nvarchar(128)') as FieldName,
attr.insRow.value('.', 'nvarchar(max)') as FieldValue
FROM ( Select
i.ID as PK,
i.LastModifiedBy as Username,
convert(xml, (select i.* for xml raw)) as insRowCol
FROM inserted as i
) as insRowTbl
CROSS APPLY insRowTbl.insRowCol.nodes('/row/#*') as attr(insRow)
As you can see, I convert the row into XML (Subquery select i,* for xml raw, this converts all columns into one xml column)
Then I CROSS APPLY a function to each XML attribute of this column, so that I get one row per attribute.
Overall, this converts columns into rows, without knowing the column names and without using dynamic sql. It is fast enough for my purpose.
(Edit: I just saw Roman Pekar answer above, who is doing the same.
I used the dynamic sql trigger with cursors first, which was 10 to 100 times slower than this solution, but maybe it was caused by the cursor, not by the dynamic sql. Anyway, this solution is very simple an universal, so its definitively an option).
I am leaving this comment at this place, because I want to reference this explanation in my post about the full audit trigger, that you can find here: https://stackoverflow.com/a/43800286/4160788
DECLARE #TableName varchar(max)=NULL
SELECT #TableName=COALESCE(#TableName+',','')+t.TABLE_CATALOG+'.'+ t.TABLE_SCHEMA+'.'+o.Name
FROM sysindexes AS i
INNER JOIN sysobjects AS o ON i.id = o.id
INNER JOIN INFORMATION_SCHEMA.TABLES T ON T.TABLE_NAME=o.name
WHERE i.indid < 2
AND OBJECTPROPERTY(o.id,'IsMSShipped') = 0
AND i.rowcnt >350
AND o.xtype !='TF'
ORDER BY o.name ASC
print #tablename
You can get list of tables which has rowcounts >350 . You can see at the solution list of table as row.
The opposite of this is to flatten a column into a csv eg
SELECT STRING_AGG ([value],',') FROM STRING_SPLIT('Akio,Hiraku,Kazuo', ',')

failed combining multiple columns into one row in ms sql

This how the code works I got this code somewhere here in stackoverflow site, I couldn't add the email address to that row it only accepts the phone numbers
Select distinct ST2.AlumniID,
substring(
(
Select ','+ST2.contactinfo AS [text()]
From dbo.Alumni_contacts ST1
Where ST1.AlumniID = ST2.AlumniID
For XML PATH ('')
), 2, 10000) [Contact Info]
From dbo.Alumni_contacts ST2
This is the result
2011-0014|656-88-24,656-88-24
2011-0014|pakchaeyoung#gmail.com,pakchaeyoung#gmail.com
2012-0098|667-63-55,667-63-55
2012-0098|park.ginnie#gmail.com,park.ginnie#gmail.com
2012-0172|gutierrez88#gmail.com
Do you have any better and simpler query about this?
You are referencing wrong table expression in subquery SELECT list. It should alias of the inner FROM. Try
... (Select ','+ ST1.contactinfo ...

What is the purpose of a [field] + 0 in a T-SQL select Statement?

I ran across something in a T-SQL view that I had never seen before, and was curious what the purpose of it would be. The view was performing a left hash join on some tables, and the ID column of the first table was selected using a.ID + 0. I've never seen this in the wild before, and could not find anything similar on StackOverflow.
SELECT a.ID + 0 ID, a.FIELD1, a.FIELD2
FROM myDatabase.dbo.SomeTable as a LEFT HASH JOIN ...
I had to remove the + 0 statement from the view because ID field in the view appeared to be nullable, even though the original table data source listed the column as non-nullable.
What would be the purpose of the ID + 0 in the select statement?
It converts a string to an INT when possible, and throws an error otherwise.
Consider:
select sql_variant_property('1', 'BaseType') -- varchar
select sql_variant_property('1'+0, 'BaseType') -- int
select sql_variant_property('ABC'+0, 'BaseType') -- Error

SQL 2008: getting rows with a join to an XML field?

Not sure if this question makes for some poor performance down the track, but seems to at least feel "a better way" right now..
What I am trying to do is this:
I have a table called CONTACTS which amongst other things has a primary key field called memberID
I also have an XML field which contains the ID's of your friends (for example).. like:
<root><id>2</id><id>6</id><id>14</id></root>
So what I am trying to do via a stored proc is pass in say your member ID, and return all of your friends info, for example:
select name, address, age, dob from contacts
where id... xml join stuff...
The previous way I had it working (well sort of!) selected all the XML nodes (/root/id) into a temp table, and then did a join from that temp table to the contact table to get the contact fields...
Any help much appreciated.. just a bit overloaded from the .query .nodes examples, and of course which is maybe a better way of doing this...
THANKS IN ADVANCE!
<-- EDIT -->
I did get something working, but looks like a SQL frankenstein statement!
Basically I needed to get the friends contact ID's from the XML field, and populate into a temp table like so:
Declare #contactIDtable TABLE (ID int)
INSERT INTO #contactIDtable (ID)
SELECT CONVERT(INT,CAST(T2.memID.query('.') AS varchar(100))) AS friendsID
FROM dbo.members
CROSS APPLY memberContacts.nodes('/root/id/text()') AS T2(memID)
But crikey! the convert/cast thing looks serious.. as I need to get an INT for the next bit which is the actual join to return the contact data as follows:
SELECT memberID, memberName, memberAddress1
FROM members
INNER JOIN #contactIDtable cid
ON members.memberID = cid.ID
ORDER BY memberName
RESULT...
Well it works.. in my case, my memberContacts XML field had 3 nodes (id's in this case), and the above query returned 3 rows of data (memberID, memberName, memberAddress1)...
The whole point of this of course was to try to save creating a many join table i.e. list of all my friends ID's... just not sure if the above actually makes this quicker and easier...
Anymore ideas / more efficient ways of trying to do this???
SQL Server's syntax for reading XML is one of the least intuitive around. Ideally, you'd want to:
select f.name
from friends f
join #xml x
on x.id = f.id
Instead, SQL Server requires you to spell out everything. To turn an XML variable or column into a "rowset", you have to spell out the exact path and think up two aliases:
#xml.nodes('/root/id') as table_alias(column_alias)
Now you have to explain to SQL Server how to turn <id>1</id> into an int:
table_alias.column_alias.value('.', 'int')
So you can see why most people prefer to decode XML on the client side :)
A full example:
declare #friends table (id int, name varchar(50))
insert #friends (id, name)
select 2, 'Locke Lamorra'
union all select 6, 'Calo Sanzo'
union all select 10, 'Galdo Sanzo'
union all select 14, 'Jean Tannen'
declare #xml xml
set #xml = ' <root><id>2</id><id>6</id><id>14</id></root>'
select f.name
from #xml.nodes('/root/id') as table_alias(column_alias)
join #friends f
on table_alias.column_alias.value('.', 'int') = f.id
In order to get your XML contents as rows from a "pseudo-table", you need to use the .nodes() on the XML column - something like:
DECLARE #xmlfield XML
SET #xmlfield = '<root><id>2</id><id>6</id><id>14</id></root>'
SELECT
ROOT.ID.value('(.)[1]', 'int')
FROM
#xmfield.nodes('/root/id') AS ROOT(ID)
SELECT
(list of fields)
FROM
dbo.Contacts c
INNER JOIN
#xmlfield.nodes('/root/id') AS ROOT(ID) ON c.ID = Root.ID.value('(.)[1]', 'INT')
Basically, the .nodes() defines a pseudo-table ROOT with a single column ID, that will contain one row for each node in the XPath expression, and the .value() selects a specific value out of that XML fragment.

Is there a way to optimize the query given below

I have the following Query and i need the query to fetch data from SomeTable based on the filter criteria present in the Someothertable. If there is nothing present in SomeOtherTable Query should return me all the data present in SomeTable
SQL SERVER 2005
SomeOtherTable does not have any indexes or any constraint all fields are char(50)
The Following Query work fine for my requirements but it causes performance problems when i have lots of parameters.
Due to some requirement of Client, We have to keep all the Where clause data in SomeOtherTable. depending on subid data will be joined with one of the columns in SomeTable.
For example the Query can can be
SELECT
*
FROM
SomeTable
WHERE
1=1
AND
(
SomeTable.ID in (SELECT DISTINCT ID FROM SomeOtherTable WHERE Name = 'ABC' and subid = 'EF')
OR
0=(SELECT Count(1) FROM SomeOtherTable WHERE spName = 'ABC' and subid = 'EF')
)
AND
(
SomeTable.date =(SELECT date FROM SomeOtherTable WHERE Name = 'ABC' and subid = 'Date')
OR
0=(SELECT Count(1) FROM SomeOtherTable WHERE spName = 'ABC' and subid = 'Date')
)
EDIT----------------------------------------------
I think i might have to explain my problem in detail:
We have developed an ASP.net application that is used to invoke parametrize crystal reports, parameters to the crystal reports are not passed using the default crystal reports method.
In ASP.net application we have created wizards which are used to pass the parameters to the Reports, These parameters are not directly consumed by the crystal report but are consumed by the Query embedded inside the crystal report or the Stored procedure used in the Crystal report.
This is achieved using a table (SomeOtherTable) which holds parameter data as long as report is running after which the data is deleted, as such we can assume that SomeOtherTable has max 2 to 3 rows at any given point of time.
So if we look at the above query initial part of the Query can be assumed as the Report Query and the where clause is used to get the user input from the SomeOtherTable table.
So i don't think it will be useful to create indexes etc (May be i am wrong).
SomeOtherTable does not have any
indexes or any constraint all fields
are char(50)
Well, there's your problem. There's nothing you can do to a query like this which will improve its performance if you create it like this.
You need a proper primary or other candidate key designated on all of your tables. That is to say, you need at least ONE unique index on the table. You can do this by designating one or more fields as the PK, or you can add a UNIQUE constraint or index.
You need to define your fields properly. Does the field store integers? Well then, an INT field may just be a better bet than a CHAR(50).
You can't "optimize" a query that is based on an unsound schema.
Try:
SELECT
*
FROM
SomeTable
LEFT JOIN SomeOtherTable ON SomeTable.ID=SomeOtherTable.ID AND Name = 'ABC'
WHERE
1=1
AND
(
SomeOtherTable.ID IS NOT NULL
OR
0=(SELECT Count(1) FROM SomeOtherTable WHERE spName = 'ABC')
)
also put 'with (nolock)' after each table name to improve performance
The following might speed you up
SELECT *
FROM SomeTable
WHERE
SomeTable.ID in
(SELECT DISTINCT ID FROM SomeOtherTable Where Name = 'ABC')
UNION
SELECT *
FROM SomeTable
Where
NOT EXISTS (Select spName From SomeOtherTable Where spName = 'ABC')
The UNION will effectivly split this into two simpler queries which can be optiomised separately (depends very much on DBMS, table size etc whether this will actually improve performance -- but its always worth a try).
The "EXISTS" key word is more efficient than the "SELECT COUNT(1)" as it will return true as soon as the first row is encountered.
Or check if the value exists in db first
And you can remove the distinct keyword in your query, it is useless here.
if EXISTS (Select spName From SomeOtherTable Where spName = 'ABC')
begin
SELECT *
FROM SomeTable
WHERE
SomeTable.ID in
(SELECT ID FROM SomeOtherTable Where Name = 'ABC')
end
else
begin
SELECT *
FROM SomeTable
end
Aloha
Try
select t.* from SomeTable t
left outer join SomeOtherTable o
on t.id = o.id
where (not exists (select id from SomeOtherTable where spname = 'adbc')
OR spname = 'adbc')
-Edoode
change all your select statements in the where part to inner jons.
the OR conditions should be union all-ed.
also make sure your indexing is ok.
sometimes it pays to have an intermediate table for temp results to which you can join to.
It seems to me that there is no need for the "1=1 AND" in your query. 1=1 will always evaluate to be true, leaving the software to evaluate the next part... why not just skip the 1=1 and evaluate the juicy part?
I am going to stick to my original Query.

Resources