SQL Server : finding substring using PATINDEX function - sql-server

I'm writing different queries in SQL Server.
I have 2 tables, Employees and Departments.
Table Employees consists of EMPLOYEE_ID, ENAME, ID_DEP - department id. Table Departments consists of ID_DEP, DNAME.
The task is to show Employee.ENAME and his Department.DNAME where Department.DNAME has word Sales inside. I have to use functions SUBSTRING and PATINDEX.
Here is my code, but I think that it looks quite strange and it's meaningless. Nevertheless I need to use both functions in this task.
SELECT e.ENAME, d.DNAME
FROM EMPLOYEE e
JOIN DEPARTMENTS d ON d.ID_DEP = e.ID_DEP
WHERE UPPER(SUBSTRING(d.DNAME, (PATINDEX('%SALES%', d.DNAME)), 5)) = 'SALES'
Any ideas what should I change while continuing using these two functions?

The answer is just below, and BTW, using row constructor VALUES is an excellent mean to get a simple demo of what you want.
The query below provides several possible answers to your ambiguous question. Why would you need to use these functions? Is it an homework that specify this? If your SQL Server database was installed with a case insensitive collation, or the column 'name' was set to this collation, no matter how UPPER is used, it will makes no difference in match. The most you can get of UPPER is to make the data appears uppercase in the result, or turn data to uppercase if you update the column. PATINDEX/LIKE are going to perform case insensitive match. And you know, this is so useful, that most people configure their server with some case insensitive collation. To circumvent default comparison behavior that match the column/database collation, specify the collate clause, as in the outer apply of Test2.
Here are the queries. Watch the results, they show what I said.
select *
From
(Values ('très sales'), ('TRES SALES'), ('PLUTOT PROPRE')) as d(name)
outer apply (Select Test1='match' Where Substring(name, patindex('%SALES%', name), 5) = 'SALES') as test1
outer apply (Select Test2='match' Where name COLLATE Latin1_General_CS_AS like '%SALES%' ) as test2 -- CS_AS mean case sensitive
outer apply (Select Test3='match' Where name like '%SALES%') as test3
select * -- really want an upper case match ?
From
(Values ('très sales'), ('TRES SALES'), ('PLUTOT PROPRE')) as d(name)
Where name COLLATE Latin1_General_CS_AS like '%SALES%'
select * -- demo of patindex
From
(Values ('très sales'), ('TRES SALES'), ('PLUTOT PROPRE')) as d(name)
outer apply (Select ReallyUpperMatch=name Where patindex('%SALES%', name COLLATE Latin1_General_CS_AS)>0 ) as ReallyUpperMatch -- CI_AS mean case sensitive
outer apply (Select ciMatch=name Where name like '%SALES%' ) as ciMatch
outer apply (Select MakeItLookUpper=UPPER(ciMatch) ) MakeItLookUpper

Related

Is it possible to avoid subquery in select when concatenating columns?

I have a "main" table containing an id (plus some other columns) and an aka table which joins to it by the [main id] column to main.id. The following query returns some columns from main along with a column of concatenated comma-separated "lastName"s from aka:
SELECT m.id, m.name,
(SELECT a.[lastname] + ',' AS [text()]
FROM aka a
WHERE a.[main id] = m.[id]
FOR xml path ('')) [akas]
FROM main m
This works fine, but I'd like to know if there is a way to avoid doing this in a subquery?
Using CROSS APPLY you could move subquery from SELECT list:
SELECT m.id, m.name,
(SELECT a.[lastname] + ',' AS [text()]
FROM aka a
WHERE a.[main id] = m.[id]
FOR xml path ('')) [akas]
FROM main m;
to:
SELECT m.id, m.name, s.akas
FROM main m
CROSS APPLY (SELECT a.[lastname] + ',' AS [text()]
FROM aka a
WHERE a.[main id] = m.[id]
FOR xml path ('')) AS s(akas)
Notes:
You could refer to s.akas multiple time
You could add WHERE s.akas ...
Long subquery in SELECT list could be less readable
If it is possbile that correlated subquery return no rows you need to use OUTER APPLY instead.
Generally spoken there's nothing against a sub-query in a technical view...
You might prefer an APPLY due to readability or multi reference.
Whenever you put a sub-query directly into the column's list like here:
SELECT Column1
,Column2
,(SELECT x FROM y) AS Column3
,[...]
... this sub-select must deliver
just one column
of just one row
Using FOR XML PATH(''),TYPE lets the result be one single value of type XML. This makes it possible to return many rows/columns "as one". Without the ,TYPE it will be the XML "as text". The concatenation trick with XML is possible due to a speciality of the generation of XML with empty tag names and return "as text". But in any case: The returned value will be just one bit of information, therefore fitting into a column list.
Whenever you expect more than one row, you'd have to force this to be one bit of data (like - often seen! - SELECT TOP 1 x FROM y ORDER BY SomeSortKey, which brings back the first or the last or ...)
All other intentions to get 1:n data needs 'JOIN' or 'APPLY'. With scalar data, as in your case, there will actually be no difference, whether you use a sub-select or an APPLY.
Since you have an arbitrary number of records combining for the final string, what you have is the best option to do this in SQL. Generally, you are expected to return one row per item, and if you want a CSV string then build that in your client code.

unexpected output sql server using count

I am using sql-server 2012
The query is :
CREATE TABLE TEST ( NAME VARCHAR(20) );
INSERT TEST
( NAME
)
SELECT NULL
UNION ALL
SELECT 'James'
UNION ALL
SELECT 'JAMES'
UNION ALL
SELECT 'Eric';
SELECT NAME
, COUNT(NAME) AS T1
, COUNT(COALESCE(NULL, '')) T2
, COUNT(ISNULL(NAME, NULL)) T3
, COUNT(DISTINCT ( Name )) T4
, COUNT(DISTINCT ( COALESCE(NULL, '') )) T5
, ##ROWCOUNT T6
FROM TEST
GROUP BY Name;
DROP TABLE TEST;
In the result set ther is no 'JAMES' ? (caps)
please tell how this was excluded
expected was Null,james,JAMES,eric
You need to change your Name column collation to Latin1_General_CS_AS which is case sensitive
SELECT NAME COLLATE Latin1_General_CS_AS,
Count(NAME) AS T1,
Count(COALESCE(NULL, '')) T2,
Count(Isnull(NAME, NULL)) T3,
Count(DISTINCT ( Name )) T4,
Count(DISTINCT ( COALESCE(NULL, '') )) T5,
##ROWCOUNT T6
FROM TEST
GROUP BY Name COLLATE Latin1_General_CS_AS;
Use a sensitive case collation like COLLATE Latin1_General_CS_AS.
CREATE TABLE TEST ( NAME VARCHAR(20) COLLATE Latin1_General_CS_AS );
The other people who commented here are correct.
It would be easier for you to understand their meaning if you googled for collation and case sensitivity, but in layman's terms it's like this:
Collation is a little like encoding; It determines how the characters in string columns are interpreted, ordered and compared to one another. Case insensitive means that UPPERCASE / lowercase are considered exactly the same, so for instance 'JAMES', 'james', 'JaMeS' etc would be no different to SQL Server. So when your database has a case-insensitive collation and you then create a table with a column without defining the collation, that column will inherit the default collation used by the database, which is how we arrived here.
You can manually alter a column collation, or define it during a query, but bear in mind that whenever you compare two different columns, you need to assign both of them to use the same collation, or you will get an error. That's why it's good practice to pretty much use the same collation throughout the database barring special query-specific circumstances.
To your question regarding what Latin1_General_CS_AS means, it basically means "Latin1_General" alphabet, the details of which you can check online. The "CS" part means case-sensitive, if it were case-insensitive you would see "CI" instead. The "AS" means accent-sensitivity, and "AI" would mean accent-insensitivity. Basically, whether 'Á' is considered to be equal to 'A', or not.
You can read a lot more about it from the source, here.

Use SQL variable for comparison in the same SELECT statement that sets it

How do I make the following T-SQL statement legal? I can copy the subquery that sets #Type variable for every CASE option, but I'd rather execute the subquery only once. Is it possible?
SELECT
#Type = (SELECT CustomerType FROM dbo.Customers WHERE CustomerId = (SELECT CustomerId FROM dbo.CustomerCategories WHERE CatId= #CatId)),
CASE
WHEN #Type = 'Consumer'THEN dbo.Products.FriendlyName
WHEN #Type = 'Company' THEN dbo.Products.BusinessName
WHEN #Type IS NULL THEN dbo.Products.FriendlyName
WHEN #Type = '' THEN dbo.Products.FriendlyName
END Product,
...
FROM
Products
INNER JOIN
Category
...
Edit: modified my example to be more concrete...have to run now...will be back tomorrow...sorry for signing off short but have to pick up kids :D will check back tomorrow. THX!!
Clarification: I can't separate the two: in the subquery's where-clasue, I need to refer to columns from tables that're used in the main query's join stmt. If I separate them, then #Type will lose relevance.
Why not just separate it into two operations? What do you think you gain by trying to glom them into a single statement?
SELECT #Type = (subquery);
SELECT CASE WHEN #type = 'Consumer'...
At the risk of sounding obtuse, do you really need the variable at all? Why not:
SELECT CASE WHEN col_form_subquery = 'Consumer' THEN ...
END Product
FROM (<subquery>) AS x;
With that form you'll need to decide whether you want to assign values to variables or retrieve results.
You can also pull multiple variables, e.g.
SELECT #Col1 = Col1, #Col2 = Col2
FROM (<subquery>) AS x;
-- then refer to those variables in your other query:
SELECT *, #Col1, #Col2 FROM dbo.Products WHERE Col1 = #Col2;
But this is all conjecture, because you haven't shared enough specifics.
EDIT okay, now that we have a real query and can understand a bit better what you're after, let's see if we can write you a new version. I'll assume that you were only trying to store the #Type variable so you can re-use it within the query, and that you weren't trying to store a value there to use later (after this query).
SELECT CASE
WHEN c.CustomerType = 'Company' THEN p.BusinessName
WHEN COALESCE(c.CustomerType, '') IN ('Consumer', '') THEN p.FriendlyName
END
--, other columns
FROM dbo.Products AS p
INNER JOIN dbo.Category AS cat
ON p.CatId = cat.CatId
INNER JOIN dbo.CustomerCategories AS ccat
ON ccat.CatId = cat.CatId
INNER JOIN dbo.Customers AS c
ON c.CustomerId = ccat.CustomerId
WHERE cat.CategoryId = #CatId;
Some notes:
I'm not sure why you thought subqueries are the right way to approach this. Usually it is much better (and clearer to other developers) to build proper joins and let SQL Server optimize the query overall instead of trying to be smart and optimize individual subqueries largely independent of the main query. A proper join will help to eliminate rows up front that would otherwise, through the subqueries, potentially be materialized - only to be discarded. Trust SQL Server to do its job, and in this case its job is to perform a join across multiple tables.
The join to dbo.Category might not be needed if the SELECT doesn't need to display the category name. If so then change the where clause and remove that join (join to CusomterCategories instead).
The second case can be changed to a simple ELSE if you've covered all the possible scenarios.
I made an assumption about the join between Products and Category (why is Category not plural like the others?). If this isn't it please fill us in.
You cannot not do that, you will get the following error
A SELECT statement that assigns a value to a variable must not be combined with data-retrieval operations.
separate the two and then return the variable as part of the select statement

Transact SQL parallel query execution

Suppose I have
INSERT INTO #tmp1 (ID) SELECT ID FROM Table1 WHERE Name = 'A'
INSERT INTO #tmp2 (ID) SELECT ID FROM Table2 WHERE Name = 'B'
SELECT ID FROM #tmp1 UNION ALL SELECT ID FROM #tmp3
I would like to run queries 1 & 2 in parallel, and then combine results after they are finished.
Is there a way to do this in pure T-SQL, or a way to check if it will do this automatically?
A background for those who wants it: I investigate a complex search where there're multiple conditions which are later combined (term OR (term2 AND term3) OR term4 AND item5=term5) and thus I investigate if it would be useful to execute those - largely unrelated - conditions in parallel, later combining resulting tables (and calculating ranks, weights, and so on).
E.g. should be several resultsets:
SELECT COUNT(*) #tmp1 union #tmp3
SELECT ID from (#tmp1 union #tmp2) WHERE ...
SELECT * from TABLE3 where ID IN (SELECT ID FROM #tmp1 union #tmp2)
SELECT * from TABLE4 where ID IN (SELECT ID FROM #tmp1 union #tmp2)
You don't. SQL doesn't work like that: it isn't procedural. It leads to race conditions and data issues because of other connections
Table variables are also scoped to the batch and connection so you can't share results over 2 connections in case you're wondering.
In any case, all you need is this, unless you gave us an bad example:
SELECT ID FROM Table1 WHERE Name = 'A'
UNION
SELECT ID FROM Table2 WHERE Name = 'B'
I suspect you're thinking of "run in parallel" because of this procedural thinking. What is your actual desired problem and goal?
Note: table variables do not allow parallel operations: Can queries that read table variables generate parallel exection plans in SQL Server 2008?
You don't decide what to parallelise - SQL Server's optimizer does. And the largest unit of work that the optimizer will work with is a single statement - so, you find a way to express your query as a single statement, and then rely on SQL Server to do its job, which it will usually do quite well.
If, having constructed your query, the performance isn't acceptable, then you can look at applying hints or forcing certain plans to be used. A lot of people break their queries into multiple statements, either believing that they can do a better job than SQL Server, or because it's how they "naturally" think of the task at hand. Both are "wrong" (for certain values of wrong), but if there's a natural breakdown, you may be able to replicate it using Common Table Expressions - these would allow you to name each sub-part of the problem, and then combine them together, all as part of a single statement.
E.g.:
;WITH TabA AS (
SELECT ID FROM Table1 WHERE Name = 'A'
), TabB AS (
SELECT ID FROM Table2 WHERE Name = 'B'
)
SELECT ID FROM TabA UNION ALL SELECT ID FROM TabB
And this will allow the server to decide how best to resolve this query (e.g. deciding whether to store intermediate results in "temp" tables)
Seeing in one of your other comments you discussing about having to "work with" the intermediate results - this can still be done with CTEs (if it's not just a case of you failing to be able to express the "final" result as a single query), e.g.:
;WITH TabA AS (
SELECT ID FROM Table1 WHERE Name = 'A'
), TabAWithCalcs AS (
SELECT ID,(ID*5+6) as ModID from TabA
)
SELECT * FROM TabAWithCalcs
Why not just:
SELECT ID FROM Table1 WHERE Name = 'A'
UNION ALL
SELECT ID FROM Table2 WHERE Name = 'B'
then if SQL Server wants to run the two selects in parallel, it will do at its own violition.
Otherwise we need more context for what you're trying to achieve if this isn't practical.

Is there a way to optimize the query given below

I have the following Query and i need the query to fetch data from SomeTable based on the filter criteria present in the Someothertable. If there is nothing present in SomeOtherTable Query should return me all the data present in SomeTable
SQL SERVER 2005
SomeOtherTable does not have any indexes or any constraint all fields are char(50)
The Following Query work fine for my requirements but it causes performance problems when i have lots of parameters.
Due to some requirement of Client, We have to keep all the Where clause data in SomeOtherTable. depending on subid data will be joined with one of the columns in SomeTable.
For example the Query can can be
SELECT
*
FROM
SomeTable
WHERE
1=1
AND
(
SomeTable.ID in (SELECT DISTINCT ID FROM SomeOtherTable WHERE Name = 'ABC' and subid = 'EF')
OR
0=(SELECT Count(1) FROM SomeOtherTable WHERE spName = 'ABC' and subid = 'EF')
)
AND
(
SomeTable.date =(SELECT date FROM SomeOtherTable WHERE Name = 'ABC' and subid = 'Date')
OR
0=(SELECT Count(1) FROM SomeOtherTable WHERE spName = 'ABC' and subid = 'Date')
)
EDIT----------------------------------------------
I think i might have to explain my problem in detail:
We have developed an ASP.net application that is used to invoke parametrize crystal reports, parameters to the crystal reports are not passed using the default crystal reports method.
In ASP.net application we have created wizards which are used to pass the parameters to the Reports, These parameters are not directly consumed by the crystal report but are consumed by the Query embedded inside the crystal report or the Stored procedure used in the Crystal report.
This is achieved using a table (SomeOtherTable) which holds parameter data as long as report is running after which the data is deleted, as such we can assume that SomeOtherTable has max 2 to 3 rows at any given point of time.
So if we look at the above query initial part of the Query can be assumed as the Report Query and the where clause is used to get the user input from the SomeOtherTable table.
So i don't think it will be useful to create indexes etc (May be i am wrong).
SomeOtherTable does not have any
indexes or any constraint all fields
are char(50)
Well, there's your problem. There's nothing you can do to a query like this which will improve its performance if you create it like this.
You need a proper primary or other candidate key designated on all of your tables. That is to say, you need at least ONE unique index on the table. You can do this by designating one or more fields as the PK, or you can add a UNIQUE constraint or index.
You need to define your fields properly. Does the field store integers? Well then, an INT field may just be a better bet than a CHAR(50).
You can't "optimize" a query that is based on an unsound schema.
Try:
SELECT
*
FROM
SomeTable
LEFT JOIN SomeOtherTable ON SomeTable.ID=SomeOtherTable.ID AND Name = 'ABC'
WHERE
1=1
AND
(
SomeOtherTable.ID IS NOT NULL
OR
0=(SELECT Count(1) FROM SomeOtherTable WHERE spName = 'ABC')
)
also put 'with (nolock)' after each table name to improve performance
The following might speed you up
SELECT *
FROM SomeTable
WHERE
SomeTable.ID in
(SELECT DISTINCT ID FROM SomeOtherTable Where Name = 'ABC')
UNION
SELECT *
FROM SomeTable
Where
NOT EXISTS (Select spName From SomeOtherTable Where spName = 'ABC')
The UNION will effectivly split this into two simpler queries which can be optiomised separately (depends very much on DBMS, table size etc whether this will actually improve performance -- but its always worth a try).
The "EXISTS" key word is more efficient than the "SELECT COUNT(1)" as it will return true as soon as the first row is encountered.
Or check if the value exists in db first
And you can remove the distinct keyword in your query, it is useless here.
if EXISTS (Select spName From SomeOtherTable Where spName = 'ABC')
begin
SELECT *
FROM SomeTable
WHERE
SomeTable.ID in
(SELECT ID FROM SomeOtherTable Where Name = 'ABC')
end
else
begin
SELECT *
FROM SomeTable
end
Aloha
Try
select t.* from SomeTable t
left outer join SomeOtherTable o
on t.id = o.id
where (not exists (select id from SomeOtherTable where spname = 'adbc')
OR spname = 'adbc')
-Edoode
change all your select statements in the where part to inner jons.
the OR conditions should be union all-ed.
also make sure your indexing is ok.
sometimes it pays to have an intermediate table for temp results to which you can join to.
It seems to me that there is no need for the "1=1 AND" in your query. 1=1 will always evaluate to be true, leaving the software to evaluate the next part... why not just skip the 1=1 and evaluate the juicy part?
I am going to stick to my original Query.

Resources