SSMS RegEx replace with optional part - sql-server

I'm trying to separate columns into a separate line form "select", "group by" and "order by" keywords. How can I make the preceeding spaces optional?
Find (requires preceeding spaces):
^{[ ]+}{(SELECT|GROUP BY|ORDER BY)} {[#_a-z0-5]+}
Replace with: \1\2\n\1 \3
Original query (just an example with no logic):
SELECT myColumn
FROM (
SELECT myColumn
FROM foo
GROUP BY myColumn
ORDER BY myColumn
) as bar
GROUP BY myColumn
ORDER BY myColumn
Result (Failed for the main query):
SELECT myColumn
FROM (
SELECT
myColumn
FROM foo
GROUP BY
myColumn
ORDER BY
myColumn
) as bar
GROUP BY myColumn
ORDER BY myColumn
Expected result:
SELECT
myColumn
FROM (
SELECT
myColumn
FROM foo
GROUP BY
myColumn
ORDER BY
myColumn
) as bar
GROUP BY
myColumn
ORDER BY
myColumn

A couple of small changes to your regex gave the correct result for your example:
{^:b*}{(SELECT|GROUP BY|ORDER BY)} {.+}
:b matches space or tab. * matches zero or more occurrences (so keywords at the start of lines will be matched).
I didn't understand the purpose of the restriction on the column list names so replaced it with a generic .+, which seems more reliable.
This solution could probably be made more robust by not relying on a single space between the keyword and the column list:
{^:b*}{(SELECT|GROUP BY|ORDER BY)}:b+{.+}

I think you should use * quantifier after [ ] to make the preceding spaces optional. So the expression will be
^{[ ]*}{(SELECT|GROUP BY|ORDER BY)} {[#_a-z0-5]+}
and replace with same pattern you are using.
\1\2\n\1 \3

Related

Is there anything like preprocessor directives in SQL Server, particularly like #DEFINE?

Whenever I have complicated fields in my SELECT my GROUP BY ends up looking like a trash fire since GROUP BY can't see my aliases defined in SELECT. But if there was something like #DEFINE then I could macro that problem away.
Is there anything like that?
Yes it is called SQLCMD Mode:
:setvar <var> <value>
:SETVAR DATABASENAME "adventureworks2014"
USE $(DATABASENAME);
:setvar col_list "col1,col2,col3"
SELECT $(col_list), COUNT(*) AS cnt
FROM tab
GROUP BY $(col_list);
EDIT:
In my opinion the proper way is to use CROSS APPLY:
SELECT s.col1, s.col2, COUNT(*) AS cnt
FROM t
CROSS APPLY (SELECT col1 = complex_expression, col2 = complex_expression2) AS s
-- the expression is defined once at the same level
GROUP BY s.col1, s.col2;
This isn't akin to a preprocessor directive, but one way to clean up the messy GROUP BY statements is to package the complicated SELECT statements into a subquery, and then perform the GROUP BY from an outer query. This helps to clean things up a bit.
For example:
SELECT
dataset.keyvalue,
SUM(dataset.amt) as total
FROM
(SELECT
field1 + field2 + field3 as keyvalue,
OrderQty as amt
FROM dbo.table1) as dataset
GROUP BY dataset.keyvalue

T-SQL pattern matching with exceptions

Here's a problem I've repeatedly encountered while playing with the Stack Exchange Data Explorer, which is based on T-SQL:
How to search for a string except when it occurs as a substring of some other string?
For example, how can I select all records in a table MyTable where the column MyCol contains the string foo, but ignoring any foos that are part of the string foobar?
A quick and dirty attempt would be something like:
SELECT *
FROM MyTable
WHERE MyCol LIKE '%foo%'
AND MyCol NOT LIKE '%foobar%'
but obviously this will fail to match e.g. MyCol = 'not all foos are foobars', which I do want to match.
One solution I've come up with is to replace all occurrences of foobar with some dummy marker (that is not a substring of foo) and then checking for any remaining foos, as in:
SELECT *
FROM MyTable
WHERE REPLACE(MyCol, 'foobar', 'X') LIKE '%foo%'
This works, but I suspect it's not very efficient, since it has to run the REPLACE() on every record in the table. (For SEDE, this would typically be the Posts table, which currently has about 30 million rows.) Are the any better ways to do this?
(FWIW, the real use case that prompted this question was searching for SO posts with image URLs that use the http:// scheme prefix but do not point to the host i.stack.imgur.com.)
Neither of the ways given so far are guaranteed to work as advertised and only perform the REPLACE on a subset of rows.
SQL Server does not guarantee short circuiting of predicates and can move compute scalars up into the underlying query for derived tables and CTEs.
The only thing that is (mostly) guaranteed to work is the CASE statement. Below I use the syntactic sugar variety of IIF that expands out to CASE
SELECT *
FROM MyTable
WHERE 1 = IIF(MyCol LIKE '%foo%',
IIF(REPLACE(MyCol, 'foobar', 'X') LIKE '%foo%', 1, 0),
0);
A three-stage filter should work:
collect all rows matching '%foo%';
replace all instances of 'foobar' with a non-occurring string (such as '' perhaps);
Check again for matching '%foo%'
Here you only perform the REPLACE on potentially matching rows, not all rows. If you are expecting only a small percentage of matches, this should be much more efficient.
SQL would look like this:
;with data as (
select *
from MyTable
where MyCol like '%foo%'
)
select *
from data
where replace(MyCol, 'foobar', 'X') like '%foo%'
Note that a sub-query is required, as there are no expression short-cuts in SQL; the engine is free to reorder Boolean terms as desired for efficient processing within a singe query level.
This will be faster than your current query:
SELECT *
FROM MyTable
WHERE
MyCol like '%foo%' AND
REPLACE(MyCol, 'foobar', 'X') LIKE '%foo%'
The REPLACE is calculated after MyCol has been applied, so this is faster than just:
REPLACE(MyCol, 'foobar', 'X') LIKE '%foo%'
Assuming you're only interested in finding instances of foo with spaces surrounding them
SELECT *
FROM MyTable
WHERE MyCol LIKE 'foo %' OR MyCol LIKE '% foo %' OR MyCol LIKE '% foo'

LIKE and Equals not working as expected

I am currently trying to find all users in a database where there is a left bracket [ in the user name however when performing a query against the user table using the LIKE operator no rows are returned. If I use the equals = operator then rows are returned.
The issue doesn't appear when using the right bracket ] or other special characters. I want to use the LIKE keyword as I eventually want to use the wildcard functionality to find all users in the table which the character in.
I have isolated the problem out to an example below. FYI Collation is Latin1_General_BIN
-- [ Not Working
DECLARE #Temp TABLE (UserID VARCHAR(10))
INSERT INTO #Temp SELECT 'TEST[DEPT'
SELECT * FROM #Temp WHERE UserID = 'TEST[DEPT' --Returns 1 row
SELECT * FROM #Temp WHERE UserID LIKE 'TEST[DEPT' -- Returns 0 rows
-- ] Working
DECLARE #Temp2 TABLE (UserID VARCHAR(10))
INSERT INTO #Temp2 SELECT 'TEST]DEPT'
SELECT * FROM #Temp2 WHERE UserID = 'TEST]DEPT' --Returns 1 row
SELECT * FROM #Temp2 WHERE UserID LIKE 'TEST]DEPT' -- Returns 1 row
Citing the documentation forLIKE:
Using Wildcard Characters As Literals
You can use the wildcard pattern matching characters as literal characters. To use a wildcard character as a literal character, enclose the wildcard character in brackets.
So do this: SELECT * FROM #Temp WHERE UserID LIKE '%[[]%' if you want to match all rows with a left bracket somewhere in the UserID.
This is because SQL Server's LIKE doesn't comply with the SQL standard and uses an extended "pattern matching" that is a limited subset of a regular expression matching. And because of that the [ has a special meaning in SQL Server (unlike standard SQL) so the [ needs to be escaped.
So you need to use this:
WHERE UserID LIKE 'TEST\[DEPT' ESCAPE '\'
The escape character can be any character you want and it should be one that doesn't occur in the actual value.
SELECT * FROM #Temp WHERE UserID LIKE 'TEST[DEPT'
The query 'delimits' the special character with square brackets, telling the engine to treat it as a normal literal character instead of a character with special meaning. Check here for more info.
Try something like this.
SELECT * FROM #Temp WHERE UserID LIKE '%TEST[[]DEPT%'

Sorting a query, how does thas it work?

Can someone explain to me why this is possible with SQL Server :
select column1 c,column2 d
from table1
order by c,column3
I can sort by column1 using the alias because order by clause is applied after the select clause, but how is it possible to sort by a column that i'm not retreiving ?
Thanks in advance.
All column names from the objects in the FROM clause are available to ORDER BY, except in the case of GROUPing or DISTINCT. As you've indicated the alias is also available, because the SELECT statement is processed before the ORDER BY.
This is one of those cases where you trust the optimizer.
According to Books Online (http://technet.microsoft.com/en-us/library/ms188385(v=sql.90).aspx)
The ORDER BY clause can include items that do not appear in the
select list. However, if SELECT DISTINCT is specified, or if the
statement contains a GROUP BY clause, or if the SELECT statement
contains a UNION operator, the sort columns must appear in the select
list.
Additionally, when the SELECT statement includes a UNION operator, the
column names or column aliases must be those specified in the first
select list.
You can sort by alias' which you define in the select select column1 c and then you tell it to sort by a column that you are not including in the select, but one that still exists in the table. This allows us to sort by expressions of data, without having to have it in the select.
Select cost, tax From table ORDER BY (cost*tax)

Using the SQL LIKE operator with %%

I had a requirement to create a query in SQL Server where the search condition would include/exclude a table based on user input.
Say I have two tables, TABLE_A and TABLE_B with columns KEYCOLUMN_A and COLUMN_A in TABLE_A and columns FKCOLUMN_B and COLUMN_B in TABLE_B.
And a query like:
SELECT TABLE_A.* FROM TABLE_A, TABLE_B WHERE TABLE_A.KEYCOLUMN_A = TABLE_B.FKCOLUMN_B
AND TABLE_A.COLUMN_A LIKE '%SEARCH%' AND TABLE_B.COLUMN_B LIKE '%SEARCH2%'
Now if the user does not input SEARCH2, I don't need to search TABLE_B. But this would mean an IF ELSE clause. And as the number of "optional" tables in the query increases, the permutations and combinations would also increase and there will be many IF and ELSE statements.
Instead I decided to keep the statement as it is. So if SEARCH2 is empty, the query will effectively become:
SELECT * FROM TABLE_A, TABLE_B WHERE TABLE_A.KEYCOLUMN_A = TABLE_B.FKCOLUMN_B
AND TABLE_A.COLUMN_A LIKE '%SEARCH%' AND TABLE_B.COLUMN_B LIKE '% %'
Can the SQL optimizer recognize that LIKE %% is as good as removing the condition itself?
Wrap an OR around your "B" table, such as:
AND (len(searchString)=0 OR table_b.column_b LIKE "%searchString%" )
This way, if no value for the string, its length would be zero, and the first part of the OR would be evaluated, always come back as true and return that portion of the equation as valid and ignore the other half using the LIKE clause.
You could apply the same for as many linked tables as you need.
First thing, you have a space in your example:
AND TABLE_B.COLUMN_B LIKE '% %'
That will never be optimized as it is indeed a significant condition.
Now, I think that if it is optimized away depends on the database engine and how smart it is.
For example, SQL Server 2005 does offer the same execution plan for the two types of queries, while MySQL 5.0.38 does not.
LIKE is used with the WHERE clause to search, update, and delete a record using wild cards.
Example:
To search all records whose employee name is starred from a character, 'a':
select * from Employee where Name like 'a%'
To update all records with name amit whose employee name is starting with a character, 'a':
update Employee set Name='amit' where Name like 'a%'
To delete all records whose employee name is starting with a character, 'a':
delete from Employee where Name like 'a%'
The LIKE operator is used in a WHERE clause to search for a specified pattern in a column
LIKE '%[p-s]' -- "It search data from table parameter where sentence ending with p,q,r,s word."
LIKE '[0-9]' --It use for search only numeric value
LIKE '%table%' -- it use for search parameter from table where use "table" keyword'.
LIKE %[^p-r] -- it set the condition where Not Ending With a Range of Characters
Example:
SELECT T1.BrandName,T1.BrandID,T2.CategoryName,T2.Color FROM TABLE1 T1
LEFT JOIN TABLE2 T2 on T1.ID = T2.BrandID
WHERE T1.BrandName LIKE '%Samsung%'
Example:
SELECT T1.BrandName,T1.BrandID,T2.CategoryName,T2.Color FROM TABLE1 T1
LEFT JOIN TABLE2 T2 on T1.ID = T2.BrandID
WHERE T1.BrandName LIKE '%[a-j]'
In MySQL you can also use ILIKE, and then it's case insensitive.
You can rewrite you query like this:
SELECT TABLE_A.* FROM TABLE_A, TABLE_B WHERE TABLE_A.KEYCOLUMN_A = TABLE_B.FKCOLUMN_B
AND (#paramA='' or TABLE_A.COLUMN_A LIKE '%' + #paramA + '%')
AND (#paramB='' or TABLE_B.COLUMN_B LIKE '%' + #paramB + '%')
This way, if paramA or paramB is '', then the other column that is queried inside same parentheses will not be queried.
Use UNION and proper JOINs.
The %foo% search term is bad enough (can't use index) without adding OR and LEN to the mix too.
SELECT
TABLE_A.*
FROM
TABLE_A
JOIN
TABLE_B On TABLE_A.KEYCOLUMN_A = TABLE_B.FKCOLUMN_B
WHERE
TABLE_A.COLUMN_A LIKE '%SEARCH%' AND TABLE_B.COLUMN_B LIKE '%SEARCH2%'
UNION
SELECT
TABLE_A.*
FROM
TABLE_A
WHERE
TABLE_A.COLUMN_A LIKE '%SEARCH%'

Resources