SQL Server: Running CONTAINS against a substring of a column

SQL Server: Running CONTAINS against a substring of a column - sql-server

I have a table table1 with an nvarchar column column1 that looks something like this:
phrase 1.1;phrase 1.2;phrase 1.3 ...
phrase 2.1;phrase 2.2;phrase 2.3 ...
...
I would like to run an CONTAINS query on only the first phrase in the column. I've tried several variations of this:
SELECT * FROM table1
WHERE CONTAINS(LEFT(table1.column1, CHARINDEX(';', table1.column1) - 1), <search query>)
Is this possible? Ideally, I'd like to do it without creating a new table or column.
Edit -- Some of the errors I'm getting:
Incorrect syntax near the keyword 'LEFT'., An expression of non-boolean type specified in a context where a condition is expected., Incorrect syntax near ";". Expecting '(', or SELECT.

The CONTAINS function is only used when the table (column?) is configured to use full-text indexing. I'm going to guess that this is not the case, here. (Apologies if it is--I have no experience with full-text indexing.)
In any case, as you are matching the first characters in the string, the more precise function LEFT should work fine:
SELECT *
FROM table1
WHERE LEFT(table1.column1, CHARINDEX(';', table1.column1) - 1) = #SearchQuery
Note that you may have problems if there are no semicolons in the string. One way to avoid that would be to guarantee there is always one present, like so:
SELECT *
FROM table1
WHERE LEFT(table1.column1 + ';', CHARINDEX(';', table1.column1) - 1) = #SearchQuery

So there's a couple of problems here. You can only use CONTAINS on a full-text indexed column. If your table is configured that way then great!
Your second issue is that the CONTAINS syntax is a bit clunky, and it doesn't like complexity. You could work around that using a common-table expression, e.g.:
DECLARE #table TABLE (column1 NVARCHAR(100));
INSERT INTO #table SELECT 'phrase 1.1;phrase 1.2;phrase 1.3;'
INSERT INTO #table SELECT 'phrase 2.1;phrase 2.2;phrase 2.3;'
SELECT LEFT(column1, CHARINDEX(';', column1) - 1) FROM #table;
WITH x AS (SELECT LEFT(column1, CHARINDEX(';', column1) - 1) AS search FROM #table)
SELECT * FROM x WHERE CONTAINS(x.search, 'phrase 1.2');
Noting that this won't work, because #table.column1 isn't full-text indexed. But it gets around the syntax error, and could be adapted for your case. Something like this:
WITH x AS (SELECT LEFT(column1, CHARINDEX(';', column1) - 1) AS search FROM table1)
SELECT * FROM x
WHERE CONTAINS(search, <search query>)

Related

Split string to array using delimiter, getting second to last element in SELECT Statement

Heads!
In my database, I have a column that contains the following data (examples):
H-01-01-02-01
BLE-01-03-01
H-02-05-1.1-03
The task is to get the second to last element of the array if you would split that using the "-" character. The strings are of different length.
So this would be the result using the above mentioned data:
02
03
1.1
Basically I'm searching for an equivalent of the following ruby-statement for use in a Select-Statement in SQL-Server:
"BLE-01-03-01".split("-")[-2]
Is this possible in any way in SQL Server? After spending some time searching for a solution, I only found ones that work for the last or first element.
Thanks very much for any clues or solutions!
PS: Version of SQL Server is Microsoft SQL Server 2012

As an alternative you can try this:.
--A mockup table with some test data to simulate your issue
DECLARE #mockupTable TABLE (ID INT IDENTITY, YourColumn VARCHAR(50));
INSERT INTO #mockupTable VALUES
('H-01-01-02-01')
,('BLE-01-03-01')
,('H-02-05-1.1-03');
--The query
SELECT CastedToXml.value('/x[sql:column("CountOfFragments")-1][1]','nvarchar(10)') AS TheWantedFragment
FROM #mockupTable t
CROSS APPLY(SELECT CAST('<x>' + REPLACE(t.YourColumn,'-','</x><x>') + '</x>' AS XML))A(CastedToXml)
CROSS APPLY(SELECT CastedToXml.value('count(/x)','int')) B(CountOfFragments);
The idea in short:
The first APPLY will transform the string to a XML like this
<x>H</x>
<x>01</x>
<x>01</x>
<x>02</x>
<x>01</x>
The second APPLY will xquery into this XML to get the count of fragments. As APPLY will add this as a column to the result set, we can use the value using sql:column() to get the wanted fragment by its position.

As I wrote in my comment - using charindex with reverse.
First, create and populate sample table (Please save us this step in your future questions):
DECLARE #T AS TABLE
(
Col Varchar(100)
);
INSERT INTO #T (Col) VALUES
('H-01-01-02-01'),
('BLE-01-03-01'),
('H-02-05-1.1-03');
The query:
SELECT Col,
LEFT(RIGHT(Col, AlmostLastDelimiter-1), AlmostLastDelimiter - LastDelimiter - 1) As SecondToLast
FROM #T
CROSS APPLY (SELECT CharIndex('-', Reverse(Col)) As LastDelimiter) As A
CROSS APPLY (SELECT CharIndex('-', Reverse(Col), LastDelimiter+1) As AlmostLastDelimiter) As B
Results:
Col SecondToLast
H-01-01-02-01 02
BLE-01-03-01 03
H-02-05-1.1-03 1.1

Similar to Zohar's solution, but using CTEs instead of CROSS APPLY to prevent redundancy. I personally find this easier to follow, as you can see what happens in each step. Doesn't make it a better solution though ;)
DECLARE #strings TABLE (data VARCHAR(50));
INSERT INTO #strings VALUES ('H-01-01-02-01') , ('BLE-01-03-01'), ('H-02-05-1.1-03');
WITH rev AS (
SELECT
data,
REVERSE(data) AS reversed
FROM
#strings),
first_hyphen AS (
SELECT
data,
reversed,
CHARINDEX('-', reversed) + 1 AS first_pos
FROM
rev),
second_hyphen AS (
SELECT
data,
reversed,
first_pos,
CHARINDEX('-', reversed, first_pos) AS second_pos
FROM
first_hyphen)
SELECT
data,
REVERSE(SUBSTRING(reversed, first_pos, second_pos - first_pos)) AS result
FROM
second_hyphen;
Results:
data result
H-01-01-02-01 02
BLE-01-03-01 03
H-02-05-1.1-03 1.1

Try this
declare #input NVARCHAR(100)
declare #dlmt NVARCHAR(3);
declare #pos INT = 2
SET #input=REVERSE(N'H-02-05-1.1-03');
SET #dlmt=N'-';
SELECT
CAST(N'<x>'
+ REPLACE(
(SELECT REPLACE(#input,#dlmt,'#DLMT#') AS [*] FOR XML PATH(''))
,N'#DLMT#',N'</x><x>'
) + N'</x>' AS XML).value('/x[sql:variable("#pos")][1]','nvarchar(max)');

For xml path returns null instead of nothing

I thought that following query suppose to return nothing, but, instead, it returns one record with a column containing null:
select *
from ( select 1 as "data"
where 0 = 1
for xml path('row') ) as fxpr(xmlcol)
If you run just the subquery - nothing is returned, but when this subquery has an outer query, performing a select on it, null is returned.
Why is that happening?

SQL Server will try to predict the type. Look at this
SELECT tbl.[IsThereAType?] + '_test'
,tbl.ThisIsINT + 100
FROM
(
SELECT NULL AS [IsThereAType?]
,3 AS ThisIsINT
UNION ALL
SELECT 'abc'
,NULL
--UNION ALL
--SELECT 1
-- ,NULL
) AS tbl;
The first column will be predicted as string type, while the second is taken as INT. That's why the + operator on top works. Try to add a number to the first or a string to the second. This will fail.
Try to uncomment the last block and it will fail too.
The prediction is done at a very early stage. Look at this, where I did include the third UNION ALL (invalid query, breaking the type):
EXEC sp_describe_first_result_set
N'SELECT *
FROM
(
SELECT NULL AS [IsThereAType?]
,3 AS ThisIsINT
UNION ALL
SELECT ''abc''
,NULL
UNION ALL
SELECT 1
,NULL
) AS tbl';
The result returns "IsThereAType?" as INT! (I'm pretty sure this is rather random and might be different on your system.)
Btw: Without this last block the type is VARCHAR(3)...
Now to your question
A naked XML is taken as NTEXT (altough this is deprecated!) and needs ,TYPE to be predicted as XML:
EXEC sp_describe_first_result_set N'SELECT ''blah'' FOR XML PATH(''blub'')';
EXEC sp_describe_first_result_set N'SELECT ''blah'' FOR XML PATH(''blub''),TYPE';
The same wrapped within a sub-select returns as NVARCHAR(MAX) resp. XML
EXEC sp_describe_first_result_set N'SELECT * FROM(SELECT ''blah'' FOR XML PATH(''blub'')) AS x(y)';
EXEC sp_describe_first_result_set N'SELECT * FROM(SELECT ''blah'' FOR XML PATH(''blub''),TYPE) AS x(y)';
Well, this is a bit weird actually...
An XML is a scalar value taken as NTEXT, NVARCHAR(MAX) or XML (depending on the way you are calling it). But it is not allowed to place a naked scalar in a sub-select:
SELECT * FROM('blah') AS x(y) --fails
While this is okay
SELECT * FROM(SELECT 'blah') AS x(y)
Conclusio:
The query parser seems to be slightly inconsistent in your special case:
Although a sub-select cannot consist of one scalar value only, the SELECT ... FOR XML (which returs a scalar actually) is not rejected. The engine seems to interpret this as a SELECT returning a scalar value. And this is perfectly okay.
This is usefull with nested sub-selects as a column (correlated sub-queries) to nest XML:
SELECT TOP 5 t.TABLE_NAME
,(
SELECT COLUMN_NAME,DATA_TYPE
FROM INFORMATION_SCHEMA.COLUMNS AS c
WHERE c.TABLE_SCHEMA=t.TABLE_SCHEMA
AND c.TABLE_NAME=t.TABLE_NAME
FOR XML PATH('Column'),ROOT('Columns'),TYPE
) AS AllTablesColumns
FROM INFORMATION_SCHEMA.TABLES AS t;
Without the FOR XML clause this would fail (...more than one value... / ...Only one column...)
Pass a generic SELECT as a parameter?
Some would say this is not possible, but you can try this:
CREATE FUNCTION dbo.TestType(#x XML)
RETURNS TABLE
AS
RETURN
SELECT #x AS BringMeBack;
GO
--The SELECT must be wrapped in paranthesis!
SELECT *
FROM dbo.TestType((SELECT TOP 5 * FROM sys.objects FOR XML PATH('x'),ROOT('y')));
GO
DROP FUNCTION dbo.TestType;

Empty XML Data is treated as NULL in SQL Server.
select *
from ( select 1 as "data"
where 0 = 1
for xml path('row') ) as fxpr(xmlcol)
The Subquery will be executed first and the result of the subquery i.e (Empty Rowset) will be converted to XML therefore, getting NULL Value.

MS SQL Server - Use Calculated Field of SELECT statement

i would like to ask you if there is a statement to use calculated fields of the same SELECT-statement:
For example:
Table Test:
Machine Amount Value
500 20 20
SELECT Machine,
Amount*Value AS TestFormula
TestFormula*12 AS TestFormulaYear
FROM Test
What is the correct statement to reuse this calculated field?
Thanks in advance,
Kevin

In sql server at least, you can do it with a subquery:
SELECT Machine
, TestFormula
, TestFormula*12 AS TestFormulaYear
FROM (
SELECT Machine
, Amount*Value AS TestFormula
FROM Test
) T

For the simple example you showed us, I would just recommend repeating the expression
SELECT
Machine,
Amount*Value AS TestFormula,
Amount*Value*12 AS TestFormulaYear
FROM Test;
Other answers have already shown how you can use a subquery to truly reuse the column, but that is not very performant compared to what I wrote above.

You can use a common-table expression (CTE) to reuse the value:
WITH formula AS (
SELECT Machine,
Amount*Value AS TestFormula
FROM Test
)
SELECT Machine,
TestFormula
TestFormula*12 AS TestFormulaYear
FROM formula;
If the batch with the CTE contains multiple statements, the preceding statement must be terminated with a semicolon.

Assuming this is T-SQL:
You can't reference the alias of a column in the SELECT statement, no. If you look at SELECT (Transact-SQL) you'll note that the SELECT is the 8th part of the query to be processed. This means only ORDER BY is going to be able to reference a column's alias.
If you need to do further calculations on a calculated value you need to use a CTE, subquery, or redeclare the calculation. For example:
Repeated calculation:
SELECT [Column] * 10 As Expression,
[Column] * 10 * 5 AS Expression2
FROM [Table];
CTE:
WITH Formula AS(
SELECT [Column] * 10 As Expression
FROM [Table])
SELECT Expression,
Expression * 5 AS Expression2
FROM Formula;
Sub Query:
SELECT Expression,
Expression * 5 AS Expression2
FROM (SELECT [Column] * 10 As Expression
FROM [Table]) Formula;

If you are looking to set up a Statement so that when formulas are changed many columns will be updated, I suppose you could declare the formulas and use Dynamic SQL. There can be an advantage to this if you want to be sure that lots of columns are updated correctly:
Declare #TestFormula as nvarchar(100) = '([Amount]*[Value])'
Declare #TestFormulaYear as nvarchar(100) = '(12*' + #TestFormula + ')'
declare #sql as nvarchar(max)
set #sql = 'SELECT [Machine], ' + #TestFormula + ' AS TestFormula, ' + #TestFormulaYear + ' AS TestFormulaYear
FROM (values(500, 20, 20)) a([Machine], [Amount], [Value])'
exec(#sql)

SQL Server Regular expression extract pattern from DB colomn

I have a question about SQL Server: I have a database column with a pattern which is like this:
up to 10 digits
then a comma
up to 10 digits
then a semicolon
e.g.
100000161, 100000031; 100000243, 100000021;
100000161, 100000031; 100000243, 100000021;
and I want to extract within the pattern the first digits (up to 10) (1.) and then a semicolon (4.)
(or, in other words, remove everything from the semicolon to the next semicolon)
100000161; 100000243; 100000161; 100000243;
Can you please advice me how to establish this in SQL Server? Im not very familiar with regex and therefore have no clue how to fix this.
Thanks,
Alex

Try this
Declare #Sql Table (SqlCol nvarchar(max))
INSERT INTO #Sql
SELECT'100000161,100000031;100000243,100000021;100000161,100000031;100000243,100000021;'
;WITH cte
AS (SELECT Row_number()
OVER(
ORDER BY (SELECT NULL)) AS Rno,
split.a.value('.', 'VARCHAR(1000)') AS Data
FROM (SELECT Cast('<S>'
+ Replace( Replace(sqlcol, ';', ','), ',',
'</S><S>')
+ '</S>'AS XML) AS Data
FROM #Sql)AS A
CROSS apply data.nodes('/S') AS Split(a))
SELECT Stuff((SELECT '; ' + data
FROM cte
WHERE rno%2 <> 0
AND data <> ''
FOR xml path ('')), 1, 2, '') AS ExpectedData
ExpectedData
-------------
100000161; 100000243; 100000161; 100000243

I believe this will get you what you are after as long as that pattern truly holds. If not it's fairly easy to ensure it does conform to that pattern and then apply this
Select Substring(TargetCol, 1, 10) + ';' From TargetTable

You can take advantage of SQL Server's XML support to convert the input string into an XML value and query it with XQuery and XPath expressions.
For example, the following query will replace each ; with </b><a> and each , to </a><b> to turn each string into <a>100000161</a><a>100000243</a><a />. After that, you can select individual <a> nodes with /a[1], /a[2] :
declare #table table (it nvarchar(200))
insert into #table values
('100000161, 100000031; 100000243, 100000021;'),
('100000161, 100000031; 100000243, 100000021;')
select
xCol.value('/a[1]','nvarchar(200)'),
xCol.value('/a[2]','nvarchar(200)')
from (
select convert(xml, '<a>'
+ replace(replace(replace(it,';','</b><a>'),',','</a><b>'),' ','')
+ '</a>')
.query('a') as xCol
from #table) as tmp
-------------------------
A1 A2
100000161 100000243
100000161 100000243
value extracts a single value from an XML field. nodes returns a table of nodes that match the XPath expression. The following query will return all "keys" :
select
a.value('.','nvarchar(200)')
from (
select convert(xml, '<a>'
+ replace(replace(replace(it,';','</b><a>'),',','</a><b>'),' ','')
+ '</a>')
.query('a') as xCol
from #table) as tmp
cross apply xCol.nodes('a') as y(a)
where a.value('.','nvarchar(200)')<>''
------------
100000161
100000243
100000161
100000243
With 200K rows of data though, I'd seriously consider transforming the data when loading it and storing it in indivisual, indexable columns, or add a separate, related table. Applying string manipulation functions on a column means that the server can't use any covering indexes to speed up queries.
If that's not possible (why?) I'd consider at least adding a separate XML-typed column that would contain the same data in XML form, to allow the creation of an XML index.

How do I extract part of a string in t-sql

If I have the following nvarchar variable - BTA200, how can I extract just the BTA from it?
Also, if I have varying lengths such as BTA50, BTA030, how can I extract just the numeric part?

I would recommend a combination of PatIndex and Left. Carefully constructed, you can write a query that always works, no matter what your data looks like.
Ex:
Declare #Temp Table(Data VarChar(20))
Insert Into #Temp Values('BTA200')
Insert Into #Temp Values('BTA50')
Insert Into #Temp Values('BTA030')
Insert Into #Temp Values('BTA')
Insert Into #Temp Values('123')
Insert Into #Temp Values('X999')
Select Data, Left(Data, PatIndex('%[0-9]%', Data + '1') - 1)
From #Temp
PatIndex will look for the first character that falls in the range of 0-9, and return it's character position, which you can use with the LEFT function to extract the correct data. Note that PatIndex is actually using Data + '1'. This protects us from data where there are no numbers found. If there are no numbers, PatIndex would return 0. In this case, the LEFT function would error because we are using Left(Data, PatIndex - 1). When PatIndex returns 0, we would end up with Left(Data, -1) which returns an error.
There are still ways this can fail. For a full explanation, I encourage you to read:
Extracting numbers with SQL Server
That article shows how to get numbers out of a string. In your case, you want to get alpha characters instead. However, the process is similar enough that you can probably learn something useful out of it.

substring(field, 1,3) will work on your examples.
select substring(field, 1,3) from table
Also, if the alphabetic part is of variable length, you can do this to extract the alphabetic part:
select substring(field, 1, PATINDEX('%[1234567890]%', field) -1)
from table
where PATINDEX('%[1234567890]%', field) > 0

LEFT ('BTA200', 3) will work for the examples you have given, as in :
SELECT LEFT(MyField, 3)
FROM MyTable
To extract the numeric part, you can use this code
SELECT RIGHT(MyField, LEN(MyField) - 3)
FROM MyTable
WHERE MyField LIKE 'BTA%'
--Only have this test if your data does not always start with BTA.

declare #data as varchar(50)
set #data='ciao335'
--get text
Select Left(#Data, PatIndex('%[0-9]%', #Data + '1') - 1) ---->>ciao
--get numeric
Select right(#Data, len(#data) - (PatIndex('%[0-9]%', #Data )-1) ) ---->>335

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SQL Server: Running CONTAINS against a substring of a column - sql-server

Related

Split string to array using delimiter, getting second to last element in SELECT Statement

For xml path returns null instead of nothing

MS SQL Server - Use Calculated Field of SELECT statement

SQL Server Regular expression extract pattern from DB colomn

How do I extract part of a string in t-sql

Categories

Resources