SQL Server: Concatenating WHERE Clauses. Seeking Appropriate Pattern - sql-server

I want to take a poorly designed SQL statement that's embedded in C# code and rewrite it as a stored procedure (presumably), and am looking for an appropriate means to address the following pattern:
sql = "SELECT <whatever> FROM <table> WHERE 1=1";
if ( someCodition.HasValue )
{
sql += " AND <some-field> = " + someCondition.Value;
}
This is a simplification. The actual statement is quite long and contains several such conditions, some of which include INNER JOIN's to other tables if the condition is present. This last part is key, otherwise I'd probably be able to solve all of them with:
WHERE <some-condition-value> IS NULL OR <some-field> = <some-condition-value>
I can think of a few possible approaches. I'm looking for the correct approach.
Edit:
I don't want to perform concatenation in C#. I consider this a serious compromise to security.

If I understand the question properly, the idea is to replace a whole section of code in C# in charge of producing, "long hand", a specific SQL statement corresponding to a list of search criteria, by a single call to a stored-procedure which would, SQL-side, use a generic template of the query aimed at handling all allowed combinations of search criteria in a uniform fashion.
In addition to the difficulty of mapping expressions evaluated on the application-side (eg. someCondition.HasValue) to expressions evaluated on the SQL-side (eg "some-condition-value"), the solution you envision may be logically/functionally equivalent to a "hand-crafted" SQL statement, but slower and more demanding of SQL resources.
Essentially, the C# code encapsulates specific knowledge about the "physical" layout of the database and its schema. It uses this info to figure-out when a particular JOIN may be required or when a particular application-level search criteria value translate to say a SQL "LIKE" rather than an "=" predictate. It may also encaspsulate business rules such as "when the ZIP code is supplied, search by that rather than by State".
You are right to attempt and decouple the data model (the way the application sees the data) from the data schema (the way it is declared and stored in SQL), but the proper mapping needs to be done somehow, somewhere.
Doing this at the level of the application, with all the expressive power of C# as opposed to say T-SQL, is not necessarily a bad thing, provided it is done
- in a module that is independent of other features of the application
and, where practical,
- it is somewhat data/configuration-driven as so to allow small changes in the data model (say the addition of a search criteria) to be implemented by changing a configuration file, rather than plugging this in somewhere in the middle of a long series of C# conditional statements.

start with this WHERE clause:
WHERE 1=1
then append all conditions as:
AND <some-field> = " + someCondition.Value;
the optimizer will toss out the 1=1 condition and you don't have to worry about too many ANDs
EDIT based on OP's comment about not wanting to concatinate strings:
here is a very comprehensive article on how to handle this topic:
Dynamic Search Conditions in T-SQL by Erland Sommarskog
it covers all the issues and methods of trying to write queries with multiple optional search conditions
here is the table of contents:
Introduction
The Case Study: Searching Orders
The Northgale Database
Dynamic SQL
Introduction
Using sp_executesql
Using the CLR
Using EXEC()
When Caching Is Not Really What You Want
Static SQL
Introduction
x = #x OR #x IS NULL
Using IF statements
Umachandar's Bag of Tricks
Using Temp Tables
x = #x AND #x IS NOT NULL
Handling Complex Conditions
Hybrid Solutions – Using both Static and Dynamic SQL
Using Views
Using Inline Table Functions
Conclusion
Feedback and Acknowledgements
Revision History

Well you can start with
StringBuilder sb = new StringBuilder();
sb.Append("SELECT <whatever> FROM <table> WHERE 1 = 1 ");
if ( someCodition.HasValue )
{
sb.Append(" AND <some-field> = " + someCondition.Value);
}
// And so on
Will save you the trouble of putting the first WHERE - AND
[Edit]
You can also try this
Create an SP with all required parameters for the table, and write the query like this.
DECLARE #sqlStatement NVARCHAR(MAX)
#sqlStatement = " SELECT fields1, fields2 FROM TableA WHERE 1 = 1 "
if(#param1 IS NOT NULL) #sqlStatement = #sqlStatement + "AND Column1 = " + #param1
if(#param2 IS NOT NULL) #sqlStatement = #sqlStatement + "AND Column2 = " + #param2
// and so on
sp_executeSql #sqlStatement
Also you can try similar SP but with:
SELECT fields1, fields2 FROM TableA WHERE 1 = 1
AND ( ( #param1 IS NULL ) OR ( Column1 = #param1 ) )
AND ( ( #param2 IS NULL ) OR ( Column2 = #param2 ) )
this is definitely injection proof!

Related

Django model based on an SQL table-valued function using MyModel.objects.raw()

If it's relevant I'm using Django with Django Rest Framework, django-mssql-backend and pyodbc
I am building some read only models of a legacy database using fairly complex queries and Django's MyModel.objects.raw() functionality. Initially I was executing the query as a Select query which was working well, however I received a request to try and do the same thing but with a table-valued function from within the database.
Executing this:
MyModel.objects.raw(select * from dbo.f_mytablefunction)
Gives the error: Invalid object name 'myapp_mymodel'.
Looking deeper into the local variables at time of error it looks like this SQL is generated:
'SELECT [myapp_mymodel].[Field1], '
'[myapp_mymodel].[Field2] FROM '
'[myapp_mymodel] WHERE '
'[myapp_mymodel].[Field1] = %s'
The model itself is mapped properly to the query as executing the equivalent:
MyModel.objects.raw(select * from dbo.mytable)
Returns data as expected, and dbo.f_mytablefunction is defined as:
CREATE FUNCTION dbo.f_mytablefunction
(
#param1 = NULL etc etc
)
RETURNS TABLE
AS
RETURN
(
SELECT
field1, field2 etc etc
FROM
dbo.mytable
)
If anyone has any explanation as to why these two modes of operation are treated substantially differently then I would be very pleased to find out.
Guess you've figured this out by now (see docs):
MyModel.objects.raw('select * from dbo.f_mytablefunction(%s)', [1])
If you'd like to map your table valued function to a model, this gist has a quite thorough approach, though no license is mentioned.
Once you've pointed your model 'objects' to the new TableFunctionManager and added the 'function_args' OrderedDict (see tests in gist), you can query it as follows:
MyModel.objects.all().table_function(param1=1)
For anyone wondering about use cases for table valued functions, try searching for 'your_db_vendor tvf'.

SQL Server execution speed varies wildly depending on how parameters are provided to an inline table function

I am investigating a problem with the execution speed of an inline table function in SQL Server. Or that's where I thought the problem lay. I came across
T-SQL code is extremely slow when saved as an Inline Table-valued Function
which looked promising, since it described what I was seeing, but I seemed to have the opposite problem - when I passed variables to my function, it took 17 seconds, but when I ran the code of my function in a query window, using DECLARE statements for the variables (which I thought effectively made them literals), it ran in milliseconds. Same code, same parameters - just wrapping them up in an inline table function seemed to drag it way down.
I tried to reduce my query to the minimum possible code that still exhibited the behaviour. I am using numerous existing inline table functions (all of which have worked fine for years), and managed to strip my code down to needing just a call of one existing inline table function to be able to highlight the speed difference. But in doing so I noticed something very odd
SELECT strStudentNumber
FROM dbo.udfNominalSnapshot('2019', 'REG')
takes 17 seconds whereas
DECLARE #strAcademicSessionStart varchar(4) = '2019'
DECLARE #strProgressCode varchar(12)= 'REG'
SELECT strStudentNumber
FROM dbo.udfNominalSnapshot(#strAcademicSessionStart, #strProgressCode)
takes milliseconds! So nothing to do with wrapping the code in an inline table function, but everything to do with how the parameters are passed to a nested function within it. Based on the cited article I'm guessing there are two different execution plans in play, but I have no idea why/how, and more importantly, what I can do to persuade SQL Server to use the efficient one?
P.S. here is the code of the inner UDF call in response to a comment request
ALTER FUNCTION [dbo].[udfNominalSnapshot]
(
#strAcademicSessionStart varchar(4)='%',
#strProgressCode varchar(10)='%'
)
RETURNS TABLE
AS
RETURN
(
SELECT TOP 100 PERCENT S.strStudentNumber, S.strSurname, S.strForenames, S.strTitle, S.strPreviousSurname, S.dtmDoB, S.strGender, S.strMaritalStatus,
S.strResidencyCode, S.strNationalityCode, S.strHESAnumber, S.strSLCnumber, S.strPreviousSchoolName, S.strPreviousSchoolCode,
S.strPreviousSchoolType,
COLLEGE_EMAIL.strEmailAddress AS strEmailAlias,
PERSONAL_EMAIL.strEmailAddress AS strPersonalEmail,
P.[str(Sub)Plan], P.intYearOfCourse, P.strProgressCode,
P.strAcademicSessionStart, strC2Knumber AS C2K_ID, AcadPlan, strC2KmailAlias
,ISNULL([strC2KmailAlias], [strC2Knumber]) + '#c2kni.net' AS strC2KmailAddress
FROM dbo.tblStudents AS S
LEFT JOIN
dbo.udfMostRecentEmail('COLLEGE') AS COLLEGE_EMAIL ON S.strStudentNumber = COLLEGE_EMAIL.strStudentNumber
LEFT JOIN
dbo.udfMostRecentEmail('PERSONAL') AS PERSONAL_EMAIL ON S.strStudentNumber = PERSONAL_EMAIL.strStudentNumber
INNER JOIN
dbo.udfProgressHistory(#strAcademicSessionStart) AS P ON S.strStudentNumber = P.strStudentNumber
WHERE (P.strProgressCode LIKE #strProgressCode OR (SUBSTRING(#strProgressCode, 1, 1) = '^' AND P.strProgressCode NOT LIKE SUBSTRING(#strProgressCode, 2, LEN(#strProgressCode)))) AND
(P.strStudentNumber NOT IN
(SELECT strStudentNumber
FROM dbo.tblPilgrims
WHERE (strAcademicSessionStart = #strAcademicSessionStart) AND (strScheme = 'BEI')))
ORDER BY P.[str(Sub)Plan], P.intYearOfCourse, S.strSurname
)
Expanding on #Ross Pressers comment, this might not really be an answer, but demonstrates what is happening (a bit), with my understanding (which could be wrong!) of what is happening...
Run the setup code at the end and then....
Execute the following with query plan on (Ctrl-M)... (note: depending on the random number generator you may or may not get any results, that does not affect the plan)
declare #one varchar(100) = '379', #two varchar(200) = '726'
select * from wibble(#one, #two) -- 1
select * from wibble('379', '726') -- 2
select * from wibble(#one, #two) OPTION (RECOMPILE) -- 3
select * from wibble(#one, #two) -- 4
Caveat. The following is what happens on MY system, your mileage may vary...
-- 1 (and -- 4) are the most expensive.
SQL Server creates a generic plan as it does not know what the parameters are (yes they are defined, but the plan is for wibble(#one, #two) where, at that point, the parameter values are "unknown")
https://www.brentozar.com/pastetheplan/?id=rJtIRwx_r
-- 2 has a different plan
Here, sql server knows what the parameters are, so can create a specific plan, which is quite different to --1
https://www.brentozar.com/pastetheplan/?id=rJa9APldS
-- 3 has the same plan as --2
Testing this further, adding OPTION (RECOMPILE) gets SQL Server to create a specific plan for the specific execution of wibble(#one, #two) so we get the same plan as --2
--4 is there for completeness to show that after all that mucking about the generic plan is still in place
So, in this simple example we have a parameterised TVF being called with identical values, that are passed either as parameters or inline, producing different execution plans and different execution times as per the OP
Set up
use tempdb
GO
drop table if EXISTS Orders
GO
create table Orders (
OrderID int primary key,
UserName varchar(50),
PhoneNumber1 varchar(50),
)
-- generate 300000 with randon "phone" numbers
;WITH TallyTable AS (
SELECT TOP 300000 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS [N]
FROM dbo.syscolumns tb1,dbo.syscolumns tb2
)
insert into Orders
select n, 'user' + cast(n as varchar(10)), cast(CRYPT_GEN_RANDOM(3) as int)
FROM TallyTable;
GO
drop function if exists wibble
GO
create or alter function wibble (
#one varchar(4) = '%'
, #two varchar(4) = '%'
)
returns table
as
return select * from Orders
where PhoneNumber1 like '%' + #one + '%'
and PhoneNumber1 like '%' + #two + '%'
or (SUBSTRING(#one, 1, 1) = '^' AND PhoneNumber1 NOT LIKE SUBSTRING(#two, 2, LEN(#two)))
and (select 1) = 1
GO
Problem was overcome (I wouldn't say "fixed") by following up on Ross Presser's observation about the complexity of udfProgressHistory. This sucked data from a table tblProgressHistory which was joined to itself. The table is added to annually. I think this year's additional 2K records must have caused the sudden cost-hike when using a particular execution plan. I deleted >2K redundant records and we're back to sub-second execution.

SQL CLR Trigger - get source table

I am creating a DB synchronization engine using SQL CLR Triggers in Microsoft SQL Server 2012. These triggers do not call a stored procedure or function (and thereby have access to the INSERTED and DELETED pseudo-tables but do not have access to the ##procid).
Differences here, for reference.
This "sync engine" uses mapping tables to determine what the table and field maps are for this sync job. In order to determine the target table and fields (from my mapping table) I need to get the source table name from the trigger itself. I have come across many answers on Stack Overflow and other sites that say that this isn't possible. But, I've found one website that provides a clue:
Potential Solution:
using (SqlConnection lConnection = new SqlConnection(#"context connection=true")) {
SqlCommand cmd = new SqlCommand("SELECT object_name(resource_associated_entity_id) FROM sys.dm_tran_locks WHERE request_session_id = ##spid and resource_type = 'OBJECT'", lConnection);
cmd.CommandType = CommandType.Text;
var obj = cmd.ExecuteScalar();
}
This does in fact return the correct table name.
Question:
My question is, how reliable is this potential solution? Is the ##spid actually limited to this single trigger execution? Or is it possible that other simultaneous triggers will overlap within this process id? Will it stand up to multiple executions of the same and/or different triggers within the database?
From these sites, it seems the process Id is in fact limited to the open connection, which doesn't overlap: here, here, and here.
Will this be a safe method to get my source table?
Why?
As I've noticed similar questions, but all without a valid answer for my specific situation (except that one). Most of the comments on those sites ask "Why?", and in order to preempt that, here is why:
This synchronization engine operates on a single DB and can push changes to target tables, transforming the data with user-defined transformations, automatic source-to-target type casting and parsing and can even use the CSharpCodeProvider to execute methods also stored in those mapping tables for transforming data. It is already built, quite robust and has good performance metrics for what we are doing. I'm now trying to build it out to allow for 1:n table changes (including extension tables requiring the same Id as the 'master' table) and am trying to "genericise" the code. Previously each trigger had a "target table" definition hard coded in it and I was using my mapping tables to determine the source. Now I'd like to get the source table and use my mapping tables to determine all the target tables. This is used in a medium-load environment and pushes changes to a "Change Order Book" which a separate server process picks up to finish the CRUD operation.
Edit
As mentioned in the comments, the query listed above is quite "iffy". It will often (after a SQL Server restart, for example) return system objects like syscolpars or sysidxstats. But, it seems that in the dm_tran_locks table there's always an associated resource_type of 'RID' (Row ID) with the same object_name. My current query which works reliably so far is the following (will update if this changes or doesn't work under high load testing):
select t1.ObjectName FROM (
SELECT object_name(resource_associated_entity_id) as ObjectName
FROM sys.dm_tran_locks WHERE resource_type = 'OBJECT' and request_session_id = ##spid
) t1 inner join (
SELECT OBJECT_NAME(partitions.OBJECT_ID) as ObjectName
FROM sys.dm_tran_locks
INNER JOIN sys.partitions ON partitions.hobt_id = dm_tran_locks.resource_associated_entity_id
WHERE resource_type = 'RID'
) t2 on t1.ObjectName = t2.ObjectName
If this is always the case, I'll have to find that out during testing.
How reliable is this potential solution?
While I do not have time to set up a test case to show it not working, I find this approach (even taking into account the query in the Edit section) "iffy" (i.e. not guaranteed to always be reliable).
The main concerns are:
cascading (whether recursive or not) Trigger executions
User (i.e. Explicit / Implicit) transactions
Sub-processes (i.e. EXEC and sp_executesql)
These scenarios allow for multiple objects to be locked, all at the same time.
Is the ##SPID actually limited to this single trigger execution? Or is it possible that other simultaneous triggers will overlap within this process id?
and (from a comment on the question):
I think I can join my query up with the sys.partitions and get a dm_trans_lock that has a type of 'RID' with an object name that will match up to the one in my original query.
And here is why it shouldn't be entirely reliable: the Session ID (i.e. ##SPID) is constant for all of the requests on that Connection). So all sub-processes (i.e. EXEC calls, sp_executesql, Triggers, etc) will all be on the same ##SPID / session_id. So, between sub-processes and User Transactions, you can very easily get locks on multiple resources, all on the same Session ID.
The reason I say "resources" instead of "OBJECT" or even "RID" is that locks can occur on: rows, pages, keys, tables, schemas, stored procedures, the database itself, etc. More than one thing can be considered an "OBJECT", and it is possible that you will have page locks instead of row locks.
Will it stand up to multiple executions of the same and/or different triggers within the database?
As long as these executions occur in different Sessions, then they are a non-issue.
ALL THAT BEING SAID, I can see where simple testing would show that your current method is reliable. However, it should also be easy enough to add more detailed tests that include an explicit transaction that first does some DML on another table, or have a trigger on one table do some DML on one of these tables, etc.
Unfortunately, there is no built-in mechanism that provides the same functionality that ##PROCID does for T-SQL Triggers. I have come up with a scheme that should allow for getting the parent table for a SQLCLR Trigger (that takes into account these various issues), but haven't had a chance to test it out. It requires using a T-SQL trigger, set as the "first" trigger, to set info that can be discovered by the SQLCLR Trigger.
A simpler form can be constructed using CONTEXT_INFO, if you are not already using it for something else (and if you don't already have a "first" Trigger set). In this approach you would still create a T-SQL Trigger, and then set it as the "first" Trigger using sp_settriggerorder. In this Trigger you SET CONTEXT_INFO to the table name that is the parent of ##PROCID. You can then read CONTEXT_INFO() on a Context Connection in a SQLCLR Trigger. If there are multiple levels of Triggers then the value of CONTEXT INFO will get overwritten, so reading that value must be the first thing you do in each SQLCLR Trigger.
This is an old thread, but it is an FAQ and I think I have a better solution. Essentially it uses the schema of the inserted or deleted table to find the base table by doing a hash of the column names and comparing the hash with the hashes of tables with a CLR trigger on them.
Code snippet below - at some point I will probably put the whole solution on Git (it sends a message to Azure Service Bus when the trigger fires).
private const string colqry = "select top 1 * from inserted union all select top 1 * from deleted";
private const string hashqry = "WITH cols as ( "+
"select top 100000 c.object_id, column_id, c.[name] "+
"from sys.columns c "+
"JOIN sys.objects ot on (c.object_id= ot.parent_object_id and ot.type= 'TA') " +
"order by c.object_id, column_id ) "+
"SELECT s.[name] + '.' + o.[name] as 'TableName', CONVERT(NCHAR(32), HASHBYTES('MD5',STRING_AGG(CONVERT(NCHAR(32), HASHBYTES('MD5', cols.[name]), 2), '|')),2) as 'MD5Hash' " +
"FROM cols "+
"JOIN sys.objects o on (cols.object_id= o.object_id) "+
"JOIN sys.schemas s on (o.schema_id= s.schema_id) "+
"WHERE o.is_ms_shipped = 0 "+
"GROUP BY s.[name], o.[name]";
public static void trgSendSBMsg()
{
string table = "";
SqlCommand cmd;
SqlDataReader rdr;
SqlTriggerContext trigContxt = SqlContext.TriggerContext;
SqlPipe p = SqlContext.Pipe;
using (SqlConnection con = new SqlConnection("context connection=true"))
{
try
{
con.Open();
string tblhash = "";
using (cmd = new SqlCommand(colqry, con))
{
using (rdr = cmd.ExecuteReader(CommandBehavior.SingleResult))
{
if (rdr.Read())
{
MD5 hash = MD5.Create();
StringBuilder hashstr = new StringBuilder(250);
for (int i=0; i < rdr.FieldCount; i++)
{
if (i > 0) hashstr.Append("|");
hashstr.Append(GetMD5Hash(hash, rdr.GetName(i)));
}
tblhash = GetMD5Hash(hash, hashstr.ToString().ToUpper()).ToUpper();
}
rdr.Close();
}
}
using (cmd = new SqlCommand(hashqry, con))
{
using (rdr = cmd.ExecuteReader(CommandBehavior.SingleResult))
{
while (rdr.Read())
{
string hash = rdr.GetString(1).ToUpper();
if (hash == tblhash)
{
table = rdr.GetString(0);
break;
}
}
rdr.Close();
}
}
if (table.Length == 0)
{
p.Send("Error: Unable to find table that CLR trigger is on. Message not sent!");
return;
}
….
HTH

SQL Server 2014 - XQuery - get comma-separated List

I have a database table in SQL Server 2014 with only an ID column (int) and a column xmldata of type XML.
This xmldata column contains for example:
<book>
<title>a nice Novel</title>
<author>Maria</author>
<author>Peter</author>
</book>
As expected, I have multiple books, therefore multiple rows with xmldata.
I now want to execute a query for all books, where Peter is an Author. I tried this in some xPath2.0 testers and got to the conclusion that:
/book/author/concat(text(), if(position() != last())then ',' else '')
works.
If you try to port this success into SQL Server 2014 Express it looks like this, which is correctly escaped syntax etc.:
SELECT id
FROM books
WHERE 'Peter' IN (xmldata.query('/book/author/concat(text(), if(position() != last())then '','' else '''')'))
SQL Server however does not seem to support a construction like /concat(...) because of:
The XQuery syntax '/function()' is not supported.
I am at a loss then however, why /text() would work in:
SELECT id, xmldata.query('/book/author/text()')
FROM books
which it does.
My constraints:
I am bound to use SQL Server
I am bound to xpath or something else that can be "injected" as the statement above (if the structure of the xml or the database changes, the xpath above could be changed isolated and the application logic above that constructs the Where clause will not be touched) SEE EDIT
Is there a way to make this work?
regards,
BillDoor
EDIT:
My second constraint boils down to this:
An Application constructs the Where clause by
expression <operator> value(s)
expression is stored in a database and is mapped by the xmlTag eg.:
| tokenname| querystring
| "author" | "xmldata.query(/book/author/text())"
the values are presented by the Requesting user. so if the user asks for the author "Peter" with operator "EQUALS" the application constructs:
xmaldata.query(/book/author/text()) = "Peter"
as where clause.
If the customer now decides that author needs to be nested in an <authors> element, i can simply change the expression in the construction-database and the whole machine keeps running without any changes to code, simply manageable.
So i need a way to achieve that
<xPath> <operator> "Peter"
or any other combination of this three isolated components (see above: "Peter" IN <xPath>...) gets me all of Peters' books, even if there are multiple unsorted authors.
This would not suffice either (its not sqlserver syntax, but you get the idea):
WHERE xmldata.exist('/dossier/client[text() = "$1"]', "Peter") = 1;
because the operator is still nested in the expression, i could not request <> "Peter".
I know this is strange, please don't question the concept as a whole - it has a history :/
EDIT: further clarification:
The filter-rules come into the app in an XML structure basically:
Operator: "EQ"
field: "name"
value "Peter"
evaluates to:
expression = lookupExpressionForField("name") --> "table2.xmldata.value('book/author/name[1]', 'varchar')"
operator = lookUpOperatorMapping("EQ") --> "="
value = FormatValues("Peter") --> "Peter" (if multiple values are passed FormatValues cosntructs a comma seperated list)
the application then builds:
- constructClause(String expression,String operator,String value)
"table2.xmldata.value('book/author/name[1]', 'varchar')" + "=" + "Peter"
then constructs a Select statement with the result as WHERE clause.
it does not build it like this, unescaped, unfiltered for injection etc, but this is the basic idea.
i can influence how the input is Transalted, meaning I can implement the methods:
lookupExpressionForField(String field)
lookUpOperatorMapping(String operator)
Formatvalues(List<String> values) | Formatvalues(String value)
constructClause(String expression,String operator,String value)
however i choose to do, i can change the parameter types, I can freely implement them. The less the better of course. So simply constructing a comma-seperated list with xPath would be optimal (like if i could somewhere just tick "enable /function()-syntax in xPath" in sqlserver and the /concat(if...) would work)
How about something like this:
SET NOCOUNT ON;
DECLARE #Books TABLE (ID INT NOT NULL IDENTITY(1, 1) PRIMARY KEY, BookInfo XML);
INSERT INTO #Books (BookInfo)
VALUES (N'<book>
<title>a nice Novel</title>
<author>Maria</author>
<author>Peter</author>
</book>');
INSERT INTO #Books (BookInfo)
VALUES (N'<book>
<title>another one</title>
<author>Bob</author>
</book>');
SELECT *
FROM #Books bk
WHERE bk.BookInfo.exist('/book/author[text() = "Peter"]') = 1;
This returns only the first "book" entry. From there you can extract any portion of the XML field using the "value" function.
The "exist" function returns a boolean / BIT. This will scan through all "author" nodes within "book", so there is no need to concat into a comma-separated list only for use in an IN list, which wouldn't work anyway ;-).
For more info on the "value" and "exist" functions, as well as the other functions for use with XML data, please see:
xml Data Type Methods

How are parameterized queries done on free text CONTAINS with logical operators?

I am dynamically creating queries in SQL Server that are parameterized. When it comes to the CONTAINS predicate I have run into a question that I couldn’t find an answer to.
If the CONATINS will have logical operators (OR, AND, AND NOT) do I have to enumerate each of the values or do I use one parameter for all?
Here is a static example:
SELECT * FROM SomeTable WHERE CONTAINS(ColumName, ‘Cat and Dog’);
Is this the correct parameterized example?
SELECT * FROM SomeTable WHERE CONTAINS(ColumName, ‘#Q1 and #Q2’);
#Q1 = “Cat”
#Q2 = “Dog”
Or is this the correct approach?
SELECT * FROM SomeTable WHERE CONTAINS(ColumName, ‘#Q1’);
#Q1 = “Cat and Dog”
Or is it something entirely different?
At the time I asked the question I didn't have a database with full text enabled that I could test against. I was just writing the code and unit tests. Since then I managed to find a DB that I could test with.
The second example was the closest. The only problem was the the parameter was quoted so it would have searched on the literal "#Q1".
SELECT * FROM SomeTable WHERE CONTAINS(ColumName, #Q1);
#Q1 = “Cat and Dog”
If the query needs the a generation term, proximity term or wieghted term then those need to be in the parameter too.
#Q1 = 'FORMSOF(INFLECTIONAL, Cat) and FORMSOF(INFLECTIONAL, Dog)'
#Q1 = 'FORMSOF(THESAURUS, Cat) and FORMSOF(THESAURUS, "Dog")'
#Q1 = 'Cat NEAR Dog'

Resources