SQL Server Regular expressions in T-SQL - sql-server

Is there any regular expression library written in T-SQL (no CLR, no extended SP, pure T-SQL) for SQL Server, and that should work with shared hosting?
Edit:
Thanks, I know about PATINDEX, LIKE, xp_ sps and CLR solutions
I also know it is not the best place for regex, the question is theoretical :)
Reduced functionality is also accepted

How about the PATINDEX function?
The pattern matching in TSQL is not a complete regex library, but it gives you the basics.
(From Books Online)
Wildcard Meaning
% Any string of zero or more characters.
_ Any single character.
[ ] Any single character within the specified range
(for example, [a-f]) or set (for example, [abcdef]).
[^] Any single character not within the specified range
(for example, [^a - f]) or set (for example, [^abcdef]).

If anybody is interested in using regex with CLR here is a solution. The function below (C# .net 4.5) returns a 1 if the pattern is matched and a 0 if the pattern is not matched. I use it to tag lines in sub queries. The SQLfunction attribute tells sql server that this method is the actual UDF that SQL server will use. Save the file as a dll in a place where you can access it from management studio.
// default using statements above
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;
namespace CLR_Functions
{
public class myFunctions
{
[SqlFunction]
public static SqlInt16 RegexContain(SqlString text, SqlString pattern)
{
SqlInt16 returnVal = 0;
try
{
string myText = text.ToString();
string myPattern = pattern.ToString();
MatchCollection mc = Regex.Matches(myText, myPattern);
if (mc.Count > 0)
{
returnVal = 1;
}
}
catch
{
returnVal = 0;
}
return returnVal;
}
}
}
In management studio import the dll file via programability -- assemblies -- new assembly
Then run this query:
CREATE FUNCTION RegexContain(#text NVARCHAR(50), #pattern NVARCHAR(50))
RETURNS smallint
AS
EXTERNAL NAME CLR_Functions.[CLR_Functions.myFunctions].RegexContain
Then you should have complete access to the function via the database you stored the assembly in.
Then use in queries like so:
SELECT *
FROM
(
SELECT
DailyLog.Date,
DailyLog.Researcher,
DailyLog.team,
DailyLog.field,
DailyLog.EntityID,
DailyLog.[From],
DailyLog.[To],
dbo.RegexContain(Researcher, '[\p{L}\s]+') as 'is null values'
FROM [DailyOps].[dbo].[DailyLog]
) AS a
WHERE a.[is null values] = 0

There is some basic pattern matching available through using LIKE, where % matches any number and combination of characters, _ matches any one character, and [abc] could match a, b, or c...
There is more info on the MSDN site.

In case anyone else is still looking at this question, http://www.sqlsharp.com/ is a free, easy way to add regular expression CLR functions into your database.

If you are using SQL Server 2016 or above, you can use sp_execute_external_script along with R. It has functions for Regular Expression searches, such as grep and grepl.
Here's an example for email addresses. I'll query some "people" via the SQL Server database engine, pass the data for those people to R, let R decide which people have invalid email addresses, and have R pass back that subset of people to SQL Server. The "people" are from the [Application].[People] table in the [WideWorldImporters] sample database. They get passed to the R engine as a dataframe named InputDataSet. R uses the grepl function with the "not" operator (exclamation point!) to find which people have email addresses that don't match the RegEx string search pattern.
EXEC sp_execute_external_script
#language = N'R',
#script = N' RegexWithR <- InputDataSet;
OutputDataSet <- RegexWithR[!grepl("([_a-z0-9-]+(\\.[_a-z0-9-]+)*#[a-z0-9-]+(\\.[a-z0-9-]+)*(\\.[a-z]{2,4}))", RegexWithR$EmailAddress), ];',
#input_data_1 = N'SELECT PersonID, FullName, EmailAddress FROM Application.People'
WITH RESULT SETS (([PersonID] INT, [FullName] NVARCHAR(50), [EmailAddress] NVARCHAR(256)))
Note that the appropriate features must be installed on the SQL Server host. For SQL Server 2016, it is called "SQL Server R Services". For SQL Server 2017, it was renamed to "SQL Server Machine Learning Services".
Closing Thoughts
Microsoft's implementation of SQL (T-SQL) doesn't have native support for RegEx. This proposed solution may not be any more desirable to the OP than the use of a CLR stored procedure. But it does offer an additional way to approach the problem.

You can use VBScript regular expression features using OLE Automation. This is way better than the overhead of creating and maintaining an assembly. Please make sure you go through the comments section to get a better modified version of the main one.
http://blogs.msdn.com/b/khen1234/archive/2005/05/11/416392.aspx
DECLARE #obj INT, #res INT, #match BIT;
DECLARE #pattern varchar(255) = '<your regex pattern goes here>';
DECLARE #matchstring varchar(8000) = '<string to search goes here>';
SET #match = 0;
-- Create a VB script component object
EXEC #res = sp_OACreate 'VBScript.RegExp', #obj OUT;
-- Apply/set the pattern to the RegEx object
EXEC #res = sp_OASetProperty #obj, 'Pattern', #pattern;
-- Set any other settings/properties here
EXEC #res = sp_OASetProperty #obj, 'IgnoreCase', 1;
-- Call the method 'Test' to find a match
EXEC #res = sp_OAMethod #obj, 'Test', #match OUT, #matchstring;
-- Don't forget to clean-up
EXEC #res = sp_OADestroy #obj;
If you get SQL Server blocked access to procedure 'sys.sp_OACreate'... error, use sp_reconfigure to enable Ole Automation Procedures. (Yes, unfortunately that is a server level change!)
More information about the Test method is available here
Happy coding

Related

Sql-Server UniqueIntList type for java

I am having a stored procedure (sp) which uses UniqueIntList type in the sql server. I want to call this sp from java using spring's NamedParameterJdbcTemplate.
sql server sp execution details
DECLARE #M dbo.UniqueIntList
INSERT INTO #M VALUES (1),(2),(6)
exec usp_mysp #M
GO
Below is how I am executing it using java
private static final String SQL_SP = "usp_mysp :myVar";
MapSqlParameterSource mapSqlParameterSource = new MapSqlParameterSource();
mapSqlParameterSource.addValue("myVar", myList); // my list is List<Integer>
namedParameterJdbcTemplate.query(SQL_SP, mapSqlParameterSource, (q, i) -> {
// ... //
});
While doing this, I am getting Procedure or function has too many arguments specified and I figured out that this is due to the UniqueIntList type in the sql server.
So I would like to know how exactly shall I pass the values in the map for the NamedParameterJdbcTemplate?
As suggested by #David Browne in the comments, I had put some research on Table-valued parameter and this Link here helped to understand how can I achieve it using spring's NamedParameterJdbcTemplate for the ms-sql server

What is the best to remove xp_cmdshell calls from t-sql code

I'm maintaining large t-sql based application.
It has a lot of usages of bcp called through xp_cmdshell.
It is problematic, because xp_cmdshell has the same security context as SQL Server service account and it's more than necessary to the work.
My first idea to get rid of this disadvantage is to use CLR code. CLR is running with permissions of user that called the code.
I created following procedure and it works fine. I can see that it's using permissions of account that is running this code:
public static void RunBCP(SqlString arguments, out SqlString output_msg, out SqlString error_msg, out SqlInt32 return_val) {
output_msg = string.Empty;
error_msg = string.Empty;
try {
var proc = new Process {
StartInfo = new ProcessStartInfo {
FileName = "bcp",
Arguments = arguments.ToString(),
UseShellExecute = false,
RedirectStandardOutput = true,
CreateNoWindow = true
}
};
proc.Start();
while (!proc.StandardOutput.EndOfStream) {
output_msg += proc.StandardOutput.ReadLine();
}
return_val = proc.ExitCode;
}
catch (Exception e) {
error_msg = e.Message;
return_val = 1;
}
}
This is good solution because I'm not messing up in BCP calls(arguments are the same). There are no major changes in logic so there is no risk of an error.
Therefore previous call of BCP in T-SQL was looking this way:
declare #ReturnCode int;
declare #cmd varchar(1000);
SELECT #CMD = 'bcp "select FirstName, LastName, DateOfBirth" queryout "c:\temp\OutputFile.csv" -c -t -T -S"(local)"'
EXEC #ReturnCode=xp_cmdshell #CMD,no_output
Now I call it this way:
declare #ReturnCode int;
declare #cmd varchar(1000);
SELECT #CMD = '"select FirstName, LastName, DateOfBirth" queryout "c:\temp\OutputFile.csv" -c -t -T -S"(local)"'
exec DataBase.dbo.up_RunBCP #arguments = #cmd;
So, the question is: is there any other way to get rid of xp_cmdshell bcp code?
I heard that I can use PowerShell(sqlps). But examples I found suggest to create a powershell script.
Can I call such script from t-sql code?
How this code(powershell script) should be stored? As a database object?
Or maybe there is some other way? Not necessary SSIS. Most what I'd like to know is about powershell.
Thanks for any advices.
Your options for data EXPORT are the following:
using xp_cmdshell to call bcp.exe - your old way of bulk copying
using CLR - your new way of bulk copying
SSIS - my preferred way of doing this; here is the example
INSERT INTO OPENROWSET - the interesting alternative you can use if you are either working on 32-bit environment with text/Jet/whatever drivers installed, or you can install 64-bit drivers (e.g. 64-bit ODBC text driver, see Microsoft Access Database Engine 2010 Redistributable)
SQL Server Import/Export wizard - ugly manual way that seldom works in the way you want it to work
using external CSV table - not supported yet (SQL Server 2016 promises it will be...)
HTH
I would use simple Powershell script that does this, something like:
Invoke-SqlCommand -query '...' | ExportTo-Csv ...
Generally, for administrative functions you could add this to Task Scheduler and be done with it. If you need to execute this task as needed, you can do it via xp_cmdshell using schtasks.exe run Task_NAME which might be better for you since it might be easier to express yourself in Powershell then in T-SQL in given context.
Other mentioned thing all require extra tools (SSIS requires VS for example), this is portable with no dependencies.
To call script without xp_cmdshell you should create a job with powershell step and run it from within t-sql.

Code which is free from problems of SQL Injection

I have a comments box in my website
<td id="commentsBox" class="xec" size="200"></td>
I use Javascript to read the comments box and create an XML string.
<ROWS><COMMENTS>My comments</COMMENTS></ROWS>
The XML string is passed to a stored procedure via java. (I have simplified the SQL Code and XML String for the purposes of the question)
CREATE PROCEDURE [DB].[TEST$ExecuteXML] #doc VARCHAR(max)
,#P_Result VARCHAR(max) OUTPUT
AS
BEGIN
DECLARE #idoc INT;
EXEC sp_xml_preparedocument #idoc OUTPUT
,#doc;
INSERT INTO MyTable (
COMMENTS
,UPDATE_DATE
)
SELECT COMMENTS
,getDate()
FROM OPENXML(#idoc, '/ROWS', 1) WITH (
COMMENTS VARCHAR(200) 'COMMENTS'
);
SET #p_result = 1;
END
I have looked at sites dealing with SQL Injection such as https://technet.microsoft.com/en-us/library/ms161953(v=sql.105).aspx
Is it possible to enter something into the textbox that will be destructive to the database?
UPDATE
In response to #Linky I add here (part of) the Java code - although I am at a loss to understand how this could be problematic, as the premise of my question is that the SQL Server procedure could accept anything in the XML.
XMLObject br = new XMLObject(xmlString);
String result = br.update();
public class XMLObject {
public static final int RESULT_FAILED = 0;
public static final int RESULT_SUCCESS = 1;
protected DBConnection dbConn = null;
protected String theXML=null;
public XMLObject(String theXML) {
this.theXML=theXML;
}
public String update() {
String result;
ArrayList<DBField> fields = new ArrayList<>();
fields.add(new DBField(DBField.STRING, theXML, false));
result = DMLUtils.executeString("ExecuteXML", fields);
return result;
}
}
The best way to avoid SQL Injection attacks is to tie the content to a particular column. Directly entering the users comments into the database opens you up to cross site scripting attacks if that data is ever displayed to the user. I would suggest that you take this question to the Security Stack Exchange.
Injection comes from allowing SQL data to become or be treated as SQL commands. In your example you are keeping data and commands clearly delineated, and AFAIK, neither OPENXML nor sp_xml_preparedocumen can on their own be used cause injection.
So, in my professional opinion, this appears to be safe from injection.

Scalar function fn_cdc_get_min_lsn() constantly returns '0x00000000000000000000' for valid table names?

I have Change Data Capture (CDC) activated on my MS SQL 2008 database and use the following code to add a new tabel to the data capture:
EXEC sys.sp_cdc_enable_table
#source_schema ='ordering',
#source_name ='Fields',
#role_name = NULL,
#supports_net_changes = 0;
However, whenever I try to select the changes from the tracking tables using the sys.fn_cdc_get_min_lsn(#TableName) function
SET #Begin_LSN = sys.fn_cdc_get_min_lsn('Fields')
I always get the zero value.
I tried adding the schema name using the following spelling:
SET #Begin_LSN = sys.fn_cdc_get_min_lsn('ordering.Fields')
but this didn't help.
My mystake was to assume that sys.fn_cdc_get_min_lsn() accepts the table name. I was mostly misguided by the examples in MSDN documentation, probably and didn't check the exact meaning of the parameters.
It turns out that the sys.fn_cdc_get_min_lsn() accepts the capture instance name, not table name!
A cursory glance at my current capture instances:
SELECT capture_instance FROM cdc.change_tables
returns the correct parameter name:
ordering_Fields
So, one should use underscore as schema separator, and not the dot notation as it is common in SQL Server.
I know this is mostly already explained in this post but I thought I would put together my evenings journey through CDC
This error:
"An insufficient number of arguments were supplied for the procedure or function cdc..."
Is probably caused by your low LSN being 0x00
This in turn might be because you put the wrong instance name in with fn_cdc_get_min_lsn.
Use SELECT * FROM cdc.change_tables to find it
Lastly make sure you use binary(10) to store your LSN. If you use just varbinary or binary, you will again get 0x00. This is clearly payback for me scoffing at all those noobs using varchar and wondering why their strings are truncated to one character.
Sample script:
declare #S binary(10)
declare #E binary(10)
SET #S = sys.fn_cdc_get_min_lsn('dbo_YourTable')
SET #E = sys.fn_cdc_get_max_lsn()
SELECT #S, #E
SELECT *
FROM [cdc].[fn_cdc_get_net_changes_dbo_issuedToken2]
(
#S,#E,'all'
)
The above answer is correct. Alternatively you can add an additional parameter capture_instance to the cdc enable
EXEC sys.sp_cdc_enable_table
#source_schema ='ordering',
#source_name ='Fields',
#capture_instance = 'dbo_Fields'
#role_name = NULL,
#supports_net_changes = 0;
then use the capture_instance string in the min_lsn function
SET #Begin_LSN = sys.fn_cdc_get_min_lsn('dbo_Fields')
will return the first LSN, and not 0x00000000000000000000.
This is partiularly useful when trying to solve the error
"An insufficient number of arguments were supplied for the procedure or function cdc..." from SQL when calling
cdc_get_net_changes_Fields(#Begin_LSN, sys.fn_cdc_get_max_lsn(), 'all')
Which simply means "LSN out of expected range"

SQL Server: Concatenating WHERE Clauses. Seeking Appropriate Pattern

I want to take a poorly designed SQL statement that's embedded in C# code and rewrite it as a stored procedure (presumably), and am looking for an appropriate means to address the following pattern:
sql = "SELECT <whatever> FROM <table> WHERE 1=1";
if ( someCodition.HasValue )
{
sql += " AND <some-field> = " + someCondition.Value;
}
This is a simplification. The actual statement is quite long and contains several such conditions, some of which include INNER JOIN's to other tables if the condition is present. This last part is key, otherwise I'd probably be able to solve all of them with:
WHERE <some-condition-value> IS NULL OR <some-field> = <some-condition-value>
I can think of a few possible approaches. I'm looking for the correct approach.
Edit:
I don't want to perform concatenation in C#. I consider this a serious compromise to security.
If I understand the question properly, the idea is to replace a whole section of code in C# in charge of producing, "long hand", a specific SQL statement corresponding to a list of search criteria, by a single call to a stored-procedure which would, SQL-side, use a generic template of the query aimed at handling all allowed combinations of search criteria in a uniform fashion.
In addition to the difficulty of mapping expressions evaluated on the application-side (eg. someCondition.HasValue) to expressions evaluated on the SQL-side (eg "some-condition-value"), the solution you envision may be logically/functionally equivalent to a "hand-crafted" SQL statement, but slower and more demanding of SQL resources.
Essentially, the C# code encapsulates specific knowledge about the "physical" layout of the database and its schema. It uses this info to figure-out when a particular JOIN may be required or when a particular application-level search criteria value translate to say a SQL "LIKE" rather than an "=" predictate. It may also encaspsulate business rules such as "when the ZIP code is supplied, search by that rather than by State".
You are right to attempt and decouple the data model (the way the application sees the data) from the data schema (the way it is declared and stored in SQL), but the proper mapping needs to be done somehow, somewhere.
Doing this at the level of the application, with all the expressive power of C# as opposed to say T-SQL, is not necessarily a bad thing, provided it is done
- in a module that is independent of other features of the application
and, where practical,
- it is somewhat data/configuration-driven as so to allow small changes in the data model (say the addition of a search criteria) to be implemented by changing a configuration file, rather than plugging this in somewhere in the middle of a long series of C# conditional statements.
start with this WHERE clause:
WHERE 1=1
then append all conditions as:
AND <some-field> = " + someCondition.Value;
the optimizer will toss out the 1=1 condition and you don't have to worry about too many ANDs
EDIT based on OP's comment about not wanting to concatinate strings:
here is a very comprehensive article on how to handle this topic:
Dynamic Search Conditions in T-SQL by Erland Sommarskog
it covers all the issues and methods of trying to write queries with multiple optional search conditions
here is the table of contents:
Introduction
The Case Study: Searching Orders
The Northgale Database
Dynamic SQL
Introduction
Using sp_executesql
Using the CLR
Using EXEC()
When Caching Is Not Really What You Want
Static SQL
Introduction
x = #x OR #x IS NULL
Using IF statements
Umachandar's Bag of Tricks
Using Temp Tables
x = #x AND #x IS NOT NULL
Handling Complex Conditions
Hybrid Solutions – Using both Static and Dynamic SQL
Using Views
Using Inline Table Functions
Conclusion
Feedback and Acknowledgements
Revision History
Well you can start with
StringBuilder sb = new StringBuilder();
sb.Append("SELECT <whatever> FROM <table> WHERE 1 = 1 ");
if ( someCodition.HasValue )
{
sb.Append(" AND <some-field> = " + someCondition.Value);
}
// And so on
Will save you the trouble of putting the first WHERE - AND
[Edit]
You can also try this
Create an SP with all required parameters for the table, and write the query like this.
DECLARE #sqlStatement NVARCHAR(MAX)
#sqlStatement = " SELECT fields1, fields2 FROM TableA WHERE 1 = 1 "
if(#param1 IS NOT NULL) #sqlStatement = #sqlStatement + "AND Column1 = " + #param1
if(#param2 IS NOT NULL) #sqlStatement = #sqlStatement + "AND Column2 = " + #param2
// and so on
sp_executeSql #sqlStatement
Also you can try similar SP but with:
SELECT fields1, fields2 FROM TableA WHERE 1 = 1
AND ( ( #param1 IS NULL ) OR ( Column1 = #param1 ) )
AND ( ( #param2 IS NULL ) OR ( Column2 = #param2 ) )
this is definitely injection proof!

Resources