What is regexp_replace equivalent in SQL Server - sql-server

I have this piece of code in Oracle which I need to convert into SQL Server to get the same behavior. I have used the REPLACE function. It seems to be working but I just wanted to make sure.
REGEXP_REPLACE(
phonenumber,
'([[:digit:]]{3})([[:digit:]]{3})([[:digit:]]{4})',
'(\1)\2-\3'
) phonenumber

As Martin said in his answer, SQL Server does not have built-in RegEx functionality (and while it has not been suggested here, just to be clear: no, the [...] wildcard of LIKE and PATINDEX is not RegEx). If your data has little to no variation then yes, you can use some combination of T-SQL functions: REPLACE, SUBSTRING, LEFT, RIGHT, CHARINDEX, PATINDEX, FORMATMESSAGE, CONCAT, and maybe one or two others.
However, if the data / input has even a moderate level of complexity, then the built-in T-SQL functions will be at best be cumbersome, and at worst useless. In such cases it's possible to do actual RegEx via SQLCLR (as long as you aren't using Azure SQL Database Single DB or SQL Server 2017+ via AWS RDS), which is (restricted) .NET code running within SQL Server. You can either code your own / find examples here on S.O. or elsewhere, or try a pre-done library such as the one I created, SQL# (SQLsharp), the Free version of which contains several RegEx functions. Please note that SQLCLR, being .NET, is not a POSIX-based RegEx, and hence does not use POSIX character classes (meaning: you will need to use \d for "digits" instead of [:digit:]).
The level of complexity needed in this particular situation is unclear as the example code in the question implies that the data is simple and uniform (i.e. 1112223333) but the example data shown in a comment on the question appears to indicate that there might be dashes and/or spaces in the data (i.e. xxx- xxx xxxx).
If the data truly is uniform, then stick with the pure T-SQL solution provided by #MartinSmith. But, if the data is of sufficient complexity, then please consider the RegEx example below, using a SQLCLR function found in the Free version of my SQL# library (as mentioned earlier), that easily handles the 3 variations of input data and more:
SELECT SQL#.RegEx_Replace4k(tmp.phone,
N'\(?(\d{3})\)?[ .-]*(\d{3})[ .-]*(\d{4})', N'($1)$2-$3',
-1, -- count (-1 == unlimited)
1, -- start at
N'') -- RegEx options
FROM (VALUES (N'8885551212'),
(N'123- 456 7890'),
(N'(777) 555- 4653')
) tmp([phone]);
returns:
(888)555-1212
(123)456-7890
(777)555-4653
The RegEx pattern allows for:
0 or 1 (
3 decimal digits
0 or 1 )
0 or more of , ., or -
3 decimal digits
0 or more of , ., or -
4 decimal digits
NOTE
It was mentioned that the newer Language Extensions might be a better choice than SQLCLR. Language Extensions allow calling R / Python / Java code, hosted outside of SQL Server, via the sp_execute_external_script stored procedure. As the Tutorial: Search for a string using regular expressions (regex) in Java page shows, external scripts are actually not a good choice for many / most uses of RegEx in SQL Server. The main problems are:
Unlike with SQLCLR, the only interface for external scripts is a stored procedure. This means that you can't use any of that functionality inline in a query (SELECT, WHERE, etc).
With external scripts, you pass in the query, work on the results in the external language, and pass back a static result set. This means that compiled code now has to be more specialized (i.e. tightly-coupled) to the particular usage. Changing how the query uses RegEx and/or what columns are returned now requires editing, compiling, testing, and deploying the R / Python / Java code in addition to (and coordinated with!) the T-SQL changes.
I'm sure external scripts are absolutely wonderful, and a better choice than SQLCLR, in certain scenarios. But they certainly do not lend themselves well to the highly varied, and often ad hoc, nature of how RegEx is used (like many / most other functions).

SQL Server does not have native regex support. You would need to use CLR (or as #Lukasz Szozda points out in the comments one of the newer Language Extensions) .
If I have understood the regex correctly though it matches strings of 10 digits and assigns the first 3 to group 1, second 3 to group 2, and last 4 to group 3 and then uses the back references in the expression (\1)\2-\3
You can use built in string functions to do this as below
SELECT CASE
WHEN phonenumber LIKE REPLICATE('[0-9]', 10)
THEN FORMATMESSAGE('(%s)%s-%s',
LEFT(phonenumber, 3),
SUBSTRING(phonenumber, 4, 3),
RIGHT(phonenumber, 4))
ELSE phonenumber
END

You can write SQL function using CLR, that will wrap standard dotnet regex. I have wrote this and you can use it there. It will look this:
DECLARE #SourceText NVARCHAR(MAX) = N'My first line <br /> My second line';
DECLARE #RegexPattern NVARCHAR(MAX) = N'([<]br\s*/[>])';
DECLARE #Replacement NVARCHAR(MAX) = N''
DECLARE #IsCaseSensitive BIT = 0;
SELECT regex.Replace(#SourceText, #RegexPattern, #Replacement, #IsCaseSensitive);

Related

Anyone recognize this Syntax or Programming Language?

I am the local admin for an obsucure CRM-ATS prior to migrating to SFDC in 18 mos. It has a (basically BETA) report builder that is not well documented, but, appears very powerful. I have the capabilities to build custom expressions within the report, but, I can't figure the syntax for all the operators.
Does anyone recognize this code or operators list that might be able to point me to the proper language for syntax and creation of these individual expressions.
IF(GREATER_OR_EQUAL(DATE_DIFF(NOW(); JobCurrentStep.StepTime); 14); SUBSTRING("14Days+"; 1); SUBSTRING("<14 Days"; 1))
DATE_DIFF(StepsLinkedPeop.StepStartTime; StepsLinkedPeop.StepEndTime)
COUNT_DISTINCT(People.Person)
IF(LIKE(LinkedJobs.JobClientNameSBD; "MSP"); COUNT_DISTINCT(People.Person); 9)
COUNT_DISTINCT(People.Person)
IF(GREATER_OR_EQUAL(DATE_DIFF(StepChangesJour.StepStartTime; NOW()); 30); COUNT_DISTINCT(People.Person); 0
COUNT_DISTINCT(LinkedPeople.Applicant)
COUNT(LinkedPeople.Applicant)
DATE_DIFF(StepChangesJour.StepEndTime; StepChangesJour.StepStartTime)
GREATER_OR_EQUAL(DATE_DIFF(StepChangesJour.StepEndTime; StepChangesJour.StepStartTime); 7)
DATE_DIFF(NOW(); JobCurrentStep.StepTime)
IF(GREATER_OR_EQUAL(DATE_DIFF(NOW(); JobCurrentStep.StepTime); 500); SUBSTRING("Greater than 2 Weeks"; 1); SUBSTRING("Recent"; 1))
Here are the available operators:
AVG
CONCAT
COUNT
COUNT_DISTINCT
DATE_ADD_DAYS
DATE_ADD_SECONDS
DATE_DIFF
DATE_DIFF_IN_SECONDS
DATE_DIFF_IN_YEARS
DATE_FORMAT
DIVISION
EQUALS
GREATER
GREATER_OR_EQUAL
GROUP_CONCAT
GROUP_CONCAT_DISTINCT
GROUP_CONCAT_DISTINCT_WITH_HYPHEN
GROUP_CONCAT_DISTINCT_WITH_PIPES
HOUR_DIFF
IF
IF_NULL
IN
INET_NTOA
LIKE
LITERAL_NULL
LOCATE
LOGGED_USER_ID
LOGGED_USER_PERSON_ID
LOGGED_USER_TIMEZONE
MAX
MIN
MINUS
MULTIPLY
NOW
PCT
PLUS
REPLACE
ROUND
SUBSTRING
SUBSTRING_INDEX
SUM
SUM_DISTINCT
TO_DATETIME
TO_INT
TRIM
TRUNCATE
WORKING_DAYS
Though I don't assert this because I don't have experience with the language, this looks like ABAP to me, the high-level language created by the German software company SAP for its business applications.
Many of the operators you've mentioned look like MySQL function names or keywords. Notable examples include:
COUNT_DISTINCT, which looks a lot like COUNT(DISTINCT)
GROUP_CONCAT
INET_NTOA
LIKE
SUBSTRING_INDEX
However, many of the functions you've identified do not appear in MySQL; in particular, basic operators like PLUS, MULTIPLY and EQUALS are not functions in MySQL, and LOGGED_USER_ID and WORKING_DAYS do not appear in MySQL either. Additionally, the function call syntax you've described doesn't match what MySQL uses.
If I had to guess, I'd say you're looking at something custom that "compiles" expressions into MySQL queries.

Removing decimals from Number keeping thousands delimiters in Sql server?

I have a number like : 12345.678
I want it to be like : 12,345 ( removing the rest !)
one solution of adding thousands seperator is :
select convert(varchar(100), cast(12345.678 as money), 1)
which yields : 12,345.68
Now I should remove the .68
I stumbled upon a beautiful solution with parseName :
select parsename(convert(varchar(100), cast(12345.678 as money), 1),2)
which yields : 12,345
Question : is there a better solution for this problem ? ( maybe without involving other functions/string manipulations ? )
There's no easy way of doing this in SQL server 2008. SQL Server 2012 introduced the FORMAT function, which enables you to do the following:
SELECT FORMAT(12345.67, '#,###')
(although this will round the number to 12,346)
Formatting results is something that is not normally in the scope of a database - it's something which should be left to your front-end program/web site/report/spreadsheet etc. However, if you really need to be doing this in SQL Server, I suspect your solution is as close as you're going to get, unless you use CLR Integration to link to the String.Format function from .NET.
If you do go with your solution using parseName, be aware that it may not work internationally (e.g. in parts of Europe where , is used as a decimal separator). This may not be an issue for you, but if it is then you'll need a solution which allows you to explicitly control the formatting.
Parsename is indeed an inventive solution. I have always been using
select replace(convert(varchar,cast(floor(12345.678) as money),1), '.00', '');
-- parsename(convert(varchar,cast(12345.678 as money),1), 2);
Although it's longer. But maybe a direct replacement is faster than a parse routine? In any case, it's not better in the sense that even if it ran 21ns faster, in the greater scheme of things, what are you trying to improve on with such a question?
use floor() to convert it to integer first :
select convert(varchar(100), cast(floor(12345.68) as money), 1)
I support Richards comment that this is normally done in the 'presentation' layer. Why do you need to format this in SQL Server? Are you viewing reports directly in SQL Server? Usually the first thing someone does is copy and paste into Excel.
Anyway If you can always wrap the expression above in a some more string functions:
select replace(
convert(
varchar(100),
cast(floor(12345.68) as money),
1
),'.00','')
There are still possible bugs in this, and again I do not recommend this is done at SQL Server unless you are generating COBOL-like text reports.

How to get data from database according to string length without using any string function

I have to get the records from a table field where Length of record/data/string is greater then 8 characters. I cannot use any string function as the query has to be used on (MySQL, MSSQL, Oracle).
I don't want to do the below EXAMPLE:
List<String> names = new ArrayList<String>();
String st = select 'name' from table;
rs = executeSQL(st);
if ( rs != null )
{
rs.next();
names.add(rs.getString(1));
}
for(String name : names)
{
if(name.length() > 8)
result.add(name);
}
Any idea other then the one coded above? A query that can get the required result instead of processing on the retrieved data and then getting the required result.
Thank you for any help / clue.
JDBC Drivers may implement a JDBC escapes for the functions listed in appendix D (Scalar Functions) of the JDBC specification. A driver should convert the scalar functions it supports to the appropriate function on the database side. A list of the supported functions can be queried using 'DatabaseMetaData.getStringFunctions()'
To use this in a query you would then either use CHAR_LENGTH(string) or LENGTH(string) like :
SELECT * FROM table WHERE {fn CHAR_LENGTH(field)} > 8
You can replace CHAR_LENGTH with LENGTH. The driver (if it supports this function) will then convert it to the appropriate function in the underlying database.
From section 13.4.1 Scalar Functions of the JDBC 4.1 specification:
Appendix D “Scalar Functions" provides a list of the scalar functions
a driver is expected to support. A driver is required to implement
these functions only if the data source supports them, however.
The escape syntax for scalar functions must only be used to invoke the
scalar functions defined in Appendix D “Scalar Functions". The escape
syntax is not intended to be used to invoke user-defined or vendor
specific scalar functions.
I think you may be better off leveraging the power of the database and implementing a factory for your SQL statements (or perhaps for objects encapsulating your SQL functionality).
That way you can configure your factory with the name/type of the database, and it'll give you the appropriate SQL statements for that database. It gives you a clean means of parameterising this info, whilst allowing you to leverage the functionality of your databases and not having to replicate the database functionality in a suboptimal fashion in your code.
e.g.
DabaseStatementFactory fac = DatabaseStatementFactory.for(NAME_OF_DATABASE);
String statement = fac.getLongNames();
// then use this statement. It'll be configured for each db type
It's probably wise to encapsulate further and use something like:
DabaseStatementFactory fac = DatabaseStatementFactory.for(NAME_OF_DATABASE);
List<String> names = fac.getLongNames();
such that you're not making assumptions re. common schema and means of queries etc.
Another solution that I found is:
Select name from table where name like '________';
SQL counts the underscore (_) characters and return a name of length equal to number of underscore characters.

What exactly does the T-SQL "LineNo" reserved word do?

I was writing a query against a table today on a SQL Server 2000 box, and while writing the query in Query Analyzer, to my surprise I noticed the word LineNo was converted to blue text.
It appears to be a reserved word according to MSDN documentation, but I can find no information on it, just speculation that it might be a legacy reserved word that doesn't do anything.
I have no problem escaping the field name, but I'm curious -- does anyone know what "LineNo" in T-SQL is actually used for?
OK, this is completely undocumented, and I had to figure it out via trial and error, but it sets the line number for error reporting. For example:
LINENO 25
SELECT * FROM NON_EXISTENT_TABLE
The above will give you an error message, indicating an error at line 27 (instead of 3, if you convert the LINENO line to a single line comment (e.g., by prefixing it with two hyphens) ):
Msg 208, Level 16, State 1, Line 27
Invalid object name 'NON_EXISTENT_TABLE'.
This is related to similar mechanisms in programming languages, such as the #line preprocessor directives in Visual C++ and Visual C# (which are documented, by the way).
How is this useful, you may ask? Well, one use of this it to help SQL code generators that generate code from some higher level (than SQL) language and/or perform macro expansion, tie generated code lines to user code lines.
P.S., It is not a good idea to rely on undocumented features, especially when dealing with a database.
Update: This explanation is still correct up to and including the current version of SQL Server, which at the time of this writing is SQL Server 2008 R2 Cumulative Update 5 (10.50.1753.0) .
Depending on where you use it, you can always use [LineNo]. For example:
select LnNo [LineNo] from OrderLines.

Is this sufficient to prevent query injection while using SQL Server?

I have recently taken on a project in which I need to integrate with PHP/SQL Server. I am looking for the quickest and easiest function to prevent SQL injection on SQL Server as I prefer MySQL and do not anticipate many more SQL Server related projects.
Is this function sufficient?
$someVal = mssql_escape($_POST['someVal']);
$query = "INSERT INTO tblName SET field = $someVal";
mssql_execute($query);
function mssql_escape($str) {
return str_replace("'", "''", $str);
}
If not, what additional steps should I take?
EDIT:
I am running on a Linux server - sqlsrv_query() only works if your hosting environment is windows
The best option: do not use SQL statements that get concatenated together - use parametrized queries.
E.g. do not create something like
string stmt = "INSERT INTO dbo.MyTable(field1,field2) VALUES(" + value1 + ", " + value2 + ")"
or something like that and then try to "sanitize" it by replacing single quotes or something - you'll never catch everything, someone will always find a way around your "safe guarding".
Instead, use:
string stmt = "INSERT INTO dbo.MyTable(field1,field2) VALUES(#value1, #value2)";
and then set the parameter values before executing this INSERT statement. This is really the only reliable way to avoid SQL injection - use it!
UPDATE: how to use parametrized queries from PHP - I found something here - does that help at all?
$tsql = "INSERT INTO DateTimeTable (myDate, myTime,
myDateTimeOffset, myDatetime2)
VALUES (?, ?, ?, ?)";
$params = array(
date("Y-m-d"), // Current date in Y-m-d format.
"15:30:41.987", // Time as a string.
date("c"), // Current date in ISO 8601 format.
date("Y-m-d H:i:s.u") // Current date and time.
);
$stmt = sqlsrv_query($conn, $tsql, $params);
So it seems you can't use "named" parameters like #value1, #value2, but instead you just use question marks ? for each parameter, and you basically just create a parameter array which you then pass into the query.
This article Accessing SQL Server Databases with PHP might also help - it has a similar sample of how to insert data using the parametrized queries.
UPDATE: after you've revealed that you're on Linux, this approach doesn't work anymore. Instead, you need to use an alternate library in PHP to call a database - something like PDO.
PDO should work both on any *nix type operating system, and against all sorts of databases, including SQL Server, and it supports parametrized queries, too:
$db = new PDO('your-connection-string-here');
$stmt = $db->prepare("SELECT priv FROM testUsers WHERE username=:username AND password=:password");
$stmt->bindParam(':username', $user);
$stmt->bindParam(':password', $pass);
$stmt->execute();
No, it's not sufficient. To my knowledge, string replacement can never really be sufficient in general (on any platform).
To prevent SQL injection, all queries need to be parameterized - either as parameterized queries or as stored procedures with parameters.
In these cases, the database calling library (i.e. ADO.NET and SQL Command) sends the parameters separately from the query and the server applies them, which eliminates the ability for the actual SQL to be altered in any way. This has numerous benefits besides injection, which include code page issues and date conversion issues - for that matter any conversions to string can be problematic if the server does not expect them done the way the client does them.
I partially disagree with other posters. If you run all your parameters through a function that double the quotes, this should prevent any possible injection attack. Actually in practice the more frequent problem is not deliberate sabotague but queries that break because a value legitimately includes a single quote, like a customer named "O'Hara" or a comment field of "Don't call Sally before 9:00". Anyway, I do escapes like this all the time and have never had a problem.
One caveat: On some database engines, there could be other dangerous characters besides a single quote. The only example I know is Postgres, where the backslash is magic. In this case your escape function must also double backslashes. Check the documentation.
I have nothing against using prepared statements, and for simple cases, where the only thing that changes is the value of the parameter, they are an excellent solution. But I routinely find that I have to build queries in pieces based on conditions in the program, like if parameter X is not null then not only do I need to add it to the where clause but I also need an additional join to get to the value I really need to test. Prepared statements can't handle this. You could, of course, build the SQL in pieces, turn it into a prepared statement, and then supply the parameters. But this is just a pain for no clear gain.
These days I mostly code in Java that allows functions to be overloaded, that is, have multiple implementations depending on the type of the passed in parameter. So I routine write a set of functions that I normally name simply "q" for "quote", that return the given type, suitably quoted. For strings, it doubles any quote marks, then slaps quote marks around the whole thing. For integers it just returns the string representation of the integer. For dates it converts to the JDBC (Java SQL) standard date format, which the driver is then supposed to convert to whatever is needed for the specific database being used. Etc. (On my current project I even included array as a passed in type, which I convert to a format suitable for use in an IN clause.) Then every time I want to include a field in a SQL statement, I just write "q(x)". As this is slapping quotes on when necessary, I don't need the extra string manipulation to put on quotes, so it's probably just as easy as not doing the escape.
For example, vulnerable way:
String myquery="select name from customer where customercode='"+custcode+"'";
Safe way:
String myquery="select name from customer where customercode="+q(custcode);
The right way is not particularly more to type than the wrong way, so it's easy to get in a good habit.
String replacement to escape quotes IS sufficient to prevent SQL injection attack vectors.
This only applies to SQL Server when QUOTED_IDENTIFIER is ON, and when you don't do something stoopid to your escaped string, such as truncating it or translating your Unicode string to an 8-bit string after escaping. In particular, you need to make sure QUOTED_IDENTIFIER is set to ON. Usually that's the default, but it may depend on the library you are using in PHP to access MSSQL.
Parameterization is a best practice, but there is nothing inherently insecure about escaping quotes to prevent SQL injection, with due care.
The rel issue with escaping strings is not the efficacy of the replacement, it is the potential for forgetting to do the replacement every time.
That said, your code escapes the value, but does not wrap the value in quotes. You need something like this instead:
function mssql_escape($str) {
return "N'" + str_replace("'", "''", $str) + "'";
}
The N above allows you to pass higher Unicode characters. If that's not a concern (i.e., your text fields are varchar rather than nvarchar), you can remove the N.
Now, if you do this, there are some caveats:
You need to make DAMNED SURE you call mssql_escape for every string value. And therein lies the rub.
Dates and GUID values also need escaping in the same manner.
You should validate numeric values, or at least escape them as well using the same function (MSSQL will cast the string to the appropriate numeric type).
Again, like others have said, parameterized queries are safer--not because escaping quotes doesn't work (it does except as noted above), but because it's easier to visually make sure you didn't forget to escape something.

Resources