Parsing CREATE TABLE sql with regex - c

I'm doing a simple language sql with regex, and i'm parsing the sentence CREATE TABLE, but does not work. I use this:
reti = regcomp(&regex, "CREATE TABLE [a-zA-Z]\\((.*)\\)", REG_EXTENDED);
It's something simple, i just do it to learn...
What is wrong with the regex?

Apart from only working on one letter SQL tables, you are either omitting whitespaces before the open parentheses or, depending on the SQL and regex engine/string syntax, over-escaping your expression parentheses.
Check:
[A-Za-z]+\\s*
if it doesn't work and the plus sign is not recognized,
[A-Za-z][A-Za-z]*\\s*
and whether it's \\\\( or just \\( (it should be the latter. But better be sure).
This supports names such as Antinoo and cUsToMeRs, but not Invoices_New or Suppliers2014. You might want to add numbers and underscores to your regex. Since table names probably won't start with numbers, you'll want
[A-Z_a-z][A-Z_a-z0-9]*\\s*\\(([^;]*)\\)

Related

Snowflake and Regular Expressions - issue when implementing known good expression in SF

I'm looking for some assistance in debugging a REGEXP_REPLACE() statement.
I have been using an online regular expressions editor to build expressions, and then the SF regexp_* functions to implement them. I've attempted to remain consistent with the SF regex implementation, but I'm seeing an inconsistency in the returned results that I'm hoping someone can explain :)
My intent is to replace commas within the text (excluding commas with double-quoted text) with a new delimiter (#^#).
Sample text string:
"Foreign Corporate Name Registration","99999","Valuation Research",,"Active Name",02/09/2020,"02/09/2020","NEVADA","UNITED STATES",,,"123 SOME STREET",,"MILWAUKEE","WI","53202","UNITED STATES","123 SOME STREET",,"MILWAUKEE","WI","53202","UNITED STATES",,,,,,,,,,,,
RegEx command and Substitution (working in regex101.com):
([("].*?["])*?(,)
\1#^#
regex101.com Result:
"Foreign Corporate Name Registration"#^#"99999"#^#"Valuation Research"#^##^#"Active Name"#^#02/09/2020#^#"02/09/2020"#^#"NEVADA"#^#"UNITED STATES"#^##^##^#"123 SOME STREET"#^##^#"MILWAUKEE"#^#"WI"#^#"53202"#^#"UNITED STATES"#^#"123 SOME STREET"#^##^#"MILWAUKEE"#^#"WI"#^#"53202"#^#"UNITED STATES"#^##^##^##^##^##^##^##^##^##^##^##^#
When I try and implement this same logic in SF using REGEXP_REPLACE(), I am using the following statement:
SELECT TOP 500
A.C1
,REGEXP_REPLACE((A."C1"),'([("].*?["])*?(,)','\\1#^#') AS BASE
FROM
"<Warehouse>"."<database>"."<table>" AS A
This statement returns the result for BASE:
"Foreign Corporate Name Registration","99999","Valuation Research",,"Active Name",02/09/2020,"02/09/2020","NEVADA","UNITED STATES",,,"123 SOME STREET",,"MILWAUKEE","WI","53202","UNITED STATES","123 SOME STREET",,"MILWAUKEE","WI","53202","UNITED STATES"#^##^##^##^##^##^##^##^##^##^##^##^#
As you can see when comparing the results, the SF result set is only replacing commas at the tail-end of the text.
Can anyone tell me why the results between regex101.com and SF are returning different results with the same statement? Is my expression non-compliant with the SF implementation of RegEx - and if yes, can you tell me why?
Many many thanks for your time and effort reading this far!
Happy Wednesday,
Casey.
The use of .*? to achieve lazy matching for regexing is limited to PCRE, which Snowflake does not support. To see this, in regex101.com, change your 'flavor" to be anything other than PCRE (PHP); you will see that your ([("].*?["])*?(,) regex no longer achieves what you are expecting.
I believe that this will work for your purposes:
REGEXP_REPLACE(A.C1,'("[^"]*")*,','\\1#^#')

Use of pipe symbol in Select clause

Is there any way/use of putting pipe symbol || in select clause.
I have come across following query in one of the article(probably to concatenate two values), but when I try to use the same in my query I am getting syntax error.
select FirstName ||''|| LastName As CustomerName from Customer
Please correct if I am using wrong syntax.
You can use CONCAT() function, which works in SQL Server 2012 and above, or just a plain + sign to do concatenation.
https://msdn.microsoft.com/en-us/library/hh231515(v=sql.110).aspx
Returns a string that is the result of concatenating two or more
string values.
you need to use '+' to perform Concat() instead of pipe if you are using SQL-Server. Pipe operator is not used in SQL-Server
It is used to concatenate you columns and output a single result i.e in one column.
For example, if i want to see first name and last name together as in one column then i could use pipes:
SELECT Fname||Lname FROM my_table;
If you are asking whether you can use pipes || for concatenation in Microsoft SQL, then the short answer is no.
If you’re asking about the concatenation operator itself, then read on.
|| is the standard ANSI concatenation operator. This is apparent in PostgreSQL, SQLite and Oracle, among others.
Microsoft, however uses +, because, why not. Except Microsoft Access uses &, because, why not.
MariaDB/MySQL have two modes. In traditional mode, || is interpreted as “or”, and there is no concatenation operator. In ANSI mode, || is interpreted as the concatenation operator.
Most DBMS (not SQLite) have the non-standard concat() function which will also concatenate. They also coalesce any NULLs to empty strings, so they’re a bit more forgiving if you don’t care about NULLs.

Regular Expression in Visual Studio Find & Replace - multiple spaces between search terms

I require a regular expression for the Visual Studio Search and Replace functionality, as follows:
Search for the following term: sectorkey in (
There could be multiple spaces between each of the above 3 search terms, or even multiple line breaks/carriage returns.
The search term is looking for SQL statements that have hard-coded SectorKey values inside a SQL in statement. These need to be replaced with a SQL join statement - this will be done manually.
The little arrow to the right of the Find What box is your friend and can help you with the vagaries of the MS regex syntax.
Newline is represented by \n, so you can just do sectorkey( |\n)+in( |\n)+\( (You need to escape the open paren in your search expression, since that's used in grouping.)
I believe :Wh+ is what you want. The Visual Studio regex flavor is very strange; you'll tend to get better results if you consult the official reference. Expertise with "mainstream" regexes tends to be more of a handicap than a help when it comes to VS.
You can use \s+ to search for one or more adjacent whitespace characters (including tab, CR, LF etc), so your regex would presumably end up looking something like sectorkey\s+in\s+\(.
Edit...
As Joe points out in his comment, it seems that Visual Studio doesn't support \s in Find/Replace expressions, in which case you'll probably need to use something like [\n:b] instead. The regex would then become sectorkey[\n:b]+in[\n:b]+\(.

Is this sufficient to prevent query injection while using SQL Server?

I have recently taken on a project in which I need to integrate with PHP/SQL Server. I am looking for the quickest and easiest function to prevent SQL injection on SQL Server as I prefer MySQL and do not anticipate many more SQL Server related projects.
Is this function sufficient?
$someVal = mssql_escape($_POST['someVal']);
$query = "INSERT INTO tblName SET field = $someVal";
mssql_execute($query);
function mssql_escape($str) {
return str_replace("'", "''", $str);
}
If not, what additional steps should I take?
EDIT:
I am running on a Linux server - sqlsrv_query() only works if your hosting environment is windows
The best option: do not use SQL statements that get concatenated together - use parametrized queries.
E.g. do not create something like
string stmt = "INSERT INTO dbo.MyTable(field1,field2) VALUES(" + value1 + ", " + value2 + ")"
or something like that and then try to "sanitize" it by replacing single quotes or something - you'll never catch everything, someone will always find a way around your "safe guarding".
Instead, use:
string stmt = "INSERT INTO dbo.MyTable(field1,field2) VALUES(#value1, #value2)";
and then set the parameter values before executing this INSERT statement. This is really the only reliable way to avoid SQL injection - use it!
UPDATE: how to use parametrized queries from PHP - I found something here - does that help at all?
$tsql = "INSERT INTO DateTimeTable (myDate, myTime,
myDateTimeOffset, myDatetime2)
VALUES (?, ?, ?, ?)";
$params = array(
date("Y-m-d"), // Current date in Y-m-d format.
"15:30:41.987", // Time as a string.
date("c"), // Current date in ISO 8601 format.
date("Y-m-d H:i:s.u") // Current date and time.
);
$stmt = sqlsrv_query($conn, $tsql, $params);
So it seems you can't use "named" parameters like #value1, #value2, but instead you just use question marks ? for each parameter, and you basically just create a parameter array which you then pass into the query.
This article Accessing SQL Server Databases with PHP might also help - it has a similar sample of how to insert data using the parametrized queries.
UPDATE: after you've revealed that you're on Linux, this approach doesn't work anymore. Instead, you need to use an alternate library in PHP to call a database - something like PDO.
PDO should work both on any *nix type operating system, and against all sorts of databases, including SQL Server, and it supports parametrized queries, too:
$db = new PDO('your-connection-string-here');
$stmt = $db->prepare("SELECT priv FROM testUsers WHERE username=:username AND password=:password");
$stmt->bindParam(':username', $user);
$stmt->bindParam(':password', $pass);
$stmt->execute();
No, it's not sufficient. To my knowledge, string replacement can never really be sufficient in general (on any platform).
To prevent SQL injection, all queries need to be parameterized - either as parameterized queries or as stored procedures with parameters.
In these cases, the database calling library (i.e. ADO.NET and SQL Command) sends the parameters separately from the query and the server applies them, which eliminates the ability for the actual SQL to be altered in any way. This has numerous benefits besides injection, which include code page issues and date conversion issues - for that matter any conversions to string can be problematic if the server does not expect them done the way the client does them.
I partially disagree with other posters. If you run all your parameters through a function that double the quotes, this should prevent any possible injection attack. Actually in practice the more frequent problem is not deliberate sabotague but queries that break because a value legitimately includes a single quote, like a customer named "O'Hara" or a comment field of "Don't call Sally before 9:00". Anyway, I do escapes like this all the time and have never had a problem.
One caveat: On some database engines, there could be other dangerous characters besides a single quote. The only example I know is Postgres, where the backslash is magic. In this case your escape function must also double backslashes. Check the documentation.
I have nothing against using prepared statements, and for simple cases, where the only thing that changes is the value of the parameter, they are an excellent solution. But I routinely find that I have to build queries in pieces based on conditions in the program, like if parameter X is not null then not only do I need to add it to the where clause but I also need an additional join to get to the value I really need to test. Prepared statements can't handle this. You could, of course, build the SQL in pieces, turn it into a prepared statement, and then supply the parameters. But this is just a pain for no clear gain.
These days I mostly code in Java that allows functions to be overloaded, that is, have multiple implementations depending on the type of the passed in parameter. So I routine write a set of functions that I normally name simply "q" for "quote", that return the given type, suitably quoted. For strings, it doubles any quote marks, then slaps quote marks around the whole thing. For integers it just returns the string representation of the integer. For dates it converts to the JDBC (Java SQL) standard date format, which the driver is then supposed to convert to whatever is needed for the specific database being used. Etc. (On my current project I even included array as a passed in type, which I convert to a format suitable for use in an IN clause.) Then every time I want to include a field in a SQL statement, I just write "q(x)". As this is slapping quotes on when necessary, I don't need the extra string manipulation to put on quotes, so it's probably just as easy as not doing the escape.
For example, vulnerable way:
String myquery="select name from customer where customercode='"+custcode+"'";
Safe way:
String myquery="select name from customer where customercode="+q(custcode);
The right way is not particularly more to type than the wrong way, so it's easy to get in a good habit.
String replacement to escape quotes IS sufficient to prevent SQL injection attack vectors.
This only applies to SQL Server when QUOTED_IDENTIFIER is ON, and when you don't do something stoopid to your escaped string, such as truncating it or translating your Unicode string to an 8-bit string after escaping. In particular, you need to make sure QUOTED_IDENTIFIER is set to ON. Usually that's the default, but it may depend on the library you are using in PHP to access MSSQL.
Parameterization is a best practice, but there is nothing inherently insecure about escaping quotes to prevent SQL injection, with due care.
The rel issue with escaping strings is not the efficacy of the replacement, it is the potential for forgetting to do the replacement every time.
That said, your code escapes the value, but does not wrap the value in quotes. You need something like this instead:
function mssql_escape($str) {
return "N'" + str_replace("'", "''", $str) + "'";
}
The N above allows you to pass higher Unicode characters. If that's not a concern (i.e., your text fields are varchar rather than nvarchar), you can remove the N.
Now, if you do this, there are some caveats:
You need to make DAMNED SURE you call mssql_escape for every string value. And therein lies the rub.
Dates and GUID values also need escaping in the same manner.
You should validate numeric values, or at least escape them as well using the same function (MSSQL will cast the string to the appropriate numeric type).
Again, like others have said, parameterized queries are safer--not because escaping quotes doesn't work (it does except as noted above), but because it's easier to visually make sure you didn't forget to escape something.

When naming columns in a SQL Server table, are there any names I should avoid using?

I remember when I was working with PHP several years back I could blow up my application by naming a MySQL column 'desc' or any other term that was used as an operator.
So, in general are there names I should avoid giving my table columns?
As long as you surround every column name with '[' and ']', it really doesn't matter what you use. Even a space works (try it: [ ]).
Edit: If you can't use '[' and ']' in every case, check the documentation for characters that are not allowable as well as keywords that are intrinsic to the system; those would be out of bounds. Off the top of my head, the characters allowed (for SqlServer) for an identifier are: a-z, A-Z, 0-9, _, $, #.
in general don't start with a number, don't use spaces, don't use reserved words and don't use non alphanumeric characters
however if you really want to you can still do it but you need to surround it with brackets
this will fail
create table 1abc (id int)
this will not fail
create table [1abc] (id int)
but now you need to use [] all the time, I would avoid names as the ones I mentioned above
Check the list of reserved keywords as indicated in other answers.
Also avoid using the "quoting" using quotes or square brackets for the sake of having a space or other special character in the object name. The reason is that when quoted the object name becomes case sensitive in some database engines (not sure about MSSQL though)
Some teams use the prefix for database objects (tables, views, columns) like T_PERSON, V_PERSON, C_NAME etc. I personally do not like this convention, but it does help avoiding keyword issues.
You should avoid any reserved SQL keywords (ex. SELECT) and from a best practices should avoid spaces.
Yes, and no.
Yes, because it's annoying and confusing to have names that match keywords, and that you have to escape in funny ways (when you're not consistently escaping)
and No, because it's possible to have any sequence of characters as an identifier, if you escape it properly :)
Use [square brackets] or "double quotes" to escape multi-word identifiers or keywords, or even names that have backslashes or any other slightly odd character, if you must.
Strictly speaking, there's nothing you can't name your columns. However, it will make your life easier if you avoid names with spaces, SQL reserved words, and reserved words in the language you're programming in.
You can use pretty much anything as long as you surround it with square brackets:
SELECT [value], [select], [insert] FROM SomeTable
I however like to avoid doing this, partly because typing square brackets everywhere is anoying and partyly because I dont generally find that column names like 'value' particularly descriptive! :-)
Just stay away from SQL keywords and anything which contains something other than letters and you shouldn't need to use those pesky square brackets.
You can surround a word in square brackets [] and basically use anything you'd like.
I prefer not to use the brackets, and in order to do so you just have to avoid reserved words.
MS SQL Server 2008 has these reserved words
Beware of using square brackets on updates, I had a problem using the following query:
UPDATE logs SET locked=1 WHERE [id] IN (SELECT [id] FROM ids)
This caused all records to be updated, however, this appears to work fine:
UPDATE logs SET locked=1 WHERE id IN (SELECT [id] FROM ids)
Note that this problem appears specific to updates, as the following returns only the rows expected (not all rows):
SELECT * FROM logs WHERE [id] IN (SELECT [id] FROM ids)
This was using MSDE 2000 SP3 and connecting to the database using MS SQL (2000) Query Analyzer V 8.00.194
Very odd, possibly related to this Knowledgebase bug http://support.microsoft.com/kb/140215
In the end I just removed all the unnecessary square brackets.

Resources