Consider this simple program:
from clickhouse_driver import Client
c = Client(host="localhost")
params = {"database": "test", "table": "t"}
query = c.substitute_params( query='SELECT * from %(database)s.%(table)s', params= params, context=c.connection.context)
print(query)
Clickhouse will put single quotes around the parameters, so the query result will be:
SELECT * from 'test'.'t'
I could also use f-string and the problem will be solved but that's vulnerable to SQLI. If I understand correctly, this is how parameterized queries are used in clickhouse to prevent SQLI.
How can we prevent the quotes from being put around the parameters?
As I understand it, substitute_params is not intended for database object identifiers like database and table, since those have to be quoted "differently" in ClickHouse (generally with backticks) than actual literal string values (with single quotes). https://clickhouse.com/docs/en/sql-reference/syntax/#identifiers
In general you can do your own bit of "SQL Injection defense" by validating the inputs for database and table, like ensuring they match a simple regex like "are all lower case letters or underscore" that applies to your ClickHouse schema. In that case using an f-string should be safe.
ClickHouse also support "server side substitution" where you can use an Identifier type for this use case, but I don't believe that feature is available in clickhouse-driver.
I'm looking for some assistance in debugging a REGEXP_REPLACE() statement.
I have been using an online regular expressions editor to build expressions, and then the SF regexp_* functions to implement them. I've attempted to remain consistent with the SF regex implementation, but I'm seeing an inconsistency in the returned results that I'm hoping someone can explain :)
My intent is to replace commas within the text (excluding commas with double-quoted text) with a new delimiter (#^#).
Sample text string:
"Foreign Corporate Name Registration","99999","Valuation Research",,"Active Name",02/09/2020,"02/09/2020","NEVADA","UNITED STATES",,,"123 SOME STREET",,"MILWAUKEE","WI","53202","UNITED STATES","123 SOME STREET",,"MILWAUKEE","WI","53202","UNITED STATES",,,,,,,,,,,,
RegEx command and Substitution (working in regex101.com):
([("].*?["])*?(,)
\1#^#
regex101.com Result:
"Foreign Corporate Name Registration"#^#"99999"#^#"Valuation Research"#^##^#"Active Name"#^#02/09/2020#^#"02/09/2020"#^#"NEVADA"#^#"UNITED STATES"#^##^##^#"123 SOME STREET"#^##^#"MILWAUKEE"#^#"WI"#^#"53202"#^#"UNITED STATES"#^#"123 SOME STREET"#^##^#"MILWAUKEE"#^#"WI"#^#"53202"#^#"UNITED STATES"#^##^##^##^##^##^##^##^##^##^##^##^#
When I try and implement this same logic in SF using REGEXP_REPLACE(), I am using the following statement:
SELECT TOP 500
A.C1
,REGEXP_REPLACE((A."C1"),'([("].*?["])*?(,)','\\1#^#') AS BASE
FROM
"<Warehouse>"."<database>"."<table>" AS A
This statement returns the result for BASE:
"Foreign Corporate Name Registration","99999","Valuation Research",,"Active Name",02/09/2020,"02/09/2020","NEVADA","UNITED STATES",,,"123 SOME STREET",,"MILWAUKEE","WI","53202","UNITED STATES","123 SOME STREET",,"MILWAUKEE","WI","53202","UNITED STATES"#^##^##^##^##^##^##^##^##^##^##^##^#
As you can see when comparing the results, the SF result set is only replacing commas at the tail-end of the text.
Can anyone tell me why the results between regex101.com and SF are returning different results with the same statement? Is my expression non-compliant with the SF implementation of RegEx - and if yes, can you tell me why?
Many many thanks for your time and effort reading this far!
Happy Wednesday,
Casey.
The use of .*? to achieve lazy matching for regexing is limited to PCRE, which Snowflake does not support. To see this, in regex101.com, change your 'flavor" to be anything other than PCRE (PHP); you will see that your ([("].*?["])*?(,) regex no longer achieves what you are expecting.
I believe that this will work for your purposes:
REGEXP_REPLACE(A.C1,'("[^"]*")*,','\\1#^#')
The FM EPS2_GET_DIRECTORY_LISTING has a parameter file_mask which I guess that it should act as a pattern. I need to read from the AS the files containing a word but the file_mask is working faultly. For example if I pass "*ZIP" it returns a file named '.TXT'. Is there a proper way to use that parameter?
The parameters are described in SAP note 1860206 which I will not quote here because I'm not sure about the copyright status. However, wildcards generally do not work as expected in this case - your best bet is to read without the parameter and filter the table afterwards.
I had similar problem but due to poor(eg. * wildcard can be used only at the and of the file mask string :/ ) implementation of standard FILE_MASK-based filtering feature in EPS2_GET_DIRECTORY_LISTING I ended up with the solution where I read entire directory content and then process it with regular expressions to find matching files/directories.
Entirely by accident today I was running a SQL statement to filter some items by date, for simplicity sake we'll say I used
SELECT *
FROM [TableName]
WHERE [RecordCreated] >+ '2016-04-10'
Only after the statement ran I realised I had used >+ instead of >=, now I was confused as I would have expected an error.
I tried a couple of other variations such as
>- -- Throws an error
<+ -- Ran successfully
<- -- Throws an error
The count of rows returned was exactly the same whether I used >= or >+
After searching online I couldn't find any documentation that covered this syntax directly, only when the two operators are used apart.
The RecordCreated column is a datetime.
Is this just a nicety in syntax for a possible common mistake or is it potentially trying to cast the date as a numeric value?
This seems to be a bug with '+' operator.
As per the updates from Microsoft team,
After some investigation, this behavior is by design since + is an
unary operator. So the parser accepts "+ , and the '+' is
simply ignored in this case. Changing this behavior has lot of
backward compatibility implications so we don't intend to change it &
the fix will introduce unnecessary changes for application code.
You can find a really good answer by RGO to his own question here.
The result shouldn't match with ">=" and "<=" but with ">" and "<". Just checked and the rowcound varies by 2 - the first and last item is removed.
I require a regular expression for the Visual Studio Search and Replace functionality, as follows:
Search for the following term: sectorkey in (
There could be multiple spaces between each of the above 3 search terms, or even multiple line breaks/carriage returns.
The search term is looking for SQL statements that have hard-coded SectorKey values inside a SQL in statement. These need to be replaced with a SQL join statement - this will be done manually.
The little arrow to the right of the Find What box is your friend and can help you with the vagaries of the MS regex syntax.
Newline is represented by \n, so you can just do sectorkey( |\n)+in( |\n)+\( (You need to escape the open paren in your search expression, since that's used in grouping.)
I believe :Wh+ is what you want. The Visual Studio regex flavor is very strange; you'll tend to get better results if you consult the official reference. Expertise with "mainstream" regexes tends to be more of a handicap than a help when it comes to VS.
You can use \s+ to search for one or more adjacent whitespace characters (including tab, CR, LF etc), so your regex would presumably end up looking something like sectorkey\s+in\s+\(.
Edit...
As Joe points out in his comment, it seems that Visual Studio doesn't support \s in Find/Replace expressions, in which case you'll probably need to use something like [\n:b] instead. The regex would then become sectorkey[\n:b]+in[\n:b]+\(.