I need to write a T-SQL query against a text column where some of the values are html or asp.net coding but include normal human-readable text. For example:
{\colortbl ;\red31\green73\blue125;\red0\green0\blue0;} \viewkind4\uc1\pard\ltrpar\lang1033\f0\fs22 All invoices to be emailed to Jack Jack.Marsman#brampton.ca
I don't need that information I need the real text; in this case I want to get just All invoices to be emailed to Jack Jack.Marsman#brampton.ca
Any suggestions on how to go about extracting the text without getting the coding?
Short answer is that there is no easy standard way to do this. I’d try creating a CLR since this kind of parsing is easier in C# or VB.NET.
You can also try using regex to strip out everything that’s not human readable.
Is all of your data in similar format like you already shown? If that’s the case then it comes down to calling substring several times…
Related
I'm looking for some assistance in debugging a REGEXP_REPLACE() statement.
I have been using an online regular expressions editor to build expressions, and then the SF regexp_* functions to implement them. I've attempted to remain consistent with the SF regex implementation, but I'm seeing an inconsistency in the returned results that I'm hoping someone can explain :)
My intent is to replace commas within the text (excluding commas with double-quoted text) with a new delimiter (#^#).
Sample text string:
"Foreign Corporate Name Registration","99999","Valuation Research",,"Active Name",02/09/2020,"02/09/2020","NEVADA","UNITED STATES",,,"123 SOME STREET",,"MILWAUKEE","WI","53202","UNITED STATES","123 SOME STREET",,"MILWAUKEE","WI","53202","UNITED STATES",,,,,,,,,,,,
RegEx command and Substitution (working in regex101.com):
([("].*?["])*?(,)
\1#^#
regex101.com Result:
"Foreign Corporate Name Registration"#^#"99999"#^#"Valuation Research"#^##^#"Active Name"#^#02/09/2020#^#"02/09/2020"#^#"NEVADA"#^#"UNITED STATES"#^##^##^#"123 SOME STREET"#^##^#"MILWAUKEE"#^#"WI"#^#"53202"#^#"UNITED STATES"#^#"123 SOME STREET"#^##^#"MILWAUKEE"#^#"WI"#^#"53202"#^#"UNITED STATES"#^##^##^##^##^##^##^##^##^##^##^##^#
When I try and implement this same logic in SF using REGEXP_REPLACE(), I am using the following statement:
SELECT TOP 500
A.C1
,REGEXP_REPLACE((A."C1"),'([("].*?["])*?(,)','\\1#^#') AS BASE
FROM
"<Warehouse>"."<database>"."<table>" AS A
This statement returns the result for BASE:
"Foreign Corporate Name Registration","99999","Valuation Research",,"Active Name",02/09/2020,"02/09/2020","NEVADA","UNITED STATES",,,"123 SOME STREET",,"MILWAUKEE","WI","53202","UNITED STATES","123 SOME STREET",,"MILWAUKEE","WI","53202","UNITED STATES"#^##^##^##^##^##^##^##^##^##^##^##^#
As you can see when comparing the results, the SF result set is only replacing commas at the tail-end of the text.
Can anyone tell me why the results between regex101.com and SF are returning different results with the same statement? Is my expression non-compliant with the SF implementation of RegEx - and if yes, can you tell me why?
Many many thanks for your time and effort reading this far!
Happy Wednesday,
Casey.
The use of .*? to achieve lazy matching for regexing is limited to PCRE, which Snowflake does not support. To see this, in regex101.com, change your 'flavor" to be anything other than PCRE (PHP); you will see that your ([("].*?["])*?(,) regex no longer achieves what you are expecting.
I believe that this will work for your purposes:
REGEXP_REPLACE(A.C1,'("[^"]*")*,','\\1#^#')
Recently I got a requirement to read a file and insert those records into a DB. But, when I looked at the file it is not consistent, and the source team is not in a position to alter it in any way. So, is there a way to read it?
Example of a File:
Record1,Record2,Record3,Record4
Record1,Record2,Record3,Record4
Record1,Record2
Record1,Record2,Record3,Record4
Record1,Record2,Record3,Record4
Record1,Record2
Record1,Record2,Record3,Record4
Any inputs will be appreciated.
Regards,
Vishnu.
If I understand correctly, you have a comma-separated list of values and each new line forms a dataset.
You can use MFL Format Builder with delimiter as a comma and generate a standardized XML document out of your data.
This link has a good tutorial to get you started.
I was not able to solve this by myself so I hope I didn't miss any similar post here and I'm not wasting your time.
What I want is to identify (get a list) of all strings used in SQL Server code.
Example:
select 'WordToCatch1' as 'Column1'
from Table1
where Column2 = 'WordToCatch2'
If you put above code to SSMS all three words in apostrophes will be red but only words 'WordToCatch1' and 'WordToCatch2' are "real" strings used in code.
My goal is to find all those "real" strings in any code.
For example if I will have stored procedure 10k rows long it would be impossible to search them manually so I want something what will find all those "real" strings for me and return a list of them or something.
Thanks in advance!
The trouble is, Column1 is nothing particular different compared to WordToCatch1 and WordToCatch2 - not unless you parse the SQL yourself. You could modify your query to take the quotes away from Column1 and it will show up coloured black.
I guess a simple regex will show up all identifiers after an AS keyword, which would be easier than fully parsing SQL, if all the unwanted strings are like that, and its not just an example.
With the application that I am working with and writing reports for, the user is entering the Location in all upper case. It has been requested by those who my reports are going to that the Location be in proper case. This was fine till I realized that proper case does not recognize abbreviations. Is there a way to write an expression in SSDT that will, while converting the street name into proper case, also make is so abbreviations like "SE" or "DR" are upper case?
John Saunders is right, it's not simple, and it'd be better if you can fix the data at the source. But you can wrap your Proper Case function in a series of outer REPLACE Functions. It's not simple because you'll have to analyze your data and figure out all the abbreviations you want to handle, and manually code each one. It will get huge, so you might consider creating this function in SSRS custom code, so it doesn't look so cluttered in the expression builder.
Psuedo code would look something like this:
REPLACE(
REPLACE(
ProperCase(MyFieldName)
,"Se","SE")
,"Dr","DR")
Add a REPLACE(InnerExpression,ProperCaseExpression,UpperCaseExpression) for each individual abbreviation you want to handle. It won't be fun, but it will work.
My co-worker is being unsafe with his code and is allowing a user to upload an SQL file to be run on the server.
He strips out any key words in the file such as "EXEC", "DROP", "UPDATE", "INSERT", "TRUNC"
I want to show him the error of his ways by exploiting his EXEC ( #sql )
My first attempt will be with 'EXEXECEC (N''SELECT ''You DRDROPOPped the ball Bob!'')'
But he might filter that all out in a loop.
Is there a way I can exploit my co-worker's code? Or is filtering out the key words enough?
Edit: I got him to check in his code. If the code contains a keyword he does not execute it. I'm still trying to figure out how to exploit this using the binary conversion.
Tell your co-worker he's a moron.
Do an obfuscated SQL query, something like:
select #sql = 0x44524f5020426f627350616e7473
This will need some tweaking depending on what the rest of the code looks like, but the idea is to encode your code in hex and execute it (or rather, let it be executed). There are other ways to obfuscate code to be injected.
You've got a huge security hole there. And the funny part is, this is not even something that needs to be reinvented. The proper way to stop such things from happening is to create and use an account with the correct permissions (eg: can only perform select queries on tables x, y and z).
Have a look at ASCII Encoded/Binary attacks ...
should convince your friend he is doomed.. ;)
And here some help on how to encode the strings ..
Converting a String to HEX in SQL