Convert \u0000\u0000 to readable in snowflake - snowflake-cloud-data-platform

I have a data file that contains unicode values of \u0000\u0000, \u0000\u0018 and \u0000\u001aq{ in one column. How do I convert this to human readable format using snowflake?

Snowflake will automatically decode Unicode escape sequences. Of course, \u0000 is the Unicode escape sequence for a null string, so it's not printable. \u0018 is the Unicode escape sequence for "Cancel", so it's also not printable.
Here's an example with a printable Unicode escape sequence:
create or replace temp table t1 as select
'This \u028D is a Latin small letter turned w.' as THE_STRING;
select THE_STRING from T1;
If the values have already somehow landed in the fields still escaped, you can use a Javascript UDF to convert them:
create or replace function decode_unicode("s" string)
returns string
language javascript
strict immutable
as
$$
return decodeURIComponent(JSON.parse(`"${s}"`));
$$;
select decode_unicode('This is a Unicode escape code, double escaped to simulate landing in a field that way: \\u028D');

Related

How do you escape strings for SQLite table names in c?

How do you escape strings for SQLite table names in c?
I find a document, but it do not tell me the detail https://www.sqlite.org/lang_keywords.html
And this document says that sql is end with '\x00' https://www.sqlite.org/c3ref/prepare.html
Here is the similar question in python: How do you escape strings for SQLite table/column names in Python?
Identifiers should be wrapped in double quotes if they need escaping, with all double quotes in them escaped by doubling up the quotes. bad"name" needs to become "bad""name" to be used in a SQL statement.
Sqlite comes with custom versions of *printf() functions that include formats for escaping sql identifiers and strings (Which use single quotes in SQL). The one that does the escaping of double quotes for identifiers is %w:
char *sanitized_ddl = sqlite3_mprintf("CREATE TABLE \"%w\"(\"%w\", \"%w\");",
"bad\"name", "foo bar", "baz");
Ideally, though, you're not going to use table or column names that need escaping, but it's good practice to escape user-supplied names to help protect against SQL injection attacks and the like.
Example:
example to 'example'
'a to '''a'
Detail:
Do not use the byte value 0 in the string.It will return an error like unrecognized token: "'" even if you pass the correct zSql len to sqlite3_prepare_v2().
Replace ' to '' in the string, add ' to the start and the end to the string. ' is single quote (byte 39).
It is not recommend to use invalid utf8 string. The string do not need to be valid utf8 string according to the parse code in sqlite3 version 3.28.0 . I have tested that invalid utf8 string can use as table name, but the document of sqlite3_prepare_v2() says you need SQL statement, UTF-8 encoded
I have write a program to confirm that any byte list with len 1,2 without byte value 0 in it, can use as the table name, and the program can read value from that table, can list the table from the SQLITE_MASTER table in sqlite3 version 3.28.0.

Escaping square brackets when using LIKE operator in sql [duplicate]

I am trying to filter items with a stored procedure using like. The column is a varchar(15). The items I am trying to filter have square brackets in the name.
For example: WC[R]S123456.
If I do a LIKE 'WC[R]S123456' it will not return anything.
I found some information on using the ESCAPE keyword with LIKE, but how can I use it to treat the square brackets as a regular string?
LIKE 'WC[[]R]S123456'
or
LIKE 'WC\[R]S123456' ESCAPE '\'
Should work.
Let's say you want to match the literal its[brac]et.
You don't need to escape the ] as it has special meaning only when it is paired with [.
Therefore escaping [ suffices to solve the problem. You can escape [ by replacing it with [[].
I needed to exclude names that started with an underscore from a query, so I ended up with this:
WHERE b.[name] not like '\_%' escape '\' -- use \ as the escape character
Here is what I actually used:
like 'WC![R]S123456' ESCAPE '!'
The ESCAPE keyword is used if you need to search for special characters like % and _, which are normally wild cards. If you specify ESCAPE, SQL will search literally for the characters % and _.
Here's a good article with some more examples
SELECT columns FROM table WHERE
column LIKE '%[[]SQL Server Driver]%'
-- or
SELECT columns FROM table WHERE
column LIKE '%\[SQL Server Driver]%' ESCAPE '\'
According to documentation:
You can use the wildcard pattern matching characters as literal
characters. To use a wildcard character as a literal character,
enclose the wildcard character in brackets.
You need to escape these three characters %_[:
'5%' LIKE '5[%]' -- true
'5$' LIKE '5[%]' -- false
'foo_bar' LIKE 'foo[_]bar' -- true
'foo$bar' LIKE 'foo[_]bar' -- false
'foo[bar' LIKE 'foo[[]bar' -- true
'foo]bar' LIKE 'foo]bar' -- true
If you would need to escape special characters like '_' (underscore), as it was in my case, and you are not willing/not able to define an ESCAPE clause, you may wish to enclose the special character with square brackets '[' and ']'.
This explains the meaning of the "weird" string '[[]' - it just embraces the '[' character with square brackets, effectively escaping it.
My use case was to specify the name of a stored procedure with underscores in it as a filter criteria for the Profiler. So I've put string '%name[_]of[_]a[_]stored[_]procedure%' in a TextData LIKE field and it gave me trace results I wanted to achieve.
Here is a good example from the documentation:
LIKE (Transact-SQL) - Using Wildcard Characters As Literals
There is a problem in that while
LIKE 'WC[[]R]S123456'
and
LIKE 'WC\[R]S123456' ESCAPE '\'
both work for SQL Server, neither work for Oracle.
It seems that there isn't any ISO/IEC 9075 way to recognize a pattern involving a left brace.
Instead of '\' or another character on the keyboard, you can also use special characters that aren't on the keyboard. Depending o your use case this might be necessary, if you don't want user input to accidentally be used as an escape character.
Use the following.
For user input to search as it is, use escape, in that it will require the following replacement for all special characters (the below covers all of SQL Server).
Here a single quote, "'" ,is not taken as it does not affect the like clause as it is a matter of string concatenation.
The "-" & "^" & "]" replace is not required as we are escaping "[".
String FormattedString = "UserString".Replace("ð","ðð").Replace("_", "ð_").Replace("%", "ð%").Replace("[", "ð[");
Then, in SQL Query it should be as following. (In parameterised query, the string can be added with patterns after the above replacement).
To search an exact string.
like 'FormattedString' ESCAPE 'ð'
To search start with a string:
like '%FormattedString' ESCAPE 'ð'
To search end with a string:
like 'FormattedString%' ESCAPE 'ð'
To search containing with a string:
like '%FormattedString%' ESCAPE 'ð'
And so on for other pattern matching. But direct user input needs to be formatted as mentioned above.

.NET Regex for SQL Server string... but not Unicode string?

I'm trying to build a .NET regex to match SQL Server constant strings... but not Unicode strings.
Here's a bit of SQL:
select * from SomeTable where SomeKey = 'abc''def' and AnotherField = n'another''value'
Note that within a string two single quotes escapes a single quote.
The regex should match 'abc''def' but not n'another''value'.
I have a regex now that manages to locate a string, but it also matches the Unicode string (starting just after the N):
'('{2})*([^']*)('{2})*([^']*)('{2})*'
Thanks!
This pattern will do most of what you are looking to do:
(?<unicode>n)?'(?<value>(?:''|[^'])*)'
The upside is that it should accurately match any number of escaped quotes. (SomeKey = 'abc''''def''' will match abc''''def''.)
The downside is it also matches Unicode strings, although it captures the leading n to identify it as a Unicode string. When you process the regular expression, you can ignore matches where the match group "unicode" was successful.
The pattern creates the following groups for each match:
unicode: Success if the string is a Unicode string, fails to match if ASCII
value: the string value. escaped single quotes remain escaped
If you are using .NET regular expressions, you could add (?(unicode)(?<-value>)) to the end of the pattern to suppress matching the value, although the pattern as a whole would still match.
Edit
Having thought about it some more, the following pattern should do exactly what you wanted; it will not match Unicode strings at all. The above approach might still be more readable, however.
(?:n'(?:''|[^'])*'[^']*)*(?<!n)'(?<value>(?:''|[^'])*)'

SQL Server BULK INSERT - Escaping reserved characters

There's very little documentation available about escaping characters in SQL Server BULK INSERT files.
The documentation for BULK INSERT says the statement only has two formatting options: FIELDTERMINATOR and ROWTERMINATOR, however it doesn't say how you're meant to escape those characters if they appear in a row's field value.
For example, if I have this table:
CREATE TABLE People ( name varchar(MAX), notes varchar(MAX) )
and this single row of data:
"Foo, \Bar", "he has a\r\nvery strange name\r\nlol"
...how would its corresponding bulk insert file look like, because this wouldn't work for obvious reasons:
Foo,\Bar,he has a
very strange name
lol
SQL Server says it supports \r and \n but doesn't say if backslashes escape themselves, nor does it mention field value delimiting (e.g. with double-quotes, or escaping double-quotes) so I'm a little perplexed in this area.
I worked-around this issue by using \0 as a row separator and \t as a field separator, as neither character appeared as a field value and are both supported as separators by BULK INSERT.
I am surprised MSSQL doesn't offer more flexibility when it comes to import/export. It wouldn't take too much effort to build a first-class CSV/TSV parser.
For the next person to search:
I used "\0\t" as a field separator, and "\0\n" for the end-of-line separator on the last field. Use of "\0\r\n" would also be acceptable if you wish to pretend that the files have DOS EOL conventions.
For those unfamiliar with the \x notation, \0 is CHAR(0), \t is CHAR(9), \n is CHAR(10) and \r is CHAR(13). Replace the CHAR() function with whatever your language offers to convert a number to a nominated character.
With this combination, all instances of \t and \n (and \r) become acceptable characters in the data file. After all, the weakness of the bulk upload system is that tabs and newlines are often legitimate characters in text strings, whereas other low-ASCII characters like CHAR(0), CHAR(1) and CHAR(2) are not legal text - not even appearing in UTF-8.
The only character you cannot have in your data is \0 - UNLESS you can guarantee it will never be followed by \t or \n (or \r)
If your language suffers problems when you use \0 in strings (but depending on how you code, you may still be able to avoid that problem) - AND if you know that your data won't have CHAR(1) or CHAR(2) in it (ie no binary) then use those characters instead. Those low characters are only going to be found when you are trying to store arbitrary binary data in strings.
Note also that you will find bytes 0, 1, 2 in UTF-16, UCS-2 and UTF-32 (aka UCS-4) - BUT - the 2 or 4 byte wide representation of CHAR(0, 1 or 2) is still acceptable and distinct from any legal unicode text. Just make sure you select the correct codepage setting in the format file to suit your choice of a UTF or UCS variant.
A bulk insert needs to have corresponding fields and field count for each row. Your example is a little rough, as its not structured data. As for thecharacters it will interpret them literally, not using escape characters (your string will be as seen in the file.
As for the double quotes enclosing each field, you will just have to use them as field and row terminators as well. So now your you should have:
Fieldterminator = '","',
Rowterminator = '"\n'
Does that make sense? Then after the bulk insert you'll need to take out the prefix double quote with something like:
Update yourtable
set yourfirstcolumn = right(yourfirstcolumn, len(yourfirstcolumn) - 1)

Sql command - like with % operator

In table i have row where in column 'name' is 'name123'
First sql command return this row but second command do nothing , why ?
select * from Osoby where imie like '%123%'
select * from Osoby where imie like '%123'
In line with what others are suggesting, try this --
select * from Osoby where RTRIM(LTRIM((imie)) like '%123'
and verify that you are getting the row
Perhaps the field has trailing spaces.
If the imie field is a char field, it will pad whatever you put in it with spaces to reach the length of the field. If you change this to a varchar field, you can get rid of the trailing spaces.
If you change your field to varchar, then run, UPDATE Osoby SET imie = RTRIM(imie) to trim off the extra spaces.
In essence, the query you posted should work, it sounds like something's wrong with your data.
Check your datatypes and have a look at:
http://msdn.microsoft.com/en-us/library/ms179859.aspx
Pattern Matching by Using LIKE
LIKE supports ASCII pattern matching and Unicode pattern matching. When all arguments (match_expression, pattern, and escape_character, if present) are ASCII character data types, ASCII pattern matching is performed. If any one of the arguments are of Unicode data type, all arguments are converted to Unicode and Unicode pattern matching is performed. When you use Unicode data (nchar or nvarchar data types) with LIKE, trailing blanks are significant; however, for non-Unicode data, trailing blanks are not significant. Unicode LIKE is compatible with the ISO standard. ASCII LIKE is compatible with earlier versions of SQL Server.
to prevent spaces problems try this:
select * from Osoby where ltrim(rtrim(imie)) like '%123'

Resources