Find first appearance of a character in a set of possible characters in a string in SQL Server 2012 - sql-server

I'm aware of the SQL Server CHARINDEX function which returns the position of a character (or sub-string) within another string. Still, I did not find any evident that there is support for regular expressions (unless I develop my own UDF).
What I'm looking for is the ability to find the first position of any character in a set within a string.
Example:
DECLARE #_Source_String NVARCHAR(100) = 'This is "MY" string \ and here is more text' ;
SELECT <some function> (#_Source_String,'"\') ;
This should return 9 because " appears before \. On the other hand:
SELECT <some function> (#_Source_String,'x\') ;
should return 21 because \ is before x.
I should add that performance is very important since this function/mechanism will be invoked with very high frequency.

Pattern matching capabilities in TSQL are pretty basic and often you would require CLR and regular expressions.
You can do this requirement with PATINDEX though. A list of characters in square brackets denotes a set of characters to match.
DECLARE #_Source_String NVARCHAR(100) = 'This is "MY" string \ and here is more text';
SELECT PATINDEX('%["\]%', #_Source_String),
PATINDEX('%[x\]%', #_Source_String);
Returns
+------------------+------------------+
| (No column name) | (No column name) |
+------------------+------------------+
| 9 | 21 |
+------------------+------------------+

Related

Extract particular text from String in Snowflake

I m new to snowflake.
Input String : ["http://info.wealthenhancement.com/ppc-rt-retirement-planning"]
Output String : info.wealthenhancement.com/ppc-rt-retirement-planning
Please help to get output string.
Thanks
Use the substr function to only take characters from the 8th character to the end:
select
'http://info.wealthenhancement.com/ppc-rt-retirement-planning' as orig_value,
substr(orig_value, 8) as new_value
The output is:
+-------------------------------------------------------------+-------------------------------------------------------+
|ORIG_VALUE | NEW_VALUE |
+-------------------------------------------------------------+-------------------------------------------------------+
|http://info.wealthenhancement.com/ppc-rt-retirement-planning | info.wealthenhancement.com/ppc-rt-retirement-planning |
+-------------------------------------------------------------+-------------------------------------------------------+
This will work for http and https URLs by splitting using // as a delimiter. Only the last statement is required. The other two show how it's done built into steps:
-- Set a session variable to the string
set INPUT_STRING = '["http://info.wealthenhancement.com/ppc-rt-retirement-planning"]';
-- Trim leading and trailing square brackets and double quotes
select (trim($INPUT_STRING, '"[]'));
-- Split using // as a delimiter and keep only the right part and cast as string
select split((trim($INPUT_STRING, '"[]')), '//')[1]::string as URL

Regex string with 2+ different numbers and some optional characters in Snowflake syntax

I would like to check if a specific column in one of my tables meets the following conditions:
String must contain at least three characters
String must contain at least two different numbers [e.g. 123 would work but 111 would not]
Characters which are allowed in the string:
Numbers (0-9)
Uppercase letters
Lowercase letters
Underscores (_)]
Dashes (-)
I have some experience with Regex but am having issues with Snowflake's syntax. Whenever I try using the '?' regex character (to mark something as optional) I receive an error. Can someone help me understand a workaround and provide a solution?
What I have so far:
SELECT string,
LENGTH(string) AS length
FROM tbl
WHERE REGEXP_LIKE(string,'^[0-9]+{3,}[-+]?[A-Z]?[a-z]?$')
ORDER BY length;
Thanks!
Your regex looks a little confusing and invalid, and it doesn't look like it quite meets your needs either. I read this expression as a string that:
Must start with one or more digits, at least 3 or more times
The confusing part to me is the '+' is a quantifier, which is not quantifiable with {3,} but somehow doesn't produce an error for me
Optionally followed by either a dash or plus sign
Followed by an uppercase character zero or one times (giving back as needed)
Followed by and ending with a lowercase character zero or one times (giving back as needed)
Questions
You say that your string must contain 3 characters and at least 2 different numbers, numbers are characters but I'm not sure if you mean 3 letters...
Are you considering the numbers to be characters?
Does the order of the characters matter?
Can you provide an example of the error you are receiving?
Notes
Checking for a second digit that is not the same as the first involves the concept of a lookahead with a backreference. Snowflake does not support backreferences.
One thing about pattern matching with regular expressions is that order makes a difference. If order is not of importance to you, then you'll have multiple patterns to match against.
Example
Below is how you can test each part of your requirements individually. I've included a few regexp_substr functions to show how extraction can work to check if something exists again.
Uncomment the WHERE clause to see the dataset filtered. The filters are written as expressions so you can remove any/all of the regexp_* columns.
select randstr(36,random(123)) as r_string
,length(r_string) AS length
,regexp_like(r_string,'^[0-9]+{3,}[-+]?[A-Z]?[a-z]?$') as reg
,regexp_like(r_string,'.*[A-Za-z]{3,}.*') as has_3_consecutive_letters
,regexp_like(r_string,'.*\\d+.*\\d+.*') as has_2_digits
,regexp_substr(r_string,'(\\d)',1,1) as first_digit
,regexp_substr(r_string,'(\\d)',1,2) as second_digit
,first_digit <> second_digit as digits_1st_not_equal_2nd
,not(regexp_instr(r_string,regexp_substr(r_string,'(\\d)',1,1),1,2)) as first_digit_does_not_appear_again
,has_3_consecutive_letters and has_2_digits and first_digit_does_not_appear_again as test
from table(generator(rowcount => 10))
//where regexp_like(r_string,'.*[A-Za-z]{3,}.*') // has_3_consecutive_letters
// and regexp_like(r_string,'.*\\d+.*\\d+.*') // has_2_digits
// and not(regexp_instr(r_string,regexp_substr(r_string,'(\\d)',1,1),1,2)) // first_digit_does_not_appear_again
;
Assuming the digits need to be contiguous, you can use a javascript UDF to find the number in a string with with the largest number of distinct digits:
create or replace function f(S text)
returns float
language javascript
returns null on null input
as
$$
const m = S.match(/\d+/g)
if (!m) return 0
const lengths = m.map(m=> [...new Set (m.split(''))].length)
const max_length = lengths.reduce((a,b) => Math.max(a,b))
return max_length
$$
;
Combined with WHERE-clause, this does what you want, I believe:
select column1, f(column1) max_length
from t
where max_length>1 and length(column1)>2 and column1 rlike '[\\w\\d-]+';
Yielding:
COLUMN1 | MAX_LENGTH
------------------------+-----------
abc123def567ghi1111_123 | 3
123 | 3
111222 | 2
Assuming this input:
create or replace table t as
select * from values ('abc123def567ghi1111_123'), ('xyz111asdf'), ('123'), ('111222'), ('abc 111111111 abc'), ('12'), ('asdf'), ('123 456'), (null);
The function is even simpler if the digits don't have to be contiguous (i.e. count the distinct digits in a string). Then core logic changes to:
const m = S.match(/\d/g)
if (!m) return 0
const length = [...new Set (m)].length
return length
Hope that's helpful!

LTRIM and RTRIM Truncating Floating Point Number

I am experiencing what I would describe as entirely unexpected behaviour when I pass a float value through either LTRIM or RTRIM:
CREATE TABLE MyTable
(MyCol float null)
INSERT MyTable
values (11.7333335876465)
SELECT MyCol,
RTRIM(LTRIM(MyCol)) lr,
LTRIM(MyCol) l,
RTRIM(MyCol) r
FROM MyTable
Which gives the following results:
MyCol | lr | l | r
--------------------------------------------
11.7333335876465 | 11.7333 | 11.7333 | 11.7333
I have observed the same behaviour on SQL Server 2014 and 2016.
Now, my understanding is that LTRIM and RTRIM should just strip off white space from a value - not cast it/truncate it.
Does anyone have an idea what is going on here?
Just to explain the background to this. I am generating SQL queries using the properties of a set of C# POCOs (the results will be used to generate an MD5 hash that will then be compared to an equivalent value from an Oracle table) and for convenience was wrapping every column with LTRIM/RTRIM.
Perhaps you can use format() instead
Declare #F float = 11.7333335876465
Select format(#F,'#.##############')
Returns
11.7333335876465

TSQL search exact match into a string

I stumbling on an issue with string parsing; what I'm trying to achieve is substitute a marker string with a value but the string match needs to be perfect.
Keep in mind that before the compare I split the entire string in a table (rowID int, segment nvarchar(max)) wherever i find a space so, a thing like 'The delta_s is §delta_s' will look like:
rowID | segment
1 | the
2 | deltaT_s
3 | is
4 | §deltaT_s
After this i cycle each row with my table of "replacements" (idString nvarchar(max), val float); example:
Marker string (#segment): '§deltaT_s'
String to replace (#idString): '§deltaT_s'
The instruction I am using (since "like" is a lost cause as far I can see):
SELECT STUFF(#segment, PATINDEX('%'+#idString+'[^a-z]%', #segment), LEN(#idString), CAST(#val AS NVARCHAR(MAX)))
with #val being the number to substitute taken from the "replacements" table.
Now, in my table of "replacements" i have 2 delta like markers
1) §deltaT_s
2) §deltaT
My issue is that when the cycle start comparing the segments with the markers and the §deltaT comes up it will substitute the first part of the string in this way
'§deltaT_s' -> '10_s'
I don't understand what I am doing wrong with the REGEX anyone can give me and hand on this matter?
I am available in case more info are required.
Thank you,
F.
If possible you should change the marking style putting a paragraph symbol (§) at both side of the token, making one of the example in your comment
the deltaT_s is §deltaT_s§, see ya!
doing that the sentence will be split as
rowID | segment
--------------------
1 | the
2 | deltaT_s
3 | is
4 | §deltaT_s§,
5 | see
6 | ya!
if the replace values are stored in a fact table you will have something like
token | value
------------------
§deltaT§ | foo
§deltaT_s§ | 10
or you can fake it putting the symbol at the end of the token in you query.
Than it's possible to search for the substitution with a LIKE and a LEFT JOIN between the two tables
SELECT COALESCE(REPLACE(segment, t.token, t.value), segment) Replaced
FROM Sentence s
LEFT JOIN Token t ON s.segment LIKE '%' + t.token + '%'
SQLFiddle demo
If you cannot change the fact table you can fake the change adding the symbol after the token
SELECT COALESCE(REPLACE(segment, t.token, t.value), segment) Replaced
FROM Sentence s
LEFT JOIN Token t ON s.segment LIKE '%' + t.token + '§%'
Maybe it is not an option, but for me helped ones.
If you can use Regex in sql or create CLR functions, look at this link http://www.sqllion.com/2010/12/pattern-matching-regex-in-t-sql/ last 2 options.
For you the best will be to take last choice using CLR function.
Then you will can do like this:
Text: the deltaT_s is §delta, see ya!
Regex: (?<=[^a-z])§delta(?![a-z_]) - this (?<=[^a-z]) means that will not take to match and (?![a-z_]) is not followed by letters and underline.
Replace to : 10
I also have tried regex \b§delta\b (\b :Start or End of word), but it seems it doesn't like §

sphinx - Column count doesn't match

I have the following in my sphinx
mysql> desc rec;
+-----------+---------+
| Field | Type |
+-----------+---------+
| id | integer |
| desc | field |
| tid | uint |
| gid | uint |
| no | uint |
+-----------+---------+
And I ran the following successfully in sphinx sql
replace into rec VALUES ('24','test test',1,1, 1 );
But when I run in the C mysql API I get this error
Column count doesn't match value count at row 1
the c code is this
if (mysql_query(con, "replace into rec VALUES ('24','test test',1,1, 1 )") )
{
fprintf(stderr, "%s\n", mysql_error(con));
mysql_close(con);
exit(1);
}
Please note that the C program is connecting to the sphinx sql with no issues
One problem may be that you are quoting the integer for the id column. I would try taking out the single quotes around the 24. The column named desc is also concerning, since that is a reserved word in MySQL.
A good best practice is to always specify the column names, even if you are inserting into all columns. The reason is that you may want to alter the table later to add a column and you don't necessarily want to go back and change all your code to match the new structure. It also makes your code clearer since you don't have to reference the table structure to know what the values mean and it helps in case a tool like Sphinx is using a different order for the columns than you expect. Try changing your code to this, which specifies the columns and quotes them (mysql uses backticks for quotes) and also removes the quotes around the value for the id column:
if (mysql_query(con, "replace into rec (`id`, `desc`, `tid`, `gid`, `no`) VALUES (24, 'test test', 1, 1, 1)") )

Resources