SQL query to find any consecutive integer in my amount field - sql-server

So my data in the table looks like this:
amount | ID
10918.6 | ABC
9999.99 | BCD
9999.89 | DEF
I need to find all consecutive digit (9999.99, 1111.11, 2222.22 etc except 0000.00) So from above example output should give only BCD. I have to check for 1k place only.
If I have 9999.99 and 99.99 it should only give me 9999.99.
Also if I have 989999.99 I have to consider this also as my accepted output
I can do this by using where clause -- column like '%9999.99' or '%1111.11' but I need to find the better way may be by regular exp etc.

Using modulo you can strip away any digits above the 10k position, then check the values are in an accepted list.
WHERE
(amount % 10000) IN (1111.11, 2222.22, 3333.33, 4444.44, 5555.55, 6666.66, 7777.77, 8888.88, 9999.99)
Or...
WHERE
(amount % 10000) / 1111.11 IN (1,2,3,4,5,6,7,8,9)
These avoid turning numbers in to strings, which is generally neither necessary nor prudent.

Related

Regex string with 2+ different numbers and some optional characters in Snowflake syntax

I would like to check if a specific column in one of my tables meets the following conditions:
String must contain at least three characters
String must contain at least two different numbers [e.g. 123 would work but 111 would not]
Characters which are allowed in the string:
Numbers (0-9)
Uppercase letters
Lowercase letters
Underscores (_)]
Dashes (-)
I have some experience with Regex but am having issues with Snowflake's syntax. Whenever I try using the '?' regex character (to mark something as optional) I receive an error. Can someone help me understand a workaround and provide a solution?
What I have so far:
SELECT string,
LENGTH(string) AS length
FROM tbl
WHERE REGEXP_LIKE(string,'^[0-9]+{3,}[-+]?[A-Z]?[a-z]?$')
ORDER BY length;
Thanks!
Your regex looks a little confusing and invalid, and it doesn't look like it quite meets your needs either. I read this expression as a string that:
Must start with one or more digits, at least 3 or more times
The confusing part to me is the '+' is a quantifier, which is not quantifiable with {3,} but somehow doesn't produce an error for me
Optionally followed by either a dash or plus sign
Followed by an uppercase character zero or one times (giving back as needed)
Followed by and ending with a lowercase character zero or one times (giving back as needed)
Questions
You say that your string must contain 3 characters and at least 2 different numbers, numbers are characters but I'm not sure if you mean 3 letters...
Are you considering the numbers to be characters?
Does the order of the characters matter?
Can you provide an example of the error you are receiving?
Notes
Checking for a second digit that is not the same as the first involves the concept of a lookahead with a backreference. Snowflake does not support backreferences.
One thing about pattern matching with regular expressions is that order makes a difference. If order is not of importance to you, then you'll have multiple patterns to match against.
Example
Below is how you can test each part of your requirements individually. I've included a few regexp_substr functions to show how extraction can work to check if something exists again.
Uncomment the WHERE clause to see the dataset filtered. The filters are written as expressions so you can remove any/all of the regexp_* columns.
select randstr(36,random(123)) as r_string
,length(r_string) AS length
,regexp_like(r_string,'^[0-9]+{3,}[-+]?[A-Z]?[a-z]?$') as reg
,regexp_like(r_string,'.*[A-Za-z]{3,}.*') as has_3_consecutive_letters
,regexp_like(r_string,'.*\\d+.*\\d+.*') as has_2_digits
,regexp_substr(r_string,'(\\d)',1,1) as first_digit
,regexp_substr(r_string,'(\\d)',1,2) as second_digit
,first_digit <> second_digit as digits_1st_not_equal_2nd
,not(regexp_instr(r_string,regexp_substr(r_string,'(\\d)',1,1),1,2)) as first_digit_does_not_appear_again
,has_3_consecutive_letters and has_2_digits and first_digit_does_not_appear_again as test
from table(generator(rowcount => 10))
//where regexp_like(r_string,'.*[A-Za-z]{3,}.*') // has_3_consecutive_letters
// and regexp_like(r_string,'.*\\d+.*\\d+.*') // has_2_digits
// and not(regexp_instr(r_string,regexp_substr(r_string,'(\\d)',1,1),1,2)) // first_digit_does_not_appear_again
;
Assuming the digits need to be contiguous, you can use a javascript UDF to find the number in a string with with the largest number of distinct digits:
create or replace function f(S text)
returns float
language javascript
returns null on null input
as
$$
const m = S.match(/\d+/g)
if (!m) return 0
const lengths = m.map(m=> [...new Set (m.split(''))].length)
const max_length = lengths.reduce((a,b) => Math.max(a,b))
return max_length
$$
;
Combined with WHERE-clause, this does what you want, I believe:
select column1, f(column1) max_length
from t
where max_length>1 and length(column1)>2 and column1 rlike '[\\w\\d-]+';
Yielding:
COLUMN1 | MAX_LENGTH
------------------------+-----------
abc123def567ghi1111_123 | 3
123 | 3
111222 | 2
Assuming this input:
create or replace table t as
select * from values ('abc123def567ghi1111_123'), ('xyz111asdf'), ('123'), ('111222'), ('abc 111111111 abc'), ('12'), ('asdf'), ('123 456'), (null);
The function is even simpler if the digits don't have to be contiguous (i.e. count the distinct digits in a string). Then core logic changes to:
const m = S.match(/\d/g)
if (!m) return 0
const length = [...new Set (m)].length
return length
Hope that's helpful!

T-SQL: SUM Number between Delimiters from String

I need to get numbers with a variable length out of a string and sum them.
The strings got the following format:
EH:NUMBER=SomeOtherStuff->Code
I'm extracting the code via RIGHT() and join with another table to get the group right, at the moment I'm using sum to get it together via date:
SUM(CASE WHEN (MONTH(data.DATE1) = 5 AND YEAR(data.DATE1) = YEAR(GETDATE())) THEN 1 ELSE 0 END) N'Mai',
I then need to sum the numbers from the string and not the number of rows.
Some Examples:
Month1 EH:1=24->ZTM
Month1 EH:4=13-21->LKm
Month2 EH:3=34,33,43->LKm
Month2 EH:7=12,92-29,29->LKm
Month2 EH:5=24-26,11,21,22->ZOL
What i need:
Material - Month1 - Month2
ZTM - 1 - 0
LKM - 4 - 10
ZOL - 0 - 5
Could you help me please?
Greetings
Short version:
What you are looking for is SUBSTRING.
Longer version:
To get the the sum of the numerical value of NUMBER you need think about how break it down.
I'd recommend following these steps:
Extract the NUMBER part from the string. This should be done with SUBSTRING (much like you extract Code with RIGHT). To get the start and and length och your substring use charindex ( or patindex if you like).
Convert the NUMBER part to a numerical value with cast (or convert or what you are familiar with)
Now you can do your aggregation.
So SUM(CAST(SUBSTRING(*this part you will have to figure out by yourself)) as correct numerical data type)).
I'll let you figure out the values to insert by yourself and would recommend to first find the positions of the delimiting characters, then extract the NUMBER part, then get the numerical value .... you get it .
This to gain a better understanding of what you are actually doing.
Cheers, and good luck with your assignment
Martin

String concatenation based of column length

i have telephone number like this in one table:
ID Telephone extention
------------------------------
1 9986323422 4
2 9992108 2222
3 9962718 241
Final result wanted is number of digit in extention will be taken and replace the end digit/(s) of "Telephone" column.
want my result to be:
ID Telephone extention result
-----------------------------------------
1 9986323422 4 9986323424
2 9992108 2222 9992222
3 9962718 241 9962241
I have 100k records like this. What is the best and quick way to achieve this? Thanks.
This may be a little too cute1 but is an alternative to the STUFF approaches:
SELECT ID,Telephone,Extension,
SUBSTRING(Telephone,1-LEN(Extension),LEN(Telephone)) + Extension as Result
It works because negative arguments to the start parameter for SUBSTRING allow you to truncate the end of the string by those amounts.
1It avoid repetitive calls to LEN(), but the optimizer should be able to avoid duplication anyway and avoids having to reverse the entire string, but this does come at a readability cost.
You can use STUFF() together with some calculations with LEN()
DECLARE #dummyTable TABLE(ID INT,Telephone VARCHAR(100), extention VARCHAR(100));
INSERT INTO #dummyTable VALUES
(1,'9986323422','4')
,(2,'9992108','2222')
,(3,'9962718','241');
SELECT *
,STUFF(t.Telephone,LEN(t.Telephone)-LEN(t.extention)+1,LEN(t.extention),t.extention) AS result
FROM #dummyTable AS t;
You might have to add some validations to avoid errors (e.g. length of extension should be smaller than of phone number)
In similar way use reverse() function with stuff() function to replace ends digits of Telephone value with extention value
select *, reverse(stuff(reverse(Telephone), 1, len(extention), reverse(extention)))
from table

Check last bit in a hex in SQL

I have a SQL entry that is of type hex (varbinary) and I want to do a SELECT COUNT for all the entries that have this hex value ending in 1.
I was thinking about using CONVERT to make my hex into a char and then use WHERE my_string LIKE "%1". The thing is that varchar is capped at 8000 chars, and my hex is longer than that.
What options do I have?
Varbinary actually works with some string manipulation functions, most notably substring. So you can use eg.:
select substring(yourBinary, 1, 1);
To get the first byte of your binary column. To get the last bit then, you can use this:
select substring(yourBinary, len(yourBinary), 1) & 1;
This will give you zero if the bit is off, or one if it is on.
However, if you really only have to check at most the last 4-8 bytes, you can easily use the bitwise operators on the column directly:
select yourBinary & 1;
As a final note, this is going to be rather slow. So if you plan on doing this often, on large amounts of data, it might be better to simply create another bit column just for that, which you can index. If you're talking about at most a thousand rows or so, or if you don't care about speed, fire away :)
Check last four bits = 0001
SELECT SUM(CASE WHEN MyColumn % 16 IN (-15,1) THEN 1 END) FROM MyTable
Check last bit = 1
SELECT SUM(CASE WHEN MyColumn % 2 IN (-1,1) THEN 1 END) FROM MyTable
If you are wondering why you have to check for negative moduli, try SELECT 0x80000001 % 16
Try using this where
WHERE LEFT(my_string,1) = 1
It it's text values ending in 1 then you want the Right as opposed to the Left
WHERE RIGHT(my_string,1) = 1

What is the best way to change to a currency format?

I have a list of values such as "12000","12345","123456" that need to be converted to currency ("120.00", "123.45", "1234.56"). The only way I know is to convert the value to a string, copy the first strlen()-2 characters to one string (dollars) and the remainging two digits to another string(cents) and then write them as the following:
printf("%s.%s", dollars, cents);
printf("$%.2f", value/100);
Don't use floats for storing or representing monetary amounts. Use longs (if you need more than 4 billion cent use llongs). Its usually a good idea to represent currency in its minimum usable unit, example use 10000 to represent 100Euro). Then the correct way to format these values (assuming 100 cent to the euro or dollar) is:
printf( "%d.%02d", value/100, value%100);
Hope that makes sense...
Calculations with currency values is a complex subject but you cant go far wrong is you always aim to have a rounded answer to the nearest currency unit (cent for example) and always make sure that rounding errors are calculated for (example, to divide 1 dollar three ways you should end up with 33+33+34 or 33+33+33+1).
to prefix values less than $1.00 with 0, use:
printf( "$%0.2f", value / 100.0 );
This will result in $0.25 if value = 25

Resources