T-SQL, Find numeric values - sql-server

I did this quiz on http://www.sql-ex.ru/, question 35 to be exact.
The question is as follows:
In Product table, determine the models which consist only of digits or only of latin letters (A-Z, case insensitive).
Result set: model, type of model.
And I gave the correct answer which is:
SELECT model,
type
FROM Product
WHERE Model NOT LIKE '%[^A-Z]%'
OR Model NOT LIKE '%[^0-9]%'
Now my question is why do I need double negations to make it work.
If I rewrite the code to:
SELECT model,
type
FROM Product
WHERE Model LIKE '%[A-Z]%'
OR Model LIKE '%[0-9]%'
I get the wrong answer:
Your query returned the correct dataset on the first (available) database, but it returned incorrect dataset on the second checking database.
* Wrong number of records (more by 37)
How come that the first example of code gives the correct answer while the second example doesn´t?
I have tried to find answer but no luck. Grateful for an explanation.

Where Model LIKE '%[A-Z]%' Or Model LIKE '%[0-9]%'
Matches rows where Model contains at least one alpha numeric character.
This does not exclude in any way those values that contain mixed alphanumeric and non-alphanumeric characters.
e.g. ~A#- would pass because of the presence of the A
Moreover your correct query matches either
'%[^A-Z]%': those strings which do not contain any non letters (i.e. consist of only letters or are empty)
'%[^0-9]%': those strings which do not contain any non digits (i.e. consist of only digits or are empty).
This is not handled at all in your second attempt and a mixed string of letters and digits would be accepted by that.
I would use your first attempt but if you were determined to avoid the double negative you could use
SELECT model
FROM Product
WHERE Model LIKE REPLICATE('[A-Z]', LEN(Model))
OR Model LIKE REPLICATE('[0-9]', LEN(Model))

Related

How to split a search string into parts, then check parts against a database

Here's what I'm dealing with:
We have a database of machines and their part lists are specified using strings. For example, one machine might be specified with the string &XXX&YYY-ZZZ, meaning the machine contains parts XXX and YYY and not ZZZ.
We use &XXX to specify that a part exists in a machine, and -XXX to specify that a part does not exist in a machine.
It's also possible that a part is not listed (i.e. not specified whether or not it exists in the machine). For example I might only have &XXX&YYY (ZZZ is not specified).
Additionally, the codes can be in any order, for example I might have &XXX&YYY-ZZZ or &XXX-ZZZ&YYY.
In order to search for machines, I get a string like this: &XXX-YYY/&YYY&ZZZ (/ is an OR operator), meaning "I want to find all machines that either a) contain XXX and do not contain YYY, or b) contain both YYY and ZZZ.
I'm having trouble parsing the string based on the variable ordering, possibility that parts may not be shown, and handling of the / operator. Note, we use Microsoft 365.
Looking for some suggestions!
When I search for &XXX-YYY/&YYY&ZZZ, I should return the following machines:
Machine
Result
&XXX-YYY&ZZZ
TRUE (because XXX exists and YYY does not exist)
&XXX-YYY-ZZZ
TRUE (because XXX exists and YYY does not exist)
&XXX&YYY&ZZZ
TRUE (because YYY exists and ZZZ exists)
&XXX&ZZZ
FALSE (because YYY is specified in the search, but this machine doesn't specify it)
&ZZZ&YYY
TRUE (showing that parts can be in any order)
You can try it in cell C2 with the following formula:
=LET(query, A2, queries, TEXTSPLIT(query,, "/"), input, B2:B7,
qryNum, ROWS(queries),
SPLIT, LAMBDA(txt,LET(str, SUBSTITUTE(SUBSTITUTE(txt, "&",";1_"),
"-",";0_"), TEXTSPLIT(str,,";",TRUE))),
lkUps, DROP(REDUCE("", queries, LAMBDA(acc,qry, HSTACK(acc, SPLIT(qry)))),,1),
MAP(input, LAMBDA(txt, LET(str, SPLIT(txt),
out, REDUCE("", SEQUENCE(qryNum, 1), LAMBDA(acc,idx,
LET(cols, INDEX(lkUps,,idx), qry, FILTER(cols, cols<>""),
matches, SUM(N(ISNUMBER(XMATCH(str, qry)))),
result, IF(ROWS(qry)=matches,1,0),IF(acc="", result, MAX(acc, result))
))), IF(out=1, TRUE, FALSE)
)))
)
and here is the corresponding output:
Assumptions:
String values (operation and part) should be unique, i.e. the case &XXX-YYY&XXX is not considered, because &XXX is duplicated.
Explanation
The main idea is to transform the input information in a way we can do comparisons at array level via XMATCH. The first thing to do is to identify each OR condition in the search string because we need to test each one of them against the Input column. The name queries is an array with all the OR conditions.
We can transform the string inputs in a way we can split the string into an array. SPLIT is a user LAMBDA function that does that:
SUBSTITUTE(SUBSTITUTE(txt, "&",";1_"),"-",";0_"), TEXTSPLIT(str,,";",TRUE)))
What it does is convert for example the input: &XXX-YYY&ZZZ into the following array:
1_XXX
0_YYY
1_ZZZ
We change the original operations &,- into 1,0 just for convenience, but you can keep the original operation value, it is irrelevant for the calculation. It is important to set the fourth TEXTSPLIT input argument to TRUE to ensure no empty rows are generated.
The name lkUps is an array with all the OR conditions organized by column for query. In the format we want, for example:
1_XXX 1_YYY
0_YYY 1_ZZZ
Note: For creating lkUps we use the pattern: DROP/REDUCE/HSTACK, for more information about it, check the answer to the question: how to transform a table in Excel from vertical to horizontal but with different length provided by #DavidLeal.
Now we have all the elements we need to build the recurrence. We use MAP to iterate over all Input column values. For each element (txt) we transform it to the format of our convenience via SPLIT user LAMBDA function and name it str.
We use REDUCE function inside MAP to iterate over all columns of lkUps to check against str. We use SEQUENCE(qryNum, 1) as input of REDUCE to be able to iterate over each lkUps column (qry).
Now we are going to use the above variables in XMATCH and name the variable matches as follows:
SUM(N(ISNUMBER(XMATCH(str, qry))))
If all values from qry were found in str then we have a match. If that is the case the item of the SUM will be 1, otherwise 0. Therefore the SUM for the match case should be of the same size as qry.
Because we include in the XMATCH both parts and operations (1,0), we ensure that not just the same parts are found, but also their corresponding operations are the same. The order of the parts is not relevant, XMATCH ensures it.
The REDUCE recurrence keeps the maximum value from the previous iteration (previous OR condition). We just need at least one match among all OR conditions. Therefore once we finish all the recurrence, if the result value of REDUCE is 1 at least one match was found. Finally, we transform the result into a TRUE/FALSE.
Note: For a large list of operations instead of using the above approach of two SUBSTITUTE calls. The SPLIT function can be defined as follow:
LAMBDA(txt,tks, LET(seq, SEQUENCE(COLUMNS(tks),1),
out, REDUCE("", seq, LAMBDA(acc,idx, LET(str, IF(acc="", txt, acc),
SUBSTITUTE(str, INDEX(tks,1,idx), INDEX(tks,2,idx))))),
TEXTSPLIT(out,,";",TRUE)))
and the input tks (tokens) can be defined as follow: {"&","-";"1_", "0_"}, i.e. in the first row old values and in the second row the new values.

How to compare numeric in PostgreSQL JSONB

I ran into strange situation working with jsonb type.
Expected behavior
Using short jsonb structure:
{"price": 99.99}
I wrote query like this:
SELECT * FROM table t WHERE t.data->price > 90.90
And it fail with error operator does not exist: jsonb > numeric the same as text (->>) operator does not exist: text > numeric
Then I wrote comparison as mentioned in many resources:
SELECT * FROM table t WHERE (t.data->>price)::NUMERIC > 90.90
And it's works as expected.
What's strange:
SELECT * FROM table t WHERE t.data->price > '90.90';
a little weird but query above works right.
EXPLAIN: Filter: ((data -> 'price'::text) > '90.90'::jsonb)
But if I change jsonb value to text as: {"price": "99.99"}
there is no result any more - empty.
Question: How actually PostgreSQL compare numeric data and what preferable way to do this kind of comparison.
But you aren't comparing numeric data, are you.
I can see that you think price contains a number, but it doesn't. It contains a JSON value. That might be a number, or it might be text, or an array, or an object, or an object containing arrays of objects containing...
You might say "but the key is called 'price' of course it is a number" but that's no use to PostgreSQL, particularly if I come along and sneakily insert an object containing arrays of objects containing...1
So - if you want a number to compare to you need convert it to a number (t.data->>price)::NUMERIC or convert your target value to JSON and let PostgreSQL do a JSON-based comparison (which might do what you want, it might not - I don't know what the exact rules are for JSON).
1 And that's exactly the sort of thing I would do, even though it is Christmas. I'm a bad person.

Full text index doesn't work at single word?

I have a full text index on many columns of the customer table, one of which columns is fname.
The following query:
select * from customer where fname like 'In%' and code='1409584557891'
returns me the line needed, this customer has an fname of 'In' .But if I add this to the end:
and contains((customer.fname) , N'"In*"')
an empty result-set is retuned. Why?
Also: there is another column named lname. If I add the equivelant contains command with the column and its value altered, it works!
There is a good chance "In" is a noise word. I also believe that if you do a fulltextsearch for something too short like the letter 'a' it is simply considered a noise word. See if 'a' or 'I' gives you anything.
Here is a link that can provide information on changing the noise words around if that is the case.
https://www.mssqltips.com/sqlservertip/1491/sql-server-full-text-search-noise-words-and-thesaurus-configurations/
You may also be able to simply turn off noise or 'stop' words:
https://dba.stackexchange.com/questions/135062/sql-server-no-search-results-caused-by-noise-words

SQL Server validating postcodes

I have a table containing postcodes but there is no validation built in to the entry form so there is no consistency in the way they are stored in the database, sample below:
ID Postcode
001742 B5
001745
001746
001748 DY3
001750
001751
001768 B276LL
001774 B339HY
001776 B339QY
001780 WR51DD
I want to use these postcode to map the distance from a central point but before I can do that I need to put them into a valid format and filter out any blanks or incomplete postcodes.
I had considered using
left(postcode,3) + ' ' + right(postcode,3)
To correct the formatting but this wouldn't work for postcodes like 'M6 8HD'
My aim is to get the list of postcodes in a valid format but I don't know how to account for different lengths of postcode. Is this there a way to do this in SQL Server?
As discussed in the comments, sometimes looking at a problem the other way around presents a far simpler solution.
You have a list of arbitrary input provided by users, which frequently doesn't contain the correct spacing. You also have a list of valid postcodes which are correctly spaced.
You're trying to solve the problem of finding the correct place to insert spaces into your arbitrary inputs to make them match the list of valid codes, and this is extremely difficult to do in practice.
However, performing the opposite task - removing the spaces from the valid postcodes - is remarkably easy to do. So that is what I'd suggest doing.
In our most recent round of data modelling, we have modelled addresses with two postcode columns - PostCode containing the postcode as provided from whatever sources, and PostCodeNoSpace, a computed column which strips whitespace characters from PostCode. We use the latter column for e.g. searches based on user input. You may want to do something similar with your list of Valid postcodes, if you're keeping it around permanently - so that you can perform easy matches/lookups and then translate those matches back into a version that has spaces - which is actually a solution to the original question posed!

Ordering by the first alphabetical char in a column in MYSQL

A table in a MYSQL database has address details- eg...
add1, add2, add3, district, postalTown, country
Ordering by postal town is usually fine, but some details have numbers in the postalTown column. For example 1420 Territet or 3100 Overijse. This will mean these will appear at the top above Aberdeen or Bristol. Is there a way of ordering by postalTown but by the first alphabetical character? That would mean the order of the above would be- Aberdeen, Bristol, Overijse, Territet
Thanks
Write an expression that will return the first alphabetical character, then just Order By [that expression]
Order By substring(LTrim(
Replace(Replace(Replace(Replace(Replace(
Replace(Replace(Replace(Replace(Replace(
colname, '1', ''),'2',''),'3',''),'4,''),'5', ''),
'6',''),'7',''),'8',''),'9',''),'0',''))
1,1)
If you want the rows sorted by the entire city name, and not just by the first character (as question title specifies) then use this:
Order By LTrim(
Replace(Replace(Replace(Replace(Replace(
Replace(Replace(Replace(Replace(Replace(
colname, '1', ''),'2',''),'3',''),'4,''),'5', ''),
'6',''),'7',''),'8',''),'9',''),'0',''))
Above is a guess (I haven't tried it), but the idea is first delete all numeric characters from the column value, then take the first character of whatever is remaining.
Also, if this works, and if you have any development access to the dataabse, (thinking DRY principle), I would add a computed column to this table, (or a separate view against the table), that is defined to use the above expression, so that this "extraction" of the town name is available to all other code that might want to access it without copying this expression everywhere you may need it..
You could write a stored function which returns the remainder of the column starting at the first alphabetic character (perhaps using REGEXP to find that index). Then order by the stored function.
Edit: instead of regexp in your function, depending on data format you could do a 'substring_index' on ' ' (space) and return the index of the first space, then call substring to return the remainder of the string after the first space.
Once you've created a stored function to return the string following the numbers, you can utilize it like this:
order by yourfunctionname(postalTown)
Stored Functions
First thing that comes to mind to me would to do the following on my ORDER BY, obviously, adding numbers 0 through 9. You'll notice crappy schemas produce crappy solutions. :) As the gentleman said above, you should probably think about a redesign of how you are storing your town data.
ORDER BY REPLACE(REPLACE(REPLACE(FieldName, '1', ''),'2',''),'3','') ETC.
Create a view on the table, making whatever translations you need, and then query against the view?

Resources