Find rows where text array contains value similar to input - arrays

I'm trying to get rows where a column of type text[] contains a value similar to some user input.
What I've thought and done so far is to use the 'ANY' and 'LIKE' operator like this:
select * from someTable where '%someInput%' LIKE ANY(someColum);
But it doesn't work. The query returns the same values as that this query:
select * from someTable where 'someInput' = ANY(someColum);
I've got good a result using the unnest() function in a subquery but I need to query this in WHERE clause if possible.
Why doesn't the LIKE operator work with the ANY operator and I don't get any errors? I thought that one reason should be that ANY operator is in the right-hand of query, but ...
Is there any solution to this without using unnest() and if it is possible in WHERE clause?

It's also important to understand that ANY is not an operator but an SQL construct that can only be used to the right of an operator. More:
How to use ANY instead of IN in a WHERE clause with Rails?
The LIKE operator - or more precisely: expression, that is rewritten with to the ~~ operator in Postgres internally - expects the value to the left and the pattern to the right. There is no COMMUTATOR for this operator (like there is for the simple equality operator =) so Postgres cannot flip operands around.
Your attempt:
select * from someTable where '%someInput%' LIKE ANY(someColum);
has flipped left and right operand so '%someInput%' is the value and elements of the array column someColum are taken to be patterns (which is not what you want).
It would have to be ANY(someColum) LIKE '%someInput%' - except that's not possible with the ANY construct which is only allowed to the right of an operator. You are hitting a road block here.
Related:
Is there a way to usefully index a text column containing regex patterns?
Can PostgreSQL index array columns?
You can normalize your relational design and save elements of the array in separate rows in a separate table. Barring that, unnest() is the solution, as you already found yourself. But while you are only interested in the existence of at least one matching element, an EXISTS subquery will be most efficient while avoiding duplicates in the result - Postgres can stop the search as soon as the first match is found:
SELECT *
FROM tbl
WHERE EXISTS (
SELECT -- can be empty
FROM unnest(someColum) elem
WHERE elem LIKE '%someInput%'
);
You may want to escape special character in someInput. See:
Escape function for regular expression or LIKE patterns
Careful with the negation (NOT LIKE ALL (...)) when NULL can be involved:
Check if NULL exists in Postgres array

An admittedly imperfect possibility might be to use ARRAY_TO_STRING, then use LIKE against the result. For example:
SELECT *
FROM someTable
WHERE ARRAY_TO_STRING(someColum, '||') LIKE '%someInput%';
This approach is potentially problematic, though, because someone could search over two array elements if they discover the joining character sequence. For example, an array of {'Hi','Mom'}, connected with || would return a result if the user had entered i||M in place of someInput. Instead, the expectation would probably be that there would be no result in that case since neither Hi nor Mom individually contain the i||M sequence of characters.

My question was marked duplicate and linked to a question out of context by a careless mod. This question comes closest to what I asked so I leave my answer here. (I think it may help people for who unnest() would be a solution)
In my case a combination of DISTINCT and unnest() was the solution:
SELECT DISTINCT ON (id_) *
FROM (
SELECT unnest(tags) tag, *
FROM someTable
) x
WHERE (tag like '%someInput%');
unnest(tags) expands the text array to a list of rows and DISTINCT ON (id_) removes the duplicates that result from the expansion, based on a unique id_ column.
Update
Another way to do this without DISTINCT within the WHERE clause would be:
SELECT *
FROM someTable
WHERE (
0 < (
SELECT COUNT(*)
FROM unnest(tags) AS tag
WHERE tag LIKE '%someInput%'
)
);

Please check this out.
This answer was exactly what I was looking for. It also provides for some useful tips (and examples) in case you need more flexibility.
It basically explains the ANY(), the #> and the && operators.
"If you want to search multiple values, you can use #> operator"
"#> means contains all the values in that array. If you want to search if the current array contains any values in another array, you can use &&"

Related

Regular expression in snowflake

I have a requirement where the string from a column has a value "/Date(-34905600000)/". The value within brackets could be in any one of the following patters
"/Date(-34905600000)/"
"/Date(1407283200000)/"
"/Date(1636654411000+0000)/"
I need to extract all inside the parenthesis for examples 1 and 2 including the "-" if any. For the 3rd example, it should be only the numbers inside the parenthesis before "+" ie 1636654411000.
I tried the following and not getting the results as the output is coming along with the parenthesis.
select REGEXP_substr("/Date(-34905600000)/", '\\([[:alnum:]\-]+\\)')
from table A;
select REGEXP_substr("/Date(-34905600000)/", '\\((.*?)\\)') from table
A;
select REGEXP_substr("/Date(-34905600000)/", '[0-9]+') from table A;
Using regexp_replace() instead you could do:
regexp_replace(colA, '(\\/Date\\()([-0-9]*)(.*)', '\\2')
That splits the string into three substitution groups and then only keeps the second. I often end up doing regexp_replace() with substitution groups like this when regexp_substr() fails me.
if you want the REGEXP_SUBSTR to sub-matches you need to use the 'e' <regex_parameters> option, and then you can use 1 as the to match your first grouping, thus:
SELECT column1,
REGEXP_substr(column1, 'Date\\(([-+]?[0-9]+)',1,1,'e')
FROM VALUES
('"/Date(-34905600000)/"'),
('"/Date(1407283200000)/"'),
('"/Date(1636654411000+0000)/"');
gives:
COLUMN1
REGEXP_SUBSTR(COLUMN1, 'DATE\(([-+]?[0-9]+)',1,1,'E')
"/Date(-34905600000)/"
-34905600000
"/Date(1407283200000)/"
1407283200000
"/Date(1636654411000+0000)/"
1636654411000
I am quite sure the regexp is greedy by default, but otherwise you can force the match to the timezone or paren with
'Date\\(([-+]?[0-9]+)[-+\\)]'

Check for integer in string array

I am trying to check a string array for existence of a converted integer number. This sits inside of a procedure where:
nc_ecosite is an integer variable
current_consite is a string array
ecosite is an integer
current_ecosite_nc is double
IF to_char(nc_ecosite, '999') IN
(select current_consite from current_site_record
where current_ecosite_nc::integer = nc_ecosite) THEN
ecosite := nc_ecosite;
The result always comes from the ELSIF that follows the first IF. This occurs when nc_ecosite is in the array (from checks). Why is ecosite not being populated with nc_ecosite when values are matching?
I am working with Postgres 9.3 inside pgAdmin.
I found the following to provide the desired result:
IF nc_ecosite in
(select (unnest(string_to_array(current_consite, ',')))::integer
from current_site_record
where current_ecosite_nc::integer = nc_ecosite) THEN
ecosite := nc_ecosite::integer;
The immediate reason for the problem is that to_char() inserts a leading blank for your given pattern (legacy reasons - to make space for a potential negative sign). Use the FM Template Pattern Modifier to avoid that:
to_char(nc_ecosite, 'FM999')
Of course, it would be best to operate with matching data types to begin with - if at all possible.
Barring that, I suggest this faster and cleaner statement:
SELECT INTO ecosite nc_ecosite -- variable or column??
WHERE EXISTS (
SELECT 1 FROM current_site_record c
WHERE current_ecosite_nc::integer = nc_ecosite
AND to_char(nc_ecosite, 'FM999') = ANY(current_consite)
);
IF NOT FOUND THEN ... -- to replace your ELSIF
Make sure you don't run into naming conflicts between parameters, variables and column names! A widespread convention is to prepend variable names with _ (and never use the same for column names). But you better table-qualify column names in all queries anyway. You did not make clear which is a column and which is a variable ...
I might be able to optimize the statement further if I had the complete function and table definition.
Related:
Remove blank-padding from to_char() output
Variables for identifiers inside IF EXISTS in a plpgsql function
Naming conflict between function parameter and result of JOIN with USING clause

Can SQL Server index a text string by delimiter?

I need to store content keyed by strings, so a database table of key/value pairs, essentially. The keys, however, will be of a hierarchical format, like this:
foo.bar.baz
They'll have multiple categories, delimited by dots. The above value is in a category called "baz" which is in a parent category called "bar" which is in a parent category called "foo."
How can I index this in such a way that it's rapidly searchable for different permutations of the key/dot combo? For example, I want to be able to very quick find everything that starts
foo
Or
foo.bar
Yes, I could do a LIKE query, but I never need find anything like:
fo
So that seems like a waste to me.
Is there any way that SQL would index all permutation of a string delimited by the dots? So, in the above case we have:
foo
foo.bar
foo.bar.baz
Is there any type of index that would facilitate searching like that?
Edit
I will never need to search backwards or from the middle. My searches will always begin from the front of the string:
foo.bar
Never:
bar.baz
SQL Server can't really index substrings, no. If you only ever want to search on the first string, this will work fine, and will perform an index seek (depending on other query semantics of course):
WHERE col LIKE 'foo.%';
-- or
WHERE col LIKE 'foo.bar.%';
However when you start needing to search for bar or baz following any leading string, you will need to search on the substring:
WHERE col LIKE '%.bar.%';
-- or
WHERE PATINDEX('%.bar.%', col) > 0;
This won't work well with regular B-tree indexes, and I don't think Full-Text Search will be much help either, because of the special characters (periods) - but you should try it out if this is a requirement.
In general, storing data this way smells wrong to me. Seems to me that you should either have separate columns instead of jamming all the data into one column, or using a more relational EAV design.
Its appears to be a work for CTE!
create TableA(
id int identity,
parentid int null,
name varchar(50)
)
for a (fixed) two level its easy
select t2.name, t1.name
from tableA t1
join tableA t2 on t2.id = t1.parentid
where t2.name = 'father'
To find that kind of hierarchical values for a most general case you ill need some kind of recursion in self-join table by using a CTE.
http://msdn.microsoft.com/pt-br/library/ms175972.aspx

Evaluate logical expressions in string column SQL

I have a table containing columns id(int), logical expression(varchar) and result(bit). The logical expression is stored in a varchar which I need to evaluate and put the result into the result column. For example, the column could contain:
'1=1'
'2<3 AND 1^1=1'
'3>4 OR 4<2'
The result column should then contain
1
0
0
Currently I am using a cursor to navigate the rows and using dynamic sql to evaluate the expression.
"IF(" + #expression + ") SET #result = 1"
Is there a better, more efficient way to do this? I would ideally like to get rid of the cursor. Any ideas? Would this be better performed using an assembly?
I'd go with a CLR.
I posted a very similar answer here: Convert string with expression to decimal
infact, the above answer would work fine unmodified for (and any other simple expressions):
SELECT dbo.eval('1=1' )
SELECT dbo.eval('3>4 OR 4<2' )
However, it would fail for the one using the ^ (caret) operator - you would need to tweak the CLR to handle the bitwise XOR.
Some time ago, I wrote a user-defined function in SQL to give the decimal result of evaluating infix arithmetic expressions like 1+2+3+4/(5-2). The code is here. You could probably adapt it to work for your boolean expressions. It uses a table of integers called Sequence0_8000, which you can populate in any way you want.

Is there any way to put an invisible character at beginning of a string to change its sort order?

Is there any way to put a non printing or non obtrusive character at the beginning of a string of data in sqlserver. so that when an order by is performed, the string is sorted after the letter z alphabetically?
I have used a space at the beginning of the string to get the string at the top of the sorted list, but I am looking to do something similar to put a string at the end of the list.
I would rather not put another field such as "SortOrder" in the table to use to order the sort, and I would rather not have to sort the list in my code.
Added: Yes I know this is a bad idea, thanks to all for mentioning it, but still, I am curious if what I am asking can be done
Since no one is venturing to answer your question properly, here's my answer
Given: You are already adding <space> to some other data to make them appear top
Solution: Add CHAR(160) to make it appear at the bottom. This is in reality also a space, but is designed for computer systems to not treat it as a word break (hence the name).
http://en.wikipedia.org/wiki/Non-breaking_space
Your requirements:
Without adding another field such as "SortOrder" to the table
Without sorting the list in your code
I think this fits!
create table my(id int,data varchar(100))
insert my
select 1,'Banana' union all
select 2,Char(160) + 'mustappearlast' union all
select 3,' ' +N'mustappearfirst' union all
select 4,'apple' union all
select 5,'pear'
select *
from my
order by ASCII(lower(data)), data
(ok I cheated, I had to add ASCII(lower( but this is closest to your requirements than all the other answers so far)
You should use another column in the database to help specify the ordering rather than modifying the string:
SELECT *
FROM yourtable
ORDER BY sortorder, yourstring
Where you data might look like this:
yourstring sortorder
foo 0
bar 0
baz 1
qux 1
quux 2
If you can't modify the table you might be able to put the sortorder column into a different table and join to get it:
SELECT *
FROM yourtable AS T1
JOIN yourtablesorting AS T2
ON T1.id = T2.T1_id
ORDER BY T2.sortorder, T1.yourstring
Alternative solution:
If you really can't modify the database at all, not even adding a new table then you could add any character you like at the start of the string and remove it during the select:
SELECT RIGHT(yourstring, LEN(yourstring) - 1)
FROM yourtable
ORDER BY yourstring
Could you you include something like:
"<SORT1>This is my string"
"<SORT2>I'd like this to go second"
And remove them later? I think using invisible characters is fragile and hacky.
You could put a sort order in the query and use unions (no guarantees on performance).
select 1 as SortOrder, *
from table
where ... --first tier
union
select 2, *
from table
where ... --second tier
order by SortOrder
In my opinion, an invisible character for this purpose is a bad idea because it pollutes the data. I would do exactly what you would rather not do and add a new column.
To modify the idea slightly, you could implement it not as a sort order, but a grouping order, defaults to 0, where a negative integer puts the group at top of the list and a positve integer at the bottom, and then "order by sort_priority, foo"
I'm with everyone else that the ideal way to do this is by adding an additional column for sort order.
But if you don't want to add another column, and you already use a space for those items you want to appear at the top of the list, how do you feel about using a pipe (|) for items at the bottom of the list?
By default, SQL Server uses a Unicode character set for its sorting. In Unicode, the pipe and both curly brackets ({, }) come after z, so any of those three characters should work for you.

Resources