Case insensitive Postgres query with array contains - arrays

I have records which contain an array of tags like these:
id | title | tags
--------+-----------------------------------------------+----------------------
124009 | bridge photo | {bridge,photo,Colors}
124018 | Zoom 5 | {Recorder,zoom}
123570 | Sint et | {Reiciendis,praesentium}
119479 | Architecto consectetur | {quia}
I'm using the following SQL query to fetch a specific record by tags ('bridge', 'photo', 'Colors'):
SELECT "listings".* FROM "listings" WHERE (tags #> ARRAY['bridge', 'photo', 'Colors']::varchar[]) ORDER BY "listings"."id" ASC LIMIT $1 [["LIMIT", 1]]
And this returns a first record in this table.
The problem with this is that I have mixed type cases and I would like this to return the same result if I search for: bridge, photo, colors. Essentially I need to make this search case-insensitive but can't find a way to do so with Postgres.
This is the SQL query I've tried which is throwing errors:
SELECT "listings".* FROM "listings" WHERE (LOWER(tags) #> ARRAY['bridge', 'photo', 'colors']::varchar[]) ORDER BY "listings"."id" ASC LIMIT $1
This is the error:
PG::UndefinedFunction: ERROR: function lower(character varying[]) does not exist
HINT: No function matches the given name and argument types. You might need to add explicit type casts.

You can convert text array elements to lower case in the way like this:
select lower(tags::text)::text[]
from listings;
lower
--------------------------
{bridge,photo,colors}
{recorder,zoom}
{reiciendis,praesentium}
{quia}
(4 rows)
Use this in your query:
SELECT *
FROM listings
WHERE lower(tags::text)::text[] #> ARRAY['bridge', 'photo', 'colors']
ORDER BY id ASC;
id | title | tags
--------+--------------+-----------------------
124009 | bridge photo | {bridge,photo,Colors}
(1 row)

You can't apply LOWER() to an array directly, but you can unpack the array, apply it to each element, and reassemble it when you're done:
... WHERE ARRAY(SELECT LOWER(UNNEST(tags))) #> ARRAY['bridge', 'photo', 'colors']
You could also install the citext (case-insensitive text) module; if you declare listings.tags as type citext[], your query should work as-is.

Related

How to inner join two windowed tables in Flux query language?

The goal is to join tables min and max returned by the following query:
data = from(bucket: "my_bucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
min = data
|> aggregateWindow(
every: 1d,
fn: min,
column: "_value")
max = data
|> aggregateWindow(
every: 1d,
fn: max,
column: "_value")
The columns of max look like this:
+---------------------------------+
| Columns |
+---------------------------------+
| table MAX |
| _measurement GROUP STRING |
| _field GROUP STRING |
| _value NO GROUP DOUBLE |
| _start GROUP DATETIME:RFC3339 |
| _stop GROUP DATETIME:RFC3339 |
| _time NO GROUP DATETIME:RFC3339 |
| env GROUP STRING |
| path GROUP STRING |
+---------------------------------+
The min table looks the same except the name of the first column. Both tables return data which can be confirmed by running yield(tables:min) or yield(tables:max). The join should be an inner join on columns _measurement, _field, _time, env and path and it should contain both the minimum and the maximum value _value of every window.
When I try to run within influxdb DataExplorer
join(tables: {min: min, max: max}, on: ["_time", "_field", "path", "_measurement", "env"], method: "inner")
I get the following error:
Failed to execute Flux query
When I run the job in Bash via influx query --file ./query.flux -r > ./query.csv; I get the following error:
Error: failed to execute query: 504 Gateway Timeout: unable to decode response content type "text/html; charset=utf-8"
No more logging-output is available to investigate the issue further. Whats wrong with this join?
join can only take two tables as the parameters according to this doc. You could try the union where it can take more than two tables as the input. See more details here.
You might just need to modify the script as below:
union(tables: [min: min, max: max]

ERROR: operator does not exist: jsonb[] -> integer

select id,rules from links where id=2;
id | rules
----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2 | {"{\"id\": \"61979e81-823b-419b-a577-e2acb34a2f40\", \"url\": \"https://www.wikijob.co.uk/about-us\", \"what\": \"country\", \"matches\": \"GB\", \"percentage\": null}"}
I'm trying to get the elements of the jsonb using the operators here https://www.postgresql.org/docs/9.6/functions-json.html
Whether I use 'url', or an integer as below, I get a similar result.
select id,rules->1 from links where id=2;
ERROR: operator does not exist: jsonb[] -> integer
LINE 1: select id,rules->1 from links where id=2;
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
What am I doing wrong?
PS Postgres version 9.6.12.
The column is an array, you can access the first element using index:
select id, rules[1]
from links
where id = 2
Be sure to check also this answer.
Use jsonb_each() in a lateral join to see all rules in separate rows:
select id, key, value
from links
cross join jsonb_each(rules[1]) as rule(key, value)
where id = 2
You can get a single rule in this way:
select id, value as url
from links
cross join jsonb_each(rules[1]) as rule(key, value)
where id = 2 and key = 'url'
Use unnest() to find an url in all elements of the array, e.g.:
select id, unnest(rules)->'url' as url
from links
where id = 2;

Check a value in an array inside a object json in PostgreSQL 9.5

I have an json object containing an array and others properties.
I need to check the first value of the array for each line of my table.
Here is an example of the json
{"objectID2":342,"objectID1":46,"objectType":["Demand","Entity"]}
So I need for example to get all lines with ObjectType[0] = 'Demand' and objectId1 = 46.
This the the table colums
id | relationName | content
Content column contains the json.
just query them? like:
t=# with table_name(id, rn, content) as (values(1,null,'{"objectID2":342,"objectID1":46,"objectType":["Demand","Entity"]}'::json))
select * From table_name
where content->'objectType'->>0 = 'Demand' and content->>'objectID1' = '46';
id | rn | content
----+----+-------------------------------------------------------------------
1 | | {"objectID2":342,"objectID1":46,"objectType":["Demand","Entity"]}
(1 row)

Hive query, better option to self join

So I am working with a hive table that is set up as so:
id (Int), mapper (String), mapperId (Int)
Basically a single Id can have multiple mapperIds, one per mapper such as an example below:
ID (1) mapper(MAP1) mapperId(123)
ID (1) mapper(MAP2) mapperId(1234)
ID (1) mapper(MAP3) mapperId(12345)
ID (2) mapper(MAP2) mapperId(10)
ID (2) mapper(MAP3) mapperId(12)
I want to return the list of mapperIds associated to each unique ID. So for the above example I would want the below returned as a single row.
1, 123, 1234, 12345
2, null, 10, 12
The mapper Strings are known, so I was thinking of doing a self join for every mapper string I am interested in, but I was wondering if there was a more optimal solution?
If the assumption that the mapper column is distinct with respect to a given ID is correct, you could collect the mapper column and the mapperid column to a Map using brickhouse collect. You can clone the repo from that link and build the jar with Maven.
Query:
add jar /complete/path/to/jar/brickhouse-0.7.0-SNAPSHOT.jar;
create temporary function collect as 'brickhouse.udf.collect.CollectUDAF';
select id
,id_map['MAP1'] as mapper1
,id_map['MAP2'] as mapper2
,id_map['MAP3'] as mapper3
from (
select id
,collect(mapper, mapperid) as id_map
from some_table
group by id
) x
Output:
| id | mapper1 | mapper2 | mapper3 |
------------------------------------
1 123 1234 12345
2 10 12

Apply function to every element of an array in a SELECT statement

I am listing all functions of a PostgreSQL schema and need the human readable types for every argument of the functions. OIDs of the types a represented as an array in proallargtypes. I can unnest the array and apply format_type() to it, which causes the query to split into multiple rows for a single function. To avoid that I have to create an outer SELECT to GROUP the argtypes again because, apperently, one cannot group an unnested array. All columns are dependent on proname but I have to list all columns in GROUP BY clause, which is unnecessary but proname is not a primary key.
Is there a better way to achieve my goal of an output like this:
proname | ... | protypes
-------------------------------------
test | ... | {integer,integer}
I am currently using this query:
SELECT
proname,
prosrc,
pronargs,
proargmodes,
array_agg(proargtypes), -- see here
proallargtypes,
proargnames,
prodefaults,
prorettype,
lanname
FROM (
SELECT
p.proname,
p.prosrc,
p.pronargs,
p.proargmodes,
format_type(unnest(p.proallargtypes), NULL) AS proargtypes, -- and here
p.proallargtypes,
p.proargnames,
pg_get_expr(p.proargdefaults, 0) AS prodefaults,
format_type(p.prorettype, NULL) AS prorettype,
l.lanname
FROM pg_catalog.pg_proc p
JOIN pg_catalog.pg_language l
ON l.oid = p.prolang
JOIN pg_catalog.pg_namespace n
ON n.oid = p.pronamespace
WHERE n.nspname = 'public'
) x
GROUP BY proname, prosrc, pronargs, proargmodes, proallargtypes, proargnames, prodefaults, prorettype, lanname
you can use internal "undocumented" function pg_catalog.pg_get_function_arguments(p.oid).
postgres=# SELECT pg_catalog.pg_get_function_arguments('fufu'::regproc);
pg_get_function_arguments
---------------------------
a integer, b integer
(1 row)
Now, there are no build "map" function. So unnest, array_agg is only one possible. You can simplify life with own custom function:
CREATE OR REPLACE FUNCTION format_types(oid[])
RETURNS text[] AS $$
SELECT ARRAY(SELECT format_type(unnest($1), null))
$$ LANGUAGE sql IMMUTABLE;
and result
postgres=# SELECT format_types('{21,22,23}');
format_types
-------------------------------
{smallint,int2vector,integer}
(1 row)
Then your query should to be:
SELECT proname, format_types(proallargtypes)
FROM pg_proc
WHERE pronamespace = 2200 AND proallargtypes;
But result will not be expected probably, because proallargtypes field is not empty only when OUT parameters are used. It is empty usually. You should to look to proargtypes field, but it is a oidvector type - so you should to transform to oid[] first.
postgres=# SELECT proname, format_types(string_to_array(proargtypes::text,' ')::oid[])
FROM pg_proc
WHERE pronamespace = 2200
LIMIT 10;
proname | format_types
------------------------------+----------------------------------------------------
quantile_append_double | {internal,"double precision","double precision"}
quantile_append_double_array | {internal,"double precision","double precision[]"}
quantile_double | {internal}
quantile_double_array | {internal}
quantile | {"double precision","double precision"}
quantile | {"double precision","double precision[]"}
quantile_cont_double | {internal}
quantile_cont_double_array | {internal}
quantile_cont | {"double precision","double precision"}
quantile_cont | {"double precision","double precision[]"}
(10 rows)

Resources