How to find Postgresql Json array elements of one table matches different column values of another table - arrays

I have a table X with one column of type jsonb.
Jsonb contains json array - "cities":["aaaa","bbbb","cccc"].
Postgresql 9.4 provides jsonb operators to get json array elements using '->'
There is another table Y with column cities.
Y
a b cities
aaaa
bbbb
cccc
I want to display
select Y.a, Y.b from Y, X only if X.jsonb->cities is present in Y.cities.

This is done with a lateral join over the json_array_elements (or in this case json_array_elements_text since y.cities is presumably text-typed) function. You didn't provide a full sample schema, so I'll hand-wave some untested SQL to give you the idea.
select *
from x
cross join json_array_elements_text(x.cities) AS x_cities(city)
inner join y on (x_cities.city = y.cities);
If you're going to use json you're going to need to get very good with lateral joins.
In general I'm seeing a lot of people using json where it's completely unnecessary and a simple relational modelling would be more appropriate. Think about whether you really need to do this. In this case it seems like if you must use an array, a native PostgreSQL text[] array would be better, but you should probably model it with a join-table instead.

Related

How to query multiple JSON document schemas in Snowflake?

Could anyone tell me how to change the Stored Procedure in the article below to recursively expand all the attributes of a json file (multiple JSON document schemas)?
https://support.snowflake.net/s/article/Automating-Snowflake-Semi-Structured-JSON-Data-Handling-part-2
Craig Warman's stored procedure posted in that blog is a great idea. I asked him if it was okay to refactor his code, and he agreed. I've used the refactored version in the field, so I know the SP well as well as how it works.
It may be possible to modify the SP to work on your JSON. It will depend on whether or not Snowflake types the JSON in your variant column. The way you have it structured, it may not type everything. You can check by running this SQL and seeing if the result set includes all the columns you need:
set VARIANT_TABLE = 'WEATHER';
set VARIANT_COLUMN = 'V';
with MAIN_TABLE as
(
select * from identifier($VARIANT_TABLE) sample (1000 rows)
)
select distinct REGEXP_REPLACE(REGEXP_REPLACE(f.path, '\\[(.+)\\]'),'[^a-zA-Z0-9]','_') AS path_name, -- This generates paths with levels enclosed by double quotes (ex: "path"."to"."element"). It also strips any bracket-enclosed array element references (like "[0]")
typeof(f.value) AS attribute_type, -- This generates column datatypes.
path_name AS alias_name -- This generates column aliases based on the path
from
MAIN_TABLE,
LATERAL FLATTEN(identifier($VARIANT_COLUMN), RECURSIVE=>true) f
where TYPEOF(f.value) != 'OBJECT'
AND NOT contains(f.path, '[');
Be sure to replace the variables to your table and column names. If this picks up the type information for the columns in your JSON, then it's possible to modify this SP to do what you need. If it doesn't but there's a way to modify the query to get it to pick up the columns, that would work too.
If it doesn't pick up the columns, based on Craig's idea I decided to write type inference for non variant (such as strings from CSV log files without type information). Try the SQL above and see what results first.

Comparing SQL tables

I am still new in SQL. I am currently having two tables in SQL server and I would like to not exactly compare but more likely see if the one specific column in table 1 is equal to similar specific column in table 2. I have a certain level of success with it but I would like to see also the ones which don't match from table 1 with table 2 (e.g. it can give back null value). Below you can see an example code which might help to understand better my point:
select tb1.models, tb1.year, tb1.series, tb2.model, tb.price
from tb1, tb2
where tb1.year = '2014' and tb1.models = tb2.model
and here comes the place which I have tried all kind of combinations like <> and etc. but unfortunately haven't got to a solution. The point is that in table 1 I have certain amount of models and on table 2 I have quite huge list which sometimes is not including the same ones from table 1. Due to which I want to see what is not matching exactly so I can try to check and analyse it.
The above example I've shown is returning only the ones which are equal and I see for example that there are 30 more models in table 1 but they are not in table 2 and don't have visibility which ones exactly.
Thank you in advance!
Btw: Do not use '2014', if this value (and the column tb1.year) is numeric (probably INT). Rather use tb1.year=2014. Implicit casts are expensive and can have various side effects...
This sounds like a plain join:
select tb1.models
, tb1.year
, tb1.series
, tb2.model
, tb.price
from tb1
INNER JOIN tb2 ON tb1.models = tb2.model
where tb1.year = '2014'
But your model*s* vs. modell might point to troubles with not normalized data... If this does not help, please provide sample data and expected output!
UPDATE
Use LEFT JOIN to find all rows from tb1 (rows without a corresponding row in tb2 get NULLs
USE RIGHT JOIN for the opposite
USE FULL OUTER JOIN to enforce all rows of both tables with NULLs on both sides, if there is no corresponding row.

How to populate a CTE with a list of values in Sqlite

I am working with SQLite and straight C. I have a C array of ids of length N that I am compiling into a string with the following format:
'id1', 'id2', 'id3', ... 'id[N]'
I need to build queries to do several operations that contain comparisons to this list of ids, an example of which might be...
SELECT id FROM tableX WHERE id NOT IN (%s);
... where %s is replaced by the string representation of my array of ids. For complicated queries and high values of N, this obviously produces some very ungainly queries, and I would like to clean them up using common-table-expressions. I have tried the following:
WITH id_list(id) AS
(VALUES(%s))
SELECT * FROM id_list;
This doesn't work because SQLite expects a column for for each value in my string. My other failed attempt was
WITH id_list(id) AS
(SELECT (%s))
SELECT * FROM id_list;
but this throws a syntax error at the comma. Does SQLite syntax exist to accomplish what I'm trying to do?
SQLite supports VALUES clauses with multiple rows, so you can write:
WITH id_list(id) AS (VALUES ('id1'), ('id2'), ('id3'), ...
However, this is not any more efficient than just listing the IDs in IN.
You could write all the IDs into a temporary table, but this would not make sense unless you have measured the performance improvement.
One solution I have found is to reformat my string as follows:
VALUES('id1') UNION VALUES('id2') ... UNION VALUES('id[N]')
Then, the following query achieves the desired result:
WITH id_list(id) AS
(%s)
SELECT * FROM id_list;
However, I am not totally satisfied with this solution. It seems inefficient.

Can SQL Server index a text string by delimiter?

I need to store content keyed by strings, so a database table of key/value pairs, essentially. The keys, however, will be of a hierarchical format, like this:
foo.bar.baz
They'll have multiple categories, delimited by dots. The above value is in a category called "baz" which is in a parent category called "bar" which is in a parent category called "foo."
How can I index this in such a way that it's rapidly searchable for different permutations of the key/dot combo? For example, I want to be able to very quick find everything that starts
foo
Or
foo.bar
Yes, I could do a LIKE query, but I never need find anything like:
fo
So that seems like a waste to me.
Is there any way that SQL would index all permutation of a string delimited by the dots? So, in the above case we have:
foo
foo.bar
foo.bar.baz
Is there any type of index that would facilitate searching like that?
Edit
I will never need to search backwards or from the middle. My searches will always begin from the front of the string:
foo.bar
Never:
bar.baz
SQL Server can't really index substrings, no. If you only ever want to search on the first string, this will work fine, and will perform an index seek (depending on other query semantics of course):
WHERE col LIKE 'foo.%';
-- or
WHERE col LIKE 'foo.bar.%';
However when you start needing to search for bar or baz following any leading string, you will need to search on the substring:
WHERE col LIKE '%.bar.%';
-- or
WHERE PATINDEX('%.bar.%', col) > 0;
This won't work well with regular B-tree indexes, and I don't think Full-Text Search will be much help either, because of the special characters (periods) - but you should try it out if this is a requirement.
In general, storing data this way smells wrong to me. Seems to me that you should either have separate columns instead of jamming all the data into one column, or using a more relational EAV design.
Its appears to be a work for CTE!
create TableA(
id int identity,
parentid int null,
name varchar(50)
)
for a (fixed) two level its easy
select t2.name, t1.name
from tableA t1
join tableA t2 on t2.id = t1.parentid
where t2.name = 'father'
To find that kind of hierarchical values for a most general case you ill need some kind of recursion in self-join table by using a CTE.
http://msdn.microsoft.com/pt-br/library/ms175972.aspx

Find columns that match in two tables

I need to query two tables of companies in the first table are the full names of companies, and the second table are also the names but are incomplete. The idea is to find the fields that are similar. I put pictures of the reference and SQL code I'm using.
The result I want is like this
The closest way I found to do so:
SELECT DISTINCT
RTRIM(a.NombreEmpresaBD_A) as NombreReal,
b.EmpresaDB_B as NombreIncompleto
FROM EmpresaDB_A a, EmpresaDB_B b
WHERE a.NombreEmpresaBD_A LIKE 'VoIP%' AND b.EmpresaDB_B LIKE 'VoIP%'
The problem with the above code is that it only returns the record specified in the WHERE and if I put this LIKE '%' it returns the Cartesian product of two tables. The RDBMS is Microsoft SQL Server. I would greatly appreciate if you help me with any proposed solution.
Use the short name plus appended '%' as argument in the LIKE expression:
Edit with info that we deal with SQL Server:
SELECT a.NombreEmpresaBD_A as NombreReal
,b.NombreEmpresaBD_B as NombreIncompleto
FROM EmpresaDB_A a, EmpresaDB_B b
WHERE a.NombreEmpresaBD_A LIKE (b.NombreEmpresaBD_B + '%');
According to your screenshot you had the column name wrong!
String concatenation in T-SQL with + operator.
Above query finds a case where
'Computex S.A' LIKE 'Computex%'
but not:
'Voip Service Mexico' LIKE 'VoipService%'
For that you would have to strip blanks first or use more powerful pattern matching functions.
I have created a demo for you on data.SE.
Look up pattern matching or the LIKE operator in the manual.
I would suggest adding a foreign key between the tables linking the data. Then you can just search for the one table and join the second to get the other results.

Resources