Check if a field value exists in a JSONB array inside another table's field - database

For simplicity, let's assume that I have two tables: A and B.
Table A has a JSONB field called data with the field columns.
This JSONB array is just a list of ids that are primary keys in Table B.
Table B has the field id (the others don't matter for the question).
The idea is to create a constraint so that it's impossible to delete a row in Table B if this row's id is IN A.data->'columns'.
As far as I know, it's impossible to create such constrains in a conventional way so I have decided that this behavior can be implemented as:
SELECT *
FROM B
WHERE id = ANY (SELECT UNNEST(ARRAY(
SELECT JSONB_ARRAY_ELEMENTS_TEXT(A.data -> 'cols') FROM A)::int[]
)
);
This query supposedly does exactly what I want, but it looks clumsy enough to assume that there must be a fancier way. Constructing an array and then unnesting it doesn't seem to be optimal anyway.
Can you think of a better way to achieve the behavior I described above?

Related

How to implement a hierarchical structure of several nested composite types in PostgreSQL?

I am trying to implement a database in PostgreSQL 11.7, which should represent a hierarchical structure of several nested composite types. Currently I have the following defined (simplified):
CREATE TYPE type_school as (code integer, descr text);
CREATE TYPE type_district as (code integer, descr text, schools type_school[]);
CREATE TYPE type_city as (code integer, descr text, districts type_district[]);
CREATE TYPE type_country as (code integer, descr text, cities type_city[]);
and a single table:
CREATE TABLE countries (country type_country);
For example these should be valid records (the descr column is optional and not of interest):
country with code 1, cities 3,4,5 with districts {1,2}, {1,3}, {3,6}
country with code 2, cities 3,6 with districts {3,7}, {7,9}
To populate the table I use INSERT for the country and UPDATE for the other elements:
INSERT INTO countries values(ROW(1, 'country descr', ARRAY[]::type_city[]));
UPDATE countries SET(country.city[1].code, country.city[1].descr) = (1, 'city descr') WHERE (country).code = 1;
UPDATE countries SET(country.city[1].district[2].code, country.city[1].district[2].descr) = (2, 'district descr') WHERE (country).code = 1;
This setup works alright, as I can perform most of the necessary queries. However I do not think this is the correct approach. I am a C programmer with no experience in database programming. I view this arrangement as a array of struct elements, consisting of more arrays of struct. And I am used to accessing the elements by indexing, which is what you see in this implementation. I would like to have some of the features of the database, such as constraints. These however are not possible on PostgreSQL types, only on tables. And if I define the array as a table, I do not know how to write the INSERT queries to access an inner table. According to some websites, nested tables are not possible in PostgreSQL, they recommend using arrays. Is it possible to enforce constraints in an array of composite type? Another suggestion I found on the web is using the ltree extension. However it seems to me that the tree elements are all of the same type and I have a different type on each level. Also in my current implementation I do not know how to delete a certain element and all of its sub-elements. So my question is:
How should one implement a table to represent a tree-like structure, consisting of 4 levels, each level representing a different type, so that constraints can be specified for the elements of each type? Is it even possible to do it with a relational database? And just to be clear, all I have to differentiate the elements is an index, and each elements is identified uniquely only by its path country[i]->city[j]->district[k]->school[l].
Thanks to Laurenz Albe for pointing me in the right direction. I am posting a solution with code, in case anybody else needs a working example.
First create the top level table:
CREATE TABLE countries(code INTEGER, descr TEXT, PRIMARY KEY(code));
The table on the next level uses its code and the code from the top level table as its primary key. This way duplicate values for its column 'code' are possible, as the combination of code_country and code_city must be unique:
CREATE TABLE cities(code INTEGER,
descr TEXT,
code_country INTEGER,
FOREIGN KEY(code_country) REFERENCES countries(code),
PRIMARY KEY(code_country, code));
The table on the third level should use the primary key of its parent as a foreign key. This is accomplished by omitting the column names in the REFERENCES directive:
CREATE TABLE district(code INTEGER,
descr TEXT,
code_country INTEGER,
code_city INTEGER,
FOREIGN KEY(code_country, code_city) REFERENCES cities,
PRIMARY KEY(code_country, code_city, code));
Adding constraints is now trivial.

How to use order by with table to sort the table in hierarchical way

I am given with table shown in Image 1
How to use order by statement so that i can get resultant table. I don't know how to solve it. I just try to use order by C, D column but all numm comes upwards irrespective of B Column.
Result given in image 2
Updated
Sorry, I just forgot to mention this table also contain id and this table is already sorted on the basis of id. So i am not even able to sort it by column A. Due to this SQL think whole table is already sorted but I still want to sort on the basis of column

PostgreSQL - How do you create a "dimension" table from a 'select distinct' query and create a primary and foreign key?

I have a fact table with many entries, and they have 'ship to' columns that are very closely related, but none of the columns are always unique. I would like to make a dimension table for this, and reference the new dimension table rows using a key.
I can create the new dimension table with a create table as select distinct, and I can add a row number primary key for it, but I'm not sure how to put the matching foreign key into the fact table, where they match.
I could easily create a new foreign key column, and fill it using a where to match the old distinct rows in the fact table to the distinct rows in the dimension table, but there is no easy column to match (since there is no key yet), so do I need to create a 'where' match that matches all of the columns together, then assigns the primary key from the dimension table?
I may just be lazy, and don't want to research how to create alter queries and how to create complex where matching, but it seems like a pretty common action for database management, so I feel like it might help others.
I guess one way to do it is to create a concatenation of all of the values in all of the columns for each row in the new dimension to make a unique identifier from the data, then do the same thing for the columns in the fact dimension. Now there is a unique key between the two, and this can be converted to an integer id. Add a new sequence column in the new dimension, then create a new column in the fact, and set it to the integer id in the new dimension where the concatenated id is the same.
This is obviously very inefficient, as the whole contents of the new dimension have to be duplicated, as well as that again in the fact dimension - just to create a link.

Is sorting a table by a time field (where auto_now_add=True), equivalent to sorting it by the said table's primary key ID?

Imagine a database table with a time_of_insert attribute, which is auto-filled by the current time for every INSERT (e.g. in a Django model's example, the attribute has auto_now_add=True).
In that case, is sorting the said table by time_of_insert equivalent to sorting it by each row's ID (primary key)?
Background: I ask because I have a table where I have an auto created time_of_insert attribute. I'm currently sorting the said table by time_of_insert; this field isn't indexed. I feel I can simply sort it by id, instead of indexing time_of_insert - that way I get fast results AND I don't have to incur the over-head of indexing one more table column. My DB is postgres.
What am I missing?
Now it's not.
id guarantees uniqueness. And your datetime column does not.
So in case if there are 2 rows with the same time_of_insert value - then the result set order is not guaranteed.

SQL Server 2008 - Database Design Query

I have to load the data shown in the below image into my database.
For a particular row, either field PartID would be NULL OR field GroupID will be NULL, and the other available columns refers to the NON-NULL entity. I have following three options:
To use one database table, which will have one unified column say ID, which will have PartID and GroupID data. But, in this case I won't be able to apply foreign key constraint, as this column will be containing both entities' data.
To use one database table, which will have columns for both PartID and GroupID, which will contain the respective data. For each row, one of them will be NULL, But in this case I will be able to apply foreign key constraint.
To use two database tables, which will have similar structure, the only difference will be the column PartID and GroupID. In this case I will be able to apply foreign key constraint.
One thing to note here is that, the table(s) will be used in import processes to import about 30000 rows in one go and will also be heavily used in data retrieve operations. Also, the other columns will be used as pivot columns.
Can someone please suggest what should be best approach to achieve this?
I would use option 2 and add a constraint that only one can be non-null and the other must be null (just to be safe). I would not use option 1 because of the lack of a FK and the possibility of linking to the wrong table when not obeying the type identifier in the join.
There is a 4th option, which is to normalize them as "items" with another (surrogate) key and two link tables which link items to either parts or groups. This eliminates NULLs. There are further problems with that approach (items might be in both again or neither without any simple constraint), so unless that is necessary for other reasons, I wouldn't generally go down that path.
Option 3 could be fine - it really depends if these rows are a relation - i.e. data associated with a primary key. That's one huge problem I see with the data presented, the lack of a candidate key - I think you need to address that first.
IMO option 2 is the best - it's not perfectly normalized but will be the easiest to work with. 30K rows is not a lot of rows to import.
I would modify the table so it has one ID column and then add an IDType that is either "G" for Group or "P" for Part.

Resources