I have a connection-object browser that I want to allow a user to view of various data sources they are connected to.The viewer of the objects look something like this:
Connection: Remote.1234.MySQL (3 level source)
Database: Sales
Table: User
Field: Name -- CHAR(80)
Field: Age -- INT32
Table: Product
...
Table: Purchase
...
Database: Other
...
Connection: Remote.abc.ElasticSearch (2 level source)
Index: Inventory
Field: ID -- INTEGER
Field: Product -- STRING
...
Connection: Local.xyz.MongoDB (3 level source)
Database: Mail
Collection: Users
Field: MailboxID -- INTEGER
Field: Name -- STRING
Collection: Documents
...
Connection: Local.xyz.SQLServer (4 level source)
Database: Main
Schema: Public
Table: user
Field: Name -- STRING
Database: History
...
In other words, a 'Source' is a hierarchy of a known number of levels and a known 'name' for each level. While the entire hierarchy is variable, the hierarchy of any given source will always have the same number of levels and name. What might be a good way to model this relationally? My thought was to have the following:
Connection:
id
host
(other details)
SourceType:
id
Name
SourceTypeLevelMapping:
SourceTypeID
level (int)
name
ThreeLevelSource_Level1: # e.g., Database
ID
ParentID (ConnectionID)
Name
(other details)
ThreeLevelSource_Level2: # e.g., Table
ID
ParentID (Level1ID)
Name
(other details)
ThreeLevelSource_Level3: # e.g., Field
ID
ParentID (Level2ID)
FieldName
FieldType
(other details)
Then do the same for the other level-ed hierarchies:
TwoLevelSource_Level1, TwoLevelSource_Level2
FourLevelSource_Level1, FourLevelSource_Level2, FourLevelSource_Level3, FourLevelSource_Level4
So basically define the known hierarchies and for each new source that we add we would attach it to one of the known hierarchy levels. The alternative approach I was thinking of doing is to create a new hierarchy for each new source, but then we would be looking at literally hundreds of tables if we were to allow access to 25-50 sources.
What might be a good way to model this type of hierarchical data?
(Also, yes I am familiar with the existing general approaches for modeling hierarchical data as delineated here -- What are the options for storing hierarchical data in a relational database?, How can you represent inheritance in a database? -- the below is not a duplicate.)
Relational Solution
Responding to the relational-database and hierarchic-data tags, the latter being pedestrian in the former.
1.1 Preliminary
Due to the requirement for, and the difference between:
the genuine SQL Platforms (conformation to the Standard; server architecture, unified language; etc) and
the pretend "SQL" programs (no architecture; bits of language spread across those programs; no Transactions; no ACID; etc) that provide no compliance to the Standard, and therefore use the term incorrectly, and
the non-SQLs
Thus I use Record and Field to cover all possibilities, instead of the Relational terms, which would convey Relational definitions.
All possibilities are catered for, but a Relational and SQL-compliant approach (eg. MS SQL Server) is taken as the best method, due to its 40-year establishment and maturity, and the absence of alternatives.
The collection of SQL Platforms; pretend "SQL" applications; and non-SQL suites, are labelled DataSource.
1.2 Compliance
This solution is 100% Relational: Codd's Relational Model, not the substandard alternatives marketed by the academics as "relational":
It can be implemented in any SQL compliant Platform
It has Relational Integrity (which is logical, beyond Referential Integrity, which is SQL and physical); Relational Power; and Relational Speed.
All update interaction is simple, via SQL ACID Transactions.
No warranties are given for pretend "SQLs" and non-SQLs.
2 Solution
2.1 Concept
I appreciate that as a developer, your focus in on the data values and how to retrieve it. However, two levels of definition are required first, in order to support the third data level:
Catalogue Potential
Blue (Reference cluster).
The DtaSources and definition that is available in the market, which the organisation might use. Let's say 42, as per your descriptions.
I would entrust this only to a developer, not an user_admin, because the set up it critical (the lower levels depend on it), and it describes the physical capability and limitations of each DataSource.
Catalogue Actual
Green (Identification cluster).
The DataSources and definition that are actually contracted and used by the organisation. Let's say 12. At this point we have connection addresses; ports; and users. It is constrained CataloguePotential, directly, and via CHECKS that call Functions.
This level defines the content (the tables that actually exist), it contains no data values.
Maintaining an SQL mindset, because that would be the most prudent, given that it is an established Standard, with 40 years of maturity, because it gives us the most flexibility: the CatalogueActual forms the SQL Catalogue.
Likewise, I have used the terms Record and Field for the objects in the collective, rather than Table and Column, which would imply Relational and SQL meanings.
SQL Platform
This level can be populated automatically by the program querying the SQL Catalogue.
"SQL" applications and non-SQL suites
Population is manual due to the absence of a Catalogue. It can be done by an user_admin. The constraint would be your program attempting a trial query to validate the user-supplied table definition.
Current Data
Yellow (Transaction cluster)
The current data, that the user has queried from the DataSources, via his Connection, for the webpage. The assumption is, I have taken the user::webpage to be central, and governing (one user per Connection; one user per webpage), not the OO Object.
if the OO Objects are not reliable (depends on the library you use), or there is one set of Objects across all user-webpages, more Constraints need to be added.
2.2 Method
You need:
Simple Hierarchy
a single-parent hierarchy to replicate the fixed levels of definition in the Catalogue in the SQL servers, as well as the variable levels in the constructed catalogue for the pretend "SQLs" and the non-SQLs.
Relational Hierarchies are fully defined, along with SQL implementation details, in the Hierarchy doc. The simple or single-parent model is given in [§ 2.2].
The Root level (not the Anchor) is the Potential DataSource
The Leaf level is that which contains data, either a Record or a Struct (for those in the collective that allow one).
In the Potential Datasource, it is representative, truly a RecordType and FieldType
In the Actual DataSource, it is an actual Record, which is an instance of RecordType, and actual Field, which is a narrower definition of FieldType.
Method/Struct
In order to handle a Struct, which in definition terms is identical to a Record, and to allow a Struct to contain a Struct, we need a level of abstraction, which is ...
Article
is either
a Field, which is the atomic unit of storage, xor
a Struct, which contains Articles
that requires an Exclusive Subtype cluster, fully defined along with SQL implementation details, in the Subtype doc
Method/Array
To support an Array of Fields:
These are multi-valued dependencies on Field, thus implemented as child tables.
For scalars the NumElement is 1.
That makes the Exclusive Subtype cluster on Field that is otherwise required for scalars redundant.
2.3 Relational Data Model
This is the progress after seven iterations. It shows the Table-Relation level (the Attribute level is too large for an inline graphic).
Assumption
That the JS (or whatever) objects are local to the webpage/user. If your objects are global, the value tables need to be constrained to Connection.
The data model is given in a single PDF:
Table Relation level
Table Relation level + sample data
Table Attribute level + sample data.
2.4 Notation
All my data models are rendered in IDEF1X, available from the early 1980's, the one and only notation for Relational Data Modelling, the Standard since 1993.
The IDEF1X Introduction is essential reading for those who are new to Codd's Relational Model, or its modelling method. Note that IDEF1X models complete, they are rich in detail and precision, showing all required details, whereas a home-grown model, being unaware of the imperatives of the Standard, have far less definition. Which means, the notation needs to be fully understood.
Here three working sqlite flavored implementations (once sqlite is being used, column types not being enforced are acceptable, only integer primary keys were used in order to act as rowid):
In all cases, sqlite foreign key PRAGMA is set to true: PRAGMA foreign_keys = 1;
Simple implementation - one fixed table for each source/level (constrained by foreign keys)
The following design/implementation utilizes one table for each type of database and level. Tables references one each other with foreign keys to ensure correctness. For example, a mongo collection can't be child of a mysql database. Only in the connection level all database types share the same table, but it could be different if it is expected different properties for each kind of connection.
create table databasetype(name primary key) without rowid;
insert into databasetype values ('mysql'),('elasticsearch'),('mongo'),('sqlserver');
create table datatype(name primary key) without rowid;
insert into datatype values ('int'),('str'); -- you can differentiate varchar if you will
create table connection(id integer, hostname, databasetype, primary key(id), foreign key(databasetype) references databasetype(name));
create table mysqldatabase(id integer, connectionid, name, primary key(id), foreign key(connectionid) references connection(id));
create table mysqltable(id integer, databaseid, name, primary key(id), foreign key(databaseid) references mysqldatabase(id));
create table mysqlfield(id integer, tableid, name, datatype, datalength, primary key(id), foreign key(tableid) references mysqltable(id), foreign key(datatype) references datatype(name));
create table elasticsearchindex(id integer, connectionid, name, primary key(id), foreign key(connectionid) references connection(id));
create table elasticsearchfield(id integer, indexid, name, datatype, datalength, primary key(id), foreign key(indexid) references mysqltable(id), foreign key(datatype) references datatype(name));
create table mongodatabase(id integer, connectionid, name, primary key(id), foreign key(connectionid) references connection(id));
create table mongocollection(id integer, databaseid, name, primary key(id), foreign key(databaseid) references mongodatabase(id));
create table mongofield(id integer, collectionid, name, datatype, datalength, primary key(id), foreign key(collectionid) references mongocollection(id), foreign key(datatype) references datatype(name));
create table sqlserverdatabase(id integer, connectionid, name, primary key(id), foreign key(connectionid) references connection(id));
create table sqlserverschema(id integer, databaseid, name, primary key(id), foreign key(databaseid) references sqlserverdatabase(id));
create table sqlservertable(id integer, schemaid, name, primary key(id), foreign key(schemaid) references sqlserverschema(id));
create table sqlserverfield(id integer, tableid, name, datatype, datalength, primary key(id), foreign key(tableid) references sqlservertable(id), foreign key(datatype) references datatype(name));
Loading data representing the first table:
insert into connection(hostname, databasetype) values ('remote:1234', 'mysql');
insert into mysqldatabase(connectionid, name) select id, 'sales' from connection where hostname='remote:1234';
insert into mysqltable(databaseid, name) select id, 'user' from mysqltable where name='sales';
insert into mysqlfield(tableid, name, datatype, datalength) select id, 'name', 'str', 80 from mysqldatabase where name='product';
insert into mysqlfield(tableid, name, datatype) select id, 'age', 'i32' from mysqldatabase where name='product';
Trying invalid manipulations of data:
insert into mysqlfield(tableid, name, datatype) values (2, 'newfield', 'qubit');
-- Error: FOREIGN KEY constraint failed
In order to pretty-print the whole tree it is necessary to do a manual join of all tables involved.
Graph like implementation - one table representing the tree, other the hierarchy (constrained by triggers)
Here the element table is used to represent each element/node in the tree. Its level column explicitly classifies each element as an database, table, etc. Here sqlite's rowid is being used as the primary key, but it is easy to change it to a regular id.
In the previous implementation, foreign keys were used to ensure model correctness. Now triggers are used for this job. They decide which parent level accepts which child level, as it is allowed for the respective dbtype - those rules are specified on the element_type table.
Lastly, an exra table element_properties, is used to allow extra properties to be attached to any elements, such as field type.
create table db_type(name primary key) without rowid;
insert into db_type values ('mysql'),('elasticsearch'),('mongo'),('sqlserver');
create table element_type(parentlevel, childlevel, dbtype, primary key(parentlevel, childlevel, dbtype), foreign key(dbtype) references db_type(name)); --not using without rowid to be able to have null parent level
insert into element_type values
(null, 'connection', 'mysql'),
('connection', 'database', 'mysql'),
('database', 'table', 'mysql'),
('table', 'field', 'mysql'),
(null, 'connection', 'elasticsearch'),
('connection', 'index', 'elasticsearch'),
('index','field', 'elasticsearch'),
(null, 'connection', 'mongo'),
('connection', 'database', 'mongo'),
('database', 'collection', 'mongo'),
('collection', 'field', 'mongo'),
(null, 'connection', 'sqlserver'),
('connection', 'database', 'sqlserver'),
('database', 'schema', 'sqlserver'),
('schema', 'table', 'sqlserver'),
('table', 'field', 'sqlserver');
create table element(id integer, parentid, name, level, dbtype, primary key(id), foreign key(parentid) references element(id), foreign key(dbtype) references db_type(name));
create table element_property(parentid, name, value, primary key(parentid, name), foreign key(parentid) references element(id)) without rowid;
-- trigger to guarantee that new elements will conform hierarchy
create trigger element_insert before insert on element
begin
select iif(count(*)>0, 'ok', raise(abort,'invalid parent-child insertion')) from element_type etc join element_type etp on (etp.childlevel, etp.dbtype)=(etc.parentlevel, etc.dbtype) where (etc.dbtype, etc.parentlevel, etc.childlevel)=(new.dbtype, (select level from element ei where ei.rowid=new.parentid), new.level);
end;
-- trigger to guarantee that updated elements will conform hierarchy
create trigger element_update before update on element
begin
select iif(count(*)>0, 'ok', raise(abort,'invalid parent-child update')) from element_type etc join element_type etp on (etp.childlevel, etp.dbtype)=(etc.parentlevel, etc.dbtype) where (etc.dbtype, etc.parentlevel, etc.childlevel)=(new.dbtype, (select level from element ei where ei.rowid=new.parentid), new.level);
end;
-- trigger to guarantee that hierarchy removal must respect existing elements (no delete cascade used)
create trigger element_type_delete before delete on element_type
begin
select iif(count(*)>0, raise(abort,'can''t remove, entries found in the element table using this relationship'), 'ok') from element etc join element etp on etp.rowid=etc.parentid and etp.dbtype=etp.dbtype where etc.dbtype=old.dbtype and (etp.level,etc.level)=(old.parentlevel, old.childlevel);
end;
-- trigger to guarantee that hierarchy changes must respect existing elements
create trigger element_type_update before update on element_type
begin
select iif(count(*)>0, raise(abort,'can''t change, entries found in the element table using this relationship'), 'ok') from element etc join element etp on etp.rowid=etc.parentid and etp.dbtype=etp.dbtype where etc.dbtype=old.dbtype and (etp.level,etc.level)=(old.parentlevel, old.childlevel) and (etp.level,etc.level)!=(new.parentlevel, new.childlevel);
end;
Loading data representing the first table:
insert into element(name, level, dbtype) values ('remote:1234', 'connection', 'mysql');
insert into element(name, level, dbtype, parentid) values ('sales', 'database', 'mysql', (select id from element where (level, name, dbtype)=('connection', 'remote:1234', 'mysql')));
insert into element(name, level, dbtype, parentid) values ('user', 'table', 'mysql', (select id from element where (level, name, dbtype)=('database', 'sales', 'mysql')));
insert into element(name, level, dbtype, parentid) values ('name', 'field', 'mysql', (select id from element where (level, name, dbtype)=('table', 'user', 'mysql')));
insert into element(name, level, dbtype, parentid) values ('age', 'field', 'mysql', (select id from element where (level, name, dbtype)=('table', 'user', 'mysql')));
insert into element_property(name, value, parentid) values ('fieldtype', 'varchar', (select id from element where (level, name, dbtype)=('field', 'name', 'mysql')));
insert into element_property(name, value, parentid) values ('fieldlength', 80, (select id from element where (level, name, dbtype)=('field', 'name', 'mysql')));
insert into element_property(name, value, parentid) values ('fieldtype', 'integer', (select id from element where (level, name, dbtype)=('field', 'age', 'mysql')));
Trying invalid manipulations of data:
insert into element(name, level, dbtype, parentid) values ('documents', 'collection', 'mysql', (select id from element where (level, name, dbtype)=('database', 'sales', 'mysql')));
-- Error: invalid parent-child insertion
update element_type set childlevel='specialfield' where dbtype='mysql' and (parentlevel, childlevel)=('table','field');
-- Error: can't change, entries found in the element table using this relationship
Pretty-printing the tree:
create view elementree(path) as
with recursive cte(id, name, depth, dbtype, level) as (
select id, name, 0 as depth, dbtype, level from element where parentid is null
union all
select el.id, el.name, cte.depth+1 as depth, el.dbtype, el.level from element el join cte on el.parentid=cte.id
order by depth desc
)
select substring(' ',0,2*depth)||name||' ('||dbtype||'-'||level||')' from cte;
select * from elementree;
-- remote:1234 (mysql-connection)
-- sales (mysql-database)
-- user (mysql-table)
-- documents (mysql-table)
-- name (mysql-field)
-- age (mysql-field)
Minimalist DRY graph like implementation - one table with only names representing the tree and only one auxiliar table
Here again it is used an element table to represent each element in the tree. Differently from the previous case, the table has less information and the type of each element - whether it is a database or a table is implicitly inferred instead of explicitly determined by a column. By simply adding an user as a child of sales, it is inferred that user is a mysql table, once it is child of a mysql database - sales, which is adatabase because it is child of a mysql connection, which is child of the mysql root element. Dbtypes are root elements in this tree, all their children are inferred to be of this dbtype.
Here the hierarchypath table was used to tell the hierarchy that has be followed in the element tree. For the user confort, (s)he will only have to insert a (> separated) string representing the hierarchy path, starting from dbtype. The hierarchy view will desconstruct this string to the hierachy structure. One example of a hirearcy path would be: mysql>connection>database>table>field.
Note that again, sqlite's rowid is used as table id. Remember that it is not possible to see rowid by simply select * from table;, it is hidden by default, it is needed to explicitly select it: select rowid,* from table;.
create table element(name, parentrowid, foreign key(parentrowid) references element(rowid));
-- dbtypes are the root elements
insert into element(name) values ('mysql'),('elasticsearch'),('mongo'),('sqlserver');
create table hierarchypath(path);
insert into hierarchypath values
('mysql>connection>database>table>field'),
('elasticsearch>connection>index>field'),
('mongo>connection>database>collection>field'),
('sqlserver>connection>schema>database>table>field');
Loading data:
insert into element select 'remote:1234',rowid from element where (name,coalesce(parentrowid,-1))=('mysql',-1); --returning rowid; -- returning only works for sqlite 3.35+
insert into element select 'sales',rowid from element where rowid=5;
insert into element select 'user',rowid from element where rowid=6;
insert into element select 'name',rowid from element where rowid=7;
insert into element select 'age',rowid from element where rowid=7;
Pretty-printing:
create view hierarchy(root, depth, name) as
with recursive hierarchycte(root, depth, name, remaining) as (
select substr(path, 0, instr(path, '>')) as root, 0 as depth, substr(path, 0, instr(path, '>')) as name, substr(path, instr(path, '>')+1)||'>' as remaining from hierarchypath
union all
select root, depth+1 as depth, substr(remaining, 0, instr(remaining, '>')) as name, substr(remaining, instr(remaining, '>')+1) as remaining from hierarchycte where instr(remaining, '>') > 0
)
select root, depth, name from hierarchycte where depth>=0;
create view elementhierarchy(root, depth, name) as
with recursive elementcte(root, depth, name, rowid, parentrowid) as (
select name as root, 0 as depth, name, rowid, parentrowid from element where parentrowid is null
union all
select elcte.root, elcte.depth+1, el.name, el.rowid, el.parentrowid from elementcte elcte join element el on el.parentrowid=elcte.rowid
order by depth desc
)
select root, depth, name from elementcte;
create view elementree as
with recursive elementcte(root, depth, name, rowid, parentrowid) as (
select name as root, 0 as depth, name, rowid, parentrowid from element where parentrowid is null
union all
select elcte.root, elcte.depth+1, el.name, el.rowid, el.parentrowid from elementcte elcte join element el on el.parentrowid=elcte.rowid
order by depth desc
)
select substring(' ',0,2*h.depth-2)||eh.name||' ('||h.root||'-'||h.name||')' from (select *,row_number() over () as originalorder from elementhierarchy) eh join hierarchy h on (eh.root,eh.depth)=(h.root,h.depth) where h.depth>0 order by originalorder;
select * from elementree;
-- remote:1234 (mysql-connection)
-- sales (mysql-database)
-- user (mysql-table)
-- age (mysql-field)
-- name (mysql-field)
Triggers were not implemented here, but it would be good to do so. One example would be to avoid inserting more levels than allowed.
It would be wiser to store the hierarchy in the desconstructed form seen on the view
hierarchy, by doing the desconstruction in insertion time instead of every select query to avoid cpu consumption. Here it was left this way to differentiate it more from other implementations.
Here the last level entity, the field have no properties as shown on previous implementations. In this model it would be necessary to add one or two extra levels to the hierarchy: ...table>field>fieldpropertyandvalue or ...table>field>fieldproperty>fieldpropertyvalue, in the first case an example of fieldpropertyandvalue would be datatype=integer and an example of separated property and values would be respectively datatype and integer. This approach where any properties are new nodes in the graph is closer to the approach used by RDF stores.
To conclude it must be stated that it would be possible to use specialized graph databases, using their own query languages like cypher in neo4j and sparql in others or even other languages, but since the graph design overall is simple, a relational database suffice our needs.