Say I have entities that I want to store in datomic. If the attributes are all known in advance, I just add them to my datomic schema once and can then make use of them.
What if in addition to known attributes, entities could have an arbitrary number of arbitrary keys, mapping to arbitrary values. Of course I can just store that list in some "blob" attribute that I also add to the schema, but then I couldn't easily query those attributes.
The solution that I've come up with is to define a key and a value attribute in datomic, each of type string, and treat every one of those additional key/value entries as entities in their own right, using aforementioned attributes. Then I can connect all those key/value-entities to the actual entity by means of a 1:n relation using the ref type.
That allows me to query. Is that the way to go or is there a better way?
I would be reluctant to lose the power of attribute definitions. Datomic attributes can be added at any time, and the limit is reasonably high (2^20), so it may be reasonable to model the dynamic keys and values as they come along, creating a new attribute for each.
Related
I'm working on a transformation work in which I need to transform a property graph dataset into a RDF dataset. There are so many n-ary relationships that need to be traited as a class, but I do not know how to affect an unique identification on these relations. I tried to use the row index but I've got more than one file on this work so this can't work. So I would like to know how do you affect an unique identification to relationships, if the URI is the solution, how do we do this in OntoRefine mapping? Thank you for your answers.
Lee
There are several ways to address this:
Ideally, use some characteristics of the related entities to make a deterministic URL. Eg if you're making a position (membership) node between a person and an org that involves a mandatory role and start date, you could use a URL like org/<org_id>/person/<person_id>/role/<role_id>/date/<date>
Use a blank node. In that case you don't need to worry about a URN
Use the row index if you prepend it with the table/file name (as a constant)
Use the GREL function random(). It doesn't produce a globally unique identifier, but if you ask for a large enough range, it'll be unique with a very high probability
Use a Jython function, as shown at How to create UUID in Openrefine based on the MD5 hash of the values
If you do your mapping using SPARQL, then use the builtin uuid() function
I have a general question about relational database design. I have a large list of objects from a table. These objects are classified with three descriptor classes. There is distributor1, distributor2 and distributor3. To describe one object one always builds a triplet of these descriptors.
For example assume that descriptor1 is colour, descriptor2 is size and descriptor3 is weight. For each object I would build a triplet which describes that object.
In my case I have thousands of entries for each descriptor. So I build a table for each descriptor. How can I now build a tripled and relate that to an object in the object table?
If there would be only one such tripled for each object, I could just store the three descriptor ids in each object as a foreign key, but let's assume that each object can have 0 or many such triplets.
I am using sqlalchemy, but I am happy to do the coding myself, I am just looking for keywords to look for in the documentation, since so far I could not find much.
My solution would be to create another table with three descriptor ids and and object id. Is that the way to go?
I could also store a string with tripled of descriptor ids in each object... but that seems to go very much against the principle of relational databases...
There's rarely a perfect design for all scenarios. What you've described would work well, if you know that you'll never need another attribute and you'll always lookup that row by using all three attributes. It depends on your use case, but those are pretty limiting assumptions.
Adding more attributes or looking up records by 1 or 2 attributes instead of all 3 is when Lucas's suggestion of adding additional columns that can be indexed is more flexible. The ability to define an arbitrary set of columns within a non-clustered index is where the relational db tends to get a lot of it's search performance/flexibility.
I have an application that requires a database containing a set of products where each product can have a set of tables. The end-user should be able to add new products and define new tables for a product. So each table has a set of columns that are specified by the user. The user can then fill the tables with rows of data. Each table belongs to exactly one product.
The end-user should also be able to view the tables as they were at a specific point in time (at a certain transaction).
How would I go about making a schema for this in Datomic so that querying it would be as efficient as possible?
I would go with 4 entity types: products, tables, columns, and rows.
The relationship between products and tables is best handled by a :table/product to-one ref attribute, but a :product/tables to-many component ref attribute could also work (the latter does not enforce the one-to-many relationship).
Likewise, I would use either a :column/table or :table/columns attribute. I would also have a :column/name string attribute and maybe a :column/type enumerated attribute.
The hardest part is to model rows.
One tempting solution is to just create an attribute per column - I actually think it's bad idea, Datomic attributes are not intended for such a dynamic use. In particular, schema attributes are stored in a cache on the Peer that's not meant to grow big. (I may be wrong about this, so it'd be nice if someone in the Datomic team could confirm.)
Instead, I would have a few dozens reusable :row/cell-0, :row/cell-1, :row/cell-2, etc. 'cell position' attributes, that are shared across all tables. Each actual column would be mapped to a at creation time by a to-one :column/position attribute.
If the rows can have several data types, it's a bit more difficult, you'd have to basically make an attribute for each (type,position) pair.
Then each row basically consist of a :row/table attribute and the above cell position attributes.
Here's a Datalog query that would let you read the whole table
[:find ?row ?column-name ?val :in $ ?table :where
[?column :column/table ?table]
[?row :row/table ?table]
[?row ?pos ?val]
[?column :column/position ?pos]
[?column :column/name ?column-name]]
Note that all of the above is only useful if you want to query the table with Datalog directly against your Datomic db. But it can be also completely fine to serialize your tables and store them as blobs - especially if they're small; later, you pull out the blob, deserialize it, then you can query with Datalog too. And if tables are to coarse for this use, maybe you can do it with rows.
I'm trying to figure out how to implement this relationship in coldfusion. Also if anyone knows the name for this kind of relationship I'd be curious to know it.
I'm trying to create the brown table.
Recreating the table from the values is not the problem, the problem that I've been stuck with for a couple of days now is how to create an editing environment.
I'm thinking that I should have a table with all the Tenants and TenantValues (TenantValues that match TenantID I'm editing) and have the empty values as well (the green table)
any other suggestions?
The name of this relationship is called an Entity Attribute Value model (EAV). In your case Tenant, TenantVariable, TenantValues are the entity, attribute and value tables, respectively. EAV is attempt to allow for the runtime definition or entities and is most found in my experience backing content managements systems. It has been referred to an as anti pattern database model because you lose certain RDBMS advantages, while gaining disadvantages such as having to lock several tables on delete or save. Often a suitable persistence alternative is a NoSQL solution such as Couch.
As for edits, the paradigm I typically see is deleting all the value records for a given ID and inserting inside a loop, and then updating the entity table record. Do this inside of a transaction to ensure consistency. The upshot of this approach is that it's must easier to figure out than delta detection algorithm. Another option is using the MERGE statement if your database supports it.
You may want to consider an RDF Triple Store for this problem. It's an alternative to Relational DBs that's particularly good for sparse categorical data. The data is represented as triples - directed graph edges consisting of a subject, an object, and the predicate that describes the property connecting them:
(subject) (predicate) (object)
Some example triples from your data set would look something like:
<Apple> rdf:type <Red_Fruit>
<Apple> hasWeight "1"^^xsd:integer
RDF triple stores provide the SPARQL query language to retrieve data from your store much like you would use SQL.
I'm developing an application with Google App Engine and stumbled across the following scenario, which can perhaps be described as "MVP-lite".
When modeling many-to-many relationships, the standard property to use is the ListProperty. Most likely, your list is comprised of the foreign keys of another model.
However, in most practical applications, you'll usually want at least one more detail when you get a list of keys - the object's name - so you can construct a nice hyperlink to that object. This requires looping through your list of keys and grabbing each object to use its "name" property.
Is this the best approach? Because "reads are cheap", is it okay to get each object even if I'm only using one property for now? Or should I use a special property like tipfy's JsonProperty to save a (key, name) "tuple" to avoid the extra gets?
Though datastore reads are comparatively cheaper datastore writes, they can still add significant time to request handler. Including the object's names as well as their foreign keys sounds like a good use of denormalization (e.g., use two list properties to simulate a tuple - one contains the foreign keys and the other contains the corresponding name).
If you decide against this denormalization, then I suggest you batch fetch the entities which the foreign keys refer to (rather than getting them one by one) so that you can at least minimize the number of round trips you make to the datastore.
When modeling one-to-many (or in some
cases, many-to-many) relationships,
the standard property to use is the
ListProperty.
No, when modeling one-to-many relationships, the standard property to use is a ReferenceProperty, on the 'many' side. Then, you can use a query to retrieve all matching entities.
Returning to your original question: If you need more data, denormalize. Store a list of titles alongside the list of keys.