Is it idiomatic to make a join entity? - datomic

When I say join entity I mean whatever the datomic equivalent to a SQL join table is. Say I have a parent entity with attribute name and I have a child entity with attribute name. parent is a in a many to many relationship with child as each parent can have multiple children and each child can have two parents.
If I were using SQL I'd create a join table family that includes the foreign keys to parent and child; however with datomic, I have the option to pick either parent or child to give a reference attribute to the other with a many cardinality. Is this the favored approach over creating a new entity? What if family has attributes that are more associated with the family as a whole than individual parents/children; for example, a family priority number?

Datomic's flexible schema would allow you to model this either way. If you need to model attributes for the family itself, though, it makes more sense for family to be a reified entity.

Related

Should database include transitive properties to avoid nullability?

Consider this simple setup.
In this model, the following restrictions apply:
A person is either a parent or a child.
Only one parent per child
A parent has a relationship to a public utility instance
A child has a transitive relationship to the utility because of the parent.
Now, the question is: should every child have the "City utility id" property set in the database?
Advantage:
You avoid nullability. It is said that databases will build less effective indexes if the field is nullable because every person that is a child will have the same value (null) on this property.
Disadvantage:
Less clean, more bookkeeping on CUD operations. The field does not convey data that isn't already represented in the database.
EF supports both Table Per Hierarchy (TPH) and Table Per Concrete Type (TPT) so as far as the schema goes you have options. While both a Parent and a Child "are a" person, they each have individual characteristics. One of which is that only Parents have a City Utility assigned, the others are that Children can only be associated to a Parent. (A parent cannot reference another parent as a child, or a child cannot reference another child as a parent.) Handling all of these scenarios within a single table is a TPH structure which relies on implicit rules to be enforced by the application code, and results in a lot of null-able references and fields for data that applies to one or the other.
Wherever possible I recommend using a TPT structure and making the relationships more explicit. This has the benefit of not relying solely on application code to ensure that relationships and "optionality vs. required" are enforced at a DB level.
This would have something like:
[Person]
PersonId [PK]
// other common fields that apply to ALL types of Person.
[Parent]
PersonId [PK, FK]
CityUtilityId [FK]
// other parent-specific fields.
[Child]
PersonId [PK, FK]
ParentPersonId [FK] (To Parent, not Person)
// other child-specific fields.
This way if a parent or child has required or optional fields, they can be put in their respective tables with the respective NULL-ability. The alternative is that the field would always be NULL-able and it's up to the application to ensure the required nature for one or the other is enforced. The DB would be free to get into a completely invalid state by mistake at any point without complaint.
There is still a lot of attraction out in the development community to minimize the number of tables which stems largely from the days when drive space was expensive and schema cost $$ so combining similar data into a single table might have made sense. Relationally though it still had significant drawbacks. With modern databases I'd argue it's always better to only combine what is effectively identical and use TPT for inheritance, or use composition.
An example of Composition would be something like an Order which has a status. That status might be Delivered and there might be details we want to record against an order when it is delivered. (signatures, etc.) These could be Null-able fields on the Order table, but they only apply to Delivered orders and would be Null in all other cases. Instead, having a table like OrderDeliveryDetails /w a 1-to-1 relationship with order which is created when an order is delivered. (And deleted/made inactive if an order changes from Delivered to another status for any reason.)

Why are foreign keys stored in child items rather than parent items in a database

I feel like there is probably a sensible answer to this question. When I create child objects in code, the parent object stores a reference to the child. The children don't know about the parent unless there is a specific reason for them to store a reference.
With databases, the opposite is the norm. ie: You create something that "has many" something elses, and a reference to the parent is stored in each of the many child items.
So generally speaking if I am programming, I have a list of child items stored in the parent. If I am databasing, I have many child items each with a parent-reference, but the parents do not have references to the children.
How and why did this come to pass? Is it just a matter of arbitrary design decisions becoming the norm, or is there a performance or logic reason behind data stores doing it one way and code objects doing it another?
Not sure if that's quite the real reason, but here's my view.
The fundamental difference I find is that, in databases, each cell is designed to contain one, and only one piece of data. The child can easily reference the parent though its PK, effectively forming a FK there. But, how would the parent reference the childs?
Remember that in a one-to-many relationship, each parent may have an arbitrary number of childs, so what kind of column would the parent need to hold those references? Having the PK of the child would be useless, since you can hold only one per column (that would make a one-to-one instead). You can't simply put a list of PKs in a DB cell, except though hacks like a comma-separated string, but that would defeat the purpose of FK and eliminate most benefits of RMDBS. An intermediate table is a possible solution, but then you're just moving the problem to another place, as that table then becomes the child, and the parent still has no references to it.
In contrast, OOP languages contain data structures that can be used to store multiple items in a single property: collections. With those you have a property that contains an object containing multiple, arbitrary number of child object references. Is this kind of structure what relational databases lack to make such a reference possible. Child to parent reference (or a many-to-one side) is also possible with a normal object reference.
The idea is to be able to add new things to the old without modifying them.
Suppose you have a client with names and addresses, then you add transactions. Transactions shouldn't modify the client. Then you add special orders for the client and so on. You shouldn't have to modify client table for all of those.
Code should work the same. This is a core principle of oop called coupling/cohesion

In Google App Engine Datastore, to what extent does using parent keys hurt performace?

I have two models which naturally exist in a parent-child relationship. IDs for the child are unique within the context of a single parent, but not necessarily globally, and whenever I want to query a specific child, I'll have the IDs for both parent and child available.
I can implement this two ways.
Make the datastore key name of each child entity be the string "<parent_id>,<child_id>", and do joins and splits to process the IDs.
Use parent keys.
Option 2 sounds like the obvious winner from a code perspective, but will it hurt performance on writes? If I never use transactions, is there still overhead for concurrent writes to different children of the same parent? Is the datastore smart enough to know that if I do two transactions in the same entity group which can't affect each other, they should both still apply? Or should parent keys be avoided if locking isn't necessary?
In terms of the datastore itself, parent/child relationships are conceptual only. That is, the actual entities are not joined in any way.
A key consists of a Parent Key, a Kind and Id. This is the only link between them.
Therefore, there isn't any real impact beyond the ability to do things transactionally. Similarly, siblings have no actual relationship, just a conceptual one.
For example, you can put an entity into the datastore referencing a parent which doesn't actually exist. That is entirely legitimate and oftentimes very useful.
So, the only difference between option 1 and option 2 is that with option 1 you have to do more heavy lifting and cannot take advantage of transactions or strongly consistent queries.
Edit: The points above to do not mention the limitation of 1 write per entity group per second. So to directly answer the original question, using parent keys limits throughput if you want to write to many entities sharing the same parent key within a second outside of a single transaction.
In general, if you don't need two entities to be updated or read in the same transaction, they should not be in the same entity group, i.e. have similar roots in their key paths, as they would if one were a key-parent of the other. If they're in the same entity group, then concurrent updates to either entity will contend for the entire group, and some updates may need to be retried.
From your question, it sounds like "<parent_id>,<child_id>" is an appropriate key name for the child. If you need to access these IDs separately (such as to get all entities with a particular "<child_id>"), you can store them as indexed properties, and perform queries as needed.
For the transactions, you cannot do multiple concurrent writes
https://developers.google.com/appengine/docs/java/datastore/transactions#Java_What_can_be_done_in_a_transaction

What will be the Correct Defination of Junction Object in Salesforce

Recently I was puzzled with the Junction Object, well I am clear about this thing, But I require a proper definition with an example, So I can understand it more easily and clear manner.
My actual problem is here:--
The junction Object provides a many to many relationship, So surely in this relationship, one will be acted as master and other one as child relationship or vice verse. So Now As you also know that when we delete a master entry, then the resultant child entreies will also be removed/deleted which are attached to that Master entry. In our case we have a two sided Master-child relationship. means a Master is a child for an another object and vice verse, So If we deleted an entry in any of the side then it will be deleted/removed other entries also, and as the other other entries removed which are also attach to the previously removed entry side then both side of entries will removed, So I/m just confused in it, that how's our junction Object functionality is working.
Its Many-to-many relationship.
For example you have object Position and object Candidate.
Each candidate can be related to many jobs and each Position can be related to many candidates.
For this purpose create a custom object "junction object" with two master-detail relationship fields, linking to the object Job and object Position which represented this kind of relationship.

is entry.key.id unique across all entries of a model?

I have a model. When I list id of all entries of the model. Some id of them are same. When I create the entries, didn't defined id for them it's assigned automatically. Maybe some of them have a different parents.
So is entry.key.id unique for across all entries of a model? or it's defend on parent of them?
No, it depends on the parent. The path - that is, parent kind, parent ID, child kind, child ID - is unique, but child ID is reused (although not deterministically) across different entity groups.
FYI: Google App Engine Datastore Documentation
http://code.google.com/appengine/docs/python/datastore/entities.html#Kinds_IDs_and_Names
http://code.google.com/appengine/docs/python/datastore/

Resources