Django - check existence of multiple objects individually - django-models

There are objects of type A and B, some have relations to each other defined in the Model ABRelation. We wish to check the existence of many relations individually and create them if non-existent, delete the ones that should not exists anymore.
Thus there are two lists of ids, a_ids and b_ids, that need to match according to position. It could also be a list of tuples (id_a, id_b), whatever is state-of-the-art in django. Any pair of ids in that set must be created if non-existent. Furthermore existing relations on the database that are not contained in the given set must be deleted.
How to do this most efficiently by processing bulks and not individual objects?
We tried to check the existence using filter and queries but it will aggregate the results and return a single boolean that reflects if all objects exist or not.
result = ABRelation.objects.filter(
Q(a_id__in=a_ids) &
Q(b_id__in=b_ids)).exists()
How can this be done? Is there a straight forward way of doing it?

Related

building complicated relationships in a relational database

I have a general question about relational database design. I have a large list of objects from a table. These objects are classified with three descriptor classes. There is distributor1, distributor2 and distributor3. To describe one object one always builds a triplet of these descriptors.
For example assume that descriptor1 is colour, descriptor2 is size and descriptor3 is weight. For each object I would build a triplet which describes that object.
In my case I have thousands of entries for each descriptor. So I build a table for each descriptor. How can I now build a tripled and relate that to an object in the object table?
If there would be only one such tripled for each object, I could just store the three descriptor ids in each object as a foreign key, but let's assume that each object can have 0 or many such triplets.
I am using sqlalchemy, but I am happy to do the coding myself, I am just looking for keywords to look for in the documentation, since so far I could not find much.
My solution would be to create another table with three descriptor ids and and object id. Is that the way to go?
I could also store a string with tripled of descriptor ids in each object... but that seems to go very much against the principle of relational databases...
There's rarely a perfect design for all scenarios. What you've described would work well, if you know that you'll never need another attribute and you'll always lookup that row by using all three attributes. It depends on your use case, but those are pretty limiting assumptions.
Adding more attributes or looking up records by 1 or 2 attributes instead of all 3 is when Lucas's suggestion of adding additional columns that can be indexed is more flexible. The ability to define an arbitrary set of columns within a non-clustered index is where the relational db tends to get a lot of it's search performance/flexibility.

How to interpret this database assignment?

I am working on an assignment concerning a simple database.
The instructions are given as:
Create a small database for "products",
give each product 3 or so attributes in a related table,
and then provide URL's for updating all aspects of those objects.
Create
Read
Update
Delete
List
Search
(For Products:)
Add Attribute
Remove Attribute
I am confused as to whether the Attributes are supposed to fixed categories that are the same for all products, and that deleting the attribute simply clears the cell, OR if it means that attributes are intended to be dynamically added, and that each product should be capable of having different categories of attributes.
In the second case, deleting an attribute would get rid of the entire category.
Your assignment is unclear and you should ask your instructor.
The straighforward interpretation of
Create a small database for "products", give each product 3 or so
attributes in a related table, and then provide URL's for updating all
aspects of those objects.
is that there are some "products" (ie product types rather than individual objects) each of which has its own current set of attributes and its own table that records the current "objects" of that type and the attribute values of each one.
You more or less have the choice of recording a table for each product type P with:
(object,A) rows like "object object is a P product whose attribute A is A and attribute ..." with Add Attribute implemented by DDL
(object,attribute,value) rows like "object object is a P product whose attribute attribute is value" with Add Attribute implemented by DML
(The latter approach is called "EAV" and leads to obscure queries and foregoing of most of the supportive functionality of a DBMS. I only mention it because given the vaguenss of the question maybe it's wanted as a solution nevertheless.)
Re "attributes are intended to be dynamically added": Either of these allows each database state to have its own set of attributes for a product. Remember that there is both DDL and DML. Also, you seem to allow that "attribute" might mean "attribute value" but the former choice seems much more likely.
Re "deleting an attribute would get rid of the entire category": The assignment asks you to record "all aspects" of objects that are given product types, not of products. So it doesn't matter whether you ever "get rid of the entire category" (of a product).
(Clearly give and justify your interpretation of the assignment. If there is more than one interpretation that you think likely, do that for each.)

Storing arbitrary key/value entries alongside a datomic entity

Say I have entities that I want to store in datomic. If the attributes are all known in advance, I just add them to my datomic schema once and can then make use of them.
What if in addition to known attributes, entities could have an arbitrary number of arbitrary keys, mapping to arbitrary values. Of course I can just store that list in some "blob" attribute that I also add to the schema, but then I couldn't easily query those attributes.
The solution that I've come up with is to define a key and a value attribute in datomic, each of type string, and treat every one of those additional key/value entries as entities in their own right, using aforementioned attributes. Then I can connect all those key/value-entities to the actual entity by means of a 1:n relation using the ref type.
That allows me to query. Is that the way to go or is there a better way?
I would be reluctant to lose the power of attribute definitions. Datomic attributes can be added at any time, and the limit is reasonably high (2^20), so it may be reasonable to model the dynamic keys and values as they come along, creating a new attribute for each.

Appengine's Indexing order, cursors, and aggregation

I need to do some continuous aggregation on a data set. I am using app engines High Replication Datastore.
Lets say we have a simple object with a property that holds a string of the date when it's created. There's other fields associated with the object but it's not important in this example.
Lets say I create and store some objects. Below is the date associated with each object. Each object is stored in the order below. These objects will be created in separate transactions.
Obj1: 2012-11-11
Obj2: 2012-11-11
Obj3: 2012-11-12
Obj4: 2012-11-13
Obj5: 2012-11-14
The idea is to use a cursor to continually check for new indexed objects. Aggregation on the new indexed entities will be performed.
Here are the questions I have:
1) Are objects indexed in order? As in is it possible for Obj4 to be indexed before Obj 1,2, and 3? This will be a issue if i use a ORDER BY query and a cursor to continue searching. Some entities will not be found if there is a delay in indexing.
2) If no ORDER BY is specified, what order are entities returned in a query?
3) How would I go about checking for new indexed entities? As in, grab all entities, storing the cursor, then later on checking if any new entities were indexed since the last query?
Little less important, but food for thought
4) Are all fields indexed together? As in, if I have a date property, and lets say a name property, will both properties appear to be indexed at the same time for a given object?
5) If multiple entities are written in the same transaction, are all entities in the transaction indexed at the same time?
6) If all entities belong to the same entity group, are all entities indexed at the same time?
Thanks for the responses.
All entities have default indexes for every property. If you use ORDER BY someProperty then you will get entities ordered by values of that property. You are correct on index building: queries use indexes and indexes are built asynchronously, meaning that it's possible that query will not find an entity immediately after it was added.
ORDER BY defaults to ASC, i.e. ascending order.
Add a created timestamp to you entity then order by it and repeat the cursor. See Cursors and Data Updates.
Indexes are built after put() operation returns. They are also built in parallel. Meaning that when you query some indexes may be build, some not. See Life of a Datastore Write. Note that if you want to force "apply" on an entity you can issue a get() after put(), which will force the changes to be applied (= indexes written).
and 6. All entities touched in the same transaction must be in the same entity group (=have common parent). Transaction isolation docs state that transactions can be unapplied, meaning that query after put() will not find new entities. Again, you can force entity to be applied via a read or ancestor query.

Can we have a Model with a lot of properties (say 30) while avoiding the exploding indexes pitfall?

I was thinking that maybe you can have the index.yaml only specify certain indexes (not all the possible ones that GAE automatically does for you).
If that's not a good idea, what is another way of dealing with storing large amount of properties other than storing extra properties as a serialized object in a blob property.
The new improved query planner should generate optimized index definitions.
Note that you can set a property as unindexed by using indexed=False in python or Entity.setUnindexedProperty in Java.
A few notes:
Exploding indexes happen when you have multiple properties that contain "multiple values", i.e. an entity with MULTIPLE list properties AND those properties are listed in a composite index. In this case index entry is created for each list property value combination. In different words: index entries created equals a product of list properties size. So a list property with 20 entries and another list property with 30 entries would create, when BOTH listed in index.yaml under one compound index, 600 index entries.
Exploding indexes do not happen for simple (non-list) properties, or if there is only one list property in entity.
Exploding indexes also do not happen if you do not create a compound index in your index.yaml file, listing at least two list properties in same index.
If you have a lot of properties and you do not need to query upon them, than you can simply put them in a list or two parallel lists (to simulate map), or serialize them. The simplest would be two two parallel lists: this is done automatically for you if you use objectify with embedded classes.

Resources