I'm trying to fetch a long list of entities, and those entities all refer to one of a few different related entities. It's explained in the comments, but basically many "items" reference to a few "Company"s. I don't want to have to make multiple queries for each key in unique_key (IE key.get()), so I thought the following would work but it's returning an empty list. Pray tell, what am I doing wrong? Or is there a better way to accomplish this relationship of many items referencing a few, while minimizing calls to the db (I'm new to AppEngine Datastore).
Notice, this is in Python, using the ndb library offered by app engine.
# "items" is a list of entities that have a property "parenty_company"
# parent_company is a string of the Company key
# I get a unique list of all Key strings and convert them to Keys
# I then query for where the Company Key is in my unique list
unique_keys = list(set([ndb.Key(Company, prop.parent_company) for prop in items]))
companies = Company.query(Company.key.IN(unique_keys)).fetch()
You definitely should use ndb.get_multi(unique_keys). It will fetch all keys asynchronously in a single batch.
Related
I'm making a simple CRUD application with MongoDB so I can learn more about it.
The application is a simple blog, I have a collection named "articles" which stores various documents, each one representing a post for my blog.
When I display the list of all blog posts, I can do a db.collection.find(), and list all of them.
But the question lies when I need to show a single post individually, when I need to query the collection for a single, specific document.
The logical solution would be to use a RDBMS and an auto increment feature, but MongoDB is NoSQL and does not have auto increment.
I'm using the auto generated _id field of the document which stores an ObjectId by default, which means that my url's look like this:
http://localhost/blog/article.php?_id=5d41f6e5fc1a2f3d80645185
I saw in the documentation that the ObjectId contains a unique identifier for the server, together with a timestamp and a counter, isn't exposing these things a security risk?
As a solution, I stumbled into UUID https://docs.mongodb.com/manual/reference/method/UUID/ which is an auto-generated unique ID, that doesn't expose timestamp and machine info in it. It seems like a logical solution to use this instead of the _id that contains my ObjectId for querying and finding a document.
So I can make my url's look like this:
http://localhost/blog/article.php?_id=23829651-26f7-4092-99d0-5be8658c966e
But still, should I keep the _id property? should I add another one called "id" that stores the UUID? should I even use UUID's at all?
Here's what I would consider before choosing an identifier:
Collision
Risk of collision is very low for both UUIDs and ObjectIDs. This has been discussed in detail in another question.
Nature
UUIDs are random whereas ObjectID values always increase over time. This makes ObjectIDs a bad choice for sharding.
Other uses
ObjectIDs have the creation timestamp as a part and can be used as a substitute of commonly used the createdAt field. A sort by ObjectIDs is a sort by creation time.
Insecure object references (OWASP)
Short def: An attacker cannot deduce the ID of another object if they have the ID of one object. You can read more about this here. Both UUIDs and ObjectIDs are not vulnerable to this.
Link to another question that discusses the security of ObjectIDs (thanks zbee).
Ease of use
Note: This is subjective
Using ObjectIds is a lot easier in the Mongo ecosystem. The existence of speical aggregation operators to deal with ObjectIDs + libraries add to it.
Portability
UUIDs are more portable than ObjectIDs. I do not know of any other system that uses ObjectIDs internally except for Mongo. Whereas there are other DBs such as Postgres that have a special data type for UUIDs + extensions for random generation etc.
I have a system whereby you can create documents. You select the document type to create and a form is displayed. Data is then added to the form, and the document can be generated. In Laravel things are done via Models. I am creating a new Model for each document but I don't think this is the best way. An example of my database :
So at the heart of it are projects. I create a new project; I can now create documents for this project. When I select project brief from a select box, a form is displayed whereby I can input :
Project roles
Project Data
Deliverables
Budget
It's three text fields and a standard input field. If I select reporting doc from the select menu, I have to input the data for this document (which is a couple of normal inputs, a couple of text fields, and a date). Although they are both documents, they expect different data (which is why I have created a Model for each document).
The problems: As seen in the diagram, I want to allow supporting documents to be uploaded alongside a document which is generated. I have a doc_upload table for this. So a document can have one or more doc_uploads.
Going back to the MVC structure, in my DocUpload model I can't say that DocUpload belongs to both ProjectBriefDoc and ProjectReportingDoc because it can only belong to one Model. So not only am I going to create a new model for every single document, I will have to create a new Upload model for each document as well. As more documents are added, I can see this becoming a nightmare to manage.
I am after a more generic Model which can handle different types of documents. My question relates to the different types of data I need to capture for each document, and how I can fit this into my design.
I have a design that can work, but I think it is a bad idea. I am looking for advice to improve this design, taking into account that each document requires different input, and each document will need to allow for file uploads.
You don't need to have a table/Model for each document type you'll create.
A more flexible approach would be to have a project_documents table, where you'll have a project_id and some data related to it, and then a doc_uploads related to the project_documents table.
This way a project can have as many documents your business will ever need and each document can have as many files as it needs.
You could try something like that:
If you still want to keep both tables, your doc_upload table in your example can have two foreign keys and two belongsTo() Laravel Model declarations without conflicts (it's not a marriage, it's an open relationship).
Or you could use Polymorphic Relations to do the same thing, but it's an anti-pattern of Database Design (because it'll not ensure data integrity on the database level).
For a good reference about Database Design, google for "Bill Karwin" and "SQL Antipatterns".
This guy has a very good Slideshare presentation and a book written about this topic - he used to be an active SO user as well.
ok.
I have a suggestion..you don't have to have such a tight coupling on the doc_upload references. You can treat this actually as a stand alone table in your model that is not pegged to a single entity.. You can still use the ORM to CRUD your way through and manage this table..
What I would do is keep the doc_upload table and use it for all up_load references for all documents no matter what table model the document resides in and have the following fields in the doc_upload table
documenttype (which can be the object name the target document object)
documentid_fk (this is now the generic key to a single row in the appropriate document type table(s)
So given a document in a given table.. (you can derive the documenttype based on the model object) and you know the id of the document itself because you just pulled it from the db context.. should be able to pull all related documents in the doc_upload table that match those two values.
You may be able to use reflection in your model to know what Entity (doc type ) you are in.. and the key is just the key.. so you should be able.
You will still have to create a new model Entity for each flavor of project document you wish to have.. but that may not be too difficult if the rate of change is small..
You should be able to write a minimum amount of code to e pull all related uploaded documents into your app..
You may use inheritance by zero-or-one relation in data model design.
IMO having an abstract entity(table) called project-document containing shared properties of all documents, will serve you.
project-brief and project-report and other types of documents will be children of project-document table, having a zero-or-one relation. primary key of project-document will be foreign key and primary key of the children.
Now having one-to-many relation between project-document and doc-upload will solve the problem.
I also suggest adding a unique constraint {project_id, doc_type} inside project-document for cardinal check (if necessary)
As other answers are sort of alluding to, you probably don't want to have a different Model for different documents, but rather a single Model for "document" with different views on it for your different processes. Laravel seems to have a good "templating" system for implementing views:
http://laravel.com/docs/5.1/blade
http://daylerees.com/codebright-blade/
Unowned one-to-many and many-to-many relationships are defined using Sets or Lists of Keys. Let's say I have an object of an article containing set of keys of labels of the article. So how do I fetch the labels themselves? The objects matching the keys? How do I iterate over them? Naturally I could iterate over the keys and fetch the objects separately, but is this the only way? I tried to find the answer everywhere but I wasn't successful. The documentation describes defining unowned relationships in detail, but is silent about making queries on them...
You mean you have what Google was originally calling an "unowned relation" ? (where you have a Set<Key>). Sadly that is not a relation at all IMHO, since there is no related object involved. With v2.x of their plugin you can have a real unowned relation (the example there is for 1-1, but you can equally have Set<B>) where everything looks as it should. I'd strongly advise you to use that (the keys are stored in the parent, so a single call is made to get the Set of related objects).
I want to be as efficient as possible and plan properly. Since read and write costs are important when using Google App Engine, I want to be sure to minimize those. I'm not understanding the "key" concept in the datastore. What I want to know is would it be more efficient to fetch an entity by its key, considering I know what it is, than by fetching by some kind of filter?
Say I have a model called User and a user has an array(list) of commentIds. Now I want to get all this user's comments. I have two options:
The user's array of commentId's is an array of keys, where each key is a key to a Comment entity. Since I have all the keys, I can just fetch all the comments by their keys.
The user's array of commentId's are custom made identifiers by me, in this case let's just say that they're auto-incrementing regular integers, and each comment in the datastore has a unique commentIntegerId. So now if I wanted to get all the comments, I'd do a filtered fetch based on all comments with ID that is in my array of ids.
Which implementation would be more efficient, and why?
Fetching by key is the fastest way to get an entity from the datastore since it the most direct operation and doesn't need to go thru index lookup.
Each time you create an entry (unless you specified key_name) the app engine will generate a unique (per parent entity) numeric id, you should use that as ids for your comments.
You should design a NoSql database (= GAE Datastore) based on usage patterns:
If you need to get all user's comments at once and never need to get one or some of them based on some criteria (e.g. query them), than the most efficient way, in terms of speed and cost would be to serialize all comments as a binary blob inside an entity (or save it to Blobstore).
But I guess this is not the case, as comments are usually tied to both users and to posts, right? In this case above advice would not be viable.
To answer you title question: get by key is always faster then query by a property, because query first goes through index to satisfy the property condition, where it gets the key, then it does the get with this key.
I'm developing an application with Google App Engine and stumbled across the following scenario, which can perhaps be described as "MVP-lite".
When modeling many-to-many relationships, the standard property to use is the ListProperty. Most likely, your list is comprised of the foreign keys of another model.
However, in most practical applications, you'll usually want at least one more detail when you get a list of keys - the object's name - so you can construct a nice hyperlink to that object. This requires looping through your list of keys and grabbing each object to use its "name" property.
Is this the best approach? Because "reads are cheap", is it okay to get each object even if I'm only using one property for now? Or should I use a special property like tipfy's JsonProperty to save a (key, name) "tuple" to avoid the extra gets?
Though datastore reads are comparatively cheaper datastore writes, they can still add significant time to request handler. Including the object's names as well as their foreign keys sounds like a good use of denormalization (e.g., use two list properties to simulate a tuple - one contains the foreign keys and the other contains the corresponding name).
If you decide against this denormalization, then I suggest you batch fetch the entities which the foreign keys refer to (rather than getting them one by one) so that you can at least minimize the number of round trips you make to the datastore.
When modeling one-to-many (or in some
cases, many-to-many) relationships,
the standard property to use is the
ListProperty.
No, when modeling one-to-many relationships, the standard property to use is a ReferenceProperty, on the 'many' side. Then, you can use a query to retrieve all matching entities.
Returning to your original question: If you need more data, denormalize. Store a list of titles alongside the list of keys.