Does extensive use of ndb models affect performance? - google-app-engine

I'm new to GAE and I'm still trying to figure things out. We're developing an Android app which uses Cloud Datastore to store images, videos, text, audios, etc. So we have now over 15 types of content objects.
I've been modelling each type of object as a distinct ndb Model class, but I'm wondering if this kind of design could affect performance.
Specifically, wouldn't it be better to write a simple class (e.g ContentObject) which simply had a content_type, and a few generic fields as string, number and blob?
I guess I'd go for the latter if I had to worry about creating/maintaining tables (or simply knowing that there are regular db tables behind).
I really like the first option, but I had to ask, just in case.

There are no performance differences to worry about between the 2 approaches.
With dedicated models you'll have to write a bit more code - each model needs to be handled separately. But it's simpler code, especially if eventually you will have some properties which only exist for some entities or are handled differently, which would require conditional logic with a generic model.
Building queries is also simpler with dedicated models if there are property differences, using a single model may require filling in unused properties (maybe by using default values) if they are used for sorting/filtering query results (entities with missing properties aren't indexed by the respective properties so they won't show up in the results).
On the other hand you'll need separate queries for each model, you can't obtain results for different kinds in the same query. And you'll need to maintain separate composite indexes for each kind (with a total limit of 200 such indexes per application).
If you're worrying about code duplication, which could also be a reason for which you'd consider a shared model, it's also possible to combine the common properties in a single ndb model class, with a single/common implementation for handling those common properties, and inherit that class in dedicated subclasses handling the differences. Something like this:
class Content(ndb.Model):
type = ndb.StringProperty() # not really needed, cls._get_kind() can be used instead
blob = ndb.StringProperty()
# other generic/common content properties and related methods
class Video(Content):
has_cc = ndb.BooleanProperty()
# other video-specific content properties and related methods
But this is just an implementation approach, from the datastore perspective you're still using dedicated models - in the above example a video entity will have a Video kind, not a Content kind.
There are no tables with the datastore, the only thing shared between entities of the same kind is their ndb model (which is specific just for the more performant ndb client library, other client libraries don't have one) and the search indexes definitions.

Related

How do i model multiple photos (for a Hotel) with schema.org?

I am new to schema.org. Currently i am trying to use it as our internal data model for imports as it offers a good "common ground" for all source systems.
The Hotel schema (https://schema.org/Hotel) offers a "photo" (singular) property, it inherits from Place. It used to have a "photos" (plural) property in the past.
When using schema.org for markup, this would not matter, as i can just mark up multiple elements as "photo".
However, when using it as a data class, how should i model it?
Should i just make it an array of Photograph?
If yes, does schema.org actually assume on ANY property that it may be multiple (amenityFeature, availableLanguage, etc. suspiciously look like that)?
Does that mean, i have to actually model every property as an array?
After some additional research i have to assume schema.org is not meant as a full data model. It is mostly about providing a common vocabulary and a hierarchy of information. Its primary use case seems to be markup, so types definitions are very vague since they have to work on content that is actually meant to be presented to a user. So i will have to specify my own schema and let my decisions and my naming be guided by schema.org.

Using of graphql service within desktop application that "follows" MVVM and DDD

We have a WPF desktop application that uses MVVM pattern and DDD (well, let's say that at least my model classes that store data named by entities taken from the real world). APP uses several microservices through REST API. And it worked perfectly. Until we thought that it's time to use some facade for back-end part to unite all those microservices and get only data that we need for particular screen.
BUT. The question is, how to make them live together.
On the one hand, we have dynamically returned data from graphql. It
means that, for example, if we have list of people on the one screen,
we will request id, name, surname and role of the person. On the
different screen for dropdown of people we will request the same data
but without role.
On the other hand we have class Person that has static set of fields Name, Surname, Role and Id, which person has in "real life"
If we use the same Person class with graphql, converting data from JSON to model Person, both screens will work fine, but behind the scene one screen that doesn't need Role wouldn't request it from graphQL. And we will have a situation when model class Person will have field Role but it will be just empty (which is i believe is kind of smells. At least I don't feel like it would be easy to maintain such a code. Developer needs to add some information to the screen, opens model, sees that Role is there, bind the field to the screen and goes to drink cofee. And then oops, there is the fields but there was no data assigned ).
Two variants I have on my mind are:
either to not use models and DDD and map data directly to ViewModel
(which personally feels like ruining everything we had before).
or we map that dynamic data to our existing models and different field for different screens (for the same class Person e.g.) will be
empty (because not requested).
Maybe somebody has already used such a combination. How do you use it and what pros and cons are?
It's a fairly common situation where you have a data layer returns many columns but only some are used in a given view.
There is no absolute "best" solution independent of how much impact the full set of columns will have on performance. Which might in turn be linked to things like caching.
You could write services that return subsets of data and then you only use the necessary bandwidth. Sort of a CQRS pattern but with maybe more models than just read + write.
Often this is unnecessary and the complications introduced do not compensate for the increased cost of maintenance.
What is often done is just to map from model to viewmodel (and back). The viewmodel that needs just 4 columns just has 4 properties and any more returned by the model are not copied. The viewmodel that needs 5 has 5 properties and they are copied from the model.

How to model Data Transfer Objects for different front ends?

I've run into reoccuring problem for which I haven't found any good examples or patterns.
I have one core service that performs all heavy datasbase operations and that sends results to different front ends (html, silverlight/flash, web services etc).
One of the service operation is "GetDocuments", which provides a list of documents based on different filter criterias. If I only had one front-end, I would like to package the result in a list of Document DTOs (Data transfer objects) that just contains the data. However, different front-ends needs different amounts of "metadata". The simples client just needs the document headline and a link reference. Other clients wants a short text snippet of the document, another one also wants a thumbnail and a third wants the name of the author. Its basically all up to the implementation of the GUI what needs to be displayed.
Whats the best way to model this:
As a lot of different DTOs (Document, DocumentWithThumbnail, DocumentWithTextSnippet)
tends to become a lot of classes
As one DTO containing all the data, where the client choose what to display
Lots of unnecessary data sent
As one DTO where certain fields are populated based on what the client requested
Tends to become a very large class that needs to be extended over time
One DTO but with some kind of generic "Metadata" field containing requested metadata.
Or are there other options?
Since I want a high performance service, I need to think about both network load and caching strategies.
Does anyone have any good patterns or practices that might help me?
What I would do is give the front end the ability to request the presence of the wanted metadata ( say getDocument( WITH_THUMBNAILS | WITH_TEXT_SNIPPET ) )
Then this DTO is built with only this requested information.
Adding all the possible metadata is as you said, unacceptable.
I will surely stay with one class defining all the possible methods (getTitle(), getThumbnail()) and if possible it will return a placeholder when the thumbnail was not requested. Something like "Image not available".
If you want to model this like a pattern, take a look at the factory patterns.
Hope this helps you.
Is there any noticable cost to creating a DTO that has all the data any of your views could need and using it everywhere? I would do that, especially since it insulates you from a requirement change down the line to have one of the views incorporate data one of the other views uses
ex. Maybe your silverlight/flash view doesn't show the title itself b/c it's in the thumb now, but they decide they want to sort by it later.
To clarify, I do not necesarily think you need to pass down all of the data every time, but I think your DTO class should define all of them. Just don't fall into the pits of premature optimization or analysis paralysis. Do the simplest thing first, then justify added complexity. Throw it all in and profile it. If the perf is unacceptable, optimize and try again.

Django models generic modelling

Say, there is a Page that has many blocks associated with it. And each block needs custom rendering, saving and data.
Simplest it is, from the code point of view, to define different classes (hence, models) for each of these models. Simplified as follows:
class Page(models.Model):
name = models.CharField(max_length=64)
class Block(models.Model):
page = models.ForeignKey(Page)
class Meta():
abstract = True
class BlockType1(Block):
other_data = models.CharField(max_length=32)
def render(self):
"""Some "stuff" here """
pass
class BlockType2(Block):
other_data2 = models.CharField(max_length=32)
def render(self):
"""Some "other stuff" here """
pass
But then,
Even with this code, I can't do a query like page.block_set.all() to obtain all the different blocks, irrespective of the block type.
The reason for the above is that, each model defines a different table; Working around to accomplish it using a linking model and generic foreign keys, can solve the problem, but it still leaves multiple database tables queries per page.
What would be the right way to model it? Can the generic foreign keys (or something else) be used in some way, to store the data preferably in the same database table, yet achieve inheritance paradigms.
Update:
My point was, How can I still get the OOP paradigms to work. Using a same method with so many ifs is not what I wanted to do.
The best solution, seems to me, is to create separate standard python class (Preferably in a different blocks.py), that defines a save which saves the data and its "type" by instantiating the same model. Then create a template tag and a filter that calls the render, save, and other methods based on the model's type.
Don't model the page in the database. Pages are a presentation thing.
First -- and foremost -- get the data right.
"And each block needs custom rendering, saving and data." Break this down: you have unique data. Ignore the "block" and "rendering" from a model perspective. Just define the data without regard to presentation.
Seriously. Just define the data in the model without any consideration of presentation or rending or anything else. Get the data model right.
If you confuse the model and the presentation, you'll never get anything to work well. And if you do get it to work, you'll never be able to extend or reuse it.
Second -- only after the data model is right -- you can turn to presentation.
Your "blocks" may be done simply with HTML <div> tags and a style sheet. Try that first.
After all, the model works and is very simple. This is just HTML and CSS, separate from the model.
Your "blocks" may require custom template tags to create more complex, conditional HTML. Try that second.
Your "blocks" may -- in an extreme case -- be so complex that you have to write a specialized view function to transform several objects into HTML. This is very, very rare. You should not do this until you are sure that you can't do this with template tags.
Edit.
"query different external data sources"
"separate simple classes (not Models) that have a save method, that write to the same database table."
You have three completely different, unrelated, separate things.
Model. The persistent model. With the save() method. These do very, very little.
They have attributes and a few methods. No "query different external data sources". No "rendering in HTML".
External Data Sources. These are ordinary Python classes that acquire data.
These objects (1) get external data and (2) create Model objects. And nothing else. No "persistence". No "rendering in HTML".
Presentation. These are ordinary Django templates that present the Model objects. No external query. No persistence.
I just finished a prototype of system that has this problem in spades: a base Product class and about 200 detail classes that vary wildly. There are many situations where we are doing general queries against Product, but then want to to deal with the subclass-specific details during rendering. E.g. get all Products from Vendor X, but display with slightly different templates for each group from a specific subclass.
I added hidden fields for a GenericForeignKey to the base class and it auto-fills the content_type & object_id of the child class at save() time. When we have a generic Product object we can say obj = prod.detail and then work directly with the subclass object. Took about 20 lines of code and it works great.
The one gotcha we ran into during testing was that manage.py dumpdata followed by manage.py loaddata kept throwing Integrity Errors. Turns out this is a well-known problem and a fix is expected in the 1.2 release. We work around it by using mysql commands to dump/reload the test dataset.

Model-View-ViewModel pattern violation of DRY?

I read this article today http://dotnetslackers.com/articles/silverlight/Silverlight-3-and-the-Data-Form-Control-part-I.aspx about the use of the MVVM pattern within a silverlight app where you have your domain entities and view spesific entities which basically is a subset of the real entity objects. Isn't this a clear violation of the DRY principle? and if so how can you deal with it in a nice way?
Personally, I don't like what Dino's doing there and I wouldn't approach the problem the same way. I usually think of a VM as a filtered, grouped and sorted collections of Model classes. A VM to me is a direct mapping to the View, so I might create a NewOrderViewModel class that has multiple CollectionViews used by the View (maybe one CV for Customers and another CV for Products, probably both filtered). Creating an entirely new VM class for every class in the Model does violate DRY in my opinion. I would rather use derivation or partial classes to augment the Model where necessary, adding in View specific (often calculated) properties. IMO .NET RIA Services is an excellent implementation of combining M and VM data with the added bonus that it's usable in on both the client and the server. Dino's a brilliant guy, but way to call him out on this one.
DRY is a principle, not a hard rule. You are a human and can differentiate.
E.g. If DRY really was a hard rule you would never assign the same value to two different variables. I guess in any non trivial program you would have more than one variable containing the value 0.
Generally speaking: DRY does usually not apply to data. Those view specific entities would probably only be data transfer objects without any noteworthy logic. Data may be duplicated for all kinds of reasons.
I think the answer really depends on what you feel should be in the ViewModel. For me the ViewModel represents the model of the screen currently being displayed.
So for something like a ViewCategoryViewModel, I don't have a duplication of the fields in Category. I expose a Category object as a property on the ViewModel (under say "SelectedCategory"), any other data the view needs to display and the Commands that screen can take.
There will always be some similarity between the domain model and the view model, but it all comes down to how you choose to create the ViewModel.
It's the same as with Data Transfer Objects (DTO).
The domain for those two object types is different, so it's not a violation of DRY.
Consider the following example:
class Customer
{
public int Age
}
And a corsponding view model:
class CustomerViewModel
{
public string Age;
// WPF validation code is going to be a bit more complicated:
public bool IsValid()
{
return string.IsNullOrEmpty(Age) == false;
}
}
Differnt domains - differnet property types - different objects.

Resources