App Engine entity groups: grouping all of a user's data vs avoiding them as long as possible

App Engine entity groups: grouping all of a user's data vs avoiding them as long as possible - google-app-engine

I'm working on a web application that allows users to create simple websites and publish them on a static web host, and I have problems deciding if and how I should use ancestors in the gray area between necessary and avoidable.
The model is rather simple: currently every User has one or more Website entities. The Website entities store all the basic information about a website, plus it's nested navigation menu that refers to Page entities (the navigation tree is stored as a JSON property). The Page entity types are based on a PolyModel, and there are several page types that behave differently (There's a GalleryPage, for example).
There are no entity groups (or rather, no entities with ancestors) as of yet, and I'll only need a couple of transactions. When updating a Page's name, for example, I have to update it in the Page entity itself as well as in the navigation tree on the Website entity.
I think I understand how entity groups work and the basic implications of using them, but I have trouble deciding on the "best" way to structure my data in the absence of strong reasons for either approach. I could:
Go entirely without ancestors on my entities. As far as I understand I can still use cross-group transactions as long as I get the entities by key and don't need more than 5 within the transaction. The downside is that I'd depend on the XG transactions and there might come a point where I can't ninja my way around using ancestor queries anymore (and then it might be too late).
Make the user object the parent of all his Website's, Page's and other data. This would give the user a strongly consistent view of all of his data, allow me to use transactions whenever I add a feature that would need them, but limits the sustained writes to 1-5/sec. But, as a user will only ever be updating his own data, this might actually work and behave just the same for 1000 users as it will for 1.
Try to use even smaller entity groups (like seperating the navigation from the Website and making that the parent of the Website's Pages). But I'm not quite sure if there's much benefit to this, because most of the editing happens on Pages anyway.
So I guess the real question is: how do you decide when to use ancestor relationships on App Engine when there's no obvious reason for or against them? Would you go for the convenience of strongly consistent queries and being able to use transactions freely while adding features later, or would you avoid them at all costs until there's a very obvious reason for them, even if might limit my ability to do transactions later?
I read the related documentation, read the chapter on transactions in "Programming App Engine", looked at quite a few of the Google I/O videos, but I still find it hard to make that decision.

Related

User or User Profile model for app wide relationships

I recently read a tweet that suggested that if one wants to avoid headaches in the future of an app, they should have the user table have only authentication information and a user profile table for everything else. That is if you have bikes and peaches in the system they should be linked to the user that owns them via the user profile id. The tweet was not clear on what the consequences of using the user profile. Are there maintainability/scalability repercussions to not following this especially in a large web app?

Well, don't take it as a dogma, though it isn't completely worthless. Dependency is a problem: if you have to have a lot of different data that represent particular user, you'll change underlying database oftenly. In case everything is stored in a single column, you might find yourself doing repetative monkey job of "making it work" with your types/ORM and whatsnot gonna be involved in DB <-> RUNTIME communication.
It is all about splitting complicated task into smaller less complex subtasks: auth is self-standing - one of the most important - task itself and it definitely deserves some dedicated space. However, your app might be not that big, or not that concerned with users, and thus it won't be very helpful to split data into multiple columns. You must develop a deep sense of purpose and measure when it comes down to a software design.

Considering database options for new web application

I know this is a topic that's been addressed ad nauseam but I also know there are people who enjoy opining about databases so I figured I'd just go ahead and ask the question again.
I'm building out a web application that on a very basic level displays a list of objects that meet a user-defined search criteria. The primary function of the application will be to provide an interface by which a user can perform realtime faceted searches on a large number of object properties, including ranges of data, location data, and probably related data.
Of course there will be ancillary information too: user accounts, lookup tables etc.
My background is entirely in relational database development, primarily SQL Server with a little bit of MySQL. However, I'm intrigued by the possible applicability of an object-relational approach or even a full-on document database. Without the experience of working in those paradigms I'm not sure what I might be getting myself into.
Here are some further considerations that may affect the decision:
The schema will likely evolve considerably over time as more properties and search options are added, creating the typical versioning/deployment challenges. This is the primary reason why I would consider a document database.
The application itself will likely be written in Node/Express with an Angular or React front-end using Typescript and so the code will be interacting with data in json format. In other words, regardless of what comes back from the db server, we want json on the code level. (Another case for a doc database.)
There is the potential for a large amount of search parameters and a large amount of data, so indexing will be key and performance will be a huge potential gotcha. This would seem to me to be a strong case against a document db.
A potential use case would involve a user adjusting a slider control (let's say it controls high and low price parameters or a distance range). The selected parameters would then be packaged as a json object and sent to a search controller, which would then pass these parameters to the db server on change and expect a list of objects in return. In other words, the user would generally not be pushing a button to refine search criteria. The search update would happen each time they change a parameter.
I don't know the extent to which this a thing or not, but it would also be great if there were some way to leverage technology that could cache search results and then search within those results if the search were narrowed, thus performing a second search only on the smaller subset of the first search rather than the entire universe of available objects.
I guess while I'm at it I should ask about ORMs. Also something I'm generally not some experienced with (I've used Entity Framework a bit) but wondering if I should expand my horizons.
Thanks and I look forward to your opinions!

I think that your requirement of "There is the potential for a large amount of search parameters and a large amount of data, so indexing will be key and performance will be a huge potential gotcha" makes a strong case for using a relational database to store the data.
Leveraging an ORM that can support data in JSON format would seem to be ideal in your use case. Schema evolution for a system in production would sure be a challenge (not unsurmountable though) but it would be nice to use an ORM product that can at least easily support schema evolution during the development stage when things are more likely to change and evolve rapidly.
Given the kind of queries you would typically be issuing (e.g., adjusting a slider control), an ORM that supports prepared statements that can have range criteria would be more efficient.
Also, given your need to "perform realtime faceted searches on a large number of object properties, including ranges of data, location data, and probably related data", an ORM product that can easily support one-to-one, one-to-many, and many-to-many relationships and path-expressions in search criteria should simplify your development process.

Microservices unsuitable for business domain?

The business domain has five high-level bounded contexts
Customers
Applications
Documents
Decisions
Preforms
Further, these bounded contexts has sub-contexts like ordering and delivery of the documents. Despite the project of consisting ten of thousands of classes and dozens of EJB's, most of the business logic resides in relational database views and triggers for a reason: A lot of joins, unions and constraints involved in all business transactions. In other words, there is complex web of dependencies and constraints between the bounded contexts, which restricts the state transfers. In layman terms: the business rules are very complicated.
Now, if I were to split this monolith to database per service microservices architecture, bounded contexts being the suggested service boundaries, I will have to implement all the business logic with explicit API calls. I would end up with hundreds of API's implementing all these stupid little business rules. As the performance is main factor (we use a lot of effort to optimize the SQL as it is now), this is out of the question. Secondly, segregated API's would probably be nightmare to maintain in this web of ever evolving business rules, where as database triggers actually support the high cohesion and DRY mentality, enforcing the business rules transparently.
I came up with a conclusion microservice architecture being unsuitable for this type of document management system. Am I correct, or approaching the idea from wrong angle?

First of all, you don't have to have a Microservices architecture. I really mean it! If you were ordered by management/architect to do it, and it doesn't solve any real problems you are having, you are probably right for pushing back.
That being said, and with the disclaimer that I don't know the exact requirements of your application, having "things" as bounded context is a smell. So having "Customers", "Applications", "Documents", etc. as services is very likely the wrong approach.
Bounded contexts should not be CRUD operations on a specific entity. They should be completely independent (or as independent as possible) "vertical" parts of the whole application. Preferably with their own Database and GUI. They should also operate independently of each other, not requiring input from other services for own decisions.
It is the complete opposite of data-centric design, where tables/fields and relations are the core concepts. Here, functionality is the core concept. You would have to split your application along functionality to arrive at a good separation.
I could imagine a document management system having these idependent bounded contexts / services: Search, Workflow, Editing, etc.
Here is how you would think about it: Search does not require any (synchronous) input from any other service. It may receive regular, even near-time updates with new documents, but that does not impact it's main feature: searching already indexed documents. The GUI is also independent, something like one google-like page with a search box maybe. It can deliver results independently, and would link back to the Workflow or Editing apps when you click on a result.
The others would be similarly independent. Again, the point is to split the services in a way that makes them work independently. If you don't have that, you will only make things worse with Microservices.

First of all the above answer is correct in suggesting that you need to breaup your microservice in a better way.
Now If scalability is your concern(lots of api calls between microservice).
I strongly suggest you to validate that how many of the constraints are really required at the first level, and how many of them you could do in async way. With that what i mean is in distributed enviornment we actually do not need to validate all the things at the same time.
Sometimes these things are not directly visible , for eg: lets say there are two services order service and customer service and order service expose a api which say place a order for customer id. and business say you cannot place a order for a unknown customer
one implementation is from the order service you call the customer service in sync ---- in this case customer service down will impact your service, now lets question do we really need this.
Because a scenario could happen where customer just placed an order and somebody deleted that customer from customer service, now we have a order which dosen't belong to customer.Consistency cannot be guaranteed.
In the new sol. we are saying allow the order service to place the order without checking the customer id and do one of the following:
Using ProcessManager check the customer validity and update the status of the order as invalid and when customer get deleted using ProcessManager update the order status as invalid or perform business logic
Do not check at all , because placing a order dosen't count a thing, when this order will be in the process of dispatch that service will anyway check the customer status
In this way your API hits are reduced and better independent services are produced

Modelling a graph like entity relationship in app engine

I have a project tracking application. The app has the following entities:
project
story - belong to a particular project
user - belong to a particular project, assigned to a particular story
Each project can have multiple story and user entities as descendants. Each story can be parent to multiple user entities. Basically, every project has several users that can work on the various stories (tasks) within the project. Each story can have multiple users assigned to it as well. Something like below:
Now my question is, can i model such a relationship in the app engine datastore using ancestor queries without causing an index explosion? For example, i can find out stories within a project with a simple query. But to find out stories assigned to a specific user would require traversing the entire story index (which isn't really an issue due to query performance being independent of index table size), but would it not be better to have the query reflect a graph like relationship here? As if modeled using a graph database like neo4j?

If a user can work on multiple stories, or none at the moment, and/or can ever change the story they're working on (get assigned to a different story later), then modeling the story as "parent" to the user seems deeply wrong on a semantic level -- it may also cause performance issues (depending on kind of queries, frequency of reads and writes, etc, etc), but, that's quite a secondary worry, I'd focus on the semantic correctness first and I'm not entirely sure about the specific semantics of your data model.
A parent "relationship", in GAE's datastore, is intended to model a persistent (actually I'd say "permanent", in terms of the child entity's lifetime:-) 1:many connection -- especially one that may well require transactional behavior (or even just strong consistency) among parent and child, or among siblings (transactional behavior and strong consistency don't come for free, performance-wise -- but, when you need them, you really need them:-). How well does the connection between story and user in your app match this summary?
There are of course other ways you can model persistent 1:many connections; using ndb concepts, a StructuredProperty can in fact let you embed the "child" entity "inside" the "parent" one (and if you don't need queries on the child's attributes you can gain a speed boost by using the local kind of structured property).
And of course, the most general way to model any kind of relationship is with KeyProperty -- that doesn't require the relationship to be persistent/permanent, nor necessarily 1:many (e.g if a user can be assigned to multiple stories). In fact you can view key properties as edges in a directed graph where entities are the nodes, with full generality (indeed it can be a multi directed-graph, with 0+ edges from node A to node B, if you need even more generality than a "mere" graph can provide:-). But of course you can pay some price for such broad generality, if you don't really need it nor use it.
In the end, beyond complete clarity in the entity-relationship modeling (which is a good thing no matter what kind of db is underlying:-), the choice of "schema" (in the broadest sense of the word:-) for a NoSQL database is strongly dependent on what queries, updates, &c, the app will require, with what frequency, tolerance for latency, transactionality requisites, consistency (strong vs eventual), ... to a higher degree than for the relational databases that are what I think most of us "cut our teeth on". Thus I would encourage you to strive to make both aspects very explicit -- the E-R layer of abstraction, of course, but also the mix of queries, updates, &c, and the constraints and desiderata on them.

Why not construct UI based on DB schema?

People seem to avoid building user interfaces that pull their information (names, field types, etc. as well as relationships) from a database; they instead hard-code forms (and tables, etc.) that have pretty much the same data names and types and things.
Am I making sense?
For instance, imagine an emumerated field in MySQL: why not just have the UI construct a drop-down list whenever it encounters an ENUM? Why put the same values in both the database and the code?
Perhaps I'm just missing something; perhaps there are projects out there that do this — sort of super-crud interfaces that can be pointed at any database and from it build a fully-functional relationally-aware user interface. Are there?
I'm possibly not quite conforming to the stackoverflow norms with this question; I shall summarise:
Can you please tell me of a project that constructs its user interface (solely) from analysis of the database schema?
Why is this not a common way to do it — surely it is good to only define data structure in one place (i.e. the database)?
Thank you, and may joyous code-love rain upon your IDE.

I'd like to point out that, last time I checked, .NET and Qt (and probably other environments) make it possible to use "database-aware widgets" (sometimes shortened to just data-aware widgets), which is probably the best pragmatic solution available. What I mean by data-aware widgets is that the widgets themselves know that they're linked directly to database fields, so you would have a combobox that knows that it's backed by an enum and fetches the possible values directly from the database at runtime, just like you suggested.
This is a really neat utility, and used well, it probably won't hurt anything. It still requires that you spend some time laying out widgets manually on a form, but then if you update the database to add a new value to that enum, you don't have to rebuild your app to see it show up in the UI.
But the reason most usability experts will cringe when they hear your question is because programmers tend to think that, well, why not just generate the entire UI, form layout and everything, from the database? And this is where it starts to get really nasty, really fast.
Let's say you have a simple Person table, with first_name, last_name, email_address, street_address, city, state, zip, and phone_number. You want to automatically generate a UI based on these fields. How do you sort the fields? I mean, ideally, first name and last name should be right next to one another. And it would look very silly if you had city and state before street address. So you have to add a new column to the table to specify sort order, if you go with the quickest method, or a new table to specify each field's order index to their field ID.
What if you want to group parts of the information separately? Then you have to add more UI-specific cruft into your database layout (to do this generically, you'll need a new table specifying which UI fields belong to which UI groupboxes). So you've only solved two problems and already your database layout has gotten twice as ugly, plus now instead of a simple O(1) layout operation when you load the UI, you've gotta do several database queries to find out what fields exist and dynamically lay them out while applying the correct widget order... and we haven't even dealt with sizing (should every field be the maximum size to fit its possible contents, or should all text fields be the same width? Wouldn't it be nice if you could say that some text fields should be one width and height, and some should be another combination? etc), or text justification, or formatting, or any other really common elementary usability requirements that will require further sacrifices from the clarity and simplicity of your database schema.

Most visual database editors. phpMyAdmin for instance.
Because the database structure isn't always a very good logical structure for a user to be using, especially in the case of databases that have been denormalised on purpose for efficiency reasons.

Yup, this route has already been traveled.
Simply pointing at a database will create an oversimplified UI, not giving much more than the CRUD of an Access UI. That's why Naked Objects (I'm one of its committers) builds its metamodel from a pojo domain model. This allows the UI to expose any public methods as menus ... we call this "behaviourally complete".
Per the comment about the UI not being suitable for end-users, I have two points:
distinguish between power users vs casual users. Most internal apps are for the former (we use Alan Coopers' term of a "sovereign application" for this), who understand the domain and don't want fancy UI stuff getting in the way. Most external apps, eg public web sites, are for the latter.
for the latter, there's nothing to prevent the autogenerated UI of a tool like Naked Objects being replaced with a custom or semi-customized viewer. One such viewer is Scimpi, I'm also working on an Eclipse RCP viewer that'll expose extension points. But even here, the auto gen UI is still very valuable for the development team and business analysts for exploration and prototyping.
Hope some of the above has piqued your interest. If you want more, google around, or you might want to check out my book on domain-driven design and NO, at pragprag.com.
HTH
Dan

List of projects that implement this idea.
.NET
dotObjects
Naked Objects
TrueView
Java
Domain Object Explorer
JMatter
Naked Objects
Sanssouci
Trails
Lablz
C++
Typical Objects

Specifically to your second question: Alot of it really depends on your data model. Some are very complicated and would lead to un-intuitive user interfaces. Perhaps for simply CRUD based systems, having your UI be a front end to the database would be preferable. In that case, I think that some of these tools would be great. However, for some more complicated systems where some db data needs to be hidden from the users, it would be better if you UI didn't mirror the db schema.

Microsoft Access has used this model for years - the database and UI development are very closely tied. You can auto-generate a form directly from a table definition with smart defaults and search built in. The model works well for developing applications with few concurrent users such as custom applications for small businesses where the amount of data stored is small.
If you are scaling to larger relational DBs with a number of concurrent users, or large databases then reliability and performance become more important, and separately constructed UI and databases make more sense. When more users are involved they often have different requirements so decoupling the UI from the DB schema makes it more efficient to develop.

Just a note on Java "projects that implement this idea" - tynamo is the new version of Trails framework

There are many systems that build an interface for you to edit stuff directly from table information. End-user interfaces, however, must be tweaked a little bit. You may not want to reveal to the user every field in your table.
Frameworks that make good use of the MVC design pattern can let you do all kinds of things with your models, which are the preferred way to build new systems (rather than creating database tables directly).
To answer your questions specifically:
django allows you to construct forms (and a complete admin CMS) out of models.
It is a common thing to do.

Naked Objects is about one step removed from this. They base the UI on an object model, and then persist the object model.

I think you are forgetting to consider the user in your design process if you are thinking like that. Bad mistake. Users don't like it when the interface changes, they would especially not like it if it changed frequently as they then wouldn't know what to do. Further, if you generate your UI on the fly based on the database structure, then what order would the objects be in? UIs need to have objects in an order that makes sense to the users not the database designers.
Further in a well-designed database there are fields that are not meant for the users to see. Things like numeric keys, insert date, last updated etc. You don't want to automatically expose these to the users and you certainly don't want them to have the ability to mess with the data in such fields.
Finally, if you don't think about the functionality of the page, then you aren't doing your job. A UI needs to be more than just a list of fileds that can be edited. You need to have constraints on who can see what, checks of the data before inserting to the database, business rules that need to be applied. You can't just autogenerate a lot of this (and you shouldn't even if you could!). Design needs thought and care.
Now as to drop down lists, of course you can generate them from the database and not the code, in fact it is the better choice. Just make the query the source for your particular object, not a list generated in code.

You can do it with the help of this cool tool from a developer in Philippines, it is called COBALT. You can download it here.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight