What pattern applies to encapsulating "contextual" queries?

What pattern applies to encapsulating "contextual" queries? - database

At the moment, my project at work has a very inefficient loop which is suffering the n + 1 problem to a great degree. (6n + 1, I think.) Currently, a number of web services instantiate an object whose constructor builds a canonical representation of one of our ORM objects -- call them Foo and FooView(). There are a number of places where a collection of Foo is built; each instance of Foo is passed to FooView and has its (pseudo-)foreign key fields queried in another database to build a textual representation, so that, for example, we can return <fooColor>Blue</fooColor> rather than <fooColor>5</fooColor>. The sets of these properties--Colors, Shapes, and other similarly general properties--are relatively small, and obviously should be pulled into memory.
There is also another, more complex query, which is contributing to the 6n + 1 problem. This is a set of metadata fields. Each Foo has a Source. Each Source can have one, none, or many metadata fields defined for their subset of Foos. Empty XML tags are required for metadata fields which apply to a given Foo's Source. Currently, the four(!) ORM queries(!) used to build this XML are located inside the FooView constructor, meaning they get executed for each and every Foo.
My goal is as follows:
Query for general properties, like Color, Shapes, etc. before anything else.
Run the query to generate the collection of Foo. Store the primary keys in a list.
Using the list of primary keys, run the heinous multi-join, raw SQL query to generate Foo.Metadata.
Call FooView, providing the collection of Foo along with a context object containing the items built in steps 1 and 3. FooView will provide the interleaving logic, using the context object rather than database lookups.
Is this a sound practice? It will certainly solve some of the performance problems in generating the FooView, but where should this thing live? Should I call it FooHelper? FooContext? FooService? Is this a design pattern, or is there one I should be using to make this more logical?
Thanks!

Related

Referencing previously defined items in JSON-LD

I'm trying to wrap my head around defining JSON-LD correctly for my website. The bit I'm not sure about is how to reference previously defined JSON-LD items without having to copy and paste them.
I know that each item can be given an #id property, but how should I correctly utilize it (if I even can)?
For example, suppose I create an Organization item with an #id of https://example.com/#Organization.
When I need to reference that item again, is it correct to simply specify that #id again, nothing more?
Also am I correct in assuming that I can do this even if the item isn't defined on the page that I'm referencing it?
In the case of the Organization item type, my understanding is that you should only declare it on the home page, rather than every page, so if the user is currently on the product page, and I want to reference the organization, it isn't already defined on the page I'm on, but has been declared elsewhere.

You're correct that using the same #id in different places allows you to make statements about the same thing. In fact, the JSON-LD Flattening algorithm, which is used as part of Framing, consolidates these all together in a single node object.
JSON-LD is a format for Linked Data, and it is reasonable to say that statements made about the same resource on different locations (pages) can be merged together, and if you form a Knowledge Graph from information across multiple locations, this is effectively what you're doing. A Knowledge Graph will typically reduce the JSON-LD (or other equivalent syntactic representation) to RDF Triples/Quads, where each "page" effectively defines a graph, which can be combined to create a larger Dataset. You can then query the dataset in different ways to retrieve that information, which can result in the separate statements being consolidated.
Most applications, however, will likely look for a complete definition of a resource in a single location. But for something like Organization, you could imaging that different Employee resources might be made, where there is a relation such as :Employee :worksFor :Organization, so that the page for an Organization would not expect to also list every employee in that organization, but a more comprehensive Knowledge Graph made from the merge of all of those separate resources could be used to reconstruct it.

How to filter a parent entity using properties of child entity in datastore

I am using Google App Engine (Java) for my REST backend and google-datastore as the database and using objectify to access the database.
I want to create a unit entity to store Units where a unit can be a component or an assembled unit , basically an assembled unit is made up of multiple components and also has some properties of its own. There can be multiple types of assembled units and multiple types of components.
The Entity class would be something like
public class UnitEntity {
Long unitId;
String serialNumber;
String state;
String unitType;
Long parentId;// -> this would be null for all assembled units and in case of components, if they are part of any assembled unit, it will be the Id of the assembled unit (This is added so that I can list all components of a particular assembled unit)
UnitParameters unitParameters;
}
Here UnitParameters would be a polymorphic class to contain properties specific to a unit type, that is, based on the value of "unitType", there would be different classes which extend "UnitParameters".
Let's assume one the components (let's say component1, that is , unitType=component1) has a property called modelNumber. This property would be stored in the unitParameters of the all the entities where unitType=component1.
Now I want to able to list units where unitType=assembledUnit1 and which have a child component1 whose modelNumber is 2.0.
(I can easily get list units of type component1 where modelNumber is 2.0 , but I want to be able to get the parent entity also)
So basically here I am trying to get parent entities by filtering on the properties of children.
I want to know whether this is possible with datastore and objectify? Is there any way to achieve this functionality?
Update - Follow-up question based on the Answer by #stickfigure:
If I go with google cloud sql (which is based on mysql) for my use case, then how should I model my data ?
I initially thought of having a table for each unitType. Let's say there are 3 unitTypes - assembledUnit1, assembledUnit2 and component1. Now if I want to have an API which lists the details of each unit , how can I achieve this with cloud sql.
This is something I could have done with datastore since all the entities were of the same "kind".
I can obviously have separate APIs to list all units of type assembledUnit1, assembledUnit2 etc., but how can I have a single API which could list could list all the units ?
Also in this approach, if someone calls the REST API GET /units/{unitId} , I suppose I would have to check for the unitId in each of the tables which doesn't seem correct?
I suppose one way by which this could be solved is to just have one table called "Unit" whose columns would be a superset of the columns of all the unitTypes. However I don't think this is a good way of designing since there would be a lot of empty columns for each row and also the schema would have to be changed if a new unitType is added.

The datastore doesn’t do joins. So you’re left with two options, either 1) do the join yourself via fetching or 2) denormalize some of the child data into the parent and index it there. Which strategy works best will vary depending on the shape of your data and performance/cost considerations.
I should add there is a third option which is “store an external index of some of your data in another kind of database, such as the search api or an RDBMS”.
This is not always a very satisfying answer - the ability to do joins and aggregations in an RDBMS is incredibly useful. If you have highly relational data with modest size/traffic/reliability requirements, you may want to use something like Cloud SQL instead.

Datastore efficiency, low level API

Every Cloud Datastore query computes its results using one or more indexes, which contain entity keys in a sequence specified by the index's properties and, optionally, the entity's ancestors. The indexes are updated incrementally to reflect any changes the application makes to its entities, so that the correct results of all queries are available with no further computation needed.
Generally, I would like to know if
datastore.get(List<Key> listOfKeys);
is faster or slower than a query with the index file prepared (with the same results).
Query q = new Query("Kind")(.setFilter(someFilter));
My current problem:
My data consists of Layers and Points. Points belong to only one unique layer and have unique ids within a layer. I could load the points in several ways:
1) Have points with a "layer name" property and query with a filter.
- Here I am not sure whether the datastore would have the results prepared because as the layer name changes dynamically.
2) Use only keys. The layer would have to store point ids.
KeyFactory.createKey("Layer", "layer name");
KeyFactory.createKey("Point", "layer name"+"x"+"point id");
3) Use queries without filters: I don't actually need the general kind "Point" and could be more specific: kind would be ("layer name"+"point id")
- What are the costs to creating more kinds? Could this be the fastest way?
Can you actually find out how the datastore works in detail?

faster or slower than a query with the index file prepared (with the same results).
Fundamentally a query and a get by key are not guaranteed to have the same results.
Queries are eventually consistent, while getting data by key is strongly consistent.
Your first challenge, before optimizing for speed, is probably ensuring that you're showing the correct data.
The docs are good for explaining eventual vs strong consistency, it sounds like you have the option of using an ancestor query which can be strongly consistent. I would also strongly recommend avoiding using the 'name' - which is dynamic - as the entity name, this will cause you an excessive amount of grief.
Edit:
In the interests of being specifically helpful, one option for a working solution based on your description would be:
Give a unique id (a uuid probably) to each layer, store the name as a property
Include the layer key as the parent key for each point entity
Use an ancestor query when fetching points for a layer (which is strongly consistent)
An alternative option is to store points as embedded entities and only have one entity for the whole layer - depends on what you're trying to achieve.

Database storage design of large amounts of heterogeneous data

Here is something I've wondered for quite some time, and have not seen a real (good) solution for yet. It's a problem I imagine many games having, and that I can't easily think of how to solve (well). Ideas are welcome, but since this is not a concrete problem, don't bother asking for more details - just make them up! (and explain what you made up).
Ok, so, many games have the concept of (inventory) items, and often, there are hundreds of different kinds of items, all with often very varying data structures - some items are very simple ("a rock"), others can have insane complexity or data behind them ("a book", "a programmed computer chip", "a container with more items"), etc.
Now, programming something like that is easy - just have everything implement an interface, or maybe extend an abstract root item. Since objects in the programming world don't have to look the same on the inside as on the outside, there is really no issue with how much and what kind of private fields any type of item has.
But when it comes to database serialization (binary serialization is of course no problem), you are facing a dilemma: how would you represent that in, say, a typical SQL database ?
Some attempts at a solution that I have seen, none of which I find satisfying:
Binary serialization of the items, the database just holds an ID and a blob.
Pro's: takes like 10 seconds to implement.
Con's: Basically sacrifices every database feature, hard to maintain, near impossible to refactor.
A table per item type.
Pro's: Clean, flexible.
Con's: With a wide variety come hundreds of tables, and every search for an item has to query them all since SQL doesn't have the concept of table/type 'reference'.
One table with a lot of fields that aren't used by every item.
Pro's: takes like 10 seconds to implement, still searchable.
Con's: Waste of space, performance, confusing from the database to tell what fields are in use.
A few tables with a few 'base profiles' for storage where similar items get thrown together and use the same fields for different data.
Pro's: I've got nothing.
Con's: Waste of space, performance, confusing from the database to tell what fields are in use.
What ideas do you have? Have you seen another design that works better or worse?

It depends if you need to sort, filter, count, or analyze those attribute.
If you use EAV, then you will screw yourself nicely. Try doing reports on an EAV schema.
The best option is to use Table Inheritance:
PRODUCT
id pk
type
att1
PRODUCT_X
id pk fk PRODUCT
att2
att3
PRODUCT_Y
id pk fk PRODUCT
att4
att 5
For attributes that you don't need to search/sort/analyze, then use a blob or xml

I have two alternatives for you:
One table for the base type and supplemental tables for each “class” of specialized types.
In this schema, properties common to all “objects” are stored in one table, so you have a unique record for every object in the game. For special types like books, containers, usable items, etc, you have another table for each unique set of properties or relationships those items need. Every special type will therefore be represented by two records: the base object record and the supplemental record in a particular special type table.
PROS: You can use column-based features of your database like custom domains, checks, and xml processing; you can have simpler triggers on certain types; your queries differ exactly at the point of diverging concerns.
CONS: You need two inserts for many objects.
Use a “kind” enum field and a JSONB-like field for the special type data.
This is kind of like your #1 or #3, except with some database help. Postgres added JSONB, giving you an improvement over the old EAV pattern. Other databases have a similar complex field type. In this strategy you roll your own mini schema that you stash in the JSONB field. The kind field declares what you expect to find in that JSONB field.
PROS: You can extract special type data in your queries; can add check constraints and have a simple schema to deal with; you can benefit from indexing even though your data is heterogenous; your queries and inserts are simple.
CONS: Your data types within JSONB-like fields are pretty limited and you have to roll your own validation.

Yes, it is a pain to design database formats like this. I'm designing a notification system and reached the same problem. My notification system is however less complex than yours - the data it holds is at most ids and usernames. My current solution is a mix of 1 and 3 - I serialize data that is different from every notification, and use a column for the 2 usernames (some may have 2 or 1). I shy away from method 2 because I hate that design, but it's probably just me.
However, if you can afford it, I would suggest thinking outside the realm of RDBMS - it sounds like Non-RDBMS (especially key/value storage ones) may be a better fit to store these data, especially if item 1 and item 2 differ from each item a lot.

I'm sure this has been asked here a million times before, but in addition to the options which you have discussed in your question, you can look at EAV schema which is very flexible, but which has its own sets of cons.
Another alternative is database systems which are not relational. There are object databases as well as various key/value stores and document databases.
Typically all these things break down to some extent when you need to query against the flexible attributes. This is kind of an intrinsic problem, however. Conceptually, what does it really mean to query things accurately which are unstructured?

First of all, do you actually need the concurrency, scalability and ACID transactions of a real database? Unless you are building a MMO, your game structures will likely fit in memory anyway, so you can search and otherwise manipulate them there directly. In a scenario like this, the "database" is just a store for serialized objects, and you can replace it with the file system.
If you conclude that you do (need a database), then the key is in figuring out what "atomicity" means from the perspective of the data management.
For example, if a game item has a bunch of attributes, but none of these attributes are manipulated individually at the database level (even though they could well be at the application level), then it can be considered as "atomic" from the data management perspective. OTOH, if the item needs to be searched on some of these attributes, then you'll need a good way to index them in the database, which typically means they'll have to be separate fields.
Once you have identified attributes that should be "visible" versus the attributes that should be "invisible" from the database perspective, serialize the latter to BLOBs (or whatever), then forget about them and concentrate on structuring the former.
That's where the fun starts and you'll probably need to use "all of the above" strategy for reasonable results.
BTW, some databases support "deep" indexes that can go into heterogeneous data structures. For example, take a look at Oracle's XMLIndex, though I doubt you'll use Oracle for a game.

You seem to be trying to solve this for a gaming context, so maybe you could consider a component-based approach.
I have to say that I personally haven't tried this yet, but I've been looking into it for a while and it seems to me something similar could be applied.
The idea would be that all the entities in your game would basically be a bag of components. These components can be Position, Energy or for your inventory case, Collectable, for example. Then, for this Collectable component you can add custom fields such as category, numItems, etc.
When you're going to render the inventory, you can simply query your entity system for items that have the Collectable component.
How can you save this into a DB? You can define the components independently in their own table and then for the entities (each in their own table as well) you would add a "Components" column which would hold an array of IDs referencing these components. These IDs would effectively be like foreign keys, though I'm aware that this is not exactly how you can model things in relational databases, but you get the idea.
Then, when you load the entities and their components at runtime, based on the component being loaded you can set the corresponding flag in their bag of components so that you know which components this entity has, and they'll then become queryable.
Here's an interesting read about component-based entity systems.

How to query a particular object without it's embedded objects or collections?

I have a class, lets say Blarkar. Blarkar has an embed class kar. Sometimes when I query for an instance of Blarkar I want the complete object, but other times I don't need all its embed objects and their embed objects. How do I load an object without its embed objects?

You can't. GAE loads an entity whole or not at all. Generally this is not a problem and you shouldn't try to optimize unless you know you have a real issue. But if so, you can split your entity into multiple parts, eg User and UserExtraStuff.
There is a special type of query called a projection query, but this is not likely going to be useful - it lets you select some data out of an index without doing a full entity lookup. It's only useful in limited types of inequality queries. The data has to be in the index.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight