I'm using the appengine datastore, and all of my entities have Long ids as their PrimaryKey. I use those ids to communicate with the client, since the full-fledged Keys take much more bandwidth to transmit.
Now, I want to form entity groups so that I can do complex operations within transactions, and it seems from http://code.google.com/appengine/docs/java/datastore/transactions.html#Entity_Groups that I need to use Keys or String encoded keys - the simple Longs are not an option.
I don't mind refactoring a little to use Keys, but I still want to avoid sending the behemoth things over the wire. How can I get a unique (per kind) Long identifier for an entity whose primary key is a Key?
You do not have to use names (strings). All of the KeyBuilder methods that take names also have counterparts that take ids (longs).
For transmission, you simply need the name or id part of a Key. Once you know the id or name, you can reconstruct the Key server side. If it is a child entity, you'll need to know both the parent and the child's names or ids.
Related
I have a question regarding specific data vault modelling.
I have a source table which captures call center CALL informations like this:
CallId (business key)
Date
Call_alert
Call_acw
etc
The same source table also has a bunch of foreign keys in it, like this:
RouteID (on which line the call eventually ends)
ConnectionType (phone, email etc)
Via each foreign key it is possible to retrieve extra-information about the key (which is not linked to the CALL).
My question is how to model these foreign keys in my model? Do i keep them as attributes in my satellite or do i model them as links? Or any other option i haven't thought about?
Thanks!!
I'll focus on one example you give (RouteID) but the discussion is probably the same for each.
The first thing to remember is that the aim in Data Vault is to model the business and business processes not the system the data is stored in. Foreign keys may be an indication of something meaningful (a linking between two hubs) or they may not (the product of normalisation within the database that you may not need to replicate).
The first step in your case is to think about what RouteID and the data it links through to means to the business. If route (or the line it represents) is a meaningful concept to the business in its own right then it probably requires it's own hub, satellites for data relating to it and then link tables to join it up to your call data.
On the other hand the data might only have meaning as categorisation of another hub (call in your case) in which case think about de-normalising it in to a satellite that connects to the call hub. Remember that you can have multiple satellites connected to one hub, there's nothing preventing you having a call route satellite, a connection type satellite and so on.
You'll need to make this decision for each foreign key and probably end up with different choices for each. Staff member receiving the call for example would almost certainly be a link to another hub as you almost certainly have other data you want to link staff to. Something like the connection type you mention is unlikely to be meaningful in its own right so is more likely to form part of a satellite.
I do not want to create an autogenerated key for my entities so I specify my own:
Entity employee = Entity.newBuilder().setKey(makeKey("Employee", "bobby"))
.addProperty(makeProperty("fname", makeValue("fname").setIndexed(false)))
.addProperty(makeProperty("lname", makeValue("lname").setIndexed(false)))
.build();
CommitRequest request = CommitRequest.newBuilder()
.setMode(CommitRequest.Mode.NON_TRANSACTIONAL)
.setMutation(Mutation.newBuilder().addInsert(employee))
.build();
datastore.commit(request);
When I check to see what the entity looks like it looks like this:
Why is this auto-generated key generated if I specified my own key (bobby)? It seems bobby was also created, but now I have bobby and this autogenerated key. What is the difference between the key and id/name?
You can't specify your own key, keys actually contain information necessary for the datastore operation. This note in the documentation gives you an idea:
Note: The URL-safe string looks cryptic, but it is not encrypted! It
can easily be decoded to recover the original entity's kind and
identifier:
key = Key(urlsafe=url_string)
kind_string = key.kind()
ident = key.id()
If you use such URL-safe keys, don't use sensitive data such as email
addresses as entity identifiers. (A possible solution would be to use
the MD5 hash of the sensitive data as the identifier. This stops third
parties, who can see the encrypted keys, from using them to harvest
email addresses, though it doesn't stop them from independently
generating their own hash of a known email address and using it to
check whether that address is present in the Datastore.)
What you can specify is the ID portion of the key, either as a number or as a string:
A key is a series of kind-ID pairs. You want to make sure each entity
has a key that is unique within its application and namespace. An
application can create an entity without specifying an ID; the
Datastore automatically generates a numeric ID. If an application
picks some IDs "by hand" and they're numeric and the application lets
the Datastore generate some IDs automatically, the Datastore might
choose some IDs that the application already used. To avoid, this, the
application should "reserve" the range of numbers it will use to
choose IDs (or use string IDs to avoid this issue entirely).
This is the url-safe version of your key, suitable for use in links. Use KeyFactory.stringToKey to convert it to an actual key, and you'll see that it contains your string name.
What you create with makeKey("Employee", "bobby") is a key for an Entity with the entity name Employee and the name bobby. What you see as Key in the datastore viewer is a representation for exactly that.
Generally speaking a key always consists of
optional parent key (with entity type and name/id)
entity type
entity name/id
Maybe someone here can tell you how to decode the key into its components but rest asured that you're doing everything right and the behavior is as expected.
I had a question regarding why Google App Engine's Datastore uses a key and and ID. Coming from a relational database background I am comparing entities with rows, so why when storing an entity does it require a key (which is a long automatically generated string) and an ID (which can be manually or automatically entered)? This seems like a big waste of space to identify a record. Again I am new to this type of database, so I may be missing something.
Key design is a critical part of efficient Datastore operations. The keys are what are stored in the built-in and custom indexes and when you are querying, you can ask to have only keys returned (in Python: keys_only=True). A keys-only query costs a fraction of a regular query, both in $$ and to a lesser extent in time, and has very low deserialization overhead.
So, if you have useful/interesting things stored in your key id's, you can perform keys-only queries and get back lots of useful data in a hurry and very cheaply.
Note that this extends into parent keys and namespaces, which are all part of the key and therefore additional places you can "store" useful data and retrieve all of it with keys-only queries.
It's an important optimization to understand and a big part of our overall design.
Basically, the key is built from two pieces of information :
The entity type (in Objectify, it is the class of the object)
The id/name of the entity
So, for a given entity type, key and id are quite the same.
If you do not specify the ID yourself, then a random ID is generated and the key is created based on that random id.
I want to be as efficient as possible and plan properly. Since read and write costs are important when using Google App Engine, I want to be sure to minimize those. I'm not understanding the "key" concept in the datastore. What I want to know is would it be more efficient to fetch an entity by its key, considering I know what it is, than by fetching by some kind of filter?
Say I have a model called User and a user has an array(list) of commentIds. Now I want to get all this user's comments. I have two options:
The user's array of commentId's is an array of keys, where each key is a key to a Comment entity. Since I have all the keys, I can just fetch all the comments by their keys.
The user's array of commentId's are custom made identifiers by me, in this case let's just say that they're auto-incrementing regular integers, and each comment in the datastore has a unique commentIntegerId. So now if I wanted to get all the comments, I'd do a filtered fetch based on all comments with ID that is in my array of ids.
Which implementation would be more efficient, and why?
Fetching by key is the fastest way to get an entity from the datastore since it the most direct operation and doesn't need to go thru index lookup.
Each time you create an entry (unless you specified key_name) the app engine will generate a unique (per parent entity) numeric id, you should use that as ids for your comments.
You should design a NoSql database (= GAE Datastore) based on usage patterns:
If you need to get all user's comments at once and never need to get one or some of them based on some criteria (e.g. query them), than the most efficient way, in terms of speed and cost would be to serialize all comments as a binary blob inside an entity (or save it to Blobstore).
But I guess this is not the case, as comments are usually tied to both users and to posts, right? In this case above advice would not be viable.
To answer you title question: get by key is always faster then query by a property, because query first goes through index to satisfy the property condition, where it gets the key, then it does the get with this key.
I'm developing an application with Google App Engine and stumbled across the following scenario, which can perhaps be described as "MVP-lite".
When modeling many-to-many relationships, the standard property to use is the ListProperty. Most likely, your list is comprised of the foreign keys of another model.
However, in most practical applications, you'll usually want at least one more detail when you get a list of keys - the object's name - so you can construct a nice hyperlink to that object. This requires looping through your list of keys and grabbing each object to use its "name" property.
Is this the best approach? Because "reads are cheap", is it okay to get each object even if I'm only using one property for now? Or should I use a special property like tipfy's JsonProperty to save a (key, name) "tuple" to avoid the extra gets?
Though datastore reads are comparatively cheaper datastore writes, they can still add significant time to request handler. Including the object's names as well as their foreign keys sounds like a good use of denormalization (e.g., use two list properties to simulate a tuple - one contains the foreign keys and the other contains the corresponding name).
If you decide against this denormalization, then I suggest you batch fetch the entities which the foreign keys refer to (rather than getting them one by one) so that you can at least minimize the number of round trips you make to the datastore.
When modeling one-to-many (or in some
cases, many-to-many) relationships,
the standard property to use is the
ListProperty.
No, when modeling one-to-many relationships, the standard property to use is a ReferenceProperty, on the 'many' side. Then, you can use a query to retrieve all matching entities.
Returning to your original question: If you need more data, denormalize. Store a list of titles alongside the list of keys.