Google Appengine: Odd get_by_key_name behavior - google-app-engine

UPDATE: After further testing, it seems this issue affects all child entities in my entity group. My root parent for all these different instances is User kind, which is my own creation, not the built in User kind. After removing the parent=user from the constructor of the child Kind, the get_by_key_name works as expected. However, I would like to be able to use the Entity Group functionality along with the defined keys, if that is possible.
--
Hi,
I am attempting to use defined key names for speedier querying in my GAE project.
However, I have run into an odd issue where I cannot fetch they key. This code does not seem to work:
for l in Logins.all().fetch():
print Login.get_by_key_name(l.key().name())
Some notes:
I have only tested in the SDK
l.key().name() Returns the key name string listed with the entity when I look in the data store. I can copy and paste the string out of the data story and use that as the arg to get_by_key_name() and that does not work either.
keynames for the Login kind are all prefixed with an "l" (i.e. lowercase "L") and are other wise all lowercase and may contain underscores or dashes but are under 500 bytes.
Other kind serches like this work.
The key is a interpolation of 2 properties of the Login kind, and I can fetch the objects just fine using regular .filter() methods
The "parent" for the instances is a User class. (mentioning in case this has some bearing on the way I have to fetch)
So I have to ask, is there any obvious reasons why this would not work? Any known issues with key name searches using the SDK?

Your second comment is correct, AFAIK. The parent/child relationship is similar to a directory or folder structure in your filesystem. Your key is (conceptually) /parents/[parent_keyname]/logins/[login_keyname]. So if you try to fetch /logins/[login_keyname] you will not get your entity. (There is no rule that all Logins must be children of Parents; `get_by_key_name() must be told of the parent relationship every time.)
In my own code, I have ended up building my keys myself with Key.from_path(). I use class methods, e.g. Login.key_for_name(some_parent, some_name) and also Login.get_by_key_name_for_parent(some_parent, some_name) (well, my method name is shorter but just making it clear. Then at least it is not possible for me to generate a key with the wrong parent/child relationship.

Related

Alternate string ID for Guid ID objects

I currently use Guid as the primary key for my ContentItems in my code-first Entity Framework Context. However, since Guid are so unwieldy I would like to also set an alternate, friendly ID for each ContentItem (or descendant of ContentItem) according to the following logic:
Use the Name property, to lower, replacing whitespace with a - , and end the prefix with a - as well
Look in the database to see which other ContentItem have a FriendlyID with the same prefix, and find the one with the highest numeric suffix
Increment that by 1 and add as a suffix
So the first item with name "Great Story" would have FriendlyID of great-story-1, the next one great-story-2, and so forth.
I realize there are a number of ways to implement this sort of thing, but here are my questions:
Is it advisable to explicitly set a new field with the alternate ID according to this logic, or should I just run a query each time applying the same rules as I would to generate the ID to find the right object?
How should I enforce the setting of the alternate ID? Should I do it in my service methods for each content item at creation time? (This concerns me because if someone forgets to add that logic to the service method, now the object doesn't have a FriendlyID) Or should I do it in the model itself, with a property with manually-defined getters/setters that have to query the DB and find out what the next available FriendlyID is?
Are there alternatives to using this sort of FriendlyID for the purpose of making human-friendly URL's and web service requests? The ultimate purpose of this thing is really so that we can have users go to http://awesomewebsite.com/Content/great-story-1 and get sent to the right content item, rather than http://awesomewebsite.com/Content/f0be271e-ee01-48de-8599-ddd602e777b6, etc.
Pre-generate them. This allows you to index them. I understand your concern but there's no alternative in practice. (I have done this.)
I don't know the architecture of your app. Just note, that generating such an ID requires database query access. It probably shouldn't be done as a property or method on the entity itself.
You could use a combination by putting both a "speaking name" and and ID into the URL. I have seen sites do this. For GUID ID's this is not exactly pretty, though.
Write yourself a few helper methods to generate such string IDs in a convenient and robust way. That way it is not that much trouble doing this.

Is it SAFELY possible to use a str(BlobKey) as datastore id?

Task: implement global, cross entity group blob sharing.
I need an ancestor group with either BlobInfo or a string-representation of the BlobKey as parent of the BlobReference objects to have strong consistency. So I construct a virtual ancestor group with the blob-key as parent of the referencing DB-object ...
br = BlobReferenece(id=some_id, parent = ndb.Key("MyBlobKey",str(blob)))
br.put()
This works in SDK so far, but I am concerned that this is is way off the documeted paths of appengine.
My previous attempts failed to convert a blob-key to a db-key using ndb.Key.from_old_key(blobinfo.key()). It seems there is no legal way to get a "db/ndb" reference to the BlobInfo table (because The BlobInfo class provides a db.Model-like interface). Am I missing something here?
Seems like your question is asking whether you can create some kind of "virtual ancestor group" by specifying a parent that doesn't exist. This is legitimate, it's mentioned in the docs that the parent doesn't actually need to exist.
https://developers.google.com/appengine/docs/python/datastore/entities#Python_Ancestor_paths
Alternatively, if your list of BlobReferences will be limited, it would probably be easier and less expensive to just store a list of them inside one entity. You an make the Key of that container entity the same as the BlobKey. Then fetching that entity by key and modifying it will let you work without eventual consistency problems. It'll also be cheaper than querying and modifying indexed entities.
You sound confused by the various uses of the word "key" in different parts of the API. A blob key has nothing in common with an entity key. The good news is that str() of a BlobKey instance is a sane base64-encoded string that should be fine to use as the ID portion of a Key object. And you can go from that ID string to a BlobKey instance using the BlobKey constructor.

Data storage: "grouping" entities by property value? (like a dictionary/map?)

Using AppEngine datastore, but this might be agnostic, no idea.
Assume a database entity called Comment. Each Comment belongs to a User. Every Comment has a date property, pretty standard so far.
I want something that will let me: specify a User and get back a dictionary-ish (coming from a Python background, pardon. Hash table, map, however it should be called in this context) data structure where:
keys: every date appearing in the User's comment
values: Comments that were made on date.
I guess I could just iterate over a range of dates an build a map like this myself, but I seriously doubt I need to "invent" my own solution here.
Is there a way/tool/technique to do this?
Datastore supports both references and list properties. This let's you build one-to-many relationships in two ways:
Parent (User) has a list property containing keys of Child entities (Comment).
Child has a key property pointing to Parent.
Since you need to limit Comments by date, you'd best go with option two. Then you could query Comments which have date=somedate (or date range) and where user=someuserkey.
There is no native grouping functionality in Datastore, so to also "group" by date, you can add a sort on date to the query. Than when you iterate over the result, when the date changes you can use/store it as a grouping key.
Update
Designing no-sql databases should be access-oriented (versus datamodel oriented in sql): for often-used operations you should be getting data out as cheaply (= as few operations) as possible.
So, as a rule of thumb you should, in one operation, only get data that is needed at that moment (= shown on that page to user). I'm not sure about your app's design, but I doubt you need all user's full comments (with text and everything) at one time.
I'd start by saying you shouldn't apologize for having a Python background. App Engine started supporting only Python. Using the db module, you could have a User entity as the parent of several DailyCommentBatch entities each a parent of a couple Comment entities. IIRC, this will keep all related entities stored together (or close).
If you are using the NDB (I love it) you may have employ a StructuredProperty either at the User or DailyCommentBatch levels.

Enforcing Unique Constraint in GAE

I am trying out Google App Engine Java, however the absence of a unique constraint is making things difficult.
I have been through this post and this blog suggests a method to implement something similar. My background is in MySQL.Moving to datastore without a unique constraint makes me jittery because I never had to worry about duplicate values before and checking each value before inserting a new value still has room for error.
"No, you still cannot specify unique
during schema creation."
-- David Underhill talks about GAE and the unique constraint (post link)
What are you guys using to implement something similar to a unique or primary key?
I heard about a abstract datastore layer created using the low level api which worked like a regular RDB, which however was not free(however I do not remember the name of the software)
Schematic view of my problem
sNo = biggest serial_number in the db
sNo++
Insert new entry with sNo as serial_number value //checkpoint
User adds data pertaining to current serial_number
Update entry with data where serial_number is sNo
However at line number 3(checkpoint), I feel two users might add the same sNo. And that is what is preventing me from working with appengine.
This and other similar questions come up often when talking about transitioning from a traditional RDB to a BigTable-like datastore like App Engine's.
It's often useful to discuss why the datastore doesn't support unique keys, since it informs the mindset you should be in when thinking about your data storage schemes. The reason unique constraints are not available is because it greatly limits scalability. Like you've said, enforcing the constraint means checking all other entities for that property. Whether you do it manually in your code or the datastore does it automatically behind the scenes, it still needs to happen, and that means lower performance. Some optimizations can be made, but it still needs to happen in one way or another.
The answer to your question is, really think about why you need that unique constraint.
Secondly, remember that keys do exist in the datastore, and are a great way of enforcing a simple unique constraint.
my_user = MyUser(key_name=users.get_current_user().email())
my_user.put()
This will guarantee that no MyUser will ever be created with that email ever again, and you can also quickly retrieve the MyUser with that email:
my_user = MyUser.get(users.get_current_user().email())
In the python runtime you can also do:
my_user = MyUser.get_or_create(key_name=users.get_current_user().email())
Which will insert or retrieve the user with that email.
Anything more complex than that will not be scalable though. So really think about whether you need that property to be globally unique, or if there are ways you can remove the need for that unique constraint. Often times you'll find with some small workarounds you didn't need that property to be unique after all.
You can generate unique serial numbers for your products without needing to enforce unique IDs or querying the entire set of entities to find out what the largest serial number currently is. You can use transactions and a singleton entity to generate the 'next' serial number. Because the operation occurs inside a transaction, you can be sure that no two products will ever get the same serial number.
This approach will, however, be a potential performance chokepoint and limit your application's scalability. If it is the case that the creation of new serial numbers does not happen so often that you get contention, it may work for you.
EDIT:
To clarify, the singleton that holds the current -- or next -- serial number that is to be assigned is completely independent of any entities that actually have serial numbers assigned to them. They do not need to be all be a part of an entity group. You could have entities from multiple models using the same mechanism to get a new, unique serial number.
I don't remember Java well enough to provide sample code, and my Python example might be meaningless to you, but here's pseudo-code to illustrate the idea:
Receive request to create a new inventory item.
Enter transaction.
Retrieve current value of the single entity of the SerialNumber model.
Increment value and write it to the database
Return value as you exit transaction.
Now, the code that does all the work of actually creating the inventory item and storing it along with its new serial number DOES NOT need to run in a transaction.
Caveat: as I stated above, this could be a major performance bottleneck, as only one serial number can be created at any one time. However, it does provide you with the certainty that the serial number that you just generated is unique and not in-use.
I encountered this same issue in an application where users needed to reserve a timeslot. I needed to "insert" exactly one unique timeslot entity while expecting users to simultaneously request the same timeslot.
I have isolated an example of how to do this on app engine, and I blogged about it. The blog posting has canonical code examples using Datastore, and also Objectify. (BTW, I would advise to avoid JDO.)
I have also deployed a live demonstration where you can advance two users toward reserving the same resource. In this demo you can experience the exact behavior of app engine datastore click by click.
If you are looking for the behavior of a unique constraint, these should prove useful.
-broc
I first thought an alternative to the transaction technique in broc's blog, could be to make a singleton class which contains a synchronized method (say addUserName(String name)) responsible adding a new entry only if it is unique or throwing an exception. Then make a contextlistener which instantiates a single instance of this singleton, adding it as an attribute to the servletContext. Servlets then can call the addUserName() method on the singleton instance which they obtain through getServletContext.
However this is NOT a good idea because GAE is likely to split the app across multiple JVMs so multiple singleton class instances could still occur, one in each JVM. see this thread
A more GAE like alternative would be to write a GAE module responsible for checking uniqueness and adding new enteries; then use manual or basic scaling with...
<max-instances>1</max-instances>
Then you have a single instance running on GAE which acts as a single point of authority, adding users one at a time to the datastore. If you are concerned about this instance being a bottleneck you could improve the module, adding queuing or an internal master/slave architecture.
This module based solution would allow many unique usernames to be added to the datastore in a short space of time, without risking entitygroup contention issues.

App Engine - Chain of entities generates exceptionally long entity keys

I am writing an application which allows users to send messages between them. I am using transactions to ensure that there is only a single "top" message between any two users, and this "top" message has a link to the "next" message, and so on.. forming a sort of of linked list of messages. The messages reference each other through reference properties, and are placed in the same entity group by declaring each new "top" a having the previous "top" as its parent.
However, the problem with this approach is that each new entity has a key that includes the entire key of the previous entity (ie: new_top_key == old_top_key + new_stuff). This results in entity keys growing at a large rate, and probably very bad behavior after a few hundred messages in a single chain (but I haven't actually tested).
So, my question is: 1) Is this an intentional feature of the App Engine. 2) Should I be avoiding this type of a structure -- or is it somehow efficiently dealt with by the App Engine internally? 3) Do you have any suggestions on what is the correct approach for a linked-list-of-entities type of structure?
Thank you and kind regards
Alex
In order:
Yes. Each entity is uniquely identified by its kind, key or id, and those of all its parents, which means that the entire chain is necessary to identify an entity.
Yes. Instead, have a "conversation" entity (which could be the first message, as well), which is a direct parent of all the posts. If you still need to maintain parent/child relationships within a conversation (instead of just ordering them by timestamp, for example), declare an explicit SelfReferenceProperty.
See #2, above.
Are you using python or java? The detailed answer will depend a bit on which API you are using.
I'm pretty sure that having your keys grow indefinitely is not the best plan. (it might be a good test case for the app engine api though :)
I think the solution will be to separate the entity group information from the message linking information. In order to do transactions on a thread/conversation/chain/whatever, all your messages need to be in the same entity group. However, they do not need to be in a hierarchy that exactly matches the structure of the links between messages. You should explicitly set the parent (entity group) of all your message entities to be the same, in a flat structure. So each entity would be a sibling of the others, in the sense of entity groups. You would also need a field in your entity to link to the next (and/or previous) message. So you would still have a linked list (or tree or whatever) in terms of the "previous message" links.
Both Python and Java have methods for creating an entity with a specific parent/entity group. (In fact, you can even specify a nonexistent entity to be the root of an entity group hierarchy!)
Now the key of each message will be a fixed length, so your "next" and "previous" reference properties will be nice and safe from overflowing some limit on key length.

Resources