I currently use Guid as the primary key for my ContentItems in my code-first Entity Framework Context. However, since Guid are so unwieldy I would like to also set an alternate, friendly ID for each ContentItem (or descendant of ContentItem) according to the following logic:
Use the Name property, to lower, replacing whitespace with a - , and end the prefix with a - as well
Look in the database to see which other ContentItem have a FriendlyID with the same prefix, and find the one with the highest numeric suffix
Increment that by 1 and add as a suffix
So the first item with name "Great Story" would have FriendlyID of great-story-1, the next one great-story-2, and so forth.
I realize there are a number of ways to implement this sort of thing, but here are my questions:
Is it advisable to explicitly set a new field with the alternate ID according to this logic, or should I just run a query each time applying the same rules as I would to generate the ID to find the right object?
How should I enforce the setting of the alternate ID? Should I do it in my service methods for each content item at creation time? (This concerns me because if someone forgets to add that logic to the service method, now the object doesn't have a FriendlyID) Or should I do it in the model itself, with a property with manually-defined getters/setters that have to query the DB and find out what the next available FriendlyID is?
Are there alternatives to using this sort of FriendlyID for the purpose of making human-friendly URL's and web service requests? The ultimate purpose of this thing is really so that we can have users go to http://awesomewebsite.com/Content/great-story-1 and get sent to the right content item, rather than http://awesomewebsite.com/Content/f0be271e-ee01-48de-8599-ddd602e777b6, etc.
Pre-generate them. This allows you to index them. I understand your concern but there's no alternative in practice. (I have done this.)
I don't know the architecture of your app. Just note, that generating such an ID requires database query access. It probably shouldn't be done as a property or method on the entity itself.
You could use a combination by putting both a "speaking name" and and ID into the URL. I have seen sites do this. For GUID ID's this is not exactly pretty, though.
Write yourself a few helper methods to generate such string IDs in a convenient and robust way. That way it is not that much trouble doing this.
Related
I'm creating a REST API which has a method to generate a price for an order. The parameters for the order are passed via GET request, and the logic for calculating a price based on those parameters is quite large and complex.
I'm wondering how I can move that logic out of the controller to keep the code DRY and tidy.
I feel like the best solution would be to have a Price model of some kind, which is a class not linked to a table but expects to be created with the required parameters, and can then perform various tasks and ultimately give a price based on the variables. I would also like to be able to perform validation upon creation of the model, so that it can check it has all the required parameters and that they are valid.
What is the best way to architect this?
How do I create an "imaginary" model which is not really a table or proper entity?
What is the best way to architect this?
By not using a table object because you don't want to interact with a table.
How do I create an "imaginary" model which is not really a table or proper entity?
Data processing belongs clearly into the model layer. Model is a whole layer and not just about databases. This is a common mistake. So simply create your own set of classes or single class in the src/Model/ or src/Model/Calculator/ folder like src/Model/Calculator.php and then simply use namespaces and autoloading for use it where ever you want to use that.
use App\Model\Calculator;
$calculator = new Calculator();
You can use that class in the beforeSave() of a table object for example if you want to store the price somewhere.
I recommend you to always think in terms of responsibility and concerns and how to separate them. The calculator should do it's calculations, nothing more, it doesn't need to know anything about a database connection or table to do it's job. Read about SoC.
I have a fairly simple application (like CRM) which has a lot of contacts and associated tags.
A user can search giving lot of criteria (search-items) such as
updated_time in last 10 days
tags in xxx
tags not in xxx
first_name starts with xxx
first_name not in 'Smith'
I understand indexing and how filters (not in) cannot work on more than one property.
For me, since most of the times, reporting is done in a cron - I can iterate through all records and process them. However, I would like to know the best optimized route of doing it.
I am hoping that instead of querying 'ALL', I can get close to a query which can run with the appengine design limits and then manually match rest of the items in the query.
One way of doing it is to start with the first search-item and then get count, add another the next search-item, get count. The point it bails out, I then process those records with rest of the search-items manually.
The question is
Is there a way before hand to know if a query is valid programatically w/o doing a count
How do you determine the best of search-items in a set which do not collide (like not-in does not work on many filters) etc.
The only way I see it is to get all equal filters as one query, take the first in-equality filter or in, execute it and just iterate over the search entities.
Is there a library which can help me ;)
I understand indexing and how filters (not in) cannot work on more than one property.
This is not strictly true. You may create a "composite index" which allows you to perform filters on multiple fields. These consume additional data.
You may also generate your own equivalent of composite index by generating your own "composite field" that you can use to query against.
Is there a way before hand to know if a query is valid programatically w/o doing a count
I'm not sure I understand what kind of validity you're referring to.
How do you determine the best of search-items in a set which do not collide (like not-in does not work on many filters) etc.
A "not in" filter is not trivial. One way is to create two arrays (repeated fields). One with all the tagged entries and one with not all the tags. This would allow you to easily find all the entities with and without the tag. The only issue is that once you create a new tag, you have to sweep across the entities adding a "not in" entry for all the entities.
Using AppEngine datastore, but this might be agnostic, no idea.
Assume a database entity called Comment. Each Comment belongs to a User. Every Comment has a date property, pretty standard so far.
I want something that will let me: specify a User and get back a dictionary-ish (coming from a Python background, pardon. Hash table, map, however it should be called in this context) data structure where:
keys: every date appearing in the User's comment
values: Comments that were made on date.
I guess I could just iterate over a range of dates an build a map like this myself, but I seriously doubt I need to "invent" my own solution here.
Is there a way/tool/technique to do this?
Datastore supports both references and list properties. This let's you build one-to-many relationships in two ways:
Parent (User) has a list property containing keys of Child entities (Comment).
Child has a key property pointing to Parent.
Since you need to limit Comments by date, you'd best go with option two. Then you could query Comments which have date=somedate (or date range) and where user=someuserkey.
There is no native grouping functionality in Datastore, so to also "group" by date, you can add a sort on date to the query. Than when you iterate over the result, when the date changes you can use/store it as a grouping key.
Update
Designing no-sql databases should be access-oriented (versus datamodel oriented in sql): for often-used operations you should be getting data out as cheaply (= as few operations) as possible.
So, as a rule of thumb you should, in one operation, only get data that is needed at that moment (= shown on that page to user). I'm not sure about your app's design, but I doubt you need all user's full comments (with text and everything) at one time.
I'd start by saying you shouldn't apologize for having a Python background. App Engine started supporting only Python. Using the db module, you could have a User entity as the parent of several DailyCommentBatch entities each a parent of a couple Comment entities. IIRC, this will keep all related entities stored together (or close).
If you are using the NDB (I love it) you may have employ a StructuredProperty either at the User or DailyCommentBatch levels.
I am trying out Google App Engine Java, however the absence of a unique constraint is making things difficult.
I have been through this post and this blog suggests a method to implement something similar. My background is in MySQL.Moving to datastore without a unique constraint makes me jittery because I never had to worry about duplicate values before and checking each value before inserting a new value still has room for error.
"No, you still cannot specify unique
during schema creation."
-- David Underhill talks about GAE and the unique constraint (post link)
What are you guys using to implement something similar to a unique or primary key?
I heard about a abstract datastore layer created using the low level api which worked like a regular RDB, which however was not free(however I do not remember the name of the software)
Schematic view of my problem
sNo = biggest serial_number in the db
sNo++
Insert new entry with sNo as serial_number value //checkpoint
User adds data pertaining to current serial_number
Update entry with data where serial_number is sNo
However at line number 3(checkpoint), I feel two users might add the same sNo. And that is what is preventing me from working with appengine.
This and other similar questions come up often when talking about transitioning from a traditional RDB to a BigTable-like datastore like App Engine's.
It's often useful to discuss why the datastore doesn't support unique keys, since it informs the mindset you should be in when thinking about your data storage schemes. The reason unique constraints are not available is because it greatly limits scalability. Like you've said, enforcing the constraint means checking all other entities for that property. Whether you do it manually in your code or the datastore does it automatically behind the scenes, it still needs to happen, and that means lower performance. Some optimizations can be made, but it still needs to happen in one way or another.
The answer to your question is, really think about why you need that unique constraint.
Secondly, remember that keys do exist in the datastore, and are a great way of enforcing a simple unique constraint.
my_user = MyUser(key_name=users.get_current_user().email())
my_user.put()
This will guarantee that no MyUser will ever be created with that email ever again, and you can also quickly retrieve the MyUser with that email:
my_user = MyUser.get(users.get_current_user().email())
In the python runtime you can also do:
my_user = MyUser.get_or_create(key_name=users.get_current_user().email())
Which will insert or retrieve the user with that email.
Anything more complex than that will not be scalable though. So really think about whether you need that property to be globally unique, or if there are ways you can remove the need for that unique constraint. Often times you'll find with some small workarounds you didn't need that property to be unique after all.
You can generate unique serial numbers for your products without needing to enforce unique IDs or querying the entire set of entities to find out what the largest serial number currently is. You can use transactions and a singleton entity to generate the 'next' serial number. Because the operation occurs inside a transaction, you can be sure that no two products will ever get the same serial number.
This approach will, however, be a potential performance chokepoint and limit your application's scalability. If it is the case that the creation of new serial numbers does not happen so often that you get contention, it may work for you.
EDIT:
To clarify, the singleton that holds the current -- or next -- serial number that is to be assigned is completely independent of any entities that actually have serial numbers assigned to them. They do not need to be all be a part of an entity group. You could have entities from multiple models using the same mechanism to get a new, unique serial number.
I don't remember Java well enough to provide sample code, and my Python example might be meaningless to you, but here's pseudo-code to illustrate the idea:
Receive request to create a new inventory item.
Enter transaction.
Retrieve current value of the single entity of the SerialNumber model.
Increment value and write it to the database
Return value as you exit transaction.
Now, the code that does all the work of actually creating the inventory item and storing it along with its new serial number DOES NOT need to run in a transaction.
Caveat: as I stated above, this could be a major performance bottleneck, as only one serial number can be created at any one time. However, it does provide you with the certainty that the serial number that you just generated is unique and not in-use.
I encountered this same issue in an application where users needed to reserve a timeslot. I needed to "insert" exactly one unique timeslot entity while expecting users to simultaneously request the same timeslot.
I have isolated an example of how to do this on app engine, and I blogged about it. The blog posting has canonical code examples using Datastore, and also Objectify. (BTW, I would advise to avoid JDO.)
I have also deployed a live demonstration where you can advance two users toward reserving the same resource. In this demo you can experience the exact behavior of app engine datastore click by click.
If you are looking for the behavior of a unique constraint, these should prove useful.
-broc
I first thought an alternative to the transaction technique in broc's blog, could be to make a singleton class which contains a synchronized method (say addUserName(String name)) responsible adding a new entry only if it is unique or throwing an exception. Then make a contextlistener which instantiates a single instance of this singleton, adding it as an attribute to the servletContext. Servlets then can call the addUserName() method on the singleton instance which they obtain through getServletContext.
However this is NOT a good idea because GAE is likely to split the app across multiple JVMs so multiple singleton class instances could still occur, one in each JVM. see this thread
A more GAE like alternative would be to write a GAE module responsible for checking uniqueness and adding new enteries; then use manual or basic scaling with...
<max-instances>1</max-instances>
Then you have a single instance running on GAE which acts as a single point of authority, adding users one at a time to the datastore. If you are concerned about this instance being a bottleneck you could improve the module, adding queuing or an internal master/slave architecture.
This module based solution would allow many unique usernames to be added to the datastore in a short space of time, without risking entitygroup contention issues.
UPDATE: After further testing, it seems this issue affects all child entities in my entity group. My root parent for all these different instances is User kind, which is my own creation, not the built in User kind. After removing the parent=user from the constructor of the child Kind, the get_by_key_name works as expected. However, I would like to be able to use the Entity Group functionality along with the defined keys, if that is possible.
--
Hi,
I am attempting to use defined key names for speedier querying in my GAE project.
However, I have run into an odd issue where I cannot fetch they key. This code does not seem to work:
for l in Logins.all().fetch():
print Login.get_by_key_name(l.key().name())
Some notes:
I have only tested in the SDK
l.key().name() Returns the key name string listed with the entity when I look in the data store. I can copy and paste the string out of the data story and use that as the arg to get_by_key_name() and that does not work either.
keynames for the Login kind are all prefixed with an "l" (i.e. lowercase "L") and are other wise all lowercase and may contain underscores or dashes but are under 500 bytes.
Other kind serches like this work.
The key is a interpolation of 2 properties of the Login kind, and I can fetch the objects just fine using regular .filter() methods
The "parent" for the instances is a User class. (mentioning in case this has some bearing on the way I have to fetch)
So I have to ask, is there any obvious reasons why this would not work? Any known issues with key name searches using the SDK?
Your second comment is correct, AFAIK. The parent/child relationship is similar to a directory or folder structure in your filesystem. Your key is (conceptually) /parents/[parent_keyname]/logins/[login_keyname]. So if you try to fetch /logins/[login_keyname] you will not get your entity. (There is no rule that all Logins must be children of Parents; `get_by_key_name() must be told of the parent relationship every time.)
In my own code, I have ended up building my keys myself with Key.from_path(). I use class methods, e.g. Login.key_for_name(some_parent, some_name) and also Login.get_by_key_name_for_parent(some_parent, some_name) (well, my method name is shorter but just making it clear. Then at least it is not possible for me to generate a key with the wrong parent/child relationship.