I have seen many articles and questions about how to implement a unique constraint in appengine, but I actually didn't found any explanation about why this feature is not present.
If appengine developers considered it would be better not to implement such a feature, i believe they had good reasons, but i'd be interested in understanding why they decided so.
Was this decision guided by performance concerns? Why?
Any detailed explanation about this would be greatly appreciated.
As that post linked here http://code.google.com/p/googleappengine/issues/detail?id=178#c14 says, the distributed nature of datastore makes it difficult to enforce a unique constraint. If two app instances simultaneously try to create an entity, each with a property that should be unique, the only way to enforce this would require some kind of coordination throughout all machines in the datastore.
Imagine a room of 26 people each with a piece of paper, say with a table of pets and their owners. Each person controls every pet with a different letter of the alphabet, e.g. person 1 does everything starting with the letter A, person 2 does all the things starting with the letter B, and so on.
If you wanted to make sure that a pet named mittens was the only mittens in the entire datastore, this is easy because only one person in the room will be involved, and they will be able to check their piece of paper to make sure that mittens isn't already there.
If you wanted to require that owners must be unique too, you can imagine that every time someone wants to write an entry in their table, they need to check with /every single other person/ to make sure that nobody else has that owner name used. This is the fundamental reason that app engine's datastore does not allow uniqueness constraints on anything except entity keys. It would simply not be possible to do it when the datastore contains thousands of servers.
Hopefully you can see why this limitation exists, and hopefully my late-night typing isn't too difficult to read :D
You can see a response from Google on adding unique constraints on their issue list for GAE.
Related
Our team developing a new database for health care ERP. During the brain storming meeting I recommended to use the uniqueidentifier because it has many benefits like
Less round trip to the database OnInsert if we generate the value from client application
By generating it on the client application, we can use more easily the master-detail approach.
It helps in data replication
Till now, I was confident and even I thought I would hear some compliments, till my boss asked me couple of questions:
You are going to use this Guid as primary key with clustered indexing? .
Do you know the size of your table how big it and its consequences on the performance?
Some of the developers proposed the Int and others BigInt
I would like to know if my Boss questions have a base or what I am thinking is true because what I think is best thing for building ERP with replication support.
NOTE I did already search for long time here in this site and on other sites also.
Which of the above is the best key to be used in ERP like health care information system?
Think about what your company is proposing to do and the level of expertise your group currently has. Apparently it does not have significant experience with sql server based on your questions and your manager's questions. I cannot reasonably see a way for you to develop an enterprise-scale system without the necessary expertise - especially with the backend systems that you plan on using.
And your process (as little as you describe it) sounds concerning. "Brainstorming" is not, IMO, a point where you decide on schemas and choose keys. And one should not just blindly choose a particular datatype for every primary key. But all of this is guessing without knowing more about where you are in this process. If your schema is not yet fixed (regardless of what datatypes are selected for each column), then you are not yet in a position to worry about performance.
Lastly, you and your manager confuse two related but independent attributes. A primary key is not the same as the clustered index, despite the unfortunate implementation choices made by the MS development team. They are independent of each other; make a conscious decision about your clustered indexes and do not allow the db engine to automatically choose the primary key as the clustered index.
So to answer your questions. Yes - those questions are valid. But your project does not yet appear to have reached a point where those concerns can be addressed.
I am an unexperienced computer science student and while making projects for different courses a few conceptual questions occurred.
Say I am to develop a website similar to imdb, but for music, from scratch and I want to list some artists on the frontpage.
The database schema is already done with all its relationship and attributes, and there is a table artists.
Should my server-side artist-class contain all table columns and relationships at creation time even it is not necessarily needed at that time?
Or should I construct these objects with minimal parameters (like id, name) and get all the rest when needed (resulting in more individual sql statements) via helper-methods?
I know that there is maybe no definitive answer except for 'it depends' or boils down to personal preference, but maybe there is even a consensus.
If someone could name or link to resources to read up on things like this I would be very grateful, I didn't know what to search for exactly. Thanks.
PS: For people wondering why I don't ask these questions in the CS course; they are mostly held by students/assistants who only had to pass the course and don't have that much experience themselves.
I am not sure what this means so I am answering assuming this does not exist in the question. Will edit answer when clarification is given.
Or should I construct these objects with minimal parameters (like id, name) and get all the rest when needed (resulting in more individual sql statements) via helper-methods?
Actual answer starts here
It does not boil down to personal preference but whether you can or cannot find a practical reason to do something. All design patterns follow practicality instead of personal preferences. Even if there is a consensus you can always ask why.
If there are 100 tables in the database already present and in my web application I can get by with just 2 of them I don't see a reason why I should sit down and create all 100 tables in my web application's domain model. It's just not logical.
There may be some cases when a big application is being created and we are like 99% sure that we will need to model all of it and that requires us to model a bit more classes (say 5 instead of 2) for ensuring that our future work is not hindered.
Also there is the concern of data integerity. Does those 2 tables depend on some other table? Do some table depend on them? If there is a dependency then you might need to include those tables also.
FYI such questions are better suited on programmers stackexchange
I'm hoping that this isn't flagged as "not helpful" because I think that many people are attempting to figure out a way to keep strong consistency in the HRD.
Here is the idea I'm using for my app. I'd like to get your opinions.
I have a fitness app. This is of course made up of Workouts and Exercises.
The HRD contains about 400 exercises to pick from, or the User can create their own Exercise (a UExercise).
When the User logs in, I load all of the Workout keys into a "workoutKeys" List on the User. At the same time I load all the User exercise keys (UExercise) into a "exerciseKeys" List also on the User.
If the user wants to add/delete exercises from a specific workout, the Workout is loaded and all its Exercise keys are loaded into a "exerciseKeys" List on the Workout.
See a pattern here?
So whenever I want to view Exercises created by the user (UExercise) or the users Workouts, or the Exercises in that Workout, I do a get() using those keys.
Since a user would probably not have 1000's of Workouts, or create 1000's of Exercises, I think this is a safe and quick way to achieve strong consistency.
Not saying that this is the best way for EVERY app. But for mine I believe it will work well.
I would greatly appreciate all of your input as to if there is something I may be missing here, or not properly taking into consideration.
Ok... After some careful consideration of how my app will work, and how users actually use it, I have decided to ditch the idea above and go with Ancestor Queries.
So for the above models, I have come up with the following...
For a Workout, I make the User the parent
For an Exercise created a user (UExercise), I make the User the
parent
This allows me to use Ancestor Queries (which are strongly consistent) to pull the most recently added or modified Entities.
Due to the fact that the user will not be modifying these Entities en mass, I think the limitations on the writes will not be a factor.
This also rids me of properties on Model objects that should not really be there in the first place.
By the way, I also tried Memcache. I found this to be the ultimate pain. Having to keep the Memcache and the Datastore in sync seemed to inject much more complexity than was really needed.
But your site, and results may differ. This idea works well for my app.
Thanks!
i have a database that already has a users table
COLUMNS:
userID - int
loginName - string
First - string
Last - string
i just installed the asp.net membership table. Right now all of my tables are joined into my users table foreign keyed into the "userId" field
How do i integrate asp.net_users table into my schema? here are the ideas i thought of:
Add a membership_id field to my users table and on new inserts, include that new field in my users table. This seems like the cleanest way as i dont need to break any existing relationships.
break all existing relationship and move all of the fields in my user table into the asp.net_users table. This seems like a pain but ultimately will lead to the most simple, normalized solution
any thoughts?
I regularly use all manner of provider stacks with great success.
I am going to proceed with the respectful observation that your experience with the SqlProvider stack is limited and that the path of least resistance seems to you to be to splice into aspnet_db.
The abstract provider stack provides cleanly separated feature sets that compliment and interact with each other in an intuitive way... if you take the time to understand how it works.
And by extension, while not perfect, the SqlProviders provide a very robust backing store for the extensive personalization and security facilities that underly the asp.net runtime.
The more effort you make to understand the workings of these facilities, focusing less on how to modify (read: break) the existing schema and more on how to envision how your existing data could fit into the existing schema the less effort you will ultimately expend in order to ultimately end up with a robust, easily understandable security and personalization system that you did not have to design, write, test and maintain.
Don't get me wrong, I am not saying not to customize the providers. That is the whole point of an abstract factory pattern. But before you take it upon yourself to splice into a database/schema/critical infrastructural system it would behoove you to better understand it.
And once you get to that point you will start to see how simple life can be if you concentrate on learning how to make systems that have thousands of man hours in dev time and countless users every minute of every day work for you the more actual work you will get done on the things that really interest you and your stakeholders.
So - let me suggest that you import your users into the aspnet_db/sqlprovider stack and leverage the facilities provided.
The userId in aspnet_db is a guid and should remain that way for very many reasons. If you need to retain the original integral user identifier - stash it in the mobile pin field for reference.
Membership is where you want to place information that is relevant to security and identification. User name, password, etc.
Profiles is where you want to place volitile meta like names and site preferences.
Anyway - what I am trying to say is that you need to have a better understanding of the database and the providers before you hack it. Start of by understanding how to use it as provided and your experience will be more fruitful.
Good luck.
In my experience, the "ASP.NET membership provider" introduces more complexity than it solves. So I'd go for option 2: a custom user table.
P.S. If anyone has been using the "ASP.NET membership provider" with success, please comment!
I've been trying to see if I can accomplish some requirements with a document based database, in this case CouchDB. Two generic requirements:
CRUD of entities with some fields which have unique index on it
ecommerce web app like eBay (better description here).
And I'm begining to think that a Document-based database isn't the best choice to address these requirements. Furthermore, I can't imagine a use for a Document based database (maybe my imagination is too limited).
Can you explain to me if I am asking pears from an elm when I try to use a Document oriented database for these requirements?
You need to think of how you approach the application in a document oriented way. If you simply try to replicate how you would model the problem in an RDBMS then you will fail. There are also different trade-offs that you might want to make. ([ed: not sure how this ties into the argument but:] Remember that CouchDB's design assumes you will have an active cluster of many nodes that could fail at any time. How is your app going to handle one of the database nodes disappearing from under it?)
One way to think about it is to imagine you didn't have any computers, just paper documents. How would you create an efficient business process using bits of paper being passed around? How can you avoid bottlenecks? What if something goes wrong?
Another angle you should think about is eventual consistency, where you will get into a consistent state eventually, but you may be inconsistent for some period of time. This is anathema in RDBMS land, but extremely common in the real world. The canonical transaction example is of transferring money from bank accounts. How does this actually happen in the real world - through a single atomic transactions or through different banks issuing credit and debit notices to each other? What happens when you write a cheque?
So lets look at your examples:
CRUD of entities with some fields with unique index on it.
If I understand this correctly in CouchDB terms, you want to have a collection of documents where some named value is guaranteed to be unique across all those documents? That case isn't generally supportable because documents may be created on different replicas.
So we need to look at the real world problem and see if we can model that. Do you really need them to be unique? Can your application handle multiple docs with the same value? Do you need to assign a unique identifier? Can you do that deterministically? A common scenario where this is required is where you need a unique sequential identifier. This is tough to solve in a replicated environment. In fact if the unique id is required to be strictly sequential with respect to time created it's impossible if you need the id straight away. You need to relax at least one of those constraints.
ecommerce web app like ebay
I'm not sure what to add here as the last comment you made on that post was to say "very useful! thanks". Was there something missing from the approach outlined there that is still causing you a problem? I thought MrKurt's answer was pretty full and I added a little enhancement that would reduce contention.
Is there a need to normalize the data?
Yes: Use relational.
No: Use document.
I am in the same boat, I am loving couchdb at the moment, and I think that the whole functional style is great. But when exactly do we start to use them in ernest for applications. I mean, yes we can all start to develop applications extremely quickly, cruft free with all those nasty hang-ups about normal form being left in the wayside and not using schemas. But, to coin a phrase "we are standing on the shoulders of giants". There is a good reason to use RDBMS and to normalise and to use schemas. My old oracle head is reeling thinking about data without form.
My main wow factor on couchdb is the replication stuff and the versioning system working in tandem.
I have been racking my brain for the last month trying to grok the storage mechanisms of couchdb, apparently it uses B trees but doesn't store data based on normal form. Does this mean that it is really really smart and realises that bits of data are replicated so lets just make a pointer to this B tree entry?
So far I am thinking of xml documents, config files, resource files streamed to base64 strings.
But would I use couchdb for structural data. I don't know, any help greatly appreciated on this.
Might be useful in storing RDF data or even free form text.
A possibility is to have a main relational database that stores definitions of items that can be retrieved by their IDs, and a document database for the descriptions and/or specifications of those items. For example, you could have a relational database with a Products table with the following fields:
ProductID
Description
UnitPrice
LotSize
Specifications
And that Specifications field would actually contain a reference to a document with the technical specifications of the product. This way, you have the best of both worlds.
Document based DBs are best suiting for storing, well, documents. Lotus Notes is a common implementation and Notes email is an example. For what you are describing, eCommerce, CRUD, etc., realtional DBs are better designed for storage and retrieval of data items/elements that are indexed (as opposed to documents).
Re CRUD: the whole REST paradigm maps directly to CRUD (or vice versa). So if you know that you can model your requirements with resources (identifiable via URIs) and a basic set of operations (namely CRUD), you may be very near to a REST-based system, which quite a few document-oriented systems provide out of the box.