Is it possible to create nested namespaces in a google app engine app?
Lets say i am modeling a multi-tenant application similar to google docs. So I obviously need one namespace for each organization to avoid data leaks from one org to another. However, i probably want a namespace per user also so that when i am searching for a document, i dont have to search through all the documents of all the users in that organization and again to avoid data leaks.
Whats the best way to model this?
With namespaces just being strings (limited to [0‑9A‑Za‑z._‑]{0,100}) you could just use "_" or somewhere as your separator for your sub namespaces, so you'd have namespaces like "%organisation_%user"
But your argument
So I don't have to search through all the documents
doesn't seem like a strong enough reason to go down this route. Your code will become more complex (and as a result more likly to leak data) if you are constantly having to switch between namespaces to get at data that is organisation wide vs user wide. Performance will not be improved any more than if you were filtering your list of documents by a user-id field.
Related
We are now developing an application that uses GAE Datastore and trying to implement Multitenancy.
Our customers are companies, so we are going to create namespaces on a per-company basis.
My question is how should we treat company mergers and separations.
For example, when two of our customers are merged, data under two namespaces should be migrated into a single namespace. When our customer is separated into two company, some of data should be migrated into another namespace. This takes a lot of effort and we would like to avoid these operations.
How can we treat these cases smoothly? Or is namespace suitable for per-company basis? If not, how should we implement per-company based multitenancy?
The general way this is handled is by creating a job that handles mergers as a batch process by read-write-delete the old to new keys as part of a transaction. Generally you'll have a bunch of business rules you throw in as part of the processing as well as the basic rekeying. For example, how will you handle 2 users having the same username?
Using Cloud Dataflow (Java & Python connectors available) is a good tool to do this.
Mergers are messy when it comes to data in most cases, so it isn't really namespaces that prevent a simpler solution.
I'm designing yet another "Find Objects near my location" web site and mobile app.
My requirements are:
Store up to 100k objects;
Query for objects that are close to the point (my location, city, etc). And other search criteria (like object type);
Display results on the Google Maps with smooth performance.
Let user filter objects by object time.
I'm thinking about using Google App Engine for this project.
Could You recommend what would be the best data storage option for this?
And couple of words about dynamic data loading strategy.
I kinda feel overwhelmed with options at the moment and looking for hints where should I continue my research.
Thanks a lot!
I'm going to to assume that you are using the datastore. I'm not familiar with Google Cloud SQL (which I believe aims to offer MySQL-like features in the cloud), so I can't speak if it can do geospatial queries.
I've been looking into the whole "get locations in proximity of a location" problem for a while now. I have some good and bad news for you, unfortunately.
The best way to do the proximity search in the Google Environment is via the Search Service (https://developers.google.com/appengine/docs/python/search/ or find the JAVA link ). Reason being is that it supports a "Geopoint Field", and allows you to query in such a way.
Ok, cool, so there is support, right? However, "A query is complex if its query string includes the name of a geopoint field or at least one OR or NOT boolean operator". The free quota for Complex Search Queries are 100/day. Per 10,000 queries, it costs 60 cents. Depending on your application, this may be an issue.
I'm not too familar with the Google Maps API you might be able to pull off something like this :(https://developers.google.com/maps/articles/phpsqlsearch_v3)
My current project/problem involves moving locations, and not "static" ones (stores, landmarks,etc). I've decided to go with Amazon's Dynamodb and they have a library which supports geospatial indexing : http://aws.amazon.com/about-aws/whats-new/2013/09/05/announcing-amazon-dynamodb-geospatial-indexing/
I'm working on a web application that allows users to create simple websites and publish them on a static web host, and I have problems deciding if and how I should use ancestors in the gray area between necessary and avoidable.
The model is rather simple: currently every User has one or more Website entities. The Website entities store all the basic information about a website, plus it's nested navigation menu that refers to Page entities (the navigation tree is stored as a JSON property). The Page entity types are based on a PolyModel, and there are several page types that behave differently (There's a GalleryPage, for example).
There are no entity groups (or rather, no entities with ancestors) as of yet, and I'll only need a couple of transactions. When updating a Page's name, for example, I have to update it in the Page entity itself as well as in the navigation tree on the Website entity.
I think I understand how entity groups work and the basic implications of using them, but I have trouble deciding on the "best" way to structure my data in the absence of strong reasons for either approach. I could:
Go entirely without ancestors on my entities. As far as I understand I can still use cross-group transactions as long as I get the entities by key and don't need more than 5 within the transaction. The downside is that I'd depend on the XG transactions and there might come a point where I can't ninja my way around using ancestor queries anymore (and then it might be too late).
Make the user object the parent of all his Website's, Page's and other data. This would give the user a strongly consistent view of all of his data, allow me to use transactions whenever I add a feature that would need them, but limits the sustained writes to 1-5/sec. But, as a user will only ever be updating his own data, this might actually work and behave just the same for 1000 users as it will for 1.
Try to use even smaller entity groups (like seperating the navigation from the Website and making that the parent of the Website's Pages). But I'm not quite sure if there's much benefit to this, because most of the editing happens on Pages anyway.
So I guess the real question is: how do you decide when to use ancestor relationships on App Engine when there's no obvious reason for or against them? Would you go for the convenience of strongly consistent queries and being able to use transactions freely while adding features later, or would you avoid them at all costs until there's a very obvious reason for them, even if might limit my ability to do transactions later?
I read the related documentation, read the chapter on transactions in "Programming App Engine", looked at quite a few of the Google I/O videos, but I still find it hard to make that decision.
I am trying to figure out the best way to deploy a single Google App Engine application across multiple regions.
The same code is to be used, but the stored data is specific to each region. Motivating examples are hyperlocal review sites, like yelp.com or urbanspoon, where restaurants and other businesses to review are specific to a region (e.g. boston.app.com, seattle.app.com).
A couple options include:
Create multiple GAE applications,
and duplicate the code across them.
Create a single GAE application, and store all data for all regions
in the same Datastore, with a region
identifier field for each model
delimiting the relevant region.
Some of the trade-offs:
Option 2 seems like it will be increasingly inefficient (space: replicating a region identifier for each record of every model; time: filtering/indexing on the identifier for every query).
Option 1 requires an app ID for every region, while GAE only allows 10 apps per account. Moreover, deploying the code across every region, as well as Datastore migration, seems like it could be a pain to manage.
In the ideal world, I would have a single application instance. From that instance, I could route between subdomains (like here), as well as have a separate Datastore for each subdomain. But I believe GAE only allows a single datastore per application.
Does anyone have ideas on the best way to solve this problem? Or options that I am not considering?
Thanks for your time!
I would recommend your approach #2. Storage space is cheap (and region codes are short), and datastore performance does not degrade with size, unlike most databases. Using a single app also makes for easier management and upgrades, and avoids any issues with the TOS (which prohibit sharding your app to avoid billing charges).
If you use source code revision control, then it is not too bad to push identical code into multiple apps. You could set a policy whereby only full-fledged tags are allowed to be pushed up to GAE. Another option is to make your application version the same as the revision number.
With App Engine, I (and I believe most others) always migrate data from within my model code. You can't easily do bulk migrations in GAE and the usual solution is to migrate data as you come across it in code. In this way, you can keep your models pretty much identical across applications.
Having said that, I would probably still go with a unified application. It's more future-proof. What if users want to join their L.A. identity and their New York identity? Or what if an advertiser offers you a sweet deal for you to run some marketing reports on your own data?
Finally, a few bytes of data doesn't matter so much on App Engine. As your site grows, you will very quickly discover that you will always be bumping into ceilings. GAE limits are extremely small compared to a traditional web server and so you will have to work within those limits anyway. For example, you can only fetch 1,000 records at a time. So your architecture will already support a piecemeal paging solution. So don't worry too much about an extra field or two in your record.
I've thought about this too much now with no obviously correct solution. It might be a real wood-for-the-trees situation, so I need stackoverflow's help.
I'm trying to enforce database filtering on a regional basis. My system has various users and each one is assigned to a regional office. I only want users to be able to see data that is associated with their regional office.
Put simply my application is: Java App -> JPA (hibernate) -> MySQL
The database contains object from all regions, but I only want the users to be able to manipulate objects from their own region. I've thought about the following ways of doing it:
1) modify all database querys so they read something like select * from tablex where region="myregion". This is nasty. It doesn't work to well with JPA eg the entitymanager.find() method only accepts primary key. Of course I can go native, but I only have to miss one select statement and my security is shot
2) use a mysql proxy to filter results. kind of funky, but then the mysql proxy just sees the raw call and doesn't really know how it should be filtering them (ie which region the user that made this request belongs to). Ok, I could start a proxy for each region, but it starts getting a little messy..
3) use separate schemas for each region. yeah, simple, I'm using spring so I could use the RoutingDataSource to route the requests via the correct datasource (1 datasource per schema). Of the course the problem now is somewhere down the line I'm going to want to filter by region and some other category. ohps.
4) ACL - not really sure about this. If a did a select * from tablex; would it quietly filter out objects I don't have access for or would a load of access exceptions be thrown?
But am I thinking too much about this? This seems like a really common problem. There must be some easy solution I'm just too dumb to see. I'm sure it'll be something close to / or in the database as you want to filter as near to source as possible, but what?
Not looking to be spoonfed - any links, keywords, ideas, commerical/opensource product suggestions would be really appreciated!! thanks.
I've just been implementing something similar (REALbasic talking to MySQL) over the last couple of weeks for a hierarchical multi-company extension to an accounting package.
There's a large body of existing code which composes SQL statements so we had to live with that and just do a lot of auditing to ensure the restrictions were included in each table as appropriate. One gotcha was related lookups where lookup tables were normally only used in combination with a primary table but for some maintenance GUIs would load the lookup table itself, directly.
There's a danger of giving away implied information such as revealing that Acme Pornstars are a client of some division of the company ;-)
The only solution for that part was very careful construction of DB diagrams to show all implied relationships and lots of auditing and grepping source code, with careful commenting to indicate areas which had been OK'd as not needing additional restrictions.
The one pattern I've come up with to make this more generalised in future is, rather than explicit region=currentRegionVar type searches, using an arbitrary entityID which is supplied by a global CurrentEntityForRole("blah") function.
This abstraction allows for sharing of some data as well as implementing pseudo-entities which represent other restriction boundaries.
I don't know enough about Java and Spring to be able to tell but is there a way you could use views to provide a single-key lookup, where the views are restricted by the region filter?
The desire to provide aggregations and possible data sharing was why we didn't go down the separate database route.
Good Question.
Seems like #1 is the best since it's the most flexible.
Region happens to be what you're filtering on today, but it could be region + department + colour of hair tomorrow.
If you start carving up the data too much it seems like you'll be stuck working harder than necessary to glue them all back together for reporting.
I am having the same problem. It is hard to believe that such a common task (filtering a list of model entities based on the user profile) has not a 'standard' way, pattern or best-practice to do it.
I've found pgacl, a PostgreSQL module. Basically, you do your query like you normally would, and then you tack on an acl_access() predicate to work as a filter.
Maybe there is something similar for MySQL.
I suggest you to use ACL. It is more flexible than other choices. Use Spring Security. You can use it without using Spring Framework. Read the tutorial from link text