The only two answers on here involve essentially restructuring the database to accommodate this limitation, but I am unsure how to do that in my case.
I have a list of thousands of contacts, each with many many properties. I'm making a page that has an ability to filter on multiple properties at once.
For example: Age < 15, Date Added > 15 days ago, Location == Santa Cruz, etc. Potentially a ton of inequality filters required. How does one achieve this in GAE?
According to the docs (for python),
Limitations: The Datastore enforces some restrictions on queries.
Violating these will cause it to raise exceptions. For example,
combining too many filters, using inequalities for multiple
properties, or combining an inequality with a sort order on a
different property are all currently disallowed. Also filters
referencing multiple properties sometimes require secondary indexes to
be configured.
If you check back in a few months, this may change. GAE keeps changing pretty quickly.
For now, though, you'll have to make multiple queries and combine them in your code.
Related
I'm evaluating CouchBase for an application, and trying to figure out something about range queries on views. I know I can do a view get for a single key, multiple keys, or a range. Can I do a get for multiple ranges? i.e. I want to retrieve items with view key 0-10, 50-100, 5238-81902. I might simultaneously need 100 different ranges, so having to make 100 requests to the database seems like a lot of overhead.
As far as I know in couchbase there is no way to implement getting values from multiple ranges with one view. May be there are (or will be implemented in future) some features in Couchbase N1QL, but I didn't work with it.
Answering your question 100 requests will not be a big overhead. Couchbase is quiet fast and it's designed to handle a lot of operations per second. Also, if your view is correctly designed, it will not be "recalculated" on each query.
Also there is another way:
1. Determine minimum and maximum value of your range (it will be 0..81902 according to your example)
2. Query view that will return only document ids and a value that range was based on, without including all docs in result.
3. On client side filter array of results from previous step according to your ranges (0-10, 50-100, 5238-81902)
and then use getMulti with document ids that left in array.
I don't know your data structure, so you can try both ways, test them and choose the best one that will fit your demands.
I have an application that is essentially built out of many smaller applications. Each application has their own individual preferences, but all of them share the same 5 preferences, for example, whether the application is displayed in the nav, whether it is public, whether reports should be generated, etc.
All of these common preferences need to be known by any page in the web app because the navigation is constructed from it. So originally I put all these preferences in a single table. However as the number of applications grow (10 now, eventually around 30), the number of columns will end up being around 150-200 total. Most of these columns are just booleans, but it still worries me having that many columns in one table. On the other hand, if I were to split them apart into separate tables (preferences per app), I'd have to join them all together anyway every time I need to see the preferences, so why not just leave them all together?
In the application I can break the preferences into smaller objects so they are easier to work with, but from a db perspective they are a single entity. Is it better to leave them in one giant table, or break them apart into smaller ones but force many joins every time they are requested?
Which database engine are you using ? normally you will find some recommendations about recommended number of columns per table in your DB engine. Mostly Row size limitations, which should keep you safe.
Other options and suggestions include:
Assign a bit per config key in an integer, and use the logical "AND" operation to show only the key you are interested in at a given point in time. Single value read from DB, one quick Logical operation for each read of a config key.
Caching the preferences in memory, less round trips to DB servers, Based on frequency of changes , you may also having to clear the cache of each preference when it is updated.
Why not turn the columns into rows and use something like this:
This is a typical approach for maintaining lists of settings values.
The APP_SETTING table contains the value of the setting. The SETTING table gives you the context of what the setting is.
There are ways of extending this to add information such as which settings apply to which applications and whether or not the possible values for a particular setting are constrained to a specific list.
Well CommonPreferences and ApplicationPreferences would certainly make sense, and perhaps even segregating them in code (two queries instead of a join).
After that a table per application will make more sense.
Another way is going down the route suggested By Joel Brown.
A third would be instead of having individual colums or row per setting, you stuff all the non-common ones in to an xml snippet or serialise from a preferences class.
Which decision you make revolves around how your application does (or could use the data).
If you go down the settings table approach getting application settings as a row will be 'erm painful. Go down the xml snippet route and querying for a setting across applications will be even more painful than several joins.
No way to say what you should compromise on from here. I think I'd go for CommonPreferences first and see where I was at after that.
From the appengine blog:
Advanced Query Planning - We are removing the need for exploding indexes and reducing the custom index requirements for many queries. The SDK will suggest better indexes in several cases and an upcoming article will describe what further optimizations are possible.
As a test, I have an Entity in appengine that has a listProperty
class Entity(db.Model):
tags = db.StringListProperty()
I have 500,000 entities, half of them have tags = ['1'], and the other half have tags = ['2']
my query is
SELECT FROM Entity WHERE tags='1' and tags='2'
It returns no results really quickly. What plan is it using to achieve this? How is the list indexed to achieve this? In the old days, an exploding index would have been needed.
The algorithm used internally ('merge-join') was described in the Google I/O 2009 tech talk Building Scalable, Complex Apps on App Engine. This functionality has also been available since the launch of GAE; the 'exploding indexes' only happen if you create a compound index of multiple StringListProperties.
It's worth noting that this functionality is actually a bit more general than you may realize - any combination of multiple equality filters on any arbitrary combination of properties can be satisfied without any compound indices, provided they're all equality filters and you don't have a sort order. They don't all have to be from a StringListProperty, and can even be split across multiple StringListProperty.
We have many years of weather data that we need to build a reporting app on. Weather data has many fields of different types e.g. city, state, country, zipcode, latitude, longitude, temperature (hi/lo), temperature (avg), preciptation, wind speed, date etc. etc.
Our reports require that we choose combinations of these fields then sort, search and filter on them e.g.
WeatherData.all().filter('avg_temp =',20).filter('city','palo alto').filter('hi_temp',30).order('date').fetch(100)
or
WeatherData.all().filter('lo_temp =',20).filter('city','palo alto').filter('hi_temp',30).order('date').fetch(100)
May be easy to see that these queries require different indexes. May also be obvious that the 200 index limit can be crossed very very easily with any such data model where a combination of fields will be used to filter, sort and search entities. Finally, the number of entities in such a data model can obviously run into millions considering that there are many cities and we could do hourly data instead of daily.
Can anyone recommend a way to model this data which allows for all the queries to still be run, at the same time staying well under the 200 index limit? The write-cost in this model is not as big a deal but we need super fast reads.
Your best option is to rely on the built-in support for merge join queries, which can satisfy these queries without an index per combination. All you need to do is define one index per field you want to filter on and sort order (if that's always date, then you're down to one index per field). See this part of the docs for details.
I know it seems counter-intuitive but you can use a full-text search system that supports categories (properties/whatever) to do something like this as long as you are primarily using equality filters. There are ways to get inequality filters to work but they are often limited. The faceting features can be useful too.
The upcoming Google Search API
IndexTank is the service I currently use
EDIT:
Yup, this is totally a hackish solution. The documents I am using it for are already in my search index and I am almost always also filtering on search terms.
Hey,
I'm using AppEngine for an application that I'm writing. So I need to assign tags each object. I wanted to know what is the best way of doing this.
Should I create a space seperated string of tags and then query something like %search_tag% (I'm not sure if you can do that in JDOQL)?
What other options do I have ?
Should I create another class which will map every object to a tag?
Which would be the best from the point of view of scalability, performance and ease of use?
Thanks
First, '%search_tag%' type 'LIKE' queries do not work on App Engine's datastore. The best you can do is a prefix search.
It is difficult to answer very general questions like this. The best solution will depend on several factors, how many tags do you expect per entity? Is there a limit to the number of tags? How will you use the tags? For searching? For display only? The answers to all these questions impact how you should design your models.
One general solution for tagging is to use a multi-valued property, such as a list of tags.
http://code.google.com/appengine/docs/java/datastore/dataclasses.html#Collections
Be aware, if you will have many tags on your entities it will add overhead at write time, since the indexes writes need time too. Also, you should try to avoid using multi-valued properties multiple times (or multiple multi-value properties) in queries with inequalities or orders. That can lead to 'exploding indexes,' since one index row gets written for every combination of the indexed fields.