Google App engine - Search API 10GB per-index limit - google-app-engine

In the app engine release notes 1.9.0 is stated:
"The size limit on the Search API is now computed and enforced on a per-index basis, rather than for the app as a whole. The per-index limit is now 10GB. There is no fixed limit on the number of indexes, or on the total amount of Search API storage an application may use."
The Search API is now in experimental state, but I would like to know if the 10GB per-index limit will be removed when the Search API will be out of experimental (or at least, will be replaced with a much larger one).

As indicated in Search API quotas, the current quota is still 10 GB per index. There are no public plans to increase this quota at the time of this writing.
Should this quota increase be desired, feel free to file a new feature request on the App Engine public issue tracker detailing a thorough business case. You may also want to consider supporting this existing and related issue: Issue 10667: Search API multiple index search.

Related

Throttling limits for Azure Search

I'm looking for throttling information and this is the best that I've been able to find so far: https://learn.microsoft.com/en-us/azure/search/search-limits-quotas-capacity#throttling-limits
For doing a search
https://{{search-service}}.search.windows.net/indexes/:index/docs?api-version={{version}}&search=some text
Is this line from the reference page above the limit for searches?
Get Index (GET /indexes/myindex): 10 per second per search unit
I'm trying to see what the limit is for searching only under ideal scenario of nothing else happening such as an indexer running.
Some APIs such as GET /indexes are throttled based on simple rate limits. However, queries and indexing requests do not work this way. In the case of those APIs, throttling happens dynamically based on resource availability. If the system's internal queues start to fill, requests will begin to fail with 503 (Service Unavailable). If enough such failures happen within a discrete period of time (calculated as an average over a rolling window), the service will throttle requests in order to relieve pressure and allow the system to recover.
The reason throttling works this way instead of based on static rate limits is that most Azure Cognitive Search pricing tiers (other than Free) give you dedicated capacity. Static rate limits could artificially limit how you use your own capacity, so instead throttling dynamically applies backpressure as a way to ensure the reliability of the service when its capacity is overloaded.
For more information about testing and performance tuning Azure Cognitive Search, see this article.
For Azure search, there are 2 kinds of APIs: Query APIs (Search/Suggest/Autocomplete) and Index APIs .
The one you mentioned belongs to Index APIs:
Get Index (GET /indexes/myindex): 10 per second per search unit
If you want to know Query APIs(searching) limit (QPS limit), this doc will be helpful:

google appstore, how to split fees per datastore namespace

I'd like to make a GAE app multi-tenant to cater to different clients (companies), database namespaces seems like a GAE endorsed solution. Is there a meaningful way to split GAE fees among client/namespaces? GAE costs for app are mainly depends on user activities - backend instances up time, because new instances are created or (after 15 min delay) terminated proportionally to the server load, not total volume of data user has or created. Ideally the way the fees are split should be meaningful and could be explained to the clients.
I guess the most fair fee splitting solution is just create a new app for a new client, so all costs reported separately, yet total cost will grow up, I expect few apps running on same instances will use server resources more economically.
Every app engine request is logged with a rough estimated cost measurement. It is possible to log the namespace/client associated with every request and query the logs to add up the estimated instance costs for that namespace. Note that the estimated cost field is deprecated and may be inaccurate. It is mostly useful as a rough guide to the proportion of instance cost associated with each client.
As far as datastore pricing goes, the cloud console will tell you how much data has been stored in each namespace, and you can calculate costs from that. For reads/writes, we have set up a logging system to help us track reads and writes per namespace (i.e. every request tracks the number of datastore reads and writes it does in each namespace and logs these numbers at the end of the request).
The bottom line is that with some investments into infrastructure and logging, it is possible to roughly track costs per namespace. But no, App Engine does not make this easy, and it may be impossible to calculate very accurate cost estimates.

Datastore partition using namespace api

I want to use Multitenancy feature of GAE in my app's datastore. Here they have given that
Using the Namespaces API, you can easily partition data across tenants simply by specifying a unique namespace string for each tenant.
so my questions are,
[1.]How many partition(of one datastore) can be created using namespace api?
[2.]Is there any limit on the size of each partition?
[3.]how would I know if size of a partition is increased beyond GAE free quota?
i'll really appreciate any help to clear this doubt.
thanks in advance!!!!
How many partition(of one datastore) can be created using namespace api?
Namespaces help you increase your app scalability, you have no limit number
Is there any limit on the size of each partition?
App engine Free quota is fixed, it's the only limit. If you need to activate billing, you'll have to fix the limit budget. App engine offers you a very high scalability
how would I know if size of a partition is increased beyond GAE free quota?
As for second question, you have a quota for the whole application, not per namespace. If you consume all the free resources your app will throw an error instead of serving the appropriate handler, until the quota is replenished

Is GAE optimized for database-heavy applications?

I'm writing a very limited-purpose web application that stores about 10-20k user-submitted articles (typically 500-700 words). At any time, any user should be able to perform searches on tags and keywords, edit any part of any article (metadata, text, or tags), or download a copy of the entire database that is recent up-to-the-hour. (It can be from a cache as long as it is updated hourly.) Activity tends to happen in a few unpredictable spikes over a day (wherein many users download the entire database simultaneously requiring 100% availability and fast downloads) and itermittent weeks of low activity. This usage pattern is set in stone.
Is GAE a wise choice for this application? It appeals to me for its low cost (hopefully free), elasticity of scale, and professional management of most of the stack. I like the idea of an app engine as an alternative to a host. However, the excessive limitations and quotas on all manner of datastore usage concern me, as does the trade-off between strong and eventual consistency imposed by the datastore's distributed architecture.
Is there a way to fit this application into GAE? Should I use the ndb API instead of the plain datastore API? Or are the requirements so data-intensive that GAE is more expensive than hosts like Webfaction?
As long as you don't require full text search on the articles (which is currently still marked as experimental and limited to ~1000 queries per day), your usage scenario sounds like it would fit just fine in App Engine.
stores about 10-20k user-submitted articles (typically 500-700 words)
Maximum entity size in App Engine is 1 MB, so as long as the total size of the article is lower than that, it should not be a problem. Also, the cost for reading data in is not tied to the size of the entity but to the number of entities being read.
At any time, any user should be able to perform searches on tags and keywords.
Again, as long as the search on the tags and keywords are not full text searches, App Engine's datastore queries could handle these kind of searches efficiently. If you want to search on both tags and keywords at the same time, you would need to build a composite index for both fields. This could increase your write cost.
download a copy of the entire database that is recent up-to-the-hour.
You could use cron/scheduled task to schedule a hourly dump to the blobstore. The cron could be targeted to a backend instance if your dump takes more than 60 seconds to be finished. Do remember that with each dump, you would need to read all entities in the database, and this means 10-20k read ops per hour. You could add a timestamp field to your entity, and have your dump servlet query for anything newer than the last dump instead to save up read ops.
Activity tends to happen in a few unpredictable spikes over a day (wherein many users download the entire database simultaneously requiring 100% availability and fast downloads) and itermittent weeks of low activity.
This is where GAE shines, you could have very efficient instance usages with GAE in this case.
I don't think your application is particularly "database-heavy".
500-700 words is only a few KB of data.
I think GAE is a good fit.
You could store each article as a textproperty on an entity, with tags in a listproperty. For searching text you could use the search service https://developers.google.com/appengine/docs/python/search/ (which currently has quota limits).
Not 100% sure about downloading all the data, but I think you could store all the data in the blobstore (possibly as pdf?) and then allow users to download that blob.
I would choose NDB over regular datastore, mostly for the built-in async functionality and caching.
Regarding staying below quota, it depends on how many people are accessing the site and how much data they download/upload.

Will full-text search index be limited in size?

We are about to migrate our app to FTS to replace datastore indexes.
Will there be a limit in index size once the FTS is our of experimental state or is it unlimited like the datastore?
Once the Search API is out of experimental, there should be no limit on index size.
As of App Engine version 1.9.0: "The size limit on the Search API is now computed and enforced on a per-index basis, rather than for the app as a whole. The per-index limit is now 10GB. There is no fixed limit on the number of indexes, or on the total amount of Search API storage an application may use."

Resources