I'm looking for throttling information and this is the best that I've been able to find so far: https://learn.microsoft.com/en-us/azure/search/search-limits-quotas-capacity#throttling-limits
For doing a search
https://{{search-service}}.search.windows.net/indexes/:index/docs?api-version={{version}}&search=some text
Is this line from the reference page above the limit for searches?
Get Index (GET /indexes/myindex): 10 per second per search unit
I'm trying to see what the limit is for searching only under ideal scenario of nothing else happening such as an indexer running.
Some APIs such as GET /indexes are throttled based on simple rate limits. However, queries and indexing requests do not work this way. In the case of those APIs, throttling happens dynamically based on resource availability. If the system's internal queues start to fill, requests will begin to fail with 503 (Service Unavailable). If enough such failures happen within a discrete period of time (calculated as an average over a rolling window), the service will throttle requests in order to relieve pressure and allow the system to recover.
The reason throttling works this way instead of based on static rate limits is that most Azure Cognitive Search pricing tiers (other than Free) give you dedicated capacity. Static rate limits could artificially limit how you use your own capacity, so instead throttling dynamically applies backpressure as a way to ensure the reliability of the service when its capacity is overloaded.
For more information about testing and performance tuning Azure Cognitive Search, see this article.
For Azure search, there are 2 kinds of APIs: Query APIs (Search/Suggest/Autocomplete) and Index APIs .
The one you mentioned belongs to Index APIs:
Get Index (GET /indexes/myindex): 10 per second per search unit
If you want to know Query APIs(searching) limit (QPS limit), this doc will be helpful:
Related
I am from the team that runs nuget.org, the package ecosystem for .NET. We use Azure Search to power our search API. Our APIs are public, so third-party customers can use them to analyze our ecosystem or make apps.
We recently had an outage caused by a single customer paging through our search documents using the $skip and $top query parameters in batches of 200 documents at a time. This resulted in Azure Search throttling:
Failed to execute request because the request rate has caused your service to exceed the limits of its provisioned capacity. Reduce the rate of requests, or adjust the number of replicas/partitions. See http://aka.ms/azure-search-throttling for more information.
Azure Search's throttling affected all customers in that region for 10 minutes, not just the single customer that was paging. We read through Azure Search's throttling documentation, but have the following questions:
Is customer paging with high $skip values particularly expensive for Azure Search?
What can we do to reduce the likelihood of Azure Search throttling for paging scenarios?
Should we add our own throttling to ensure a single customer’s searches doesn’t affect all other customers' searches? Does Azure Search have guidance on this?
Some more information about our service:
Number of documents in index: ~950K
Request volume: 1.3K paging requests in ~10 minutes. Peak of 125 requests per second, average of 6 requests per second
Scale: standard SKU, 1 partition, 3 replicas (this is our secondary region, hence the smaller scale to save money)
Deep paging is indeed a costly operation. Since Azure Search is designed to be distributed, all indexes are divided into multiple shards, to allow for quick scale operation. This comes with the downside that ranked results from each shard need to be merged and ranked to create a final list of results. The number of results to merge increases linearly with the skip value, so that step can become expensive when paging very deep in the results.
As a search service, Azure Search is optimized for quick retrieval of top documents based on textual relevance. It's unfortunately not the best tool for scenarios where a client simply want to return a list of all documents in a data source.
From what I understand in your post, there's 2 reasons for the throttling
High skip values
Sharp increase in QPS
We encourage you to control both. It is not uncommon for our customers to implement their own throttling logic to prevent their own customers from emitting an abnormally large amount of requests. Even without skip values, having a single customer send enough queries to increase the traffic multiple-fold can lead to throttling (I'm not sure if that was the case here). There's no official guidelines as to how to handle queries coming from your client apps. The best approach in my opinion would probably be for your team to run performance tests using realistic workloads to try to understand the limits of your search service (which depends on the index schema, number of documents, type of queries being emitted, etc.). Once you have a good idea of how many QPS your service can handle for your scenarios, then you can decide how much of that QPS you are willing to allocate to a single customer at a time, and enforce a limit based on that.
Regarding the deep paging cost: if this is a common scenario for your customers (paging through all documents of a search index), I would recommend you expose a way to page through all documents directly from the data source (assuming Azure Search is not the primary data store of the documents), and mostly use Azure Search for relevance related retrieval scenarios only.
Is there a way query and get execution time/metrics for search queries when making a call to Azure Search.
We are looking to log execution times in our app code when each call is made.
Thanks!
You should be able to make use of the documentation on this page: https://learn.microsoft.com/en-us/azure/search/search-monitor-usage to leverage some of the fundamental latency related metrics that Azure search tracks.
In addition, to find out how much time Azure search took to process any particular request, you'll find the elapsed-time header useful. Note that it doesn't include network latency, which you might have to factor in yourself.
If you want to go above and beyond and add analytics for better insights, I'd point you to the Search Traffic Analytics page
I am running a proof of concept using App Engine and the built in Search API. We are testing the Search API under the assumption that it provides linear scaling as is the case with other products and services that are bundled with App Engine.
Specs: approx. 8 million documents in a single index
Query type:
Complex queries, we need spatial queries based on square areas, not
distance(!). All queries include 2 ranges based on latitude and
longitude.
Page sizes: between 16 and 250.
Accuracy (result counting) set to 100 in all test cases.
Our target performance (latency) is in the 100's of milliseconds range.
We are testing performance of the Search API running several concurrent requests. Test results are now measured at about 25 concurrent requests, but this number is expected to go up significantly. However, if the Search API is properly scalable then this should be meaningless.
I am measuring the time it takes the Search API to process a call to Index.search(Query).
What I am measuring is the following:
The average time it takes the search method to return is around 8000 ms. There are no cases in which the method returns significantly faster or slower than that. However, using an index with 10 documents results in latency measurements of around 300 ms (!!!). This might be an indication that the Search API is not scalable at all.
The page size does not seem to make any significant differences. Perhaps at page sizes of 10.000 or higher it will, but this is not part of our tests.
Adding one criteria (equality) seems to speed up the search significantly. Up to approximately 40% improvement. This seems like a nice improvement, but 4 seconds is still an eternity.
Questions:
What is the expected latency (best possible scenario/configuration) that the Search API can deliver?
Which parameters influence latency including app engine configuration.
Does the number of documents in an index influence latency?
Is a search based on 2 range queries slower than a search based on equality filters alone? (because we could pre-process the data and add 'index' data to each document).
Is the Search API really scalable?
Our application for this was to plot a number of markers on a map using a tile server. However, the tile server performs many queries (i.e. 'tiles') in parallel, almost 30 per user/view. To make things difficult, we were not able to solve this problem using pre-aggregated maps because we have too many parameters/dimensions to take care of (if this is the case for you then try: Google Maps Engine).
So, we ended up with a CloudSQL instance set to the highest tier for max. performance. Another reason to use a relational database is that index performance is more precisely tunable as opposed to the Search API or BigQuery.
To answer the questions, this is what we found:
The latency depends on the size of the index. At lower volumes per index the latency seems reasonable. At much higher volumes this may become a problem. But for text-searched this is probably ok in most cases.
We did not test at lower volumes but at around 8 million documents, the latency sits between 5000 - 8000 ms. per query. We did not find any parameters that decreased latency, we did find parameters that increased latency.
Yes.
We did not test this.
Yes.
I'm writing a very limited-purpose web application that stores about 10-20k user-submitted articles (typically 500-700 words). At any time, any user should be able to perform searches on tags and keywords, edit any part of any article (metadata, text, or tags), or download a copy of the entire database that is recent up-to-the-hour. (It can be from a cache as long as it is updated hourly.) Activity tends to happen in a few unpredictable spikes over a day (wherein many users download the entire database simultaneously requiring 100% availability and fast downloads) and itermittent weeks of low activity. This usage pattern is set in stone.
Is GAE a wise choice for this application? It appeals to me for its low cost (hopefully free), elasticity of scale, and professional management of most of the stack. I like the idea of an app engine as an alternative to a host. However, the excessive limitations and quotas on all manner of datastore usage concern me, as does the trade-off between strong and eventual consistency imposed by the datastore's distributed architecture.
Is there a way to fit this application into GAE? Should I use the ndb API instead of the plain datastore API? Or are the requirements so data-intensive that GAE is more expensive than hosts like Webfaction?
As long as you don't require full text search on the articles (which is currently still marked as experimental and limited to ~1000 queries per day), your usage scenario sounds like it would fit just fine in App Engine.
stores about 10-20k user-submitted articles (typically 500-700 words)
Maximum entity size in App Engine is 1 MB, so as long as the total size of the article is lower than that, it should not be a problem. Also, the cost for reading data in is not tied to the size of the entity but to the number of entities being read.
At any time, any user should be able to perform searches on tags and keywords.
Again, as long as the search on the tags and keywords are not full text searches, App Engine's datastore queries could handle these kind of searches efficiently. If you want to search on both tags and keywords at the same time, you would need to build a composite index for both fields. This could increase your write cost.
download a copy of the entire database that is recent up-to-the-hour.
You could use cron/scheduled task to schedule a hourly dump to the blobstore. The cron could be targeted to a backend instance if your dump takes more than 60 seconds to be finished. Do remember that with each dump, you would need to read all entities in the database, and this means 10-20k read ops per hour. You could add a timestamp field to your entity, and have your dump servlet query for anything newer than the last dump instead to save up read ops.
Activity tends to happen in a few unpredictable spikes over a day (wherein many users download the entire database simultaneously requiring 100% availability and fast downloads) and itermittent weeks of low activity.
This is where GAE shines, you could have very efficient instance usages with GAE in this case.
I don't think your application is particularly "database-heavy".
500-700 words is only a few KB of data.
I think GAE is a good fit.
You could store each article as a textproperty on an entity, with tags in a listproperty. For searching text you could use the search service https://developers.google.com/appengine/docs/python/search/ (which currently has quota limits).
Not 100% sure about downloading all the data, but I think you could store all the data in the blobstore (possibly as pdf?) and then allow users to download that blob.
I would choose NDB over regular datastore, mostly for the built-in async functionality and caching.
Regarding staying below quota, it depends on how many people are accessing the site and how much data they download/upload.
Could somebody give a good explanation for newbie, what does following phrase means:
1) workload throttling within a single cluster and 2) workload
balance across multiple clusters.
This is from overview of advantages of one ETL-jobs tool, that helps perform ETL (Extract, Transform, Load) jobs on Redshift database.
Many web services allocate a maximum amount of "interaction" that you can have with a service. Once your exceed that amount, the service will shift in how it completes its interactions.
Amazon imposes limitations on how much compute power you can consume within your nodes. The phrase "workload throttling" means that if you exceed the limits detailed in Amazon's documentation Amazon Redshift Limts, your queries, jobs, tasks, or work items will be given lower priority or fail outright.
The idea is that Amazon doesn't want you to consume so much compute power that it prevents others from using the service and, honestly, they don't want you to consume more power than it costs them to provide.
Workload throttling isn't an idea exclusive to this Amazon service, or cloud services in general. The concept can be found in any system that needs to account for receiving more tasks than it can handle. Some systems deal with being overburdened differently.
For example, some systems will defer you to alternate services in the case of a load balancer. 3rd party data APIs will delegate you a maximum amount of data per hour/minute and then either delay the responses you get back, charge you more money, or stop responding altogether.
Another service that you can look at that deals with throttling is the Google Maps Geocoding service. If you look on their documentation, Google Maps Geocoding API Usage Limits, you will see that:
Users of the standard API:
2,500 free requests per day, calculated as the sum of client-side and server-side queries.
50 requests per second, calculated as the sum of client-side and server-side queries.
If you exceed this and have billing enabled, Google will shift to:
$0.50 USD / 1000 additional requests, up to 100,000 daily.
I can't remember what the response looks like after you hit that daily limit, but once you hit it, you basically don't get responses back until the day resets.