How do I reduce the number of search units in Azure cognitive search? - azure-cognitive-search

How do I reduce the number of search units in Azure cognitive search? I mistakenly overprovisioned the capacity and I no longer need as many search units. Thanks.

You can scale down your search service units by reducing the number of partitions or replicas. See below docs:
https://learn.microsoft.com/en-us/azure/search/search-capacity-planning#add-or-reduce-replicas-and-partitions

Related

Azure Search paging causes throttling

I am from the team that runs nuget.org, the package ecosystem for .NET. We use Azure Search to power our search API. Our APIs are public, so third-party customers can use them to analyze our ecosystem or make apps.
We recently had an outage caused by a single customer paging through our search documents using the $skip and $top query parameters in batches of 200 documents at a time. This resulted in Azure Search throttling:
Failed to execute request because the request rate has caused your service to exceed the limits of its provisioned capacity. Reduce the rate of requests, or adjust the number of replicas/partitions. See http://aka.ms/azure-search-throttling for more information.
Azure Search's throttling affected all customers in that region for 10 minutes, not just the single customer that was paging. We read through Azure Search's throttling documentation, but have the following questions:
Is customer paging with high $skip values particularly expensive for Azure Search?
What can we do to reduce the likelihood of Azure Search throttling for paging scenarios?
Should we add our own throttling to ensure a single customer’s searches doesn’t affect all other customers' searches? Does Azure Search have guidance on this?
Some more information about our service:
Number of documents in index: ~950K
Request volume: 1.3K paging requests in ~10 minutes. Peak of 125 requests per second, average of 6 requests per second
Scale: standard SKU, 1 partition, 3 replicas (this is our secondary region, hence the smaller scale to save money)
Deep paging is indeed a costly operation. Since Azure Search is designed to be distributed, all indexes are divided into multiple shards, to allow for quick scale operation. This comes with the downside that ranked results from each shard need to be merged and ranked to create a final list of results. The number of results to merge increases linearly with the skip value, so that step can become expensive when paging very deep in the results.
As a search service, Azure Search is optimized for quick retrieval of top documents based on textual relevance. It's unfortunately not the best tool for scenarios where a client simply want to return a list of all documents in a data source.
From what I understand in your post, there's 2 reasons for the throttling
High skip values
Sharp increase in QPS
We encourage you to control both. It is not uncommon for our customers to implement their own throttling logic to prevent their own customers from emitting an abnormally large amount of requests. Even without skip values, having a single customer send enough queries to increase the traffic multiple-fold can lead to throttling (I'm not sure if that was the case here). There's no official guidelines as to how to handle queries coming from your client apps. The best approach in my opinion would probably be for your team to run performance tests using realistic workloads to try to understand the limits of your search service (which depends on the index schema, number of documents, type of queries being emitted, etc.). Once you have a good idea of how many QPS your service can handle for your scenarios, then you can decide how much of that QPS you are willing to allocate to a single customer at a time, and enforce a limit based on that.
Regarding the deep paging cost: if this is a common scenario for your customers (paging through all documents of a search index), I would recommend you expose a way to page through all documents directly from the data source (assuming Azure Search is not the primary data store of the documents), and mostly use Azure Search for relevance related retrieval scenarios only.

Throttling limits for Azure Search

I'm looking for throttling information and this is the best that I've been able to find so far: https://learn.microsoft.com/en-us/azure/search/search-limits-quotas-capacity#throttling-limits
For doing a search
https://{{search-service}}.search.windows.net/indexes/:index/docs?api-version={{version}}&search=some text
Is this line from the reference page above the limit for searches?
Get Index (GET /indexes/myindex): 10 per second per search unit
I'm trying to see what the limit is for searching only under ideal scenario of nothing else happening such as an indexer running.
Some APIs such as GET /indexes are throttled based on simple rate limits. However, queries and indexing requests do not work this way. In the case of those APIs, throttling happens dynamically based on resource availability. If the system's internal queues start to fill, requests will begin to fail with 503 (Service Unavailable). If enough such failures happen within a discrete period of time (calculated as an average over a rolling window), the service will throttle requests in order to relieve pressure and allow the system to recover.
The reason throttling works this way instead of based on static rate limits is that most Azure Cognitive Search pricing tiers (other than Free) give you dedicated capacity. Static rate limits could artificially limit how you use your own capacity, so instead throttling dynamically applies backpressure as a way to ensure the reliability of the service when its capacity is overloaded.
For more information about testing and performance tuning Azure Cognitive Search, see this article.
For Azure search, there are 2 kinds of APIs: Query APIs (Search/Suggest/Autocomplete) and Index APIs .
The one you mentioned belongs to Index APIs:
Get Index (GET /indexes/myindex): 10 per second per search unit
If you want to know Query APIs(searching) limit (QPS limit), this doc will be helpful:

Search API on Google App Engine

I am running a proof of concept using App Engine and the built in Search API. We are testing the Search API under the assumption that it provides linear scaling as is the case with other products and services that are bundled with App Engine.
Specs: approx. 8 million documents in a single index
Query type:
Complex queries, we need spatial queries based on square areas, not
distance(!). All queries include 2 ranges based on latitude and
longitude.
Page sizes: between 16 and 250.
Accuracy (result counting) set to 100 in all test cases.
Our target performance (latency) is in the 100's of milliseconds range.
We are testing performance of the Search API running several concurrent requests. Test results are now measured at about 25 concurrent requests, but this number is expected to go up significantly. However, if the Search API is properly scalable then this should be meaningless.
I am measuring the time it takes the Search API to process a call to Index.search(Query).
What I am measuring is the following:
The average time it takes the search method to return is around 8000 ms. There are no cases in which the method returns significantly faster or slower than that. However, using an index with 10 documents results in latency measurements of around 300 ms (!!!). This might be an indication that the Search API is not scalable at all.
The page size does not seem to make any significant differences. Perhaps at page sizes of 10.000 or higher it will, but this is not part of our tests.
Adding one criteria (equality) seems to speed up the search significantly. Up to approximately 40% improvement. This seems like a nice improvement, but 4 seconds is still an eternity.
Questions:
What is the expected latency (best possible scenario/configuration) that the Search API can deliver?
Which parameters influence latency including app engine configuration.
Does the number of documents in an index influence latency?
Is a search based on 2 range queries slower than a search based on equality filters alone? (because we could pre-process the data and add 'index' data to each document).
Is the Search API really scalable?
Our application for this was to plot a number of markers on a map using a tile server. However, the tile server performs many queries (i.e. 'tiles') in parallel, almost 30 per user/view. To make things difficult, we were not able to solve this problem using pre-aggregated maps because we have too many parameters/dimensions to take care of (if this is the case for you then try: Google Maps Engine).
So, we ended up with a CloudSQL instance set to the highest tier for max. performance. Another reason to use a relational database is that index performance is more precisely tunable as opposed to the Search API or BigQuery.
To answer the questions, this is what we found:
The latency depends on the size of the index. At lower volumes per index the latency seems reasonable. At much higher volumes this may become a problem. But for text-searched this is probably ok in most cases.
We did not test at lower volumes but at around 8 million documents, the latency sits between 5000 - 8000 ms. per query. We did not find any parameters that decreased latency, we did find parameters that increased latency.
Yes.
We did not test this.
Yes.

Performance expectations for an Amazon RDS instance

I am using Google App Engine for an app, and the app is currently hitting the datastore at a rate of around 2.5 million row writes, and 4.5 million row reads per day.
I am currently porting the app to Amazon Elastic Beanstalk and Amazon RDS due to the very high costs of running an application on GAE.
Based on the values above, how can I find out / estimate what type of RDS instance I will need for my requirements? Is the above a considerable amount of processing for, lets say a Small or Micro MySQL RDS instance to process in a day?
Totally depends on a number of factors:
Row size.
Field types and sizes.
Complexity of your queries (joins, etc).
Proper use of indexes.
Row contention and other possible bottlenecks.
Really hard to tell. But from experience, if you don't need fancy replication or sharding, the costs of the GAE datastore are usually higher as it offers total redundancy, distribution, scalability, etc.
My suggestion would be to write a quick program to benchmark a load on RDS that replicates what you are expecting. Should be easy to write if you forgo all the business rules and such and just do fake but randomized reads and writes.

Will full-text search index be limited in size?

We are about to migrate our app to FTS to replace datastore indexes.
Will there be a limit in index size once the FTS is our of experimental state or is it unlimited like the datastore?
Once the Search API is out of experimental, there should be no limit on index size.
As of App Engine version 1.9.0: "The size limit on the Search API is now computed and enforced on a per-index basis, rather than for the app as a whole. The per-index limit is now 10GB. There is no fixed limit on the number of indexes, or on the total amount of Search API storage an application may use."

Resources