I am looking for two things from Azure Search.
I want to be notified by the Azure Search Service when the Indexer completes the process.
Is there a way I can optimize the performance of Indexer such that if it's completing in one hour now, can be completed in 30 minutes?
Any help in this regard is greatly appreciated!
The way to achieve this is to query indexer status API. Also, please vote for this UserVoice suggestion and leave a comment with your use case.
The specific suggestions for performance optimization depend on the datasource you use, and whether the performance bottleneck is on the datasource side or on Azure Search side. If you share your service name and indexer name, we can take a look at the telemetry and provide more targeted advice.
Related
We are using Azure Search Service in our application. From documentations we know that Azure Search Service is not supporting “Geo Replication” out of the box. We want to avoid single point of failure around single instance of Azure search service. We have a couple of questions around this. Please clarify them below.
Is out of the box “Geo replication” feature in Azure search service is planned in future ? If yes, can you please share the ETA?
If we maintain at least three replicas, does it mean each instance will be deployed in different zones of same region automatically? In case, even if one zone goes down, will the requests be served by other zones without any manual intervention ?
We are finding the best way to avoid single point of failure.
Unfortunately we do not currently support out of the box Geo Replication and do not have an ETA for when this will be available.
[Edited]: Sorry for my last response. I see that your question was about replicas spanning zones, not regions. We do make replicas automatically span zones in regions where Azure supports AZs.
We have more information on this topic here: https://learn.microsoft.com/en-us/azure/search/search-performance-optimization
Liam
I'm currently doing real time synchronization with elastic search (upon save in database, I save on elastic search).
The problem that I have is synchronization of all entitites through some tool (probably Logstash) - though I'm not sure about best practices. I would like to be able to synchronize specific entity (or all entities), which is not a problem since I have DB View for each entity but I'm not sure about performance of whole DB synchronization, and are there any limitations on logstash/other tools?
Basically the idea is to run full synchronization on initial project setup, and then just run synchronization if something goes wrong or model changes and needs elastic search update. I don't have too many records for now (<1M overall I'd say).
Any suggestion would be well appreciated!
You can use the JDBC plugin of logstash. You can even use the cron job style scheduler that is built in the plugin.
The documentation is here JDBC input Plugin
But you have to understand that Elasticsearch is a Search Engine. So real-time is not possible but near real time is.
Is there a way to look at the currently active requests (YSQL/YCQL) on a YugaByte node? I have noticed that some queries take a long time, and looking for ways to debug this.
You have one of two ways:
You can go to the RPCs in flight page and take a look at the currently running queries. This should give you an idea of which queries are slower. You can access this here:
YCQL: http://IP_ADDRESS:12000/rpcz
YSQL: http://IP_ADDRESS:13000/rpcz
It is possible to enable slow query logging. I would join and ask in our Slack community chat to find out how to do this. This is something we should have documented, maybe you could contribute!
I am about to build a service that logs clicks and transactions from an e-commerce website. I expect to log millions of clicks every month.
I will use this to run reports to evaluate marketing efforts and site usage (similar to Google Analytics*). I need to be able to make queries, such as best selling product, most clicked category, average margin, etc.
*As some actions occur at later times and offline GA doesn´t fullfill all our needs.
The reporting system will not have a heady load and it will only be used internally.
My plan is to place loggable actions in a que and have a separate system store these to a database.
My question is what database I should use for this. Due to corporate IT-policy I do only have these options; SimpleDB (AWS), DynamoDB (AWS) or MS SQL/My SQL
Thanks in advance!
Best regards,
Fredrik
Have you checked this excelent Amazon documentation page ? http://aws.amazon.com/running_databases/ It helps to pick the best database from their products.
From my experience, I would advise that you do not use DynamoDB for this purpose. There is no real select equivalent and you will have hard time modeling your data. It is feasible, but not trivial.
On the other hand, SimpleDB provides a select operation that would considerably simplify the model. Nonetheless, it is advised against volumes > 10GB: http://aws.amazon.com/running_databases/
For the last resort option, RDS, I think you can do pretty much everything with it.
First of all, I'm not a DB expert .. so I'm not sure if my terminology or anything is correct here, but if you can bear with me I hope you'll get the idea.
I have a SQL Azure database which powers a social network. Millions of transactions occur every day from very simple ones to complex SELECTS which sort through tens of thousands of users based on distance etc.
My user numbers grow daily and I know (believe) that at some point I'll need to implement sharding, or make use of SQL Azure Federation to keep my app up and running due to 1 SQL Azure having limited resources ... but my question is, how do I figure out when I'm going to need to do this?
I know that when I start to use too much resources, my queries will be throttled ... but for all I know this could start happening tomorrow, or be years away.
If I know I'm hitting 80% of what I'm allowed to, then I know I need to prioritise a solution to help me scale things out, but if I'm only using 10% then I know I can put this on the back-burner and deal with it later.
I can't find any way, or even mention, of how to measure this?
Any suggestions?
Thanks,
Steven
I don't know of any inbuilt way to measure this. If somebody does then I would be very interested to hear about it.
However there is a great library from the Microsoft AppFabric CAT Best Practices Team which is a transient fault handling framework. See here
This does a number of things including handling retry logic for opening connections and running queries. You could use that but extend it slightly to log when you were being throttled by SQL Azure.
This probably wont give you as much warning as you want, but will help you know when you are getting closer to the limit. If you combined this approach together with some kind of application / database stress testing then you can find your limits now before your real usage gets there.
Based on the numbers you have given I would definitely start looking at sharding now.
I would recommend you read the article below if you haven't done so; it contains interesting information about the underlying reasons for SQL Azure throttling conditions. Understanding what is being monitored for throttling can help you figure out why your database is being throttled.
Technet Article: SQL Azure Connection Management
Thank you for mentioning the Enzo library by the way (disclaimer: I wrote it)!
However understanding the reason of the throttling would be my first recommendation because depending on the reason, sharding may or may not help you. For example, if the issue of the thottling is excessive locks, sharding may indeed reduce locks to a single database, but it could come back and bite you at a later time.
Thank you
Herve
best practices to fight throttling 1) keep queries as short as possible 2) run workloads in batches 3) employ retry mechanisms.
I would also like to point you to couple of resources.
1) sql azure throttling and decoding reason codes: http://msdn.microsoft.com/en-us/library/ff394106.aspx#throttling
2) http://geekswithblogs.net/hroggero/archive/2011/05/26/cloud-lesson-learned-exponential-backoff.aspx