Does a Search Engine send a request to the server every time a new character is inserted for new suggestions in order to employ an autocomplete feature? I'm referring to search engines that look through an index and not those web-surfing crawlers.
No, there is a two configuration knobs:
min-number-of-characters that the minimum number of chars in the query before sending a request to the index.
The other configuration option is something like timeout when you type it will wait that amount of time before sending the request, with possibly a "backoff" strategy that reduce the the timeout to zero to send immediatly the request after a few keystrokes.
By the way "instant search" is different from "query suggestion"
Related
I am building an application with Solr that should give users the first N results of a search, and using cursorMark to paginate through R rows at a time.
The problem is that with the client+server relationship, the client knows the page number and the cursorMark, but the server is only told the cursorMark. It's also not safe to trust a page number from the client.
Is there any way I'd be able to determine the offset from a given cursorMark server-side without also storing a list of page number + cursorMark combinations for every search?
For example, I'd like to be able to reject a request after using a cursorMark that would yield results > 10000 for a given search.
I have an API which is adding multiple rows of data to our Search Index using the following code :
var documentList = new List<IndexBase> { document };
var batch = IndexBatch.Upload(documentList);
await searchIndexClient.Documents.IndexAsync(batch);
The API (after insertion of data) checks the count of documents originally passed in the API call and the count of the documents present in Search Index. However, the count is not always the same.
On adding some delay to the API after insertion to Search and then querying the Search Index again gives the correct count. There seems to be a lag in inserting data to the Search Index.
Is that the expected behavior ?
I am using the Microsoft.Azure.Search.3.0.4 DLL.
Indeed that is an expected behavior – documents indexed to a service will be visible for querying after a short delay, like you observed.
While the delay depends on the service topology and indexing load, Azure search do guarantee that successfully indexed documents will eventually be visible for search requests.
For more details, please read the "Response" section of Add, Update or Delete Documents (Azure Search Service REST API) document.
I hope this helps.
I have the following problem to solve.
Client sends the id of the document. This is an HTTP Get to a proxy (not directly to SOLR). Example:
baseURL/movies/{id}
The response of this call will be a list of variants of this movie.
In order to find the variants I want to perform a SOLR search using title and some other fields, e.g.
/movies/select?q=title:spiderman+year:2001
it will expect the different variants of Spiderman e.g. SpiderMan, Spiderman HD. etc
The problem I have now is that the proxy service will not have the title of the original movie. It will get only the id of this movie for the API.
My approach so far is to get the original movie information using the id,
e.g.
/movies/select?id={id}
After I get the original movie then I perform a second request to SOLR search for the variants.
Any ideas how to avoid the two calls to SOLR search?
I'm working with solr to index a large amount of documents. And I would like to monitor the indexing speed of these documents.
I generate my documents collection into CSV files and then I index my documents via CURL.
So i can get the QTime in ms. I can then check the indexing speed by CSV file doing this simple calculation :
indexing speed = number of documents in my CSV File / Qtime
From Solr Documentation :
QTime: The elapsed time (in milliseconds) between the arrival of the request (when the SolrQueryRequest object is created) and the completion of the request handler. It does not include time spent in the response writer formatting/streaming the response to the client.
I'm not sure that is the right way to monitor indexing speed. It's not very precise, and the result doesn't seem very good...
Are there some tools that I can use instead of this archaic method ?! :)
Thank
You can use solrmeter https://code.google.com/p/solrmeter/ to monitor indexing speed
Our web app has recently become the target of some DDoSers. We use solr and they managed to generate 100% load by searching for "**" every few seconds. Can someone tell me why that query takes tens of seconds to run, whereas everything else takes just milliseconds? Also, the code appends the user ID to the search so the query was "userid: 10 AND **", which shouldn't really slow down because that user only has 10 documents or so.
Does anyone know what's going on, and how we can best protect ourselves from it?
Thank you.
** gets interpreted by Solr as a query with leading and ending wildcard and since these is no defined field it lands on your default search field, which (as your said in the comments) is a big text field. So it ends up searching for everything which is probably why it takes so long.
Solution: filter out ** in your application before passing the query to Solr. You could even filter all * if you don't want to allow your users to issue any wildcard queries.