I have an API which is adding multiple rows of data to our Search Index using the following code :
var documentList = new List<IndexBase> { document };
var batch = IndexBatch.Upload(documentList);
await searchIndexClient.Documents.IndexAsync(batch);
The API (after insertion of data) checks the count of documents originally passed in the API call and the count of the documents present in Search Index. However, the count is not always the same.
On adding some delay to the API after insertion to Search and then querying the Search Index again gives the correct count. There seems to be a lag in inserting data to the Search Index.
Is that the expected behavior ?
I am using the Microsoft.Azure.Search.3.0.4 DLL.
Indeed that is an expected behavior – documents indexed to a service will be visible for querying after a short delay, like you observed.
While the delay depends on the service topology and indexing load, Azure search do guarantee that successfully indexed documents will eventually be visible for search requests.
For more details, please read the "Response" section of Add, Update or Delete Documents (Azure Search Service REST API) document.
I hope this helps.
Related
I am building an application with Solr that should give users the first N results of a search, and using cursorMark to paginate through R rows at a time.
The problem is that with the client+server relationship, the client knows the page number and the cursorMark, but the server is only told the cursorMark. It's also not safe to trust a page number from the client.
Is there any way I'd be able to determine the offset from a given cursorMark server-side without also storing a list of page number + cursorMark combinations for every search?
For example, I'd like to be able to reject a request after using a cursorMark that would yield results > 10000 for a given search.
I'm working with solr to store web crawling search results to be used in a search engine. The structure of my documents in solr is the following:
{
word: The word received after tokenizing the body obtained from the html.
url: The url where this word was found.
frequency: The no. of times the word was found in the url.
}
When I go the Solr dashboard on my system, which is http://localhost:8983/solr/#/CrawlerSearchResults/query I'm able to find a word say "Amazon" with the query "word: Amazon" but on directly searching for Amazon I get no results. Could you please help me out with this issue ?
Image links below.
First case
Second case (No results)
Thanks,
Nilesh.
In your second example, the value is searched against the default search field (since you haven't provided a field name). This is by default a field named _text_.
To support just typing a query into the q parameter without field names, you can either set the default field name to search in with df=wordin your URL, or use the edismax query parser (defType=edismax) and the qf parameter (query fields). qf allows multiple fields and giving them a weight, but in your case it'd just be qf=word.
Second - what you're doing seems to replicate what Lucene is doing internally, so I'm not sure why you'd do it this way (each word is what's called a "token", and each count is what's called a term frequency). You can write a custom similarity to add custom scoring based on these parameters.
I have the following problem to solve.
Client sends the id of the document. This is an HTTP Get to a proxy (not directly to SOLR). Example:
baseURL/movies/{id}
The response of this call will be a list of variants of this movie.
In order to find the variants I want to perform a SOLR search using title and some other fields, e.g.
/movies/select?q=title:spiderman+year:2001
it will expect the different variants of Spiderman e.g. SpiderMan, Spiderman HD. etc
The problem I have now is that the proxy service will not have the title of the original movie. It will get only the id of this movie for the API.
My approach so far is to get the original movie information using the id,
e.g.
/movies/select?id={id}
After I get the original movie then I perform a second request to SOLR search for the variants.
Any ideas how to avoid the two calls to SOLR search?
I'm trying to implement a keyword/tags search for a certain entity type in GAE's datastore:
class Docs(db.Model):
title = db.StringProperty()
user = db.StringProperty()
tags = db.StringListProperty()
I also wrote a very basic search function (using a fake list of tags, not the datastore values), that takes a query string and matches it to each set of tags. It ranks Docs based on how many query words match the tags. This is pretty much all I need it to do, except using the actual datastore.
I have no idea how to get the actual datastore values though. For my search function to work I need a list of all the entities in the datastore, which is impossible (?).
I also tried looking into GAE's experimental full-text search, and Relation Index Entities as a way to search the datastore without using the function I wrote. Neither was successful.
Any other ideas on how to search for entities based on tags?
It's a very simple query, if you need to find all Docs with a tag "findme", it's simply:
num_results = 10
query = Docs.all().filter("tags in", "findme")
results = query.fetch(num_results) # get list of results
It's well documented:
https://developers.google.com/appengine/docs/python/datastore/queries
Is it possible to perform a search with solr within a subset of data? I am using solr combined with tokyo tyrant in python
I read Searching within a subset of data - Solr but I guess it does not really fit my problem because I am not using solr.NET
What I want is to:
find the elements of the data set with code = 'xxxx' and
I want to perform the search within the a subset of data : data whose id are in a given list / or with id startswith 'yy'
So 1 is not a problem but I do not know how to do 2
thanks for your help
Does query like this work for you q=id:(id1 OR id2 OR id3) OR id:yy*
You can use id:(id1 OR id2 OR id3) to search for ids in the id field and id:yy* for the prefix query to check for ids starting with yy
if you have a server their's an administration panel where you can do some request.
but all you have to do to make request is to send an http request with good parameters, basis are explained here.
http://wiki.apache.org/solr/SolrQuerySyntax
http://wiki.apache.org/solr/CommonQueryParameters