When there is a change in core and the document gets reindexed will the nextCursorMark that i already got will be valid or not. If not how to handle such cases?
Yes, the cursorMark will still be valid. A cursorMark is completely stateless, meaning that any changes to the index won't make it invalid.
It will not include documents inserted before the mark in the index either (which would make the same document be displayed twice - in the last position on the previous page and the first on the new page).
Think of the cursorMark as an identifier saying "we've moved so far into the result set that any documents any document that sort in front of this key has already been shown".
Related
I have indexed the following record in my collection
{
"app_name":"atm inspection",
"appversion":1,
"id":"app_1427_version_2449",
"icon":"/images/media/default_icons/app.png",
"type":"app",
"app_id":1427,
"account_id":556,
"app_description":"inspection",
"_version_":1599625614495580160}]
}
and It's working fine unless an until i search records case sensitively i.e if i write following Solr query to search records whose app_name contains atm then Solr is returning above response which is a correct behaviour.
http://localhost:8983/solr/NewAxoSolrCollectionLocal/select?fq=app_name:*atm\ *&q=*:*
However, If i execute following Solr query to search records whose app_name contains ATM
http://localhost:8983/solr/NewAxoSolrCollectionLocal/select?fq=app_name:*ATM\ *&q=*:*
Solr is not returning above response because ATM!=atm.
Can someone please help me with the Solr query to search records case insensitively.
Your help is greatly appreciated.
You can't. The field type string requires an exact match (it's a single, unprocessed token being stored for the field value).
The way to do it is to use a TextField with an associated Tokenizer and a LowercaseFilter. If you use a KeywordTokenizer, the whole token will be kept intact (so it won't get split as you'd usually assume with a tokenizer), and since it's a TextField it can have a analysis chain associated - allowing you to add a LowercaseFilter.
The LowerCaseFilter is multiterm aware as far as I remember, but remember that wildcard queries will usually not have any filters applied. You should therefor lowercase the value before creating your query yourself (even if it probably will work in this simple case).
I want to manipulate doc and change the token value for field(s) by prepending some value to each token. I am doing bulk update through DIH and also posting Documents through SOLRJ. I have replication factor as 2, so Replication should also work. The value that I want to prepend is there in the document as a separate field. I am interested to know the place where I can intercept the document before the indexing so that I can manipulate it. One of the option I can think of overriding DirectUpdateHandler2. Is this the right place?
I can do it by externally processing the document and passing it to SOLR But I want to do it inside SOLR.
Document fields are :
city:mumbai
RestaurantName:Talk About
Keywords:Cofee, Chines, South Indian, Bar
I want to index keywords as
mumbai_cofee
mumbai_Chines
mumbai_South Indian
mumbai_Bar
the right place is an Update Request Processor, you make sure you plug that in sorlconfig.xml into all udpate handlers you are using (including DIH), and the single URP will cover all updates.
In your java code in the URP you can easily get the value of a field and then prepend it to all the others in another field etc. This happens before the doc is indexed.
Does someone know how to check if a document is well indexed after an update with Solr ?
I've tried to read the response after calling the add() method of SolrServer as below but it doesn't seem to work :
SolrInputDocument doc = new SolrInputDocument();
/*
* Processing on document to add fields ...
*/
UpdateResponse response = server.add(doc);
if(response.getStatus()==0){
System.out.println("File Added");
}
else{
System.out.println("Error when Adding File");
}
In the javadoc, there is no way to know what returns the add() method. Does it always return 0 ?
In this case, what is the best way to check that a file is well indexed after an update ?
Thank
Corentin
You need to perform a commit to be able to see the documents added.
Add will simply add the document to the Index.
However, the document is still not returned as search result unless you commit.
When you are indexing documents to solr none of the changes (add/delete/update) you make will appear until you run the commit command.
A commit operation makes index changes visible to new search requests.
Also check for Soft commits which will perform in a more performant manner.
To add to Jayendra's answers, there might be situations where you might be trying to index existing document again. e.g. to test a different index-time chain of analyzers.
In these cases, you might not be able to deduce if the document was indexed again if no content changes.
In such cases, _version_ field might come to rescue. _version_ field always changes its value when the document is indexed again. Please refer my answer here to know more about _version_ field.
I am trying to query Solr with following requirement:
_ I would like to get all documents which not have a particular field
-exclusivity:[* TO *]
I would like to get all document which have this field and got the specific value
exclusivity:(None)
so when I am trying to query Solr 4 with:
fq=(-exclusivity:[* TO *]) OR exclusivity:(None)
I have only got results if the field exists in document and the value is None but results not contain results from first query !!
I cannot understand why it is not working
To explain your results, the query (-exclusivity:[* TO *]) will always get no results, because you haven't specified any result to retrieve. By default, Lucene doesn't retrieve any results, unless you tell it to get them. exclusivity:(None) isn't a limitation placed on the full result set, it is the key used to find the documents to retrieve. This differs from a database, which by default returns all records in a table, and allows you to limit the set.
(-exclusivity:[* TO *]) only specifies what NOT to get, but doesn't tell it to GET anything at all.
Solr has logic to handle Pure negative queries (I believe, in much the same way as below, by implicitly retrieving all documents first), but from what I gather, only as the top level query, and it does not handle queries like term1 OR -term2 documented here.
I believe with solr you should be able to use the query *:* to get all docs (though that would not be available in raw lucene), so you could use the query:
(*:* -exclusivity:[* TO *]) exclusivity:(None)
which would mean, get (all docs except those with a value in exclusivity) or docs where exclusivity = "None"
I have founded answer to this problem. I have made bad assumption how "-" works in solr.I though that
-exclusivity:[* TO *]
add everything without exclusivity field to the data set but it is not the case. The '-' could only exclude things from data set. BTW femtoRgon you are right but I am using it as fq (filter query) not as a master query I have forgotten to mention that.
So the solution is like
-exclusivity:([* TO *] AND -(None))
and full query looks like
/?q=*:*&fq=-exclusivity:([* TO *] AND -(None))
so that means I will get everything does not have field exclusivity or has this field and it is populated with value None.
I am having lots of solr document indexed which has field
uri = nntp://msnews.microsoft.com/microsoft.public.windows.server.sbs
but when i search with query
uri:nntp\://msnews.microsoft.com/microsoft.public.windows.server.sbs
It returns zero results. The search query works with similar other uri (nntp://msnews.microsoft.com/microsoft.public.windows.windowsxp.general) though.
What am i missing here?
If your search URI is similar to
/select?uri%3Anntp*&rows=0
you should still be able to get a good idea of how many items in that field begin with nntp without even returning any rows, the numFound attribute of the result tag should tell you.
If this is blank, I would check your logfile. It is entirely likely you're adding documents with commit turned off. I would use the command line scripts to force things to commit and refresh the readers:
sync
bin/commit
sync
bin/readercycle
Then I would issue that search again and see if you can see your data again.