How to check that a document is really indexed? - solr

Does someone know how to check if a document is well indexed after an update with Solr ?
I've tried to read the response after calling the add() method of SolrServer as below but it doesn't seem to work :
SolrInputDocument doc = new SolrInputDocument();
/*
* Processing on document to add fields ...
*/
UpdateResponse response = server.add(doc);
if(response.getStatus()==0){
System.out.println("File Added");
}
else{
System.out.println("Error when Adding File");
}
In the javadoc, there is no way to know what returns the add() method. Does it always return 0 ?
In this case, what is the best way to check that a file is well indexed after an update ?
Thank
Corentin

You need to perform a commit to be able to see the documents added.
Add will simply add the document to the Index.
However, the document is still not returned as search result unless you commit.
When you are indexing documents to solr none of the changes (add/delete/update) you make will appear until you run the commit command.
A commit operation makes index changes visible to new search requests.
Also check for Soft commits which will perform in a more performant manner.

To add to Jayendra's answers, there might be situations where you might be trying to index existing document again. e.g. to test a different index-time chain of analyzers.
In these cases, you might not be able to deduce if the document was indexed again if no content changes.
In such cases, _version_ field might come to rescue. _version_ field always changes its value when the document is indexed again. Please refer my answer here to know more about _version_ field.

Related

Will NextCursorMark be valid when documents gets reindexed in Apache Solr?

When there is a change in core and the document gets reindexed will the nextCursorMark that i already got will be valid or not. If not how to handle such cases?
Yes, the cursorMark will still be valid. A cursorMark is completely stateless, meaning that any changes to the index won't make it invalid.
It will not include documents inserted before the mark in the index either (which would make the same document be displayed twice - in the last position on the previous page and the first on the new page).
Think of the cursorMark as an identifier saying "we've moved so far into the result set that any documents any document that sort in front of this key has already been shown".

Solr indexing fails over media_black_point

In front i want to say that i dont have much experience with Solr.
Problem we are facing, we only want to index content of files and not want to add dynamic fields, is this possible and if so how?
Problem 2: If Problem one is a No, how would we exclude media_black_point,
media_white_point with indexing?
Error code where Solr trips:
{"responseHeader":{"status":400,"QTime":149},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],"msg":"incompatible dimension (2) and values (313/1000 329/1000). Only 0 values specified","code":400}}
Dynamic Fields and schemaless mode are both there to catch fields you did not declare explicitly. If neither are used, the assumption is that every field you send to Solr (including output from extract handler that generates a Solr document internally) needs to be explicitly mapped. This helps to avoid spelling errors and other unexpected edge-cases.
If you want to ignore all the fields you did not define explicitly, you can use dynamic field with stored/indexed/docValues all set to false. Solr ships with one example out of the box, you just need to uncomment it.
The other option is to ignore specific fields. You can do that by defining a custom UpdateRequestProcessor chain (or individual URP in the latest Solr) and using IgnoreFieldUpdateProcessorFactory with your specific field name or a name pattern.

How To intercept Document in Solr

I want to manipulate doc and change the token value for field(s) by prepending some value to each token. I am doing bulk update through DIH and also posting Documents through SOLRJ. I have replication factor as 2, so Replication should also work. The value that I want to prepend is there in the document as a separate field. I am interested to know the place where I can intercept the document before the indexing so that I can manipulate it. One of the option I can think of overriding DirectUpdateHandler2. Is this the right place?
I can do it by externally processing the document and passing it to SOLR But I want to do it inside SOLR.
Document fields are :
city:mumbai
RestaurantName:Talk About
Keywords:Cofee, Chines, South Indian, Bar
I want to index keywords as
mumbai_cofee
mumbai_Chines
mumbai_South Indian
mumbai_Bar
the right place is an Update Request Processor, you make sure you plug that in sorlconfig.xml into all udpate handlers you are using (including DIH), and the single URP will cover all updates.
In your java code in the URP you can easily get the value of a field and then prepend it to all the others in another field etc. This happens before the doc is indexed.

Adding and Updating Solr and lucene field

I am new to solr. can someone address below questions.
1. Currently I have an index with 1.5 mill records. I am having a need to update value of a field to a new value. How do I do it. Will it be a re-indexing? Sample code will be helpful.
I have another need where I want to add a index field but don't want to reindex the entire content. I have document ids with me. For this requirement I can use lucene if that helps.
Currently I have an index with 1.5 mill records. I am having a need to update value of a field to a new value. How do I do it. Will it be a re-indexing? Sample code will be helpful.
Well, the good news is that the latest versions of Solr (starting with 4.3 or 4.4, I think) allows you to do what they call Atomic Updates. See here:
http://wiki.apache.org/solr/Atomic_Updates
From the coding point of view, it as if you were only updating the desired field. Using the Java SolrJ API it's something like this:
Let's say you have a document with a multi value field called "stuffedAnimals". The field already contains "teddy bear" and "stuffed turtle" as values. You want to update it and add a new value like "pink fluffy flamingo". What you can do is:
SolrInputDocument updateDocument = new SolrInputDocument();
//here you must add the id field with the desired value, corresponding to the doc you want to update:
updateDocument.addField("id", 2312312);
//tell it to add the new value to the existing ones, rather then replace them with it:
updateDocument.addField("stuffedAnimals", new HashMap(){{put("add","pink fluffy flamingo");}});
Problem with this is performance: what actually happens when you do this is that the document is removed and re-added entirely (not just the field). This is something you need to take into consideration if you plan on doing a lot of such operations.
I have another need where I want to add a index field but don't want to reindex the entire content. I have document ids with me. For this requirement I can use lucene if that helps.
Well, as I was saying above: when you update a field, the document is actually re-written entirely, so that means it's re-indexed with the new field as well. If you're using Solr 4.4 or earlier you need to declare the new fields in the schema.xml file. If you're using Solr 4.5 or newer you don't need to worry about the schema.xml any more.
Finally, as a remark for both questions: if you want to update a Solr document, make sure all its fields are marked as "stored" (stored=true in schema.xml). Since a partial update on a field translates into Solr removing and re-adding the document (with the update applied), if certain fields are not stored, Solr won't know what value to put in them after the update.
Take a look at atomic update feature added in 4.0.
It allows You to change value of particular field without reindexing whole document.
Remember that all fields in your schema have to be stored(without copyFields). If You need further assistance please write more detailed description.

Solr weird search behaviour

I am having lots of solr document indexed which has field
uri = nntp://msnews.microsoft.com/microsoft.public.windows.server.sbs
but when i search with query
uri:nntp\://msnews.microsoft.com/microsoft.public.windows.server.sbs
It returns zero results. The search query works with similar other uri (nntp://msnews.microsoft.com/microsoft.public.windows.windowsxp.general) though.
What am i missing here?
If your search URI is similar to
/select?uri%3Anntp*&rows=0
you should still be able to get a good idea of how many items in that field begin with nntp without even returning any rows, the numFound attribute of the result tag should tell you.
If this is blank, I would check your logfile. It is entirely likely you're adding documents with commit turned off. I would use the command line scripts to force things to commit and refresh the readers:
sync
bin/commit
sync
bin/readercycle
Then I would issue that search again and see if you can see your data again.

Resources