solr indexing and reindexing

solr indexing and reindexing - solr

I have the schema with 10 fields. One of the fields is text(content of a file) , rest all the fields are custom metadata. Document doesn't chnages but the metadata changes frequently .
Is there any way to skip the Document(text) while re-indexing. Can I only index only custom metadata? If I skip the Document(text) in re-indexing , does it update the index file by removing the text field from the Index document?

To my knowledge there's no way to selectively update specific fields. An update operation performs a complete replace of all document data. Since Solr is open source, it's possible that you could produce your own component for this if really desired.

Related

Reloading External file field with server up

I am trying to implement an external file field in order to change ranking values in Solr.
I've defined a field and field type in the schema and, in the "solrconfig.xml", bellow the <query> tags, created the external file and added the reload listeners as described in the ref guide:
After server start up, I'm able to sort the documents based on that previous created field, however, when i change the values while the server is up and when I make a new search query, I'm not able to see the updated rank list (neither the updated rank scores).
I also tried adding a reload request handler as suggested in another post and tried a force commit (http://HOST:PORT/solr/update?commit=true), but it says:
DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
DirectUpdateHandler2 end_commit_flush
Any suggestions?

Using ExternalFileFields for scoring is really not that useful any more, since Solr and Lucene now supports In-place updates for values that uses docValues.
You can then use those fields directly from your document for scoring, and you can update them without having to update the whole document. That way you don't have to reload anything externally, and your caches can be managed automagically by Solr.
There are three conditions a field has to pass for in-place updates (that being said, atomic updates can also be used, but that requires all your fields to be set as stored):
An atomic update operation is performed using this approach only when
the fields to be updated meet these three conditions:
are non-indexed (indexed="false"), non-stored (stored="false"), single
valued (multiValued="false") numeric docValues (docValues="true")
fields;
the _version_ field is also a non-indexed, non-stored single valued
docValues field; and,
copy targets of updated fields, if any, are also non-indexed,
non-stored single valued numeric docValues fields.

Does copy field requires data re-upload

I have solr instance having lacks of data uploaded. I want to create a new copyfield which is concatenation of existing two fields.
Do I need to repopulate my data?

Yes. From the solr documentation
Fields are copied before analysis is done
By analysis in copyfield context they mean index analizer, which executed when a document is indexed.

conversion of DateField to TrieDateField in Solr

I'm using Apache Solr for powering the search functionality in my Drupal site using a contributed module for drupal named ApacheSolr Search Integration. I'm pretty novice with Solr and have a basic understanding of it, hence wish to convey my apologies in advance if this query sounds outrageous.
I have a date field added through one of drupal's hooks named ds_myDate which I initially used for sorting the search results. I decided to use a date boosting, so that the search results are displayed based on relevancy and boosted by their date rather than merely being displayed by the descending order of date. Once I had updated my hook to implement the same by adding a boost field as recip(ms(NOW/HOUR,ds_myDate),3.16e-11,1,1) I got a HTTP 400 error stating
Can't use ms() function on non-numeric legacy date field ds_myDate
Googling for the same suggested that I use a TrieDateField instead of the Legacy DateField to prevent this error. Adding a TrieDate field named tds_myDate following the suggested naming convention and implementing the boost as recip(ms(NOW/HOUR,tds_myDate),3.16e-11,1,1) did effectively achieve the boosting. However this requires me to reindex all the content (close to 500k records) to populate the new TrieDate field so that I may be able to use it effectively.
I'd request to know if there's an effective workaround than re-indexing all my content such as converting my ds_myDate to a TrieDate field like running an alter query on a mysql table field to change its type. Since I'm unfamiliar with how Solr works would request to know if such an option is feasible and what the right thing to do would be for this case.

You may be able to achieve it by doing a Partial update, but for that you need to be on on Solr 4+ and storing all indexed fields.
Here is how I would go with this:
Make sure version of Solr is 4+
Make sure all indexed fields are stored (requirement for partial updates)
If above two conditions meet, write a script(PHP), which does following:
1) Iterate through full Solr index, and for each doc:
----a) read value stored in ds_myDate field
----b) Convert it to TrieDateField format
----c) Push onto Solr, via partial update to only tds_myDate field (see sample query)
Sample query:
curl 'localhost:8983/solr/update?commit=true' -H 'Content-type:application/json' -d '[{"id":"$id","tds_myDate":{"set":$converted_Val}}]'
For more details on partial updates: http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/

Unfortunately, once a document has been indexed a certain way and you change the schema, you cannot have the new schema changes be applied to existing documents until those documents are re-indexed.
Please see this previous question - Does Schema Change need Reindex for additional details.

Can SOLR perform an UPSERT?

I've been attempting to do the equivalent of an UPSERT (insert or update if already exists) in solr. I only know what does not work and the solr/lucene documentation I have read has not been helpful. Here's what I have tried:
curl 'localhost:8983/solr/update?commit=true' -H 'Content-type:application/json' -d '[{"id":"1","name":{"set":"steve"}}]'
{"responseHeader":{"status":409,"QTime":2},"error":{"msg":"Document not found for update. id=1","code":409}}
I do up to 50 updates in one request and request may contain the same id with exclusive fields (title_en and title_es for example). If there was a way of querying whether or not a list of id's exist, I could split the data and perform separate insert and update commands... This would be an acceptable alternative but is there already a handler that does this? I would like to avoid doing any in house routines at this point.
Thanks.

With Solr 4.0 you can do a Partial update of all those document with just the fields that have changed will keeping the complete document same. The id should match.

Solr does not support UPSERT mechanics out of the box. You can create a record or you can update a record and syntax is different.
And if you update the record you must make sure all your other pre-inserted fields are stored (not just indexed). Under the covers, an update creates a completely new record just pre-populated with previously stored values. But that functionality if very deep in (probably in Lucene itself).
Have you looked at DataImportHandler? You reverse the control flow (start from Solr), but it does have support for checking which records need to be updated and which records need to be created.
Or you can just run a solr query like http://solr.example.com:8983/solr/select?q=id%3A(ID1+ID2+ID3)&fl=id&wt=csv where you ask Solr to look for your ID records and return only ID of records it does find. Then, you could post-process that to segment your Updates and Inserts.

Reindex after change in one field

I have a field with indexed="no" and stored="yes" and now I need to query that field.
How can build this index after setting indexed="yes"? or I need to do a complete reindex (re-import) ?
Thanks

No, you'll need to do a complete reindex. Solr indexes a document at a time and in order to have any change done to any of its fields, the whole document will have to be reindexed.
If all your fields are stored, you might be able to write some code to have the complete reindexing done without having to fetch the data again from the data source -- you can fetch the documents from Solr and then add them back to Solr.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight