I have a field with indexed="no" and stored="yes" and now I need to query that field.
How can build this index after setting indexed="yes"? or I need to do a complete reindex (re-import) ?
Thanks
No, you'll need to do a complete reindex. Solr indexes a document at a time and in order to have any change done to any of its fields, the whole document will have to be reindexed.
If all your fields are stored, you might be able to write some code to have the complete reindexing done without having to fetch the data again from the data source -- you can fetch the documents from Solr and then add them back to Solr.
Related
I'm new to Apache Nifi. My requirement is to retrieve data from a solr index, do some processing and store it in a different solr index.
I'm trying to use Nifi GetSolr processor to retrieve the data. GetSolr processor has a mandatory field Date Field. however My solr index doesn't have a date/timestamp field in the collections.
Please see a sample document in my solr collection below.
Any workaround to this? Can I use GetSolr without the Date field and use someting like the version field instead?
Thanks.
GetSolr is meant to do incremental extraction from an index, meaning each time it runs it finds documents newer than the last time it ran. It can only do that if it can sort the documents by a date/time to compare against it's last execution time.
If you just want a one-time extraction, you may want to use QuerySolr instead.
I have solr instance having lacks of data uploaded. I want to create a new copyfield which is concatenation of existing two fields.
Do I need to repopulate my data?
Yes. From the solr documentation
Fields are copied before analysis is done
By analysis in copyfield context they mean index analizer, which executed when a document is indexed.
I'm using Apache Solr for powering the search functionality in my Drupal site using a contributed module for drupal named ApacheSolr Search Integration. I'm pretty novice with Solr and have a basic understanding of it, hence wish to convey my apologies in advance if this query sounds outrageous.
I have a date field added through one of drupal's hooks named ds_myDate which I initially used for sorting the search results. I decided to use a date boosting, so that the search results are displayed based on relevancy and boosted by their date rather than merely being displayed by the descending order of date. Once I had updated my hook to implement the same by adding a boost field as recip(ms(NOW/HOUR,ds_myDate),3.16e-11,1,1) I got a HTTP 400 error stating
Can't use ms() function on non-numeric legacy date field ds_myDate
Googling for the same suggested that I use a TrieDateField instead of the Legacy DateField to prevent this error. Adding a TrieDate field named tds_myDate following the suggested naming convention and implementing the boost as recip(ms(NOW/HOUR,tds_myDate),3.16e-11,1,1) did effectively achieve the boosting. However this requires me to reindex all the content (close to 500k records) to populate the new TrieDate field so that I may be able to use it effectively.
I'd request to know if there's an effective workaround than re-indexing all my content such as converting my ds_myDate to a TrieDate field like running an alter query on a mysql table field to change its type. Since I'm unfamiliar with how Solr works would request to know if such an option is feasible and what the right thing to do would be for this case.
You may be able to achieve it by doing a Partial update, but for that you need to be on on Solr 4+ and storing all indexed fields.
Here is how I would go with this:
Make sure version of Solr is 4+
Make sure all indexed fields are stored (requirement for partial updates)
If above two conditions meet, write a script(PHP), which does following:
1) Iterate through full Solr index, and for each doc:
----a) read value stored in ds_myDate field
----b) Convert it to TrieDateField format
----c) Push onto Solr, via partial update to only tds_myDate field (see sample query)
Sample query:
curl 'localhost:8983/solr/update?commit=true' -H 'Content-type:application/json' -d '[{"id":"$id","tds_myDate":{"set":$converted_Val}}]'
For more details on partial updates: http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/
Unfortunately, once a document has been indexed a certain way and you change the schema, you cannot have the new schema changes be applied to existing documents until those documents are re-indexed.
Please see this previous question - Does Schema Change need Reindex for additional details.
I added a new field in my schema that is indexed but not stored, so that I can copy another field into it. Do I still have to re-index all the documents because of this schema change? Or can I just restart my solr server? I looks like I have to re-index all documents since sorting on that new non-stored field is giving me unexpected results, but I would like a confirmation on that.
You have to full re-index. As schema change can contain different IndexAnalyzers Solr can't apply schema changes by itself.
Yes, you have to run indexer to actually fill in the data to that filed
I have the schema with 10 fields. One of the fields is text(content of a file) , rest all the fields are custom metadata. Document doesn't chnages but the metadata changes frequently .
Is there any way to skip the Document(text) while re-indexing. Can I only index only custom metadata? If I skip the Document(text) in re-indexing , does it update the index file by removing the text field from the Index document?
To my knowledge there's no way to selectively update specific fields. An update operation performs a complete replace of all document data. Since Solr is open source, it's possible that you could produce your own component for this if really desired.