conversion of DateField to TrieDateField in Solr - solr

I'm using Apache Solr for powering the search functionality in my Drupal site using a contributed module for drupal named ApacheSolr Search Integration. I'm pretty novice with Solr and have a basic understanding of it, hence wish to convey my apologies in advance if this query sounds outrageous.
I have a date field added through one of drupal's hooks named ds_myDate which I initially used for sorting the search results. I decided to use a date boosting, so that the search results are displayed based on relevancy and boosted by their date rather than merely being displayed by the descending order of date. Once I had updated my hook to implement the same by adding a boost field as recip(ms(NOW/HOUR,ds_myDate),3.16e-11,1,1) I got a HTTP 400 error stating
Can't use ms() function on non-numeric legacy date field ds_myDate
Googling for the same suggested that I use a TrieDateField instead of the Legacy DateField to prevent this error. Adding a TrieDate field named tds_myDate following the suggested naming convention and implementing the boost as recip(ms(NOW/HOUR,tds_myDate),3.16e-11,1,1) did effectively achieve the boosting. However this requires me to reindex all the content (close to 500k records) to populate the new TrieDate field so that I may be able to use it effectively.
I'd request to know if there's an effective workaround than re-indexing all my content such as converting my ds_myDate to a TrieDate field like running an alter query on a mysql table field to change its type. Since I'm unfamiliar with how Solr works would request to know if such an option is feasible and what the right thing to do would be for this case.

You may be able to achieve it by doing a Partial update, but for that you need to be on on Solr 4+ and storing all indexed fields.
Here is how I would go with this:
Make sure version of Solr is 4+
Make sure all indexed fields are stored (requirement for partial updates)
If above two conditions meet, write a script(PHP), which does following:
1) Iterate through full Solr index, and for each doc:
----a) read value stored in ds_myDate field
----b) Convert it to TrieDateField format
----c) Push onto Solr, via partial update to only tds_myDate field (see sample query)
Sample query:
curl 'localhost:8983/solr/update?commit=true' -H 'Content-type:application/json' -d '[{"id":"$id","tds_myDate":{"set":$converted_Val}}]'
For more details on partial updates: http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/

Unfortunately, once a document has been indexed a certain way and you change the schema, you cannot have the new schema changes be applied to existing documents until those documents are re-indexed.
Please see this previous question - Does Schema Change need Reindex for additional details.

Related

Apache Nifi GetSolr configuration without Date Field

I'm new to Apache Nifi. My requirement is to retrieve data from a solr index, do some processing and store it in a different solr index.
I'm trying to use Nifi GetSolr processor to retrieve the data. GetSolr processor has a mandatory field Date Field. however My solr index doesn't have a date/timestamp field in the collections.
Please see a sample document in my solr collection below.
Any workaround to this? Can I use GetSolr without the Date field and use someting like the version field instead?
Thanks.
GetSolr is meant to do incremental extraction from an index, meaning each time it runs it finds documents newer than the last time it ran. It can only do that if it can sort the documents by a date/time to compare against it's last execution time.
If you just want a one-time extraction, you may want to use QuerySolr instead.

Reloading External file field with server up

I am trying to implement an external file field in order to change ranking values in Solr.
I've defined a field and field type in the schema and, in the "solrconfig.xml", bellow the <query> tags, created the external file and added the reload listeners as described in the ref guide:
After server start up, I'm able to sort the documents based on that previous created field, however, when i change the values while the server is up and when I make a new search query, I'm not able to see the updated rank list (neither the updated rank scores).
I also tried adding a reload request handler as suggested in another post and tried a force commit (http://HOST:PORT/solr/update?commit=true), but it says:
DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
DirectUpdateHandler2 end_commit_flush
Any suggestions?
Using ExternalFileFields for scoring is really not that useful any more, since Solr and Lucene now supports In-place updates for values that uses docValues.
You can then use those fields directly from your document for scoring, and you can update them without having to update the whole document. That way you don't have to reload anything externally, and your caches can be managed automagically by Solr.
There are three conditions a field has to pass for in-place updates (that being said, atomic updates can also be used, but that requires all your fields to be set as stored):
An atomic update operation is performed using this approach only when
the fields to be updated meet these three conditions:
are non-indexed (indexed="false"), non-stored (stored="false"), single
valued (multiValued="false") numeric docValues (docValues="true")
fields;
the _version_ field is also a non-indexed, non-stored single valued
docValues field; and,
copy targets of updated fields, if any, are also non-indexed,
non-stored single valued numeric docValues fields.

How do you update data in Solr 4?

We need to update the index of Solr 4 but are getting some unexpected results. We run a C# program that uses SolrNet to do an AddRange(). In this process, we're adding new documents and also trying to update existing ones.
We're noticing that some records' fields get updated with the latest data, while others still show the old information. Should we be using the information indicated in the documentation?
The documentation indicates we can set an update="set|add|inc" on the field. If we'd like the existing record to be updated, should we use set? Also, when we delete a field, to have it removed, do we need to shut down Solr and restart? Or set null="true"?
Can you point us to some good information on doing updates to Solr data? Thank you.
The documenation reference that you list describes the parameters for Atomic Updates in Solr 4, which is currently not supported in SolrNet - see issue 199 for more details.
Until this support has been added to SolrNet, your only option for updating documents in the index is to resend the entire document (object in C#) with the required updated/deleted feilds set appropriately. Internally Solr will re-add the document to the index with the updated fields.
Also, when you are adding/updating documents in the index, these changes will not be visible to queries against the index until a commit has been issued. I would recommend using the CommitWithin option of AddParameters to allow Solr to handle this internally, this is described in detail in the SolrWiki - CommitWithin.

Can SOLR perform an UPSERT?

I've been attempting to do the equivalent of an UPSERT (insert or update if already exists) in solr. I only know what does not work and the solr/lucene documentation I have read has not been helpful. Here's what I have tried:
curl 'localhost:8983/solr/update?commit=true' -H 'Content-type:application/json' -d '[{"id":"1","name":{"set":"steve"}}]'
{"responseHeader":{"status":409,"QTime":2},"error":{"msg":"Document not found for update. id=1","code":409}}
I do up to 50 updates in one request and request may contain the same id with exclusive fields (title_en and title_es for example). If there was a way of querying whether or not a list of id's exist, I could split the data and perform separate insert and update commands... This would be an acceptable alternative but is there already a handler that does this? I would like to avoid doing any in house routines at this point.
Thanks.
With Solr 4.0 you can do a Partial update of all those document with just the fields that have changed will keeping the complete document same. The id should match.
Solr does not support UPSERT mechanics out of the box. You can create a record or you can update a record and syntax is different.
And if you update the record you must make sure all your other pre-inserted fields are stored (not just indexed). Under the covers, an update creates a completely new record just pre-populated with previously stored values. But that functionality if very deep in (probably in Lucene itself).
Have you looked at DataImportHandler? You reverse the control flow (start from Solr), but it does have support for checking which records need to be updated and which records need to be created.
Or you can just run a solr query like http://solr.example.com:8983/solr/select?q=id%3A(ID1+ID2+ID3)&fl=id&wt=csv where you ask Solr to look for your ID records and return only ID of records it does find. Then, you could post-process that to segment your Updates and Inserts.

Update document field with solrj

I want to edit document filed in solr,for example edit the author name,so i use the following code in solrj:
params.set("literal.author","anaconda")
but the author multivalued="true" in schema and because of that "anaconde" is not replace with it's previous name and add to the end of the author name,also if i ommit the multivalued field or set it to false the bad request exception happen in re-indexing file with new author field,how can i solve this problem and delete or modify the previous document field in solrj?
or does it any config i miss in schema?
thanks
The only option I know of would be to query the full document (all fields using &fl=* parameter) into a local construct with solrj, update the appropriate field(s) and them submit the entire document back to Solr.
Nope there is no way to update specific field for an document in Solr, nor through any of its Client apis.
EDIT :- With Solr 4.0 it it possible to Partially update the documents with certain fields.
This post should be the correct answer to your question (if you are using SOLR 4.x)
For Solr 4.0 you are able to update a single field on a document, but that version is ALPHA, if you are concerned.
But for the update thingy, it is only possible by CURL I think, I didnt find any way to update a single field on a doc on java side by solrj.
You have two options:
As stated in other answers, you can query for the original document, update the field, and then re-save which will overwrite the original document with the new values.
Your other option is to install a nightly build of Solr, where Yonik has added a patch for updateable documents. You should keep an eye on https://issues.apache.org/jira/browse/SOLR-139 as this patch is pretty new and still being worked on.

Resources