Update Solr with Nifi - solr

I have a Nifi workflow which retrieves a URL from Solr and calculates the share count of this URL on different social media's, I need to update this share count back on Solr, the two attributes in the flowfile are the url and the respective sharecount, how can I do the update back to solr with Nifi.

You can use the PutSolrContentStream processor to insert a new document into Solr, or overwrite one that is there. There currently isn't support for partial updates, so if you were trying to update a document that was already there, then you need to have the entire original document in NiFi + the updates, so you can send the whole document back in.
PutSolrContentStream can basically do any of the updates described here:
https://lucene.apache.org/solr/guide/6_6/uploading-data-with-index-handlers.html
The default approach is to create a JSON document in NiFi and send that to the JSON update handler.

Related

Can I update metadata only with updatedocument API?

I'd like to update document's metadata only without re-uploading document itself.
So, I use updateDocument API without "File" parameter to update metadata only, but unfortunately enrich data is gone. (metadata is successfully updated!).
Is this updatedocument api's spec?
If I want to update metadata. Do I need upload document itself?
https://watson-api-explorer.ng.bluemix.net/apis/discovery-v1#!/Documents/updateDocument
Unfortunately, Discovery does not support updating only the metadata.
As you speculate, you do need to re-upload the document itself, along with the new or updated metadata that you want.
This documentation says:
Update a document
Replace an existing document. Starts ingesting a document with optional metadata.
which I can see may not be completely clear. By saying Replace an existing document it is trying to convey that the existing document is always and completely replaced.

Solr JSON Update via Restcall

Not able to Add or update solr document via restart this particular format, But same happening via solr admin console.
Content:
{'id':'1', 'Name':'jaga', 'childDocuments':[{'id':'1_1', 'Class':'A'},{'id':'1_2', 'Class':'B'}]}
Child and Parent relation documents

How to set Data Import Handler and Scheduler using solrJ Client

I am new to solr search, i have completed a simple search.
Now I want to index documents directly from Database and want set scheduler or trigger for updating index when there is any change in DB.
I know that I can do it with DataImportHandler but can't understand its flow.
can you help me that from which steps I should have to start this process?
or can anyone just give me pointers to do this ??
I want to do this all things using SolrJ client.
This task requires many parts to work together. Work through https://wiki.apache.org/solr/DataImportHandler
DataImportHandler is a Solr component, which means that it runs inside the Solr instance. All you have to do is configure Solr and than run the DHI through the Dataimport Screen.
On the other hand SolrJ is an API that makes it easy for Java applications to talk to Solr. So you can write your own applications that create, modify, search and delete documents to Solr.
try to do simple edit and delete function on button click event and
send the id with that url in servlet and do your jdbc opertaion
after that successfully commited, call your data import command from solrj and redirect it to your index page
thats it.

Search using SOLR is not up to date

I am writing an application in which I present search capabilities based on SOLR 4.
I am facing a strange behaviour: in case of massive indexing, search request doesnt always "sees" new indexed data. It seems like the index reader is not getting refreshed frequently, and only after I manually refresh the core from the Solr Core Admin window - the expected results will return...
I am indexing my data using JsonUpdateRequestHandler.
Is it a matter of configuration? do I need to configure Solr to reopen its index reader more frequently somehow?
Changes to the index are not available until they are commited.
For SolrJ, do
HttpSolrServer server = new HttpSolrServer(host);
server.commit();
For XML either send in <commit/> or add ?commit=true to the URL, e.g. http://localhost:8983/solr/update?commit=true

How to add data to the solr's schema

I try to add new data to the solandra according to the solr's schema but I can't find any example about this. My ultimate goal is to integrate solandra with django-solr.
What I understand about the insert and updating in the solr based on the original solr and django-solr is to send the new data on the http protocol to the decent path, for example:
http://localhost:8983/solandra/wikipedia/update/json
However, when I access the url, the browser keep telling me HTTP ERROR: 404.
Can you help me understand the step to add new data and delete the data in the solandra environment?
I also have a look at the reuters-demo, but the procedure to insert data is process in the file of reutersimporter.jar, but I can't see the source as well. So Please help me to understand how the system work in terms of data inserting and deleting.
Thank you.
Since you are using the JSON update handler, this UpdateJSON page on the Solr Wiki has some good examples of inserting data using the JSON handler via curl. Also, the Indexing Data section of the Solr Tutorial shows how you can insert data using the post.jar file that is included with the Solr source.
Are you creating the solr schema.xml and solrconfig.xml and posting it to solandra? If you add the JSON handler then this should work. The reutersdemo uses solrj. django-solr should work as well.

Resources