How to add data to the solr's schema - solr

I try to add new data to the solandra according to the solr's schema but I can't find any example about this. My ultimate goal is to integrate solandra with django-solr.
What I understand about the insert and updating in the solr based on the original solr and django-solr is to send the new data on the http protocol to the decent path, for example:
http://localhost:8983/solandra/wikipedia/update/json
However, when I access the url, the browser keep telling me HTTP ERROR: 404.
Can you help me understand the step to add new data and delete the data in the solandra environment?
I also have a look at the reuters-demo, but the procedure to insert data is process in the file of reutersimporter.jar, but I can't see the source as well. So Please help me to understand how the system work in terms of data inserting and deleting.
Thank you.

Since you are using the JSON update handler, this UpdateJSON page on the Solr Wiki has some good examples of inserting data using the JSON handler via curl. Also, the Indexing Data section of the Solr Tutorial shows how you can insert data using the post.jar file that is included with the Solr source.

Are you creating the solr schema.xml and solrconfig.xml and posting it to solandra? If you add the JSON handler then this should work. The reutersdemo uses solrj. django-solr should work as well.

Related

Can I update metadata only with updatedocument API?

I'd like to update document's metadata only without re-uploading document itself.
So, I use updateDocument API without "File" parameter to update metadata only, but unfortunately enrich data is gone. (metadata is successfully updated!).
Is this updatedocument api's spec?
If I want to update metadata. Do I need upload document itself?
https://watson-api-explorer.ng.bluemix.net/apis/discovery-v1#!/Documents/updateDocument
Unfortunately, Discovery does not support updating only the metadata.
As you speculate, you do need to re-upload the document itself, along with the new or updated metadata that you want.
This documentation says:
Update a document
Replace an existing document. Starts ingesting a document with optional metadata.
which I can see may not be completely clear. By saying Replace an existing document it is trying to convey that the existing document is always and completely replaced.

How to set Data Import Handler and Scheduler using solrJ Client

I am new to solr search, i have completed a simple search.
Now I want to index documents directly from Database and want set scheduler or trigger for updating index when there is any change in DB.
I know that I can do it with DataImportHandler but can't understand its flow.
can you help me that from which steps I should have to start this process?
or can anyone just give me pointers to do this ??
I want to do this all things using SolrJ client.
This task requires many parts to work together. Work through https://wiki.apache.org/solr/DataImportHandler
DataImportHandler is a Solr component, which means that it runs inside the Solr instance. All you have to do is configure Solr and than run the DHI through the Dataimport Screen.
On the other hand SolrJ is an API that makes it easy for Java applications to talk to Solr. So you can write your own applications that create, modify, search and delete documents to Solr.
try to do simple edit and delete function on button click event and
send the id with that url in servlet and do your jdbc opertaion
after that successfully commited, call your data import command from solrj and redirect it to your index page
thats it.

Solr Cluster + DataImportHandler: can I have autogenerated id?

I'm using Solr 4.3. I've created 4 shards. I configured UniqueKey autogenerated field as described here:
http://wiki.apache.org/solr/UniqueKey
It works fine if I use the actual update handler to insert documents (i.e. if I make a HTTP POST to /update with some JSON data, the unique key is autogenerated for each document).
If however I use the DataImportHandler to pull some documents from database, they are not added to the index, instead I see a warning in the Solr log saying that "mandatory id field is missing".
I know the DataImportHandler doesn't go through the UpdateHandler to add documents, but I was hoping this feature would work for DIH as well...
So my question is: does anybody know how to make work the id autogeneration for a Solr 4.3 cluster when using the DataImportHandler to insert documents?
Well, the solution I ended up using was this
created a custom transformer in Java (actually I was already using one - I find it's faster than doing them in JS - the other option Solr offers)
Inside the transformer I pretty much do what the UUIDUpdateProcessorFactory does: add
#Override
public Object transformRow(Map<String, Object> row, Context context) {
row.put("id", UUID.randomUUID());
I then removed the <updateRequestProcessorChain name="uuid"> tag from my solrconfig.xml, and only left the schema.xml configuration as per the link in the question

how to implement solr to index mysql database in java?

i want to use solr to index MySql database and so that I can perform a faster search of data on my website.Can anyone help me with the code. I don't have any idea how to implement solr in my code.
Your question is too broad. However for a head start you could have a look at DataImport in Solr.
you many want to check for Solr Data Import Handler module which will help you index data from MySQL into Solr without writing any java code.
If you have downloaded Solr, You can check out the example solr-4.3.0/example/example-DIH (Refer to the readme.txt) which will give you an idea of how the DIH is configured and the indexing can be done.
CommonsHttpSolrServer commonsHttpSolrServer = new CommonsHttpSolrServer("http://localhost:8983/solr");
QueryRequest request = new QueryRequest(params);
request.setPath("/dataimport");
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("command", "full-import");
commonsHttpSolrServer.request(request);
NOTE - The request sent is asynchronous, so you would receive an immediate response and would need to check the status to know if it was complete.

Get Solr Schema in json

I was wondering if it's possible to get the full schema or just the fields the schema defines in json format? Obviously I could scrape the page the schema is on
/solr/#/collection1/schema
Do a transformation and create my own json but if solr has a method built in :)
Thanks in advance
You cannot get the schema.xml directly in JSON format but you can get the raw file from Solr instead of haveing to scrape the solr admin page that shows it. You can use this url, where collection1 is the name of your core:
http://localhost:8080/solr/collection1/admin/file?file=schema.xml&contentType=text/xml;charset=utf-8

Resources