Solr Cluster + DataImportHandler: can I have autogenerated id? - solr

I'm using Solr 4.3. I've created 4 shards. I configured UniqueKey autogenerated field as described here:
http://wiki.apache.org/solr/UniqueKey
It works fine if I use the actual update handler to insert documents (i.e. if I make a HTTP POST to /update with some JSON data, the unique key is autogenerated for each document).
If however I use the DataImportHandler to pull some documents from database, they are not added to the index, instead I see a warning in the Solr log saying that "mandatory id field is missing".
I know the DataImportHandler doesn't go through the UpdateHandler to add documents, but I was hoping this feature would work for DIH as well...
So my question is: does anybody know how to make work the id autogeneration for a Solr 4.3 cluster when using the DataImportHandler to insert documents?

Well, the solution I ended up using was this
created a custom transformer in Java (actually I was already using one - I find it's faster than doing them in JS - the other option Solr offers)
Inside the transformer I pretty much do what the UUIDUpdateProcessorFactory does: add
#Override
public Object transformRow(Map<String, Object> row, Context context) {
row.put("id", UUID.randomUUID());
I then removed the <updateRequestProcessorChain name="uuid"> tag from my solrconfig.xml, and only left the schema.xml configuration as per the link in the question

Related

Tridion and SOLR Configuration

Hi I am using Tridion CMS and have configuration with SOLR, however, I would like to add new fields. I have added new fields in Tridion, but they are not being configured properly in SOLR, is there a step that I might be missing when configuring the new fields? If possible, is there a walkthrough of the necessary steps to make sure that the new fields are being configured between Tridion and SOLR properly?

Can I query a new handler not in solr config?

I am using solr 5.4.0.
I want to create a new handler in solr say "X". This handler is not defined in solr config, but can I define this on run time and include it in query using the qt field?
The same way how we can replace the bq, qf etc fields for an already existing handler in solr config, is there a support for creating a new handler while issuing the solr query as well
I do not remember being able to create additional request handlers via API in Solr 5.4. You may be able to modify or XInclude a file on a filesystem and reload core. But that's a bit hacky.
In the latest versions of Solr, you do have
configuration API to override solrconfig.xml
request parameters API, that allow you to define parameter sets, which you can apply with useParams configuration or as a query parameter in the URL.

how to backup solrconfig file from running solr

I have a single core solr server. when solr was running, in one collection solrconfig.xml and schema.xml files replaced by mistake.
now collection worked correctly and correctly response to request but valid file in conf folder is replaced by mistake files. surly if i reload collection, new bad files load and my collection not worked correctly.
is there a way than can get solrconfig.xml & schema.xml from running collection without considering solrconfig.xml and schema.xml files that exist in conf folder?
You can read the current running schema and config through the Solr schema API and Solr config API.
Pay attention: the results of this APIs is not the original schema.xml or solrconfig.xml files but from that you can rebuild the originals.
Again, pay also attention that Solr config API is available only in recent version of Solr.
In older versions (I have tested version 4.8.1) are no API for the solr configuration, so there is no way to fully rebuild the solrconfig.xml file.
You can retrieve the loaded configuration files using Solr Administration User Interface :
Go to http://<hostname>:<port>/solr.
Select your core in the dropdown menu in the left pane.
A menu apears below the selected core, select Files
Load the file you want
Or you can go straight to http://<hostname>/solr/#/<corename>/files?file=<filename>
See https://cwiki.apache.org/confluence/display/solr/Files+Screen
Solr version prior to 4.x shows a slightly different interface, if I remember correctly there is no core dropdown, solrconfig.xml & schema.xml appears right in the left pane.
On SolrCloud there is an additional dropdown list showing all collections in a given cluster, but you get the idea.
Note : Solr Admin UI shows you parsed files, so if you ever had to escape special characters, for example in a filter's regex that uses a <, you would have to re-escape it to < once you get the file back in order to prevent parse error.

Replicating Schemaless SOLR Index

I have an index on a Schemaless solr instance. To allow the application to query some of the fields that are in this index, I have to register these fields using the schema REST API http://localhost:8983/solr/schema/fields.
All works fine in isolation. I can also replicate the index to slaves without problem. However, I am unable to query the replicated index using the fields that were registered via the schema REST API.
That means, if I register the field "button" using the API, I can query using this field on master, but I cannot query on slave. I get error message 400 undefined field button.
Now, I also tried to register this field on the slave in the same way I registered it on the master using the schema REST API. This fails with the message: 400 This IndexSchema is not mutable.
Any idea how this should be addressed?
I presume that when the schema is well defined, the schema.xml can be replicated. But what happens with fields created via the REST API?
I am using SOLR 4.10.3
I have not fully validated that this is the solution to this problem, but my gut feeling tells me that it is. The SOLR master was running SOLR 4.8.0 and the SOLR Slave was running SOLR 4.10.3. It looks like the slave did not completely like the index replicated from 4.8.0. So I downgraded the slave to 4.8.0 and everything works fine.

How to add data to the solr's schema

I try to add new data to the solandra according to the solr's schema but I can't find any example about this. My ultimate goal is to integrate solandra with django-solr.
What I understand about the insert and updating in the solr based on the original solr and django-solr is to send the new data on the http protocol to the decent path, for example:
http://localhost:8983/solandra/wikipedia/update/json
However, when I access the url, the browser keep telling me HTTP ERROR: 404.
Can you help me understand the step to add new data and delete the data in the solandra environment?
I also have a look at the reuters-demo, but the procedure to insert data is process in the file of reutersimporter.jar, but I can't see the source as well. So Please help me to understand how the system work in terms of data inserting and deleting.
Thank you.
Since you are using the JSON update handler, this UpdateJSON page on the Solr Wiki has some good examples of inserting data using the JSON handler via curl. Also, the Indexing Data section of the Solr Tutorial shows how you can insert data using the post.jar file that is included with the Solr source.
Are you creating the solr schema.xml and solrconfig.xml and posting it to solandra? If you add the JSON handler then this should work. The reutersdemo uses solrj. django-solr should work as well.

Resources