I am very new to Apache Solr. I understand that Solr accepts a CSV, JSON or XML format and not a TXT documents. Is it necessary to convert a text document into JSON or XML before sending the document to Solr?
Can you please guide me how to insert the text documents in Apache Solr?
Your help will be appreciated.
You can use DIH functionality of solr. DIH is Data Import Handler. Here you can use File
Import the files using the FileDataSource.
You can refer the below page for your further reference
DIH
Related
We are running Solr 8.4 and SolrJ 8.4. I can successfully retrieve about 18K lines of metrics using curl: curl 'http://localhost:1080/MySolr/admin/metrics'. How can I retrieve the same metrics using SolrJ?
I was unable to find any information in either the Solr or SolrJ documentation about this.
Any help is appreciated.
You make explicit use of the CommonParams.QT parameter to change the query path into any value.
query.setParam(CommonParams.QT, "/admin/metrics");
This lets you make a custom query to a path under a specific core name.
Iam new to Solr and installed Solr 7.5.0 on my local and created a collection with sample json file,iam able to query it and view the data.my requirement is to download the same data as CSV into my local.Please help me out.
Use Solr select endpoint to extract all documents with q=*:* and csv as the responseWriter.
you r query will look like :
http:/localhost:8886/solr/product/select?q=*:*&wt=csv&indent=true
Another alternative is to use /export to make requests to export the result set of a query.
I was wondering if it's possible to get the full schema or just the fields the schema defines in json format? Obviously I could scrape the page the schema is on
/solr/#/collection1/schema
Do a transformation and create my own json but if solr has a method built in :)
Thanks in advance
You cannot get the schema.xml directly in JSON format but you can get the raw file from Solr instead of haveing to scrape the solr admin page that shows it. You can use this url, where collection1 is the name of your core:
http://localhost:8080/solr/collection1/admin/file?file=schema.xml&contentType=text/xml;charset=utf-8
I try to add new data to the solandra according to the solr's schema but I can't find any example about this. My ultimate goal is to integrate solandra with django-solr.
What I understand about the insert and updating in the solr based on the original solr and django-solr is to send the new data on the http protocol to the decent path, for example:
http://localhost:8983/solandra/wikipedia/update/json
However, when I access the url, the browser keep telling me HTTP ERROR: 404.
Can you help me understand the step to add new data and delete the data in the solandra environment?
I also have a look at the reuters-demo, but the procedure to insert data is process in the file of reutersimporter.jar, but I can't see the source as well. So Please help me to understand how the system work in terms of data inserting and deleting.
Thank you.
Since you are using the JSON update handler, this UpdateJSON page on the Solr Wiki has some good examples of inserting data using the JSON handler via curl. Also, the Indexing Data section of the Solr Tutorial shows how you can insert data using the post.jar file that is included with the Solr source.
Are you creating the solr schema.xml and solrconfig.xml and posting it to solandra? If you add the JSON handler then this should work. The reutersdemo uses solrj. django-solr should work as well.
We have several custom nutch fields that the crawler picks up and indexes. Transferring this to solr via solrindex (using the mapping file) works fine. The log shows everything is fine, however the index in solr environment does not reflect this.
Any help will be much appreciated,
Thanks,
Ashok
What I would do is use a tool like tcpmon to monitor exactly what Nutch is sending to Solr. By examing the xml payload, you could determine if Nutch is correctly sending those custom fields to Solr. If Nutch is sending them correctly, there is something going on on the Solr side. On the opposite, re-check your Nutch code.