I am implementing Solr Cloud for the first time. I've worked with normal Solr and have that down pretty well, but I'm not finding a lot on what you can and can't do with Solr Cloud. So my question is about Managed Resources. I know you can CRUD stop words and synonyms using the new RESTful api in solr. However with the cloud do I need to CRUD my changes to each individual solr server in the cloud, or do I send them to a different url that sends them through to each server? I'm new to cloud and zookeeper. I have not found anything in the solr wiki about working with the managed resources in the cloud setup. Any advice would be helpful.
In SolrCloud configuration and other files like stopwords, are stored and maintained by Zookeeper. Which means you do not need to individually send updates to each server.
Once you have SolrCloud, before putting in any data, you will create a collection. Each collection has its own set of resources/config folder.
So for example if u have a collection called techproducts with 2 servers localhost1 and localhost2 the below command from any of the servers will work on the same resource.
curl "http://localhost1:8983/solr/techproducts/schema/analysis/synonyms/english"
curl "http://localhost2:8983/solr/techproducts/schema/analysis/synonyms/english"
Related
I am using the nutch REST API to run nutch searches on a seperate server. I would like to retrieve the crawled data back to my local machine. Is there a way I can use the nutch dump functionality to dump the data and retrieve it via the API, or am I better off indexing the data into Solr and retrieving it from Solr.
Thanks for your help.
Currently, the REST API doesn't provide such functionality. The main purpose of the REST API is to configure and lunch your crawl jobs. At its core, it will allow you to set the configuration of a new crawl job and manage it (to some extent).
The transfer of the crawled data is up to you. That being said I do have a couple of recommendations:
If you're sending the data into Solr/ES (or any other indexer) I would recommend getting the data directly from there. Both Solr&ES already provide a REST API, with the additional benefit that you might filter which data to "copy over".
If you're running Nutch in a distributed mode (i.e in a Hadoop cluster) try to use the Hadoop libraries to copy the data to the destination.
If none of this applies then perhaps relying on something else like rsync or similar might be worth considering.
I have been struggling in getting solution for below scenario:
I have 2 instances of solr cloud on composite engine.
I have 2 instances of application rest api which calls above solr cluster for data.
From my application I wanna take backup of solr and copy zipped backup file to google storage automatically and restore it automatically with the an url endpoint.
For that I am trying to make a api endpoint in my application that will call below solr api to take back up
admin/collections?action=BACKUP
And making another endpoint that call below url to restore
/admin/collections?action=RESTORE
However after taking backup my application doesnt have access to the back up files as they are getting saved on solr instances. So I am not able to save them to google bucket.
Please guide me a simpler way to achieve this i.e automatically backup and restore solr from other GCP instance.
Have you considered something like gcs-fuse? It'll allow you to mount a GCS bucket directly on the file system.
You can then point the BACKUP command directly to the mount point for gcs-fuse on your Solr compute engine VMs, and the whole thing is abstracted away through how the VM is configured (instead of having to be manually uploaded afterwards with a separate tool when a local copy has been made).
I found GCSFuse a bit unreliable, and decided to write a wrapper script which first detects the master for the given collection and then just executed the backup directly on that node.
I am relatively new to this. So I am trying to understand the relationships among zookeeper, solrcloud, and http requests.
My understanding is:
Zookeeper (accessible through 2181) keeps config files for solrcloud.
and all http requests goes to solrcloud instance directly rather than going through zookeeper.
Therefore, zookeeper, in this particular case, is not used for its ability in routing (API) requests? I do not really think that should be the case. But based on the tutorials from solr official sites. It seems all the requests needs to go through solr's 8983 port.
Solr uses Zookeeper to keep its clusterstate (which servers has which cores / shards / parts of the complete collection) as well as configuration files and anything else that should be available all throughout the cluster.
The request itself is made to Solr, and Solr uses information from Zookeeper in the background to route the request internally to the correct location. A client can be Cloud Aware (such as SolrJ) and can query Zookeeper directly by itself and then contact the correct Solr server instantly, instead of having Solr route the request internally. In SolrJ, this is implemented as CloudSolrClient (or CloudSolrServer as it might be named in older versions of SolrJ) (and not the regular SolrServer, which would contact the Solr instance you're referencing and then route the request from there).
If you look at the documentation of CloudSolrClient, you can see that it takes the Zookeeper information as its argument, and not the Solr Server address. SolrJ makes a ZK request to Zookeeper, retrieves the clusterstate, then makes the HTTP request directly to the servers hosting the shard or collection.
Firstly Thanks to stackoverflow which is giving support to everyone.
Iam new to drupal and solr server
I have Successfully installed the solrserver in my system and I can able to search the data using "Apache Solr search module" In drupal7.
But Actually I dont know what is the Background process that is Running.But Inorder to have work with it I need to have a ground knowledge on it.Drupal is connecting to solr server using the url which I have Provided in admin UI.
As Per My knowledge I think the following is the backend flow of Apache solr server module
1)It sends the request of search string from drupal to solr server.
2)The solr server searches for the string and send the result back in the format of json to drupal.
3)Drupal displays the results
But How the solr server connects to drupal db inorder to search for the string or content?
Please help with this..I really In a need to know the backend flow how the request is handling
Thankyou
I'm not a Drupal specialist, but from the Solr prospective you are searching on the documents previously indexed on Solr. I.e., all documents must be indexed on Solr prior to the search.
Therefore, you have 2 ways here:
You call Solr API from your backend and push documents to Solr index. There are specific drupal solutions you may research, but here is the wiki article from Solr prospective describing how to index documents using only JSON API: http://wiki.apache.org/solr/UpdateJSON
You connect to your database directly from Solr and pull documents to Solr index. Here is the related wiki page: http://wiki.apache.org/solr/DataImportHandler
I am making a search query on local Apache Solr Server by browser and see the results.
I want to make Same Query on the production server.
Since tomcat port is blocked on production, I cannot test the query results on the browser.
Is there any method to make query and see the results?
Solr is a java web application: if you can't access the port it's listening to, you can't access Solr itself. There's no other way to retrieve data from a remote location. Usually on production Solr is put behind an apache proxy, so that it protects the whole Solr and makes accessible only the needed contexts, in your case solr/select for example to make queries.