We use Solr 6.4.1 and implement several cores for searching. In one of core contain several entities. All steps for refreshing index start manually from UI, including the credentials of the database.
My question is can I reindex solr core with several entities from a remote console? I need create CI job for this.
And the second question is where I can specify custom parameters with database credentials for all cores on the server?
If the application has some sort of command, you could just trigger the command directly from the CI pipeline, if it's not the case and the indexing/update code is highly coupled to the UI, then you could use DataImportHandler so you configure in Solr (as described in the documentation) the credentials, the queries that Solr needs to execute, etc. And you just trigger the import handler from the CI pipeline, something like:
http://<host>:<port>/solr/<collection_name>/command=delta-import
This will start a delta-import, for some more commands check the Data Import Handler Commands section on the previous link.
Related
I need to create a bunch of solr cores on a Linux box. I can do this relatively easily with a combination of command line interactions to create the necessary directory structure, and the solr admin console to actually create the cores.
I would like to automate this process, but I'm not sure how to proceed. I can create the cores using the REST API, but the directory structure needs to already exist as far as I can tell. Also, I am a Windows user. Is there any way this can be done entirely from a Windows machine?
I'm not looking for code samples, I'm looking for advice on the technology/techniques I would use to accomplish this?
The url for creating core is "http://localhost:8983/Solr/admin/cores?action=CREATE&name=core-name&instanceDir=path/to/dir&config=solrconfig.xml&dataDir=data"
Here you can write a scheduler for it creating the core. Before creating the core you can check if the instanceDir exist. If not you can create the same and map it to the core creating url.
Next is solr core requires the configset, you can create your own configset and add the required files to it. Again map the the config set path to the solr core url.
Data dir is the path where indexes are stored. Create the folder and map the path of it to the solr core creation url.
You can do the same by adding all these values in the database like storing the values of configset, instanceDir etc in the tables. Use those values for creating the core. You can change these values in the database as required. You need not have to change the values at the code side. Without the code modification it will continue working.
if you are running it on unix, then you can run the cron job for creating the core as well.
I am making a Solr web-based application and one of the features is the user can create a core and schema to the Solr. My friend made it using child process by going to the directory of the Solr first and then using the command 'bin/solr create -c...' the core can be created. But I am thinking of another approach, like using the http api request. I found this.
http://localhost:8983/solr/admin/cores?action=CREATE&name=mycore&instanceDir=path/to/instance&configSet=configset2
But apparently, it cannot run properly because you need to make the config file first for the core. The error says like this.
Error CREATEing SolrCore 'mycore': Unable to create core [mycore] Caused by: Could not load configuration from directory/opt/solr/server/solr/configsets/configset2
So I am wondering what kind of approach I can do, since it seems like I can't make a core without setting up a config first. Or should I make an input menu with create core, create schema and only after the user clicks 'submit' it will process everything, from making a config file, creating schema, and then finally creating the core? I wonder if it's the best approach.
I am looking forward to any help.
You always need to provide a configuration when creating a core.
When your friend run the command, it actually used the default configuration data_driven_schema_configs, which you can confirm by reading the explanation from create_core command (create is an alias for create_core for non Cloud setup):
bin/solr create_core -h
The solr script copied that configuration and then created the core with it.
The example you showed is only valid for SolrCloud. If you are not using SolrCloud, you need to be using Core Admin API directly and manually setup the directory with configuration.
Notice that configsets are a bit of a tricky thing in the sense that if you create several cores from the same configset, that configset is shared and changes made to it by one core affect all of them. So, you most likely don't want to use them, but instead copy the configuration as I described above.
Currently I have a zookeeper instance controlling replication on 3 physical servers. It is the solr integrated zookeeper. 1 shard, 1 collection.
I have a new requirement in which I will need a new static solr instance (1 new collection, no replication). Same schema as previous collection. A copy of this instance will also be placed on the 3 physical servers mentioned above. A caveat is that I need to perform distributed searches across the 2 collections and have the results blended.
Thanks to javacreed I now know that sharding is not in my solution. Previous questions answers here and here.
In my current setup I run the following command on the server running zookeeper -
java -Dbootstrap_confdir=solr/myApp/conf -Dcollection.configName=myConfig -DzkRun -DnumShards=1 -jar start.jar
Am I correct in saying that this will not change and I will now also manually start the non replicated collection. I really only need to change my search queries to include the 'collection' parameter? Something like -
http://localhost:8983/solr/collection1/select?collection=collection1,collection2
This example is from Solr documentation. I am slightly confused as to whether it should be ...solr/collection1/select?... or ...solr/collection2/select?... or if it even matters?
Thanks
Thanks for your kind word stewart.You can search it directly on solr as
http://localhost:8983/solr/select?collection=collection1,collection2
There is no need to mention any collection path since you are defining them in the collection parameters.
I just started using apache solr with it's data import functionality on my project
by following steps in http://tudip.blogspot.in/2013/02/install-apache-solr-on-ubuntu.html
but now I have to make two different instances of my project on same server with different databases but with same configuration of solr for both projects. How can I do that?
Please help me if anyone can?
Probably the closest you can get is having two different Solr cores. They will run under the same server but will have different configuration (which you can copy paste).
When you say "different databases", do you mean you want to copy from several databases into one join collection/core? If so, you just define multiple entities and possibly multiple datasources in your DataImportHandler config and run either all of them or individually. See the other question for some tips.
If, on the other hand, you mean different Solr cores/collections, then you just want to run Solr with multiple cores. It is very easy, you just need solr.xml file above your collection level as described on the Solr wiki. Once you get the basics, you may want to look at sharing instance directories and having separate data directories to avoid changing the same config twice (instanceDir vs. dataDir settings on each core).
I think I'm missing something obvious here. I have to imagine a lot of people open up their Solr servers to other developers and don't want them to be able to modify the index.
Is there something in solrconfig.xml that can be set to effectively make the index read-only?
Update for clarification:
My goal is to use Solr with an existing Lucene index managed by another application. This works just fine, but I want to be sure Solr never tries to write to this index.
Exposing a Solr instance to the public internet is a bad idea. Even though you can strip some components to make it read-only, it just wasn't designed with security in mind, it's meant to be used as an internal service, just like you wouldn't expose a RDBMS.
From the Solr Security wiki page:
First and foremost, Solr does not
concern itself with security either at
the document level or the
communication level. It is strongly
recommended that the application
server containing Solr be firewalled
such the only clients with access to
Solr are your own. A default/example
installation of Solr allows any client
with access to it to add, update, and
delete documents (and of course
search/read too), including access to
the Solr configuration and schema
files and the administrative user
interface.
Even ajax-solr, a Solr client for javascript meant to run in a browser, recommends talking to Solr through a proxy.
Take for example guardian.co.uk: it's well-known that they use Solr for searching, but they built an API to let others access their content. This way they can define and control exactly what and how they want people to search for things.
Otherwise, any script kiddie can write a trivial loop to DoS your Solr instance and therefore bring down your site.
You can probably just remove the line that defines your solr.XmlUpdateRequestHandler in solrconfig.xml.
Replication is a nice way to setup read-only while being able to do indexation. Just setup a master with restricted access and a slave that is read-only (by removing your XmlUpdateRequestHandler from the config). The slave will be replicated from the master but won't accept any indexation directly.
UPDATE
I just read that in Solr 1.4, you can disable component. I just tried it on the /update requestHandler and I was not able to index anymore.