Migrating from Standalone Apache SOLR to SOLR Cloud

Migrating from Standalone Apache SOLR to SOLR Cloud - solr

We have a standalone Apache Solr setup in local Server for testing purpose, with around 10 cores created in it. Now we are planning to setup a new Solrcloud environment in AWS with numShards:3 and replicationFact:3 .
Is there any way to transfer existing Apache Solr core (Schema and data) to new SolrCloud environment in AWS.

Used ZooKeeper to Manage Configuration Files. Uploaded Configuration file to Zookeeper : /bin/solr zk upconfig -n -d

Related

Where to find Apache Solr port configuration/setting in Apache Solr filesystem or source files or in which xml file can I find it

In Apache Solr 8.4.1, where could I find the solr port that will listen to Solr REST API request.
Where to find Apache Solr port configuration/setting in Apache Solr filesystem or source files or in which xml file can I find it.

At the root of your Solr installation, you will find a bin folder that contains the scripts used to interact with Solr instances (this is what we have here).
The port Solr binds to and other settings are defined in solr.in.sh (or solr.in.cmd if you are on a windows machine). As stated in that file :
Settings here will override settings in existing env vars or in bin/solr. The default shipped state of this file is completely commented.
By default, you should have this :
#SOLR_PORT=8983

Connecting to apache atlas + hbase + solr setup with gremlin cli

I am new to atlas and janusgraph, I have a local setup of atlas with hbase and solr as the backends with dummy data.
I would like to use gremlin cli + gremlin server and connect to the existing data in hbase. ie: view and traverse the dummy atlas metadata objects.
This is what I have done so far:
Run atlas server + hbase + solr - inserted dummy entities
Run gremlin server with the right configuration
I have set the graph: { ConfigurationManagementGraph: ..} to janusgraph-hbase-solr.properties
Run gremlin cli, connect with :remote connect tinkerpop.server conf/remote.yaml session which connects the gremlin server just fine.
I do graph = JanusGraphFactory.open(..../janusgraph-hbase-solr.properties) and create g = graph.traversal()
I am able to create my own vertex and edges and list them, but not able to list anything related to atlas ie: entities etc.
What am I missing?
I want to connect to existing atlas setup and traverse the graph with gremlin cli.
Thanks

To be able to access Atlas artifacts from gremlin cli you will have to add Atlas dependency jars to Janusgraph's lib directory.
You can get the jars from Atlas maven repo or from your local build.
$ cp atlas-* janusgraph-0.3.1-hadoop2/lib/
list of JARs
atlas-common-1.1.0.jar
atlas-graphdb-api-1.1.0.jar
atlas-graphdb-common-1.1.0.jar
atlas-graphdb-janus-1.1.0.jar
atlas-intg-1.1.0.jar
atlas-repository-1.1.0.jar
A sample query could be:
gremlin> :> g.V().has('__typeName','hive_table').count()
==>10

As ThiagoAlvez mentioned, Atlas docker image can be used since Tinknerpop Gremlin support is now build-in into it and can be easily used to play with Janusgraph, and Atlas artifacts using gremlin CLI:
Pull the image:
docker pull sburn/apache-atlas
Start Apache Atlas in a container exposing Web-UI port 21000:
docker run -d \
-p 21000:21000 \
--name atlas \
sburn/apache-atlas \
/opt/apache-atlas-2.1.0/bin/atlas_start.py
Install gremlin-server and gremlin-console into the container by running included automation script:
docker exec -ti atlas /opt/gremlin/install-gremlin.sh
Start gremlin-server in the same container:
docker exec -d atlas /opt/gremlin/start-gremlin-server.sh
Finally, run gremlin-console interactively:
docker exec -ti atlas /opt/gremlin/run-gremlin-console.sh

I had this same issue when trying to connect to the Apache Atlas JanusGraph database (org.janusgraph.diskstorage.solr.Solr6Index).
I got it solved after moving the atlas jars to the JanusGraph lib folder as anand said and then configuring janusgraph-hbase-solr.properties.
These are the configurations that set on janusgraph-hbase-solr.properties:
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=hbase
storage.hostname=localhost
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5
index.search.backend=solr
index.search.solr.mode=http
index.search.solr.http-urls=http://localhost:9838/solr
index.search.solr.zookeeper-url=localhost:2181
index.search.solr.configset=_default
atlas.graph.storage.hbase.table=apache_atlas_janus
storage.hbase.table=apache_atlas_janus
I'm running Atlas using this docker image: https://github.com/sburn/docker-apache-atlas

Solrcloud multicore configuration

I have a standalone Solr instance with 4 different cores working fine using the embedded Jetty server. I configured the cores for v4.10.3 but since I moved to v5.1 and all seems to work fine without any changes.
Before going into production, I need to set it up as a Solrcloud installation, initially with 2 nodes (two different machines) with 1 shard per node (to keep it simple). I have been trying to get it to work but I have not been able to do it.
I tried to run it like this (I think using start.jar is not the preferred way), having read that Solr will look for multiple configured cores in any nested folders (which works for standalone Solr):
java -DzkRun -DnumShards=2 -Dbootstrap_confdir=solr/ -jar start.jar
but that did not work, it does not find the needed solrconfig.xml file.
My Solr directory looks like this:
My solr.xml file is the standard one:
<solr>
<solrcloud>
<str name="host">${host:}</str>
<int name="hostPort">${jetty.port:8983}</int>
<str name="hostContext">${hostContext:solr}</str>
<int name="zkClientTimeout">${zkClientTimeout:30000}</int>
<bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
</solrcloud>
<shardHandlerFactory name="shardHandlerFactory"
class="HttpShardHandlerFactory">
<int name="socketTimeout">${socketTimeout:0}</int>
<int name="connTimeout">${connTimeout:0}</int>
</shardHandlerFactory>
</solr>
Each core looks like this:
And the core.properties just has the name of the core:
name=users
My question is:
How do I start Solrcloud v5.1 so the 4 cores are picked up?

In SolrCloud each of your Core will become a Collection.
Each Collection will have its own set of Config Files and data.
You might find this helpful Moving multi-core SOLR instance to cloud
Solr 5.0 (onwards) has made some changes on how to create a SolrCloud setup with shards, and how to add collections etc.
Everything listed below is my understanding of the Solr Reference Guide. I will highly recommend going through it thoroughly.
https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide
I setup my servers on a Linux(CentOS) server, but the steps can be used to setup solr on Windows system also. For example, there is solr.cmd file instead of solr.sh
Here are the steps I followed to create a simple two shard SolrCloud setup.
Setup the zookeeper ensemble. I am assuming you are trying to use the
embedded ZK in solr. For a production system, it is highly
recommended to create a external ZK ensemble. You can find steps to install a external ensemble in this section of reference guid
Download solr to /opt folder.
Extract the install file ONLY.
tar xzf solr-5.0.0.tgz solr-5.0.0/bin/install_solr_service.sh --strip components=2
This command will install solr on your system
sudo bash ./install_solr_service.sh solr-5.0.0.tgz
The above command will create a new user called "solr" if it does not exist.
These are some of the default options it will assume. You can view this in /var/solr/solr.in.sh . This is the include file where you can specify other options.
* SOLR_PID_DIR=/var/solr
* SOLR_HOME=/var/solr/data
* LOG4J_PROPS=/var/solr/log4j.properties
* SOLR_LOGS_DIR=/var/solr/logs
* SOLR_PORT=8983
Running install_solr_service start in the above step will start a solr server. Stop the server using service solr stop before doing any of the changes below.
Change Java heap value
SOLR_HEAP="3g"
This will set Xmx and Xms as 3GB . (optional)
This variable is not mentioned in the solr.in.sh file in Solr 5.1 . Its a bug and has been fixed, will be released in next version.
SOLR_MODE="solrcloud" Required
this is what you need start solr in cloud mode.
ZK_HOST=ZK1:2181,ZK2:2181,ZK3:2181 Required
(replace zk with you zookeeper host names)
Running the install_solr_service.sh command also creates a init.d file as /etc/init.d/solr
This init.d script in turn calls the /opt/solr/bin/solr script and includes all the variables from /var/solr/solr.in.sh
Once you have made the above changes, start solr again using service solr start
You can check the status using service solr status
Creating Collections Shards and Replicas
- All shard, collection, replica related commands are now made using Collections API.
Before creating a collection a config folder should be uploaded to ZK .
This can be done using the zkcli.sh script in the solr folder (not on the zookeeper servers)
Folder: /opt/solr/server/scripts/cloud-scripts
The command to upload the confg folder is
sh zkcli.sh -cmd upconfig -zkhost zk1:2181,zk2:2181,zk3:2181 -confname yourconfigname -confdir /var/solr/configs/conf
You will run this command 4 times for each of your 4 cores, each time changing the path of the conf folder and config name.
This will upload all the config files in conf folder with the name 'yourconfigname' in zookeeper.
Creating a collection
I used the following command to create a new collection.
http://1.1.1.1:8983/solr/admin/collections?action=CREATE&name=yourcollectionname&numShards=2&replicationFactor=1&maxShardsPerNode=1&createNodeSet=1.1.1.1:8983_solr,2.2.2.2:8983_solr&collection.configName=yourconfigname
Happy Searching!

SolrCloud does not use configuration files stored in core conf directory. To make your cores visible in SolrCloud structure you need to upload the configuration files to ZooKeeper and keep it manage the files to you. All the time a Solr instance comes up it get the configuration files stored in ZooKeeper. This way your cores doesn't need to have conf directory to work. To upload your core configuration files to ZooKeeper follow the link bellow and take a look at Upload a configuration directory
https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities

what is solr web interface url

I am using datastax 2.2.2 in my ubuntu:12.04 system. I just downloaded tar ball and started solr using
dse cassandra -s
It starts solr. I verified it using
netstat -plten
It show port 8983 is being used.
10.XX.XX.XX:8983/solr/ --> Call never ends. It's keep on loading.
I started opscenter and checked with ip:8888/ it shows opscenter ui.
What is the solr web interface url for datastx 2.2.2 and Do i need to change any configurations for solr?
Any Ideas?

The Solr Admin UI default is usually:
<ip>:8983/solr
For example from my test VM on my local machine:
http://192.168.56.20:8983/solr/

Uploading (updating) config files into a running ZooKeeper (SolrCloud)

I want to be able to add new collections to my SolrCloud dynamically. But to add a collection I have to upload the config file into ZooKeeper first, right?
But how can I do that without restarting ZooKeeper and my Solr instance?

If you are using SOLR4.0+ then you dont have to restart the SOLR instances.
http://technical-fundas.blogspot.in/2014/07/solr-reload-solrconfigxml-and-schemaxml.html
And When you upload the config files zookeeper, you neither have to restart the zookeeper as well.
Steps:
1) Upload the solrconfig.xml / schema.xml file in zookeeper
2) RUN reload command
http://technical-fundas.blogspot.in/2014/07/solr-reload-solrconfigxml-and-schemaxml.html
Hope this answers to your question...

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight