In SOLR, what is multicore?
Is it a way to create multiple tables (inside a single solr app) with their own set of schema files, or is it about creating different databases (inside a single solr app)?
If we want to create multiple tables (with their respective schema.xml files) for solr web app then what is the best way to do this, or how can we achieve this in SOLR?
Solr Multicore is basically a set up for allowing Solr to host multiple cores.
These Cores which would host a complete different set of unrelated entities.
You can have a separate Core for each table as well.
For e.g. If you have collections for Documents, People, Stocks which are completely unrelated entities you would want to host then in different collections
Multicore setup would allow you to
Host unrelated entities separately so that they don't impact each other
Having a different configuration for each core with different behavior
Performing activities on each core differently (Update data, Load, Reload, Replication)
keep the size of the core in check and configure caching accordingly
Related
I am using solr version 3.0.1, and I am about to change to solr 4.6.0.
Usually I just use solr without defining core (I think solr 3.0.1 doesn't have core yet).
And now I want to upgrade my solr to version 4.6.0, there is something new on it.
So i have 3 questions:
What exactly solr core is?
When i should use solr core?
Is it right that each solr core is like a table in a (relational) database? That is, can I save different type of data in different core?
Thanks in advance.
A core is basically an index with a given schema and will hold a set of documents.
You should use different cores for different collections of documents, it doesn't mean you should store different kind of documents in different indexes.
Some examples:
you could have same documents in different languages stored on different cores and select the core based on configured language;
you could have different type of documents stored in different cores to organize them physically separated;
but at the same time you could have different documents stored on the same index and differentiate them by a field value;
it really depends on your use-case.
You have to think up-front about what type of queries you are going to execute against you Solr index. You then lay down your schema of a core or several cores accordingly.
If you for example execute some JOIN queries on your relational DB, those won't be very efficient (if at all possible) with lots of documents in the SOLR index, because it is NoSQL world (here read as: non-relational). In such a case you might need to duplicate your data from several DB tables into one core's schema.
As Francisco has already mentioned physically core is represented as an independent entity with its own schema, config and index data.
One caution with multi-core setup: all the cores configured under the same container instance will hence share the same JVM. This means you should be careful with the amount of data you store on those cores. Lucene, which is an indexing engine inside Solr, has really neat and fast (de)compression algorithms (in versions 4.x) so disk can leave for longer, but JVM heap is something to care about.
The goodies of cores coupled with the Solr admin UI are things like:
core reload after schema / solrconfig changes
core hot swap (if you have a live core serving queries you can hot swap it with a new core with same data and some modifications)
core index optimization
core renaming
What are the pros and cons of having multiple Solr applications for completely different searches comparing to having a single Solr application but have different searches setup as separate cores?
What is the Solr's preferred method? Is having a single Solr application with multicore setup (for various search indexes) is always a right way?
There is no preferred method. It depends on what you are trying to solve. So by nature, can handle multiple cores on the single Solr instance or can have cores across Solr application servers , can handle the collection (in solrcloud).
Having said that, usually you go for
1) Single core on a Solr instance if your data is fairly small - few million documents.
2) You go for multiple solr instances with a single core on each if you want to shard your data incase of billions of documents and want to get better indexing and query performance.
3) You go for multiple cores on single or multiple solr instances if you have multitenancy separating, example a core for each customer or a for catalog another core for skus.
It depends on your use case, the volume of data and query response times etc.
I'd like to set up SolrCloud with one collection consisting of three different shards.
I understand that since a collection represents a single logical index, it must have a single schema. I'm wondering, however, if each shard can have a different solrconfig?
Despite a fair amount of searching, I haven't seen any examples where a collection consists of a single schema but multiple solrconfig's. The SolrCloud tutorials I've worked through all init the collection with one bootstrapping config:
java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
However, there are some elements in SolrCloud documentation that leads me to believe a SolrCloud set up with a single schema yet different solrconfig files for each shard might be possible. From "Solr Glossary":
"Collection: In Solr, one or more documents grouped together in a single logical index. A collection must have a single schema, but can be spread across multiple cores."
If a collection must have a single schema, but can consist of multiple cores, is that an indication that these different cores can have different solrconfig's? If so, how can this be set up?
Any help would be much appreciated.
Collection is a logical container for the same configuration. You cannot have cores with different configuration in single collection.
In general, you may query several collections (see SolrCloud wiki for that), if those collections have same schema. This will work only if both collections reside on the same zookeeper cluster. Give it a try.
Building an application. Right now we have one Solr server. But we would like to design the app so that it can support multiple Solr shard in future if we outgrow the indexing needs.
What are keys things to keep in mind when developing an application that can support multiple shards in future?
we stored the solr URL /solr/ in a DB. Which is used to execute queries against solr. There is one URL for Updates and one URL for Searches in the DB
If we add shards to the solr environment at a future date, will the process for using the shards be as simple as updating the URLs in the DB? Or are there other things that need to be updated. We are using SolrJ
e.g. change the SolrSearchBaseURL in DB to:
https://solr2/solr/select?shards=solr1/solr,solr2/solr&indent=true&q={search_query}
And updating the SolrUpdateBaseURL in DB to
https://solr2/solr/
?
Basically, what you are describing has already been implemented in SolrCloud. There the ZooKeeper maintains the state of your search cluster (which shards in what collections, shard replicas, leader and slave nodes and more). It can handle the load on indexing and querying sides by using hashing.
You could, in principle, get by (at least in the beginning of your cluster growth) with the system you have developed. But think about replicating, adding load balancers, external cache servers (like e.g. varnish): in the long run you would end up implementing smth like SolrCloud yourself.
Having said that, there are some caveats to using hash based indexing and hence searching. If you want to implement logical partitioning of you data (say, by date) at this point there is no way to this but making a custom code. There is some work projected around this though.
Currently I had 2 different schema set (setA/ and setB/) sitting under multicore/ folder in a jetty solr path /opt/solr/example/multicore.
If I wanna create shads for each schema, how should I go about it?
Thanks,
Two shards will have the same configuration, but different documents. So you make a copy of your configuration on a new server, then put half the documents on each server.
The Solr page on distributed search gives a little bit of information about querying across multiple shards.