Solr, can you use multiple schema.xml for a document? - solr

Can I use different format (schema.xml) for a document (like car document)?
So that I can use different index to query the same class of documents differently?
(OK.. I can use two instances of Solr, but.. that's the only way? )

Only one schema is possible for a Core.
You can always have different Cores within the same solr with the Multicore configuration.
However, if you have the same entity and want to query it differently, you can have the same schema.xml to hold the values different fields and different field types (Check copyfield) and have different query handler to have weighted queries depending upon the needs.

As far as I know you can only have one schema file per Solr core.
Each core uses its own schema file so if you want to have two different schema files then either set -up a 2nd Solr core or run another instance of Solr

Related

Import data from two tables without join in apache solr

We are using apache solr to implement search in our application.
The search will be such that the user can search for employees, offices or both. We need to have auto suggest feature and search for the same.
My question is how do i import data from two tables without using a join(As offices and tables are not related directly) in db-data-config file. I tried using two entities but it gave me an error saying the unique key needed to be the same.
Also how do i configure the fields of these two entities in the schema.xml file
Please help
You should be perfectly ok with single core and multiple entities.
You just need to have some discriminator that you append to ID column in your database (if it's numeric and you want to use it as identity in Solr).
You would also like to have some column that keeps your data type and declare fields from all tables in Solr document.
Keep in mind that Solr schema is not the same as SQL schema. You can have many fields declared in schema.xml but only use few of them in your documents. It costs nothing. Only fields you actually set are stored.
I've been loading data for many data types with different schema to Solr in my previous project. Let me know if you need some examples, I'll try to find them.
More info about data import in solr:
http://wiki.apache.org/solr/DataImportHandler
It sounds like what you have are two different types of document that you want to index in Solr. To do this I believe you will need to set up a multi-core solr instance with separate schema.xml files for each one.
For more information see this question:
what is SOLR multicore exactly
And here:
https://wiki.apache.org/solr/CoreAdmin

In what condition i can use solr core

I am using solr version 3.0.1, and I am about to change to solr 4.6.0.
Usually I just use solr without defining core (I think solr 3.0.1 doesn't have core yet).
And now I want to upgrade my solr to version 4.6.0, there is something new on it.
So i have 3 questions:
What exactly solr core is?
When i should use solr core?
Is it right that each solr core is like a table in a (relational) database? That is, can I save different type of data in different core?
Thanks in advance.
A core is basically an index with a given schema and will hold a set of documents.
You should use different cores for different collections of documents, it doesn't mean you should store different kind of documents in different indexes.
Some examples:
you could have same documents in different languages stored on different cores and select the core based on configured language;
you could have different type of documents stored in different cores to organize them physically separated;
but at the same time you could have different documents stored on the same index and differentiate them by a field value;
it really depends on your use-case.
You have to think up-front about what type of queries you are going to execute against you Solr index. You then lay down your schema of a core or several cores accordingly.
If you for example execute some JOIN queries on your relational DB, those won't be very efficient (if at all possible) with lots of documents in the SOLR index, because it is NoSQL world (here read as: non-relational). In such a case you might need to duplicate your data from several DB tables into one core's schema.
As Francisco has already mentioned physically core is represented as an independent entity with its own schema, config and index data.
One caution with multi-core setup: all the cores configured under the same container instance will hence share the same JVM. This means you should be careful with the amount of data you store on those cores. Lucene, which is an indexing engine inside Solr, has really neat and fast (de)compression algorithms (in versions 4.x) so disk can leave for longer, but JVM heap is something to care about.
The goodies of cores coupled with the Solr admin UI are things like:
core reload after schema / solrconfig changes
core hot swap (if you have a live core serving queries you can hot swap it with a new core with same data and some modifications)
core index optimization
core renaming

Prefer multiple Solr applications or single application multicore setup?

What are the pros and cons of having multiple Solr applications for completely different searches comparing to having a single Solr application but have different searches setup as separate cores?
What is the Solr's preferred method? Is having a single Solr application with multicore setup (for various search indexes) is always a right way?
There is no preferred method. It depends on what you are trying to solve. So by nature, can handle multiple cores on the single Solr instance or can have cores across Solr application servers , can handle the collection (in solrcloud).
Having said that, usually you go for
1) Single core on a Solr instance if your data is fairly small - few million documents.
2) You go for multiple solr instances with a single core on each if you want to shard your data incase of billions of documents and want to get better indexing and query performance.
3) You go for multiple cores on single or multiple solr instances if you have multitenancy separating, example a core for each customer or a for catalog another core for skus.
It depends on your use case, the volume of data and query response times etc.

Different shards in a SolrCloud set up with different solrconfig's?

I'd like to set up SolrCloud with one collection consisting of three different shards.
I understand that since a collection represents a single logical index, it must have a single schema. I'm wondering, however, if each shard can have a different solrconfig?
Despite a fair amount of searching, I haven't seen any examples where a collection consists of a single schema but multiple solrconfig's. The SolrCloud tutorials I've worked through all init the collection with one bootstrapping config:
java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
However, there are some elements in SolrCloud documentation that leads me to believe a SolrCloud set up with a single schema yet different solrconfig files for each shard might be possible. From "Solr Glossary":
"Collection: In Solr, one or more documents grouped together in a single logical index. A collection must have a single schema, but can be spread across multiple cores."
If a collection must have a single schema, but can consist of multiple cores, is that an indication that these different cores can have different solrconfig's? If so, how can this be set up?
Any help would be much appreciated.
Collection is a logical container for the same configuration. You cannot have cores with different configuration in single collection.
In general, you may query several collections (see SolrCloud wiki for that), if those collections have same schema. This will work only if both collections reside on the same zookeeper cluster. Give it a try.

Solr schema validation

I don't understand in Solr wiki, whether Solr takes one schema.xml, or can have multiple ones.
I took the schema from Nutch and placed it in Solr, and later tried to run examples from Solr. The message was clear that there was error in schema.
If I have a Solr, am I stuck to a specific schema? If not, where is the information for using multiple ones?
From the Solr Wiki - SchemaXml page:
The schema.xml file contains all of the details about which fields
your documents can contain, and how those fields should be dealt with
when adding documents to the index, or when querying those fields.
Now you can only have one schema.xml file per instance/index within Solr. You can implement multiple instances/indexes within Solr by using the following strategies:
Running Multiple Indexes - please see this Solr Wiki page for more details.
There are various strategies to take when you want to manage multiple "indexes" in a Single Servlet Container
Running Multiple Cores within a Solr instance. - Again, see the Solr Wiki page for more details...
Multiple cores let you have a single Solr instance with separate
configurations and indexes, with their own config and schema for very
different applications, but still have the convenience of unified
administration. Individual indexes are still fairly isolated, but you
can manage them as a single application, create new indexes on the fly
by spinning up new SolrCores, and even make one SolrCore replace
another SolrCore without ever restarting your Servlet Container.

Resources