I don't understand in Solr wiki, whether Solr takes one schema.xml, or can have multiple ones.
I took the schema from Nutch and placed it in Solr, and later tried to run examples from Solr. The message was clear that there was error in schema.
If I have a Solr, am I stuck to a specific schema? If not, where is the information for using multiple ones?
From the Solr Wiki - SchemaXml page:
The schema.xml file contains all of the details about which fields
your documents can contain, and how those fields should be dealt with
when adding documents to the index, or when querying those fields.
Now you can only have one schema.xml file per instance/index within Solr. You can implement multiple instances/indexes within Solr by using the following strategies:
Running Multiple Indexes - please see this Solr Wiki page for more details.
There are various strategies to take when you want to manage multiple "indexes" in a Single Servlet Container
Running Multiple Cores within a Solr instance. - Again, see the Solr Wiki page for more details...
Multiple cores let you have a single Solr instance with separate
configurations and indexes, with their own config and schema for very
different applications, but still have the convenience of unified
administration. Individual indexes are still fairly isolated, but you
can manage them as a single application, create new indexes on the fly
by spinning up new SolrCores, and even make one SolrCore replace
another SolrCore without ever restarting your Servlet Container.
Related
We have thousands of solr indexes/collections that share pages being crawled by nutch.
Currently these pages are being crawled multiple times, once for each solr index that contains them.
It is possible to crawl these sites once, and share the crawl data between indexes?
Maybe by checking existing crawldbs if a site has been crawled and get the data from there for parsing and indexing.
Or crawl all sites in one go, and then selectively submit crawl data to each index. (eg: one site per segment, but not sure how to identify which segment belongs to what site due to segment names are numeric)
Any ideas or help appreciated :)
You will need to write a new indexer plugin to do that; look at the SolrIndexer of Nutch to understand how to write a new indexer. In that indexer, you should do the following:
Define three or four Solr server instances, one for each core.
Inside the write method of the indexer, examine the type of the document and use the right Solr core to add the document. By right, you should have a field at Nutch that you can use to determine where to send the document.
We are using apache solr to implement search in our application.
The search will be such that the user can search for employees, offices or both. We need to have auto suggest feature and search for the same.
My question is how do i import data from two tables without using a join(As offices and tables are not related directly) in db-data-config file. I tried using two entities but it gave me an error saying the unique key needed to be the same.
Also how do i configure the fields of these two entities in the schema.xml file
Please help
You should be perfectly ok with single core and multiple entities.
You just need to have some discriminator that you append to ID column in your database (if it's numeric and you want to use it as identity in Solr).
You would also like to have some column that keeps your data type and declare fields from all tables in Solr document.
Keep in mind that Solr schema is not the same as SQL schema. You can have many fields declared in schema.xml but only use few of them in your documents. It costs nothing. Only fields you actually set are stored.
I've been loading data for many data types with different schema to Solr in my previous project. Let me know if you need some examples, I'll try to find them.
More info about data import in solr:
http://wiki.apache.org/solr/DataImportHandler
It sounds like what you have are two different types of document that you want to index in Solr. To do this I believe you will need to set up a multi-core solr instance with separate schema.xml files for each one.
For more information see this question:
what is SOLR multicore exactly
And here:
https://wiki.apache.org/solr/CoreAdmin
Currently I have a zookeeper multi solr server, single shard setup. Unique ids are generated automatically by solr.
I now have a zookeeper mult solr server, multi shard requirement. I need to be able to route updates to a specific shard.
After reading http://searchhub.org/2013/06/13/solr-cloud-document-routing/ I am concerned that I cannot allow solr to generate random unique ids if I want to route updates to a specific shard.
Cannot anyone confirm this for me and perhaps give an explanation of the best approach.
Thanks
There is no way you can route your documents to a particular shard since it is being managed by the zookeeper.
Solution to your problem is that you should create two collections instead of two shards. Use your 1st collection with two servers and 2nd collection can use the third server and then you can send your updates to particular servers.The design should look like
collection1---->shard1---->server1,server2
collection2---->shard1----->server3
This way you can separate your indexes as per your requirement.
Can I use different format (schema.xml) for a document (like car document)?
So that I can use different index to query the same class of documents differently?
(OK.. I can use two instances of Solr, but.. that's the only way? )
Only one schema is possible for a Core.
You can always have different Cores within the same solr with the Multicore configuration.
However, if you have the same entity and want to query it differently, you can have the same schema.xml to hold the values different fields and different field types (Check copyfield) and have different query handler to have weighted queries depending upon the needs.
As far as I know you can only have one schema file per Solr core.
Each core uses its own schema file so if you want to have two different schema files then either set -up a 2nd Solr core or run another instance of Solr
Currently I had 2 different schema set (setA/ and setB/) sitting under multicore/ folder in a jetty solr path /opt/solr/example/multicore.
If I wanna create shads for each schema, how should I go about it?
Thanks,
Two shards will have the same configuration, but different documents. So you make a copy of your configuration on a new server, then put half the documents on each server.
The Solr page on distributed search gives a little bit of information about querying across multiple shards.