Import data from two tables without join in apache solr

Import data from two tables without join in apache solr - solr

We are using apache solr to implement search in our application.
The search will be such that the user can search for employees, offices or both. We need to have auto suggest feature and search for the same.
My question is how do i import data from two tables without using a join(As offices and tables are not related directly) in db-data-config file. I tried using two entities but it gave me an error saying the unique key needed to be the same.
Also how do i configure the fields of these two entities in the schema.xml file
Please help

You should be perfectly ok with single core and multiple entities.
You just need to have some discriminator that you append to ID column in your database (if it's numeric and you want to use it as identity in Solr).
You would also like to have some column that keeps your data type and declare fields from all tables in Solr document.
Keep in mind that Solr schema is not the same as SQL schema. You can have many fields declared in schema.xml but only use few of them in your documents. It costs nothing. Only fields you actually set are stored.
I've been loading data for many data types with different schema to Solr in my previous project. Let me know if you need some examples, I'll try to find them.
More info about data import in solr:
http://wiki.apache.org/solr/DataImportHandler

It sounds like what you have are two different types of document that you want to index in Solr. To do this I believe you will need to set up a multi-core solr instance with separate schema.xml files for each one.
For more information see this question:
what is SOLR multicore exactly
And here:
https://wiki.apache.org/solr/CoreAdmin

Related

How can I download all documents from Retrieve and Rank (Solr)?

We have a Cloudant database on Bluemix that contains a large number of documents that are answer units built by the Document Conversion service. These answer units are used to populate a Solr Retrieve and Rank collection for our application. The Cloudant database serves as our system of record for the answer units.
For reasons that are unimportant, our Cloudant database is no longer valid. What we need is a way to download everything from the Solr collection and re-create the Cloudant database. Can anyone tell me a way to do that?

I'm not aware of any automated way to do this.
You'll need to fetch all your documents from Solr (and assuming you have a lot of them, do this in a paginated way - there are some examples of how to do this in the Solr doc) and add them into Cloudant.
Note that you'll only be able to do this for the fields that you have set to be stored in your schema. If there are important fields that you need in Cloudant that you haven't got stored in Solr, then you might be stuck. :(

You can replicate one Cloudant database to another which will create you an exact replica.
Another technique is to use a tool such as couchbackup which takes a copy of your database's documents (ignoring any deletions) and allows you to save the data in a text file. You can then use the couchrestore tool to upload the data file to a new database.
See this blog for more details.

Solr, can you use multiple schema.xml for a document?

Can I use different format (schema.xml) for a document (like car document)?
So that I can use different index to query the same class of documents differently?
(OK.. I can use two instances of Solr, but.. that's the only way? )

Only one schema is possible for a Core.
You can always have different Cores within the same solr with the Multicore configuration.
However, if you have the same entity and want to query it differently, you can have the same schema.xml to hold the values different fields and different field types (Check copyfield) and have different query handler to have weighted queries depending upon the needs.

As far as I know you can only have one schema file per Solr core.
Each core uses its own schema file so if you want to have two different schema files then either set -up a 2nd Solr core or run another instance of Solr

Distributed search in SOLR

I am using SOLR 1.3.0 for performing a distributed search over already existing lucene indices. The question is, is there any way in which I could find from which shard did a result come up after the search?
P.S : I am using the REST api.

For Solr sharding -
Documents must have a unique key and the unique key must be stored
(stored="true" in schema.xml)
I think the logic should be already there on your side, by which you are feeding the data to the shards, as the ids need to be unique.
e.g. the simplest is the odd even combination, but you may have some complex ones by which you distribute the data into the shards.

You may be able to get some information using debugQuery=on, but if this is something that you'll query often I'd add a specific stored field for the shard name.
PS: Solr doesn't have a REST API.

Solr/SolrNet: How can I update a document given a document unique ID?

I need to update few fields of each document in Solr index separately from the main indexing process. According to documentation "Create" and "Update" are mapped onto the "Add()" function. http://code.google.com/p/solrnet/wiki/CRUD
So if I add a document which already exist, will it replace the entire document or just the fields that I have specified?
If it'll replace the entire document then the only way that I can think of in order to update is to search the document by unique id, update the document object and then "Add" it again. This doesn't sound feasible because of the frequency of update ops required. Is there a better way to update?
Thanks!

Unfortunately, Solr does not currently support updating individual fields for a given document in the index. The later scenario you describe of retrieving the entire document contents (either from Solr or the original source) and then resending the document (adding via SolrNet) is the only way to update documents in Solr.
Please see the previous question: Update specific field on Solr index for more details about Solr not supporting individual field updates and an open JIRA issue for adding this support to Solr.

If you need to frequently update a lot of documents in SOLR, you might need to rethink your entire solution. In typical solutions that use SOLR and require lots of frequent updates to documents, the way it is usually done is that the documents reside in some SQL or NoSQL database, and they are modified there. Then you use DIH or something similar to bulk update the SOLR index from the database, possibly just dropping the index and re-indexing all content. SOLR can index documents very quickly so that is typically not a problem.

Partial updating of documents is now supported in the newer versions of Solr, for example 4.10 does pretty well. Please look at the following page for more information:
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
The only detail is that you need to declare your fields as stored=true to allow for partial updates.
I also show how to do it in this training:
http://www.pluralsight.com/courses/enterprise-search-using-apache-solr
In this specific module: Content: Schemas, Documents and Indexing

Solr schema validation

I don't understand in Solr wiki, whether Solr takes one schema.xml, or can have multiple ones.
I took the schema from Nutch and placed it in Solr, and later tried to run examples from Solr. The message was clear that there was error in schema.
If I have a Solr, am I stuck to a specific schema? If not, where is the information for using multiple ones?

From the Solr Wiki - SchemaXml page:
The schema.xml file contains all of the details about which fields
your documents can contain, and how those fields should be dealt with
when adding documents to the index, or when querying those fields.
Now you can only have one schema.xml file per instance/index within Solr. You can implement multiple instances/indexes within Solr by using the following strategies:
Running Multiple Indexes - please see this Solr Wiki page for more details.
There are various strategies to take when you want to manage multiple "indexes" in a Single Servlet Container
Running Multiple Cores within a Solr instance. - Again, see the Solr Wiki page for more details...
Multiple cores let you have a single Solr instance with separate
configurations and indexes, with their own config and schema for very
different applications, but still have the convenience of unified
administration. Individual indexes are still fairly isolated, but you
can manage them as a single application, create new indexes on the fly
by spinning up new SolrCores, and even make one SolrCore replace
another SolrCore without ever restarting your Servlet Container.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Import data from two tables without join in apache solr - solr

Related

How can I download all documents from Retrieve and Rank (Solr)?

Solr, can you use multiple schema.xml for a document?

Distributed search in SOLR

Solr/SolrNet: How can I update a document given a document unique ID?

Solr schema validation

Categories

Resources