Denormalize datasource input for Solr

Denormalize datasource input for Solr - solr

I have a MySql database from which I need to fetch data into Solr that is normalized in MySql over several tables. For example, I have an 'articles' table that have a 'companyId' column. 'companyIds' are linked to 'companyName' in a second table 'company'. So in order to be able to find articles by company name using Solr I need to denormalize when building the Solr index.
What is the easiest way to do this? Can denormalization be done in the data source configuration or do I need to denormalize prior to creating the index?
Feeding data using Solrj and normalizing while doing it seems to be the easiest method I can come up with at the moment (although it seems unnecessary if Solr has those features).

Ah, I found what I was looking for in the documentation for the data import handler. Queries on tables holding values of references found in the current table can be extracted using queries of 'child entities' like below.
The category name of the item is resolved by selecting from the category table using the category_id from the parent entity/query:
<entity name="item_category" query="select category_id from item_category where item_id='${item.id}'">
<entity name="category" query="select description from category where id = '${item_category.category_id}'">
<field column="description" name="cat" />
</entity>
</entity>
XML from here:
http://wiki.apache.org/solr/DataImportHandler#Full_Import_Example

Related

How to decide the dynamic filed in solr using data type and without any suffix or prefix

I am indexing the RDBMS data into solr from my java application. For each row of a table I am creating a java bean and adding to solr server.(While creating a bean which is nothing but one solr document, I am using table's column name as field name of solr doc and corresponding value as solr field's value). But we support to index data from any number of tables , where each table will have different column names and data types. To, handle this we are using dynamic fields in schema.xml as below
<dynamicField name="*" type="string" indexed="true" stored="true" multiValued="true"/>
But the problem with this configuration is all the fields type is String , but I want to use numeric types for numeric data types in RDBMS and String for Varchar data type. Please suggest me how can I achieve this. I can't use suffix or prefix to field name while creating solr doc because I want to index and retrieve the docs using field name same as column name of table.
Any suggestions are appreciated.

Frequently Index SearchTerm to Solr

I am working on eCommerce web application which is developed using DOT NET MVC. I use Solr to index product details. So that I have mentioned Product related fields to my Solr Schema file.
Now I also want to index SearchTerm to Solr. For this how can I manage my Schema file to store/index searchterm as my Schema file is product specific?
Can anyone please suggest?

You can have a separate core for this and define the new schema.xml for it or if you want to use the existing schema.xml then you can make use of the dynamic fields by which you need not have bother in future if any other field you need to add..
You can use Dynamic fields.
Dynamic fields allow Solr to index fields that you did not explicitly define in your schema.
This is useful if you discover you have forgotten to define one or more fields. Dynamic fields can make your application less brittle by providing some flexibility in the documents you can add to Solr.
A dynamic field is just like a regular field except it has a name with a wildcard in it. When you are indexing documents, a field that does not match any explicitly defined fields can be matched with a dynamic field.
For example, suppose your schema includes a dynamic field with a name of *_i.
If you attempt to index a document with a cost_i field, but no explicit cost_i field is defined in the schema, then the cost_i field will have the field type and analysis defined for *_i.
Like regular fields, dynamic fields have a name, a field type, and options.
<dynamicField name="*_i" type="int" indexed="true" stored="true"/>

Store category information in Solr

I have product information stored in my solr database. A product can be a part of multiple categories.
Now, I want to store information about those categories inside the product which belongs in those categories. (Is there any other way?)
So, say a product A belongs to category C1 and C2 with ids I1 and I2. Now how do i store this mapping of I1 to C1 in my product A? What should be the schema to do so?
But, if a simply store a list of ids, names and some other data(say urls), then the mapping of each id to name or url will be lost. Like this:
<field name="category_ids" type="tints" indexed="true" stored="true"/>
<field name="category_names" type="strings" indexed="true" stored="true"/>
So how should I store documents?

The way you've described works - Solr will keep the sequence between fields the same, so you can assume that the first value in the field category_ids corresponds to the first value in the field category_names. We use this to index more complex objects in several multi value fields.
A second solution is to use the category id to look up the actual category information in your middleware, querying the database for the related information. This will avoid having to reindex all documents for a category if the name changes (except if you use the name for querying, which will require you to do a re-index regardless of solution selected).
A third solution would be to have a field containing both id, name in a serialized form, such as 3;Laptops or as JSON, and just store the field while not indexing it (and use an indexed, non-stored field for actual searching).
You can also use child documents for something like this, but my personal opinion is that it'll give you quite a bit of unnecessary complexity.

Database mapping to Solr

I'm building a Java app using a relational database and I wish to map it's primary data to a Solr index/es. However, I'm not sure how to map the components of a database. At the momement I've mapped a single row cell to a Solr/Lucene Document.
A doc would be something like this (each line is a field):
schema: "schemaName"
table: "tableName"
column: "columnName"
row: "rowNumber"
data: "data on schemaName.tableName.columnName.row"
This allows me to have a "fixed" Solr schema.xml(as far as I know it has to be defined "before" creating indexes). Also dynamic fields doesn't seem to serve my purpose.
What I've found while searching is that a single row is usually mapped to a Solr Document and each column is mapped as a Field. But, how can I add the column names as fields into schema.xml (when I don't know the columns a table has)? Also, I would need the info to be queried as if it was SQL. I.e, search for all rows of a column in a table, etc, etc.
With my current "solution" I can do that kind of queries but I'm worried with performance as I'm new to Solr and I don't know the implications it may have.
So, what do you say about my "solution"? Is there another way map a database to a Solr index concerning the schema.xml fields should be set before indexing? I've also "heard" that a table is usually mapped to a index: how could I achieve that?
Maybe I'm just being noob but by the research I did I don't see how I can map a database Row to a Solr Document without messing with schema.xml Fields.
I would appreciate any thoughts :) Regards.

You can specify your table columns in the schema before hand or use dynamic fields and then use the solr DIH to import the data into solr from the database. Select your dynamic fields name in the queries for DIH.
Please go through Solr DIH for database integration

It is possible to update uniqueKey in Solr 4?

My uniqueKey is defined as:
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<uniqueKey>id</uniqueKey>
I load several docs into Solr with its corresponding "id" field, what i need now is UPDATE "id" value, It is possible?
When I try to do that I get this error:
Document contains multiple values for uniqueKey field
I am using Apache Solr 4.3.0

It's not directly possible. Before I get into how you can do it indirectly, I need to explain a couple of things.
The value in the uniqueKey field is how Solr handles document updating/replacing. When you send a document in for indexing, if an existing document with the same uniqueKey value already exists, Solr will delete its own copy before indexing the new one.
The atomic update functionality is slightly different. It lets an update add, change, or remove any field in the document except the uniqueKey field - because that's the way that Solr can identify the document.
What you need to do is basically index a new document with all the data from the old document, and delete the old document. If all the fields in the document are available to the indexing process, then you can just index the new document, either before or after deleting the old one. Otherwise, you can query the existing doc out of Solr, make a new one and index it, and then delete the old one.
In order to use the existing Solr document to index a new one, all fields must be stored, unless they are copyField destinations, in which case they must NOT be stored. Atomic updates (discussed above) have the same requirement. If one or more of these fields is not stored, then the search result will not contain that field and the data will be lost.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight