Solr schema strategy - solr

We are creating an entertainment site which we want to be able to search events, restaurants & bars, movies, arts & theater, and TV/radio.
All of these obviously have different fields associated with them.
A restaurant would have the following fields: name, address, category, description
A movie would have the following fields: name, theater_name, theater_address, times, description
arts & theater: name, address, venue_name
Should I be storing all of these in the same index? How would you recommend sharing common fields and creating unique fields for each content type?
Sometimes this would be searched individually while other times they might be searched together.

Here is are posts about some of the tradeoffs between a single index and multiple indexes.
Solr Combined Versus Single Index
Multiple Indexes
Multiple Doc Types
Based on what you have shown for fields, I would suggest using common fields for name, address, description and then additional specific fields for each type as necessary. In regard to the additional fields, you could leverage the power of Dynamic Fields if you do not want to define all of the additional fields up front.
<dynamicField name="theater_*" type="string" indexed="true" stored="true" />

Related

How to decide the dynamic filed in solr using data type and without any suffix or prefix

I am indexing the RDBMS data into solr from my java application. For each row of a table I am creating a java bean and adding to solr server.(While creating a bean which is nothing but one solr document, I am using table's column name as field name of solr doc and corresponding value as solr field's value). But we support to index data from any number of tables , where each table will have different column names and data types. To, handle this we are using dynamic fields in schema.xml as below
<dynamicField name="*" type="string" indexed="true" stored="true" multiValued="true"/>
But the problem with this configuration is all the fields type is String , but I want to use numeric types for numeric data types in RDBMS and String for Varchar data type. Please suggest me how can I achieve this. I can't use suffix or prefix to field name while creating solr doc because I want to index and retrieve the docs using field name same as column name of table.
Any suggestions are appreciated.

Frequently Index SearchTerm to Solr

I am working on eCommerce web application which is developed using DOT NET MVC. I use Solr to index product details. So that I have mentioned Product related fields to my Solr Schema file.
Now I also want to index SearchTerm to Solr. For this how can I manage my Schema file to store/index searchterm as my Schema file is product specific?
Can anyone please suggest?
You can have a separate core for this and define the new schema.xml for it or if you want to use the existing schema.xml then you can make use of the dynamic fields by which you need not have bother in future if any other field you need to add..
You can use Dynamic fields.
Dynamic fields allow Solr to index fields that you did not explicitly define in your schema.
This is useful if you discover you have forgotten to define one or more fields. Dynamic fields can make your application less brittle by providing some flexibility in the documents you can add to Solr.
A dynamic field is just like a regular field except it has a name with a wildcard in it. When you are indexing documents, a field that does not match any explicitly defined fields can be matched with a dynamic field.
For example, suppose your schema includes a dynamic field with a name of *_i.
If you attempt to index a document with a cost_i field, but no explicit cost_i field is defined in the schema, then the cost_i field will have the field type and analysis defined for *_i.
Like regular fields, dynamic fields have a name, a field type, and options.
<dynamicField name="*_i" type="int" indexed="true" stored="true"/>

Store category information in Solr

I have product information stored in my solr database. A product can be a part of multiple categories.
Now, I want to store information about those categories inside the product which belongs in those categories. (Is there any other way?)
So, say a product A belongs to category C1 and C2 with ids I1 and I2. Now how do i store this mapping of I1 to C1 in my product A? What should be the schema to do so?
But, if a simply store a list of ids, names and some other data(say urls), then the mapping of each id to name or url will be lost. Like this:
<field name="category_ids" type="tints" indexed="true" stored="true"/>
<field name="category_names" type="strings" indexed="true" stored="true"/>
So how should I store documents?
The way you've described works - Solr will keep the sequence between fields the same, so you can assume that the first value in the field category_ids corresponds to the first value in the field category_names. We use this to index more complex objects in several multi value fields.
A second solution is to use the category id to look up the actual category information in your middleware, querying the database for the related information. This will avoid having to reindex all documents for a category if the name changes (except if you use the name for querying, which will require you to do a re-index regardless of solution selected).
A third solution would be to have a field containing both id, name in a serialized form, such as 3;Laptops or as JSON, and just store the field while not indexing it (and use an indexed, non-stored field for actual searching).
You can also use child documents for something like this, but my personal opinion is that it'll give you quite a bit of unnecessary complexity.

Solr schema design and performance

I have books database that has three entities: Books, pages and titles (titles found in a page). I have got confused and concerned about performance between two approaches in the schema design:
1- Dealing with books as documents i.e book field, pages field with multiValue and titles field with multiValue too. In this approach all of the book data will be represented in one Solr document with very large fields.
2- dealing with pages as documents which will lead in much smaller fields but larger number of documents.
I tried to look at this official resource but I could not able to find a clear answer for my question.
Assuming you are going to take Solr results and present them through another application, I would make the smallest item - Titles - the model for documents, which will make it much easier to present where a result appears. Doing it this way minimizes the amount of application code you need to write. If your users are querying Solr directly I might use Page as a my document instead - presumably you are using Solr's highlighting feature then to assist your users with identifying how their search term(s) matched.
For Title documents I would model the schema as follows:
Book ID + Page Number + Title [string - unique key]
Book ID [integer]
Book Name [tokenized text field]
Page Number [TrieIntField]
Title [tokenized text field]
Content for that book/title/page combination [tokenized text field]
There may be other attributes you want to capture, such as author, publication date, publisher, but you do not explain above what other information you have so I leave that out of this example.
Textual queries then can involve Book Name, Title and Content where you may want to define a single field that's indexed, but not stored, that serves as a target for <copyField/> declarations in your schema.xml to allow for easy searching over all three at the same time.
For indexing, without knowing more about the data being indexed, I would use the ICU Tokenizer and Snowball Porter Stemming Filter with a language specification on the text fields to handle non-English data - assuming all the books are in the same language. And if English, the Standard Tokenizer instead of ICU.

solr - complex data structure

I have the following data structure for creating index.
user
userid
username
userstatus
friends
friendid
friendstatus
friendcreateddate
I think dynamic field wont work for me since I need to query based on specific field names.
I have search based on friendstatus and friendcreateddate. Can someone advise me on best possible document structure?
That is a very simple data structure. You just need to look at an example schema.xml and put your own field definitions in there. A field like "friends" would be declared as multiValued="true" and the userid would be tagged <uniqueKey>
Follow this guide http://wiki.apache.org/solr/SchemaXml
and ignore complicated stuff like dynamic fields which you probably don't need.

Resources