Solr - How to index on multiple entities? - solr

I have two tables contacts and inventory. These two tables are not related. I want to index these two tables and search using Solr.
Is this possible?

If some part of your application needs to search for contacts, and another one needs to search in the inventory, create two separate indices. Storing wildly different data in the same index is almost never a good idea, it complicates things unnecessarily. As the Solr wiki wisely says:
The more heterogeneous (different
kinds of data) you have in one field
or in one index, the less useful it
is.
You don't need to have multiple Solr instances to accomodate multiple indices, you can easily manage this with multi-core.

I found a very helpful answer to this question here, including some guidance on using "multiple indexes" vs. "multiple document types in one index". The post also links to example code on github that I found very useful.

Yes, you can do that. Simply create a Solr schema, that contains all fields necessary for both tables and add another field, that contains the table name. During indexing, add the table name property to the fields you want to index. During searching also always include a query parameter for the table name field.
As an alternative, you can setup multiple instances of Solr. But you should do this only, if we are talking about massive amounts of data here (like millions of table rows).

Related

Apache Solr Querying by search term from multiple tables and in all columns

I am new to Apache Solr and have worked with single table and importing it in Solr to get data using query.
Now I want to do following.
query from multiple tables ..... Like if I find by a word, it should return all occurances in multiple tables.
Search in all fields of table ....like I query by word in all fields in single table too.
Do I need to create single document by importing data from multiple tables using joins in data-config.xml? And then querying over it?
Any leads and guidance is welcome.
TIA.
Do I need to create single document by importing data from multiple tables using joins in data-config.xml? And then querying over it?
Yes. Solr uses a document model (rather than a relational model) and the general approach is to index a single document with the fields that you need for searching.
From the Apache Solr guide:
Solr’s basic unit of information is a document, which is a set of data
that describes something. A recipe document would contain the
ingredients, the instructions, the preparation time, the cooking time,
the tools needed, and so on. A document about a person, for example,
might contain the person’s name, biography, favorite color, and shoe
size. A document about a book could contain the title, author, year of
publication, number of pages, and so on.

Implementing tag system with multiple tables

I am trying to implement tag system similar to one which StackOverflow has. Obviously I've read multiple articles including this answer.
However my scenario is little bit different
there will be limited amount of tags which can be only created by user with higher privilege (anybody can assign a tag there). This excludes option #1 (from SO question I linked above, each tag is inserted directly into the tables tags column and then it's queried with LIKE) I guess
there are also multiple tables in DB which can be tagged (currently five)
Especially second criteria makes it harder so these are my thoughts
I could follow option #3, have table tags and have M:N relationship with each table. However that would make searching harder (imagine that join if the table number grows) and also I need to tell which table (application module) matches the tag in a search result
I could use some kind of polymorphism but I am pretty new to this concept regarding to the databases so is this something which fits to this problem well?
I use newest version of PostgreSQL.
Since you are using PostgreSQL, you have the option of some field types which aren't available for other databases. Particularly, arrays and JSON fields. I did some performance comparisons of the various methods in a blog post. Arrays and JSONB were definitely better options than a tags table for any search which needed to combine multiple tags.
Given that, I would recommend creating a tags column for each table on which you want to have tags, either an array or a JSONB column, depending. If you need to search over multiple tables, I'd suggest a UNION query instead of having a single monolithic tags table which joins to everything.

Querying Solr multiple indexes with different schema in single query

We have a situation where we are keeping two indexes with different schemas.
For example: suppose we have an index for seller where the key value is seller id and other attributes are seller information. Now another index is book where book id is unique key and it keeps book related information.
Is it possible to query both these indexes in a single query and get collective results?
I have checked Solr but as per my findings we can do this through distributed search in Solr but it works on same kind of schema being distributed in at max 3 indexes.
I am a newbie to Solr so please ignore if this is a stupid question.
You need to think about what makes sense for a search query but there are some rules.
The first requirement is that the unique keys need to have the same name and be unique across collections or Solr cannot collate results.
If you are then hoping to get some kind of sensible ranking of your results you need some common fields. For example I have two collections: one of product data and one containing product related documents. I have a unique key: id and I have common title and contents fields for when I want to query across the two collections. I also have an advanced search interface where I can query on specific fields like product id.
A "unification core" is a typical way of handling search across two or more cores, see this Stack Overflow answer on how to set that up
Query multiple collections with different fields in solr
Other techniques are to use federated search with something like Carrot or to issue two queries and show the results in different tabs in the search results.

Is there a pattern to avoid ever-multiplying link tables in database design?

Currently scoping out a new system. Like many systems, it will be required to store documents and link them to other kinds of item. In this instance a Document object can belong to a Job or it can belong to an Item (which in turn belongs to a Job).
We could do this by having a JobId and an ItemId against a Document and leaving one or the other blank if necessary, but that's going to mean annoying conditional logic in the handling code. So, two link tables seems a better idea.
However, it is likely that we will need to link Documents to other items in the system at some point in the future. There are Company and User objects, for example, and we might want to record Documents against those. There may be more.
That would entail a proliferation of link tables which, while effective, is messy and hard to follow.
This solution is in SQL Server and will be handled in code via Entity Framework.
Are there any design principles that can allow us to hook up Document objects with a variety of other system objects as required in a neater and more flexible way?
You could store two values: the id, and the type of object to which the document is attached. It doesn't allow the use of foreign keys, but is compatible with many application development frameworks.
If you have the partitioning option then you could dedicate different partitions to different object types.
You could also have multiple tables, one for job documents, one for item documents, and get an overview of all of them with a view that UNION ALL's them together. If you need uniqueness in that result set then you could use UUIDs for the primary key, or add an extra column to the view to express from which table the row was read.

Full-text Search on Joined, Hierarchical Records in SQL Server 2008

Probably a noob question, but I'll go for it nevertheless.
For sake of example, I have a Person table, a Tag table and a ContactMethod table. A Person will have multiple Tag records and multiple ContactMethod records associated with them.
I'd like to have a forgiving search which will search among several fields from each table. So I can find a person by their email (via ContactMethod), their name (via Person) or a tag assigned to them.
As a complete noob to FTS, two approaches come to mind:
Build some complex query which addresses each field individually
Build some sort of lookup table which concatenates the fields I want to index and just do a full-text query on that derived table.
(Feel free to edit for clarity; I'm not in it for the rep points.)
If your sql server supports it you can create an indexed view and full text search that; you can use containstable(*,'"chris"') to read all the columns.
If it doesn't support it as the fields are all coming from different tables I think for scalability; if you can easily populate the fields into a single row per record in a separate table I would full text search that rather than the individual records. You will end up with a less complex FTS catalog and your queries will not need to do 4 full text searches at a time. Running lots of separate FTS queries over different tables at the same time is a ticket to query performance issues in my experience. The downside with doing this is you lose the ability to search for Surname on its own; if that is something you need you might need to look at an alternative.
In our app we found that the single table was quicker (we can't rely on customers having enterprise sql at hand); so we populate the data with spaces into an FTS table through an update sp then our main contact lookup runs a search over the list. We have two separate searches to handle finding things with precision (i.e. names or phone numbers) or just for free text. The other nice thing about the table is it is relatively easy and low cost to add further columns to the lookup (we have been asked for social security number for example; to do it we just added the column to the update SP and we were away with little or no impact.
One possibility is to make a view which has these columns: PersonID, ContentType, Content. ContentType would be something like "Email", "PhoneNumber", etc... and Content would hold that. You'd be searching on the Content column, and you'd be able to see what the person's ID is. I'm not 100% sure how full text search works though, so I'm not sure if you could use that on a view.
The FTS can search multiple fields out-of-the-box. The CONTAINS predicate accepts a list of columns to search. Also CONTAINSTABLE.

Resources