imagine this kind of db
Authors(id, author)
Publication(id, authorID, Title, Year....)
What is the best way to proceed string search queries f.e. "2001 Smith Theory of Evolution", I mean not in particular case, but in general: searching records not by 1 column?
For a simple/quick solution:
Consider creating a new (fulltext indexed) terms column on your Publication table which will receive every text string of interest to search (e.g. author name, pubdate, title).
Then add a MATCH/AGAINST clause to your query (or to_tsquery() for Postgres).
Postgres doc: http://www.postgresql.org/docs/9.4/static/textsearch-tables.html
MySQL doc: https://dev.mysql.com/doc/refman/5.7/en/fulltext-search.html
If you find that you need finer control over search relevance, or stock search features like facets and autocomplete, then consider deploying Solr or Elasticsearch as an external index to your database.
Related
I am new to Apache Solr and have worked with single table and importing it in Solr to get data using query.
Now I want to do following.
query from multiple tables ..... Like if I find by a word, it should return all occurances in multiple tables.
Search in all fields of table ....like I query by word in all fields in single table too.
Do I need to create single document by importing data from multiple tables using joins in data-config.xml? And then querying over it?
Any leads and guidance is welcome.
TIA.
Do I need to create single document by importing data from multiple tables using joins in data-config.xml? And then querying over it?
Yes. Solr uses a document model (rather than a relational model) and the general approach is to index a single document with the fields that you need for searching.
From the Apache Solr guide:
Solr’s basic unit of information is a document, which is a set of data
that describes something. A recipe document would contain the
ingredients, the instructions, the preparation time, the cooking time,
the tools needed, and so on. A document about a person, for example,
might contain the person’s name, biography, favorite color, and shoe
size. A document about a book could contain the title, author, year of
publication, number of pages, and so on.
We have a situation where we are keeping two indexes with different schemas.
For example: suppose we have an index for seller where the key value is seller id and other attributes are seller information. Now another index is book where book id is unique key and it keeps book related information.
Is it possible to query both these indexes in a single query and get collective results?
I have checked Solr but as per my findings we can do this through distributed search in Solr but it works on same kind of schema being distributed in at max 3 indexes.
I am a newbie to Solr so please ignore if this is a stupid question.
You need to think about what makes sense for a search query but there are some rules.
The first requirement is that the unique keys need to have the same name and be unique across collections or Solr cannot collate results.
If you are then hoping to get some kind of sensible ranking of your results you need some common fields. For example I have two collections: one of product data and one containing product related documents. I have a unique key: id and I have common title and contents fields for when I want to query across the two collections. I also have an advanced search interface where I can query on specific fields like product id.
A "unification core" is a typical way of handling search across two or more cores, see this Stack Overflow answer on how to set that up
Query multiple collections with different fields in solr
Other techniques are to use federated search with something like Carrot or to issue two queries and show the results in different tabs in the search results.
I have books database that has three entities: Books, pages and titles (titles found in a page). I have got confused and concerned about performance between two approaches in the schema design:
1- Dealing with books as documents i.e book field, pages field with multiValue and titles field with multiValue too. In this approach all of the book data will be represented in one Solr document with very large fields.
2- dealing with pages as documents which will lead in much smaller fields but larger number of documents.
I tried to look at this official resource but I could not able to find a clear answer for my question.
Assuming you are going to take Solr results and present them through another application, I would make the smallest item - Titles - the model for documents, which will make it much easier to present where a result appears. Doing it this way minimizes the amount of application code you need to write. If your users are querying Solr directly I might use Page as a my document instead - presumably you are using Solr's highlighting feature then to assist your users with identifying how their search term(s) matched.
For Title documents I would model the schema as follows:
Book ID + Page Number + Title [string - unique key]
Book ID [integer]
Book Name [tokenized text field]
Page Number [TrieIntField]
Title [tokenized text field]
Content for that book/title/page combination [tokenized text field]
There may be other attributes you want to capture, such as author, publication date, publisher, but you do not explain above what other information you have so I leave that out of this example.
Textual queries then can involve Book Name, Title and Content where you may want to define a single field that's indexed, but not stored, that serves as a target for <copyField/> declarations in your schema.xml to allow for easy searching over all three at the same time.
For indexing, without knowing more about the data being indexed, I would use the ICU Tokenizer and Snowball Porter Stemming Filter with a language specification on the text fields to handle non-English data - assuming all the books are in the same language. And if English, the Standard Tokenizer instead of ICU.
I am a bit confused as to where SOLR usage ends and where it begins.
I use php with a relational mysql db for a shopping site where all tables are related to the product table joining the tables as theyre queried. Needless to say its too slow!
e.g.
Category table - catid, catname, catdesc
Brand table - brandid, brandname, branddesc
Product table - productid, productname, productdesc, catid, brandid
(I also use ranges for price ranges etc)
I am wondering whether I should use SOLR to index the whole relational schema or whether just to index the product table alone and let my application work as it currently does.
If I just switch the product table to use SOLR are there any caveats to this?
e.g. in mysql I can do a fulltext search while joining the brand table. This will allow brands to also be searched upon. Is it possible to achieve the same thing just by switching the product table to SOLR? Are there any other caveats I should be looking out for.
I also would like to create a new table for "searches". This would allow me to use keywords in a mysql table in the following way:
Searches table - searchterm (e.g. lipstick), synonyms (e.g. lipstick, lips etc.)
ie. this would allow me to search upon multiple terms at the same time - a good time to use SOLR facets maybe instead of storing searches in mysql?, or should I just use mysql to store the searches and pull the products from SOLR?
Any help is gladly appreciated
NO NEED TO SWITCH
You don't want to "switch" -- just like using full-text indexing in MySQL (or using something like Sphinx), the full-text index is separate from the database tables.
What you want to do is figure out what you're searching for and index that in Solr -- it may well be just products. That's certainly an easy first step.
Basically you'll:
index the appropriate column(s) into Solr
use Solr for the searching
use the Solr results to point back to the records from the database
I'm more Ruby and Java than PHP, but you'll basically be talking to Solr for the full-text search and using that to find the records you want to display.
Probably a noob question, but I'll go for it nevertheless.
For sake of example, I have a Person table, a Tag table and a ContactMethod table. A Person will have multiple Tag records and multiple ContactMethod records associated with them.
I'd like to have a forgiving search which will search among several fields from each table. So I can find a person by their email (via ContactMethod), their name (via Person) or a tag assigned to them.
As a complete noob to FTS, two approaches come to mind:
Build some complex query which addresses each field individually
Build some sort of lookup table which concatenates the fields I want to index and just do a full-text query on that derived table.
(Feel free to edit for clarity; I'm not in it for the rep points.)
If your sql server supports it you can create an indexed view and full text search that; you can use containstable(*,'"chris"') to read all the columns.
If it doesn't support it as the fields are all coming from different tables I think for scalability; if you can easily populate the fields into a single row per record in a separate table I would full text search that rather than the individual records. You will end up with a less complex FTS catalog and your queries will not need to do 4 full text searches at a time. Running lots of separate FTS queries over different tables at the same time is a ticket to query performance issues in my experience. The downside with doing this is you lose the ability to search for Surname on its own; if that is something you need you might need to look at an alternative.
In our app we found that the single table was quicker (we can't rely on customers having enterprise sql at hand); so we populate the data with spaces into an FTS table through an update sp then our main contact lookup runs a search over the list. We have two separate searches to handle finding things with precision (i.e. names or phone numbers) or just for free text. The other nice thing about the table is it is relatively easy and low cost to add further columns to the lookup (we have been asked for social security number for example; to do it we just added the column to the update SP and we were away with little or no impact.
One possibility is to make a view which has these columns: PersonID, ContentType, Content. ContentType would be something like "Email", "PhoneNumber", etc... and Content would hold that. You'd be searching on the Content column, and you'd be able to see what the person's ID is. I'm not 100% sure how full text search works though, so I'm not sure if you could use that on a view.
The FTS can search multiple fields out-of-the-box. The CONTAINS predicate accepts a list of columns to search. Also CONTAINSTABLE.