I'm building a Java app using a relational database and I wish to map it's primary data to a Solr index/es. However, I'm not sure how to map the components of a database. At the momement I've mapped a single row cell to a Solr/Lucene Document.
A doc would be something like this (each line is a field):
schema: "schemaName"
table: "tableName"
column: "columnName"
row: "rowNumber"
data: "data on schemaName.tableName.columnName.row"
This allows me to have a "fixed" Solr schema.xml(as far as I know it has to be defined "before" creating indexes). Also dynamic fields doesn't seem to serve my purpose.
What I've found while searching is that a single row is usually mapped to a Solr Document and each column is mapped as a Field. But, how can I add the column names as fields into schema.xml (when I don't know the columns a table has)? Also, I would need the info to be queried as if it was SQL. I.e, search for all rows of a column in a table, etc, etc.
With my current "solution" I can do that kind of queries but I'm worried with performance as I'm new to Solr and I don't know the implications it may have.
So, what do you say about my "solution"? Is there another way map a database to a Solr index concerning the schema.xml fields should be set before indexing? I've also "heard" that a table is usually mapped to a index: how could I achieve that?
Maybe I'm just being noob but by the research I did I don't see how I can map a database Row to a Solr Document without messing with schema.xml Fields.
I would appreciate any thoughts :) Regards.
You can specify your table columns in the schema before hand or use dynamic fields and then use the solr DIH to import the data into solr from the database. Select your dynamic fields name in the queries for DIH.
Please go through Solr DIH for database integration
Related
I am new to Apache Solr and have worked with single table and importing it in Solr to get data using query.
Now I want to do following.
query from multiple tables ..... Like if I find by a word, it should return all occurances in multiple tables.
Search in all fields of table ....like I query by word in all fields in single table too.
Do I need to create single document by importing data from multiple tables using joins in data-config.xml? And then querying over it?
Any leads and guidance is welcome.
TIA.
Do I need to create single document by importing data from multiple tables using joins in data-config.xml? And then querying over it?
Yes. Solr uses a document model (rather than a relational model) and the general approach is to index a single document with the fields that you need for searching.
From the Apache Solr guide:
Solr’s basic unit of information is a document, which is a set of data
that describes something. A recipe document would contain the
ingredients, the instructions, the preparation time, the cooking time,
the tools needed, and so on. A document about a person, for example,
might contain the person’s name, biography, favorite color, and shoe
size. A document about a book could contain the title, author, year of
publication, number of pages, and so on.
The database I have at hand uses the EAV model describing all objects one can find in a house. Good or bad isn't the question, there is no choice but to keep and use this model. 6.000+ items point to 3.000+ attributes and 150.000+ attribute-values.
My task is to get this data into a Solr index for quick searching/sorting/faceting.
In Solr, using DIH, a regular SQL query is used to extract data. Each column name returned from the query is a 'field' (defined or not in a schema), and each row of the query's resultset is a 'document'.
Because the EAV model uses rows for attributes instead of columns, a simple query will not work, I need to flatten each item row. What should my SQL query look like in order to extract all items from the DB ? Is there a special Solr/DIH configuration which I should consider ?
There are some similar questions on SO, but none really helped.
Any pointers are much appreciated!
I have indexed two json documents into Solr, and when I get the response am I recieving both documents - how to differentiate the two documents and store the documents separately?
You need to define a (unique) key when indexing the json-documents - this key being either mandatory or not. This could be done in schema.xml or managed-schema, if not already done. Further on would you have to search for this key in the query for fetching the wanted document.
This can be compared with querying for a unique primary key in SQL and traditional databases. A tuple/record, uniquely defined by the primary key, would in this scenario be equivalent with the json documents.
Assuming two documents with respective unique id 1 and 2 - can you fetch document 1 by searching forq=id:1 in the Solr Admin-UI - if you want the document with id 1. I'm afraid I don't know how to do this is Solrj or by QueryResponse.
Management of where documents are stored in Solr is not supported - it is more or less black-boxed. This should however not be a problem considering your situation as long as you specify the query correctly.
Look here for a link that tells how to use Solr 6 as a JDBC dataSource . Better if you use Solr 6 if you want to utilize Solr more as a data source rather than an index source as it has enhanced SQL level features and hence, serves the purpose best . Here is the link https://sematext.com/blog/2016/04/26/solr-6-as-jdbc-data-source/ . Let me know if that helps you :) .
I'm working with solr and indexing data from DB.
When I import the data using SQL query, I got some rows with the same key.
I need a way that solr will generate a new field with unique key.
How can I do that?
Thanks
I am not sure if this is possible or not, but maybe you need to re-consider your logic here...
Indexing operation into Solr should be Re-Runable. So, imagine that you come one day and decide to change the schema of your core.
If you generate a new key everytime you import a document, you will end up creating duplicate items when you re-run your data import.
Maybe you need to revisit your DB design to have a unique key, or maybe in the select query, you can create a derived or calculated column value that is calculated based on multiple columns. But I am sure that pushing this problem to solr is not the solution.
ideally the unique key should come from the db (are you sure you cannot get one, by composing some columns etc?).
But, if you cannot, Solr supports UUID generation for this, look here to see how it works depending on your solr version
I am a bit confused as to where SOLR usage ends and where it begins.
I use php with a relational mysql db for a shopping site where all tables are related to the product table joining the tables as theyre queried. Needless to say its too slow!
e.g.
Category table - catid, catname, catdesc
Brand table - brandid, brandname, branddesc
Product table - productid, productname, productdesc, catid, brandid
(I also use ranges for price ranges etc)
I am wondering whether I should use SOLR to index the whole relational schema or whether just to index the product table alone and let my application work as it currently does.
If I just switch the product table to use SOLR are there any caveats to this?
e.g. in mysql I can do a fulltext search while joining the brand table. This will allow brands to also be searched upon. Is it possible to achieve the same thing just by switching the product table to SOLR? Are there any other caveats I should be looking out for.
I also would like to create a new table for "searches". This would allow me to use keywords in a mysql table in the following way:
Searches table - searchterm (e.g. lipstick), synonyms (e.g. lipstick, lips etc.)
ie. this would allow me to search upon multiple terms at the same time - a good time to use SOLR facets maybe instead of storing searches in mysql?, or should I just use mysql to store the searches and pull the products from SOLR?
Any help is gladly appreciated
NO NEED TO SWITCH
You don't want to "switch" -- just like using full-text indexing in MySQL (or using something like Sphinx), the full-text index is separate from the database tables.
What you want to do is figure out what you're searching for and index that in Solr -- it may well be just products. That's certainly an easy first step.
Basically you'll:
index the appropriate column(s) into Solr
use Solr for the searching
use the Solr results to point back to the records from the database
I'm more Ruby and Java than PHP, but you'll basically be talking to Solr for the full-text search and using that to find the records you want to display.