DIH delta import for multiple fields index from mysql - solr

What is pk in solr DIH delta import? I am trying to delta index multiple fields in solr?

I believe it is whatever field you specify in your schema.xml file as the id field.

It is a name of Solr field that serves as a unique key for that record. You define your mapping of source to that Solr column and then - after mapping - Solr checks its presence and values based on the pk field you specified.
It is different from primaryKey because you may be generating primaryKey or it may not be suitable somehow. But it could be same. I think the clearest Wiki explanation may be in the example for HttpDataSource.
I believe, you may also be able to define a compound pk for when you are flattening inner source entries into one Solr entry.

I think the problem is in your delta-query for the child entity. You have given,
deltaQuery="select id from cc_gadget_lang where '${cc_gadget.last_modified_date}' > '${dataimporter.last_index_time}'"
I think the where condition in the above query validates to TRUE always and there is no specific purpose of having that.
The Solution I would suggest is to have a separate "last_modified_date" field in the "cc_gadget_lang" table in your database and use that in the delta query of your child entity.
I also believe that there is no need to have the "pk" of the child entity in your schema file because, they are stored and used temporarily during delta-imports and do not require to be stored permanently in Index.

Related

Solr: deltaQuery / parentDeltaQuery / deltaImportQuery

Solr's documentation for DataImportHandler gives this table for the entity query attributes.
That's not extremely descriptive. Can someone express here the difference and interaction between these query attributes? I have seen some code use deltaQuery and parentDeltaQuery to support nested entities, and I have seen others use deltaQuery and deltaImportQuery.
What is the purpose of choosing one of those over the other?
I see it now in the Solr Wiki:
* The query gives the data needed to populate fields of the Solr document in full-import
* The deltaImportQuery gives the data needed to populate fields when running a delta-import
* The deltaQuery gives the primary keys of the current entity which have changes since the last index time
* The parentDeltaQuery uses the changed rows of the current table (fetched with deltaQuery) to give the changed rows in the parent table. This is necessary because whenever a row in the child table changes, we need to re-generate the document which has that field.
I missed this explanation on the first pass, and expected that information to show up in the table I posted. Strangely enough, Solr In Action spent less than 1 page of 600 explaining how to use DataImportHandler to read a database.

Database mapping to Solr

I'm building a Java app using a relational database and I wish to map it's primary data to a Solr index/es. However, I'm not sure how to map the components of a database. At the momement I've mapped a single row cell to a Solr/Lucene Document.
A doc would be something like this (each line is a field):
schema: "schemaName"
table: "tableName"
column: "columnName"
row: "rowNumber"
data: "data on schemaName.tableName.columnName.row"
This allows me to have a "fixed" Solr schema.xml(as far as I know it has to be defined "before" creating indexes). Also dynamic fields doesn't seem to serve my purpose.
What I've found while searching is that a single row is usually mapped to a Solr Document and each column is mapped as a Field. But, how can I add the column names as fields into schema.xml (when I don't know the columns a table has)? Also, I would need the info to be queried as if it was SQL. I.e, search for all rows of a column in a table, etc, etc.
With my current "solution" I can do that kind of queries but I'm worried with performance as I'm new to Solr and I don't know the implications it may have.
So, what do you say about my "solution"? Is there another way map a database to a Solr index concerning the schema.xml fields should be set before indexing? I've also "heard" that a table is usually mapped to a index: how could I achieve that?
Maybe I'm just being noob but by the research I did I don't see how I can map a database Row to a Solr Document without messing with schema.xml Fields.
I would appreciate any thoughts :) Regards.
You can specify your table columns in the schema before hand or use dynamic fields and then use the solr DIH to import the data into solr from the database. Select your dynamic fields name in the queries for DIH.
Please go through Solr DIH for database integration

changing solr id from string to uuid

I am very new to solr.
Initially the "id" in my solr schema was of type string.
I have 30,000 documents, but now I want to use uuid instead of a string.
Simply changing the id to uuid and following instructions from http://wiki.apache.org/solr/UniqueKey
It did not work because it tried to string id as uuid and it failed.
My question is how do i change my id to uuid without deleting any data ?
Any info on this will be helpful.
Hope your id field is be mentioned as uniqueKey in the schema.xml. That means every solr document in your Solr instance must contain the id field. When you modify the type of any field in the schema, the previously created index for those fields get messed up. Now you can't query on those field, though they are still present in your Solr instance.
What good is that if you can not query on the data, you indexed to query? So, there is no good keeping the old document in your Solr, on which you can't query. And this time you have modified the uniqueKey field. So, you must re-index. If you would have modified the type of other field except uniqueKey, then Atomic update or partial update would have been a solution.

indexing different record types in one single solr schema

I am struggling with the overall view of how (whether possible) one might be able to index multiple different types of records in one single Solr core. Multiple records meaning that they have different unique keys.
We are inclined to want to use a single core because we want to be able to, at certain levels, search everything all at once and not have to cobble cores together.
So, for example, I have products that have the fields:
product_code <--- unique key
product_title
product_description
etc...
then there are job listings that have the fields:
job_id <---- unique key
job_description
job_title
etc...
there are multiple other entities, including a Nutch search index, which will have a unique id of 'id'
is it possible to include in the schema.xml more than one unique key? so that id do not have to send each different kind of record to a different solr core?
The main concern I have is that in identifying the <uniqueKey>s at least one of them has to be required, but not all records sent to the solr index will have the required key.
Is there an accepted way to get around this problem in Solr?
See https://wiki.apache.org/solr/MultipleIndexes#Flattening_Data_Into_a_Single_Index and https://wiki.apache.org/solr/UniqueKey
Solr does not need a uniqueKey. If you do not specify a unique key, then you need to do the following - when you post a new doc that has the same key as an existing doc, the new doc will not replace the old one, so you will have to delete the old one first manually and then add the new one (and commit, of course).
If you need a unique key, then append a prefix to the IDs which is based on the type. Then you can have two other fields like id and type. So, for example:
uniquekey: P1
product_code: 1
type: product
uniquekey: J1
job_id: 1
type: job

Obtaining record from Solr using single Key

I am using solr and looked over the documentations but couldn't find a way to get a single record from Solr by using a key?
If I know the key value of the record what is the query I need to pass to Solr to obtain this record?
Thanks.
Not sure what you mean by key, but guessing from context, you mean a field defined by your schema, if this is the case, you could issue the following:
// Assumes Id is a schema field
// If via solr admin
q=Id:1
// Properly escaped
q=Id%3A1

Resources