Timestamp compatibility while performing delta import in solr - database

Im new to solr.I have successfully indexed oracle 10g xe database. Im trying to perform delta import on the same.
The delta query requires a comparison of last_modified column of the table with ${dih.last_index_time}.
However in my application I do not have such a column . Also, i cannot add this column. Therefore i used 'scn_to_timestamp(ora_rowscn)' to give the value of the required timestamps. This query returns the value of type timestamp in the following format 24-JUL-13 12.42.32.000000000 PM and dih.last_index_time is in the format 2013-07-24 12:18:03. So, I changed the format of dih.last_index_time as to_timestamp('${dih.last_index_time}', 'YYYY/MM/DD HH:MI:SS').
My Data-config looks like this -
<dataConfig>
<dataSource type="JdbcDataSource" driver="oracle.jdbc.OracleDriver" url="jdbc:oracle:thin:#XXX.XXX.XX.XX:XXXX:xe" user="XXXXXXXX" password="XXXXXXX" />
<document name="product_info">
<entity name="PRODUCT" pk="PID" query="SELECT * FROM PRODUCT" deltaImportQuery="SELECT * FROM PRODUCT WHERE PID=${dih.delta.id}" deltaQuery="SELECT PID FROM PRODUCT WHERE scn_to_timestamp(ora_rowscn) > to_timestamp('${dih.last_index_time}', 'YYYY/MM/DD HH:MI:SS')">
<field column="PID" name="id" />
<field column="PNAME" name="itemName" />
<field column="INITQTY" name="itemQuantity" />
<field column="REMQTY" name="remQuantity" />
<field column="PRICE" name="itemPrice" />
<field column="SPECIFICATION" name="specifications" />
<entity name="SUB_CATEGORY" query="SELECT * FROM SUB_CATEGORY WHERE SCID=${PRODUCT.SCID}">
<field column="SUBCATNAME" name="brand" />
<entity name="CATEGORY" query="SELECT CNAME FROM CATEGORY WHERE CID=${SUB_CATEGORY.CID}">
<field column="CNAME" name="itemCategory" />
</entity>
</entity>
</entity>
</document>
</dataConfig>
However,This is not working and im getting the following error -
Unable to execute query: SELECT * FROM PRODUCT WHERE PID= Processing Document # 1
Caused by: java.sql.SQLException: ORA-00936: missing expression
Please help me out!!!

I had a similar issue and had more success with *to_date*. But looking at this again, it just looks like perhaps you just need to quote your delta id in the delatImportQuery:
deltaImportQuery="SELECT * FROM PRODUCT WHERE PID='${dih.delta.id}'"

Related

SolR's Tika processor in Data Import Handler does not get filename from DB processor

I have a DIH configuration where I want to combine data from DB and Tika, by passing the filename from db to Tika. Problem is that filename in Tika is coming as empty. Logs say:
ERROR (Thread-16) [ ] o.a.s.h.d.DataImporter Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.RuntimeException: java.io.FileNotFoundException: Could not find file: (resolved to: C:\Users\jimbo\Desktop\solr-8.9.0\server\.
My configuration xml file is this:
<dataConfig>
<dataSource name="ds-db" driver="org.mariadb.jdbc.Driver" url="jdbc:mysql://localhost:3306/eepyakm?user=root" user="root" password="wpadmin"/>
<dataSource name="ds-file" type="BinFileDataSource"/>
<document>
<entity name="supplier" query="select * from suppliers_tmp_view" dataSource="ds-db"
deltaQuery="select id from suppliers_tmp_view where last_modified > '${dataimporter.last_index_time}'"
deltaImportQuery="select * from suppliers_tmp_view where id='${dataimporter.delta.id}'">
<entity name="attachment" dataSource="ds-db"
query="select * from suppliers_tmp_files_view where supplier_tmp_id='${supplier.id}'"
deltaQuery="select id,supplier_tmp_id from suppliers_tmp_files_view where last_modified > '${dataimporter.last_index_time}'"
parentDeltaQuery="select id from suppliers_tmp_view where id='${attachment.supplier_tmp_id}'">
<field name="path" column="path"/>
<entity name="file" processor="TikaEntityProcessor" url="${attachment.path}" format="text" dataSource="ds-file">
<field column="text"/>
</entity>
</entity>
</entity>
</document>
</dataConfig>
I found a similar problem at a very old post: Solr's TikaEntityProcessor not working

SolR's Data Import Handler tracks but ignores nested entity's changes

I have two tables and I'm trying to make Data Import Handler to update the index of the document when the sub-entity changes. When I fire the "delta-import" command, I get the following:
{
"responseHeader":{
"status":0,
"QTime":3},
"initArgs":[
"defaults",[
"config","db-data-config.xml"]],
"command":"delta-import",
"status":"idle",
"importResponse":"",
"statusMessages":{
"Total Requests made to DataSource":"5",
"Total Rows Fetched":"3",
"Total Documents Processed":"0",
"Total Documents Skipped":"0",
"Delta Dump started":"2021-08-16 11:05:47",
"Identifying Delta":"2021-08-16 11:05:47",
"Deltas Obtained":"2021-08-16 11:05:47",
"Building documents":"2021-08-16 11:05:47",
"Total Changed Documents":"0",
"Time taken":"0:0:0.12"}}
My data config is this:
<dataConfig>
<dataSource driver="org.mariadb.jdbc.Driver" url="jdbc:mysql://localhost:3306/eepyakm?user=root" user="root" password="root"/>
<document>
<entity name="supplier" query="select * from suppliers_tmp_view"
deltaQuery="select id from suppliers_tmp_view where last_modified > '${dataimporter.last_index_time}'"
deltaImportQuery="select * from suppliers_tmp_view where id='${dataimporter.delta.id}'">
<entity name="attachment"
query="select * from suppliers_tmp_files_view where supplier_tmp_id='${supplier.id}'"
deltaQuery="select id from suppliers_tmp_files_view where last_modified > '${dataimporter.last_index_time}'"
parentDeltaQuery="select id from suppliers_tmp_view where id='${attachment.supplier_tmp_id}'">
<field name="path" column="path" />
</entity>
</entity>
</document>
</dataConfig>
In my understanding, "Total Rows Fetched" shows that 3 entries in the sub-entity table have changed. So, why doesn't it index the changed field?
If I do a "full-import" it picks the changes fine.
Neither of your queries do include a supplier_tmp_id - but you still reference this in your parentDeltaQuery.
You want to select this column as well in your SELECT statement.

Solr templated sql print nothing in dataimport

I have the following dataimport configuration:
<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
<dataSource driver="net.ucanaccess.jdbc.UcanaccessDriver" type="JdbcDataSource" url="jdbc:ucanaccess://C:/feqh/main.mdb;memory=false" />
<document>
<entity name="Book"
query="select bkid AS id, bkid AS BookID,bk AS BookTitle, betaka AS BookInfo, cat as cat from 0bok">
<field column="id" name="id"/>
<field column="BookID" name="BookID"/>
<field column="BookTitle" name="BookTitle"/>
<field column="cat" name="cat"/>
<entity name="Category"
query="select name as CatName, catord as CatWeight, Lvl as CatLevel from 0cat where id = ${Book.cat}">
<field column="CatName" name="CatName"/>
<field column="CatWeight" name="CatWeight"/>
<field column="CatLevel" name="CatLevel"/>
</entity>
</entity>
</document>
</dataConfig>
This dataimport is failed due to the following error from the log:
Unable to execute query: select name as CatName, catord as CatWeight,
Lvl as CatLevel from 0cat where id = Processing Document # 1
When I replace ${Book.cat} with any fixed number such as 128, the import works fine.
So it seems that ${Book.cat} does not printout any value. The database that I import data from is MS Access database mdb using ucanaccess version 2.0.9. I'm using Solr 4.9.0 on Java8. How could I solve this issue?
For unknown reason I found that the column name should be written in Upper case in the template, so ${Book.cat} should be ${Book.CAT} I said unknown because I'm sure that the column
name in the database is written lower case cat.

Solr DataImportHandler 1 to many relation

I have setup MySQL tables for storing articles.
Basically, they look like this:
article
-------
article_number
title
sets_article
-------
setcode
article_number
I have setup a schema.xml and configured the DataImportHandler. Everything works fine except, that the sets are not stored, when I call the DataImportHandler with full-import.
Here is my the relevant part of my data-config.xml:
<document name="articles">
<entity name="article"
pk="article_number"
query="select * from article"
deltaImportQuery="select * from article where article_number='${dih.delta.article_number}'"
deltaQuery="select article_number from article where tstamp > UNIX_TIMESTAMP(STR_TO_DATE('${dih.last_index_time}', '%Y-%m-%d %H:%i:%s'))">
<entity name="sets_article" query="select setcode as sets from sets_article where article_number='${article.article_number}'" />
<entity name="sets_articlel2" query="select distinct setcode as sets2 from sets_article" />
<entity name="sets_articlel3" query="select distinct setcode as sets3 from sets_article where article_number='11112222'" />
</entity>
</document>
The entities sets_article2 and sets_article3 work fine, so I think there is a problem with:
where article_number='${article.article_number}'
Does anybody know what is wrong with this setup?
The problem was a simple misconfiguration.
The tables didn't have the proper primary keys set.

solr: import from different datasources using DIH

I am trying to fill a Solr index from 2 different data-sources (xml and db) using the DataImportHandler.
1st try: Created 2 data-config.xml files, one for the xml import and one for the db import.
The db-config would read id and lets say field A. The xml-config also id and field B.
That works for both (i could import from both datasources), but the index got overwritten each time (with clean=false of course), so I either had id and A or id and B
so on for the
2nd try: merged the 2 files into one
<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
<dataSource
name="cr-db"
jndiName="xyz"
type="JdbcDataSource" />
<dataSource
name="cr-xml"
type="FileDataSource"
encoding="utf-8" />
<document name="doc">
<entity
dataSource="cr-xml"
name="f"
processor="FileListEntityProcessor"
baseDir="/path/to/xml"
filename="*.xml"
recursive="true"
rootEntity="false"
onError="skip">
<entity
name="xml-data"
dataSource="cr-xml"
processor="XPathEntityProcessor"
forEach="/root"
url="${f.fileAbsolutePath}"
transformer="DateFormatTransformer"
onError="skip">
<field column="id" xpath="/root/id" />
<field column="A" xpath="/root/a" />
</entity>
<entity
name="db-data"
dataSource="cr-db"
query="
SELECT
id, b
FROM
a_table
WHERE
id = '${f.file}'">
<field column="B" name="b" />
</entity>
</entity>
</document>
</dataConfig>
A bit funny is the id = '${f.file}'-part i guess, but that is the id that is used. The select statement is correctly formed, but I get an exception when trying to run that file in the dataimport.jsp. The first part (xml) works fine, but when he gets to the db part it raises:
java.lang.RuntimeException: java.io.FileNotFoundException:
Could not find file: SELECT id, b FROM a_table WHERE id = '12345678.xml'
at org.apache.solr.handler.dataimport.FileDataSource.getFile[..]
Any advice? Thanks in advance
EDIT
I found the problem for the FileNotFoundException: within the entity tags the datasource-attributes need to be camelCased --> dataSource..
Now it runs through, but with the same outcome as in the first try: only field B gets in the index. If I take the db-entity out, then the file contents are indexed (field A)
Try:
<entity name="db-data" dataSource="cr-db"
The attributes are case-sensitive, so your wrong-cased attribute name is ignored and you fall back to the default one (which somehow is the file one).

Resources