Is that possible to use DataImportHandler with partial updates in Solr 4? Should I be able to use a data-config.xml like the one below, and import both entities in separate moments and get full documents with both data?
<document name="item">
<entity name="pricing" query="select * from prc">
<field column="ID" name="itemId" />
<field column="NM" name="itemName" />
<field column="default" name="defaultPrice" />
<field column="sale" name="salesPrice" />
</entity>
<entity name="tag" query="select * from tag">
<field column="ID" name="itemId" />
<field column="TAG" name="adminTag" />
</entity>
</document>
Solr Partial update is not support for DIH. So you would probably need to use Solrj for this.
Also, for multiple entities you can have them specific.
However, these multiple entities would be indexed as seperate Documents in the Solr index and not as a combined document. If you want to single document, you would need to have a sub entity.
Related
I am using DataImportHandler for indexing data in Solr.
I am retrieving data from three columns from AUTO table in database where two columns namely TOPIC and PARTS have data of type 'CLOB' and column DATE has oracle timestamp which holds created date.
The problem is in my data-config file where I need to transform the clob data to string and also date to the UTC that Solr uses.
So I need two transformers i.e ClobTransformer and DateFormatTransformer.
I am wondering how will I use both the transformers in single entity.
here is my data-config file
<dataConfig>
<dataSource name="ds1" type="JdbcDataSource"
driver="oracle.jdbc.OracleDriver"
url="....."
user="....."
password="...."/>
<document name="doc">
<entity name="ent"
query="Select
auto.ID,
auto.Topic as Topic,
auto.Parts as Parts,
to_date(to_char(auto.Date, 'yyyy-MM-dd HH:MI:SS'), 'YYYY-MM-DD HH:MI:SS') AS Date,
From auto
order by auto.Date DESC"
dataSource="ds1" transformer="DateFormatTransformer">
<field column="ID" name="id"/>
<field column="TOPIC" name="topic" clob="true"/>
<field column="PARTS" name="parts" clob="true"/>
<field column="DATE" name="date" xpath="/RDF/item/date" dateTimeFormat="yyyy-MM-dd HH:mm:ss" locale="en"/>
</entity>
</document>
</dataConfig>
Above I have used only DateFormatTransformer.
Any help would be much appreciated.
Ok I came to know how its done. Just specifying the particular transformers using commas in the 'transformer' section of the tag like this:
<dataConfig>
<dataSource name="ds1" type="JdbcDataSource"
driver="oracle.jdbc.OracleDriver"
url="....."
user="....."
password="...."/>
<document name="doc">
<entity name="ent"
query="Select
auto.ID,
auto.Topic as Topic,
auto.Parts as Parts,
to_date(to_char(auto.Date, 'yyyy-MM-dd HH:MI:SS'), 'YYYY-MM-DD HH:MI:SS') AS Date,
From auto
order by auto.Date DESC"
dataSource="ds1" transformer="ClobTransformer,DateFormatTransformer">
<field column="ID" name="id"/>
<field column="TOPIC" name="topic" clob="true"/>
<field column="PARTS" name="parts" clob="true"/>
<field column="DATE" name="date" xpath="/RDF/item/date" dateTimeFormat="yyyy-MM-dd HH:mm:ss" locale="en"/>
</entity>
</document>
</dataConfig>
I have used two transformers, transformer="ClobTransformer,DateFormatTransformer"
How to index documents that contain specific string in solr? This is my current dataimporthandler
<dataConfig>
<dataSource type="FileDataSource" encoding="UTF-8" />
<document>
<entity name="page"
processor="XPathEntityProcessor"
stream="true"
forEach="/mediawiki/page/"
url="pages.xml"
transformer="RegexTransformer"
>
<field column="id" xpath="/mediawiki/page/id" />
<field column="title" xpath="/mediawiki/page/title" />
<field column="text" regex="\{\{PersonData" xpath="/mediawiki/page/revision/text" />
</entity>
</document>
</dataConfig>
I only want to index if the text field contain {{PersonData , but the above imports everything . Should this be specified in import handler or schema?
You need to do something like this:
<field column="$skipDoc" regex="^#REDIRECT .*" replaceWith="true" sourceColName="text"/>
In this case documents matching the specified regex are skipped, ie. articles that are "redirects" to other articles are skipped here.
Detailed documentation here:
http://wiki.apache.org/solr/DataImportHandler#XPathEntityProcessor
So for yours you need to find a way to say skip all documents where "PersonData" data is NOT in "text" column.
Look specifically at : "Example: Indexing wikipedia" part of http://wiki.apache.org/solr/DataImportHandler#XPathEntityProcessor
I am trying to import multiple tables from a MySQL database using Solr's Data Import Handler (DIH). The DIH does not import data from the second table, 'detail'.
My database configuration file is
<document>
<entity name="item" pk="ListingId" query="SELECT * FROM item as item where listingid=360245270">
<entity name="detail" pk="ListingId" query="SELECT Body FROM detail where listingid='${item.listingid}'">
<field column="Body" name="Body" />
</entity>
</entity>
</document>
I monitored the MySQL query log, and the two important queries that are executed are:
SELECT * FROM item as item where listingid=360245270
SELECT Body FROM listeditemdetail where listeditemdetail.listingid=''
Clearly, the '${item.listingid}' part in the configuration file is not working as required. I have tried different spellings for the table and column names but cannot get it to work.
(Just a Try) Try removing the primary key and using the upper case e.g. :-
<document name="items">
<entity name="item" query="SELECT * FROM item as item where listingid=360245270">
<field column="LISTINGID" name="listingid" />
<entity name="detail" query="SELECT Body FROM detail where listingid='${item.LISTINGID}'">
<field column="Body" name="Body" />
</entity>
</entity>
</document>
When I run the "Full import with cleaning" command, error is "Indexing failed. Rolled back all changes"
My dataimport config file:
<dataConfig>
<dataSource type="JdbcDataSource" name="ds-1" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://my.ip/my_db" user="my_db_user" password="my_password" readOnly="True"/>
<document>
<entity name="videos" pk="ID" transformer="TemplateTransformer" dataSource="ds-1"
query="SELECT * FROM videos LIMIT 100">
<field column="id" name="unid" indexed="true" stored="true" />
<field column="title" name="baslik" indexed="true" stored="true" />
<field column="video_img" name="img" indexed="true" stored="true" />
</entity>
</document>
</dataConfig>
I kept receiving the same error message at some point in time.For me there were the following reasons:
bad connection string.
Bad driver (com.mysql.jdbc.Driver)
bad query
bad mapping of columns to solrfields ( I think it might be your problem too)
Make sure the name of the columns in the database is the same (case sensitive) as the name of the columns in SOLR. If not rename the colmuns name in the query:
select id as uniqueid, title as Tittle
or using the field element in the entity you defined like this:
<field column="ID" name="id" />
You are using the field element wrong. See here how you can use this element: http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml
If you can share other relevant data and logs we can give you more specific information.
Good luck.
I am new to solr, while creating the indexes i am attaching string to database table id
my field in schema.xml as follows
<field name="id" type="string" indexed="true" stored="true" required="true"/>
<uniqueKey>id</uniqueKey>
and i am passing 'GROUP1' for id, but it is storing [B#1e090ee like this.
How could i store the same value(GROUP1) instead of [B#1e090ee ?
Please help
Is group_id string or some numeric data type?
If it's not string you need to cast it to char before concatenation with appropriate encoding.
Also add encoding (that matches your MySQL db encoding) parameter to dataSource tag, like this:
<dataSource
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://host/dbname"
batchSize="-1"
user="username"
password="password"
readOnly="true"
autoCommit="true"
encoding="UTF-8" />
Which DB are you using?
Do you see the correct values when you execute your query directly in your db?
IMHO, the problem has to be either with DataImportHandler or you actually have values like that ([B#1e090ee) in your group_id field.
Have you checked that encoding param in your dataCofig's dataSource is the same as your db's encoding?
Can you post your dataConfig file?
#mbonaci
I am using mysql database.
when i execute the same query, the results are coming fine
the following is my data config file
<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
<dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://host/dbname" batchSize="-1" user="username" password="password" readOnly="true" autoCommit="true" />
<document name="products">
<entity name="item" query="select group_id,group_title,description,DATE_FORMAT(created_date, '%Y-%m-%dT%H:%i:%sZ') as createdDate,group_status,CONCAT('GROUP',group_id) as id,'GROUP' as itemtype from collaboration_groups where group_status=1 ">
<field column="id" name="id" />
<field column="group_id" name="itemid" />
<field column="itemtype" name="itemtype" />
<field column="group_title" name="fullName" />
<field column="description" name="description"/>
<field column="createdDate" name="createdDate"/>
</entity>
</document>
</dataConfig>