When I run the "Full import with cleaning" command, error is "Indexing failed. Rolled back all changes"
My dataimport config file:
<dataConfig>
<dataSource type="JdbcDataSource" name="ds-1" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://my.ip/my_db" user="my_db_user" password="my_password" readOnly="True"/>
<document>
<entity name="videos" pk="ID" transformer="TemplateTransformer" dataSource="ds-1"
query="SELECT * FROM videos LIMIT 100">
<field column="id" name="unid" indexed="true" stored="true" />
<field column="title" name="baslik" indexed="true" stored="true" />
<field column="video_img" name="img" indexed="true" stored="true" />
</entity>
</document>
</dataConfig>
I kept receiving the same error message at some point in time.For me there were the following reasons:
bad connection string.
Bad driver (com.mysql.jdbc.Driver)
bad query
bad mapping of columns to solrfields ( I think it might be your problem too)
Make sure the name of the columns in the database is the same (case sensitive) as the name of the columns in SOLR. If not rename the colmuns name in the query:
select id as uniqueid, title as Tittle
or using the field element in the entity you defined like this:
<field column="ID" name="id" />
You are using the field element wrong. See here how you can use this element: http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml
If you can share other relevant data and logs we can give you more specific information.
Good luck.
Related
I am trying to import data from relational db to Solr for indexing.
Here is my data-config.xml :
<dataConfig>
<dataSource type="JdbcDataSource"
driver="org.postgresql.Driver"
url="jdbc:postgresql://link/db"
user="user"
password="pass"/>
<document>
<entity name="locations_countries" query="select name from locations_countries">
</entity>
</document>
</dataConfig>
Managed-schema.xml
<uniqueKey>name</uniqueKey>
<field name="name" type="string" indexed="true" stored="true"/>
I created core and than try to import the solr core from localhost but no column are fetched and when I checked the log I get following error:
I tried everything but nothing work and also delete and recreated the core but again same error.
How to index documents that contain specific string in solr? This is my current dataimporthandler
<dataConfig>
<dataSource type="FileDataSource" encoding="UTF-8" />
<document>
<entity name="page"
processor="XPathEntityProcessor"
stream="true"
forEach="/mediawiki/page/"
url="pages.xml"
transformer="RegexTransformer"
>
<field column="id" xpath="/mediawiki/page/id" />
<field column="title" xpath="/mediawiki/page/title" />
<field column="text" regex="\{\{PersonData" xpath="/mediawiki/page/revision/text" />
</entity>
</document>
</dataConfig>
I only want to index if the text field contain {{PersonData , but the above imports everything . Should this be specified in import handler or schema?
You need to do something like this:
<field column="$skipDoc" regex="^#REDIRECT .*" replaceWith="true" sourceColName="text"/>
In this case documents matching the specified regex are skipped, ie. articles that are "redirects" to other articles are skipped here.
Detailed documentation here:
http://wiki.apache.org/solr/DataImportHandler#XPathEntityProcessor
So for yours you need to find a way to say skip all documents where "PersonData" data is NOT in "text" column.
Look specifically at : "Example: Indexing wikipedia" part of http://wiki.apache.org/solr/DataImportHandler#XPathEntityProcessor
Is that possible to use DataImportHandler with partial updates in Solr 4? Should I be able to use a data-config.xml like the one below, and import both entities in separate moments and get full documents with both data?
<document name="item">
<entity name="pricing" query="select * from prc">
<field column="ID" name="itemId" />
<field column="NM" name="itemName" />
<field column="default" name="defaultPrice" />
<field column="sale" name="salesPrice" />
</entity>
<entity name="tag" query="select * from tag">
<field column="ID" name="itemId" />
<field column="TAG" name="adminTag" />
</entity>
</document>
Solr Partial update is not support for DIH. So you would probably need to use Solrj for this.
Also, for multiple entities you can have them specific.
However, these multiple entities would be indexed as seperate Documents in the Solr index and not as a combined document. If you want to single document, you would need to have a sub entity.
I am trying to parse the binary content data stored in database in table document_attachment in column file_data and trying to index the same so that it's content becomes available for searching using Solr.
When I run the indexer it fetches the rows which is twice in number to the rows returned by running the query in entity named "dcs" and throws no errors or exceptions. it however does not indexes the binary content(the field that I associate with tika despite of fetching it from the table).
I am using apache-solr-3.6.1 and Tika 1.0
My configuration files look something like :
data-config.xml
<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
<dataSource
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/espritkm_1?zeroDateTimeBehavior=convertToNull"
user="root"
password=""
autoCommit="true" batchSize="-1"
convertType="false"
name="test"
/>
<dataSource name="fieldReader" type="FieldStreamDataSource" />
<document name="items">
<entity name="dcs"
query="SELECT 222000000000000000+d.id AS common_id_attr,d.id AS id,UNIX_TIMESTAMP(d.created_at) AS date_added,d.file_name as common1, d.description as common2, d.file_mime_type as common3, 72 as common4,(Select group_concat(trim(tags) ORDER BY trim(tags) SEPARATOR ' | ') from tags t where t.type_id = 72 AND t.feature_id = d.id group by t.feature_id) as common5,d.created_by as common6, df.name as common7,CONCAT(d.file_name,'.',d.file_mime_type) as common8,'' as common9,(Select da.file_data from document_attachment da where da.document_id = d.id) as text FROM document d LEFT JOIN document_folder df ON df.id = d.document_folder_id WHERE d.is_deleted = 0 and d.parent_id = 0 " dataSource="test" transformer="TemplateTransformer">
<field column="common_id_attr" name="common_id_attr" />
<field column="id" name="id" />
<entity dataSource="fieldReader" processor="TikaEntityProcessor" dataField="dcs.text" format="text" pk="dcs.id" >
<field column="text" name="text" />
</entity>
</entity>
schema.xml
<schema>
<fields>
<field name="common_id_attr" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="id" type="string" indexed="true" stored="true"/>
<field name="text" type="text" indexed="true" stored="true" multiValued="true"/>
</fields>
<uniqueKey>common_id_attr</uniqueKey>
<solrQueryParser defaultOperator="OR"/>
<defaultSearchField>text</defaultSearchField>
</schema>
Though the rows it fetches is double the number of documents counting the rows of tika as separate (I don't understand why?). It does not indexes binary content.
I am stuck in this problem from long. Can someone please help
I was able to index the documents using Apache Solr version 3.6.2. I have described the steps here:
http://tuxdna.wordpress.com/2013/02/04/indexing-the-documents-stored-in-a-database-using-apache-solr-and-apache-tika/
I think it should be doable in 3.6.1 as well. I was only impatient to search for a tarball of version 3.6.1 when only 3.6.2 was avaiable from the official site.
I hope that helps.
I am new to solr, while creating the indexes i am attaching string to database table id
my field in schema.xml as follows
<field name="id" type="string" indexed="true" stored="true" required="true"/>
<uniqueKey>id</uniqueKey>
and i am passing 'GROUP1' for id, but it is storing [B#1e090ee like this.
How could i store the same value(GROUP1) instead of [B#1e090ee ?
Please help
Is group_id string or some numeric data type?
If it's not string you need to cast it to char before concatenation with appropriate encoding.
Also add encoding (that matches your MySQL db encoding) parameter to dataSource tag, like this:
<dataSource
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://host/dbname"
batchSize="-1"
user="username"
password="password"
readOnly="true"
autoCommit="true"
encoding="UTF-8" />
Which DB are you using?
Do you see the correct values when you execute your query directly in your db?
IMHO, the problem has to be either with DataImportHandler or you actually have values like that ([B#1e090ee) in your group_id field.
Have you checked that encoding param in your dataCofig's dataSource is the same as your db's encoding?
Can you post your dataConfig file?
#mbonaci
I am using mysql database.
when i execute the same query, the results are coming fine
the following is my data config file
<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
<dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://host/dbname" batchSize="-1" user="username" password="password" readOnly="true" autoCommit="true" />
<document name="products">
<entity name="item" query="select group_id,group_title,description,DATE_FORMAT(created_date, '%Y-%m-%dT%H:%i:%sZ') as createdDate,group_status,CONCAT('GROUP',group_id) as id,'GROUP' as itemtype from collaboration_groups where group_status=1 ">
<field column="id" name="id" />
<field column="group_id" name="itemid" />
<field column="itemtype" name="itemtype" />
<field column="group_title" name="fullName" />
<field column="description" name="description"/>
<field column="createdDate" name="createdDate"/>
</entity>
</document>
</dataConfig>