I need to query a SQL table and output data into this XML format. can someone help out or point me into t he right direction.
Note: I need to use T-SQL to achieve this result.
<account>
<field name="A" value="aaaaaaaa" type="string"/>
<field name="B" value="bbbbbbbbb" type="string"/>
<field name="I" value="11111111" type="int"/>
</account>
Related
I'm attempting to use DIH to load our SOLR data.
I've done this on other SOLR cores/installations without issue, but for some reason I can't get it working on this installation.
The main data (media - mostly videos) (from the primary DIH query) loads just fine. The secondary query (a nested entity) that should be loading one-to-many records for video tags, does not populate the multivalued fields in SOLR.
I don't see any error messages (at least not that I can find in the logs or anywhere else), so I'm not sure where it's going wrong.
The one thing I wonder about is that the "join" (the where clause in the nested query) does not use the primary key field - but I don't think this should matter. Please correct me if I'm wrong here.
Here's a simplified copy of the DIH config:
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
url="jdbc:sqlserver://x.x.x.x;databaseName=media;"
user="xxx"
password="xxx"
batchSize="100" />
<document name="mediaContent">
<entity name="media" query="SELECT * FROM mediaContent" pk="mediaID">
<field column="mediaID" name="mediaID" />
<field column="mediaType" name="mediaType" />
<field column="videoID" name="videoID" />
<field column="videoTitle" name="videoTitle" />
<field column="videoDescription" name="videoDescription" />
<field column="videoStatusID" name="videoStatusID" />
<field column="videoPublished" name="videoPublished" />
<entity name="videoTags"
query="
SELECT tagID, tagTitle
FROM videoTags
WHERE videoID = '${mediaContent.videoID}'">
<field column="tagID" name="videoTagIDs" />
<field column="tagTitle" name="videoTagTitles" />
</entity>
</entity>
</document>
</dataConfig>
The multivalued items in the managed-schema file are configured as per:
<field name="videoTagIDs" type="int" indexed="true" stored="true" required="false" multiValued="true" />
<field name="videoTagTitles" type="text" indexed="true" stored="true" required="false" multiValued="true" />
Normally I would use the primary key to join the data in the second query, but in this case, because not all of the content is videos, and the tags only related to the video content, I am not using the PK field. Instead, I'm using the videoID field. Again, I'm not sure if that matters here or not. I get the proper data when I run the queries in SQL.
If anyone has any suggestions as to how I can debug what's happening with the secondary query, or better yet, if anyone sees something in my config that I've done wrong, I'd appreciate your feedback.
Thanks!
Bill
So in the end this turned out to be the fault of case sensitivity.
The fields coming from the SQL query did not exactly match the case configured in the DIH field names (videoID vs videoId). While this did not matter for the main query, and the data was imported there, it did matter for the second nested entity.
The debugging never worked well, but debugging the queries actually being run on SQL server was helpful in seeing what was going on.
I am using DataImportHandler for indexing data in Solr.
I am retrieving data from three columns from AUTO table in database where two columns namely TOPIC and PARTS have data of type 'CLOB' and column DATE has oracle timestamp which holds created date.
The problem is in my data-config file where I need to transform the clob data to string and also date to the UTC that Solr uses.
So I need two transformers i.e ClobTransformer and DateFormatTransformer.
I am wondering how will I use both the transformers in single entity.
here is my data-config file
<dataConfig>
<dataSource name="ds1" type="JdbcDataSource"
driver="oracle.jdbc.OracleDriver"
url="....."
user="....."
password="...."/>
<document name="doc">
<entity name="ent"
query="Select
auto.ID,
auto.Topic as Topic,
auto.Parts as Parts,
to_date(to_char(auto.Date, 'yyyy-MM-dd HH:MI:SS'), 'YYYY-MM-DD HH:MI:SS') AS Date,
From auto
order by auto.Date DESC"
dataSource="ds1" transformer="DateFormatTransformer">
<field column="ID" name="id"/>
<field column="TOPIC" name="topic" clob="true"/>
<field column="PARTS" name="parts" clob="true"/>
<field column="DATE" name="date" xpath="/RDF/item/date" dateTimeFormat="yyyy-MM-dd HH:mm:ss" locale="en"/>
</entity>
</document>
</dataConfig>
Above I have used only DateFormatTransformer.
Any help would be much appreciated.
Ok I came to know how its done. Just specifying the particular transformers using commas in the 'transformer' section of the tag like this:
<dataConfig>
<dataSource name="ds1" type="JdbcDataSource"
driver="oracle.jdbc.OracleDriver"
url="....."
user="....."
password="...."/>
<document name="doc">
<entity name="ent"
query="Select
auto.ID,
auto.Topic as Topic,
auto.Parts as Parts,
to_date(to_char(auto.Date, 'yyyy-MM-dd HH:MI:SS'), 'YYYY-MM-DD HH:MI:SS') AS Date,
From auto
order by auto.Date DESC"
dataSource="ds1" transformer="ClobTransformer,DateFormatTransformer">
<field column="ID" name="id"/>
<field column="TOPIC" name="topic" clob="true"/>
<field column="PARTS" name="parts" clob="true"/>
<field column="DATE" name="date" xpath="/RDF/item/date" dateTimeFormat="yyyy-MM-dd HH:mm:ss" locale="en"/>
</entity>
</document>
</dataConfig>
I have used two transformers, transformer="ClobTransformer,DateFormatTransformer"
Is that possible to use DataImportHandler with partial updates in Solr 4? Should I be able to use a data-config.xml like the one below, and import both entities in separate moments and get full documents with both data?
<document name="item">
<entity name="pricing" query="select * from prc">
<field column="ID" name="itemId" />
<field column="NM" name="itemName" />
<field column="default" name="defaultPrice" />
<field column="sale" name="salesPrice" />
</entity>
<entity name="tag" query="select * from tag">
<field column="ID" name="itemId" />
<field column="TAG" name="adminTag" />
</entity>
</document>
Solr Partial update is not support for DIH. So you would probably need to use Solrj for this.
Also, for multiple entities you can have them specific.
However, these multiple entities would be indexed as seperate Documents in the Solr index and not as a combined document. If you want to single document, you would need to have a sub entity.
I am trying to parse the binary content data stored in database in table document_attachment in column file_data and trying to index the same so that it's content becomes available for searching using Solr.
When I run the indexer it fetches the rows which is twice in number to the rows returned by running the query in entity named "dcs" and throws no errors or exceptions. it however does not indexes the binary content(the field that I associate with tika despite of fetching it from the table).
I am using apache-solr-3.6.1 and Tika 1.0
My configuration files look something like :
data-config.xml
<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
<dataSource
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/espritkm_1?zeroDateTimeBehavior=convertToNull"
user="root"
password=""
autoCommit="true" batchSize="-1"
convertType="false"
name="test"
/>
<dataSource name="fieldReader" type="FieldStreamDataSource" />
<document name="items">
<entity name="dcs"
query="SELECT 222000000000000000+d.id AS common_id_attr,d.id AS id,UNIX_TIMESTAMP(d.created_at) AS date_added,d.file_name as common1, d.description as common2, d.file_mime_type as common3, 72 as common4,(Select group_concat(trim(tags) ORDER BY trim(tags) SEPARATOR ' | ') from tags t where t.type_id = 72 AND t.feature_id = d.id group by t.feature_id) as common5,d.created_by as common6, df.name as common7,CONCAT(d.file_name,'.',d.file_mime_type) as common8,'' as common9,(Select da.file_data from document_attachment da where da.document_id = d.id) as text FROM document d LEFT JOIN document_folder df ON df.id = d.document_folder_id WHERE d.is_deleted = 0 and d.parent_id = 0 " dataSource="test" transformer="TemplateTransformer">
<field column="common_id_attr" name="common_id_attr" />
<field column="id" name="id" />
<entity dataSource="fieldReader" processor="TikaEntityProcessor" dataField="dcs.text" format="text" pk="dcs.id" >
<field column="text" name="text" />
</entity>
</entity>
schema.xml
<schema>
<fields>
<field name="common_id_attr" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="id" type="string" indexed="true" stored="true"/>
<field name="text" type="text" indexed="true" stored="true" multiValued="true"/>
</fields>
<uniqueKey>common_id_attr</uniqueKey>
<solrQueryParser defaultOperator="OR"/>
<defaultSearchField>text</defaultSearchField>
</schema>
Though the rows it fetches is double the number of documents counting the rows of tika as separate (I don't understand why?). It does not indexes binary content.
I am stuck in this problem from long. Can someone please help
I was able to index the documents using Apache Solr version 3.6.2. I have described the steps here:
http://tuxdna.wordpress.com/2013/02/04/indexing-the-documents-stored-in-a-database-using-apache-solr-and-apache-tika/
I think it should be doable in 3.6.1 as well. I was only impatient to search for a tarball of version 3.6.1 when only 3.6.2 was avaiable from the official site.
I hope that helps.
I am new to solr, while creating the indexes i am attaching string to database table id
my field in schema.xml as follows
<field name="id" type="string" indexed="true" stored="true" required="true"/>
<uniqueKey>id</uniqueKey>
and i am passing 'GROUP1' for id, but it is storing [B#1e090ee like this.
How could i store the same value(GROUP1) instead of [B#1e090ee ?
Please help
Is group_id string or some numeric data type?
If it's not string you need to cast it to char before concatenation with appropriate encoding.
Also add encoding (that matches your MySQL db encoding) parameter to dataSource tag, like this:
<dataSource
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://host/dbname"
batchSize="-1"
user="username"
password="password"
readOnly="true"
autoCommit="true"
encoding="UTF-8" />
Which DB are you using?
Do you see the correct values when you execute your query directly in your db?
IMHO, the problem has to be either with DataImportHandler or you actually have values like that ([B#1e090ee) in your group_id field.
Have you checked that encoding param in your dataCofig's dataSource is the same as your db's encoding?
Can you post your dataConfig file?
#mbonaci
I am using mysql database.
when i execute the same query, the results are coming fine
the following is my data config file
<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
<dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://host/dbname" batchSize="-1" user="username" password="password" readOnly="true" autoCommit="true" />
<document name="products">
<entity name="item" query="select group_id,group_title,description,DATE_FORMAT(created_date, '%Y-%m-%dT%H:%i:%sZ') as createdDate,group_status,CONCAT('GROUP',group_id) as id,'GROUP' as itemtype from collaboration_groups where group_status=1 ">
<field column="id" name="id" />
<field column="group_id" name="itemid" />
<field column="itemtype" name="itemtype" />
<field column="group_title" name="fullName" />
<field column="description" name="description"/>
<field column="createdDate" name="createdDate"/>
</entity>
</document>
</dataConfig>