Debugging solr rss DataImportHandler - solr

I have an existing collection, to which I want to add an RSS importer. I've copied what I could gleam from the example-DIH/solr/rss code.
The details are below, but the bottom line is that everything seems to run, but it always says "Fetched: 0" (and I get no documents). There are no exceptions in the tomcat log.
Questions:
Is there a way to turn up debugging on rss importers?
Can I see solr's actual request and response?
What would cause the request to succeed, but no rows to be fetched?
Is there a tutorial for adding an RSS DIH to an existing collection?
Thanks!
My solrconfig.xml file contains the requestHandler:
<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">rss-data-config.xml</str>
</lst>
</requestHandler>
And rss-data-config.xml:
<dataConfig>
<dataSource type="URLDataSource" />
<document>
<entity name="slashdot"
pk="link"
url="http://rss.slashdot.org/Slashdot/slashdot"
processor="XPathEntityProcessor"
forEach="/rss/channel | /rss/item"
transformer="DateFormatTransformer">
<field column="source_name" xpath="/rss/channel/title" commonField="true" />
<field column="title" xpath="/rss/item/title" />
<field column="link" xpath="/rss/item/link" />
<field column="body" xpath="/rss/item/description" />
<field column="date" xpath="/rss/item/date" dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss" />
</entity>
</document>
</dataConfig>
and from schema.xml:
<fields>
<field name="title" type="text_general" required="true" indexed="true" stored="true"/>
<field name="link" type="string" required="true" indexed="true" stored="true"/>
<field name="source_name" type="text_general" required="true" indexed="true" stored="true"/>
<field name="body" type="text_general" required="false" indexed="false" stored="true"/>
<field name="date" type="date" required="true" indexed="true" stored="true" />
<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>
<field name="_version_" type="long" indexed="true" stored="true"/>
<fields>
When I run the dataimport from the admin web page, it all seems to go well. It shows "Requests: 1" and there are no exceptions in the tomcat log:
Mar 12, 2013 9:02:58 PM org.apache.solr.handler.dataimport.DataImporter maybeReloadConfiguration
INFO: Loading DIH Configuration: rss-data-config.xml
Mar 12, 2013 9:02:58 PM org.apache.solr.handler.dataimport.DataImporter loadDataConfig
INFO: Data Configuration loaded successfully
Mar 12, 2013 9:02:58 PM org.apache.solr.handler.dataimport.DataImporter doFullImport
INFO: Starting Full Import
Mar 12, 2013 9:02:58 PM org.apache.solr.handler.dataimport.SimplePropertiesWriter readIndexerProperties
INFO: Read dataimport.properties
Mar 12, 2013 9:02:59 PM org.apache.solr.handler.dataimport.DocBuilder execute
INFO: Time taken = 0:0:0.693
Mar 12, 2013 9:02:59 PM org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [articles] webapp=/solr path=/dataimport params={optimize=false&clean=false&indent=true&commit=false&verbose=true&entity=slashdot&command=full-import&debug=true&wt=json} {} 0 706

Your problem here is due to your rss-data-config.xml and the defined xpaths.
If you open the url http://rss.slashdot.org/Slashdot/slashdot in Internet Explorer and hit F12 for developer tools it will show you the structure of the HTML.
You can see that the node <item> is a child of <channel> and not <rss>. So your config should look as follows:
<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
<dataSource type="URLDataSource" />
<document>
<entity name="slashdot"
pk="link"
url="http://rss.slashdot.org/Slashdot/slashdot"
processor="XPathEntityProcessor"
forEach="/rss/channel | /rss/channel/item"
transformer="DateFormatTransformer">
<field column="source_name" xpath="/rss/channel/title" commonField="true" />
<field column="title" xpath="/rss/channel/item/title" />
<field column="link" xpath="/rss/channel/item/link" />
<field column="body" xpath="/rss/channel/item/description" />
<field column="date" xpath="/rss/channel/item/date" dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss" />
</entity>
</document>
</dataConfig>

Which Solr version are you using ?
For 3.X you have the debug feature with DIH which will help you debug step by step.
Its missing in 4.X probably check SOLR-4151

The following data-config.xml file does the work for Slashdot (Solr 4.2.0)
<dataConfig>
<dataSource type="HttpDataSource" />
<document>
<entity name="slashdot"
pk="link"
url="http://rss.slashdot.org/Slashdot/slashdot"
processor="XPathEntityProcessor"
forEach="/rss/channel/item"
transformer="DateFormatTransformer">
<field column="title" xpath="/rss/channel/item/title" />
<field column="link" xpath="/rss/channel/item/link" />
<field column="description" xpath="/rss/channel/item/description" />
<field column="creator" xpath="/rss/channel/item/creator" />
<field column="item-subject" xpath="/rss/channel/item/subject" />
<field column="slash-department" xpath="/rss/channel/item/department" />
<field column="slash-section" xpath="/rss/channel/item/section" />
<field column="slash-comments" xpath="/rss/channel/item/comments" />
<field column="date" xpath="/rss/channel/item/date" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" />
</entity>
</document>
Notice the extra 'Z' on the dateTimeFormat, which is necessary according to "schema.xml"
Quoting schema.xml
The format for this date field is of the form 1995-12-31T23:59:59Z, and
is a more restricted form of the canonical representation of dateTime
http://www.w3.org/TR/xmlschema-2/#dateTime
The trailing "Z" designates UTC time and is mandatory.
Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z
All other components are mandatory.

Update your rss-data-config.xml as below
'<dataConfig>
<dataSource type="URLDataSource" />
<document>
<entity name="slashdot"
pk="link"
url="http://rss.slashdot.org/Slashdot/slashdot"
processor="XPathEntityProcessor"
forEach="/RDF/channel | /RDF/item"
transformer="DateFormatTransformer">
<field column="source" xpath="/RDF/channel/title" commonField="true" />
<field column="source-link" xpath="/RDF/channel/link" commonField="true" />
<field column="subject" xpath="/RDF/channel/subject" commonField="true" />
<field column="title" xpath="/RDF/item/title" />
<field column="link" xpath="/RDF/item/link" />
<field column="description" xpath="/RDF/item/description" />
<field column="creator" xpath="/RDF/item/creator" />
<field column="item-subject" xpath="/RDF/item/subject" />
<field column="slash-department" xpath="/RDF/item/department" />
<field column="slash-section" xpath="/RDF/item/section" />
<field column="slash-comments" xpath="/RDF/item/comments" />
<field column="date" xpath="/RDF/item/date" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss" />
</entity>
</document>
</dataConfig>'
It worked for me

Related

apache solr Index pdf files with xml documents

how to index pdf files in apache solr (version 8) with xml Documents
example:
<add>
<doc>
<field name="id">filePath</field>
<field name="title">the title</field>
<field name="description">description of the pdf file</field>
<field name="Creator">jhone doe</field>
<field name="Language">English</field>
<field name="Publisher">Publisher_name</field>
<field name="tags">some_tag</field>
<field name="is_published">true</field>
<field name="year">2002</field>
<field name="file">path_to_the_file/file_name.pdf</field>
</doc>
</add>
UPDATE
how to set literal.id to filePath
OK, this is what i did
i am using solr DHI
in solrconfig.xml
<requestHandler name="/dataimport_fromXML" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data.import.xml</str>
<str name="update.chain">dedupe</str>
</lst>
</requestHandler>
and the data.import.xml file
<dataConfig>
<dataSource type="BinFileDataSource" name="data"/>
<dataSource type="FileDataSource" name="main"/>
<document>
<!-- url : the url for the xml file that holde the metadata -->
<entity name="rec" processor="XPathEntityProcessor" url="${solr.install.dir:}solr/solr_core_name/filestore/docs_metaData/metaData.xml" forEach="/docs/doc" dataSource="main" transformer="RegexTransformer,DateFormatTransformer">
<field column="resourcename" xpath="//resourcename" name="resourceName" />
<field column="title" xpath="//title" name="title" />
<field column="subject" xpath="//subject" name="subject"/>
<field column="description" xpath="//description" name="description"/>
<field column="comments" xpath="//comments" name="comments"/>
<field column="author" xpath="//author" name="author"/>
<field column="keywords" xpath="//keywords" name="keywords"/>
<!-- baseDir: path to the folder that containt the files (pdf | doc | docx | ...) -->
<entity name="files" dataSource="null" rootEntity="false" processor="FileListEntityProcessor" baseDir="${solr.install.dir:}solr/solr_core_name/filestore/docs_folder" fileName="${rec.resourcename}" onError="skip" recursive="false">
<field column="fileAbsolutePath" name="filePath" />
<field column="resourceName" name="resourceName" />
<field column="fileSize" name="size" />
<field column="fileLastModified" name="lastModified" />
<!-- for etch file extracte metadata if not in the xml metadata file -->
<entity name="file" processor="TikaEntityProcessor" dataSource="data" format="text" url="${files.fileAbsolutePath}" onError="skip" recursive="false">
<field column="title" name="title" meta="true"/>
<field column="subject" name="subject" meta="true"/>
<field column="description" name="description" meta="true"/>
<field column="comments" name="comments" meta="true"/>
<field column="Author" name="author" meta="true"/>
<field column="Keywords" name="keywords" meta="true"/>
</entity>
</entity>
</entity>
</document>
</dataConfig>
after that all what you have to do is create xml file (metaData.xml)
<docs>
<doc>
<resourcename>fileName.pdf</resourcename>
<title></title>
<subject></subject>
<description></description>
<comments></comments>
<author></author>
<keywords></keywords>
</doc>
</docs>
and put all your file in one folder
"${solr.install.dir:}solr/solr_core_name/filestore/docs_folder"
the ${solr.install.dir:} is solr home folder
for the update in the question
how to set literal.id to filePath
in data.import.xml map the fileAbsolutePath to the id
<field column="fileAbsolutePath" name="id" />
one last thing
in this example the id is auto generated i am using
<updateRequestProcessorChain name="dedupe">
witch create a unique id based on the hash of the content for avoid duplication

Solr 6 dataimport fetched 10000 records but processed only 173

My first project on SOLR indexing. Importing product data from opencart ecommerce website. Its fetching correct number of records(10910) but process only 173. Appreciate if someone can help me to figure out this.
"Total Requests made to DataSource":"1",
"Total Rows Fetched":"10910",
"Total Documents Processed":"173",
"Total Documents Skipped":"0",
"Full Dump Started":"2016-07-27 11:22:58",
"":"Indexing completed. Added/Updated: 173 documents. Deleted 0
data-config.xml
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/dbname" user="root" password="password" />
<document name="doc">
<entity name="dbname" transformer="RegexTransformer" query="SELECT Query " deltaImportQuery="SELECT " deltaQuery=" SELECT p.product_id as id, p.date_modified FROM oc_product AS p WHERE p.date_modified > '${dataimporter.last_index_time}'">
<field column="id" sourceColName="id" />
<field column="model" sourceColName="model" />
<field column="price" sourceColName="price" />
<field column="selling_price" sourceColName="selling_price" />
<field column="stock_status" sourceColName="stock_status" />
<field column="name" sourceColName="name" />
<field column="set_description" sourceColName="set_description" />
<field column="description" sourceColName="description" />
<field column="categories" sourceColName="categories" splitBy="," />
<field column="category_ids" sourceColName="category_ids" splitBy="," />
<field column="filter_ids" sourceColName="filter_ids" splitBy="," />
<field column="filters" sourceColName="filters" splitBy="," />
<field column="store_ids" sourceColName="store_ids" splitBy="," />
</entity>
</document>
managed-schema - used default provided in configtest in solr 6.1.0 with following changes
<field name="id" type="int" indexed="true" stored="true" required="true" multiValued="false" />
<field name="model" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="price" type="float" indexed="true" stored="true" multiValued="false" />
<field name="selling_price" type="float" indexed="true" stored="true" />
<field name="stock_status" type="string" indexed="true" stored="true" />
<field name="name" type="string" indexed="true" stored="true" />
<field name="set_description" type="text_general" indexed="true" stored="true" />
<field name="description" type="text_general" indexed="true" stored="true" />
<field name="categories" type="string" indexed="true" stored="true" />
<field name="category_ids" type="int" indexed="true" stored="true" multiValued="true" />
<field name="filter_ids" type="int" indexed="true" stored="true" multiValued="true" />
<field name="filters" type="string" indexed="true" stored="true" />
<field name="store_ids" type="int" indexed="true" stored="true" multiValued="true" />
solrconfig.xml - used default with following changes
<lib dir="../../../contrib/dataimporthandler/lib/" regex=".*\.jar" />
<lib dir="../../../contrib/dataimporthandler-extras/lib/" regex=".*\.jar" />
<lib dir="../../../dist/" regex="solr-dataimporthandler-\d.*\.jar" />
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
Error log:
Error creating document : SolrInputDocument(fields: [selling_price=299.0000,​ filter_ids=12,​ 13,​ 19,​ 24,​ 43,​ 58,​ 62,​ stock_status=In Stock,​ store_ids=0,​ 2,​ description=Kurti length = 44 inches. No color bleed. Interlock stitching done. Side slit protection stitching done. Double bottom fold stitching done.,​ filters=Long,​ लॉंग,​ Straight,​ स्ट्रेट,​ Full Sleeve,​ फुल स्लीव,​ Solid,​ सॉलिड,​ Rayon,​ रेयॉन,​ V Neck,​ वी नेक,​ Size Set,​ साइज़ सेट,​ set_description=1 Set = Total 5 pieces,​ 1 each of 36,​ 38,​ 40,​ 42,​ 44,​ price=290.0000,​ name=Green Rayon Straight Solid Long V Neck Kurti,​ model=GNM_JP_GMI026,​ id=11856,​ category_ids=0,​ 61,​ categories=Green Rayon Straight Solid Long V Neck Kurti,​ ग्रीन रेयॉन स्ट्रेट सॉलिड लॉंग वी नेक कुर्ती,​ _version_=1541015877146640386])
org.apache.solr.common.SolrException: ERROR: [doc=11856] Error adding field 'filter_ids'='12, 13, 19, 24, 43, 58, 62' msg=For input string: "12, 13, 19, 24, 43, 58, 62"
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:177)
at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:82)
at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:280)
at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:214)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:169)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:68)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:939)
at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1094)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:720)
at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:74)
at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:260)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:524)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
Caused by: java.lang.NumberFormatException: For input string: "12, 13, 19, 24, 43, 58, 62"
at java.lang.NumberFormatException.forInputString(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at org.apache.solr.schema.TrieField.createField(TrieField.java:702)
at org.apache.solr.schema.TrieField.createFields(TrieField.java:741)
at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:47)
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:122)
Found the issue, I was using data type integer for fields category_ids, filter_ids and assigning them comma separated values. It will not work although I used multiValued=true and SpiltBy comma.
So the only product were indexed which have single values for these fields.
I changed data type to string and all records indexed.
Indexing completed. Added/Updated: 10910 documents. Deleted 0 documents. (Duration: 28s)
Requests: 1 , Fetched: 10,910 390/s, Skipped: 0 , Processed: 10,910 390/s
Started: 2 minutes ago
Thank you Uwe Allner to help on this.

Indexing wikipedia with solr

I've installed solr 4.6.0 and follow the tutorial available at Solr's home page. Everything was fine, untill I need to do a real job that I'm about to do. I have to get a fast access to wikipedia content and I was advised to use Solr. Well, I was trying to follow the example in the link http://wiki.apache.org/solr/DataImportHandler#Example:_Indexing_wikipedia, but I couldn't get the example. I am newbie, and I don't know what means data_config.xml!
<dataConfig>
<dataSource type="FileDataSource" encoding="UTF-8" />
<document>
<entity name="page"
processor="XPathEntityProcessor"
stream="true"
forEach="/mediawiki/page/"
url="/data/enwiki-20130102-pages-articles.xml"
transformer="RegexTransformer,DateFormatTransformer"
>
<field column="id" xpath="/mediawiki/page/id" />
<field column="title" xpath="/mediawiki/page/title" />
<field column="revision" xpath="/mediawiki/page/revision/id" />
<field column="user" xpath="/mediawiki/page/revision/contributor/username" />
<field column="userId" xpath="/mediawiki/page/revision/contributor/id" />
<field column="text" xpath="/mediawiki/page/revision/text" />
<field column="timestamp" xpath="/mediawiki/page/revision/timestamp" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" />
<field column="$skipDoc" regex="^#REDIRECT .*" replaceWith="true" sourceColName="text"/>
</entity>
</document>
</dataConfig>
I couldn't find in the Solr home directory. Also, I tried to find some questions related to mine, How to index wikipedia files in .xml format into solr and Indexing wikipedia dump with solr, but they didn't solve my doubt.
I think I need something more basic, guiding me step by step, because the tutorial is confusing when deals with indexing wikipedia.
Any advice to give some directions to folow would be nice.
For the data_config.xml
Each Solr instance is configured using three main files:solr.xml,solrconfig.xml,schema.xml,and the data_config.xml file define the data source when you use DIH component,this URL would be usefull for you :DIH.
About Solr home directory
You should start from here:https://cwiki.apache.org/confluence/display/solr/Running+Solr
Well, I've read many things on the Web and tried to collected as many information as possible. This is how I could find the solution:
here is my solrconfig.xml:
...
<!-- ****** Data import handler -->
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
...
<lib dir="../../../dist/" regex="solr-dataimporthandler-.*\.jar" />
Here is my data-config.xml: (important: it must be in the same folder of solrconfig.xml)
<dataConfig>
<dataSource type="FileDataSource" encoding="UTF-8" />
<document>
<entity name="page"
processor="XPathEntityProcessor"
stream="true"
forEach="/mediawiki/page/"
url="/Applications/solr-4.6.0/example/exampledocs/simplewikiSubSet.xml"
transformer="RegexTransformer,DateFormatTransformer"
>
<field column="id" xpath="/mediawiki/page/id" />
<field column="title" xpath="/mediawiki/page/title" />
<field column="revision" xpath="/mediawiki/page/revision/id" />
<field column="user" xpath="/mediawiki/page/revision/contributor/username" />
<field column="userId" xpath="/mediawiki/page/revision/contributor/id" />
<field column="text" xpath="/mediawiki/page/revision/text" />
<field column="timestamp" xpath="/mediawiki/page/revision/timestamp" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" />
<field column="$skipDoc" regex="^#REDIRECT .*" replaceWith="true" sourceColName="text"/>
</entity>
</document>
</dataConfig>
Attention: The last line is very important!
My schema.xml:
...
<field name="id" type="string" indexed="true" stored="true" required="true"/>
<field name="title" type="string" indexed="true" stored="false"/>
<field name="revision" type="int" indexed="true" stored="true"/>
<field name="user" type="string" indexed="true" stored="true"/>
<field name="userId" type="int" indexed="true" stored="true"/>
<field name="text" type="text_en" indexed="true" stored="false"/>
<field name="timestamp" type="date" indexed="true" stored="true"/>
<field name="titleText" type="text_en" indexed="true" stored="true"/>
...
<uniqueKey>id</uniqueKey>
...
<copyField source="title" dest="titleText"/>
...
And it's done. That's all folks!

solr dataimport not working for URLdatasource

This is my data-config.xml
<dataConfig>
<dataSource name="a" type="URLDataSource" encoding="UTF-8" connectionTimeout="5000" readTimeout="10000"/>
<document name="products">
<entity name="images" dataSource="a"
url="file:///abc/1299.xml"
processor="XPathEntityProcessor"
forEach="/imagesList/image"
>
<field column="id" xpath="/imageList/image/productId" />
<field column="image_array" xpath="/imageList/image/imageUrlString" />
</entity>
</document>
</dataConfig>
This is the schema.xml
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="image_array" type="text" indexed="true" stored="true" multivalued="true"/>
But when I try to deltaimport, none of the documents get added.
Any help will be highly appreciated.
Well first off, your XPath says imageList and your XML says imagesList ...

Create index on two unrelated table in Solr

I want to create index between two tables, stock and auction. Basically I am working on a product site. So I have to create index on both tables. and they are not related at all.
In data-config.xml, that I created to create index, I wrote the following code
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/database" user="root" password=""/>
<document name="content">
<entity name="stock" query="select ST_StockID,ST_StockCode,ST_Name,ST_ItemDetail from stock where estatus = 'Active' limit 100">
<field column="ST_StockID" name="stock_ST_StockID" />
<field column="ST_StockCode" name="stock_ST_StockCode" />
<field column="ST_Name" name="stock_ST_Name" />
<field column="ST_ItemDetail" name="stock_ST_ItemDetail" />
<entity name="auction" query="select iauctionid,rad_number,vsku,auction_code from auction limit 100">
<field column="iauctionid" name="auction_iauctionid" />
<field column="rad_number" name="auction_rad_number" />
<field column="vsku" name="auction_vsku" />
<field column="auction_code" name="auction_auction_code" />
</entity>
</entity>
</document>
</dataConfig>
and the schema.xml contains the fields are given below.
<field name="stock_ST_StockID" type="string" indexed="true" stored="true" required="true"/>
<field name="stock_ST_StockCode" type="string" indexed="true" stored="true" required="true"/>
<field name="stock_ST_Name" type="string" indexed="true" stored="true" required="true"/>
<field name="stock_ST_ItemDetail" type="text" indexed="true" stored="true" required="true"/>
<field name="auction_iauctionid" type="string" indexed="true" stored="true" required="true"/>
<field name="auction_rad_number" type="string" indexed="true" stored="true" required="true"/>
<field name="auction_vsku" type="string" indexed="true" stored="true" required="true"/>
<field name="auction_auction_code" type="text" indexed="true" stored="true" required="true"/>
But this way the indexes are being created in wrong way as I put the other table data into the first table in data-config.xml. If I create two entity element like given below then the indexes are not being created.
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/lc" user="root" password=""/>
<document name="content">
<entity name="stock" query="select ST_StockID,ST_StockCode,ST_Name,ST_ItemDetail from stock where estatus = 'Active' limit 100">
<field column="ST_StockID" name="stock_ST_StockID" />
<field column="ST_StockCode" name="stock_ST_StockCode" />
<field column="ST_Name" name="stock_ST_Name" />
<field column="ST_ItemDetail" name="stock_ST_ItemDetail" />
</entity>
<entity name="auction" query="select iauctionid,rad_number,vsku,auction_code from auction limit 100">
<field column="iauctionid" name="auction_iauctionid" />
<field column="rad_number" name="auction_rad_number" />
<field column="vsku" name="auction_vsku" />
<field column="auction_code" name="auction_auction_code" />
</entity>
</document>
</dataConfig>
I did not get your answer, can you pls elaborate a little more. I also have the same requirement. I have two tables stock and auction. Basically I am working on a product site. So I have to create index on both tables. and they are not related at all.
Please help
Do you get any errors when indexing the data ??
The following data config is fine as you have two unrelated items.
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/lc" user="root" password=""/>
<document name="content">
<entity name="stock" query="select ST_StockID,ST_StockCode,ST_Name,ST_ItemDetail from stock where estatus = 'Active' limit 100">
<field column="ST_StockID" name="stock_ST_StockID" />
<field column="ST_StockCode" name="stock_ST_StockCode" />
<field column="ST_Name" name="stock_ST_Name" />
<field column="ST_ItemDetail" name="stock_ST_ItemDetail" />
</entity>
<entity name="auction" query="select iauctionid,rad_number,vsku,auction_code from auction limit 100">
<field column="iauctionid" name="auction_iauctionid" />
<field column="rad_number" name="auction_rad_number" />
<field column="vsku" name="auction_vsku" />
<field column="auction_code" name="auction_auction_code" />
</entity>
</document>
</dataConfig>
However, there are few things missing ?
Whats the id field for the entity ? As each document should have a unique id, the configuration seems missing above.
Also the id should be unqiue for the entites, else the stock and auction should overwrite each other.
So you may want the id append as stock_ & auction_
You can also add a static field as Stock and auction to your schema and populate them, which would help you the filter out the results when searching and hence improve the performance.
For Assigning the Ids -
You can use the following to create the id value - This should append the Stock_ with the ST_StockID field value.
<field column="id" template="Stock_#${stock.ST_StockID}" />
OR
Use alias in sql e.g. SELECT 'Stock_' || ST_StockID AS ID ..... as use -
<field column="id" name="id" />

Resources