I want to create index between two tables, stock and auction. Basically I am working on a product site. So I have to create index on both tables. and they are not related at all.
In data-config.xml, that I created to create index, I wrote the following code
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/database" user="root" password=""/>
<document name="content">
<entity name="stock" query="select ST_StockID,ST_StockCode,ST_Name,ST_ItemDetail from stock where estatus = 'Active' limit 100">
<field column="ST_StockID" name="stock_ST_StockID" />
<field column="ST_StockCode" name="stock_ST_StockCode" />
<field column="ST_Name" name="stock_ST_Name" />
<field column="ST_ItemDetail" name="stock_ST_ItemDetail" />
<entity name="auction" query="select iauctionid,rad_number,vsku,auction_code from auction limit 100">
<field column="iauctionid" name="auction_iauctionid" />
<field column="rad_number" name="auction_rad_number" />
<field column="vsku" name="auction_vsku" />
<field column="auction_code" name="auction_auction_code" />
</entity>
</entity>
</document>
</dataConfig>
and the schema.xml contains the fields are given below.
<field name="stock_ST_StockID" type="string" indexed="true" stored="true" required="true"/>
<field name="stock_ST_StockCode" type="string" indexed="true" stored="true" required="true"/>
<field name="stock_ST_Name" type="string" indexed="true" stored="true" required="true"/>
<field name="stock_ST_ItemDetail" type="text" indexed="true" stored="true" required="true"/>
<field name="auction_iauctionid" type="string" indexed="true" stored="true" required="true"/>
<field name="auction_rad_number" type="string" indexed="true" stored="true" required="true"/>
<field name="auction_vsku" type="string" indexed="true" stored="true" required="true"/>
<field name="auction_auction_code" type="text" indexed="true" stored="true" required="true"/>
But this way the indexes are being created in wrong way as I put the other table data into the first table in data-config.xml. If I create two entity element like given below then the indexes are not being created.
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/lc" user="root" password=""/>
<document name="content">
<entity name="stock" query="select ST_StockID,ST_StockCode,ST_Name,ST_ItemDetail from stock where estatus = 'Active' limit 100">
<field column="ST_StockID" name="stock_ST_StockID" />
<field column="ST_StockCode" name="stock_ST_StockCode" />
<field column="ST_Name" name="stock_ST_Name" />
<field column="ST_ItemDetail" name="stock_ST_ItemDetail" />
</entity>
<entity name="auction" query="select iauctionid,rad_number,vsku,auction_code from auction limit 100">
<field column="iauctionid" name="auction_iauctionid" />
<field column="rad_number" name="auction_rad_number" />
<field column="vsku" name="auction_vsku" />
<field column="auction_code" name="auction_auction_code" />
</entity>
</document>
</dataConfig>
I did not get your answer, can you pls elaborate a little more. I also have the same requirement. I have two tables stock and auction. Basically I am working on a product site. So I have to create index on both tables. and they are not related at all.
Please help
Do you get any errors when indexing the data ??
The following data config is fine as you have two unrelated items.
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/lc" user="root" password=""/>
<document name="content">
<entity name="stock" query="select ST_StockID,ST_StockCode,ST_Name,ST_ItemDetail from stock where estatus = 'Active' limit 100">
<field column="ST_StockID" name="stock_ST_StockID" />
<field column="ST_StockCode" name="stock_ST_StockCode" />
<field column="ST_Name" name="stock_ST_Name" />
<field column="ST_ItemDetail" name="stock_ST_ItemDetail" />
</entity>
<entity name="auction" query="select iauctionid,rad_number,vsku,auction_code from auction limit 100">
<field column="iauctionid" name="auction_iauctionid" />
<field column="rad_number" name="auction_rad_number" />
<field column="vsku" name="auction_vsku" />
<field column="auction_code" name="auction_auction_code" />
</entity>
</document>
</dataConfig>
However, there are few things missing ?
Whats the id field for the entity ? As each document should have a unique id, the configuration seems missing above.
Also the id should be unqiue for the entites, else the stock and auction should overwrite each other.
So you may want the id append as stock_ & auction_
You can also add a static field as Stock and auction to your schema and populate them, which would help you the filter out the results when searching and hence improve the performance.
For Assigning the Ids -
You can use the following to create the id value - This should append the Stock_ with the ST_StockID field value.
<field column="id" template="Stock_#${stock.ST_StockID}" />
OR
Use alias in sql e.g. SELECT 'Stock_' || ST_StockID AS ID ..... as use -
<field column="id" name="id" />
Related
I am trying to index my database for a question answer website. To start off, I want to index the questions and answers table which has a one to many relationship. I would expect solr to return documents like:
{
'question_id': 1,
'question': 'Is this a question?',
'answers' : [
{
'answer_id': 1,
'answer': 'Maybe'
},
{
'answer_id': 2,
'answer': 'yes it is'
}
]
}
What configuration do I need to achieve this?
I've gone through Configuring the DIH Configuration File tutorial.
Below are the configurations I've tried:
CONFIG 1
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/questionsdb" user="root" password=""/>
<document>
<entity name="questions"
pk="id"
query="SELECT id, title FROM questions">
<field column="id" name="question_id"/>
<field column="title" name="title"/>
<entity name="answers"
pk="id"
query="select id, answer from answers where qid='${questions.id}'">
<field name="answer_id" column="id" />
<field name="answer" column="answer" />
</entity>
</entity>
</document>
</dataConfig>
QUERY OUTPUT:
CONFIG 2
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/questionsdb" user="root" password=""/>
<document>
<entity name="questions"
query="SELECT questions.id as question_id, questions.title as question, answers.id as answer_id, answers.answer as answer FROM questions JOIN answers ON questions.id = answers.qid">
<field column="id" name="question_id"/>
<field column="title" name="title"/>
<field name="answer" column="answer" />
<field name="answer_id" column="answer_id" />
</entity>
</document>
</dataConfig>
QUERY OUTPUT:
I'm using solr 8.6.
EDIT 1:
Updated my managed-schema file to use multiValued="true":
<field name="question" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="question_id" type="pint" indexed="false" stored="true" multiValued="false"/>
<field name="answer" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="answer_id" type="pint" indexed="false" stored="true" multiValued="true"/>
The output indexes the answers now but the answer and answer_id come up as a list.
Is it possible to restructure them to be returned as a list of dictionaries as given in the example structure above?
I am trying to index nested documents to with respect to parent docment, but does not find expected structure of indexed data in SOLR. Please correct me what is going wrong in solr configuration as mention below.
table structure:
enter image description here
db-data-config.xml
<document>
<entity name="parent" pk="parent_id" query="SELECT parent_id, name, salary, country from parent" deltaQuery="select parent_id, name, salary, country from parent where updated_at > ${dataimporter.last_index_time}">
<field column="parent_id" name="id" />
<field column="parent_id" name="parent_id" />
<field column="name" name="name" />
<field column="salary" name="salary" />
<field column="country" name="country" />
<entity name="child" child="true" pk="child_id" query="select child.child_id, child.parent_id, child.child_name from child where child.parent_id='${parent.parent_id}' ">
<field column="parent_id" name="id" />
<field column="child_id" name="child_id" />
<field column="child_name" name="child_name" />
</entity>
</entity>
</document>
managed-schema:
<!-- parent table fields -->
<field name="parent_d" type="text_general" indexed="true" stored="true"/>
<field name="name" type="text_general" indexed="true" stored="true"/>
<field name="salary" type="text_general" indexed="true" stored="true"/>
<field name="country" type="text_general" indexed="true" stored="true"/>
<!-- child table fields -->
<field name="child_id" type="text_general" indexed="true" stored="true"/>
<field name="child_name" type="text_general" indexed="true" stored="true"/>
Result of indexed documents are not nested, it seems flat representation:
"response":{"numFound":4,"start":0,"docs":[
{
"country":"IND",
"parent_id":"1",
"name":"p1",
"salary":"11",
"_version_":1582614969479856128
},
{
"id":"1",
"child_id":"1",
"child_name":"c1",
"_version_":1582614969479856128
},
{
"country":"USA",
"parent_id":"2",
"name":"p2",
"salary":"222",
"_version_":1582614969546964992
},
{
"id":"2",
"child_id":"2",
"child_name":"c2",
"_version_":1582614969546964992
}
]
}
Expected:
"response":{"numFound":4,"start":0,"docs":[
{
"parent_id":"1",
"country":"IND",
"name":"p1",
"salary":"11",
"child":{
"parent_id":"1",
"child_id":"1",
"child_name":"c1",
},
"_version_":1582614969479856128
},
{
"parent_id":"2",
"country":"USA",
"name":"p2",
"salary":"222",
"child":{
"parent_id":"2",
"child_id":"2",
"child_name":"c2",
},
"_version_":1582614969546964992
}
]
}
Solr stores the child docs as independent docs too, so what you see is normal. But there is some plumbing so you can get them back with the parent (and query one layer and get the other etc).
Read carefully this post by Yonik, and see how you must query to get children too etc.
I am using Solr DataImportHandler module. Here is my config;
<dataConfig>
<dataSource type="JdbcDataSource"
name="sql"
driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
url="jdbc:sqlserver://localhost;databaseName=AdventureWorks2008;integratedSecurity=true;"/>
<document>
<entity name="Person" dataSource="sql"
pk="BusinessEntityID"
query="select BusinessEntityID,FirstName,LastName FROM [Person].[Person]"
deltaImportQuery="select BusinessEntityID,FirstName,LastName FROM [Person].[Person] WHERE id='${dih.delta.id}'"
deltaQuery="SELECT BusinessEntityID FROM [Person].[Person] WHERE ModifiedDate > '${dih.last_index_time}'">
<field column="BusinessEntityID" name="id"/>
<field column="FirstName" name="firstname"/>
<field column="LastName" name="lastname"/>
</entity>
</document>
</dataConfig>
for some reason, only id field is importing but not the rest.
What would be the reason? Am I missing something?
You might have missed the below entries in the schema.xml file
<field name="id" type="string" indexed="true" stored="true" required="true"/>
<field name="firstname" type="string" indexed="true" stored="true"/>
<field name="lastname" type="string" indexed="true" stored="true"/>
Here type for id can be int. Just check what you want.
<field name="id" type="int" indexed="true" stored="true" required="true"/>
Make sure your Id and unique field is Proper.
I was facing same issue, change Pk and unique field name and it's working fine.
I am in the process of redesigning one of our companies site. My boss wants to play around with the idea of replacing all of our navigation with a search box.. the search box should be able to query any of our tables of unrelated data.
So right now I am trying it with 5 tables.
Products
Manufacturers
Category
Ingredients
Uses
So should be able to lookup a product name, a manufacturer name, a category name, an ingredient name, or a use name
When I retrieve the results. if the user clicked on a manufacturer search result.. It will take them to a manufacturer page that lookups all products for that manufacturer.
When clicks on a product page.. link will take them to that actual product information.
Ingredient will take them to a page that will show all products containing that ingredient.
Anyways here is my data config
<dataConfig>
<dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/xxx" user="xxx" password="xxx" />
<document>
<entity name="manufacturer" transformer="TemplateTransformer" pk="manNum"
query="SELECT manNum, manName FROM manufacturer
WHERE active = 1">
<field column="id" name="id" template="MAN-${manNum}" />
<field column="type" template="manufacturer" name="type"/>
<field column="manName" name="text"/>
<field column="manNum" name="manNum"/>
</entity>
<entity name="product" transformer="TemplateTransformer"
query="SELECT products.prodNum, products.prodName as text, m.manName FROM products JOIN man m USING (manNum)
WHERE products.active = 1
AND (hideWeb = 0 or hideWeb IS NULL)">
<field column="id" template="PROD-${products.prodNum}" name="id"/>
<field column="type" template="product" name="type"/>
<field column="text" name="text"/>
<field column="manName" name="manName"/>
</entity>
<entity name="ingredients" transformer="TemplateTransformer" pk="id"
query="SELECT id, text FROM inglist WHERE sort != ''">
<field column="id" name="id" template="ING-${inglist.id}"/>
<field column="type" template="ingredient" name="type"/>
<field column="text" name="text" />
</entity>
<entity name="uses" transformer="TemplateTransformer" pk="id"
query="SELECT id, text FROM useslist">
<field column="id" name="id" template="USE-${id}"/>
<field column="type" template="use" name="type"/>
<field column="text" name="text"/>
</entity>
<entity name="categories" transformer="TemplateTransformer" pk="id"
query="SELECT id, textShow as text FROM categorylist">
<field column="id" name="id" template="CATEGORY-${id}"/>
<field column="type" template="category" name="type"/>
<field column="text" name="text"/>
</entity>
</document>
</dataConfig>
And my schema..
<fields>
<field name="id" type="string" indexed="true" stored="true"/>
<field name="text" indexed="true" stored="true" type="text"/>
<field name="type" type="string" indexed="false" stored="true"/>
<field name="manName" type="text" indexed="false" stored="true"/>
<field name="manNum" type="string" indexed="false" stored="false"/>
</fields>
Now perhaps I am not doing this the right way... and there may be a better way to handle this.
Anyways the problem I am running into right now is that I am getting the error missing required field "id". Now products query and manufacturer query does not have an id column in the select.. but I thought the transform query should take care of it? If I do the select prodNum as id .. then all the ids are overwritting each other.
Now I could probably concat it in the actual query.. and will do so as a last resort, but would like to know what I am doing wrong with this solution.
EDIT
Nevermind, it was just a noob issue, for some reason I was thinking that the template variable was refering to the table name in the SQL not the entity name,
So I replaced all of the
With
And it worked.
Prefixing the table-specific ID with a distinct character or string is a good idea. I do it in the SQL, which allows me to check the behavior outside of Solr.
select
concat('b',cast(b.id as char)) as id,
...
It Was a noob issue,
for some reason I was thinking that the template variable was refering to the table name in the SQL not the entity name.
I do it like this:
<entity name="GG-Boryslaw-1939-Phonebook"
transformer="TemplateTransformer,DateFormatTransformer"
pk="id"
query="SELECT * FROM boryslaw_1939_phonebook">
<field column="record_id" template="GG-Boryslaw-1939-Phonebook-${GG-Boryslaw-1939-Phonebook.id}" />
<field column="record_type" template="phonebook" />
<field column="record_source" template="Boryslaw Phonebook (1939)" />
<field column="record_date" template="${GG-Boryslaw-1939-Phonebook.Year}" dateTimeFormat="yyyy" />
...etc...
</entity>
This is my data-config.xml
<dataConfig>
<dataSource name="a" type="URLDataSource" encoding="UTF-8" connectionTimeout="5000" readTimeout="10000"/>
<document name="products">
<entity name="images" dataSource="a"
url="file:///abc/1299.xml"
processor="XPathEntityProcessor"
forEach="/imagesList/image"
>
<field column="id" xpath="/imageList/image/productId" />
<field column="image_array" xpath="/imageList/image/imageUrlString" />
</entity>
</document>
</dataConfig>
This is the schema.xml
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="image_array" type="text" indexed="true" stored="true" multivalued="true"/>
But when I try to deltaimport, none of the documents get added.
Any help will be highly appreciated.
Well first off, your XPath says imageList and your XML says imagesList ...