indexing mutilple tables in solr using DIH - solr

I'm developing search engine using Solr and I've been successful in indexing data from one table using DIH (Dataimport Handler). What I need is to get search result from 5 different tables. I couldn't do this without help.
if we assume x table with x rows, there should be x * x documents from each table, which lead to, 5x documents if I have 5 tables as total. In dataconfig.xml, I created 5 seperate entities in single document as shown below. the result from indexed data when I query *:* is only 6 of the entity users and 3 from entity classes which is the number of users total rows which is 9.
Clearly, this way didn't work for me, so how can I achieve this using only single core?
note: I followed DIHQuickStart and DIH tutorial which didn't help me.
<document>
<!-- Users -->
<entity dataSource="JdbcDataSource" name=" >
<field column="name" name="name" sourceColName="name" />
<field column="username" name="username" sourceColName="username"/>
<field column="email" name="email" sourceColName="email" />
<field column="country" name="country" sourceColName="country" />
</entity>
<!-- Classes -->
<entity dataSource="JdbcDataSource" name="classes" >
<field column="code" name="code" sourceColName="code" />
<field column="title" name="title" sourceColName="title" />
<field column="description" name="description" sourceColName="description" />
</entity>
<!-- Schools -->
<entity dataSource="JdbcDataSource" name="schools" >
<field column="school_name" name="school_name" sourceColName="school_name" />
<field column="country" name="country" sourceColName="country" />
<field column="city" name="city" sourceColName="city" />
</entity>
<!-- Resources -->
<entity dataSource="JdbcDataSource" name="resources" >
<field column="title" name="title" sourceColName="title" />
<field column="description" name="description" sourceColName="description" />
</entity>
<!-- Tasks -->
<entity dataSource="JdbcDataSource" name="tasks" >
<field column="title" name="title" sourceColName="title" />
<field column="description" name="description" sourceColName="description" />
</entity>
</document>

you need to look at the structures of your tables then either create queries with joins or creat nested entities like this for example
<dataConfig>
<dataSource driver="org.hsqldb.jdbcDriver" url="jdbc:hsqldb:/temp/example/ex" user="sa" />
<document name="schools">
<entity name="school" query="select * from schools s ">
<field column="school_name" name="school_Name" />
<entity name="school_class" query="select * from schools_classes sc where sc.school_name = '${school.school_name}'">
<field column="class_code" name="class_code" />
<entity name="class" query="select class_name from classes c where c.class_name= '${school_class.class_code}'">
<field column="class_name" name="class_name" />
</entity>
</entity>
</entity>
</document>
</dataConfig>

Related

Solr DataImport set field to specific value [duplicate]

Im making an index in solr from db in the following way:
<document name="Index">
<entity name="c" query="SELECT * FROM C">
<field column="Name" name="name"/>
</entity>
<entity name="p" query="SELECT * FROM P">
<field column="Name" name="name"/>
</entity>
</document>
Is it possible to have a static field that is set for each row that signify what type is returned to client so that one can make a call to the right database table based on that information from the json result?
That is a field that has no column in the table
<field name="id" value="1"/>
Or is there another way to solve this?
<document name="Index">
<entity name="c" transformer="TemplateTransformer" query="SELECT * FROM C">
<field column="Name" name="name"/>
<field column="id" template="1"/>
</entity>
<entity name="p" transformer="TemplateTransformer" query="SELECT * FROM P">
<field column="Name" name="name"/>
<field column="id" template="1"/>
</entity>
</document>
You can add a column to your SQL query that contains static data like this:
<document name="Index">
<entity name="c" query="SELECT *, 'foo' as NameFromC FROM C">
<field column="NameFromC" name="name"/>
</entity>
<entity name="p" query="SELECT *, 'bar' as NameFromP FROM P">
<field column="NameFromP" name="name"/>
</entity>
</document>
If you try to add a field with only name and template attributes, Solr will throw an error saying Field must have a column attribute.

Static field for document in Data Import Handlerfor Solr

Im making an index in solr from db in the following way:
<document name="Index">
<entity name="c" query="SELECT * FROM C">
<field column="Name" name="name"/>
</entity>
<entity name="p" query="SELECT * FROM P">
<field column="Name" name="name"/>
</entity>
</document>
Is it possible to have a static field that is set for each row that signify what type is returned to client so that one can make a call to the right database table based on that information from the json result?
That is a field that has no column in the table
<field name="id" value="1"/>
Or is there another way to solve this?
<document name="Index">
<entity name="c" transformer="TemplateTransformer" query="SELECT * FROM C">
<field column="Name" name="name"/>
<field column="id" template="1"/>
</entity>
<entity name="p" transformer="TemplateTransformer" query="SELECT * FROM P">
<field column="Name" name="name"/>
<field column="id" template="1"/>
</entity>
</document>
You can add a column to your SQL query that contains static data like this:
<document name="Index">
<entity name="c" query="SELECT *, 'foo' as NameFromC FROM C">
<field column="NameFromC" name="name"/>
</entity>
<entity name="p" query="SELECT *, 'bar' as NameFromP FROM P">
<field column="NameFromP" name="name"/>
</entity>
</document>
If you try to add a field with only name and template attributes, Solr will throw an error saying Field must have a column attribute.

Struggling with learning solr

I am in the process of redesigning one of our companies site. My boss wants to play around with the idea of replacing all of our navigation with a search box.. the search box should be able to query any of our tables of unrelated data.
So right now I am trying it with 5 tables.
Products
Manufacturers
Category
Ingredients
Uses
So should be able to lookup a product name, a manufacturer name, a category name, an ingredient name, or a use name
When I retrieve the results. if the user clicked on a manufacturer search result.. It will take them to a manufacturer page that lookups all products for that manufacturer.
When clicks on a product page.. link will take them to that actual product information.
Ingredient will take them to a page that will show all products containing that ingredient.
Anyways here is my data config
<dataConfig>
<dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/xxx" user="xxx" password="xxx" />
<document>
<entity name="manufacturer" transformer="TemplateTransformer" pk="manNum"
query="SELECT manNum, manName FROM manufacturer
WHERE active = 1">
<field column="id" name="id" template="MAN-${manNum}" />
<field column="type" template="manufacturer" name="type"/>
<field column="manName" name="text"/>
<field column="manNum" name="manNum"/>
</entity>
<entity name="product" transformer="TemplateTransformer"
query="SELECT products.prodNum, products.prodName as text, m.manName FROM products JOIN man m USING (manNum)
WHERE products.active = 1
AND (hideWeb = 0 or hideWeb IS NULL)">
<field column="id" template="PROD-${products.prodNum}" name="id"/>
<field column="type" template="product" name="type"/>
<field column="text" name="text"/>
<field column="manName" name="manName"/>
</entity>
<entity name="ingredients" transformer="TemplateTransformer" pk="id"
query="SELECT id, text FROM inglist WHERE sort != ''">
<field column="id" name="id" template="ING-${inglist.id}"/>
<field column="type" template="ingredient" name="type"/>
<field column="text" name="text" />
</entity>
<entity name="uses" transformer="TemplateTransformer" pk="id"
query="SELECT id, text FROM useslist">
<field column="id" name="id" template="USE-${id}"/>
<field column="type" template="use" name="type"/>
<field column="text" name="text"/>
</entity>
<entity name="categories" transformer="TemplateTransformer" pk="id"
query="SELECT id, textShow as text FROM categorylist">
<field column="id" name="id" template="CATEGORY-${id}"/>
<field column="type" template="category" name="type"/>
<field column="text" name="text"/>
</entity>
</document>
</dataConfig>
And my schema..
<fields>
<field name="id" type="string" indexed="true" stored="true"/>
<field name="text" indexed="true" stored="true" type="text"/>
<field name="type" type="string" indexed="false" stored="true"/>
<field name="manName" type="text" indexed="false" stored="true"/>
<field name="manNum" type="string" indexed="false" stored="false"/>
</fields>
Now perhaps I am not doing this the right way... and there may be a better way to handle this.
Anyways the problem I am running into right now is that I am getting the error missing required field "id". Now products query and manufacturer query does not have an id column in the select.. but I thought the transform query should take care of it? If I do the select prodNum as id .. then all the ids are overwritting each other.
Now I could probably concat it in the actual query.. and will do so as a last resort, but would like to know what I am doing wrong with this solution.
EDIT
Nevermind, it was just a noob issue, for some reason I was thinking that the template variable was refering to the table name in the SQL not the entity name,
So I replaced all of the
With
And it worked.
Prefixing the table-specific ID with a distinct character or string is a good idea. I do it in the SQL, which allows me to check the behavior outside of Solr.
select
concat('b',cast(b.id as char)) as id,
...
It Was a noob issue,
for some reason I was thinking that the template variable was refering to the table name in the SQL not the entity name.
I do it like this:
<entity name="GG-Boryslaw-1939-Phonebook"
transformer="TemplateTransformer,DateFormatTransformer"
pk="id"
query="SELECT * FROM boryslaw_1939_phonebook">
<field column="record_id" template="GG-Boryslaw-1939-Phonebook-${GG-Boryslaw-1939-Phonebook.id}" />
<field column="record_type" template="phonebook" />
<field column="record_source" template="Boryslaw Phonebook (1939)" />
<field column="record_date" template="${GG-Boryslaw-1939-Phonebook.Year}" dateTimeFormat="yyyy" />
...etc...
</entity>

Create index on two unrelated table in Solr

I want to create index between two tables, stock and auction. Basically I am working on a product site. So I have to create index on both tables. and they are not related at all.
In data-config.xml, that I created to create index, I wrote the following code
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/database" user="root" password=""/>
<document name="content">
<entity name="stock" query="select ST_StockID,ST_StockCode,ST_Name,ST_ItemDetail from stock where estatus = 'Active' limit 100">
<field column="ST_StockID" name="stock_ST_StockID" />
<field column="ST_StockCode" name="stock_ST_StockCode" />
<field column="ST_Name" name="stock_ST_Name" />
<field column="ST_ItemDetail" name="stock_ST_ItemDetail" />
<entity name="auction" query="select iauctionid,rad_number,vsku,auction_code from auction limit 100">
<field column="iauctionid" name="auction_iauctionid" />
<field column="rad_number" name="auction_rad_number" />
<field column="vsku" name="auction_vsku" />
<field column="auction_code" name="auction_auction_code" />
</entity>
</entity>
</document>
</dataConfig>
and the schema.xml contains the fields are given below.
<field name="stock_ST_StockID" type="string" indexed="true" stored="true" required="true"/>
<field name="stock_ST_StockCode" type="string" indexed="true" stored="true" required="true"/>
<field name="stock_ST_Name" type="string" indexed="true" stored="true" required="true"/>
<field name="stock_ST_ItemDetail" type="text" indexed="true" stored="true" required="true"/>
<field name="auction_iauctionid" type="string" indexed="true" stored="true" required="true"/>
<field name="auction_rad_number" type="string" indexed="true" stored="true" required="true"/>
<field name="auction_vsku" type="string" indexed="true" stored="true" required="true"/>
<field name="auction_auction_code" type="text" indexed="true" stored="true" required="true"/>
But this way the indexes are being created in wrong way as I put the other table data into the first table in data-config.xml. If I create two entity element like given below then the indexes are not being created.
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/lc" user="root" password=""/>
<document name="content">
<entity name="stock" query="select ST_StockID,ST_StockCode,ST_Name,ST_ItemDetail from stock where estatus = 'Active' limit 100">
<field column="ST_StockID" name="stock_ST_StockID" />
<field column="ST_StockCode" name="stock_ST_StockCode" />
<field column="ST_Name" name="stock_ST_Name" />
<field column="ST_ItemDetail" name="stock_ST_ItemDetail" />
</entity>
<entity name="auction" query="select iauctionid,rad_number,vsku,auction_code from auction limit 100">
<field column="iauctionid" name="auction_iauctionid" />
<field column="rad_number" name="auction_rad_number" />
<field column="vsku" name="auction_vsku" />
<field column="auction_code" name="auction_auction_code" />
</entity>
</document>
</dataConfig>
I did not get your answer, can you pls elaborate a little more. I also have the same requirement. I have two tables stock and auction. Basically I am working on a product site. So I have to create index on both tables. and they are not related at all.
Please help
Do you get any errors when indexing the data ??
The following data config is fine as you have two unrelated items.
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/lc" user="root" password=""/>
<document name="content">
<entity name="stock" query="select ST_StockID,ST_StockCode,ST_Name,ST_ItemDetail from stock where estatus = 'Active' limit 100">
<field column="ST_StockID" name="stock_ST_StockID" />
<field column="ST_StockCode" name="stock_ST_StockCode" />
<field column="ST_Name" name="stock_ST_Name" />
<field column="ST_ItemDetail" name="stock_ST_ItemDetail" />
</entity>
<entity name="auction" query="select iauctionid,rad_number,vsku,auction_code from auction limit 100">
<field column="iauctionid" name="auction_iauctionid" />
<field column="rad_number" name="auction_rad_number" />
<field column="vsku" name="auction_vsku" />
<field column="auction_code" name="auction_auction_code" />
</entity>
</document>
</dataConfig>
However, there are few things missing ?
Whats the id field for the entity ? As each document should have a unique id, the configuration seems missing above.
Also the id should be unqiue for the entites, else the stock and auction should overwrite each other.
So you may want the id append as stock_ & auction_
You can also add a static field as Stock and auction to your schema and populate them, which would help you the filter out the results when searching and hence improve the performance.
For Assigning the Ids -
You can use the following to create the id value - This should append the Stock_ with the ST_StockID field value.
<field column="id" template="Stock_#${stock.ST_StockID}" />
OR
Use alias in sql e.g. SELECT 'Stock_' || ST_StockID AS ID ..... as use -
<field column="id" name="id" />

solr DIH - A problem about solr delta-imports

There is a problem when I use solr1.3 delta-imports to update the index. I have added the "last_modified" column in the table. After I use the "full-import" command to index the database data, the "dataimport.properties" file contains nothing, and when I use the "delta-import" command to update index, the solr list all the data in database not the lasted data. My db-data-config.xml:
deltaQuery="select shop_id from shop where last_modified > '${dataimporter.last_index_time}'">
<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
<dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/funguide" user="root" password="root"/>
<document name="shopinfo">
<entity name="shop" pk="shop_id"
query="select shop_id,title,description,tel,address,longitude,latitude from shop"
<field column="shop_id" name="id" />
<field column="title" name="title" />
<field column="description" name="description" />
<field column="tel" name="tel" />
<field column="address" name="address" />
<field column="longitude" name="longitude" />
<field column="latitude" name="latitude" />
</entity>
</document>
</dataConfig>
Anyboby know how to solve the problem? Thanks!
enzhaohoo#gmail.com
I also would recommend upgrading to Solr 1.4 RC as there have been quite a few improvements made to delta-imports with DataImportHandler. Please see DataImportHandler - Using delta-import command - wikipage for specifics.

Resources