Solr : range queries on multiValued fields in sub-entities? - solr

I'm using Solr to index some stuff.
I got a dataconfig.xml with a root entity and a sub entity, like this one :
<entity name="item" query="select id, qty from item">
<field column="id" name="id" />
<field column="qty" name="qty" />
<entity name="prices" query="select price from prices where item_id='${item.id}'">
<field column="price" name="price" />
</entity>
</entity>
And the corresponding schema.xml :
<fields>
<field name="id" type="integer" indexed="true" stored="true" />
<field name="qty" type="sint" indexed="true" stored="true" />
<field name="price" type="sint" indexed="true" stored="true" multiValued="true" />
</fields>
Fields qty (from the root entity) and price (from the sub-entity) are both of type sint (to allow range queries), and price is multiValued since there can be multiple prices values for one item.
When I do a range query on qty, it works as expected. For example, qty:[* TO 10] returns elements with qty up to 10.
But when I do a range query on price, it doesn't work at all : price:[* TO 100] returns elements with prices even over 100 !
Hence my question: are range queries supposed to work on multiValued fields from "sub-entities" ?

Related

Solr: Indexing child Documents via db-data-config.xml query

I am trying to index nested documents to with respect to parent docment, but does not find expected structure of indexed data in SOLR. Please correct me what is going wrong in solr configuration as mention below.
table structure:
enter image description here
db-data-config.xml
<document>
<entity name="parent" pk="parent_id" query="SELECT parent_id, name, salary, country from parent" deltaQuery="select parent_id, name, salary, country from parent where updated_at &gt ${dataimporter.last_index_time}">
<field column="parent_id" name="id" />
<field column="parent_id" name="parent_id" />
<field column="name" name="name" />
<field column="salary" name="salary" />
<field column="country" name="country" />
<entity name="child" child="true" pk="child_id" query="select child.child_id, child.parent_id, child.child_name from child where child.parent_id='${parent.parent_id}' ">
<field column="parent_id" name="id" />
<field column="child_id" name="child_id" />
<field column="child_name" name="child_name" />
</entity>
</entity>
</document>
managed-schema:
<!-- parent table fields -->
<field name="parent_d" type="text_general" indexed="true" stored="true"/>
<field name="name" type="text_general" indexed="true" stored="true"/>
<field name="salary" type="text_general" indexed="true" stored="true"/>
<field name="country" type="text_general" indexed="true" stored="true"/>
<!-- child table fields -->
<field name="child_id" type="text_general" indexed="true" stored="true"/>
<field name="child_name" type="text_general" indexed="true" stored="true"/>
Result of indexed documents are not nested, it seems flat representation:
"response":{"numFound":4,"start":0,"docs":[
{
"country":"IND",
"parent_id":"1",
"name":"p1",
"salary":"11",
"_version_":1582614969479856128
},
{
"id":"1",
"child_id":"1",
"child_name":"c1",
"_version_":1582614969479856128
},
{
"country":"USA",
"parent_id":"2",
"name":"p2",
"salary":"222",
"_version_":1582614969546964992
},
{
"id":"2",
"child_id":"2",
"child_name":"c2",
"_version_":1582614969546964992
}
]
}
Expected:
"response":{"numFound":4,"start":0,"docs":[
{
"parent_id":"1",
"country":"IND",
"name":"p1",
"salary":"11",
"child":{
"parent_id":"1",
"child_id":"1",
"child_name":"c1",
},
"_version_":1582614969479856128
},
{
"parent_id":"2",
"country":"USA",
"name":"p2",
"salary":"222",
"child":{
"parent_id":"2",
"child_id":"2",
"child_name":"c2",
},
"_version_":1582614969546964992
}
]
}
Solr stores the child docs as independent docs too, so what you see is normal. But there is some plumbing so you can get them back with the parent (and query one layer and get the other etc).
Read carefully this post by Yonik, and see how you must query to get children too etc.

Solr Indexing SQL record with duplicate uniqueKey

We need a full-text search for a db with millions of records (music meta-data) and I've only been working on Solr for 2 weeks roughly, I need some help regarding indexing. I am using DataImportHandler and have SQL query that generates result like this:
As you can see in the attached image above, the id (Integer data type) is repeated in the SQL result also used for in DIH and when I set uniqueKey to <uniqueKey>id</uniqueKey> solr overwites the values leaving only one record/row , in fact I think the last one processed which is the one with countryCode 'TL'.
When I first had this issue, I knew why solr was overwriting the value, its's normal so I thought of adding a global identifer to each record in db, a guid - without thinking things properly, I ended up up with same duplicates as you can see charGuid which is a uuid() from MySQL is duplicated.
But when I use the charGuid (String data type) as uniqueKey to <uniqueKey>charGuid</uniqueKey>, I get all records indexed and nothing is overwritten but of course duplicates are inevitable. The problem I for-see here is when I have to do an incremental update, solr will not be able to know which document to update exactly, In fact a quick test from admin console, revealed that the last or first record its find with that unique key is updated. - This is not acceptable.
I stumbled upon an article referencing multiValued="true", I thought making the fields that represents a JOIN column in my SQL will do the trick, but it doesn't. I was hoping a record with id:10 will be returned with a List of countryCode but no.
I am just puzzled as to how to circumvent this issue and why I did not find a similar problem posted by someone.
If I don't get a meaningful answer, I guess I will have to use charGuid as <uniqueKey> which allows duplicate and then use Solr Document Deduplication Detection to handle updates of my index but I want to believe, there is a better way.
Update
Here is my data-config.xml and schema.xml defination:
<entity name="albums" query="select * from Album">
<entity name="track" query="select t.id as id, t.title as trackTitle, t.removed as trackRemovedDate, t.productState from Track t where t.albumId='${albums.id}'"/>
<entity name="albumSalesAreaId" query="select asa.salesAreaId as albumSalesAreaId from AlbumSalesArea asa where asa.albumId='${albums.id}'"/>
<entity name="albumSalesArea" query="select sa.name as albumSalesArea from SalesArea sa where sa.id='${albumSalesAreaId.salesAreaId}'"/>
<entity name="salesAreaCountry" query="select sac.countryId as 'salesAreaCountry' from SalesAreaCountry sac where sac.salesAreaId ='${salesArea.id}'"/>
<entity name="countryId" query="select c.id as 'countryId' from Country c where c.id = '${salesAreaCountry.countryId}'"/>
<entity name="countryName" query="select c.name as 'countryName' from Country c where c.id = '${salesAreaCountry.countryId}'"/>
</entity>
**Schema.xml**
<!--new multivalue fields -->
<field name="albumSalesArea" type="int" stored="true" indexed="true" multiValued="true"/>
<field name="albumSalesAreaId" type="int" indexed="true" stored="true" multiValued="true"/>
<field name="salesAreaCountry" type="int" stored="true" indexed="true" multiValued="true"/>
<field name="countryId" type="int" indexed="true" stored="true" multiValued="true"/>
<field name="countryName" type="text_general" indexed="true" stored="true" multiValued="true"/>
When I compare my solr response with SQL result, I see countryCode but solr has none, only returned
"albumSalesAreaId": [
1,
3
],
Not sure why country etc not showing up.
Update 2
data-config.xml
<document name="content">
<entity name="albums" query="select * from Album">
<entity name="tracks" query="select t.id, t.title, t.removed, t.productState from Track t where t.albumId='${albums.id}'">
<field column="id" name="id" />
<field column="title" name="trackTitle" />
<field column="removed" name="trackRemovedDate" />
<field column="productState" name="trackProductState" />
</entity>
<entity name="albumSalesAreaIds" query="select salesAreaId from AlbumSalesArea where albumId = '${albums.id}'">
<field column="salesAreaId" name="albumSalesAreaId"/>
</entity>
<entity name="albumSalesAreaNames" query="select name from SalesArea where id = '${albumSalesAreaIds.salesAreaId}'">
<field column="name" name="albumSalesArea"/>
</entity>
<entity name="salesAreaCountryIds" query="select countryId from SalesAreaCountry where salesAreaId ='${albumSalesAreaIds.salesAreaId}'">
<field column="countryId" name="countryId" />
</entity>
<entity name="salesAreaCountry" query="select name from Country where id ='${salesAreaCountryIds.countryId}'">
<field column="name" name="countryName" />
</entity>
<field column="title" name="albumTitle"/>
<field column="removed" name="albumRemovedDate"/>
<field column="productState" name="albumProductState" />
</entity>
</document>
schema.xml
<field name="catchall" type="text_general" stored="true" indexed="true" multiValued="true"/>
<field name="publisher" type="text_general" indexed="true" stored="true"/>
<field name="uuid" type="binary" indexed="false" stored="true"/>
<field name="trackRemovedDate" type="tdate" indexed="true" stored="true"/>
<field name="albumRemovedDate" type="tdate" indexed="true" stored="true"/>
<field name="trackProductState" type="int" indexed="true" stored="true"/>
<field name="albumProductState" type="int" indexed="true" stored="true"/>
<field name="countryCode" type="text_general" indexed="true" stored="true" multiValued="true"/>
<field name="albumTitle" type="text_general" indexed="true" stored="true"/>
<field name="trackTitle" type="text_general" indexed="true" stored="true" multiValued="true"/>
<field name="guid" type="text_general" indexed="true" stored="true"/>
<!--new multivalue fields -->
<field name="albumSalesAreaId" type="int" indexed="true" stored="true" multiValued="true"/>
<field name="salesAreaCountry" type="int" stored="true" indexed="true" multiValued="true"/>
<field name="countryId" type="int" indexed="true" stored="true" multiValued="true"/>
<field name="countryName" type="text_general" indexed="true" stored="true" multiValued="true"/>
<field name="albumSalesArea" type="text_general" indexed="true" stored="true" multiValued="true"/>
sample solr response for id:5
{
"responseHeader": {
"status": 0,
"QTime": 1,
"params": {
"indent": "true",
"q": "id:5",
"_": "1383221233535",
"wt": "json"
}
},
"response": {
"numFound": 1,
"start": 0,
"docs": [
{
"id": "5",
"catchall": [
"5",
"Test Album 5",
"2011-10-21 00:00:00.0",
"[B#261ca3cb",
"Test Track 1",
"Ya man 2",
"2011-10-17 16:21:29.0",
"1",
"1450412569164513280"
],
"albumTitle": "Test Album 5",
"albumRemovedDate": "2011-10-21T00:00:00Z",
"uuid": "6oT/MMl+RDaPyKpGK1KN0w==",
"trackTitle": [
"Test Track 1",
"Ya man 2"
],
"trackRemovedDate": "2011-10-17T16:21:29Z",
"albumSalesAreaId": [
1
],
"_version_": 1450412569164513300
}
]
}
}
SQL result for id:5
trackTitle and albumSalesAreaId seem to be correct but not sure why others not been included however if hard code the albumSalesAreaNames entiy with from SalesArea where id = 1, then I get albumSalesArea field added to result, so it seem like from SalesArea where id = '${albumSalesAreaIds.salesAreaId}'" is returning null, also confirmed from by 'IN' test earlier.
This looks really a problem simply solved with a multivalued field.
If you use multivalued field in this structure what you will obtain is one document with ID=10, all the duplicated values will just be there once and all other fields will be multivalued. For example the NAME field will contain 4 different countries and so the country_code.
have a look at this article on how to structure your dataimportHandler to achieve this:
http://wiki.apache.org/solr/DataImportHandler#Full_Import_Example
basically you need one query for each multivalued field:
<dataConfig>
<dataSource driver="org.hsqldb.jdbcDriver" url="jdbc:hsqldb:/temp/example/ex" user="sa" />
<document name="products">
<entity name="item" query="select * from item">
<field column="ID" name="id" />
<field column="code" name="code" />
<entity name="countryName" query="select name from countrytable where item_id='${item.ID}'">
<field name="name" column="description" />
</entity>
<entity name="countryCode" query="select countryCode from countrytable where item_id='${item.ID}'">
</entity>
</entity>
</document>
(Posted on behalf of the OP).
SOLUTION
<entity name="albumSalesAreaNames" query="select name from SalesArea where id = '${albumSalesAreaIds.salesAreaId}'">
<field column="name" name="albumSalesArea"/>
</entity>
<field column="salesAreaId" name="albumSalesAreaId"/>
</entity>

Struggling with learning solr

I am in the process of redesigning one of our companies site. My boss wants to play around with the idea of replacing all of our navigation with a search box.. the search box should be able to query any of our tables of unrelated data.
So right now I am trying it with 5 tables.
Products
Manufacturers
Category
Ingredients
Uses
So should be able to lookup a product name, a manufacturer name, a category name, an ingredient name, or a use name
When I retrieve the results. if the user clicked on a manufacturer search result.. It will take them to a manufacturer page that lookups all products for that manufacturer.
When clicks on a product page.. link will take them to that actual product information.
Ingredient will take them to a page that will show all products containing that ingredient.
Anyways here is my data config
<dataConfig>
<dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/xxx" user="xxx" password="xxx" />
<document>
<entity name="manufacturer" transformer="TemplateTransformer" pk="manNum"
query="SELECT manNum, manName FROM manufacturer
WHERE active = 1">
<field column="id" name="id" template="MAN-${manNum}" />
<field column="type" template="manufacturer" name="type"/>
<field column="manName" name="text"/>
<field column="manNum" name="manNum"/>
</entity>
<entity name="product" transformer="TemplateTransformer"
query="SELECT products.prodNum, products.prodName as text, m.manName FROM products JOIN man m USING (manNum)
WHERE products.active = 1
AND (hideWeb = 0 or hideWeb IS NULL)">
<field column="id" template="PROD-${products.prodNum}" name="id"/>
<field column="type" template="product" name="type"/>
<field column="text" name="text"/>
<field column="manName" name="manName"/>
</entity>
<entity name="ingredients" transformer="TemplateTransformer" pk="id"
query="SELECT id, text FROM inglist WHERE sort != ''">
<field column="id" name="id" template="ING-${inglist.id}"/>
<field column="type" template="ingredient" name="type"/>
<field column="text" name="text" />
</entity>
<entity name="uses" transformer="TemplateTransformer" pk="id"
query="SELECT id, text FROM useslist">
<field column="id" name="id" template="USE-${id}"/>
<field column="type" template="use" name="type"/>
<field column="text" name="text"/>
</entity>
<entity name="categories" transformer="TemplateTransformer" pk="id"
query="SELECT id, textShow as text FROM categorylist">
<field column="id" name="id" template="CATEGORY-${id}"/>
<field column="type" template="category" name="type"/>
<field column="text" name="text"/>
</entity>
</document>
</dataConfig>
And my schema..
<fields>
<field name="id" type="string" indexed="true" stored="true"/>
<field name="text" indexed="true" stored="true" type="text"/>
<field name="type" type="string" indexed="false" stored="true"/>
<field name="manName" type="text" indexed="false" stored="true"/>
<field name="manNum" type="string" indexed="false" stored="false"/>
</fields>
Now perhaps I am not doing this the right way... and there may be a better way to handle this.
Anyways the problem I am running into right now is that I am getting the error missing required field "id". Now products query and manufacturer query does not have an id column in the select.. but I thought the transform query should take care of it? If I do the select prodNum as id .. then all the ids are overwritting each other.
Now I could probably concat it in the actual query.. and will do so as a last resort, but would like to know what I am doing wrong with this solution.
EDIT
Nevermind, it was just a noob issue, for some reason I was thinking that the template variable was refering to the table name in the SQL not the entity name,
So I replaced all of the
With
And it worked.
Prefixing the table-specific ID with a distinct character or string is a good idea. I do it in the SQL, which allows me to check the behavior outside of Solr.
select
concat('b',cast(b.id as char)) as id,
...
It Was a noob issue,
for some reason I was thinking that the template variable was refering to the table name in the SQL not the entity name.
I do it like this:
<entity name="GG-Boryslaw-1939-Phonebook"
transformer="TemplateTransformer,DateFormatTransformer"
pk="id"
query="SELECT * FROM boryslaw_1939_phonebook">
<field column="record_id" template="GG-Boryslaw-1939-Phonebook-${GG-Boryslaw-1939-Phonebook.id}" />
<field column="record_type" template="phonebook" />
<field column="record_source" template="Boryslaw Phonebook (1939)" />
<field column="record_date" template="${GG-Boryslaw-1939-Phonebook.Year}" dateTimeFormat="yyyy" />
...etc...
</entity>

Multiple Indexes in same Solr Core..?

I am using Apache Solr..I have the following Scenario.. :
I have Two table in my PostGreSQL database. One is "Cars". Other is "Dealers"
Now i have a data-config file for Cars like the following :
<document name="offerings">
<entity name="jc_offerings" query="select * from jc_offerings" >
<field column="id" name="id" />
<field column="name" name="name" />
<field column="display_name" name="display_name" />
<field column="extra" name="extra" />
</entity>
</document>
I have a similar data--config.xml for "Dealers". It has the same fields as Cars : name, extra etc
Now in my Schema.xml , i have defined the following fields :
<fields>
<field name="id" type="string" indexed="true" />
<field name="name" type="name" indexed="true" />
<field name="extra" type="extra" indexed="true" />
<field name="CarsText" type="text_general" indexed="true"
stored="true" multiValued="true"/>
</fields>
<uniqueKey>id</uniqueKey>
<defaultSearchField>CarsText</defaultSearchField>
<copyField source="name" dest="CarsText"/>
<copyField source="extra" dest="CarsText"/>
Now i want to search like : "where name is Maruti"..So how will Solr know Whether to Search ::: Cars Field : name OR Dealer Field "name"..??
I have read to the following link : http://wiki.apache.org/solr/MultipleIndexes
But i am not able to understand how is works..??
After reading that link : I made another field in My Cars and Dealers *data-config.xml* .. Something like :
<field name="type" value="car" /> : in Cars date-config.xml
and
<field name="type" value="dealer" /> : in Cars date-config.xml
And then in Schema.xml i created a new field :
<field name="type" type="string" indexed="true" stored="true" />
And then i queried something like :
localhost:8983/solr/select?q=name:Maruti&fq=type:dealer
But it dint Worked..!!
So what should i do..??
if the fields are the same for both cars and dealers, you could use one index with an object defined like so:
<fields>
<field name="id" type="string" indexed="true" stored="true"/>
<field name="name" type="name" indexed="true" stored="true" />
<field name="extra" type="extra" indexed="true" stored="true" />
<field name="description_text" type="text_general" indexed="true" stored="true" multiValued="true"/>
<field name="type" type="string" indexed="true" stored="true" />
</fields>
this will work for both cars and dealers (so you don't need to have 2 indexes) and you'll use the "type" field to sort out if you want a "dealer" or a "car" (i'm using the same system to filter out similar types of objects with only a minor "semanthical" difference)
also you'll need to add stored="true" to the fields you want to retrieve, or you'll be only able to use them for searching (hence that index="true")
Adding a default value to the type field will ensure the type value being set to cars|dealer.
You will have to index the sources separately. Then use copy field and you can easily filter on either cars|dealer.
This does seem a bit tricky and is not explained well in the muti-indexes link referred to above.

Create index on two unrelated table in Solr

I want to create index between two tables, stock and auction. Basically I am working on a product site. So I have to create index on both tables. and they are not related at all.
In data-config.xml, that I created to create index, I wrote the following code
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/database" user="root" password=""/>
<document name="content">
<entity name="stock" query="select ST_StockID,ST_StockCode,ST_Name,ST_ItemDetail from stock where estatus = 'Active' limit 100">
<field column="ST_StockID" name="stock_ST_StockID" />
<field column="ST_StockCode" name="stock_ST_StockCode" />
<field column="ST_Name" name="stock_ST_Name" />
<field column="ST_ItemDetail" name="stock_ST_ItemDetail" />
<entity name="auction" query="select iauctionid,rad_number,vsku,auction_code from auction limit 100">
<field column="iauctionid" name="auction_iauctionid" />
<field column="rad_number" name="auction_rad_number" />
<field column="vsku" name="auction_vsku" />
<field column="auction_code" name="auction_auction_code" />
</entity>
</entity>
</document>
</dataConfig>
and the schema.xml contains the fields are given below.
<field name="stock_ST_StockID" type="string" indexed="true" stored="true" required="true"/>
<field name="stock_ST_StockCode" type="string" indexed="true" stored="true" required="true"/>
<field name="stock_ST_Name" type="string" indexed="true" stored="true" required="true"/>
<field name="stock_ST_ItemDetail" type="text" indexed="true" stored="true" required="true"/>
<field name="auction_iauctionid" type="string" indexed="true" stored="true" required="true"/>
<field name="auction_rad_number" type="string" indexed="true" stored="true" required="true"/>
<field name="auction_vsku" type="string" indexed="true" stored="true" required="true"/>
<field name="auction_auction_code" type="text" indexed="true" stored="true" required="true"/>
But this way the indexes are being created in wrong way as I put the other table data into the first table in data-config.xml. If I create two entity element like given below then the indexes are not being created.
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/lc" user="root" password=""/>
<document name="content">
<entity name="stock" query="select ST_StockID,ST_StockCode,ST_Name,ST_ItemDetail from stock where estatus = 'Active' limit 100">
<field column="ST_StockID" name="stock_ST_StockID" />
<field column="ST_StockCode" name="stock_ST_StockCode" />
<field column="ST_Name" name="stock_ST_Name" />
<field column="ST_ItemDetail" name="stock_ST_ItemDetail" />
</entity>
<entity name="auction" query="select iauctionid,rad_number,vsku,auction_code from auction limit 100">
<field column="iauctionid" name="auction_iauctionid" />
<field column="rad_number" name="auction_rad_number" />
<field column="vsku" name="auction_vsku" />
<field column="auction_code" name="auction_auction_code" />
</entity>
</document>
</dataConfig>
I did not get your answer, can you pls elaborate a little more. I also have the same requirement. I have two tables stock and auction. Basically I am working on a product site. So I have to create index on both tables. and they are not related at all.
Please help
Do you get any errors when indexing the data ??
The following data config is fine as you have two unrelated items.
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/lc" user="root" password=""/>
<document name="content">
<entity name="stock" query="select ST_StockID,ST_StockCode,ST_Name,ST_ItemDetail from stock where estatus = 'Active' limit 100">
<field column="ST_StockID" name="stock_ST_StockID" />
<field column="ST_StockCode" name="stock_ST_StockCode" />
<field column="ST_Name" name="stock_ST_Name" />
<field column="ST_ItemDetail" name="stock_ST_ItemDetail" />
</entity>
<entity name="auction" query="select iauctionid,rad_number,vsku,auction_code from auction limit 100">
<field column="iauctionid" name="auction_iauctionid" />
<field column="rad_number" name="auction_rad_number" />
<field column="vsku" name="auction_vsku" />
<field column="auction_code" name="auction_auction_code" />
</entity>
</document>
</dataConfig>
However, there are few things missing ?
Whats the id field for the entity ? As each document should have a unique id, the configuration seems missing above.
Also the id should be unqiue for the entites, else the stock and auction should overwrite each other.
So you may want the id append as stock_ & auction_
You can also add a static field as Stock and auction to your schema and populate them, which would help you the filter out the results when searching and hence improve the performance.
For Assigning the Ids -
You can use the following to create the id value - This should append the Stock_ with the ST_StockID field value.
<field column="id" template="Stock_#${stock.ST_StockID}" />
OR
Use alias in sql e.g. SELECT 'Stock_' || ST_StockID AS ID ..... as use -
<field column="id" name="id" />

Resources