SOLR nested entities query (Oracle SQL DIH) - solr

I want to setup my SOLR (8.5.2) schema in such a way that I can query a parent entity and get the child entities associated with it all in the same result. For example:
{
entityId: 1,
entityName: "something"
locations: [ ( <- nested entity)
{
locationId: 1,
locationName: "something"
},
{
locationId: 2,
locationName "something"
}
]
}
I have managed to import the data from an Oracle Database with nested entities, here's my dataconfig.xml
<document name="entities">
<entity name="entity"
query="select * from LIC_ENTITIES" >
<field column="ENT_ID" name="entityId"/>
<field column="NOMBRE" name="entityName"/>
<entity name="entity_locations"
child="true"
query="select * from LIC_ENTITIES_LOCATIONS where ent_ent_id ='${entity.ENT_ID}'">
<field column="LOC_ID" name="locationId"/>
<field column="NOMBRE" name="locationName"/>
</entity>
</entity>
</document>
Here's the schema.xml fields configuration:
<!-- If you don't use child/nested documents, then you should remove the next two fields: -->
<!-- for nested documents (minimal; points to root document) -->
<field name="_root_" type="string" indexed="true" stored="true" docValues="false" />
<!-- for nested documents (relationship tracking) -->
<field name="_nest_path_" type="_nest_path_" indexed="true" stored="true"/>
<fieldType name="_nest_path_" class="solr.NestPathField" />
<field name="_text_" type="text_general" indexed="true" stored="false" multiValued="true"/>
<field name="entityId" type="pint" docValues="false" indexed="true" stored="true"/>
<field name="entityName" type="string" docValues="false" indexed="true" stored="true"/>
<field name="locationId" type="pint" docValues="false" indexed="true" stored="true"/>
<field name="locationName" type="string" docValues="false" indexed="true" stored="true"/>
All the data is imported and I can query it just fine, but I can't query a parent entity and get the child entities at the same time.
I've tried using the Child Transformer (e.g [child parentFilter=entityId:274939]) but I get the following error :
Parent filter should not be sent when the schema is nested
I've tried using Block Join Query (e.g q={!parent which="entityId:274939"}) but it only returns either the parent or the child records.
I've tried using a multi-valued field to store the child elements but it that creates a flat array making it harder to select the child elements.
I've had to create separate entities and then make the relation between them in Node by querying them separately but I wanted to simplify it by having SOLR deliviring the data already formatted the way I wanted.
Is there any way to achieve this kind of result?

unfortunately currently available DIH does not support this field and solr drop support of dih. Currently it is going in sparete project and has few comunity support.

Related

Solr Nested Documents: Can't make non-anon nested documents (_nest_path_) work

I am trying to create an index with non-anonymous nested classes. My desired output from solr is:
"responseHeader":{
"status":0,
"QTime":8,
"params":{
"q":"discriminator:project",
"indent":"true",
"fl":"*,[child]",
"q.op":"OR",
"_":"1660714908720"}},
"response":{"numFound":1003,"start":0,"numFoundExact":true,"docs":[
{"name":"Project 1",
"id":"315500",
"discriminator":"project",
"_version_":1741444763087798272,
"publicContacts":[
{
"name":"Gurney Halleck",
"id":"315520",
"discriminator":"publicContact",
"_version_":1741444763087798272},
{
"name":"Thufir Hawat",
"id":"315530",
"discriminator":"publicContact",
"_version_":1741444763087798272}]},
I have read and followed: https://solr.apache.org/guide/8_0/indexing-nested-documents.html
and https://solr.apache.org/guide/8_11/indexing-nested-documents.html#indexing-nested-documents
If I add /just/
<field name="_root_" type="string" indexed="true" stored="false" docValues="false" />
to my schema.xml I can perform a query and get a result with anonymous nested documents returned as childDocuments
"responseHeader":{
"status":0,
"QTime":8,
"params":{
"q":"discriminator:project",
"indent":"true",
"fl":"*,[child]",
"q.op":"OR",
"_":"1660714908720"}},
"response":{"numFound":1003,"start":0,"numFoundExact":true,"docs":[
"name":"Project 1",
"id":"315500",
"discriminator":"project",
"_version_":1741444763087798272,
"_childDocuments_":[
{
"name":"Gurney Halleck",
"id":"315520",
"discriminator":"publicContact",
"_version_":1741444763087798272},
{
"name":"Thufir Hawat",
"id":"315530",
"discriminator":"publicContact",
"_version_":1741444763087798272}]
},
However, if I add
<fieldType name="_nest_path_" class="solr.NestPathField" />
<field name="_nest_path_" type="_nest_path_" stored="true" />
the nesting relationships are not created at all (not even anonymous childDocuments!) but my nexted documents are put in the index.
I am using DIH to index the documents:
<entity transformer="RegexTransformer" name="project" query="select * from project">
<!-- universal fields -->
<field column="discriminator"/>
<field column="id"/>
<field column="name"/>
<entity child="true" name="publicContacts" query="select * from project_public_contacts where project_id='${project.id}'">
<field column="discriminator"/>
<field column="id"/>
<field column="name"/>
</entity>
</entity>
What am I doing wrong?
After digging into this, I have found this is a defect in Solr's DIH. As of 8/29/20, Apache has determined that this defect will not be fixed due to the deprecation of DIH.
https://issues.apache.org/jira/browse/SOLR-14490?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel
I did find a work around, which is to populate the nest_path yourself in db-data-config.xml. For example:
<entity name="project" query="select * from project">
<!-- universal fields -->
<field column="discriminator"/>
<field column="id"/>
<field column="name"/>
<entity child="true" name="publicContacts" query="select * from project_public_contacts where project_id='${project.id}'">
<field column="discriminator"/>
<field column="id"/>
<field column="name"/>
<field column="nest" name="_nest_path_"/>
</entity>
<entity child="true" name="privateContacts" query="select * from project_private_contacts where project_id='${project.id}'">
<field column="discriminator"/>
<field column="id"/>
<field column="name"/>
<field column="nest" name="_nest_path_"/>
</entity>
</entity>
where the value looks like:
/publicContacts
or whatever you want the property to be named. For more details about how/what the _nest_path_ field should be set to, you can set the field to be stored in schema.xml and then populate the data with the SOLR REST endpoints or other means that are not DIH to see how it's populated. This is how I debugged this issue.
<field name="_nest_path_" type="_nest_path_" stored="true"/>
I also noted that the documentation is incorrect in the SOLR links I provided in my original post. You DO need to have fields defined in schema.xml for the named child documents. I received errors when trying to index through REST endpoints without them. My definitions are:
<field name="publicContacts" type="string" indexed="true" stored="true" required="false" multiValued="true"/>
<field name="privateContacts" type="string" indexed="true" stored="true" required="false" multiValued="true"/>

Search in json nested fields (schema + query)

I got the following JSON stored in a Riak bucket which handle Solr research.
{
"date" : 1535673489,
"customer" : {
"name" : "X"
"id" : 1205643
}
}
And my schema.xml fields look like that for the moment
<field name="_version_" type="long" indexed="true" stored="true"/>
<field name="date" type="int" indexed="true" stored="true" mult iValued="true"/>
And the research on date works perfectly fine with query as
$RIAK_HOST/search/query/order?wt=json&q=date:[1535553489%20TO%201535599999]
Unfortunately I didn't found any documentation that explains how to properly field and query on sub field as customer.name or customer.id
Edit: As I found on the following post Riak search schema and nested fields, it seems that I need to create the fields as follow:
<field name="customer_name" type="string" indexed="true" stored="true" mult iValued="true"/>
But then when I query on the fields, I got no answer to my request
Edit 2: I proceed to the following experimentation and I get no error from riak.
I uploaded the file
{
"customer_name" : "toto",
"customer" : {
"name" : "tata"
}
}
And on research Riak obtained the result from the field "toto" and not the one from "tata". Is it possible that the nesting research is unactivated or associated to another character?
The fields you need to add to your schema.xml are as follows:
<field name="date" type="string" indexed="true" stored="true"/>
<field name="customer" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="customer.name" type="string" indexed="true" stored="true"/>
<field name="customer.id" type="string" indexed="true" stored="true"/>
And then you need to query your index as follow:
$RIAK_HOST/search/query/order?wt=json&q=customer.name:t*

parent child indexing in apache solr

I'm new to Apache solr search. I'm not getting ho to get solr search result with child documents.
My entity in data-config.xml
<entity name="products" query="SELECT DISTINCT IDENTIFIER,PDT_NAME,PDT_DESCRIPTION FROM **PARENT_TABLE**"
deltaQuery="SELECT IDENTIFIER FROM PARENT_TABLE WHERE LAST_MODIFIED_DATE > '${dataimporter.last_index_time}'">
<field column="IDENTIFIER" name="pdtid" />
<field column="PDT_NAME" name="productname" />
<field column="PDT_DESCRIPTION" name="productdescription" />
<entity name="productVersions" child="true" query="SELECT DISTINCT child_id , child_name FROM WHERE IDENTIFIER = '${**products.IDENTIFIER**}'">
<field column="IDENTIFIER" name="productVersions.pdtesat" />
<field column="VERSION_NUMBER" name="productVersions.versionnum" />
<field column="DISPLAY_NAME" name="productVersions.displayname" />
</entity>
</entity>
field details in managed-schema file:
<field name="pdtid" type="text_general" indexed="true" stored="true" multiValued="false" />
<field name="productname" type="text_general" indexed="true" stored="true" multiValued="true" />
<field name="productnamerrr" type="text_general" indexed="true" stored="true" multiValued="false" />
<field name="productdescription" type="text_general" indexed="true" stored="true" multiValued="false" />
<field name="productVersions.childid" type="text_general" indexed="true" stored="true" multiValued="false" />
<field name="productVersions.versionnum" type="text_general" indexed="true" stored="true" multiValued="false" />
<field name="productVersions.displayname" type="text_general" indexed="true" stored="true" multiValued="false" />
I'm expecting my solr result should be :
"response":{"numFound":26,"start":0,"docs":[
{
"productdescription":" Java",
"productnamerrr":"pdtid",
"pdtid":"6591",
"child_docs" : [
"productVersions":[
"productVersions.childid":"123"
"productVersions.versionnum":"V1"
"productVersions.displayname":"disp"],
"productVersions":[
"productVersions.childid":"456"
"productVersions.versionnum":"V2"
"productVersions.displayname":"disp2"]
],
"id":"92689209-dc5f-4ae6-bd3c-d55dbd0e200c",
"_version_":1599132440456069120},
Please help me in getting the multiple child docs in json format after indexing.
May 2nd edit.
My query result from solr search like below.
"response":{"numFound":38,"start":0,"docs":[
{
"productdescription":" JIRA provides issue (bug) and project tracking
for the software development team.",
"productnamerrr":"Atlassian JIRA",
"productVersions":
["childid:6.x,versionnum:Jira 6.x,displayname :Withdrawn",
"childid:2.0.3,versionnum:Atlassian JIRA,displayname:Planning",
"childid:JIRA Server 5.0.1 - 6.3.15,versionnum:JIRA - JEditor,displayname :Withdrawn",
"childid:1.x,versionnum:Jira 1.x,displayname :Withdrawn"
],
"id":"0b5ba528-ef7a-49ba-a97b-2ea94922cbb5",
"_version_":1599297669816123392},
Edited on May 3-2018
returned data is correct. But the i'm expecting in parent child documents explicitly. getting child docs as below.
"productVersions":["childid:6.x,versionnum:Jira 6.x,displayname :Withdrawn",
"childid:2.0.3,versionnum:Atlassian JIRA,displayname:Planning",
"childid:JIRA Server 5.0.1 - 6.3.15,versionnum:JIRA - JEditor,displayname :Withdrawn",
"childid:1.x,versionnum:Jira 1.x,displayname :Withdrawn"
],
Expecting like below.
"productVersions":[
"productVersions.childid":"123"
"productVersions.versionnum":"V1"
"productVersions.displayname":"disp"],
"productVersions":[
"productVersions.childid":"456"
"productVersions.versionnum":"V2"
"productVersions.displayname":"disp2"]
],
How can i change the query to get child docs separately as a separate entity.??

Solr Stats on Tuple

I'm trying to use the Stats Component in my Datastax Solr instance.
The part of the schema I'm trying to get stats on looks like this:
<field name="foo" type="tuple" indexed="true" stored="true"/>
<field name="foo.start" type="bigint" indexed="false" stored="true"/>
<field name="foo.end" type="bigint" indexed="false" stored="true"/>
<field name="foo.time" type="int" indexed="true" stored="true"/>
However, when I try and use stats=true&stats.field={!tuple}foo.time with a *:* query I get the following:
"stats": {
"stats_fields": {
"foo.time": null
}
}
Is it not possible to use a {!tuple} for stats?
This is currently not supported unfortunately. Still you may contact Datastax support for further info.

How to index columns with same name but different data in solr

I have two table and both the tables have delete_status,but these columns have different data
CODE:(data-config.xml)
<entity name="category_masters" query="SELECT
category_updated,delete_status,category_id,category_name FROM category_masters
where category_id='${type_masters.category_id}'">
category_id=${category_masters.category_id}">
<field column="category_id" name="id"/>
<field column="category_name" name="category_name" indexed="true" stored="true" />
**<field column="delete_status" name="delete_status" indexed="true" stored="true" />**
<field column="category_updated" name="category_updated" indexed="true"
stored="true" />
</entity>
<entity name="type_masters" pk="type_id" query="SELECT
type_updated,delete_status as type_masters_delte,type_id,category_id,type_name FROM type_masters
where type_id='${businessmasters.Business_Type}' ">
<field column="type_id" name="id"/>
<field column="category_id" name="category_id" indexed="true" stored="true" />
<field column="type_name" name="type_name" indexed="true" stored="true" />
**<field column="delete_status" name="delete_status" indexed="true" stored="true" />**
<field column="type_updated" name="type_updated" indexed="true" stored="true" />
How do i display data from both the columns,i tried aliasing the columns but it does not work.
And when i query i only see one delete_status column,even if i make it multivalued how do i differentiate which delete_status belongs to which table.
I want the data separately and cant make changes in the database.
In your case, i would use the DIH. In that case, you could define an join to merge both tables in data-config.xml.
Using that file supports aliases for Column names, like table1.delete_status as type_masters_delete

Resources