How to write nested schema.xml in solr? - solr

How to write nested schema.xml in solr
The document in schema.xml says
<!-- points to the root document of a block of nested documents. Required for nested
document support, may be removed otherwise
-->
<field name="_root_" type="string" indexed="true" stored="false"/>
http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/collection1/conf/schema.xml?view=markup
Which can be used in
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers
What will be schema.xml for nesting the following items:
Person string
Address
city string
postcode string

I know this is an old question, but I ran into a similar issue. Modifying my solution for yours, the fields you need to add to your schema.xml are as follows:
<field name="person" type="string" indexed="true" stored="true" />
<field name="address" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="address.city" type="string" indexed="true" stored="true" />
<field name="address.postcode" type="string" indexed="true" stored="true" />
Then when you run it you should be able to add the following JSON to your Solr instance and see the matching output in the query:
{
"person": "John Smith",
"address": {
"city": "San Diego",
"postcode": 92093
}
}

Related

Search in json nested fields (schema + query)

I got the following JSON stored in a Riak bucket which handle Solr research.
{
"date" : 1535673489,
"customer" : {
"name" : "X"
"id" : 1205643
}
}
And my schema.xml fields look like that for the moment
<field name="_version_" type="long" indexed="true" stored="true"/>
<field name="date" type="int" indexed="true" stored="true" mult iValued="true"/>
And the research on date works perfectly fine with query as
$RIAK_HOST/search/query/order?wt=json&q=date:[1535553489%20TO%201535599999]
Unfortunately I didn't found any documentation that explains how to properly field and query on sub field as customer.name or customer.id
Edit: As I found on the following post Riak search schema and nested fields, it seems that I need to create the fields as follow:
<field name="customer_name" type="string" indexed="true" stored="true" mult iValued="true"/>
But then when I query on the fields, I got no answer to my request
Edit 2: I proceed to the following experimentation and I get no error from riak.
I uploaded the file
{
"customer_name" : "toto",
"customer" : {
"name" : "tata"
}
}
And on research Riak obtained the result from the field "toto" and not the one from "tata". Is it possible that the nesting research is unactivated or associated to another character?
The fields you need to add to your schema.xml are as follows:
<field name="date" type="string" indexed="true" stored="true"/>
<field name="customer" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="customer.name" type="string" indexed="true" stored="true"/>
<field name="customer.id" type="string" indexed="true" stored="true"/>
And then you need to query your index as follow:
$RIAK_HOST/search/query/order?wt=json&q=customer.name:t*

parent child indexing in apache solr

I'm new to Apache solr search. I'm not getting ho to get solr search result with child documents.
My entity in data-config.xml
<entity name="products" query="SELECT DISTINCT IDENTIFIER,PDT_NAME,PDT_DESCRIPTION FROM **PARENT_TABLE**"
deltaQuery="SELECT IDENTIFIER FROM PARENT_TABLE WHERE LAST_MODIFIED_DATE > '${dataimporter.last_index_time}'">
<field column="IDENTIFIER" name="pdtid" />
<field column="PDT_NAME" name="productname" />
<field column="PDT_DESCRIPTION" name="productdescription" />
<entity name="productVersions" child="true" query="SELECT DISTINCT child_id , child_name FROM WHERE IDENTIFIER = '${**products.IDENTIFIER**}'">
<field column="IDENTIFIER" name="productVersions.pdtesat" />
<field column="VERSION_NUMBER" name="productVersions.versionnum" />
<field column="DISPLAY_NAME" name="productVersions.displayname" />
</entity>
</entity>
field details in managed-schema file:
<field name="pdtid" type="text_general" indexed="true" stored="true" multiValued="false" />
<field name="productname" type="text_general" indexed="true" stored="true" multiValued="true" />
<field name="productnamerrr" type="text_general" indexed="true" stored="true" multiValued="false" />
<field name="productdescription" type="text_general" indexed="true" stored="true" multiValued="false" />
<field name="productVersions.childid" type="text_general" indexed="true" stored="true" multiValued="false" />
<field name="productVersions.versionnum" type="text_general" indexed="true" stored="true" multiValued="false" />
<field name="productVersions.displayname" type="text_general" indexed="true" stored="true" multiValued="false" />
I'm expecting my solr result should be :
"response":{"numFound":26,"start":0,"docs":[
{
"productdescription":" Java",
"productnamerrr":"pdtid",
"pdtid":"6591",
"child_docs" : [
"productVersions":[
"productVersions.childid":"123"
"productVersions.versionnum":"V1"
"productVersions.displayname":"disp"],
"productVersions":[
"productVersions.childid":"456"
"productVersions.versionnum":"V2"
"productVersions.displayname":"disp2"]
],
"id":"92689209-dc5f-4ae6-bd3c-d55dbd0e200c",
"_version_":1599132440456069120},
Please help me in getting the multiple child docs in json format after indexing.
May 2nd edit.
My query result from solr search like below.
"response":{"numFound":38,"start":0,"docs":[
{
"productdescription":" JIRA provides issue (bug) and project tracking
for the software development team.",
"productnamerrr":"Atlassian JIRA",
"productVersions":
["childid:6.x,versionnum:Jira 6.x,displayname :Withdrawn",
"childid:2.0.3,versionnum:Atlassian JIRA,displayname:Planning",
"childid:JIRA Server 5.0.1 - 6.3.15,versionnum:JIRA - JEditor,displayname :Withdrawn",
"childid:1.x,versionnum:Jira 1.x,displayname :Withdrawn"
],
"id":"0b5ba528-ef7a-49ba-a97b-2ea94922cbb5",
"_version_":1599297669816123392},
Edited on May 3-2018
returned data is correct. But the i'm expecting in parent child documents explicitly. getting child docs as below.
"productVersions":["childid:6.x,versionnum:Jira 6.x,displayname :Withdrawn",
"childid:2.0.3,versionnum:Atlassian JIRA,displayname:Planning",
"childid:JIRA Server 5.0.1 - 6.3.15,versionnum:JIRA - JEditor,displayname :Withdrawn",
"childid:1.x,versionnum:Jira 1.x,displayname :Withdrawn"
],
Expecting like below.
"productVersions":[
"productVersions.childid":"123"
"productVersions.versionnum":"V1"
"productVersions.displayname":"disp"],
"productVersions":[
"productVersions.childid":"456"
"productVersions.versionnum":"V2"
"productVersions.displayname":"disp2"]
],
How can i change the query to get child docs separately as a separate entity.??

Find duplicates objects with solr4 and Haystack

I use the facet mode of solr to find duplicates. It works pretty well but I can't figure how to get objects id's.
>>> from haystack.query import SearchQuerySet
>>> sqs = SearchQuerySet().facet('text_string', limit=-1)
>>> sqs.facet_counts()
{
'dates': {},
'fields': {
'text_string': [
('the red ballon', 4),
('my grand pa is an alien', 2),
('be kind rewind', 12),
],
},
'queries': {}
}
How can I get id of my objects 'the red ballon', 'my grand pa is an alien', etc. , do I have to add id field in the schema.xml of solr ?
I'm expecting something like that:
>>> sqs.facet_counts()
{
'dates': {},
'fields': {
'text_string': [
(object_id, 'the red ballon', 4),
(object_id, 'my grand pa is an alien', 2),
(object_id, 'be kind rewind', 12),
],
},
'queries': {}
}
EDIT: Added schema.xml and search_indexes.py
schema.xml for solr
...
<fields>
<!-- general -->
<field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="django_ct" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="django_id" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="_version_" type="long" indexed="true" stored ="true"/>
<dynamicField name="*_i" type="int" indexed="true" stored="true"/>
<dynamicField name="*_s" type="string" indexed="true" stored="true"/>
<dynamicField name="*_l" type="long" indexed="true" stored="true"/>
<dynamicField name="*_t" type="text_en" indexed="true" stored="true"/>
<dynamicField name="*_b" type="boolean" indexed="true" stored="true"/>
<dynamicField name="*_f" type="float" indexed="true" stored="true"/>
<dynamicField name="*_d" type="double" indexed="true" stored="true"/>
<dynamicField name="*_dt" type="date" indexed="true" stored="true"/>
<dynamicField name="*_p" type="location" indexed="true" stored="true"/>
<dynamicField name="*_coordinate" type="tdouble" indexed="true" stored="false"/>
<field name="text" type="text_en" indexed="true" stored="true" multiValued="false" termVectors="true" />
<field name="title" type="text_en" indexed="true" stored="true" multiValued="false" />
<!-- Used for duplicate content detection -->
<copyField source="title" dest="text_string" />
<field name="text_string" type="string" indexed="true" stored="true" multiValued="false" />
<field name="pk" type="long" indexed="true" stored="true" multiValued="false" />
</fields>
<!-- field to use to determine and enforce document uniqueness. -->
<uniqueKey>id</uniqueKey>
<!-- field for the QueryParser to use when an explicit fieldname is absent -->
<defaultSearchField>text</defaultSearchField>
<!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
<solrQueryParser defaultOperator="AND"/>
...
searche_indexes.py
class VideoIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
pk = indexes.IntegerField(model_attr='pk')
title = indexes.CharField(model_attr='title', boost=1.125)
def index_queryset(self, using=None):
return Video.on_site.all()
def get_model(self):
return Video
Faceting is the arrangement of search results into categories (which are based on indexed terms). Within each category, Solr reports on the number of hits for relevant term, which is called a facet constraint. Faceting makes it easy for users to explore search results on sites such as movie sites and product review sites, where there are many categories and many items within a category.
Here is good example of it...
faceting example by Yonik
faceting example on solr wiki
In your case you may need to fire a query again to get the id and othere details....

solr accurate match through copyField and defaultOperator

In the Solr schema.xml, configure the field & and copyField
<field name="title" type="text" indexed="true" stored="true" required="false" />
<field name="content" type="text" indexed="true" stored="true" required="false" />
<field name="all" type="text" indexed="true" stored="false" multiValued="true"/>
<copyField source="title" dest="all"/>
<copyField source="content" dest="all"/>
<solrQueryParser defaultOperator="AND"/>
And do dataimport which incude a document like this: title="sport",content="I like basket".
Now i set query string as:
all:sport basket
I intent to get this document through match title field to "sport" and match content field to "basket".
I mean when i use "all: sport basket", sorl will hit the the document from the source document:
<title>sport</title>
<content>I like basket</content>
But sorl copyField seems can't do this, can anyone help?

Multiple Indexes in same Solr Core..?

I am using Apache Solr..I have the following Scenario.. :
I have Two table in my PostGreSQL database. One is "Cars". Other is "Dealers"
Now i have a data-config file for Cars like the following :
<document name="offerings">
<entity name="jc_offerings" query="select * from jc_offerings" >
<field column="id" name="id" />
<field column="name" name="name" />
<field column="display_name" name="display_name" />
<field column="extra" name="extra" />
</entity>
</document>
I have a similar data--config.xml for "Dealers". It has the same fields as Cars : name, extra etc
Now in my Schema.xml , i have defined the following fields :
<fields>
<field name="id" type="string" indexed="true" />
<field name="name" type="name" indexed="true" />
<field name="extra" type="extra" indexed="true" />
<field name="CarsText" type="text_general" indexed="true"
stored="true" multiValued="true"/>
</fields>
<uniqueKey>id</uniqueKey>
<defaultSearchField>CarsText</defaultSearchField>
<copyField source="name" dest="CarsText"/>
<copyField source="extra" dest="CarsText"/>
Now i want to search like : "where name is Maruti"..So how will Solr know Whether to Search ::: Cars Field : name OR Dealer Field "name"..??
I have read to the following link : http://wiki.apache.org/solr/MultipleIndexes
But i am not able to understand how is works..??
After reading that link : I made another field in My Cars and Dealers *data-config.xml* .. Something like :
<field name="type" value="car" /> : in Cars date-config.xml
and
<field name="type" value="dealer" /> : in Cars date-config.xml
And then in Schema.xml i created a new field :
<field name="type" type="string" indexed="true" stored="true" />
And then i queried something like :
localhost:8983/solr/select?q=name:Maruti&fq=type:dealer
But it dint Worked..!!
So what should i do..??
if the fields are the same for both cars and dealers, you could use one index with an object defined like so:
<fields>
<field name="id" type="string" indexed="true" stored="true"/>
<field name="name" type="name" indexed="true" stored="true" />
<field name="extra" type="extra" indexed="true" stored="true" />
<field name="description_text" type="text_general" indexed="true" stored="true" multiValued="true"/>
<field name="type" type="string" indexed="true" stored="true" />
</fields>
this will work for both cars and dealers (so you don't need to have 2 indexes) and you'll use the "type" field to sort out if you want a "dealer" or a "car" (i'm using the same system to filter out similar types of objects with only a minor "semanthical" difference)
also you'll need to add stored="true" to the fields you want to retrieve, or you'll be only able to use them for searching (hence that index="true")
Adding a default value to the type field will ensure the type value being set to cars|dealer.
You will have to index the sources separately. Then use copy field and you can easily filter on either cars|dealer.
This does seem a bit tricky and is not explained well in the muti-indexes link referred to above.

Resources