Solr Indexing Design Requirement

Solr Indexing Design Requirement - solr

I am having a 5 tables in database namely State,District,City,Locality and Pincode(Hierarchy as mention).
Each table is having the Foreign Keys corresponding to all parents in hierarchy.But some of the Pincodes maynot have the locality id.I am trying to indexing this data with Solr.
So far i am indexing such as below
<doc>
<str name="state">Punjab</str>
<arr name="district">
<str>test</str>
<str>test1</str>
</arr>
<arr name="city">
<str>abc</str>
<str>dfsdf</str>
</arr>
<arr name="locality">
<str>fggf</str>
<str>gddd</str>
</arr>
<arr name="pincode">
<str>123</str>
<str>345</str>
</arr>
</doc>
But i hope this is not the correct way for fetching the data as there is no relation between district and city,city and locality etc..
help me on this

You are looking at this problem backwards. You need to work from the results. What do you want to find?
Imagine you already have everythin working correctly. Given that, what individual record would be in that search result (pincode-level entries?). Then, de-normalize down to that level and include all the information required to find that record.
See the presentation from Gilt regarding how they refactored their initial architecture to reflect their needs better. Ignore all the technical details for now, just follow the logic arguments.
Then, you will probably have a (separate) technical question on how to implement it.

Related

How precise should be a query specified in Solr query related listeners config?

If I have lots of requests which search selecting different addresses, may I use a wildcard for select query, selecting all addresses for warming in settings of query related listeners? I would like to cache all addresses to make subsequent queries of separate addresses faster. Or using wildcards for caching isn't possible?
<listener event="newSearcher" class="solr.QuerySenderListener">
<arr name="queries">
<lst>
<str name="q">address:*</str>
<str name="rows">10000</str>
</lst>
</arr>
</listener>
<listener event="firstSearcher" class="solr.QuerySenderListener">
<arr name="queries">
<lst>
<str name="q">address:*</str>
<str name="rows">10000</str>
</lst>
</arr>
</listener>

The query address:* retrieves all documents having a non-empty value in the field address, but that won't be that much useful for Solr's filter cache since a subsequent hit would only match the wildcard character as a filter.
You need to load documents where address field actually matches a precise value, and the wildcard character in this context will be treated as a unique filter for the filter cache, not as a cacthall.
So it's not that caching a wildcard query doesn't work but it doesn't warm the cache as you might expect/need, that is for all distinct values in the field (it could be useful as a "shortcut" to warm all possible results though, but imagine the cost of warming a wildcard query if the field is not restricted to a finite set..).
Instead you may have to use filter queries, each intersecting the whole set of documents (this always implies a main wildcard query q=*:* on which you apply a fq), and using one fq per possible value in the field - or per most frequently submitted values if the field is not restricted, which will load every (or the most frequently loaded) subsets of documents by addresses, which actually means warming the filter cache for each one of them.
https://lucene.apache.org/solr/guide/7_3/query-settings-in-solrconfig.html#filtercache

Can SOLR calculate and then query based on array position matches?

In our SOLR documents, we store the contributor first and last names as two separate arrays, that correspond to one another, along with their role in the publication.
In this case, I'd like to write a query to run from our middleware platform that only returns documents in which authors that correspond to certain contributor codes. An example is here:
<arr name="Role">
<str>Author</str>
<str>Cover</str>
<str>Author2</str>
<str>Summary</str>
</arr>
<arr name="Forename">
<str>George</str>
<str>John</str>
<str>George</str>
<str>Sue</str>
</arr>
<arr name="Surname">
<str>Anderson</str>
<str>Smith</str>
<str>Anderson</str>
<str>Maryson</str>
</arr>
but the positions of the Role array items will vary with each document...
Here, I'd like only to query Author and Author2, i.e. Forename and Surname array positions 1 and 3 "George" and "Anderson", and ignore John Smith and Sue Maryson.
Is this possible at the query level?
Thanks

Understading Solr nested queries

I'm trying to understand solr nested queries but I'm having a problem undestading the syntax.
I have the following two indexed documents (among others):
<doc>
<str name="city">Guarulhos</str>
<str name="name">Fulano Silva</str>
</doc>
<doc>
<str name="city">Fortaleza</str>
<str name="name">Fulano Cardoso Silva</str>
</doc>
If I query for q="Fulano Silva"~2&defType=edismax&qf=name&fl=score I have:
<doc>
<float name="score">28.038431</float>
<str name="city">Guarulhos</str>
<str name="name">Fulano Silva</str>
</doc>
<doc>
<float name="score">19.826164</float>
<str name="city">Fortaleza</str>
<str name="name">Fulano Cardoso Silva</str>
</doc>
So I thought that if I queried for:
q="Fulano Silva"~2 AND __query__="{!edismax qf=city}fortaleza" &defType=edismax&qf=name&fl=score
I'd give a bit more score for the second document, but actually I get an empty result set with numFound=0.
What am I doing wrong here?

Need to remove the "=" and replace it with ":" to use the nested query syntax:
q="Fulano Silva"~2 AND _query_:"{!edismax qf=city}fortaleza" &defType=edismax&qf=name&fl=score
*Use _query_: instead of _query_=
Hope this works...

EDIT: When you say q=, are you specifying the query in a URL, or is the text after the q= being put in an application or the Solr dashboard? If we're talking about a URL, you may need to use percent-encoding to get it to work. I mentioned that below, but since I haven't heard from you, I thought I'd reiterate.
Why don't you do q=name:"Fulano Silva" AND city:"fortaleza"?
Another possibility: q=_query_:"{!edismax qf='name'}Fulano Silva" AND city:"fortaleza"
If you're set on a nested query, select?defType=edismax&q="Fulano Silva" AND _query_:"{!edismax qf='city' v='fortaleza'}" should work, but the results and the way it matches will depend on what analyzers you are using to query and index name and city. Also, if these queries are in your query string, make sure you are
encoding them properly.
In order to help you any more, I need to know what you're trying to accomplish with your query. Then perhaps we can be sure you have the right indexing set up, that edismax is the right query handler, etc.

On top of the previous comments, the asker has mispelled _query_ as __query__ (note the double underscore in the second, mispelled, version); Solr expects _query_ to be spelled with only one underscore (_) before and one after the word query, not two.

Replacing SOLR output field value

I have below mentioned SOLR query which works fine.
query:"COMPLEX CONDITION 1" OR query:"COMPLEX CONDITION 2"
I get 4 documents in result - 2 from condition1 and 2 from condition2. I need to know documents belong to which condition.
I cannot figure out from the result as the conditions are too complex.
What i want to do is change the value of the "status" field in the output.
Lets say, status=Active for condition1 and status=Expired for condition2.
The current value of status is not accurate as the status is decided based on the conditions i use.
Is there a way to overwrite the output value of any field(s) in SOLR?

have you tried using highlighting to determine which documents matched which condition? If you turn on highlighting (&hl=on&hl.fl=<fields_you're_trying_to_match>), then Solr will return a structure at the end of the results structure (whether you're returning results in JSON or XML) called "highlighting." This structure in turn will contain structures named according to the unique key of your index (if there is one) with elements that match.
<lst name="highlighting">
<lst name="1">
<arr name="title">
<str>Bob <em>Jones</em></str>
</arr>
<arr name="category">
<str><em>Jones</em> Family</str>
</arr>
<arr name="description">
<str>This is a book about Bob <em>Jones</em>, the patriarch of the <em>Jones</em> Family.</str>
</arr>
<lst>
<lst>
More here:
How to return column that matched the query in Solr..?
Now I apologize that this doesn't answer the latter part of your question, but gives you some help for the first part.

Filter doc if a specified multivalued filed contains only one value

We encounter a query case that to filter doc if a specified multivalued filed contains only one value.
For instance:
We have an index of suit, including clothes ,trousers or other things. If there is only one product within a suit due to out of stock, we can't show the suit to user, because it's not 'suit'.
Here is our data:
<doc>
<int name="suitId">001</int>
<arr name="productName">
<str>T-shirt</str>
<str>jeans</str>
</arr>
</doc>
<doc>
<int name="suitId">002</int>
<arr name="productName">
<str>T-shirt</str>
</arr>
</doc>
We wanna except the suit of suitId=002.

It would be better to have a separate field maintaining the count of the products for a suit and use it to filter the suits.
I don't think you can use the range queries for the text multivalued fields.
you can probably use productName:[* TO *] to select suit having atleast one product, but not the count.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Solr Indexing Design Requirement - solr

Related

How precise should be a query specified in Solr query related listeners config?

Can SOLR calculate and then query based on array position matches?

Understading Solr nested queries

Replacing SOLR output field value

Filter doc if a specified multivalued filed contains only one value

Categories

Resources