Apache solr query api on how to search for a perfect match of a column values - solr

In solr repo we have tags for a page like 1,2,3 and user tags will have 1,2,3,4
Hence if user tags contains all page tags then we want give user the access
Spring solr Code below
Criteria criteria = new Criteria (“page tags”).in(usertags.getags())
Above code looks for OR checks i.e either user has 1,2 or 3. However I am looking for perfect match of page tags 1,2,3. Is there anyway to achieve this.
thanks in advance
Spring solr Code below
Criteria criteria = new Criteria (“page tags”).in(usertags.getags())
Above code looks for OR checks i.e either user has 1,2 or 3. However I am looking for perfect match of page tags 1,2,3. Is there anyway to achieve this.

Related

Get document's fields has more than N elements

In SOLR I have documents has fields like below;
"geolocation": [
"40.154400,-75.279900",
"40.117416,-75.119203",
"40.23931,-75.23126",
"40.18417,-75.07946"
]
I would like to get documents have geolocation field has more than 3 items, such as above.
How can I write this filter in solr?
I am looking something like;
len(geolocation) >= 3
upon indexing, just add the length of that field to another, custom field, then query the later one. There are several ways you can do this:
prepare the new field value on the client side
using the built in CountFieldValuesUpdateProcessorFactory. The example in the docs does exactly what you want

Solr - Remove collapsed groups from final result

I'm using Apache Solr 7.1 and using FieldCollapse feature to group documents based on a field.
Sample Document:
{id: "ASDF1234",count: 10, event: "Create"}
Sample request: http://localhost:8983/solr/brandNewComp000/select?fq={!collapse%20field=id%20sort=count%20desc}&q=*:*&rows=30
Grouping is working fine. But in the final response I want to exclude few documents based on a condition on event field. That is I want to exclude few collapsed documents in the final response.
Is it possible to do that?
Note: If I add another filter query (fq) or query (q) to filter on 'event' field then that filtering 'happens before grouping' which is NOT the behavior I am looking for. I want to exclude documents after collapsing is done. Please guide me.
I don't understand why you don't want to filter out before the grouping. That is a reasonable approach. Otherwise, you may have to filter by yourself in your application.

Solr - How do I get the number of documents for each field containing the search term within that field in Solr?

Imagine an index like the following:
id partno name description
1 1000.001 Apple iPod iPod by Apple
2 1000.123 Apple iPhone The iPhone
When the user searches for "Apple" both documents would be returned. Now I'd like to give the user the possibility to narrow down the results by limiting the search to one or more fields that have documents containing the term "Apple" within those fields.
So, ideally, the user would see something like this in the filter section of the ui after his first query:
Filter by field
name (2)
description (1)
When the user applies the filter for field "description", only documents which contain the term "Apple" within the field "description" would be returned. So the result set of that second request would be the iPod document only. For that I'd use a query like ?q=Apple&qf=description (I'm using the Extended DisMax Query Parser)
How can I accomplish that with Solr?
I already experimented with faceting, grouping and highlighting components, but did not really come to a decent solution to this.
[Update]
Just to make that clear again: The main problem here is to get the information needed for displaying the "Filter by field" section. This includes the names of the fields and the hits per field. Sending a second request with one of those filters applied already works.
Solr just plain Doesn't Do This. If you absolutely need it, I'd try it the multiple requests solution and benchmark it -- solr tends to be a lot faster than what people put in front of it, so an couple few requests might not be that big of a deal.
you could achieve this with two different search requests/queries:
name:apple -> 2 hits
description:apple -> 1 hit
EDIT:
You also could implement your own SearchComponent that executes multiple queries in the background and put it in the SearchHandler processing chain so you only will need a single query in the frontend.
if you want the term to be searched over the same fields every time, you have 2 options not breaking the "single query" requirement:
1) copyField: you group at index time all the fields that should match togheter. With just one copyfield your problem doesn't exist, if you need more than one, you're at the same spot.
2) you could filter the query each time dynamically adding the "fq" parameter at the end
http://<your_url_and_stuff>/?q=Apple&fq=name:Apple ...
this works if you'll be searching always on the same two fields (or you can setup them before querying) otherwise you'll always need at least a second query
Since i said "you have 2 options" but you actually have 3 (and i rushed my answer), here's the third:
3) the dismax plugin described by them like this:
The DisMaxQParserPlugin is designed to process simple user entered phrases
(without heavy syntax) and search for the individual words across several fields
using different weighting (boosts) based on the significance of each field.
so, if you can use it, you may want to give it a look and start from the qf parameters (that is what the option number 2 wanted to be about, but i changed it in favor of fq... don't ask me why...)
SolrFaceting should solve your problem.
Have a look at the Examples.
This can be achieved with Solr faceting, but it's not neat. For example, I can issue this query:
/select?q=*:*&rows=0&facet=true&facet.query=title:donkey&facet.query=text:donkey&wt=json
to find the number of documents containing donkey in the title and text fields. I may get this response:
{
"responseHeader":{"status":0,"QTime":1,"params":{"facet":"true","facet.query":["title:donkey","text:donkey"],"q":"*:*","wt":"json","rows":"0"}},
"response":{"numFound":3365840,"start":0,"docs":[]},
"facet_counts":{
"facet_queries":{
"title:donkey":127,
"text:donkey":4108
},
"facet_fields":{},
"facet_dates":{},
"facet_ranges":{}
}
}
Since you also want the documents back for the field-disjunctive query, something like the following works:
/select?q=donkey&defType=edismax&qf=text+titlle&rows=10&facet=true&facet.query=title:donkey&facet.query=text:donkey&wt=json

Solr End User Query Translation

I am wondering if there is anyway to transform an end user query to a more complicated solr query based on some rules.
For example, if the user types in 32" television, then I want to use the dismax query parser to let solr take care of this user query string like below:
http://localhost:8983/solr/select/?q=32" television&defType=dismax
However, if the user types in "televisions on sale", then I want to do a regular search for token televisions and onsale flag is true like below:
http://localhost:8983/solr/select/?q=name:televisions AND isOnSale:true
Is this possible? Or must this logic require an advance search form where the user can clearly state in a checkbox that they only want on sale items.
Thanks.
Transforming the user query is quite possible. You can do it in following two ways
implement a Servlet Filter that listens to user query transforms it before dispatching it to solr request handler.
Look at query parser plugin in SOLR and implement one based on the existing one like standard query parser and modify it to apply transformation rules.
Let the search happen through the whole index and let the user choose. If a review shows up, render it with the appropriate view. If a product shows up, offer to search for more products.
Samsung 32 in reviews --read more
LG 32 in offers --find more like this
Your offers page can offer more options, such as filtering products on sale.
You may use a global boost field on documents. For example, a product on sale has a score of 1.0 while out of stock products have 0.33. A review of a new products has 1.0, old products have less.
Maybe you can set up the search so when someone searches for whatever have isOnSale as a secondary sort parameter. So by default sort by score then sort by isonsale or just sort by isonsale. That way you will still get all "television" ads in the results just the ones on sale are on top.

Nutch crawler not indexing HTML content

I am trying to develop a search functionality where I enter a city name and it gives me the weather conditions for that city.
I have set up Nutch-1.3 and Solr-3.4.0 on my system. The website I am crawling is here and passing the index to Solr for searching.Now, I want to retrieve the information displayed on this link, on querying for delhi.
How can I achieve this? Does it require any plugin to be written?
<doc><float name="score">1.0</float><float name="boost">0.1879294</float><str name="content"/><str name="digest">d41d8cd98f00b204e9800998ecf8427e</str><str name="id">http://www.imd.gov.in/section/nhac/distforecast/delhi.htm</str><str name="segment">20111118153543</str><str name="title"/><date name="tstamp">2011-11-18T10:06:45.604Z</date><str name="url">http://www.imd.gov.in/section/nhac/distforecast/delhi.htm</str></doc>
Nutch basically crawls through links on the pages.
However, there are no links on the India page for it to reach the Delhi page mentioned by you.
So it won't be able to navigate it down to that page.
You can create your own dummy html page, acting as the start url for indexing, and have all the links you want Nutch to index.
Whats the default search field in you schema ?
Usually its the text field, and querying for delhi would look into that field for matches.
As *:* returns the delhi result, and delhi does not. Its not matching the indexed tokens on the field it is searching on.
Whats the field type defined for url in the schema ?
You can copy the field to an other field with text analysis, which would produce the delhi token and querying for url_copy:delhi should return you the results.

Resources