I have a Solr server with data under this format:
{
id: 1,
text_1: "some_text1",
text_2: "some_text2",
},
{
id: 2,
text_1: "some_text1",
text_2: "some_text2",
}
I need to find documents like the ones I wrote above. Documents that have the same "text_1" and "text_2" values but different ids.
I've tried using facets, but I'm not sure if it helps. Firstly, it only returns a count of the duplicates and I need the id's of these documents. Secondly, I'm not sure that faceting over multiple fields does what I want. I'm not sure that:
facet.field=text_1&facet.field=text_2 shows me a count of documents that have both those fields.
Thank you, I don't know much about Solr. Any help is greatly appreciated!
I think facets are your best bet to get this done, but as you noticed you will need to issue at least two queries: one to get the facets and another to fetch the actual documents that belong to the facet (i.e. the duplicates in your case)
To get the multi facets to work for what you are trying to do you'll need to use PivotFaceting (https://lucene.apache.org/solr/guide/7_0/faceting.html#pivot-decision-tree-faceting). The syntax is facet=on&facet.pivot=field1,field2
Make sure the field that you use for facets is a string field and not a text field.
Related
I am using Solr version 3.5. I want to implement an auto-suggest feature in my application through the Suggester approach. http://wiki.apache.org/solr/Suggester.
Can someone please help me with the following:
How can i return more than one fields in the query response. For example, i am trying to create an index based on the 'name' field, but i also want to return an 'id' field where these two fields are the product attributes i am search for [say movie titles]. Hence, the response should include both the 'id' and 'title' of the product
How can i do a case-insensitive search using Suggester? For example, a search term "abc" should return documents containing the name as "ABC", "Abc" etc.
Please help.
Regards.
If you're looking to get suggestions on a particular field but also return other fields in the document, you can use the 'Payload' tag. Only one payload field is allowed, but you can get around this by using a json format in the field.
https://cwiki.apache.org/confluence/display/solr/Suggester
https://stackoverflow.com/a/32558487/578582
I think you're not quite getting the point of the suggester. It is not designed to return suggestions for exactly one search result per entry (this is the only scenario where returning the ID would make sense).
You could, however, do normal wildcard searches on the title field and use the returned titles as suggestions. This way you could also get the ID (and any other index field) with the results. I imagine this could be implemented fairly easily with jQuery UI. It may be much slower than the suggest API, depending on your index schema design.
if you are not really interested in the order of the suggestions i found that the weight_field can be [ab]used to return the document id for each suggestion
Imagine an index like the following:
id partno name description
1 1000.001 Apple iPod iPod by Apple
2 1000.123 Apple iPhone The iPhone
When the user searches for "Apple" both documents would be returned. Now I'd like to give the user the possibility to narrow down the results by limiting the search to one or more fields that have documents containing the term "Apple" within those fields.
So, ideally, the user would see something like this in the filter section of the ui after his first query:
Filter by field
name (2)
description (1)
When the user applies the filter for field "description", only documents which contain the term "Apple" within the field "description" would be returned. So the result set of that second request would be the iPod document only. For that I'd use a query like ?q=Apple&qf=description (I'm using the Extended DisMax Query Parser)
How can I accomplish that with Solr?
I already experimented with faceting, grouping and highlighting components, but did not really come to a decent solution to this.
[Update]
Just to make that clear again: The main problem here is to get the information needed for displaying the "Filter by field" section. This includes the names of the fields and the hits per field. Sending a second request with one of those filters applied already works.
Solr just plain Doesn't Do This. If you absolutely need it, I'd try it the multiple requests solution and benchmark it -- solr tends to be a lot faster than what people put in front of it, so an couple few requests might not be that big of a deal.
you could achieve this with two different search requests/queries:
name:apple -> 2 hits
description:apple -> 1 hit
EDIT:
You also could implement your own SearchComponent that executes multiple queries in the background and put it in the SearchHandler processing chain so you only will need a single query in the frontend.
if you want the term to be searched over the same fields every time, you have 2 options not breaking the "single query" requirement:
1) copyField: you group at index time all the fields that should match togheter. With just one copyfield your problem doesn't exist, if you need more than one, you're at the same spot.
2) you could filter the query each time dynamically adding the "fq" parameter at the end
http://<your_url_and_stuff>/?q=Apple&fq=name:Apple ...
this works if you'll be searching always on the same two fields (or you can setup them before querying) otherwise you'll always need at least a second query
Since i said "you have 2 options" but you actually have 3 (and i rushed my answer), here's the third:
3) the dismax plugin described by them like this:
The DisMaxQParserPlugin is designed to process simple user entered phrases
(without heavy syntax) and search for the individual words across several fields
using different weighting (boosts) based on the significance of each field.
so, if you can use it, you may want to give it a look and start from the qf parameters (that is what the option number 2 wanted to be about, but i changed it in favor of fq... don't ask me why...)
SolrFaceting should solve your problem.
Have a look at the Examples.
This can be achieved with Solr faceting, but it's not neat. For example, I can issue this query:
/select?q=*:*&rows=0&facet=true&facet.query=title:donkey&facet.query=text:donkey&wt=json
to find the number of documents containing donkey in the title and text fields. I may get this response:
{
"responseHeader":{"status":0,"QTime":1,"params":{"facet":"true","facet.query":["title:donkey","text:donkey"],"q":"*:*","wt":"json","rows":"0"}},
"response":{"numFound":3365840,"start":0,"docs":[]},
"facet_counts":{
"facet_queries":{
"title:donkey":127,
"text:donkey":4108
},
"facet_fields":{},
"facet_dates":{},
"facet_ranges":{}
}
}
Since you also want the documents back for the field-disjunctive query, something like the following works:
/select?q=donkey&defType=edismax&qf=text+titlle&rows=10&facet=true&facet.query=title:donkey&facet.query=text:donkey&wt=json
I am using apache Solr for searching my database..!!
Suppose i have indexed 4 columns from one of my table..!!..I just want that only those columns that contains my query term are returned in response..!!..is that possible..??
For example :
I have a table cars with columns : name, displayName, description, extra ..!!
Now i make a query , something like :
localhost:8983/solr/select?q=maruti&wt=json
Now some in some rows only name may contain the word "maruti"
So, In return, i want only name (along with some other fixed fields like ID) ..
Similarly, If description contains this word, then only description should be returned..and not other columns..!!
How can i acheive this..??
You may be able to do this with Solr 4 and a custom transformer - my reading of the documentation would seem to indicate as much. But it would be quite a bit of work, I think. Ultimately you may have to write a front-end filter, but that would be difficult with complex queries.
Update:
Here's how to do this in Solr without custom transformers, etc. Enable highlighting for all four columns:
hl=on&hl.fl=name,displayName,description,extra
Solr will return a "highlighting" structure containing the key and the field(s) that match the query. You will also get highlighted snippets, whether you use them is up to you. See here for additional params: http://wiki.apache.org/solr/HighlightingParameters
I need a way to get facets on two combined field names. To show you what I mean, take a look at the query as it is now:
{
"responseHeader":{
"status":0,
"QTime":16,
"params":{
"facet":"true",
"indent":"true",
"q":"productId:(1 OR 2 OR 3 OR 4)",
"facet.field":["productMetaType",
"productId"],
"rows":"10"}},
"response":{"numFound":4,"start":0,"docs":[
{
"productId":1,
"productMetaType":"PRIMARY_PHOTO",
"url":"1_PRIM.JPG"},
{
"productId":1,
"productMetaType":"OTHER_PHOTO",
"url":"1_1.JPG"},
{
"productId":1,
"productMetaType":"OTHER_PHOTO",
"url":"1_2.JPG"},
{
"productId":2,
"productMetaType":"OTHER_PHOTO",
"url":"2_1.JPG"}]
},
"facet_counts":{
"facet_queries":{},
"facet_fields":{
"productMetaType":[
"PRIMARY_PHOTO",1,
"OTHER_PHOTO",3],
"productId":[
"1",3,
"2",1]},
"facet_dates":{},
"facet_ranges":{}
}
}
I get two facet fields, productMetaType and productId. What I need to do is somehow combine those fields so I get data back something like this:
1_PRIMARY_PHOTO, 1,
1_OTHER_PHOTO, 2,
2_PRIMARY_PHOTO, 0,
2_OTHER_PHOTO, 1
Does the pivot functionality do this? Unfortunately, we're running Solr 3.1, so pivot isn't available, but if that is the only way to do this, I might have some ammo for upgrading.
The only other thing I could think of was some how concatenating the field names. I am new to Solr and don't know what is possible. Any advice or assistance is appreciated. Thank you for your time.
Yes, Pivot would work do the trick, but as you observed, this feature is only available in Solr trunk.
Your idea to combine both fields would work too. Actually, if your fields have a limited number of values, the easiest and most flexible way to do this would be to use facet queries:
productId:1 AND productMetaType:PRIMARY_PHOTO
productId:2 AND productMetaType:OTHER_PHOTO
productId:1 AND productMetaType:OTHER_PHOTO
productId:2 AND productMetaType:PRIMARY_PHOTO
Otherwise, just create a new field in your Solr schema.xml with string type, recreate your index by adding your documents as previously, but with this new field (that you can generate as you wish, using '_' as a separator between the two field values would work perfectly).
Solr newbie here.
I have created a Solr index and write a whole bunch of docs into it. I can see
from the Solr admin page that the docs exist and the schema is fine as well.
But when I perform a search using a test keyword I do not get any results back.
On entering * : *
into the query (in Solr admin page) I get all the results.
However, when I enter any other query (e.g. a term or phrase) I get no results.
I have verified that the field being queried is Indexed and contains the values I am searching for.
So I am confused what I am doing wrong.
Probably you don't have a <defaultSearchField> correctly set up. See this question.
Another possibility: your field is of type string instead of text. String fields, in contrast to text fields, are not analyzed, but stored and indexed verbatim.
I had the same issue with a new setup of Solr 8. The accepted answer is not valid anymore, because the <defaultSearchField> configuration will be deprecated.
As I found no answer to why Solr does not return results from any fields despite being indexed, I consulted the query documentation. What I found is the DisMax query parser:
The DisMax query parser is designed to process simple phrases (without complex syntax) entered by users and to search for individual terms across several fields using different weighting (boosts) based on the significance of each field. Additional options enable users to influence the score based on rules specific to each use case (independent of user input).
In contrast, the default Lucene parser only speaks about searching one field. So I gave DisMax a try and it worked very well!
Query example:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video
You can also specify which fields to search exactly to prevent unwanted side effects. Multiple fields are separated by spaces which translate to + in URLs:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features+text
Last but not least, give the fields a weight:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features^20.0+text^0.3
If you are using pysolr like I do, you can add those parameters to your search request like this:
results = solr.search('search term', **{
'defType': 'dismax',
'qf': 'features text'
})
In my case the problem was the format of the query. It seems that my setup, by default, was looking and an exact match to the entire value of the field. So, in order to get results if I was searching for the sit I had to query *sit*, i.e. use wildcards to get the expected result.
With solr 4, I had to solve this as per Mauricio's answer by defining type="text_en" to the field.
With solr 6, use text_general.