how to order groups by count in solr - solr

I'm wondering how to order groups in a Solr result. I want to order the groups by numFound. I saw how to order the groups by score here, but that didn't seem to actually make a difference in the examples I looked at, and isn't exactly what I wanted.
In the xml you can see the number per group as numFound and that is what I want to sort the groups by, so for example I could see the largest group at the top.
<arr name="groups">
<lst>
<str name="groupValue">top secret</str>
<result name="doclist" numFound="12" start="0">
...
Any tips appreciated! Thanks!

This is an old question, but it is possible with two queries.
First query: bring back the field you're grouping by as a set of facets for your navigation state. You can limit the number of records returned to 0 here: you just need the facets. The number of facets you return should be the size of your page.
group_id:
23 (6)
143:(3)
5:(2)
Second query: Should be for the records, so no facets are required. The query should be an OR query for the facet field values returned from the first query. (group_id:23 OR group_id:143 OR group_id:5 and so on) and be grouped by the id you are used for grouping.
Sorting: reorder the records from query 2 to match the order from query 1.
That'll do it, with the proviso that I'm not sure how scalable that OR query will be. If you're looking to paginate, remember that you can offset facets: use that as the mechanism instead of offseting the records.

Sorting on the numFound is not possible as numFound is not an field in Solr.
Check the discussion mentioning it not being supported and I did not find a JIRA open for the issue as well.

Not possible since the last time I looked into this.

you can sort by using fields
consider an Example :
If you have 5 FACETS and COUNT associated with it.
Then you can sort by using the COUNTS of each fields.
It can be applicable to normal/non-facets fields .
public class FacetBean implements Category,Serializable {
private String facetName; //getter , setters
private long facetCount; // getter , setters
public FacetBean(String facetName, long count,) {
this.facetName = facetName;
this.count = count;
}}
Your calling method should be like this
private List<FacetBean> getFacetFieldsbyCount(QueryResponse queryResponse)
{
List<FacetField> flds = queryResponse.getFacetFields();
List<FacetBean> facetList = new ArrayList<FacetBean>();
FacetBean facet = null;
if (flds != null) {
for (FacetField fld : flds) {
facet = new FacetBean();
facet.setFacetName(fld.getName());
List<Count> counts = fld.getValues();
if (counts != null) {
for (Count count : counts) {
facet.setFacetCount(count.getCount());
}
}
facetList.add(facet);
}
}
Collections.sort(facetList,new Comparator<FacetBean>() {
public int compare(FacetBean obj1, FacetBean obj2) {
if(obj1.getFacetCount() > obj2.getFacetCount()) {
return (int)obj1.getFacetCount();
} else {
return (int)obj2.getFacetCount();
}
}
});
return facetList;
}
In The same URL They have mentioned something like.
sort -- >ex : For example, sort=popularity desc will cause the groups to be sorted according to the highest popularity doc
group.sort -- > you can apply your field here .
Hope it helps.

Related

Solr query to find one letter without other letter around

I have documents in my solr already indexed. I want to find Producer and model in tire.
I have file with producer and model like this:
Nokian;WR G2 SUV
Nokian;WR SUV
Nokian;V
Query:
((productname:"NOKIAN" OR producer:"NOKIAN") AND (productname:"V" OR description:"V" OR referencenumber:"V"))
But it found for example this:
"2X NOKIAN 215/55 R17 94V LINE (3)"
Because in this product speed index is V and here model is Line. My algorithm take this product for Nokian;V not for Nokian;Line.
How to ask solr to gives me only this product where this V don't have any other letters around?
LETNIE 225/45/17 94V NOKIAN V FINLAND - PŁOTY
This found beautiful. Its Nokian;V.
As far as I understand your question you need to put MUST quantifier before each boolean clause. So query will look like:
(
+(productname:"NOKIAN" OR producer:"NOKIAN") AND
+(productname:"V" OR description:"V" OR referencenumber:"V")
)
If your productname field is of type text it has the WordDelimiterFilter in the analysis chain. One of the default behaviors of this filter is to split terms on letter-number boundaries causing:
2X NOKIAN 215/55 R17 94V LINE (3)
to generate the following tokens:
2 X NOKIAN 215 55 R 17 94 V LINE 3
(which matches the "V" in your query).
You can always run debug=results to get an explanation for why something matches. I think in this particular case, you might construct another field type for your productname field that analyzes your model string less aggressively.
I solved the problem in such a way that sorted out brand,model Dictionary. I used my own comparer.
public class MyComparer : IComparer<string>
{
int IComparer<string>.Compare(string x, string y)
{
if (x == y)
{
return 0;
}
if (x.Contains(y))
{
return -1;
}
else
{
return 1;
}
}
}
All model that have V or H now are on the end of Dcitionary. It's works very well. Because first solr searched Nokian;Line and this product where found add to other list alreadyFound and skip this product where found model. Thanks all for your reply.

sosl query wildcard not returning correct results

I have a function that uses a sosl query:
private List<Product2> runSoslToExecute() {
List<List<Product2>> searchResults = [FIND :query IN ALL FIELDS RETURNING Product2 (Id, Name)];
List<Product2> results = new List<Product2>();
for(Product2 p : searchResults[0]) {
results.add(p);
}
return results;
}
If I search for "AB*" then I also get results that include "1AB...". I thought the "*" wildcard only searches in the middle and end of the search and not at the beginning? Is there a way to run the sosl search so it only searches "AB" at the beginning?
Thanks for any help.
You could use like, for example,
Select Id, Name from Account where Name like 'AB%'

Solr / Lucene: Get all field names sorted by number of occurrences in index

I want to get the list of all fields (i.e. field names) sorted by the number of times they occur in the Solr index, i.e.: most frequently occurring field, second most frequently occurring field and so on.
Alternatively, getting all fields in the index and the number of times they occur would also be sufficient.
How do I accomplish this either with a single solr query or through solr/lucene java API?
The set of fields is not fixed and ranges in the hundreds. Almost all fields are dynamic, except for id and perhaps a couple more.
As stated in Solr: Retrieve field names from a solr index? you can do this by using the LukeRequesthandler.
To do so you need to enable the requestHandler in your solrconfig.xml
<requestHandler name="/admin/luke" class="org.apache.solr.handler.admin.LukeRequestHandler" />
and call it
http://solr:8983/solr/admin/luke?numTerms=0
If you want to get the fields sorted by something you are required to do this on your own. I would suggest to use Solrj in case you are in a java environment.
Fetch fields using Solrj
#Test
public void lukeRequest() throws SolrServerException, IOException {
SolrServer solrServer = new HttpSolrServer("http://solr:8983/solr");
LukeRequest lukeRequest = new LukeRequest();
lukeRequest.setNumTerms(1);
LukeResponse lukeResponse = lukeRequest.process(solrServer );
List<FieldInfo> sorted = new ArrayList<FieldInfo>(lukeResponse.getFieldInfo().values());
Collections.sort(sorted, new FieldInfoComparator());
for (FieldInfo infoEntry : sorted) {
System.out.println("name: " + infoEntry.getName());
System.out.println("docs: " + infoEntry.getDocs());
}
}
The comparator used in the example
public class FieldInfoComparator implements Comparator<FieldInfo> {
#Override
public int compare(FieldInfo fieldInfo1, FieldInfo fieldInfo2) {
if (fieldInfo1.getDocs() > fieldInfo2.getDocs()) {
return -1;
}
if (fieldInfo1.getDocs() < fieldInfo2.getDocs()) {
return 1;
}
return fieldInfo1.getName().compareTo(fieldInfo2.getName());
}
}

Custom Query Component - How to get "score" from document id?

I'm writing several Solr Custom Query Components.
Each component run different kinds of queries:
Component A: does a group by query field A
Component B: does a group by on a different fild B
Each component will send it's the documents from it's result to the next component.
In my "process" function, I'm donig the following after the result is set by grouping:
IndexSchema schema = searcher.getSchema();
DocListAndSet s = result.getDocListAndSet();
DocSet s3 = s.docSet;
DocIterator dit = s3.iterator()
while (dit.hasNext())
{
SolrDocument doc = new SolrDocument();
int docid = dit.nextDoc();
//float score = dit.score();<--This does not get the score
Document luceneDoc = searcher.doc(docid);//get the document using the doc id
for( Fieldable field : luceneDoc.getFields())
{
SchemaField sf = schema.getField( field.name() );
doc.addField( field.name(), sf.getType().toObject( field ) );
......
}
And then iterating through the Set and createing SolrDocument.
The SolrDocumentes are entered into a SolDocumentList and end I send it off to the next Component:
rb.req.getContext().put("TAG", list);
I also want to add a field called "score" SolrDocument, this field will contain the actual score. I've tried getting the score using:
float score = dit.score()
But the above does not get the score of the document. How do I get the "score" of the document using the document id?
Is there a particular reason you are getting the docSet instead of the docList?
I would try (condensing a bit) getting s.docList.iterator() instead of s.docSet.iterator(). The latter states specifically in the documentation here that you can't get meaningful scores from it, where the docList states it may contains valid scores.
Well you have to set GET_Scores in getDocList(query,List,Lsort,offset,maxnoofdocs,1)
Here
`
query is your query obj
List<Query> your filters could be null
lsort could be null
offset
maxnoofdocs integer
1 means get score with documents`

How to query SOLR for empty fields?

I have a large solr index, and I have noticed some fields are not updated correctly (the index is dynamic).
This has resulted in some fields having an empty "id" field.
I have tried these queries, but they didn't work:
id:''
id:NULL
id:null
id:""
id:
id:['' TO *]
Is there a way to query empty fields?
Thanks
Try this:
?q=-id:["" TO *]
One caveat! If you want to compose this via OR or AND you cannot use it in this form:
-myfield:*
but you must use
(*:* NOT myfield:*)
This form is perfectly composable. Apparently SOLR will expand the first form to the second, but only when it is a top node. Hope this saves you some time!
According to SolrQuerySyntax, you can use q=-id:[* TO *].
If you have a large index, you should use a default value
<field ... default="EMPTY" />
and then query for this default value.
This is much more efficient than q=-id:["" TO *]
You can also use it like this.
fq=!id:['' TO *]
If you are using SolrSharp, it does not support negative queries.
You need to change QueryParameter.cs (Create a new parameter)
private bool _negativeQuery = false;
public QueryParameter(string field, string value, ParameterJoin parameterJoin = ParameterJoin.AND, bool negativeQuery = false)
{
this._field = field;
this._value = value.Trim();
this._parameterJoin = parameterJoin;
this._negativeQuery = negativeQuery;
}
public bool NegativeQuery
{
get { return _negativeQuery; }
set { _negativeQuery = value; }
}
And in QueryParameterCollection.cs class, the ToString() override, looks if the Negative parameter is true
arQ[x] = (qp.NegativeQuery ? "-(" : "(") + qp.ToString() + ")" + (qp.Boost != 1 ? "^" + qp.Boost.ToString() : "");
When you call the parameter creator, if it's a negative value. Simple change the propertie
List<QueryParameter> QueryParameters = new List<QueryParameter>();
QueryParameters.Add(new QueryParameter("PartnerList", "[* TO *]", ParameterJoin.AND, true));
you can do it with filter query
q=*:*&fq=-id:*
A note added here, to make the field searchable first, it needs the field type in SOLR schema set to "indexed = true". Then you can use "field_name:*" for string type and "field_name:[* TO *]" for numeric type.

Resources