Solr - how to get all faceted field's values? - solr

I have a huge solr index with ~1.500.000 items and I want to get all distinct Brands.
I tried this solr query:
select/?q=*&rows=0&facet=on&facet.field=brand , but not all brands are displayed (just some of them).
Solr response:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">39</int>
<lst name="params">
<str name="facet">on</str>
<str name="facet.mincount">0</str>
<str name="q">*</str>
<str name="facet.field">brand</str>
<str name="rows">0</str>
</lst>
</lst>
<result name="response" numFound="1520444" start="0"/>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="brand">
<int name=" ">51099</int>
<int name="Blancheporte">11269</int>
<int name="Ama Fashion">8254</int>
<int name="Heine">7026</int>
<int name="Kolok">6874</int>
<int name="Knecht">6836</int>
<int name="JoyJoy">6139</int>
<int name="MyDressing">5418</int>
<int name="Widmann Italia">5310</int>
<int name="modlet">4476</int>
<int name="Mann-Filter">4243</int>
<int name="Daniel Klein">4033</int>
<int name="LEGO">4002</int>
<int name="Casio">3887</int>
<int name="Canon">3706</int>
<int name="Generic">3641</int>
<int name="HP">3608</int>
<int name="PUMA">3593</int>
<int name="Nespecificat">3552</int>
<int name="YATO">3421</int>
<int name="Philips">3397</int>
<int name="Polirom">3320</int>
<int name="LE COQ SPORTIF">3154</int>
<int name="Bullyland">3056</int>
<int name="PIATRAONLINE.RO">2863</int>
<int name="Ravensburger">2775</int>
<int name="Samsung">2612</int>
<int name="Zambirici">2612</int>
<int name="ASUS">2579</int>
<int name="Humanitas">2536</int>
<int name="MyKids">2485</int>
<int name="""">2484</int>
<int name="QQ">2467</int>
<int name="Chipolino">2441</int>
<int name="VOREL">2386</int>
<int name="Disney">2367</int>
<int name="Bosch">2287</int>
<int name="Kingston">2259</int>
<int name="Litera">2255</int>
<int name="Dell">2122</int>
<int name="Corsair">2116</int>
<int name="Lenovo">2057</int>
<int name="RAO">2054</int>
<int name="Mango">2049</int>
<int name=""">2043</int>
<int name="Playmobil">2003</int>
<int name="Melissa & Doug">1995</int>
<int name="BOOKCITY">1985</int>
<int name="Epson">1980</int>
<int name="SAMSUNG">1961</int>
<int name="Meli Melo - Paris">1932</int>
<int name="Moje Bambino">1917</int>
<int name="Mattel">1906</int>
<int name="Q-Hausmarke">1875</int>
<int name="Mahle Original">1856</int>
<int name="Purflux">1844</int>
<int name="Orient">1763</int>
<int name="Triumph">1739</int>
<int name="THEICONIC">1731</int>
<int name="Michelin ">1721</int>
<int name="Vero Moda">1694</int>
<int name="Pirelli ">1681</int>
<int name="Marko">1679</int>
<int name="Lorelli">1674</int>
<int name="Peg Perego">1646</int>
<int name="Hengst Filter">1642</int>
<int name="Trendzilla">1612</int>
<int name="Hasbro">1611</int>
<int name="Brother">1552</int>
<int name="Baby Mix">1540</int>
<int name="Adidas">1526</int>
<int name="Brevi">1517</int>
<int name="oteros">1511</int>
<int name="Continental ">1500</int>
<int name="Microsoft">1492</int>
<int name="PEPE JEANS">1480</int>
<int name="Bertoni-Lorelli">1465</int>
<int name="Sony">1464</int>
<int name="R essentiel">1452</int>
<int name="Trespass">1420</int>
<int name="Hauck">1418</int>
<int name="Clementoni">1409</int>
<int name="Revell">1390</int>
<int name="Miniland">1388</int>
<int name="Floria">1366</int>
<int name="Sense">1338</int>
<int name="Lexmark">1332</int>
<int name="Altii">1317</int>
<int name="Salomon ">1296</int>
<int name="Hewlett Packard">1295</int>
<int name="SAMSUNG ">1290</int>
<int name="D-Mail">1283</int>
<int name="Make-up Studio PROFESSIONAL">1253</int>
<int name="Panasonic">1251</int>
<int name="Zara">1243</int>
<int name="Gigabyte">1237</int>
<int name="Trei">1233</int>
<int name="Tommy Hilfiger">1227</int>
<int name="Divisima">1219</int>
<int name="Bright Starts">1214</int>
</lst>
</lst>
<lst name="facet_dates"/>
<lst name="facet_ranges"/>
<lst name="facet_intervals"/>
<lst name="facet_heatmaps"/>
</lst>
</response>
I think there is a limit.. or the way I am trying to get them is wrong.
Do you know how I can get all distinct values from one field, using solr?
Thanks!

If you want to get all the brands (assuming there are more than 100) and have them listed in the facet_fields, you'll need to set facet.limit to -1. In SolrJ, that would look something like this: query.setFacetLimit(-1);.
From the docs:
This param indicates the maximum number of constraint counts that
should be returned for the facet fields. A negative value means
unlimited.
The default value is 100.
This parameter can be specified on a per field basis to indicate a
separate limit for certain fields.
If your issue is no documents are being returned, the rows parameter needs to be set to something other than zero.

Related

Two-dimensional facet query

I have a simple query which returns counts for two different facets (e.g. authors and categories) with some limit. For instance:
select?q=*&facet.field=authors&facet.field=categories&facet.limit=2
As the result two lists with two top values each are returned, e.g.:
<lst name="facet_fields">
<lst name="authors">
<int name="Author1">1200</int>
<int name="Author2">1100</int>
</lst>
<lst name="categories">
<int name="Cat1">500</int>
<int name="Cat2">400</int>
</lst>
</lst>
I would like to present these facets on a two-dimensional table with the count for each pair:
Author1 Author2
Cat1 x x
Cat2 x x
How can I get the count for each pair? I tried to use pivot faceting with the query like this:
select?q=*&facet.pivot=authors,categories&facet.limit=2
but the response is not what I expect as it can contain something like in the following example where categories in the pivot for Author2 are different than for Author1. The pivot presents top categories for each author and not top categories for the whole query:
<lst name="facet_pivot">
<arr name="authors,categories">
<lst>
<str name="field">authors</str>
<str name="value">Author1</str>
<int name="count">1200</int>
<arr name="pivot">
<lst>
<str name="field">categories</str>
<str name="value">Cat1</str>
<int name="count">450</int>
</lst>
<lst>
<str name="field">categories</str>
<str name="value">Cat2</str>
<int name="count">300</int>
</lst>
</arr>
</lst>
<lst>
<str name="field">authors</str>
<str name="value">Author2</str>
<int name="count">1100</int>
<arr name="pivot">
<lst>
<str name="field">categories</str>
<str name="value">Cat3</str>
<int name="count">300</int>
</lst>
<lst>
<str name="field">categories</str>
<str name="value">Cat4</str>
<int name="count">250</int>
</lst>
</arr>
</lst>
</arr>
</lst>
Can the pivot query be somehow parameterized to achieve the desired result of is there any other query that I could use for this particular use case?

Solr Query to display selected fields

I am using 'Facet' to find the count of top 3 most repeated words in a particular field say "msgs" which contains more than 10,000 records.
and I get the output similar to this.
word1 1600
word2 1536
word3 956
Now, along with the count, I want to display those particular fields which contain the above words. Any suggestions??
Okay. I hope I understand what you need. You could try query similar to this one:
http://solrhost:solrport/solr/select?q=your_query&rows=0&facet=true&facet.limit=-1&facet.field=your_facet_field1&facet.field=your_facet_field2
where
solrhost - Solr address
solrport - Solr port (default 8983)
your_facet_field1, etc - your field msgs
your_query could be : if you want to facet every document
Result will be something like this:
<response>
<responseHeader>
<status>0</status>
<QTime>2</QTime>
</responseHeader>
<result numFound="4" start="0" />
<lst name="facet_counts">
<lst name="facet_queries" />
<lst name="facet_fields">
<lst name="your_facet_field1">
<int name="search">0</int>
<int name="memory">0</int>
<int name="graphics">0</int>
<int name="card">0</int>
<int name="music">1</int>
<int name="software">0</int>
<int name="electronics">3</int>
<int name="copier">0</int>
<int name="multifunction">0</int>
<int name="camera">0</int>
<int name="connector">2</int>
<int name="hard">0</int>
<int name="scanner">0</int>
<int name="monitor">0</int>
<int name="drive">0</int>
<int name="printer">0</int>
</lst>
<lst name="your_facet_field2">
<int name="false">3</int>
<int name="true">1</int>
</lst>
</lst>
</lst>
</response>

how to use functionqueries in solr as comparison to a spesific value?

I want to write a query that for ex. in sql psodocode like below
select * from temptable where price + 3 = 188;
Solr query i try is below
http://127.0.0.1:8983/solr/select/?fl=score,id&defType=func&q=sum(price,3):188
but i get below error. How can i query in solr? Please do not advice using "TO" keyword.
<response>
<lst name="responseHeader">
<int name="status">400</int>
<int name="QTime">1</int>
<lst name="params">
<str name="fl">score,id</str>
<str name="q">sum(price,3):188</str>
<str name="defType">func</str>
</lst>
</lst>
<lst name="error">
<str name="msg">
org.apache.solr.search.SyntaxError: Unexpected text after function: :188
</str>
<int name="code">400</int>
</lst>
</response>
frange query will do
{!frange l=188 u=188} sum(price,3)

what does facet in Solr means?

Can you please explain me , what is facet ?
What did I understand is , suppose I have following documents.
State Country
karntaka India
Bangalore India
Delhi India
Noida India
It collapse multiple same value of field to a single value and returns number of times that value occurred.
Now when i am search on field 'Country' then obviously I am getting 4 times India , So i keep facet=on and facet.field=Country, with a motive of getting only one time India , but when i fired query rather I am getting
some weird result
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">6</int>
</lst>
<result name="response" numFound="4" start="0">
<doc>
<str name="country">India</str></doc>
<doc>
<str name="country">India</str></doc>
<doc>
<str name="country">India</str></doc>
<doc>
<str name="country">India</str></doc>
</result>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="country">
<int name="a">4</int>
<int name="d">4</int>
<int name="di">4</int>
<int name="dia">4</int>
<int name="i">4</int>
<int name="ia">4</int>
<int name="in">4</int>
<int name="ind">4</int>
<int name="indi">4</int>
<int name="india">4</int>
</lst>
</lst>
<lst name="facet_dates"/>
<lst name="facet_ranges"/>
</lst>
</response>
Can any one help me to understand .
Thanks
If you had a Washington, USA entry, the facet would report 4 results for India and 1 for USA.
Use a string field type. You seem to have used a (text) field with lowercasing and n-gramming, which may benefit people who spell India as Inde, for example. A string field is not processed like this and therefore its best suited for a field meant to be faceted.

SOLR date faceting and BC / BCE dates / negative date ranges

Date ranges including BC dates is this possible?
I would like to return facets for all years between 11000 BCE (BC) and 9000 BCE (BC) using SOLR.
A sample query might be with date ranges converted to ISO 8601:
q=*:*&facet.date=myfield_earliestDate&facet.date.end=-92009-01-01T00:00:00&facet.date.gap=%2B1000YEAR&facet.date.other=all&facet=on&f.myfield_earliestDate.facet.date.start=-112009-01-01T00:00:00
However the returned results seem to be suggest that dates are in positive range, ie CE, not BCE...
see sample returned results
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">6</int>
<lst name="params">
<str name="f.vra.work.creation.earliestDate.facet.date.start">-112009-01-01T00:00:00Z</str>
<str name="facet">on</str>
<str name="q">*:*</str>
<str name="facet.date">vra.work.creation.earliestDate</str>
<str name="facet.date.gap">+1000YEAR</str>
<str name="facet.date.other">all</str>
<str name="facet.date.end">-92009-01-01T00:00:00Z</str>
</lst>
</lst>
<result name="response" numFound="9556" start="0">ommitted</result>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields"/>
<lst name="facet_dates">
<lst name="vra.work.creation.earliestDate">
<int name="112010-01-01T00:00:00Z">0</int>
<int name="111010-01-01T00:00:00Z">0</int>
<int name="110010-01-01T00:00:00Z">0</int>
<int name="109010-01-01T00:00:00Z">0</int>
<int name="108010-01-01T00:00:00Z">0</int>
<int name="107010-01-01T00:00:00Z">0</int>
<int name="106010-01-01T00:00:00Z">0</int>
<int name="105010-01-01T00:00:00Z">0</int>
<int name="104010-01-01T00:00:00Z">0</int>
<int name="103010-01-01T00:00:00Z">0</int>
<int name="102010-01-01T00:00:00Z">0</int>
<int name="101010-01-01T00:00:00Z">0</int>
<int name="100010-01-01T00:00:00Z">5781</int>
<int name="99010-01-01T00:00:00Z">0</int>
<int name="98010-01-01T00:00:00Z">0</int>
<int name="97010-01-01T00:00:00Z">0</int>
<int name="96010-01-01T00:00:00Z">0</int>
<int name="95010-01-01T00:00:00Z">0</int>
<int name="94010-01-01T00:00:00Z">0</int>
<int name="93010-01-01T00:00:00Z">0</int>
<str name="gap">+1000YEAR</str>
<date name="end">92010-01-01T00:00:00Z</date>
<int name="before">224</int>
<int name="after">0</int>
<int name="between">5690</int>
</lst>
</lst>
</lst>
</response>
Any ideas why this is the case, can solr handle negative dates such as -112009-01-01T00:00:00Z?
I don't think this is fully supported. At least I don't see any explicit references to BC dates in the source code or the tests.
I even tried defining BC years using date math, e.g:
facet.date.start: NOW-11000YEARS
facet.date.end: NOW
facet.date.gap: +1000YEAR
and got some weird results:
<int name="8991-06-07T20:30:45-.666Z">0</int>
<int name="7991-06-07T20:30:45-.666Z">0</int>
<int name="6991-06-07T20:30:45-.666Z">0</int>
<int name="5991-06-07T20:30:45-.666Z">0</int>
<int name="4991-06-07T20:30:45-.666Z">0</int>
<int name="3991-06-07T20:30:45-.666Z">0</int>
<int name="2991-06-07T20:30:45-.666Z">0</int>
<int name="1991-06-07T20:30:45-.666Z">0</int>
<int name="0991-06-07T20:30:45-.666Z">0</int>
<int name="0010-06-07T20:30:45-.666Z">0</int>
<int name="1010-06-07T20:30:45-.666Z">1435</int>
Note the - after the seconds. Looks like a bug to me...

Resources