SolrException: maxNumThreadStates must be >= 1 but was: 0 - solr

SolrException: maxNumThreadStates must be >= 1 but was: 0
I get following response while trying to create core. Can anyone help on how to deal with maxNumThreadStates?
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">500</int>
<int name="QTime">629</int>
</lst><lst name="error">
<str name="msg">maxNumThreadStates must be >= 1 but was: 0</str>
<str name="trace">org.apache.solr.common.SolrException: maxNumThreadStates must be >= 1 but was: 0
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: maxNumThreadStates must be >= 1 but was: 0
at org.apache.lucene.index.DocumentsWriterPerThreadPool.<init>(DocumentsWriterPerThreadPool.java:142)
at org.apache.lucene.index.DocumentsWriterPerThreadPool.clone(DocumentsWriterPerThreadPool.java:360)
at org.apache.solr.update.SolrIndexConfig.toIndexWriterConfig(SolrIndexConfig.java:261)
at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:80)
at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:66)
at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:550)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:806)
... 38 more
</str><int name="code">500</int></lst><str name="params">name=profile.user&action=CREATE</str>
</response>

Related

Solr/Lucene simple operator "or" missunderstanding / Seach same word in differnt fields

Learning Solr/Lucene Syntax, using Solr Admin in Browser.
There I try to search for the same word in two differnt fields with following syntax:
content:myword -> results found
content:myword OR title:existingTitle -> results found
but
content:myword OR title:myword -> ZERO results found, why? It is "or".
also tried without operator which should be equal to "or" , also tried "|" and "||"
this happens when I try to find the same word in one of multipe fields
[edit]
Here are the solr url requests:
content:fahrzeug title:fahrzeug
http://xxx/solr/core_de/select?q=content%3Afahrzeug%20title%3Afahrzeug
content:fahrzeug OR title:fahrzeug
http://xxx/solr/core_de/select?q=content%3Afahrzeug%20OR%20title%3Afahrzeug
content:fahrzeug | title:fahrzeug
http://xxx/solr/core_de/select?q=content%3Afahrzeug%20%7C%20title%3Afahrzeug
{
"responseHeader":{
"status":400,
"QTime":5,
"params":{
"q":"content:fahrzeug OR title:fahrzeug",
"debugQuery":"1"}},
"error":{
"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.solr.common.SolrException"],
"msg":"invalid boolean value: 1",
"code":400}}
I guess, that it is configured like this:
Try:
http://www119.pxia.de:8983/solr/core_de/select?fq=content%3Afahrzeug%20title%3Afahrzeug&q=*%3A* - this returns correct documents. So those documents are there if only filtering is used. Query use more complex conditions, your default configuration is:
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="echoParams">explicit</str>
<str name="qf">content^40.0 title^5.0 keywords^2.0 tagsH1^5.0 tagsH2H3^3.0 tagsH4H5H6^2.0 tagsInline^1.0</str>
<str name="pf">content^2.0</str>
<str name="df">content</str>
<int name="ps">15</int>
<str name="mm">2<-35%</str>
<str name="mm.autoRelax">true</str>
...
Parser and boosting may play a key role here.
I am not familiar with edixmax parser, please check: documentation
I would guess mm parameter may be causing this.
Anyway its strange, that OR does not work as we are use to from boolean algebra.
"debug":{
"queryBoosting":{
"q":"title:Home OR content:Perfekt",
"match":null},
"rawquerystring":"title:Home OR content:Perfekt",
"querystring":"title:Home OR content:Perfekt",
"parsedquery":"+(title:hom content:perfekt)~2 ()",
"parsedquery_toString":"+((title:hom content:perfekt)~2) ()",
"explain":{
"bf72a75534ba703e4b8dc7194f92ce34223fc0d2/pages/1/0/0/0":"\n4.8893824 = sum of:\n 4.8893824 = sum of:\n 1.9924302 = weight(title:hom in 0) [SchemaSimilarity], result of:\n 1.9924302 = score(doc=0,freq=1.0 = termFreq=1.0\n), product of:\n 1.9924302 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n 1.0 = docFreq\n 10.0 = docCount\n 1.0 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1) from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.0 = parameter b (norms omitted for field)\n 2.8969522 = weight(content:perfekt in 0) [SchemaSimilarity], result of:\n 2.8969522 = score(doc=0,freq=5.0 = termFreq=5.0\n), product of:\n 1.4816046 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n 2.0 = docFreq\n 10.0 = docCount\n 1.9552802 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n 5.0 = termFreq=5.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 508.3 = avgFieldLength\n 184.0 = fieldLength\n"},
"QParser":"ExtendedDismaxQParser",
Check "parsedquery":"+(title:hom content:perfekt)~2 ()" it basically says, that both title and content must be there:
Solr operators

Unable to add a row to Solr

Im trying to add a row to the Solr Index but it does not get added.
I get a response but not sure what to infer from the response.
What information the response provides here? How do I get the exception?
>>> c = SolrConnection('http://localhost:8983/solr')
>>> c
<SolrConnection (url=http://localhost:8983/solr, persistent=True, post_headers={'Content-Type': 'text/xml; charset=utf-8'}, reconnects=0)>
>>> l = [{'document_type': 'demo', 'id': 'demo11234', 'deco_name': 'test'}]
>>> c.add_many(l)
'<?xml version="1.0" encoding="UTF-8"?>\n<response>\n<lst name="responseHeader"><int name="status">0</int><int name="QTime">1</int></lst>\n</response>\n'
>>> try:
... c.add_many(l)
... except:
... print "error"
...
'<?xml version="1.0" encoding="UTF-8"?>\n<response>\n<lst name="responseHeader"><int name="status">0</int><int name="QTime">2</int></lst>\n</response>\n'

How to make word concordance with Solr?

I would like to create a word concordance hit list with Solr, which gives all occurrences of the given word with context.
An English example:
...bla bla1 <b>dog</b> bla bla 1...
...bla bla2 <b>dog</b> bla bla 2...
...bla bla3 <b>dogs</b> bla bla 3
...bla bla4 <b>dogging</b> bla bla 4...
...bla bla5 <b>dog</b> bla bla 5...
It's important to be able to customize the size of the context. (Sometimes more than 1 sentence.)
My question: how can i do this with Solr?
Lucene 4.1 is able to do this, for example with FastVectorHighlighter:
//indexing
FieldType offsetsType = new FieldType(TextField.TYPE_STORED);
offsetsType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
offsetsType.setStored(true);
offsetsType.setIndexed(true);
offsetsType.setStoreTermVectors(true);
offsetsType.setStoreTermVectorOffsets(true);
offsetsType.setStoreTermVectorPositions(true);
offsetsType.setStoreTermVectorPayloads(true);
doc.add(new Field("content", fileContent, offsetsType));
//searching
IndexReader reader = DirectoryReader.open(FSDirectory.open(new File(indexPath)));
IndexSearcher searcher = new IndexSearcher(reader);
Analyzer analyzer = StandardAnalyzer(Version.LUCENE_41);
QueryParser parser = new QueryParser(Version.LUCENE_41, "content", analyzer);
Query query = parser.parse("dog");
TopDocs results = searcher.search(query, 10);
for (int i = 0; i < results.scoreDocs.length; i++) {
int id = results.scoreDocs[i].doc;
Document doc = searcher.doc(id);
FastVectorHighlighter h = new FastVectorHighlighter();
String[] hs = h.getBestFragments(h.getFieldQuery(query), reader, id, "content", contextSize, 10000);
if (hs != null)
for(String f : hs)
System.out.println(" highlight: " + f);
}
But how can i ask Solr to do the same?
My trial was this (solrconfig.xml):
<fragmentsBuilder name="colored" class="org.apache.solr.highlight.ScoreOrderFragmentsBuilder">
<lst name="defaults">
<str name="hl.tag.pre"><![CDATA[
<b style="background:yellow">,<b style="background:lawgreen">,
<b style="background:aquamarine">,<b style="background:magenta">,
<b style="background:palegreen">,<b style="background:coral">,
<b style="background:wheat">,<b style="background:khaki">,
<b style="background:lime">,<b style="background:deepskyblue">]]></str>
<str name="hl.tag.post"><![CDATA[</b>]]></str>
</lst>
</fragmentsBuilder>
<requestHandler name="drupal" class="solr.SearchHandler" default="true">
...
<str name="hl">true</str>
<str name="hl.fl">content</str>
<int name="hl.snippets">5000</int>
<int name="hl.fragsize">300</int>
<str name="hl.simple.pre"><![CDATA[ <b style="background:yellow"><i> ]]></str>
<str name="hl.simple.post"><![CDATA[ </i></b> ]]></str>
<str name="hl.mergeContiguous">true</str>
<str name="hl.fragListBuilder">single</str>
<str name="hl.useFastVectorHighlighter">true</str>
But it always gives one great fragment (for each doc), but not with all occurrences.
Thanks,
Steve
Can you try with hl.fragsize=100 and hl.mergeContiguous=false and see how many fragments you get?
(Before adding the params directly in your SearchHandler in solrconfig.xml you can try various options by specifying all your params in query. Once you find a set of params you are happy with, use those in solrconfig.)
I just contributed a patch http://issues.apache.org/jira/i#browse/LUCENE-5317 that might be of interest. A Solr-wrapper is on its way.

How to parse the TermVectorComponent response to which Java Object?

In the Apache Java SOLR API, what is the correct Java object to read the TermVectorComponent response? http://lucene.apache.org/solr/api/index-all.html
For example, to parse a document list response from SOLR to a Java Object called SolrDocumentList which consists of objects of type SolrDocument as specified in the SOLR Apache 3.5 API reference,
NamedList<Object> solrResponse = solrServer.request(new QueryRequest(solrQuery));
SolrDocumentList solrDocumentList = (SolrDocumentList) solrResponse.get("response");
What is the equivalent way of getting the term vectors into a TermVectorComponent list instead and TermVector Component Objects out using Java SOLR 3.5 API?
Also, what is the Java object a termVector list can be read from a response to a query.
http://localhost:8983/solr/select/?fl=documentPageId,pageNumber,contents&q=documentId:49667&pageNumber:*&qt=tvrh&tv.tf=true&tv.fl=contents&tv.all=true
For example, the response looks like this:
<lst name="termVectors">
<lst name="doc-2">
<str name="uniqueKey">49667.16</str>
<lst name="contents">
<lst name="15">
<int name="tf">2</int>
<lst name="offsets">
<int name="start">147</int>
<int name="end">149</int>
<int name="start">278</int>
<int name="end">280</int>
</lst>
<lst name="positions">
<int name="position">23</int>
<int name="position">47</int>
</lst>
<int name="df">9</int>
<double name="tf-idf">0.2222222222222222</double>
</lst>
<lst name="15,">
<int name="tf">1</int>
<lst name="offsets">
<int name="start">1193</int>
<int name="end">1196</int>
</lst>
<lst name="positions">
<int name="position">188</int>
</lst>
<int name="df">3</int>
<double name="tf-idf">0.3333333333333333</double>
</lst>
<lst name="15.">
<int name="tf">1</int>
<lst name="offsets">
<int name="start">1019</int>
<int name="end">1022</int>
</lst>
<lst name="positions">
<int name="position">161</int>
</lst>
<int name="df">5</int>
<double name="tf-idf">0.2</double>
</lst>
<lst name="2">
<int name="tf">1</int>
<lst name="offsets">
<int name="start">1751</int>
<int name="end">1752</int>
</lst>
<lst name="positions">
<int name="position">276</int>
</lst>
<int name="df">10</int>
<double name="tf-idf">0.1</double>
</lst>
<lst name="22a">
<int name="tf">2</int>
<lst name="offsets">
<int name="start">1110</int>
<int name="end">1113</int>
<int name="start">1373</int>
<int name="end">1376</int>
</lst>
<lst name="positions">
<int name="position">174</int>
<int name="position">213</int>
</lst>
<int name="df">4</int>
<double name="tf-idf">0.5</double>
</lst>
<lst name="22b">
<int name="tf">1</int>
<lst name="offsets">
<int name="start">1118</int>
<int name="end">1121</int>
</lst>
<lst name="positions">
<int name="position">176</int>
</lst>
<int name="df">4</int>
<double name="tf-idf">0.25</double>
</lst>
<lst name="22b.">
<int name="tf">1</int>
<lst name="offsets">
<int name="start">1381</int>
<int name="end">1385</int>
</lst>
<lst name="positions">
<int name="position">215</int>
</lst>
<int name="df">3</int>
<double name="tf-idf">0.3333333333333333</double>
</lst>
<lst name="acceptable">
<int name="tf">1</int>
<lst name="offsets">
<int name="start">1556</int>
<int name="end">1566</int>
</lst>
<lst name="positions">
<int name="position">246</int>
</lst>
<int name="df">1</int>
<double name="tf-idf">1.0</double>
</lst>
<lst name="achieve">
<int name="tf">1</int>
<lst name="offsets">
<int name="start">883</int>
<int name="end">890</int>
</lst>
<lst name="positions">
<int name="position">138</int>
</lst>
<int name="df">4</int>
<double name="tf-idf">0.25</double>
</lst>
<lst name="allow">
<int name="tf">1</int>
<lst name="offsets">
<int name="start">1550</int>
<int name="end">1555</int>
</lst>
<lst name="positions">
<int name="position">245</int>
</lst>
<int name="df">3</int>
<double name="tf-idf">0.3333333333333333</double>
</lst>
<lst name="also">
<int name="tf">1</int>
<lst name="offsets">
<int name="start">224</int>
<int name="end">228</int>
</lst>
<lst name="positions">
<int name="position">38</int>
</lst>
<int name="df">9</int>
<double name="tf-idf">0.1111111111111111</double>
</lst>
<lst name="also,">
<int name="tf">1</int>
<lst name="offsets">
<int name="start">780</int>
<int name="end">785</int>
</lst>
<lst name="positions">
<int name="position">123</int>
</lst>
<int name="df">1</int>
<double name="tf-idf">1.0</double>
</lst>
<lst name="amplified">
<int name="tf">2</int>
<lst name="offsets">
<int name="start">1583</int>
<int name="end">1592</int>
<int name="start">1656</int>
<int name="end">1665</int>
</lst>
<lst name="positions">
<int name="position">250</int>
<int name="position">262</int>
</lst>
<int name="df">4</int>
<double name="tf-idf">0.5</double>
</lst>
<lst name="amplifier">
<int name="tf">1</int>
<lst name="offsets">
<int name="start">1741</int>
<int name="end">1750</int>
</lst>
<lst name="positions">
<int name="position">275</int>
</lst>
<int name="df">22</int>
<double name="tf-idf">0.045454545454545456</double>
</lst>
<lst name="amplifier.">
<int name="tf">2</int>
<lst name="offsets">
<int name="start">57</int>
<int name="end">67</int>
<int name="start">647</int>
<int name="end">657</int>
</lst>
<lst name="positions">
<int name="position">7</int>
<int name="position">104</int>
</lst>
<int name="df">4</int>
<double name="tf-idf">0.5</double>
</lst>
<lst name="amplitude">
<int name="tf">3</int>
<lst name="offsets">
<int name="start">72</int>
<int name="end">81</int>
<int name="start">759</int>
<int name="end">768</int>
<int name="start">848</int>
<int name="end">857</int>
</lst>
<lst name="positions">
<int name="position">9</int>
<int name="position">121</int>
<int name="position">134</int>
</lst>
<int name="df">1</int>
<double name="tf-idf">3.0</double>
</lst>
<lst name="appear">
<int name="tf">1</int>
<lst name="offsets">
<int name="start">110</int>
<int name="end">117</int>
</lst>
<lst name="positions">
<int name="position">16</int>
</lst>
<int name="df">1</int>
<double name="tf-idf">1.0</double>
</lst>
<lst name="between">
<int name="tf">1</int>
<lst name="offsets">
<int name="start">934</int>
<int name="end">941</int>
</lst>
<lst name="positions">
<int name="position">146</int>
</lst>
<int name="df">7</int>
<double name="tf-idf">0.14285714285714285</double>
</lst>
<lst name="c4">
<int name="tf">2</int>
<lst name="offsets">
<int name="start">1033</int>
<int name="end">1035</int>
<int name="start">1242</int>
<int name="end">1244</int>
</lst>
<lst name="positions">
<int name="position">163</int>
<int name="position">195</int>
</lst>
<int name="df">4</int>
<double name="tf-idf">0.5</double>
</lst>
<lst name="c4,">
<int name="tf">1</int>
<lst name="offsets">
<int name="start">1471</int>
<int name="end">1474</int>
</lst>
<lst name="positions">
<int name="position">229</int>
</lst>
<int name="df">2</int>
<double name="tf-idf">0.5</double>
</lst>
<lst name="c5">
<int name="tf">3</int>
<lst name="offsets">
<int name="start">210</int>
<int name="end">212</int>
<int name="start">715</int>
<int name="end">717</int>
<int name="start">993</int>
<int name="end">995</int>
</lst>
<lst name="positions">
<int name="position">34</int>
<int name="position">113</int>
<int name="position">155</int>
</lst>
<int name="df">5</int>
<double name="tf-idf">0.6</double>
</lst>
<lst name="c5,">
<int name="tf">1</int>
<lst name="offsets">
<int name="start">1475</int>
<int name="end">1478</int>
</lst>
<lst name="positions">
<int name="position">230</int>
</lst>
<int name="df">2</int>
<double name="tf-idf">0.5</double>
</lst>
<lst name="c6">
<int name="tf">4</int>
<lst name="offsets">
<int name="start">217</int>
<int name="end">219</int>
<int name="start">722</int>
<int name="end">724</int>
<int name="start">1000</int>
<int name="end">1002</int>
<int name="start">1483</int>
<int name="end">1485</int>
</lst>
<lst name="positions">
<int name="position">36</int>
<int name="position">115</int>
<int name="position">157</int>
<int name="position">232</int>
</lst>
<int name="df">5</int>
<double name="tf-idf">0.8</double>
</lst>
<lst name="can">
<int name="tf">2</int>
<lst name="offsets">
<int name="start">558</int>
<int name="end">561</int>
<int name="start">1486</int>
<int name="end">1489</int>
</lst>
<lst name="positions">
<int name="position">89</int>
<int name="position">233</int>
</lst>
<int name="df">9</int>
<double name="tf-idf">0.2222222222222222</double>
</lst>
<lst name="capacitance">
<int name="tf">2</int>
<lst name="offsets">
<int name="start">665</int>
<int name="end">677</int>
<int name="start">1216</int>
<int name="end">1228</int>
</lst>
<lst name="positions">
<int name="position">107</int>
<int name="position">192</int>
</lst>
<int name="df">6</int>
<double name="tf-idf">0.3333333333333333</double>
</lst>
<lst name="capacitor">
<int name="tf">8</int>
<lst name="offsets">
<int name="start">199</int>
<int name="end">209</int>
<int name="start">704</int>
<int name="end">714</int>
<int name="start">982</int>
<int name="end">992</int>
<int name="start">1023</int>
<int name="end">1032</int>
<int name="start">1057</int>
<int name="end">1067</int>
<int name="start">1232</int>
<int name="end">1241</int>
<int name="start">1266</int>
<int name="end">1276</int>
<int name="start">1460</int>
<int name="end">1470</int>
</lst>
<lst name="positions">
<int name="position">33</int>
<int name="position">112</int>
<int name="position">154</int>
<int name="position">162</int>
<int name="position">167</int>
<int name="position">194</int>
<int name="position">199</int>
<int name="position">228</int>
</lst>
<int name="df">16</int>
<double name="tf-idf">0.5</double>
</lst>
<lst name="cause">
<int name="tf">3</int>
<lst name="offsets">
<int name="start">506</int>
<int name="end">511</int>
<int name="start">562</int>
<int name="end">567</int>
<int name="start">1122</int>
<int name="end">1127</int>
</lst>
<lst name="positions">
<int name="position">84</int>
<int name="position">90</int>
<int name="position">177</int>
</lst>
<int name="df">5</int>
<double name="tf-idf">0.6</double>
</lst>
<lst name="characteristics,">
<int name="tf">1</int>
<lst name="offsets">
<int name="start">536</int>
<int name="end">552</int>
</lst>
<lst name="positions">
<int name="position">87</int>
</lst>
<int name="df">1</int>
<double name="tf-idf">1.0</double>
</lst>
<lst name="chopper-stabilized">
<int name="tf">1</int>
<lst name="offsets">
<int name="start">38</int>
<int name="end">56</int>
</lst>
<lst name="positions">
<int name="position">6</int>
</lst>
<int name="df">9</int>
<double name="tf-idf">0.1111111111111111</double>
</lst>
<lst name="chopping">
<int name="tf">6</int>
<lst name="offsets">
<int name="start">236</int>
<int name="end">244</int>
<int name="start">793</int>
<int name="end">801</int>
<int name="start">942</int>
<int name="end">950</int>
<int name="start">1390</int>
<int name="end">1398</int>
<int name="start">1507</int>
<int name="end">1515</int>
<int name="start">1608</int>
<int name="end">1616</int>
</lst>
<lst name="positions">
<int name="position">41</int>
<int name="position">126</int>
<int name="position">147</int>
<int name="position">217</int>
<int name="position">238</int>
<int name="position">254</int>
</lst>
<int name="df">19</int>
<double name="tf-idf">0.3157894736842105</double>
</lst>
<lst name="circuitry.">
<int name="tf">1</int>
<lst name="offsets">
<int name="start">446</int>
<int name="end">456</int>
</lst>
<lst name="positions">
<int name="position">74</int>
</lst>
<int name="df">1</int>
<double name="tf-idf">1.0</double>
</lst>
<str name="uniqueKeyFieldName">documentPageId</str>
</lst>
I don't have enough context on your application design, but from your code I am guessing that you are using Solrj client to query & process Solr response. You could try the following,
QueryResponse queryResponse = server.query(solrQuery);
TermsResponse termsResponse = queryResponse.getTermsResponse();
TermsResponse encapsulates response from terms component, perhaps TermsResponse won't contain complete term vector info from response, in which case the following option could be worth exploring,
Iterator<Entry<String, Object>> termVectors = ((NamedList) solrResponse.get("termVectors")).iterator();
while(termVectors.hasNext()){
Entry<String, Object> docTermVector = termVectors.next();
for(Iterator<Entry<String, Object>> fi = ((NamedList)docTermVector.getValue()).iterator(); fi.hasNext(); ){
Entry<String, Object> fieldEntry = fi.next();
if(fieldEntry.getKey().equals("contents")){
for(Iterator<Entry<String, Object>> tvInfoIt = ((NamedList)fieldEntry.getValue()).iterator(); tvInfoIt.hasNext(); ){
Entry<String, Object> tvInfo = tvInfoIt.next();
NamedList tv = (NamedList) tvInfo.getValue();
System.out.println("Vector Info: " + tvInfo.getKey() + " tf: " + tv.get("tf") + " df: " + tv.get("df") + " tf-idf: " + tv.get("tf-idf"));
}
}
}
}
This should yield,
Vector Info: 15 tf: 2 df: 9 tf-idf: 0.2222222222222222
Vector Info: 15, tf: 1 df: 3 tf-idf: 0.3333333333333333
........
You could process this into your own TermVector domain object as necessary, Hope this helps.
Here is my attempt to harvest the term vector data from the QueryResponse object provided by SolrJ. I had to break the object down into its piece parts in order for my style of understanding ....
I have not included the Pojos due to IP issues, but they should be fairly easy to infer .... An vector has one or more infos.
Hope this helps.
SCott
/**
* accept a list of things and marry the term vectors from the response to the list of things
*
* #param aQueryResponse
* #param list
*/
protected void extractTermVectorData(final QueryResponse aQueryResponse, final List<? extends BaseModel> list)
{
final NamedList<Object> response = aQueryResponse.getResponse();
NamedList<Object> termVectorsObject = null;
for ( int i = 0; i < response.size(); i++ )
{
final String name = response.getName(i);
if ( "termVectors".contentEquals(name) )
{
termVectorsObject = (NamedList<Object>)response.getVal(i);
break;
}
}
if ( null != termVectorsObject )
{
final ArrayList<IMGTermVector> termVectorList = process(termVectorsObject);
final int i = 0;
for ( final BaseModel model : list )
{
final IMGTermVector anIMGTermVector = termVectorList.get(i);
model.setTermVector(anIMGTermVector);
}
}
}
/**
* #param termVectorNamedList
* #return
*/
private ArrayList<IMGTermVector> process(final NamedList<Object> termVectorNamedList)
{
final Iterator<Entry<String, Object>> termVectorIteratior = consumeTopListHeader(termVectorNamedList);
final ArrayList<IMGTermVector> vectors = consumeVectorsFromList(termVectorIteratior);
return vectors;
}
/**
* #param termVectorNamedList
* #return
*/
private Iterator<Entry<String, Object>> consumeTopListHeader(final NamedList<Object> termVectorNamedList)
{
final Iterator<Entry<String, Object>> termVectorIteratior = termVectorNamedList.iterator();
final Entry<String, Object> termVectorHeaderEntry = termVectorIteratior.next();
final String termVectorHeaderEntryKey = termVectorHeaderEntry.getKey();
final Object termVectorHeaderEntryValue = termVectorHeaderEntry.getValue();
System.out.println(termVectorHeaderEntryKey + "=" + (String)termVectorHeaderEntryValue);
return termVectorIteratior;
}
/**
* #param termVectorIteratior
* #return
*/
#SuppressWarnings("unchecked")
private ArrayList<IMGTermVector> consumeVectorsFromList(final Iterator<Entry<String, Object>> termVectorIteratior)
{
final ArrayList<IMGTermVector> vectors = new ArrayList<IMGTermVector>();
while ( termVectorIteratior.hasNext() )
{
final IMGTermVector vector = new IMGTermVector();
vectors.add(vector);
final Entry<String, Object> termVectorEntry = termVectorIteratior.next();
final String termVectorEntryKey = termVectorEntry.getKey();
vector.setFieldEntry(termVectorEntryKey);
System.out.println("processing vector #" + termVectorEntryKey);
final NamedList<Object> termVectorEntryValue = (NamedList<Object>)termVectorEntry.getValue();
convertIntoVector(termVectorEntryValue, vector);
}
return vectors;
}
/**
* #param termVectorEntryValue
* #param vector
*/
private void convertIntoVector(final NamedList<Object> termVectorEntryValue, final IMGTermVector vector)
{
//
// grab vector header
//
final Iterator<Entry<String, Object>> vectorTermIteratior = consumeVectorHeader(termVectorEntryValue, vector);
//
// now process terms
//
final List<IMGTermVectorInfo> vectorInfoList = convertFilteredTextIntoInfos(vectorTermIteratior);
vector.setInfos(vectorInfoList);
}
/**
* #param vectorTermIteratior
* #return
*/
#SuppressWarnings("unchecked")
private List<IMGTermVectorInfo> convertFilteredTextIntoInfos(
final Iterator<Entry<String, Object>> termVectorEntryValueIteratior)
{
final List<IMGTermVectorInfo> vectorInfoList = new ArrayList<IMGTermVectorInfo>();
final Entry<String, Object> termVectorEntryValueIteratiorEntry = termVectorEntryValueIteratior.next();
final String key = termVectorEntryValueIteratiorEntry.getKey();
final NamedList<Object> value = (NamedList<Object>)termVectorEntryValueIteratiorEntry.getValue();
System.out.println("processing components of key " + key);
for ( final Iterator<Entry<String, Object>> termVectorInfoIteratior = value.iterator(); termVectorInfoIteratior
.hasNext(); )
{
final Entry<String, Object> fieldEntry = termVectorInfoIteratior.next();
final NamedList<Object> tv = (NamedList<Object>)fieldEntry.getValue();
final IMGTermVectorInfo info = parseTermVectorInfo(fieldEntry, tv);
vectorInfoList.add(info);
}
return vectorInfoList;
}
private IMGTermVectorInfo parseTermVectorInfo(final Entry<String, Object> tvInfo, final NamedList<Object> tv)
{
final IMGTermVectorInfo info = new IMGTermVectorInfo();
System.out.println("Vector Info: " + tvInfo.getKey() + " tf: " + tv.get("tf") + " df: " + tv.get("df")
+ " tf-idf: " + tv.get("tf-idf"));
info.setInfo(tvInfo.getKey());
info.setTf((Integer)tv.get("tf"));
info.setDf((Integer)tv.get("df"));
info.setTfidf((Double)tv.get("tf-idf"));
return info;
}
/**
* #param termVectorEntryValue
* #param vector
* #return
*/
private Iterator<Entry<String, Object>> consumeVectorHeader(final NamedList<Object> termVectorEntryValue,
final IMGTermVector vector)
{
final Iterator<Entry<String, Object>> termVectorEntryValueIteratior = termVectorEntryValue.iterator();
final Entry<String, Object> termVectorEntryValueIteratiorEntry = termVectorEntryValueIteratior.next();
final String key = termVectorEntryValueIteratiorEntry.getKey();
final String value = (String)termVectorEntryValueIteratiorEntry.getValue();
System.out.println(" " + key + "=" + value + " <<<--- ignoring this data for now");
return termVectorEntryValueIteratior;
}

Solr boolean queries combined with index-time boosts

I have a site using Solr 1.4.1 for relevancy/recommendations. I am using boolean-style queries in some places. I am using a query like +(+type:aoh_company +aoh_dictionary_tids:623) - and that provides the expected results, but the order of the results appear to be arbitrary.
I am trying to control the ranking of the document by setting index-time boosts, but they seem to be ignored for these queries.
An example
The query URL is http://localhost:4930/solr/prod/select?rows=5&start=0&q.alt=(type%3Aaoh_company)+(aoh_dictionary_tids%3A623)&q=
The results are returned in this order (with the index time boost value in parentheses):
17132 (1.22)
17179 (1.02)
17131 (1.10)
17133 (1.10)
17184 (1.10)
Obviously, result #2 should not come before #3-5 based on the boost alone.
Given this is a boolean query, there should not be much difference in ranking.
Debugging output
I tried debugging the query above by appending debugQuery=true to the query, so it becomes http://localhost:4930/solr/prod/select?rows=5&start=0&q.alt=(type%3Aaoh_company)+(aoh_dictionary_tids%3A623)&q=&debugQuery=true
It's very verbose, but here it is:
<lst name="debug">
<null name="rawquerystring"/>
<null name="querystring"/>
<str name="parsedquery">+(+type:aoh_company +aoh_dictionary_tids:623)</str>
<str name="parsedquery_toString">+(+type:aoh_company +aoh_dictionary_tids:623)</str>
<lst name="explain">
<str name="50hves/node/17132">
1.7819747 = (MATCH) sum of:
0.9014403 = (MATCH) weight(type:aoh_company in 1805), product of:
0.37135038 = queryWeight(type:aoh_company), product of:
2.4274657 = idf(docFreq=457, maxDocs=1909)
0.15297863 = queryNorm
2.4274657 = (MATCH) fieldWeight(type:aoh_company in 1805), product of:
1.0 = tf(termFreq(type:aoh_company)=1)
2.4274657 = idf(docFreq=457, maxDocs=1909)
1.0 = fieldNorm(field=type, doc=1805)
0.88053435 = (MATCH) weight(aoh_dictionary_tids:623 in 1805), product of:
0.9284928 = queryWeight(aoh_dictionary_tids:623), product of:
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15297863 = queryNorm
0.9483481 = (MATCH) fieldWeight(aoh_dictionary_tids:623 in 1805), product of:
1.0 = tf(termFreq(aoh_dictionary_tids:623)=1)
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15625 = fieldNorm(field=aoh_dictionary_tids, doc=1805)
</str>
<str name="50hves/node/17179">
1.7819747 = (MATCH) sum of:
0.9014403 = (MATCH) weight(type:aoh_company in 1896), product of:
0.37135038 = queryWeight(type:aoh_company), product of:
2.4274657 = idf(docFreq=457, maxDocs=1909)
0.15297863 = queryNorm
2.4274657 = (MATCH) fieldWeight(type:aoh_company in 1896), product of:
1.0 = tf(termFreq(type:aoh_company)=1)
2.4274657 = idf(docFreq=457, maxDocs=1909)
1.0 = fieldNorm(field=type, doc=1896)
0.88053435 = (MATCH) weight(aoh_dictionary_tids:623 in 1896), product of:
0.9284928 = queryWeight(aoh_dictionary_tids:623), product of:
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15297863 = queryNorm
0.9483481 = (MATCH) fieldWeight(aoh_dictionary_tids:623 in 1896), product of:
1.0 = tf(termFreq(aoh_dictionary_tids:623)=1)
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15625 = fieldNorm(field=aoh_dictionary_tids, doc=1896)
</str>
<str name="50hves/node/17131">
1.7819747 = (MATCH) sum of:
0.9014403 = (MATCH) weight(type:aoh_company in 1905), product of:
0.37135038 = queryWeight(type:aoh_company), product of:
2.4274657 = idf(docFreq=457, maxDocs=1909)
0.15297863 = queryNorm
2.4274657 = (MATCH) fieldWeight(type:aoh_company in 1905), product of:
1.0 = tf(termFreq(type:aoh_company)=1)
2.4274657 = idf(docFreq=457, maxDocs=1909)
1.0 = fieldNorm(field=type, doc=1905)
0.88053435 = (MATCH) weight(aoh_dictionary_tids:623 in 1905), product of:
0.9284928 = queryWeight(aoh_dictionary_tids:623), product of:
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15297863 = queryNorm
0.9483481 = (MATCH) fieldWeight(aoh_dictionary_tids:623 in 1905), product of:
1.0 = tf(termFreq(aoh_dictionary_tids:623)=1)
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15625 = fieldNorm(field=aoh_dictionary_tids, doc=1905)
</str>
<str name="50hves/node/17133">
1.7819747 = (MATCH) sum of:
0.9014403 = (MATCH) weight(type:aoh_company in 1906), product of:
0.37135038 = queryWeight(type:aoh_company), product of:
2.4274657 = idf(docFreq=457, maxDocs=1909)
0.15297863 = queryNorm
2.4274657 = (MATCH) fieldWeight(type:aoh_company in 1906), product of:
1.0 = tf(termFreq(type:aoh_company)=1)
2.4274657 = idf(docFreq=457, maxDocs=1909)
1.0 = fieldNorm(field=type, doc=1906)
0.88053435 = (MATCH) weight(aoh_dictionary_tids:623 in 1906), product of:
0.9284928 = queryWeight(aoh_dictionary_tids:623), product of:
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15297863 = queryNorm
0.9483481 = (MATCH) fieldWeight(aoh_dictionary_tids:623 in 1906), product of:
1.0 = tf(termFreq(aoh_dictionary_tids:623)=1)
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15625 = fieldNorm(field=aoh_dictionary_tids, doc=1906)
</str>
<str name="50hves/node/17184">
1.6058679 = (MATCH) sum of:
0.9014403 = (MATCH) weight(type:aoh_company in 1892), product of:
0.37135038 = queryWeight(type:aoh_company), product of:
2.4274657 = idf(docFreq=457, maxDocs=1909)
0.15297863 = queryNorm
2.4274657 = (MATCH) fieldWeight(type:aoh_company in 1892), product of:
1.0 = tf(termFreq(type:aoh_company)=1)
2.4274657 = idf(docFreq=457, maxDocs=1909)
1.0 = fieldNorm(field=type, doc=1892)
0.7044275 = (MATCH) weight(aoh_dictionary_tids:623 in 1892), product of:
0.9284928 = queryWeight(aoh_dictionary_tids:623), product of:
6.069428 = idf(docFreq=11, maxDocs=1909)
0.15297863 = queryNorm
0.7586785 = (MATCH) fieldWeight(aoh_dictionary_tids:623 in 1892), product of:
1.0 = tf(termFreq(aoh_dictionary_tids:623)=1)
6.069428 = idf(docFreq=11, maxDocs=1909)
0.125 = fieldNorm(field=aoh_dictionary_tids, doc=1892)
</str>
</lst>
<str name="QParser">DisMaxQParser</str>
<str name="altquerystring">org.apache.lucene.search.BooleanQuery:+type:aoh_company +aoh_dictionary_tids:623</str>
<null name="boostfuncs"/>
<lst name="timing">
<double name="time">7.0</double>
<lst name="prepare">
<double name="time">1.0</double>
<lst name="org.apache.solr.handler.component.QueryComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.FacetComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.HighlightComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.StatsComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.SpellCheckComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.DebugComponent">
<double name="time">0.0</double>
</lst>
</lst>
<lst name="process">
<double name="time">6.0</double>
<lst name="org.apache.solr.handler.component.QueryComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.FacetComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.HighlightComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.StatsComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.SpellCheckComponent">
<double name="time">0.0</double>
</lst>
<lst name="org.apache.solr.handler.component.DebugComponent">
<double name="time">6.0</double>
</lst>
</lst>
</lst>
As I read it, the first four results are scored 1.7819747, and the fifth is scored 1.6058679, and I can't see the boost values anywhere in there, so it seems that they are not a factor in the ranking equation.
So what am I doing wrong. Is there something I need to do to make Solr take the boosts into consideration?
Is there a way to check the boost value stored in Solr? It looks right in the documents I send to it, but I can't find a way to see the stored value?
Additionally, here's the relevant parts from my schema.xml:
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="integer" class="solr.IntField" omitNorms="true"/>
</types>
<fields>
<field name="type" type="string" indexed="true" stored="true"/>
<field name="aoh_dictionary_tids" type="integer" indexed="true" stored="true" multiValued="true" omitNorms="false"/>
</fields>
In his answer below, fyr mentioned that norms need to be enabled on the field for the boost value to apply. So I'd like to amend my question a bit:
Is it enough to have norms enabled on one of the queried fields for the boost to apply?
Does my omitNorms="false" on the field override the omitNorms="true" on the fieldType?
Any help would be greatly appreciated.
You will not see the boost in the explain. Boosting at indexing time is applied to the Norms of a certain field in a certain document. Like a multiplicator.
If you have Norms enabled your bosst value is used at indexing time. Norms are always part of the similarity function if you use the DefaultSimilarity and Norms are enabled.
Edit for the follow up questions:
It is enough to have norms enabled for the boost to apply. Because norms provide the field in the index with a data weight structure in the index. And index time boosts are multiplied on the norm value and saved to the norm field.
omitNorms on the field declaration overrides the type definition - You see this also on your explain structure. aoh_dictionary has a value which does not equal 1. If norms are disabled 1 is as default applied.

Resources