Solr configuration for scored search - solr

I am trying to setup a Solr index for searching against a database of product information. For this purpose, I have populated a database of product details and used Solr 6.0.0. For a given product detail (title, brand, other keywords), I would like to know if there is a product in the database that closely matches the given details. I have started dataimport and created the index. However, when I search, the scores of the matching product are all the same in spite of the products listed being different. I have tried with different combinations of search keywords, but the result is similar in every case. I have also tried using different Tokenizers and Filters.
Sample of schema.xml I have tried is:
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">
<field name="id" type="Int" indexed="true" stored="true"/>
<field name="name" type="text_general" indexed="true" stored="true" />
<field name="brand" type="text_general" indexed="true" stored="true"/>
<field name="category" type="text_general" indexed="true" stored="true"/>
<field name="description" type="text_general" indexed="true" stored="true" />
<field name="catchall" type="text_general" indexed="true" stored="true" multiValued="true" />
<copyField source="id" dest="catchall" />
<copyField source="name" dest="catchall" />
<copyField source="brand" dest="catchall" />
<copyField source="category" dest="catchall" />
<copyField source="description" dest="catchall" />
<uniqueKey>id</uniqueKey>
<defaultSearchField>catchall</defaultSearchField>
<types>
<fieldtype name="string" class="solr.StrField" sortMissingLast="true" />
<fieldtype name="Int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldtype name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
splitOnNumerics="1"
splitOnCaseChange="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
preserveOriginal="1"
/>
<filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
<filter class="solr.ICUFoldingFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
splitOnNumerics="1"
splitOnCaseChange="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
preserveOriginal="1"
/>
<filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
<filter class="solr.ICUFoldingFilterFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldtype>
<fieldtype name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" />
</types>
</schema>
Edit
The entity definition from data-config.xml is as below
<entity name="master_products"
pk="id"
query="select p.* ,b.* from master_products p ,master_brands b where b.id=p.brand_id"
deltaImportQuery="SELECT * FROM master_products WHERE product_name='${dataimporter.delta.product_name}' "
>
<!-- or b.brnad='${dataimporter.delta.brand}' -->
<field column="product_name" name="name"/>
<field column="product_description" name="description"/>
<field column="id" name="id"/>
<field column="mrp" name="mrp"/>
<field column="brand" name="brand"/>
<entity name="master_brands"
query="select * from master_brands"
deltaImportQuery="select * from master_brands where id ={master_products.brand_id}" processor="SqlEntityProcessor" cacheImpl="SortedMapBackedCache" >
</entity>
<entity name="master_product_categories"
query="select * from master_product_categories"
deltaImportQuery="select * from master_product_categories where id ={master_products. product_category_id}" processor="SqlEntityProcessor" cacheImpl="SortedMapBackedCache" >
<field column="category" name="category" />
</entity>
</entity>
Edit
The query is as below.
http://localhost:8983/solr/myproducts/select?fl=* score&fq=brand:Nikon&fq=mrp:28950*&indent=on&q=name:*"Nikon D3200 (Black) DSLR with AF-S 18-55mm VR Kit Lens"*&wt=json
I would like help in achieving my goal. Can you please direct me to creating the proper configuration that would meet my purpose? Thanks in advance.

Wildcard queries are constant scoring, meaning that they won't change the score of the documents that match. You probably want to use regular querying (and not wildcards) to get proper scoring between documents.
Range queries [a TO z], prefix queries a*, and wildcard queries a*b are constant-scoring (all matching documents get an equal score). The scoring factors tf, idf, index boost, and coord are not used. There is no limitation on the number of terms that match (as there was in past versions of Lucene).
fq terms does not affect score, they just filter the result set.

Related

Solr combining exact match and likely match on single text field not working

I am trying to perform likely search on full-name fields and exact match on office-no,mobile-number,house-no,other-phone-number fields .All these i have copied to Text field "full-search-all" so that i can configure into website for a single text box where users can search for full-name like Kat should return Katric and if they give exact mobile number as 123456789 on same text field should return exact match result. Either one(exact match on mobile,office,house numbers OR likely match on full-name) working for my "full-search-all" field when i perform search.Both of them not working on full-search-all field in solrAdmin. I am Stanadard Query Parser.
I have placed my schema.xml file which i have created for my search.
Please can you pointout where is the wrong in Schema.xml file . Both search won't be searchable on single text field?
Complete schema.xml file below
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="dynamic" version="1.5">
<types>
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="15" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
<fieldType name="search" class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer type="index">
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
<fieldType name="exactstring" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true" />
<fieldType name="long" class="solr.TrieLongField" />
</types>
<fields>
<!-- The _version_ field is required when using the Solr update log or SolrCloud (cfr. SOLR-3432) -->
<field name="_version_" type="long" indexed="true" stored="true" />
<field name="full-search-all" type="search" indexed="true" stored="false" multiValued="true" />
<field name="phone-number" type="exactstring" indexed="true" stored="false" multiValued="true" />
<!-- Exact Match columns -->
<copyField source="mobile-number" dest="phone-number" />
<copyField source="house-no" dest="phone-number" />
<copyField source="office-no" dest="phone-number" />
<copyField source="other-phone-number" dest="phone-number" />
<copyField source="mobile-number" dest="full-search-all" />
<copyField source="house-no" dest="full-search-all" />
<copyField source="office-no" dest="full-search-all" />
<copyField source="other-phone-number" dest="full-search-all" />
<copyField source="full-name" dest="full-search-all" />
<!-- query fields -->
<field name="application-id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="full-name" type="text_general" indexed="true" stored="true" required="false" multiValued="false" />
<field name="mobile-number" type="exactstring" indexed="true" stored="true" required="false" multiValued="false" />
<field name="house-no" type="exactstring" indexed="true" stored="true" required="false" multiValued="false" />
<field name="office-no" type="exactstring" indexed="true" stored="true" required="false" multiValued="false" />
<field name="other-phone-number" type="exactstring" indexed="true" stored="true" required="false" multiValued="false" />
<field name="campaign-name" type="string" indexed="true" stored="true" required="false" multiValued="false" />
<field name="reason" type="text_general" indexed="true" stored="true" required="false" multiValued="false" />
</fields>
<uniqueKey>application-id</uniqueKey>
</schema>
Field name only should consist of alphanumeric or underscore characters only and not start with a digit
The name of the field. Field names should consist of alphanumeric or underscore characters only and not start with a digit. This is not currently strictly enforced, but other field names will not have first class support from all components and back compatibility is not guaranteed. Names with both leading and trailing underscores (e.g. version) are reserved. Every field must have a name.
Most of your field name contain - character, remove the character.
Source : https://cwiki.apache.org/confluence/display/solr/Defining+Fields
Once you have copied field into full_search_all field, you can't separate them from that field. So if you want name to be prefix, phone to be exact search you can't do this with a single field.
Instead write a query analyzer, which will tell you on which which field to perform search.
For example : If a user write 123456789 (only numeric) on the text box, your query analyzer should return field to search is phone_number.
Query will be:
phone_number : 123456789
And if a user write ashraful (non numeric) on the text box your query analyzer should return full_name.
Query will be :
full_name : ashraful

Solr spatial query returning location outside the search range

I'm using Solr 5.3.0 and evaluating geo spatial search. I followed the instructions per online reference and see results outside the radius. Do you see any issue in the schema or data?
http://localhost:8983/solr/demo2/select?q=*.*&wt=json&indent=true&spatial=true&pt=4.89%2C-4.05&sfield=geoloc&d=1
<schema name="weather lookup index" version="1.3">
<types>
<fieldType name="integer" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
<fieldtype name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>
<fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
<fieldType name="text" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
</types>
<fields>
<field name="name" type="text" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="geoloc" type="location" indexed="true" multiValued="false" stored="true"/>
<dynamicField name="*_coordinate" type="tdouble" indexed="true" stored="false"/>
</fields>
<defaultSearchField>name</defaultSearchField>
<solrQueryParser defaultOperator="OR"/>
</schema>
Result:
{
"responseHeader": {
"status": 0,
"QTime": 2,
"params": {
"q": ".",
"pt": "4.89,-4.05",
"d": "1",
"indent": "true",
"spatial": "true",
"sfield": "geoloc",
"wt": "json",
"": "1443653671468"
}
},
"response": {
"numFound": 1,
"start": 0,
"docs": [
{
"name": [ "test" ],
"geoloc": [ "40.89,-86.05" ],
"id": "9711c69e-9ac3-4302-a41d-719f57fde24c",
"_version": 1513779446777118700
}
]
}
}
I beleive you want to add a filter to your search as documented here. As you have it now you are defining parameters, but not filtering anything.
&fq={!geofilt pt=4.89,-4.05 sfield=geoloc d=1}
It looked like Solr didn't index lat long all along. Solr creates a file called managed_schema in C:\Program Files\solr\server\solr\\conf that override my schema.xml. I got rid of the managed_schema file and it started using schema.xml (I had to add few missing types to schema.xml to make it work).

SOLR performance

I am using SolrJ + Solr in my project.
The problem is that I faced unclear bottleneck regarding Solr/Jetty
Using jvisualvm I connected to JVM instance under which Solr launched and saw that 77% of time spent in method "org.eclipse.jetty.io.ByteArrayBuffer.readFrom()", stacktrace of one of threads is below:
"qtp64700533-36718" - Thread t#36718
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at org.eclipse.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:391)
at org.eclipse.jetty.io.bio.StreamEndPoint.fill(StreamEndPoint.java:141)
at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.fill(SocketConnector.java:227)
at org.eclipse.jetty.http.HttpParser.fill(HttpParser.java:1040)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:280)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)
So, it may looks OK that time spent on I/O, but:
application, which doing query launched on local machine (so I/O time should not be big, and thread state "RUNNABLE" in above stacktrace seems suspicious)
query response times may have up to 5-10 seconds
Load average on machine (CentOS) is about 10
Any help/advices appreciated, thanks!
UPD:
Indeed, guys, I forgot to give addtional info. Here it is:
hardware: i3770, 32gb ram, according to iotop it shows 50-600kb/sec read, 200-1000kb/sec write (almost most relates to SOLR process)
OS: Centos 6.6
java: OpenJDK 64-Bit Server VM (1.7.0_71 24.65-b04)
solr: 4.9.0 (launched with -Xmx=24000, but I think should split SOLR cores to separare JVM SOLR instances to minimize GC time)
solrj: 4.10.3, adding/updating/removing documents done with commitWithIn=10000 msec in java code.
about schemas: I am storing in SOLR data (ads + objects) regarding 5 countries: UA, RU, PL, BY, KZ.
So, there are 2 cores for each country, for example for Ukraine: ua_ads and ua_objects (10 cores in total)
Schemas between countries almost indentical, see below for Ukraine
"ua_ads" schema (should rename it from "example" though :) )
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
<fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldType name="text_ru" class="solr.TextField" positionIncrementGap="100"/>
<field name="_version_" type="long" indexed="true" stored="true"/>
<uniqueKey>adId</uniqueKey>
<field name="adId" type="long" indexed="true" stored="true" required="true"/>
<field name="objectId" type="long" indexed="true" stored="true" required="false"/>
<field name="url" type="string" indexed="false" stored="true" required="true"/>
<field name="regionId" type="int" indexed="false" stored="true" required="true"/>
<field name="sourceId" type="int" indexed="false" stored="true" required="true"/>
<field name="type" type="int" indexed="false" stored="true" required="true"/>
<field name="title" type="text_ru" indexed="false" stored="true" required="true"/>
<field name="address" type="text_ru" indexed="false" stored="true" required="true"/>
<field name="text" type="text_ru" indexed="false" stored="true" required="true"/>
<field name="dateFound" type="tdate" indexed="true" stored="true" required="true"/>
<!-- should be a string field (not int) to avoid cutting zero at beginning of phone number -->
<field name="phoneNumbers" type="string" indexed="true" stored="true" required="true" multiValued="true"/>
<field name="priceLocal" type="long" indexed="false" stored="true" required="false"/>
<field name="priceUsd" type="long" indexed="false" stored="true" required="false"/>
<field name="currency" type="int" indexed="false" stored="true" required="false"/>
<field name="roomsCount" type="int" indexed="false" stored="true" required="false"/>
<field name="area" type="int" indexed="false" stored="true" required="false"/>
<field name="imagesCount" type="int" indexed="true" stored="true" required="true"/>
</schema>
"ua_objects" schema
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
<fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldtype name="binary" class="solr.BinaryField"/>
<fieldType name="addr_ru" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<!-- no stemming for address, dots must me followed by space: "г. Киев" -->
<!-- char filters is always firs (preprocessing) -->
<charFilter class="solr.MappingCharFilterFactory" mapping="lang/chars_replacement.txt" />
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<!-- replacing all except letters, removing "-" in home address (9-А) -->
<filter class="solr.PatternReplaceFilterFactory" pattern="[^0-9abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюяіїє\-]" replacement="" replace="all"/>
<!-- replacing all except letters, removing "-" in home address ("9-а" => "9а") -->
<filter class="solr.PatternReplaceFilterFactory" pattern="(\d{1,3})[\- ]([абвгдеёжзийклмнопрстуфхцчшщ])" replacement="$1$2" replace="all"/>
<filter class="solr.SynonymFilterFactory" ignoreCase="true" synonyms="lang/cities_ukr2rus.txt"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="ї" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="і" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="й" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="ё" replacement="е" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="є" replacement="е" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="э" replacement="е" replace="all"/>
<!-- 1-length is for case with home letters: "Хрещатик, 3" -->
<filter class="solr.LengthFilterFactory" min="1" max="64"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt,lang/stopwords_addr.txt" format="snowball"/>
</analyzer>
</fieldType>
<fieldType name="text_ru" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<!-- dots must me followed by space: "г. Киев" -->
<!-- char filters is always firs (preprocessing) -->
<charFilter class="solr.MappingCharFilterFactory" mapping="lang/chars_replacement.txt" />
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="[^0-9abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюяіїє\-]" replacement="" replace="all"/>
<!-- replacing all except letters, removing "-" in home address ("9-а" => "9а") -->
<filter class="solr.PatternReplaceFilterFactory" pattern="(\d{1,3})[\- ]([абвгдеёжзийклмнопрстуфхцчшщ])" replacement="$1$2" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="ї" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="і" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="й" replacement="и" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="ё" replacement="е" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="є" replacement="е" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="э" replacement="е" replace="all"/>
<filter class="solr.LengthFilterFactory" min="1" max="64"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt" format="snowball"/>
<filter class="solr.SynonymFilterFactory" ignoreCase="true" synonyms="lang/synonyms.txt"/>
<filter class="solr.SnowballPorterFilterFactory" language="Russian"/>
</analyzer>
</fieldType>
<field name="_version_" type="long" indexed="true" stored="true"/>
<uniqueKey>objectId</uniqueKey>
<field name="objectId" type="long" indexed="true" stored="true" required="true"/>
<field name="url" type="string" indexed="false" stored="true" required="true"/>
<field name="regionId" type="int" indexed="true" stored="true" required="true"/>
<field name="sourceId" type="int" indexed="false" stored="true" required="true"/>
<field name="type" type="int" indexed="true" stored="true" required="true"/>
<field name="address" type="addr_ru" indexed="true" stored="true" required="true"/>
<field name="title" type="text_ru" indexed="true" stored="true" required="true"/>
<field name="text" type="text_ru" indexed="true" stored="true" required="true"/>
<field name="dateFound" type="tdate" indexed="true" stored="true" required="true"/>
<!-- should be a string field (not int) to avoid cutting zero at beginning of phone number -->
<field name="phoneNumbers" type="string" indexed="true" stored="true" required="true" multiValued="true"/>
<field name="ownerDetected" type="boolean" indexed="true" stored="true" required="true"/>
<field name="priceUsd" type="long" indexed="true" stored="true" required="false"/>
<field name="priceLocal" type="long" indexed="false" stored="true" required="false"/>
<field name="currency" type="int" indexed="false" stored="true" required="false"/>
<field name="roomsCount" type="int" indexed="true" stored="true" required="false"/>
<field name="area" type="int" indexed="true" stored="true" required="false"/>
<field name="dateUpdated" type="tdate" indexed="true" stored="true" required="true"/>
<field name="dateClosed" type="tdate" indexed="true" stored="true" required="false"/>
<field name="m2priceRel" type="float" indexed="true" stored="true" required="false"/>
<field name="ceddData" type="binary" indexed="false" stored="true" required="false" multiValued="true"/>
<field name="imagesCount" type="int" indexed="true" stored="true" required="true"/>
<field name="uniqAdTexts" type="string" indexed="false" stored="true" required="true" multiValued="true"/>
</schema>
biggest indexes:
ru_ads: 2.99gb
ru_objects: 3.25gb
ua_ads: 5.45gb
ua_objects: 2.36gb
other cores indexes relatively small
queries which runs too long ("too long" from client-side) looks like this one (took from SOLR log, "????" is just non-english letters)
400723188 [qtp64700533-40547] INFO org.apache.solr.core.SolrCore ? [ua-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(??????\+????????\+???????\+????????)+AND+type:3+AND+regionId:2+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[2+TO+2])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[40+TO+60])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[23500+TO+70500])+AND+dateUpdated:[2014-12-09T10:23:07Z+TO+2015-01-28T10:23:07Z]+AND+-objectId:(27824841)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=18 status=0 QTime=287
401989528 [qtp64700533-40830] INFO org.apache.solr.core.SolrCore ? [ru-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(?????????????\+??????)+AND+type:4+AND+regionId:162+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[1+TO+1])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[40+TO+58])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[9+TO+27])+AND+dateUpdated:[2014-12-09T10:44:08Z+TO+2015-01-28T10:44:08Z]+AND+-objectId:(26415616)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=820 status=0 QTime=5755
400832723 [qtp64700533-40322] INFO org.apache.solr.core.SolrCore ? [ru-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(????????\+???????)+AND+type:4+AND+regionId:102+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[1+TO+1])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[31+TO+45])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[115+TO+343])+AND+dateUpdated:[2014-12-09T10:24:57Z+TO+2015-01-28T10:24:57Z]+AND+-objectId:(26415342)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=9 status=0 QTime=372
402069370 [qtp64700533-40832] INFO org.apache.solr.core.SolrCore ? [ru-objects] webapp=/solr path=/select params={mm=1&fl=*&start=0&q=(????????\+?????????\+??\+????????)+AND+type:3+AND+regionId:135+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[1+TO+1])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[28+TO+40])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[9529+TO+28585])+AND+dateUpdated:[2014-10-30T10:45:33Z+TO+2015-01-28T10:45:33Z]+AND+-objectId:(26415855)&qf=address^20+title^2+text&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=14075 status=0 QTime=544
401805198 [qtp64700533-40233] INFO org.apache.solr.core.SolrCore ? [ua-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(??????\+??\+??????\+?????\+??????????)+AND+type:3+AND+regionId:16+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[3+TO+3])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[93+TO+95])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[284050+TO+313950])+AND+dateUpdated:[2015-01-08T10:41:09Z+TO+2015-01-28T10:41:09Z]+AND+-objectId:(27826334)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=6 status=0 QTime=462
here is fresh profiling screenshot from jvisualvm
part of "top" command, delay=10sec
You have given the parameter rows=2147483647 in every of your queries. The meaning of this parameter is (taken from the reference)
You can use the rows parameter to paginate results from a query. The
parameter specifies the maximum number of documents from the complete
result set that Solr should return to the client at one time.
The default value is 10. That is, by default, Solr returns 10
documents at a time in response to a query.
So you are telling Solr in effect to send all hits found for a query in a single response. This is the reason for your bad performance.
Does google send you all 500.000.000 hits found when querying for "java", no. Why not, performance. Each and every IR application I know gives you a small page with the first results so that a search performs well.
This is the reason for your high I/O, solr fetches the records from the disk and writes them to the response. This is I/O, nothing more, nothing less.
Since you are using this for analytics and want to extract everything matching, you should look into the new streaming export feature. Unfortunately, it is only available in Solr 4.10.
You can also update to SSD - it is very good boost for Solr performance.
Finally, review your cache levels. If you don't update frequently and some of the caches are full, you could increase the defaults. If you do update frequently, it's not as beneficial as caches are invalidated on commits.

Errors when attempting to create a CQL3 backed SOLR Core

We are currently attempting to create a CQL3 Backed SOLR core on DSE 3.2.5. The curl command to create the core fails with a schema disagreement error (after waiting 30 seconds). After attempting to reload the core several times the indexes were created. We tried adding in some data and reindexing but kept getting null:org.apache.solr.common.SolrException errors on two of the machines. One machine works.
EDIT: The field name error was due to inserting the maps with the incorrect field name. Now one node works but the other two keep getting the following error.
we are getting this error on two of the nodes:
INFO [http-8983-7] 2014-03-17 16:02:01,715 SolrDispatchFilter.java (line 618) [admin] webapp=null path=/admin/cores params={deleteAll=true&action=RELOAD&reindex=true&_=1395072121091&core=linkcurrent_search.content_items&wt=json} status=500 QTime=543
ERROR [http-8983-7] 2014-03-17 16:02:01,715 SolrException.java (line 136) null:org.apache.solr.common.SolrException
at com.datastax.bdp.search.solr.core.SolrCoreResourceManager.reloadCore(SolrCoreResourceManager.java:439)
at com.datastax.bdp.search.solr.handler.admin.CassandraCoreAdminHandler.handleReloadAction(CassandraCoreAdminHandler.java:144)
at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:170)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:615)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:206)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at com.datastax.bdp.search.solr.servlet.CassandraDispatchFilter.doFilter(CassandraDispatchFilter.java:90)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.datastax.bdp.cassandra.audit.SolrHttpAuditLogFilter.doFilter(SolrHttpAuditLogFilter.java:194)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.datastax.bdp.search.solr.auth.CassandraAuthorizationFilter.doFilter(CassandraAuthorizationFilter.java:92)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.datastax.bdp.search.solr.auth.DseAuthenticationFilter.doFilter(DseAuthenticationFilter.java:102)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:724)
ERROR [http-8983-7] 2014-03-17 16:02:01,716 SolrDispatchFilter.java (line 642) Error request exception: null
org.apache.solr.common.SolrException
at com.datastax.bdp.search.solr.core.SolrCoreResourceManager.reloadCore(SolrCoreResourceManager.java:439)
at com.datastax.bdp.search.solr.handler.admin.CassandraCoreAdminHandler.handleReloadAction(CassandraCoreAdminHandler.java:144)
at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:170)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:615)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:206)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at com.datastax.bdp.search.solr.servlet.CassandraDispatchFilter.doFilter(CassandraDispatchFilter.java:90)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.datastax.bdp.cassandra.audit.SolrHttpAuditLogFilter.doFilter(SolrHttpAuditLogFilter.java:194)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.datastax.bdp.search.solr.auth.CassandraAuthorizationFilter.doFilter(CassandraAuthorizationFilter.java:92)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.datastax.bdp.search.solr.auth.DseAuthenticationFilter.doFilter(DseAuthenticationFilter.java:102)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:724)
Here is our schema.xml:
<schema name="content" version="1.5">
<types>
<fieldType name="tfloat" class="solr.TrieFloatField" omitNorms="true" positionIncrementGap="0"/>
<fieldType name="string" class="solr.StrField"/>
<fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
<fieldType name="lowercase" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
</types>
<fields>
<!-- Our standard fields for searching -->
<field name="content_id" type="string" indexed="true" stored="true" required="true" />
<field name="account_id" type="string" indexed="true" stored="true" required="true" />
<field name="url" type="string" indexed="true" stored="true" required="true" />
<field name="published_at" type="tdate" indexed="true" stored="true" />
<field name="title" type="text_general" indexed="true" stored="true" />
<!-- Multi-valued fields -->
<field name="authors" type="lowercase" multiValued="true" stored="true" indexed="true" omitNorms="true" />
<field name="tags" type="lowercase" multiValued="true" stored="true" indexed="true" omitNorms="true" />
<field name="channels" type="lowercase" multiValued="true" stored="true" indexed="true" omitNorms="true" />
<field name="boards" type="lowercase" multiValued="true" stored="true" indexed="true" omitNorms="true" />
<!-- Dynamic fields -->
<dynamicField name="shared_on_*" type="string" indexed="true" stored="true" />
<dynamicField name="score_value_*" type="tfloat" indexed="true" stored="true" />
<dynamicField name="score_calculated_*" type="tdate" indexed="true" stored="true" />
<dynamicField name="score_pspv_*" type="tfloat" indexed="true" stored="true" />
<dynamicField name="score_velocity_*" type="tfloat" indexed="true" stored="true" />
</fields>
<defaultSearchField>title</defaultSearchField>
<solrQueryParser defaultOperator="AND" />
<uniqueKey>content_id</uniqueKey>
</schema>
and our table definition:
CREATE KEYSPACE linkcurrent_search
WITH REPLICATION = { 'class': 'NetworkTopologyStrategy', 'Solr': 3, 'Cassandra': 3 }; -- production
USE linkcurrent_search;
CREATE TABLE content_items (
content_id TEXT, -- The content_id
account_id TEXT, -- The account_id
url TEXT, -- The URL of the article
published_at TIMESTAMP, -- When the article was published
title TEXT, -- The title of the article
-- Our multi-valued fields
authors SET<TEXT>, -- Set of authors
channels SET<TEXT>, -- Set of channels
tags SET<TEXT>, -- Set of tags
boards SET<TEXT>, -- Set of associated boards
-- Our dynammic fields
shared_on_ MAP<TEXT, TEXT>, -- Social networks that item has been shared on
score_value_ MAP<TEXT, FLOAT>, -- The current score value
score_calculated_ MAP<TEXT, TIMESTAMP>, -- The last time the score was calculated
score_pspv_ MAP<TEXT, FLOAT>, -- The predicted social pageviews from the score calculation
score_velocity_ MAP<TEXT, FLOAT>, -- The current velocity of the score
PRIMARY KEY(content_id)
) WITH comment = 'Content items '
AND read_repair_chance=0.001000;
And this is the data we entered:
cqlsh:linkcurrent_search> select * from content_items;
# Row 1
-------------------+-----------------------------------------------------------------------------------------------
content_id | 4893332cd2caa0a1424702f1e1c55cba
account_id | 4893332cd2caa0a1424702f1e1c55cbd
authors | {'Test McTest'}
boards | {'530e5a6cb91c275929002dbf'}
channels | {'us.test'}
published_at | 2014-03-14 11:03:00+0000
score_calculated_ | {'realtime': '2014-03-17 12:34:23+0000'}
score_pspv_ | {'realtime': 124}
score_value_ | {'realtime': 42}
score_velocity_ | {'realtime': 0.42}
shared_on_ | {'facebook': '9138739jdh', 'twitter': '1243243214'}
solr_query | null
tags | null
title | Test title
url | http://www.test.com/2014/03/14/test.html
Bounced DSE on the problem nodes seems to have resolved the issue.

Cannot make any query to Solr

I am trying to setup Solandra (Solr + Cassandra), and to use it using SolrJ library. I managed to setup everything and add some documents using SolrJ, but when I tried to make a query it failed. I tried to execute query through url but it failed also.
I used:
http://localhost:8983/solandra/my_core/select?q=*:*&wt=xml
And in my app I used following code:
CommonsHttpSolrServer server = new CommonsHttpSolrServer(url);
server.setParser(new XMLResponseParser());
SolrQuery query = new SolrQuery();
query.setQuery("some");
query.setStart(0);
query.setRows(10);
QueryResponse response = server.query(query);
This is stacktrace that I got:
org.apache.solr.common.SolrException: java.lang.StackOverflowError at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:281)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
solandra.SolandraDispatchFilter.execute(SolandraDispatchFilter.java:171)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
solandra.SolandraDispatchFilter.doFilter(SolandraDispatchFilter.java:137)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326) at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.StackOverflowError at
java.net.SocketInputStream.socketRead0(Native Method) at
java.net.SocketInputStream.read(SocketInputStream.java:150) at
java.net.SocketInputStream.read(SocketInputStream.java:121) at
java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at
java.io.BufferedInputStream.read(BufferedInputStream.java:254) at
org.apache.commons.httpclient.HttpConnection.isStale(HttpConnection.java:506)
at
org.apache.commons.httpclient.HttpConnection.closeIfStale(HttpConnection.java:431)
at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.closeIfStale(MultiThreadedHttpConnectionManager.java:1313)
at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:382)
at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:422)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
at
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166) at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Could it be that I missed something in schema.xml? This is how it looks like:
<schema name="my_core" version="1.1">
<types>
<fieldType name="string" class="solr.StrField"/>
<fieldType name="text_general_rev" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ReversedWildcardFilterFactory" withOriginal="true" maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
</types>
<fields>
<field name="url" type="string" indexed="true" stored="true"/>
<field name="content" type="text_general_rev" indexed="true" stored="true"/>
<field name="media" type="string" indexed="true" stored="true"/>
<field name="country" type="string" indexed="true" stored="true"/>
<field name="date" type="string" indexed="true" stored="true"/>
</fields>
<defaultSearchField>content</defaultSearchField>
<uniqueKey>url</uniqueKey>
</schema>
I solved it. I increased stack size inside solandra.in.env file. So I have set -Xss512k parameter if someone get same problem.

Resources