Solr MoreLikeThis and using Boost Functions (Boost recent Items) - solr

I have a similar question as in "Boost recent item in MoreLikeThis Solr request handler" Boost recent item in MoreLikeThis Solr request handler
I would like to Boost recent Items returned from the MoreLikeThis Handler or Component.
I found out that bf isn't supported for MoreLikeThisHandler as it is a Dismax Parameter.
Therefore I tried following (within my solrconfig.xml):
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="df">id</str>
<str name="mlt">true</str>
<str name="mlt.count">10</str>
<str name="mlt.fl">project,type,summary,description,environment,fixfor,component</str>
<str name="mlt.mintf">1</str>
<str name="mlt.mindf">2</str>
<str name="mlt.boost">true</str>
<str name="rows">20</str>
<str name="fl">id,key,project,summary,reporter,assignee,updated,score</str>
<str name="bf">ms(NOW/HOUR,updated)</str>
</lst>
<!--<arr name="components">
<str>mlt</str>
</arr>-->
with
<field name="id" type="long" indexed="true" stored="true" required="true" multiValued="false" termVectors="true"/><!-- is termVector by long needed? -->
...
<field name="key" type="string" indexed="true" stored="true" required="true" multiValued="false"/>
...
<field name="description" type="text_general" indexed="true" stored="false" required="true" multiValued="false" termVectors="true"/>
...
<field name="updated" type="date" indexed="true" stored="true" required="false" multiValued="false"/>

Mlt boost does not seem to be supported.
You can probably check the Mlt Sort Patch SOLR-1545

Related

Solr Deduplication (dedupe) is not working, getting error while updating document

I have followed the example listed in the below documentation :
https://solr.apache.org/guide/8_4/de-duplication.html
My requirement is to ignore duplicate records, but after implementing dedupe I am not able to add any document(even if it is unique) and getting same error :
Exception in thread "main" org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/my_core: Document contains multiple values for uniqueKey field: id=[0011, affa84b255f98fd800dd0056b7040855]
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:681)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:177)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:138)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:156)
solrconfig.xml :
<updateRequestProcessorChain name="dedupe">
<processor class="solr.processor.SignatureUpdateProcessorFactory">
<bool name="enabled">true</bool>
<str name="signatureField">id</str>
<str name="fields">first_name,last_name,phone_no</str>
<bool name="overwriteDupes">false</bool>
<str name="signatureClass">solr.processor.TextProfileSignature</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
<requestHandler name="/update" class="solr.UpdateRequestHandler" >
<lst name="defaults">
<str name="update.chain">dedupe</str>
</lst>
</requestHandler>
schema.xml :
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="dummydata" version="1.5">
<field name="first_name" type="string" indexed="true" stored="true" multiValued="false" />
<field name="last_name" type="string" indexed="true" stored="true" multiValued="false" />
<field name="location" type="string" indexed="true" stored="true" multiValued="false" />
<field name="phone_no" type="string" indexed="true" stored="true" multiValued="false" />
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<uniqueKey>id</uniqueKey>
</schema>
Java code used :
{
String urlString = "http://localhost:8983/solr/my_core";
SolrClient Solr = new HttpSolrClient.Builder(urlString).build();
UpdateResponse response;
SolrInputDocument myDocumentInstantlycommited = new SolrInputDocument();
myDocumentInstantlycommited.addField("id", "0011");
myDocumentInstantlycommited.addField("first_name", "T11");
myDocumentInstantlycommited.addField("last_name","L11");
myDocumentInstantlycommited.addField("phone_no","9912121312");
myDocumentInstantlycommited.addField("location","TESt211");
response=Solr.add( myDocumentInstantlycommited);
Solr.commit();
Solr.close();
System.out.println("Documents Updated");
}

Solr how to indexing file content to multiple field?

Solr version:
7.3.0
I want to indexing file and register extracted text to multi field (word splitted field and bi-gram field) for search flexibility.
I wrote below configset, but it does not work, solr indexed only to content_text ,or content_text_bi (upper defined fmap.content field only)
solrconfig.xml
...
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>
<str name="fmap.meta">ignored_</str>
<str name="fmap.content">content_text</str>
<str name="fmap.content">content_text_bi</str>
<str name="captureAttr">true</str>
</lst>
</requestHandler>
...
schema.xml
...
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<!-- docValues are enabled by default for long type so we don't need to index the version field -->
<field name="_version_" type="plong" indexed="false" stored="false"/>
<field name="_root_" type="string" indexed="true" stored="false" docValues="false" />
<field name="_text_" type="text_general" indexed="true" stored="false" multiValued="true"/>
<field name="content_text" type="text_ja" indexed="true" stored="true" storeOffsetsWithPositions="false"/>
<field name="content_text_bi" type="text_ja_bi" indexed="true" stored="true" storeOffsetsWithPositions="false"/>
<field name="filepath" type="string" indexed="true" stored="true" />
<field name="filename" type="string" indexed="true" stored="true" />
<field name="storage_id" type="pint" indexed="true" stored="true" />
...
How can I make it work as I want?
I solved to use copyField in schema.xml.
1. Add this line to schema.xml
<copyField source="content_text" dest="content_text_bi" />
2.and remove this line in in solrconfig.xml
<str name="fmap.content">content_text_bi</str>

How to configure solr(4.10) for multiple words auto suggestion?

I want to create collection with auto suggestion in Solr, i tried for single word its working fine but am looking for phrases for example if we type "Barack" its should come "Barack", "Barack Obama","Barack Obama president of USA".
I have 6 fields but want to give suggestion for one filed only (i.e..Content), how to configure schema.xml and solr config.xml according to this fields. I've tried ton of examples but that didn't work for me!
Have any simple solution for this?? Any help would be appreciate !
Thanks in Advance.
Thanks Amit for response, I tried that also but didnt get What I was looking for
my schema.xml is like this I want the suggestion on my Content field
and my solr config is given below !
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="Content" type="suggest_phrase" indexed="true" stored="true" />
<field name="Lang" type="string" indexed="true" stored="true" />
<field name="PubDate" type="tdate" indexed="true" stored="true" />
<field name="Section" type="string" indexed="true" stored="true" />
<field name="PaperName" type="string" indexed="true" stored="true" />
<field name="Page_No" type="tint" indexed="true" stored="true" />
<fieldType name="suggest_phrase" class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
<!-- search content -->
<searchComponent name="suggest_phrase" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="name">suggest_phrase</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
<str name="field">suggest_phrase</str>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest_phrase">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest_phrase</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.collate">false</str>
</lst>
<arr name="components">
<str>suggest_phrase</str>
</arr>
</requestHandler>
You could use the shingle on the field which you are using for the auto-suggestion.

automatic language detect in solr 4.5.1 during indexing time

I need your helps.
I want to detect Korean and English language during indexing time in solr.
My solr directory structure is
/opt/tmocat7/webapps/solr (solr webapp)
/usr/share/solr/collection1 (solr core)
/usr/share/solr/lib/langid (lib for langid)
First, I copy some libraries(jsonic-1.2.7.jar, langdetect-1.1-20120112.jar, solr-langid-4.5.1.jar) into specific directory(/usr/share/solr/lib/langid) - my solr is located
My solrconfig.xml is
<lib dir="../lib/langid/" regex=".*\.jar" />
<requestHandler name="/update" class="solr.UpdateRequestHandler">
<lst name="defaults">
<str name="update.chain">dedupe</str>
<str name="update.chain">uuid</str>
<str name="update.chain">langid</str>
</lst>
</requestHandler>
<updateRequestProcessorChain name="langid">
<processor class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
<bool name="langid">true</bool>
<str name="langid.fl">title,content,comment</str>
<str name="langid.langField">lang</str>
<str name="langid.langsField">langs</str>
<str name="langid.lcmap">ko:ko kor:ko en_GB:en en_US:en</str>
<str name="langid.whitelist">ko,en</str>
<bool name="langid.map">true</bool>
<str name="langid.map.fl">title,content,comment</str>
<bool name="langid.map.keepOrig">true</bool>
<bool name="langid.map.individual">true</bool>
<str name="langid.fallback">ko</str>
<str name="langid.map.lcmap">ko:ko kor:ko en_GB:en en_US:en</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
and schema.xml is
<field name="lang" type="string" indexed="true" stored="true" multiValued="false" />
<field name="langs" type="string" indexed="true" stored="true" multiValued="true" />
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="title" type="text_ko" indexed="true" stored="true" multiValued="true"/>
<field name="content" type="text_ko" indexed="true" stored="true" multiValued="true"/>
<field name="comment" type="text_ko" indexed="true" stored="true" multiValued="true" />
<field name="site" type="text_general" indexed="true" stored="true" multiValued="false"/>
<field name="page" type="text_general" indexed="true" stored="true" multiValued="false"/>
<field name="fileloc" type="text_general" indexed="true" stored="true"
multiValued="false"/>
<field name="filename" type="text_general" indexed="true" stored="true"
multiValued="false" />
<field name="storeddate" type="date" indexed="true" stored="true" multiValued="false"/>
<!-- for english web data-->
<field name="title_en" type="text_en" indexed="true" stored="true" multiValued="true" />
<field name="content_en" type="text_en" indexed="true" stored="true" multiValued="true" />
<field name="comment_en" type="text_en" indexed="true" stored="true" multiValued="true" />
<field name="title_ko" type="text_ko" indexed="true" stored="true" multiValued="true"/>
<field name="content_ko" type="text_ko" indexed="true" stored="true" multiValued="true"/>
<field name="comment_ko" type="text_ko" indexed="true" stored="true" multiValued="true" />
<copyField source="title" dest="title_en"/>
<copyField source="content" dest="content_en"/>
<copyField source="comment" dest="comment_en"/>
<copyField source="title" dest="title_ko"/>
<copyField source="content" dest="content_ko"/>
<copyField source="comment" dest="comment_ko"/>
I read a some books and searching web to get a information about detecting language in solr, but can't detect language.
What is my fault?
For more information, add my post.sh and log
This is post.sh
#!/bin/sh
FILES=$*
URL=http://locahost:port/solr/collection1/update
for f in $FILES; do
echo Posting file $f to $URL
curl $URL --data-binary #$f -H 'Content-type:application/xml'
echo
done
#send the commit command to make sure all the changes are flushed and visible
curl $URL --data-binary '<commit/>' -H 'Content-type:application/xml'
echo
some part of tomcat logs during indexing
70634079 [http-bio-7070-exec-38] TRACE org.apache.solr.handler.UpdateRequestHandler – body
70634079 [http-bio-7070-exec-38] DEBUG org.apache.solr.update.processor.LogUpdateProcessor – PRE_UPDATE add{,id=2f2323f4f7966e0d} {{params({params(),defaults(update.chain=dedupe&update.chain=uuid&update.chain=langid)}),defaults(wt=xml)}}
70634125 [http-bio-7070-exec-38] TRACE org.apache.solr.update.UpdateLog – TLOG: added id 2f2323f4f7966e0d to tlog{file=/usr/share/solr/collection1/data/tlog/tlog.0000000000000000129 refcount=1} LogPtr(29407) map=614254179
70634125 [http-bio-7070-exec-38] DEBUG org.apache.solr.update.processor.LogUpdateProcessor – PRE_UPDATE FINISH {{params({params(),defaults(update.chain=dedupe&update.chain=uuid&update.chain=langid)}),defaults(wt=xml)}}
70634126 [http-bio-7070-exec-38] INFO org.apache.solr.update.processor.LogUpdateProcessor – [collection1] webapp=/solr path=/update params={} {add=[2f2323f4f7966e0d (1473490520171872256)]} 0 68
70634146 [http-bio-7070-exec-33] TRACE org.apache.solr.handler.UpdateRequestHandler – body
70634146 [http-bio-7070-exec-33] DEBUG org.apache.solr.update.processor.LogUpdateProcessor – PRE_UPDATE add{,id=329ee20831e1a0c7} {{params({params(),defaults(update.chain=dedupe&update.chain=uuid&update.chain=langid)}),defaults(wt=xml)}}
70634148 [http-bio-7070-exec-33] TRACE org.apache.solr.update.UpdateLog – TLOG: added id 329ee20831e1a0c7 to tlog{file=/usr/share/solr/collection1/data/tlog/tlog.0000000000000000129 refcount=1} LogPtr(46005) map=614254179
70634148 [http-bio-7070-exec-33] DEBUG org.apache.solr.update.processor.LogUpdateProcessor – PRE_UPDATE FINISH {{params({params(),defaults(update.chain=dedupe&update.chain=uuid&update.chain=langid)}),defaults(wt=xml)}}
70634148 [http-bio-7070-exec-33] INFO org.apache.solr.update.processor.LogUpdateProcessor – [collection1] webapp=/solr path=/update params={} {add=[329ee20831e1a0c7 (1473490520241078272)]} 0 2
I can't find any other warn or error.
I need your advice
Thanks all
I think you use /update/extract instead of /update
In Solr 5.3.1, it works fine when I use with /update/extract.
Here's the full config:
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>
<!-- capture link hrefs but ignore div attributes -->
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
<str name="fmap.div">ignored_</str>
<str name="update.chain">langid</str>
</lst>
Thanks for the question and the great answers, they helped me configure my system appropriately. I don't know how I managed to get the JAR file solr-langdetect.*.*.*.jar into my lib directory, but each time when I started solr it would show me the following error:
org.apache.solr.common.SolrException: com.cybozu.labs.langdetect.DetectorFactory.loadProfile(Ljava/util/List;)V
After removing that JAR file everything worked fine. The other three JAR files mentioned in the question (jsonic-*.*.*.jar, langdetect-*.*.jar, solr-langid-*.*.*.jar) are however required.

How can I add data to dynamic fields when using solr's extract functionality?

I'm using a PHP library called solr-php-client (http://code.google.com/p/solr-php-client/) to interface with my Solr server. I can extract data from the document, store it, and search on it, but I can't seem to get it to allow me to add my own data to the parameters for indexing:
$aParams = array
(
"literal.ClassName_ms" => "File",
"literal.SS_ID_i" => 73,
"literal.Name_ms" => "OverviewOfBenefits.pdf",
"literal.title" => "Overview Of Benefits",
"literal.Created_dt" => "2011-09-19T13:50:30Z",
"literal.last_modified_dt" => "2011-10-12T19:33:59Z",
"literal.SS_Stage_ms" => "Live",
"literal.ClassNameHierarchy_ms" => array("Object","ViewableData","DataObject","File"),
"literal.id" => "File_73_Live",
"fmap.content" => "text",
);
try {
$oResponse = $oSOLR->extract($sFilePath, $aParams);
$oSOLR->commit();
$oSOLR->optimize();
}
catch(Exception $e) {
var_dump($e);
}
I can query "text" and get results:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="indent">on</str>
<str name="start">0</str>
<str name="q">text:Overview</str>
<str name="rows">10</str>
<str name="version">2.2</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<arr name="content_type"><str>application/pdf</str></arr>
<str name="id">File_73_Live</str>
<date name="last_modified">2011-02-07T16:21:10Z</date>
</doc>
</result>
</response>
But I can't query any of the dynamic fields, i.e. "SS_Stage_ms":
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="indent">on</str>
<str name="start">0</str>
<str name="q">SS_Stage_ms:Live</str>
<str name="rows">10</str>
<str name="version">2.2</str>
</lst>
</lst>
<result name="response" numFound="0" start="0"/>
</response>
Here are the applicable schema definitions:
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="title" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="text" type="text" indexed="true" stored="false" multiValued="true"/>
<dynamicField name="*_i" type="int" indexed="true" stored="false"/>
<dynamicField name="*_ms" type="string" indexed="true" stored="false" multiValued="true"/>
<dynamicField name="*_dt" type="date" indexed="true" stored="false"/>
I switched the schema definitions to store the data so I could see how the fields were being interpreted by Solr:
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="title" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="text" type="text" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="*_i" type="int" indexed="true" stored="true"/>
<dynamicField name="*_ms" type="string" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="*_dt" type="date" indexed="true" stored="true"/>
After doing this, I found that all of the fields were getting switched to lowercase. I found my answer (http://wiki.apache.org/solr/ExtractingRequestHandler):
lowernames=true|false - Map all field names to lowercase with underscores. For example, Content-Type would be mapped to content_type.
By default "lowernames" is set to true. I added "lowernames" to the parameters, set it to false, and voila, it worked!

Resources