tika solr integration - solr

I am trying to index using curl based request
the request is
curl "http://localhost:8080/solr1/update/extract?literal.id=who.pdf&uprefix=attr_&fmap.content=attr_content&commit=true" -F "myfile=#/root/apache-solr-3.1.0/docs/who.pdf"
On submitting the request, i am getting this error,
Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 400 - ERROR:unknown field 'ignored_meta'</h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u>ERROR:unknown field 'ignored_meta'</u></p><p><b>description</b> <u>The request sent by the client was syntactically incorrect (ERROR:unknown field 'ignored_meta').</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/6.0.18</h3></body></html>r

Your problem is due to the fact that the default handler for ExtractingRequestHandler defined in the solrconfig.xml put all the Tika's not identified extracted fields into fields named 'ingored_XXXXX'.
To solve this, you can simply add to your Solr configuration a field name 'ignored_*' like this:
<dynamicField name="ignored_*" type="ignored"/>
Don't forget to add also the ignored type if you remove it from the default configuration:
<fieldtype name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" />
This will stop your Solr from crashing when Tika index fields that Solr don't know of.

Related

Adding fields to Solr using the API

I am very new to solr. I am trying to add a large number of fields to the schema. I am using version 8.1, and it is my understanding that it should be done through the API.
I am trying to upload all fields using curl, but keep getting errors. It works fine through the web interface.
1. Where can I find the correct field types? I checked
here, but I get error messages like "Field type 'StrField' not found".The values are also different from the ones that I get presented with in the webinterface.
2. Enum valuesI found documentation, which also results in an unknown field error. For enumns I don't see an option in the web interface.
<\p>
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field":{"name":"TEST","type":"string","required":"true","stored":true,"indexed":"true"}}' http://localhost:8983/api/cores/tgec/schema
{
"responseHeader":{
"status":400,
"QTime":27},
"error":{
"metadata":[
"error-class","org.apache.solr.api.ApiBag$ExceptionWithErrObject",
"root-error-class","org.apache.solr.api.ApiBag$ExceptionWithErrObject"],
"details":[{
"add-field":{
"name":"TEST",
"type":"StrField",
"required":"true",
"stored":true,
"indexed":"true"},
"errorMessages":["Field 'TEST': Field type 'StrField' not found.\n"]}],
"msg":"error processing commands",
"code":400}}
There is field type named "string" and the class is of "solr.StrField".
Its defined in schema.xml as below.
<fieldType name="string" class="solr.StrField" sortMissingLast="true" docValues="true" />
Then when you define a field, you mention a type string to it as below.
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
You need to change the "type":"StrField" to "type":"string".

400 Bad Request: unknown field 'type'

I've set up Solr 3.6.2 on Tomcat as described here.
Using the sunspot-rails gem and the embedded solr server I have no problems, but on my staging server I'm getting the response:
message ERROR: [doc=Foo 20] unknown field 'type'
description The request sent by the client was syntactically incorrect.
The request data looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<add>
<doc>
<field name="id">Foo 20</field>
<field name="type">Foo</field>
<field name="type">ActiveRecord::Base</field>
<field name="class_name">Foo</field>
<field name="name">test</field>
</doc>
</add>
What's causing this? Is there some configuration that should be set? (I'm expecting something that allows for the type name to be used regardless of whether or not such a column exists.)
It turns out that the sunspot-solr gem expects a slightly different schema.xml than the default that is bundled with solr.
I replaced the file with the one that the gem uses (from here) and it works fine now. This answer explains what the schema.xml file is.

Is it possible to have a static index field for Liferay using solr-web plugin?

Can anyone tell me if I can associate a static index field for Liferay using the solr-web.plugin? Is there a way to define a static index in solr?
I need something similar to the following configuration in Nutch
<property>
<name>index.static</name>
<value>source:nutch</value>
</property>
This will add the field "source" as an index and its value as "nutch" to all documents in Nutch. Anything similar to this for Liferay + Solr?
Not sure for Liferay configuration, however you can add a default value in the schema.xml which will be applied to documents.
<field name="source" type="string" indexed="true" stored="true" default="Nutch" />

Using an external file to boost results. Changes in the external file not reflected

I am using drupal 7 with apachesolr module.
I have an external file field to boost the results i want. The name of the file is external_eff_ranking. In the schema, I have:
<fieldType name="pfloat" class="solr.FloatField" omitNorms="true"/>
<fieldType name="file" keyField="id" defVal="1" stored="false" indexed="false" class="solr.ExternalFileField" valType="pfloat"/>
<dynamicField name="eff_*" type="file"/>
The format of the external file is:
id1=3.1
id2=4.2
id3=5
This works as expected, the results are boosted according to the values in the file. The problem is that when the values are changed, the results do not reflect the changes. I understand that I need to commit the changes somehow, but I can not figure out how.
I tried things like:
curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" --data-binary '<commit />'
but did not work.
SOLVED
The following line in my solrconfig.xml solved the problem:
<requestHandler name="/reloadCache" class="org.apache.solr.search.function.FileFloatSource$ReloadCacheRequestHandler" />
Then I hit this url (http://localhost:port/reloadCache) after each file update
Looks like this is due to a bug in solr that affects cached results. May be trying the reloadCache helps?

make location fields made visible in solr

I have the following field defined in solr (schema.xml)
<field name="store" type="location" indexed="true" stored="true"/>
If I search for say this-
&fq={!geofilt pt=45.15,-93.85 sfield=store d=5}
Then I can see the location coordinates in the search result.
But the field "store" seems to be a hidden field under normal circumstances. How do I get the coordinates to be a part of the search result for normal searches? (q=*:* for example)
I just verified that this works correctly for both Solr 3.1 and Solr 4.0-dev with the example data.
Example:
http://localhost:8983/solr/select?q=:&fl=id,store&wt=json&indent=true
[...]
"response":{"numFound":17,"start":0,"docs":[
{
"id":"SP2514N",
"store":"35.0752,-97.032"},
{
"id":"6H500F0",
"store":"45.17614,-93.87341"},
{
"id":"F8V7067-APL-KIT",
"store":"45.18014,-93.87741"},
[...]
Did you perhaps change this setting and forget to re-index or forget to commit?

Resources