I've set up Solr 3.6.2 on Tomcat as described here.
Using the sunspot-rails gem and the embedded solr server I have no problems, but on my staging server I'm getting the response:
message ERROR: [doc=Foo 20] unknown field 'type'
description The request sent by the client was syntactically incorrect.
The request data looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<add>
<doc>
<field name="id">Foo 20</field>
<field name="type">Foo</field>
<field name="type">ActiveRecord::Base</field>
<field name="class_name">Foo</field>
<field name="name">test</field>
</doc>
</add>
What's causing this? Is there some configuration that should be set? (I'm expecting something that allows for the type name to be used regardless of whether or not such a column exists.)
It turns out that the sunspot-solr gem expects a slightly different schema.xml than the default that is bundled with solr.
I replaced the file with the one that the gem uses (from here) and it works fine now. This answer explains what the schema.xml file is.
Related
I am trying to implement delta-import in solr indexing its working fine,in case when i am indexing data from database.But i want to implement it on filebased datasource.
My data-config.xml file is like
dataSource type="com.solr.datasource.DataSource" name="SuggestionsFile"/>
<document name="suggester">
<entity name="file" dataSource="SuggestionsFile">
<field column="suggestion" name="suggestion" />
</entity>
and i am using DataImportHandler in solrconfig.xml file.i am not able to post my config file,i tried to post,but i don't know why not its showing.
My DataSource class read the text file and return list of data,that solr index .Its working fine in case of full-import but not working in case of delta-import.Pls suggest what else i need to do.
The FileDataSourceEntityProcessor supports filtering the list based on the "newerThan" attribute:
<entity
name="fileimport"
processor="FileListEntityProcessor"
newerThan="${dataimporter.last_index_time}"
.. other options ..
>
...
</entity>
There's a complete example available online.
Hi I am getting this exception and I've exhausted all the possible settings that I could think of.
org.apache.solr.common.SolrException: ERROR: [doc=SOMEURL] unknown field ''
The problem is field '' - the quotation marks are empty so I don't know what causes the problem.
Does anybody had the same problem? I will help me a lot.
Some informations:
Nutch version 2.1
Solr version 1.5
Hbase as a data storage
-Tomcat6 for Solr running
In code have just this:
nutchDocument.add("my_key",stringValue);
I have checked Solr's schema.xml, Nutch's schema.xml and also Nutch solr-mapping.xml (I am sure in the right directories) in each is "my_key" written in the right way.
Thanks for help
Well, I had to be blind. I found where was the problem. For someone who will have the similar problem here is the reason:
In my solrindex-mapping.xml I had this:
<field dest="video_og_title" source="video_og_title" />
<field dest="video_og_type" source="video_og_type"/>
<field dest="video_og_image" source="video_og_image" />
<field name="video_og_url" source="video_og_url"/>
<field name="video_og_description" source="video_og_description" />
<field name="video_og_video" source="video_og_video" />
I didn't see the field has attribute name and not dest so Solr represent the dest attribute, which it uses for mapping like empty field ' '.
I don't know java, I don't know XML, and I don't know Lucene. Now that that's out of the way. I have been working to create a little project using apache solr/lucene. My problem is that I am unable to index the xml files. I think I understand how its supposed to work but I could be wrong. I am not sure what information is required for you to help me so I will just post the code.
<dataConfig>
<dataSource type="FileDataSource" encoding="UTF-8" />
<document>
<!-- This first entity block will read all xml files in baseDir and feed it into the second entity block for handling. -->
<entity name="AMMFdir" rootEntity="false" dataSource="null"
processor="FileListEntityProcessor"
fileName="^*\.xml$" recursive="true"
baseDir="C:\Documents and Settings\saperez\Desktop\Tomcat\apache-tomcat-7.0.23\webapps\solr\data\AMMF_New"
>
<entity
processor="XPathEntityProcessor"
name="AMMF"
pk="AcquirerBID"
datasource="AMMFdir"
url="${AMMFdir.fileAbsolutePath}"
forEach="/AMMF/Merchants/Merchant/"
transformer="DateFormatTransformer, RegexTransformer"
>
<field column="AcquirerBID" xpath="/AMMF/Merchants/Merchant/AcquirerBID" />
<field column="AcquirerName" xpath="/AMMF/Merchants/Merchant/AcquirerName" />
<field column="AcquirerMerchantID" xpath="/AMMF/Merchants/Merchant/AcquirerMerchantID" />
</entity>
</entity>
</document>
Example xml file
<?xml version="1.0" encoding="utf-8"?>
<AMMF xmlns="http://tempuri.org/XMLSchema.xsd" Version="11.2" CreateDate="2011-11-07T17:05:14" ProcessorBINCIB="422443" ProcessorName="WorldPay" FileSequence="18">
<Merchants Count="153">
<Merchant ChangeIndicator="A" LocationCountry="840">
<AcquirerBID>10029881</AcquirerBID>
<AcquirerName>WorldPay</AcquirerName>
<AcquirerMerchantID>*</AcquirerMerchantID>
<Merchant ChangeIndicator="A" LocationCountry="840">
<AcquirerBID>10029882</AcquirerBID>
<AcquirerName>WorldPay2</AcquirerName>
<AcquirerMerchantID>Hello World!</AcquirerMerchantID>
</Merchant>
</Merchants>
I have this in schema.
<field name="AcquirerBID" type="string" indexed="true" stored="true" required="true" />
<field name="AcquirerName" type="string" indexed="true" stored="true" />
<field name="AcquirerMerchantID" type="string" indexed="true" stored="true"/>
I have this in config.
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler" default="true" >
<lst name="defaults">
<str name="config">AMMFconfig.xml</str>
</lst>
</requestHandler>
The sample XML is not well formed. This might explain errors indexing the files:
$ xmllint sample.xml
sample.xml:13: parser error : expected '>'
</Merchants>
^
sample.xml:14: parser error : Premature end of data in tag Merchants line 3
sample.xml:14: parser error : Premature end of data in tag AMMF line 2
Corrected XML
Here's what I think your sample data should look like (Didn't check the XSD file)
<?xml version="1.0" encoding="utf-8"?>
<AMMF xmlns="http://tempuri.org/XMLSchema.xsd" Version="11.2" CreateDate="2011-11-07T17:05:14" ProcessorBINCIB="422443" ProcessorName="WorldPay" FileSequence="18">
<Merchants Count="153">
<Merchant ChangeIndicator="A" LocationCountry="840">
<AcquirerBID>10029881</AcquirerBID>
<AcquirerName>WorldPay</AcquirerName>
<AcquirerMerchantID>*</AcquirerMerchantID>
</Merchant>
<Merchant ChangeIndicator="A" LocationCountry="840">
<AcquirerBID>10029882</AcquirerBID>
<AcquirerName>WorldPay2</AcquirerName>
<AcquirerMerchantID>Hello World!</AcquirerMerchantID>
</Merchant>
</Merchants>
</AMMF>
Alternative solution
I know you said you're not a programmer, but this task is significantly simpler, if you use the solrj interface.
The following is a groovy example which indexes your example XML
//
// Dependencies
// ============
import org.apache.solr.client.solrj.SolrServer
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
import org.apache.solr.common.SolrInputDocument
#Grapes([
#Grab(group='org.apache.solr', module='solr-solrj', version='3.5.0'),
])
//
// Main
// =====
SolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr/");
def i = 1
new File(".").eachFileMatch(~/.*\.xml/) {
it.withReader { reader ->
def ammf = new XmlSlurper().parse(reader)
ammf.Merchants.Merchant.each { merchant ->
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", i++)
doc.addField("bid_s", merchant.AcquirerBID)
doc.addField("name_s", merchant.AcquirerName)
doc.addField("merchantId_s", merchant.AcquirerMerchantID)
server.add(doc)
}
}
}
server.commit()
Groovy is a Java scripting language that does not require compilation. It would be just as easy to maintain as a DIH config file.
To figure out how DIH XML import works, I suggest you first carefully read this chapter in DIH wiki: http://wiki.apache.org/solr/DataImportHandler#HttpDataSource_Example.
Open the Slashdot link http://rss.slashdot.org/Slashdot/slashdot in your browser, then right click on the page and select View source. There's the XML file used in this example.
Compare it with XPathEntityProcessor configuration in DIH example and you'll see how easy it is to import any XML file in Solr.
If you need more help just ask...
Often the best thing to do is NOT use the DIH. How hard would it be to just post this data using the API and a custom script in a language you DO know?
The benefit of this approach is two-fold:
You learn more about your system, and know it better.
You don't spend time trying to understand the DIH.
The downside is that you're re-inventing the wheel a bit, but the DIH is quite a thing to understand.
I am trying to index using curl based request
the request is
curl "http://localhost:8080/solr1/update/extract?literal.id=who.pdf&uprefix=attr_&fmap.content=attr_content&commit=true" -F "myfile=#/root/apache-solr-3.1.0/docs/who.pdf"
On submitting the request, i am getting this error,
Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 400 - ERROR:unknown field 'ignored_meta'</h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u>ERROR:unknown field 'ignored_meta'</u></p><p><b>description</b> <u>The request sent by the client was syntactically incorrect (ERROR:unknown field 'ignored_meta').</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/6.0.18</h3></body></html>r
Your problem is due to the fact that the default handler for ExtractingRequestHandler defined in the solrconfig.xml put all the Tika's not identified extracted fields into fields named 'ingored_XXXXX'.
To solve this, you can simply add to your Solr configuration a field name 'ignored_*' like this:
<dynamicField name="ignored_*" type="ignored"/>
Don't forget to add also the ignored type if you remove it from the default configuration:
<fieldtype name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" />
This will stop your Solr from crashing when Tika index fields that Solr don't know of.
Please find below necessary steps that executed.
Iam following same structure as mentioned by you, and checked results in the admin page by clicking search button, samples are working fine.
Ex:Added monitor.xml and search for video its displaying results----- search content is displaying properly
Let me explain you the problem which iam facing:
step 1: I started Apache tomcat
step2 : Indexing Data
java -jar post.jar myfile.xml
Here is my XML content:
<add>
<doc>
<field name="id">11111</field>
<field name="name">Youth to Elder</field>
<field name="Author"> Integrated Research Program</field>
<field name="Year">2009</field>
<field name="Publisher"> First Nation</field>
</doc>
<doc>
<field name="id">22222</field>
<field name="name">Strategies </field>
<field name="Author">Implementation Committee </field>
<field name="Year">2001</field>
<field name="Publisher">Policy</field>
</doc>
</add>
Step 4 : i did
java -jar post.jar myfile.xml
output of above one:
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, othe
r encodings are not currently supported
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file curnew.xml
SimplePostTool: FATAL: Solr returned an error: Bad Request
Request to help me on this.
You need to configure your schema. The default schema doesn't have any Author, Year or Publisher fields.