Need help understanding Solr - solr

I'm just getting started with Nutch and Solr. I ran the crawl once with just one seed URL.
I ran this command:
bin/nutch crawl urls -dir crawl -solr http://localhost:8983/solr/ -depth 3 -topN 5
Everything goes fine and I'm assuming Solr indexes the pages? So how do I go about searching now? I went here localhost:8983/solr/admin/ but when I put a search query and click search I get this:
HTTP ERROR 400
Problem accessing /solr/select/.
Reason: undefined field text
I also tried an example from the tutorial but when I run this command:
java -jar post.jar solr.xml monitor.xml
I get this:
SimplePostTool: version 1.4
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file solr.xml
SimplePostTool: FATAL: Solr returned an error #400 ERROR: [doc=SOLR1000] unknown field 'name'
My ultimate goal is to somehow add this data into Accumulo and use it for a search engine.

I'm assuming you are using Nutch 1.4 or up. If that is the case, you need to change the type of the fields you added in the solr/conf/schema.xml file from "text" to "text_general", without the quotes.
I am working towards a similar goal right now and have used that fix to at least get solr working properly, although I still cannot get solr to search the indexed sites. Hope this helps, let me know if you get it working.

Related

Querying Solr Config API returns Internal Server Error

I'm attempting to update my solr config via the Solr Config API. I attempting to first query the config with the following endpoint:
http://localhost:8983/solr//config
the response I get back is 500 Internal Server Error and I noticed in logs for the deployed solr the following exception:
Internal Server Error (500) - No RestManager found!
at org.apache.solr.rest.RestManager.getRestManager(RestManager.java:245)
at org.apache.solr.rest.SolrConfigRestApi.createInboundRoot(SolrConfigRestApi.java:67)
at org.restlet.Application.getInboundRoot(Application.java:270)
at org.restlet.engine.application.ApplicationHelper.start(ApplicationHelper.java:127)
at org.restlet.Application.start(Application.java:582)
The core was created using the following post:
curl "http://localhost:8983/solr/admin/cores?action=CREATE&name=<keyspace.table>&generateResources=true&reindex=true"
This action was successful but a GET to the config api fails.
The url should be: http://localhost:8983/solr/#/[ks.cf]/config
Still there are convenience/recommended methods to avoid you all the curl commands. See:
dsetool create_core
dsetool get_core_config
dsetool get_core_schema
dsetool reload_core
In your case a dsetool create_core, dsetool get_core_config, modify the solrconfig.xml and a later dsetool reload_core with the new solrconfig should get you what you need. See: Dsetool doc for usage details
If you need some examples on how to do that with curl, if you still need to use curl, see the shell files under the wikipedia demo, 1-add-schema.sh for instance. Still dsetool commands are the recommended method.
If you intended to use the Solr Config API please notice this is Solr 5 feature and in DSE you will find version 4.10 so that feature is not available yet. If you want to change the solr config you will have to reload with the new solrconfig using the outlined dsetool commands above.
It would be very useful to know the exact version you are using.
Hope it helps.

uploaded data is not visible in solr

I am using solr4.7. I have created a new core by copying "collection1(default example provided by solr)" to different name say "wiki" and updated core.properties with new name. Hence new core is visible at solr admin panel.
After starting solr, I am trying to import the data to new core like below.
$ java -jar post.jar ../../../enwiki-20150602-pages-articles1.xml -Durl='http://localhost:8983/solr/#/wiki/update'
SimplePostTool version 1.5
Posting files to base url http://localhost:8983/solr/update using content-type application/xml..
POSTing file enwiki-20150602-pages-articles1.xml
SimplePostTool: WARNING: No files or directories matching -Durl=http:/localhost:8983/solr/#/wiki/update
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/update..
Time spent: 0:00:03.671
I also tried
$ java -jar post.jar ../../../enwiki-20150602-pages-articles1.xml
But still while querying at solr admin panel I am not getting any data.So my question is if data has been indexed then why I can't see it. Where exactly I am doing wrong.
Not sure this has been resolved.
I had the exact same problem. You need to specify the path/location of where your file to be ingested is located.
C:\test-solr>java -Durl=http://localhost:8983/solr/testdemo/update -Dtype=text/csv -jar C:/test-solr/exampledocs/post.jar "C:\test-solr\exampledocs\ingestMeFile.csv"

error when using solr and Integrating nutch and solr(HTTP ERROR 500)

I have Linux Ubuntu 12.04 installed and I'm trying to install nutch 1.5.1 and solr 3.6.1 and integrate theme together to crawl seed urls.
I'm using This tutorial to get this work.
I followed the steps before 3.2 and skipped to step 4 and I can access to
localhost:8983/solr/admin/
without error.
but when going to step 6 and copying schema.xml from conf folder of nutch to example/solr/conf folder of solr
solr/admin page occurs a java error,below:
How can I handle that?
one more thing to ask....
I have another tutorial for this that looks good but in first step it mentions that add some code to nutch-site.xml file in /conf/ and /runtime/local/conf/ folder
but in nutch folder there is no runtime folder.In step 4 this folder mentioned too.
any suggestion?
thanks in advance
This is just bit of red herring. The line that specifies version number something like:
<schema name="nutch" version="1.5.1">
is causing it because the value of version is being parsed as float. remove the extra dot. Change it to 1.5 or 1.51 to make it valid float and restart your solr instance. The exception should disappear.
Check,please, whether are Nutch 1.5.1 and Solr 3.6.1 compatible (are they having same versions of lucene-core and solr-solrj jars). I got some problems with incompatible versions, but not with 1.5/3.6 .

#500 Internal Server Error when trying to add PDF to Solr index with extraction

I am a first-time Solr user, using v3.5 with Tomcat 7 on a Windows 7 system. I went through the XML example in example-docs with no problems. However, I'm going to need to use extraction with HTML and PDF files, and when I try to Post a PDF file for indexing I'm getting the following:
SimplePostTool: version 1.4
SimplePostTool: POSTing files to http://localhost:8080/solr/update/extract?literal.id=doc2..
SimplePostTool: POSTing file test.pdf
SimplePostTool: FATAL: Solr returned an error #500 Internal Server Error
The command I used is:
java -Durl=http://localhost:8080/solr/update/extract?literal.id=doc2 -Dtype=application/pdf -jar post.jar test.pdf
My solr home directory is C:\solr, where I have done the following so far:
Copied the contents of the solr download package's example/solr folder
Copied the solr download package's contrib/extraction/lib folder to C:\solr\lib
Copied the solr download package's dist/apache-solr-cell-3.5.0.jar to C:\solr\dist\apache-solr-cell-3.5.0.jar
Modified the appropriate "lib" tags in C:\solr\conf\solrconfig.xml to <lib dir="lib" /> and <lib dir="dist/" regex="apache-solr-cell-\d.*\.jar" />
What else do I need to do to make this work for PDF and HTML files? I've read multiple tutorials and "Getting Started" guides but can't seem to understand what's wrong. I'm also a Tomcat beginner and as far as I can tell, none of this is showing up in Tomcat's logs ... so I'm pretty much stuck. Again, I'm not having any problem with the XML example, so Tomcat itself is running fine and recognizes solr (I can see the solr admin page). Any help is appreciated.

org.apache.solr.common.SolrException: missing content stream

I have installed Apache Solr with Tomcat and my /solr/admin is working fine. But when I try to issue /solr/update I am getting the following error. What could be the reason?
org.apache.solr.common.SolrException: missing content stream
If you add commit parameter i.e. ?commit=true, it will work
/solr/update will look for any input documents to be indexed. Running plain /solr/update will cause this exception since there is no input for it. The easiest way to run it is like,
java -Durl=localhost:8080/<your apache solr context path, mostly solr>/update -jar post.jar *.xml
This can also happen through SolrJ/spring-data-solr if you try to persist an empty collection of documents.
So solrClient.add(new ArrayList<SolrInputDocument>(), 10000);
would also cause the error.

Resources