I am trying to integrate apache nucth and Solr and when nutch tries to dump the output to solr , it throws
HTTP method POST is not supported by this URL
I checked configurations but couldn't find the right point to make solr url POST supported , and I don't like to use Tomcat to drive solr , so how to make solr url POST supported
Related
I get Starting Zookeeper and solr service and in Cloudera Manager also I had create a HDFS.But i still not able to get working nutch and solr together in Cloudera.
I do not know the following steps in order to get crawling and indexing new urls and get Query Result of solr index.
Does anyone know how to proceed?
I have just installed nutch integrated with solr and started crawling. but the urls I am specifying in seed.txt nutch is not crawling those url immediately. It's injecting old urls which I may have given earlier but now they are commented out.It looks like nutch is injecting url's in some strange order. What is the reason.also could anybody guide me any book or detailed tutorial on nutch becuase most of the tutorial available are only installation.
As mentioned in an answer to a similar question, the old URLs are still in Nutch's crawldb.
You can nuke your previous runs completely like this user did and start fresh, or you can remove the unwanted URLs a few different ways via CrawlDbMerger:
CLI via bin/nutch mergedb
CLI via bin/nutch updatedb
Firstly Thanks to stackoverflow which is giving support to everyone.
Iam new to drupal and solr server
I have Successfully installed the solrserver in my system and I can able to search the data using "Apache Solr search module" In drupal7.
But Actually I dont know what is the Background process that is Running.But Inorder to have work with it I need to have a ground knowledge on it.Drupal is connecting to solr server using the url which I have Provided in admin UI.
As Per My knowledge I think the following is the backend flow of Apache solr server module
1)It sends the request of search string from drupal to solr server.
2)The solr server searches for the string and send the result back in the format of json to drupal.
3)Drupal displays the results
But How the solr server connects to drupal db inorder to search for the string or content?
Please help with this..I really In a need to know the backend flow how the request is handling
Thankyou
I'm not a Drupal specialist, but from the Solr prospective you are searching on the documents previously indexed on Solr. I.e., all documents must be indexed on Solr prior to the search.
Therefore, you have 2 ways here:
You call Solr API from your backend and push documents to Solr index. There are specific drupal solutions you may research, but here is the wiki article from Solr prospective describing how to index documents using only JSON API: http://wiki.apache.org/solr/UpdateJSON
You connect to your database directly from Solr and pull documents to Solr index. Here is the related wiki page: http://wiki.apache.org/solr/DataImportHandler
Ok, so I'm trying to setup nutch to crawl a site and index the pages into solr. I'm currently using Nutch 1.9 with Solr 4.10.2
I've followed these instructions: http://wiki.apache.org/nutch/NutchTutorial#A4._Setup_Solr_for_search
The crawling appears to go just fine but when I check the collection on Solr (using the web ui) there are no documents indexed...any idea where I could check for problems?
Found my problem, I'll leave it as an answer in case anyone else has the same symptoms:
My problem was the proxy configuration. My linux box has the proxy configured to be applied system-wide, but I also had to configure Nutch to use the same proxy. Once I changed that, it started to work.
The configuration is under config/nutch-default.xml
Edit with more info
To be more specific, here is the Proxy configuration I had to change:
<property>
<name>http.proxy.host</name>
<value>xxx.xxx.xxx</value>
<description>The proxy hostname. If empty, no proxy is used.</description>
</property>
Context:
I have a web application that serves content via RESTful web services
I need to provide a search functionality
This is what I have in mind. Am I on the right track or way off ?
Index seed client:
This component will poll the Application at repeated intervals for data
(I have a WS which returns an XML response)
And then Post the XML to a EMS
Queue Listener:
The Queue Listener will convert the domain XML into Solr doc
And the post the document to Solr to be indexed
Search client:
The client will make a search request to my web application with query parameters
The web application will forward the request to Solr
Solr returns search results to my web application
My web application returns the result back to the client
Alternate flow ?
The search client talks to Solr directly and does the search.
Suggestions?
Searching will depend on your implementation choice of solr server. If you use embbededSolrServer you will need to query via your web client then calling sol. If you are using an httpsolrserver then you can query solr directly.
It also depends on how you want to return the results.
As solr documents?
Or your own interpretation of a solr document?
The later would have to be serviced by your web application