HTTP ERROR 401 and 404 while running Solr search - solr

I am working on a hybris upgrade to 1808 version. To do this upgrade, i have to upgrade Solr to version 7.4. The upgraded solr server is getting started without any error but when i try to search for any product, it throws HTTP ERROR 401
[m de.hybris.platform.solrfacetsearch.search.FacetSearchException: Error from server at https://localhost:8983/solr: Error from server at https://localhost:8983/solr/master_XXXXXXX_Index: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 401 require authentication</title>
</head>
<body><h2>HTTP ERROR 401</h2>
<p>Problem accessing /solr/master_XXXXXXXX_Index/select. Reason:
<pre> require authentication</pre></p>
</body>
</html>
at de.hybris.platform.solrfacetsearch.search.impl.LegacyFacetSearchStrategy.search(LegacyFacetSearchStrategy.java:204) ~[solrfacetsearchserver.jar:?]
at de.hybris.platform.solrfacetsearch.search.impl.DefaultFacetSearchService.search(DefaultFacetSearchService.java:89) ~[solrfacetsearchserver.jar:?]
at de.hybris.platform.solrfacetsearch.search.impl.DefaultFacetSearchService.search(DefaultFacetSearchService.java:78) ~[solrfacetsearchserver.jar:?]
at com.hybris.XXXXXX.core.search.content.impl.XXXXXXXXSearchService.quickSearchContent(XXXXXXXXSearchService.java:754) [classes/:?]
As i know that this version of solr is using https and security.json for authentication and authorizations, so i removed all authentication related settings i.e. security.json, properties file entry for authType, users and passwords.
[m de.hybris.platform.solrfacetsearch.search.FacetSearchException: Error from server at https://localhost:8983/solr: Error from server at https://localhost:8983/solr/master_XXXXXXXContent_Index: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404</h2>
<p>Problem accessing /solr/master_XXXXXXXContent_Index/select. Reason:
<pre> Not Found</pre></p>
</body>
</html>
at de.hybris.platform.solrfacetsearch.search.impl.LegacyFacetSearchStrategy.search(LegacyFacetSearchStrategy.java:204) ~[solrfacetsearchserver.jar:?]
at de.hybris.platform.solrfacetsearch.search.impl.DefaultFacetSearchService.search(DefaultFacetSearchService.java:89) ~[solrfacetsearchserver.jar:?]
at de.hybris.platform.solrfacetsearch.search.impl.DefaultFacetSearchService.search(DefaultFacetSearchService.java:78) ~[solrfacetsearchserver.jar:?]
at com.hybris.doterra.core.search.content.impl.XXXXXXXXContentSearchService.quickSearchContent(DefaultDoterraContentSearchService.java:754) [classes/:?]
I am not sure about what configurations / steps missing which is causing this issue.

Related

Solr index custom file types

Basically, I am a Solr newbie and have had 0 experience with this as our Solr expert left the company. We are receiving a file from a client that is a proprietary file. I don't have access to the application in which it was generated from.
When uploading to Solr we receive the following error
SOLR Log
solr-cloud.log: {"msg":"2022-01-19 08:10:06.915 ERROR (qtp349420578-3516) [c:<collection> s:shard2 r:core_node5 x:<redacted>] o.a.s.s.HttpSolrCall null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: ucar/nc2/NetcdfFile"}
Our App logging
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/<collection>: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 500 Server Error</title>
</head>
<body><h2>HTTP ERROR 500</h2>
<p>Problem accessing /solr/<collection>/update/extract. Reason:
<pre> Server Error</pre></p><h3>Caused by:</h3><pre>java.lang.NoClassDefFoundError: ucar/nc2/NetcdfFile
at org.apache.tika.parser.hdf.HDFParser.parse(HDFParser.java:88)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
Other normal file types works (e.g. doc, pdf, zip)
I cannot open or edit the file to see what fields are in there to
index so is there a way to be able to index this?
If not, is there anything else I can do to handle this file type
TIA
file is being parsed by Solr/Tika using an HDF parser which in turn depends on NetCDF parser -
https://www.unidata.ucar.edu/downloads/netcdf-java/

Solarium return Solr HTTP error : OK (404)

I use Solarium to access Solr with Symfony. It works without problem on my computer and dev computer but not on prod server.
On the prod server, Sorl is running with the same configuration, same port, same logins.
Do you have any idea of what can be the problem?
Here is the error
Solr HTTP error: OK (404)
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd">
<HTML><HEAD><TITLE>Not Found</TITLE>
<META HTTP-EQUIV="Content-Type" Content="text/html; charset=us-ascii"></HEAD>
<BODY><h2>Not Found</h2>
<hr><p>HTTP Error 404. The requested resource is not found.</p>
</BODY></HTML>
Problem solved. There was a wrong proxy installed on the windows server.

why nutch index to a wrong solr collection even though set solr.server.url parameter?

integrate nutch 1.15 with solr8.0, but when i use the following command
nutch/bin/crawl -i -D solr.server.url=http://192.168.199.109:8983/solr/csdn -s ./csdn-seed/ ./data/csdn 1
to index crawled data from nutch to solr it throw out the exception in hadoop.log
2019-03-23 02:03:07,491 WARN mapred.LocalJobRunner - job_local1877827743_0001
java.lang.Exception: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/nutch: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404</h2>
<p>Problem accessing /solr/nutch/update. Reason:
<pre> Not Found</pre></p>
</body>
</html>
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/nutch: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404</h2>
<p>Problem accessing /solr/nutch/update. Reason:
<pre> Not Found</pre></p>
</body>
</html>
but actually, i set solr.server.url to /solr/csdn isn't it? but why it told me that it is indexing to /solr/nutch?
The way indexer plugins are configured has changed with Nutch 1.15: all indexer plugins are now configured in a single XML file (conf/index-writers.xml), setting or overwriting configuration parameters via Nutch properties is not possible anymore.
See https://wiki.apache.org/nutch/IndexWriters how to configure the Solr server URL. This breaking change was necessary to allow multiple indexers of the same type, e.g. multiple Solr instances.

How configure timeout for apache camel jetty component

I use Talend Open Studio 5.6 ESB, I make a apache camel route. The end of my route is :
.removeHeaders("CamelHttpPath")
.removeHeaders("CamelHttpUrl")
.removeHeaders("CamelServletContextPath")
.to("jetty:http://toOverRide?bridgeEndpoint=false&throwExceptionOnFailure=false&useContinuation=false&httpClient.timeout=120000&httpClient.idleTimeout=120000")
Before this, I overide the url in the jetty component for call a remote service. This service takes 30 seconds to reply, the route closes the connection and send a error 503. How can I increase the timeout.
log camel :
[WARN ]: org.apache.camel.component.jetty.CamelContinuationServlet - Continuation expired of exchangeId: ID-A1995-62398-1480423883621-0-1
[WARN ]: org.apache.camel.component.jetty.CamelContinuationServlet - Cannot resume expired continuation of exchangeId: ID-A1995-62398-1480423883621-0-1
reponse :
HTTP/1.1 503 Service Unavailable
Cache-Control: must-revalidate,no-cache,no-store
Content-Type: text/html;charset=ISO-8859-1
Content-Length: 1325
Server: Jetty(8.y.z-SNAPSHOT)
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 503 Service Unavailable</title>
</head>
<body>
<h2>HTTP ERROR: 503</h2>
<p>Problem accessing /sync/mockTmcWithLog/utilisateurs/30000. Reason:
<pre> Service Unavailable</pre></p>
<hr /><i><small>Powered by Jetty://</small></i>
</body>
</html>
Here is an example:
.to("jetty:http://toOverRide?continuationTimeout=900000&httpClient.timeout=900000")

What SOLR configuration is required to fetch an html page and parse it?

I've been consulting one tutorial after another and have spent oodles of time searching.
I installed SOLR from scratch and start it up.
bin/solr start
I successfully navigate to the SOLR admin. Then I create a new core.
bin/solr create -c core_wiki -d basic_configs
I look at the help for the bin/post command.
bin/post -h
...
* Web crawl: bin/post -c gettingstarted http://lucene.apache.org/solr -recursive 1 -delay 1
...
So I try to make a similar call... but I keep getting a FileNotFound error.
bin/post -c core_wiki http://localhost:8983/solr/ -recursive 1 -delay 10
/usr/lib/jvm/java-7-openjdk-amd64/jre//bin/java -classpath /home/ubuntu/src/solr-5.4.0/dist/solr-core-5.4.0.jar -Dauto=yes -Drecursive=1 -Ddelay=10 -Dc=core_wiki -Ddata=web org.apache.solr.util.SimplePostTool http://localhost:8983/solr/
SimplePostTool version 5.0.0
Posting web pages to Solr url http://localhost:8983/solr/core_wiki/update/extract
Entering auto mode. Indexing pages with content-types corresponding to file endings xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
Entering recursive mode, depth=1, delay=10s
Entering crawl at level 0 (1 links total, 1 new)
SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/core_wiki/update/extract?literal.id=http%3A%2F%2Flocalhost%3A8983%2Fsolr&literal.url=http%3A%2F%2Flocalhost%3A8983%2Fsolr
SimplePostTool: WARNING: Response: <html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404</h2>
<p>Problem accessing /solr/core_wiki/update/extract. Reason:
<pre> Not Found</pre></p><hr><i><small>Powered by Jetty://</small></i><hr/>
</body>
</html>
SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException: http://localhost:8983/solr/core_wiki/update/extract?literal.id=http%3A%2F%2Flocalhost%3A8983%2Fsolr&literal.url=http%3A%2F%2Flocalhost%3A8983%2Fsolr
SimplePostTool: WARNING: An error occurred while posting http://localhost:8983/solr
0 web pages indexed.
COMMITting Solr index changes to http://localhost:8983/solr/core_wiki/update/extract...
SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/core_wiki/update/extract?commit=true
SimplePostTool: WARNING: Response: <html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404</h2>
<p>Problem accessing /solr/core_wiki/update/extract. Reason:
<pre> Not Found</pre></p><hr><i><small>Powered by Jetty://</small></i><hr/>
</body>
</html>
Time spent: 0:00:00.041
I'm still fairly new to SOLR indexing. Any hints that could point me in the right direction would be appreciated.
It seems that the request handler named /update/extract is missing from your configuration.
The ExtractingRequestHandler is not incorporated into the solr war
file, it is provided as a SolrPlugins, and you have to load it (and
it's dependencies) explicitly. (Apache Solr Wiki)
It should be defined in solrconfig.xml, like :
<requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler">

Resources