I am using solr 6.4.1 and using the below to crawl a url
Below is how I am crawling a URL:
java -Ddata=web -Dc=corename -jar post.jar (urlname )
I am getting the below issue:
The URL (urlname) caused a redirect to (urlname)/us/
SimplePostTool: WARNING: The URL (urlname) returned a HTTP result status of 301
0 web pages indexed.
How do I resolve this ?
The URL that you are providing has been permanently redirected as seen by the HTTP 301 Status Code. You need to change the URL that you are providing to be the new one that is being reported back. In your case shown above it would be (urlname)/us/
So you need to change the value of the urlname to include the trailing /us/ in the command you are calling.
So if you have a urlname value of http://thecoolsite.com/ you now need to use http://thecoolsite.com/us/
Hope this helps.
Related
OPTIONS http://localhost:9000/api/chat/ 404 (Not Found)
XMLHttpRequest cannot load http://localhost:9000/api/chat/. Response for preflight has invalid HTTP status code 404
https://www.playframework.com/documentation/2.5.x/CorsFilter has details on enabling CORS for Play (which is Lagom is built on). To handle the OPTIONS you may need to do something like:
.withAutoAcl(true)
.withServiceAcls(
ServiceAcl.methodAndPath(Method.OPTIONS, "/foo")
)
https://groups.google.com/forum/#!msg/lagom-framework/dtYN_1Ds4SQ/gT-BGPuCAQAJ is a lagom-framework list discussion thread with more details.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Access_control_CORS#Preflighted_requests has an explanation of why your browser is sending an OPTIONS request to begin with.
The metadata for the current exact maven artifact which provides CORS for Play is this:
<metadata>
<groupId>com.typesafe.play</groupId>
<artifactId>filters-helpers_2.12</artifactId>
<versioning>
<latest>2.6.0-M2</latest>
<release>2.6.0-M2</release>
<versions>
<version>2.6.0-M1</version>
<version>2.6.0-M2</version>
</versions>
<lastUpdated>20170310220437</lastUpdated>
</versioning>
</metadata>
I am following the Quickstart for Cloud Endpoints Frameworks on App Engine in standard environment. I have deployed the sample API. When I open https://[my-project].appspot.com/ I get the error message:
Error: Not Found. The Requested URL / was not found on this server
The logs show the message:
No Handlers matched this url
The app.yaml handlers are the what came with the endpoints-frameworks-v2/echo sample:
handlers:
# The endpoints handler must be mapped to /_ah/api.
- url: /_ah/api/.*
script: main.api
I was having great difficulty generating the OpenAPI configuration file in a previous step of the quickstart. I got it to work by updating the system variable path for the SDK but I did get this error:
No handlers could be found for logger "endpoints.apiserving"
WARNING:root:Method echo.echo_path_parameter specifies path parameters buy you are
not using a ResourceContainer. This will fail in future releases; please
switch to using ResourceContainer as soon as possible.
I have no idea if this error is relavant to the current problem.
Any help would be much appreciated.
Regarding the "No handlers could be found for logger..." you need to do this:
http://excid3.com/blog/no-handlers-could-be-found-for-logger
The other issue is a known issue:
What are ResourceContainers and how to use them for Cloud Endpoints?
You need a url handler for / if that is a valid url:
handlers:
# The endpoints handler must be mapped to /_ah/api.
- url: /_ah/api/.*
script: main.api
- url: /.* # catchall for all other urls
script: main.api # or wherever you handle the request for `/` and others
I am trying to index xml files in solr search engine using following command:
java -Durl=http://10.1.11.143:8080/solr/#/ -jar post.jar solr.xml
But I am getting following error:
SimplePostTool version 1.5
Posting files to base url http://10.1.11.143:8080/solr/#/ using content-type application/xml..
POSTing file solr.xml
SimplePostTool: WARNING: Solr returned an error #500 Internal Server Error
SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 500 for URL: http://10.1.11.143:8080/solr/#/
1 files indexed.
COMMITting Solr index changes to http://10.1.11.143:8080/solr/#/..
SimplePostTool: WARNING: Solr returned an error #500 Internal Server Error for url http://10.1.11.143:8080/solr/#/?commit=true
Time spent: 0:00:00.017
Please help me to come out of this error.
Content of solr.xml is as shown in the picture:
The issue is because of the URL. You didn't mention any requestHandler while updating. Use the following command. It'll work.
java -Durl=http://10.1.11.143:8080/solr/update?commit=true -jar post.jar solr.xml
/update is the requestHandler to index data into Solr.
from google.appengine.api import urlfetch
totango_url = "https://sdr.totango.com/pixel.png"
totango_url2 = "https://app.totango.com/images/accounts-users.png"
result = urlfetch.fetch(totango_url, validate_certificate=None )
print result.status_code
In production , request to totango_url logs indicate (with no error_detail) :
DownloadError: Unable to fetch URL: https://sdr.totango.com/pixel.gif
i ran this curl command. works fine from local setup , for both the https totango urls
curl -v "https://sdr.totango.com/pixel.gif"
curl -v "https://app.totango.com/images/accounts-users.png"
The ssl certificates are valid and same for both urls.
using the urlfetch.fetch on both urls also returns 200 from my (local) datastore console.
However , the urlfetch.fetch calls to https://sdr.totango.com/pixel.png fails with the above error
Also , i ran the same code in the google cloud playground tweaking the sample app-engine application and seem to get a 200 response for totango_url2 while it returns a 500 for totango_url. Both have the same ssl certificate , i think.
is there some ip whitelisting /firewall issue that app-engine in production that i need to take care of?
This sounds more like an issue on the remote side. If you're able to fetch that image from one place but not another, that speaks to the remote site doing some sort of filtering, possibly by IP address.
I am trying to crawl the web using nutch and I followed the documentation steps in the nutch's official web site (run the crawl successfully, copy the scheme-solr4.xml into solr directory). but when I run the
bin/nutch solrindex http://localhost:8983/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/*
I get the following error:
Indexer: starting at 2013-08-25 09:17:35
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
Active IndexWriters :
SOLRIndexWriter
solr.server.url : URL of the SOLR instance (mandatory)
solr.commit.size : buffer size when sending to SOLR (default 1000)
solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
solr.auth : use authentication (default false)
solr.auth.username : use authentication (default false)
solr.auth : username for authentication
solr.auth.password : password for authentication
Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:123)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:185)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:195)
I have to mention that the solr is running but I cannot browse http://localhost:8983/solr/admin (it redirects me to http://localhost:8983/solr/#).
On the other hand, when I stop the solr, I get the same error! Does anybody have any idea about what is wrong with my setting?
P.S. the url that I crawl is: http://localhost/NORC
Check your configuration against: Solr and Nutch
Nutch and Solr's schema files should be the same or you may encounter problems so make sure they match up
When I meet same problem in nutch, the solr's log appear a error message "unknown field host".
After modifying the schema.xml in solr, the nutch's error vanish.
You are missing the name of the core inside your command.
e.g.:
./bin/crawl -i -D solr.server.url=http://localhost:8983/solr/#/your_corname urls/ crawl 1