404 Pages and 301 redirect - http-status-code-404

We built pages like this:
Old URL:
http://www.ifsc-code.co.in/all-india-banks-database/bank-of-india/karnataka/
Please notice, bank of india is the bank name and karnataka is the state name.
New URL:
http://bank-of-india.ifsc-code.co.in/karnataka
There are 45000 old urls all have been set as 301 redirect to new url. Its been 2 months, but still google sees them as 404. Why?
This is how Googlebot fetched the page.
URL: http:/ /www.ifsc-code.co.in/all-india-banks-database/bank-of-india/karnataka/
Date: Thursday, March 29, 2012 1:29:54 PM PDT
Googlebot Type: Web
Download Time (in milliseconds): 168
HTTP/1.1 301 Moved Permanently
Date: Thu, 29 Mar 2012 20:29:54 GMT
Server: Apache/2.2.21 (Unix) mod_ssl/2.2.21 OpenSSL/0.9.8e-fips-rhel5 DAV/2 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
X-Powered-By: PHP/5.3.8
Location: http:/ /bank-of-india.ifsc-code.co.in/karnataka
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html
Then why does this page shows up in 404 page? The url that links to that page doesn't even exist, which is also a 301 redirect.
Please help.

Because you have to manually mark them as fixed in webmaster tools : Mark crawl error as fixed

Related

Why is my Netlify-hosted Gatsby site 301 redirecting to a path without a domain?

I did a quick check to see make sure my site was 301 redirecting from:
https://inlunar.com/news/iceye-shows-off-new-sharp-images-from-satellite
to the www. version:
https://www.inlunar.com/news/iceye-shows-off-new-sharp-images-from-satellite
However, when I checked, I found that there was an extra 301 redirect happening from the www. url to:
/news/iceye-shows-off-new-sharp-images-from-satellite
without the domain name anywhere to be found. Here is the full log of that second redirect:
>>> https://www.inlunar.com/news/iceye-shows-off-new-sharp-images-from-satellite
> --------------------------------------------
> 301 Moved Permanently
> --------------------------------------------
Status: 301 Moved Permanently
Code: 301
Cache-Control: public, max-age=0, must-revalidate
Content-Type: text/html; charset=UTF-8
Date: Wed, 17 Jun 2020 01:09:17 GMT
Etag: "8af6153ff17d129285674adb734ca0e3-ssl"
Strict-Transport-Security: max-age=31536000
Age: 0
Server: Netlify
X-NF-Request-ID: 69351fad-bde6-4674-a9b8-fe017a45ee0c-2118676
Location: /news/iceye-shows-off-new-sharp-images-from-satellite/
Why is this second 301 redirect happening?
Netlify appears to be redirecting to the location of the current request with a trailing slash, consistent with their documentation, in an effort to improve cache hit rates.
As for the omission of the domain, it's simply a relative URL.
Relative URLs are URLs that do not include a scheme or a host. In
order to be understood they must be combined with the URL of the
original request.
Client request for http://www.example.com/blog:
GET /blog HTTP/1.1
Host: www.example.com
Server response:
HTTP/1.1 302 Found
Location: /articles/
The URL of the location is expanded by the client to
http://www.example.com/articles/.

Google Webmaster Tools gets 301 Redirect incorrectly

In Google Webmaster Tools, I tell Google to crawl and render my site:
Inlcuding:
http://smartnavi-app.com/download
Google shows me its HTTP Response:
HTTP/1.1 301 Moved Permanently
Date: Mon, 11 Aug 2014 07:24:56 GMT
Server: nginx/1.4.2
Connection: Keep-Alive
Content-Type: text/html;charset=UTF-8
Location: http://smartnavi-app.com/index.html
Content-Length: 0
Keep-Alive: timeout=5, max=100
But there should be NO redirect for that URL!
If I open this URL I am correctly not redirected. So why is Google?
I got it. I am using prerender.io to cache my AJAX Website for Crawler-Bots.
In the .htaccess my Domain in the prerender.io part was an old one, so there was kind of a redirect loop.
So if you change redirects etc. never forget your prerender.io stuff!

How to enable browser caching in GAE

Although this question should be trivial, I didn't success to enable browser caching on web google app engine java server.
I've try to put this kind of thing in my appengine-web.xml:
<static-files>
<include path="/**.cache.**" expiration="365d" />
...
but when I'm looking the response header I find this in local:
Content-Length: 196084
Cache-Control: public, max-age=31536000
Expires: Fri, 10 Jan 2014 19:40:45 GMT
Content-Type: image/png
Last-Modified: Tue, 18 Dec 2012 21:41:22 GMT
Server: Jetty(6.1.x)
Which is fine... but this in production environment:
HTTP/1.1 304 Not Modified
ETag: "RV4Bpg"
X-AppEngine-Estimated-CPM-US-Dollars: $0.000000
X-AppEngine-Resource-Usage: ms=109 cpu_ms=0
Date: Thu, 10 Jan 2013 19:41:20 GMT
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Cache-Control: no-cache, must-revalidate
Server: Google Frontend
Which is definitively not what I want :(
Any idea ? something I've missed ?
[EDIT]
for not yet downloaded content, my browser receive the following header:
HTTP/1.1 200 OK
ETag: "RV4Bpg"
Date: Fri, 11 Jan 2013 12:50:50 GMT
Expires: Sat, 11 Jan 2014 12:50:50 GMT
Cache-Control: public, max-age=31536000
X-AppEngine-Estimated-CPM-US-Dollars: $0.000000
X-AppEngine-Resource-Usage: ms=3 cpu_ms=0
Date: Fri, 11 Jan 2013 12:50:50 GMT
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Cache-Control: no-cache, must-revalidate
Content-Type: image/png
Server: Google Frontend
Content-Length: 196084
Proxy-Connection: Keep-Alive
Connection: Keep-Alive
X-RBT-Optimized-By: eu-dcc-sh02 (RiOS 6.5.5b) SC
An ETag and several contradictory 'Expires' and 'Cache-Control' ...
Is there several way to configure caching policy ? Could it come from my ISP ? or a proxy ?
When you are logged in to a Google App Engine application as an administrator:
The X-AppEngine-* headers shown in your question are included.
The Cache-Control: no-cache, must-revalidate header is included, because the X-AppEngine-* headers are private and must not be cached.
This is hidden at the end of the Responses section at https://developers.google.com/appengine/docs/python/runtime#Responses, which says that:
Responses with resource usage statistics will be made uncacheable.
Yes, Cache-Control is off because reply is HTTP 304.
The problem is that your browser saved the ETag: http://en.wikipedia.org/wiki/HTTP_ETag
Now for every request for the same url/content, browser provides ETag and GAE replies with HTTP 304 Not Modified.
Try changing the resource (image) at this url, checking another url that you have not yet loaded in this browser or using another browser or computer altogether.
Also, this is relevant: What takes precedence: the ETag or Last-Modified HTTP header?

GZIP content in Google App Engine using django-nonrel

I have a django-nonrel app running in Google App Engine and am wanting all the content to be gzipped.
I keep reading that GAE automatically gzips the content but when I check the headers using Firefox's web developer toolbar I get the following result:
Via: 1.1 TL-ISA1
Connection: Keep-Alive
Proxy-Connection: Keep-Alive
Transfer-Encoding: chunked
Expires: Thu, 09 Dec 2010 12:23:46 GMT
Date: Thu, 09 Dec 2010 12:23:46 GMT
Content-Type: text/html; charset=utf-8
Etag: "463ad22512f09050f76a291c11d9746d"
Server: Google Frontend
Last-Modified: Thu, 09 Dec 2010 12:23:46 GMT
Cache-Control: max-age=0
200 OK
I was expecting to see Content-Encoding: gzip, but since it is not there, my assumption is that the content is not being gzipped as it should.
Am I missing something? For example, do I need to do something extra if I am using django-nonrel?
Just to add, I am new to Web development - so don't be afraid to patronise. Thanks
Gzip should work out of the box, you are probably requesting the page through a proxy.

Google App Engine Set-Cookie fails to use my expiration date

I am trying to set a cookie in my Google App Engine page:
self.response.headers.add_header('Set-Cookie','CookieName=1234; expires:Sun, 31-May-2009 23:59:59 GMT; path=/;')
The expiration date is not showing up in the browser. So it deletes itself at the end of the session.
Here is the output from curl -D:
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Cache-Control: no-cache
Set-Cookie: CookieName=1234; expires:Fri, 01 Jan 2010 11:48:41 GMT
Date: Fri, 08 May 2009 11:57:25 GMT
Server: Google Frontend
Expires: Fri, 08 May 2009 11:57:25 GMT
Transfer-Encoding: chunked
What am I missing?
The problem is you're using "expires:" with a colon. Needs to be "expires=" with an equals.
With a "curl -D somefile" I can check that your cookie comes to the client exactly as specified. Can you check that, and confirm that the issue is with your browser and its settings rather than with the server side?

Resources