I have uploaded robots.txt in my url http://watchmariyaanmovieonline.appspot.com/robots.txt , But when i use google webmaster and do Fetch as google for my home page http://watchmariyaanmovieonline.appspot.com/ i get error Unreachable robots.txt
Your robots.txt contents have one empty Disallow due to which you get that error.
User-agent: *
Disallow:
Disallow: /cgi-bin/
Sitemap: http://watchmariyaanmovieonline.appspot.com/sitemap.xml
Update it to:
User-agent: *
Disallow: /cgi-bin/
Sitemap: http://watchmariyaanmovieonline.appspot.com/sitemap.xml
And it should work just fine, let me know if this helps :)
Related
I have created a site, complete with responsive layout which works well.
Apparently google thinks the site isn't mobile friendly, and has listed a whole pile of resoures that I notice are included in this text in the robots.txt file
Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
It looks like I need to allow access to some of these files/folders including media, templates and plugins
I am concerned that google will then be putting up administrator type pages within its search results
What should I do?
Is it ok to do this - and which ones should I allow?
Thanks
After some more rooting around I just made image, media and templates viewable to robots. Now my site is friends with google.
I have an AngularJs app (with routing by ui-router) being served by nginx.
I'm not sure exactly who/what the culprit is, but when I refresh my browser with a URL with a trailing slash, it looks like the latter part of the URL is treated as a subdirectory, and static resources (img / css / js / etc) are then requested from a location prefixed with this subdirectory (which doesn't exist)
Example:
rereshing www.site.com/login will require logo.png, which will be requested from www.site.com/images/logo.png
rereshing www.site.com/login/ (trailing slash) will require logo.png, which will be requested from www.site.com/login/images/logo.png
I need to somehow prevent this login/ as being treated as a subdirectory.
nginx config:
Since angular does its own routing, in my nginx config I have an api route for the REST api, and a fallback route for all other URIs, which serves index.html for all unknown resources.
I have added config to strip trailing slashes from URIs.
# api route
location ~* /api.? {
...
}
# fallback route
location ~* .? {
# strip any trailing slash
rewrite ^/(.*)/$ /$1 break;
root /var/www/site/app;
index index.html;
# serve index.html if uri doesn't exist
try_files $uri $uri/ /index.html =404;
}
Whenever I try to refresh a route with a trailing slash, it looks as if something (Angular/nginx/browser?) treats the URI as a directory, and prefixes that directory onto all the GET requests for static resources, which breaks.
Here are excerpts from my nginx log:
Works: refresh an angular route with no trailing slash
http://localhost/login
[debug] : http request line: "GET /login HTTP/1.1"
[debug] : http header: "Referer: http://localhost/"
[debug] : http request line: "GET /images/logo.png HTTP/1.1"
Doesn't Work: refresh an angular route with a trailing slash
http://localhost/login/
[debug] : http request line: "GET /login/ HTTP/1.1"
[debug] : http header: "Referer: http://localhost/login/"
[debug] : http request line: "GET /login/images/logo.png HTTP/1.1"
*error* /login/images/logo.png doesn't exist
I'm not sure if the Referer header is a red-herring?
Question
How can I configure angular and/or ui-router and/or ngingx (or whoever the culprit is) so that refreshing routes with trailing slashes works?
This was fixed by setting the base to root
<base href="/">
In robots.txt I can put:
#Baiduspider
User-agent: Baiduspider
Disallow: /
#Yandex
User-agent: Yandex
Disallow: /
to tell the search engines to stop crawling my app pages (php app). But how to block them by IPs in GAE?
There are two ways.
Do it in your code.
Use the DOS facilities https://developers.google.com/appengine/docs/python/config/dos
A large site I am working with is getting 80K+ 404s a day from Google for garbage URLs. I can't figure out where they are coming from. Here is a sample of a few. These URIs exist no where in the site structure so I am assuming they are being created by an external agent/site that is driving Gbot to crawl them. Anyone have any ideas?
7/2/2013 22:05 /Sl/4watQCXBFtF6obwFRA0f35148b 10262 404 - Not Found No
Referrer Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)
7/2/2013 22:05 /PvDIs6AveH9tju3tETtWg045cb22d 10261 404 - Not Found No
Referrer Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)
I'm trying to run the GWT 2.4 sample app "MobileWebApp". I get a 500 "No Realm" error when I try to run the app in dev mode through Eclipse.
I understand this is an authentication problem.
I'm not familiar with Google App Engine or Jetty but from looking at the web.xml I can see there is a servlet filter where it is using the appengine UserService to presumably redirect the user to Google for authentication.
I'm using:
Eclipse 3.7 (Indigo SR1)
Google Plugin for Eclipse 2.4
m2eclipse
I'm including an excerpt from the web.xml below. I'm not sure what other info would be helpful in diagnosing this problem.
<security-constraint>
<display-name>
Redirect to the login page if needed before showing
the host html page.
</display-name>
<web-resource-collection>
<web-resource-name>Login required</web-resource-name>
<url-pattern>/MobileWebApp.html</url-pattern>
</web-resource-collection>
<auth-constraint>
<role-name>*</role-name>
</auth-constraint>
</security-constraint>
<filter>
<filter-name>GaeAuthFilter</filter-name>
<!--
This filter demonstrates making GAE authentication
services visible to a RequestFactory client.
-->
<filter-class>com.google.gwt.sample.gaerequest.server.GaeAuthFilter</filter-class>
</filter>
<filter-mapping>
<filter-name>GaeAuthFilter</filter-name>
<url-pattern>/gwtRequest/*</url-pattern>
</filter-mapping>
Below is the output in the Eclipse console:
[WARN] Request /MobileWebApp.html failed - no realm
[ERROR] 500 - GET /MobileWebApp.html?gwt.codesvr=127.0.0.1:9997 (127.0.0.1) 1401 bytes
Request headers
Host: 127.0.0.1:8888
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:7.0.1) Gecko/20100101 Firefox/7.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Connection: keep-alive
Response headers
Content-Type: text/html; charset=iso-8859-1
Content-Length: 1401
Many thanks for any helpful advice!
Edit on 11/11/11: I added Jetty tag since it seems relevant to this problem.
If your very first request fails, just getting the /MobileWebApp.html page, then it probably isn't an authentication problem. Do you have GAE enabled for that project (not only GWT)? That might be one issue.
I read somewhere that there's two ways of debugging an app in Eclipse, one is with run as/webapp, and forgot which was the other one (I don't use Eclipse). One of them works and another doesn't.
If that doesn't work, you can try replacing the built-in jetty:
add a GWT param: -server com.google.appengine.tools.development.gwt.AppEngineLauncher
VM param: -javaagent:/path_to/appengine-agent.jar
And the last option is with -noserver, but then you wont be able to debug the server-side code, just the client-side GWT stuff: first start jetty with mvn jetty:run and then debug in Eclipse with -noserver GWT param.
I had the same problem. Finally I noticed that when I switched to a newer version of Appengine, the older Appengine libraries remained in the WEB-INF/lib along with the new ones.
Removing them solved the problem.