server log analysis of 404s - file

A large site I am working with is getting 80K+ 404s a day from Google for garbage URLs. I can't figure out where they are coming from. Here is a sample of a few. These URIs exist no where in the site structure so I am assuming they are being created by an external agent/site that is driving Gbot to crawl them. Anyone have any ideas?
7/2/2013 22:05 /Sl/4watQCXBFtF6obwFRA0f35148b 10262 404 - Not Found No
Referrer Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)
7/2/2013 22:05 /PvDIs6AveH9tju3tETtWg045cb22d 10261 404 - Not Found No
Referrer Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)

Related

405 Err on OPTIONS preflight for upload_url on Google Appengine SDK on different port #

I have a Google AppEngine project that works fine in production but not locally.
There is a React browser application running locally on port 3001 and a python api service running on 9090.
When I attempt to upload files via the React client, I first call an REST endpoint that returns the blobstore get_upload_url() to the client. This url is something like: http://localhost:9090/_ah/upload/aghkZXZ-... <-- note the port is that of the python service
When I fashion a POST request to that url from the browser client to actually upload the file, I get a 405 on the OPTIONS preflight check. So far as I understand, this is due to the ports being different. This only occurs in the local App Engine SDK since I am using dispatch.yaml settings in production to have everything on the same domain/port.
I had dug into the SDK code a while ago and put a hack in place. (https://gist.github.com/blainegarrett/4d3b3081d09b4ff7be00765eb32b0d94)
However, since upgrading Google Cloud to 218.0.0, the hack was overwritten and I'm back to square one.
Here are the headers to the blobstore upload url:
OPTIONS /_ah/upload/aghkZXZ-Tm9uZXIiCxIVX19CbG9iVXBsb2FkU2Vzc2lvbl9fGICAgICA77ALDA HTTP/1.1
Host: localhost:9090
Connection: keep-alive
Origin: http://localhost:3001
Access-Control-Request-Method: POST
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36
Accept: */*
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
I am currently using vanilla XMLHttpRequest() for the upload call specifically.
Does anyone have any suggestion on how to either get around the preflight check when the ports are different and/or to allow OPTIONS checks on the upload url in a less hacky way?
Update: I'd still like to hear an answer regarding the 405 on the SDK, but I was able to dodge the preflight check by getting rid of the xhr progress listener. My original assertion that the port difference was triggering the preflight check was incorrect. It was the progress callback.
xhr.upload.addEventListener('progress', function(e) { .. }
See research on: CORS request is preflighted, but it seems like it should not be

Removing header (User-agent) from make_fetch_call while requesting from GAE

I have an application of Google App Engine(GAE) and I am using Python 2.7. This application receives an GET(ajax) request from user portal(say Chrome). Upon receiving the request, I prepare Asynchronous connections for requesting data from multiple websites(say X1, X2, etc) outside GAE using urlfetch.make_fetch_call() - GET request.
This worked fine for X1 website but not for X2. Started probing on local dev server. Upon probing I suspected that X2 is checking {'User-Agent':'Python-urllib/2.7'} tag in header. This is my best guess since changing this field to {'User-Agent': 'Mozilla/5.0'} returns the desired results.
So I uploaded the code to GAE and started the process with urlfetch.make_fetch_call(). Upon intercepting this call i found that no matter what i do, the default header added by GAE is not removed.
Here is the default header added by GAE.
302 218ms 0kb Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36 AppEngine-Google; (+http://code.google.com/appengine; appid: s~xxx-etching-112014) module=default version=1
107.178.194.96 - - [06/Feb/2016:19:57:04 -0800] "GET / HTTP/1.1" 302 383 "http://www.mywebbsite.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36 AppEngine-Google; (+http://code.google.com/appengine; appid: s~xxx-etching-112014)" "1.usedForIntercepting.appspot.com" ms=218 cpu_ms=224 cpm_usd=0.000043 loading_request=1 app_engine_release=1.9.32 trace_id=fd7b7420e7f8c23371a5b0ea7e9651 instance=00c61b117ce5ebac2a2eba44f26a01d4f2
This is what i have tried
for portal in self.searchPortals:
spoofHeader = {
'User-agent':'Mozilla/5.0----------------------',
'Host':portal.getURL(),
'Accept-Encoding': 'identity',
'Connection': 'close',
'Accept': 'application/json, text/plain, */*',
'Origin': 'http://www.mywebsite.com'
}
logging.info(spoofHeader)
rpc = urlfetch.create_rpc(deadline=5)
rpc.callback = lambda: self.handleCallBack(rpc, portal)
#urlfetch.make_fetch_call(rpc, portal.getSearchURL(searchKeyword), headers={'User-agent':'Mozilla/5.0'})
urlfetch.make_fetch_call(rpc, url='http://1.usedforintercepting.appspot.com', headers=spoofHeader)
rpcs.append(rpc)
for rpc in rpcs:
rpc.wait()
This is what i received.
2016-02-07 13:01:21.306 / 302 59ms 0kb Mozilla/5.0---------------------- AppEngine-Google; (+http://code.google.com/appengine; appid: s~xxx-etching-112014) module=default version=1
107.178.194.20 - - [06/Feb/2016:23:31:21 -0800] "GET / HTTP/1.1" 302 383 - "Mozilla/5.0---------------------- AppEngine-Google; (+http://code.google.com/appengine; appid: s~xxx-etching-112014)" "1.usedForIntercepting.appspot.com" ms=59 cpu_ms=6 cpm_usd=0.000043 app_engine_release=1.9.32 trace_id=a4a1f521c5a6fa65ed0295835dd175 instance=00c61b117ce5ebac2a2eba44f26a01d4f2
What i want is something like this.
GET http://somelink/search/abc HTTP/1.1
Accept-Encoding: identity
Host: somelink.com
Connection: close
User-Agent: Mozilla/5.0
I want to remove everything form header other than User-Agent:Mozilla/5.0 ??
Note - for intercepting the request made from GAE using urlfetch i am using another instance of GAE.
In the documentation, URL Fetch Python API Overview: Request Headers, it says
For security reasons, the following headers cannot be modified by the application:
Content-Length
Host
Vary
Via
X-Appengine-Inbound-Appid
X-Forwarded-For
X-ProxyUser-IP
It also says:
The following headers indicate the app ID of the requesting app:
User-Agent. This header can be modified but App Engine will append an identifier string to allow servers to identify App Engine requests. The appended string has the format "AppEngine-Google; (+http://code.google.com/appengine; appid: APPID)", where APPID is your app's identifier.
If you want custom headers, you will have to write your own urlfetch code or use an outside server that makes the call for you with your headers.

Glassware Starter Project(Java) works fine on localhost, but not in app engine

I have build it from eclipse and ran on development server, it works, when I deploy it on app engine from eclipse, I got this following error, any idea why?
Very first time it went up to oauth2 dance, but it did not succeed, I have fixed redirect-url on api console, then, localhost worked, I did deployed again onto app engine, but it did not work, I got this error:
Error: 500 Server Error
The server encountered an error and could not complete your request.
If the problem persists, please report your problem and mention this error message and the query that caused it.
Error log in App engine console:
2013-06-07 01:59:36.619 /oauth2callback?code={removed now} 500 2416ms 0kb Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36
68.5.238.205 - - [07/Jun/2013:01:59:36 -0700] "GET /oauth2callback?code={removed now} HTTP/1.1" 500 0 - "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36" "{my appid }.appspot.com" ms=2416 cpu_ms=1581 cpm_usd=0.000087 app_engine_release=1.8.0 instance=00c61b117c2c977fac245e8480eff747e75eb6
I 2013-06-07 01:59:34.228
com.google.glassware.AuthFilter doFilter: Skipping auth check during auth flow
I 2013-06-07 01:59:34.230
com.google.glassware.AuthServlet doGet: Got a code. Attempting to exchange for access token.
I 2013-06-07 01:59:35.427
com.google.glassware.AuthServlet doGet: Code exchange worked. User 115370471277937689999 logged in.
W 2013-06-07 01:59:36.614
Error for /oauth2callback
java.lang.NoClassDefFoundError: com/google/common/collect/Lists
at com.google.glassware.NewUserBootstrapper.bootstrapNewUser(NewUserBootstrapper.java:54)
at com.google.glassware.AuthServlet.doGet(AuthServlet.java:67)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
This looks like a bug in App Engine and the Google OAuth endpoints. It's being tracked in the Glass issue tracker. You can star that issue to receive updates as the investigation proceeds.
In the mean time, check out the .NET and PHP quick starts. They're working great.

google app engine converting (redirecting) HTTP POST to GET calls into the Handler

Https POST's on google app engine are getting redirected to app url with a GET and losing all arguments.
2011-11-28 22:21:06.026 / 302 218ms 0kb
71.167.39.92 - - [28/Nov/2011:19:21:06 -0800] "POST / HTTP/1.1" 302 0 "http://static.ak.facebook.com/platform/page_proxy.php?v=4" -
"9.appname.appspot.com" ms=219 cpu_ms=0 api_cpu_ms=0 cpm_usd=0.000095
My handler is not called at this time, and app engine redirects above post to
2011-11-28 22:21:06.100 / 200 13ms 1kb Mozilla/5.0 (Windows NT 6.1;
WOW64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2
71.167.39.92 - - [28/Nov/2011:19:21:06 -0800] "GET / HTTP/1.1" 200 1661 "http://static.ak.facebook.com/platform/page_proxy.php?v=4"
"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20100101
Firefox/6.0.2" "9.appname.appspot.com" ms=14 cpu_ms=23 api_cpu_ms=0
cpm_usd=0.000873 instance=00c61b117c6840ba7ad8c376b950491ada80
This happens for every post. A HTTPs GET calls my handler directly.
I had a secure=always on my app.yaml - which caused the redirects to trigger every time on POSTs and resulted in GET's with full loss of post parameters (like signed_requests for FB apps for example). It is solved now.

"HTTP ERROR: 500 No realm" running GWT's "MobileWebApp" sample

I'm trying to run the GWT 2.4 sample app "MobileWebApp". I get a 500 "No Realm" error when I try to run the app in dev mode through Eclipse.
I understand this is an authentication problem.
I'm not familiar with Google App Engine or Jetty but from looking at the web.xml I can see there is a servlet filter where it is using the appengine UserService to presumably redirect the user to Google for authentication.
I'm using:
Eclipse 3.7 (Indigo SR1)
Google Plugin for Eclipse 2.4
m2eclipse
I'm including an excerpt from the web.xml below. I'm not sure what other info would be helpful in diagnosing this problem.
<security-constraint>
<display-name>
Redirect to the login page if needed before showing
the host html page.
</display-name>
<web-resource-collection>
<web-resource-name>Login required</web-resource-name>
<url-pattern>/MobileWebApp.html</url-pattern>
</web-resource-collection>
<auth-constraint>
<role-name>*</role-name>
</auth-constraint>
</security-constraint>
<filter>
<filter-name>GaeAuthFilter</filter-name>
<!--
This filter demonstrates making GAE authentication
services visible to a RequestFactory client.
-->
<filter-class>com.google.gwt.sample.gaerequest.server.GaeAuthFilter</filter-class>
</filter>
<filter-mapping>
<filter-name>GaeAuthFilter</filter-name>
<url-pattern>/gwtRequest/*</url-pattern>
</filter-mapping>
Below is the output in the Eclipse console:
[WARN] Request /MobileWebApp.html failed - no realm
[ERROR] 500 - GET /MobileWebApp.html?gwt.codesvr=127.0.0.1:9997 (127.0.0.1) 1401 bytes
Request headers
Host: 127.0.0.1:8888
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:7.0.1) Gecko/20100101 Firefox/7.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Connection: keep-alive
Response headers
Content-Type: text/html; charset=iso-8859-1
Content-Length: 1401
Many thanks for any helpful advice!
Edit on 11/11/11: I added Jetty tag since it seems relevant to this problem.
If your very first request fails, just getting the /MobileWebApp.html page, then it probably isn't an authentication problem. Do you have GAE enabled for that project (not only GWT)? That might be one issue.
I read somewhere that there's two ways of debugging an app in Eclipse, one is with run as/webapp, and forgot which was the other one (I don't use Eclipse). One of them works and another doesn't.
If that doesn't work, you can try replacing the built-in jetty:
add a GWT param: -server com.google.appengine.tools.development.gwt.AppEngineLauncher
VM param: -javaagent:/path_to/appengine-agent.jar
And the last option is with -noserver, but then you wont be able to debug the server-side code, just the client-side GWT stuff: first start jetty with mvn jetty:run and then debug in Eclipse with -noserver GWT param.
I had the same problem. Finally I noticed that when I switched to a newer version of Appengine, the older Appengine libraries remained in the WEB-INF/lib along with the new ones.
Removing them solved the problem.

Resources