I have built 2 nagios servers this week. The first was just a proof of concept, and tonight I built the prod one. I followed the exact same instructions on both, and migrated my existing configuration over to the new server tonight. Everything works perfect, except that some check_http checks are getting a 404 error, even though I can curl and wget the address. Example:
./check_http -I 127.0.0.1 -u http://11.210.1.18:8001/alphaweb/login.html
HTTP WARNING: HTTP/1.1 404 Not Found - 528 bytes in 0.000 second response time |time=0.000418s;;;0.000000 size=528B;;;0
I can curl this address with no problem. But the following succeeds:
./check_http -I 127.0.0.1 -u http://11.210.1.16:7001
HTTP OK: HTTP/1.1 200 OK - 288 bytes in 0.001 second response time |time=0.000698s;;;0.000000 size=288B;;;0
Both of these checks work perfectly on an almost identical server, any ideas?
Good thing is, that you receive some http status code, even 404 is good one, because you somehow interact with web server.
check log files on target web servers
Assuming you have access to the target web server I would recommend you to check log files.
Both requests, even the one with 404 status code, shall be seen there. And here can come a surprise, you might find, that while your check is getting some response, your log files is not showing any log record about it. In such a case I would suspect some proxy or iptables in the way.
doublecheck spelling of your calls
But the cause could be much simpler - small mistake in your command causing significant difference.
Related
I am doing some load testing on a service run with Apache2 and my load testing tool has a default timeout of 30 seconds. When I run the tool for a minute with 1 request per second load, it reports that 40 succeeded with 200 OK response and 20 requests were cancelled because client timeout exceeded while awaiting headers.
Now, I was trying to spot this on the server side. I can't see the timeouts logged either in apache access logs or gunicorn access logs. Note that I am interested in connections that weren't accepted as well as that are accepted and times out.
I have some experience working on similar services on Windows. The http.sys error logs would show connection dropped errors and we would know if our server was dropping connections.
When a client times out, all the server knows is that the client has aborted the connection. In mod_log's config, the %X format specifier is used to log the status of the client connection after the request has completed, which is exactly what you want to know in this case.
Configure your logs to use %X, and look for the X character in the log lines.
Bonus: I even found the discussion about this feature in apache's dev forum, from 20 years ago
Update:
Regarding refused connections, these cannot be logged by apache. Connection refusal is done by the kernel, in the tcp stack, and not by apache. The closest solution including only apache that I can think of is keeping track of the amount of open connections (using mod_status). If it reaches the maximum you know you might be refusing connections. Otherwise, you'd need to set up some monitoring solution to track tcp resets sent by the kernel.
I am trying to implement a REST protocol and have realized in trying to debug that my web server is disallowing the PUT request.
I have tested and confirmed this further by running:
curl -X PUT http://www.mywebserver.com/testpage
Which for my web server gives back a 403 - Forbidden error.
The same happens for DELETE, where as for POST and GET everything is fine.
I am wondering if this is a common issue that those who use REST encounter and what the work-around might be?
Could I make a simple change to an .htaccess file? Or do I need to modify the protocol to set a hidden variable "_method" in the POST query string?
Often web servers will be configured to block anything except GET and POST since
99% of the time they're all that are needed and there have been problems in the
past with applications assuming the requests were one of those two.
You don't say which server it is but, for example, you can tell Apache which
methods to allow with the directive:
eg:
<Limit POST PUT DELETE>
Require valid-user
</Limit>
It sounds like maybe some helpful sysadmin has used this to block non GET/POST
You could try an .htaccess with
<Limit GET POST PUT DELETE>
Allow from all
</Limit>
(I'm not an expert at apache, this may not be exactly correct)
UPDATE
App Engine SDK 1.9.24 was released on July 20, 2015, so if you're still experiencing this, you should be able to fix this simply by updating. See +jpatokal's answer below for an explanation of the exact problem and solution.
Original Question
I have an application I'm working with and running into troubles when developing locally.
We have some shared code that checks an auth server for our apps using urllib2.urlopen. When I develop locally, I'm getting rejected with a 404 on my app that makes the request from AppEngine, but the request succeeds just fine from a terminal.
I have appengine running on port localhost:8000, and the auth server on localhost:8001
import urllib2
url = "http://localhost:8001/api/CheckAuthentication/?__client_id=dev&token=c7jl2y3smhzzqabhxnzrlyq5r5sdyjr8&username=amadison&__signature=6IXnj08bAnKoIBvJQUuBG8O1kBuBCWS8655s3DpBQIE="
try:
r = urllib2.urlopen(url)
print(r.geturl())
print(r.read())
except urllib2.HTTPError as e:
print("got error: {} - {}".format(e.code, e.reason))
which results in got error: 404 - Not Found from within AppEngine
It appears that AppEngine is adding the schema, host and port to the PATH portion of the url I'm trying to hit, as this is what I see on the auth server:
[02/Jul/2015 16:54:16] "GET http://localhost:8001/api/CheckAuthentication/?__client_id=dev&token=c7jl2y3smhzzqabhxnzrlyq5r5sdyjr8&username=amadison&__signature=6IXnj08bAnKoIBvJQUuBG8O1kBuBCWS8655s3DpBQIE= HTTP/1.1" 404 10146
and from the request header we can see the whole scheme and host and port are being passed along as part of the path (header pieces below):
'HTTP_HOST': 'localhost:8001',
'PATH_INFO': u'http://localhost:8001/api/CheckAuthentication/',
'SERVER_PORT': '8001',
'SERVER_PROTOCOL': 'HTTP/1.1',
Is there any way to not have the AppEngine Dev server hijack this request to localhost on a different port? Or am I not misunderstanding what is happening? Everything works fine in production where our domains are different.
Thanks in advance for any assistance helping to point me in the right direction.
This is an annoying problem introduced by the urlfetch_stub implementation. I'm not sure what gcloud sdk version introduced it.
I've fixed this by patching the gcloud SDK - until Google does.
which means this answer will hopefully be irrelevant shortly
Find and open urlfetch_stub.py, which can often be found at ~/google-cloud-sdk/platform/google_appengine/google/appengine/api/urlfetch_stub.py
Around line 380 (depends on version), find:
full_path = urlparse.urlunsplit((protocol, host, path, query, ''))
and replace it with:
full_path = urlparse.urlunsplit(('', '', path, query, ''))
more info
You were correct in assuming the issue was a broken PATH_INFO header. The full_path here is being passed after the connection is made.
disclaimer
I may very easily have broken proxy requests with this patch. Because I expect google to fix it, I'm not going to go too crazy about it.
To be very clear this bug is ONLY related to LOCAL app development - you won't see this on production.
App Engine SDK 1.9.24 was released on July 20, 2015, so if you're still experiencing this, you should be able to fix this simply by updating.
Here's a brief explanation of what happened. Until 1.9.21, the SDK was formatting URL fetch requests with relative paths, like this:
GET /test/ HTTP/1.1
Host: 127.0.0.1:5000
In 1.9.22, to better support proxies, this changed to absolute paths:
GET http://127.0.0.1:5000/test/ HTTP/1.1
Host: 127.0.0.1:5000
Both formats are perfectly legal per the HTTP/1.1 spec, see RFC 2616, section 5.1.2. However, while that spec dates to 1999, there are apparently quite a few HTTP request handlers that do not parse the absolute form correctly, instead just naively concatenating the path and the host together.
So in the interest of compatibility, the previous behavior has been restored. (Unless you're using a proxy, in which case the RFC requires absolute paths.)
I have set up Varnish on my centos server which runs my drupal site.
Browsing to any page returns a blank page due to 503 :Service Unavailable
I have read many questions and answers about intermittent 503's but this is occurring constantly. I can still browse to the site using www.example.com:8080 .
I am running on Centos 6 using the VCL :
https://raw.githubusercontent.com/NITEMAN/Varnish_VCL_samps-hacks/master/varnish3/drupal-base.vcl
I have also tried https://fourkitchens.atlassian.net/wiki/display/TECH/Configure+Varnish+3+for+Drupal+7 .
Not sure where to even start in debugging this.
ADDITIONAL INFO:
NITEMANS answer below provides some really helpful debug suggestions.
In my case it was something very simple, I had left the default 127.0.0.1 in my default.vcl . Changing this to my real external IP got things working. I hope that is the correct thing to do!
As you're running my sample VCL, it should be easy to debug (try each step separately):
Make sure apache is listening on 127.0.0.1:8080 (as it can be listening on another IP and not in the local loopback). netstat -lpn | grep 8080 should help.
Rise backend timeouts (if the server is very slow, since defined timeouts are already huge). Requires a Varnish reload.
Disable health probe (as Varnish can be marking the backend as sick). Comment probe basic block and probe line on backend default. Requires a Varnish reload.
Disable Varnish logic, uncommenting the first return(pipe) on sub vcl_recv. Requires a Varnish reload.
You should also provide when debugging:
varnishadm debug.health output
varnishlog output for a sample request
Hope it helps!
I know you can download the raw access logs with appcfg.py, but I'm really interested in all the information around a specific request like python logging statements, exceptions and api statistics (just like the online log viewer). Does anyone know if there is a way to get that information another way then having to build it yourself?
If case anyone is wondering, we want to do some continuos statistical analyzing for problems and displaying them on a large screen on a wall in the office.
Sure - just pass the --severity flag to appcfg.py:
$ appcfg.py help request_logs
Usage: appcfg.py [options] request_logs <directory> <output_file>
Write request logs in Apache common log format.
The 'request_logs' command exports the request logs from your application
to a file. It will write Apache common log format records ordered
chronologically. If output file is '-' stdout will be written.
Options:
-h, --help Show the help message and exit.
-q, --quiet Print errors only.
-v, --verbose Print info level logs.
--noisy Print all logs.
-s SERVER, --server=SERVER
The server to connect to.
--insecure Use HTTP when communicating with the server.
-e EMAIL, --email=EMAIL
The username to use. Will prompt if omitted.
-H HOST, --host=HOST Overrides the Host header sent with all RPCs.
--no_cookies Do not save authentication cookies to local disk.
--passin Read the login password from stdin.
-A APP_ID, --application=APP_ID
Override application from app.yaml file.
-V VERSION, --version=VERSION
Override (major) version from app.yaml file.
-n NUM_DAYS, --num_days=NUM_DAYS
Number of days worth of log data to get. The cut-off
point is midnight UTC. Use 0 to get all available
logs. Default is 1, unless --append is also given;
then the default is 0.
-a, --append Append to existing file.
--severity=SEVERITY Severity of app-level log messages to get. The range
is 0 (DEBUG) through 4 (CRITICAL). If omitted, only
request logs are returned.
--vhost=VHOST The virtual host of log messages to get. If omitted,
all log messages are returned.
--include_vhost Include virtual host in log messages.
--end_date=END_DATE End date (as YYYY-MM-DD) of period for log data.
Defaults to today.
This is what works for us really well:
appcfg.py --append --num_days=0 --include_all request_logs /path/to/your/app/ /var/log/gae/yourapp.log
Anyway, the line above will get all your log records and append them to a log file if you've executed this before, if not, it will create a new log file. It actually looks at your existing log (if it's there) and it will not get any duplicates. You can run this without --append if you want, but use it if you are automating log downloads.
The key here is the --include_allflag which seems to be undocumented. This flag will get all the data that you see if you use GAE's web log viewer. So, you will get fields such as: ms=71 cpu_ms=32 api_cpu_ms=12 cpm_usd=0.000921... etc.
OK, I hope that helps someone.
BTW, we wrote up a blog post on this, check it out here.
I seem to be running into 100M limit with appcfg. I ended up using logservice API to get the logs
Here's the code - https://github.com/manasg/gae-log-fetcher
Here is a way to access raw logs so you can further processing without custom parsing (also for me request_logs is not downloading all the data for specified time frame).
Here is an app which runs in the appengine itself:
https://gaelogapp.appspot.com/
You can easily add this functionality to your app with updates to app.yaml and copy logs.py:
https://github.com/okigan/gaelogapp