What could be the reason 400 error are not being logged in IIS logs? - valence

I tried getting the IIS logs as per GMT (when the call was made) and also the client time zone plus collected the logs for a day before, including and after the day the call was made.
Now I get 200 and 404 error codes, I do not get 400 error codes in the logs (or those that I get don't match the user's time stamp of the call)
Are all 400 error codes always logged and what could be the reason they are not showing in the IIS logs?
Is there anything specific we should be keeping in mind when collecting the logs/ queries.
Note: when collecting the IIS logs, I seemed to have gotten some errors regarding file creation etc, could that be the reason?

Check your request logs. Sometimes they don't go in the error log depending on your configuration.

Related

How to join Google App Engine's logging table in BigQuery with Error Reporting

Google App Engine (GAE) creates a table for each day of logging with lines containing several logs information, such as "status" (e.g.: 500, 404), etc.
Yet, this table does not contain the Resolution Status for the errors in Error Reporting. E.g.:
At the moment, I would like to get how many "Acknowledged" errors happen per day. I can get which errors happen per day through the appengine_googleapis_com_request_log_* (e.g.: appengine_googleapis_com_request_log_20211130). However, I don't know how can I get if the error is Acknowledged or Open.
Does anyone know how can I combine those information, or at least if Error Reporting saves its information in any BigQuery table?
Reviewing Google documentation to help with your question, apparently, it is impossible to get the resolution status.
You can view your errors on the Error Reporting page of your GCP Console, which displays a list of all errors in the order of frequency. Errors with the same root cause are grouped together. The error reporting list provides the following information for all reported errors:
Resolution status
Occurrences
Users
Error
Seen in
First seen
Last seen
Response code
In case you would like to have more information, you can review the viewing error documentation.
Now, as I said, there is no way to get the resolution status; however, what you can try to do is a feature request.
Here is some documentation that shows what you can get with the API, it could help you with the feature request.
https://cloud.google.com/support/docs/issue-trackers
https://cloud.google.com/error-reporting/reference/rest/v1beta1/ErrorEvent
Additionally, here is a link I found regarding the Error Processing and Log Monitoring documentation using GCP.

Office 365 Management API activity/feed/subscriptions APIs returning InternalServerError

Am researching activity audits for the last couple days using an asp.net MVC project. I was using contentType=Audit.Exchange and contentType=Azure.ActiveDirectory successfully since the last half of yesterday and this morning up until about two hours ago.
I made no changes to my authorization/authentication code and the tokens look good. Also no changes to the calls themselves. I added some json handling for the response to list subscriptions and when I ran the app to test that code, suddenly I am getting an InternalServerError response to start subscription, list subscriptions and stop subscription. The error is returned after a long timeout (in fact I had to increase the default timeout value).
So as of about two hours ago all the APIs are returning InternalServerError after a long timeout. This is happening on the following APIs:
/activity/feed/subscriptions/start
/activity/feed/subscriptions/list
/activity/feed/subscriptions/stop
The body of the response message is empty. So does not include any error info as described in https://msdn.microsoft.com/office-365/office-365-management-activity-api-reference.
Seems crazy this could be a service outage, so I must be missing something really elemental?
Hmmm. With no further changes to the code, now am getting HTTP 200 responses. If that was a service outage, that was a heck of a long outage for 99.9% uptime.

Random 500 errors on AppEngine

I have a fairly big application which went over a major overhaul.
The newer version uses lot of JSONP calls and I notice 500 server errors. Nothing is logged in the logs section to determine the error cause. It happens on JS, png and even jersey (servlets) too.
Searching SO and groups suggested that these errors are common during deployment. But it happens even after hours after deployment.
BTW, the application has become slightly bigger and it even causes deadline exception while starting few instances in few rare cases. Sometimes, it starts & serves within 6-10secs. Sometimes it goes to more than 75secs thereby causing a timeout for the similar request. I see the same behavior for warmup requests too. Nothing custom is loaded during app warmup.
I feel like you should be seeing the errors in your logs. Are you exceeding quotas or having deadline errors? Perhaps you have an error in your error handler like your file cannot be found, or the path to the error handler overlaps with another static file route?
To troubleshoot, I would implement custom error pages so you could determine the actual error code. I'm assuming Python since you never specified what language you are using. Add the following to your app.yaml and create static html pages that will give the recipient some idea of what's going on and then report back with your findings:
error_handlers:
- file: default_error.html
- error_code: over_quota
file: over_quota.html
- error_code: dos_api_denial
file: dos_api_denial.html
- error_code: timeout
file: timeout.html
If you already have custom error handlers, can you provide some of your app.yaml so we can help you?
Some 500s are not logged in your application logs. They are failures at the front-end of GAE. If, for some reason, you have a spike in requests and new instances of your application cannot be started fast enough to serve those requests, your client may see 500s even though those 500s do not appear in your application's logs. GAE team is working to provide visibility into those front-end logs.
I just saw this myself... I was researching some logs of visitors who only loaded half of the graphics files on a page. I tried clicking on the same link on a blog that they did to get to our site. In my case, I saw a 500 error in the chrome browser developer console for a js file. Yet when I looked at the GAE logs it said it served the file correctly with a 200 status. That js file loads other images which were not. In my case, it was an https request.
It is really important for us to know our customer experience (obviously). I wanted to let you know that this problem is still occurring. Just having it show up in the logs would be great, even attach a warm-up error to it or something so we know it is an unavoidable artefact of a complex server system (totally understandable). I just need to know if I should be adding instances or something else. This error did not wait for 60 seconds, maybe 5 to 10 seconds. It is like the round trip for SSL handshaking failed in the middle but the logs showed it as success.
So can I increase any timeout for the handshake or is that done on the browser side?

Constant disconnects due to channels going stale for no reason

Ever since the latest release a few days ago, our users are constantly being disconnected due to channel tokens going stale with minutes of being created. Our tokens are set to last for 5 hours, but we're lucky if they last for 5-10 minutes and we cannot even reconnect with a new channel token when the channel closes until the user refreshes.
A Javascript error triggers the beginning of it. It looks like this:
NetworkError: 400 Unknown SID - http://89.talkgadget.google.com/talkgadget/dch/bind?VER=8&clid=C9C2EFC06C7C5163&gsessionid&prop=data&token=AHRlWrrWl611ZMMDw8Apgi5vdYuS9UslofxEiJI47-2n4rkPgmuu1z0AN-UNQcyNEvhck-AYAMSLPru8Aumooz62hYNNbLTbi1a3lTSAzGEyj6TsXZirJYE&RID=rpc&SID=BEBDEFDA92C6A9F7&CI=0&AID=54&TYPE=xmlhttp&zx=gsjg8mb1i987&t=1
Then, in Firefox Firebug, the console gets spammed infinitely with
channel name mismatch; message ignored
Until a refresh occurs.
Our site is a real-time interactive site with chat. Our users are sending us emails upset that they keep getting disconnected. They're leaving the site. This is costing us not only goodwill with our user base, but also money and we are powerless to do anything because the bug is with Google App Engine.
Please fix this or rollback to the previous build immediately until you figure this out. The latest build is broken.
I haven't been able to reproduce this but I'm still looking at it. In the meantime: if you explicitly call socket.close() after receiving the error, can you then create a new Channel object and reconnect? If that doesn't work, you could even try manually removing the element with id "wcs-iframe" itself from the DOM. You should be able to use the original token when doing this instead of fetching a new token.

About the list of URIs showing the most errors

In my application, I hava a cron job which runs every 30 mins.
My question is why GAE always say that it occurs erros.
However, when I go to see the LATEST log.
I cannot see any errors.
And when I query logs with minimum severity(error).
The latest error is happened several days ago.
How can I fix it.
Thanks in advance!
The http status code being returned is 405. I Think any non 2XX codes will count as errors on that screen.
A 405 status means "Method Not Allowed"
I'm guessing that your RequestHandler for your cron task is not setup to receive POST requests.

Resources