GAE/J request log format breakdown - google-app-engine

Here is a sample of GAE Console log record:
http://i.stack.imgur.com/M2iJX.png for readable high res version.
I would like to provide a breakdown of the fileds, displayed both in the collpased (summary) view and the expended (detail) view. I will fill the fields I know their meaning and would appreciate assistannce with dichipering the rest. This post will be updated once new information is available.
Thank you,
Maxim.
Open issues:
How to read timestamp? [...-prod/0-0-39.346862139187007139]
Why in summary it says request took 343ms but in details is says 344ms ?
If request spend 123ms on cpu and 30ms on API calls where did the rest of the time go? Why the total request time is 343/344ms ?

Summary
12-14 : Date of the request. 12 is the month (December), 14 is the day of the month (Tuesday).
05:21AM : Time of the request, PST offset. 05 is the hour. 21 is the minute.
57.593 : Time of request, PST offset. 57 is the second. 593 is the millisecond.
/match/... : HTTP request path
200 : HTTP return code. (200 = OK)
343ms : The total time (in milliseconds) it took to calculate and return the response to the user
123cpu_ms : The time (in milliseconds) the request spend on CPU calculation
30api_cpu_ms : The time (in milliseconds) the request spend on API calls (Datastore get and co...)
1kb : The size (in kilobytes) of the response that was sent to the user
Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.44 Safari/534.7,gzip(gfe) : User Agent note that gzip(gfe) is added by AppEngine front end.
Details
IP yellow masked out : The IP address of the client initiating the request
HTTP Referrer : Note that it's empty on this request because it's a direct hit
[14/Dec/2010:05:21:57 -0800] : Date, including timestamp offset specification.
"GET /match/... HTTP/1.1" : The HTTP GET URI.
200 : HTTP return code. (200 = OK)
1036 : The size (in bytes) of the response that was sent to the user
Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.44 Safari/534.7,gzip(gfe) : User Agent note that gzip(gfe) is added by AppEngine front end.
ms=344 : The total time (in milliseconds) it took to calculate and return the response to the user
cpu_ms=123 : The time (in milliseconds) the request spend on CPU calculation
api_cpu_ms=30 : The time (in milliseconds) the request spend on API calls (Datastore get and co...)
cpm_usd=0.003648 : The amount (in us $) that 1000 requests such as this one would cost. ref
log record
12-14 : Date of this specific application emitted log entry. 12 is the month (December), 14 is the day of the month (Tuesday).
05:21AM : Time of this specific application emitted log entry, PST offset.
57.833 : Time of request, PST offset. 57 is the second. 833 is the millisecond.
[...-prod/0-0-39.346862139187007139] : The identifier of current version of the application that emitted this log message. Note: ...-prod is the application name. 0-0-39 is the deployed version name (app.yaml). .346862139187007139 is the time? (in what format?) when this version was deployed to appengine cloud.
stdout : The channel to which the application emitted this log message. Can be either stdout or stderr.
INFO ....Matcher - ... Id 208 matched. : Application level output. Can be done via either System.out.print or (as in this case) using a logging framework, logback

Isn't 57.593 seconds.milliseconds?
And cpm_usd represents an estimate of what 1,000 requests similar to this request would cost in US dollars.

Related

413 Request is larger than 20 MB or headers are too large. websolr

Getting error on SOLR for GET request Request is larger than 20 MB or headers are too large. Please reduce the size of your request and try again, or contact support#websolr.com with a sample of your request for further assistance.
Sample request below:
removing curl request due to security reason
Response:
HTTP/2 413
date: Thu, 06 May 2021 16:25:31 GMT
content-type: application/json
content-length: 218
{"code":413,"message":"Request is larger than 20 MB or headers are too large. Please reduce the size of your request and try again, or contact support#websolr.com with a sample of your request for further assistance."}
I am not sure how the request become more than 20MB. Also send nothing on header.
Websolr support here. When you pass those query parameters in the curl command, they are part of the Request-Line header. There are over 5900 characters in that line, which is what's triggering the HTTP 413.
As a workaround, just use a POST:
curl -i "removing data due to security reason"

WebPageTest complaining about not caching static resources even though I have caching enabled

I am testing my website on webpagetest.org. It gives me a
and then goes on to give this list:
Leverage browser caching of static assets: 63/100
WARNING - (2.0 hours) - http://stats.g.doubleclick.net/dc.js
WARNING - (5.5 days) - http://www.bookmine.net/css/images/ui-bg_highlight-soft_100_eeeeee_1x100.png
WARNING - (5.5 days) - http://www.bookmine.net/favicon.ico
WARNING - (5.5 days) - http://www.bookmine.net/js/index.min.js
WARNING - (5.5 days) - http://www.bookmine.net/js/jquery-ui-1.8.13.custom.min.js
WARNING - (5.5 days) - http://www.bookmine.net/css/index.css
WARNING - (5.5 days) - http://www.bookmine.net/js/jquery.form.min.js
WARNING - (5.5 days) - http://www.bookmine.net/css/jquery-ui-1.8.13.custom.css
funny thing is that it does recognize I have caching enabled (set to 5.5 days as reported above), then what is it complaining about? I have also verified I have a default_expiration: "5d 12h" set in my app.yaml and from this link:
default_expiration
Optional. The length of time a static file served by a static file
handler ought to be cached by web proxies and browsers, if the handler
does not specify its own expiration. The value is a string of numbers
and units, separated by spaces, where units can be d for days, h
for hours, m for minutes, and s for seconds. For example, "4d 5h"
sets cache expiration to 4 days and 5 hours after the file is first
requested. If omitted, the production server sets the expiration to 10
minutes.
For example:
application: myapp version: alpha-001 runtime: python27 api_version: 1
threadsafe: true
default_expiration: "4d 5h"
handlers:
Important: The expiration time will be sent in the Cache-Control and Expires HTTP response headers, and therefore, the files are likely
to be cached by the user's browser, as well as intermediate caching
proxy servers such as Internet Service Providers. Once a file is
transmitted with a given expiration time, there is generally no way to
clear it out of intermediate caches, even if the user clears their own
browser cache. Re-deploying a new version of the app will not reset
any caches. Therefore, if you ever plan to modify a static file, it
should have a short (less than one hour) expiration time. In most
cases, the default 10-minute expiration time is appropriate.
I even verified response my website is returning in fiddler:
HTTP/200 responses are cacheable by default, unless Expires, Pragma,
or Cache-Control headers are present and forbid caching. HTTP/1.0
Expires Header is present: Sat, 26 Sep 2015 08:14:56 GMT
HTTP/1.1 Cache-Control Header is present: public, max-age=475200
public: This response MAY be cached by any cache. max-age: This
resource will expire in 132 hours. [475200 sec]
HTTP/1.1 ETAG Header is present: "74YGeg"
So why am I getting a D?
Adding some useful links:
- http://www.learningtechnicalstuff.com/2011/01/static-resources-and-cache-busting-on.html
- http://www.codeproject.com/Articles/203288/Automatic-JS-CSS-versioning-to-update-browser-cach
- https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching#invalidating-and-updating-cached-responses
- https://developers.google.com/speed/docs/insights/LeverageBrowserCaching
- https://stackoverflow.com/a/7671705/147530
- http://www.particletree.com/notebook/automatically-version-your-css-and-javascript-files/
WebPagetest gives a warning if the cache expiration is set for less than 30 days. You can view that detail by clicking on the "D" grade in your test results and viewing the glossary for "Cache Static". You can also find that info here.
If you need to modify a cached static javascript file, you can add version number to the file path or in a querystring.

Google Cloud Storage (gcs) Error 200 on non-final Chunk

I'm running into the following error when running an export to CSV job on AppEngine using the new Google Cloud Storage library (appengine-gcs-client). I have about ~30mb of data I need to export on a nightly basis. Occasionally, I will need to rebuild the entire table. Today, I had to rebuild everything (~800mb total) and I only actually pushed across ~300mb of it. I checked the logs and found this exception:
/task/bigquery/ExportVisitListByDayTask
java.lang.RuntimeException: Unexpected response code 200 on non-final chunk: Request: PUT https://storage.googleapis.com/moose-sku-data/visit_day_1372392000000_1372898225040.csv?upload_id=AEnB2UrQ1cw0-Jbt7Kr-S4FD2fA3LkpYoUWrD3ZBkKdTjMq3ICGP4ajvDlo9V-PaKmdTym-zOKVrtVVTrFWp9np4Z7jrFbM-gQ
x-goog-api-version: 2
Content-Range: bytes 4718592-4980735/*
262144 bytes of content
Response: 200 with 0 bytes of content
ETag: "f87dbbaf3f7ac56c8b96088e4c1747f6"
x-goog-generation: 1372898591905000
x-goog-metageneration: 1
x-goog-hash: crc32c=72jksw==
x-goog-hash: md5=+H27rz96xWyLlgiOTBdH9g==
Vary: Origin
Date: Thu, 04 Jul 2013 00:43:17 GMT
Server: HTTP Upload Server Built on Jun 28 2013 13:27:54 (1372451274)
Content-Length: 0
Content-Type: text/html; charset=UTF-8
X-Google-Cache-Control: remote-fetch
Via: HTTP/1.1 GWA
at com.google.appengine.tools.cloudstorage.oauth.OauthRawGcsService.put(OauthRawGcsService.java:254)
at com.google.appengine.tools.cloudstorage.oauth.OauthRawGcsService.continueObjectCreation(OauthRawGcsService.java:206)
at com.google.appengine.tools.cloudstorage.GcsOutputChannelImpl$2.run(GcsOutputChannelImpl.java:147)
at com.google.appengine.tools.cloudstorage.GcsOutputChannelImpl$2.run(GcsOutputChannelImpl.java:144)
at com.google.appengine.tools.cloudstorage.RetryHelper.doRetry(RetryHelper.java:78)
at com.google.appengine.tools.cloudstorage.RetryHelper.runWithRetries(RetryHelper.java:123)
at com.google.appengine.tools.cloudstorage.GcsOutputChannelImpl.writeOut(GcsOutputChannelImpl.java:144)
at com.google.appengine.tools.cloudstorage.GcsOutputChannelImpl.waitForOutstandingWrites(GcsOutputChannelImpl.java:186)
at com.moose.task.bigquery.ExportVisitListByDayTask.doPost(ExportVisitListByDayTask.java:196)
The task is pretty straightforward, but I'm wondering if there is something wrong with the way I'm using waitForOutstandingWrites() or the way I'm serializing my outputChannel for the next task run. One thing to note, is that each task is broken into daily groups, each outputting their own individual file. The day tasks are scheduled to run 10 minutes apart concurrently to push out all 60 days.
In the task, I create a PrintWriter like so:
OutputStream outputStream = Channels.newOutputStream( outputChannel );
PrintWriter printWriter = new PrintWriter( outputStream );
and then write data out to it 50 lines at a time and call the waitForOutstandingWrites() function to push everything over to GCS. When I'm coming up to the open-file limit (~22 seconds) I put the outputChannel into Memcache and then reschedule the task with the data iterator's cursor.
printWriter.print( outputString.toString() );
printWriter.flush();
outputChannel.waitForOutstandingWrites();
This seems to be working most of the time, but I'm getting these errors which is creating ~corrupted and incomplete files on the GCS. Is there anything obvious I'm doing wrong in these calls? Can I only have one channel open to GCS at a time per application? Is there some other issue going on?
Appreciate any tips you could lend!
Thanks!
Evan
A 200 response indicates that the file has been finalized. If this occurs on an API other than close, the library throws an error, as this is not expected.
This is likely occurring do to the way you are rescheduling the task. It may be that when you reschedule the task, the task queue is duplicating the delivery of the task for some reason. (This can happen) and if there are no checks to prevent this, there could be two instances attempting to write to the same file at the same time. When one closes the file the other sees an error. The net result is a corrupt file.
The simple solution is not to re-schedule the task. There is no time limit on how long a file can be held open with the GCS client. (Unlike the deprecated Files API.)

How to generate GET request to webServiceURL/version/devices/deviceLibraryIdentifier/registrations/passTypeIdentifier?passesUpdatedSince=tag

After creating a pass, I can add it to device, register device by saving data to database. Next, how can I generate GET request to webServiceURL/version/devices/deviceLibraryIdentifier/registrations/passTypeIdentifier?passesUpdatedSince=tag ? I see this in console: Apr 4 10:08:26 CamMobs-iPod4 passd[12098] <Warning>: Generating GET request with URL <http:/192.168.1.202:8888/passesWebserver/v1/devices/02d6566cc59dc34e3abd116eed498898/registrations/pass.cam-mob.passbookpasstest>Apr 4 10:08:26 CamMobs-iPod4 passd[12098] <Warning>: Get serial #s task (for device 02d6566cc59dc34e3abd116eed498898, pass type pass.cam-mob.passbookpasstest, last updated (null); with web service url http://192.168.1.202:8888/passesWebserver/) got response with code 200Apr 4 10:08:26 CamMobs-iPod4 passd[12098] <Warning>: Get serial #s task (for device 02d6566cc59dc34e3abd116eed498898, pass type pass.cam-mob.passbookpasstest, last updated (null); with web service url http://192.168.1.202:8888/passesWebserver/) encountered error: Server response was malformed (Missing response data)

Timeout on bigquery v2 from GAE

I am doing a query to BigQuery from my app in google app engine, and receive a weird result from BQ sometimes (discovery#restDescription). It took me some time to understand that the problem occurs only when the amount of data i am querying is high, and thus making somehow my query time out within 10 sec.
I found a good description of my problem here:
Bad response to a BigQuery query
After reading again GAE docs, i found out that HTTP requests should be handled within a few seconds. So i guess, and this is only a guess, that bigquery might also be limiting itself in the same way, and therefor, has to respond to my queries "within seconds".
If this is the case, first of all, i will be a bit surprised, because my bigquery requests are for sure going to take more than few seconds... But anyway, i did a test by forcing a timeout of 1 second to my query, and then get the queryResult by polling the API call getQueryResults.
The outcome is very interesting. BigQuery is returning something within 3 secs, more or less (not 1 as i asked) and then i get my results later on, within 26 secs by polling. This looks like circumventing the 10 secs timeout issue.
But i hardly see myself doing this trick in production.
Did someone encontered the same problem with BigQuery? What am i supposed to do when the query lasts more than "few seconds"?
Here is the code i use to query:
query_config = {
'timeoutMs': 1000,
"defaultDataset": {
"datasetId": self.dataset,
"projectId": self.project_id
},
}
query_config.update(params)
result_json = (self.service.jobs()
.query(projectId=project,
body=query_config)
.execute())
And to retrieve the results, i poll with this:
self.service.jobs().getQueryResults(projectId=project,jobId=jobId).execute()
And those are the logs of what happens on BigQuery:
2012-12-03 12:31:19.835 /api/xxxxx/ 200 4278ms 0kb Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1
xx.xx.xx.xx - - [03/Dec/2012:02:31:19 -0800] "GET /api/xxxxx/ HTTP/1.1" 200 243 ....... ms=4278 cpu_ms=346 cpm_usd=0.000426 instance=00c61b117c1169753678c6d5dac736b223809b
I 2012-12-03 12:31:16.060
URL being requested: https://www.googleapis.com/discovery/v1/apis/bigquery/v2/rest?userIp=xx.xx.xx.xx
I 2012-12-03 12:31:16.061
Attempting refresh to obtain initial access_token
I 2012-12-03 12:31:16.252
URL being requested: https://www.googleapis.com/bigquery/v2/projects/xxxxxxxxxxxx/queries?alt=json
I 2012-12-03 12:31:19.426
URL being requested: https://www.googleapis.com/bigquery/v2/projects/xxxxxxxx/jobs/job_a1e74a6769f74cb997d998623b1b6b2e?alt=json
I 2012-12-03 12:31:19.500
This is what my query API call returns me. And in the meta data, status is 'RUNNING':
{u'kind': u'bigquery#queryResponse', u'jobComplete': False, u'jobReference': {u'projectId': u'xxxxxxxxxxx', u'jobId': u'job_a1e74a6769f74cb997d998623b1b6b2e'}}
with the jobId I am able to retrieve the results 26 secs later, when they are ready.
There must be another way! What am i doing wrong?

Resources