Timeout on bigquery v2 from GAE - google-app-engine

I am doing a query to BigQuery from my app in google app engine, and receive a weird result from BQ sometimes (discovery#restDescription). It took me some time to understand that the problem occurs only when the amount of data i am querying is high, and thus making somehow my query time out within 10 sec.
I found a good description of my problem here:
Bad response to a BigQuery query
After reading again GAE docs, i found out that HTTP requests should be handled within a few seconds. So i guess, and this is only a guess, that bigquery might also be limiting itself in the same way, and therefor, has to respond to my queries "within seconds".
If this is the case, first of all, i will be a bit surprised, because my bigquery requests are for sure going to take more than few seconds... But anyway, i did a test by forcing a timeout of 1 second to my query, and then get the queryResult by polling the API call getQueryResults.
The outcome is very interesting. BigQuery is returning something within 3 secs, more or less (not 1 as i asked) and then i get my results later on, within 26 secs by polling. This looks like circumventing the 10 secs timeout issue.
But i hardly see myself doing this trick in production.
Did someone encontered the same problem with BigQuery? What am i supposed to do when the query lasts more than "few seconds"?
Here is the code i use to query:
query_config = {
'timeoutMs': 1000,
"defaultDataset": {
"datasetId": self.dataset,
"projectId": self.project_id
},
}
query_config.update(params)
result_json = (self.service.jobs()
.query(projectId=project,
body=query_config)
.execute())
And to retrieve the results, i poll with this:
self.service.jobs().getQueryResults(projectId=project,jobId=jobId).execute()
And those are the logs of what happens on BigQuery:
2012-12-03 12:31:19.835 /api/xxxxx/ 200 4278ms 0kb Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1
xx.xx.xx.xx - - [03/Dec/2012:02:31:19 -0800] "GET /api/xxxxx/ HTTP/1.1" 200 243 ....... ms=4278 cpu_ms=346 cpm_usd=0.000426 instance=00c61b117c1169753678c6d5dac736b223809b
I 2012-12-03 12:31:16.060
URL being requested: https://www.googleapis.com/discovery/v1/apis/bigquery/v2/rest?userIp=xx.xx.xx.xx
I 2012-12-03 12:31:16.061
Attempting refresh to obtain initial access_token
I 2012-12-03 12:31:16.252
URL being requested: https://www.googleapis.com/bigquery/v2/projects/xxxxxxxxxxxx/queries?alt=json
I 2012-12-03 12:31:19.426
URL being requested: https://www.googleapis.com/bigquery/v2/projects/xxxxxxxx/jobs/job_a1e74a6769f74cb997d998623b1b6b2e?alt=json
I 2012-12-03 12:31:19.500
This is what my query API call returns me. And in the meta data, status is 'RUNNING':
{u'kind': u'bigquery#queryResponse', u'jobComplete': False, u'jobReference': {u'projectId': u'xxxxxxxxxxx', u'jobId': u'job_a1e74a6769f74cb997d998623b1b6b2e'}}
with the jobId I am able to retrieve the results 26 secs later, when they are ready.
There must be another way! What am i doing wrong?

Related

Thunderbird Lightning caldav sync doesn't show any data/events

when i try to synchronize my caldav server implementation with Thunderbird 45.4.0 and Lightning 4.7.4 (one particular calendar collection) it doesnt show any data or events in the calendar though the last call of the sequence provided the data.
In the Thunderbird error log i can see one error:
Zeitstempel: 07.11.16, 14:21:12
Fehler: [calCachedCalendar] replay action failed: null,
uri=http://127.0.0.1:8003/sap/sports/webdav/appsvc/webdav/services/
server.xsjs/cal/_D043133/, result=2147500037, op=[xpconnect wrapped
calIOperation]
Quelldatei:
file:///Users/d043133/Library/Thunderbird/Profiles/hfbvuk9f.default/
extensions/%7Be2fda1a4-762b-4020-b5ad-a41df1933103%7D/calendar-
js/calCachedCalendar.js
Zeile: 327
the call sequence is as follows (detailed content via gist-links):
Propfind Request - Response
Options Request - Response
Propfind Request - Response
Report Request - Response - Response Raw
The synchronization with other clients like macOS-calendar and ios-calendar works in principle and shows the data. Does anyone has a clue what is going wrong here?
Not sure whether that is the cause but I can see two incorrect things:
a) Your <href/> property has trailing spaces:
<d:href>/sap/sports/webdav/appsvc/webdav/services/server.xsjs/cal/_D043133/EVENT%3A070768ba5dd78ff15458f1985cdaabb1.ics
</d:href>
b) your ORGANIZER property is not a valid URI
ORGANIZER:_D043133
i was able to find the cause of the above issue by debugging Thunderbird as propsed by Philipp. The Report Response has http status code 200, but as it is a multistatus response Thunderbird/Lightning expects status code 207 ;-)
Thanks for the hints!

Solr Query Max Condition

I am using solr 4.3.0 for my web site search. I want to do something using solr but when I query, I get an error. In my situation I have 40000 products, and I want to excludes 1500 products with query. This is the my query
-brand-slug:reebok OR -brand-slug:nike AND
-skuCode:(01-117363 01-117364 01-117552 01-119131 01-119166 01-1J622 01-1J793 01-1M4434 01-1M9691 01-1Q279 01-1T405 01-1T865 01-2109830 01-2111116 01-2111186 01-21J625 01-21J794 01-21V019 01-2M9691 01-2M9696 01-33J793 01-519075 01-M4431 01-M7652 01-M9160 01-M9165 01-M9166 01-M9613 01-M9622 01-M9697 01200CY0001N00 01211SU0141M00 01212KU0009N00 01212KU0010N00 01212KU0025N00 01212KU0027N00 01212KU0038N00 01212KW0019N00 01212KW0020N00
....thousands of skuCodes)
If I put 670 skuCodes in their that will works good, but I use 1500 skuCodes is an error like
Solr HTTP error: OK (400)
How could I solve this problem? Thanks
What a night :) I solved my problem. Actually there was 2 problems in my system. First problem is in my tomcat server. I increase their request size with change maxHttpHeaderSize="65536". ( You could change your web server buffer size I changed my nginx conf). The other problem is about solr config. I got an error like 'too many boolean clauses'. If you get this error, you could change maxBooleanClauses in solrconfig.xml. After restart my tomcat server everything was ok.

Google Cloud Storage (gcs) Error 200 on non-final Chunk

I'm running into the following error when running an export to CSV job on AppEngine using the new Google Cloud Storage library (appengine-gcs-client). I have about ~30mb of data I need to export on a nightly basis. Occasionally, I will need to rebuild the entire table. Today, I had to rebuild everything (~800mb total) and I only actually pushed across ~300mb of it. I checked the logs and found this exception:
/task/bigquery/ExportVisitListByDayTask
java.lang.RuntimeException: Unexpected response code 200 on non-final chunk: Request: PUT https://storage.googleapis.com/moose-sku-data/visit_day_1372392000000_1372898225040.csv?upload_id=AEnB2UrQ1cw0-Jbt7Kr-S4FD2fA3LkpYoUWrD3ZBkKdTjMq3ICGP4ajvDlo9V-PaKmdTym-zOKVrtVVTrFWp9np4Z7jrFbM-gQ
x-goog-api-version: 2
Content-Range: bytes 4718592-4980735/*
262144 bytes of content
Response: 200 with 0 bytes of content
ETag: "f87dbbaf3f7ac56c8b96088e4c1747f6"
x-goog-generation: 1372898591905000
x-goog-metageneration: 1
x-goog-hash: crc32c=72jksw==
x-goog-hash: md5=+H27rz96xWyLlgiOTBdH9g==
Vary: Origin
Date: Thu, 04 Jul 2013 00:43:17 GMT
Server: HTTP Upload Server Built on Jun 28 2013 13:27:54 (1372451274)
Content-Length: 0
Content-Type: text/html; charset=UTF-8
X-Google-Cache-Control: remote-fetch
Via: HTTP/1.1 GWA
at com.google.appengine.tools.cloudstorage.oauth.OauthRawGcsService.put(OauthRawGcsService.java:254)
at com.google.appengine.tools.cloudstorage.oauth.OauthRawGcsService.continueObjectCreation(OauthRawGcsService.java:206)
at com.google.appengine.tools.cloudstorage.GcsOutputChannelImpl$2.run(GcsOutputChannelImpl.java:147)
at com.google.appengine.tools.cloudstorage.GcsOutputChannelImpl$2.run(GcsOutputChannelImpl.java:144)
at com.google.appengine.tools.cloudstorage.RetryHelper.doRetry(RetryHelper.java:78)
at com.google.appengine.tools.cloudstorage.RetryHelper.runWithRetries(RetryHelper.java:123)
at com.google.appengine.tools.cloudstorage.GcsOutputChannelImpl.writeOut(GcsOutputChannelImpl.java:144)
at com.google.appengine.tools.cloudstorage.GcsOutputChannelImpl.waitForOutstandingWrites(GcsOutputChannelImpl.java:186)
at com.moose.task.bigquery.ExportVisitListByDayTask.doPost(ExportVisitListByDayTask.java:196)
The task is pretty straightforward, but I'm wondering if there is something wrong with the way I'm using waitForOutstandingWrites() or the way I'm serializing my outputChannel for the next task run. One thing to note, is that each task is broken into daily groups, each outputting their own individual file. The day tasks are scheduled to run 10 minutes apart concurrently to push out all 60 days.
In the task, I create a PrintWriter like so:
OutputStream outputStream = Channels.newOutputStream( outputChannel );
PrintWriter printWriter = new PrintWriter( outputStream );
and then write data out to it 50 lines at a time and call the waitForOutstandingWrites() function to push everything over to GCS. When I'm coming up to the open-file limit (~22 seconds) I put the outputChannel into Memcache and then reschedule the task with the data iterator's cursor.
printWriter.print( outputString.toString() );
printWriter.flush();
outputChannel.waitForOutstandingWrites();
This seems to be working most of the time, but I'm getting these errors which is creating ~corrupted and incomplete files on the GCS. Is there anything obvious I'm doing wrong in these calls? Can I only have one channel open to GCS at a time per application? Is there some other issue going on?
Appreciate any tips you could lend!
Thanks!
Evan
A 200 response indicates that the file has been finalized. If this occurs on an API other than close, the library throws an error, as this is not expected.
This is likely occurring do to the way you are rescheduling the task. It may be that when you reschedule the task, the task queue is duplicating the delivery of the task for some reason. (This can happen) and if there are no checks to prevent this, there could be two instances attempting to write to the same file at the same time. When one closes the file the other sees an error. The net result is a corrupt file.
The simple solution is not to re-schedule the task. There is no time limit on how long a file can be held open with the GCS client. (Unlike the deprecated Files API.)

Google App Engine: Cron handler is called by cron, but no code is run

I have a following problem. I have defined a cron job in Google App Engine, but my get method is not called (or to be precise it is called every other time - if I run it manually it doesn't do anything at the first time, but at the second it works flawlessly). This is output from logging for the call made by cron:
2011-07-04 11:39:08.500 /suggestions/ 200 489ms 70cpu_ms 0kb AppEngine-Google; (+http://code.google.com/appengine)
0.1.0.1 - - [04/Jul/2011:11:39:08 -0700] "GET /suggestions/ HTTP/1.1" 200 0 - "AppEngine-Google; (+http://code.google.com/appengine)" "bazinga-match.appspot.com" ms=489 cpu_ms=70 api_cpu_ms=0 cpm_usd=0.001975 queue_name=__cron task_name=a449e27ff383de24ff8fc5d5f05f2aae
As you can see it makes GET request on /suggestions/, but nothing happens, including my log messages (they are printed, when I run it second time manually). Do you have any idea, why this might be happening?
My handler:
class SuggestionsHandler(RequestHandler):
def get(self):
logging.debug('Creating suggestions')
for key in db.Query(User, keys_only=True).order('last_suggestion'):
make_suggestion(key)
logging.debug('Done creating suggestions')
print
print('Done creating suggestions')
This is my cron.yaml:
cron:
- description: daily suggestion creation
url: /suggestions/
schedule: every 6 hours
and proper section of my app.yaml:
- url: /suggestions/
script: cron.py
login: admin
You're missing this declaration from the bottom of your handler file, after main:
if __name__ == '__main__':
main()
The first time a handler script is run, App Engine simply imports it, and this snippet runs your main in that situation. On subsequent requests, App Engine runs the main function, if you defined one.

GAE/J request log format breakdown

Here is a sample of GAE Console log record:
http://i.stack.imgur.com/M2iJX.png for readable high res version.
I would like to provide a breakdown of the fileds, displayed both in the collpased (summary) view and the expended (detail) view. I will fill the fields I know their meaning and would appreciate assistannce with dichipering the rest. This post will be updated once new information is available.
Thank you,
Maxim.
Open issues:
How to read timestamp? [...-prod/0-0-39.346862139187007139]
Why in summary it says request took 343ms but in details is says 344ms ?
If request spend 123ms on cpu and 30ms on API calls where did the rest of the time go? Why the total request time is 343/344ms ?
Summary
12-14 : Date of the request. 12 is the month (December), 14 is the day of the month (Tuesday).
05:21AM : Time of the request, PST offset. 05 is the hour. 21 is the minute.
57.593 : Time of request, PST offset. 57 is the second. 593 is the millisecond.
/match/... : HTTP request path
200 : HTTP return code. (200 = OK)
343ms : The total time (in milliseconds) it took to calculate and return the response to the user
123cpu_ms : The time (in milliseconds) the request spend on CPU calculation
30api_cpu_ms : The time (in milliseconds) the request spend on API calls (Datastore get and co...)
1kb : The size (in kilobytes) of the response that was sent to the user
Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.44 Safari/534.7,gzip(gfe) : User Agent note that gzip(gfe) is added by AppEngine front end.
Details
IP yellow masked out : The IP address of the client initiating the request
HTTP Referrer : Note that it's empty on this request because it's a direct hit
[14/Dec/2010:05:21:57 -0800] : Date, including timestamp offset specification.
"GET /match/... HTTP/1.1" : The HTTP GET URI.
200 : HTTP return code. (200 = OK)
1036 : The size (in bytes) of the response that was sent to the user
Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.44 Safari/534.7,gzip(gfe) : User Agent note that gzip(gfe) is added by AppEngine front end.
ms=344 : The total time (in milliseconds) it took to calculate and return the response to the user
cpu_ms=123 : The time (in milliseconds) the request spend on CPU calculation
api_cpu_ms=30 : The time (in milliseconds) the request spend on API calls (Datastore get and co...)
cpm_usd=0.003648 : The amount (in us $) that 1000 requests such as this one would cost. ref
log record
12-14 : Date of this specific application emitted log entry. 12 is the month (December), 14 is the day of the month (Tuesday).
05:21AM : Time of this specific application emitted log entry, PST offset.
57.833 : Time of request, PST offset. 57 is the second. 833 is the millisecond.
[...-prod/0-0-39.346862139187007139] : The identifier of current version of the application that emitted this log message. Note: ...-prod is the application name. 0-0-39 is the deployed version name (app.yaml). .346862139187007139 is the time? (in what format?) when this version was deployed to appengine cloud.
stdout : The channel to which the application emitted this log message. Can be either stdout or stderr.
INFO ....Matcher - ... Id 208 matched. : Application level output. Can be done via either System.out.print or (as in this case) using a logging framework, logback
Isn't 57.593 seconds.milliseconds?
And cpm_usd represents an estimate of what 1,000 requests similar to this request would cost in US dollars.

Resources