I wrote an application running on Google App that generates logs at very high rate. Now I need to download and process them. It says in the Admin Console that the log is around 300MB. Ideally I would like to process only small parts of the logs. Could anybody give me some pointers on how to:
Download only log entries in a specific time range (i.e. the timestamp of the log records fall into this range).
OR
If I have about 300Mb of logs, what is the quickest way to download it?. I've run the appcfg.sh command for almost an hour now and it's still running (around 250000 log records now). Is there a way to break the log down to small chunks and download them in parallel?
Many thanks,
Anh.
Related
I was working on the tensorflow object detection API. I managed to train it locally on my computer and get decent results. However, when I tried to replicate the same on GCP, I had several errors. So, basically, I followed the documentation mentioned in the official tensorflow -running on cloud documentation
So this is how the bucket is laid out:
Bucket
weeddetectin-data
Train-packages
This is how I ran the training and evaluation job:
Running a multiworker training job
Running an evaluation job on cloud
I then used the following command to monitor on tensoboard:
tensorboard --logdir=gs://weeddetection --port=8080
I opened the dashboard using the preview feature in the console. But it says no dashboards are active for the current data set.
No Dashboards are active
So, I checked on my activity page to really see if the training and evaluation job were submitted:
Training Job
Evaluation Job
It seems as if there are no events files being written to your bucket.
The root cause could be that the manual your are using refers to an old version of the tensor models.
Please try and change
--train_dir=gs:...
to
--model_dir=gs://${YOUR_BUCKET_NAME}/model
And resend the job, once the job is running check the model_dir in the bucket to see if the files are written there.
Check out: gcloud ml-engine jobs documentation for additional read.
Hope it help!
Trying out new flexible app engine runtime. In this case a custom Ruby on Rails runtime based on the google provided ruby runtime.
When firing of gcloud preview app deploy the whole process takes ~8 minutes, most of which is "updating service". Is this normal? And more importantly, how can I speed it up?
Regards,
Ward
Yes, that is totally normal. Most of the deployment steps happen away from your computer and are independent of your codebase size, so there's not a lot you can do to speed up the process.
Various steps that are involved in deploying an app on App Engine can be categorized as follows:
Gather info from app.yaml to understand overall deployment
Collect code and use the docker image specified in app.yaml to build a docker image with your code
Provision Compute Instances, networking/firewall rules, install docker related tools on instance, push docker image to instance and start it
Make sure all deployments were successful, start health-checks and if required, transfer/balance out the load.
The only process which takes most of time is the last part where it does all the necessary checks to make sure deployment was successful and start ingesting traffic. Depending upon your code size (uploading code to create container) and requirements for resources (provisioning custom resources), step 2 and 3 might take a bit more time.
If you do an analysis you will find that about 70% of time is consumed in last step, where we have least visibility into, yet the essential process which gives app-engine the ability to do all the heavy lifting.
Deploying to the same version got me from 6 minutes to 3 minutes in subsequent deploys.
Example:
$ gcloud app deploy app.yaml --version=test
Make sure you check what is in the zip it's uploading (it tells you the location of this on deploy), and make sure your yaml skip_files is set to include things like your .git directory if you have one, and node_modules
Note that the subsequent deploys should be much faster than 8 mins. It's usually 1 minute or less in my tests with Node.js on App Engine Flex.
As suggested above by #ludo you could use in the meantime Google App Engine Standard instead of Flex. Which, takes approximately ~30-50 seconds after the first deployment.
You can test GAE Standard by running this tutorial, which doesn't require a billing account:
https://codelabs.developers.google.com/codelabs/cloud-app-engine-springboot/index.html#0
And agreed. this doesn't address GAE Flex but gives some options to accelerate during development.
Just fire this command from root directory of app.yaml
From shell visit directory of app.yaml then run gcloud app deploy
It will be uploaded within few seconds.
I have encountered a Google App Engine problem for which I can not find an answer despite a lot of searching.
I can make a minor tweak to my code - say changing a constant or something, nothing that breaks the app code - and do a deployment using appcfg.py update.
After deployment, nothing works, and I find a "This app has no instances deployed" message in the developer console. Not a single instance is running. Eventually after about ten minutes I start to see instances appearing, the app working fine.
I have maximum idle instances set to 5, I've enabled warmup, and I am well within both quota and budget. My log files don't tell me anything of use.
So this one's got me stumped. I'm not anxious to see a ten minute outage every time I update my code. What is happening here, and how can I ensure that my instances start up as they should when I deploy new code?
I`m running several python (2.7) applications and constantly hit one problem: log search (from dashboard, admin console) is not reliable. it is fine when i searching for recent log entries (they are normally found ok), but after some period (one day for instance) its not possible to find same record with same search query again. just "no results". admin console shows that i have 1 gig of logs spanning 10-12 days, so old record should be here to find, retention/log size limits is not a reason for this.
Specifically i have "cron" request that write stats to log every day (it`s enough for me) and searching for this request always gives me the last entry, not entry-per-day-of-span-period as expected.
Is it expected behaviour (i do not see clear statements about log storage behaviour in docs, for example) or there is something to tune? For example, will it help to log less per request? or may be there is advanced use of query language.
Please advise.
This is a known issue that has already been reported on googleappengine issue tracker.
As an alternative you can consider reading your application logs programmatically using the Log Service API to ingest them in BigQuery, or build your own search index.
Google App Engine Developer Relations delivered a codelab at Google I/O 2012 about App Engine logs ingestion into Big Query.
And Streak released a tool called called Mache and a chrome extension to automate this use case.
I have a 10 MB CSV file of Geolocation data that I tried to upload to my App Engine datastore yesterday. I followed the instructions in this blog post and used the bulkloader/appcfg tool. The datastore indicated that records were uploaded but it took several hours and used up my entire CPU quota for the day. The process broke down in errors towards the end before I actually exceeded my quota. But needless to say, 10 MB of data shouldn't require this much time and power.
So, is there some other way to get this CSV data into my App Engine datastore (for a Java app).
I saw a post by Ikai Lan about using a mapper tool he created for this purpose but it looks rather complicated.
Instead, what about uploading the CSV to Google Docs - is there a way to transfer it to the App Engine datastore from there?
I do daily uploads of 100000 records (20 megs) through the bulkloader. Settings I played with:
- bulkloader.yaml config: set to auto generate keys.
- include header row in raw csv file.
- speed parameters are set on max (not sure if reducing would reduce cpus consumed)
These settings burn through my 6.5 hrs of free quota in about 4 minutes -- but it gets the data loaded (maybe its' from the indexes being generated).
appcfg.py upload_data --config_file=bulkloader.yaml --url=http://yourapp.appspot.com/remote_api --filename=data.csv --kind=yourtablename --bandwidth_limit=999999 --rps_limit=100 --batch_size=50 --http_limit=15
(I autogenerate this line with a script and use Autohotkey to send my credentials).
I wrote this gdata connector to pull data out of a Google Docs Spreadsheet and insert it into the datastore, but it uses Bulkloader, so it kind of takes you back to square one of your problem.
http://code.google.com/p/bulkloader-gdata-connector/source/browse/gdata_connector.py
What you could do however is take a look at the source to see how I pull data out of gdocs and create a task(s) that does that, instead of going through bulkloader.
Also you could upload your document into the blobstore and similarly create a task that reads csv data out of blobstore and creates entities. (I think this would be easier and faster than working with gdata feeds)