I'm trying to use Google BigQuery to download a large dataset for the GitHub Data Challenge. I have designed my query and am able to run it in the console for Google BigQuery, but I am not allowed to export the data as CSV because it is too large. The recommended help tells me to save it to a table. This requires requires me to enable billing on my account and make a payment as far as I can tell.
Is there a way to save datasets as CSV (or JSON) files for export without payment?
For clarification, I do not need this data on Google's cloud and I only need to be able to download it once. No persistent storage required.
If you can enable the BigQuery API without enabling billing on your application, you can try using the getQueryResult API call. You're best bet is probably to enable billing (you probably won't be charged for the limited usage you need as you will probably stay within the free tier but if you do get charged it should only be a few cents) and save your query as a Google Storage object. If its too large I don't think you'll be able to use the Web UI effectively.
See this exact topic documentation:
https://developers.google.com/bigquery/exporting-data-from-bigquery
Summary: Use the extract operation. You can export CSV, JSON, or Avro. Exporting is free, but you need to have Google Cloud Storage activated to put the resulting files there.
use BQ command line tool
$ bq query
use the --format flag to save results as CSV.
Related
I'm trying to use the Google Monitoring API to retrieve metrics about my cloud usage. I'm using the Google Client Library for Python.
The API advertises the ability to access over 900 Stackdriver Monitoring Metrics. I am interested in accessing some Google App Engine metrics, such as Instance count, total memory, etc. The Google API Metrics page has a list of all the metrics I should be able to access.
I've followed the guides on the Google Client Library page , but my script making the API calls is not printing the metrics, it is just printing the metric descriptions.
How do I use the Google Monitoring API to access the metrics, rather than the descriptions?
My Code:
from oauth2client.service_account import ServiceAccountCredentials
from apiclient.discovery import build
...
response = monitor.projects().metricDescriptors().get(name='projects/{my-project-name}/metricDescriptors/appengine.googleapis.com/system/instance_count').execute()
print(json.dumps(response, sort_keys=True, indent=4))
My Output
I expect to see the actual instance count. How can I achieve this?
For anyone reading this, I figured out the problem. I was assuming the values would come from the 'metric descriptors' class in the api, but that was a poor assumption.
For values, you need to use a 'timeSeries' call. For this call, you need to specify the project you want to monitor, start time, end time, and a filter (the metric you want, such as cpu, memory, etc.)
So, to retrieve the app engine project memory, the above code becomes
request = monitor.projects().timeSeries().list(name='projects/my-appengine-project',
interval_startTime='2016-05-02T15:01:23.045123456Z',
interval_endTime='2016-06-02T15:01:23.045123456Z',
filter='metric.type="appengine.googleapis.com/system/memory/usage"')
response = request.execute()
This example has the start time and end time to cover a month of data.
How I can export information from hawt.io's dashboard to database in realtime? I want to save history of cpu load and ect, and read they from database later. May be this all wrong way and better write, using jolokia, something specific for my task?
hawtio is the visualization of the data, you better extract the data using jolokia yourself to store in the database. Jolokia makes extracting the data easier as you can use REST over HTTP as transport instead of native Java JMX.
On my appspot website, I use a third party API to query a large amount of data. The user then downloads the data in CSV. I know how to generate a csv and download it. The problem is that because the file is huge, I get the DeadlineExceededError.
I have tried tried increasing the fetch deadline to 60 (urlfetch.set_default_fetch_deadline(60)). It doesn't seem reasonable to increase it any further.
What is the appropriate way to tackle this problem on Google App Engine? Is this something where I have to use Task Queue?
Thanks.
DeadlineExceededError means that your incoming request took longer than 60 secs, not your UrlFetch call.
Deploy the code to generate the CSV file into a different module that you setup with basic or manual scaling. The URL to download your CSV will become http://module.domain.com
Requests can run indefinitely on modules with basic or manual scaling.
Alternately, consider creating a file dynamically in Google Cloud Storage (GCS) with your CSV content. At that point, the file resides in GCS and you have the ability to generate a URL from which they can download the file directly. There are also other options for different auth methods.
You can see documentation on doing this at
https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/
and
https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/functions
Important note: do not use the Files API (which was a common way of dynamically create files in blobstore/gcs) as it has been depracated. Use the above referenced Google Cloud Storage Client API instead.
Of course, you can delete the generated files after they've been successfully downloaded and/or you could run a cron job to expire links/files after a certain time period.
Depending on your specific use case, this might be a more effective path.
I have a model called "Category" in my app in GAE.
This model simply contains a name and it's parent category, and this won't be changed frequently after the website go online.
I'd like to know what is a better way to put these model instances in the beginning?
I now only know to execute (category.put()) in a webapp.RequestHandler by issuing a http request. But I suspect there is a proper way to do this.
Thanks!
You can use the remote API to connect to your datastore in a shell and add data as required.
Or, if it's a huge amount, you could think about using the bulk loader - but I suspect that the remote API will be more suitable.
I can't find anything recent on this. Is there any documentation on how to track with Google Analytics without using ga.js? I want a JS implementation on mobile devices but I don't want to load up 9KB of local memory or use server-side GA. I'm primarily interested only in tracking page views and uniques. Has anyone rolled their own GA implementation?
You can track using just a gif file.
To use GA without javascript... do it by generating our own gif file and passing some information back to Google through our server. That is, we generate a gif, assign and track our own cookie, and then gather that information as you move through the site, and use a HTTP request with the appropriate query strings and pass it back to Google, which they then compile and treat as regular old analytics.
more here: http://blogs.walkerart.org/newmedia/2009/11/12/building-walkers-mobile-site-google-analytics-without-javascript-pt2/
Here is a detailed explanation for how its done.
http://www.developria.com/2011/01/hacking-air-for-android-22-goo.html