How to import Stackdriver logs into BigQuery - google-app-engine

Is there a way to load logs from app engine into BigQuery on Google Cloud Platform?
I'm attempting to use federated querying to load Stackdriver log files in Cloud Storage. However BigQuery cannot load some of the field names written by Stackdriver.
The log files are newline-delimited JSON, with records that look like
{
"insertId":"j594356785jpk",
"labels":{
"appengine.googleapis.com/instance_name":"aef-my-instance-20180204t220251-x59f",
"compute.googleapis.com/resource_id":"99999999999999999",
"compute.googleapis.com/resource_name":"c3453465db",
"compute.googleapis.com/zone":"us-central1-f"
},
"logName":"projects/my-project/logs/appengine.googleapis.com%2Fstdout",
"receiveTimestamp":"2018-02-08T02:59:59.972739505Z",
"resource":{
"labels":{
"module_id":"my-instance",
"project_id":"my-project",
"version_id":"20180204t220251"
},
"type":"gae_app"
},
"textPayload":"{\"json\":\"blob\"}\n",
"timestamp":"2018-02-08T02:59:54Z"
}
But BigQuery returns an error on this input: query: Invalid field name "compute.googleapis.com/zone". Fields must contain only letters, numbers, and underscores, start with a letter or underscore, and be at most 128 characters long.
Is there any way to ingest this kind of log into BigQuery?
I'm specifically interested in extracting just the textPayload fields.

You can export Stackdriver logs into BigQuery
In Google Cloud Platform console, Stackdriver Logging >> Exports >> Create Export.
Create a filter resource.type="gae_app" and fill the Sink Name on right and choose BigQuery as Sink Service then select the dataset to export to from Sink Destination. https://console.cloud.google.com/logs/exports
Reference: https://cloud.google.com/logging/docs/export/configure_export_v2#dest-create

Related

How to connect Metabase with Google Sheet?

I have username and password to the metabase our company use heavily. Everyday I have to download CSVs frequently and then export them to google sheets to make report or analysis. Is there any way to connect Metabase to Google Sheet so that the sheets pull CSVs automatically from Metabase url?
I've implemented a Google Sheets add-on as a solution for this and open-sourced it.
I've used this code as a Google Sheets add-on at a large organization and it has worked well so far, with hundreds of users a week, running tens of thousands of queries.
This is what it looks like in action:
If you don't want the hassle of setting it up as a Google Sheets add-on, you can take the script and adapt it as a simple Apps Script.
You could try and write a script using Google Apps Script for your issue. An idea would be to use the UrlFetchApp.fetch() method which allows scripts to communicate with other applications or access other resources on the web by making a request to fetch a URL, in your case the Metabase URL.
This following code snippet might give you a glimpse on what to do next:
function myFunction() {
var sheet = SpreadsheetApp.getActive().getSheets()[0];
var data = {
// the data that you will be using
};
var options = {
'method': 'post',
'payload': data
};
var response = UrlFetchApp.fetch('https://metabase.example.com/api/card/1/query/json', options);
var responseData = JSON.parse(response.getContentText());
sheet.appendRow(responseData);
}
What it actually does: it uses the Metabase API to fetch the data you want by using a POST request (the reason a POST request is used is because according to the API Documentation for Metabase v0.32.2 it runs the query associated with a card, and returns its results as a file in the specified format). Then the response from the request is appended to your sheet by using .appendRow().
Furthermore, you can read more about UrlFetchApp and Apps Script here:
Class UrlFetchApp;
Apps Script;
Spreadsheet App.
You can add a Google Sheet as an external table in Google Big Query, then connect Metabase to Google BigQuery.

Using firestore data with BigQuery

Is there any automatic/manual way to export data from firestore database to a BigQuery table?
Tried to look around, it looks like there's no way to export data from firestore without using code.
Any news about this one?
Thanks.
The simplest way to import Firestore data into BigQuery which does not require coding is using the command line. You can first export the data using the command below. Note that for importing the data into BigQuery you can only import specific collections and not all your documents in one batch.
gcloud beta firestore export gs://[BUCKET_NAME] --collection-ids='[COLLECTION_ID]'
Next, in the bucket you specified above, you will find a folder named after the timestamp of your export. Navigate the directories, locate the file ending with the file extension “export_metadata” and use its file path as the import source. You can then import the data into BigQuery using the command below:
bq --location=[LOCATION] load --source_format=DATASTORE_BACKUP [DATASET].[TABLE] [PATH_TO_SOURCE]
The best way to do this now is to use the official Firebase extension for exporting data in real-time from Firestore to BigQuery: https://github.com/firebase/extensions/tree/master/firestore-bigquery-export
Configuration can be done through the console or the CLI without using any code. The configured extension syncs data from Firestore to BigQuery in essentially realtime.
The extension creates an event listener on the Firestore collection you configure and a cloud function to sync data from Firestore to BigQuery.

Google BigQuery Dataset Export

I'm trying to use Google BigQuery to download a large dataset for the GitHub Data Challenge. I have designed my query and am able to run it in the console for Google BigQuery, but I am not allowed to export the data as CSV because it is too large. The recommended help tells me to save it to a table. This requires requires me to enable billing on my account and make a payment as far as I can tell.
Is there a way to save datasets as CSV (or JSON) files for export without payment?
For clarification, I do not need this data on Google's cloud and I only need to be able to download it once. No persistent storage required.
If you can enable the BigQuery API without enabling billing on your application, you can try using the getQueryResult API call. You're best bet is probably to enable billing (you probably won't be charged for the limited usage you need as you will probably stay within the free tier but if you do get charged it should only be a few cents) and save your query as a Google Storage object. If its too large I don't think you'll be able to use the Web UI effectively.
See this exact topic documentation:
https://developers.google.com/bigquery/exporting-data-from-bigquery
Summary: Use the extract operation. You can export CSV, JSON, or Avro. Exporting is free, but you need to have Google Cloud Storage activated to put the resulting files there.
use BQ command line tool
$ bq query
use the --format flag to save results as CSV.

BigQuery failed to import Cloud Datastore backup file via Google Cloud Storage

I was trying to load one of my data store table to BigQuery. When I found there is an option "AppEngine Datastore Backup" in the web ui of BigQuery, I'm very happy cause all my data are located in one data store table. It should be the easiest approach (I thought) to just export data via "Datastore Admin" page of Google App Engine and then import it in BigQuery.
The export process went quite smoothly and I happily watched all mapper tasks successfully finished. After this step, what I got are 255 files in one of my Cloud Storage bucket. The problem arose when I try to import it in the web-ui of BigQuery. I input the url of one of the 255 files as the source of data load. And all I got is following error message:
Errors:
Not Found: URI gs://your_backup_hscript/datastore_backup_queue_status_backup_2013_05_23_QueueStats-1581059100105C09ECD88-output-54-retry-0
I'm sure above URL is the right one cause I can download it with gsutil. And I can import one CSV file located in the same bucket. May I know your suggestion of next step?
Found the reason now. I shall use the file with ".backup_info" suffix instead of arbitrary data file.
Cheers!

How do you upload data in bulk to Google App Engine Datastore?

I have about 4000 records that I need to upload to Datastore.
They are currently in CSV format. I'd appreciate if someone would
point me to or explain how to upload data in bulk to GAE.
You can use the bulkloader.py tool:
The bulkloader.py tool included with
the Python SDK can upload data to your
application's datastore. With just a
little bit of set-up, you can create
new datastore entities from CSV files.
I don't have the perfect solution, but I suggest you have a go with the App Engine Console. App Engine Console is a free plugin that lets you run an interactive Python interpreter in your production environment. It's helpful for one-off data manipulation (such as initial data imports) for several reasons:
It's the good old read-eval-print interpreter. You can do things one at a time instead of having to write the perfect import code all at once and running it in batch.
You have interactive access to your own data model, so you can read/update/delete objects from the data store.
You have interactive access to the URL Fetch API, so you can pull data down piece by piece.
I suggest something like the following:
Get your data model working in your development environment
Split your CSV records into chunks of under 1,000. Publish them somewhere like Amazon S3 or any other URL.
Install App Engine Console in your project and push it up to production
Log in to the console. (Only admins can use the console so you should be safe. You can even configure it to return HTTP 404 to "cloak" from unauthorized users.)
For each chunk of your CSV:
Use URLFetch to pull down a chunk of data
Use the built-in csv module to chop up your data until you have a list of useful data structures (most likely a list of lists or something like that)
Write a for loop, iterating through each each data structure in the list:
Create a data object with all correct properties
put() it into the data store
You should find that after one iteration through #5, then you can either copy and paste, or else write simple functions to speed up your import task. Also, with fetching and processing your data in steps 5.1 and 5.2, you can take your time until you are sure that you have it perfect.
(Note, App Engine Console currently works best with Firefox.)
By using remote API and operations on multiple entities. I will show an example on NDB using python, where our Test.csv contains the following values separated with semicolon:
1;2;3;4
5;6;7;8
First we need to import modules:
import csv
from TestData import TestData
from google.appengine.ext import ndb
from google.appengine.ext.remote_api import remote_api_stub
Then we need to create remote api stub:
remote_api_stub.ConfigureRemoteApi(None, '/_ah/remote_api', auth_func, 'your-app-id.appspot.com')
For more information on using remote api have a look at this answer.
Then comes the main code, which basically does the following things:
Opens the Test.csv file.
Sets the delimiter. We are using semicolon.
Then you have two different options to create a list of entities:
Using map reduce functions.
Using list comprehension.
In the end you batch put the whole list of entities.
Main code:
# Open csv file for reading.
with open('Test.csv', 'rb') as file:
# Set delimiter.
reader = csv.reader(file, delimiter=';')
# Reduce 2D list into 1D list and then map every element into entity.
test_data_list = map(lambda number: TestData(number=int(number)),
reduce(lambda list, row: list+row, reader)
)
# Or you can use list comprehension.
test_data_list = [TestData(number=int(number)) for row in reader for number in row]
# Batch put whole list into HRD.
ndb.put_multi(test_data_list)
The put_multi operation also takes care of making sure to batch appropriate number of entities in a single HTTP POST request.
Have a look at this documentation for more information:
CSV File Reading and Writing
Using the Remote API in a Local Client
Operations on Multiple Keys or Entities
NDB functions
the later version of app engine sdk, one can upload using the appcfg.py
see appcfg.py

Resources