Load data from Google Cloud Storage to BigQuery using Java

Load data from Google Cloud Storage to BigQuery using Java - google-app-engine

I want to upload data from Google Cloud Storage to BigQuery, but I can't find any Java sample code describing how to do this. Would someone please give me some hint as how to do this?
What I actually wanna do is to transfer data from Google App Engine tables to BigQuery (and sync on a daily basis), so that I can do some analysis. I use the Google Cloud Storage Service in Google App Engine to write (new) records to files in Google Cloud Storage, and the only missing part is to append the data to tables in BigQuery (or create a new table for first time write). Admittedly I can manually upload/append the data using the BigQuery browser tool, but I would like it to be automatic, otherwise I need to manually do it everyday.

I don't know of any java samples for loading tables from Google Cloud Storage into BigQuery. That said, if you follow the instructions for running query jobs here, you can run a Load job instead with the folowing:
Job job = new Job();
JobConfiguration config = new JobConfiguration();
JobConfigurationLoad loadConfig = new JobConfigurationLoad();
config.setLoad(loadConfig);
job.setConfiguration(config);
// Set where you are importing from (i.e. the Google Cloud Storage paths).
List<String> sources = new ArrayList<String>();
sources.add("gs://bucket/csv_to_load.csv");
loadConfig.setSourceUris(sources);
// Describe the resulting table you are importing to:
TableReference tableRef = new TableReference();
tableRef.setDatasetId("myDataset");
tableRef.setTableId("myTable");
tableRef.setProjectId(projectId);
loadConfig.setDestinationTable(tableRef);
List<TableFieldSchema> fields = new ArrayList<TableFieldSchema>();
TableFieldSchema fieldFoo = new TableFieldSchema();
fieldFoo.setName("foo");
fieldFoo.setType("string");
TableFieldSchema fieldBar = new TableFieldSchema();
fieldBar.setName("bar");
fieldBar.setType("integer");
fields.add(fieldFoo);
fields.add(fieldBar);
TableSchema schema = new TableSchema();
schema.setFields(fields);
loadConfig.setSchema(schema);
// Also set custom delimiter or header rows to skip here....
// [not shown].
Insert insert = bigquery.jobs().insert(projectId, job);
insert.setProjectId(projectId);
JobReference jobRef = insert.execute().getJobReference();
// ... see rest of codelab for waiting for job to complete.
For more information on the load configuration object, see the javadoc here.

Related

Best SERVERLESS way to feed data from multiple Chrome extensions to a single Google Spreadsheet

I have a Chrome browser plugin which performs like a social media posts bookmarks collector (a click on a post captures it's author username, text, date, permalink and the plugin user who owns it). I'm looking for the most efficient, safest and SERVERLESS way to have [potentially] thousands of plugin end-users update, for each individual click on a post, a line in a Google spreadsheet.
With my limited knowledge I narrowed the options to webhooks:
Create a Google Apps simple webhook app that will listen to plugin packets.
Have the end-user plugins send each social-media post click data in JSON to the webhook.
Have the webhook Google App publish an RSS feed with all the data collected
Have the Google Spreadsheet regularly check for new RSS entries and update a new line for each.
What i'm not sure of is 1) whether a simple webhook can be created using Google Apps? 2) can this method be secure enough to prevent non-plugin entries to the RSS feed? and 3) is there a simpler more efficient way of achieving this end?
Your help will be much appreciated :-)
Thanks.

You can easily create Webhooks in Google Apps Script through the use of Webapps. As an example:
function doPost(e) {
if (e.postData && e.postData.type == 'application/json') {
var data = JSON.parse(e.postData.contents);
var author = data['author'];
var text = data['text'];
var date = data['date'];
var permalink = data['permalink'];
var user = data['user'];
// (...)
}
This example code will parse the data received through a JSON post request. Afterwards, you can insert it to your Spreadsheet and generate an RSS feed using XmlService (for more information on how to do it see this blog post). The RSS feed can be served using the doGet() method.

How to connect Metabase with Google Sheet?

I have username and password to the metabase our company use heavily. Everyday I have to download CSVs frequently and then export them to google sheets to make report or analysis. Is there any way to connect Metabase to Google Sheet so that the sheets pull CSVs automatically from Metabase url?

I've implemented a Google Sheets add-on as a solution for this and open-sourced it.
I've used this code as a Google Sheets add-on at a large organization and it has worked well so far, with hundreds of users a week, running tens of thousands of queries.
This is what it looks like in action:
If you don't want the hassle of setting it up as a Google Sheets add-on, you can take the script and adapt it as a simple Apps Script.

You could try and write a script using Google Apps Script for your issue. An idea would be to use the UrlFetchApp.fetch() method which allows scripts to communicate with other applications or access other resources on the web by making a request to fetch a URL, in your case the Metabase URL.
This following code snippet might give you a glimpse on what to do next:
function myFunction() {
var sheet = SpreadsheetApp.getActive().getSheets()[0];
var data = {
// the data that you will be using
};
var options = {
'method': 'post',
'payload': data
};
var response = UrlFetchApp.fetch('https://metabase.example.com/api/card/1/query/json', options);
var responseData = JSON.parse(response.getContentText());
sheet.appendRow(responseData);
}
What it actually does: it uses the Metabase API to fetch the data you want by using a POST request (the reason a POST request is used is because according to the API Documentation for Metabase v0.32.2 it runs the query associated with a card, and returns its results as a file in the specified format). Then the response from the request is appended to your sheet by using .appendRow().
Furthermore, you can read more about UrlFetchApp and Apps Script here:
Class UrlFetchApp;
Apps Script;
Spreadsheet App.

You can add a Google Sheet as an external table in Google Big Query, then connect Metabase to Google BigQuery.

google apps from app engine

I want to produce a Google Apps document based on a (Google doc) template stored on the users Google Drive and some XML data held by a servlet running on Google App Engine.
Preferably I want to run as much as possible on the GAE. Is it possible to run Apps Service APIs on GAE or download/manipulate Google doc on GAE? I have not been able to find anything suitable
One alternative is obviously to implement the merge functionality using an Apps Script transferring the XML as parameters and initiate the script through http from GAE, but it just seem somewhat awkward in comparison.
EDIT:
Specifically I am looking for the replaceText script functionality, as shown in the Apps script snippet below, to be implemented in GAE. Remaining code is supported through Drive/Mail API, I guess..
// Get document template, copy it as a new temp doc, and save the Doc’s id
var copyId = DocsList.getFileById(providedTemplateId)
.makeCopy('My-title')
.getId();
var copyDoc = DocumentApp.openById(copyId);
var copyBody = copyDoc.getActiveSection();
// Replace place holder keys,
copyBody.replaceText("CustomerAddressee", fullName);
var todaysDate = Utilities.formatDate(new Date(), "GMT+2", "dd/MM-yyyy");
copyBody.replaceText("DateToday", todaysDate);
// Save and close the temporary document
copyDoc.saveAndClose();
// Convert temporary document to PDF by using the getAs blob conversion
var pdf = DocsList.getFileById(copyId).getAs("application/pdf");
// Attach PDF and send the email
MailApp.sendEmail({
to: email_address,
subject: "Proposal",
htmlBody: "Hi,<br><br>Here is my file :)<br>Enjoy!<br><br>Regards Tony",
attachments: pdf});

As you already found out, apps script is currently the only one that can access an api to modify google docs. All other ways cannot do it unless you export to another format (like pdf or .doc) then use libraries that can modify those, then reupload the new file asking to convert to a google doc native format, which in some cases would loose some format/comments/named ranges and other google doc features. So like you said, if you must use the google docs api you must call apps script (as a content service). Also note that the sample apps script code you show is old and uses the deptecated docsList so you need to port it to the Drive api.

Apps script pretty much piggy backs on top of the standard published Google APIs. Increasingly the behaviours are becoming more familiar.
Obviously apps script is js based and gae not. All the APIs apart from those related to script running are available in the standard gae client runtimes.
No code to check here so I'm afraid generic answer is all I have.

I see now it can be solved by using the Google Drive API to export (download) the Google Apps Doc file as PDF (or other formats) to GAE, and do simple replace-text editing using e.g. the iText library

options for restoring appengine datastore data?

A user of our application has accidently deleted data. They'd like this to be restored. We have no special logic or datastore entities that can do this.
However, we do daily backups of our entire datastore to blobstore using the datastore admin.
What are our options for selectively restoring part of this backup back into the datastore?
We'd preferably like to not have a service interruption for other users. One final restriction is that we can not change our production app id (i.e. copy data over to a new app and then restore the backup to our old app - this is because our clients reference our appid directly).
Thoughts?
UPDATE
I was thinking of running a mapreduce over all the blobs in our app and finding the ones that are to do with our backup. Parsing these backups and restoring the entities as needed. The only issue is, what format are the blobs stored in? How can I parse them?

Since 1.6.5 the Datastore Admin now allows you to restore individual Kinds from an existing backup.
About the backup format: according to the datastore admin source code you can use RecordsReader to read backup file stored in leveldb log format in a MapperPipeline

The restore functionality in its current form is not very useful for my application. There should be an option to restore only few entities or namespaces into current app-id or another app-id.
Please star this issue
http://code.google.com/p/googleappengine/issues/detail?id=7311

May be custom backup reader help you
final BlobstoreService blobstoreService = BlobstoreServiceFactory.getBlobstoreService();
final BlobKey blobKey = blobstoreService.createGsBlobKey("/gs/" + bucket + "/" + pathToOutputFile);
final RecordReadChannel rrc = BlobserviceHelper.openRecordReadChannel(blobKey, blobstoreService);
ByteBuffer bf;
while ((bf = rrc.readRecord()) != null) {
final OnestoreEntity.EntityProto proto = new OnestoreEntity.EntityProto();
proto.mergeFrom(bf.array());
final Entity entity = EntityTranslator.createFromPb(proto);
entity.removeProperty(""); // Remove empty property
//Now you can save entity to datastore or read keys and properties
}

How to automate download of weekly export service files

In SalesForce you can schedule up to weekly "backups"/dumps of your data here: Setup > Administration Setup > Data Management > Data Export
If you have a large Salesforce database there can be a significant number of files to be downloading by hand.
Does anyone have a best practice, tool, batch file, or trick to automate this process or make it a little less manual?

Last time I checked, there was no way to access the backup file status (or actual files) over the API. I suspect they have made this process difficult to automate by design.
I use the Salesforce scheduler to prepare the files on a weekly basis, then I have a scheduled task that runs on a local server which downloads the files. Assuming you have the ability to automate/script some web requests, here are some steps you can use to download the files:
Get an active salesforce session ID/token
enterprise API - login() SOAP method
Get your organization ID ("org ID")
Setup > Company Profile > Company Information OR
use the enterprise API getUserInfo() SOAP call to retrieve your org ID
Send an HTTP GET request to https://{your sf.com instance}.salesforce.com/ui/setup/export/DataExportPage/d?setupid=DataManagementExport
Set the request cookie as follows:
oid={your org ID}; sid={your
session ID};
Parse the resulting HTML for instances of <a href="/servlet/servlet.OrgExport?fileName=
(The filename begins after fileName=)
Plug the file names into this URL to download (and save):
https://{your sf.com instance}.salesforce.com/servlet/servlet.OrgExport?fileName={filename}
Use the same cookie as in step 3 when downloading the files
This is by no means a best practice, but it gets the job done. It should go without saying that if they change the layout of the page in question, this probably won't work any more. Hope this helps.

A script to download the SalesForce backup files is available at https://github.com/carojkov/salesforce-export-downloader/
It's written in Ruby and can be run on any platform. Supplied configuration file provides fields for your username, password and download location.
With little configuration you can get your downloads going. The script sends email notifications on completion or failure.
It's simple enough to figure out the sequence of steps needed to write your own program if Ruby solution does not work for you.

I'm Naomi, CMO and co-founder of cloudHQ, so I feel like this is a question I should probably answer. :-)
cloudHQ is a SaaS service that syncs your cloud. In your case, you'd never need to upload your reports as a data export from Salesforce, but you'll just always have them backed up in a folder labeled "Salesforce Reports" in whichever service you synchronized Salesforce with like: Dropbox, Google Drive, Box, Egnyte, Sharepoint, etc.
The service is not free, but there's a free 15 day trial. To date, there's no other service that actually syncs your Salesforce reports with other cloud storage companies in real-time.
Here's where you can try it out: https://cloudhq.net/salesforce
I hope this helps you!
Cheers,
Naomi

Be careful that you know what you're getting in the back-up file. The backup is a zip of 65 different CSV files. It's raw data, outside of the Salesforce UI cannot be used very easily.

Our company makes the free DataExportConsole command line tool to fully automate the process. You do the following:
Automate the weekly Data Export with the Salesforce scheduler
Use the Windows Task Scheduler to run the FuseIT.SFDC.DataExportConsole.exe file with the right parameters.

I recently wrote a small PHP utility that uses the Bulk API to download a copy of sObjects you define via a json config file.
It's pretty basic but can easily be expanded to suit your needs.
Force.com Replicator on github.

Adding a Python3.6 solution. Should work (I haven't tested it though). Make sure the packages (requests, BeautifulSoup and simple_salesforce) are installed.
import os
import zipfile
import requests
import subprocess
from datetime import datetime
from bs4 import BeautifulSoup as BS
from simple_salesforce import Salesforce
def login_to_salesforce():
sf = Salesforce(
username=os.environ.get('SALESFORCE_USERNAME'),
password=os.environ.get('SALESFORCE_PASSWORD'),
security_token=os.environ.get('SALESFORCE_SECURITY_TOKEN')
)
return sf
org_id = "SALESFORCE_ORG_ID" # canbe found in salesforce-> company profile
export_page_url = "https://XXXX.my.salesforce.com/ui/setup/export/DataExportPage/d?setupid=DataManagementExport"
sf = login_to_salesforce()
cookie = {'oid': org_id, 'sid':sf.session_id}
export_page = requests.get(export_page_url, cookies=cookie)
export_page = export_page.content.decode()
links = []
parsed_page = BS(export_page)
_path_to_exports = "/servlet/servlet.OrgExport?fileName="
for link in parsed_page.findAll('a'):
href = link.get('href')
if href is not None:
if href.startswith(_path_to_exports):
links.append(href)
print(links)
if len(links) == 0:
print("No export files found")
exit(0)
today = datetime.today().strftime("%Y_%m_%d")
download_location = os.path.join(".", "tmp", today)
os.makedirs(download_location, exist_ok=True)
baseurl = "https://zageno.my.salesforce.com"
for link in links:
filename = baseurl + link
downloadfile = requests.get(filename, cookies=cookie, stream=True) # make stream=True if RAM consumption is high
with open(os.path.join(download_location, downloadfile.headers['Content-Disposition'].split("filename=")[1]), 'wb') as f:
for chunk in downloadfile.iter_content(chunk_size=100*1024*1024): # 50Mbs ??
if chunk:
f.write(chunk)

I have added a feature in my app to automatically backup the weekly/monthly csv files to S3 bucket, https://app.salesforce-compare.com/
Create a connection provider (currently only AWS S3 is supported) and link it to a SF connection (needs to be created as well).
On the main page you can monitor the progress of the scheduled job and access the files in the bucket
More info: https://salesforce-compare.com/release-notes/

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Load data from Google Cloud Storage to BigQuery using Java - google-app-engine

Related

Best SERVERLESS way to feed data from multiple Chrome extensions to a single Google Spreadsheet

How to connect Metabase with Google Sheet?

google apps from app engine

options for restoring appengine datastore data?

How to automate download of weekly export service files

Categories

Resources