How to automate download of weekly export service files - salesforce

In SalesForce you can schedule up to weekly "backups"/dumps of your data here: Setup > Administration Setup > Data Management > Data Export
If you have a large Salesforce database there can be a significant number of files to be downloading by hand.
Does anyone have a best practice, tool, batch file, or trick to automate this process or make it a little less manual?

Last time I checked, there was no way to access the backup file status (or actual files) over the API. I suspect they have made this process difficult to automate by design.
I use the Salesforce scheduler to prepare the files on a weekly basis, then I have a scheduled task that runs on a local server which downloads the files. Assuming you have the ability to automate/script some web requests, here are some steps you can use to download the files:
Get an active salesforce session ID/token
enterprise API - login() SOAP method
Get your organization ID ("org ID")
Setup > Company Profile > Company Information OR
use the enterprise API getUserInfo() SOAP call to retrieve your org ID
Send an HTTP GET request to https://{your sf.com instance}.salesforce.com/ui/setup/export/DataExportPage/d?setupid=DataManagementExport
Set the request cookie as follows:
oid={your org ID}; sid={your
session ID};
Parse the resulting HTML for instances of <a href="/servlet/servlet.OrgExport?fileName=
(The filename begins after fileName=)
Plug the file names into this URL to download (and save):
https://{your sf.com instance}.salesforce.com/servlet/servlet.OrgExport?fileName={filename}
Use the same cookie as in step 3 when downloading the files
This is by no means a best practice, but it gets the job done. It should go without saying that if they change the layout of the page in question, this probably won't work any more. Hope this helps.

A script to download the SalesForce backup files is available at https://github.com/carojkov/salesforce-export-downloader/
It's written in Ruby and can be run on any platform. Supplied configuration file provides fields for your username, password and download location.
With little configuration you can get your downloads going. The script sends email notifications on completion or failure.
It's simple enough to figure out the sequence of steps needed to write your own program if Ruby solution does not work for you.

I'm Naomi, CMO and co-founder of cloudHQ, so I feel like this is a question I should probably answer. :-)
cloudHQ is a SaaS service that syncs your cloud. In your case, you'd never need to upload your reports as a data export from Salesforce, but you'll just always have them backed up in a folder labeled "Salesforce Reports" in whichever service you synchronized Salesforce with like: Dropbox, Google Drive, Box, Egnyte, Sharepoint, etc.
The service is not free, but there's a free 15 day trial. To date, there's no other service that actually syncs your Salesforce reports with other cloud storage companies in real-time.
Here's where you can try it out: https://cloudhq.net/salesforce
I hope this helps you!
Cheers,
Naomi

Be careful that you know what you're getting in the back-up file. The backup is a zip of 65 different CSV files. It's raw data, outside of the Salesforce UI cannot be used very easily.

Our company makes the free DataExportConsole command line tool to fully automate the process. You do the following:
Automate the weekly Data Export with the Salesforce scheduler
Use the Windows Task Scheduler to run the FuseIT.SFDC.DataExportConsole.exe file with the right parameters.

I recently wrote a small PHP utility that uses the Bulk API to download a copy of sObjects you define via a json config file.
It's pretty basic but can easily be expanded to suit your needs.
Force.com Replicator on github.

Adding a Python3.6 solution. Should work (I haven't tested it though). Make sure the packages (requests, BeautifulSoup and simple_salesforce) are installed.
import os
import zipfile
import requests
import subprocess
from datetime import datetime
from bs4 import BeautifulSoup as BS
from simple_salesforce import Salesforce
def login_to_salesforce():
sf = Salesforce(
username=os.environ.get('SALESFORCE_USERNAME'),
password=os.environ.get('SALESFORCE_PASSWORD'),
security_token=os.environ.get('SALESFORCE_SECURITY_TOKEN')
)
return sf
org_id = "SALESFORCE_ORG_ID" # canbe found in salesforce-> company profile
export_page_url = "https://XXXX.my.salesforce.com/ui/setup/export/DataExportPage/d?setupid=DataManagementExport"
sf = login_to_salesforce()
cookie = {'oid': org_id, 'sid':sf.session_id}
export_page = requests.get(export_page_url, cookies=cookie)
export_page = export_page.content.decode()
links = []
parsed_page = BS(export_page)
_path_to_exports = "/servlet/servlet.OrgExport?fileName="
for link in parsed_page.findAll('a'):
href = link.get('href')
if href is not None:
if href.startswith(_path_to_exports):
links.append(href)
print(links)
if len(links) == 0:
print("No export files found")
exit(0)
today = datetime.today().strftime("%Y_%m_%d")
download_location = os.path.join(".", "tmp", today)
os.makedirs(download_location, exist_ok=True)
baseurl = "https://zageno.my.salesforce.com"
for link in links:
filename = baseurl + link
downloadfile = requests.get(filename, cookies=cookie, stream=True) # make stream=True if RAM consumption is high
with open(os.path.join(download_location, downloadfile.headers['Content-Disposition'].split("filename=")[1]), 'wb') as f:
for chunk in downloadfile.iter_content(chunk_size=100*1024*1024): # 50Mbs ??
if chunk:
f.write(chunk)

I have added a feature in my app to automatically backup the weekly/monthly csv files to S3 bucket, https://app.salesforce-compare.com/
Create a connection provider (currently only AWS S3 is supported) and link it to a SF connection (needs to be created as well).
On the main page you can monitor the progress of the scheduled job and access the files in the bucket
More info: https://salesforce-compare.com/release-notes/

Related

App Engine: Copy live Datastore to local dev Datastore (that still works)

This used to be possible by downloading with the bulkloader and uploading to the local dev server. However, the bulkloader download has been non-functional for several months now, due to not supporting oauth2.
A few places recommend downloading from a cloud storage backup, and uploading to the local datastore through either bulkloader or by directly parsing the backup. However, neither of these appear functional anymore. The bulkloader method throws:
OperationalError: unable to open database file
And the RecordsReader class, which is used to read the backup files, reaches end of file when trying to read the first record, resulting in no records being read.
Does there exist a current, functional, method for copying the live datastore to the local dev datastore?
RecordsReader is functioning perfectly on unix. I've tried this https://gist.github.com/jehna/3b258f5287fcc181aacf one day ago and it worked amazing.
You should add to the imports your Kinds implementation and run it in the datastore interactive shell.
for example:
from myproject.kinds_implementations import MyKind
I've removed the
for pp in dir(a):
try:
ppp = getattr(a, "_" + pp)
if isinstance(ppp, db.Key):
ppp._Key__reference.set_app(appname)
ppp
except AttributeError:
""" It's okay """
And it worked well. In my case the backup downloaded in multiple directories so I've modified the access to the directories. for some thing like that:
for directory in mypath:
full_directory_path = join(mypath, directory)
for sub_dir in listdir(directory_full_path):
full_sub_dir_path = join(full_directory_path, sub_dir)
onlyfiles = [ f for f in listdir(full_sub_dir_path) if isfile(join(mypath,f)) ]
for file in onlyfiles:
If you're working on windows you're welcome to follow my question about RecordsReader on windows, hopefully someone will answer there Google datastore backup to local dev_appserver
edit:
Working great on windows if you change the file open permissions from 'r' to 'rb'
The bulkloader is still functional on Python with OAuth2, albeit with some caveats. In downloading from the live app, there is an issue with refreshing of the OAuth2 token so the total download time is limited to 3600 seconds, or 3600+3600 if you manually use a refresh token with --oauth2_refresh_token.
When uploading to the development server app, OAuth2 will fail with a 401, so it's necessary to edit google.appengine.ext.remote_api.handler and stub out 'CheckIsAdmin' to always return True as a workaround:
def CheckIsAdmin(self):
return True
user_is_authorized = False
...
I upvoted the above answer however, as it looks like a more robust solution at this point.

Download large file on Google App Engine Python

On my appspot website, I use a third party API to query a large amount of data. The user then downloads the data in CSV. I know how to generate a csv and download it. The problem is that because the file is huge, I get the DeadlineExceededError.
I have tried tried increasing the fetch deadline to 60 (urlfetch.set_default_fetch_deadline(60)). It doesn't seem reasonable to increase it any further.
What is the appropriate way to tackle this problem on Google App Engine? Is this something where I have to use Task Queue?
Thanks.
DeadlineExceededError means that your incoming request took longer than 60 secs, not your UrlFetch call.
Deploy the code to generate the CSV file into a different module that you setup with basic or manual scaling. The URL to download your CSV will become http://module.domain.com
Requests can run indefinitely on modules with basic or manual scaling.
Alternately, consider creating a file dynamically in Google Cloud Storage (GCS) with your CSV content. At that point, the file resides in GCS and you have the ability to generate a URL from which they can download the file directly. There are also other options for different auth methods.
You can see documentation on doing this at
https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/
and
https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/functions
Important note: do not use the Files API (which was a common way of dynamically create files in blobstore/gcs) as it has been depracated. Use the above referenced Google Cloud Storage Client API instead.
Of course, you can delete the generated files after they've been successfully downloaded and/or you could run a cron job to expire links/files after a certain time period.
Depending on your specific use case, this might be a more effective path.

Migrating from App Engine Files API

My app stores a bunch of images as blobs. This is roughly how I store images.
from google.appengine.api import files
# ...
fname = files.blobstore.create(mime_type='image/jpeg')
with files.open(fname, 'a') as f:
f.write(image_byte)
files.finalize(fname)
blob_key = files.blobstore.get_blob_key(fname)
To serve these images, I use images.get_serving_url(blob_key).
Here are my questions:
Will I have to copy over all blobs to Google Cloud Storage? In other words, will I be able to access my existing blobs using GCS client library and existing blob keys? Or, will I have to copy the blobs over to GCS and get new blob keys?
Assuming I do have to copy them over to GCS, what is the easiest way? Is there a migration tool or something? Failing that, is there some sample code I can copy-paste?
Thanks!
The files have all been going into GCS for a while. The blobstore is just an alternate way to access it. The blob keys and access shouldn't be affected.
You will, however, need to stop using the files API itself and start using the GCS API to create the files.
1) No, you can still use the blobstore. You can also upload files to the blobstore when you use the BlobstoreUploadHandler.
2) Migration is easy when you use the blobstore, bacause you can create a blobkey for GCS objects. And when you use the default GCS bucket you have free quota.
from google.appengine.api import app_identity
import cloudstorage as gcs
default_bucket = app_identity.get_default_gcs_bucket_name()
gcs_filename = '/%s/%s' % (default_bucket, image_file_name)
with gcs.open(gcs_filename, 'w', content_type='image/jpeg') as f:
f.write(image_byte)
blob_key = blobstore.create_gs_key('/gs' + gcs_filename)
# and create a serving url
I received an email from Google Cloud Platform on May 19, 2015, an excerpt is shown here:
The removal of the Files API will happen in the following manner.
On May 20th, 2015 no new applications will have access to the Files
API. Applications that were created prior to May 20th, 2015 will
continue to run without any issues. That said, we strongly encourage
developers to start switching over to the Cloud Storage Client Library
today.
On July 28th, 2015 starting at 12pm Pacific Time, the Files API will
be temporarily shutdown for 24 hrs.
On August 4th, 2015, we will permanently shut down the Files API at
12:00pm Pacific time.
Since I was using the exact same code to write a blobstore file, I spent a day researching the GCS system. After failing to get a "service account" to work (by going through poorly documented OAuth2 confusion), I gave up on using GCS.
Now I am using ndb's BlobProperty. I keep the blobs in a separate model using both a parent key and a key name (as filename) to locate the images. Using a separate model keeps the huge blob out of my regular entities so fetches aren't slowed down by their sheer size. I wrote a separate REST API just for the images.
Me too faced same issue while running GAE server locally:
com.google.appengine.tools.cloudstorage.NonRetriableException: com.google.apphosting.api.ApiProxy$FeatureNotEnabledException: The Files API is disabled. Further information: https://cloud.google.com/appengine/docs/deprecations/files_api
Here in my case this is fixed my issue:
Simply I changed
This:
compile 'com.google.appengine.tools:appengine-gcs-client:0.4.1'
To:
compile 'com.google.appengine.tools:appengine-gcs-client:0.5'
in build.gradle file, because Files API(Beta) is deprecaated on June 12, 2013 and Turndowned on September 9, 2015. (Source)
From this MVN Repo latest one is 'com.google.appengine.tools:appengine-gcs-client:0.5'

From Drive to Blobstore using Picker

I have the Google picker set up, as well as Blobstore. I'm able to upload files from my local machine to the Blobstore, but now I have the Picker set up, it works, but I don't know know how to use the info (url? fileid?) to then load that selected file into the Blobstore? Any tips on how to do this? I haven't been able to find much of anything on it on Googles resources
There isn't a direct link between the Google Picker and the App Engine Blobstore. They are kind of different tools for different jobs. The Google Picker is designed as an end user tool, to select data from a users Google account. It just so happens that the Picker also provides an upload interface (to Google Drive) as well. The Blobstore on the other hand, is designed as a blob storage mechanism for your App Engine application.
In theory, you could write a script to connect the two, but there are a few considerations:
Your app would need access to the users Google Drive account using OAuth2. This is necessary, as the Picker API is a client side API, whereas the Blobstore API is a server side API. You would need to send the selected document URL to the server, then download the document and finally save it to Blobstore.
Unless you then deleted the data from Drive (very risky due to point 3), your data would be persisted in 2 places
You cannot know for sure if the user selected an existing file, or uploaded a new one
Not a great user experience - the user things they are uploading to Drive
In essence, this sounds like a bad idea! What is your use case?
#Gwyn - I don't have enough reputation to add a comment to your solution, but I had an idea about problem #3: You cannot know for sure if the user selected an existing file, or uploaded a new one
Would it be possible to use Response.VIEW to see what view they were using when the file was selected? If you have one view constructor for Drive files and one for Upload files, something like
var driveView = new google.picker.View(google.picker.ViewId.DOCS);
var uploadView = new google.picker.DocsUploadView();
would that allow you to know whether the file was a new upload (safe to delete) or an existing file (leave it alone)?
Assuming that you want to pick a file from your own Google Drive and move it to the Blobstore.
1)First you have to perform Oauth for Google Drive API
2)Using the picker when you select a file from drive, you need to get it's id
3)Using the id obtained in step 2 you can programmatically download it using Drive API
4)After downloading the file you can use FileService(deprecated though) to upload the file to the
Blobstore.

How do you upload data in bulk to Google App Engine Datastore?

I have about 4000 records that I need to upload to Datastore.
They are currently in CSV format. I'd appreciate if someone would
point me to or explain how to upload data in bulk to GAE.
You can use the bulkloader.py tool:
The bulkloader.py tool included with
the Python SDK can upload data to your
application's datastore. With just a
little bit of set-up, you can create
new datastore entities from CSV files.
I don't have the perfect solution, but I suggest you have a go with the App Engine Console. App Engine Console is a free plugin that lets you run an interactive Python interpreter in your production environment. It's helpful for one-off data manipulation (such as initial data imports) for several reasons:
It's the good old read-eval-print interpreter. You can do things one at a time instead of having to write the perfect import code all at once and running it in batch.
You have interactive access to your own data model, so you can read/update/delete objects from the data store.
You have interactive access to the URL Fetch API, so you can pull data down piece by piece.
I suggest something like the following:
Get your data model working in your development environment
Split your CSV records into chunks of under 1,000. Publish them somewhere like Amazon S3 or any other URL.
Install App Engine Console in your project and push it up to production
Log in to the console. (Only admins can use the console so you should be safe. You can even configure it to return HTTP 404 to "cloak" from unauthorized users.)
For each chunk of your CSV:
Use URLFetch to pull down a chunk of data
Use the built-in csv module to chop up your data until you have a list of useful data structures (most likely a list of lists or something like that)
Write a for loop, iterating through each each data structure in the list:
Create a data object with all correct properties
put() it into the data store
You should find that after one iteration through #5, then you can either copy and paste, or else write simple functions to speed up your import task. Also, with fetching and processing your data in steps 5.1 and 5.2, you can take your time until you are sure that you have it perfect.
(Note, App Engine Console currently works best with Firefox.)
By using remote API and operations on multiple entities. I will show an example on NDB using python, where our Test.csv contains the following values separated with semicolon:
1;2;3;4
5;6;7;8
First we need to import modules:
import csv
from TestData import TestData
from google.appengine.ext import ndb
from google.appengine.ext.remote_api import remote_api_stub
Then we need to create remote api stub:
remote_api_stub.ConfigureRemoteApi(None, '/_ah/remote_api', auth_func, 'your-app-id.appspot.com')
For more information on using remote api have a look at this answer.
Then comes the main code, which basically does the following things:
Opens the Test.csv file.
Sets the delimiter. We are using semicolon.
Then you have two different options to create a list of entities:
Using map reduce functions.
Using list comprehension.
In the end you batch put the whole list of entities.
Main code:
# Open csv file for reading.
with open('Test.csv', 'rb') as file:
# Set delimiter.
reader = csv.reader(file, delimiter=';')
# Reduce 2D list into 1D list and then map every element into entity.
test_data_list = map(lambda number: TestData(number=int(number)),
reduce(lambda list, row: list+row, reader)
)
# Or you can use list comprehension.
test_data_list = [TestData(number=int(number)) for row in reader for number in row]
# Batch put whole list into HRD.
ndb.put_multi(test_data_list)
The put_multi operation also takes care of making sure to batch appropriate number of entities in a single HTTP POST request.
Have a look at this documentation for more information:
CSV File Reading and Writing
Using the Remote API in a Local Client
Operations on Multiple Keys or Entities
NDB functions
the later version of app engine sdk, one can upload using the appcfg.py
see appcfg.py

Resources