FTP to Google Storage - google-app-engine

Some files get uploaded on a daily basis to an FTP server and I need those files under Google Cloud Storage. I don't want to bug the users that upload the files to install any additional software and just let them keep using their FTP client.
Is there a way to use GCS as an FTP server? If not, how can I create a job that periodically picks up the files from an FTP location and puts them in GCS?
In other words: what's the best and simplest way to do it?

You could write yourself an FTP server which uploads to GCS, for example based on pyftpdlib
Define a custom handler which stores to GCS when a file is received
import os
from pyftpdlib.handlers import FTPHandler
from pyftpdlib.servers import FTPServer
from pyftpdlib.authorizers import DummyAuthorizer
from google.cloud import storage
class MyHandler:
def on_file_received(self, file):
storage_client = storage.Client()
bucket = storage_client.get_bucket('your_gcs_bucket')
blob = bucket.blob(file[5:]) # strip leading /tmp/
blob.upload_from_filename(file)
os.remove(file)
def on_... # implement other events
def main():
authorizer = DummyAuthorizer()
authorizer.add_user('user', 'password', homedir='/tmp', perm='elradfmw')
handler = MyHandler
handler.authorizer = authorizer
handler.masquerade_address = add.your.public.ip
handler.passive_ports = range(60000, 60999)
server = FTPServer(("127.0.0.1", 21), handler)
server.serve_forever()
if __name__ == "__main__":
main()
I've successfully run this on Google Container Engine (it requires some effort getting passive FTP working properly) but it should be pretty simple to do on Compute Engine. According to the above configuration, open port 21 and ports 60000 - 60999 on the firewall.
To run it, python my_ftp_server.py - if you want to listen on port 21 you'll need root privileges.

You could setup a cron and rsync between the FTP server and Google Cloud Storage using gsutil rsync or open source rclone tool.
If you can't run those commands on the FTP server periodically, you could mount the FTP server as a local filesystem or drive (Linux, Windows)

I have successfully set up an FTP proxy to GCS using gcsfs in a VM in Google Compute (mentioned by jkff in the comment to my question), with these instructions:
http://ilyapimenov.com/blog/2015/01/19/ftp-proxy-to-gcs.html
Some changes are needed though:
In /etc/vsftpd.conf change #write_enable=YES to
write_enable=YES
Add firewall rules in your GC project to allow
access to ports 21 and passive ports 15393 to 15592 (https://console.cloud.google.com/networking/firewalls/list)
Some possible problems:
If you can access the FTP server using the local ip, but not the remote ip, it's probably because you haven't set up the firewall rules
If you can access the ftp server, but are unable to write, it's probably because you need the write_enable=YES
If you are tying to read on the folder you created on /mnt, but get a I/O error, it's probably because the bucket in gcsfs_config is not right.
Also, your ftp client needs to use the transfer mode set to "passive".

Set up a VM in the google cloud, using some *nix flavor. Set up ftp on it, and point it to a folder abc. Use google fuse to mount abc as a GCS bucket. Voila - back and forth between gcs / ftp without writing any software.
(Small print: fuse rolls up and dies if you push too much data, so bounce it periodically, once a week or once a day; also you might need to set the mount or fuse to allow permissions for all users)

Related

Best way to define Client to be used with localhost vs domain

so I got my Identity Server project up and running, and am setting up my project to publish. Now, when I define my client in the config for IS4, I suppose I will have to set my redirect urls to my publish domain, something like this:
new Client{
...
RedirectUris = { "localhost:5002/signin-oidc", "myclient.com/signin-oidc" }
...
}
Is including the localhost and domain the right way to do this?
I am thinking it would be ok since an attacker would have to have my client secret in order to login. Or is it better to set up two separate clients (eg. 'client' and 'client_local'), and request the appropriate client at startup?
There are two ways:
1) Use Configuration File: You can store the clients in a JSON file and load them during startup. Use different JSON files for different environments.
Example. clients.Development.json for Development and clients.Production.json in Production environment; However, The clients will be In Memory Clients and any changes in clients configuration will require a reboot of your application.
2) Use Persistent Storage: Use a database server to store configuration and operational data. A local database for development and a database for production use.
See this docs, The example uses Entity Framework for persistent storage but you're bound to Entity Framework or any ORM. You can opt to write your own Data Access Layer for IdentityServer. This will allow you to change client configurations without restarting your application as the data will be retrieved from a database.

Best approach to write generic azure logic app/azure functions to do FTP operations

I would like to build a FTP service using azure logic apps/azure functions. I would like the logic app to be invoked via HTTP request (will expose this app as REST API later). The FTP server details like directory, username, password will be sent in the request.
Is there a way by which I can have my logic app to create FTP connector dynamically based on the incoming request and then do a FTP upload or download?
You cannot create a connection at Logic Apps run time, it need to happen at author time. If it's a pre-defined list of connections, you can create them first, then use switch-case to branch into the connection that should be used at run time.

How to send AWS CLI data sync failure notification via email?

I have .bat file inside that calls the AWS sync command:
aws s3 sync D:\Users\backup s3://mybucket
It syncs my local data to an S3 bucket. Then I created a Windows Scheduler Task for that .bat file, and every day at 2300hr the .bat file runs and syncs my local data to the S3 bucket.
If a failure happens during the sync on the local PC (example: network failure/s3 authentication fail) or remote S3 server, I want to get a failure notification via email.
What is the best and most efficient way to get this notification?
The AWS Command-Line Interface (CLI) can return an error code to indicate whether the command succeeded.
See: AWS CLI Return Codes
Therefore, you could do the following:
When running a CLI command, redirect output to a temporary file
Check the return code. If it is non-zero, send an email with the error code and the contents of the temporary file
It would be something like:
aws s3 cp ./foo s3://bar >output
code=$?
if [ $code -ne 0 ]
then <email stuff with output file>
fi

How to automate download of weekly export service files

In SalesForce you can schedule up to weekly "backups"/dumps of your data here: Setup > Administration Setup > Data Management > Data Export
If you have a large Salesforce database there can be a significant number of files to be downloading by hand.
Does anyone have a best practice, tool, batch file, or trick to automate this process or make it a little less manual?
Last time I checked, there was no way to access the backup file status (or actual files) over the API. I suspect they have made this process difficult to automate by design.
I use the Salesforce scheduler to prepare the files on a weekly basis, then I have a scheduled task that runs on a local server which downloads the files. Assuming you have the ability to automate/script some web requests, here are some steps you can use to download the files:
Get an active salesforce session ID/token
enterprise API - login() SOAP method
Get your organization ID ("org ID")
Setup > Company Profile > Company Information OR
use the enterprise API getUserInfo() SOAP call to retrieve your org ID
Send an HTTP GET request to https://{your sf.com instance}.salesforce.com/ui/setup/export/DataExportPage/d?setupid=DataManagementExport
Set the request cookie as follows:
oid={your org ID}; sid={your
session ID};
Parse the resulting HTML for instances of <a href="/servlet/servlet.OrgExport?fileName=
(The filename begins after fileName=)
Plug the file names into this URL to download (and save):
https://{your sf.com instance}.salesforce.com/servlet/servlet.OrgExport?fileName={filename}
Use the same cookie as in step 3 when downloading the files
This is by no means a best practice, but it gets the job done. It should go without saying that if they change the layout of the page in question, this probably won't work any more. Hope this helps.
A script to download the SalesForce backup files is available at https://github.com/carojkov/salesforce-export-downloader/
It's written in Ruby and can be run on any platform. Supplied configuration file provides fields for your username, password and download location.
With little configuration you can get your downloads going. The script sends email notifications on completion or failure.
It's simple enough to figure out the sequence of steps needed to write your own program if Ruby solution does not work for you.
I'm Naomi, CMO and co-founder of cloudHQ, so I feel like this is a question I should probably answer. :-)
cloudHQ is a SaaS service that syncs your cloud. In your case, you'd never need to upload your reports as a data export from Salesforce, but you'll just always have them backed up in a folder labeled "Salesforce Reports" in whichever service you synchronized Salesforce with like: Dropbox, Google Drive, Box, Egnyte, Sharepoint, etc.
The service is not free, but there's a free 15 day trial. To date, there's no other service that actually syncs your Salesforce reports with other cloud storage companies in real-time.
Here's where you can try it out: https://cloudhq.net/salesforce
I hope this helps you!
Cheers,
Naomi
Be careful that you know what you're getting in the back-up file. The backup is a zip of 65 different CSV files. It's raw data, outside of the Salesforce UI cannot be used very easily.
Our company makes the free DataExportConsole command line tool to fully automate the process. You do the following:
Automate the weekly Data Export with the Salesforce scheduler
Use the Windows Task Scheduler to run the FuseIT.SFDC.DataExportConsole.exe file with the right parameters.
I recently wrote a small PHP utility that uses the Bulk API to download a copy of sObjects you define via a json config file.
It's pretty basic but can easily be expanded to suit your needs.
Force.com Replicator on github.
Adding a Python3.6 solution. Should work (I haven't tested it though). Make sure the packages (requests, BeautifulSoup and simple_salesforce) are installed.
import os
import zipfile
import requests
import subprocess
from datetime import datetime
from bs4 import BeautifulSoup as BS
from simple_salesforce import Salesforce
def login_to_salesforce():
sf = Salesforce(
username=os.environ.get('SALESFORCE_USERNAME'),
password=os.environ.get('SALESFORCE_PASSWORD'),
security_token=os.environ.get('SALESFORCE_SECURITY_TOKEN')
)
return sf
org_id = "SALESFORCE_ORG_ID" # canbe found in salesforce-> company profile
export_page_url = "https://XXXX.my.salesforce.com/ui/setup/export/DataExportPage/d?setupid=DataManagementExport"
sf = login_to_salesforce()
cookie = {'oid': org_id, 'sid':sf.session_id}
export_page = requests.get(export_page_url, cookies=cookie)
export_page = export_page.content.decode()
links = []
parsed_page = BS(export_page)
_path_to_exports = "/servlet/servlet.OrgExport?fileName="
for link in parsed_page.findAll('a'):
href = link.get('href')
if href is not None:
if href.startswith(_path_to_exports):
links.append(href)
print(links)
if len(links) == 0:
print("No export files found")
exit(0)
today = datetime.today().strftime("%Y_%m_%d")
download_location = os.path.join(".", "tmp", today)
os.makedirs(download_location, exist_ok=True)
baseurl = "https://zageno.my.salesforce.com"
for link in links:
filename = baseurl + link
downloadfile = requests.get(filename, cookies=cookie, stream=True) # make stream=True if RAM consumption is high
with open(os.path.join(download_location, downloadfile.headers['Content-Disposition'].split("filename=")[1]), 'wb') as f:
for chunk in downloadfile.iter_content(chunk_size=100*1024*1024): # 50Mbs ??
if chunk:
f.write(chunk)
I have added a feature in my app to automatically backup the weekly/monthly csv files to S3 bucket, https://app.salesforce-compare.com/
Create a connection provider (currently only AWS S3 is supported) and link it to a SF connection (needs to be created as well).
On the main page you can monitor the progress of the scheduled job and access the files in the bucket
More info: https://salesforce-compare.com/release-notes/

Does Google App Engine support ftp?

Now I use my own Java FTP program to ftp objects from my PC to my ISP's website server.
I want to use Google App Engine's servlet to get Paypal IPN messages, then store the messages into my own objects and ftp the objects to my ISP's website server, is this doable ? I heard Google App Engine doesn't support FTP.
I don't expect Google to do it for me, but can I use my own Java FTP program in the web app that I upload onto the App Engine to do it ?
Frank
No, you can't open any socket connection except by using URL Fetch
service on HTTP/HTTPS to these port ranges:
80-90, 440-450, 1024-65535.
As of April 9 this year (SDK 1.7.7) this isn't a problem any longer. Outbound sockets (e.g. FTP) are generally available to all billing-enabled apps.
Socket API Overview (Java): https://developers.google.com/appengine/docs/java/sockets/
UPDATE: Our code below may no longer work. This FTP code worked for us before but we see a comment now below that says FTP is no longer supported on App Engine. See the link below. If you try this code and it works or doesn't work for you for straight FTP (TLS is NOT supported BTW) - please comment.
Yes. FTP now works on Google App Engine. (The accepted answer is outdated and no longer true.)
Here is tested and working code on GAE.
#!/usr/bin/env python
from google.appengine.ext import webapp
from ftplib import FTP
class HwHandler(webapp.RequestHandler):
def get(self):
self.response.out.write('FTP Starting...<br>')
ftp = FTP('ftp_site.com')
ftp.login('login', 'password')
ftp.retrlines('LIST') # list directory contents
self.response.out.write('FTP opened')
ftp.quit()
app = webapp.WSGIApplication([
('/', HwHandler)
], debug=True)
Of note, FTP TLS does not appear to work currently. (Trying to do "from ftplib import FTP_TLS" fails.)
You can use the Apache Commons FTP client (org.apache.commons.net.ftp.FTPClient) if you put it into passive mode. Just do the following:
FTPClient client = new FTPClient();
client.connect(FTP_HOST);
client.enterLocalPassiveMode();
Then it won't call ServerSocketFactory, and life should be good!

Resources