AWS S3 Upload speed? - reactjs

I'm uploading my files from a project to S3 bucket in AWS. This is my first time uploading a project with AWS so I'm not sure if it usually take this long but its saying it will take over 1 day.
I also have turned on transfer acceleration and turned off everything running in the background which helped but it still seems like a long wait.
Any advice would be really appreciated!

You are uploading a large number of files via a web browser. This would involve overhead for each file and is likely single-threaded.
I would recommend using the AWS Command-Line Interface (CLI) to upload the files. It can upload multiple files simultaneously to take advantage of your bandwidth.
Also, it has the aws s3 sync command, which can recover from failures by only copying files that have not yet been uploaded. (That is, you can run it multiple times.)

Related

How to deploy a web app that needs regular access to large data files

I am trying to deploy a web app I have written, but I am stuck with one element. The bulk of it is just an Angular application that interacts with a MongoDB database, thats all fine. Where I am stuck is that I need local read access to around 10Gb of files (geoTiff digital elevation models) - these dont change and are broken down into 500 or so files. Each time my app needs geographic elevations, it needs to find the right file, read the right bit of the files, return the data - the quicker the better. To reiterate, I am not serving these files, just reading data from them.
In development these files are on my machine and I have no problems, but the files seem to be too large to bundle in the Angular app (runs out of memory), and too large to include in any backend assets folder. I've looked at two serverless cloud hosting platforms (GCP and Heroku) both of which limit the size of the deployed files to around 1Gb (if I remember right). I have considered using cloud storage for the files, but I'm worried about negative performance as each time I need a file it would need to be downloaded from the cloud to the application. The only solution I can think of is to use a VM based service like Google Compute and use an API service to recieve requests from the app and deliver back the required data, but I had hoped it could be more co-located (not least cos that solution costs more $$)...
I'm new to deployment so any advice welcome.
Load your data to a GIS DB, like PostGIS. Then have your app query this DB, instead of the local raster files.

What is the fastest way to download large files from S3 to EC2 in same region?

I want to download large files from s3 to ec2 instances for file manipulation. What would be the fastest and most efficient way to do this?
Thanks in advance!
Use the AWS Command-Line Interface (CLI).
It has an aws s3 cp command to download files, and an aws s3 sync command to synchronize the content between a local directory and S3 (or vice versa).
One technique to achieve speed is to divide the problem into smaller problems, execute the smaller problems in parallel, and then assemble the results. In this case I believe that a utility can be written in which there are a number of workers each of which is responsible for downloading a portion of the file from S3 to EBS/EFS using ranged gets. The workers would all run in parallel. After all the pieces have been downloaded then they can be combined into a single file.
A tool like s5cmd is significantly faster for downloading objects than aws-cli (golang vs python makes a big difference). Their github README has some performance results that show ~10x speed difference.
It can be used like:
s5cmd cp s3://bucketname/object.txt localobject.txt

local GAE datastore does not keep data after computer shuts down

On my local machine (i.e. http://localhost:8080/), I have entered data into my GAE datastore for some entity called Article. After turning off my computer and then restarting next day, I find the datastore empty: no entity. Is there a way to prevent this in the future?
How do I make a copy of the data in my local datastore? Also, will I be able to upload said data later into both localhost and production?
My model is ndb.
I am using Max OS X and Python 2.7, if theses matter.
I have experienced the same problem. Declaring the datastore path when running dev_appserver.py should fix it. These are the arguments I use when starting the dev_appserver
python dev_appserver.py --high_replication --use_sqlite --datastore_path=myapp.datastore --blobstore_path=myapp_blobs
This will use sqlite and save the data in the file myapp.datastore. If you want to save it in a different directory, use --datastore_path=/path/to/myapp/myapp.datastore
I also use --blobstore_path to save my blobs in a specific directory. I have found that it is more reliable to declare which directory to save my blobs. Again, that is --blobstore_path=/path/to/myapp/blobs or whatever you would like.
Since declaring blob and datastore paths, I haven't lost any data locally. More info can be found in the App Engine documentation here:
https://developers.google.com/appengine/docs/python/tools/devserver#Using_the_Datastore
Data in the local datastore is preserved unless you start it with the -c flag to clear it, at least on the PC. You therefore probably have a different issue with temp files or permissions or something.
The local data is stored using a different method to the actual production servers, so not sure if you can make a direct backup as such. If you want to upload data to both the local and deployed servers you can use the Upload tool suite: uploading data
The bulk loader tool can upload and download data to and from your application's datastore. With just a little bit of setup, you can upload new datastore entities from CSV and XML files, and download entity data into CSV, XML, and text files. Most spreadsheet applications can export CSV files, making it easy for non-developers and other applications to produce data that can be imported into your app. You can customize the upload and download logic to use different kinds of files, or do other data processing.
So you can 'backup' by downloading the data to a file.
To load/pull data into the local development server just give it the local URL.
The datastore typically saves to disk when you shut down. If you turned off your computer without shutting down the server, I could see this happening.

What is the best way to Optimise my Apache2/PHP5/MySQL Server for HTTP File Sharing?

I was wondering what optimisations I could make to my server to better it's performance at handling file uploads/downloads.
At the moment I am thinking Apache2 may not be the best HTTP server for this?
Any suggestions or optimisations I could make on my server?
My current set up is an Apache2 HTTP server with PHP dealing with the file uploads which are currently stored in a folder out of the web root and randomly assigned a name which is stored in a MySQL database (along with more file/user information).
When a user wants to download a file, I use the header() function to force the download and readfile() to output the file contents.
You are correct that this is inefficient, but it's not Apache's fault. Serving the files with PHP is going to be your bottleneck. You should look into X-Sendfile, which allows you to tell Apache (via a header inserted by PHP) what file to send (even if it's outside the DocRoot).
The increase in speed will be more pronounced with larger files and heavier loads. Of course an even better way to increase speed is by using a CDN, but that's overkill for most of us.
Using X-Sendfile with Apache/PHP
http://www.jasny.net/articles/how-i-php-x-sendfile/
As for increasing performance with uploads, I have no particular knowledge. In general however, I believe each file upload would "block" one of your Apache workers for a long time, meaning Apache has to spawn more worker processes for other requests. With enough workers spawned, a server can slow noticeably. You may look into Nginx, which is an event-based, rather than process-based, server. This may increase your throughput, but I admit I have never experimented with uploads under Nginx.
Note: Nginx uses the X-Accel-Redirect instead of X-Sendfile.
http://wiki.nginx.org/XSendfile

appengine - ftp alternatives

I have an appengine app and I need to receive files from Third Parties.
The best option to me is to receive the files via ftp, but I have read that it is not possible, at least a year ago.
It is still not possible? Which way could I receive the files?
This is very important to my project, in fact it is indispensable.
Thx a lot!!!!
You need to use the Blobstore.
Edit: To post to the blobstore in Java, the code fragment in this SO question should work (this was for Android; elsewhere, use e.g. Apache HTTPClient). The URL to post to must have been created with createUploadUrl. The simplest way to communicate it to the source server might be a GET URL, e.g. "/makeupload" which is text/plain and contains only the URL to POST to. To prevent unauthorized uploads, you can require a password either in the POST, or already in the GET (e.g. as a query parameter)
The answer depend a lot of the size range of your imports. For small files the Urlfetch API will be sufficient.
I myself tend to import large CSV files ranging from 70–800 MB, in which case the legacy Blobstore and HTTP-POST doesn't cut it. GAE cannot handle HTTP requests >32 MB directly, nor can you upload static files >32 MB for manual import.
Traditionally, I've used a *nix relay for downloading the data files, splitting them into well-formed JSON segments and then submitting maybe 10-30 K HTTP-POST requests back to GAE. all inputs into well-formed. This used to be the only viable work-around, and for >1 GB files it might still come across as the preferred method due to scaling performance (complex import procedures is easily distributed across hundreds of F1 instances).
Luckily, as of April 9 this year (SDK 1.7.7) importing large files directly to GAE isn't much of a problem any longer. Outbound sockets are generally available to all billing-enabled apps, and consequently you'd easily solve the "large files" issue by opening up an FTP connection and downloading.
Sockets API Overview (Python): https://developers.google.com/appengine/docs/python/sockets/

Resources