I've recently been using filepond for an enterprise web application that allows end-users to upload a maximum of 1,500 images (medium size, avg 200Kb, max 500Kb).
There is a very low degree of backend processing once an image is uploaded other than its temporary storage in a database. We later perform asynchronous processing picking up the files from that temporary storage. But the current challenge we are seeing is that the browser serialization is extending the upload up to 2 hours! We've been able to decrease this time close to 1 hour by increasing the max parallel uploads in filepond already, but this is still far from acceptable (the target is 20min), and we still see the serialization occurring in Chrome Dev Tools with such a volume of images being uploaded.
With this in mind, I'm currently looking for a new filepond plugin to zip the dropped files and then upload a single archive to the backend, without the user bothering to do that himself. I couldn't find anything related at filepond's plugins page and most listed there seem to be related to image transformation. Hopefully the jszip library could do the trick. Am I on the right track? Any further suggestions?
Other things in the radar our team is exploring:
creating multiple DNS endpoints to increase the number of parallel requests by the browser;
researching CDN services alternatives;
Thanks a bunch!
Related
Sorry about the "stupid" title, but I don't really know how to explain this.
I want to have a webpage on my site (built in React), that will show the release notes for each version of my site/product. I can hardcode the content of the release notes in the page, but I want to do something that allows me not to have to recompile my site just to change content.
My site is hosted in AWS, so I was thinking if there are any patterns to store the content of the page in an S3 bucket as a text file, or as an entry in DynamoDB.
Does this make sense?
These are things I remember, but I would like to ask how "you" have done this in the past.
Thank you in advance!
You could really use either S3 or DynamoDB, though S3 ends up being more favorable for a few reasons.
For S3, the general pattern would be to store your formatted release notes as an HTML file (or multiple files) as S3 objects and have your site make AJAX requests to the S3 object(s) and load the HTML stored there as your formatted release notes.
Benefits:
You can make the request client-side and asynchronous via AJAX, so the rest of the page load time isn't negatively impacted.
If you want to change the formatting of the release notes, you can do so by just changing the S3 object. No site recompilation required.
S3 is cheap.
If you were to use DynamoDB, you would have to request the contents server-side and format them server-side (the format would not be changeable without site recompilation). You get 25 read capacity units for free, but if your site sees a lot of traffic, you could end up paying much more than you would with S3.
I am trying to create a site for e-learning courses (zips html/css/js/media) to be uploaded to.
I am using go on google app engine with google cloud storage to store the zips and extracted courses.
I will explain the development dead ends I have encountered.
My first thought was to use the resumable upload functionality of cloud storage to send the zip file, then read it using go on app engine, unzip the files and write them back to cloud storage.
This took a while to read and understand the documentation and worked perfectly for my 2MB test zip. It failed when I tried it with a modest 67MB zip. I had encountered a hidden limitation when accessing cloud storage from app engine. No matter the client I used there was a 10MB/32MB limit.
I tried both the old and new libraries as well as blobstore.
I also looked into creating a custom oauth2 supporting client library using sockets but hit too many dead ends.
Giving up on that approach I thought even though it would mean more uploading, perhaps just extracting on the client (browser) side then uploading each file with it's own resumable upload would make the most sense. After exploring a few libraries I had extracting in browser working ready to upload.
I wrote my handler that created the datastore entry for the upload, selected a location for the upload and created all the upload urls.
When testing this I was finding that it would take a while to go through generating the long lists of files (anything over 100). I decided that it would make sense since I was using to to make the requests concurrently. I spend a day or two getting that working. After dealing with some CORS issues that weirdly did not show up earlier I had everything working.
Then I started getting errors when stress testing my approach with a large (500mb) zip/course. The uploads would fail and I discovered that when trying to send 300+ files to generate upload urls I was getting the following error
Post http://localhost:62394: dial tcp [::1]:62394: connectex: No connection could be made because the target machine actively refused it.
now I have no idea how to diagnose this. I don't know if I am hitting a rate limit and if I am I don't know how to avoid it.
This seems like creating this should be simple, but it is anything but.
I have a few options I can pursue
Try to create the resumable uploads with a batch operation(https://cloud.google.com/storage/docs/json_api/v1/how-tos/batch)
batch operations to /upload are not supported.
Maybe make requesting each url a one by one api call.
Make requesting the url happen over a channel (https://cloud.google.com/appengine/docs/go/channel/reference)
spend the next week or more adding layers of retries and fallback error handling.
Try another solution.
This should be simple. How should this be done?
We are storing images in Google Cloud Storage. We've generated a link, using the image service getServingUrl(). This link worked for some amount of time (a few hours) and then stopped working. We've got reports that the link is still accessible in the US, but not the UK.
Here's the link: http://lh3.googleusercontent.com/HkwzeUinidFxi-z6OO4ANasEuVYZYduhMiPG2SraKstC5Val0xGdTqvePNMr_bs7FLvj1oNzZjSWZe4dKcugaZ5hzaqfWlw=s36
Is anybody else experiencing this problem at all? If yes, has anyone cut them a ticket to investigate?
This is a known behavior for years. The getServingUrl() generates a temporary link to a CDN that is not guaranteed to last forever.
You have to generate a link on every request or from time to time or to use other solutions.
We ended up moving our images to S3 + CloudFront from Amazon. You can consider https://cloud.google.com/storage/ & https://cloud.google.com/cdn/
The serving URLs do not expire. I have a website with roughly 500k images that has been using the same image URLs for about 4 years now. All image links are still functioning and were only generated once (not on every request)
On my appspot website, I use a third party API to query a large amount of data. The user then downloads the data in CSV. I know how to generate a csv and download it. The problem is that because the file is huge, I get the DeadlineExceededError.
I have tried tried increasing the fetch deadline to 60 (urlfetch.set_default_fetch_deadline(60)). It doesn't seem reasonable to increase it any further.
What is the appropriate way to tackle this problem on Google App Engine? Is this something where I have to use Task Queue?
Thanks.
DeadlineExceededError means that your incoming request took longer than 60 secs, not your UrlFetch call.
Deploy the code to generate the CSV file into a different module that you setup with basic or manual scaling. The URL to download your CSV will become http://module.domain.com
Requests can run indefinitely on modules with basic or manual scaling.
Alternately, consider creating a file dynamically in Google Cloud Storage (GCS) with your CSV content. At that point, the file resides in GCS and you have the ability to generate a URL from which they can download the file directly. There are also other options for different auth methods.
You can see documentation on doing this at
https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/
and
https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/functions
Important note: do not use the Files API (which was a common way of dynamically create files in blobstore/gcs) as it has been depracated. Use the above referenced Google Cloud Storage Client API instead.
Of course, you can delete the generated files after they've been successfully downloaded and/or you could run a cron job to expire links/files after a certain time period.
Depending on your specific use case, this might be a more effective path.
I have the Google picker set up, as well as Blobstore. I'm able to upload files from my local machine to the Blobstore, but now I have the Picker set up, it works, but I don't know know how to use the info (url? fileid?) to then load that selected file into the Blobstore? Any tips on how to do this? I haven't been able to find much of anything on it on Googles resources
There isn't a direct link between the Google Picker and the App Engine Blobstore. They are kind of different tools for different jobs. The Google Picker is designed as an end user tool, to select data from a users Google account. It just so happens that the Picker also provides an upload interface (to Google Drive) as well. The Blobstore on the other hand, is designed as a blob storage mechanism for your App Engine application.
In theory, you could write a script to connect the two, but there are a few considerations:
Your app would need access to the users Google Drive account using OAuth2. This is necessary, as the Picker API is a client side API, whereas the Blobstore API is a server side API. You would need to send the selected document URL to the server, then download the document and finally save it to Blobstore.
Unless you then deleted the data from Drive (very risky due to point 3), your data would be persisted in 2 places
You cannot know for sure if the user selected an existing file, or uploaded a new one
Not a great user experience - the user things they are uploading to Drive
In essence, this sounds like a bad idea! What is your use case?
#Gwyn - I don't have enough reputation to add a comment to your solution, but I had an idea about problem #3: You cannot know for sure if the user selected an existing file, or uploaded a new one
Would it be possible to use Response.VIEW to see what view they were using when the file was selected? If you have one view constructor for Drive files and one for Upload files, something like
var driveView = new google.picker.View(google.picker.ViewId.DOCS);
var uploadView = new google.picker.DocsUploadView();
would that allow you to know whether the file was a new upload (safe to delete) or an existing file (leave it alone)?
Assuming that you want to pick a file from your own Google Drive and move it to the Blobstore.
1)First you have to perform Oauth for Google Drive API
2)Using the picker when you select a file from drive, you need to get it's id
3)Using the id obtained in step 2 you can programmatically download it using Drive API
4)After downloading the file you can use FileService(deprecated though) to upload the file to the
Blobstore.