Medium to large file uploads with progress updates in AspNet Core - reactjs

By medium to large I mean anything from 10mb -> 200mb (sound files if that is important)
basically I want to make an API that does some spectral analysis on the file itself, this would require a file upload. But for UI/UX reasons it would be nice to have a progress bar for the upload process. What are the common architectures for achieving this interaction.
The client application uploading the file will be a javascript client (reactjs/redux) and the API is written in ASP.NET Core. I have seen some examples which use websockets to update the client on progress, and other examples where the client polls for status updates given a resource url to query the status. Are there any best practices (or the "modern way of doing this") for doing such a thing that I should know of? TIA

In general, you just need to save progress status while reading the input stream in your controller to some variable (session-specific variable, because there might be a few file uploading sessions at the same time) and then get this status from the client-side by ajax requests (or signalr).
You could take a look at this example: https://github.com/DmitrySikorsky/AspNetCoreUploadingProgress
I have tried 11 MB files with no problems. There is line
await Task.Delay(10); // It is only to make the process slower
there, don't forget to remove it in the real solution.
In this sample files are loaded by the ajax, so I didn't try really large files, but you can use iframe solution from this sample:
https://github.com/DmitrySikorsky/AspNetCoreFileUploading
The other part will be almost the same.
Hope this helps you. Feel free to ask if have any additional questions.

Related

Client side JS: Persist blob to disk before saving/prompting user for save location

1. The setup
I'm currently initiating a GET request to an S3 bucket (not important) to download a very large file using the browser fetch(). This file is, in it's stored form, raw and unusable binary data, not structured.
2. The task and problem
There are a few things I want to do on the client-side with this data:
I need to process this data as it streams into the client to perform transformations on it (decryption, for example).
Once the data is processed and downloaded, it might still not be of any immediate use to the user outside the context of the web UI. Maybe the data should stay stored within the web app's sandbox disk space unless a user explicitly exports it?
3. The question
Where can I store this blob of unstructured data in both or either of the use cases listed above? There appear to be many options but none that fit this use case precisely. Any thoughts?
EDIT:
I feel like an idiot. I totally forgot about the FileSystem API. I'll take a look and answer my own question with a pseudo-implementation of this works.
EDIT 2:
I feel the need to reiterate what I stated in 2.2 above:
within the web app's sandbox disk space
I don't care about accessing the user's whole file system. I just want a space I can work with large files in on disk, similar to the app space directories provided to mobile applications by Android and iOS.
If you want to save and process a file at client level, and Blob is not an option, you may consider File System Access API (https://developer.mozilla.org/en-US/docs/Web/API/File_System_Access_API#writing_to_files), even if this will introduce an interaction with the user.
Another option would be to take the advantages of PWAs client-side storage (https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Client-side_web_APIs/Client-side_storage), this is also about your application architecture.
Before to check if to process your file at client level can be done as you need with the existing technologies, check if you really need to do that because it is only option, or, instead, if you are able to move such logic at server level, depending on your use cases.

Implementing a upload queue system in ReactJS/NextJS with WebWorkers

I'm working on a platform where user can create rooms, join them and share content. One of major features that needs to be implemented is a robust media upload system, and I have a pretty good idea of how i'm going to achieve this on the backend with chunk-based file upload. An average size for content that users will be uploading would be something like 200MB
On the frontend, I'm using NextJS and the idea is to have a webworker to handle all media upload logic and a queue system to not get affected by components re-rendering and not have to wait on dialogbox until the process completes and continue in the background Is this approach going to work and is it a good practice? Is it going to scale and not have to be redesigned in the long run ? If Yes, do you know any example of it? If not why and what is your suggestion?
Link to an Image explaining what I'm trying to achieve

Streaming a Growing Log File To Browser (DRF/Angular)

I am using Django Rest Framework/AngularJs for developing a web application.
I have a use case in which server needs to stream the contents of a file in realtime, the file itself is growing since some other application is logging in to it. I know many inefficient ways to do this.
Can you suggest some better ways to achieve this. The Django "view" function should not return till the file is growing but still be able to send the incremental data to the client.
Any help will be appreciated.

GAE - download sample of entities

My production app has 23GB of entities in it. I want to download only a handful of these to my dev app to debug.
I have read through https://developers.google.com/appengine/docs/python/tools/uploadingdata. It explains how to download and upload all data, but not just a sample of the data. What I am looking for is a 'number of entity instances to download' config option, but I cannot see it.
I believe the bulk download does many small batches of entities; you could just stop it part-way through the process.
If your requirement is only a handful of entities and you know which ones or have the query for it, you might write a quick web handler to query the data, and stream the bytes back to function as a simple file download.
In the medium to long term, this might end up as a useful utility to invoke whenever you need the data for debugging.

Question on serving Images on App Engine ( 2 Alternatives )

planning to launch a comic site which serves comic strips (images).
I have little prior experience to serving/caching images.
so these are my 2 methods i'm considering:
1. Using LinkProperty
class Comic(db.Model)
image_link = db.LinkProperty()
timestamp = db.DateTimeProperty(auto_now=True)
Advantages:
The images are get-ed from the disk space itself ( and disk space is cheap i take it?)
I can easily set up app.yaml with an expiration date to cache the content in user's browser
I can set up memcache to retrieve the entities faster (for high traffic)
2. Using BlobProperty
I used this tutorial , it worked pretty neat. http://code.google.com/appengine/articles/images.html
Side question: Can I say that using BlobProperty sort of "protects" my images from outside linkage? That means people can't just link directly to the comic strips
I have a few worries for method 2.
I can obviously memcache these entities for faster reads.
But then:
Is memcaching images a good thing? My images are large (100-200kb per image). I think memcache allows only up to 4 GB of cached data? Or is it 1 Mb per memcached entity, with unlimited entities...
What if appengine's memcache fails? -> Solution: I'd have to go back to the datastore.
How do I cache these images in the user's browser? If I was doing method no. 1, I could just easily add to my app.yaml the expiration date for the content, and pictures get cached user side.
would like to hear your thoughts.
Should I use method 1 or 2? method 1 sounds dead simple and straightforward, should I be wary of it?
[EDITED]
How do solve this dilemma?
Dilemma: The last thing I want to do is to prevent people from getting the direct link to the image and putting it up on bit.ly because the user will automatically get directed to only the image on my server
( and not the advertising/content around it if the user had accessed it from the main page itself )
You're going to be using a lot of bandwidth to transfer all these images from the server to the clients (browsers). Remember appengine has a maximum number of files you can upload, I think it is 1000 but it may have increased recently. And if you want to control access to the files I do not think you can use option #1.
Option #2 is good, but your bandwidth and storage costs are going to be high if you have a lot of content. To solve this problem people usually turn to Content Delivery Networks (CDNs). Amazon S3 and edgecast.com are two such CDNs that support token based access urls. Meaning, you can generate a token in your appengine app that that is good for either the IP address, time, geography and some other criteria and then give your cdn url with this token to the requestor. The CDN serves your images and does the access checks based on the token. This will help you control access, but remember if there is a will, there is a way and you can't 100% secure anything - but you probably get reasonably close.
So instead of storing the content in appengine, you would store it on the cdn, and use appengine to create urls with tokens pointing to the content on the cdn.
Here are some links about the signed urls. I've used both of these :
http://jets3t.s3.amazonaws.com/toolkit/code-samples.html#signed-urls
http://www.edgecast.com/edgecast_difference.htm - look at 'Content Security'
In terms of solving your dilemma, I think that there are a couple of alternatives:
you could cause the images to be
rendered in a Flash object that would
download the images from your server
in some kind of encrypted format that
it would know how to decode. This would
involve quite a bit of up-front work.
you could have a valid-one-time link
for the image. Each time that you
generated the surrounding web page,
the link to the image would be
generated randomly, and the
image-serving code would invalidate
that link after allowing it one time. If you
have a high-traffic web-site, this would be a very
resource-intensive scheme.
Really, though, you want to consider just how much work it is worth to force people to see ads, especially when a goodly number of them will be coming to your site via Firefox, and there's almost nothing that you can do to circumvent AdBlock.
In terms of choosing between your two methods, there are a couple of things to think about. With option one, where are are storing the images as static files, you will only be able to add new images by doing an appcfg.py update. Since AppEngine application do not allow you to write to the filesystem, you will need to add new images to your development code and do a code deployment. This might be difficult from a site management perspective. Also, serving the images form memcache would likely not offer you an improvement performance over having them served as static files.
Your second option, putting the images in the datastore does protect your images from linking only to the extent that you have some power to control through logic if they are served or not. The problem that you will encounter is that making that decision is difficult. Remember that HTTP is stateless, so finding a way to distinguish a request from a link that is external to your application and one that is internal to your application is going to require trickery.
My personal feeling is that jumping through hoops to make sure that people can't see your comics with seeing ads is solving the prolbem the wrong way. If the content that you are publishing is worth protecting, people will flock to your website to enjoy it anyway. Through high volumes of traffic, you will more than make up for anyone who directly links to your image, thus circumventing a few ad serves. Don't try to outsmart your consumers. Deliver outstanding content, and you will make plenty of money.
Your method #1 isn't practical: You'd need to upload a new version of your app for each new comic strip.
Your method #2 should work fine. It doesn't automatically "protect" your images from being hotlinked - they're still served up on a URL like any other image - but you can write whatever code you want in the image serving handler to try and prevent abuse.
A third option, and a variant of #2, is to use the new Blob API. Instead of storing the image itself in the datastore, you can store the blob key, and your image handler just instructs the blobstore infrastructure what image to serve.

Resources