Any way to reuse a Source[ByteString, Any] (without keeping it all in memory) - akka-stream

Is there any way to make a Source reusable?
I have an akka-http server that receives a large file upload and then streams the (chunked) data to subscriber websockets and other HTTP servers via HTTP POST. In both cases, there is an API that accepts a Source[ByteString, Any]:
HttpEntity(..., source) in the case of the HTTP POST
BinaryMessage(source) for the websocket
Using these APIs has some advantages over the versions that take a single ByteString (Only need to do a single HTTP post, can recreate the same chunked message, etc.).
So is there a way to make something like this work (without buffering everything in memory)?
val allSinks: Seq[Sink[Source[ByteString, Any], Future[Done]]] = ???
val g = RunnableGraph.fromGraph(GraphDSL.create(allSinks) { implicit builder => sinks =>
import GraphDSL.Implicits._
// Broadcast with an output for each subscriber
val broadcast = builder.add(Broadcast[DataSource](sinks.size))
Source.single(source) ~> broadcast
sinks.foreach(broadcast ~> _)
ClosedShape
})

Sources are Not Reusable
Unfortunately a Source cannot be reused after it has been exhausted. The underlying "source" of the data can be re-used to create separate Source values but each value can be run on at most one stream.
Persistence
If replay capabilities are a requirement then the data being streamed will need to be stored in a persistence mechanism to facilitate replay later. This mechanism could be a filesystem, database, Kafka,...
Below is a mockup using the filesystem.
The incoming POST message body can be streamed to a file in write-mode:
post {
path(Segment) { fileName =>
extractRequestEntity { entity =>
complete {
entity
.dataBytes
.toMat(FileIO.toPath(Paths.get(fileName), Set(CREATE_NEW, WRITE)))(Keep.Right)
.run()
.andThen {
case Success(ioResult) =>
StatusCodes.Ok -> s"wrote ${ioResult.count} bytes"
case Failure(ex) =>
StatusCodes.InternalServerError -> ex.toString
}
}
}
}
}
}
There is then no need to create a Broadcast hub, you simply respond to GET requests with the contents of the file:
path(Segment) { fileName =>
getFromFile(fileName)
}
This takes advantage of the fact that most OSes will allow you to write to a file as a stream of bytes while at the same time reading from a file as a stream of bytes...

Related

Corrupt video uploads when chunking MediaRecorder to Google Cloud platform

I currently am using react hook powered component to record my screen, and subsequently upload it to Google Cloud Storage. However, when it finishes, the file created inside Google Cloud appears to be corrupt.
This is the gist of the code within my React component, where useMediaRecorder is from here: https://github.com/wmik/use-media-recorder -
let {
error,
status,
mediaBlob,
stopRecording,
getMediaStream,
startRecording,
liveStream,
} = useMediaRecorder({
onCancelScreenShare: () => {
stopRecording();
},
onDataAvailable: (chunk) => {
// do the uploading here:
onChunk(chunk);
},
recordScreen: true,
blobOptions: { type: "video/webm;codecs=vp8,opus" },
mediaStreamConstraints: { audio: audioEnabled, video: true },
});
As data becomes available through this hook - it calls onChunk( chunk ) passing a binary Blob through to that method, to perform the upload, I tie in with this section of code to perform the upload:
const onChunk = (binaryData) => {
var formData = new FormData();
formData.append("data", binaryData);
let customerApi = new CustomerVideoApi();
customerApi.uploadRecording(
videoUUID,
formData,
(res) => {},
(err) => {}
);
};
customerApi.uploadRecording looks like this (using axios).
const uploadRecording = (uuid, data, fn, fnErr) => {
axios
.post(endpoint + "/stream/upload", data, {
headers: {
"Content-Type": "multipart/form-data",
},
})
.then(function (response) {
fn(response);
})
.catch(function (error) {
fnErr(error.response);
});
};
The HTTP request succeeds, and all is well with the world: the server side code to upload is based on laravel:
// this is inside the controller.
public function index( Request $request )
{
// Set file attributes.
$filepath = '/public/chunks/';
$file = $request->file('data');
$filename = $uuid . ".webm";
// streamupload
File::streamUpload($filepath, $filename, $file, true);
return response()->json(['uploaded' => true,'uuid'=>$uuid]);
}
// there's a service provider used to create a new macro on the File:: object, providing the facility for appropriate handling the stream:
public function boot()
{
File::macro('streamUpload', function($path, $fileName, $file, $overWrite = true) {
$resource = fopen($file->getRealPath(), 'r+');
$storageClient = new StorageClient([
'projectId' => 'myprjectid',
'keyFilePath' => '/my/path/to/servicejson.json',
]);
$bucket = $storageClient->bucket('mybucket');
$adapter = new GoogleStorageAdapter($storageClient, $bucket);
$filesystem = new Filesystem($adapter);
return $overWrite
? $filesystem->putStream($fileName, $resource)
: $filesystem->writeStream($fileName, $resource);
});
}
So to reiterate:
React app chunks out blobs,
server side determines if it should create or append in Google Cloud Storage
server side succeeds.
4) Video inside Google Cloud platform is corrupted.
However, the video file, inside the Google Cloud container is corrupted and won't play. I'm unsure exactly why it is corrupted, but my guesses so far:
Some sort of Dodgy Mime type problem.. - different browsers seem to handle the codec / filetype differently from the mediarecorder: e.g. Chrome seems to be x-matroska (.mkv?) - firefox different again.. Ideally I would have a container of .webm - notice how I set the file name server side, and it isn't coming from the client. Should it? I'm unsure how to force the MediaRecorder to be a specific mimeType - I thought the blobOptions option should do it, but changing the extension and mime type seems to have little to no impact on the corruption occurring.
Some sort of problem during upload where an HTTP request doesn't execute and finish in order - e.g.
1 onDataAvailable completes second
2 onDataAvailable completes first
3 onDataAvailable completes third
I've sort of ruled this out because I think the chunks should be small enough.
Some sort of problem with Google Cloud Storage APIs that I'm using, perhaps in the wrong way? Does the cloud platform support streaming, and does this library send the correct params to do so?
Some sort of problem with how I'm uploading - should the axios headers be multipart formdata, or something else?
This is the package I'm using for the Server side: https://github.com/Superbalist/flysystem-google-cloud-storage
Can anyone could shed any light on how to achieve this goal of streaming up into Google Cloud without the video from the mediarecorder being corrupted? Hopefully there's enough detail here in the question to help figure it out. The problem as illustrated isn't on getting the file as far as Google cloud, but rather the resulting file being unplayable in any video format.
Update
I've ordered my chunks client side now, and queued them properly before letting them reach the server. No difference to the output. As some have suggested - a single blob upload request works fine.
Tried using streamable config param (from reading source code it seems like chunks need to be a certain size before Google recognises them as a resumable upload
$filesystem = new Filesystem($adapter, [
'resumable'=>true
]);
Not sure how: https://cloud.google.com/storage/docs/performing-resumable-uploads - is implemented within the libraries I'm using, (or within the Google Cloud APIs themselves if at all?). Do I need to implement that myself? Documentation is light on Google's part.
Short version:
The first thing you should do is buffer the whole video locally, and send a single payload to the server and to google drive. This will validate your code for a small video is actually correct. Once you can verify this you can move onto handling multi-chunk uploads.
Longer version:
For starters, you aren't passing the uuid to the request, it's being used:
const uploadRecording = (uuid, data, fn, fnErr) => {
axios
.post(endpoint + "/stream/upload", data, {
headers: {
"Content-Type": "multipart/form-data",
},
})
.then(function (response) {
fn(response);
})
.catch(function (error) {
fnErr(error.response);
});
};
Next, you can't trust how chunking will work, I think you verified this behavior with the out of order result of chunk logging. You need to assume on your server you will get chunks out of order and handle them correctly.
Each chunk you get on the server needs to put in the right place, you can't just "writeStream", you need to write to the explicit binary block. Specifically, on every request specify the byte range: Google docs:
curl -i -X PUT --data-binary #CHUNK_LOCATION \
-H "Content-Length: CHUNK_SIZE" \
-H "Content-Range: bytes CHUNK_FIRST_BYTE-CHUNK_LAST_BYTE/TOTAL_OBJECT_SIZE" \
"SESSION_URI"
CHUNK_LOCATION is the local path to the
chunk that you're currently uploading. CHUNK_SIZE is the number of
bytes you're uploading in the current request. For example, 524288. CHUNK_FIRST_BYTE is the
starting byte in the overall object that the chunk you're uploading
contains. CHUNK_LAST_BYTE is the ending byte in the
overall object that the chunk you're uploading contains.
TOTAL_OBJECT_SIZE is the total size of the
object you are uploading. SESSION_URI is the value returned in the
Location header when you initiated the resumable upload.
Try to eliminate as many variables as possible and pinpoint where exactly the file is getting corrupted.
Since you are using a React(JS) -> Laravel(PHP) -> GoogleCloud path,
first thing I would suggest is to test each step separately:
React -> Laravel - save the file on your server and check if its corrupted at this point
Laravel -> GoogleCloud - Load a file from the server filesystem and upload to cloud and see if it gets corrupted
I don't have experience with Google cloud, but I did something very similar with AWS and found that their video uploading service was extremely picky about the requests (including order of headers that were sent).
Try to compare the specs on the service you are using with your input, make the smallest possible thing that works and start adding variables until you get to the final state.
Also I don't see any kind of data ordering in your code.
If your chunks are close to each other, and with streaming it is highly possible then there is a chance that they will arrive in different order than originally sent. If you just append them to a file without any control of the sorting then the file will indeed get corrupted. Not sure if for webm that would cause just parts of the video to be broken or the entire thing to die.

Read request body with Akka-Http and send each line to the message queue on an Actor

I'm googling for an example that could fit my use case, but I haven't found any so far.
I'm writing an Akka WebService that should process a potentially huge plain text request body sending each line to an Actor's incoming message queue.
Could any of you write some code here or just head me to an example page?
I actually have no idea from where to start: the big problem to me is dealing with streams in general (in my case I want to use Akka streaming library)
To get the request body you can use the extractRequestEntity directive to create your route. Once you have the entity stream you can simply dispatch each line of text to the Actor:
import akka.stream.scaladsl.Framing.delimiter
import akka.util.ByteString
import akka.actor.ActorRef
import akka.http.scaladsl.server.Directives.{extractRequestEntity, onComplete}
val maxLineLength = 256
val streamSplitter = delimiter(ByteString("\n"), maxLineLength)
val actorRef : ActorRef = ??? //not specified in question
val route : Route =
extractRequestEntity { entity =>
onComplete {
entity
.dataBytes
.via(streamSplitter)
.map(_.utf8String)
.runForeach(line => actorRef ! line)
} { _ =>
complete("all lines sent to actor")
}
}
The question doesn't specify whether or not the response is dependent on the results of the Actor processing so the above example simply sends the lines to the Actor and then completes the request with a response containing a simple message.
The route can now form the basis of a server.

NSURLCache cachedresponseforrequest no data

why the responseCache is nil? i'd run this post and really get the responseObject from cache. How can i get the responseCache?
manager.requestSerializer = [AFJSONRequestSerializer serializer];
manager.responseSerializer = [AFJSONResponseSerializer serializer];
manager.requestSerializer.cachePolicy=NSURLRequestReturnCacheDataElseLoad;
[manager POST:URL parameters:paramdic progress:^(NSProgress * _Nonnull uploadProgress) {
} success:^(NSURLSessionDataTask * _Nonnull task, id _Nullable responseObject) {
NSData * data=[NSJSONSerialization dataWithJSONObject:responseObject options:NSJSONWritingPrettyPrinted error:nil];
NSURLCache * cache=[NSURLCache sharedURLCache];
NSCachedURLResponse * responseCache=[cache cachedResponseForRequest:task.originalRequest];
NSCachedURLResponse * response=[[NSCachedURLResponse alloc]initWithResponse:task.response data:data userInfo:nil storagePolicy:NSURLCacheStorageAllowed];
[cache storeCachedResponse:response forRequest:task.originalRequest];
} failure:^(NSURLSessionDataTask * _Nullable task, NSError * _Nonnull error) {
NSLog(#"%#",error);
}];
There are three reasons it is nil at that point:
POST requests are not cached by any iOS/OS X networking code because they are not guaranteed to be idempotent (i.e. they can have side effects, such as storing data on the server). The only way a POST request will ever get stored in an NSURLCache is if you explicitly add it.
POST requests are not cached because NSURLCache uses the URL as the lookup key. Because the URL does not (cannot) include the POST body, any POST operations to the same URL would be returned for a different POST request, which is almost certainly not what you would want. So if you do add it, you'll have to add some custom rewriting of the URL on its way into the cache and custom lookup code to make the URLs unique enough based on specific POST body fields or whatever.
The cache is highly asynchronous, so cached data would not necessarily be available when the request's completion handler runs even if this were a GET request.
This is not necessarily a complete set of reasons. :-)
The cache is intended to reduce network traffic. You shouldn't generally consult it yourself. The normal lookup path used by NSURLSession et al performs checks for certain protocol caching policies (e.g. response expiration) that would not be performed by merely asking the cache if it has a response for a particular key.
If you need a general mechanism for storing a single response for later use by your app (rather than keeping it in memory), you should do so in your own internal dictionary (or, if the response is large, by using a download task and moving the file into a temporary folder in your app's sandbox that you purge on every launch).

Is it possible to save a file directly from a web worker?

I have an entirely browser-based (i.e. no backend) application which analyzes XML data in files which average about 250MB each. The actual parsing and analysis happens in a web worker, which is fed data in 64KB chunks by a FileReader instance, and this is all quite performant.
I have a request from the client to expand this application so that it can generate a .zip file containing the original input file and the results of the analysis, and allow the user to save that file to her local machine. Generating a .zip file in memory with those contents isn't a problem. The problem lies in transferring that much data from the web worker which generates it back to the main browser thread, so that it can be saved; attempting to do this invariably provokes a crash or out-of-memory exception. (I've tried transferring strings all at once and a chunk at a time, and I've tried using an ArrayBuffer as a transferable object to avoid copying. All fail in the same fashion.)
Unfortunately, I don't know any way to invoke a file save operation directly from a worker thread. I know several methods of doing so from the main browser thread, but all of them require either the ability to create DOM nodes (which worker threads of course can't do), or the use of interfaces (i.e. msSaveBlob, saveAs) which no browser seems to expose to a worker thread. I've spent a while looking for possibilities on the web, but found nothing usable; FileWriterSync looked good, but only Chrome supports it, and I need to target IE and Firefox as well.
Is there a method I've overlooked for saving files directly from a web worker? If so, what is it? Or am I just out of luck here?
tl;dr demo
You don't need to copy the entire file to the client side at all. You don't even need to transfer it, in fact. First a recap.
This is how to create Blob from some typed array:
// Some arbitrary binary data
const mydata = new Uint16Array([1,2,3,4,5]);
// mydata vs. mydata.buffer does not seem to make any difference
const blob = new Blob([mydata], {type: "octet/stream"});
You can create an object URL, which is a copy of the original Blob managed by the browser and accessible as URL. I have done this with huge files without seeing performance impact:
const url = URL.createObjectURL(blob);
This is how I typically download URLs:
const link = document.createElement("a");
link.download = "data.bin";
link.href = e.data.link;
link.appendChild(new Text("Download data"));
link.addEventListener("click", function() {
this.parentNode.removeChild(this);
// remember to free the object url, but wait until the download is handled
setTimeout(()=>{URL.revokeObjectURL(e.data.link);}, 500)
});
document.body.appendChild(link);
You can trigger the download automatically by invoking click event on that link. I prefer to let the user decide when to download.
So, all together:
worker.js
// Some arbitrary binary data
const mydata = new Uint16Array([1,2,3,4,5]);
self.onmessage = function(e) {
console.log("Message: ",e.data)
switch(e.data.name) {
case "make-download" :
const blob = new Blob([mydata.buffer], {type: "octet/stream"});
const url = URL.createObjectURL(blob);
self.postMessage({name:"download-link", link:url});
break;
default:
console.error("Unknown message:", e.data.name);
}
}
main.js
var worker = new Worker("worker.js");
worker.addEventListener("message", function(e) {
switch(e.data.name) {
case "download-link" : {
if(e.data.error) {
console.error("Download error: ", e.data.error);
}
else {
const link = document.createElement("a");
link.download = "data.bin";
link.href = e.data.link;
link.appendChild(new Text("Download data"));
link.addEventListener("click", function() {
this.parentNode.removeChild(this);
// remember to free the object url, but wait until the download is handled
setTimeout(()=>{URL.revokeObjectURL(e.data.link);}, 500)
});
document.body.appendChild(link);
}
break;
}
default:
console.error("Unknown message:", e.data.name);
}
});
function requestDownload() {
worker.postMessage({name:"make-download"});
}
When I click Download in my demo, I can see this in my HEX editor:
Looks just fine :)

Returning multiple items with Servlet

Good day, I'm working on a Servlet that must return a PDF file and the message log for the processing done with that file.
So far I'm passing a boolean which I evaluate and return either the log or the file, depending on the user selection, as follows:
//If user Checked the Download PDF
if (isDownload) {
byte[] oContent = lel;
response.setContentType("application/pdf");
response.addHeader("Content-disposition", "attachment;filename=test.pdf");
out = response.getOutputStream();
out.write(oContent);
} //If user Unchecked Download PDF and only wants to see logs
else {
System.out.println("idCompany: "+company);
System.out.println("code: "+code);
System.out.println("date: "+dateValid);
System.out.println("account: "+acct);
System.out.println("documentType: "+type);
String result = readFile("/home/gianksp/Desktop/Documentos/Logs/log.txt");
System.setOut(System.out);
// Get the printwriter object from response to write the required json object to the output stream
PrintWriter outl = response.getWriter();
// Assuming your json object is **jsonObject**, perform the following, it will return your json object
outl.print(result);
outl.flush();
}
Is there an efficient way to return both items at the same time?
Thank you very much
HTTP protocol doesn't allow you to send more than one HTTP response per one HTTP request. With this restriction in mind you can think of the following alternatives:
Let client fire two HTTP requests, for example by specifyingonclick event handler, or, if you returned HTML page in the first response, you could fire another request on window.load or page.ready;
Provide your for an opportunity of choosing what he'd like to download and act in a servlet accordingly: if he chose PDF - return PDF; if he chose text - return text and if he chose both - pack them in an archive and return it.
Note that the first variant is both clumsy and not user friendly and as far as I'm concerned should be avoided at all costs. A page where user controls what he gets is a much better alternative.
You could wrap them in a DTO object or place them in the session to reference from a JSP.

Resources