Akka-Stream stream within stream - akka-stream

I am trying to figure out how to handle a situation where in one of your stage you need to make a call that return an InputStream, where I would deal with that stream as a Source of the stage that comes further down.
e.g.
Source.map(e => Calls that return an InputStream)
.via(processingFlow).runwith(sink.ignore)
I would like that the element going to Processing flow as those coming from the InputStream. This is basically a situation where I am tailing a file, reading each line, the line give me the information about a call I need to make against a CLI api, when making that call I get the Stdout as an InputStream from which to read the result. Result are going to be huge most of the time, so I can just collect the all thing in memory.

you can use StreamConverters utilities to get Sources and Sinks from java.io streams. More info here.
you can use flatMapConcat or flatMapMerge to flatten a stream of Sources into a single stream. More info here.
A quick example could be:
val source: Source[String, NotUsed] = ???
def gimmeInputStream(name: String): InputStream = ???
val processingFlow: Flow[ByteString, ByteString, NotUsed] = ???
source
.map(gimmeInputStream)
.flatMapConcat(is ⇒ StreamConverters.fromInputStream(() ⇒ is, chunkSize = 8192))
.via(processingFlow)
.runWith(Sink.ignore)
However Akka Streams offers a more idiomatic DSL to read/write files in the FileIO object. More info here.
The example becomes:
val source: Source[String, NotUsed] = ???
val processingFlow: Flow[ByteString, ByteString, NotUsed] = ???
source
.flatMapConcat(name ⇒ FileIO.fromPath(Paths.get(name)))
.via(processingFlow)
.runWith(Sink.ignore)

Related

Akka stream each element to ftp sink

I want to write each element in an Akka stream to a (different) FTP file. Using Alpakka I can write each element to the same file using an FTP sink. However I can not seem to figure out how to write each element to a different file.
source.map(el -> /* to byte string */).to(Ftp.toPath("/file.xml", settings));
So every el should end up in a different file.
If you want to use the Alpakka FTP sink, you have to do something along the lines of
def sink(n: String): Sink[String, NotUsed] = Ftp.toPath(s"$n.txt", settings)
source.runForeach(s ⇒ Source.single(s).runWith(sink(s)))
otherwise, you'll need to create your own sink that establishes an FTP connection and writes the data as part of the input handler. You'll need to create your own graph stage to do it. More info about this can be found in the docs.

Apache Flink DataStream API doesn't have a mapPartition transformation

Spark DStream has mapPartition API, while Flink DataStream API doesn't. Is there anyone who could help explain the reason. What I want to do is to implement a API similar to Spark reduceByKey on Flink.
Flink's stream processing model is quite different from Spark Streaming which is centered around mini batches. In Spark Streaming each mini batch is executed like a regular batch program on a finite set of data, whereas Flink DataStream programs continuously process records.
In Flink's DataSet API, a MapPartitionFunction has two parameters. An iterator for the input and a collector for the result of the function. A MapPartitionFunction in a Flink DataStream program would never return from the first function call, because the iterator would iterate over an endless stream of records. However, Flink's internal stream processing model requires that user functions return in order to checkpoint function state. Therefore, the DataStream API does not offer a mapPartition transformation.
In order to implement functionality similar to Spark Streaming's reduceByKey, you need to define a keyed window over the stream. Windows discretize streams which is somewhat similar to mini batches but windows offer way more flexibility. Since a window is of finite size, you can call reduce the window.
This could look like:
yourStream.keyBy("myKey") // organize stream by key "myKey"
.timeWindow(Time.seconds(5)) // build 5 sec tumbling windows
.reduce(new YourReduceFunction); // apply a reduce function on each window
The DataStream documentation shows how to define various window types and explains all available functions.
Note: The DataStream API has been reworked recently. The example assumes the latest version (0.10-SNAPSHOT) which will be release as 0.10.0 in the next days.
Assuming your input stream is single partition data (say String)
val new_number_of_partitions = 4
//below line partitions your data, you can broadcast data to all partitions
val step1stream = yourStream.rescale.setParallelism(new_number_of_partitions)
//flexibility for mapping
val step2stream = step1stream.map(new RichMapFunction[String, (String, Int)]{
// var local_val_to_different_part : Type = null
var myTaskId : Int = null
//below function is executed once for each mapper function (one mapper per partition)
override def open(config: Configuration): Unit = {
myTaskId = getRuntimeContext.getIndexOfThisSubtask
//do whatever initialization you want to do. read from data sources..
}
def map(value: String): (String, Int) = {
(value, myTasKId)
}
})
val step3stream = step2stream.keyBy(0).countWindow(new_number_of_partitions).sum(1).print
//Instead of sum(1), you can use .reduce((x,y)=>(x._1,x._2+y._2))
//.countWindow will first wait for a certain number of records for perticular key
// and then apply the function
Flink streaming is pure streaming (not the batched one). Take a look at Iterate API.

Serving a file efficiently using Play 2.3

I need to serve some content from an Action in the form of a file: basically, I am creating CSV content on the fly and sending it to the client.
I cannot do it using sendFile, since the file does not really exist; I tried using the chunked transfer, but I get a really slow response (in localhost I got the file at about 100KB/s, which I think is really strange).
Is there a way for me to set the content type and write the response "line by line", without having to specify the content length "a priori"?
Here's one way using a simple predefined Enumerator that will produce the response from bytes written to an OutputStream:
def csv = Action {
val enumerator = Enumerator.outputStream { out =>
out.write(...)
// Keep writing to the Enumerator
out.close()
}
Ok.chunked(enumerator.andThen(Enumerator.eof)).withHeaders(
"Content-Type" -> "text/csv",
"Content-Disposition" -> s"attachment; filename=test.csv"
)
}
This is simple enough for relatively small files (or if the process of generating the file is slow by nature), however note that from the documentation this has no back-pressure, reading a large file into the OutputStream can quickly fill up memory if the client can't download it quickly enough.
Update:
After testing this some more it seems like the size of the Byte arrays you write to the OutputStream make a huge difference in throughput.
Using this sample stream:
val s = Stream.continually(0.toByte)
Writing in chunks of 1KB to the OutputStream like this resulted in 6MB/s of throughput:
(0 until 1024*1024).foreach{i =>
out.write(s.take(1024).toArray)
}
However if I only write 10 bytes at a time, the throughput slows to less than 100KB/s. So my suggestion for using this method to write CSVs in a chunked form would be to write multiple rows at a time to the OutputStream rather than one row at a time.

Download File (using Thread class)

Ok, I understand that maybe very stupid question, but i never did it before, so i ask this question. How can i download file (let's say, from the internet) using Thread class?
What do you mean with "using Thread class"? I guess you want to download a file threaded so it does not block your UI or some other part of your program.
Ill assume that your using C++ and WINAPI.
First create a thread. This tutorial provides good information about WIN32 threads.
This thread will be responsible for downloading the file. To do this you simply connect to the webserver on port 80 and send a HTTP GET request for the file you want. It could look similar to this (note the newline characters):
GET /path/to/your/file.jpg HTTP/1.1\r\n
Host: www.host.com\r\n
Connection: close\r\n
\r\n
\r\n
The server will then answer with a HTTP response containing the file with a preceding header. Parse this header and read the contents.
More information on HTTP can be found here.
If would suggest that you do not use threads for downloading files. It's better to use asynchronous constructs that are more targeted towards I/O, since they will incur a lower overhead than threads. I don't know what version of the .NET Framework you are working with, but in 4.5, something like this should work:
private static Task DownloadFileAsync(string uri, string localPath)
{
// Get the http request
HttpWebRequest webRequest = WebRequest.CreateHttp(uri);
// Get the http response asynchronously
return webRequest.GetResponseAsync()
.ContinueWith(task =>
{
// When the GetResponseAsync task is finished, we will come
// into this contiuation (which is an anonymous method).
// Check if the GetResponseAsync task failed.
if (task.IsFaulted)
{
Console.WriteLine(task.Exception);
return null;
}
// Get the web response.
WebResponse response = task.Result;
// Open a file stream for the local file.
FileStream localStream = File.OpenWrite(localPath);
// Copy the contents from the response stream to the
// local file stream asynchronously.
return response.GetResponseStream().CopyToAsync(localStream)
.ContinueWith(streamTask =>
{
// When the CopyToAsync task is finished, we come
// to this continuation (which is also an anonymous
// method).
// Flush and dispose the local file stream. There
// is a FlushAsync method that will flush
// asychronously, returning yet another task, but
// for the sake of brevity I use the synchronous
// method here.
localStream.Flush();
localStream.Dispose();
// Don't forget to check if the previous task
// failed or not.
// All Task exceptions must be observed.
if (streamTask.IsFaulted)
{
Console.WriteLine(streamTask.Exception);
}
});
// since we end up with a task returning a task we should
// call Unwrap to return a single task representing the
// entire operation
}).Unwrap();
}
You would want to elaborate a bit on the error handling. What this code does is in short:
See the code comments for more detailed explanations of how it works.

Read callbacks needed by AudioFileInitializeWithCallbacks? Apple AudioFile API

I am trying to write a lowish level audio writer with the AudioFile & ExtAudioFile APIs. I am creating a new audio file with AudioFileInitializeWithCallbacks but it appears that this needs read & get size callbacks implemented. Why can't this just accept a single write callback and trust that the data has been written sucessfully.
What if I am writing to a stream which I can not seek into such as a CD or a network socket?
Surely this should just continually push data to the write callback and it is my responsibility to write this data where needed returning an error code if the operation didn't succeed.
The docs for AudioFile_SetSizeProc and AudioFile_WriteProc appear to be incorrect as they both talk about read operations "inPosition An offset into the data from which to read.", "#result The callback should return the size of the data.".
At the moment I have got past this by only writing to a file but I get a kExtAudioFileError_InvalidOperationOrder after the first write procedure. What does this mean? There are no comments in the docs about it.
Any pointers or help would be much appriciated.
Apple documentation is wrong here. Check header file AudioFile.h:
/*!
#typedef AudioFile_SetSizeProc
#abstract A callback for setting the size of the file data. used with AudioFileOpenWithCallbacks or AudioFileInitializeWithCallbacks.
#discussion a function that will be called when AudioFile needs to set the size of the file data. This size is for all of the
data in the file, not just the audio data. This will only be called if the file is written to.
#param inClientData A pointer to the client data as set in the inClientData parameter to AudioFileXXXWithCallbacks.
#result The callback should return the size of the data.
*/
typedef OSStatus (*AudioFile_SetSizeProc)(
void * inClientData,
SInt64 inSize);

Resources