My akka stream is stopping after a single element. Here's my stream:
val firehoseSource = Source.actorPublisher[FirehoseActor.RawTweet](
FirehoseActor.props(
auth = ...
)
)
val ref = Flow[FirehoseActor.RawTweet]
.map(r => ResponseParser.parseTweet(r.payload))
.map { t => println("Received: " + t); t }
.to(Sink.onComplete({
case Success(_) => logger.info("Stream completed")
case Failure(x) => logger.error(s"Stream failed: ${x.getMessage}")
}))
.runWith(firehoseSource)
FirehoseActor connects to the Twitter firehose and buffers messages to a queue. When the actor receives a Request message, it takes the next element and returns it:
def receive = {
case Request(_) =>
logger.info("Received request for next firehose element")
onNext(RawTweet(queue.take()))
}
The problem is that only a single tweet is being printed to the console. The program doesn't quit or throw any errors, and I've sprinkled logging statements around, and none are printed.
I thought the sink would keep applying pressure to pull elements through but that doesn't seem to be the case since neither of the messages in Sink.onComplete get printed. I also tried using Sink.ignore but that only printed a single element as well. The log message in the actor only gets printed once as well.
What sink do I need to use to make it pull elements through the flow indefinitely?
Ah I should have respected totalDemand in my actor. This fixes the issue:
def receive = {
case Request(_) =>
logger.info("Received request for next firehose element")
while (totalDemand > 0) {
onNext(RawTweet(queue.take()))
}
I was expecting to receive a Request for each element in the stream, but apparently each Flow will send a Request.
Related
I'm using OKHttpClient in a Kotlin app to post a file to an API that gets processed. While the process is running the API is sending back messages to keep the connection alive until the result has been completed. So I'm receiving the following (this is what is printed out to the console using println())
{"status":"IN_PROGRESS","transcript":null,"error":null}
{"status":"IN_PROGRESS","transcript":null,"error":null}
{"status":"IN_PROGRESS","transcript":null,"error":null}
{"status":"DONE","transcript":"Hello, world.","error":null}
Which I believe is being separated by a new line character, not a comma.
I figured out how to extract the data by doing the following but is there a more technically correct way to transform this? I got it working with this but it seems error-prone to me.
data class Status (status : String?, transcript : String?, error : String?)
val myClient = OkHttpClient ().newBuilder ().build ()
val myBody = MultipartBody.Builder ().build () // plus some stuff
val myRequest = Request.Builder ().url ("localhost:8090").method ("POST", myBody).build ()
val myResponse = myClient.newCall (myRequest).execute ()
val myString = myResponse.body?.string ()
val myJsonString = "[${myString!!.replace ("}", "},")}]".replace (",]", "]")
// Forces the response from "{key:value}{key:value}"
// into a readable json format "[{key:value},{key:value},{key:value}]"
// but hoping there is a more technically sound way of doing this
val myTranscriptions = gson.fromJson (myJsonString, Array<Status>::class.java)
An alternative to your solution would be to use a JsonReader in lenient mode. This allows parsing JSON which does not strictly comply with the specification, such as in your case multiple top level values. It also makes other aspects of parsing lenient, but maybe that is acceptable for your use case.
You could then use a single JsonReader wrapping the response stream, repeatedly call Gson.fromJson and collect the deserialized objects in a list yourself. For example:
val gson = GsonBuilder().setLenient().create()
val myTranscriptions = myResponse.body!!.use {
val jsonReader = JsonReader(it.charStream())
jsonReader.isLenient = true
val transcriptions = mutableListOf<Status>()
while (jsonReader.peek() != JsonToken.END_DOCUMENT) {
transcriptions.add(gson.fromJson(jsonReader, Status::class.java))
}
transcriptions
}
Though, if the server continously provides status updates until processing is done, then maybe it would make more sense to directly process the parsed status instead of collecting them all in a list before processing them.
I am using akka http fileUpload method that produces a Source[akka.util.ByteString, Any].
I would like to handle this source in 2 different threads such as:
----> Future(check first rows if ok) -> insert object in db -> HTTP response 201 / 400
|
source ---|
|
----> Future(upload file to S3) -> set object to ready / delete if error...
So far, I managed to do something like this:
val f = for {
uploadResult <- Future(sendFileToS3(filePath, source)) // uploads the file
(extractedLines, fileSize) <- Future(readFileFromS3(filePath)) // reads the uploaded file
} yield(uploadResult, extractedLines, fileSize)
oncomplete(f) {
case Success((uploadResult, extractedLines, fileSize)) => HTTP OK with id of the object created
case Success((uploadResult, extractedLines, fileSize)) if ... => HTTP KO
case Failure(ex) => HTTP KO
}
The problem here is that on large files, the HTTP response is returned when the upload is finished. But what I would like to have is to handle the uploadResult separately from checking the first lines.
Something like
val f = for {
(extractedLines, fileSize) <- Future(readSource(source))
} yield(extractedLines, fileSize)
oncomplete(f) {
case Success((extractedLines, fileSize)) =>
Future(sendFileToS3AndHandle(filePath, source)) //send in another thread
HTTP OK with id of the object created
case Success((extractedLines, fileSize)) if ... => HTTP KO
case Failure(ex) => HTTP KO
}
Did someone have a similar issue and managed to handle it like this?
I have read something about using the source twice but it seems over complicated for my use case (and did not managed to do what I want). Also, I tried to use akka-stream alsoTo but this does not solve the issue about returning the response as soon as the first line check is completed.
Thank you for your help or suggestion.
I'd like to build a load test where the second request is fed from first response. The data extraction is done in a method because it is not only one line of code. My problem is storing the value (id) and load it later. How should the value be stored and loaded? I tried some different approaches, and I come up with this code. The documentation has not helped me.
object First {
val first = {
exec(http("first request")
.post("/graphql")
.headers(headers_0)
.body(RawFileBody("computerdatabase/recordedsimulation/first.json"))
.check(bodyString.saveAs("bodyResponse"))
)
.exec {
session =>
val response = session("bodyResponse").as[String]
session.set("Id", getRandomValueForKey("id", response))
session}
.pause(1)
}
}
object Second {
val second = {
exec(http("Second ${Id}")
.post("/graphql")
.headers(headers_0)
.body(RawFileBody("computerdatabase/recordedsimulation/second.json"))
)
.pause(1)
}
}
val user = scenario("User")
.exec(
First.first,
Second.second
)
setUp(user.inject(
atOnceUsers(1),
)).protocols(httpProtocol)
Your issue is that you're not using the Session properly.
From the documentation:
Warning
Session instances are immutable!
Why is that so? Because Sessions are messages that are dealt with in a multi-threaded concurrent way, so immutability is the best way to deal with state without relying on synchronization and blocking.
A very common pitfall is to forget that set and setAll actually return new instances.
This is exactly what you're doing:
exec { session =>
val response = session("bodyResponse").as[String]
session.set("Id", getRandomValueForKey("id", response))
session
}
It should be:
exec { session =>
val response = session("bodyResponse").as[String]
session.set("Id", getRandomValueForKey("id", response))
}
I need to accept 2 arguments: first is time argument for example "1m", "2h 42m", "1d 23h 3s", second is text. I thought I can just convert input string to array and split it into 2 array using regex maybe, first with "d", "h", "m" and "s", second everything else and convert in back to string. but then I realize I'll need 3rd argument which gonna be optional target channel (dm or current channel, where command been executed), and also what if user want to include 1m in his text (it's reminder command)
The easiest way to do this is to have the user seperate each argument by a comma. Although this creates the issue where the user can't user a comma in their text part. So if that isn't an option, another way to do it is to get the message content and start by stripping parts of it away. You begin by grabbing the time portion with a regex. Then you look for channel mentions and strip those away. What you're left with should solely be the text.
Below is some (non-tested) code which could lead you in the right direction. Give it a try and let me know if you have any problems
let msg = {
content: "1d 3h 45m 52s I feel like 4h would be to long <#222079895583457280>",
mentions: {
channels: ['<#222079895583457280>']
}
};
// Mocked Message object for testing purpose
let messageObject = {
mentions: {
CHANNELS_PATTERN: /<#([0-9]+)>/g
}
}
function handleCommand (message) {
let content = message.content;
let timeParts = content.match(/^(([0-9])+[dhms] )+/g);
let timePart = '';
if (timeParts.length) {
// Get only the first match. We don't care about others
timePart = timeParts[0];
// Removes the time part from the content
content = content.replace(timePart, '');
}
// Get all the (possible) channel mentions
let channels = message.mentions.channels;
let channel = undefined;
// Check if there have been channel mentions
if (channels.length) {
channel = channels[0];
// Remove each channel mention from the message content
let channelMentions = content.match(messageObject.mentions.CHANNELS_PATTERN);
channelMentions.forEach((mention) => {
content = content.replace(mention, '');
})
}
console.log('Timepart:', timePart);
console.log('Channel:', channel, '(Using Discord JS this will return a valid channel to do stuff with)');
console.log('Remaining text:', content);
}
handleCommand(msg);
For the messageObject.mentions.CHANNEL_PATTERN look at this reference
I am using Node to copy 2 million rows from SQL Server to another database, so of course I use the "streaming" option, like this:
const sql = require('mssql')
...
const request = new sql.Request()
request.stream = true
request.query('select * from verylargetable')
request.on('row', row => {
promise = write_to_other_database(row);
})
My problem is that I have do an asynchronous operation with each row ( insert into another database), which takes time.
The reading is faster than the writing, so the "on row" events just keep coming, and memory eventually fills-up with pending promises, and eventually crashes Node. This is frustrating -- the whole point of "streaming" is to avoid this, isn't it?
How can I solve this problem?
To stream millions of rows without crashing, intermittently pause your request.
sql.connect(config, err => {
if (err) console.log(err);
const request = new sql.Request();
request.stream = true; // You can set streaming differently for each request
request.query('select * from dbo.YourAmazingTable'); // or
request.execute(procedure)
request.on('recordset', columns => {
// Emitted once for each recordset in a query
//console.log(columns);
});
let rowsToProcess = [];
request.on('row', row => {
// Emitted for each row in a recordset
rowsToProcess.push(row);
if (rowsToProcess.length >= 3) {
request.pause();
processRows();
}
console.log(row);
});
request.on('error', err => {
// May be emitted multiple times
console.log(err);
});
request.on('done', result => {
// Always emitted as the last one
processRows();
//console.log(result);
});
const processRows = () => {
// process rows
rowsToProcess = [];
request.resume();
}
The problems seems to be caused by reading the stream using "row" events that don't allow you to control the flow of the stream. This should be possible with "pipe" method, but then you end up in a Data Stream and implementing a writable stream - which may be tricky.
A simple solution would be to use Scramjet so your code would be complete in a couple lines:
const sql = require('mssql')
const {DataStream} = require("scramjet");
//...
const request = new sql.Request()
request.stream = true
request.query('select * from verylargetable')
request.pipe(new DataStream({maxParallel: 1}))
// pipe to a new DataStream with no parallel processing
.batch(64)
// optionally batch the requests that someone mentioned
.consume(async (row) => write_to_other_database(row));
// flow control will be done automatically
Scramjet will use promises to control the flow. You can also try increasing the maxParallel method, but keep in mind that in this case the last line could start pushing rows simultaneously.
My own answer: instead of writing to the target database at the same time, I convert each row into an "insert" statement, and push the statement to a message queue ( RabbitMQ, a separate process ). This is fast, and can keep-up with the rate of reading. Another node process pulls from the queue ( more slowly ) and writes to the target database. Thus the big "back-log" of rows is handled by the message queue itself, which is good at that sort of thing.