backpressure is not properly handled in akka-streams - akka-stream

I wrote a simple stream using akka-streams api assuming it will handle my source but unfortunately it doesn't. I am sure I am doing something wrong in my source. I simply created an iterator which generate very large number of elements assuming it won't matter because akka-streams api will take care of backpressure. What am I doing wrong, this is my iterator.
def createData(args: Array[String]): Iterator[TimeSeriesValue] = {
var data = new ListBuffer[TimeSeriesValue]()
for (i <- 1 to range) {
sessionId = UUID.randomUUID()
for (j <- 1 to countersPerSession) {
time = DateTime.now()
keyName = s"Encoder-${sessionId.toString}-Controller.CaptureFrameCount.$j"
for (k <- 1 to snapShotCount) {
time = time.plusSeconds(2)
fValue = new Random().nextLong()
data += TimeSeriesValue(sessionId, keyName, time, fValue)
totalRows += 1
}
}
}
data.iterator
}

The problem is primarily in the line
data += TimeSeriesValue(sessionId, keyName, time, fValue)
You are continuously adding to the ListBuffer with a "very large number of elements". This is chewing up all of your RAM. The data.iterator line is simply wrapping the massive ListBuffer blob inside of an iterator to provide each element one at a time, it's basically just a cast.
Your assumption that "it won't matter because ... of backpressure" is partially true that the akka Stream will process the TimeSeriesValue values reactively, but you are creating a large number of them even before you get to the Source constructor.
If you want this iterator to be "lazy", i.e. only produce values when needed and not consume memory, then make the following modifications (note: I broke apart the code to make it more readable):
def createTimeSeries(startTime: Time, snapShotCount : Int, sessionId : UUID, keyName : String) =
Iterator.range(1, snapShotCount)
.map(_ * 2)
.map(startTime plusSeconds _)
.map(t => TimeSeriesValue(sessionId, keyName, t, ThreadLocalRandom.current().nextLong()))
def sessionGenerator(countersPerSession : Int, sessionID : UUID) =
Iterator.range(1, countersPerSession)
.map(j => s"Encoder-${sessionId.toString}-Controller.CaptureFrameCount.$j")
.flatMap { keyName =>
createTimeSeries(DateTime.now(), snapShotCount, sessionID, keyName)
}
object UUIDIterator extends Iterator[UUID] {
def hasNext : Boolean = true
def next() : UUID = UUID.randomUUID()
}
def iterateOverIDs(range : Int) =
UUIDIterator.take(range)
.flatMap(sessionID => sessionGenerator(countersPerSession, sessionID))
Each one of the above functions returns an Iterator. Therefore, calling iterateOverIDs should be instantaneous because no work is immediately being done and de mimimis memory is being consumed. This iterator can then be passed into your Stream...

Related

Kotlin Parsing json array with new line separator

I'm using OKHttpClient in a Kotlin app to post a file to an API that gets processed. While the process is running the API is sending back messages to keep the connection alive until the result has been completed. So I'm receiving the following (this is what is printed out to the console using println())
{"status":"IN_PROGRESS","transcript":null,"error":null}
{"status":"IN_PROGRESS","transcript":null,"error":null}
{"status":"IN_PROGRESS","transcript":null,"error":null}
{"status":"DONE","transcript":"Hello, world.","error":null}
Which I believe is being separated by a new line character, not a comma.
I figured out how to extract the data by doing the following but is there a more technically correct way to transform this? I got it working with this but it seems error-prone to me.
data class Status (status : String?, transcript : String?, error : String?)
val myClient = OkHttpClient ().newBuilder ().build ()
val myBody = MultipartBody.Builder ().build () // plus some stuff
val myRequest = Request.Builder ().url ("localhost:8090").method ("POST", myBody).build ()
val myResponse = myClient.newCall (myRequest).execute ()
val myString = myResponse.body?.string ()
val myJsonString = "[${myString!!.replace ("}", "},")}]".replace (",]", "]")
// Forces the response from "{key:value}{key:value}"
// into a readable json format "[{key:value},{key:value},{key:value}]"
// but hoping there is a more technically sound way of doing this
val myTranscriptions = gson.fromJson (myJsonString, Array<Status>::class.java)
An alternative to your solution would be to use a JsonReader in lenient mode. This allows parsing JSON which does not strictly comply with the specification, such as in your case multiple top level values. It also makes other aspects of parsing lenient, but maybe that is acceptable for your use case.
You could then use a single JsonReader wrapping the response stream, repeatedly call Gson.fromJson and collect the deserialized objects in a list yourself. For example:
val gson = GsonBuilder().setLenient().create()
val myTranscriptions = myResponse.body!!.use {
val jsonReader = JsonReader(it.charStream())
jsonReader.isLenient = true
val transcriptions = mutableListOf<Status>()
while (jsonReader.peek() != JsonToken.END_DOCUMENT) {
transcriptions.add(gson.fromJson(jsonReader, Status::class.java))
}
transcriptions
}
Though, if the server continously provides status updates until processing is done, then maybe it would make more sense to directly process the parsed status instead of collecting them all in a list before processing them.

Summary of ArrayList ordering in Kotlin (Android)

I am trying to provide a summary of items within an ArrayList (where order matters). Basically, I am setting up an exercise plan with two different types of activities (Training and Assessment). I then will provide a summary of the plan after adding each training/assessment to it.
The structure I have is something along the lines of:
exercisePlan: [
{TRAINING OBJECT},
{TRAINING OBJECT},
{ASSESSMENT OBJECT},
{TRAINING OBJECT}
]
What I want to be able to do is summarise this in a format of:
2 x Training, 1 x Assessment, 1 x Training, which will be displayed in a TextView in a Fragment. So I will have an arbitrarily long string that details the structure and order of the exercise plan.
I have tried to investigate using a HashMap or a plain ArrayList, but it seems pretty messy so I'm looking for a much cleaner way (perhaps a MutableList). Thanks in advance!
ArrayList is just a specific type of MutableList. It's usually preferable to use a plain List, because mutability can make code a little more complex to work with and keep robust.
I'd create a list of some class that wraps an action and the number of consecutive times to do it.
enum class Activity {
Training, Assessment
}
data class SummaryPlanStep(val activity: Activity, val consecutiveTimes: Int) {
override fun toString() = "$consecutiveTimes x $activity"
}
If you want to start with your summary, you can create it and later convert it to a plain list of activities like this:
val summary: List<SummaryPlanStep> = listOf(
SummaryPlanStep(Activity.Training, 2),
SummaryPlanStep(Activity.Assessment, 1),
SummaryPlanStep(Activity.Training, 1),
)
val plan: List<Activity> = summary.flatMap { List(it.consecutiveTimes) { _ -> it.activity } }
If you want to do it the other way around, it's more involved because I don't think there's a built-in way to group consecutive duplicate elements. You could a write a function for that.
fun <T> List<T>.groupConsecutiveDuplicates(): List<Pair<T, Int>> {
if (isEmpty()) return emptyList()
val outList = mutableListOf<Pair<T, Int>>()
var current = first() to 1
for (i in 1 until size) {
val item = this[i]
current = if (item == current.first)
current.first to (current.second + 1)
else {
outList.add(current)
item to 1
}
}
outList.add(current)
return outList
}
val plan: List<Activity> = listOf(
Activity.Training,
Activity.Training,
Activity.Assessment,
Activity.Training
)
val summary: List<SummaryPlanStep> = plan.groupConsecutiveDuplicates().map { SummaryPlanStep(it.first, it.second) }
This is what I have set up to work for me at the moment:
if (exercisePlanSummary.isNotEmpty() && exercisePlanSummary[exercisePlanSummary.size - 1].containsKey(trainingAssessment)) {
exercisePlanSummary[exercisePlanSummary.size - 1][trainingAssessment] = exercisePlanSummary[exercisePlanSummary.size - 1][trainingAssessment]!! + 1
} else {
exercisePlanSummary.add(hashMapOf(trainingAssessment to 1))
}
var textToDisplay = ""
exercisePlanSummary.forEach {
textToDisplay = if (textToDisplay.isNotEmpty()) {
textToDisplay.plus(", ${it.values.toList()[0]} x ${it.keys.toList()[0].capitalize()}")
} else {
textToDisplay.plus("${it.values.toList()[0]} x ${it.keys.toList()[0].capitalize()}")
}
}
where trainingAssessment is a String of "training" or "assessment". exercisePlanSummary is a ArrayList<HashMap<String, Int>>.
What #Tenfour04 has written above is perhaps more appropriate, and a cleaner way of implementing this. But my method is quite simple.

Controlling order of processed elements within CoProcessFunction using custom sources

For testing purposes, I am using the following custom source:
class ThrottledSource[T](
data: Array[T],
throttling: Int,
beginWaitingTime: Int = 0,
endWaitingTime: Int = 0
) extends SourceFunction[T] {
private var isRunning = true
private var offset = 0
override def run(ctx: SourceFunction.SourceContext[T]): Unit = {
Thread.sleep(beginWaitingTime)
val lock = ctx.getCheckpointLock
while (isRunning && offset < data.length) {
lock.synchronized {
ctx.collect(data(offset))
offset += 1
}
Thread.sleep(throttling)
}
Thread.sleep(endWaitingTime)
}
override def cancel(): Unit = isRunning = false
and using it like this within my test
val controlStream = new ThrottledSource[Control](
data = Array(c1,c2), endWaitingTime = 10000, throttling = 0,
)
val dataStream = new ThrottledSource[Event](
data = Array(e1,e2,e3,e4,e5),
throttling = 1000,
beginWaitingTime = 2000,
endWaitingTime = 2000,
)
val dataStream = env.addSource(events)
env.addSource(controlStream)
.connect(dataStream)
.process(MyProcessFunction)
My intent is to get all the control elements first (that is why I don't specify any beginWaitingTime nor any throttling). In processElement1 and processElement2 within MyProcessFunction I print the elements when I receive them. Most of the times I get the two control elements first as expected, but quite surprisingly to me from time to time I am getting data elements first, despite the two-second delay used for the data source to start emitting its elements. Can anyone explain this to me?
The control and data stream source operators are running in different threads, and as you've seen, there's no guarantee that the source instance running the control stream will get a chance to run before the instance running the data stream.
You could look at the answer here and its associated code on github for one way to accomplish this reliably.

How to buffer and drop a chunked bytestring with a delimiter?

Lets say you have a publisher using broadcast with some fast and some slow subscribers and would like to be able to drop sets of messages for the slow subscriber without having to keep them in memory. The data consists of chunked ByteStrings, so dropping a single ByteString is not an option.
Each set of ByteStrings is followed by a terminator ByteString("\n"), so I would need to drop a set of ByteStrings ending with that.
Is that something you can do with a custom graph stage? Can it be done without aggregating and keeping the whole set in memory?
Avoid Custom Stages
Whenever possible try to avoid custom stages, they are very tricky to get correct as well as being pretty verbose. Usually some combination of the standard akka-stream stages and plain-old-functions will do the trick.
Group Dropping
Presumably you have some criteria that you will use to decide which group of messages will be dropped:
type ShouldDropTester : () => Boolean
For demonstration purposes I will use a simple switch that drops every other group:
val dropEveryOther : ShouldDropTester =
Iterator.from(1)
.map(_ % 2 == 0)
.next
We will also need a function that will take in a ShouldDropTester and use it to determine whether an individual ByteString should be dropped:
val endOfFile = ByteString("\n")
val dropGroupPredicate : ShouldDropTester => ByteString => Boolean =
(shouldDropTester) => {
var dropGroup = shouldDropTester()
(byteString) =>
if(byteString equals endOfFile) {
val returnValue = dropGroup
dropGroup = shouldDropTester()
returnValue
}
else {
dropGroup
}
}
Combining the above two functions will drop every other group of ByteStrings. This functionality can then be converted into a Flow:
val filterPredicateFunction : ByteString => Boolean =
dropGroupPredicate(dropEveryOther)
val dropGroups : Flow[ByteString, ByteString, _] =
Flow[ByteString] filter filterPredicateFunction
As required: the group of messages do not need to be buffered, the predicate will work on individual ByteStrings and therefore consumes a constant amount of memory regardless of file size.

How to increment a variable in Gatlling Loop

I am trying to write a Gatling script where I read a starting number from a CSV file and loop through, say 10 times. In each iteration, I want to increment the value of the parameter.
It looks like some Scala or Java math is needed but could not find information on how to do it or how and where to combine Gatling EL with Scala or Java.
Appreciate any help or direction.
var numloop = new java.util.concurrent.atomic.AtomicInteger(0)
val scn = scenario("Scenario Name")
.asLongAs(_=> numloop.getAndIncrement() <3, exitASAP = false){
feed(csv("ids.csv")) //read ${ID} from the file
.exec(http("request")
.get("""http://finance.yahoo.com/q?s=${ID}""")
.headers(headers_1))
.pause(284 milliseconds)
//How to increment ID for the next iteration and pass in the .get method?
}
You copy-pasted this code from Gatling's Google Group but this use case was very specific.
Did you first properly read the documentation regarding loops? What's your use case and how doesn't it fit with basic loops?
Edit: So the question is: how do I get a unique id per loop iteration and per virtual user?
You can compute one for the loop index and a virtual user id. Session already has a unique ID but it's a String UUID, so it's not very convenient for what you want to do.
// first, let's build a Feeder that set an numeric id:
val userIdFeeder = Iterator.from(0).map(i => Map("userId" -> i))
val iterations = 1000
// set this userId to every virtual user
feed(userIdFeeder)
// loop and define the loop index
.repeat(iterations, "index") {
// set an new attribute named "id"
exec{ session =>
val userId = session("userId").as[Int]
val index = session("index").as[Int]
val id = iterations * userId + index
session.set("id", id)
}
// use id attribute, for example with EL ${id}
}
Here is my answer to this:
Problem Statement: I had to repeat the gatling execution for configured set of times, and my step name has to be dynamic.
object UrlVerifier {
val count = new java.util.concurrent.atomic.AtomicInteger(0)
val baseUrl = Params.applicationBaseUrl
val accessUrl = repeat(Params.noOfPagesToBeVisited,"index") {
exec(session=> {
val randomUrls: List[String] = UrlFeeder.getUrlsToBeTested()
session.set("index", count.getAndIncrement).set("pageToTest", randomUrls(session("index").as[Int]))
}
).
exec(http("Accessing Page ${pageToTest}")
.get(baseUrl+"${pageToTest}")
.check(status.is(200))).pause(Params.timeToPauseInSeconds)
}
So basically UrlFeeder give me list of String (urls to be tested) and in the exec, we are using count (AtomicInteger), and using this we are populating a variable named 'index' whose value will start from 0 and will be getAndIncremented in each iteration. This 'index' variable is the one which will be used within repeat() loop as we are specifying the name of counterVariable to be used as 'index'
Hope it helps others as well.

Resources