AKKA HTTP + AKKA stream 100% CPU utilization - akka-stream

I have a web API exposing one GET endpoint using Akka HTTP and, the logic that it takes the parameter form the requester go and call external web service using AKKA Streams and based on the response it goes and query another endpoint also using akka stream.
first external endpoint call looks like this
def poolFlow(uri: String): Flow[(HttpRequest, T), (Try[HttpResponse], T), HostConnectionPool] =
Http().cachedHostConnectionPool[T](host = uri, 80)
def parseResponse(parallelism: Int): Flow[(Try[HttpResponse], T), (ByteString, T), NotUsed] =
Flow[(Try[HttpResponse], T)].mapAsync(parallelism) {
case (Success(HttpResponse(_, _, entity, _)), t) =>
entity.dataBytes.alsoTo(Sink.ignore)
.runFold(ByteString.empty)(_ ++ _)
.map(e => e -> t)
case (Failure(ex), _) => throw ex
}
def parse(result: String, data: RequestShape): (Coord, Coord, String) =
(data.src, data.dst, result)
val parseEntity: Flow[(ByteString, RequestShape), (Coord, Coord, String), NotUsed] =
Flow[(ByteString, RequestShape)] map {
case (entity, request) => parse(entity.utf8String, request)
}
and the stream consumer
val routerResponse = httpRequests
.map(buildHttpRequest)
.via(RouterRequestProcessor.poolFlow(uri)).async
.via(RouterRequestProcessor.parseResponse(2))
.via(RouterRequestProcessor.parseEntity)
.alsoTo(Sink.ignore)
.runFold(Vector[(Coord, Coord, String)]()) {
(acc, res) => acc :+ res
}
routerResponse
then I do some calculations on routerResponse and create a post to the other external web service,
Second external Stream Consumer
def poolFlow(uri: String): Flow[(HttpRequest, Unit), (Try[HttpResponse], Unit), Http.HostConnectionPool] =
Http().cachedHostConnectionPoolHttps[Unit](host = uri)
val parseEntity: Flow[(ByteString, Unit), (Unit.type, String), NotUsed] = Flow[(ByteString, Unit)] map {
case (entity, _) => parse(entity.utf8String)
}
def parse(result: String): (Unit.type, String) = (Unit, result)
val res = Source.single(httpRequest)
.via(DataRobotRequestProcessor.poolFlow(uri))
.via(DataRobotRequestProcessor.parseResponse(1))
.via(DataRobotRequestProcessor.parseEntity)
.alsoTo(Sink.ignore)
.runFold(List[String]()) {
(acc, res) => acc :+ res._2
}
The Get Endpoint consume the first stream and then build the second request based on the first response,
Notes:
the first external service is fast 1-2 seconds time, and the second's external service is slow 3-4 seconds time.
the first endpoint is being queried using parallelism=2 and the second endpoint is being queried using parallelism=1
The Service is running on AWS ECS Cluster, and for the test purposes it is running on a single node
the problem,
that the web service work for some time but the CPU utilization get higher by dealing with more request, I would assume something to do with back pressure is being triggered, and the CPU stays highly utilized after no request is being sent also which is strange.
Does anybody have a clue whats going on

Related

Kotlin Parsing json array with new line separator

I'm using OKHttpClient in a Kotlin app to post a file to an API that gets processed. While the process is running the API is sending back messages to keep the connection alive until the result has been completed. So I'm receiving the following (this is what is printed out to the console using println())
{"status":"IN_PROGRESS","transcript":null,"error":null}
{"status":"IN_PROGRESS","transcript":null,"error":null}
{"status":"IN_PROGRESS","transcript":null,"error":null}
{"status":"DONE","transcript":"Hello, world.","error":null}
Which I believe is being separated by a new line character, not a comma.
I figured out how to extract the data by doing the following but is there a more technically correct way to transform this? I got it working with this but it seems error-prone to me.
data class Status (status : String?, transcript : String?, error : String?)
val myClient = OkHttpClient ().newBuilder ().build ()
val myBody = MultipartBody.Builder ().build () // plus some stuff
val myRequest = Request.Builder ().url ("localhost:8090").method ("POST", myBody).build ()
val myResponse = myClient.newCall (myRequest).execute ()
val myString = myResponse.body?.string ()
val myJsonString = "[${myString!!.replace ("}", "},")}]".replace (",]", "]")
// Forces the response from "{key:value}{key:value}"
// into a readable json format "[{key:value},{key:value},{key:value}]"
// but hoping there is a more technically sound way of doing this
val myTranscriptions = gson.fromJson (myJsonString, Array<Status>::class.java)
An alternative to your solution would be to use a JsonReader in lenient mode. This allows parsing JSON which does not strictly comply with the specification, such as in your case multiple top level values. It also makes other aspects of parsing lenient, but maybe that is acceptable for your use case.
You could then use a single JsonReader wrapping the response stream, repeatedly call Gson.fromJson and collect the deserialized objects in a list yourself. For example:
val gson = GsonBuilder().setLenient().create()
val myTranscriptions = myResponse.body!!.use {
val jsonReader = JsonReader(it.charStream())
jsonReader.isLenient = true
val transcriptions = mutableListOf<Status>()
while (jsonReader.peek() != JsonToken.END_DOCUMENT) {
transcriptions.add(gson.fromJson(jsonReader, Status::class.java))
}
transcriptions
}
Though, if the server continously provides status updates until processing is done, then maybe it would make more sense to directly process the parsed status instead of collecting them all in a list before processing them.

Flink side output for late data missing

This is my application code
object StreamingJob {
def main(args: Array[String]) {
// set up the streaming execution environment
val env = StreamExecutionEnvironment.getExecutionEnvironment
// define EventTime and Watermark
var sensorData: DataStream[SensorReading] = env.addSource(new SensorSource).assignTimestampsAndWatermarks(
WatermarkStrategy
.forBoundedOutOfOrderness[SensorReading](Duration.ofSeconds(0))
.withTimestampAssigner(new SerializableTimestampAssigner[SensorReading] {
override def extractTimestamp(t: SensorReading, l: Long): Long = t.timestamp
})
)
val outputTag = OutputTag[SensorReading]("late-event")
val minTemp: DataStream[String] = sensorData
.map(r => {
val celsius = (r.temperature - 32) * (5.0 / 9.0)
SensorReading(r.id, r.timestamp, celsius)
})
.keyBy(_.id)
.window(TumblingEventTimeWindows.of(Time.seconds(5)))
.allowedLateness(Time.seconds(10))
.sideOutputLateData(outputTag)
// compute min temperature
.process(new TemperatureMiner)
val lateStream: DataStream[SensorReading] = minTemp.getSideOutput(outputTag)
lateStream.map(r => s"late event: ${r.id}, ${r.timestamp}, ${r.temperature}").print()
minTemp.print()
// execute program
env.execute("Flink Streaming Scala API Skeleton")
}
}
I am pretty sure that late data are captured because I can see log printing that TemperatureMiner is invoke multiple times for one window, hence late firing.
But the problem is that there is nothing printing by lateStream from side output for late data. Any idea why?
A side output for late data in a window is only sent data that is so late that it falls outside the allowed lateness. Perhaps none of your late data is late enough.

Gatling2 Failing to use user session properly

I hope someone can point me into the right direction!
I try to run one scenario which has several steps that have to be executed in order and each with the same user session to work properly. The below code works fine with one user but fails if I use 2 or more users...
What am I doing wrong?
val headers = Map(
Constants.TENANT_HEADER -> tenant
)
val httpConf = http
.baseURL(baseUrl)
.headers(headers)
val scen = scenario("Default Order Process Perf Test")
.exec(OAuth.getOAuthToken(clientId))
.exec(session => OAuth.createAuthHHeader(session, clientId))
.exec(RegisterCustomer.registerCustomer(customerMail, customerPassword,
tenant))
.exec(SSO.doLogin(clientId, customerMail, customerPassword, tenant))
.exec(session => OAuth.upDateAuthToken(session, clientId))
.exec(session =>
UpdateCustomerBillingAddr.prepareBillingAddrRequestBody(session))
.exec(UpdateCustomerBillingAddr.updateCustomerBillingAddr(tenant))
.exec(RegisterSepa.startRegisterProcess(tenant))
.exec(session => RegisterSepa.prepareRegisterRequestBody(session))
.exec(RegisterSepa.doRegisterSepa(tenant))
setUp(
scen
.inject(atOnceUsers(2))
.protocols(httpConf))
object OAuth {
private val OBJECT_MAPPER = new ObjectMapper()
def getOAuthToken(clientId: String) = {
val authCode = PropertyUtil.getAuthCode
val encryptedAuthCode = new
Crypto().rsaServerKeyEncrypt(authCode)
http("oauthTokenRequest")
.post("/oauth/token")
.formParam("refresh_token", "")
.formParam("code", encryptedAuthCode)
.formParam("grant_type", "authorization_code")
.formParam("client_id", clientId)
.check(jsonPath("$").saveAs("oauthToken"))
.check(status.is(200))
}
def createAuthHHeader(session: Session, clientId: String) = {
val tokenString = session.get("oauthToken").as[String]
val tokenDto = OBJECT_MAPPER.readValue(tokenString,
classOf[TokenDto])
val session2 = session.set(Constants.TOKEN_DTO_KEY, tokenDto)
val authHeader = AuthCommons.createAuthHeader(tokenDto,
clientId, new util.HashMap[String, String]())
session2.set(Constants.AUTH_HEADER_KEY, authHeader)
}
def upDateAuthToken(session: Session, clientId: String) = {
val ssoToken = session.get(Constants.SSO_TOKEN_KEY).as[String]
val oAuthDto = session.get(Constants.TOKEN_DTO_KEY).as[TokenDto]
val params = new util.HashMap[String, String]
params.put("sso_token", ssoToken)
val updatedAuthHeader = AuthCommons.createAuthHeader(oAuthDto,
clientId, params)
session.set(Constants.AUTH_HEADER_KEY, updatedAuthHeader)
}
}
def createAuthHHeader(session: Session, clientId: String) = {
val tokenString = session.get("oauthToken").as[String]
val tokenDto = OBJECT_MAPPER.readValue(tokenString,
classOf[TokenDto])
val session2 = session.set(Constants.TOKEN_DTO_KEY, tokenDto)
val authHeader = AuthCommons.createAuthHeader(tokenDto,
clientId, new util.HashMap[String, String]())
session2.set(Constants.AUTH_HEADER_KEY, authHeader)
}
So I did add the two methods that dont work along as expected. In the first part I try to fetch a token and store in the session via check(jsonPath("$").saveAs("oauthToken")) and in the second call I try to read that token with val tokenString = session.get("oauthToken").as[String] which fails with the exception saying that there is no entry for that key in the session...
I've copied it and removed/mocked any missing code references, switched to one of my apps auth url and it seems to work - at least 2 firsts steps.
One thing that seems weird is jsonPath("$").saveAs("oauthToken") which saves whole json (not single field) as attribute, is it really what you want to do? And are you sure that getOAuthToken is working properly?
You said that it works for 1 user but fails for 2. Aren't there any more errors? For debug I suggest changing logging level to TRACE or add exec(session => {println(session); session}) before second step to verify if token is properly saved to session. I think that something is wrong with authorization request (or building that request) and somehow it fails or throws some exception. I would comment out all steps except 1st and focus on checking if that first request is properly executed and if it adds proper attribute to session.
I think your brackets are not set correctly. Change them to this:
setUp(
scn.inject(atOnceUsers(2))
).protocols(httpConf)

Can't use reactivestream Subscriber with akka stream sources

I'm trying to attach a reactivestream subscriber to an akka source.
My source seems to work fine with a simple sink (like a foreach) - but if I put in a real sink, made from a subscriber, I don't get anything.
My context is:
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Sink, Source}
import org.reactivestreams.{Subscriber, Subscription}
implicit val system = ActorSystem.create("test")
implicit val materializer = ActorMaterializer.create(system)
class PrintSubscriber extends Subscriber[String] {
override def onError(t: Throwable): Unit = {}
override def onSubscribe(s: Subscription): Unit = {}
override def onComplete(): Unit = {}
override def onNext(t: String): Unit = {
println(t)
}
}
and my test case is:
val subscriber = new PrintSubscriber()
val sink = Sink.fromSubscriber(subscriber)
val source2 = Source.fromIterator(() => Iterator("aaa", "bbb", "ccc"))
val source1 = Source.fromIterator(() => Iterator("xxx", "yyy", "zzz"))
source1.to(sink).run()(materializer)
source2.runForeach(println)
I get output:
aaa
bbb
ccc
Why don't I get xxx, yyy, and zzz?
Citing the Reactive Streams specs for the Subscriber below:
Will receive call to onSubscribe(Subscription) once after passing an
instance of Subscriber to Publisher.subscribe(Subscriber). No further
notifications will be received until Subscription.request(long) is
called.
The smallest change you can make to see some items flowing through to your subscriber is
override def onSubscribe(s: Subscription): Unit = {
s.request(3)
}
However, keep in mind this won't make it fully compliant to the Reactive Streams specs. It being not-so-easy to implement is the main reason behind higher level toolkits like Akka-Streams itself.

backpressure is not properly handled in akka-streams

I wrote a simple stream using akka-streams api assuming it will handle my source but unfortunately it doesn't. I am sure I am doing something wrong in my source. I simply created an iterator which generate very large number of elements assuming it won't matter because akka-streams api will take care of backpressure. What am I doing wrong, this is my iterator.
def createData(args: Array[String]): Iterator[TimeSeriesValue] = {
var data = new ListBuffer[TimeSeriesValue]()
for (i <- 1 to range) {
sessionId = UUID.randomUUID()
for (j <- 1 to countersPerSession) {
time = DateTime.now()
keyName = s"Encoder-${sessionId.toString}-Controller.CaptureFrameCount.$j"
for (k <- 1 to snapShotCount) {
time = time.plusSeconds(2)
fValue = new Random().nextLong()
data += TimeSeriesValue(sessionId, keyName, time, fValue)
totalRows += 1
}
}
}
data.iterator
}
The problem is primarily in the line
data += TimeSeriesValue(sessionId, keyName, time, fValue)
You are continuously adding to the ListBuffer with a "very large number of elements". This is chewing up all of your RAM. The data.iterator line is simply wrapping the massive ListBuffer blob inside of an iterator to provide each element one at a time, it's basically just a cast.
Your assumption that "it won't matter because ... of backpressure" is partially true that the akka Stream will process the TimeSeriesValue values reactively, but you are creating a large number of them even before you get to the Source constructor.
If you want this iterator to be "lazy", i.e. only produce values when needed and not consume memory, then make the following modifications (note: I broke apart the code to make it more readable):
def createTimeSeries(startTime: Time, snapShotCount : Int, sessionId : UUID, keyName : String) =
Iterator.range(1, snapShotCount)
.map(_ * 2)
.map(startTime plusSeconds _)
.map(t => TimeSeriesValue(sessionId, keyName, t, ThreadLocalRandom.current().nextLong()))
def sessionGenerator(countersPerSession : Int, sessionID : UUID) =
Iterator.range(1, countersPerSession)
.map(j => s"Encoder-${sessionId.toString}-Controller.CaptureFrameCount.$j")
.flatMap { keyName =>
createTimeSeries(DateTime.now(), snapShotCount, sessionID, keyName)
}
object UUIDIterator extends Iterator[UUID] {
def hasNext : Boolean = true
def next() : UUID = UUID.randomUUID()
}
def iterateOverIDs(range : Int) =
UUIDIterator.take(range)
.flatMap(sessionID => sessionGenerator(countersPerSession, sessionID))
Each one of the above functions returns an Iterator. Therefore, calling iterateOverIDs should be instantaneous because no work is immediately being done and de mimimis memory is being consumed. This iterator can then be passed into your Stream...

Resources