What's the difference between akka.stream.scaladsl.Source.reduce() and runReduce() functions?
I checked here https://doc.akka.io/api/akka/current/akka/stream/scaladsl/Source.html and it's pretty clear that the reduce() "folds" all the elements using the first element as a "basis". I don't quite understand what's an advantage of using runReduce() for running this Source with a reduce() function. Why does it return a Future?
You need 2 steps to execute an akka stream:
Construct a blueprint
Run it (so-called materialize)
reduce does only step 1, and runReduce does steps 1 and 2.
import akka.actor.ActorSystem
import akka.stream.scaladsl._
import scala.util.{Failure, Success}
import scala.concurrent.ExecutionContext.Implicits.global
implicit val actorSystem = ActorSystem("example")
// reduce
Source(1 to 10).reduce(_ + _).runForeach(println).onComplete {
case Success(v) => println("done")
case Failure(e) => println(e.getMessage)
}
// it prints:
// 55
// done
// runReduce
Source(1 to 10).runReduce(_ + _).onComplete {
case Success(v) => println(v)
case Failure(e) => println(e.getMessage)
}
// it prints:
// 55
Feel free to try the example in playground https://scastie.scala-lang.org/2Iure8pDSUWcLjFVGflyUQ
Related
I have a very simple salary calculator function that receives as parameters input values inside a form and that in the end returns the result with its calculations.
Logic Function
export function calcAnnualSalary(
monthlySalary: string,
healthPlan?: string,
transpostationTicket?: string,
mealTicket?: string,
valueSaturday?: boolean
) {
const annualSalary =
parseFloat(monthlySalary.replace(/\./g, '').replace(',', '.')) * 12
const thirteenth = parseFloat(
monthlySalary.replace(/\./g, '').replace(',', '.')
)
const extraHoliday =
parseFloat(monthlySalary.replace(/\./g, '').replace(',', '.')) / 3
const totalAnnualCrude = annualSalary + thirteenth + extraHoliday
return {
annualSalary,
thirteenth,
extraHoliday,
totalAnnualCrude,
}
}
Testing
With that, I created a very simple test with hardcoded values, I would like to know if this is the best practice to test function calculation logic. To avoid hardcoded for example, I should get the value inside the form, what would it suggest?
import {CalcAnnualSalary} from '~src/components/app/Calculators/CalcAnnualSalary'
import * as Calc from '~src/utils/calculators/'
import * as Lib from '~src/utils/testing-library'
describe('CalculatorAnnualSalary', () => {
it('expect return gross annual salary', () => {
const {annualSalary} = Calc.calcAnnualSalary('1.000,00')
expect(annualSalary).toEqual(12000)
})
})
In the test, you should provide the test double and test data as simply as possible. That reduces the complexity and facilitates testing.
Whether you use static data or dynamically generated test data, keep it simple. With simple test data, you can also more easily predict the desired results.
The test is predictable, when writing test cases, you should provide the desired result before running the test cases, if your input is complicated, the desired result is difficult to calculate, you need to execute the code in your brain with this data.
Use simple test data to test every code branch, the logical fragment of a function.
I am executing 2 consecutive scenarios, I have a requirement where I need to record current time before start of 1st scenario and then pass that time value to next scenario. Can someone please suggest how this can be implemented. Please check below my code
def fileUpload() = foreach("${datasetIdList}","datasetId"){
println("File Upload Start Time::::"+Calendar.getInstance().getTime+" for datasetId ::: ${datasetId}")
exec(http("file upload").post("/datasets/${datasetId}/uploadFile")
.formUpload("File","./src/test/resources/data/Scan_good.csv")
.header("content-type","multipart/form-data")
.check(status is 200).check(status.saveAs("uploadStatus")))
.exec(session => {
if(session("uploadStatus").as[Int] == 200)
counter +=1
session
})
}
def getDataSetId() = foreach("${datasetIdList}","datasetId"){
exec(http("get datasetId")
.get("/datasets/${datasetId}")
.header("content-type","application/json")
.check(status is 200)
)
I need to record upload start time for each iteration of datasetIdList and pass that value to next scenario and print that value for each datasetId. can someone please suggest how this can be implemented
You may try using before section
package load
import io.gatling.core.Predef._
import io.gatling.http.Predef._
class TransferTimeSimulation extends Simulation {
var beforeScn1Start: Long = 0L
before {
println("Simulation is about to start!")
beforeScn1Start = System.currentTimeMillis()
}
after {
println("Simulation is finished!")
}
val scn1 = scenario("Scenario 1").exec(
http("get google")
.get("http://google.com")
.check(status.is(200))
)
val scn2 = scenario("Scenario 2")
.exec { session =>
println("beforeScn1Start = " + beforeScn1Start)
session
}
setUp(
scn1.inject(atOnceUsers(1))
.andThen(scn2.inject(atOnceUsers(1)))
)
.protocols(http)
.maxDuration(10)
.assertions(
forAll.failedRequests.count.is(0),
)
}
For more flexibilty you may also consider using lazy val initialization
https://www.baeldung.com/scala/lazy-val
I've distilled this down to as few lines of code as I could to get to the bottom of this issue.
currently these are the config constants below (I'm using a array of length 1 to represent tokenised words I'm doing semantic analysis on.
export const top_words = 10000;
export const max_review_length = 1
export const embedding_vector_length = 32
Here is the code, I've substituted the tensors with mock tokens or one word length for now. I'm getting typescript linting errors showing that .print() or .dataSync()[0] will fail on the basis that they do not exist. the line of code in question (.predict) is returning a tensor which has no print or datasync method
const x_train = tf.tensor([[80], [86], [10], [1], [2]]);
const y_train = tf.tensor([[1],[1],[1],[0],[0]])
const x_val = tf.tensor([[1], [3], [102], [100], [104]]);
const y_val = tf.tensor([[0],[0],[1],[1],[1]])
const model = tf.sequential();
model.add(tf.layers.embedding({ inputDim: dictionary.size, inputLength: max_review_length, outputDim: 1 }))
model.add(tf.layers.lstm({units: 200, dropout: 0.2, recurrentDropout: 0.2}))
model.add(tf.layers.dense({units: 1, activation:'sigmoid'}))
model.compile({ loss:'binaryCrossentropy', optimizer:'rmsprop', metrics:['accuracy'] })
const history=model.fit(x_train, y_train,{epochs: 12, batchSize: 5})
history.then(hist => console.log(hist.history.loss)) // Show error loss vs epoch
const predictOut = model.predict(tf.tensor2d([10]))
predictOut.print() or predictOut.dataSync()[0]
returns
If you are using TypeScript you need to specify what predict() returns in such way:
(model.predict(...) as tf.Tensor).print()
since predict() can return either a Tensor or Tensor[]
Ok, so one thing thats easy to forget if you're not used to dealing with Python. Python is syncronous!
the model is async so to solve this problem in this code.
history (the result)
history.then(result => {
model.predict(tftensor2d([10)).print()
console.log('loss ', result.history.loss)
}
otherwise the model doesnt yet have a predict method as it is still calculating.
Gotta love async.
I need to process large file by lines and do some heavy work (on 4 core cpu) on every item, I think code correct:
implicit val system = ActorSystem("TestSystem")
implicit val materializer = ActorMaterializer()
import system.dispatcher
val sink = Sink.foreach[String](elem => println("element proceed"))
FileIO.fromPath(Paths.get("file.txt"))
.via(Framing.delimiter(ByteString("\n"), 64).map(_.utf8String))
.mapAsync(4)(v =>
//long op
Future {
Thread.sleep(500)
"updated_" + v
})
.to(sink)
.run()
But I want to have output like:
100 element proceed
200 element proceed
300 element proceed
357 element proceed. done
How to implement it?
You can use Flow.grouped:
val groupSize = 100
val groupedFlow = Flow[String].grouped(groupSize)
This Flow can now be injected before or after your mapAsync:
FileIO.fromPath(Paths.get("file.txt"))
.via(Framing.delimiter(ByteString("\n"), 64).map(_.utf8String))
.via(groupedFlow)
...
I tried to enhance the Flink example displaying the usage of streams.
My goal is to use the windowing features (see the window function call).
I assume that the code below outputs the sum of last 3 numbers of the stream.
(the stream is opened thanks to nc -lk 9999 on ubuntu)
Actually, the output sums up ALL numbers entered. Switching to a time window produces the same result, i.e. no windowing produced.
Is that a bug? (version used: latest master on github )
object SocketTextStreamWordCount {
def main(args: Array[String]) {
val hostName = args(0)
val port = args(1).toInt
val env = StreamExecutionEnvironment.getExecutionEnvironment
// Create streams for names and ages by mapping the inputs to the corresponding objects
val text = env.socketTextStream(hostName, port)
val currentMap = text.flatMap { (x:String) => x.toLowerCase.split("\\W+") }
.filter { (x:String) => x.nonEmpty }
.window(Count.of(3)).every(Time.of(1, TimeUnit.SECONDS))
// .window(Time.of(5, TimeUnit.SECONDS)).every(Time.of(1, TimeUnit.SECONDS))
.map { (x:String) => ("not used; just to have a tuple for the sum", x.toInt) }
val numberOfItems = currentMap.count
numberOfItems print
val counts = currentMap.sum( 1 )
counts print
env.execute("Scala SocketTextStreamWordCount Example")
}
}
The problem seems to be that there is an implicit conversion from WindowedDataStream to DataStream. This implicit conversion calls flatten() on the WindowedDataStream.
What happens in your case is that the code gets expanded to this:
val currentMap = text.flatMap { (x:String) => x.toLowerCase.split("\\W+") }
.filter { (x:String) => x.nonEmpty }
.window(Count.of(3)).every(Time.of(1, TimeUnit.SECONDS))
.flatten()
.map { (x:String) => ("not used; just to have a tuple for the sum",x.toInt) }
What flatten() does is similar to a flatMap() on a collection. It takes the stream of windows which can be seen as a collection of collections ([[a,b,c], [d,e,f]]) and turns it into a stream of elements: [a,b,c,d,e,f].
This means that your count really operates only on the original stream that has been windowed and "de-windowed". This looks like it has never been windowed at all.
This is a problem and I will work on fixing this right away. (I'm one of the Flink committers.) You can track the progress here: https://issues.apache.org/jira/browse/FLINK-2096
The way to do it with the current API is this:
val currentMap = text.flatMap { (x:String) => x.toLowerCase.split("\\W+") }
.filter { (x:String) => x.nonEmpty }
.map { (x:String) => ("not used; just to have a tuple for the sum",x.toInt) }
.window(Count.of(3)).every(Time.of(1, TimeUnit.SECONDS))
WindowedDataStream has a sum() method so there will be no implicit insertion of the flatten() call. Unfortunately, count() is not available on WindowedDataStream so for this you have to manually add a 1 field to the tuple and count these.