Apache Flink: Insufficient number of network buffers

Apache Flink: Insufficient number of network buffers - apache-flink

I created a project of flink-quickstart-java (DarchetypeArtifactId).
There is a example code WordCount.java.
This is a part of original WordCount.java code.
public static void main(String[] args) throws Exception {
// set up the execution environment
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
// get input data
DataSet<String> text = env.fromElements(
"To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
"Or to take arms against a sea of troubles,"
);
//DataSet<String> text = env.readTextFile("file:///home/jypark2/data3.txt");
DataSet<Tuple2<String, Integer>> counts =
// split up the lines in pairs (2-tuples) containing: (word,1)
text.flatMap(new LineSplitter())
// group by the tuple field "0" and sum up tuple field "1"
.groupBy(0)
.sum(1);
// execute and print result
counts.print();
}
I wanted to read from text file,
so I changed this code.
public static void main(String[] args) throws Exception {
// set up the execution environment
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
// get input data
DataSet<String> text = env.readTextFile("file:///home/jypark2/data3.txt");
DataSet<Tuple2<String, Integer>> counts =
// split up the lines in pairs (2-tuples) containing: (word,1)
text.flatMap(new LineSplitter())
// group by the tuple field "0" and sum up tuple field "1"
.groupBy(0)
.sum(1);
// execute and print result
counts.print();
}
But there is a run-time error.
But I can't solve this.
enter image description here
I tried to change configure with flink-conf.yaml.
I changed
taskmanager.network.numberOfBuffers
and
taskmanager.numberOfTaskSlots
and restarted taskmanager, jobmanger,
local but error message was same.
The configuration on the page localhost:8081 was changed, but
the number of error message dosen't change.
In addition, I ran the example code SocketTextStreamWordCount.java
without any change. But similar error evoked, and the error message said
that Insufficient number of network buffers: required 64, but only 36
available. The total number of network buffers is currently set to 2048.
How can I solve this? Help me...

Related

Making a command that shows all your cooldowns and the time remaining

I need a command that shows the list of the commands with cooldowns and the time remaining to use the command again. Is it possible especially putting the command in a cog? If yes, can you help me how to do the command?

It's definitely possible, to get all commands in cooldown and their remaining time. However, discord.py does not give direct access to it, and thus we have to abuse private class variables to get them
Code:
import datetime
#client.command()
async def cooldowns(ctx: commands.Context):
string = ""
for command in client.walk_commands():
dt = ctx.message.edited_at or ctx.message.created_at
current = dt.replace(tzinfo=datetime.timezone.utc).timestamp()
bucket = command._buckets.get_bucket(ctx.message, current)
if not bucket:
continue
retry_after = bucket.update_rate_limit(current)
if retry_after:
string += f"{command.name} - {retry_after} Seconds\n"
else:
string += f"{command.name} - `READY`\n"
string = string or "No commands are on cooldown"
await ctx.send(string)
Here is what the above code will look like for commands not on cooldown:
It will follow:
command 1 - READY
command 2 - READY
...
command n - READY (and so on)
and, here is how it will look if commands are on cooldown:
Explaining how the code works:
string = ""
We initialize an empty string in a string variable, this is the string we will add our lines to and send it all at once.
for command in bot.walk_commands():
dt = ctx.message.edited_at or ctx.message.created_at
current = dt.replace(tzinfo=datetime.timezone.utc).timestamp()
bucket = command._buckets.get_bucket(ctx.message, current)
bot.walk_command() is a method that gives us a generator object, which is essentially a generator of all the command objects the bot has stored (i.e: everything under #bot.command(), #commands.command, and #commands.group)
dt is just storing the current time the message was created at, it's a datetime object
message.created_at is a time naive offset and so we bind a timezon to it with its replace method and then we get a timestamp of that with the timestamp() method of datetime.datetime objects
all of the things we did is a waste, the third line is the meat and potatoes of what we want. command._buckets.get_bucket is an internal, we provide the current message object and the timestamp we created earlier.
This gives us our Cooldown object (The one you create with #commands.cooldown(1, 5, commands.BucketType.user), it basically yields what is inside the ()) [This is None for commands with no cooldowns]
here is what it looks like, just for understanding.
if not bucket:
continue
retry_after = bucket.update_rate_limit(current)
if retry_after:
string += f"{command.name} - {retry_after} Seconds\n"
else:
string += f"{command.name} - `READY`\n"
await ctx.send(string)
if not bucket:
continue
If a bucket is not found, command does not have cooldown, meaning we can skip over it.
retry_after = bucket.update_rate_limit(current)
This basically gets us our remaining time (it's a float)
returns None if it's not on cooldown
if retry_after:
string += f"{command.name} - {retry_after} Seconds\n"
else:
string += f"{command.name} - `READY`\n"
the if statement checks if it returned a float, if yet then command is on cooldown and we add the command name along side the cooldown time
the else is for if it's not on cooldown, then it adds the command name and READY along side it.
And at the end we send the entire string.
In my case, I only have one command that had cooldown, which is spotify.

Creating a command that shows all cooldowns is possible. You can do this with a for loop and the is_on_cooldown() function.
Here is an example of a command to return a list of commands and if they are on cooldown:
#client.command(pass_content=True)
async def cooldowns(ctx):
cooldown_string = ""
for command in client.commands:
if command.is_on_cooldown(ctx):
cooldown_string += f"\n{command} - **Time Left:** {command.get_cooldown_retry_after(ctx)}MS"
await ctx.send(cooldown_string)
From here you can add condition statements to only show commands on cooldown.

Why does Flink emit duplicate records on a DataStream join + Global window?

I'm learning/experimenting with Flink, and I'm observing some unexpected behavior with the DataStream join, and would like to understand what is happening...
Let's say I have two streams with 10 records each, which I want to join on a id field. Let's assume that for each record in one stream had a matching one in the other, and the IDs are unique in each stream. Let's also say I have to use a global window (requirement).
Join using DataStream API (my simplified code in Scala):
val stream1 = ... // from a Kafka topic on my local machine (I tried with and without .keyBy)
val stream2 = ...
stream1
.join(stream2)
.where(_.id).equalTo(_.id)
.window(GlobalWindows.create()) // assume this is a requirement
.trigger(CountTrigger.of(1))
.apply {
(row1, row2) => // ...
}
.print()
Result:
Everything is printed as expected, each record from the first stream joined with a record from the second one.
However:
If I re-send one of the records (say, with an updated field) from one of the stream to that stream, two duplicate join events get emitted 😞
If I repeat that operation (with or without updated field), I will get 3 emitted events, then 4, 5, etc... 😞
Could someone in the Flink community explain why this is happening? I would have expected only 1 event emitted each time. Is it possible to achieve this with a global window?
In comparison, the Flink Table API behaves as expected in that same scenario, but for my project I'm more interested in the DataStream API.
Example with Table API, which worked as expected:
tableEnv
.sqlQuery(
"""
|SELECT *
| FROM stream1
| JOIN stream2
| ON stream1.id = stream2.id
""".stripMargin)
.toRetractStream[Row]
.filter(_._1) // just keep the inserts
.map(...)
.print() // works as expected, after re-sending updated records
Thank you,
Nicolas

The issue is that records are never removed from your global window. So you trigger the join operation on the global window, whenever a new record has arrived, but the old records are still present.
Thus, to get it running in your case, you'd need to implement a custom evictor. I expanded your example in a minimal working example and added the evictor, which I will explain after the snippet.
val data1 = List(
(1L, "myId-1"),
(2L, "myId-2"),
(5L, "myId-1"),
(9L, "myId-1"))
val data2 = List(
(3L, "myId-1", "myValue-A"))
val stream1 = env.fromCollection(data1)
val stream2 = env.fromCollection(data2)
stream1.join(stream2)
.where(_._2).equalTo(_._2)
.window(GlobalWindows.create()) // assume this is a requirement
.trigger(CountTrigger.of(1))
.evictor(new Evictor[CoGroupedStreams.TaggedUnion[(Long, String), (Long, String, String)], GlobalWindow](){
override def evictBefore(elements: lang.Iterable[TimestampedValue[CoGroupedStreams.TaggedUnion[(Long, String), (Long, String, String)]]], size: Int, window: GlobalWindow, evictorContext: Evictor.EvictorContext): Unit = {}
override def evictAfter(elements: lang.Iterable[TimestampedValue[CoGroupedStreams.TaggedUnion[(Long, String), (Long, String, String)]]], size: Int, window: GlobalWindow, evictorContext: Evictor.EvictorContext): Unit = {
import scala.collection.JavaConverters._
val lastInputTwoIndex = elements.asScala.zipWithIndex.filter(e => e._1.getValue.isTwo).lastOption.map(_._2).getOrElse(-1)
if (lastInputTwoIndex == -1) {
println("Waiting for the lookup value before evicting")
return
}
val iterator = elements.iterator()
for (index <- 0 until size) {
val cur = iterator.next()
if (index != lastInputTwoIndex) {
println(s"evicting ${cur.getValue.getOne}/${cur.getValue.getTwo}")
iterator.remove()
}
}
}
})
.apply((r, l) => (r, l))
.print()
The evictor will be applied after the window function (join in this case) has been applied. It's not entirely clear how your use case exactly should work in case you have multiple entries in the second input, but for now, the evictor only works with single entries.
Whenever a new element comes into the window, the window function is immediately triggered (count = 1). Then the join is evaluated with all elements having the same key. Afterwards, to avoid duplicate outputs, we remove all entries from the first input in the current window. Since, the second input may arrive after the first inputs, no eviction is performed, when the second input is empty. Note that my scala is quite rusty; you will be able to write it in a much nicer way. The output of a run is:
Waiting for the lookup value before evicting
Waiting for the lookup value before evicting
Waiting for the lookup value before evicting
Waiting for the lookup value before evicting
4> ((1,myId-1),(3,myId-1,myValue-A))
4> ((5,myId-1),(3,myId-1,myValue-A))
4> ((9,myId-1),(3,myId-1,myValue-A))
evicting (1,myId-1)/null
evicting (5,myId-1)/null
evicting (9,myId-1)/null
A final remark: if the table API offers already a concise way of doing what you want, I'd stick to it and then convert it to a DataStream when needed.

Flink - behaviour of timesOrMore

I want to find pattern of events that follow
Inner pattern is:
Have the same value for key "sensorArea".
Have different value for key "customerId".
Are within 5 seconds from each other.
And this pattern needs to
Emit "alert" only if previous happens 3 or more times.
I wrote something but I know for sure it is not complete.
Two Questions
I need to access the previous event fields when I'm in the "next" pattern, how can I do that without using the ctx command because it is heavy..
My code brings weird result - this is my input
and my output is
3> {first=[Customer[timestamp=50,customerId=111,toAdd=2,sensorData=33]], second=[Customer[timestamp=100,customerId=222,toAdd=2,sensorData=33], Customer[timestamp=600,customerId=333,toAdd=2,sensorData=33]]}
even though my desired output should be all first six events (users 111/222 and sensor are 33 and then 44 and then 55
Pattern<Customer, ?> sameUserDifferentSensor = Pattern.<Customer>begin("first", skipStrategy)
.followedBy("second").where(new IterativeCondition<Customer>() {
#Override
public boolean filter(Customer currCustomerEvent, Context<Customer> ctx) throws Exception {
List<Customer> firstPatternEvents = Lists.newArrayList(ctx.getEventsForPattern("first"));
int i = firstPatternEvents.size();
int currSensorData = currCustomerEvent.getSensorData();
int prevSensorData = firstPatternEvents.get(i-1).getSensorData();
int currCustomerId = currCustomerEvent.getCustomerId();
int prevCustomerId = firstPatternEvents.get(i-1).getCustomerId();
return currSensorData==prevSensorData && currCustomerId!=prevCustomerId;
}
})
.within(Time.seconds(5))
.timesOrMore(3);
PatternStream<Customer> sameUserDifferentSensorPatternStream = CEP.pattern(customerStream, sameUserDifferentSensor);
DataStream<String> alerts1 = sameUserDifferentSensorPatternStream.select((PatternSelectFunction<Customer, String>) Object::toString);

You will have an easier time if you first key the stream by the sensorArea. They you will be pattern matching on streams where all of the events are for a single sensorArea, which will make the pattern easier to express, and the matching more efficient.
You can't avoid using an iterative condition and the ctx, but it should be less expensive after keying the stream.
Also, your code example doesn't match the text description. The text says "within 5 seconds" and "3 or more times", while the code has within(Time.seconds(2)) and timesOrMore(2).

Extract value from JMSByte message

Hi I have a consumer subscribe to a topic and receive the Byte Message.
The task i want to accomplish is to extract the value from below string i converted.
The code i used to convert Byte message is below:
if (message instanceof BytesMessage){
BytesMessage byteMessage = (BytesMessage) message;
byte[] byteData = null;
try {
byteData = new byte[(int) byteMessage.getBodyLength()];
byteMessage.readBytes(byteData);
byteMessage.reset();
}catch (JMSException e){
e.printStackTrace();
}
String stringMessage = new String(byteData);
System.out.println(stringMessage);
}
The stringMessage us show as below:
2179032 TradeId701118403 clearedTradeUsi
SW005285900447503296# clearedTradeUsiIssuer
1010051�zzz�cleared���i
i want to extract each value seperately like below. But right now i do not have any clue how to do that.. Could anyone help me with that?
TradeId: 70111840
clearedTradeUsi: SW005285900447503296
clearedTradeUsiIssuer: 1010051

As you have tagged your question with regex, I'll provide some regex solutions to find the fields.
For the trade id
TradeId\s?(\d+)
This will allow for a space to be added before the id like there is for some of the other values. The \s? matches none or one space.
For the clearedTradeUsi
clearedTradeUsi\s?(\w+)
For the clearedTradeUsiIssuer
clearedTradeUsiIssuer\s?(\d+)
Each of these regular expressions will match the respective id which will be put into group 1.

Apache Camel: CXF - returning Holder values (error: IndexOutOfBoundsException: Index: 1, Size: 1)

I have a problem with setting holders in my output message.
I have following simple routing and processor:
from("cxf:bean:ewidencjaEndpoint")
.process(new ProcessResult())
.end();
public class ProcessResult implements Processor {
public void process(Exchange exchange) throws Exception {
Object[] args = exchange.getIn().getBody(Object[].class);
long id = (long) args[0];
Holder<A> dataA = (Holder<A>) args[1];
Holder<B> dataB = (Holder<B>) args[2];
exchange.getOut().setBody(new Object[]{ dataA, dataB});
}
I get the following error:
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at org.apache.cxf.jaxws.interceptors.HolderInInterceptor.handleMessage(HolderInInterceptor.java:67)
at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:255)
at org.apache.cxf.endpoint.ClientImpl.onMessage(ClientImpl.java:737)
at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.handleResponseInternal(HTTPConduit.java:2335)
at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream$1.run(HTTPConduit.java:2198)
I've read many similar problems descriped on web (ie.: http://camel.465427.n5.nabble.com/java-lang-IndexOutOfBoundsException-in-cxf-producer-td468541.html) but without any success in resolving the problem.
In debug I get output message like:
Exchange[Message[null,null, A#xxxm B#yyy]]
I don't understand what the foolowing "null" values come from.
I've got only 2 outputs values (in Holders) according to wsdl file (and generated interface). I see also in debug console that in 'out' part of exchange body I have only 2 values set in ProcessResult()(idexed from 2 to 3), and size value of the 'out' part is set to '4' (not 2) ?