I have a flink Row with TypeInformation of Object class for all columns, I'm getting below error for timestamp columns while serializing Row to kafka. Flink version: 1.14.0
Caused by: java.lang.IllegalArgumentException: Java 8 date/time type `java.time.LocalDateTime` not supported by default: add Module "org.apache.flink.shaded.jackson2.com.fasterxml.jackson.datatype:jackson-datatype-jsr310" to enable handling
at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper.valueToTree(ObjectMapper.java:3312)
at org.apache.flink.formats.json.JsonRowSerializationSchema.lambda$createFallbackConverter$8c80cf11$1(JsonRowSerializationSchema.java:266)
... 21 more
Related
I have a simple Flink stream processing application (Flink version 1.13). The Flink app reads from Kakfa, does stateful processing of the record, then writes the result back to Kafka.
After reading from Kafka topic, I choose to use reinterpretAsKeyedStream() and not keyBy() to avoid a shuffle, since the records are already partitioned in Kakfa. The key used to partition in Kakfa is a String field of the record (using the default kafka partitioner). The Kafka topic has 24 partitions.
The mapping class is defined as follows. It keeps track of the state of the record.
public class EnvelopeMapper extends
KeyedProcessFunction<String, Envelope, Envelope> {
...
}
The processing of the record is as follows:
DataStream<Envelope> messageStream =
env.addSource(kafkaSource)
DataStreamUtils.reinterpretAsKeyedStream(messageStream, Envelope::getId)
.process(new EnvelopeMapper(parameters))
.addSink(kafkaSink);
With parallelism of 1, the code runs fine. With parallelism greater than 1 (e.g. 4), I am running into the follow error:
2022-06-12 21:06:30,720 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Custom Source -> Map -> Flat Map -> KeyedProcess -> Map -> Sink: Unnamed (4/4) (7ca12ec043a45e1436f45d4b20976bd7) switched from RUNNING to FAILED on 100.101.231.222:44685-bd10d5 # 100.101.231.222 (dataPort=37839).
java.lang.IllegalArgumentException: KeyGroupRange{startKeyGroup=96, endKeyGroup=127} does not contain key group 85
Based on the stack trace, it seems the exception happens when EnvelopeMapper class validates the record is sent to the right replica of the mapper object.
When reinterpretAsKeyedStream() is used, how are the records distributed among the different replicas of the EventMapper?
Thank you in advance,
Ahmed.
Update
After feedback from #David Anderson, replaced reinterpretAsKeyedStream() with keyBy(). The processing of the record is now as follows:
DataStream<Envelope> messageStream =
env.addSource(kafkaSource) // Line x
.map(statelessMapper1)
.flatMap(statelessMapper2);
messageStream.keyBy(Envelope::getId)
.process(new EnvelopeMapper(parameters))
.addSink(kafkaSink);
Is there any difference in performance if keyBy() is done right after reading from Kakfa (marked with "Line x") vs right before the stateful Mapper (EnvelopeMapper).
With
reinterpretAsKeyedStream(
DataStream<T> stream,
KeySelector<T, K> keySelector,
TypeInformation<K> typeInfo)
you are asserting that the records are already distributed exactly as they would be if you had instead used keyBy(keySelector). This will not normally be the case with records coming straight out of Kafka. Even if they are partitioned by key in Kafka, the Kafka partitions won't be correctly associated with Flink's key groups.
reinterpretAsKeyedStream is only straightforwardly useful in cases such as handling the output of a window or process function where you know that the output records are key partitioned in a particular way. To use it successfully with Kafka is can be very difficult: you must either be very careful in how the data is written to Kafka in the first place, or do something tricky with the keySelector so that the keyGroups it computes line up with how the keys are mapped to Kafka partitions.
One case where this isn't difficult is if the data is written to Kafka by a Flink job running with the same configuration as the downstream job that is reading the data and using reinterpretAsKeyedStream.
my use case use beam generate sequence to create unbounded source and do further transforms. it is working fine in direct runner. As per docs, Flink serialize Java primitive types and boxed forms, but I am getting the following error.
Caused by: org.apache.flink.api.common.InvalidProgramException: [lastEmitted type:LONG pos:0, startTime type:RECORD pos:1] is not serializable. The object probably contains or references non serializable fields.
at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:140)
at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:115)
at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:115)
at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:115)
at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:115)
at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:115)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.clean(StreamExecutionEnvironment.java:1558)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.addSource(StreamExecutionEnvironment.java:1470)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.addSource(StreamExecutionEnvironment.java:1414)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.addSource(StreamExecutionEnvironment.java:1396)
at org.apache.beam.runners.flink.FlinkStreamingTransformTranslators$UnboundedReadSourceTranslator.translateNode(FlinkStreamingTransformTranslators.java:221)
... 31 more
Caused by: java.io.NotSerializableException: org.apache.avro.Schema$Field
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at java.util.ArrayList.writeObject(ArrayList.java:768)
source
GenerateSequence.from(0).withRate(1, Duration.standardSeconds(10));
thanks for your help
I am using Apache Flink v1.12.3.
Recently I have encountered this error, and I do not know what exactly it's means. Is error related to the Kafka or Flink?
Error log:
2021-07-02 21:32:50,149 WARN org.apache.kafka.clients.consumer.internals.ConsumerCoordinator [] - [Consumer clientId=consumer-myGroup-24, groupId=myGroup] Offset commit failed on partition my_topic-14 at offset 11328862986: The request timed out.
// ...
2021-07-02 21:32:50,150 INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator [] - [Consumer clientId=consumer-myGroup-24, groupId=myGroup] Group coordinator 1.2.3.4:9092 (id: 2147483616 rack: null) is unavailable or invalid, will attempt rediscovery
// ...
2021-07-02 21:33:20,553 INFO org.apache.kafka.clients.FetchSessionHandler [] - [Consumer clientId=consumer-myGroup-21, groupId=myGroup] Error sending fetch request (sessionId=1923351260, epoch=9902) to node 29: {}.
// ...
2021-07-02 21:33:19,457 INFO org.apache.kafka.clients.FetchSessionHandler [] - [Consumer clientId=consumer-myGroup-15, groupId=myGroup] Error sending fetch request (sessionId=1427185435, epoch=10820) to node 29: {}.
org.apache.kafka.common.errors.DisconnectException: null
// ...
2021-07-02 21:34:10,157 WARN org.apache.flink.runtime.taskmanager.Task [] - Source: my_topic_stream (4/15)#0 (2e2051d41edd606a093625783d844ba1) switched from RUNNING to FAILED.
org.apache.flink.streaming.connectors.kafka.internals.Handover$ClosedException: null
at org.apache.flink.streaming.connectors.kafka.internals.Handover.close(Handover.java:177) ~[blob_p-a7919582483974414f9c0d4744bab53199b880d7-d9edc9d0741b403b3931269bf42a4f6b:?]
at org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.cancel(KafkaFetcher.java:164) ~[blob_p-a7919582483974414f9c0d4744bab53199b880d7-d9edc9d0741b403b3931269bf42a4f6b:?]
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:945) ~[blob_p-a7919582483974414f9c0d4744bab53199b880d7-d9edc9d0741b403b3931269bf42a4f6b:?]
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.lambda$createAndStartDiscoveryLoop$2(FlinkKafkaConsumerBase.java:913) ~[blob_p-a7919582483974414f9c0d4744bab53199b880d7-d9edc9d0741b403b3931269bf42a4f6b:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
It is Kafka's issue. Kafka consumer client throws an error(timeout) while committing the offset to Kafka cluster. One possible reason is the Kafka cluster is busy and cannot response in time. This error makes the task manager running Kafka consumer failed.
Try to add parameters to properties while creating the source stream from Kafka. Possible parameter is: request.timeout.ms, set it to a longer time and then have a try.
References:
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/connectors/datastream/kafka/#kafka-consumer
https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html#consumerconfigs_request.timeout.ms
There are 2 clusters for hive and solr.
I want to import data from hive table in hive cluster to solr cluster.
When the num of fields configured in managed-schema file is about 34 then import successfully, once the num larger than 34(the num is 44) , i will meet exception of connection reset.
Exception info as follow:
{"id":{"name":"id","value":"176036","boost":1.0},
"_id":{"name":"_id","value":"176036","boost":1.0}
... 44fields in json format ...
org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: IOException occured when talking to server at https://ip:port/solr/gp19edamp_qxb_administrative_punishment_all_cs1_shard2_replica1
Caused by: java.net.SocketException: Connection reset
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:115)
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:886)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:857)
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:181)
at org.apache.http.impl.io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:115)
at org.apache.http.impl.io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:122)
at org.apache.http.entity.BufferedHttpEntity.writeTo(BufferedHttpEntity.java:115)
at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:96)
at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:112)
at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117)
at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:216)
at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:237)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:122)
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:686)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:488)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:884)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at com.huawei.solr.client.solrj.impl.InsecureHttpClient.execute(InsecureHttpClient.java:143)
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:644)
... 10 more
You know there is no build-in limit. The limit is going to be dictated by your hardware resources.
Hope your answers~
CoGroupByKey problem
Data description.
I have two datasets.
Records - the first, containes around 0.5-1M of records per (key,day). For testing I use 2-3 keys and 5-10 days of data. What I shoot for is 1000+ keys. Each record contains key, timestamp in μ-seconds and some other data.
Configs - the second, is rather small. It describes the key in time, e.g. you can think about it as a list of tuples: (key, start date, end date, description).
For the exploration I've encoded the data as files of length-prefixed Protocol Buffer binary encoded messages. Additionally the files are packed with gzip. Data is sharded by date. Each file is around 10MB.
Pipeline
I use Apache Beam to express a pipeline.
First I add keys to both datasets. For Records dataset it's (key, day rounded timestamp). For Configs a key is (key, day), where day is each timestamp value between start date and end date (pointing midnight).
The datasets are merged using CoGroupByKey.
As a key type I use org.apache.flink.api.java.tuple.Tuple2 with a Tuple2Coder from repo github.com/orian/tuple-coder.
The problem
If the Records dataset is tiny like 5 days, everything seems fine (check normal_run.log).
INFO [main] (FlinkPipelineRunner.java:124) - Final aggregator values:
INFO [main] (FlinkPipelineRunner.java:127) - item count : 4322332
INFO [main] (FlinkPipelineRunner.java:127) - missing val1 : 0
INFO [main] (FlinkPipelineRunner.java:127) - multiple val1 : 0
When I run the pipeline against 10+ days I encounter an error pointing that for some Records there's no Config (wrong_run.log).
INFO [main] (FlinkPipelineRunner.java:124) - Final aggregator values:
INFO [main] (FlinkPipelineRunner.java:127) - item count : 8577197
INFO [main] (FlinkPipelineRunner.java:127) - missing val1 : 6
INFO [main] (FlinkPipelineRunner.java:127) - multiple val1 : 0
Then I've added some extra logging messages:
(a.java:144) - 68643 items for KeyValue3 on: 1462665600000000
(a.java:140) - no items for KeyValue3 on: 1463184000000000
(a.java:123) - missing for KeyValue3 on: 1462924800000000
(a.java:142) - 753707 items for KeyValue3 on: 1462924800000000 marked as no-loc
(a.java:123) - missing for KeyValue3 on: 1462752000000000
(a.java:142) - 749901 items for KeyValue3 on: 1462752000000000 marked as no-loc
(a.java:144) - 754578 items for KeyValue3 on: 1462406400000000
(a.java:144) - 751574 items for KeyValue3 on: 1463011200000000
(a.java:123) - missing for KeyValue3 on: 1462665600000000
(a.java:142) - 754758 items for KeyValue3 on: 1462665600000000 marked as no-loc
(a.java:123) - missing for KeyValue3 on: 1463184000000000
(a.java:142) - 694372 items for KeyValue3 on: 1463184000000000 marked as no-loc
You can spot that in first line 68643 items were processed for KeyValue3 and time 1462665600000000.
Later on in line 9 it seems the operation processes the same key again, but it reports that no Config was available for these Records.
The line 10 informs they've been marked as no-loc.
The line 2 is saying that there were no items for KeyValue3 and time 1463184000000000, but in line 11 you can read that the items for this (key,day) pair were processed later and they've lacked a Config.
Some clues
During one of the exploration runs I've got an exception (exception_thrown.log).
05/26/2016 03:49:49 GroupReduce (GroupReduce at GroupByKey)(1/5) switched to FAILED
java.lang.Exception: The data preparation for task 'GroupReduce (GroupReduce at GroupByKey)' , caused an error: Error obtaining the sorted input: Thread 'SortMerger spilling thread' terminated due to an exception: Error obtaining the sorted input: Thread 'SortMerger Reading Thread' terminated due to an exception: tried to access field com.esotericsoftware.kryo.io.Input.inputStream from class org.apache.flink.api.java.typeutils.runtime.NoFetchingInput
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:455)
at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:345)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Error obtaining the sorted input: Thread 'SortMerger spilling thread' terminated due to an exception: Error obtaining the sorted input: Thread 'SortMerger Reading Thread' terminated due to an exception: tried to access field com.esotericsoftware.kryo.io.Input.inputStream from class org.apache.flink.api.java.typeutils.runtime.NoFetchingInput
at org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:619)
at org.apache.flink.runtime.operators.BatchTask.getInput(BatchTask.java:1079)
at org.apache.flink.runtime.operators.GroupReduceDriver.prepare(GroupReduceDriver.java:94)
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:450)
... 3 more
Caused by: java.io.IOException: Thread 'SortMerger spilling thread' terminated due to an exception: Error obtaining the sorted input: Thread 'SortMerger Reading Thread' terminated due to an exception: tried to access field com.esotericsoftware.kryo.io.Input.inputStream from class org.apache.flink.api.java.typeutils.runtime.NoFetchingInput
at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:799)
Caused by: java.lang.RuntimeException: Error obtaining the sorted input: Thread 'SortMerger Reading Thread' terminated due to an exception: tried to access field com.esotericsoftware.kryo.io.Input.inputStream from class org.apache.flink.api.java.typeutils.runtime.NoFetchingInput
at org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:619)
at org.apache.flink.runtime.operators.sort.LargeRecordHandler.finishWriteAndSortKeys(LargeRecordHandler.java:263)
at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$SpillingThread.go(UnilateralSortMerger.java:1409)
at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:796)
Caused by: java.io.IOException: Thread 'SortMerger Reading Thread' terminated due to an exception: tried to access field com.esotericsoftware.kryo.io.Input.inputStream from class org.apache.flink.api.java.typeutils.runtime.NoFetchingInput
at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:799)
Caused by: java.lang.IllegalAccessError: tried to access field com.esotericsoftware.kryo.io.Input.inputStream from class org.apache.flink.api.java.typeutils.runtime.NoFetchingInput
at org.apache.flink.api.java.typeutils.runtime.NoFetchingInput.readBytes(NoFetchingInput.java:122)
at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:297)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:706)
at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:228)
at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:242)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:144)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:30)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:144)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:30)
at org.apache.flink.runtime.io.disk.InputViewIterator.next(InputViewIterator.java:43)
at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ReadingThread.go(UnilateralSortMerger.java:973)
at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:796)
Work-around (after more testing, doesn't work, staying with Tuple2)
I've switched from using Tuple2 to a Protocol Buffer message:
message KeyDay {
optional ByteString key = 1;
optional int64 timestamp_usec = 2;
}
But using Tuple2.of() was just easier than: KeyDay.newBuilder().setKey(...).setTimestampUsec(...).build().
When switched to a key been a class derived from protobuf.Message the problem disappeared for 10-15 days (so data size which was problem for Tuple2), but increasing data size to 20 days revealed it's there.