NPE when i submit the jar to Flink - apache-flink

I write a code to consume stream from Kafka and then sink it back to another kafka topic.
The code runs normally in my IDE ,but when i submit the jar to Flink webpage, it throws NullPointerException on String[] cells = s.split(",");
Any help is appreciated. The full exception is:
java.lang.NullPointerException
at java.base/java.lang.String.split(String.java:2273)
at java.base/java.lang.String.split(String.java:2364)
at ex_filter_operation$SplitterKafkaString.map(ex_filter_operation.java:336)
at ex_filter_operation$SplitterKafkaString.map(ex_filter_operation.java:330)
at org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:717)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:692)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:672)
at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:52)
at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:30)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordsWithTimestamps(AbstractFetcher.java:352)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.partitionConsumerRecordsHandler(KafkaFetcher.java:185)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:141)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:755)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:213)

Related

The main method caused an error: No operators defined in streaming topology. Cannot execute

I have written the flink code to read the from pubsub. While executing the code with the command flink run Flink.jar I am getting the below mentioned error. I am using the flink version 1.9.3
Starting execution of program
------------------------------------------------------------
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: No operators defined in streaming topology. Cannot execute.
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:621)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1008)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1081)
at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1081)
Caused by: java.lang.IllegalStateException: No operators defined in streaming topology. Cannot execute.
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.getStreamGraphGenerator(StreamExecutionEnvironment.java:1545)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.getStreamGraph(StreamExecutionEnvironment.java:1540)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1507)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1489)
at org.flink.ReadFromPubsub.main(ReadFromPubsub.java:30)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604)
... 9 more
Please find the code which I am using
package org.flink;
import org.apache.flink.api.common.serialization.DeserializationSchema;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import org.apache.flink.streaming.connectors.gcp.pubsub.PubSubSource;
public class ReadFromPubsub
{
public static void main(String args[]) throws Exception
{
System.out.println("Flink Pubsub Code Read 1");
StreamExecutionEnvironment streamExecEnv= StreamExecutionEnvironment.getExecutionEnvironment();
DeserializationSchema<String> deserializer = new SimpleStringSchema();
SourceFunction<String> pubsubSource = PubSubSource.newBuilder() .withDeserializationSchema(deserializer)
.withProjectName("vz-it-np-gudv-dev-vzntdo-0") .withSubscriptionName("subscription1").build();
streamExecEnv.addSource(pubsubSource);
streamExecEnv.execute();
}
}
I am trying to read the data from pubsub with flink code but not able to do so.
Flink uses lazy evaluation and since you haven't specified any sinks there would be no reason to execute this job.
From the Flink docs:
All Flink programs are executed lazily: When the program’s main method is executed, the data loading and transformations do not happen directly. Rather, each operation is created and added to a dataflow graph. The operations are actually executed when the execution is explicitly triggered by an execute() call on the execution environment.
However, your dataflow graph has no output in this case which makes processing unnecessary.
For debugging purposes, you can add a print sink to your source to make your example work:
streamExecEnv.addSource(pubsubSource).print();

Apache Flink with Kinesis Analytics : java.lang.IllegalArgumentException: The fraction of memory to allocate should not be 0

Background :
I have been trying to setup BATCH + STREAMING in the same flink application which is deployed on kinesis analytics runtime. The STREAMING part works fine, but I'm having trouble adding support for BATCH.
Flink : Handling Keyed Streams with data older than application watermark
Apache Flink : Batch Mode failing for Datastream API's with exception `IllegalStateException: Checkpointing is not allowed with sorted inputs.`
The logic is something like this :
The logic is something like this :
streamExecutionEnvironment.setRuntimeMode(RuntimeExecutionMode.BATCH);
streamExecutionEnvironment.fromSource(FileSource.forRecordStreamFormat(new TextLineFormat(), path).build(),
WatermarkStrategy.noWatermarks(),
"Text File")
.process(process function which transforms input)
.assignTimestampsAndWatermarks(WatermarkStrategy
.<DetectionEvent>forBoundedOutOfOrderness(orderness)
.withTimestampAssigner(
(SerializableTimestampAssigner<Event>) (event, l) -> event.getEventTime()))
.keyBy(keyFunction)
.window(TumblingEventWindows(Time.of(x days))
.process(processWindowFunction);
On doing this I'm getting the below exception :
java.lang.Exception: Exception while creating StreamOperatorStateContext.
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:254)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:272)
at org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:441)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:582)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
at org.apache.flink.streaming.runtime.tasks.StreamTask.executeRestore(StreamTask.java:562)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:537)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:764)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:571)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state backend for WindowOperator_90bea66de1c231edf33913ecd54406c1_(1/1) from any of the 1 provided restore options.
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:345)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:163)
... 10 more
Caused by: java.io.IOException: Failed to acquire shared cache resource for RocksDB
at org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.allocateSharedCachesIfConfigured(RocksDBOperationUtils.java:306)
at org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend.createKeyedStateBackend(EmbeddedRocksDBStateBackend.java:426)
at org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend.createKeyedStateBackend(EmbeddedRocksDBStateBackend.java:90)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:328)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
... 12 more
Caused by: java.lang.IllegalArgumentException: The fraction of memory to allocate should not be 0. Please make sure that all types of managed memory consumers contained in the job are configured with a non-negative weight via `taskmanager.memory.managed.consumer-weights`.
at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:160)
at org.apache.flink.runtime.memory.MemoryManager.validateFraction(MemoryManager.java:672)
at org.apache.flink.runtime.memory.MemoryManager.computeMemorySize(MemoryManager.java:653)
at org.apache.flink.runtime.memory.MemoryManager.getSharedMemoryResourceForManagedMemory(MemoryManager.java:521)
at org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.allocateSharedCachesIfConfigured(RocksDBOperationUtils.java:302)
... 17 more
Seems like kinesis-analytics does not allow clients to define a flink-conf.yaml file to define taskmanager.memory.managed.consumer-weights. Is there any way around this ?
It's not clear to me what the underlying cause of this exception is, nor how to make batch processing work on KDA.
You can try this (but I'm not sure KDA will allow it):
Configuration conf = new Configuration();
conf.setString("taskmanager.memory.managed.consumer-weights", "put-the-value-here");
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment(conf);

Unable to stream with Flink Streaming

I am new to Flink Streaming framework and I am trying to understand the components and the flow. I am trying to run the basic wordcount example using the DataStream. I am trying to run the code on my IDE. The code runs with no issues when I feed data using the collection as
DataStream<String> text = env.fromElements(
"To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",z
"Or to take arms against a sea of troubles,"
);
But, it fails everytime when I am trying to read data either from the socket or from a file as below:
DataStream<String> text = env.socketTextStream("localhost", 9999);
or
DataStream<String> text = env.readTextFile("sample_file.txt");
For both the socket and textfile, I am getting the following error:
Exception in thread "main" java.lang.reflect.InaccessibleObjectException: Unable to make field private final byte[] java.lang.String.value accessible: module java.base does not "opens java.lang" to unnamed module
Flink currently (as of Flink 1.14) only supports Java 8 and Java 11. Install a suitable JDK and try again.

AppEngine RemoteAPI SocketTimeoutException

I'm using RemoteAPI to fetch entities from GAE Datastore, 300 at a time.
I'm doing something along the lines of:
while(!(emails = getEmails()).isEmpty()) {
Filter filter = new FilterPredicate("email", FilterOperator.IN, emails)
Query query = new Query("MyEntity").setFilter(filter);
QueryResultIterable<Entity> result = ds.prepare(query).asQueryResultIterable();
for (Entity entity : result) {
System.out.println(entity.getProperty("name"));
}
}
I'm processing something like 50k emails. The first time I ran this code it got to maybe 3/4 of the way, then it threw the following exception. Now it throws it after a single loop iteration is run.
com.google.appengine.tools.remoteapi.RemoteApiException: remote API call: I/O error
at com.google.appengine.tools.remoteapi.RemoteRpc.makeException(RemoteRpc.java:160)
at com.google.appengine.tools.remoteapi.RemoteRpc.callImpl(RemoteRpc.java:104)
at com.google.appengine.tools.remoteapi.RemoteRpc.call(RemoteRpc.java:50)
at com.google.appengine.tools.remoteapi.RemoteDatastore.runQuery(RemoteDatastore.java:156)
at com.google.appengine.tools.remoteapi.RemoteDatastore.handleRunQuery(RemoteDatastore.java:115)
at com.google.appengine.tools.remoteapi.RemoteDatastore.handleDatastoreCall(RemoteDatastore.java:93)
at com.google.appengine.tools.remoteapi.RemoteApiDelegate.makeDefaultSyncCall(RemoteApiDelegate.java:57)
at com.google.appengine.tools.remoteapi.StandaloneRemoteApiDelegate.makeSyncCall(StandaloneRemoteApiDelegate.java:47)
at com.google.appengine.tools.remoteapi.StandaloneRemoteApiDelegate$1.call(StandaloneRemoteApiDelegate.java:58)
at com.google.appengine.tools.remoteapi.StandaloneRemoteApiDelegate$1.call(StandaloneRemoteApiDelegate.java:54)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
at sun.security.ssl.InputRecord.read(InputRecord.java:480)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:934)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:891)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:690)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1324)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338)
at com.google.appengine.repackaged.com.google.api.client.http.javanet.NetHttpResponse.<init>(NetHttpResponse.java:37)
at com.google.appengine.repackaged.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:94)
at com.google.appengine.repackaged.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:972)
at com.google.appengine.tools.remoteapi.OAuthClient.post(OAuthClient.java:54)
at com.google.appengine.tools.remoteapi.RemoteRpc.callImpl(RemoteRpc.java:102)
... 12 more
I can't figure out what the problem is, but the code seems to be evaluating the for() condition before throwing the exception.
Could this be a quota problem? The quota details screen doesn't show any problems and I can't find any relevant information in the documentation.
For future readers of this question, if you see occurrences of RemoteApiException: remote API call: I/O error which are happening consistently and not intermittently, this could be related to a disruption in network connectivity or possibly a remote issue on the App Engine side.
If the first possibility is ruled out, the best course of action is to report the issue on the Google App Engine issue tracker.
To fix this, first, check your Internet connection. Then clean all artifacts and build them again by (with IntelliJ):
Go to Build => Build Artifacts...
Focus on All Artifacts => Clean
Focus on All Artifacts => Build

Google App Engine time out?

In a Google App Engine app I used the following lines to read a page from a site :
String Url="http://...",line,Result="";
URL url=new URL(Url+"?r="+System.currentTimeMillis());
BufferedReader reader=new BufferedReader(new InputStreamReader(url.openStream()));
while ((line=reader.readLine())!=null) { Result+=line+"\n"; }
reader.close();
But I got the following error :
Uncaught exception from servlet
com.google.apphosting.api.DeadlineExceededException: This request (f5e2889605d27d42) started at 2011/09/07 03:20:41.458 UTC and was still executing at 2011/09/07 03:21:30.888 UTC.
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:276)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:82)
at com.google.appengine.tools.development.TimedFuture.get(TimedFuture.java:55)
at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:69)
at com.google.apphosting.runtime.ApiProxyImpl.doSyncCall(ApiProxyImpl.java:177)
at com.google.apphosting.runtime.ApiProxyImpl.access$000(ApiProxyImpl.java:56)
at com.google.apphosting.runtime.ApiProxyImpl$1.run(ApiProxyImpl.java:150)
at com.google.apphosting.runtime.ApiProxyImpl$1.run(ApiProxyImpl.java:148)
at java.security.AccessController.doPrivileged(Native Method)
Seems it took longer than it would like to wait, what can I do if that site is slow ?
A DeadlineExceededException is thrown when your code, handling the request to your web application takes over 30 seconds to process. Presumably your code is taking a while to process because of the length of time it had to wait to receive data from some other site.
You can create a task on a task queue to fetch and process that data, and change your web request/response flow to reply with progress on your task.
If your code is running inside a request handler, then by default there is an App-Engine-enforced 60 second deadline. You can't change it. See the "Deadlines" row / "automatic scaling" column of the chart on this page under "Scaling types":
https://developers.google.com/appengine/docs/java/modules/
However, this code will be able to run for some hours if you change your module to use "manual scaling" and a "B1"-"B4" instance. Example:
default/src/main/webapp/WEB-INF/appengine-web.xml:
<?xml version="1.0" encoding="utf-8"?>
<appengine-web-app xmlns="http://appengine.google.com/ns/1.0">
<application>myapp</application>
<module>default</module>
<version>1</version>
<threadsafe>true</threadsafe>
<instance-class>B1</instance-class>
<manual-scaling>
<instances>1</instances>
</manual-scaling>
</appengine-web-app>
On this type of instance, your requests will typically not time out for hours. (The documentation claims that the deadline is "indefinite".)

Resources