Collecting Metrics with Graphite Plugin leads to "A metric named [..] already exists" error - apache-flink

when i configure the flink-conf.yaml to collect metrics with the graphite plugin,
the most time only incomplete metrics are being sent. On the Taskmanager output multiple errors occur like:
2018-08-15 00:58:59,016 WARN org.apache.flink.runtime.metrics.MetricRegistryImpl - Error while registering metric.
java.lang.IllegalArgumentException: A metric named mycomputer.taskmanager.8ceab4c3dfbf9fc5fa2af0447f1373a1.State machine job.Source: Custom Source.0.numRecordsOut already exists
at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
at org.apache.flink.dropwizard.ScheduledDropwizardReporter.notifyOfAddedMetric(ScheduledDropwizardReporter.java:131)
at org.apache.flink.runtime.metrics.MetricRegistryImpl.register(MetricRegistryImpl.java:329)
at org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.addMetric(AbstractMetricGroup.java:379)
at org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:312)
at org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:302)
at org.apache.flink.runtime.metrics.groups.OperatorIOMetricGroup.<init>(OperatorIOMetricGroup.java:41)
at org.apache.flink.runtime.metrics.groups.OperatorMetricGroup.<init>(OperatorMetricGroup.java:48)
at org.apache.flink.runtime.metrics.groups.TaskMetricGroup.addOperator(TaskMetricGroup.java:146)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.setup(AbstractStreamOperator.java:174)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.setup(AbstractUdfStreamOperator.java:82)
at org.apache.flink.streaming.runtime.tasks.OperatorChain.<init>(OperatorChain.java:143)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:267)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)
I've tried this on a completely freshly prepared flink-1.6.0 release with following config and the precompiled "State machine job" in the examples folder:
metrics.reporters: grph
metrics.reporter.grph.class: org.apache.flink.metrics.graphite.GraphiteReporter
metrics.reporter.grph.host: localhost
metrics.reporter.grph.port: 2003
metrics.reporter.grph.interval: 1 SECONDS
metrics.reporter.grph.protocol: TCP
I use the official graphite docker image (https://hub.docker.com/r/graphiteapp/docker-graphite-statsd/) that is running on the default configuration.
Has anybody an idea, how i can fix this issue?
Thank's and best regards
update
to exclude that a specific local setting is responsible for this behaviour, I repeated the process on a clean EC2 instance. There's exactly the same error here.
How to reproduce:
start EC2 t2.xlarge
installed java
download flink at https://www.apache.org/dyn/closer.lua/flink/flink-1.6.0/flink-1.6.0-bin-scala_2.11.tgz
added the flink-metrics-graphite-1.6.0.jar to lib
configured the flink-yaml.conf as mentioned in my previous post
./bin/start-cluster.sh
./bin/flink run examples/streaming/StateMachineExample.jar
I have not set up graphite in this case, because the error obviously already
occurs before.
After the job has been started you can view the error in the flink dashboard under Task Manager -> Logs

Related

Setting up a Flink cluster with Podman for a beampipeline with flinkrunner

My goal is to create a streaming pipeline to read data from Apache Kafka, process the data, and write back to it.
Because of security reasons, I want to avoid Docker and use Podman.
I have set up a minimal cluster via a docker-compose.yml with a jobmanager, taskmanager and a Python SDK harness worker. The SDK harness worker seems to get stuck when i try to execute a pipeline.
Running the pipeline (reading a multi-line .txt file and writing it back in a file) it gets transferred to the jobmanager and taskmanager correctly, but then goes idle. When I look in the pythonsdk container, the logs show the following message repeatedly:
2022/12/04 16:13:02 Starting worker pool 1: python -m
apache_beam.runners.worker.worker_pool_main --service_port=50000
--container_executable=/opt/apache/beam/boot
Starting worker with command ['/opt/apache/beam/boot', '--id=1-1',
'--logging_endpoint=localhost:45087',
'--artifact_endpoint=localhost:35323',
'--provision_endpoint=localhost:36435',
'--control_endpoint=localhost:33237']
2022/12/04 16:16:31 Failed to obtain provisioning information: failed to
dial server at localhost:36435
caused by:
context deadline exceeded
Here is a link to a test pipeline that was created:
Example on github
Environment:
Debian 11;
Podman;
Python 3.2.9;
apache-beam==2.38.0; and
podman-compose
The setup of the cluster defined in:
docker-compose.yml
1x flink-jobmanager (flink version 1.14)
1x flink-taskmanager
1x Python Harness SDK
I chose to create a SDK container manually because I don't have Docker installed and Flink fails when it tries to create a container
over Docker.
I suspect that I have made a mistake in the network setup or there are some configurations missing for the harness worker, but I could not figure out the problem. Any thoughts?
Crossposted in user mailing list of beam.apache.org

Upgrading Apache Flink need to update pom.xml?

I've just upgraded my flink from version 1.9.1 to 1.11.2 (using docker)
I have already many flink jobs running in version 1.9.1
When I try to upgrade to 1.11.1 and re run my job, it shows error.
2020-11-12 06:49:17,731 WARN org.apache.zookeeper.ClientCnxn []
- SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/tmp/jaas-1135609831848314731.conf'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.
2020-11-12 06:49:17,739 INFO org.apache.zookeeper.ClientCnxn [] - Opening socket connection to server xxxxxx:2181
2020-11-12 06:49:17,741 ERROR org.apache.curator.ConnectionState [] - Authentication failed
And this is the error after deploying my flink job:
Caused by: java.lang.RuntimeException: API paths not defined
and also:
java.lang.NoSuchMethodError: org.apache.flink.api.common.state.OperatorStateStore.getSerializableListState(Ljava/lang/String;)Lorg/apache/flink/api/common/state/ListState;
Do I need to change every pom for my flink jobs?
Is there any work around without changing my source code?
Thanks
Yes, you do have to rebuild your Flink jobs whenever you update the Flink version being used to run them. The libraries you use should be from the same exact version used by the Job Manager and Task Managers.
If you are trying to automate deployments for a CI/CD pipeline, you could inject the version number into the pom.xml using an environment variable -- but doing things like that can make it hard to debug when things go wrong.

[AWS Glue]: org.apache.thrift.TApplicationException: Internal error processing createInterpreter

I'm trying to use zeppelin-0.8.0 to connect to AWS Glue Development endpoint and when executing a cell below error occurs.
And there is no helpful message to understand what could be the problem. Any leads appreciated
172318_1906434757 is finished, status: ERROR, exception: java.lang.RuntimeException: org.apache.thrift.TApplicationException: Internal error processing createInterpreter, result: %text org.apache.thrift.TApplicationException: Internal error processing createInterpreter
at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_createInterpreter(RemoteInterpreterService.java:209)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.createInterpreter(RemoteInterpreterService.java:192)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$2.call(RemoteInterpreter.java:169)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$2.call(RemoteInterpreter.java:165)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:135)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.internal_create(RemoteInterpreter.java:165)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:132)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:299)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:407)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:307)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
UPDATE: So as in the answer below looks like 0.8.0 doesn't work with Glue yet.. I had problems running 0.7.x aw well with the javax.ws.rx package having a bunch of MethodNotFoundException when running with Java 8(did not help update-alternative to Java 7 as well). But when running inside a JDK 7 docker container it worked with no problems and was able to connect to my Dev end point. Highly appreciate if anyone can clarify the root cause of it
Could you please provide more information, such as zeppin instance location. Is it running on your desktop/laptop or is it running as AWS Notebook server? Also did you try connecting to zeppelin 0.7.3 version, as mentioned here in this AWS forum link :
https://forums.aws.amazon.com/thread.jspa?threadID=285128
As per the above link dated Jul 2018, think AWS Glue doesn't yet support Zeppelin 0.8 version.
I am assuming all other configurations, environment settings are done as needed. Can help more, if you can provide additional info.
UPDATE:
Anyway, please refer here and setting up zeppelin on windows, for any help on setting up local development environment & zeppelin notebook.
Once you set up the zeppelin notebook, have an SSH connection established (using AWS Glue DevEndpoint URL), so you can have access to the data catalog/crawlers,etc., and also the S3 bucket where your data resides. Then, you can create your python scripts in the zeppelin notebook, and run from the zeppelin.
You can use dev instance provided by Glue, but you may incur additional costs for the same(EC2 instance charges).
Environment settings (updated in response to comments):
JAVA_HOME=E:\Java7\jre7
Path=E:\Python27;E:\Python27\Lib;E:\Python27\Scripts;
PYTHONPATH=E:\spark-2.1.0-bin-hadoop2.7\python;E:\spark-2.1.0-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip;E:\spark-2.1.0-bin-hadoop2.7\python\lib\pys
park.zip
SPARK_HOME=E:\spark-2.1.0-bin-hadoop2.7
Change the drive name/ folders accordingly. Let me know if any help neeed.

Running pubsub kafka connector standalone mode issues

So, I have been trying to get a PubSub Kafka connector running for about a month now with various problems. I have reviewed many questions here about Kafka Connect and the Pubsub connector which have helped me get his far but I am stuck again. When I run this command:
.\bin\windows\connect-standalone.bat
.\etc\kafka\WorkerConfig.properties .\etc\kafka\configSink.properties .\etc\kafka\configSource.properties
I get a long list of errors linked here:
Right after it tries to start the rest server is when the errors "could not scan file [file name]..." start. I am unsure if I need to set the rest.host.name and rest.port because currently, for the standaloneConfig values, it reads
rest.host.name = null
Edit: After reviewing the log file for awhile, I found the following messages:
Kafka consumer created
Created connector CPSConnector
Initializing task CPSConnector-0 with config {connector.class=com.google.pubsub.kafka.sink.CloudPubSubSinkConnector, task.class=com.google.pubsub.kafka.sink.CloudPubSubSinkTask, tasks.max=1, topics=, cps.project=kohls-sis-sandbox, name=CPSConnector, cps.topic=test-pubsub}
Task CPSConnector-0 threw an uncaught and unrecoverable exception
org.apache.kafka.connect.errors.ConnectException: Sink tasks require a list of topics.
at org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:202)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:139)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Edit: So, I fixed the above issue by adding topics=test in my configSink. The current error message is below. Does this indicate that you can only run either a sink connector or source connector?
Failed to create job for .\etc\kafka\configSource.properties
Stopping after connector error
java.util.concurrent.ExecutionException: org.apache.kafka.connect.errors.AlreadyExistsException: Connector CPSConnector already exists
at org.apache.kafka.connect.util.ConvertingFutureCallback.result(ConvertingFutureCallback.java:80)
at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:67)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:97)
Caused by: org.apache.kafka.connect.errors.AlreadyExistsException: Connector CPSConnector already exists
at org.apache.kafka.connect.runtime.standalone.StandaloneHerder.putConnectorConfig(StandaloneHerder.java:145)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:94)
In my WorkerConfig.properites, I have bootstrap.servers=localhost:2181. My property files are here.
I am not sure how to fix since I have my properties files set, made sure the cps-kakfa-connector.jar is in the class path. I also set plugin.path=\share\java\kafka\kafka-connect-pubsub.
If anyone can point me in the right direction to fix this issue, that would be great. I followed the directions here: https://github.com/GoogleCloudPlatform/pubsub/tree/master/kafka-connector
Each Connector instance, whether it's a source or a sink, needs to have a unique name when you submit its configuration properties to a Kafka Connect cluster, or standalone worker.
In the above example, just name your Source differently than your Sink.
For instance:
$ head -n 1 configSource.properties
name=CPSSourceConnector
$ head -n 1 configSink.properties
name=CPSSinkConnector
or, might as well:
$ head -n 1 configSource.properties
name=Tom
$ head -n 1 configSink.properties
name=Jerry

WLP :: Worklight :: Can't install runtime

I'm using Worklight 6.2 server edition and I can't deploy a working runtime (of other environments) on my server.
I'm using webpshere liberty profile v8.5.5 and when I deploy the runtime via GUI it says success and on server.xml I can see the new configuration for the app.
However when I go to the worklightconsole I don't see my runtime to upload the app.
On messages.log there is a error regarding JMX connection.
The quoted error is
Failed to obtain JMX connection to access an MBean. There might be a JMX configuration error: No JMX connector is configured
I'm refering this because I've seen some post on SO saying that these issues might be connected. However I have the restConnector-1.0 on my WLP features.
Reference: No runtime on my Worklight 6.2 Console after installing analytics
On messages.log there is some other things that I found interesting, like the correct start of the runtime I've deployed
[11/12/14 5:50:45:177 CST] 00000012 com.worklight.server.bundle.project.JeeProjectActivator I FWLST0002I: ========= Project /HelloWorld started. The project WAR file version is 6.2.0.00.20140922-2259,running on server version 6.2.0.00.20140613-0730. [project HelloWorld]
and two erros while starting my server
[11/12/14 5:50:49:911 CST] 00000012 SystemErr R 24 WorklightPU WARN [Scheduled Executor-thread-1] openjpa.Runtime - An error occurred while registering a ClassTransformer with PersistenceUnitInfo: name 'WorklightPU', root URL [file:/opt/IBM/WebSphere/Liberty/usr/shared/resources/worklight/lib/worklight-jee-library.jar]. The error has been consumed. To see it, set your openjpa.Runtime log level to TRACE. Load-time class transformation will not be available.
Second error:
java.lang.RuntimeException: Timeout while waiting for the management service to start up
I don't know what these are but I think it might be related to my problem and this errors eventually appear when I start my server.
Does anyone have any tips for troubleshooting this issue?
Thanks in advance.
This is a known issue from Websphere.
There is a APAR to fix that, a workaround is to restart the server with the --clean option to force a refresh onto the shared libraries.
http://www-01.ibm.com/support/docview.wss?uid=swg1PI17830

Resources