[flink]Task manager initialization failed - apache-flink

I am new to flink. I am trying to run the flink example on my local PC(windows).
However, after I run the start-cluster.bat, I login to the dashboard, it shows the task manager is 0.
I checked the log and seems it fails to initialize:
2020-02-21 23:03:14,202 ERROR org.apache.flink.runtime.taskexecutor.TaskManagerRunner - TaskManager initialization failed.
org.apache.flink.configuration.IllegalConfigurationException: Failed to create TaskExecutorResourceSpec
at org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpec.FromConfig(TaskExecutorResourceUtils.java:72)
at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.startTaskManager(TaskManagerRunner.java:356)
at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.<init>(TaskManagerRunner.java:152)
at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManager(TaskManagerRunner.java:308)
at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.lambda$runTaskManagerSecurely$2(TaskManagerRunner.java:322)
at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManagerSecurely(TaskManagerRunner.java:321)
at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.main(TaskManagerRunner.java:287)
Caused by: org.apache.flink.configuration.IllegalConfigurationException: The required configuration option Key: 'taskmanager.cpu.cores' , default: null (fallback keys: []) is not set
at org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkConfigOptionIsSet(TaskExecutorResourceUtils.java:90)
at org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.lambda$checkTaskExecutorResourceConfigSet$0(TaskExecutorResourceUtils.java:84)
at java.util.Arrays$ArrayList.forEach(Arrays.java:3880)
at org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkTaskExecutorResourceConfigSet(TaskExecutorResourceUtils.java:84)
at org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:70)
... 7 more
2020-02-21 23:03:14,217 INFO org.apache.flink.runtime.blob.TransientBlobCache - Shutting down BLOB cache
Basically, it looks like a required option 'taskmanager.cpu.cores' is not set. However, I can't find this property in flink-conf.yaml and in the document(https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/config.html) either.
I am using flink 1.10.0. Any help would be highly appreciated!

That configuration option is intended for internal use only -- it shouldn't be user configured, which is why it isn't documented.
The windows start-cluster.bat is failing because of a bug introduced in Flink 1.10. See https://jira.apache.org/jira/browse/FLINK-15925.
One workaround is to use the bash script, start-cluster.sh, instead.
See also this mailing list thread: https://lists.apache.org/thread.html/r7693d0c06ac5ced9a34597c662bcf37b34ef8e799c32cc0edee373b2%40%3Cdev.flink.apache.org%3E

Related

Strange transactional id errors when using the kafka sink

I had a Flink 1.15.1 job configured with
execution.checkpointing.mode='EXACTLY_ONCE'
that was failing with the following error
Sink: Committer (2/2)#732 (36640a337c6ccdc733d176b18adab979) switched from INITIALIZING to FAILED with failure cause: java.lang.IllegalStateException: Failed to commit KafkaCommittable{producerId=4521984, epoch=0, transactionalId=}
...
Caused by: org.apache.kafka.common.config.ConfigException: Invalid value for configuration transactional.id: String must be non-empty
that happened after the first checkpoint was triggered. The strange thing about it is that the KafkaSinkBuilder was used without calling setDeliverGuarantee, and hence the default delivery guarantee was expected to be used, which is NONE 1.
Is that even possible to start with? Shouldn't kafka transactions be involved only when one follows this recipe in 2?
* <p>One can also configure different {#link DeliveryGuarantee} by using {#link
* #setDeliverGuarantee(DeliveryGuarantee)} but keep in mind when using {#link
* DeliveryGuarantee#EXACTLY_ONCE} one must set the transactionalIdPrefix {#link
* #setTransactionalIdPrefix(String)}.
So, in my case, without calling setDeliverGuarantee (nor setTransactionalIdPrefix), I cannot understand why I was seeing these errors. To avoid the problem, I temporarily relaxed the checkpointing settings to
execution.checkpointing.mode='AT_LEAST_ONCE'
but I'd like to understand what was happening.
Like the JavaDoc mentions, if you enable exactly-once, you must set a transactionalIdPrefix. A complete recipe on how-to configure exactly-once with Apache Kafka can be found in this recipe: https://www.docs.immerok.cloud/docs/cookbook/exactly-once-with-apache-kafka-and-apache-flink/
Disclaimer: I work for Immerok

Unable to install devstack with designate

I am new to the OpenStack environment and started to get into it with a small DevStack setup. I worked the following instructions on a Ubuntu 18.04 machine through and everything worked fine. In order to play with some dns zones I started to research about designate. After adapting the following instructions to my setup I got some errors.
Executing stack.sh produces the following error:
++/opt/stack/designate/devstack/plugin.sh:source:5 set +o xtrace
2021-01-12 21:44:39.009 | Initializing Designate
DROP DATABASE
Could not load 'database': type object 'deprecated' has no attribute 'WALLABY'
Could not load 'pool': type object 'deprecated' has no attribute 'WALLABY'
Could not load 'tlds': type object 'deprecated' has no attribute 'WALLABY'
usage: designate [-h] [--config-dir DIR] [--config-file PATH] [--debug]
[--log-config-append PATH] [--log-date-format DATE_FORMAT]
[--log-dir LOG_DIR] [--log-file PATH] [--nodebug]
[--nouse-journal] [--nouse-json] [--nouse-syslog]
[--nowatch-log-file]
[--syslog-log-facility SYSLOG_LOG_FACILITY] [--use-journal]
[--use-json] [--use-syslog] [--watch-log-file]
{} ...
designate: error: argument category: invalid choice: 'database' (choose from )
Error on exit
World dumping... see /opt/stack/logs/worlddump-2021-01-12-214442.txt for details
nova-compute: no process found
neutron-dhcp-agent: no process found
neutron-l3-agent: no process found
neutron-metadata-agent: no process found
neutron-openvswitch-agent: no process found
I was not sure if my setup was legit. So I tried to use the example config from the designate tutorial. But the same problem occurred.
My actual local.conf:
[[local|localrc]]
USE_PYTHON3=True
ADMIN_PASSWORD=***
DATABASE_PASSWORD=$ADMIN_PASSWORD
RABBIT_PASSWORD=$ADMIN_PASSWORD
SERVICE_PASSWORD=$ADMIN_PASSWORD
SERVICE_TOKEN=$ADMIN_PASSWORD
DEST=/opt/stack
SERVICE_HOST=192.168.1.***
HOST_IP=$SERVICE_HOST
disable_service mysql
enable_service postgresql
enable_plugin designate https://opendev.org/openstack/designate
enable_service tempest
Checking the plugin.sh. It looks like the error occurred from this function:
function init_designate {
# (Re)create designate database
recreate_database designate utf8
# Init and migrate designate database
$DESIGNATE_BIN_DIR/designate-manage database sync
init_designate_backend
}
Hope somebody can give me a hint to run DevStack with designate.
Thanks in advance.
The issue you are having is a version mismatch with the cloud install and the designate plugin. Designate is expecting a newer verison of the oslo_log package.
Check that the "devstack" version you have checked out is on the master branch.
The line:
enable_plugin designate https://opendev.org/openstack/designate
Is pulling the master branch of designate for the devstack plugin.
If you are trying to install on a stable branch version OpenStack, you will need to specify a reference for the devstack plugin as well (example, stable/victoria):
enable_plugin designate https://opendev.org/openstack/designate stable/victoria
As mentioned above, you will also need to enable the designate services:
enable_service designate,designate-central,designate-api,designate-worker,designate-producer,designate-mdns

Using JanusGraph with Solr

Setting up JanusGraph i noticed the following in the console:
09:04:12,175 INFO ReflectiveConfigOptionLoader:173 - Loaded and initialized config classes: 10 OK out of 12 attempts in PT0.023S
09:04:12,230 INFO Reflections:224 - Reflections took 28 ms to scan 1 urls, producing 2 keys and 2 values
09:04:12,291 WARN GraphDatabaseConfiguration:1445 - Local setting index.search.index-name=entity (Type: GLOBAL_OFFLINE) is overridden by globally managed value (janusgraph). Use the ManagementSystem interface instead of the local configuration to control this setting.
09:04:12,294 WARN GraphDatabaseConfiguration:1445 - Local setting index.search.backend=solr (Type: GLOBAL_OFFLINE) is overridden by globally managed value (elasticsearch). Use the ManagementSystem interface instead of the local configuration to control this setting.
09:04:12,300 INFO CassandraThriftStoreManager:628 - Closed Thrift connection pooler.
and then i see the following:
Exception in thread "main" java.lang.IllegalArgumentException: Could not instantiate implementation: org.janusgraph.diskstorage.es.ElasticSearchIndex
How do i stop using elasticsearch and switch to Solr?
My properties file is as follows:
index.search.backend=solr
index.search.directory=/path/to/directory/for/solr/index/something
index.search.index-name=something
index.search.solr.mode=http
index.search.solr.http-urls=http://127.0.0.1:8983/solr
storage.backend=cassandrathrift
storage.hostname=127.0.0.1
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.25
The answer to this basically the same as this one for Titan. JanusGraph was forked from Titan.
You are probably trying to connect to an existing graph that was previously configured to use Elasticsearch. By default, the keyspace is named janusgraph.
1) You could connect to a different keyspace by updating conf/janusgraph-cassandra.properties
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=cassandrathrift
storage.hostname=127.0.0.1
storage.cassandra.keyspace=mygraph
2) You could drop the existing keyspace. If you used bin/janusgraph.sh start from the quick start directions (which starts a single node Cassandra and a single node Elasticsearch),
bin/janusgraph.sh clean
Or if you have a standalone Cassandra installation:
$CASSANDRA_HOME/bin/cqlsh -e 'drop keyspace if exists janusgraph'
Then you would be able to connect with the default conf/janusgraph-cassandra.properties.

Quartz clustering in camel spring DSL

I am trying to achieve "requests recovery" in fail-over scenario in two different machine with their clock also sync.
My configuration as below:
step 1: camel-context.xml
I have defined the below route in camel-context.xml file.
<route id="quartz" trace="true">
<from uri="quartz2://cluster/quartz?cron=0+0/2+++*+?&durableJob=true&stateful=true&recoverableJob=true">
<route>
step 2: quartz.properties:
I have enabled
org.quartz.jobStore.isClustered = true
org.quartz.scheduler.instanceId = AUTO
org.quartz.scheduler.instanceName =ClusteredScheduler
Currently I am running same camel application in two different instances in my local and clustering is working fine . But when I try to test the "requests recovery" I am getting below exception.
Exception :
[QuartzScheduler_ClusteredScheduler-camelContext-16308243724_ClusterManager] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - ClusterManager: detected 1 failed or restarted instances.
[QuartzScheduler_ClusteredScheduler-camelContext-16308243724_ClusterManager] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - ClusterManager: Scanning for instance "6308270818"'s failed in-progress jobs.
[QuartzScheduler_ClusteredScheduler-camelContext-16308243724_ClusterManager] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - ClusterManager: ......Scheduled 1 recoverable job(s) for recovery.
[ClusteredScheduler-camelContext_Worker-1] WARN org.apache.camel.component.quartz2.CamelJob - Cannot find existing QuartzEndpoint with uri: quartz2://cluster/quartz?cron=0+0%2F2+*+*+*+%3F&durableJob=true&recoverableJob=true&stateful=true. Creating new endpoint instance.
[ClusteredScheduler-camelContext_Worker-1] ERROR org.apache.camel.component.quartz2.CamelJob - Failed to execute CamelJob.
**org.apache.camel.ResolveEndpointFailedException: Failed to resolve endpoint: quartz2://cluster/quartz?cron=0+0%2F2+*+*+*+%3F&durableJob=true&recoverableJob=true&stateful=true due to: Trigger key cluster.quartz is already in used by Endpoint[quartz2://cluster/quartz?cron=0+0%2F2+*+*+*+%3F&durableJob=true&recoverableJob=true&stateful=true]**
at org.apache.camel.impl.DefaultCamelContext.getEndpoint(DefaultCamelContext.java:545)
at org.apache.camel.impl.DefaultCamelContext.getEndpoint(DefaultCamelContext.java:558)
at org.apache.camel.component.quartz2.CamelJob.lookupQuartzEndpoint(CamelJob.java:123)
at org.apache.camel.component.quartz2.CamelJob.execute(CamelJob.java:49)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
Caused by: java.lang.IllegalArgumentException: Trigger key cluster.quartz is already in used by Endpoint[quartz2://cluster/quartz?cron=0+0%2F2+*+*+*+%3F&durableJob=true&recoverableJob=true&stateful=true]
at org.apache.camel.component.quartz2.QuartzEndpoint.ensureNoDupTriggerKey(QuartzEndpoint.java:272)
at org.apache.camel.component.quartz2.QuartzEndpoint.addJobInScheduler(QuartzEndpoint.java:254)
at org.apache.camel.component.quartz2.QuartzEndpoint.doStart(QuartzEndpoint.java:202)
at org.apache.camel.support.ServiceSupport.start(ServiceSupport.java:61)
at org.apache.camel.impl.DefaultCamelContext.startService(DefaultCamelContext.java:2158)
at org.apache.camel.impl.DefaultCamelContext.doAddService(DefaultCamelContext.java:1016)
at org.apache.camel.impl.DefaultCamelContext.addService(DefaultCamelContext.java:977)
at org.apache.camel.impl.DefaultCamelContext.addService(DefaultCamelContext.java:973)
at org.apache.camel.impl.DefaultCamelContext.getEndpoint(DefaultCamelContext.java:541)
... 5 more
After shutting down the instance1 which is currently excuting the job , instance 2 is trying to recover the job immediately but its failing to execute the job .It is picking the same job in next interval (which is fine).
My requirement is active node immediately recover the failed job.
Thanks in advance.
I think we can avoid the checking of ensureNoDupTriggerKey, if the recoverableJob is true. I just created a JIRA CAMEL-8076 for it.

Disable scalatest logging statements when running tests from maven

What is the method to disable logging on the scalatest log4j messages:
The log4j.properties is as follows:
log4j.rootLogger=INFO,CA,FA
#Console Appender
log4j.appender.CA=org.apache.log4j.ConsoleAppender
log4j.appender.CA.layout=org.apache.log4j.PatternLayout
log4j.appender.CA.layout.ConversionPattern=%d{HH:mm:ss.SSS} %p %c: %m%n
log4j.appender.CA.Threshold = INFO
#File Appender
log4j.appender.FA=org.apache.log4j.FileAppender
log4j.appender.FA.append=false
log4j.appender.FA.file=target/unit-tests.log
log4j.appender.FA.layout=org.apache.log4j.PatternLayout
log4j.appender.FA.layout.ConversionPattern=%d{HH:mm:ss.SSS} %p %c{1}: %m%n
log4j.appender.FA.Threshold = INFO
..
log4j.logger.org.scalatest=WARN
However we are seeing INFO level scalatest log4j messages:
2014-11-30 14:25:57,263 INFO [ScalaTest-run-running-DiscoverySuite] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2014-11-30 14:25:57,493 INFO [ScalaTest-run-running-DiscoverySuite] hbase.HBaseCommonTestingUtility (HBaseTestingUtility.java:startMiniCluster(840)) - Starting up minicluster with 1 master(s) and 2 regionserver(s) and 2 datanode(s)
2014-11-30 14:25:57,499 INFO [ScalaTest-run-running-DiscoverySuite] hbase.HBaseCommonTestingUtility (HBaseTestingUtility.java:setupClusterTestDir(390)) - Created new mini-cluster data directory: /shared/hwspark/target/
Alternatively, you can throw this bit of code anywhere in one of your tests,
org.slf4j.LoggerFactory.getLogger(org.slf4j.Logger.ROOT_LOGGER_NAME)
.asInstanceOf[ch.qos.logback.classic.Logger]
.setLevel(ch.qos.logback.classic.Level.WARN)
which will set all logging to the WARN level.
Those log messages are not actually being printed by ScalaTest, but by something you are using from your ScalaTest tests. The reason "ScalaTest" shows up in them is that ScalaTest does change the name of threads when suites and tests are executed, so that if someone has a suite that hangs forever and does a thread dump to investigate, it is more obvious what test and suite is causing the run to hang. Log4J seems to print out the thread name in square brackets, so that can give you a hint as to where these log messages are coming from.
In my case it was slick.relational I looked at the classpath with the class reported in ScalaTest-run-running-.... information and found the class to be find in a package imported and added that specific package to logback.xml as
<logger name="slick.relational" level="INFO"/>
In your case search for HBaseTestingUtility or the other class reported there to find which jar it contains it and work out your logback logger name from the package prefix.

Resources