I am using Flink 1.15.3 and its respective Delta Connector. End-to-End, the application is working as expected. No error logs, however I receive a warning log with the following:
Name collision: Group already contains a Metric with the name 'DeltaSinkBytesWritten'. Metric will not be reported.[xxx, flink, operator]
Name collision: Group already contains a Metric with the name 'DeltaSinkRecordsWritten'. Metric will not be reported.[xxx, flink, operator]
As a result, those above mentioned metrics do not get reported/incremented. How can I potentially get around this or investigate further? This assumes that my application may potentially not be setup correctly.
This seems to be a bug on the Delta Connector for Flink:
https://github.com/delta-io/connectors/issues/493
Related
I'm creating a process to handle millions of records with apache flink to support logistics data pipelines. I'm moving from kinesis sources/sink to kafka sources/sink.
However, in the flink dashboard, the job metrics are not being updated in the near-real-time. Do you know what can be wrong with the job/version?
Btw, when job is closed, then it can show all metrics... but not in near-real-time...
Job non-updating metrics picture
Fixed after cleanup conflict dependencies on "Kafka-clients" lib.
So, in my case, using also some avro & cloudevents libs with higher Kafka-clients version. Then, just need to exclude Kafka-clients from these libs and prefer flink Kafka-clients version. And this solved the issue.
I am trying the fine-grained resource management feature in Flink 1.14, hoping it can enable assigning certain operators to certain TaskManagers. Following the sample code in https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/finegrained_resource/, I can now define the task sharing groups I would like (using setExternalResource-method), but I do not see any option to "assign" a TaskManager worker instance with the capabilities of this "external resource".
So to the question. Following the GPU-based example in 1, how can I ensure that Flink "knows" which task manager actually has the required GPU?
With help from the excellent flink mailing list, I now have the solution. Basically, add lines to flink-conf.yaml for the specific task manager as per the external resource documentation. For a resource entitled 'example', these are the two lines that must be added:
external-resources: example
external-resource.example.amount: 1
.. will match a task a task sharing group with the added external resource:
.setExternalResource("example", 1.0)
The FLink version is 1.12, I follow the step(https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/metric_reporters.html#prometheuspushgateway-orgapacheflinkmetricsprometheusprometheuspushgatewayreporter), fill my config, run my job in Flink cluster. but after a few hours, I find cannot see metric data on grafana, so i loigin server and see pushgateway log, find like "Out of memory" error log.
i dont understand, actually i set deleteOnShutdown=true and some of my jobs is closed. why pushgateway will OOM?
This problem has always existed, However, it was not described in the previous v1.13 documents. you can see the pull request to get more info.
If you want to use push model in your Flink cluster, i recommend use influxdb.
I Would like to edit the id of task metrics on flink ,so I can be able to make some metrics on Java Mission Control using JMX.
the reason why want to edit it first is because i want to make it easier to find on JMC
Can any one help me out with this problem?
You cannot change this ID on the web ui as it is from the runtime web server.
If you are connecting to the Flink JMXReport to get the metrics, you could use the task name to filter out the data you want, since data from JMX contains task name, task id, etc.
Another way is to implement your own metric reporter, only including the task name in it. So it is clearer to get the metrics from JMX.
I am having a strange issue occur while utilizing the solr_query handler to make queries in Cassandra on my terminal.
When I perform normal queries on my table, I am having no issues, but when I use solr_query I get the following error:
Unable to complete request: one or more nodes were unavailable.
Other individuals who have experienced this problem seem unable to do any queries on their data whatsoever, whether or not it is solr_query. My problem only persists while using that handler.
Can anyone give me a suggestion for what the issue may be with my solr node.
ALSO -- I can do queries off of the Solr Admin page but like I said, am unable to do so on a terminal within my macbook.....
Here is the query I used, for reference:
cqlsh:demo> select * from device WHERE solr_query='id:[1 to 10000000000}';
More info:
This is how I created my KEYSPACE:
CREATE KEYSPACE demo WITH REPLICATION = {'class':'NetworkTopologyStrategy', 'Solr':1};
This is how I created the Solr core:
bin/dsetool create_core demo.device generateResources=true reindex=true
Performed a nodetool ring -h on my localhost and got this back:
Datacenter: Solr
Address Rack Status State Load Owns Token
127.0.0.1 rack1 Up Normal 2.8 MB 100.00% -673443545391973027
So it appears my node is up and normal..... Which leads me to believe it is an issue with the actual solr_query handler.
I also found the requestHandler within my config file
Your query isn't probably correct: id:[1 to 10000000000}
The "unavailable nodes" error is unfortunately a red herring, as that's the way Thrift (which cqlsh in Cassandra 2.0 is based upon) translates given errors, while you should get a more meaningful error if you repeat the same query with a driver based on the native protocol.