Apache Zeppelin with Spark Interpreter - apache-zeppelin

I am currently exploring Zeppelin with Spark interpreter. Have a question?
Once i query all the required data using Spark interpreter where does it store for further action like group by, drill down etc. For every action does it except the spark job.

The computation and storage of data is handled by Spark itself, refer to spark execution model for more details.
Link: https://spark.apache.org/docs/latest/programming-guide.html#performance-impact

Related

Guava version conflict happens in spark-connector

I met a problem when I use NebulaGraph Database like below:
I want to write offline spark data into NebulaGraph Database by spark-nebula-connector.
But I encountered two problems:
First, the NebulaGraph Database version I use only support spark v2.4 and scala v2.11. For this one, I solve it by downgrading the spark and scala version.
Second, spark connector writes data via client, but clients has strong dependence on guava-14:nebula-java/pom.xml at v3.3.0 · vesoft-inc/nebula-java · GitHub
And my spark also has strong dependence on guava,guava-27.0-jre
If I use guava-27.0, it will give java.lang.NoSuchMethodError (com.google.common.net.HostAndPort.getHostText()
If I use guava-14.0, EROOR will be give when the spark reads hive, like Exception in thread “main” java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument
How should I solve this?
Maybe you can refer this solution. Guava-14.0 and Guava-27.0 of the HostAndPort have different methods to acqure hosts. You can change the Guava version in connector or exchange, modify the HostAndPort.getHostTex, and then package it locally.

Dynamic Job Creation and Submission to Flink

Hi I am planning to use flink as a backend for my feature where we will show a UI to user to graphically create event patterns for eg: Multiple login failures from the same Ip address.
We will create the flink pattern programmatically using the given criteria by the user in the UI.
Is there any documentation on how to dynamically create the jar file and dynamically submit the job with it to flink cluster?
Is there any best practice for this kind of use case using apache flink?
The other way you can achieve that is that you can have one jar which contains something like an “interpreter” and you will pass to it the definition of your patterns in some format (e.g. json). After that “interpreter” translates this json to Flink’s operators. It is done in such a way in https://github.com/TouK/nussknacker/ Flink’s based execution engine. If you use such an approach you will need to handle redeployment of new definition in your own application.
One straightforward way to achieve this would be to generate a SQL script for each pattern (using MATCH_RECOGNIZE) and then use Ververica Platform's REST API to deploy and manage those scripts: https://docs.ververica.com/user_guide/application_operations/deployments/artifacts.html?highlight=sql#sql-script-artifacts
Flink doesn't provide tooling for automating the creation of JAR files, or submitting them. That's the sort of thing you might use a CI/CD pipeline to do (e.g., github actions).
Disclaimer: I work for Ververica.

Use OpenTelemetry with Apache Flink

I have been trying to use OpenTelemetry (https://opentelemetry.io/) in an Apache Flink's job. I am sending the traces to a Kafka topic in order to see it in a Jaeger.
The traceability is working in the job when I am executing it inside my IntelliJ IDE, but once I create the package and try to execute it inside the cluster, I am not able to make it work.
Is there any blocker in that sense for Apache Flink that I am not aware of?
I have accomplished this using a variable:
export FLINK_ENV_JAVA_OPTS=-javaagent:./lib/opentelemetry-javaagent-all.jar
But this is working if I am setting up the Flink's cluster. The problem it's that the cluster that I am using is inside AWS (Kinesis Analytics) and I am not able to set up this variable.
Is there a way to use OpenTelemetry with Flink?

Searalize protobuff data to avro - Apache Flink

Is it possible to serialize protobuff data to Avro and write to files using Apache Flink sink?
There is currently no out-of-the-box solution for protobuf yet. It's high on the priority list.
You can use protobuf-over-kryo or parse/serialize manually in the meantime.

Apache Flink: custom ID for flink task metrics

I Would like to edit the id of task metrics on flink ,so I can be able to make some metrics on Java Mission Control using JMX.
the reason why want to edit it first is because i want to make it easier to find on JMC
Can any one help me out with this problem?
You cannot change this ID on the web ui as it is from the runtime web server.
If you are connecting to the Flink JMXReport to get the metrics, you could use the task name to filter out the data you want, since data from JMX contains task name, task id, etc.
Another way is to implement your own metric reporter, only including the task name in it. So it is clearer to get the metrics from JMX.

Resources