Flink webui when running from IDE - apache-flink

I am trying to see my job in the web ui.
I use createLocalEnvironmentWithWebUI, code is running well in IDE, but impossible to see my job in http://localhost:8081/#/overview
val conf: Configuration = new Configuration()
import org.apache.flink.configuration.ConfigConstants
conf.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true)
val env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf)
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
val rides = env.addSource(
new TaxiRideSource("nycTaxiRides.gz", 1,100))//60, 600))
val filteredRides = rides
.filter(r => GeoUtils.isInNYC(r.startLon, r.startLat) && GeoUtils.isInNYC(r.endLon, r.endLat))
.map(r => (r.passengerCnt, 1))
.keyBy(_._1)
.window(TumblingTimeWindows.of(Time.seconds(5)))
.sum(1)
.map(r => (r._1.toString+"test", r._2))
filteredRides.print()
env.execute("Taxi Ride Cleansing")
Did I need to setup something else?

I was able to start the Flink webui from IntelliJ by adding flink-runtime-web to the dependencies for my project. I did this by adding this to my pom.xml file:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-runtime-web_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
You also then need to create a local execution environment that includes the WebUI:
Configuration conf = new Configuration();
env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf);

As of Flink version 1.5.0 adding the dependency mentioned before and using the following piece of code to start the StreamEnvironment worked for me:
Configuration config = new Configuration();
config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true);
StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(config);
While the processing is running, the web UI is available under http://localhost:8081

Yes, if you want to use WebUI Dashboard, then you need to create an executable jar and then submit this jar to Flink dashboard. I will explain you this step by step
Step 1: Creating the jar from IDE code
you may need to change your execution environment to
StreamExecutionEnvironment envrionment =
StreamExecutionEnvironment.getExecutionEnvironment();
In case you have multiple jars, then set the main class in
Main-Class: variable of Manifest.mf file
Then create a jar using build artifacts in your IDE
Step 2: Start flink-local cluster which will show you dashboard.
I will assume that you have not downloaded the Flink binary, you can
easily download it here, if you have Macintosh, I will suggest you to
use brew install apache-flink which will download the latest stable
release which is 1.3.2 currently
Ok, now you have to go to path where flink is installed and start
local cluster
Step# 3 : submitting the job
submit jar via submit new job option and then run it

Related

Querying MS Sql Server from a Jenkins Pipeline

I've been using Jenkins (2.289.3) in a docker container (https://hub.docker.com/r/jenkins/jenkins). The next update to Jenkins 2.312 migrates the docker container from Java 8 to Java 11.
I have some pipelines that use the sourceforge jdbc driver to query SQL server (http://jtds.sourceforge.net/)
Example:
import java.sql.DriverManager
import groovy.sql.Sql
con = DriverManager.getConnection('jdbc:jtds:sqlserver://servername', 'user', 'password');
stmt = con.createStatement();
To make this work, in the Docker container on Java 8 I ran this on the docker container
cp jtds-1.3.1.jar ${JAVA_HOME}/jre/lib/ext
Which loads the jar for use inside Jenkins. This method no longer exists with Java 11.
It seems pipelines have added the #Grab syntax, eg
#Grab(group='net.sourceforge.jtds', module='jtds', version='1.3.1')
If I add this to my pipline, I can see the Jars are downloaded in /var/jenkins_home/.groovy/grapes/ but it doesn't seem to actually load the jar
java.lang.ClassNotFoundException: net.sourceforge.jtds.jdbc.Driver
or
java.sql.SQLException: No suitable driver found for jdbc:jtds:sqlserver://servername
depending on which commands I run. Either way, it appears to be due to the jar not being loaded.
All the groovy examples use
#GrabConfig(systemClassLoader=true)
But this appears to not be supported in pipelines.
I've considered using a command line client, but I need to parse the results of queries and I haven't seen a tool that works well for this (ie, one that would load results into a json file or similar)
I've also tried setting the -classpath argument in the docker container, eg
ENV JAVA_OPTS=-classpath /var/jenkins_home/test/jtds-1.3.1.jar
Running ps in the docker container, I can see that the java process runs with the classpath command line option specified, but it doesn't seem to actually load the jar for use.
I'm a bit lost on how to get this working, can anyone help? Thanks.
Well, I've found a workaround. It doesn't seem ideal, but it does work
The original code
import java.sql.DriverManager
import groovy.sql.Sql
con = DriverManager.getConnection('jdbc:jtds:sqlserver://servername', 'user', 'password');
stmt = con.createStatement();
Assuming we have the jar saved in /var/jenkins_home/test/jtds-1.3.1.jar it can be updated to:
import java.sql.DriverManager
import groovy.sql.Sql
def classLoader = this.class.classLoader
while (classLoader.parent) {
classLoader = classLoader.parent
if(classLoader.getClass() == java.net.URLClassLoader)
{
// load our jar into the urlclassloader
classLoader.addURL(new File("/var/jenkins_home/test/jtds-1.3.1.jar").toURI().toURL())
break;
}
}
// register the class
Class.forName("net.sourceforge.jtds.jdbc.Driver")
con = DriverManager.getConnection('jdbc:jtds:sqlserver://servername', 'user', 'password');
stmt = con.createStatement();
Once this code has been run once, the jar seems to be accessible globally (even in other pipelines that don't load the jar).
Based on this, it seems like a good way to handle this is on the Jenkins initialization, rather than in the script at all. I created /var/jenkins_home/init.groovy with these contents:
def classLoader = this.class.classLoader
while (classLoader.parent) {
classLoader = classLoader.parent
if(classLoader.getClass() == java.net.URLClassLoader)
{
classLoader.addURL(new File("/var/jenkins_home/jars/jtds-1.3.1.jar").toURI().toURL())
break;
}
}
Class.forName("net.sourceforge.jtds.jdbc.Driver")
And after that, the scripts seem to behave similar to how I think it should work with the Jar in the classpath.

Custom configuration file flink-conf.yaml

I need to specify different Flink settings for different applications. In other words, each application should run with its custom file flink-conf.yaml. What is the proper way to do it?
I found some old recommendations to declare FLINK_CONF_DIR pointing to a custom directory with Flink configuration files (for example: How could I override configuration value in Apache Flink?). However, the official Flink documentation does not mention the FLINK_CONF_DIR variable at all (as of Flink 1.13). Therefore I have doubts, that this way is officially recommended and supported by Flink developers.
UPDATE 1: Details on application running
I am running Flink on YARN in the Application mode. Here is how I launch the application:
"$flink_home/bin/flink" \
run-application \
--target yarn-application \
--class com.example.App1
The out-of-the-box Flink configuration is located in the $flink_home/conf directory. As I have several applications App1, App2, ..., I want them to use their respective Flink configurations instead of the out-of-the-box configuration.
TL;DR: The paragraph about FLINK_CONF_DIR was accidentally removed when the Flink on YARN docs were rewritten for the Flink 1.12 release. It is still the intended and supported way to establish per-application settings in YARN clusters.
Other ways to override the configuration:
You can override the settings specified in the cluster's flink-conf.yaml file with settings you specify on the command line, as described in this answer.
You can also override specific settings from the global configuration in your code, e.g.:
Configuration conf = new Configuration();
conf.setString("state.backend", "filesystem");
env = StreamExecutionEnvironment.getExecutionEnvironment(conf);
You can also load all of the settings in a flink-conf.yaml file from your application code, via
FileSystem.initialize(GlobalConfiguration.loadConfiguration("/path/to/conf/directory"));
And with Kubernetes you can mount different ConfigMaps for different applications.

How to set Zeppelin interpreter dependencies via configuration

I'm trying to deploy zeppelin 0.7.3 and add ojdbc7.jar and custom jar dependencies automatically.
I'm wondering if there's a configuration item in zeppelin-env.sh or zeppelin-site.xml that could do this.
example:
copy zeppelin-0.7.3-bin-all to /data and copy ojdbc7.jar to /data/zeppelin-0.7.3-bin-all/jdbc/ojdbc7.jar
In interpreter, find spark, I can see dependencies/artifact added according to the configuration.
/data/zeppelin-0.7.3-bin-all/interpreter/jdbc/ojdbc7.jar

Gradle with Integrated SQL Security

I'm currently working on converting one of our Maven projects to use Gradle.
Here is the issue I'm currently facing:
This project is using SQL Integrated security. Here is how Maven handles it (this took us a while to figure it out):
<dependency>
<groupId>com.microsoft.sqlserver</groupId>
<artifactId>sqljdbc4</artifactId>
<version>4.0</version>
<scope>system</scope>
<systemPath>${project.basedir}/libs/sqljdbc4.jar</systemPath>
</dependency>
after run gradle init --type pom
this specific dependency has been converted to something like this:
system group: 'com.microsoft.jdbcdriver', name: 'sqljdbc', version:'4.0.1'
which is not right. Gradle can't build. More specifically, the system scope does not even exist in Gradle's API (neither I found it in any third party Gradle plugin).
Any help from whom had any experience with Gradle SQL integrated security would highly appreciated.
It is very easy to emulate scope system with Gradle by adding a configuration.
Create configuration system and set it as the compile classpath. Any dependencies added to system will now be available during compilation (though I doubt you need a specific JDBC driver for compilation), but the dependencies in system will not be added to the published dependencies of the module:
configurations {
system.extendsFrom compile
}
sourceSets {
main {
compileClasspath = configurations.system
}
}
Now you can easily add the JAR of the JDBC driver to the system configuration. This assumes you still want to refer to a local file just like with Maven:
dependencies {
system files('libs/sqljdbc-4.0.1.jar')
}
But if you have the JAR in a (local) repository, it is better to use the repository:
dependencies {
system 'com.microsoft.jdbcdriver:sqljdbc:4.0.1'
}

jenkins selenium grid v2 confiduration

I need a help with configuration and updating jenkins selenium plugin.
I can configure selenium hub and nodes outside of jenkins and run tests from maven so selenium itself is not a problem.
1 problem: jenkins selenium plugin already defines default node with list of available browsers (5 firefox, 5 chromes, 2 IE). I would like all my test to be run on other machine than jenkins. so I was able to point my remote node to jenkins machine and it is registered there. my question is how I remove default node browser configuration ???
2 problem: how can I update to latest selenium-server-standalone version (which currently is 2.24.1). right now I can see jenkins is using 2.15.0. I tried to add jar in ...jenkins/plugins/selenium/WEB-INF/lib and update license.xml but after I restarted jenkins it still uses older version
thanks for help
I can answer on you second question. To update selenium-server-standalone version you must perform the following steps:
1) Download the latest version of selenium-server-standalone
2) Put it in to YourJenkinsHomeDirectory/plugins/selenium/WEB-INF/lib
3) Edit licenses.xml file in YourJenkinsHomeDirectory/plugins/selenium/WEB-INF
Edit following (Edit the text highlighted in bold. Paste the value of the version of your selenium-server-standalone):
l:dependency name='Unnamed - org.seleniumhq.selenium:selenium-server-standalone:pom:2.39.0' groupId='org.seleniumhq.selenium' artifactId='selenium-server-standalone' version='2.39.0'
Save file
4) Go to YourJenkinsHomeDirectory/Home/plugins/selenium/META-INF/maven/org.jenkins-ci.plugins/selenium
5) Edit pom.xml
Than find next:
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-server-standalone</artifactId>
<version>2.39.0</version>
</dependency>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>2.39.0</version>
<scope>test</scope>
</dependency>
</dependencies>
And edit <version>2.39.0</version> in both block. Paste the value of the version of your selenium-server-standalone
Save file
6) Restart your Jenkins (simply go to this http://your-jenkins-url/restart) as default the url is: http://localhost:8080. Now Jenkins should restart
7)Go to your Selenium Grid hub http://localhost:4444/grid/console
8)Now you should see your updated version
Good luck :)

Resources