Uploading jar on multiple flink job managers - apache-flink

So currently we are using flink 1.12 in HA mode on production. There are 3 job managers (1 leader and 2 standby). When I am uploading a jar on one of the job managers, somehow it is not reflected on other job managers. Is there any way where I can achieve a behaviour where uploading jar to a single job manager also gets reflected in other job managers in HA as well?
The problem that I am facing due to this is that when the jar is uploaded on let's say 'A' job manager, but when sending a job submit a request using uploaded jar on 'B' job manager, I get an error saying jar not found.

On https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/ha/overview/ is mentioned State persistence: Persisting state which is required for the successor to resume the job execution (JobGraphs, user code jars, completed checkpoints)
That implies that Flink's HA service takes care of the JARs part too. Have you tried what happens if you shutdown the active JobManager?

Related

How to solve liquibase waiting for changelog lock problem in several pods in OpenShift cluster?

We are supporting several microservices written in Java using Spring Boot and deployed in OpenShift. Some microservices communicate with databases. We often run a single microservice in multiple pods in a single deployment. When each microservice starts, it starts liquibase, which tries to update the database. The problem is that sometimes one pod fails while waiting for the changelog lock.
When this happens in our production OpenShift cluster, we expect other pods to fail while restarting because of the same problem with changelog lock issue. So, in the worst case scenario, all pods will wait for the lock to be lifted.
We want liquidbase to automatically prepare our database schemas when each pod is starting.
Is it good to store this logic in every microservice? How can we automatically solve the problem when the liquidbase changelog lock problem appears? Do we need to put the database preparation logic in a separate deployment?
So maybe I should paraphrase my question. What is the best way to run db migration in term of microservice architecture? Maybe we should not use db migration in each pod? Maybe it is better to do it with separate deployment or do it with some extra Jenkins job not in OpenShift at all?
We're running liquibase migrations as an init-container in Kubernetes. The problem with running Liquibase in micro-services is that Kubernetes will terminate the pod if the readiness probe is not successful before the configured timeout. In our case this happened sometimes during large DB migrations, which could take a few minutes to complete. Kubernetes will terminate the pod, leaving DATABASECHANGELOGLOCK in a locked state. With init-containers you will not have this problem. See https://www.liquibase.org/blog/using-liquibase-in-kubernetes for a detailed explanation.
UPDATE
Please take a look at this Liquibase extension, which replaces the StandardLockService, by using database locks: https://github.com/blagerweij/liquibase-sessionlock
This extension uses MySQL or Postgres user lock statements, which are automatically released when the database connection is closed (e.g. when the container is stopped unexpectedly). The only thing required to use the extension is to add a dependency to the library. Liquibase will automatically detect the improved LockService.
I'm not the author of the library, but I stumbled upon the library when I was searching for a solution. I helped the author by releasing the library to Maven central. Currently supports MySQL and PostgreSQL, but should be fairly easy to support other RDBMS.
When Liquibase kicks in during the spring-boot app deployment, it performs (on a very high level) the following steps:
lock the database (create a record in databasechangeloglock)
execute changeLogs;
remove database lock;
So if you interrupt application deployment while Liquibase is between steps 1 and 3, then your database will remain locked. So when you'll try to redeploy your app, Liquibase will fail, because it will treat your database as locked.
So you have to unlock the database before deploying the app again.
There are two options that I'm aware of:
Clear databasechangeloglock table or set locked to false. Which is DELETE FROM databasechangeloglock or UPDATE databasechangeloglock SET locked=0
Execute liquibase releaseLocks command. You can find documentation about it here and here.
We managed to solve this in my company by following also the same approach Liquibase suggests with Init Containers, but instead of using a new container and run the Liquibase migration via Liquibase CLI, we are reusing the existing Spring Boot service setup but just executing the Liquibase logic. We have created an alternative main class that can be used in an entrypoint to populate the database using Liquibase.
The InitContainerApplication class brings the minimal configuration required to start the application and set up Liquibase.
Typical usage:
entrypoint: "java -cp /app/extras/*:/app/WEB-INF/classes:/app/WEB-INF/lib/* com.backbase.buildingblocks.auxiliaryconfig.InitContainerApplication"
Here the class
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.ApplicationArguments;
import org.springframework.boot.ApplicationRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.SpringBootConfiguration;
import org.springframework.boot.autoconfigure.ImportAutoConfiguration;
import org.springframework.context.ApplicationContext;
#SpringBootConfiguration
#ImportAutoConfiguration(InitContainerAutoConfigurationSelector.class)
public class InitContainerApplication implements ApplicationRunner {
#Autowired
private ApplicationContext appContext;
public static void main(String[] args) {
SpringApplication.run(InitContainerApplication.class, args);
}
#Override
public void run(ApplicationArguments args) throws Exception {
SpringApplication.exit(appContext, () -> 0);
}
}
Here is the use as an Init Container:
spec:
initContainers:
- name: init-liquibase
command: ['java']
args: ['-cp', '/app/extras/*:/app/WEB-INF/classes:/app/WEB-INF/lib/*',
'com.backbase.buildingblocks.auxiliaryconfig.InitContainerApplication']
Finally we solved this problem in another project by removing liquibase migration at microservice start time. Now separate Jenkins job apply the migration and separate Jenkins job deploy and start microservice after migration apply. So now microservice itself doesn’t apply database update
I encountered this issue when one of the Java applications I manage abruptly shut down.
The logs were displaying the error below when the application tries to start:
waiting to acquire changelock
Here's how I solved it
I fixed this issue by:
Stopping the application
Deleting the databasechangelog and databasechangelog.lock files in the database connected to the application.
Restarting the application
In my case the application was connected to 2 databases. I had to delete the databasechangelog and databasechangelog.lock files in the both databases and then restarted the application. The both database databasechangelog and databasechangelog.lock files have to be at sync.
After this the application was able to acquire changelock file.

Flink run job with remote jar file

I'm new to flink and trying to submit my flink program to my flink cluster.
I have a flink cluster running on remote kubernetes and a blob storage on Azure.
I know how to submit a flink job when I have the jar file on my local machine but no idea how to submit the job with the remote jar file(the jar can be access by https)
checked the documents and it seems doesn't provide something like what we do in spark
Thanks in advance
I think you can use an init container to download the job jar into a shared volume, then submit the local jar to Flink.
Ads: Google's Flink Operator supports remote job jar, see this example.

How to execute flink job remotely when the flink job jar is bulky

I have flink server running on Kubernetes cluster. I have a job jar which is bulky due to product and third party dependencies.
I run it via
ExecutionEnvironment env = ExecutionEnvironment.createRemoteEnvironment(host, port, jar);
The jar size is around 130 MB after optimization.
I want to invoke the remoteExecution without jar upload so that the upload does not happen everytime when the job needs to be executed. Is there a way to upload the jar once and call it remotely without mentioning the jar (in java)?
You could deploy a per job cluster on Kubernetes. This will submit your user code jar along with the Flink binaries to your Kubernetes cluster. The downside is that you cannot change the job afterwards without restarting the Flink cluster.

Where can I find my jar on Apache Flink server which I submitted using Apache Flink dashboard

I developed a Flink job and submitted my job using Apache Flink dashboard. Per my understanding, when I submit my job, my jar should be available on Flink server. I tried to figure out path of my jar but couldn't able to. Does Flink keep these jar file on server? If yes, where I can find? Any documentation? Please help. Thanks!
JAR files are renamed when they are uploaded and stored in a directory that can be configured with the web.upload.dir configuration key.
If the web.upload.dir parameter is not set, the JAR files are stored in a dynamically generated directory under the jobmanager.web.tmpdir (default is System.getProperty("java.io.tmpdir")).

How to specify log file different from daemon log file while submitting a flink job in a standalone flink cluster

When I am starting a flink standalone cluster, It logs daemon logs in a file mentioned in conf/log4j.properties file, and when I submit a flink job in that cluster, it uses same properties file to log the application logs and write into same log file on taskmanagers. I want to have separate log files for my each application submitted in that flink standalone cluster. Is there any way to achieve that
When you submit the job using the ./bin/flink shell script, use the following environment variables to control log file location:
FLINK_LOG_DIR specifies the directory where the log will appear
FLINK_IDENT_STRING allows you to make the filename unique
For example if you start your job with
FLINK_LOG_DIR=/var/log FLINK_IDENT_STRING=my_app_id ./bin/flink run /path/to/the.jar
then the logs will appear in /var/log/flink-my_app_id-client-$HOSTNAME.log
Note that this only applies to the messages that are logged via the logging frameworks and not for the things that are just printed to stdout.

Resources