Using flink sql client to submit sql query. How to I restore from checkpoint or savepoint - apache-flink

I have started a local flink cluster using
./start-cluster.sh
I have started a local sql-client using
./sql-client.sh
I am able to submit sql statement in Flink SQL terminal.
I have run Set 'state.checkpoints.dir' = 'file:///tmp/flink-savepoints-directory-from-set'; --> I can see checkpoint folder and getting created and updated when the sql job is running. ( sql job is reading from a kafka topic, does some joins and writing to another topic).
When I cancel the job from the flink UI and submit the sql again, the job does not restore from the state. ( I am basing this on the fact that the output or final sink, emits the same message on every restart, its like the job is reading the beginning of source topic again).
I have not shutdown the flink cluster or kafka cluster.
I have 2 questions
How I get the sql query to restore from state ?
Is there a way to use flink run -s ... command to submit sql query directly instead of packaging this as a jar ?

This is documented at https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/table/sqlclient/#start-a-sql-job-from-a-savepoint
SET 'execution.savepoint.path' = '/tmp/flink-savepoints/savepoint-cca7bc-bb1e257f0dab';

Related

Flink jdbc sink not commiting in web ui

I have a problem with one of my new developed flink jobs.
When i run it in IntelliJ the job is working fine and commiting records to the database.
Next step was to upload it to the flink web ui and execute it there.
The database connection is established and also the inserts seem to be sended to the oracle database but the data seems to be not commited.
Im using a DataStream with the following setup:
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(10000);
...
DataStreamSink<POJO> pojoSink = filteredStream
.addSink(JdbcSink.sink(
sqlString,
JdbcStatementBuilder,
new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
.withUrl(url)
.withDriverName(driver)
.withUsername(user)
.withPassword(password)
.build());
I have no clue why it works on my laptop in the IDE but not on at the server via the web ui.
The server logs are also not having any errors and showing the checkpoints.
Maybe someone has a suggestion where i can have a look what the problem might be.
Cheers
It seems like it was a one time error. At the next time the job run perfectly.

Couldn't commit processed log positions with the source database due to a concurrent connector shutdown or restart

I am receiving the mentioned warning continuously from Debezium Connector for SQL Server, that I am running by using connect-standalone. At a certain moment, I have tried to start two concurent connectors that connect to the same database, but that was yesterday, and after that I restarted this connector several times, and the other connector is stopped, so I don't know where is that information persisted, and why, because, at the moment only one connector is running, so this shouldn't be logged. As CDC does not work, this seems like a problem, regardless of the fact that it is logged as only a warning, because no (other) error is logged but such lines:
WARN Couldn't commit processed log positions with the source database due to a concurrent connector shutdown or restart (io.debezium.connector.common.BaseSourceTask:238)

SQL Server Agent Job stops SSIS Step with "unexpected error" and without any error informations

I am dealing with my problem on some Windows Server 2019 (Core) with one running SQL Server 2019 CU4 instance each.
What we try to do
We are currently building a data warehouse with distributed databases. The individual layers of the DWH are located on one database server each. The data exchange between the layers/servers takes place via SSIS ETLs, which use Linked Servers to reach the other layers and drag and drop data. Each layer also has its own SSIS service instance and executes the corresponding SSIS packets.
The SSIS packages are called by SQL Server Agent jobs. We have a job that executes the SSIS packets (#1), which in turn calls another job (#2) as the last step, which after a short wait time executes the calling job (#1). Thus, controlled by schedules, a loop is created and data is continuously transferred with ETLs.
I hope this was not too much unnecessary background
The error
Basically the job is running and there are numerous successful executions. However, we are observing interruptions at job #1 without helpful information regarding the error. This means that the job history log refers to the SSIS log, which again only contains an "unexpected termination". In the SSIS log, we only see behavior that indicates that the ETL packet active at that time stopped after validation. Depending on the log level, nothing is logged at all, not even the execution of single packages of the project. The package where this error occurs is different and not limited to a specific one.
What I have already tried
Re-create the jobs and SSIS Enviroments by hand (scripted before)
Using the 32Bit Runtime
Upgrade the SSIS project/package version to
2019
Increase the log level to "verbose"
Patching the SQL Server to CU4
Save ssis dump files (couldn't find them or they weren't created)
Search Windows and SQL Server Logfiles
Does anyone have some suggestions or some ideas how to become more error specific informations?
Thank you very much and take care :)
UPDATE We have an error message (OLE DB 0xC0202009 and 0X80004005)!
In order to exclude the use of environments as a cause, I manually set the parameters in the SSIS job step instead of overwriting them by selecting an environment.
Long story short: Today it turns out that the parameter for an OLE DB Connection String is not passed correctly.
The following is specified as a parameter in the job step:
However, the following connection string is specified in the context of the error message:
Please note that some arguments are added twice to the parameter (red).
What could have caused that?

how to keep apache flink task and submit record when restart jobmanager

I am using apache flink 1.10 to batch compute my stream data, today I move my apache flink kubernetes(v1.15.2) pod from machine 1 to machine 2 and find all submit task record and task list disappear, what's happening? the summit record is in the memory? what should I to keep my submit record and task list when restart the kubernetes pod of apache flink? I just found checkpoint persistant but nothing about tasks.
If lose the running task history, I must upload my task jar and recreate all task, so many task should to recreate if lose the history, is there any possible to resume the task automaticlly?
The configurations that might not be set are:
Job Manager
jobmanager.archive.fs.dir: hdfs:///completed-jobs
History Server
# Monitor the following directories for completed jobs
historyserver.archive.fs.dir: hdfs:///completed-jobs
# Refresh every 10 seconds
historyserver.archive.fs.refresh-interval: 10000
Please look at for more details: https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/historyserver.html#configuration

Purging Logs in SQL Database - Serilog

I am using serlog for logging in my webapi and working fine. I used SQL Server to log and the following is the serlog config for the same.
__serilogLogger = new LoggerConfiguration()
.Enrich.WithProperty("ApplicationIPv4", _ipv4)
.Enrich.WithProperty("ApplicationIPv6", _ipv6)
.WriteTo.MSSqlServer(connectionString, tableName /*, columnOptions: columnOptions*/)
.WriteTo
.Seq(ConfigurationManager.AppSettings["SerilogServer"])
.CreateLogger();
I am beginner in serilog. My confusion is how to purge the logs in database. Any options in serilog to hold last 3 months data only like that.
Based on chat in serilog Gitter, there is no option for that. We can do using Sql Job Agent or any other scheduled job.

Resources