Savepoints without ./bin/flink - apache-flink

Is it possible to run a job from a savepoint having a direct
main() + LocalExecutionEnvironment setup?
Is it possible to do that through Remote*Environment?
Is it possible to do that or trigger a savepoint via ClusterClient?
Is the above possible through the rest api? Web ui (doesn't look like that)?
Finally, Is it possible to perform savepoint operations from local ./bin/flink against a remote cluster (same version but maybe different OS)?
Thank you.

To answer partially (3) you can do that using a ClusterClient by doing something similar to:
final Configuration config = GlobalConfiguration.loadConfiguration("...");
final ClusterClient client = new StandaloneClusterClient(config);
final PackagedProgram packagedProgram = new PackagedProgram(new File(FLINK_JOB_JAR));
packagedProgram.setSavepointRestoreSettings(SavepointRestoreSettings.forPath("...", true));
client.run(packagedProgram, 1);

Related

Flink CEP Event Not triggering

I have implement the CEP Pattern in Flink which is working as expected connecting to local Kafka broker. But when i connecting to cluster based cloud kafka setup, the Flink CEP is not triggering.
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//saves checkpoint
env.getCheckpointConfig().enableExternalizedCheckpoints(
CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
I am using AscendingTimestampExtractor,
consumer.assignTimestampsAndWatermarks(
new AscendingTimestampExtractor<ObjectNode>() {
#Override
public long extractAscendingTimestamp(ObjectNode objectNode) {
long timestamp;
Instant instant = Instant.parse(objectNode.get("value").get("timestamp").asText());
timestamp = instant.toEpochMilli();
return timestamp;
}
});
And also i am getting Warn Message that,
AscendingTimestampExtractor:140 - Timestamp monotony violated: 1594017872227 < 1594017873133
And Also i tried using AssignerWithPeriodicWatermarks and AssignerWithPunctuatedWatermarks none of one is working
I have attached Flink console screenshot where Watermark is not assigning.
Updated flink console screenshot
Could Anyone Help?
CEP must first sort the input stream(s), which it does based on the watermarking. So
the problem could be with watermarking, but you haven't shown us enough to debug the cause. One common issue is having an idle source, which can prevent the watermarks from advancing.
But there are other possible causes. To debug the situation, I suggest you look at some metrics, either in the Flink Web UI or in a metrics system if you have one connected. To begin, check if records are flowing, by looking at numRecordsIn, numRecordsOut, or numRecordsInPerSecond and numRecordsOutPerSecond at different stages of your pipeline.
If there are events, then look at currentOutputWatermark throughout the different tasks of your job to see if event time is advancing.
Update:
It appears you may be calling assignTimestampsAndWatermarks on the Kafka consumer, which will result in per-partition watermarking. In that case, if you have an idle partition, that partition won't produce any watermarks, and that will hold back the overall watermark. Try calling assignTimestampsAndWatermarks on the DataStream produced by the source instead, to see if that fixes things. (Of course, without per-partition watermarking, you won't be able to use an AscendingTimestampExtractor, since the stream won't be in order.)

How to set a max retry value in App Engine Task Queue?

I have the following retry parameters:
<retry-parameters>
<task-retry-limit>7</task-retry-limit>
<task-age-limit>1d</task-age-limit>
<min-backoff-seconds>1</min-backoff-seconds>
<max-backoff-seconds>30</max-backoff-seconds>
</retry-parameters>
But when I check the queue, I see retries like 45. I had set the task-retry-limit to 7. So why is it going beyond that? How to set a max retry value? I am using App Engine standard with push based task queue and Java 8 env. Thanks.
private Queue fsQueue = QueueFactory.getQueue(FS_QUEUE_NAME);
// ...
Product fp = new Product();
fp.setId("someid");
// ...
TaskOptions opts = TaskOptions.Builder.withUrl("/api/task/fs/product").method(TaskOptions.Method.POST)
.payload(utils.toJSONBytes(fp), "application/json");
fsQueue.add(opts);
I think that your issue is related to the fact of using the queue.xml being deprecated. You should be using the queue.yaml instead.
You should also bear in mind that if you are using the Cloud Tasks API to manage your queue as well this might cause some collisions. In this documentation you'll find information on how to handle the most common problems.

How to restart a Flink job from code

This seems to be a fairly simple problem but after several days of research I still couldn't figure a way to gracefully cancel a Flink job and restart it from the code
As a reference, there is a similar post: Canceling Apache Flink job from the code, but it didn't tell how to get the JobManager, which has cancel() method that might help.
Someone can shed some lights on this?
I think ,easiest approach to cancel flink job through code would be to use rest api .
See : https://ci.apache.org/projects/flink/flink-docs-release-1.2/monitoring/rest_api.html#job-cancellation
Then you can define Restart Strategies in main class of your flink-code. like
final int restartAttempts = configuration.getInteger(RESTART_ATTEMPTS, 3);
final int delayBtwAttempts = configuration.getInteger(RESTART_DELAY_IN_MILLIS, 3000);
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setRestartStrategy(fixedDelayRestart(restartAttempts, delayBtwAttempts));
See : https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/restart_strategies.html
We can pass configuration to flink env like:
Configuration configuration = new Configuration();
configuration.setString(SavepointConfigOptions.SAVEPOINT_PATH, "//savepointPath");
configuration.setBoolean(SavepointConfigOptions.SAVEPOINT_IGNORE_UNCLAIMED_STATE, true);
StreamExecutionEnvironment env = StreamExecutionEnvironment.createRemoteEnvironment(
flinkProperties.getIp(),
flinkProperties.getPort(), configuration);

Neo4j store is not cleanly shut down; Recovering from inconsistent db state from interrupted batch insertion

I was importing ttl ontologies to dbpedia following the blog post http://michaelbloggs.blogspot.de/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html. The post uses BatchInserters to speed up the task. It mentions
Batch insertion is not transactional. If something goes wrong and you don't shutDown() your database properly, the database becomes inconsistent.
I had to interrupt one of the batch insertion tasks as it was taking time much longer than expected which left my database in an inconsistence state. I get the following message:
db_name store is not cleanly shut down
How can I recover my database from this state? Also, for future purposes is there a way for committing after importing every file so that reverting back to the last state would be trivial. I thought of git, but I am not sure if it would help for a binary file like index.db.
There are some cases where you cannot recover from unclean shutdowns when using the batch inserter api, please note that its package name org.neo4j.unsafe.batchinsert contains the word unsafe for a reason. The intention for batch inserter is to operate as fast as possible.
If you want to guarantee a clean shutdown you should use a try finally:
BatchInserter batch = BatchInserters.inserter(<dir>);
try {
} finally {
batch.shutdown();
}
Another alternative for special cases is registering a JVM shutdown hook. See the following snippet as an example:
BatchInserter batch = BatchInserters.inserter(<dir>);
// do some operations potentially throwing exceptions
Runtime.getRuntime().addShutdownHook(new Thread() {
public void run() {
batch.shutdown();
}
});

Provide a database packaged with the .APK file or host it separately on a website?

Here is some background about my app:
I am developing an Android app that will display a random quote or verse to the user. For this I am using an SQLite database. The size of the DB would be approximately 5K to 10K records, possibly increasing to upto 1M in later versions as new quotes and verses are added. Thus the user would need to update the DB as and when newer versions are of the app or DB are released.
After reading through some forums online, there seem to be two feasible ways I could provide the DB:
1. Bundle it along with the .APK file of the app, or
2. Upload it to my app's website from where users will have to download it
I want to know which method would be better (if there is yet another approach other than these, please do let me know).
After pondering this problem for some time, I have these thoughts regarding the above approaches:
Approach 1:
Users will obtain the DB along with the app, and won't have to download it separately. Installation would thereby be easier. But, users will have to reinstall the app every time there is a new version of the DB. Also, if the DB is large, it will make the installable too cumbersome.
Approach 2:
Users will have to download the full DB from the website (although I can provide a small, sample version of the DB via Approach 1). But, the installer will be simpler and smaller in size. Also, I would be able to provide future versions of the DB easily for those who might not want newer versions of the app.
Could you please tell me from a technical and an administrative standpoint which approach would be the better one and why?
If there is a third or fourth approach better than either of these, please let me know.
Thank you!
Andruid
I built a similar app for Android which gets periodic updates with data from a government agency. It's fairly easy to build an Android compatible db off the device using perl or similar and download it to the phone from a website; and this works rather well, plus the user gets current data whenever they download the app. It's also supposed to be possible to throw the data onto the sdcard if you want to avoid using primary data storage space, which is a bigger concern for my app which has a ~6Mb database.
In order to make Android happy with the DB, I believe you have to do the following (I build my DB using perl).
$st = $db->prepare( "CREATE TABLE \"android_metadata\" (\"locale\" TEXT DEFAULT 'en_US')");
$st->execute();
$st = $db->prepare( "INSERT INTO \"android_metadata\" VALUES ('en_US')");
$st->execute();
I have an update activity which checks weather updates are available and if so presents an "update now" screen. The download process looks like this and lives in a DatabaseHelperClass.
public void downloadUpdate(final Handler handler, final UpdateActivity updateActivity) {
URL url;
try {
close();
File f = new File(getDatabasePath());
if (f.exists()) {
f.delete();
}
getReadableDatabase();
close();
url = new URL("http://yourserver.com/" + currentDbVersion + ".sqlite");
URLConnection urlconn = url.openConnection();
final int contentLength = urlconn.getContentLength();
Log.i(TAG, String.format("Download size %d", contentLength));
handler.post(new Runnable() {
public void run() {
updateActivity.setProgressMax(contentLength);
}
});
InputStream is = urlconn.getInputStream();
// Open the empty db as the output stream
OutputStream os = new FileOutputStream(f);
// transfer bytes from the inputfile to the outputfile
byte[] buffer = new byte[1024 * 1000];
int written = 0;
int length = 0;
while (written < contentLength) {
length = is.read(buffer);
os.write(buffer, 0, length);
written += length;
final int currentprogress = written;
handler.post(new Runnable() {
public void run() {
Log.i(TAG, String.format("progress %d", currentprogress));
updateActivity.setCurrentProgress(currentprogress);
}
});
}
// Close the streams
os.flush();
os.close();
is.close();
Log.i(TAG, "Download complete");
openDatabase();
} catch (Exception e) {
Log.e(TAG, "bad things", e);
}
handler.post(new Runnable() {
public void run() {
updateActivity.refreshState(true);
}
});
}
Also note that I keep a version number in the filename of the db files, and a pointer to the current one in a text file on the server.
It sounds like your app and your db are tightly bound -- that is, the db is useless without the database and the database is useless without the app, so I'd say go ahead and put them both in the same .apk.
That being said, if you expect the db to change very slowly over time, but the app to change quicker, and you don't want your users to have to download the db with each new app revision, then you might want to unbundle them. To make this work, you can do one of two things:
Install them as separate applications, but make sure they share the same userID using the sharedUserId tag in the AndroidManifest.xml file.
Install them as separate applications, and create a ContentProvider for the database. This way other apps could make use of your database as well (if that is useful).
If you are going to store the db on your website then I would recommend that you just make rpc calls to your webserver and get data that way, so the device will never have to deal with a local database. Using a cache manager to avoid multiple lookups will help as well so pages will not have to lookup data each time a page reloads. Also if you need to update the data you do not have to send out a new app every time. Using HttpClient is pretty straight forward, if you need any examples please let me know

Resources