Watermarks in a RichParallelSourceFunction - apache-flink

I am implementing a SourceFunction, which reads Data from a Database.
The job should be able to be resumed if stopped or crushed (i.e savepoints and checkpoints) with the data being processed exactly once.
What I have so far:
#SerialVersionUID(1L)
class JDBCSource(private val waitTimeMs: Long) extends
RichParallelSourceFunction[Event] with StoppableFunction with LazyLogging{
#transient var client: PostGreClient = _
#volatile var isRunning: Boolean = true
val DEFAULT_WAIT_TIME_MS = 1000
def this(clientConfig: Serializable) =
this(clientConfig, DEFAULT_WAIT_TIME_MS)
override def stop(): Unit = {
this.isRunning = false
}
override def open(parameters: Configuration): Unit = {
super.open(parameters)
client = new JDBCClient
}
override def run(ctx: SourceFunction.SourceContext[Event]): Unit = {
while (isRunning){
val statement = client.getConnection.createStatement()
val resultSet = statement.executeQuery("SELECT name, timestamp FROM MYTABLE")
while (resultSet.next()) {
val event: String = resultSet.getString("name")
val timestamp: Long = resultSet.getLong("timestamp")
ctx.collectWithTimestamp(new Event(name, timestamp), timestamp)
}
}
}
override def cancel(): Unit = {
isRunning = false
}
}
How can I make sure to only get the rows of the database which aren't processed yet?
I assumed the ctx variable would have some information about the current watermark so that I could change my query to something like:
select name, timestamp from myTable where timestamp > ctx.getCurrentWaterMark
But it doesn't have any relevant methods for me. Any Ideas how to solve this problem would be appreciated

You have to implement CheckpointedFunction so that you can manage checkpointing by yourself. The documentation of the interface is pretty comprehensive but if you need an example I advise you to take a look at an example.
In essence, your function must implement CheckpointedFunction#snapshotState to store the state you need using Flink's managed state and then, when performing a restore, it will read that same state in CheckpointedFunction#initializeState.

Related

It is necessary to perform the planned action through the apex scheduler

i have this code:
public static String currencyConverter() {
allCurrency test = new allCurrency();
HttpRequest request = new HttpRequest();
HttpResponse response = new HttpResponse();
Http http = new Http();
request.setEndpoint(endPoint);
request.setMethod('GET');
response = http.send(request);
String JSONresponse = response.getBody();
currencyJSON currencyJSON = (currencyJSON)JSON.deserialize(JSONresponse, currencyJSON.class);
test.USD = currencyJSON.rates.USD;
test.CAD = currencyJSON.rates.CAD;
test.EUR = currencyJSON.rates.EUR;
test.GBP = currencyJSON.rates.GBP;
Log__c logObject = new Log__c();
logObject.Status_Code__c = String.valueOf(response.getStatusCode());
logObject.Response_Body__c = response.getBody();
insert logObject;
if (response.getStatusCode() == 200) {
Exchange_Rate__c ExchangeRateObject = new Exchange_Rate__c();
ExchangeRateObject.CAD__c = currencyJSON.rates.CAD;
ExchangeRateObject.EUR__c = currencyJSON.rates.EUR;
ExchangeRateObject.GBP__c = currencyJSON.rates.GBP;
ExchangeRateObject.USD__c = currencyJSON.rates.USD;
ExchangeRateObject.Log__c = logObject.id;
insert ExchangeRateObject;
}
return JSON.serialize(test);
}
Here I am getting different currencies and then calling them in LWC and also I create two objects with values.
I want these objects to be created every day at 12PM.
Tell me how to implement this through the apex scheduler.
You need a class that implements Schedulable. Can be this class, just add that to the header and add execute method. Or can be separate class, doesn't matter much.
https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/apex_scheduler.htm
Something like
public with sharing MyClass implements Schedulable{
public static String currencyConverter() { ... your method here }
public void execute(SchedulableContext sc) {
currencyConverter();
}
}
Once your class saves OK you should be able to schedule it from UI (Setup -> Apex Classes -> button) or with System.schedule 1-liner from Developer Console for example.
Alternative would be to have a scheduled Flow - but that's bit more work with making your Apex callable from Flow.
P.S. Don't name your variables test. Eventually some time later you may need "real" stuff like Test.isRunningTest() and that will be "fun". It's like writing
Integer Account = 5; // happy debugging suckers

Room Data Base Create Instance

I want to Create An Instance Of Room Data base in Composable
But
val db = Room.databaseBuilder(applicationContext, UserDatabase::class.java,"users.db").build()
is not working here not getting applicationContext
How to create an instance of context in composable
Have you tried getting the context with : val context = LocalContext.current and then adding this to get your applicationContext?
Like this: context.applicationContext or using simply val db = Room.databaseBuilder(context, UserDatabase::class.java,"users.db").build()
Room (and the underlying SQliteOpenHelper) only need the context to open the database (or more correctly to instantiate the underlying SQLiteOpenHelper).
Room/Android SQLiteOpenHelper uses the context to ascertain the Application's standard (recommended) location (data/data/<the_package_name>/databases). e.g. in the following demo (via Device Explorer):-
The database, as it is still open includes 3 files (the -wal and -shm are the Write Ahead Logging files that will at sometime be committed/written to the actual database (SQLite handles that)).
so roughly speaking Room only needs to have the context so that it can ascertain /data/data/a.a.so75008030kotlinroomgetinstancewithoutcontext/databases/testit.db (in the case of the demo).
So if you cannot use the applicationContext method then you can circumvent the need to provide the context, if using a singleton approach AND if after instantiating the singleton.
Perhaps consider this demo:-
First some pretty basic DB Stuff (table (#Entity annotated class), DAO functions and #Database annotated abstract class WITH singleton approach). BUT with some additional functions for accessing the instance without the context.
#Entity
data class TestIt(
#PrimaryKey
val testIt_id: Long?=null,
val testIt_name: String
)
#Dao
interface DAOs {
#Insert(onConflict = OnConflictStrategy.IGNORE)
fun insert(testIt: TestIt): Long
#Query("SELECT * FROM testit")
fun getAllTestItRows(): List<TestIt>
}
#Database(entities = [TestIt::class], exportSchema = false, version = 1)
abstract class TestItDatabase: RoomDatabase() {
abstract fun getDAOs(): DAOs
companion object {
private var instance: TestItDatabase?=null
/* Extra/not typical for without a context (if wanted)*/
fun isInstanceWithoutContextAvailable() : Boolean {
return instance != null
}
/******************************************************/
/* Extra/not typical for without a context */
/******************************************************/
fun getInstanceWithoutContext(): TestItDatabase? {
if (instance != null) {
return instance as TestItDatabase
}
return null
}
/* Typically the only function*/
fun getInstance(context: Context): TestItDatabase {
if (instance==null) {
instance = Room.databaseBuilder(context,TestItDatabase::class.java,"testit.db")
.allowMainThreadQueries() /* for convenience/brevity of demo */
.build()
}
return instance as TestItDatabase
}
}
}
And to demonstrate (within an activity for brevity) :-
class MainActivity : AppCompatActivity() {
lateinit var roomInstance: TestItDatabase
lateinit var dao: DAOs
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
setContentView(R.layout.activity_main)
roomInstance = TestItDatabase.getInstance(this) /* MUST be used before withoutContext functions but could be elsewhere shown here for brevity */
dao = roomInstance.getDAOs()
//dao.insert(TestIt(testIt_name = "New001")) /* Removed to test actually doing the database open with the without context */
logDataWithoutContext()
addRowWithoutContext()
addRowWithApplicationContext()
logDataWithoutContext()
}
private fun logDataWithoutContext() {
Log.d("${TAG}_LDWC","Room DB Instantiated = ${TestItDatabase.isInstanceWithoutContextAvailable()}")
for (t in TestItDatabase.getInstanceWithoutContext()!!.getDAOs().getAllTestItRows()) {
Log.d("${TAG}_LDWC_DATA","TestIt Name is ${t.testIt_name} ID is ${t.testIt_id}")
}
}
private fun addRowWithoutContext() {
Log.d("${TAG}_LDWC","Room DB Instantiated = ${TestItDatabase.isInstanceWithoutContextAvailable()}")
if (TestItDatabase.getInstanceWithoutContext()!!.getDAOs()
.insert(TestIt(System.currentTimeMillis(),"NEW AS PER ID (the time to millis) WITHOUT CONTEXT")) > 0) {
Log.d("${TAG}_ARWC_OK","Row successfully inserted.")
} else {
Log.d("${TAG}_ARWC_OUCH","Row was not successfully inserted (duplicate ID)")
}
}
private fun addRowWithApplicationContext() {
TestItDatabase.getInstance(applicationContext).getDAOs().insert(TestIt(System.currentTimeMillis() / 1000,"NEW AS PER ID (the time to seconds) WITH CONTEXT"))
}
}
The result output to the log showing that the database access, either way, worked:-
2023-01-05 12:45:39.020 D/DBINFO_LDWC: Room DB Instantiated = true
2023-01-05 12:45:39.074 D/DBINFO_LDWC: Room DB Instantiated = true
2023-01-05 12:45:39.077 D/DBINFO_ARWC_OK: Row successfully inserted.
2023-01-05 12:45:39.096 D/DBINFO_LDWC: Room DB Instantiated = true
2023-01-05 12:45:39.098 D/DBINFO_LDWC_DATA: TestIt Name is NEW AS PER ID (the time to seconds) WITH CONTEXT ID is 1672883139
2023-01-05 12:45:39.098 D/DBINFO_LDWC_DATA: TestIt Name is NEW AS PER ID (the time to millis) WITHOUT CONTEXT ID is 1672883139075
note that the shorter id was the last added but appears first due to it being selected first as it appears earlier in the index that the SQlite Query Optimiser would have used (aka the Primary Key).
basically the same date time second wise but the first insert included milliseconds whilst the insert via AddRowWithApplicationContext drops the milliseconds.

Kotlin room database boolean

Im trying to create an app where a user can has a list of goals of steps to reach they create and then choose one of them to be active and to follow. The database works when I was just using the goal id, name, and steps but now I realised I need to insert another column defining when a goal is active so Im trying to insert that, however I don't know how I should handle the boolean especially in the repository and viewModel. I'd appreciate any help. Thanks in advance
here's my code
interface Dao {
#Insert(onConflict = OnConflictStrategy.IGNORE)
suspend fun insert(goal: Goal)
#Update
suspend fun updateGoal(goal: Goal)
#Query("SELECT * FROM user_goal_table order by goalId")
fun getAll(): LiveData<List<Goal>>
#Query("SELECT * FROM user_goal_table WHERE goalId = :key")
suspend fun getGoal(key: Int): Goal
#Delete
suspend fun delete(goal: Goal)
#Query("SELECT * FROM user_goal_table WHERE goal_is_active = 1 order by goalId")
suspend fun makeGoalActive(key: Int): Goal
class Repository (private val dao : Dao){
val allGoals: LiveData<List<Goal>> = Dao.getAll()
suspend fun insert(goal: Goal){
dao.insert(goal)
}
suspend fun update(goal: Goal){
dao.update(goal)
}
suspend fun delete(goal: Goal){
dao.delete(goal)
}
suspend fun active(goal: Goal, int: Int){
dao.makeGoalActive(int)
}
class ViewModel (application: Application) : AndroidViewModel(application) {
val allGoals: LiveData<List<Goal>>
private val repository: Repository
init{
val dao = GoalDatabase.getInstance(application).getGoalDao()
repo = Repository(dao)
allGoals = repository.allGoals
}
fun insert(goal: Goal) = viewModelScope.launch(Dispatchers.IO){
repository.insert(goal)
}
fun update(goal: Goal) = viewModelScope.launch(Dispatchers.IO){
repository.update(goal)
}
fun delete(goal: Goal) = viewModelScope.launch(Dispatchers.IO){
repository.delete(goal)
}
Just add a new property like isActive: Boolean to your Goal class and then use #Update annotation in Room (that you've already implemented in updateGoal(goal: Goal) method of your Dao) or UPDATE command itself in SQLite to update the row you want to change its isActive state. For using SQLite do something like below:
#Query("UPDATE user_goal_table SET isActive = 1 WHERE goalId = :goalId")
suspend fun makeGoalActive(goalId: Int)
For boolean properties, use 1 for true and 0 for false in SQLite commands.
In Repository, this method is enough:
suspend fun active(goalId: Int) {
dao.makeGoalActive(int)
}
And in the ViewModel:
fun insert(goal: Goal) = viewModelScope.launch {
repository.insert(goal)
}
Btw, you don't need to determine IO dispatcher for Room methods, Room uses its own dispatcher to run queries.

is it possible to make different keys have independent watermark

I am using Flink 1.12 and I have a keyed stream, in my code it looks that both A and B share the same watermark? and therefore B is determined as late because A's coming has upgraded the watermark to be 2020-08-30 10:50:11?
The output is A(2020-08-30 10:50:08, 2020-08-30 10:50:16):2020-08-30 10:50:15,there is no output for B
I would ask whether it is possible to make different keys have independent watermark? A's watermark and B'watermark change independently
The application code is:
import java.text.SimpleDateFormat
import java.util.Date
import java.util.concurrent.TimeUnit
import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.functions.AssignerWithPunctuatedWatermarks
import org.apache.flink.streaming.api.scala.function.WindowFunction
import org.apache.flink.streaming.api.scala.{StreamExecutionEnvironment, _}
import org.apache.flink.streaming.api.watermark.Watermark
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows
import org.apache.flink.streaming.api.windowing.time.Time
import org.apache.flink.streaming.api.windowing.windows.TimeWindow
import org.apache.flink.util.Collector
object DemoDiscardLateEvent4_KeyStream {
def to_milli(str: String) =
new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").parse(str).getTime
def to_char(milli: Long) = {
val date = if (milli <= 0) new Date(0) else new Date(milli)
new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(date)
}
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
val data = Seq(
("A", "2020-08-30 10:50:15"),
("B", "2020-08-30 10:50:07")
)
env.fromCollection(data).setParallelism(1).assignTimestampsAndWatermarks(new AssignerWithPunctuatedWatermarks[(String, String)]() {
var maxSeen = Long.MinValue
override def checkAndGetNextWatermark(lastElement: (String, String), extractedTimestamp: Long): Watermark = {
val eventTime = to_milli(lastElement._2)
if (eventTime > maxSeen) {
maxSeen = eventTime
}
//Allow 4 seconds late
new Watermark(maxSeen - 4000)
}
override def extractTimestamp(element: (String, String), previousElementTimestamp: Long): Long = to_milli(element._2)
}).keyBy(_._1).window(TumblingEventTimeWindows.of(Time.of(8, TimeUnit.SECONDS))).apply(new WindowFunction[(String, String), String, String, TimeWindow] {
override def apply(key: String, window: TimeWindow, input: Iterable[(String, String)], out: Collector[String]): Unit = {
val start = to_char(window.getStart)
val end = to_char(window.getEnd)
val sb = new StringBuilder
//the start and end of the window
sb.append(s"$key($start, $end):")
//The content of the window
input.foreach {
e => sb.append(e._2 + ",")
}
out.collect(sb.toString().substring(0, sb.length - 1))
}
}).print()
env.execute()
}
}
While it would sometimes be helpful if Flink offered per-key watermarking, it does not.
Each parallel instance of your WatermarkStrategy (or in this case, of your AssignerWithPunctuatedWatermarks) is generating watermarks independently, based on the timestamps of the events it observes (regardless of their keys).
One way to work around the lack of this feature is to not use watermarks at all. For example, if you would be using per-key watermarks to trigger keyed event-time windows, you can instead implement your own windows using a KeyedProcessFunction, and instead of using watermarks to trigger event time timers, keep track of the largest timestamp seen so far for each key, and whenever updating that value, determine if you now want to close one or more windows for that key.
See one of the Flink training lessons for an example of how to implement keyed tumbling windows with a KeyedProcessFunction. This example depends on watermarks but should help you get started.

Flink interval join does not output

We have a Flink job that does intervalJoin two streams, both streams consume events from Kafka. Here is the example code
val articleEventStream: DataStream[ArticleEvent] = env.addSource(articleEventSource)
.assignTimestampsAndWatermarks(new ArticleEventAssigner)
val feedbackEventStream: DataStream[FeedbackEvent] = env.addSource(feedbackEventSource)
.assignTimestampsAndWatermarks(new FeedbackEventAssigner)
articleEventStream
.keyBy(article => article.id)
.intervalJoin(feedbackEventStream.keyBy(feedback => feedback.article.id))
.between(Time.seconds(-5), Time.seconds(10))
.process(new ProcessJoinFunction[ArticleEvent, FeedbackEvent, String] {
override def processElement(left: ArticleEvent, right: FeedbackEvent, ctx: ProcessJoinFunction[ArticleEvent, FeedbackEvent, String]#Context, out: Collector[String]): Unit = {
out.collect(left.name + " got feedback: " + right.feedback);
}
});
});
class ArticleEventAssigner extends AssignerWithPunctuatedWatermarks[ArticleEvent] {
val bound: Long = 5 * 1000
override def checkAndGetNextWatermark(lastElement: ArticleEvent, extractedTimestamp: Long): Watermark = {
new Watermark(extractedTimestamp - bound)
}
override def extractTimestamp(element: ArticleEvent, previousElementTimestamp: Long): Long = {
element.occurredAt
}
}
class FeedbackEventAssigner extends AssignerWithPunctuatedWatermarks[FeedbackEvent] {
val bound: Long = 5 * 1000
override def checkAndGetNextWatermark(lastElement: FeedbackEvent, extractedTimestamp: Long): Watermark = {
new Watermark(extractedTimestamp - bound)
}
override def extractTimestamp(element: FeedbackEvent, previousElementTimestamp: Long): Long = {
element.occurredAt
}
}
However, we do not see any joined output. We checked that each stream does continuously emit elements with timestamp and proper watermark. Does anyone have any hint what could be possible reasons?
After checking different parts (timestamp/watermark, triggers), I just noticed that I made a mistake, i.e., the window size I used
between(Time.seconds(-5), Time.seconds(10))
is just too small, which could not find elements from both streams to join. This might sound obvious, but since I am new to Flink, I did not know where to check.
So, my lesson is that if the join does not output, it could be necessary to check the window size.
And thanks all for the comments!

Resources