End to end exactly-once sample code - flink-streaming

I'm new to Apache Flink streaming and want to implement end to end exactly-once. I read the following blog and got the main ideas (I know how it works):
https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html
But I couldn't find any example (sample source code). Is there any source code of implementing end to end exactly-once using twoPhaseCommitSinkFunction
-thanks

The FlinkKafkaProducer011 implements the TwoPhaseCommitSinkFunction and can be found here.

Related

Using SourceFunction and SinkFunction in PyFlink

I am new to PyFlink. I have done the official training exercise in Java: https://github.com/apache/flink-training
However, the project I am working on must use Python as a programming language. I want to know if it is possible to write a data generator using the "SourceFunction". In older PyFlink versions this was possible, using Jython: https://nightlies.apache.org/flink/flink-docs-release-1.7/dev/stream/python.html#streaming-program-example
In newer examples the dataframe contains a finite set of data, which is never extended. I have not found any example of a data generator in PyFlink, e.g. https://github.com/apache/flink-training/blob/master/common/src/main/java/org/apache/flink/training/exercises/common/sources/TaxiRideGenerator.java
I am not sure which functionality the interfaces Source and SinkFunction provide. Can it be used somehow in python or can it only be used in combination with other pipelines or jar files? It looks like the methods "run()" and "cancel()" are not implemented and thus it cannot be used like some other classes, by overloading.
If it can not be used in Python, are there any other ways to use it? Someone may provide an easy example.
If it is not possible to use it, are there any other ways to write a data generator in OOP style? Take this example: https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/python/datastream_tutorial/ There the split() method is used to separate the stream. Basically, I want to do this by an extra class and just extending the stream, which was done in the Java TaxiRide example via "ctx.collect()". I am trying to avoid using Java, another framework for the pipeline, and Jython. It would be nice to get a short example code, but I appreciate any tips and advice.
I tried to use SourceFunction directly, but as already mentioned, I think this is a completely wrong way, resulting in an error: AttributeError: 'DataGenerator' object has no attribute '_get_object_id'
class DataGenerator(SourceFunction):
def __init__(self):
super().__init__(self)
self._num_iters = 1000
self._running = True
def run(self, ctx):
counter = 0
while self._running and counter < self._num_iters:
ctx.collect('Hello World')
counter += 1
def cancel(self):
self._running = False
Solution:
After looking in some older code using the classes Source and SinkFunction, I came to a solution. Here a kafka connector written in Java is used. The python code can be taken as an example of how to use pyflink's Source and SinkFuntion.
I have only written an example for the SourceFunction:
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.datastream import SourceFunction
from pyflink.java_gateway import get_gateway
class TaxiRideGenerator(SourceFunction):
def __init__(self):
java_src_class = get_gateway().jvm.org.apache.flink.training.exercises.common.sources.TaxiRideGenerator
java_src_obj = java_src_class()
super(TaxiRideGenerator, self).__init__(java_src_obj)
def show(ds, env):
# this is just a little helper to show the output of the pipeline
ds.print()
env.execute()
def streaming():
# arm the flink ExecutionEnvironment
env = StreamExecutionEnvironment.get_execution_environment()
env.set_parallelism(1)
taxi_src = TaxiRideGenerator()
ds = env.add_source(taxi_src)
show(ds, env)
if __name__ == "__main__":
streaming()
The second line in the class init was hard to find. I had expected to get an object in the first line.
You have to create a jar file after building this project.
I have entered the path until I see the folder "org":
$ cd flink-training/flink-training/common/build/classes/java/main
flink-training/common/build/classes/java/main$ ls
flink-training/common/build/classes/java/main$ org
flink-training/common/build/classes/java/main$ jar cvf flink-training.jar org/apache/flink/training/exercises/common/**/*.class
Copy the jar file to the pyflink/lib folder, normally under your python environment, e.g. flinkenv/lib/python3.8/site-packages/pyflink/lib. Then start the script.

Documentation for SupervisingRouteController

I have MQTT route like below
from("paho:mytopic?brokerUrl=tcp://0.0.0.0:1883&clientId=ipc)
.routeId("myroute")
.to("log:my?showAll=true&multiline=true");
it starts only if broker is available and after that if it lost connectivity with broker it handle it very well and resume.
But my concern is how i can start first time if broker is not available?
I searched on google and got to know  "SupervisingRouteController" might help in this regard, But no document is available how i can use it.
By some hit and trial i reach this point but what further i can do as no document available
final Main main = new Main();
main.addRouteBuilder(new MyMqttRoute());
SupervisingRouteController controller = main.getCamelContexts().get(0).getRouteController().unwrap(SupervisingRouteController.class);
main.run();
Here are two unit test cases that shows usage of SupervisingRouteController.
SupervisingRouteControllerTest.java
SupervisingRouteControllerRestartTest.java
These may be helpful in understanding its usage.

Spring Batch FlatFileItemWriter does not write data to a file

I am new to Spring Batch application. I am trying to use FlatFileItemWriter to write the data into a file. Challenge is application is creating the file on a given path, but, now writing the actual content into it.
Following are details related to code:
List<String> dataFileList : This list contains the data that I want to write to a file
FlatFileItemWriter<String> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource("C:\\Desktop\\test"));
writer.open(new ExecutionContext());
writer.setLineAggregator(new PassThroughLineAggregator<>());
writer.setAppendAllowed(true);
writer.write(dataFileList);
writer.close();
This is just generating the file at proper place but contents are not getting written into the file.
Am I missing something? Help is highly appreciated.
Thanks!
This is not a proper way to use Spring Batch Writer and writer data. You need to declare bean of Writer first.
Define Job Bean
Define Step Bean
Use your Writer bean in Step
Have a look at following examples:
https://github.com/pkainulainen/spring-batch-examples/blob/master/spring-boot/src/main/java/net/petrikainulainen/springbatch/csv/in/CsvFileToDatabaseJobConfig.java
https://spring.io/guides/gs/batch-processing/
You probably need to force a sync to disk. From the docs at https://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/item/file/FlatFileItemWriter.html,
setForceSync
public void setForceSync(boolean forceSync)
Flag to indicate that changes should be force-synced to disk on flush. Defaults to false, which means that even with a local disk changes could be lost if the OS crashes in between a write and a cache flush. Setting to true may result in slower performance for usage patterns involving many frequent writes.
Parameters:
forceSync - the flag value to set

SpringXD counter not working with Kafka source

I am using SpringXD 1.3 and Apache Kafka 0.9.0.0.
I have a functioning Kafka producer that I was able to configure as a Kafka source in Spring XD (I use a Groovy script to transform the message before logging it).
stream create --name metrics1 --definition "kafka --topic=metrics | transform --script=MetricsInterpreter.groovy | log" --deploy
I can see my Kafka messages getting printed in Spring XD logs. So this stream is working as intended.
However, the counter I create doesn't show up in the list of counters.
stream create --name metrics1tap1 --definition "tap:stream:metrics1 > counter --name=hitcount" --deploy
Although I get a success message (Created and deployed new stream 'metrics1tap1'), this counter does not show up when I try to list counters using "counter list" command.
I tried the TwitterSearch counter example from documentation and that worked fine.
Question: Is there a configuration/setup step that I am missing? Why would my own counter not work in this case?
(FYI both Kafka and SpringXD are running in dev/single-node mode)
Just to confirm the counter list will only display the specific counter if at least one value in the counter.
Are you sure you have at least one message that is possibly received by the counter?
Also, when you do stream list do you see the stream metrics1tap1 in there?

EF6 (6.1.3/ net45) Logging feature does not seem to work

When I attempt to run the line:
MyDBContext.Database.Log = Console.Write
The compiler smiles and tells me I don't know what I am doing...
The app won't compile because of the line and the error on that line is:
Overload resolution failed because no accessible Write accepts this number of arguments.
That makes sense. 'Console.Write' returns nothing and I am setting it equal to a System.Action(Of String)
This just seems kind of half baked.
I tried numerous ways to fix it including delegates, and some of the other 'new possibilities' moving this off the Context is supposed to offer but still no dice.
What am I missing? Is it something that was changed at the last minute?
I have two large edmx files (one connects to SQL Server and the other to Oracle) in the solution and all of that is working great.
Here are my version numbers if that can help.
EntityFramework 6.0.0.0 (folder is ...\EntityFramework.6.1.3\lib\net45\EntityFramework.dll)
EntityFramework.SqlServer 6.0.0.0 (folder is ...\EntityFramework.6.1.3\lib\net45\EntityFramework.dll)
Oracle.ManagedDataAccess.EntityFramework 6.121.2.0
I have a tool I created that lets me paste the output of the L2S 'mycontext.log' into it and it then parses it and creates SSMS ready SQL with variables... it has been incredibly useful. This has been one of my favorite features of L2S.
Please help me understand why this isn't working.
Thanks in advance.
This technique works for me:
public override int SaveChanges()
{
SetIStateInfo();
#if DEBUG
Database.Log = s => Debug.WriteLine(s);
#endif
return base.SaveChanges();
}
http://blogs.msdn.com/b/mpeder/archive/2014/06/16/how-to-see-the-actual-sql-query-generated-by-entity-framework.aspx
Well, the answer was to research the Action(T) Delegate which showed me how to do it.
#If DEBUG Then
myctx.Database.Log = AddressOf Console.Write
#End If
Just needed the AddressOf and I was back in business.

Resources