The begin and end time of windows - flink-streaming

What would be the way to show the begin and end time of windows? something like implement user-defined windows?
Would like to know the time that windows begin and evaluate such that the output is
quantity(WindowAll Sum), window_start_time, window_end_time
12, 1:13:21, 1:13:41
6, 1:13:41, 1:15:01

Found the answer. TimeWindow.class has getStart() and getEnd()
example usage:
public static class SumAllWindow implements AllWindowFunction<Tuple2<String,Integer>,
Tuple3<Integer, String, String>, TimeWindow> {
private static transient DateTimeFormatter timeFormatter =
DateTimeFormat.forPattern("yyyy-MM-dd'T'HH:mm:ss.SS").withLocale(Locale.GERMAN).
withZone(DateTimeZone.forID("Europe/Berlin"));
#Override
public void apply (TimeWindow window, Iterable<Tuple2<String, Integer>> values,
Collector<Tuple3<Integer, String, String>> out) throws Exception {
DateTime startTs = new DateTime(window.getStart(), DateTimeZone.forID("Europe/Berlin"));
DateTime endTs = new DateTime(window.getEnd(), DateTimeZone.forID("Europe/Berlin"));
int sum = 0;
for (Tuple2<String, Integer> value : values) {
sum += value.f1;
}
out.collect(new Tuple3<>(sum, startTs.toString(timeFormatter), endTs.toString(timeFormatter)));
}
}
in main()
msgStream.timeWindowAll(Time.of(6, TimeUnit.SECONDS)).apply(new SumAllWindow()).print();

Related

Flink Watermarks on Event Time

I'm trying to understand watermarks with Event Time.
My Code is similar with Flink Documentation WordCount example .
I did some changes to include timestamp on event and added watermarks.
Event format is: word;timestamp
The map function creates a tuple3 with word;1;timestamp.
Then it's assign a watermark strategy with timestamp assigner equals to event timestamp field.
For the following stream events:
test;1662128808294
test;1662128818065
test;1662128822434
test;1662128826434
test;1662128831175
test;1662128836581
I got the following result: (test,6) => This is correct, i sent 6 times test word.
But looking for context in ProcessFunction i see the following:
Processing Time: Fri Sep 02 15:27:20 WEST 2022
Watermark: Fri Sep 02 15:26:56 WEST 2022
Start Window: 2022 09 02 15:26:40 End Window: 2022 09 02 15:27:20
The window it's correct, it's 40 seconds window as defined, and the watermark also it's correct, it's 20 seconds less the last event timestamp (1662128836581 = Friday, September 2, 2022 3:27:16) as defined in watermark strategy.
My Question is the window processing time. The window fired exactly at end window time processing time, but shouldn't wait until watermark pass the end of window (something like processing time = end of window + 20 seconds) (Window Default Trigger Docs) ?
What i'm doing wrong? or i'm having a bad understanding about watermarks?
My Code:
public class DataStreamJob {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
WatermarkStrategy<Tuple3<String, Integer, Long>> strategy = WatermarkStrategy
.<Tuple3<String, Integer, Long>>forBoundedOutOfOrderness(Duration.ofSeconds(20))
.withTimestampAssigner((event, timestamp) -> event.f2);
DataStream<Tuple2<String, Integer>> dataStream = env
.socketTextStream("localhost", 9999)
.map(new Splitter())
.assignTimestampsAndWatermarks(strategy)
.keyBy(value -> value.f0)
.window(TumblingProcessingTimeWindows.of(Time.seconds(40)))
.process(new MyProcessWindowFunction());
dataStream.print();
env.execute("Window WordCount");
}
public static class Splitter extends RichMapFunction<String, Tuple3<String, Integer, Long>> {
#Override
public Tuple3<String, Integer, Long> map(String value) throws Exception {
String[] word = value.split(";");
return new Tuple3<String, Integer, Long>(word[0], 1, Long.parseLong(word[1]));
}
}
public static class MyProcessWindowFunction extends ProcessWindowFunction<Tuple3<String, Integer, Long>, Tuple2<String, Integer>, String, TimeWindow> {
#Override
public void process(String s, ProcessWindowFunction<Tuple3<String, Integer, Long>, Tuple2<String, Integer>, String, TimeWindow>.Context context, Iterable<Tuple3<String, Integer, Long>> elements, Collector<Tuple2<String, Integer>> out) throws Exception {
Integer sum = 0;
for (Tuple3<String, Integer, Long> in : elements) {
sum++;
}
out.collect(new Tuple2<String, Integer>(s, sum));
Date date = new Date(context.window().getStart());
Date date2 = new Date(context.window().getEnd());
Date watermark = new Date(context.currentWatermark());
Date processingTime = new Date(context.currentProcessingTime());
System.out.println(context.currentWatermark());
System.out.println("Processing Time: " + processingTime);
Format format = new SimpleDateFormat("yyyy MM dd HH:mm:ss");
System.out.println("Watermark: " + watermark);
System.out.println("Start Window: " + format.format(date) + " End Window: " + format.format(date2));
}
}
}
Thanks.
To get event time windows, you need to change
.window(TumblingProcessingTimeWindows.of(Time.seconds(40)))
to
.window(TumblingEventTimeWindows.of(Time.seconds(40)))

How to get comma separated value using JPA in Database

enter image description here
Currently I've been problem what get my database data.. Because That's fileIdx Column was be Comma Seperated value
Problem is..
Caused by: java.sql.SQLException: Out of range value for column 'fileidx7_4_' : value 322,323
at org.mariadb.jdbc.internal.com.read.resultset.rowprotocol.TextRowProtocol.getInternalLong(TextRowProtocol.java:349)
at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.getLong(SelectResultSet.java:1001)
at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.getLong(SelectResultSet.java:995)
at com.zaxxer.hikari.pool.HikariProxyResultSet.getLong(HikariProxyResultSet.java)
at org.hibernate.type.descriptor.sql.BigIntTypeDescriptor$2.doExtract(BigIntTypeDescriptor.java:63)
at org.hibernate.type.descriptor.sql.BasicExtractor.extract(BasicExtractor.java:47)
at org.hibernate.type.AbstractStandardBasicType.nullSafeGet(AbstractStandardBasicType.java:257)
at org.hibernate.type.AbstractStandardBasicType.nullSafeGet(AbstractStandardBasicType.java:253)
at org.hibernate.type.AbstractStandardBasicType.nullSafeGet(AbstractStandardBasicType.java:243)
at org.hibernate.type.AbstractStandardBasicType.hydrate(AbstractStandardBasicType.java:329)
at org.hibernate.type.ManyToOneType.hydrate(ManyToOneType.java:184)
at org.hibernate.persister.entity.AbstractEntityPersister.hydrate(AbstractEntityPersister.java:3088)
at org.hibernate.loader.Loader.loadFromResultSet(Loader.java:1907)
at org.hibernate.loader.Loader.hydrateEntityState(Loader.java:1835)
at org.hibernate.loader.Loader.instanceNotYetLoaded(Loader.java:1808)
at org.hibernate.loader.Loader.getRow(Loader.java:1660)
at org.hibernate.loader.Loader.getRowFromResultSet(Loader.java:745)
at org.hibernate.loader.Loader.getRowsFromResultSet(Loader.java:1044)
at org.hibernate.loader.Loader.processResultSet(Loader.java:995)
at org.hibernate.loader.Loader.doQuery(Loader.java:964)
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:350)
at org.hibernate.loader.Loader.doList(Loader.java:2887)
... 108 more
It is my Entity ↓↓↓↓↓
#Entity
public class Board {
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
private Long idx;
private Long code;
private Long adminIdx;
private String write;
private String title;
private String content;
private String fileIdx;
private String deleted;
#Column(name = "sDate")
private Long sdate;
#Column(name = "eDate")
private Long edate;
private Long regdate;
It is my JpaRepository Method used JPQL ↓↓↓↓↓
#Query(value = "select b from Board b")
List<Board> getTest();
It is my TestCode ↓↓↓↓↓
#Test
public void test(){
List<Board> shopBoardEntity = boardRepository.getTest();
shopBoardEntity.forEach(o -> {
System.out.println("dddd : " + o.getTitle());
});
}
No matter how hard I try, I don't know..
Anyone Idea??

Flink DataSet Tuple Values not coming as expected

I have a Dataset<Tuple3<String,String,Double>> values which has the following data:
<Vijaya,Chocolate,5>
<Vijaya,Chips,10>
<Rahul,Chocolate,2>
<Rahul,Chips,8>
I want the DataSet<Tuple5<String,String,Double,String,Double>> values1as following:
<Vijaya,Chocolate,5,Chips,10>
<Rahul,Chocolate,2,Chips,8>
My code looks like following:
DataSet<Tuple5<String, String, Double, String, Double>> values1 = values.fullOuterJoin(values)
.where(0)
.equalTo(0)
.with(
new JoinFunction<Tuple3<String, String, Double>, Tuple3<String, String, Double>, Tuple5<String, String, Double, String, Double>>() {
private static final long serialVersionUID = 1L;
public Tuple5<String, String, Double, String, Double> join(Tuple3<String, String, Double> first, Tuple3<String, String, Double> second) {
return new Tuple5<String, String, Double, String, Double>(first.f0, first.f1, first.f2, second.f1, second.f2);
}
})
.distinct(1, 3)
.distinct(1);
In the above code I tried doing self join.I want the output in that particular format but I am unable to get it.
How to do this?
Please help.
Since you don't want the output to have the same item repeated, you can use a flat-join, in which you can output only those records that have the value in the 2nd position not equal to the value in the 4th position. Also, if you want only "chocolate" in the 2nd position, that can also be checked inside the FlatJoinFunction. Please find below the link to Flink's documentation about Flat-join.
https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/batch/dataset_transformations.html#join-with-flat-join-function
Approach using GroupReduceFunction:
values
.groupBy(0)
.reduceGroup(new GroupReduceFunction<Tuple3<String,String,Double>, Tuple2<String, String>>() {
#Override
public void reduce(Iterable<Tuple3<String,String,Double>> in, Collector<Tuple2<String, String>> out) {
StringBuilder output = new StringBuilder();
String name = null;
for (Tuple3<String,String,Double> item : in) {
name = item.f0;
output.append(item.f1+","+item.f2+",");
}
out.collect(new Tuple2<String, String>(name,output.toString()));
}
});

How to avoid repeated tuples in Flink slide window join?

For example, there are two streams. One is advertisements showed to users. The tuple in which could be described as (advertiseId, showed timestamp). The other one is click stream -- (advertiseId, clicked timestamp). We want get a joined stream, which includes all the advertisement that is clicked by user in 20 minutes after showed. My solution is to join these two streams on a SlidingTimeWindow. But in the joined stream, there are many repeated tuples. How could I get joined tuple only one in new stream?
stream1.join(stream2)
.where(0)
.equalTo(0)
.window(SlidingTimeWindows.of(Time.of(30, TimeUnit.MINUTES), Time.of(10, TimeUnit.SECONDS)))
Solution 1:
Let flink support join two streams on separate windows like Spark streaming. In this case, implement SlidingTimeWindows(21 mins, 1 min) on advertisement stream and TupblingTimeWindows(1 min) on Click stream, then join these two windowed streams.
TupblingTimeWindows could avoid duplicate records in the joined stream.
21 mins size SlidingTimeWindows could avoid missing legal clicks.
One issue is there would be some illegal click(click after 20 mins) in the joined stream. This problem could be fixed easily by adding a filter.
MultiWindowsJoinedStreams<Tuple2<String, Long>, Tuple2<String, Long>> joinedStreams =
new MultiWindowsJoinedStreams<>(advertisement, click);
DataStream<Tuple3<String, Long, Long>> joinedStream = joinedStreams.where(keySelector)
.window(SlidingTimeWindows.of(Time.of(21, TimeUnit.SECONDS), Time.of(1, TimeUnit.SECONDS)))
.equalTo(keySelector)
.window(TumblingTimeWindows.of(Time.of(1, TimeUnit.SECONDS)))
.apply(new JoinFunction<Tuple2<String, Long>, Tuple2<String, Long>, Tuple3<String, Long, Long>>() {
private static final long serialVersionUID = -3625150954096822268L;
#Override
public Tuple3<String, Long, Long> join(Tuple2<String, Long> first, Tuple2<String, Long> second) throws Exception {
return new Tuple3<>(first.f0, first.f1, second.f1);
}
});
joinedStream = joinedStream.filter(new FilterFunction<Tuple3<String, Long, Long>>() {
private static final long serialVersionUID = -4325256210808325338L;
#Override
public boolean filter(Tuple3<String, Long, Long> value) throws Exception {
return value.f1<value.f2&&value.f1+20000>=value.f2;
}
});
Solution 2:
Flink supports join operation without window. A join operator implement the interface TwoInputStreamOperator keeps two buffers(time length based) of these two streams and output one joined stream.
DataStream<Tuple2<String, Long>> advertisement = env
.addSource(new FlinkKafkaConsumer082<String>("advertisement", new SimpleStringSchema(), properties))
.map(new MapFunction<String, Tuple2<String, Long>>() {
private static final long serialVersionUID = -6564495005753073342L;
#Override
public Tuple2<String, Long> map(String value) throws Exception {
String[] splits = value.split(" ");
return new Tuple2<String, Long>(splits[0], Long.parseLong(splits[1]));
}
}).keyBy(keySelector).assignTimestamps(timestampExtractor1);
DataStream<Tuple2<String, Long>> click = env
.addSource(new FlinkKafkaConsumer082<String>("click", new SimpleStringSchema(), properties))
.map(new MapFunction<String, Tuple2<String, Long>>() {
private static final long serialVersionUID = -6564495005753073342L;
#Override
public Tuple2<String, Long> map(String value) throws Exception {
String[] splits = value.split(" ");
return new Tuple2<String, Long>(splits[0], Long.parseLong(splits[1]));
}
}).keyBy(keySelector).assignTimestamps(timestampExtractor2);
NoWindowJoinedStreams<Tuple2<String, Long>, Tuple2<String, Long>> joinedStreams =
new NoWindowJoinedStreams<>(advertisement, click);
DataStream<Tuple3<String, Long, Long>> joinedStream = joinedStreams
.where(keySelector)
.buffer(Time.of(20, TimeUnit.SECONDS))
.equalTo(keySelector)
.buffer(Time.of(5, TimeUnit.SECONDS))
.apply(new JoinFunction<Tuple2<String, Long>, Tuple2<String, Long>, Tuple3<String, Long, Long>>() {
private static final long serialVersionUID = -5075871109025215769L;
#Override
public Tuple3<String, Long, Long> join(Tuple2<String, Long> first, Tuple2<String, Long> second) throws Exception {
return new Tuple3<>(first.f0, first.f1, second.f1);
}
});
I implemented two new join operators base on Flink streaming API TwoInputTransformation. Please check Flink-stream-join. I will add more tests to this repository.
On your code, you defined an overlapping sliding window (slide is smaller than window size). If you don't want to have duplicates you can define a non-overlapping window by only specifying the window size (the default slide is equal to the window size).
While searching for a solution for the same problem, I found the "Interval Join" very useful, which does not repeatedly output the same elements. This is the example from the Flink documentation:
DataStream<Integer> orangeStream = ...
DataStream<Integer> greenStream = ...
orangeStream
.keyBy(<KeySelector>)
.intervalJoin(greenStream.keyBy(<KeySelector>))
.between(Time.milliseconds(-2), Time.milliseconds(1))
.process (new ProcessJoinFunction<Integer, Integer, String(){
#Override
public void processElement(Integer left, Integer right, Context ctx, Collector<String> out) {
out.collect(first + "," + second);
}
});
With this no explicit window has to be defined, instead an interval that is used for each single element like this (image from Flink documentation):

How to update a postgresql array column with spring JdbcTemplate?

I'm using Spring JdbcTemplate, and I'm stuck at the point where I have a query that updates a column that is actually an array of int. The database is postgres 8.3.7.
This is the code I'm using :
public int setUsersArray(int idUser, int idDevice, Collection<Integer> ids) {
int update = -666;
int[] tipi = new int[3];
tipi[0] = java.sql.Types.INTEGER;
tipi[1] = java.sql.Types.INTEGER;
tipi[2] = java.sql.Types.ARRAY;
try {
update = this.jdbcTemplate.update(setUsersArrayQuery, new Object[] {
ids, idUser, idDevice }, tipi);
} catch (Exception e) {
e.printStackTrace();
}
return update;
}
The query is "update table_name set array_column = ? where id_user = ? and id_device = ?".
I get this exception :
org.springframework.dao.DataIntegrityViolationException: PreparedStatementCallback; SQL [update acotel_msp.users_mau set denied_sub_client = ? where id_users = ? and id_mau = ?]; The column index is out of range: 4, number of columns: 3.; nested exception is org.postgresql.util.PSQLException: The column index is out of range: 4, number of columns: 3.
Caused by: org.postgresql.util.PSQLException: The column index is out of range: 4, number of columns: 3.
I've looked into spring jdbc template docs but I can't find any help, I'll keep looking, anyway could someone point me to the right direction? Thanks!
EDIT :
Obviously the order was wrong, my fault...
I tried both your solutions, in the first case I had this :
org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [update users set denied_sub_client = ? where id_users = ? and id_device = ?]; nested exception is org.postgresql.util.PSQLException: Cannot cast an instance of java.util.ArrayList to type Types.ARRAY
Trying the second solution I had this :
org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [update users set denied_sub_client = ? where id_users = ? and id_device = ?]; nested exception is org.postgresql.util.PSQLException: Cannot cast an instance of [Ljava.lang.Object; to type Types.ARRAY
I suppose i need an instance of java.sql.Array, but how can I create it using JdbcTemplate?
After struggling with many attempts, we settled to use a little helper ArraySqlValue to create Spring SqlValue objects for Java Array Types.
usage is like this
jdbcTemplate.update(
"UPDATE sometable SET arraycolumn = ?",
ArraySqlValue.create(arrayValue))
The ArraySqlValue can also be used in MapSqlParameterSource for use with NamedParameterJdbcTemplate.
import static com.google.common.base.Preconditions.checkNotNull;
import java.sql.Array;
import java.sql.JDBCType;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.Locale;
import org.springframework.jdbc.core.StatementCreatorUtils;
import org.springframework.jdbc.support.SqlValue;
public class ArraySqlValue implements SqlValue {
private final Object[] arr;
private final String dbTypeName;
public static ArraySqlValue create(final Object[] arr) {
return new ArraySqlValue(arr, determineDbTypeName(arr));
}
public static ArraySqlValue create(final Object[] arr, final String dbTypeName) {
return new ArraySqlValue(arr, dbTypeName);
}
private ArraySqlValue(final Object[] arr, final String dbTypeName) {
this.arr = checkNotNull(arr);
this.dbTypeName = checkNotNull(dbTypeName);
}
#Override
public void setValue(final PreparedStatement ps, final int paramIndex) throws SQLException {
final Array arrayValue = ps.getConnection().createArrayOf(dbTypeName, arr);
ps.setArray(paramIndex, arrayValue);
}
#Override
public void cleanup() {}
private static String determineDbTypeName(final Object[] arr) {
// use Spring Utils similar to normal JdbcTemplate inner workings
final int sqlParameterType =
StatementCreatorUtils.javaTypeToSqlParameterType(arr.getClass().getComponentType());
final JDBCType jdbcTypeToUse = JDBCType.valueOf(sqlParameterType);
// lowercasing typename for Postgres
final String typeNameToUse = jdbcTypeToUse.getName().toLowerCase(Locale.US);
return typeNameToUse;
}
}
this code is provided in the Public Domain
private static final String ARRAY_DATATYPE = "int4";
private static final String SQL_UPDATE = "UPDATE foo SET arr = ? WHERE d = ?";
final Integer[] existing = ...;
final DateTime dt = ...;
getJdbcTemplate().update(new PreparedStatementCreator() {
#Override
public PreparedStatement createPreparedStatement(final Connection con) throws SQLException {
final PreparedStatement ret = con.prepareStatement(SQL_UPDATE);
ret.setArray(1, con.createArrayOf(ARRAY_DATATYPE, existing));
ret.setDate(2, new java.sql.Date(dt.getMillis()));
return ret;
}
});
This solution is kind of workaround using postgreSQL built-in function, which definitely worked for me.
reference blog
1) Convert String Array to Comma Separated String
If you are using Java8, it's pretty easy. other options are here
String commaSeparatedString = String.join(",",stringArray); // Java8 feature
2) PostgreSQL built-in function string_to_array()
you can find other postgreSQL array functions here
// tableName ( name text, string_array_column_name text[] )
String query = "insert into tableName(name,string_array_column_name ) values(?, string_to_array(?,',') )";
int[] types = new int[] { Types.VARCHAR, Types.VARCHAR};
Object[] psParams = new Object[] {"Dhruvil Thaker",commaSeparatedString };
jdbcTemplate.batchUpdate(query, psParams ,types); // assuming you have jdbctemplate instance
The cleanest way I found so far is to first convert the Collection into an Integer[] and then use the Connection to convert that into an Array.
Integer[] idArray = ids.toArray(new Integer[0]);
Array idSqlArray = jdbcTemplate.execute(
(Connection c) -> c.createArrayOf(JDBCType.INTEGER.getName(), idArray)
);
update = this.jdbcTemplate.update(setUsersArrayQuery, new Object[] {
idSqlArray, idUser, idDevice })
This is based on information in the documentation: https://jdbc.postgresql.org/documentation/head/arrays.html
The argument type and argument is not matching.
Try changing the argument type order
int[] tipi = new int[3];
tipi[0] = java.sql.Types.ARRAY;
tipi[1] = java.sql.Types.INTEGER;
tipi[2] = java.sql.Types.INTEGER;
or use
update = this.jdbcTemplate.update(setUsersArrayQuery, new Object[] {
ids.toArray(), idUser, idDevice })
and see if it works
http://valgogtech.blogspot.com/2009/02/passing-arrays-to-postgresql-database.html explains how to create java.sql.Array postgresql
basically Array.getBaseTypeName should return int and Array.toString should return the array content in "{1,2,3}" format
after you create the array you can set it using preparedstatement.setArray(...)
from PreparedStatementCreator e.g.
jdbcTemplate.update(
new PreparedStatementCreator() {
public PreparedStatement createPreparedStatement(Connection connection) throws SQLException {
Good Luck ..
java.sql.Array intArray = connection.createArrayOf("int", existing);
List<Object> values= new ArrayList<Object>();
values.add(intArray);
values.add(dt);
getJdbcTemplate().update(SQL_UPDATE,values);

Resources