apache-flink: sliding window in output - apache-flink

I'm currently coding a small application to understand the sliding windowing in FLINK (with data input from a APACHE-KAFKA topic):
//Split kafka stream by comma and create tuple
DataStream<Tuple3<String, Integer, Date>> parsedStream = stream
.map((line) -> {
String[] cells = line.split(",");
return new Tuple3(cells[1], Integer.parseInt(cells[4]), f.parse(cells[2]));
});
DataStream<Tuple3<String, Integer, Date>> parsedStreamWithTSWM = parsedStream
.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<Tuple3<String, Integer, Date>>(Time.minutes(1)) {
#Override
public long extractTimestamp(Tuple3<String, Integer, Date> element) {
return element.f2.getTime();
}
});
//Sum values per windows and per id
DataStream<Tuple3<String, Integer, Date>> AggStream = parsedStreamWithTSWM
.keyBy(0)
.window(SlidingEventTimeWindows.of(Time.minutes(30), Time.minutes(1)))
.sum(1);
AggStream.print();
Is it possible to improve my output (AggStream.print();) by adding the window details which produce the aggregation output ?
$ tail -f flink-chapichapo-jobmanager-0.out
(228035740000002,300,Fri Apr 07 14:42:00 CEST 2017)
(228035740000000,28,Fri Apr 07 14:42:00 CEST 2017)
(228035740000002,300,Fri Apr 07 14:43:00 CEST 2017)
(228035740000000,27,Fri Apr 07 14:43:00 CEST 2017)
(228035740000002,300,Fri Apr 07 14:44:00 CEST 2017)
(228035740000000,26,Fri Apr 07 14:44:00 CEST 2017)
(228035740000001,27,Fri Apr 07 14:44:00 CEST 2017)
(228035740000002,300,Fri Apr 07 14:45:00 CEST 2017)
(228035740000000,25,Fri Apr 07 14:45:00 CEST 2017)
Thank you in advance

You can use the generic function apply where you have access to Window info.
public interface WindowFunction<IN, OUT, KEY, W extends Window> extends Function, Serializable {
/**
* Evaluates the window and outputs none or several elements.
*
* #param key The key for which this window is evaluated.
* #param window The window that is being evaluated.
* #param input The elements in the window being evaluated.
* #param out A collector for emitting elements.
*
* #throws Exception The function may throw exceptions to fail the program and trigger recovery.
*/
void apply(KEY key, W window, Iterable<IN> input, Collector<OUT> out) throws Exception;
}
See docs

Related

Apache Flink -TumblingProcessingTimeWindows - Incorrect calculation Start-End

it's very simple example:env.keyBy(value -> (...)) .window(TumblingProcessingTimeWindows.of(Time.hours(24))).addSink();
................
public Collection<TimeWindow> assignWindows(){
final long now = context.getCurrentProcessingTime();
long start = TimeWindow.getWindowStartWithOffset(now, offset, size);
// the value "now" is correct = 1603379120043 (Date in your timezone*: 10/22/2020, 12:05:20 PM GMT-0300 (-03))
// the value "start" is 1603324800000 (Date in your timezone*: 10/21/2020, 9:00:00 PM GMT-0300 (-03) : ???!!!!!
// I should started yesterday ???
// As the result:
public TimeWindow(long start, long end) {
this.start = start; //1603324800000 - Date in your timezone*: 10/21/2020, 9:00:00 PM GMT-0300 (-03)
this.end = end; //1603411200000 - Date in your timezone*: 10/22/2020, 9:00:00 PM GMT-0300 (-03)}
So, my job with 24h TumblingProcessingTimeWindows starting now at 10/22/2020, 12:05:20 PM will be finished today at 9:00:00 PM === 9 hours instead of 24 hours
Some solutions, please ?
By default, Flink's windows are aligned to the epoch, not to the time when they are created. So a 24 hour window will end at midnight UTC.
You can use the optional offset parameter to shift the window boundaries.

How to pause a Camel Quartz2 timer in a suspended route?

The following unit test tries a quartz2 route that triggers each second:
import org.apache.camel.builder.RouteBuilder;
import org.apache.camel.component.mock.MockEndpoint;
import org.apache.camel.test.junit4.CamelTestSupport;
import org.junit.Test;
public class CamelQuartzTest extends CamelTestSupport {
static private String routeId = "test-route";
#Test
public void testSuspendRoute() throws Exception {
// arrange
MockEndpoint mock = getMockEndpoint("mock:result");
// act
System.out.println("context.start()");
context.start();
Thread.sleep(2000);
System.out.println(String.format("receivedCounter = %d", mock.getReceivedCounter()));
System.out.println("context.startRoute()");
context.startRoute(routeId);
Thread.sleep(2000);
System.out.println(String.format("receivedCounter = %d", mock.getReceivedCounter()));
System.out.println("context.suspendRoute()");
context.suspendRoute(routeId);
Thread.sleep(2000);
System.out.println(String.format("receivedCounter = %d", mock.getReceivedCounter()));
System.out.println("context.resumeRoute()");
context.resumeRoute(routeId);
Thread.sleep(2000);
System.out.println(String.format("receivedCounter = %d", mock.getReceivedCounter()));
System.out.println("context.stop()");
context.stop();
System.out.println(String.format("receivedCounter = %d", mock.getReceivedCounter()));
// assert
assertEquals(4, mock.getReceivedCounter());
}
#Override
protected RouteBuilder createRouteBuilder() {
return new RouteBuilder() {
public void configure() {
from("quartz2://testtimer?cron=0/1+*+*+?+*+*")
.autoStartup(false)
.routeId(routeId)
.setBody()
.simple("${header.triggerName}: ${header.fireTime}")
.to("mock:result", "stream:out");
}
};
}
}
Result output:
context.start()
receivedCounter = 0
context.startRoute()
testtimer: Tue Oct 21 10:06:38 CEST 2014
testtimer: Tue Oct 21 10:06:39 CEST 2014
receivedCounter = 2
context.suspendRoute()
receivedCounter = 2
context.resumeRoute()
testtimer: Tue Oct 21 10:06:41 CEST 2014
testtimer: Tue Oct 21 10:06:41 CEST 2014
testtimer: Tue Oct 21 10:06:42 CEST 2014
testtimer: Tue Oct 21 10:06:43 CEST 2014
receivedCounter = 6
context.stop()
receivedCounter = 6
After resuming the route, the result shows 4 incoming triggers, while 2 were expected. Apparently, the quartz2 timer keeps firing while the route is suspended. How can I make quartz2 take a pause while the route is suspended?
Found the root cause: if a quartz job is suspended for a while, and resumed again, the default behavior of quartz is to catch up the triggers, aka "misfires", that were missed during the suspended period. I did not find a way the switch off this misfire behavior. However, decreasing the misfire threshold from 60 seconds to 500 ms helped in my case. This can be done by copying the default quartz.properties from quartz-<version>.jar to org/quartz/quartz.properties in the default classpath, and overrule the misfire threshold:
# Properties file for use by StdSchedulerFactory
# to create a Quartz Scheduler Instance.
# This file overrules the default quartz.properties file in the
# quartz-<version>.jar
#
org.quartz.scheduler.instanceName: DefaultQuartzScheduler
org.quartz.scheduler.rmi.export: false
org.quartz.scheduler.rmi.proxy: false
org.quartz.scheduler.wrapJobExecutionInUserTransaction: false
org.quartz.threadPool.class: org.quartz.simpl.SimpleThreadPool
org.quartz.threadPool.threadCount: 10
org.quartz.threadPool.threadPriority: 5
org.quartz.threadPool.threadsInheritContextClassLoaderOfInitializingThread: true
# default threshold: 60 seconds
#org.quartz.jobStore.misfireThreshold: 60000
# overruled threshold: 500 ms, to prevent superfluous triggers after resuming
# a quartz job
org.quartz.jobStore.misfireThreshold: 500
org.quartz.jobStore.class: org.quartz.simpl.RAMJobStore

Date format issue in Firefox browser

My code
for(n in data.values){
data.values[n].snapshot = new Date(data.values[n].snapshot);
data.values[n].value = parseInt(data.values[n].value);
console.log(data.values[n].snapshot);
}
here console.log shows perfect date in Chrome as 'Thu Aug 07 2014 14:29:00 GMT+0530 (India Standard Time)', but in Firefox it is showing as 'Invalid Date'.
If I console.log(data.values[n].snapshot) before the new Date line, it is showing date as
2014-08-07 14:29
How can I convert the date format to Firefox understandable way.
The Date object only officially accepts two formats:
Mon, 25 Dec 1995 13:30:00 GMT
2011-10-10T14:48:00
This means that your date 2014-08-07 14:29 is invalid.
Your date can be easily made compatible with the second date format though (assuming that date is yyyy-mm-dd hh:mm):
for(n in data.values){
n = n.replace(/\s/g, "T");
data.values[n].snapshot = new Date(data.values[n].snapshot);
data.values[n].value = parseInt(data.values[n].value);
console.log(data.values[n].snapshot);
}

Turn array of strings into array of dates in Google Apps Script

I have a Google Sheets spreadsheet. In Column B, I have a list of strings that are either dates or ranges of dates in the format month/date. For example:
7/26
7/27-7/31
8/1
8/2
8/3-8/5
I want to create an array with the first date on the left and the second date (if any) on the right. If there's no second date, it can be left blank. This is what I want:
[7/26,]
[7/27,7/31]
[8/1,]
[8/2,]
[8/3,8/5]
I've tried:
var r = 'B'
var dateString = sheet.getRange(dateColumns[r] + '1:' + dateColumns[r] + lastRow.toString()).getValues();
var dateArr = Utilities.parseCsv(dateString, '-');
But that just keeps concatenating all values. Also if it's possible to put the output in a date format that would be great too.
This was a funny exercise to play with...
Here is a code that does what you want :
function test(){
convertToDateArray('7/26,7/27-7/31,8/1,8/2,8/3-8/5');
}
function convertToDateArray(inputString){
if(typeof(inputString)=='string'){inputString=inputString.split(',')}; // if input is a string then split it into an array using comma as separator
var data = [];
var datesArray = [];
for(var n in inputString){
if(inputString[n].indexOf('-')==-1){inputString[n]+='-'};// if only 1 field add an empty one
data.push(inputString[n].split('-'));// make it an array
}
Logger.log(data);//check
for(var n in data){
var temp = [];
for(var c in data[n]){
Logger.log('data[n][c] = '+ data[n][c]);
var date = data[n][c]!=''? new Date(2014,Number(data[n][c].split('/')[0])-1,Number(data[n][c].split('/')[1]),0,0,0,0) : '';// create date objects with right values
Logger.log('date = '+date);//check
temp.push(date);
}
datesArray.push(temp);//store output data in an array of arrays, ready to setValues in a SS
}
Logger.log(datesArray);
var sh = SpreadsheetApp.getActive().getActiveSheet();
sh.getRange(1,1,datesArray.length,datesArray[0].length).setValues(datesArray);
}
Logger result for datesArray :
[[Sat Jul 26 00:00:00 GMT+02:00 2014, ], [Sun Jul 27 00:00:00 GMT+02:00 2014, Thu Jul 31 00:00:00 GMT+02:00 2014], [Fri Aug 01 00:00:00 GMT+02:00 2014, ], [Sat Aug 02 00:00:00 GMT+02:00 2014, ], [Sun Aug 03 00:00:00 GMT+02:00 2014, Tue Aug 05 00:00:00 GMT+02:00 2014]]

Change Date value from one TimeZone to another TimeZone

my case is I have a Date obj the date inside is UTC time. However I want it to be changed to Japan time.
Calendar calendar = Calendar.getInstance(TimeZone.getTimeZone("Japan"));
calendar.setTime(someExistingDateObj);
System.out.println(String.valueOf(calendar.get(Calendar.HOUR_OF_DAY)) + ":" + calendar.get(Calendar.MINUTE));
the existingDateObj is mapped from db and db value is 2013-02-14 03:37:00.733
04:37
it seems the timezone is not working?
thanks for your time....
Your problem may be that you're looking at things wrong. A Date doesn't have a time zone. It represents a discrete moment in time and is "intended to reflect coordinated universal time". Calendars and date formatters are what get time zone information. Your second example with the Calendar and TimeZone instances appears to work fine. Right now, this code:
public static void main(String[] args) {
Calendar calendar = Calendar.getInstance(TimeZone.getTimeZone("Japan"));
System.out.println(String.valueOf(calendar.get(Calendar.HOUR)) + ":" + calendar.get(Calendar.MINUTE));
}
Reports:
0:32
That appears correct to me. What do you find wrong with it?
Update: Oh, perhaps you're expecting 12:32 from the above code? You'd want to use Calendar.HOUR_OF_DAY instead of Calendar.HOUR for that, or else do some hour math. Calendar.HOUR uses 0 to represent both noon and midnight.
Update 2: Here's my final attempt to try to get this across. Try this code:
public static void main(String[] args) {
Calendar calendar = Calendar.getInstance();
SimpleDateFormat format = new SimpleDateFormat("H:mm a Z");
List<TimeZone> zones = Arrays.asList(
TimeZone.getTimeZone("CST"),
TimeZone.getTimeZone("UTC"),
TimeZone.getTimeZone("Asia/Shanghai"),
TimeZone.getTimeZone("Japan"));
for (TimeZone zone : zones) {
calendar.setTimeZone(zone);
format.setTimeZone(zone);
System.out.println(
calendar.get(Calendar.HOUR_OF_DAY) + ":"
+ calendar.get(Calendar.MINUTE) + " "
+ (calendar.get(Calendar.AM_PM) == 0 ? "AM " : "PM ")
+ (calendar.get(Calendar.ZONE_OFFSET) / 1000 / 60 / 60));
System.out.println(format.format(calendar.getTime()));
}
}
Note that it creates a single Calendar object, representing "right now". Then it prints out the time represented by that calendar in four different time zones, using both the Calendar.get() method and a SimpleDateFormat to show that you get the same result both ways. The output of that right now is:
22:59 PM -6
22:59 PM -0600
4:59 AM 0
4:59 AM +0000
12:59 PM 8
12:59 PM +0800
13:59 PM 9
13:59 PM +0900
If you used Calendar.HOUR instead of Calendar.HOUR_OF_DAY, then you'd see this instead:
10:59 PM -6
22:59 PM -0600
4:59 AM 0
4:59 AM +0000
0:59 PM 8
12:59 PM +0800
1:59 PM 9
13:59 PM +0900
It correctly shows the current times in Central Standard Time (my time zone), UTC, Shanghai time, and Japan time, respectively, along with their time zone offsets. You can see that they all line up and have the correct offsets.
sdf2 and sdf3 are equaly initialized, so there is no need for two of them.

Resources