Flink json serialization timezone issue

Flink json serialization timezone issue - apache-flink

I use JsonRowSerializationSchema to serialize Flink's Row into JSON. I SQL timestamp serialization has timezone issues.
val row = new Row(1)
row.setField(0, new Timestamp(0))
val tableSchema = TableSchema
.builder
.field("c", DataTypes.TIMESTAMP(3).bridgedTo(classOf[Timestamp]))
.build
val serializer = JsonRowSerializationSchema.builder()
.withTypeInfo(tableSchema.toRowType)
.build()
println(new String(serializer.serialize(row)))
{"c":"1969-12-31T16:00:00Z"}
I see it uses PST(local time zone) to interpret timestamp, but then output UTC(see Z in output)
If I do TimeZone.setDefault(TimeZone.getTimeZone("UTC")), then it prints {"c":"1970-01-01T00:00:00Z"}. My timestamps are created for UTC time, and I want Flink to interpret them as UTC.
I am checking the Flink implementation, following two methods are in action.
private JsonNode convertLocalDateTime(ObjectMapper mapper, JsonNode reuse, Object object) {
return mapper.getNodeFactory()
.textNode(RFC3339_TIMESTAMP_FORMAT.format((LocalDateTime) object));
}
private JsonNode convertTimestamp(ObjectMapper mapper, JsonNode reuse, Object object) {
Timestamp timestamp = (Timestamp) object;
return convertLocalDateTime(mapper, reuse, timestamp.toLocalDateTime());
}
It looks like the implementation is hardcoded, is there any way to tell Flink to use UTC without changing the system time?

The java.sql.Timestamp is very problematic because it depends on a time zone. This is why we replaced it with the new java.time.* classes in the new Table/SQL type system.
We recommend that all Flink JVMs are configured in UTC time zone for the outdated implementation.
For Table/SQL, we use the new org.apache.flink.formats.json.JsonRowDataSerializationSchema but this works on internal data structures. I would recommend to just copy the source code of JsonRowSerializationSchema and implement the format as you need it. Or use the Jackson library directly which would avoid dealing with TypeInformation at all.

Related

How to convert RowData into Row when using DynamicTableSink

I have a question regarding the new sourceSinks interface in Flink. I currently implement a new custom DynamicTableSinkFactory, DynamicTableSink, SinkFunction and OutputFormat. I use the JDBC Connector as an example and I use Scala.
All data that is fed into the sink has the type Row. So the OutputFormat serialisation is based on the Row Interface:
override def writeRecord(record: Row): Unit = {...}
As stated in the documentation:
records must be accepted as org.apache.flink.table.data.RowData. The
framework provides runtime converters such that a sink can still work
on common data structures and perform a conversion at the beginning.
The goal here is to keep the Row data structure and only convert Row into RowData when inserted into the SinkFunction. So in this way the rest of the code does not need to be changed.
class MySinkFunction(outputFormat: MyOutputFormat) extends RichSinkFunction[RowData] with CheckpointedFunction
So the resulting question is: How to convert RowData into Row when using a DynamicTableSink and OutputFormat? Where should the conversion happen?
links:
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sourceSinks.html
https://github.com/apache/flink/tree/master/flink-connectors/flink-connector-jdbc/src/test/java/org/apache/flink/connector/jdbc
Thanks.

You can obtain a converter instance in the Context provided in org.apache.flink.table.connector.sink.DynamicTableSink#getSinkRuntimeProvider.
// create type information for the DeserializationSchema
final TypeInformation<RowData> producedTypeInfo =
context.createTypeInformation(producedDataType);
// most of the code in DeserializationSchema will not work on internal data structures
// create a converter for conversion at the end
final DataStructureConverter converter =
context.createDataStructureConverter(producedDataType);
The instance is Java serializable and can be passed into the sink function. You should also call the converter.open() method in your sink function.
A more complex example can be found here (for sources but sinks work in a similar way). Have a look at SocketDynamicTableSource and ChangelogCsvFormat in the same package.

Flutter wrong utc time from SQL Server database

I've got a problem with flutter recognizing my DateTime as UTC time. Via http request, I am storing some data inside my SQL Server database which contains a UTC DateTime. This is done via Entity Framework.
var workingTime = new WorkingTime()
{
StartDateTime = dto.StartDateTime.ToUniversalTime(),
EndDateTime = dto.EndDateTime.ToUniversalTime(),
...
};
await _repository.AddAsync(workingTime);
This looks quiet okay, the request was done at 2020-06-13 01-59-41:690 in Germany, so the stored data is the correct UTC time.
Now I am loading these data in my flutter app and if I debug my app, the loaded datetime says, that it is not a UTC time.
I am not sure, if I store the data wrong, or if I the parsing inside flutter is wrong, but I can't see me doing something wrong here.
Please tell me if you need code or more information.
Edit
So after a lot of testing, I found out something:
Debug.WriteLine(model.EndDateTime.Kind);
This prints "Unspecified". It seems like there is something wrong either in storing the DateTime or in reading from it.

Be explicit about the UTC every time. Indicate to the compiler you want the UTC time.
To get millisecondsSinceEpoch in UTC time:
DateTime dateTime=DateTime.now().toUtc();
int epochTime = dateTime.toUtc().millisecondsSinceEpoch;
To get Datetime in UTC:
DateTime dt=DateTime.fromMillisecondsSinceEpoch(millisecondsSinceEpoch).toUtc();

Store time as int in database. Milliseconds since epoch time.
DateTime dateTime=DateTime.now().toUtc();
int epochTime = dateTime.toUtc().millisecondsSinceEpoch;

After a lot of thinking I came along a good method for my case. I ended up implementing an extension method, that converts the Kind of every DateTime stored inside the Database to the UTC Kind when loaded.
Inside the DataContext:
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
// Convert all DateTimes to UTC
modelBuilder.TreatDateTimeAsUtc();
}
The extension method
public static class ModelBuilderExtensions
{
public static void TreatDateTimeAsUtc(this ModelBuilder modelBuilder)
{
var dateTimeConverter = new ValueConverter<DateTime, DateTime>(
v => v.ToUniversalTime(),
v => DateTime.SpecifyKind(v, DateTimeKind.Utc));
var nullableDateTimeConverter = new ValueConverter<DateTime?, DateTime?>(
v => v.HasValue ? v.Value.ToUniversalTime() : v,
v => v.HasValue ? DateTime.SpecifyKind(v.Value, DateTimeKind.Utc) : v);
foreach (var entityType in modelBuilder.Model.GetEntityTypes())
{
foreach (var property in entityType.GetProperties())
{
if (property.ClrType == typeof(DateTime))
{
property.SetValueConverter(dateTimeConverter);
}
else if (property.ClrType == typeof(DateTime?))
{
property.SetValueConverter(nullableDateTimeConverter);
}
}
}
}
}

Proper way to assign watermark with DateStreamSource<List<T>> using Flink

I have a continuing JSONArray data produced to Kafka topic,and I wanna process records with EventTime characteristic.In order to reach this goal,I have to assign watermark to each record which contained in the JSONArray.
I didn't find a convenience way to achieve this goal.My solution is consuming data from DataStreamSource> ,then iterate List and collect Object to downstream with an anonymous ProcessFunction,finally assign watermark to the this downstream.
The major code shows below:
DataStreamSource<List<MockData>> listDataStreamSource = KafkaSource.genStream(env);
SingleOutputStreamOperator<MockData> convertToPojo = listDataStreamSource
.process(new ProcessFunction<List<MockData>, MockData>() {
#Override
public void processElement(List<MockData> value, Context ctx, Collector<MockData> out)
throws Exception {
value.forEach(mockData -> out.collect(mockData));
}
});
convertToPojo.assignTimestampsAndWatermarks(
new BoundedOutOfOrdernessTimestampExtractor<MockData>(Time.seconds(5)) {
#Override
public long extractTimestamp(MockData element) {
return element.getTimestamp();
}
});
SingleOutputStreamOperator<Tuple2<String, Long>> countStream = convertToPojo
.keyBy("country").window(
SlidingEventTimeWindows.of(Time.seconds(10), Time.seconds(10)))
.process(
new FlinkEventTimeCountFunction()).name("count elements");
The code seems all right without doubt,running without error as well.But ProcessWindowFunction never triggered.I tracked the Flink source code,find EventTimeTrigger never returns TriggerResult.FIRE,causing by TriggerContext.getCurrentWatermark returns Long.MIN_VALUE all the time.
What's the proper way to process List in eventtime?Any suggestion will be appreciated.

The problem is that you are applying the keyBy and window operations to the convertToPojo stream, rather than the stream with timestamps and watermarks (which you didn't assign to a variable).
If you write the code more or less like this, it should work:
listDataStreamSource = KafkaSource ...
convertToPojo = listDataStreamSource.process ...
pojoPlusWatermarks = convertToPojo.assignTimestampsAndWatermarks ...
countStream = pojoPlusWatermarks.keyBy ...
Calling assignTimestampsAndWatermarks on the convertToPojo stream does not modify that stream, but rather creates a new datastream object that includes timestamps and watermarks. You need to apply your windowing to that new datastream.

Flink SQL - How to parse a TIMESTAMP with custom pattern?

From documentation it looks like Flink's SQL can only parse timestamps in a certain format, namely:
TIMESTAMP string: Parses a timestamp string in the form "yy-mm-dd hh:mm:ss.fff" to a SQL timestamp.
Is there any way to pass in a custom DateTimeFormatter to parse a different kind of timestamp format?

You can implement any parsing logic using a user-defined scalar function (UDF).
This would look in Scala as follows.
class TsParser extends ScalarFunction {
def eval(s: String): Timestamp = {
// your logic
}
}
Once defined the function has to be registered at the TableEnvironment:
tableEnv.registerFunction("tsParser", new TsParser())
Now you can use the function tsParser just like any built-in function.
See the documentation for details.

Is there an equivalent to Kafka's KTable in Apache Flink?

Apache Kafka has a concept of a KTable, where
where each data record represents an update
Essentially, I can consume a kafka topic, and only keep the latest message for per key.
Is there a similar concept available in Apache Flink? I have read about Flink's Table API, but does not seem to be solving the same problem.
Some help comparing and contrasting the 2 frameworks would be helpful. I am not looking for which is better or worse. But rather just how they differ. The answer for which is right would then depend on my requirements.

You are right. Flink's Table API and its Table class do not correspond to Kafka's KTable. The Table API is a relational language-embedded API (think of SQL integrated in Java and Scala).
Flink's DataStream API does not have a built-in concept that corresponds to a KTable. Instead, Flink offers sophisticated state management and a KTable would be a regular operator with keyed state.
For example, a stateful operator with two inputs that stores the latest value observed from the first input and joins it with values from the second input, can be implemented with a CoFlatMapFunction as follows:
DataStream<Tuple2<Long, String>> first = ...
DataStream<Tuple2<Long, String>> second = ...
DataStream<Tuple2<String, String>> result = first
// connect first and second stream
.connect(second)
// key both streams on the first (Long) attribute
.keyBy(0, 0)
// join them
.flatMap(new TableLookup());
// ------
public static class TableLookup
extends RichCoFlatMapFunction<Tuple2<Long,String>, Tuple2<Long,String>, Tuple2<String,String>> {
// keyed state
private ValueState<String> lastVal;
#Override
public void open(Configuration conf) {
ValueStateDescriptor<String> valueDesc =
new ValueStateDescriptor<String>("table", Types.STRING);
lastVal = getRuntimeContext().getState(valueDesc);
}
#Override
public void flatMap1(Tuple2<Long, String> value, Collector<Tuple2<String, String>> out) throws Exception {
// update the value for the current Long key with the String value.
lastVal.update(value.f1);
}
#Override
public void flatMap2(Tuple2<Long, String> value, Collector<Tuple2<String, String>> out) throws Exception {
// look up latest String for current Long key.
String lookup = lastVal.value();
// emit current String and looked-up String
out.collect(Tuple2.of(value.f1, lookup));
}
}
In general, state can be used very flexibly with Flink and let's you implement a wide range of use cases. There are also more state types, such as ListState and MapState and with a ProcessFunction you have fine-grained control over time, for example to remove the state of a key if it has not been updated for a certain amount of time (KTables have a configuration for that as far as I know).