How to achieve KGroupTable use case in flink - apache-flink

I am doing some poc on flink but I am not able to find documentation around how will I achieve use case similar to KGroupTable in kafka stream as shown below
KTable<byte[], Long> aggregatedStream = groupedTable.aggregate(() -> 0L,
(aggKey, newValue, aggValue) -> aggValue + newValue.length(),
(aggKey, oldValue, aggValue) -> aggValue - oldValue.length(), Serdes.Long(),     "aggregation-table-store");
Use case I want to aggregate account balance from transactions I receive. If I get an update on existing transaction Id I want to remove old value and add new value. Lets say if a transaction gets cancelled I want to remove the old value from account balance.
eg
TransactionId AccountId Balance
1 account1 1000 // account1 - 1000
2 account1 2000 // account1 - 3000
3 account2 2000 // account1 - 3000, account2 - 2000
1 account1 500 // account1 - 2500, account2 - 2000
In above example 4th update is, i got an update on existing transaction #1 so it will remove the old balance (1000) and add new balance (500)
Thanks

Here's a sketch of how you could approach that. I used Tuples because I was lazy; it would be better to use POJOs.
import org.apache.flink.api.common.functions.RichMapFunction;
import org.apache.flink.api.common.state.MapState;
import org.apache.flink.api.common.state.MapStateDescriptor;
import org.apache.flink.api.common.state.ReducingState;
import org.apache.flink.api.common.state.ReducingStateDescriptor;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
public class TransactionsWithRetractions {
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<Tuple3<Integer, String, Float>> rawInput = env.fromElements(
new Tuple3<>(1, "account1", 1000.0F ),
new Tuple3<>(2, "account1", 2000.0F),
new Tuple3<>(3, "account2", 2000.0F),
new Tuple3<>(1, "account1", 500.0F)
);
rawInput
.keyBy(t -> t.f1)
.map(new ManageAccounts())
.print();
env.execute();
}
public static class ManageAccounts extends RichMapFunction<Tuple3<Integer, String, Float>, Tuple2<String, Float>>{
MapStateDescriptor<Integer, Float> transactionsDesc;
ReducingStateDescriptor<Float> balanceDesc;
#Override
public void open(Configuration parameters) throws Exception {
transactionsDesc = new MapStateDescriptor<Integer, Float>("transactions", Integer.class, Float.class);
balanceDesc = new ReducingStateDescriptor<>("balance", (f, g) -> f + g, Float.class);
}
#Override
public Tuple2<String, Float> map(Tuple3<Integer, String, Float> event) throws Exception {
MapState<Integer, Float> transactions = getRuntimeContext().getMapState(transactionsDesc);
ReducingState<Float> balance = getRuntimeContext().getReducingState(balanceDesc);
Float currentValue = transactions.get(event.f0);
if (currentValue == null) {
currentValue = 0F;
}
transactions.put(event.f0, event.f2);
balance.add(event.f2 - currentValue);
return new Tuple2<>(event.f1, balance.get());
}
}
}
When run, this produces:
1> (account1,1000.0)
8> (account2,2000.0)
1> (account1,3000.0)
1> (account1,2500.0)
Note that this implementation keeps all transactions in state forever, which might become problematic in a real application, though you can scale Flink state to be very large.

Related

Huge checkpoint size using ValueState leading to event processing lag

I have an application in flink, which does deduplication of multiple streams.
It does key-by on one string field and dedupes it by using value-state.
Using value state in RichFilterFunction.
public class DedupeWithState extends RichFilterFunction<Tuple2<String, Message>> {
private ValueState<Boolean> seen;
private final ValueStateDescriptor<Boolean> desc;
public DedupeWithState(long cacheExpirationTimeMs) {
StateTtlConfig ttlConfig = StateTtlConfig
.newBuilder(Time.milliseconds(cacheExpirationTimeMs))
.setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)
.setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)
.build();
desc = new ValueStateDescriptor<>("seen", Types.BOOLEAN);
desc.enableTimeToLive(ttlConfig);
}
#Override
public void open(Configuration conf) {
seen = getRuntimeContext().getState(desc);
}
#Override
public boolean filter(Tuple2<String, Message> stringMessageTuple2) throws Exception {
if (seen.value() == null) {
seen.update(true);
return true;
}
return false;
}
}
The application consumes 3 streams from kafka, and each stream has its own dedupe function with ttl of 4hours.
DataStream<Tuple2<String, Message>> event1 = event1Input
.keyBy(x->x.f0)
.filter(new DedupeWithState(14400000));
DataStream<Tuple2<String, Message>> event2 = event2Input
.keyBy(x->x.f0)
.filter(new DedupeWithState(14400000));
DataStream<Tuple2<String, Message>> event3 = event3Input
.keyBy(x->x.f0)
.filter(new DedupeWithState(14400000));
Screenshots attached.
Backend properties are:
state.backend: rocksdb
state.backend.incremental: true
state.checkpoints.dir: <azure blob store>
Checkpoint configuration as on WebUI
We are using Flink 1.13.6.
The QPS of each stream is event1 - 7k, event2 - 6k, event3 - 200
Key size is ~110 bytes
Checkpoint interval is 5 mins and incremental checkpoint is enabled.
As per above configs (given that incremental checkpoint is enabled) each stream should have following checkpoint size:
event1 -> ((7000 * 60 * 5) * 110bytes) = ~220MB
Issue is the checkpoint size is very huge. It starts from 400 MB (as expected) but is going upto 2-3GB per checkpoint Checkpoint history. This results in huge backpressure in Dedupe function and overall lag in the system. Checkpoint per operator
Maybe the state is not being cleaned since it is done lazily (on read). From the initial release post (a bit old but may still stand):
When a state object is accessed in a read operation, Flink will check its timestamp and clear the state if it is expired (depending on the configured state visibility, the expired state is returned or not). Due to this lazy removal, expired state that is never accessed again will forever occupy storage space unless it is garbage collected.
I would try with a MapState per stream (without keying by) instead of a ValueState per key as you have now so the same state is continuously accessed. Or you may also be able to set up a timer in DedupeWithState that accesses the state and forces the cleanup (you may need to use a ProcessFunction to be able to set up timers) or that simply clears it.
Try something like this -
/**
* #author sucheth.shivakumar
*/
public class Check extends KeyedProcessFunction {
private ValueState<Boolean> seen;
#Override
public void open(Configuration parameters) throws Exception {
ValueStateDescriptor<Boolean> desc = new ValueStateDescriptor<>("seen", Types.BOOLEAN);
// defines the time the state has to be stored in the state backend before it is auto cleared
seen = getRuntimeContext().getState(desc);
}
#Override
public void processElement(Object value, Context ctx, Collector out) throws Exception {
if (seen.value() == null) {
seen.update(true);
// emits the record
out.collect(stringMessageTuple2);
ctx.timerService().registerProcessingTimeTimer(ctx.timestamp() + 14400000);
}
}
#Override
// this fires after 4 hrs is passed and clears the state
public void onTimer(long timestamp, OnTimerContext ctx, Collector out)
throws Exception {
// triggers after ttl has passed
if (seen.value()) {
seen.clear();
}
}

JDBI: Connections being automatically closed after idle

I'm relatively new to connection pooling, but from what I've read it seems ideal to leave some connections idle for faster performance.
I'm currently using JDBI, and after idle periods I'll get a
com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.
I would assume this is either because my database configuration settings are lacking or that I must be using the framework incorrectly:
config.yml:
database:
# whether or not idle connections should be validated
checkConnectionWhileIdle: false
# the maximum amount of time to wait on an empty pool before throwing an exception
maxWaitForConnection: 10s
# Limits for simultaneous connections to DB
minSize: 10
initialSize: 10
maxSize: 100
DAOs:
public class AccountDAO {
private final Jdbi jdbi;
public AccountDAO(Jdbi jdbi) {
this.jdbi = jdbi;
}
public void addAccount(String id) {
jdbi.useHandle(h ->
h.createUpdate("INSERT INTO Account(id) values (:id)")
.bind("id", id)
.execute());
}
}
public class RestaurantDAO {
private final Jdbi jdbi;
public RestaurantDAO(Jdbi jdbi) {
this.jdbi = jdbi;
}
public Optional<RestaurantDTO> getRestaurantByName(String restName) {
return jdbi.withHandle(h ->
h.createQuery("SELECT * FROM Restaurant WHERE restName =:restName")
.bind("restName", restName)
.mapToBean(RestaurantDTO.class)
.findOne());
}
public void addRestaurant(String restName) {
jdbi.useHandle(h ->
h.createUpdate("INSERT INTO Restaurant(restName) values (:restName)")
.bind("restName", restName)
.execute()
);
}
}
public class ReviewDAO(Jdbi jdbi) {
this.jdbi = jdbi;
}
public Optional<ReviewDTO> getReviewByAuthorAndRestaurant(String author, String restName) {
return jdbi.withHandle(h ->
h.createQuery("SELECT * FROM Review WHERE author=:author AND restName =:restName")
.bind("author", author)
.bind("restName", restName)
.mapToBean(ReviewDTO.class)
.findOne());
}
public List<ReviewDTO> getReviewsByAuthor(String author) {
return jdbi.withHandle(h ->
h.createQuery("SELECT * FROM Review WHERE author =:author ORDER BY created DESC")
.bind("author", author)
.mapToBean(ReviewDTO.class)
.list());
}
public List<ReviewDTO> getReviewsByRestaurant(String restName) {
return jdbi.withHandle(h ->
h.createQuery("SELECT * FROM Review WHERE restName =:restName ORDER BY created DESC")
.bind("restName", restName)
.mapToBean(ReviewDTO.class)
.list());
}
public List<ReviewDTO> getRecentReviews() {
return jdbi.withHandle(h ->
h.createQuery("SELECT top 5 * FROM Review ORDER BY created DESC")
.mapToBean(ReviewDTO.class)
.list());
}
public void addReview(String author, String restName, String title, String bodyText, int rating) {
jdbi.useHandle(h ->
h.createUpdate("INSERT INTO Review(bodyText, rating, restName, author, title) values (:bodyText, :rating, :restName, :author, :title)")
.bind("bodyText", bodyText)
.bind("rating", rating)
.bind("restName", restName)
.bind("author", author)
.bind("title", title)
.execute());
}
public void updateReview(String author, String restName, String title, String bodyText, int rating) {
jdbi.useHandle(h ->
h.createUpdate("UPDATE Review SET bodyText=:bodyText, rating=:rating, title=:title where author=:author AND restName=:restName")
.bind("bodyText", bodyText)
.bind("rating", rating)
.bind("title", title)
.bind("author", author)
.bind("restName", restName)
.execute());
}
public void deleteReview(String author, String restName) {
jdbi.useHandle(h ->
h.createUpdate("DELETE FROM Review WHERE author=:author AND restName=:restName")
.bind("author", author)
.bind("restName", restName)
.execute());
}
}
Using the setting
checkConnectionOnBorrow: true
Might work, but I would assume that the ideal solution would be to prevent my initial connections from being closed in the first place?
Any assistance is appreciated
It turns out my DB host, Azure, automatically closes idle connections after 30 minutes. For the time being, I've added aggressive validation settings to my config to renew the pool accordingly. Probably just gonna switch hosts since it doesn't look like you can configure the timeout on Azure's end.
validationQuery: "/* APIService Health Check */ SELECT 1"
validationQueryTimeout: 3s
checkConnectionWhileIdle: true
minIdleTime: 25m
evictionInterval: 5s
validationInterval: 1m

Change in Behavior for EventTimeSessionWindows from Flink 1.11.1 to 1.14.0

I observed what appears to be a change in behavior for EventTimeSessionWindows when upgrading from 1.11.1 to 1.14.0. This was identified in a unit test.
For a window with a defined time gap of 10 seconds.
Publish KEY_1 with eventtime 1 second
Publish KEY_1 with eventtime 3 seconds
Publish KEY_1 with eventtime 2 seconds
Publish KEY_2 with eventtime 101 seconds
For Flink 1.11.1 the window for KEY_1 closes, reduces, and publishes, supposedly because KEY_2 had an event time greater than 10 seconds after the last message in KEY_1's window. KEY_2 window would also not close. In the absence of KEY_2 the KEY_1 window would not close.
For Flink 1.14.0 the main difference is that the window for KEY_2 DOES close even though there are no new messages after 111 seconds.
This appears to be a change in behavior. The nearest I could find was https://issues.apache.org/jira/browse/FLINK-20443 but that’s in 1.14.1. I also noticed https://issues.apache.org/jira/browse/FLINK-19777 which was in 1.11.3 but couldn't ascertain if that would have resulted in this behavior. Is there an explanation for this change in behavior? Is it expected or desirable? Is it because all pending windows are automatically closed based on an updated trigger behavior?
I tested the same behavior for ProcessingTimeSessionWindows and did not observe a similar change in behavior.
Thanks.
Jai
#Test
public void testEventTime() throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// configure your test environment
env.setParallelism(1);
env.getConfig().registerTypeWithKryoSerializer(Document.class, ProtobufSerializer.class);
// values are collected in a static variable
CollectSink.values.clear();
// create a stream of custom elements and apply transformations
SingleOutputStreamOperator<Tuple2<String, Document>> inputStream = buildStream(env, this.generateTestOrders());
SingleOutputStreamOperator<Tuple2<String, Document>> intermediateStream = this.documentDebounceFunction.insertIntoPipeline(inputStream);
intermediateStream.addSink(new CollectSink());
// execute
env.execute();
// verify your results
Assertions.assertEquals(1, CollectSink.values.size());
Map<String, Long> expectedVersions = Maps.newHashMap();
expectedVersions.put(KEY_1, 2L);
for (Tuple2<String, Document> actual : CollectSink.values) {
Assertions.assertEquals(expectedVersions.get(actual.f0), actual.f1.getVersion());
}
}
// create a testing sink
private static class CollectSink implements SinkFunction<Tuple2<String, Document>> {
// must be static
public static final List<Tuple2<String, Document>> values = Collections.synchronizedList(new ArrayList<>());
#Override
public void invoke(Tuple2<String, Document> value, SinkFunction.Context context) {
values.add(value);
}
}
public List<Tuple2<String, Document>> generateTestOrders() {
List<Tuple2<String, Document>> testMessages = Lists.newArrayList();
// KEY_1
testMessages.add(
Tuple2.of(
KEY_1,
Document.newBuilder()
.setVersion(1)
.setUpdatedAt(
Timestamp.newBuilder().setSeconds(1).build())
.build()));
testMessages.add(
Tuple2.of(
KEY_1,
Document.newBuilder()
.setVersion(2)
.setUpdatedAt(
Timestamp.newBuilder().setSeconds(3).build())
.build()));
testMessages.add(
Tuple2.of(
KEY_1,
Document.newBuilder()
.setVersion(3)
.setUpdatedAt(
Timestamp.newBuilder().setSeconds(2).build())
.build()));
// KEY_2 -- WAY IN THE FUTURE
testMessages.add(
Tuple2.of(
KEY_2,
Document.newBuilder()
.setVersion(15)
.setUpdatedAt(
Timestamp.newBuilder().setSeconds(101).build())
.build()));
return ImmutableList.copyOf(testMessages);
}
private SingleOutputStreamOperator<Tuple2<String, Document>> buildStream(
StreamExecutionEnvironment executionEnvironment,
List<Tuple2<String, Document>> inputMessages) {
inputMessages =
inputMessages.stream()
.sorted(
Comparator.comparingInt(
msg -> (int) ProtobufTypeConversion.toMillis(msg.f1.getUpdatedAt())))
.collect(Collectors.toList());
WatermarkStrategy<Tuple2<String, Document>> watermarkStrategy =
WatermarkStrategy.forMonotonousTimestamps();
return executionEnvironment
.fromCollection(
inputMessages, TypeInformation.of(new TypeHint<Tuple2<String, Document>>() {}))
.assignTimestampsAndWatermarks(
watermarkStrategy.withTimestampAssigner(
(event, timestamp) -> Timestamps.toMillis(event.f1.getUpdatedAt())));
}

Flink JDBC Sink part 2

I have posted a question few days back- Flink Jdbc sink
Now, I am trying to use the sink provided by flink.
I have written the code and it worked as well. But nothing got saved in DB and no exceptions were there. Using previous sink my code was not finishing(that should happen ideally as its a streaming app) but after the following code I am getting no error and the nothing is getting saved to DB.
public class CompetitorPipeline implements Pipeline {
private final StreamExecutionEnvironment streamEnv;
private final ParameterTool parameter;
private static final Logger LOG = LoggerFactory.getLogger(CompetitorPipeline.class);
public CompetitorPipeline(StreamExecutionEnvironment streamEnv, ParameterTool parameter) {
this.streamEnv = streamEnv;
this.parameter = parameter;
}
#Override
public KeyedStream<CompetitorConfig, String> start(ParameterTool parameter) throws Exception {
CompetitorConfigChanges competitorConfigChanges = new CompetitorConfigChanges();
KeyedStream<CompetitorConfig, String> competitorChangesStream = competitorConfigChanges.run(streamEnv, parameter);
//Add to JBDC Sink
competitorChangesStream.addSink(JdbcSink.sink(
"insert into competitor_config_universe(marketplace_id,merchant_id, competitor_name, comp_gl_product_group_desc," +
"category_code, competitor_type, namespace, qualifier, matching_type," +
"zip_region, zip_code, competitor_state, version_time, compConfigTombstoned, last_updated) values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)",
(ps, t) -> {
ps.setInt(1, t.getMarketplaceId());
ps.setLong(2, t.getMerchantId());
ps.setString(3, t.getCompetitorName());
ps.setString(4, t.getCompGlProductGroupDesc());
ps.setString(5, t.getCategoryCode());
ps.setString(6, t.getCompetitorType());
ps.setString(7, t.getNamespace());
ps.setString(8, t.getQualifier());
ps.setString(9, t.getMatchingType());
ps.setString(10, t.getZipRegion());
ps.setString(11, t.getZipCode());
ps.setString(12, t.getCompetitorState());
ps.setTimestamp(13, Timestamp.valueOf(t.getVersionTime()));
ps.setBoolean(14, t.isCompConfigTombstoned());
ps.setTimestamp(15, new Timestamp(System.currentTimeMillis()));
System.out.println("sql"+ps);
},
new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
.withUrl("jdbc:mysql://127.0.0.1:3306/database")
.withDriverName("com.mysql.cj.jdbc.Driver")
.withUsername("xyz")
.withPassword("xyz#")
.build()));
return competitorChangesStream;
}
}
You need enable autocommit mode for jdbc Sink.
new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
.withUrl("jdbc:mysql://127.0.0.1:3306/database;autocommit=true")
It looks like SimpleBatchStatementExecutor only works in auto-commit mode. And if you need to commit and rollback batches, then you have to write your own ** JdbcBatchStatementExecutor **
Have you tried to include the JdbcExecutionOptions ?
dataStream.addSink(JdbcSink.sink(
sql_statement,
(statement, value) -> {
/* Prepared Statement */
},
JdbcExecutionOptions.builder()
.withBatchSize(5000)
.withBatchIntervalMs(200)
.withMaxRetries(2)
.build(),
new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
.withUrl("jdbc:mysql://127.0.0.1:3306/database")
.withDriverName("com.mysql.cj.jdbc.Driver")
.withUsername("xyz")
.withPassword("xyz#")
.build()));

Flink SQL: How can use a Long type column to Rowtime

Flink1.9.1
I read a csv file. I want to use a long type column to TUMBLE.
I use UDF transfer Long type to Timestamp type,but is can't work
error message: Window can only be defined over a time attribute column.
I try to debug. TimeIndicatorRelDataType is not Timestamp,I don't know how to transfer and why?
def isTimeIndicatorType(relDataType: RelDataType): Boolean = relDataType match {
case ti: TimeIndicatorRelDataType => true
case _ => false
}
CODE
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
env.setParallelism(1);
// read csv
URL fileUrl = HotItemsSql.class.getClassLoader().getResource("UserBehavior-less.csv");
CsvTableSource csvTableSource = CsvTableSource.builder().path(fileUrl.getPath())
.field("userId", BasicTypeInfo.LONG_TYPE_INFO)
.field("itemId", BasicTypeInfo.LONG_TYPE_INFO)
.field("categoryId", BasicTypeInfo.LONG_TYPE_INFO)
.field("behavior", BasicTypeInfo.LONG_TYPE_INFO)
.field("optime", BasicTypeInfo.LONG_TYPE_INFO)
.build();
// trans to stream
DataStream<Row> csvDataStream=csvTableSource.getDataStream(env).assignTimestampsAndWatermarks(new AscendingTimestampExtractor<Row>() {
#Override
public long extractAscendingTimestamp(Row element) {
return Timestamp.valueOf(element.getField(5).toString()).getTime();
}
}).broadcast();
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
tableEnv.registerDataStream("T_UserBehavior",csvDataStream,"userId,itemId,categoryId,behavior,optime");
tableEnv.registerFunction("Long2DateTime",new DateTransFunction());
Table result = tableEnv.sqlQuery("select userId," +
"TUMBLE_START(Long2DateTime(optime), INTERVAL '10' SECOND) as window_start," +
"TUMBLE_END(Long2DateTime(optime), INTERVAL '10' SECOND) as window_end " +
"from T_UserBehavior " +
"group by TUMBLE(Long2DateTime(optime),INTERVAL '10' SECOND),userId");
tableEnv.toRetractStream(result, Row.class).print();
UDF
import java.sql.Timestamp;
public class DateTransFunction extends ScalarFunction {
public Timestamp eval(Long longTime) {
try {
Timestamp t = new Timestamp(longTime);
return t;
} catch (Exception e) {
return null;
}
}
}
error stack
Exception in thread "main" org.apache.flink.table.api.ValidationException: Window can only be defined over a time attribute column.
at org.apache.flink.table.plan.rules.datastream.DataStreamLogicalWindowAggregateRule.getOperandAsTimeIndicator$1(DataStreamLogicalWindowAggregateRule.scala:85)
at org.apache.flink.table.plan.rules.datastream.DataStreamLogicalWindowAggregateRule.translateWindowExpression(DataStreamLogicalWindowAggregateRule.scala:90)
at org.apache.flink.table.plan.rules.common.LogicalWindowAggregateRule.onMatch(LogicalWindowAggregateRule.scala:68)
at org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:319)
at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:560)
at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:419)
at org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:256)
at org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
at org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:215)
at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:202)
at org.apache.flink.table.plan.Optimizer.runHepPlanner(Optimizer.scala:228)
at org.apache.flink.table.plan.Optimizer.runHepPlannerSequentially(Optimizer.scala:194)
at org.apache.flink.table.plan.Optimizer.optimizeNormalizeLogicalPlan(Optimizer.scala:150)
at org.apache.flink.table.plan.StreamOptimizer.optimize(StreamOptimizer.scala:65)
at org.apache.flink.table.planner.StreamPlanner.translateToType(StreamPlanner.scala:410)
at org.apache.flink.table.planner.StreamPlanner.org$apache$flink$table$planner$StreamPlanner$$translate(StreamPlanner.scala:182)
Since you already managed to assign a timestamp in DataStream API, you should be able to call:
tableEnv.registerDataStream(
"T_UserBehavior",
csvDataStream,
"userId, itemId, categoryId, behavior, rt.rowtime");
The .rowtime instructs the API to create column with the timestamp stored in every stream record coming from DataStream API.
The community is currently working on making your program easier. In Flink 1.10 you should be able to define your CSV with rowtime table directly in a SQL DDL.

Resources