In Flink How does java.util.Date type property of POJO object can be converted to TIMESTAMP(3) - apache-flink

I would like to register incoming streaming objects as table in Flink Table environment .
This object has transactionTime property which type is java.util.Date .
When I use
Table incomingStream = tableEnv.fromDataStream(incomingDataStream,Schema.newBuilder() .columnByExpression("processTime", "PROCTIME()") // extract process time into a column .build()); tableEnv.createTemporaryView("incomingDataStream", transactionTable); tableEnv.executeSql("DESCRIBE incomingDataStream").print();
I see that transactionTime registered as
| transactionTime | RAW('java.util.Date', '...') | TRUE
How can I register this field as TIMESTAMP(3) in order to use it as eventTime
transactionTime | TIMESTAMP(3) | TRUE

Related

SSRS multiple filters but get getting only the header

I have four filters in the reports as below
Customer : Base on the report customer field
Loan ref : Base on the Loan ref
Type : Base on the Type
Date : user can the select date base on the date column
All the above filters based on parameter on the report columns, Null values accepted in all filters
Even though I input the customer no in filter, report output will only the column headers
--Report Headers as follows
Date | company | Loan ref | Customer | Type | Amount

Flink SQL Watermark Strategy After Join Operation

My problem is that I cannot use the ORDER BY clause after the JOIN operation. To reproduce the problem,
CREATE TABLE stack (
id INT PRIMARY KEY,
ts TIMESTAMP(3),
WATERMARK FOR ts AS ts - INTERVAL '1' SECONDS
) WITH (
'connector' = 'datagen',
'rows-per-second' = '5',
'fields.id.kind'='sequence',
'fields.id.start'='1',
'fields.id.end'='100'
);
This table has a watermark strategy and TIMESTAMP(3) *ROWTIME* type on ts.
Flink SQL> DESC stack;
+------+------------------------+-------+---------+--------+----------------------------+
| name | type | null | key | extras | watermark |
+------+------------------------+-------+---------+--------+----------------------------+
| id | INT | FALSE | PRI(id) | | |
| ts | TIMESTAMP(3) *ROWTIME* | TRUE | | | `ts` - INTERVAL '1' SECOND |
+------+------------------------+-------+---------+--------+----------------------------+
2 rows in set
However, if I define a view as a simple self-join
CREATE VIEW self_join AS (
SELECT l.ts, l.id, r.id
FROM stack as l INNER JOIN stack as r
ON l.id=r.id
);
it loses the watermark strategy but not the type,
Flink SQL> DESC self_join;
+------+------------------------+-------+-----+--------+-----------+
| name | type | null | key | extras | watermark |
+------+------------------------+-------+-----+--------+-----------+
| ts | TIMESTAMP(3) *ROWTIME* | TRUE | | | |
| id | INT | FALSE | | | |
| id0 | INT | FALSE | | | |
+------+------------------------+-------+-----+--------+-----------+
3 rows in set
I assume that we can preserve the watermark strategy and use ORDER BY after a JOIN operation but this is not the case. How can I add a watermark strategy again to the VIEW?
Thanks in advance.
Whenever Flink SQL performs a regular join in streaming mode (a join without any sort of temporal constraint), it's not possible for the result to have watermarks. Which in turn means that you can't sort or apply windowing to the result.
Why is this, and what can you do about it?
Background
Flink SQL uses time attributes (in this case, stack.ts) to optimize state retention. Because the stack stream/table has a time attribute, we know that this stream will be processed more-or-less in order, by time (the elements are constrained to be at most 1 second out-of-order). This then places a tight constraint on how much state must be retained in order to perform an operation like sorting this table -- a 1-second-long buffer will be enough.
If stack didn't have a time attribute defined on it (i.e., a timestamp field with watermarking defined on it), then Flink SQL would refuse to sort it (in streaming mode) because doing so would require keeping around an unbounded amount of state, and it would be impossible to know how long to wait before emitting the first result.
The result of a regular join cannot have a well-defined watermark strategy
Any type of regular join requires that Flink store in its state backend all rows of the input tables forever (which Flink is willing to try to do). But more to the point, watermarking is not well-defined on the result, because there are no constraints on how out-of-order it might be.
What you can do
If you modify the join to be either an interval join or a temporal join then the result will still have watermarks. E.g., you could do this:
CREATE VIEW self_join AS (
SELECT l.ts, l.id, r.id
FROM stack as l INNER JOIN stack as r
ON l.id=r.id
WHERE ls.ts BETWEEN r.ts - INTERVAL '1' MINUTE AND r.ts
);
or you could do this:
CREATE VIEW self_join AS (
SELECT l.ts, l.id, r.id
FROM stack as l INNER JOIN stack as r FOR SYSTEM_TIME AS OF r.ts
ON l.id=r.id
);
In both of these cases, Flink's SQL engine will be able to retain less state than with the regular join, and it will be able to produce watermarks in the output stream/table.
Another possible solution would be to convert the result table to a DataStream, then use the DataStream API to apply watermarking, and then convert that stream back to a table. But that's only going to make sense if you have some domain knowledge that allows you to know how out-of-order the result stream might be -- and you probably could have expressed that same information as either an interval or temporal join.

add tracking to an existing function in postgres plpgsql

I have a postgres table that gets its data (Inserts/Updates/Deletes) from a function that performs these operations.
I am looking to add tracking functionality to that function so that it automatically adds the old value, new value, the type of operation performed on the table (insert/update/delete), and the timestamp on a new table.
I am trying to make the logging table in the following format:
old_val new_val Type_of_operation Timestamp
---------|---------|-------------------|--------------------|
a,b,c,d | w,x,y,z | update | 11:09PM 01/08/2019 |
1,2,3,4 | | delete | 2:05PM 02/12/2018 |
| ki,hjko | insert | 09:00AM 02/10/2018 |
I was explicitly asked not to use triggers and to use plpgsql.
Is there anyway that i can modify the function to simultaneously add to the logging table along with the original table along with the type of operation in plpgsql.?

How to model table in cassandra for selecting with where condition

Suppose i have a table just like below
create table userevent(id uuid,eventtype text,sourceip text,user text,sessionid text,roleid int,menu text,action text,log text,date timestamp,PRIMARY KEY (id,eventtype));
id | action | date | eventtype | log | menu | roleid | sessionid | sourceip | user
--------------------------------------+--------+--------------------------+-----------+----------+-----------+--------+-----------+--------------+-------
6ac47b10-d6bb-11e8-bb9a-59dfa00365c6 | Login | 2018-10-01 04:05:00+0000 | DemoType | demolog | demomenu | 1 | Demo_1 | 121.11.11.12 | Aqib
62119cf0-d6bb-11e8-bb9a-59dfa00365c6 | Login | 2018-05-31 22:35:00+0000 | DemoType3 | demolog3 | demomenu3 | 3 | Demo_3 | 121.11.11.12 | Jasim
5ebb4600-d6bb-11e8-bb9a-59dfa00365c6 | Login | 2018-05-31 22:35:00+0000 | DemoType3 | demolog3 | demomenu3 | 3 | Demo_3 | 121.11.11.12 | Jasim
So how could i select if want full data that satisfies something like user="something" or eventtype="something" etc in my table.
Because when i tried with a simple select query with where condition user='Aqib', its giving error. i know that the data modeling in cassandra is not same as in sql.
Any one could help me its very much appreciable.
How to change the above table creation to satisfy below queries,
select * from userevent where user='Aqib';
select * from userevent where eventtype='DemoType';
select * from userevent where action='Login';
etc
First things first: there is no OR in Cassandra queries
If in your queries you are always restricting a field like user or eventtype or action then I suggest that you create a separate table for each of these types of queries, one table that supports querying with user field in which the partition key would be the user field and another table with eventtype as the partition key to support querying with the eventtype field, and one table for the action field etc... This complies with Cassandra's data modeling of building tables based on your queries.
So the table that supports querying with user should be:
CREATE TABLE userTable (
user text,
id uuid,
eventtype text,
sourceip text,
sessionid text,
roleid int,
menu text,
action text,
log text,
PRIMARY KEY (user)
);
Table that supports querying with eventtype should be:
CREATE TABLE eventtypeTable (
eventtype text,
id uuid,
user text,
sourceip text,
sessionid text,
roleid int,
menu text,
action text,
log text,
PRIMARY KEY (eventtype)
);
And you can create as many tables as you want each supporting a query.
Then when you execute your queries (for example in your application code) if you know you have the value of the user field value then query the table in which the user field is the partition key and restrict the user field value, for example:
select * from userTable where user='Aqib';
else if you know you have the eventtype field value then you should query the table in which the partition key is the eventtype field for example:
select * from eventtypeTable where eventtype='DemoType';
and similarly for the other fields and their tables.

Automatic update of values in SQL Server

For example I have a table tb with columns :
order_id | date_ordered | due_date | status
Are there any out of the box solution where I can automatically update status column when the current time (from server) reaches the value of the due_date column? How do I do it?
Thanks in advance.
UPDATE :
Something like this :
test1 | 2016-03-30 09:19:06.610 | 2016-03-30 11:19:06.610 | NEW
test2 | 2016-03-30 09:22:43.513 | 2016-03-30 11:22:43.513 | NEW
test3 | 2016-03-30 09:06:03.627 | 2016-03-30 11:06:03.627 | NEW
When the server time reaches 2016-03-30 11:19:06.610, test1's status will change value say, overdue
It depends on what you mean by "out of the box solution". You could create a sql server agent job, which checks every minute if the value due_date is less or equal to the current date and time and change the state column.
A computed column might be another, much simpler solution.
A table like this might suffice:
CREATE TABLE tb_test (
order_id INT PRIMARY KEY,
date_ordered DATETIME,
due_date DATETIME,
[status] as
CASE WHEN due_date <= GETDATE() THEN 'overdue'
ELSE 'new' END
);

Resources