Temporal Table function: Cannot add expression of different type to set - apache-flink

I am useing the temporal table funciton to join two stream like this, but got this error.
The diff between set type and expression type is the type of proctime0, one with NOT NULL
How will different appears, and any ways to solve this?
Exception in thread "main" java.lang.AssertionError: Cannot add expression of different type to set:
set type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" order_id, DECIMAL(32, 2) price, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" currency, TIMESTAMP(3) order_time, TIMESTAMP_LTZ(3) *PROCTIME* NOT NULL proctime, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" currency0, BIGINT conversion_rate, TIMESTAMP(3) update_time, TIMESTAMP_LTZ(3) *PROCTIME* proctime0) NOT NULL
expression type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" order_id, DECIMAL(32, 2) price, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" currency, TIMESTAMP(3) order_time, TIMESTAMP_LTZ(3) *PROCTIME* NOT NULL proctime, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" currency0, BIGINT conversion_rate, TIMESTAMP(3) update_time, TIMESTAMP_LTZ(3) *PROCTIME* NOT NULL proctime0) NOT NULL
set is rel#61:LogicalCorrelate.NONE.any.None: 0.[NONE].[NONE](left=HepRelVertex#59,right=HepRelVertex#60,correlation=$cor0,joinType=inner,requiredColumns={4})
expression is LogicalJoin(condition=[__TEMPORAL_JOIN_CONDITION($4, $7, __TEMPORAL_JOIN_CONDITION_PRIMARY_KEY($5))], joinType=[inner])
LogicalProject(order_id=[$0], price=[$1], currency=[$2], order_time=[$3], proctime=[PROCTIME()])
LogicalTableScan(table=[[default_catalog, default_database, orders]])
LogicalProject(currency=[$0], conversion_rate=[$1], update_time=[$2], proctime=[PROCTIME()])
LogicalTableScan(table=[[default_catalog, default_database, currency_rates]])
Fact Table:
CREATE TABLE `orders` (
order_id STRING,
price DECIMAL(32,2),
currency STRING,
order_time TIMESTAMP(3),
proctime as PROCTIME()
) WITH (
'properties.bootstrap.servers' = '127.0.0.1:9092',
'properties.group.id' = 'test',
'scan.topic-partition-discovery.interval' = '10000',
'connector' = 'kafka',
'format' = 'json',
'scan.startup.mode' = 'latest-offset',
'topic' = 'test1'
)
Build Table:
CREATE TABLE `currency_rates` (
currency STRING,
conversion_rate BIGINT,
update_time TIMESTAMP(3),
proctime as PROCTIME()
) WITH (
'properties.bootstrap.servers' = '127.0.0.1:9092',
'properties.group.id' = 'test',
'scan.topic-partition-discovery.interval' = '10000',
'connector' = 'kafka',
'format' = 'json',
'scan.startup.mode' = 'latest-offset',
'topic' = 'test3'
)
The way to generate table function:
TemporalTableFunction table_rate = tEnv.from("currency_rates")
.createTemporalTableFunction("update_time", "currency");
tEnv.registerFunction("rates", table_rate);
Join logic:
SELECT
order_id,
price,
s.currency,
conversion_rate,
order_time
FROM orders AS o,
LATERAL TABLE (rates(o.proctime)) AS s
WHERE o.currency = s.currency

try using tEnv.createTemporarySystemFunction("rates", table_rate) instead of using .registerFunction()

Related

Using flink sql how to convert a string read using json_value to timestamp_ltz

CREATE TABLE roles_created_raw_v1
(
id VARCHAR,
created VARCHAR
PRIMARY KEY (id) NOT ENFORCED
) WITH (
'connector' = 'upsert-kafka',
'topic' = 'sink_topic',
'properties.bootstrap.servers' = 'localhost:29092,localhost:39092',
'properties.group.id' = 'sink_topic_id',
'value.format' = 'json',
'key.format' = 'json',
'properties.allow.auto.create.topics' = 'true',
'value.json.timestamp-format.standard' = 'ISO-8601',
'sink.parallelism' = '3'
);
i am trying to insert into this table using
insert into roles_created_raw_v1
select
JSON_VALUE(contentJson, '$.id') as id,
to_timestamp(JSON_VALUE(contentJson, '$.created'), 'yyyy-MM-ddTHH:mm:ss.SSSZ') as created
from some_raw_table;
My contentJson field has
"contentJson": "{\"created\":\"2023-02-04T04:12:07.925Z\"}".
created field in the sink_topic and table roles_created_raw_v1 is null. How to i get this converted to timestamp_ltz field ?
Instead of to_timestamp(JSON_VALUE(contentJson, '$.created'), 'yyyy-MM-ddTHH:mm:ss.SSSZ') if used JSON_VALUE(contentJson, '$.created' RETURNING STRING) i get the string value back.
to_timestamp(replace(replace(JSON_VALUE(contentJson, '$. created'), 'T', ' '), 'Z', ' '),'yyyy-MM-dd HH:mm:ss.SSS')
This seems to work.

Why flink 1.15.2 showing No Watermark (Watermarks are only available if EventTime is used)

In my create table ddl, i have set watermark on column and doing simple count(distinct userId) on a tumble window of 1 min, but stil not getting any data, same simple job is working fine in 1.13
CREATE TABLE test (
eventName String,
ingestion_time BIGINT,
time_ltz AS TO_TIMESTAMP_LTZ(ingestion_time, 3),
props ROW(userId VARCHAR, id VARCHAR, tourName VARCHAR, advertiserId VARCHAR, deviceId VARCHAR, tourId VARCHAR),
WATERMARK FOR time_ltz AS time_ltz - INTERVAL '5' SECOND
) WITH (
'connector' = 'kafka',
'topic' = 'test',
'scan.startup.mode' = 'latest-offset',
'properties.bootstrap.servers' = 'localhost:9092',
'properties.group.id' = 'local_test_flink_115',
'format' = 'json',
'json.ignore-parse-errors' = 'true',
'scan.topic-partition-discovery.interval' = '60000'
);
Also we have other jobs migrated but no data is matching with output. Is there any watermark default setting we need to set.

Flink Dynamic Tables Temporal Join - Calcite error

I think I am doing exactly what was mentioned in this stack overflow question: Flink temporal join not showing data and what has been mentioned in the official doc to join two data streams with a temporal join but I keep getting the following error:
[ERROR] Could not execute SQL statement. Reason:
java.lang.ClassCastException: org.apache.flink.table.planner.plan.nodes.calcite.LogicalWatermarkAssigner cannot be cast to org.apache.calcite.rel.core.TableScan
while executing the following SQL:
SELECT v.rid, u.name, v.parsed_timestamp FROM VEHICLES v JOIN USERS FOR SYSTEM_TIME AS OF v.parsed_timestamp AS u ON v.userID = u.userID;
I have two dynamic tables that must be joined (regular join is not enough since I need to maintain the rowtime column from VEHICLES for further processing and lookup join is not okay since both sources come from Kafka).
The two tables were created with:
CREATE TABLE USERS (
userID BIGINT,
name STRING,
ts STRING,
parsed_timestamp AS TO_TIMESTAMP(ts),
WATERMARK FOR parsed_timestamp AS parsed_timestamp - INTERVAL '5' SECONDS,
PRIMARY KEY(userID) NOT ENFORCED
) WITH (
'connector' = 'kafka',
'topic' = 'USERS',
'properties.bootstrap.servers' = 'kafka:9092',
'properties.group.id' = 'testGroup4',
'scan.startup.mode' = 'earliest-offset',
'format' = 'json'
);
CREATE TABLE VEHICLES (
userID BIGINT,
rid BIGINT,
type STRING,
manufacturer STRING,
model STRING,
plate STRING,
status STRING,
ts STRING,
parsed_timestamp AS TO_TIMESTAMP(ts),
WATERMARK FOR parsed_timestamp AS parsed_timestamp - INTERVAL '5' SECONDS
) WITH (
'connector' = 'kafka',
'topic' = 'VEHICLES',
'properties.group.id' = 'mytestgroup4',
'scan.startup.mode' = 'earliest-offset',
'properties.bootstrap.servers' = 'kafka:9092',
'format' ='json'
);
Any suggestion on what I have done wrong? I couldn't find that much information about the error. If possible I would prefer to do not create the views and work directly with the two tables (and I have tried with the views but I still get the same message).
Thank you

Flink Source kafka Join with CDC source to kafka sink

We are trying to join from a DB-cdc connector (upsert behave) table.
With a 'kafka' source of events to enrich this events by key with the existing cdc data.
kafka-source (id, B, C) + cdc (id, D, E, F) = result(id, B, C, D, E, F) into a kafka sink (append)
INSERT INTO sink (zapatos, naranjas, device_id, account_id, user_id)
SELECT zapatos, naranjas, source.device_id, account_id, user_id FROM source
JOIN mongodb_source ON source.device_id = mongodb_source._id
The problem, this only works if our kafka sink is 'upsert-kafka'.
But this created tombstones on deletion in DB.
We need to just behave as plain events, not a changelog.
but we cannot use just 'kafka' sink because db connector is upsert so is not compatible...
What would be the way to do this? Transform the upsert into just append events?
s_env = StreamExecutionEnvironment.get_execution_environment()
s_env.set_stream_time_characteristic(TimeCharacteristic.EventTime)
s_env.set_parallelism(1)
# use blink table planner
st_env = StreamTableEnvironment \
.create(s_env, environment_settings=EnvironmentSettings
.new_instance()
.in_streaming_mode()
.use_blink_planner().build())
ddl = """CREATE TABLE sink (
`zapatos` INT,
`naranjas` STRING,
`account_id` STRING,
`user_id` STRING,
`device_id` STRING,
`time28` INT,
PRIMARY KEY (device_id) NOT ENFORCED
) WITH (
'connector' = 'upsert-kafka',
'topic' = 'as-test-output-flink-topic',
'properties.bootstrap.servers' = 'kafka:9092',
'properties.group.id' = 'testGroup',
'key.format' = 'raw',
'value.format' = 'json',
'value.fields-include' = 'EXCEPT_KEY'
)
"""
st_env.sql_update(ddl)
ddl = """CREATE TABLE source (
`device_id` STRING,
`timestamp` TIMESTAMP_LTZ(3) METADATA FROM 'timestamp',
`event_type` STRING,
`payload` ROW<`zapatos` INT, `naranjas` STRING, `time28` INT, `device_id` STRING>,
`trace_id` STRING
) WITH (
'connector' = 'kafka',
'topic' = 'as-test-input-flink-topic',
'properties.bootstrap.servers' = 'kafka:9092',
'properties.group.id' = 'testGroup',
'key.format' = 'raw',
'key.fields' = 'device_id',
'value.format' = 'json',
'value.fields-include' = 'EXCEPT_KEY'
)
"""
st_env.sql_update(ddl)
ddl = """
CREATE TABLE mongodb_source (
`_id` STRING PRIMARY KEY,
`account_id` STRING,
`user_id` STRING,
`device_id` STRING
) WITH (
'connector' = 'mongodb-cdc',
'uri' = '******',
'database' = '****',
'collection' = 'testflink'
)
"""
st_env.sql_update(ddl)
st_env.sql_update("""
INSERT INTO sink (zapatos, naranjas, device_id, account_id, user_id)
SELECT zapatos, naranjas, source.device_id, account_id, user_id FROM source
JOIN mongodb_source ON source.device_id = mongodb_source._id
""")
# execute
st_env.execute("kafka_to_kafka")
Dont mind the Mongo-cdc connector, is new but works as the mysql-cdc or postgre-cdc.
Thanks for your help!
Have you tried to use LEFT JOIN instead of JOIN?
It shouldn’t create tombstones then if your purpose is just enrichment of kafka events if there is any from mongo…

Overflow error with firebird stored procedure data type conversion

I have the following stored proc:
SET TERM ^ ;
CREATE PROCEDURE SP_OUTWARD_ACCOUNTS (
STARTDATE Timestamp,
ENDDATE Timestamp )
RETURNS (
DATEX Timestamp,
PERSONNAME Varchar(50),
FILENO Varchar(30),
ACCOUNTTYPE Smallint,
AMOUNT Decimal(9,2),
DUEDATE Timestamp,
BANKNAME Varchar(50),
CHECKNO Varchar(10),
NOTES Varchar(200),
PAIDINFULL Smallint,
PAIDSF Decimal(9,2),
BANKACCNO Varchar(20),
CHECKOWNER Varchar(50),
ENDORSEDTO Integer,
STATE Integer,
ID Integer,
MYTRANSID Integer,
MYTRANSTYP Smallint,
PERSONID Integer,
CHECKGIVER Varchar(50) )
AS
BEGIN
for
select * from (
SELECT ACCOUNTS.DATEX, PEOPLE.NAME as personname, ACCOUNTS.ACCOUNTTYPE, ACCOUNTS.AMOUNT, ACCOUNTS.DUEDATE, BANKS.BANKNAME, ACCOUNTS.CHECKNO,
ACCOUNTS.NOTES, ACCOUNTS.PAIDINFULL, SUM(PAYMENTS.AMOUNTPAID) AS PAIDSF, ACCOUNTS.BANKACCNO, ACCOUNTS.CHECKOWNER,
ACCOUNTS.ENDORSEDTO, ACCOUNTS.STATE,
ACCOUNTS.OUTTRANSID AS MYTRANSID,
ACCOUNTS.OUTTRANSTYP AS MYTRANSTYP, ACCOUNTS.ID,
ACCOUNTS.PERSONID, ACCOUNTS.FILENO,'' as checkgiver
FROM ACCOUNTS LEFT OUTER JOIN
PEOPLE ON ACCOUNTS.PERSONID = PEOPLE.ID LEFT OUTER JOIN
BANKS ON ACCOUNTS.BANKID = BANKS.BANKID LEFT OUTER JOIN
PAYMENTS ON ACCOUNTS.ID = PAYMENTS.ACCOUNTID
WHERE (ACCOUNTS.DATEX BETWEEN :STARTDATE AND :ENDDATE) AND (ACCOUNTS.OUTTRANSTYP<>-1) and accounts.ACCOUNTTYPE<>2
GROUP BY ACCOUNTS.DATEX, personname, ACCOUNTS.FILENO, ACCOUNTS.ACCOUNTTYPE, ACCOUNTS.AMOUNT, ACCOUNTS.DUEDATE, BANKS.BANKNAME, ACCOUNTS.CHECKNO,
ACCOUNTS.NOTES, ACCOUNTS.PAIDINFULL, ACCOUNTS.BANKACCNO, ACCOUNTS.CHECKOWNER, ACCOUNTS.ENDORSEDTO, ACCOUNTS.STATE, ACCOUNTS.ID,
MYTRANSID, MYTRANSTYP, ACCOUNTS.PERSONID,checkgiver
union SELECT ACCOUNTS.DATEX, PEOPLE.NAME as personname, ACCOUNTS.ACCOUNTTYPE, ACCOUNTS.AMOUNT, ACCOUNTS.DUEDATE , BANKS.BANKNAME, ACCOUNTS.CHECKNO,
ACCOUNTS.NOTES, ACCOUNTS.PAIDINFULL, SUM(PAYMENTS.AMOUNTPAID) AS PAIDSF, ACCOUNTS.BANKACCNO, ACCOUNTS.CHECKOWNER,
ACCOUNTS.ENDORSEDTO, ACCOUNTS.STATE,
ACCOUNTS.OUTTRANSID AS MYTRANSID,
ACCOUNTS.OUTTRANSTYP AS MYTRANSTYP, ACCOUNTS.ID,
ACCOUNTS.PERSONID, ACCOUNTS.FILENO, x.name as checkgiver
FROM ACCOUNTS LEFT OUTER JOIN
PEOPLE ON ACCOUNTS.ENDORSEDTO = PEOPLE.ID LEFT OUTER JOIN
BANKS ON ACCOUNTS.BANKID = BANKS.BANKID LEFT OUTER JOIN
PAYMENTS ON ACCOUNTS.ID = PAYMENTS.ACCOUNTID
left outer join (select ACCOUNTS.id, PEOPLE.name from PEOPLE inner join ACCOUNTS on people.id=accounts.PERSONID) x on x.ID=accounts.ID
WHERE (ACCOUNTS.DATEX BETWEEN :STARTDATE AND :ENDDATE) AND (ACCOUNTS.OUTTRANSTYP<>-1) and accounts.ACCOUNTTYPE=2
GROUP BY ACCOUNTS.DATEX, personname, ACCOUNTS.FILENO, ACCOUNTS.ACCOUNTTYPE, ACCOUNTS.AMOUNT,ACCOUNTS.DUEDATE, BANKS.BANKNAME, ACCOUNTS.CHECKNO,
ACCOUNTS.NOTES, ACCOUNTS.PAIDINFULL, ACCOUNTS.BANKACCNO, ACCOUNTS.CHECKOWNER, ACCOUNTS.ENDORSEDTO, ACCOUNTS.STATE, ACCOUNTS.ID,
MYTRANSID, MYTRANSTYP, ACCOUNTS.PERSONID,checkgiver
)
order by DATEX
into
:DATEX,
:personname,
:FILENO,
:ACCOUNTTYPE,
:AMOUNT,
:DUEDATE,
:BANKNAME,
:CHECKNO,
:NOTES,
:PAIDINFULL,
:PAIDSF,
:BANKACCNO,
:CHECKOWNER,
:ENDORSEDTO,
:STATE,
:ID,
:MYTRANSID,
:MYTRANSTYP,
:PERSONID,
:checkgiver
DO
begin
suspend;
end
END^
SET TERM ; ^
GRANT EXECUTE
ON PROCEDURE SP_OUTWARD_ACCOUNTS TO SYSDBA;
This code works when I run it on the query screen, and it compiles as stored proc, but when I try to run the stored proc as :
SELECT p.XX, p.PERSONNAME, p.FILENO, p.ACCOUNTTYPE, p.AMOUNT, p.YY, p.BANKNAME, p.CHECKNO, p.NOTES, p.PAIDINFULL, p.PAIDSF, p.BANKACCNO, p.CHECKOWNER, p.ENDORSEDTO, p.STATE, p.ID, p.MYTRANSID, p.MYTRANSTYP, p.PERSONID, p.CHECKGIVER
FROM SP_OUTWARD_ACCOUNTS('2014-03-01', '2014-03-20') p
An IBPP error occurred.
* IBPP::SQLException *
Context: Statement::Fetch
Message: isc_dsql_fetch failed.
SQL Message : -413
Overflow occurred during data type conversion.
Engine Code : 335544334
Engine Message :
conversion error from string "2014-02-28 00:00:00.0000"
At procedure 'SP_OUTWARD_ACCOUNTS' line: 28, col: 1
OK
What is wrong? please help.
tables structre:
TABLE ACCOUNTS
(
ACCOUNTTYPE Smallint,
PERSONID Integer,
DUEDATE Timestamp,
NOTES Varchar(200) DEFAULT '',
AMOUNT Decimal(9,2) DEFAULT 0,
BANKID Integer,
DIRECTION Smallint,
TRANSID Integer DEFAULT -1,
DATEX Timestamp,
ID Integer NOT NULL,
PAIDINFULL Smallint DEFAULT 0,
CHECKNO Varchar(10),
TRANSTYPE Smallint DEFAULT -1,
BANKACCNO Varchar(20) CHARACTER SET ASCII,
CHECKOWNER Varchar(50),
ENDORSEDTO Integer DEFAULT 0,
STATE Integer DEFAULT 0,
OUTTRANSID Integer DEFAULT -1,
OUTTRANSTYP Smallint DEFAULT -1,
DEPOSITBANK Integer DEFAULT -1,
PARENT Integer DEFAULT -1,
FILENO Varchar(30),
CONSTRAINT PK_ACCOUNTS PRIMARY KEY (ID)
);
PEOPLE:
TABLE PEOPLE
(
CTYPE Smallint,
NAME Varchar(50),
COMPANY Varchar(50),
PHONE Varchar(30),
MOBILE Varchar(30),
EMAIL Varchar(40),
ADDRESS Varchar(120),
NOTES Varchar(200),
ID Integer NOT NULL,
HIDDEN Smallint DEFAULT 0
);
BANKS:
TABLE BANKS
(
BANKNAME Varchar(50),
BANKBALANCE Decimal(9,2) DEFAULT 0,
BANKID Integer NOT NULL
);
PAYMENTS:
TABLE PAYMENTS
(
ACCOUNTID Integer,
NOTES Varchar(200),
AMOUNTPAID Decimal(9,2) DEFAULT 0,
PAYMENTDATE Timestamp,
ID Integer NOT NULL,
RECORDDATETIME Timestamp DEFAULT CURRENT_TIMESTAMP,
DIRECTION Smallint
);
The problem is that the columns in your select do not match the output columns in the INTO clause. The mapping of INTO is based on position, not on name!
If you look closely they map like:
ACCOUNTS.DATEX -------------> :DATEX,
PEOPLE.NAME as personname --> :personname,
ACCOUNTS.ACCOUNTTYPE -------> :FILENO, <==== Mismatch starts here
ACCOUNTS.AMOUNT ------------> :ACCOUNTTYPE,
ACCOUNTS.DUEDATE -----------> :AMOUNT,
BANKS.BANKNAME -------------> :DUEDATE,
....
The problem occurs when ACCOUNTS.DUEDATE is assigned to AMOUNT (which is a DECIMAL(9,2)). This is a conversion that is not supported, so Firebird attempts TIMESTAMP => string (VARCHAR/CHAR) => DECIMAL(9,2). And this conversion fails.
Note that it looks like this isn't the only mixup in column order you have, so only fixing this one isn't going to solve the problem.
The best solution is probably to replace the SELECT * with an explicit list in the same order as your INTO clause.

Resources