Flink- multi event dependency sql query on dataStream - apache-flink

Not getting expected behavior, my flink application getting live event and my trigger condition is depend on two event ABC and XYZ. when both event reach then trigger the notification.
application is using StreamTableEnviornment
here is the sql query that I am using
SELECT *
from EventTable
where eventName in ('ABC','XYZ')
and 1 IN (select 1 from EventTable where name='XYZ')
and 1 IN (select 1 from EventTable where name='ABC')
use case: 1
ABC event comes -->nothing happens (as expected and waiting for XYZ event)
XYZ event comes --> condition match and sql query gives two event record(ABC &XYZ) and it trigger the notification (as expected)
Now again if I send 'ABC' event then sql query give the result ABC event and notification triggered.
I was expecting that query will not give result as only one event ABC reached and will wait for event XYZ. could you please help me with this behaviour? Am I missing something to get the expected result?

When the second ABC is added to the dynamic table, the first XYZ is already there, so the conditions are met. The addition of this third row to the input table causes one new row to be appended to the output table.
See Dynamic Tables in the documentation for more information about the model underlying stream SQL.

Related

System.sobjectexception: sobject row was retrieved via soql without querying the requested field: Asset.Product2

I have a problem on the query on my test class. I have put AssetId__r.Product2Id on the 1st query, and put Product2Id in the Asset query. but still the error persist. as the error says it needs Asset.Product2 which is an SObject , not a field and I am lost on how to fix this error. does anyone encounter this error before? please I need your help.
would be easier if you'd post the actual query.
You probably have something like that in your code:
String name = myObject.AssetId__r.Product2.Name;
but the query SELECT AssetId__r.Product2Id, ... FROM MyTable WHERE ...
So what you can change is put more product fields in the query. You can go up to 5 dots "up".
SELECT AssetId__r.Product2.Id,
AssetId__r.Product2.Name,
AssetId__r.Product2.ProductCode
FROM myObject
WHERE ...
It will work and then Apex that uses it can treat AssetId__r.Product2 as a normal Product2 object, as if it was queried separately. It'll have Id, Name, ProductCode fields set.
The query is also "safe". If AssetId__c is null or Product2Id is null - it'll still execute OK. You'll have to do null checks in Apex (or in query's WHERE clause)

Flink CEP cannot get correct results on a unioned table

I use Flink SQL and CEP to recognize some really simple patterns. However, I found a weird thing (likely a bug). I have two example tables password_change and transfer as below.
transfer
transid,accountnumber,sortcode,value,channel,eventtime,eventtype
1,123,1,100,ONL,2020-01-01T01:00:01Z,transfer
3,123,1,100,ONL,2020-01-01T01:00:02Z,transfer
4,123,1,200,ONL,2020-01-01T01:00:03Z,transfer
5,456,1,200,ONL,2020-01-01T01:00:04Z,transfer
password_change
accountnumber,channel,eventtime,eventtype
123,ONL,2020-01-01T01:00:05Z,password_change
456,ONL,2020-01-01T01:00:06Z,password_change
123,ONL,2020-01-01T01:00:08Z,password_change
123,ONL,2020-01-01T01:00:09Z,password_change
Here are my SQL queries.
First create a temporary view event as
(SELECT accountnumber,rowtime,eventtype FROM password_change WHERE channel='ONL')
UNION ALL
(SELECT accountnumber,rowtime, eventtype FROM transfer WHERE channel = 'ONL' )
rowtime column is the event time extracted directly from original eventtime col with watermark periodic bound 1 second.
Then output the query result of
SELECT * FROM `event`
MATCH_RECOGNIZE (
PARTITION BY accountnumber
ORDER BY rowtime
MEASURES
transfer.eventtype AS event_type,
transfer.rowtime AS transfer_time
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (transfer password_change ) WITHIN INTERVAL '5' SECOND
DEFINE
password_change AS eventtype='password_change',
transfer AS eventtype='transfer'
)
It should output
123,transfer,2020-01-01T01:00:03Z
456,transfer,2020-01-01T01:00:04Z
But I got nothing when running Flink 1.11.1 (also no output for 1.10.1).
What's more, I change the pattern to only password_change, it still output nothing, but if I change the pattern to transfer then it outputs several rows but not all transfer rows. If I exchange the eventtime of two tables which means let password_changes happen first, then the pattern password_change will output several rows while transfer not.
On the other hand, if I extract those columns from two tables and merge them in one table manually, then emit them into Flink, the running result is correct.
I searched and tried a lot to get it right including changing the SQL statement, watermark, buffer timeout and so on, but nothing helped. Hope anyone here can help. Thanks.
10/10/2020 update:
I use Kafka as the table source. tEnv is the StreamTableEnvironment.
Kafka kafka=new Kafka()
.version("universal")
.property("bootstrap.servers", "localhost:9092");
tEnv.connect(
kafka.topic("transfer")
).withFormat(
new Json()
.failOnMissingField(true)
).withSchema(
new Schema()
.field("rowtime",DataTypes.TIMESTAMP(3))
.rowtime(new Rowtime()
.timestampsFromField("eventtime")
.watermarksPeriodicBounded(1000)
)
.field("channel",DataTypes.STRING())
.field("eventtype",DataTypes.STRING())
.field("transid",DataTypes.STRING())
.field("accountnumber",DataTypes.STRING())
.field("value",DataTypes.DECIMAL(38,18))
).createTemporaryTable("transfer");
tEnv.connect(
kafka.topic("pchange")
).withFormat(
new Json()
.failOnMissingField(true)
).withSchema(
new Schema()
.field("rowtime",DataTypes.TIMESTAMP(3))
.rowtime(new Rowtime()
.timestampsFromField("eventtime")
.watermarksPeriodicBounded(1000)
)
.field("channel",DataTypes.STRING())
.field("accountnumber",DataTypes.STRING())
.field("eventtype",DataTypes.STRING())
).createTemporaryTable("password_change");
Thank #Dawid Wysakowicz's answer. To confirm that, I added 4,123,1,200,ONL,2020-01-01T01:00:10Z,transfer to the end of transfer table, then the output becomes right, which means it is really some problem about watermarks.
So now the question is how to fix it. Since a user will not change his/her password frequently, the time gap between these two table is unavoidable. I just need the UNION ALL table has the same behavior as that I merged manually.
Update Nov. 4th 2020:
WatermarkStrategy with idle sources may help.
Most likely the problem is somewhere around watermark generation in conjunction with the UNION ALL operator. Could you share how you create the two tables including how you define the time attributes and what are the connectors? It could let me confirm my suspicions.
I think the problem is that one of the sources stops emitting Watermarks. If the transfer table (or the table with lower timestamps) does not finish and produces no records it emits no Watermarks. After emitting the fourth row it will emit Watermark = 3 (4-1 second). The Watermark of a union of inputs is the smallest of values of the two. Therefore the first table will pause/hold the Watermark with value Watermark = 3 and thus you see no progress for the original query and you see some records emitted for the table with smaller timestamps.
If you manually join the two tables, you have just a single input with a single source of Watermarks and thus it progresses further and you see some results.

multiple emails sent by sp_db_sendmail when run as SSIS package

I've a procedure which generates a tab delimited text file and also sends an email with a list of students as attachment using msdb.dbo.sp_send_dbmail.
When I execute the procedure thoruhg SQL server management studio, it sends only one email.
But I created a SSIS package and scheduled the job to run nightly. This job sends 4 copies of the email to each recipient.
EXEC msdb.dbo.sp_send_dbmail #profile_name = 'A'
,#recipients = #email_address
,#subject = 'Error Records'
,#query = 'SELECT * FROM ##xxxx'
,#attach_query_result_as_file = 1
,#query_attachment_filename = 'results.txt'
,#query_result_header = 1
,#query_result_width=8000
,#body = 'These students were not imported'
I've set following parameters to 0 (within database mail configuration wizard), to see if it makes any difference. But it didn't resolve the problem.
AccountRetryAttempts 0
AccountRetryDelay 0
DatabaseMailExeMinimumLifeTime 0
Any suggestions?
I assume you have this email wired up to an event, like OnError/OnTaskFailed, probably at the root level.
Every item you add to a Control Flow adds another layer of potential events. Imagine a Control Flow with a Sequence Container which Contains a ForEach Enumerator which contains a Data Flow Task. That's a fairly common design. Each of those objects has the ability to raise/handle events based on the objects it contains. The distance between the Control Flow's OnTaskFailed event handler and the Data Flow's OnTaskFailed event handler is 5 objects deep.
Data flow fails and raises the OnTaskFailed message. That message bubbles all the way up to the Control Flow resulting in email 1 being fired. The data flow then terminates. The ForEach loop receives signal that the Data Flow has completed and the return status was a failure so now the OnTaskFailed error fires for the Foreach loop. Repeat this pattern ad nauseum until every task/container has raised their own event.
Resolution depends, but usually folks get around this by either only putting the notification at the innermost objects (data flow in my example) or disabling the percolation of event handlers.
Check the solution here (it worked for me as I was getting 2 at a time) - Stored procedure using SP_SEND_DBMAIL sending duplicate emails to all recipients
Change the number of retries from X to 0. Now I only get 1 email. It'll be more obvious if your users are getting 4 emails, exactly 1 minute apart.

How can I refresh a TClientDataSet without applying pending updates?

Here is what I'm trying to accomplish:
Retrieve 1 record from the database through TSQLDataset's CommandText: SELECT * FROM myTable WHERE ID = 1
Use TClientDataset to modify the record. (1 pending update)
Retrieve next record. SELECT * FROM myTable WHERE ID = 2
Modify the record. (now 2 pending updates)
Finally, send the 2 pending updates back to the database through ApplyUpdates function.
When I do step 3 I got "Must apply updates before refreshing data."
How can I refresh a TClientDataSet without applying pending updates?
You can append data packets manually to your DataSet calling the AppendData method.
In an application where the provider is in the same application with the ClientDataSet you can code something like this:
begin
ConfigureProviderToGetRecordWithID(1);
//make the ClientDataSet fetch this single record and not hit the EOF.
ClientDataSet1.PacketRecords := 1;
ClientDataSet1.Open;
ClientDataSet1.Edit;
ModifyFirstRecord;
ClientDataSet1.Post;
ConfigureProviderToGetRecordWithID(2);
ClientDataSet1.AppendData(DataSetProvider1.Data, False);
//now you have two records in your DataSet without losing the delta.
end;
This is kind of pseudo-code, but shows the general technique you could use.

TClientDataset ApplyUpdates error because of database table constraint

I have an old Delphi 7 application that loads data from one database table, make many operations and calculation and finally writes records to a destination table.
This old application calls ApplyUpdates every 500 records, for performances reasons.
The problem is that, sometimes, in this bunch of records lies one that will trigger database constraint; Delphi fires an exception on ApplyUpdates.
My problem is I don't know which record is responsible for this exception. There are 500 candidates!
Is it possible to ask TClientDataset which is the offending record?
I do not want to ApplyUpdates foreach appended record for speed issues.
I think you may try to implement the OnReconcileError event which is being fired once for each record that could not be applied to the dataset. So I would try the following code, raSkip means here to skip the current record:
procedure TForm1.ClientDataSet1ReconcileError(DataSet: TCustomClientDataSet;
E: EReconcileError; UpdateKind: TUpdateKind; var Action: TReconcileAction);
begin
Action := raSkip;
ShowMessage('The record with ID = ' + DataSet.FieldByName('ID').AsString +
' couldn''t be updated!' + sLineBreak + E.Context);
end;
But please note, I've never tried this before and I'm not sure if it's not too late to ignore the errors raised by the ApplyUpdates function. Forgot to mention, try to use the passed parameter DataSet which should contain the record that couldn't be updated; it might be the way to determine what record caused the problem.
And here is described the updates applying workflow.
Implementing OnReconcileError will give you access to the record and data that is responsible for the exception. An easy to accomplish this is to add a “Reconcile Error Dialog”. It is located on the “New Items” dialog which is displayed by File | New | Other. Once you have added it to your project and used it in the form with the clientdataset. The following code shows how it is invoked.
procedure TForm1.ClientDataSetReconcileError(DataSet: TCustomClientDataSet;
E: EReconcileError; UpdateKind: TUpdateKind;
var Action: TReconcileAction);
begin
Action := HandleReconcileError(DataSet, UpdateKind, E);
end;
It will display instead of the exception dialog. It will allow you to view the offending data and select how you want to proceed. It has been over 5 years since I last used it, hopefully I have not forgotten some details.

Resources