Flink Table API : Flink Dynamic table produces incorrect intermediate CDC values - apache-flink

I am using flink table API to calculate few aggregations. I have stream of data coming from Kafka which is transformed to stream of rows. Using this rows I am creating dynamic table.
Ex: consider below three records, primary key is "id"
DataStream<Row> stream = Row.of(RowKind.UPSERT_AFTER, {"id": 1, "globalId": 123, "demand": 10}), Row.of(RowKind.UPSERT_AFTER, {"id": 2, "globalId": 123, "demand": 20}), Row.of(RowKind.UPSERT_AFTER, {"id": 1, "globalId": 123, "demand": 30}).....
(Not exactly like this, but consider these as Row of id, globalId, demand fields)
When I create table using above stream,
Table streamTable = tableEnv.fromChangeLogStream(stream, Schema.builder().primaryKey("id")).
I see below output
rowKind
id
globalId
demand
+I
1
123
10.
+I
2
123.
20.
-U
1
123
10.(invalidate for id 1, added new entry below)
+U
1
123.
30.
I am using this table to calculate "sum of demand grouped by globalId"
Table demandSum = tableEnv.sqlQuery("select globalId, sum(demand) from "+ streamTable + " group by globalId"); DataStream<Row> final = tableEnv.toChangelogStream(demandSum); final.print();
I am getting below output , which has some intermediate values, because it has to consider "-U" from streamTable. I see a subtracted value in between but I am only interested in end value.
+I, 123, 10
-U, 123, 10
+U, 123, 30 -- correct till this point
-U, 123, 30
+U, 123, 20 -- this dip is because it has to first subtract demand 10 of id 1 and then add 30.
But I don't want this in my output stream.
-U, 123, 20
+U, 123, 50 -- again correct value at end
How to handle this case in TableAPI?
I tried using upsertMode so I see only +I /+U in streamTable. But for final table to calculate value correctly it needs -U.
For upsertMode , it just added 2 value
So total value is correct. In this final result was 60, instead of 50
Window on final table: tried to get only top value of window. This is simple example, But I have usecases where for one -U in first table can generate many intermediate values in final table. But this window can end at any wrong value and I don't have any field to identify if it's right /wrong
I have achieved this use case using stream APIs in flink. But Table API is much easier to write and developer friendly. So want these use cases using Table APIs.

Related

LARAVEL - where inside array have id and a value that greater than 80

hello all so i have a question,
i have a sql database that have object array inside it, it will be like this
id_data
array
created_at
1
{"id":1032,"prc":77},{"id":1033,"prc":97}
2021-09-28 12:30:04
2
{"id":1032,"prc":85},{"id":1034,"prc":97}
2021-09-28 12:30:04
3
{"id":1030,"prc":85},{"id":1031,"prc":97}
2021-09-28 12:30:04
4
{"id":1032,"prc":90},{"id":1033,"prc":97},{"id":1035,"prc":97}
2021-09-28 12:30:04
and what i want to do is take every data with id 1032 that has prc greater than 80 with its
so in this table it would take id number 2, and 4
i need take the data using eloquent laravel, can somebody help me?, i'm still new to laravel
Data::where("something here")->get()
Take a see to the answer of a question like yours (How to query array inside JSON column in Eloquent).
You can use whereJsonContains (Laravel JSON queries).
This feature is not supported by the SQLite database
Data::whereJsonContains('array_field->id', '1032')->get();
Updated:
#brian christian: how i can get the "prc greater than"
...->where('array_field->prc','>', '80')
And mysql query is (Mysql JSON_TABLE):
select * from tbl where
json_contains(array_field, '{"id":1032}') and
(select min(all_prc.prc) from json_table(array_field,'$[*]."prc"' columns(prc int path '$')) as all_prc) > 80
After all you can use Regex like this:
select * from tbl where
`array_field` REGEXP '\\"id\\":\\s{0,}1032' and
`array_field` REGEXP '\\"prc\\":\\s{0,}(8[1-9]|9\\d)|(\\d{3,})'
In eloquent:
Data::whereRaw('`array_field` REGEXP \'\\"id\\":\\s{0,}1032\'')
->whereRaw('`array_field` REGEXP \'\\"prc\\":\\s{0,}(8[1-9]|9\\d)|([1-9]\\d{2,})\'')
->get();
\\"id\\":\\s{0,}1032: All rows that have id value 1032.
\\"prc\\":\\s{0,}([8-9]\\d)|([1-9]\\d{2,}): All rows that have prc greater than 80 (81-99) and (100-Infinite).

Split data from strings into columns

I have a column with a long string. The data needs split into columns and there are variable lengths of strings with not always the same amount of columns. Not exactly sure how to do this so was looking for some advice here.
Lets say I have this string:
VS5~MedCond1~35.4|VS4~MedCond2~16|VS1~MedCond3~155|VS2~MedCond4~70|SPO2~MedCond5~100|VS3~MedCond6~64|FiO2~MedCond7~21|MAP~MedCond8~98|
And in some cases the string might not have all the medical conditions just some of them.
I need to split into columns where the column name is in between the tilds i.e. MedCond1 and the value would be the value to the right of the tild but before the pipe and end up like this:
MedCond1 MedCond2 MedCond3 MedCond4 MedCond5 MedCond6 MedCond7 MedCond8
======== ======== ======== ======== ======== ======== ======== ========
35.1 24 110 64 100 88 21 79
I need to do this for a lot of rows within a large table and as I said not all the columns are always present but they will not be different names, you might have med cond 1- 8, then in another set have med cond 3, 4, 7.
Here is a query I created that is kind of what I want but not dynamic so it is picking up the values with some extra bits of the string
select MainCol, case when charindex('MedCond1', MainCol) > 0 then
substring(MainCol, charindex('MedCond1', MainCol) + 9, 4) end as [MedCond1]
from MedTable
Will return
MedCond1
========
35.3
40.2
33.6
33|V <--- Problem
As you can see the numeric value is sometimes picked up with additional part of the string due to hard coding of the charindex number. The value is sometimes 4 characters long with a decimal place, sometimes 2 long with no decimal place. I would like to make this dynamic. The pipe defines the end of the data I need and the start is defined by the tild at the end of the column name.
Thanks for any thoughts on making this dynamic
Andrew
This data looks like a table itself. It could have been stored in SQL Server as xml. SQL Server supports xml fields and allows querying them. In fact, one could try to convert this string to XML, then try to query it:
declare #medTable table (item nvarchar(2000))
insert into #medTable
values ('VS5~MedCond1~35.4|VS4~MedCond2~16|VS1~MedCond3~155|VS2~MedCond4~70|SPO2~MedCond5~100|VS3~MedCond6~64|FiO2~MedCond7~21|MAP~MedCond8~98|');
-- Step 1: Replace `|` with <item> tags and `~` with `tag` tags
-- This will return an xml value for each medTable row
with items as (
select xmlField= cast('<item><tag>'
+ replace(
replace(item,'|','</tag></item><item><tag>'),
'~','</tag><tag>' )
+ '</tag></item>' as xml)
from #medTable
)
-- Step 2: Select different tags and display them as fields
select
y.item.value('(tag/text())[1]','nvarchar(20)'),
y.item.value('(tag/text())[2]','nvarchar(20)'),
y.item.value('(tag/text())[3]','nvarchar(20)')
from items outer apply xmlField.nodes('item') as y(item)
The result is :
-------------------- -------------------- -------
VS5 MedCond1 35.4
VS4 MedCond2 16
VS1 MedCond3 155
VS2 MedCond4 70
SPO2 MedCond5 100
VS3 MedCond6 64
FiO2 MedCond7 21
MAP MedCond8 98
NULL NULL NULL
It would be better to perform this conversion when loading the data though. It's easier for example, to make the replacements in C# or SSIS and store a complete xml value in the database.
You can modify this query too, to generate the xml value and store it in the database:
declare #medTable2 table (xmlField xml)
with items as (
select xmlField= cast('<item><tag>' + replace(replace(item,'|','</tag></item><item><tag>'),'~','</tag><tag>' ) + '</tag></item>' as xml)
from #medTable
)
insert into #medTable2
select items.xmlField
from items
-- Query the new table from now on
select
y.item.value('(tag/text())[1]','nvarchar(20)'),
y.item.value('(tag/text())[2]','nvarchar(20)'),
y.item.value('(tag/text())[3]','nvarchar(20)')
from #medTable2 outer apply xmlField.nodes('item') as y(item)
OK, let me take a stab at this. The solution I'm outlining is not going to be purely SQL Server, however, it uses a round-trip via a text-file.
The approach uses the following steps:
Unpivot the data delimited by the pipe symbols (to create more than one line of output for each line of input)
Round-trip the data from SQL Server to a text file and back
Separate the data into columns on the tilde ~ symbol delimiter
Pivot the data back into columns
The key benefit of this approach is the unpivot operation, which allows you to handle missing columns like MedCond2 naturally by the absence of an equivalent row. It also eliminates nearly all string manipulation, save for the one REPLACE function in step 1 below.
Given a single row's contents like the following:
VS5~MedCond1~35.4|VS4~MedCond2~16|VS1~MedCond3~155|VS2~MedCond4~70|SPO2~MedCond5~100|VS3~MedCond6~64|FiO2~MedCond7~21|MAP~MedCond8~98|
Step 1 (Unpivot): Find and replace all instances of the pipe symbol with a newline character. So, REPLACE(column, '|', CHAR(13)) will give you the following lines of text (i.e. multiple lines of text in a single database row) for a single input row:
VS5~MedCond1~35.4
VS4~MedCond2~16
VS1~MedCond3~155
VS2~MedCond4~70
SPO2~MedCond5~100
VS3~MedCond6~64
FiO2~MedCond7~21
MAP~MedCond8~98
Step 2 (Round-trip): Write the above output to a text file, using your tool of choice (SSIS, SQLCMD, etc.) and ensure that the newline character defined is the same as that used in the REPLACE command in step 1.
The purpose of this step is to concatenate multiple lines within the same row with other lines in different rows.
Note that steps 1 can be eliminated by defining the row delimiter for steps 2 & 3 as the pipe symbol. I've put in the additional step 1 using newlines only to make it easier to understand and debug.
Step 3 (Separate columns): Import the text file back into SQL Server using the same tool, and define the column delimiter as the tilde ~ symbol, row delimiter same as in steps 1/2.
ColA MedCondTitle MedCondValue
------ ------------- -------------
VS5 MedCond1 35.4
VS4 MedCond2 16
VS1 MedCond3 155
VS2 MedCond4 70
SPO2 MedCond5 100
VS3 MedCond6 64
FiO2 MedCond7 21
MAP MedCond8 98
Step 4 (Pivot): Now you'd have a trivially simple step of pivoting rows to columns, which can be achieved with a statement of the form:
SUM(CASE WHEN MedCondTitle='MedCond1' THEN MedCondValue ELSE 0) as MedCond1

MSSQL Data type conversion

I have a pair of databases (one mssql and one oracle), ran by different teams. Some data are now being synchronized regularily by a stored procedure in the mssql table. This stored procedure is calling a very large
MERGE [mssqltable].[Mytable] as s
USING THEORACLETABLE.BLA as t
ON t.[R_ID] = s.[R_ID]
WHEN MATCHED THEN UPDATE SET [Field1] = s.[Field1], ..., [Brokenfield] = s.[BrokenField]
WHEN NOT MATCHED BY TARGET THEN
... another big statement
Field Brokenfield was a numeric one until today, and could take value NULL, 0, 1, .., 24
Now, the oracle team introduced a breaking change today for some reason, changed the type of the column to string and now has values NULL, "", "ALFA", "BRAVO"... in the column. Of course, the sync got broken.
What is the easiest way to fix the sync here? I (Mysql team lead, frontend expert but not so in databases) would usually apply one of our database expert guys here, but all of them are now ill, and the fix must go online today....
I thought of a stored procedure like CONVERT_BROKENFIELD_INT_TO_STRING or so, based on some switch-case, which could be called in that merge statement, but not sure how to do that.
Edit/Clarification:
What I need is a way to make a chunk of SQL code (stored procedure), taking an input of "ALFA" and returning 1, "BRAVO" -> 2, etc. and which can be reused, to avoid writing huge ifs in more then one place.
If you can not simplify the logic for correct values the way #RichardHansell desribed, you can create a crosswalk table for BrokenField to the correct values. Then you can use a common table expression or subquery with a left join to that crosswalk to use in the merge.
create table dbo.BrokenField_Crosswalk (
BrokenField varchar(32) not null primary key
, CorrectedValue int
);
insert into dbo.BrokenField_Crosswalk (BrokenField,CorrectedValue) values
('ALFA', 1)
, ('ALPHA', 1)
, ('BRAVO', 2)
...
go
And your code for the merge would look something like this:
;with cte as (
select o.R_ID
, o.Field1
, BrokenField = cast(isnull(c.CorrectedValue,o.BrokenField) as int)
....
from oracle_table.bla as o
left join dbo.BrokenField_Crosswalk as c
)
merge into [mssqltable].[Mytable] t
using cte as s
on t.[R_ID] = s.[R_ID]
when matched
then update set
[Field1] = s.[Field1]
, ...
, [Brokenfield] = s.[BrokenField]
when not matched by target
then
If they are using names with a letter at the start that goes in a sequence:
A = 1
B = 2
C = 3
etc.
Then you could do something like this:
MERGE [mssqltable].[Mytable] as s
USING THEORACLETABLE.BLA as t
ON t.[R_ID], 1)) - ASCII('A') + 1 = s.[R_ID]
WHEN MATCHED THEN UPDATE SET [Field1] = s.[Field1], ..., [Brokenfield] = s.[BrokenField]
WHEN NOT MATCHED BY TARGET THEN
... another big statement
Edit: but actually I re-read your question and you are talking about [Brokenfield] being the problem column, so my solution wouldn't work.
I don't really understand now, as it seems as though the MERGE statement is updating the oracle table with numbers, so surely you need the mapping to work the other way, i.e. 1 -> ALFA, 2 -> BETA, etc.?

Loading Flat File into SQL Server using SSIS

New to SSIS and am trying to import a flat file into my DB. There are 6 different rows on the flat file that I need to combine into one row in the database, each of these rows contain a different price for one symbol. For example below:
IGBGK 21 w 47
IGBGK 21 u 2.9150
IGBGK 21 h 2.9300
IGBGK 21 l 2.9050
IGBGK 22 h 2.9300
IGBGK 22 l 2.8800
So each of these are in a different rows on the flat file but will become one row in different columns for symbol IGBGK. I can transform the data to place each number into its own column but can not get them to combine into one row.
Any help on the direction I need to go with this is greatly appreciated.
End product should look like:
Symbol | col 1 | col 2 | col 3 | col 4 | col 5 | col 6
-------+-------+-------+-------+-------+-------+-------
IGBGK | 47 | 2.915 | 29.30 | 2.905 | 2.930 | 2.880
1.Name a variable with whatever name you want with system object type
2.Use execute sql task
Query for you table:
WIth ABC
as
(Select * From table --which give you the original result
)
Select * From ABC
PIVOT (Count(**4th Column Name**) for **1st Column Name** IN ([col 1],[col 2],[col 3],[col 4],[col 5],[col 6]))
4.copy all the complete query into that task and specify the result Set to Full result
5.Switch to Result Set page, choose the variable you create, and set the result name to 0
6.Now every time you run the package the variable will be assigned as the complete result table as shown in your desired format above.
7.And specify another 7 variables corresponding to each column, "symbol, [col 1]...", should be string data type for each variable
Use another execute sql task, specify Variable in SQL Source Type, then go to the Parameter Mapping page, choose that System Object variable, set Name to 0, after that go to Result set page, choose all those seven parameters one by one, and change the parameter name to 0,1,2,3,4,5,6
From now on every time you run the package, each variable would be assigned each value, if you want to load them into target table, here comes the last step
Use another Execute SQL Task, using query like this:
Insert into table
select ?,?,?,?,?,?,?
go to the Parameter Mapping page, choose all those seven variables and change name to 0,1,2,3,4,5,6 for each one by one to map the ?
There could be some small issue you need to figure by yourself, like the data type, but the logic is almost like this.
Hope this helps!

Loop 5 records at a time and assign it to variable

I have a table of 811 records. I want to get five records at a time and assign it to variable. Next time when I run the foreach loop task in SSIS, it will loop another five records and overwrite the variable. I have tried doing with cursor but couldn't find the solution. Any help will be highly appreciated. I have table like this for e.g.
ServerId ServerName
1 Abc11
2 Cde22
3 Fgh33
4 Ijk44
5 Lmn55
6 Opq66
7 Rst77
. .
. .
. .
I want query should take first five names as follows and assign it to variable
ServerId ServerName
1 Abc11
2 Cde22
3 Fgh33
4 Ijk44
5 Lmn55
Then next loop takes another five name and overwrite the variable value and so on till the last record is consumed.
Taking ltn's answer into consideration this is how you can achieve limiting the rows in SSIS.
The Design will look like
Step 1 : Create the variables
Name DataType
Count int
Initial int
Final int
Step 2 : For the 1st Execute SQL Task write the sql to store the count
Select count(*) from YourTable
In the General tab of this task Select the ResultSet as Single Row.
In the ResultSet tab map the result to the variable
ResultName VariableName
0 User::Count
Step 3 : In the For Loop container enter the expression as shown below
Step 4 : Inside the For Loop drag an Execute SQL Task and write the expression
In Parameter Mapping map the initial variable
VariableName Direction DataType ParameterName ParameterSize
User::Initial Input NUMERIC 0 -1
Result Set tab
Result Name Variable Name
0 User::Final
Inside the DFT u can write the sqL to get the particular rows
Click on Parameters and select the variable INITIAL and FINAL
if your data will not be update between paging cycles and the sort order is always the same then you could try an approach similiar to:
CREATE PROCEDURE TEST
(
#StartNumber INT,
#TakeNumber INT
)
AS
SELECT TOP(#TakeNumber)
*
FROM(
SELECT
RowNumber=ROW_NUMBER() OVER(ORDER BY IDField DESC),
NameField
FROM
TableName
)AS X
WHERE RowNumber>=#StartNumber

Resources