Nested match_recognize query not supported in flink SQL? - apache-flink

I am using flink 1.11 and trying nested query where match_recognize is inside, as shown below :
select * from events where id = (SELECT * FROM events MATCH_RECOGNIZE (PARTITION BY org_id ORDER BY proctime MEASURES A.id AS startId ONE ROW PER MATCH PATTERN (A C* B) DEFINE A AS A.tag = 'tag1', C AS C.tag <> 'tag2', B AS B.tag = 'tag2'));
And I am getting an error as : org.apache.calcite.sql.validate.SqlValidatorException: Table 'A' not found
Is this not supported ? If not what's the alternative ?

I was able to get something working by doing this:
Table events = tableEnv.fromDataStream(input,
$("sensorId"),
$("ts").rowtime(),
$("kwh"));
tableEnv.createTemporaryView("events", events);
Table matches = tableEnv.sqlQuery(
"SELECT id " +
"FROM events " +
"MATCH_RECOGNIZE ( " +
"PARTITION BY sensorId " +
"ORDER BY ts " +
"MEASURES " +
"this_step.sensorId AS id " +
"AFTER MATCH SKIP TO NEXT ROW " +
"PATTERN (this_step next_step) " +
"DEFINE " +
"this_step AS TRUE, " +
"next_step AS TRUE " +
")"
);
tableEnv.createTemporaryView("mmm", matches);
Table results = tableEnv.sqlQuery(
"SELECT * FROM events WHERE events.sensorId IN (select * from mmm)");
tableEnv
.toAppendStream(results, Row.class)
.print();
For some reason, I couldn't get it to work without defining a view. I kept getting Calcite errors.
I guess you are trying to avoid enumerating all of the columns from A in the MEASURES clause of the MATCH_RECOGNIZE. You may want to compare the resulting execution plans to see if there's any significant difference.

Related

exceeded the 80 characters length limit and was truncated in flink job

I am doing joining in flink and I am getting exceeded the 80 characters length limit and was truncated.
Table tr = tableEnv.sqlQuery("select " +
" coalesce(a.id, b.id) id," +
" coalesce(a.item, b.item) item," +
" a.amount as revenue," +
" b.amount as profit" +
" from " +
" (select * from tableA" +
" where type='revenue') a" +
" full outer join " +
" (select * from tableA" +
" where type='profit') b" +
" on a.id=b.id, a.item=b.item");
I am not sure how to resolve this. Is there any limit of character in joining?
I suspect you are seeing this warning:
The operator name {} exceeded the {} characters length limit and was truncated.
You can safely ignore this. This just means that label you see in the web UI won't show the complete SQL join.

wrong result in Apache flink full outer join

I have 2 data streams which were created from 2 tables like:
Table orderRes1 = ste.sqlQuery(
"SELECT orderId, userId, SUM(bidPrice) as q FROM " + tble +
" Group by orderId, userId");
Table orderRes2 = ste.sqlQuery(
"SELECT orderId, userId, SUM(askPrice) as q FROM " + tble +
" Group by orderId, userId");
DataStream<Tuple2<Boolean, Row>> ds1 = ste.toRetractStream(orderRes1 , Row.class).
filter(order-> order.f0);
DataStream<Tuple2<Boolean, Row>> ds2 = ste.toRetractStream(orderRes2 , Row.class).
filter(order-> order.f0);
I wonder to perform a full outer join on these 2 streams, and I used both orderRes1.fullOuterJoin(orderRes2 ,$(exp))
and a sql query containing a full outer join, as below:
Table bidOrdr = ste.fromDataStream(bidTuple, $("orderId"),
$("userId"), $("price"));
Table askOrdr = ste.fromDataStream(askTuple, $("orderId"),
$("userId"), $("price"));
Table result = ste.sqlQuery(
"SELECT COALESCE(bidTbl.orderId,askTbl.orderId) , " +
" COALESCE(bidTbl.userId,askTbl.orderId)," +
" COALESCE(bidTbl.bidTotalPrice,0) as bidTotalPrice, " +
" COALESCE(askTbl.askTotalPrice,0) as askTotalPrice, " +
" FROM " +
" (SELECT orderId, userId," +
" SUM(price) AS bidTotalPrice " +
" FROM " + bidOrdr +
" Group by orderId, userId) bidTbl full outer JOIN " +
" (SELECT orderId, userId," +
" SUM(price) AS askTotalPrice" +
" FROM " + askOrdr +
" Group by orderId, userId) askTbl " +
" ON (bidTbl.orderId = askTbl.orderId" +
" AND bidTbl.userId= askTbl.userId) ") ;
DataStream<Tuple2<Boolean, Row>> = ste.toRetractStream(result, Row.class).filter(order -> order.f0);
However, the result in some cases in not correct: imagine user A sells with a price to B 3 times, after that user B sells to A 2 times, the second time the result is:
7> (true,123,a,300.0,0.0)
7> (true,123,a,300.0,200.0)
10> (true,123,b,0.0,300.0)
10> (true,123,b,200.0,300.0)
the second and forth lines are the expected result of stream, but it will generate the 1st and 3rd lines too.
worth mentioning that coGroup is the other solution, yet I do not want to use windowing in this scenario, and a non-windowing solution is just accessible in bounded streams (DataSet).
Hint: orderId and userId will repeat in both streams, and I want to produce 2 rows in each action, containing:
orderId, userId1, bidTotalPrice, askTotalPrice AND
orderId, userId2, bidTotalPrice, askTotalPrice
Something like this is to be expected with streaming queries (or in other words, with queries executed on dynamic tables). Unlike a traditional database, where the input relations to a query are kept static during query execution, the inputs to a streaming query are being continuously updated -- and so the result must also be continuously updated.
If I understand the setup here, the "incorrect" results on lines 1 and 3 are correct up until the relevant rows from orderRes2 are processed. If those rows never arrive, then lines 1 and 3 will remain correct.
What you should expect is an eventually correct result, including retractions as necessary. You can reduce the number of intermediate results by turning on mini-batch aggregation.
This mailing list thread gives more insight. If I've misunderstood your situation, please provide a reproducible example that illustrates the problem.

What to substitute for array_agg in a PostgreSQL subquery when switching to Derby?

I have a java application that for some installations acceses a PostgreSQL database, while in others it acceses essentially the same database in Derby.
I have a SQL query that returns an examination record from the examination table. There is an exam_procedure table that relates to the examination table in a one (examination) to many fashion. I need to concatenate the potentially multiple string records in the exam_procedure table so that I can add a single string value to the query return that represents all the related exam_procedure records. For a variety of reasons (eg, joins return too many records, especially when multiple subqueries are needed for other related one to many tables), I need to do this via a subquery in the SELECT section of the main query. The following SQL works just fine for PostgreSQL, but my understanding is that array_agg is not available in Derby. What Derby subquery can I substitute for the PostgreSQL subquery?
Many thanks.
// part of the query
"SELECT "
+ "patient_id, "
+ "examination_date, "
+ "examination_number, "
+ "operating_physician_id, "
+ "referring_physician_id, "
+ "patient.last_name AS pt_last_name, "
+ "patient.first_name AS pt_first_name, "
+ "patient.middle_name AS pt_middle_name, "
+ "("
+ "SELECT "
+ "array_agg(prose) "
+ "FROM "
+ "exam_procedure "
+ "WHERE examination_id = " + examId
+ " GROUP BY examination_id"
+ ") AS agg_procedures, "
+ "FROM "
+ "examination "
+ "JOIN patient ON patient.id = examination.patient_id "
+ "WHERE "
+ "examination.id = ?"
;

org.hibernate.exception.SQLGrammarException: Conversion failed when converting the nvarchar value 'ViewWebApp' to data type int

I am trying to use the NativeQuery feature in Hibernate with JOIN and subqueries in SQL query, when I am passing the list of parameters with IN condition it's not working.
Code:
Query query1 = entityManager.createNativeQuery("select product0_.* from TEST.PRODUCT product0_ \n" +
"LEFT JOIN\n" +
"(select DISTINCT product1_.PRODUCT_ID from TEST.PRODUCT_GROUP_PERMISSION product1_ \n" +
"where product1_.GROUP_ID = 101 and product1_.PERMISSION_TYPE_ID in (:permissionTypes)\n" +
") \n" +
"AS ProductID ON product0_.PRODUCT_ID = ProductID.PRODUCT_ID where product0_.NAME=:productName and product0_.ACTIVE='1' \n" +
"and product0_.APPLICATION_ID = AppID.APPLICATION_ID order by upper(display_version)")
.setParameter("productName",productName)
.setParameter("permissionTypes",permissionTypes);
Error:
javax.persistence.PersistenceException: org.hibernate.exception.SQLGrammarException: could not execute query] with root cause
com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed when converting the nvarchar value 'ViewWebApp' to data type int.
Hibernate Create Query method:
Query cQuery = entityManager.createQuery("select product0_ from Product product0_ where product0_.name =:productN and product0_.active='1' and product0_.productId in (select product1_.application.applicationId from ApplicationGroupPermission product1_ where product1_.groups in (:myGroupIds) and product1_.permissionType.permissionTypeField in (:pList)) order by product0_.displayVersion asc ")
.setParameter("productN",productName)
.setParameter("pList",permissionTypes)
.setParameter("myGroupIds",myGroups);

HIbernate + MSSQL query compatibility

I need to get the latest "version" of a Task object for a given objectUuid. The Task is identified by its objectUuid, taskName and createdTimestamp attributes.
I had the following HQL query:
select new list(te) from " + TaskEntity.class.getName() + " te
where
te.objectUuid = '" + domainObjectId + "' and
te.createdTimestamp = (
select max(te.createdTimestamp) from " + TaskEntity.class.getName() + " teSub
where teSub.objectUuid = te.objectUuid and teSub.taskName = te.taskName
)
which ran and produced the correct results on H2 (embedded) and MySQL.
However after installing this code in production to MS SQL Server I get the following error:
An aggregate may not appear in the WHERE clause unless it is in a
subquery contained in a HAVING clause or a select list, and the column
being aggregated is an outer reference.
I tried to rewrite the query but HQL doesn't seem to support subqueries properly. My latest attempt is something like:
select new list(te) from " + TaskEntity.class.getName() + " te
inner join (
select te2.objectUuid, te2.taskName, max(te2.createdTimestamp)
from " + TaskEntity.class.getName() + " te2
group by te2.objectUuid, te2.taskName
) teSub on
teSub.objectUuid = te.objectUuid and teSub.taskName = te.taskName
where
te.objectUuid = '" + domainObjectId + "'
but of course it fails at the "(" after the join statement.
Since this is a very frequent type of query I cannot believe there is no solution that works with HQL+MSSQL.
Uh-oh. Can this be a typo?
max(teSub.createdTimestamp)
instead of
max(te.createdTimestamp)
in the subquery.

Resources