exceeded the 80 characters length limit and was truncated in flink job - apache-flink

I am doing joining in flink and I am getting exceeded the 80 characters length limit and was truncated.
Table tr = tableEnv.sqlQuery("select " +
" coalesce(a.id, b.id) id," +
" coalesce(a.item, b.item) item," +
" a.amount as revenue," +
" b.amount as profit" +
" from " +
" (select * from tableA" +
" where type='revenue') a" +
" full outer join " +
" (select * from tableA" +
" where type='profit') b" +
" on a.id=b.id, a.item=b.item");
I am not sure how to resolve this. Is there any limit of character in joining?

I suspect you are seeing this warning:
The operator name {} exceeded the {} characters length limit and was truncated.
You can safely ignore this. This just means that label you see in the web UI won't show the complete SQL join.

Related

Nested match_recognize query not supported in flink SQL?

I am using flink 1.11 and trying nested query where match_recognize is inside, as shown below :
select * from events where id = (SELECT * FROM events MATCH_RECOGNIZE (PARTITION BY org_id ORDER BY proctime MEASURES A.id AS startId ONE ROW PER MATCH PATTERN (A C* B) DEFINE A AS A.tag = 'tag1', C AS C.tag <> 'tag2', B AS B.tag = 'tag2'));
And I am getting an error as : org.apache.calcite.sql.validate.SqlValidatorException: Table 'A' not found
Is this not supported ? If not what's the alternative ?
I was able to get something working by doing this:
Table events = tableEnv.fromDataStream(input,
$("sensorId"),
$("ts").rowtime(),
$("kwh"));
tableEnv.createTemporaryView("events", events);
Table matches = tableEnv.sqlQuery(
"SELECT id " +
"FROM events " +
"MATCH_RECOGNIZE ( " +
"PARTITION BY sensorId " +
"ORDER BY ts " +
"MEASURES " +
"this_step.sensorId AS id " +
"AFTER MATCH SKIP TO NEXT ROW " +
"PATTERN (this_step next_step) " +
"DEFINE " +
"this_step AS TRUE, " +
"next_step AS TRUE " +
")"
);
tableEnv.createTemporaryView("mmm", matches);
Table results = tableEnv.sqlQuery(
"SELECT * FROM events WHERE events.sensorId IN (select * from mmm)");
tableEnv
.toAppendStream(results, Row.class)
.print();
For some reason, I couldn't get it to work without defining a view. I kept getting Calcite errors.
I guess you are trying to avoid enumerating all of the columns from A in the MEASURES clause of the MATCH_RECOGNIZE. You may want to compare the resulting execution plans to see if there's any significant difference.

wrong result in Apache flink full outer join

I have 2 data streams which were created from 2 tables like:
Table orderRes1 = ste.sqlQuery(
"SELECT orderId, userId, SUM(bidPrice) as q FROM " + tble +
" Group by orderId, userId");
Table orderRes2 = ste.sqlQuery(
"SELECT orderId, userId, SUM(askPrice) as q FROM " + tble +
" Group by orderId, userId");
DataStream<Tuple2<Boolean, Row>> ds1 = ste.toRetractStream(orderRes1 , Row.class).
filter(order-> order.f0);
DataStream<Tuple2<Boolean, Row>> ds2 = ste.toRetractStream(orderRes2 , Row.class).
filter(order-> order.f0);
I wonder to perform a full outer join on these 2 streams, and I used both orderRes1.fullOuterJoin(orderRes2 ,$(exp))
and a sql query containing a full outer join, as below:
Table bidOrdr = ste.fromDataStream(bidTuple, $("orderId"),
$("userId"), $("price"));
Table askOrdr = ste.fromDataStream(askTuple, $("orderId"),
$("userId"), $("price"));
Table result = ste.sqlQuery(
"SELECT COALESCE(bidTbl.orderId,askTbl.orderId) , " +
" COALESCE(bidTbl.userId,askTbl.orderId)," +
" COALESCE(bidTbl.bidTotalPrice,0) as bidTotalPrice, " +
" COALESCE(askTbl.askTotalPrice,0) as askTotalPrice, " +
" FROM " +
" (SELECT orderId, userId," +
" SUM(price) AS bidTotalPrice " +
" FROM " + bidOrdr +
" Group by orderId, userId) bidTbl full outer JOIN " +
" (SELECT orderId, userId," +
" SUM(price) AS askTotalPrice" +
" FROM " + askOrdr +
" Group by orderId, userId) askTbl " +
" ON (bidTbl.orderId = askTbl.orderId" +
" AND bidTbl.userId= askTbl.userId) ") ;
DataStream<Tuple2<Boolean, Row>> = ste.toRetractStream(result, Row.class).filter(order -> order.f0);
However, the result in some cases in not correct: imagine user A sells with a price to B 3 times, after that user B sells to A 2 times, the second time the result is:
7> (true,123,a,300.0,0.0)
7> (true,123,a,300.0,200.0)
10> (true,123,b,0.0,300.0)
10> (true,123,b,200.0,300.0)
the second and forth lines are the expected result of stream, but it will generate the 1st and 3rd lines too.
worth mentioning that coGroup is the other solution, yet I do not want to use windowing in this scenario, and a non-windowing solution is just accessible in bounded streams (DataSet).
Hint: orderId and userId will repeat in both streams, and I want to produce 2 rows in each action, containing:
orderId, userId1, bidTotalPrice, askTotalPrice AND
orderId, userId2, bidTotalPrice, askTotalPrice
Something like this is to be expected with streaming queries (or in other words, with queries executed on dynamic tables). Unlike a traditional database, where the input relations to a query are kept static during query execution, the inputs to a streaming query are being continuously updated -- and so the result must also be continuously updated.
If I understand the setup here, the "incorrect" results on lines 1 and 3 are correct up until the relevant rows from orderRes2 are processed. If those rows never arrive, then lines 1 and 3 will remain correct.
What you should expect is an eventually correct result, including retractions as necessary. You can reduce the number of intermediate results by turning on mini-batch aggregation.
This mailing list thread gives more insight. If I've misunderstood your situation, please provide a reproducible example that illustrates the problem.

What to substitute for array_agg in a PostgreSQL subquery when switching to Derby?

I have a java application that for some installations acceses a PostgreSQL database, while in others it acceses essentially the same database in Derby.
I have a SQL query that returns an examination record from the examination table. There is an exam_procedure table that relates to the examination table in a one (examination) to many fashion. I need to concatenate the potentially multiple string records in the exam_procedure table so that I can add a single string value to the query return that represents all the related exam_procedure records. For a variety of reasons (eg, joins return too many records, especially when multiple subqueries are needed for other related one to many tables), I need to do this via a subquery in the SELECT section of the main query. The following SQL works just fine for PostgreSQL, but my understanding is that array_agg is not available in Derby. What Derby subquery can I substitute for the PostgreSQL subquery?
Many thanks.
// part of the query
"SELECT "
+ "patient_id, "
+ "examination_date, "
+ "examination_number, "
+ "operating_physician_id, "
+ "referring_physician_id, "
+ "patient.last_name AS pt_last_name, "
+ "patient.first_name AS pt_first_name, "
+ "patient.middle_name AS pt_middle_name, "
+ "("
+ "SELECT "
+ "array_agg(prose) "
+ "FROM "
+ "exam_procedure "
+ "WHERE examination_id = " + examId
+ " GROUP BY examination_id"
+ ") AS agg_procedures, "
+ "FROM "
+ "examination "
+ "JOIN patient ON patient.id = examination.patient_id "
+ "WHERE "
+ "examination.id = ?"
;

How to return a subquery in LINQ/EF

I have the following method who return a SQL subquery. With the method's return I elaborate the main query.
But now I need to do this using a LINQ query.
How can I do it?
Public void AvailableStock()
{
string query = "Select prod.ID, prod.Name, ";
query += AvailableStockQuery("prod.ID") + " as AvailableStock ";
query += " From TAB_Products prod ";
}
Public string AvailableStockQuery(string ProductAlias = "prod.ID")
{
string query = "((Select Sum(est.Quantity) " +
" From ProductStock est " +
" Where est.ProductID = " + ProductAlias +
" ) " +
" - (Select Sum(it.Quantity) " +
" From OrderItens it " +
" Where it.ProductID = " + ProductAlias +
")" +
") ";
return query;
}
But, you don't even need a sub query here. You could just join ProductStock and OrderItens tables to the TAB_Products table, and group by prod.ID and then the sub queries are not needed at all. Probably the performance is also better. And it is easier to translate this to EF because there are no sub queries.
Something like this:
SELECT prod.ID, FIRST(prod.Name), (SUM(est.Quantity) - Sum(it.Quantity)) AS AvailableStock
From TAB_Products prod
LEFT JOIN ProductStock est ON est.ProductID = prod.ID
LEFT JOIN OrderItens it ON it.ProductID = prod.ID
GROUP BY prod.ID
But, if you want you can make subqueries also, here is an example:
https://learn.microsoft.com/en-us/dotnet/csharp/linq/perform-a-subquery-on-a-grouping-operation

HIbernate + MSSQL query compatibility

I need to get the latest "version" of a Task object for a given objectUuid. The Task is identified by its objectUuid, taskName and createdTimestamp attributes.
I had the following HQL query:
select new list(te) from " + TaskEntity.class.getName() + " te
where
te.objectUuid = '" + domainObjectId + "' and
te.createdTimestamp = (
select max(te.createdTimestamp) from " + TaskEntity.class.getName() + " teSub
where teSub.objectUuid = te.objectUuid and teSub.taskName = te.taskName
)
which ran and produced the correct results on H2 (embedded) and MySQL.
However after installing this code in production to MS SQL Server I get the following error:
An aggregate may not appear in the WHERE clause unless it is in a
subquery contained in a HAVING clause or a select list, and the column
being aggregated is an outer reference.
I tried to rewrite the query but HQL doesn't seem to support subqueries properly. My latest attempt is something like:
select new list(te) from " + TaskEntity.class.getName() + " te
inner join (
select te2.objectUuid, te2.taskName, max(te2.createdTimestamp)
from " + TaskEntity.class.getName() + " te2
group by te2.objectUuid, te2.taskName
) teSub on
teSub.objectUuid = te.objectUuid and teSub.taskName = te.taskName
where
te.objectUuid = '" + domainObjectId + "'
but of course it fails at the "(" after the join statement.
Since this is a very frequent type of query I cannot believe there is no solution that works with HQL+MSSQL.
Uh-oh. Can this be a typo?
max(teSub.createdTimestamp)
instead of
max(te.createdTimestamp)
in the subquery.

Resources