This code converts pandas to flink table do the transformation than again converting back to pandas. It perfectly works fine when I use filter filter than select but gives me an error when i add group_by and order_by.
import pandas as pd
import numpy as np
f_s_env = StreamExecutionEnvironment.get_execution_environment()
f_s_settings = EnvironmentSettings.new_instance().use_old_planner().in_streaming_mode().build()
table_env = StreamTableEnvironment.create(f_s_env, environment_settings=f_s_settings)
df = pd.read_csv("dataBase/New/candidate.csv")
col = ['candidate_id', 'candidate_source_id', 'candidate_first_name',
'candidate_middle_name', 'candidate_last_name', 'candidate_email',
'created_date', 'last_modified_date', 'last_modified_by']
table = table_env.from_pandas(df,col)
table.filter("candidate_id > 322445")\
.filter("candidate_first_name === 'Libby'")\
.group_by("candidate_id, candidate_source_id")\
.select("candidate_id, candidate_source_id")\
.order_by("candidate_id").to_pandas()
My error is
Py4JJavaError: An error occurred while calling o3164.orderBy.
: org.apache.flink.table.api.ValidationException: A limit operation on unbounded tables is currently not supported.
at org.apache.flink.table.operations.utils.SortOperationFactory.failIfStreaming(SortOperationFactory.java:131)
at org.apache.flink.table.operations.utils.SortOperationFactory.createSort(SortOperationFactory.java:63)
at org.apache.flink.table.operations.utils.OperationTreeBuilder.sort(OperationTreeBuilder.java:409)
at org.apache.flink.table.api.internal.TableImpl.orderBy(TableImpl.java:401)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
at org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
at org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:745)
If you look in the documentation, you will see that with the Table API, ORDER BY is only supported for batch queries. If you switch to SQL, then you can have streaming queries that sort on an ascending time attribute.
Sorting by anything else in an unbounded streaming query is simply impossible, since sorting requires full knowledge of the input.
Related
I have a ML model that takes two numpy.ndarray - users and items - and returns an numpy.ndarray predictions. In normal Python code, I would do:
model = load_model()
df = load_data() # the DataFrame includes 4 columns, namely, user_id, movie_id, rating, and timestamp
users = df.user_id.values
items = df.movie_id.values
predictions = model(users, items)
I am looking into porting this code into Flink to leverage its distributed nature. My assumption is: by distributing the prediction workload on multiple Flink nodes, I should be able to run the whole prediction faster.
So I compose a PyFlink job. Note I implement an UDF called predict to run the prediction.
# batch_prediction.py
model = load_model()
settings = EnvironmentSettings.new_instance().use_blink_planner().build()
exec_env = StreamExecutionEnvironment.get_execution_environment()
t_env = StreamTableEnvironment.create(exec_env, environment_settings=settings)
SOURCE_DDL = """
CREATE TABLE source (
user_id INT,
movie_id INT,
rating TINYINT,
event_ms BIGINT
) WITH (
'connector' = 'filesystem',
'format' = 'csv',
'csv.field-delimiter' = '\t',
'path' = 'ml-100k/u1.test'
)
"""
SINK_DDL = """
CREATE TABLE sink (
prediction DOUBLE
) WITH (
'connector' = 'print'
)
"""
t_env.execute_sql(SOURCE_DDL)
t_env.execute_sql(SINK_DDL)
t_env.execute_sql(
"INSERT INTO sink SELECT PREDICT(user_id, movie_id) FROM source"
).wait()
Here is the UDF.
# batch_prediction.py (cont)
#udf(result_type=DataTypes.DOUBLE())
def predict(user, item):
return model([user], [item]).item()
t_env.create_temporary_function("predict", predict)
The job runs fine. However, the prediction actually runs on each and every row of the source table, which is not performant. Instead, I want to split the 80,000 (user_id, movie_id) pairs into, let's say, 100 batches, with each batch having 800 rows. The job triggers the model(users, items) function 100 times (= # of batch), where both users and items have 800 elements.
I couldn't find a way to do this. By looking at the docs, vectorized user-defined functions may work.
# batch_prediction.py (snippet)
# I add the func_type="pandas"
#udf(result_type=DataTypes.DOUBLE(), func_type="pandas")
def predict(user, item):
...
Unfortunately, it doesn't.
> python batch_prediction.py
...
Traceback (most recent call last):
File "batch_prediction.py", line 55, in <module>
"INSERT INTO sink SELECT PREDICT(user_id, movie_id) FROM source"
File "/usr/local/anaconda3/envs/flink-ml/lib/python3.7/site-packages/pyflink/table/table_result.py", line 76, in wait
get_method(self._j_table_result, "await")()
File "/usr/local/anaconda3/envs/flink-ml/lib/python3.7/site-packages/py4j/java_gateway.py", line 1286, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/usr/local/anaconda3/envs/flink-ml/lib/python3.7/site-packages/pyflink/util/exceptions.py", line 147, in deco
return f(*a, **kw)
File "/usr/local/anaconda3/envs/flink-ml/lib/python3.7/site-packages/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o51.await.
: java.util.concurrent.ExecutionException: org.apache.flink.table.api.TableException: Failed to wait job finish
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at org.apache.flink.table.api.internal.TableResultImpl.awaitInternal(TableResultImpl.java:119)
at org.apache.flink.table.api.internal.TableResultImpl.await(TableResultImpl.java:86)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
at org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
at org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.table.api.TableException: Failed to wait job finish
at org.apache.flink.table.api.internal.InsertResultIterator.hasNext(InsertResultIterator.java:59)
at org.apache.flink.table.api.internal.TableResultImpl$CloseableRowIteratorWrapper.hasNext(TableResultImpl.java:355)
at org.apache.flink.table.api.internal.TableResultImpl$CloseableRowIteratorWrapper.isFirstRowReady(TableResultImpl.java:368)
at org.apache.flink.table.api.internal.TableResultImpl.lambda$awaitInternal$1(TableResultImpl.java:107)
at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
Caused by: java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at org.apache.flink.table.api.internal.InsertResultIterator.hasNext(InsertResultIterator.java:57)
... 7 more
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
at org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$2(MiniClusterJobClient.java:119)
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
at org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:229)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:996)
at akka.dispatch.OnComplete.internal(Future.scala:264)
at akka.dispatch.OnComplete.internal(Future.scala:261)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572)
at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22)
at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21)
at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436)
at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy
at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116)
at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78)
at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:224)
at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:217)
at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:208)
at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:610)
at org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:89)
at org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:419)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:286)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:201)
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
at akka.actor.ActorCell.invoke(ActorCell.scala:561)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
... 4 more
Caused by: org.apache.flink.streaming.runtime.tasks.AsynchronousException: Caught exception while processing timer.
at org.apache.flink.streaming.runtime.tasks.StreamTask$StreamTaskAsyncExceptionHandler.handleAsyncException(StreamTask.java:1108)
at org.apache.flink.streaming.runtime.tasks.StreamTask.handleAsyncException(StreamTask.java:1082)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invokeProcessingTimeCallback(StreamTask.java:1213)
at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$null$17(StreamTask.java:1202)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:92)
at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxExecutorImpl.tryYield(MailboxExecutorImpl.java:91)
at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.quiesceTimeServiceAndCloseOperator(StreamOperatorWrapper.java:155)
at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.close(StreamOperatorWrapper.java:130)
at org.apache.flink.streaming.runtime.tasks.OperatorChain.closeOperators(OperatorChain.java:412)
at org.apache.flink.streaming.runtime.tasks.StreamTask.afterInvoke(StreamTask.java:585)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:547)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:722)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:547)
at java.lang.Thread.run(Thread.java:748)
Caused by: TimerException{java.lang.RuntimeException: Failed to close remote bundle}
... 13 more
Caused by: java.lang.RuntimeException: Failed to close remote bundle
at org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.finishBundle(BeamPythonFunctionRunner.java:371)
at org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.flush(BeamPythonFunctionRunner.java:325)
at org.apache.flink.streaming.api.operators.python.AbstractPythonFunctionOperator.invokeFinishBundle(AbstractPythonFunctionOperator.java:291)
at org.apache.flink.table.runtime.operators.python.scalar.arrow.RowDataArrowPythonScalarFunctionOperator.invokeFinishBundle(RowDataArrowPythonScalarFunctionOperator.java:77)
at org.apache.flink.streaming.api.operators.python.AbstractPythonFunctionOperator.checkInvokeFinishBundleByTime(AbstractPythonFunctionOperator.java:285)
at org.apache.flink.streaming.api.operators.python.AbstractPythonFunctionOperator.lambda$open$0(AbstractPythonFunctionOperator.java:134)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invokeProcessingTimeCallback(StreamTask.java:1211)
... 12 more
Caused by: java.lang.NullPointerException
at org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.finishBundle(BeamPythonFunctionRunner.java:369)
... 18 more
The error messages are not very helpful. Can anyone help? Thanks!
Note: source code can be found here. To run the code, you will need Anaconda locally, then:
conda env create -f environment.yml
conda activate flink-ml
Credits to Dian Fu from Apache Flink community. See thread.
For Pandas UDF, the input type for each input argument is Pandas.Series and the result type should also be a Pandas.Series. Besides, the length of the result should be the same as the inputs. Could you check if this is the case for your Pandas UDF implementation?
Then I decide to add a pytest unit test for my UDF to verify the input and output type. Here is how:
import pandas as pd
from udf_def import predict
def test_predict():
f = predict._func
users = pd.Series([1, 2, 3])
items = pd.Series([1, 4, 9])
preds = f(users, items)
assert isinstance(preds, pd.Series)
assert len(preds) == 3
It uncovers the implementation mistake. Then I fix my UDF implementation to pass the unit test.
from model_def import MatrixFactorization
#udf(result_type=DataTypes.DOUBLE(), func_type="pandas")
def predict(users, items):
n_users, n_items = 943, 1682
model = MatrixFactorization(n_users, n_items)
model.load_state_dict(torch.load("model.pth"))
return pd.Series(model(users, items).detach().numpy())
With the new UDF implementation, I am able to run the batch prediction.
> python batch_prediction.py
1> +I(-2.3802385330200195)
1> +I(20.000154495239258)
1> +I(-30.704544067382812)
1> +I(-11.602800369262695)
1> +I(10.998968124389648)
...
The updated source code can be found here.
I'm trying to import some data from different SqlServer databases using ExecuteSQL in NiFi, but it's returning me an error. I've already imported a lot of other tables from MySQL databases without any problem and I'm trying to use the same workflow structure for the SqlServer dbs.
The structure is as follows:
There's a file .txt with the list of tables to be imported
This file is fetched, splitted and uptaded; so there's a FlowFile for each table of each db that has to be imported,
These FlowFiles are passed into ExecuteSQL which executes their contents
For example:
file.txt
table1
table2
table3
is being updated into 3 different FlowFiles:
FlowFile1
SELECT * FROM table1
FlowFile2
SELECT * FROM table2
FlowFile3
SELECT * FROM table3
which are passed to ExecuteSQL.
Here follows the configuration of ExecuteSQL (identical for SqlServer tables and MySQL ones)
ExecuteSQL
As the only difference with the import from MySQL db is in the connectors, this is how a generic MySQL connector has been configured:
SETTINGSPROPERTIES
Database Connection URL jdbc:mysql://00.00.00.00/DataBase?zeroDateTimeBehavior=convertToNull&autoReconnect=true
Database Driver Class Name com.mysql.jdbc.Driver
Database Driver Location(s) file:///path/mysql-connector-java-5.1.47-bin.jar
Database User user
PasswordSensitive value set
Max Wait Time 500 millis
Max Total Connections 8
Validation query No value set
And this is how a SqlServer connector has been configured:
SETTINGSPROPERTIES
Database Connection URL jdbc:jtds:sqlserver://00.00.00.00/DataBase;useNTLMv2=true;integratedSecurity=true;
Database Driver Class Name net.sourceforge.jtds.jdbc.Driver
Database Driver Location(s) /path/connectors/jtds-1.3.1.jar
Database User user
PasswordSensitive value set
Max Wait Time -1
Max Total Connections 8
Validation query No value set
It has to be noticed that one (only one!) SqlServer connector works and the ExecuteSQL processor imports the data without any problem. The even stranger thing is that the database that is being connected via this connector is located in the same place as other two (the connection URL and user/psw are identical), but only the first one is working.
Notice that I've tried appending ?zeroDateTimeBehavior=convertToNull&autoReconnect=true also to the SqlServer connections, supposing it was a problem of date type, but it didn't give any positive change.
Here is the error that is being returned:
12:02:46 CEST ERROR f1553b83-a173-1c0f-93cb-1c32f0f46d1d
00.00.00.00:0000 ExecuteSQL[id=****] ExecuteSQL[id=****] failed to process session due to null; Processor Administratively Yielded for 1 sec: java.lang.AbstractMethodError
Error retrieved from logs:
ERROR [Timer-Driven Process Thread-49] o.a.nifi.processors.standard.ExecuteSQL ExecuteSQL[id=****] ExecuteSQL[id=****] failed to process session due to java.lang.AbstractMethodError; Processor Administratively Yielded for 1 sec: java.lang.AbstractMethodError
java.lang.AbstractMethodError: null
at net.sourceforge.jtds.jdbc.JtdsConnection.isValid(JtdsConnection.java:2833)
at org.apache.commons.dbcp2.DelegatingConnection.isValid(DelegatingConnection.java:874)
at org.apache.commons.dbcp2.PoolableConnection.validate(PoolableConnection.java:270)
at org.apache.commons.dbcp2.PoolableConnectionFactory.validateConnection(PoolableConnectionFactory.java:389)
at org.apache.commons.dbcp2.BasicDataSource.validateConnectionFactory(BasicDataSource.java:2398)
at org.apache.commons.dbcp2.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:2381)
at org.apache.commons.dbcp2.BasicDataSource.createDataSource(BasicDataSource.java:2110)
at org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:1563)
at org.apache.nifi.dbcp.DBCPConnectionPool.getConnection(DBCPConnectionPool.java:305)
at org.apache.nifi.dbcp.DBCPService.getConnection(DBCPService.java:49)
at sun.reflect.GeneratedMethodAccessor1696.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:84)
at com.sun.proxy.$Proxy449.getConnection(Unknown Source)
at org.apache.nifi.processors.standard.AbstractExecuteSQL.onTrigger(AbstractExecuteSQL.java:195)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I am using Java-Hibernate with two Databases (Postgresql and MSSQL).
SqlServer2012 with dialect:
hibernate.dialect = org.hibernate.dialect.SQLServer2012Dialect
I have written a Criteria query like :
DetachedCriteria detachedCriteria=DetachedCriteria.forClass(entityClazz);
ProjectionList proj = Projections.projectionList();
proj.add(Projections.max(COMPOSEDID_VERSION_ID));
proj.add(Projections.groupProperty(COMPOSEDID_ID));
detachedCriteria.setProjection(proj);
criteria = session.createCriteria(entityClazz)
.add( Subqueries.propertiesIn(new String[] { COMPOSEDID_VERSION_ID, COMPOSEDID_ID }, detachedCriteria));
This query worked fine with Postgre Db. But when i switch to MSSQL i get the following error:
Caused by: java.sql.SQLException: An expression of non-boolean type specified in a context where a condition is expected, near ','.
at net.sourceforge.jtds.jdbc.SQLDiagnostic.addDiagnostic(SQLDiagnostic.java:372)
at net.sourceforge.jtds.jdbc.TdsCore.tdsErrorToken(TdsCore.java:2988)
at net.sourceforge.jtds.jdbc.TdsCore.nextToken(TdsCore.java:2421)
at net.sourceforge.jtds.jdbc.TdsCore.getMoreResults(TdsCore.java:671)
at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQLQuery(JtdsStatement.java:505)
at net.sourceforge.jtds.jdbc.JtdsPreparedStatement.executeQuery(JtdsPreparedStatement.java:1029)
at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(ResultSetReturnImpl.java:70)[201:org.hibernate.core:5.0.0.Final]
Can anyone help me out? What change should i made in Criteria API to achieve my goal to get maxVersion record against each Id??
Instead of adding subqueries in criteria of projections from detached critaria, add projection directly in critaria like this:
DetachedCriteria detachedCriteria=DetachedCriteria.forClass(entityClazz);
ProjectionList proj = Projections.projectionList();
proj.add(Projections.max(COMPOSEDID_VERSION_ID));
proj.add(Projections.groupProperty(COMPOSEDID_ID));
criteria = session.createCriteria(entityClazz)
.setProjection( proj );
I wanted to convert my JSON string to SQL statement by using ConvertJSONtoSQL processor.
example: JSON string -
{"cpuwait":"0.0","servernamee":"mywindows","cpusys":"5.3","cpuidle":"77.6","datee":"29-SEP-2016","timee":"00:01:33","cpucpuno":"CPU01","cpuuser":"17.1"}
Table structure in oracle db -
CREATE TABLE cpu (
datee varchar2(15) DEFAULT NULL,
timee varchar2(10) DEFAULT NULL,
servernamee varchar2(20) DEFAULT NULL,
cpucpuno varchar2(4) DEFAULT NULL,
cpuuser varchar2(5) DEFAULT NULL,
cpusys varchar2(5) DEFAULT NULL,
cpuwait varchar2(5) DEFAULT NULL,
cpuidle varchar2(5) DEFAULT NULL
);
Configuration used for MySQL Database:
Database connection url:jdbc:mysql://localhost:3306/testnifi
Database Driver classname:com.mysql.jdbc.Driver
I was successfully connected to MySQL using(DBCP connection pool) JDBC url,username and password.
ConvertJSONtoSQL processor successfully worked there and I'm getting valid sql insert statement as output.
But when i was trying the same with Oracle Database I'm getting
ERROR [Timer-Driven Process Thread-6] o.a.n.p.standard.ConvertJSONToSQL
java.sql.SQLException: Stream has already been closed
My configuration for Oracle db connection:
I searched for the error in google but I found that this error will occur when Long Datatypes were used in database tables but I'm not using them.
I went through the source code of ConvertJSONtoSQL processor(following stack trace) and tried to implement the same in eclipse where I'm not getting any error ,I can connect to database and make queries.
So is there any mistake in my configuration?
Nifi version - 0.7.0/1.0(i'm getting same error in both)
java version - java8
Oracle DB version - Oracle Database 11g Express Edition
Complete Stack trace:
2016-10-19 07:10:06,557 ERROR [Timer-Driven Process Thread-6] o.a.n.p.standard.ConvertJSONToSQL
java.sql.SQLException: Stream has already been closed
at oracle.jdbc.driver.LongAccessor.getBytesInternal(LongAccessor.java:156) ~[ojdbc6.jar:11.2.0.1.0]
at oracle.jdbc.driver.LongAccessor.getBytes(LongAccessor.java:126) ~[ojdbc6.jar:11.2.0.1.0]
at oracle.jdbc.driver.LongAccessor.getString(LongAccessor.java:201) ~[ojdbc6.jar:11.2.0.1.0]
at oracle.jdbc.driver.T4CLongAccessor.getString(T4CLongAccessor.java:427) ~[ojdbc6.jar:11.2.0.1.0]
at oracle.jdbc.driver.OracleResultSetImpl.getString(OracleResultSetImpl.java:1251) ~[ojdbc6.jar:11.2.0.1.0]
at oracle.jdbc.driver.OracleResultSet.getString(OracleResultSet.java:494) ~[ojdbc6.jar:11.2.0.1.0]
at org.apache.commons.dbcp.DelegatingResultSet.getString(DelegatingResultSet.java:263) ~[na:na]
at org.apache.nifi.processors.standard.ConvertJSONToSQL$ColumnDescription.from(ConvertJSONToSQL.java:677) ~[nifi-standard-processors-0.7.0.jar:0.7.0]
at org.apache.nifi.processors.standard.ConvertJSONToSQL$TableSchema.from(ConvertJSONToSQL.java:621) ~[nifi-standard-processors-0.7.0.jar:0.7.0]
at org.apache.nifi.processors.standard.ConvertJSONToSQL.onTrigger(ConvertJSONToSQL.java:267) ~[nifi-standard-processors-0.7.0.jar:0.7.0]
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) [nifi-api-0.7.0.jar:0.7.0]
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1054) [nifi-framework-core-0.7.0.jar:0.7.0]
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136) [nifi-framework-core-0.7.0.jar:0.7.0]
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) [nifi-framework-core-0.7.0.jar:0.7.0]
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:127) [nifi-framework-core-0.7.0.jar:0.7.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [na:1.7.0_40]
at java.util.concurrent.FutureTask.runAndReset(Unknown Source) [na:1.7.0_40]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source) [na:1.7.0_40]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [na:1.7.0_40]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.7.0_40]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.7.0_40]
at java.lang.Thread.run(Unknown Source) [na:1.7.0_40
It seems a bug in Oracle driver. See:
https://blog.jooq.org/2015/12/30/oracle-long-and-long-raw-causing-stream-has-already-been-closed-exception/
Hibernate custom type to avoid 'Caused by: java.sql.SQLException: Stream has already been closed'
The item 2 give me the workaround. Basically add in bootstrap.conf the following argument:
java.arg.xx=-Doracle.jdbc.useFetchSizeWithLongColumn=true
I've 2 buttons on my webpage. And when I hit the first button, the system time gets updated in starttime of my sql table, and when I hit a stop, The end time has to be updated and the difference between both should be updated in anothere table. I've using the below queries.
To update the end time and to update the total time.
update breakstable set endtime = ?, TotalBreakTime = (? - StartTime) where userid = ?
and endtime is NULL
here first 2 ?s refers to a button click that happens and another ? for the logged in userId getting captured from session.
update another table with the sum of the totalbreaktime.
MERGE Time_Tracker as target using (SELECT USERID, CONVERT(NUMERIC(10,2),
SUM(DATEDIFF(Second, '19000101', TotalBreakTime))/60.0) as ColumnWithBreaksCount FROM
BreaksTable where CONVERT(Date, StartTime) = CONVERT(Date, GETDATE()) GROUP BY USERID)
as source ON target.USERID = source.USERID WHEN MATCHED THEN UPDATE
SET BREAKS = source.ColumnWithBreaksCount;"
problem:
I start my time, go on a break and return after an hour and half and i hit the stop button. Instead of updating the table it is giving me the below Exception.
com.microsoft.sqlserver.jdbc.SQLServerException: The datediff
function resulted in an overflow. The number of dateparts separating
two date/time instances is too large. Try to use datediff with a less
precise datepart.
and the JDBC Exception (For Java guys) is as below.
com.microsoft.sqlserver.jdbc.SQLServerException: The datediff function
resulted in an overflow. The number of dateparts separating two
date/time instances is too large. Try to use datediff with a less
precise datepart. at
com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:196)
at
com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1454)
at
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:388)
at
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:338)
at
com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:4026)
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1416)
at
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:185)
at
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:160)
at
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeUpdate(SQLServerPreparedStatement.java:306)
at
org.DAO.UpdateEndTimeDAO.UpdateEndTimeDetails(UpdateEndTimeDAO.java:48)
at org.servlet.UpdateEndTime.doPost(UpdateEndTime.java:38) at
javax.servlet.http.HttpServlet.service(HttpServlet.java:648) at
javax.servlet.http.HttpServlet.service(HttpServlet.java:729) at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:291)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at Filter.MyFilter.doFilter(MyFilter.java:58) at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:217)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:142)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
at
org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:616)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:518)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1091)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:673)
at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1500)
at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1456)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Unknown Source)
And the exception points to the second query declared in my question.
If the time is less, say like 2 min or 3 mins, it is updating without any issue.
please where am I going wrong and how can I fix this.
My Breaks table looks like below
Instead of null, this should be the end time captured on button click
And my Time tracker looks like below.
instead of 4.97 in above screenshot it should be the sum of the totalbreaktime from my first screenshot.
Thanks
This is because DATETIME functions like DATEDIFF, DATEADD etc. accept and returns integer (INT) data type. Limit of the data type is 2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647).
So if you add -2147483648 to TotalBreakTime. It returns "1948-08-08 14:01:46.000"
SELECT DATEADD(SS,CAST(-2147483648 AS BIGINT),'2016-08-26 17:15:54.000');
And more the the limit of datatype returns ERROR.
Arithmetic overflow error converting expression to data type int.
Because date difference in second is too large (out of limit of datatype) between the dates in the query. It returns the error.
SELECT CONVERT(NUMERIC(10,2),
SUM(CAST(DATEDIFF(Second, '19000101', '2016-08-26 17:15:54.000') AS BIGINT))/60.0)
I would suggest to discuss logic with you team resolve the issue accordingly.
Thanks
try this:
MERGE Time_Tracker as target
using
(SELECT USERID,
Sum(cast(cast(TotalBreakTime as float)
* 86400 as bigint)) ColumnWithBreaksCount
FROM BreaksTable b
Where datediff(day, StartTime, GETDATE()) = 0
GROUP BY USERID) source
ON target.USERID = source.USERID
WHEN MATCHED THEN UPDATE
SET BREAKS = source.ColumnWithBreaksCount;"