I'm trying to connect to prestodb from Zeppelin using the generic JDBC interpreter. Here's the configuration:
presto %jdbc (default)
Option Shared
Properties
name value
default.driver com.facebook.presto.jdbc.PrestoDriver
default.url jdbc:presto://presto:8080
default.user presto
zeppelin.jdbc.concurrent.max_connection 10
zeppelin.jdbc.concurrent.use true
Dependencies
artifact exclude
/zeppelin/interpreter/jdbc/presto-jdbc-0.157.jar
I can successfully connect and query using the CLI:
./presto --server presto:8080
But when I try to use any query inside a notebook paragraph I get:
null
class java.lang.NullPointerException
org.apache.zeppelin.jdbc.JDBCInterpreter.getMaxResult(JDBCInterpreter.java:471)
org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:307)
org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:408)
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:94)
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:341)
org.apache.zeppelin.scheduler.Job.run(Job.java:176)
org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
Is there something missing in my JDBC interpreter configuration?
It turns out it was a simple configuration issue. Looking at the code for getMaxResult:
propertiesMap.get(COMMON_KEY).getProperty(MAX_LINE_KEY, MAX_LINE_DEFAULT));
So I suspected JDBC interpreters had some configuration that I forgot to include on my presto interpreter. Looking for the keywords COMMON and MAX, there was it:
common.max_count 1000
Adding this property to the presto interpreter solved this problem.
Related
I am trying to create a data pipeline from "SQL SERVER (from GCP VM)" To "BigQuery" using CLOUD DATA FUSION; I have done all the below setup configurations,
Created the new instance in Cloud data fusion.
Added this as a service account in IAM & Admin.
Installed the JDBC driver in SQL Server plugin
Create the wrangler and read the data from SQL server using this SQL Server plugin (in this step I can successfully authenticate my SQL server and I can see my SQL table data in it)
I Completed the pipleine config by adding Bigquery as a sink.
And I try run the pipeline and it end up with few errors; I have tried few google search but I didn't get the answer.
I was able to create a data fusion pipeline between "GCS To BigQuery" and it was working fine. but this "SQL server to big query" pipeline showing some Error.
Could anyone please help me on this?
Here is the error details,
2020-01-10 13:00:47,528 - WARN [Thread-95:o.a.h.m.LocalJobRunner#589] - job_local976595976_0001
java.lang.Exception: java.lang.NullPointerException
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:491) ~[hadoop-mapreduce-client-common-2.9.2.jar:na]
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:551) ~[hadoop-mapreduce-client-common-2.9.2.jar:na]
java.lang.NullPointerException: null
at org.apache.hadoop.mapreduce.lib.db.DataDrivenDBInputFormat.createDBRecordReader(DataDrivenDBInputFormat.java:281) ~[hadoop-mapreduce-client-core-2.9.2.jar:na]
at io.cdap.plugin.db.batch.source.DataDrivenETLDBInputFormat.createDBRecordReader(DataDrivenETLDBInputFormat.java:124) ~[1578661227434-0/:na]
at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.createRecordReader(DBInputFormat.java:245) ~[hadoop-mapreduce-client-core-2.9.2.jar:na]
at io.cdap.cdap.etl.batch.preview.LimitingInputFormat.createRecordReader(LimitingInputFormat.java:51) ~[cdap-etl-core-6.1.0.jar:na]
at io.cdap.cdap.internal.app.runtime.batch.dataset.input.MultiInputFormat.createRecordReader(MultiInputFormat.java:92) ~[na:na]
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.(MapTask.java:521) ~[hadoop-mapreduce-client-core-2.9.2.jar:na]
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) ~[hadoop-mapreduce-client-core-2.9.2.jar:na]
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) ~[hadoop-mapreduce-client-core-2.9.2.jar:na]
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270) ~[hadoop-mapreduce-client-common-2.9.2.jar:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_232]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_232]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_232]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_232]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_232]
2020-01-10 13:00:50,841 - ERROR [MapReduceRunner-phase-1:i.c.c.i.a.r.ProgramControllerServiceAdapter#97] - MapReduce Program 'phase-1' failed.
java.lang.IllegalStateException: MapReduce JobId job_local976595976_0001 failed
at com.google.common.base.Preconditions.checkState(Preconditions.java:176) ~[com.google.guava.guava-13.0.1.jar:na]
at io.cdap.cdap.internal.app.runtime.batch.MapReduceRuntimeService.run(MapReduceRuntimeService.java:416) ~[na:na]
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52) ~[com.google.guava.guava-13.0.1.jar:na]
at io.cdap.cdap.internal.app.runtime.batch.MapReduceRuntimeService$2$1.run(MapReduceRuntimeService.java:450) [na:na]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_232]
2020-01-10 13:00:50,842 - ERROR [MapReduceRunner-phase-1:i.c.c.i.a.r.ProgramControllerServiceAdapter#98] - MapReduce program 'phase-1' failed with error: MapReduce JobId job_local976595976_0001 failed. Please check the system logs for more details.
java.lang.IllegalStateException: MapReduce JobId job_local976595976_0001 failed
at com.google.common.base.Preconditions.checkState(Preconditions.java:176) ~[com.google.guava.guava-13.0.1.jar:na]
at io.cdap.cdap.internal.app.runtime.batch.MapReduceRuntimeService.run(MapReduceRuntimeService.java:416) ~[na:na]
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52) ~[com.google.guava.guava-13.0.1.jar:na]
at io.cdap.cdap.internal.app.runtime.batch.MapReduceRuntimeService$2$1.run(MapReduceRuntimeService.java:450) [na:na]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_232]
2020-01-10 13:00:50,916 - ERROR [WorkflowDriver:i.c.c.d.SmartWorkflow#552] - Pipeline '0f084034-33a9-11ea-95f6-8e2648ebe039' failed.
2020-01-10 13:00:51,225 - ERROR [WorkflowDriver:i.c.c.i.a.r.w.WorkflowProgramController#89] - Workflow service 'workflow.default.0f084034-33a9-11ea-95f6-8e2648ebe039.DataPipelineWorkflow.20288f05-33a9-11ea-a505-8e2648ebe039' failed.
java.lang.IllegalStateException: MapReduce JobId job_local976595976_0001 failed
at com.google.common.base.Preconditions.checkState(Preconditions.java:176) ~[com.google.guava.guava-13.0.1.jar:na]
at io.cdap.cdap.internal.app.runtime.batch.MapReduceRuntimeService.run(MapReduceRuntimeService.java:416) ~[na:na]
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52) ~[com.google.guava.guava-13.0.1.jar:na]
at io.cdap.cdap.internal.app.runtime.batch.MapReduceRuntimeService$2$1.run(MapReduceRuntimeService.java:450) ~[na:na]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_232]
As per issue records reported, you have persisted with java.lang.nullpointerexception error, that might reflect the usage of a null when the object required within an application run path.
Assuming the fact that you've successfully configured JDBC driver, I would recommend to check the source Database Properties across your pipeline in order to determine the undefined field, supposedly can be Import Query property field, that is used to import data from specified table by supplying SELECT query with appropriate $CONDITIONS if the number of splits to generate is more than 1:
SELECT * FROM <table> WHERE $CONDITIONS
UPDATE:
https://issues.cask.co/browse/CDAP-16453
It's a known issue, fixed in 6.1.2
"Same error on MySQL 5.x
Strange enough, if you deploy the pipeline and run it it works...
I'm thinking about decoupling pipelines to have small sql-to-storage and the big pipeline in the outgoing flow"
regards
Virgilio
I'm trying to get my Zeppelin notebook to use Windows Authentication to connect to MS SQL Server. I've gotten local authentication to work using the JDBC. I've gotten Zeppelin working authentication with Active Directory. This is the final step to get the notebook to working. This should be possible right?
In my interpreter I have:
Properties
zeppelin.jdbc.auth.type = Kerberos
zeppelin.jdbc.integratedSecurity = true
Dependencies
/opt/zeppelin/interpreter/mssql/mssql-jdbc-6.4.0.jre8.jar
But when I try out my notebook I get this error:
java.lang.ClassNotFoundException: org.apache.hadoop.security.UserGroupInformation$AuthenticationMethod
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.zeppelin.jdbc.security.JDBCSecurityImpl.getAuthtype(JDBCSecurityImpl.java:65)
at org.apache.zeppelin.jdbc.security.JDBCSecurityImpl.createSecureConfiguration(JDBCSecurityImpl.java:42)
at org.apache.zeppelin.jdbc.JDBCInterpreter.open(JDBCInterpreter.java:190)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:491)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
zeppelin.jdbc.auth.type = true is not a valid setting
Types of authentications' methods supported are SIMPLE, and KERBEROS
https://zeppelin.apache.org/docs/latest/interpreter/jdbc.html#more-properties
I'm trying to set up my interpreter but I'm lost and I need some help. I set up my environmental variables (I think) but when I try to check the spark version using
$spark sc.version
I get this error:
java.lang.NullPointerException
at org.apache.thrift.transport.TSocket.open(TSocket.java:170)
at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51)
at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37)
at org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)
at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:62)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:133)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.internal_create(RemoteInterpreter.java:165)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:132)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:299)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:407)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:315)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I this saying that my Java is wrong? I have my environment variable for java home set like so:
export JAVA_HOME=/usr/java8
So I would expect the interpreter to use java8 and work.
Any help is appreciated!
I'm trying to add Spark dependencies via maven by specifying groupId:artifactId:version in Dependencies section in Zeppelin's Interpreter console. Once I saved and executed a spark paragraph. A java.lang.NullPointerException was thrown, full log below
java.lang.NullPointerException
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:44)
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:39)
at org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext_2(OldSparkInterpreter.java:375)
at org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext(OldSparkInterpreter.java:364)
at org.apache.zeppelin.spark.OldSparkInterpreter.getSparkContext(OldSparkInterpreter.java:172)
at org.apache.zeppelin.spark.OldSparkInterpreter.open(OldSparkInterpreter.java:740)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:61)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at org.apache.zeppelin.spark.SparkSqlInterpreter.getSparkInterpreter(SparkSqlInterpreter.java:76)
at org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:92)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:633)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I then removed that maven dependencies but the exception didn't disappear.
According to the log, it seems like Zeppelin is unable to start a spark interpreter, so I looked at Spark Interpreter log at zeppelin-interpreter-spark-zeppelin-development-cluster-m.log but nothing was logged when I ran a spark paragraph. Below is a Zeppelin log found in zeppelin-zeppelin-development-cluster-m.log
INFO [2018-08-10 13:24:53,036] ({qtp2110245805-15} VFSNotebookRepo.java[save]:196) - Saving note:2DM9MXZGM
INFO [2018-08-10 13:24:53,053] ({pool-2-thread-2} SchedulerFactory.java[jobStarted]:109) - Job 20180810-111509_373986425 started by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpreter-spark:shared_process-2DM9MXZGM
INFO [2018-08-10 13:24:53,054] ({pool-2-thread-2} Paragraph.java[jobRun]:380) - Run paragraph [paragraph_id: 20180810-111509_373986425, interpreter: sql, note_id: 2DM9MXZGM, user: anonymous]
WARN [2018-08-10 13:24:58,614] ({pool-2-thread-2} NotebookServer.java[afterStatusChange]:2303) - Job 20180810-111509_373986425 is finished, status: ERROR, exception: null, result: %text java.lang.NullPointerException
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:44)
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:39)
at org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext_2(OldSparkInterpreter.java:375)
at org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext(OldSparkInterpreter.java:364)
at org.apache.zeppelin.spark.OldSparkInterpreter.getSparkContext(OldSparkInterpreter.java:172)
at org.apache.zeppelin.spark.OldSparkInterpreter.open(OldSparkInterpreter.java:740)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:61)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at org.apache.zeppelin.spark.SparkSqlInterpreter.getSparkInterpreter(SparkSqlInterpreter.java:76)
at org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:92)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:633)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I found similar posts facing the same issues but they are related to zeppelin unable to connect to Hive. I don't think that's my issue. I also ran spark-shell and it worked fine.
I'm on Google Dataproc image version 1.3.1.
thanks
I've managed to fix it. The problem is I added the following artifact https://mvnrepository.com/artifact/spotify/spark-bigquery/0.2.2-s_2.11 as an external dependency and it might have conflicted with Spark or Zeppelin libs. After removing everything in zeppelin.dep.localrepo which is /usr/lib/zeppelin/local-repo in my case and restart Zeppelin, everything is back to normal.
thanks
I tried to install new zeppelin i.e 0.8.0 version, I followed this link to install zeppelin .
Since I wanted phoenix interpreter, I added interpreter with the help of guide given in here.
When I run the the below paragraph I got the below error
%phoenix
select * from "mytable"
org.apache.thrift.TApplicationException: Internal error processing createInterpreter
at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_createInterpreter(RemoteInterpreterService.java:209)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.createInterpreter(RemoteInterpreterService.java:192)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$2.call(RemoteInterpreter.java:164)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$2.call(RemoteInterpreter.java:160)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:141)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.internal_create(RemoteInterpreter.java:160)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:129)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:287)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:425)
at org.apache.zeppelin.scheduler.Job.run(Job.java:182)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:307)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)