Spark check fails - apache-zeppelin

I'm trying to set up my interpreter but I'm lost and I need some help. I set up my environmental variables (I think) but when I try to check the spark version using
$spark sc.version
I get this error:
java.lang.NullPointerException
at org.apache.thrift.transport.TSocket.open(TSocket.java:170)
at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51)
at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37)
at org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)
at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:62)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:133)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.internal_create(RemoteInterpreter.java:165)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:132)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:299)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:407)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:315)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I this saying that my Java is wrong? I have my environment variable for java home set like so:
export JAVA_HOME=/usr/java8
So I would expect the interpreter to use java8 and work.
Any help is appreciated!

Related

How to setup up Zeppelin to use SQL Server Windows Authentication

I'm trying to get my Zeppelin notebook to use Windows Authentication to connect to MS SQL Server. I've gotten local authentication to work using the JDBC. I've gotten Zeppelin working authentication with Active Directory. This is the final step to get the notebook to working. This should be possible right?
In my interpreter I have:
Properties
zeppelin.jdbc.auth.type = Kerberos
zeppelin.jdbc.integratedSecurity = true
Dependencies
/opt/zeppelin/interpreter/mssql/mssql-jdbc-6.4.0.jre8.jar
But when I try out my notebook I get this error:
java.lang.ClassNotFoundException: org.apache.hadoop.security.UserGroupInformation$AuthenticationMethod
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.zeppelin.jdbc.security.JDBCSecurityImpl.getAuthtype(JDBCSecurityImpl.java:65)
at org.apache.zeppelin.jdbc.security.JDBCSecurityImpl.createSecureConfiguration(JDBCSecurityImpl.java:42)
at org.apache.zeppelin.jdbc.JDBCInterpreter.open(JDBCInterpreter.java:190)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:491)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
zeppelin.jdbc.auth.type = true is not a valid setting
Types of authentications' methods supported are SIMPLE, and KERBEROS
https://zeppelin.apache.org/docs/latest/interpreter/jdbc.html#more-properties

Apache Zeppelin error local jar not exist

java.lang.RuntimeException: Warning: Local jar
C:\Zeppelin\zeppelin-0.8.0-bin-all\bin\54480 does not exist, skipping.
Warning: Local jar C:\Zeppelin\zeppelin-0.8.0-bin-all\bin\10.10.10.122
does not exist, skipping. java.lang.ClassNotFoundException:
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer at
java.net.URLClassLoader.findClass(URLClassLoader.java:381) at
java.lang.ClassLoader.loadClass(ClassLoader.java:424) at
java.lang.ClassLoader.loadClass(ClassLoader.java:357) at
java.lang.Class.forName0(Native Method) at
java.lang.Class.forName(Class.java:348) at
org.apache.spark.util.Utils$.classForName(Utils.scala:235) at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:836)
at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136) at
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 2018-09-18
10:33:29 INFO ShutdownHookManager:54 - Shutdown hook called 2018-09-18
10:33:29 INFO ShutdownHookManager:54 - Deleting directory
C:\Users\Polichetti\AppData\Local\Temp\spark-e1cca18a-e05a-4539-b1b6-2f56a8ab27aa
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterManagedProcess.start(RemoteInterpreterManagedProcess.java:205)
at
org.apache.zeppelin.interpreter.ManagedInterpreterGroup.getOrCreateInterpreterProcess(ManagedInterpreterGroup.java:64)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getOrCreateInterpreterProcess(RemoteInterpreter.java:111)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.internal_create(RemoteInterpreter.java:164)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:132)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:299)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:407)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:307)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Please James, this is not the way that Stackoverflow works... Do some research If you have an error log, and don't throw to here like trash.

Zeppelin throws java.lang.NullPointerException when starting SparkInterpreter after adding Spark dependencies via Maven

I'm trying to add Spark dependencies via maven by specifying groupId:artifactId:version in Dependencies section in Zeppelin's Interpreter console. Once I saved and executed a spark paragraph. A java.lang.NullPointerException was thrown, full log below
java.lang.NullPointerException
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:44)
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:39)
at org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext_2(OldSparkInterpreter.java:375)
at org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext(OldSparkInterpreter.java:364)
at org.apache.zeppelin.spark.OldSparkInterpreter.getSparkContext(OldSparkInterpreter.java:172)
at org.apache.zeppelin.spark.OldSparkInterpreter.open(OldSparkInterpreter.java:740)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:61)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at org.apache.zeppelin.spark.SparkSqlInterpreter.getSparkInterpreter(SparkSqlInterpreter.java:76)
at org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:92)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:633)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I then removed that maven dependencies but the exception didn't disappear.
According to the log, it seems like Zeppelin is unable to start a spark interpreter, so I looked at Spark Interpreter log at zeppelin-interpreter-spark-zeppelin-development-cluster-m.log but nothing was logged when I ran a spark paragraph. Below is a Zeppelin log found in zeppelin-zeppelin-development-cluster-m.log
INFO [2018-08-10 13:24:53,036] ({qtp2110245805-15} VFSNotebookRepo.java[save]:196) - Saving note:2DM9MXZGM
INFO [2018-08-10 13:24:53,053] ({pool-2-thread-2} SchedulerFactory.java[jobStarted]:109) - Job 20180810-111509_373986425 started by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpreter-spark:shared_process-2DM9MXZGM
INFO [2018-08-10 13:24:53,054] ({pool-2-thread-2} Paragraph.java[jobRun]:380) - Run paragraph [paragraph_id: 20180810-111509_373986425, interpreter: sql, note_id: 2DM9MXZGM, user: anonymous]
WARN [2018-08-10 13:24:58,614] ({pool-2-thread-2} NotebookServer.java[afterStatusChange]:2303) - Job 20180810-111509_373986425 is finished, status: ERROR, exception: null, result: %text java.lang.NullPointerException
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:44)
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:39)
at org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext_2(OldSparkInterpreter.java:375)
at org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext(OldSparkInterpreter.java:364)
at org.apache.zeppelin.spark.OldSparkInterpreter.getSparkContext(OldSparkInterpreter.java:172)
at org.apache.zeppelin.spark.OldSparkInterpreter.open(OldSparkInterpreter.java:740)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:61)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at org.apache.zeppelin.spark.SparkSqlInterpreter.getSparkInterpreter(SparkSqlInterpreter.java:76)
at org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:92)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:633)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I found similar posts facing the same issues but they are related to zeppelin unable to connect to Hive. I don't think that's my issue. I also ran spark-shell and it worked fine.
I'm on Google Dataproc image version 1.3.1.
thanks
I've managed to fix it. The problem is I added the following artifact https://mvnrepository.com/artifact/spotify/spark-bigquery/0.2.2-s_2.11 as an external dependency and it might have conflicted with Spark or Zeppelin libs. After removing everything in zeppelin.dep.localrepo which is /usr/lib/zeppelin/local-repo in my case and restart Zeppelin, everything is back to normal.
thanks

Zeppelin Internal error processing create Interpreter

I tried to install new zeppelin i.e 0.8.0 version, I followed this link to install zeppelin .
Since I wanted phoenix interpreter, I added interpreter with the help of guide given in here.
When I run the the below paragraph I got the below error
%phoenix
select * from "mytable"
org.apache.thrift.TApplicationException: Internal error processing createInterpreter
at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_createInterpreter(RemoteInterpreterService.java:209)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.createInterpreter(RemoteInterpreterService.java:192)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$2.call(RemoteInterpreter.java:164)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$2.call(RemoteInterpreter.java:160)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:141)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.internal_create(RemoteInterpreter.java:160)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:129)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:287)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:425)
at org.apache.zeppelin.scheduler.Job.run(Job.java:182)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:307)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

prestodb JDBC interpreter raises NullPointerException

I'm trying to connect to prestodb from Zeppelin using the generic JDBC interpreter. Here's the configuration:
presto %jdbc (default)
Option Shared
Properties
name value
default.driver com.facebook.presto.jdbc.PrestoDriver
default.url jdbc:presto://presto:8080
default.user presto
zeppelin.jdbc.concurrent.max_connection 10
zeppelin.jdbc.concurrent.use true
Dependencies
artifact exclude
/zeppelin/interpreter/jdbc/presto-jdbc-0.157.jar
I can successfully connect and query using the CLI:
./presto --server presto:8080
But when I try to use any query inside a notebook paragraph I get:
null
class java.lang.NullPointerException
org.apache.zeppelin.jdbc.JDBCInterpreter.getMaxResult(JDBCInterpreter.java:471)
org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:307)
org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:408)
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:94)
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:341)
org.apache.zeppelin.scheduler.Job.run(Job.java:176)
org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
Is there something missing in my JDBC interpreter configuration?
It turns out it was a simple configuration issue. Looking at the code for getMaxResult:
propertiesMap.get(COMMON_KEY).getProperty(MAX_LINE_KEY, MAX_LINE_DEFAULT));
So I suspected JDBC interpreters had some configuration that I forgot to include on my presto interpreter. Looking for the keywords COMMON and MAX, there was it:
common.max_count 1000
Adding this property to the presto interpreter solved this problem.

Resources