Zeppelin connection refused - apache-zeppelin

Zeppeling notebook is reporting an "java.net.ConnectException: Connection refused: connect". I tried a few simple python statements. Is this the notebook losing connection to the Web Server? I don't see a lot of chat about Zeppelin on SO. It's hard to troubleshoot.
INFO [2016-12-12 19:23:37,006] ({pool-1-thread-7} Paragraph.java[jobRun]:252) - run paragraph 20161212-191758_314125131 using null org.apache.zeppelin.interpreter.LazyOpenInterpreter#6bf78ad0
ERROR [2016-12-12 19:23:38,003] ({pool-1-thread-7} Job.java[run]:189) - Job failed
org.apache.zeppelin.interpreter.InterpreterException: org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused: connect
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:165)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:328)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterprete

First of all, that error isn't connection problem to webserver, but looks like Spark connection problem. In case you didn't intent to use Spark there (and use Python as before), you should specify the name of interpreter in the beginning of each paragaraph in Zeppelin (python, spark, etc). If you don't specify, it uses default bound interpreter in your list of interpreters, and by default it's spark with scala. Thus:
1) in case of using python, should add %python in the beginning of that paragraph
2) in case of intended use of spark interpreter, your version of Zeppelin and whether you're using SPARK_HOME would be helpful.

Related

SSIS to Oracle "Could not create a managed connection manager."

I'm trying to use SSIS to load some data from Oracle database to MSSQL database.
I created the project and used the ADO.Net source and was able to create a connection to Oracle and run queries and view results.
However when I actually run the package I get the following error:
Error: 0xC0208449 at Data Flow Task, ADO NET Source 2: ADO NET Source has failed to acquire the connection {EECB236A-59EA-475E-AE82-52871D15952D} with the following error message: "Could not create a managed connection manager.".
It seems similar to the issue here
And I did find that I have two oracle clients version installed "11.1" and "12.2".
One is used by PL/SQL and the other by other entity framework project.
If this is the issue I just wanted a way to tell the SSIS to pick-up the correct one.
I tried adding Entry in machine.config for "oracle.manageddataaccess.client" section with the desired version.
I also tried using other types of data sources but couldn't even create a successful connection
I tried changing the Run64bitRuntime property in the project to False
Note: I don't have SSIS installed on my machine.
Eventually, I just had to remove the entries related to 11.1 in path variable then restarted my machine.
Also I switched to "dotConnectForOracle" for connection and now it seems to be working fine.
I'm expecting issues related to other applications that might still be using the 11.1 version, but that will be a problem for another day.
Always make sure to write the user (oracle schema) in uppercase and some special characters [in my case it was $] in the password needs escape character even if you're using the wizard not the cmd
I still don't understand the whole issue but I hope this helps someone some day.

ct_connect(): network packet layer: internal net library error: Net-Lib protocol driver call to connect two endpoints failed stackoverflow

While connecting to sybase, i'm trying to start my server using
startserver
but, i have encountered the above error.
The usual way to start a Sybase instance is:
startserver -f RUN_SERVER_FILENAME
e.g RUN_MY_INSTANCE or similar depending on what your Sybase instance is called. Once its started correctly then you should be able to connect. This is usually in the $SYBASE/$SYBASE_ASE/install directory on a default installation for Unix.
More info here:http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc30191.1570100/doc/html/san1367605056632.html
Are you getting any other messages? You should look at the instance errorlog to check for problems on startup.

JavaKerberos authentication to SQL Server on Spark framework

I am trying to get a spark cluster to write to SQL server using JavaKerberos with Microsoft's JDBC driver (v7.0.0) (i.e., I specify integratedSecurity=true;authenticationScheme=JavaKerberos in the connection string) with credentials specified in a keyTab file and I am not having much success (the problem is the same if I specify credentials in the connections string).
I am submitting the job to the cluster (4-node YARN mode v 2.3.0) with:
spark-submit --driver-class-path mssql-jdbc-7.0.0.jre8.jar \
--jars /path/to/mssql-jdbc-7.0.0.jre8.jar \
--conf spark.executor.extraClassPath=/path/to/mssql-jdbc-7.0.0.jre8.jar \
--conf "spark.driver.extraJavaOptions=-Djava.security.auth.login.config=/path/to/SQLJDBCDriver.conf" \
--conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=/path/to/SQLJDBCDriver.conf" \
application.jar
Things work partially: the spark driver authenticates correctly and creates the table, however when any of the executors come to write to the table they fail with an exception:
java.security.PrivilegedActionException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
Observations:
I can get everything to work if I specify SQL server credentials (however I need to use integrated security in my application)
The keytab and login module file “SQLJDBCDriver.conf” seem to be specified correctly since they work for the driver
I can see in the spark UI the executors pick up the correct command line options :
-Djava.security.auth.login.config=/path/to/SQLJDBCDriver.conf
After a lot of logging/debugging the difference in spark driver and executor behaviour, it seems to come down to the executor trying to use the wrong credentials even though the options specified should make it use those specified in the keytab file as it does successfully for the spark driver. (That is why it generates that particular exception which is what it does if I try deliberately incorrect credentials.)
Strangely, I can see in the debug output the JDBC driver finds and reads the SQLJDBCDriver.conf file and the keytab has to present (otherwise I get file not found failure) yet it then promptly ignores them and tries to use default behaviour/local user credentials.
Can anyone help me understand how I can force the executors to use credentials provided in a keytab or otherwise get JavaKerberos/SQL Server authentication to work with Spark?
Just to give an update on this, I've just closed https://issues.apache.org/jira/browse/SPARK-12312 and now it's possible to do Kerberos authentication with built-in JDBC connection providers. There are many providers added and one of them is MS SQL. Please read the documentation how to use it: https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
Please be aware Spark 3.1 is not yet released so it will take some time when the newly added 2 configuration parameters appear on the page (keytab and principal). I think the doc update will happen within 1-2 weeks.
Integrated authentication does not work with MS SQLServer JDBC driver in a secure cluster with AD integration as the containers will not have the context as the Kerberos tokens are lost when the mappers spawn (as the YARN transitions the job to its internal security subsystem).
Here is my repo that was used as work around to get Kerberos/AD authentication https://github.com/chandanbalu/mssql-jdbc-krb5 solution implements a Driver that overrides connect method of the latest MS SQL JDBC Driver (mssql-jdbc-9.2.1.jre8.jar), and will get a ticket for keytab file/principal, and gives this connection back.
You can grab the latest build of this custom driver from release folder here
Start spark-shell with JARS
spark-shell --jars /efs/home/c795701/.ivy2/jars/mssql-jdbc-9.2.1.jre8.jar,/efs/home/c795701/mssql-jdbc-krb5/target/scala-2.10/mssql-jdbc-krb5_2.10-1.0.jar
Scala
scala>val jdbcDF = spark.read.format("jdbc").option("url", "jdbc:krb5ss://<SERVER_NAME>:1433;databasename=<DATABASE_NAME>;integratedSecurity=true;authenticationScheme=JavaKerberos;krb5Principal=c795701#NA.DOMAIN.COM;krb5Keytab=/efs/home/c795701/c795701.keytab").option("driver","hadoop.sqlserver.jdbc.krb5.SQLServ, "dbo.table_name").load()
scala>jdbcDF.count()
scala>jdbcDF.show(10)
spark-submit command
com.spark.SparkJDBCIngestion - Spark JDBC data frame operations
ingestionframework-1.0-SNAPSHOT.jar - Your project build JAR
spark-submit \
--master yarn \
--deploy-mode cluster \
--jars "/efs/home/c795701/mssql-jdbc-krb5/target/scala-2.10/mssql-jdbc-krb5_2.10-1.0.jar,/efs/home/c795701/.ivy2/jars/scala-library-2.11.1.jar"
--files /efs/home/c795701/c795701.keytab
--class com.spark.SparkJDBCIngestion \
/efs/home/c795701/ingestionframework/target/ingestionframework-1.0-SNAPSHOT.jar
So apparently JDBC Kerberos authentication is just not possible currently on the executors according to an old JIRA here https://issues.apache.org/jira/browse/SPARK-12312. The behaviour is the same as of version 2.3.2 according to the spark user list and my testing.
Workarounds
Use kinit and then distribute the cached TGT to the executors as detailed here: https://github.com/LucaCanali/Miscellaneous/blob/master/Spark_Notes/Spark_Executors_Kerberos_HowTo.md. I think this technique only works for the user that spark executors run under. At least I couldn't get it to work for my use case.
Wrap the jdbc driver with a custom version that deals with the authentication and then calls and returns a connection from the real MS JDBC driver. Details here: https://datamountaineer.com/2016/01/15/spark-jdbc-sql-server-kerberos/ and the associated repo here: https://github.com/nabacg/krb5sqljdb. I got this technique to work though I had to modify the authentication code for my case.
as Gabor Somogyi said.
you need to use spark > 3.1.0 and keytab and principal arguments
I have 3.1.1.
Throw a keytab on the same path for ALL HOST and machine where you use your code - and keep keytab up to date
add to connection string value integratedSecurity=true;authenticationScheme=JavaKerberos;
reading block will look like:
jdbcDF = (spark.read
.format("com.microsoft.sqlserver.jdbc.spark")
.option("url", url)
.option("dbtable", table_name)
.option("principal", "username#domen")
.option("keytab", "sameALLhostKEYTABpath")
.load()
)

Address already in use when run kafka connect distributed mode

I launched confluent suite by issue "./bin/confluent start" command.
Then I use kafka connect to sink kafka data into mysql.
I can run kafka connect well in standalone mode by executing the following command:
./bin/connect-standalone
./etc/schema-registry/connect-avro-standalone.properties
./etc/kafka-connect-jdbc/adstats-jdbc-sink.properties
Then I close above command and switch to distributed mode by command:
./bin/connect-distributed
./etc/schema-registry/connect-avro-distributed.properties
./etc/kafka-connect-jdbc/adstats-jdbc-sink.properties
It reported the following exception:
[2018-08-09 14:51:56,951] ERROR Failed to start Connect
(org.apache.kafka.connect.cli.ConnectDistributed:108)
org.apache.kafka.connect.errors.ConnectException: Unable to start REST
server at
org.apache.kafka.connect.runtime.rest.RestServer.start(RestServer.java:214)
at org.apache.kafka.connect.runtime.Connect.start(Connect.java:53)
at
org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:106)
Caused by: java.net.BindException: Address already in use at
sun.nio.ch.Net.bind0(Native Method) at
sun.nio.ch.Net.bind(Net.java:433) at
sun.nio.ch.Net.bind(Net.java:425)
What's wrong? How can I switch to use kafka connect distributed mode? Thanks!
When you run confluent start you already started Kafka Connect in distributed mode. So you can either use that instance, or you can define a new REST port in the properties file for the second instance that you want to run.
Either way, you submit your sink configuration to Kafka Connect distributed over REST, rather than passing it as a properties argument at start up (unlike standalone).
After bootstrapping all the confluent services via
./confluent start
make sure to stop default kafka-connect through
./confluent stop connect
before starting your customized kafka-connect.

Python3: Connect to Remote Postgres Database with SSL

I am in the process of setting up a remote PostgreSQL database. The server is running CentOS 7 and PostgreSQL-9.5. Currently, I am testing whether users can query the database. To this end, I have the following:
import psycopg2
host = 'server1'
dbname = 'test_db'
user = 'test-user'
sslcert = 'test-db.crt'
sslmode = 'verify-full'
sslkey = 'test-db.key'
dsn = 'host={0} dbname={1} user={2} sslcert={3} sslmode={4} sslkey={5}'.format(host, dbname, user, sslcert, sslmode, sslkey)
conn = psycopg2.connect(dsn)
The connection times out with the following error:
psycopg2.OperationalError: could not connect to server: Connection timed out (0x0000274C/10060)
Is the server running on host "server1" (xx.xx.xx.xx) and accepting
TCP/IP connections on port 5432?
I have tried several things (given below). I'm trying to pin down on which side the problem exists: the Python end or the database configuration:
Is the Python syntax correct?
Where can I find documentation concerning the DSN arguments, such as sslmode, sslcert, and sslkey?
Is there a different package better suited for this kind of connection?
What other questions should I be asking?
I have checked the following:
'server1' was entered correctly and the IP address returned by Python corresponds
All other arguments are spelled correctly and refer to the correct object
Postgres is currently running (service postgres-9.5 status shows "active")
Postgres is listening on port 5432 (netstat -na | grep tcp shows "LISTEN" on port 5432)
SSL is running for my table (psql -U username -W -d test-db -h host returns SSL connection (protocol: TLSAv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
user=test-user has been added to postgres as a Superuser
My understanding is that psycopg2 is the appropriate package to use nowadays. I have scoured the documentation and don't find much information regarding SSL connections. I found this SO post which talks about SSL connections using psycog2, but I can't match some of the syntax to the documentation.
In the Python script, I have tried the following in all 4 combinations:
Use sslmode='require'
Use absolute paths to test-db.crt and test-db.key
It appears that you have presented yourself with a False Dilemma. The problem does not lie solely between Python and the database configuration. There exist other entities in between which may cause a disconnect.
Is the Python syntax correct?
Yes. The syntax is described in the psycopg2.connect() documentation. It has the form:
psycopg2.connect(dsn=None, connection_factory=None, cursor_factory=None, async=False, **kwargs)
where the DSN (Data Source Name) can be given as a single string or as separate arguments:
conn = psycopg2.connect(dsn="dbname=test user=postgres password=secret")
conn = psycopg2.connect(dbname="test", user="postgres", password="secret")
Where can I find documentation concerning the DSN arguments, such as sslmode, sslcert, and sslkey?
Note that as DSN arguments, they are not part of the psycopg2 module. They are defined by the database, in this case Postgres. They can be found in the chapter on Database Connection Control Functions, under the Parameter Key Words section.
What other questions should I be asking?
Perhaps,
Is there anything between the host (the PostgresSQL server) and the client (the local Python instance) which could prevent communication?
One answer to this would be "the firewall." This turned out to be the problem. Postgres was listening and Python was reaching out. But the door was closed.

Resources