I need your help to create a "permanently" connection from databricks to sql server database in Azure.
I have a code in pyspark to connect to database, using driver "com.microsoft.sqlserver.jdbc.spark" and JAR spark_mssql_connector_2_12_3_0_1_0_0_alpha.jar.
I have created a class to connect to DB is via token
class SQLSpark():
database_name: str = ""
sql_service_name: str = ""
service_principal_id: str = ""
service_principal_secret: str = ""
tenant_id: str = ""
authority: str = ""
state = None
except_error = None
def __init__(self, database_name, service_principal_id, service_principal_secret, tenant_id,
authority, spark, sql_service_name=None):
self.database_name = database_name
self.sql_service_name = sql_service_name
self.service_principal_id = service_principal_id
self.service_principal_secret = service_principal_secret
self.tenant_id = tenant_id
self.authority = authority
self.state = True
self.except_error = ""
self._spark_session = spark
context = adal.AuthenticationContext(self.authority)
token = context.acquire_token_with_client_credentials("https://database.windows.net", self.service_principal_id,
self.service_principal_secret)
self._access_token = token["accessToken"]
server_name = "jdbc:sqlserver://" + self.sql_service_name + ".database.windows.net"
self._url = server_name + ";" + "databaseName=" + self.database_name + ";"
def select_table(self, table, sql_query):
try:
logger.info(f"Reading table {table} in DB {self.database_name} ")
df = self._spark_session.read.format("com.microsoft.sqlserver.jdbc.spark") \
.options(
url=self._url,
databaseName=self.database_name,
accessToken=self._access_token,
hostNameInCertificate="*.database.windows.net",
query=sql_query) \
.load()
self.custom_logger.info(f"Table {table} in database {self.database_name} has been read")
return df
except Exception as ex:
logger.error(f"Failed to read table {table}")
logger.error(ex)
The problem is that I have to process huge data and processes took more that 1h to process and database token expired. Is there a way to refresh the token when I call to select_table method?
Error given is:
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user '<token-identified principal>'. Token is expired.
Full error:
Py4JJavaError: An error occurred while calling o9092.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 59.0 failed 4 times, most recent failure: Lost task 0.3 in stage 59.0 (TID 2611, 10.139.64.5, executor 0): com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user '<token-identified principal>'. Token is expired. ClientConnectionId:009909b8-d779-4df2-b077-59cf4c4b3c73
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)
at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onEOF(tdsparser.java:283)
at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:129)
at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:37)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.sendLogon(SQLServerConnection.java:5173)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.logon(SQLServerConnection.java:3810)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.access$000(SQLServerConnection.java:94)
at com.microsoft.sqlserver.jdbc.SQLServerConnection$LogonCommand.doExecute(SQLServerConnection.java:3754)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7225)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:3053)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2562)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:2216)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:2067)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:1204)
at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:825)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$createConnectionFactory$1(JdbcUtils.scala:64)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:272)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:356)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:320)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:356)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:320)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:356)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:320)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
at org.apache.spark.scheduler.Task.run(Task.scala:117)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:655)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:658)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2519)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2466)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2460)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2460)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1152)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1152)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1152)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2721)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2668)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2656)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user '<token-identified principal>'. Token is expired. ClientConnectionId:009909b8-d779-4df2-b077-59cf4c4b3c73
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)
at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onEOF(tdsparser.java:283)
at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:129)
at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:37)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.sendLogon(SQLServerConnection.java:5173)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.logon(SQLServerConnection.java:3810)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.access$000(SQLServerConnection.java:94)
at com.microsoft.sqlserver.jdbc.SQLServerConnection$LogonCommand.doExecute(SQLServerConnection.java:3754)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7225)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:3053)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2562)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:2216)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:2067)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:1204)
at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:825)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$createConnectionFactory$1(JdbcUtils.scala:64)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:272)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:356)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:320)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:356)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:320)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:356)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:320)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
at org.apache.spark.scheduler.Task.run(Task.scala:117)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:655)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:658)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Couple of things I can think of.
Check if there is an option to provide a refresh URL to Spark. So it can get new token. Similar to this but for your SQL Server instead of ADLS. You'll probably have to use some other API like acquire_token_with_refresh_token() to create the token.
I know some of the token generator implementations allow you to provide a requested expiry period while making the call to create a new token. If your does, then create a token valid for 2-3-6-whatever-you-need hours instead of letting it set expiry to default one hour.
Other option assuming your code is NOT correct. I.e. there is NOT a good reason to create token in __init__(). You should create token near where you use it. I.e.
class SQLSpark():
# ...
def __init__(self, database_name, service_principal_id, service_principal_secret, tenant_id,
authority, spark, sql_service_name=None):
# same as OP, except no token is created and stored in self.token
def select_table(self, table, sql_query):
# ...
# Generate the token closer to it's use.
token = adal.AuthenticationContext(self.authority).acquire_token_with_client_credentials("https://database.windows.net",
self.service_principal_id, self.service_principal_secret)
df = self._spark_session.read.format("com.microsoft.sqlserver.jdbc.spark") \
.options(
# ...
accessToken=token["accessToken"],
query=sql_query) \
.load()
# ...
Related
Pymssql -V:2.2.1
Python -V: 3.8
DB: SQL Server 2008 R2
This is my code:
class read_sql(object):
def __init__(self):
# 服务器名
self.server = HOST_24
# 用户名
self.user = SQL_SERVER_USER
# 密码
self.password = SQL_SERVER_PASSWORD
# 数据库名
self.database = PUBMED_DB
# 连接数据库
try:
self.conn = pymssql.connect(self.server, self.user, self.password, self.database)
# 创建cursor缓冲区,用来存放sql语句
self.cursor = self.conn.cursor()
except Exception as e:
print(e)
raise ValueError('数据库链接实例化出错')
def query(self, query_str):
# 输入query_str查询语句,内容返回到cursor缓冲区内
self.cursor.execute(query_str)
# 接收全部的返回结果行.
row = self.cursor.fetchall()
return row
if __name__ == "__main__":
sql_data = read_sql()
row = sql_data.query("SELECT TOP 1 * FROM JinMo_CheckTable order by id desc")
print(row)
This is my first time connecting to SQL Server!
I get this error:
pymssql._mssql.MSSQLDatabaseException: (18456, b"\xe7\x94\xa8\xe6\x88\xb7 'sa' \xe7\x99\xbb\xe5\xbd\x95\xe5\xa4\xb1\xe8\xb4\xa5\xe3\x80\x82DB-Lib error message 20018, severity 14:
General SQL Server error: Check messages from the SQL Server
DB-Lib error message 20002, severity 9:
Adaptive Server connection failed (192.168.0.24)
DB-Lib error message 20002, severity 9:
Adaptive Server connection failed (192.168.0.24)
Please give me some suggestions - thanks !
My Python code shown below is written to create a SQL Server connection using Windows authentication. I have constraints to use adodbapi library for database connectivity.
Please can anyone tell me what is missing from this code? I referred to the library's documentation, but there is nothing mentioning Windows authentication.
I referred to a lot of articles about that exception. But they seems there's no help to understand the nature of exception and its resolution.
Multiple-step OLE DB operation generated errors. Check each OLE DB status value, if available. No work was done.
Code:
import configparser
import adodbapi
config = configparser.ConfigParser()
config.read("C:/plugin/configsql.ini")
_SERVER_NAME = config['SQL']['SERVER_NAME']
_DATABASE = config['SQL']['DATABASE']
conn = adodbapi.connect("PROVIDER=MSOLEDBSQL;Data Source={0};Database={1};Integrated Security = True;".format(_SERVER_NAME,_DATABASE))
print(conn)
Exception:
Traceback (most recent call last):
File "C:\Arelle-master\venv1\lib\site-packages\adodbapi\adodbapi.py", line 113, in connect
co.connect(kwargs)
File "C:\Arelle-master\venv1\lib\site-packages\adodbapi\adodbapi.py", line 275, in connect
self.connector.Open() # Open the ADO connection
File "", line 3, in Open
File "C:\Arelle-master\venv1\lib\site-packages\win32com\client\dynamic.py", line 287, in ApplyTypes
result = self.oleobj.InvokeTypes(*(dispid, LCID, wFlags, retType, argTypes) + args)
pywintypes.com_error: (-2147352567, 'Exception occurred.', (0, 'Provider', 'Multiple-step OLE DB operation generated errors. Check each OLE DB status value, if available. No work was done.', None, 1240640, -2147217887), None)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "winAuthentication.py", line 8, in
conn = adodbapi.connect("PROVIDER=MSOLEDBSQL;Data Source={0};Database={1};Integrated Security = True;".format(_SERVER_NAME,_DATABASE))
File "C:\Arelle-master\venv1\lib\site-packages\adodbapi\adodbapi.py", line 117, in connect
raise api.OperationalError(e, message)
adodbapi.apibase.OperationalError: (com_error(-2147352567, 'Exception occurred.', (0, 'Provider', 'Multiple-step OLE DB operation generated errors. Check each OLE DB status value, if available. No work was done.', None, 1240640, -2147217887), None), 'Error opening connection to "PROVIDER=MSOLEDBSQL;Data Source=MSSQLSERVER01;Database=TESTDB;Integrated Security = True;"')
Have you tried Trusted_Connection=yes? Here is my connection string that uses windows authentication (using pyodbc) but should be the same connection parameter, not Integrated Security.
conn = pyodbc.connect('Driver={SQL Server};'
'Server=ServerName;'
'Database=DatabaseName;'
'Trusted_Connection=yes;')
Or perhaps Integrated Security = SSPI, found mentioned here http://adodbapi.sourceforge.net/quick_reference.pdf
'Integrated Security=SSPI'
Recently I've discovered that you can make a JDBC request Test Step in SoapUI (doc 1, doc 2). And I have a load test that fails under certain conditions, i.e. I need to manually execute SQL script in order to prepare data each time before I run this load test.
I'm not sure that it's possible, but if it is, how can I automate my initialization step?
ps. If I simply add JDBC Request test step to the load test then this step executes multiple times and this is not what I want. I think I need to query database from setup script:
Possible, Setup Script will be run before test is executed - for example you can set a groovy script like:
import groovy.sql.Sql
// db connection
def DBurl = 'jdbc:oracle:thin:#11.111.1.11:1521:SID'
def DBuser = 'user'
def DBpassword = 'password'
def DBdriver = 'oracle.jdbc.pool.OracleDataSource'
def DBsql = Sql.newInstance(DBurl, DBuser, DBpassword, DBdriver)
// your sql
try{
DBsql.execute('''
[SQL U WANT TO EXECUTE]
''' )
} catch (Exception e) {
log.error e.getMessage()
}
I am trying to connect to a redshift server and run some sql commands. Here is the code that I have written:
Class.forName("org.postgresql.Driver")
val url: String = s"jdbc:postgres://${user}:${password}#${host}:${port}/${database}"
val connection: Connection = DriverManager.getConnection(url, user, password)
val statement = connection.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY)
val setSearchPathQuery: String = s"set search_path to '${schema}';"
statement.execute(setSearchPathQuery)
But I am getting the following error:
java.sql.SQLException: No suitable driver found for jdbc:postgres://user:password#host:port/database
But when I am using play framework's default database library with the same configuration, then I am able to connect to database successfully. Below is the configuration for the default database:
db.default.driver=org.postgresql.Driver
db.default.url="postgres://username:password#hostname:port/database"
db.default.host="hostname"
db.default.port="port"
db.default.dbname = "database"
db.default.user = "username"
db.default.password = "password"
The problem was with the url. The correct format for the url is:
jdbc:postgresql://hostname:port/database
I'm trying to use oracle's hsodbc generic database link driver to access a postgresql database from my oracle 10gr2 database server. I think I have everything configured but I'm receiving this error from the sqlplus promt after trying a remote query.
SQL> select * from temp_user#intranet;
select * from temp_user#intranet
*
ERROR at line 1:
ORA-28545: error diagnosed by Net8 when connecting to an agent
Unable to retrieve text of NETWORK/NCR message 65535
ORA-02063: preceding 2 lines from INTRANET
If I use "isql" from the linux command line (in other words test just the odbc connection) the query works.
I enter in "isql intranet" (intranet is the name of the odbc connection)
I get the prompt I type select * from temp_user and I receive back my 157 records on screen.
So I know the odbc configuration is setup correctly. Here is what I do for oracle.
%oracle_home/hs/admin/inithsodbc.ora
HS_FDS_CONNECT_INFO = intranet
HS_FDS_TRACE_LEVEL = OFF
HS_FDS_SHAREABLE_NAME = /usr/bin/ODBCConfig
%oracle_home/network/admin/tnsnames.ora
INTRANET =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.5.1)(PORT = 5432))
)
(CONNECT_DATA =
(SID = INTRANET)
)
(HS = OK)
%oracle_home/network/admin/listener.ora
SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(GLOBAL_DBNAME = INTRANET)
(PROGRAM = hsodbc)
(SID_NAME = INTRANET)
(ORACLE_HOME = /home/oracle/app/OraHomeTEST)
LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = oracledb.andersen-const.com)(PORT = 5432))
)
)
I have restarted the listener. It's status is as follows.
Services Summary...
Service "INTRANET" has 1 instance(s).
Instance "INTRANET", status UNKNOWN, has 1 handler(s) for this service...
I then go into sqlplus from the database server command line and do the following.
drop database link intranet;
create database link intranet connect to auser identified by apassword using 'intranet';
This is successful.
However when I run
select * from temp_user#intranet
I receive the error
ERROR at line 1:
ORA-28545: error diagnosed by Net8 when connecting to an agent
Unable to retrieve text of NETWORK/NCR message 65535
ORA-02063: preceding 2 lines from INTRANET
I've spend atleast a good day going back over the configures and trying things and I always get this error.
Anybody have any good ideas,
What does "tnsping intranet" report?
Are you sure your hsodbc prorgram is in the Oracle_home/bin directory of the your gateway installation? Also, is your LD_LIBRARY_PATH set properly?
I believe your LD_LIBRARY_PATH should be $ORACLE_HOME/lib. Sorry, not sure since I don't do much with *Nix these days.