I'm currently trying to write a pandas data frame into a new SQL Server table, and I'm having trouble figuring out how to connect WITHOUT USING USER/PASSWORD.
Pandas documentation states that an engine must be created via sqlalchemy, and the company only gave me a sample code (not using pandas, for other tasks) for the connection via pymssql:
server = "server name"
conn = pymssql.connect(server, database='TestDatabase')
cursor = conn.cursor()
cursor.execute(instruction)
conn.close()
Now I must pass a connection to sqlalchemy, which the sqlalchemy documentation states would be something like
engine = create_engine("mssql+pymssql://<username>:<password>#<freetds_name>/?charset=utf8",
encoding='latin1', echo=True)
but our SQL Server instance uses local authentication, and I found no model for this.
How can I create this connection string?
Related
How can I directly connect MS SQL Server to polars?
The documentation does not list any supported connections but recommends the use of pandas.
Update:
SQL Server Authentication works per answer, but Windows domain authentication is not working. see issue
Ahh, actually MsSQL is supported for loading directly into polars (via the underlying library that does the work, which is connectorx); the documentation is just slightly out of date - I'll take a look and refresh it accordingly.
Here you can connect to MS SQL Server with Polar (connectorx under the hood). Just use a connection string:
import polars as pl
# usually don't store sensitive info in plain text
username = 'my_username'
password = '1234'
server = 'SERVER1'
database = 'db1'
trusted_conn = 'no' # or yes
conn = f'mssql://{username}:{password}#{server}/{database}?driver=SQL+Server&trusted_connection={trusted_conn}'
query = "SELECT * FROM table1"
df = pl.read_sql(query, conn)
This question already has answers here:
How to run SQL statement from Databricks cluster
(2 answers)
Closed 2 years ago.
It is very straight forward to send custom SQL queries to a SQL database on Python.
connection = mysql.connector.connect(host='localhost',
database='Electronics',
user='pynative',
password='pynative##29')
sql_select_Query = "select * from Laptop" #any custom sql statement not particularly select statement
cursor = connection.cursor()
cursor.execute(sql_select_Query)
records = cursor.fetchall()
However, I have scoured the internet to do a similar task on Databricks and I haven't found any solution. It's worth mentioning that I can read from and write to SQL Server database using JDBC but I want to send a custom SQL statement for example a "bulk insert" statement that I want to execute within the SQL Server database.
Here is how I read data from SQL Server using JDBC.
table_name="dbo.myTable"
spark.read.jdbc(url=jdbcUrl, table=table_name, properties=connectionProperties)
Please reference this document: SQL Databases using JDBC:
Databricks Runtime contains JDBC drivers for Microsoft SQL Server and Azure SQL Database. See the Databricks runtime release notes for the complete list of JDBC libraries included in Databricks Runtime.
This article covers how to use the DataFrame API to connect to SQL
databases using JDBC and how to control the parallelism of reads
through the JDBC interface. This article provides detailed examples
using the Scala API, with abbreviated Python and Spark SQL examples
at the end. For all of the supported arguments for connecting to SQL
databases using JDBC, see JDBC To Other Databases.
Python example:
jdbcHostname = "<hostname>"
jdbcDatabase = "employees"
jdbcPort = 1433
jdbcUrl = "jdbc:sqlserver://{0}:{1};database={2};user={3};password={4}".format(jdbcHostname, jdbcPort, jdbcDatabase, username, password)
pushdown_query = "(select * from employees where emp_no < 10008) emp_alias"
df = spark.read.jdbc(url=jdbcUrl, table=pushdown_query, properties=connectionProperties)
display(df)
But the traditional jdbc connector writes data into your database using row-by-row insertion. You can use the Spark connector to write data to Azure SQL and SQL Server using bulk insert. It significantly improves the write performance when loading large data sets or loading data into tables where a column store index is used.
import com.microsoft.azure.sqldb.spark.bulkcopy.BulkCopyMetadata
import com.microsoft.azure.sqldb.spark.config.Config
import com.microsoft.azure.sqldb.spark.connect._
/**
Add column Metadata.
If not specified, metadata is automatically added
from the destination table, which may suffer performance.
*/
var bulkCopyMetadata = new BulkCopyMetadata
bulkCopyMetadata.addColumnMetadata(1, "Title", java.sql.Types.NVARCHAR, 128, 0)
bulkCopyMetadata.addColumnMetadata(2, "FirstName", java.sql.Types.NVARCHAR, 50, 0)
bulkCopyMetadata.addColumnMetadata(3, "LastName", java.sql.Types.NVARCHAR, 50, 0)
val bulkCopyConfig = Config(Map(
"url" -> "mysqlserver.database.windows.net",
"databaseName" -> "MyDatabase",
"user" -> "username",
"password" -> "*********",
"dbTable" -> "dbo.Clients",
"bulkCopyBatchSize" -> "2500",
"bulkCopyTableLock" -> "true",
"bulkCopyTimeout" -> "600"
))
df.bulkCopyToSqlDB(bulkCopyConfig, bulkCopyMetadata)
//df.bulkCopyToSqlDB(bulkCopyConfig) if no metadata is specified.
Ref: Use Spark Connector
HTH.
I've reached the writing to a SQL Server database part of my data journey, I hope someone is able to help.
I've been able to successfully connect to a remote Microsoft SQL Server database using PYODBC this allows me to pass in SQL queries into dataframes and to create reports.
I now would want to automate the "select import" manual method I've had a read of many blogs but I'm none the wiser to understanding the how behind it all.
import pandas as pd
import pyodbc
SERVER = r'Remote SQL Server'
database = 'mydB'
username = 'datanovice'
password = 'datanovice'
cnxn = pyodbc.connect('Driver={SQL
Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+
password)
cursor = cnxn.cursor()
I'm able to read queries easily using this and pass them into dataframes.
what's the best way to write into my MS SQL dB? noting that it's not local I'm happy to pass this into SQL Alchemy but I wasn't sure of the correct syntax.
Things to consider:
This is a mission critical database and some of the DataFrames must be written as delete queries
If this is an unsafe method and if I need to go back and study more to understand proper database methodology I'm very happy to do so
I'm not looking for someone to write or provide the code for me, but rather point me in the right direction
I envisage this to be something like.. but I'm not sure how I specify the correct table:
df.to_sql('my_df', con, chunksize=1000)
As you've seen from the pandas documentation you need to pass a SQLAlchemy engine object as the second argument to the to_sql method. Then you can use something like
df.to_sql("table_name", engine, if_exists="replace")
The SQLAlchemy documentation shows how to create the engine object. If you use an ODBC DSN then the statement will look something like this:
from sqlalchemy import create_engine
# ...
engine = create_engine("mssql+pyodbc://scott:tiger#some_dsn")
I created the below method to connect to SQL Server using SQL Alchemy and Pyodbc.
def getDBEngine(server, database):
Params = urllib.quote_plus("DRIVER={SQL Server};SERVER="+server+";DATABASE="+database+";TRUSTED_CONNECTION=Yes")
Engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s" % Params)
return Engine
I'm able to then use that engine to read and write data via methods like to_sql from pandas per the below.
def toSQL(Server, Database, Tablename):
writeEngine= getDatabaseEngine(Server, Database)
data.to_sql('write_Tablename',writeEngine,if_exists='append')
My question is whether there is a simple way to check the connection/ status of the engine before actually using it to read or write data. What's the easiest way?
One pattern I've seen used at multiple engagements is essentially a "is alive" check that is effectively select 1 as is_alive;. There's no data access, so it just checks where your connection is receptive to receiving commands from your application.
How can pypyodbc connect to linked tables in the .accdb database? Is this possible at all, or is this a limitation of pyodbc?
I need to get data from an MS Acess .accdb database into Python. This works perfectly and I can use pypyodbc to access tables and queries defined inside the .accdb Database. However, the database also has tables linked to an external SQL Server. When accessing such linked tables pypyodbc complains that it cannot connect to the SQL server.
test.accdb contains two tables: Test (local table) and cidb_ain (linked SQL table)
The following Python 3 code is my attempt to access the data:
import pypyodbc as pyodbc
cnxn = pyodbc.connect(driver='Microsoft Access Driver (*.mdb, *.accdb)',
dbq='test.accdb',
readonly=True)
cursor = cnxn.cursor()
# access to the local table works
for row in cursor.execute("select * from Test"):
print(row)
print('----')
# access to the linked table fails
for row in cursor.execute("select * from cidb_ain"):
print(row)
Output:
(1, 'eins', 1)
(2, 'zwei', 2)
(3, 'drei', 3)
----
Traceback (most recent call last):
File "test_02_accdb.py", line 14, in <module>
for row in cursor.execute("select * from cidb_ain"):
File "C:\software\installed\miniconda3\lib\site-packages\pypyodbc.py", line 1605, in execute
self.execdirect(query_string)
File "C:\software\installed\miniconda3\lib\site-packages\pypyodbc.py", line 1631, in execdirect
check_success(self, ret)
File "C:\software\installed\miniconda3\lib\site-packages\pypyodbc.py", line 986, in check_success
ctrl_err(SQL_HANDLE_STMT, ODBC_obj.stmt_h, ret, ODBC_obj.ansi)
File "C:\software\installed\miniconda3\lib\site-packages\pypyodbc.py", line 964, in ctrl_err
raise Error(state,err_text)
pypyodbc.Error: ('HY000', "[HY000] [Microsoft][ODBC-Treiber für Microsoft Access] ODBC-Verbindung zu 'SQL Server Native Client 11.0SQLHOST' fehlgeschlagen.")
The error message roughly translates to "ODBC connection to 'SQL Server Native Client 11.0SQLHOST' failed".
I cannot access the SQL Server through the .accdb database with pypyodbc, but querying the cidb_ain table from within MS Access is possible. Furthermore, I can connect to the SQL Server directly:
cnxn = pyodbc.connect(driver='SQL Server Native Client 11.0',
server='SQLHOST',
trusted_connection='yes',
database='stuffdb')
Considering that (1) MS Access (and Matlab too) can use the information contained in the .accdb file to query the linked tables, and (2) the SQL Server is accessible, I assume the problem is related to pypyodbc. (The way driver name and host name are mangled into 'SQL Server Native Client 11.0SQLHOST' in the error message seems somewhat suspicious, too.)
I have no previous experience with Access, so please be patient and let me know if I omitted important information that seemed unnecessary to me...
First, MS Access is a unique type of database application that is somewhat different than other RDMS's (e.g., SQLite, MySQL, PostgreSQL, Oracle, DB2) as it ships with both a default back-end Jet/ACE SQL Relational Engine (which by the way is not an Access-restricted component but a general Microsoft technology) and a front-end GUI interface and report generator. In essence, Access is a collection of objects.
Linked tables are somewhat a feature of the front-end side of MS Access used to replace the default Jet/ACE database (i.e., local tables) for another backend database, specifically for you SQL Server. Moreover, linked tables are ODBC/OLEDB connections themselves! You had to have used a DSN, Driver, or Provider to even establish and create linked tables in the MS Access file.
Hence, any external client, here being your Python script, that connects to the MS Access database [driver='Microsoft Access Driver (*.mdb, *.accdb)] is actually connecting to the backend Jet/ACE database. Client/script never interacts with frontend objects. In your error Python reads the ODBC connection of the linked table and since the SQL Server Driver/Provider [SQL Server Native Client 11.0SQLHOST] is never called in script, the script fails.
Altogether, to resolve your situation you must connect Python directly to the SQL Server database (and not use MS Access as a medium) to connect to any local tables there, here being cidb_ain. Simply use the connection string of the Access linked table:
#(USING DSN)
db = pypyodbc.connect('DSN=dsn name;')
cur = db.cursor()
cur.execute("SELECT * FROM dbo.cidb_ain")
for row in cur.fetchall()
print(row)
cur.close()
db.close()
# (USING DRIVER)
constr = 'Trusted_Connection=yes;DRIVER={SQL Server};SERVER=servername;' \
'DATABASE=database name;UID=username;PWD=password'
db = pypyodbc.connect(constr)
cur = db.cursor()
cur.execute("SELECT * FROM dbo.cidb_ain")
for row in cur.fetchall()
print(row)
cur.close()
db.close()
Update:
It turns out that the solution to this problem is as simple as setting pyodbc.pooling = False before establishing the connection to the Access database:
import pyodbc
# ... also works with `import pypyodbc as pyodbc`, too
pyodbc.pooling = False # this prevents the error
cnxn = pyodbc.connect(r"DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};DBQ= ... ")
(previous answer)
It appears that neither pypyodbc nor pyodbc can read a SQL Server linked table from an Access database. However, System.Data.Odbc in .NET can do it so IronPython can, too.
To verify, I created a table named [Foods] in SQL Server
id guestId food
-- ------- ----
1 1 pie
2 2 soup
I created an ODBC linked table named [dbo_Foods] in Access which pointed to that table on SQL Server.
I also created a local Access table named [Guests] ...
id firstName
-- ---------
1 Gord
2 Jenn
... and a saved Access query named [qryGuestPreferences] ...
SELECT Guests.firstName, dbo_Foods.food
FROM Guests INNER JOIN dbo_Foods ON Guests.id = dbo_Foods.guestId;
Running the following script in IronPython ...
import clr
import System
clr.AddReference("System.Data")
from System.Data.Odbc import OdbcConnection, OdbcCommand
connectString = (
r"DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};"
r"DBQ=C:\Users\Public\Database1.accdb;"
)
conn = OdbcConnection(connectString)
conn.Open()
query = """\
SELECT firstName, food
FROM qryGuestPreferences
"""
cmd = OdbcCommand(query, conn)
rdr = cmd.ExecuteReader()
while rdr.Read():
print("{0} likes {1}.".format(rdr["firstName"], rdr["food"]))
conn.Close()
... returns
Gord likes pie.
Jenn likes soup.