Getting a warning when using a pyodbc Connection object with pandas

Getting a warning when using a pyodbc Connection object with pandas - sql-server

I am trying to make sense of the following error that I started getting when I setup my python code to run on a VM server, which has 3.9.5 installed instead of 3.8.5 on my desktop. Not sure that matters, but it could be part of the reason.
The error
C:\ProgramData\Miniconda3\lib\site-packages\pandas\io\sql.py:758: UserWarning: pandas only support SQLAlchemy connectable(engine/connection) or
database string URI or sqlite3 DBAPI2 connection
other DBAPI2 objects are not tested, please consider using SQLAlchemy
warnings.warn(
This is within a fairly simple .py file that imports pyodbc & sqlalchemy fwiw. A fairly generic/simple version of sql calls that yields the warning is:
myserver_string = "xxxxxxxxx,nnnn"
db_string = "xxxxxx"
cnxn = "Driver={ODBC Driver 17 for SQL Server};Server=tcp:"+myserver_string+";Database="+db_string +";TrustServerCertificate=no;Connection Timeout=600;Authentication=ActiveDirectoryIntegrated;"
def readAnyTable(tablename, date):
conn = pyodbc.connect(cnxn)
query_result = pd.read_sql_query(
'''
SELECT *
FROM [{0}].[dbo].[{1}]
where Asof >= '{2}'
'''.format(db_string,tablename,date,), conn)
conn.close()
return query_result
All the examples I have seen using pyodbc in python look fairly similar. Is pyodbc becoming deprecated? Is there a better way to achieve similar results without warning?

Is pyodbc becoming deprecated?
No. For at least the last couple of years pandas' documentation has clearly stated that it wants either a SQLAlchemy Connectable (i.e., an Engine or Connection object) or a SQLite DBAPI connection. (The switch-over to SQLAlchemy was almost universal, but they continued supporting SQLite connections for backwards compatibility.) People have been passing other DBAPI connections (like pyodbc Connection objects) for read operations and pandas hasn't complained … until now.
Is there a better way to achieve similar results without warning?
Yes. You can take your existing ODBC connection string and use it to create a SQLAlchemy Engine object as described in the SQLAlchemy 1.4 documentation:
from sqlalchemy.engine import URL
connection_string = "DRIVER={ODBC Driver 17 for SQL Server};SERVER=dagger;DATABASE=test;UID=user;PWD=password"
connection_url = URL.create("mssql+pyodbc", query={"odbc_connect": connection_string})
from sqlalchemy import create_engine
engine = create_engine(connection_url)
Then pass engine to the pandas methods you need to use.

It works for me.
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import pyodbc
import sqlalchemy as sa
import urllib
from sqlalchemy import create_engine, event
from sqlalchemy.engine.url import URL
server = 'IP ADDRESS or Server Name'
database = 'AdventureWorks2014'
username = 'xxx'
password = 'xxx'
params = urllib.parse.quote_plus("DRIVER={SQL Server};"
"SERVER="+server+";"
"DATABASE="+database+";"
"UID="+username+";"
"PWD="+password+";")
engine = sa.create_engine("mssql+pyodbc:///?odbc_connect={}".format(params))
qry = "SELECT t.[group] as [Region],t.name as [Territory],C.[AccountNumber]"
qry = qry + "FROM [Sales].[Customer] C INNER JOIN [Sales].SalesTerritory t on t.TerritoryID = c.TerritoryID "
qry = qry + "where StoreID is not null and PersonID is not null"
with engine.connect() as con:
rs = con.execute(qry)
for row in rs:
print (row)
You can use the SQL Server name or the IP address, but this requires a basic DNS listing. Most corporate servers should already have this listing though. You can check the server name or IP address using the nslookup command in the command prompt followed by the server name or IP address.
I'm using SQL 2017 on Ubuntu server running on VMWare. I'm connecting with IP Address here as part of a wider "running MSSQL on Ubuntu" project.
If you are connecting with your Windows credentials, you can replace the params with the trusted_connection parameter.
params = urllib.parse.quote_plus("DRIVER={SQL Server};"
"SERVER="+server+";"
"DATABASE="+database+";"
"trusted_connection=yes")

since its a warning, I suppressed the message using the warnings python library. Hope this helps
import warnings
with warnings.catch_warnings(record=True):
warnings.simplefilter("always")
#your code goes here

My company doesn't use SQLAlchemy, preferring to use postgres connections based on pscycopg2 and incorporating other features. If you can run your script directly from a command line, then turning warnings off will solve the problem: start it with python3 -W ignore

The correct way to import for SQLAlchemy 1.4.36 is using:
import pandas as pd
from sqlalchemy import create_engine, event
from sqlalchemy.engine.url import URL
#...
conn_str = set_db_info() # see above
conn_url = URL.create("mssql+pyodbc", query={"odbc_connect": conn_str})
engine = create_engine(conn_url)
df = pd.read_sql(SQL, engine)
df.head()

Related

Connect python-polars to SQL server (no support currently)

How can I directly connect MS SQL Server to polars?
The documentation does not list any supported connections but recommends the use of pandas.
Update:
SQL Server Authentication works per answer, but Windows domain authentication is not working. see issue

Ahh, actually MsSQL is supported for loading directly into polars (via the underlying library that does the work, which is connectorx); the documentation is just slightly out of date - I'll take a look and refresh it accordingly.

Here you can connect to MS SQL Server with Polar (connectorx under the hood). Just use a connection string:
import polars as pl
# usually don't store sensitive info in plain text
username = 'my_username'
password = '1234'
server = 'SERVER1'
database = 'db1'
trusted_conn = 'no' # or yes
conn = f'mssql://{username}:{password}#{server}/{database}?driver=SQL+Server&trusted_connection={trusted_conn}'
query = "SELECT * FROM table1"
df = pl.read_sql(query, conn)

Pandas df to SQL Server, connecting without user/password

I'm currently trying to write a pandas data frame into a new SQL Server table, and I'm having trouble figuring out how to connect WITHOUT USING USER/PASSWORD.
Pandas documentation states that an engine must be created via sqlalchemy, and the company only gave me a sample code (not using pandas, for other tasks) for the connection via pymssql:
    server = "server name"
    conn = pymssql.connect(server, database='TestDatabase')
    cursor = conn.cursor()
    cursor.execute(instruction)
    conn.close()
Now I must pass a connection to sqlalchemy, which the sqlalchemy documentation states would be something like
engine = create_engine("mssql+pymssql://<username>:<password>#<freetds_name>/?charset=utf8",
encoding='latin1', echo=True)
but our SQL Server instance uses local authentication, and I found no model for this.
How can I create this connection string?

Writing to a SQL Server database from Pandas using PYODBC

I've reached the writing to a SQL Server database part of my data journey, I hope someone is able to help.
I've been able to successfully connect to a remote Microsoft SQL Server database using PYODBC this allows me to pass in SQL queries into dataframes and to create reports.
I now would want to automate the "select import" manual method I've had a read of many blogs but I'm none the wiser to understanding the how behind it all.
import pandas as pd
import pyodbc
SERVER = r'Remote SQL Server'
database = 'mydB'
username = 'datanovice'
password = 'datanovice'
cnxn = pyodbc.connect('Driver={SQL
Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+
password)
cursor = cnxn.cursor()
I'm able to read queries easily using this and pass them into dataframes.
what's the best way to write into my MS SQL dB? noting that it's not local I'm happy to pass this into SQL Alchemy but I wasn't sure of the correct syntax.
Things to consider:
This is a mission critical database and some of the DataFrames must be written as delete queries
If this is an unsafe method and if I need to go back and study more to understand proper database methodology I'm very happy to do so
I'm not looking for someone to write or provide the code for me, but rather point me in the right direction
I envisage this to be something like.. but I'm not sure how I specify the correct table:
df.to_sql('my_df', con, chunksize=1000)

As you've seen from the pandas documentation you need to pass a SQLAlchemy engine object as the second argument to the to_sql method. Then you can use something like
df.to_sql("table_name", engine, if_exists="replace")
The SQLAlchemy documentation shows how to create the engine object. If you use an ODBC DSN then the statement will look something like this:
from sqlalchemy import create_engine
# ...
engine = create_engine("mssql+pyodbc://scott:tiger#some_dsn")

Load a dataframe to SQL Server from pandas? [duplicate]

I am trying to understand how python could pull data from an FTP server into pandas then move this into SQL server. My code here is very rudimentary to say the least and I am looking for any advice or help at all. I have tried to load the data from the FTP server first which works fine.... If I then remove this code and change it to a select from ms sql server it is fine so the connection string works, but the insertion into the SQL server seems to be causing problems.
import pyodbc
import pandas
from ftplib import FTP
from StringIO import StringIO
import csv
ftp = FTP ('ftp.xyz.com','user','pass' )
ftp.set_pasv(True)
r = StringIO()
ftp.retrbinary('filname.csv', r.write)
pandas.read_table (r.getvalue(), delimiter=',')
connStr = ('DRIVER={SQL Server Native Client 10.0};SERVER=localhost;DATABASE=TESTFEED;UID=sa;PWD=pass')
conn = pyodbc.connect(connStr)
cursor = conn.cursor()
cursor.execute("INSERT INTO dbo.tblImport(Startdt, Enddt, x,y,z,)" "VALUES (x,x,x,x,x,x,x,x,x,x.x,x)")
cursor.close()
conn.commit()
conn.close()
print"Script has successfully run!"
When I remove the ftp code this runs perfectly, but I do not understand how to make the next jump to get this into Microsoft SQL server, or even if it is possible without saving into a file first.

For the 'write to sql server' part, you can use the convenient to_sql method of pandas (so no need to iterate over the rows and do the insert manually). See the docs on interacting with SQL databases with pandas: http://pandas.pydata.org/pandas-docs/stable/io.html#io-sql
You will need at least pandas 0.14 to have this working, and you also need sqlalchemy installed. An example, assuming df is the DataFrame you got from read_table:
import sqlalchemy
import pyodbc
engine = sqlalchemy.create_engine("mssql+pyodbc://<username>:<password>#<dsnname>")
# write the DataFrame to a table in the sql database
df.to_sql("table_name", engine)
See also the documentation page of to_sql.
More info on how to create the connection engine with sqlalchemy for sql server with pyobdc, you can find here:http://docs.sqlalchemy.org/en/rel_1_1/dialects/mssql.html#dialect-mssql-pyodbc-connect
But if your goal is to just get the csv data into the SQL database, you could also consider doing this directly from SQL. See eg Import CSV file into SQL Server

Python3 version using a LocalDB SQL instance:
from sqlalchemy import create_engine
import urllib
import pyodbc
import pandas as pd
df = pd.read_csv("./data.csv")
quoted = urllib.parse.quote_plus("DRIVER={SQL Server Native Client 11.0};SERVER=(localDb)\ProjectsV14;DATABASE=database")
engine = create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted))
df.to_sql('TargetTable', schema='dbo', con = engine)
result = engine.execute('SELECT COUNT(*) FROM [dbo].[TargetTable]')
result.fetchall()

Yes, the bcp utility seems to be the best solution for most cases.
If you want to stay within Python, the following code should work.
from sqlalchemy import create_engine
import urllib
import pyodbc
quoted = urllib.parse.quote_plus("DRIVER={SQL Server};SERVER=YOUR\ServerName;DATABASE=YOur_Database")
engine = create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted))
df.to_sql('Table_Name', schema='dbo', con = engine, chunksize=200, method='multi', index=False, if_exists='replace')
Don't avoid method='multi', because it significantly reduces the task execution time.
Sometimes you may encounter the following error.
ProgrammingError: ('42000', '[42000] [Microsoft][ODBC SQL Server
Driver][SQL Server]The incoming request has too many parameters. The
server supports a maximum of 2100 parameters. Reduce the number of
parameters and resend the request. (8003) (SQLExecDirectW)')
In such a case, determine the number of columns in your dataframe: df.shape[1]. Divide the maximum supported number of parameters by this value and use the result's floor as a chunk size.

I found that using bcp utility (https://learn.microsoft.com/en-us/sql/tools/bcp-utility) works best when you have a large dataset. I have 2.7 million rows that inserts at 80K rows/sec. You can store your data frame as csv file (use tabs for separator if your data doesn't have tabs and utf8 encoding). With bcp, I've used format "-c" and it works without issues so far.

This worked for me on Python 3.5.2:
import sqlalchemy as sa
import urllib
import pyodbc
conn= urllib.parse.quote_plus('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
engine = sa.create_engine('mssql+pyodbc:///?odbc_connect={}'.format(conn))
frame.to_sql("myTable", engine, schema='dbo', if_exists='append', index=False, index_label='myField')

"As the Connection represents an open resource against the database, we want to always limit the scope of our use of this object to a specific context, and the best way to do that is by using Python context manager form, also known as the with statement."
https://docs.sqlalchemy.org/en/14/tutorial/dbapi_transactions.html
The example would then be
from sqlalchemy import create_engine
import urllib
import pyodbc
connection_string = (
"Driver={SQL Server Native Client 11.0};"
"Server=myserver;"
"UID=myuser;"
"PWD=mypwd;"
"Database=mydb;"
)
quoted = urllib.parse.quote_plus(connection_string)
engine = create_engine(f'mssql+pyodbc:///?odbc_connect={quoted}')
with engine.connect() as cnn:
df.to_sql('mytable',con=cnn, if_exists='replace', index=False)

Following is what worked for me using sqlalchemy. Pay attention to the last part ?driver=SQL+Server'.
import sqlalchemy
import pyodbc
engine = sqlalchemy.create_engine('mssql+pyodbc://MyUser:MyPWD#dataserver.sandbox.myserver/MY_DB?driver=SQL+Server')
dt.to_sql("PatientResultTest", engine,if_exists='append')
The SQL table needs an index column at the beginning to store the index value of dataframe.

# using class function
import pandas as pd
import pyodbc
import sqlalchemy
import urllib
class data_frame_to_sql():
def__init__(self,dataFrame,sql_table_name):
self.dataFrame=dataFrame
self.sql_table_name=sql_table_name
def conversion(self):
params = urllib.parse.quote_plus("DRIVER={SQL Server};"
"SERVER=######;"
"DATABASE=####;"
"UID=#####;"
"PWD=###;")
try:
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect={}".format(params))
return f"Table '{self.sql_table_name}' added sucsessfully in database" ,self.dataFrame.to_sql(self.sql_table_name, engine)
except Exception as e :
e=str(e).replace(".","")
print(f"{e} in Database." )
data={"BusinessEntityID":["1","2","3"],"FirstName":["raj","abhi","amir"],"LastName":["kapoor","bachn","khhan"]}
df = pd.DataFrame(data, columns= ['BusinessEntityID','FirstName','LastName'])
ab=data_frame_to_sql(df,"ab").conversion()
print(ab)

It's not necessary to use sqlamchemy, one could create a connection with pyodbc directly to use it with pandas, as below: `with pyodbc.connect('DRIVER={ODBC Driver 18 for SQL Server};SERVER='+server
+';DATABASE='+database+';UID='+username+';PWD='+ password) as newconn:
df = pd.read_sql(,newconn)
`

How to Connect R to Oracle?

I need to connect R to oracle and I have been unsuccessful so far. I downloaded two packages: RODBC & RODM.
This is the statement that I've been using:
DB <- odbcDriverConnect("DBIORES1",uid="mhala",pwd="XXXXXXX")
But I get this error:
Error in odbcDriverConnect("DBIORES1", uid = "mhalagan", pwd = "XXXXXXX") :
unused argument(s) (uid = "mhalagan", pwd = "XXXXXXX")
What information do I need to be able to connect to an oracle database? Am I using the correct package?

See the help page for odbcDriverConnect(). odbcDriverConnect() does not accept uid or pwd arguments. You probably meant to use odbcConnect() instead:
odbcConnect(dsn = "DBIORES1", uid = "mhala", pwd = "XXXXXXX")
In addition to the RODBC package, there is the RODM package, which I believe is specifically designed for Oracle databases and is further described here: http://www.oracle.com/technetwork/articles/datawarehouse/saternos-r-161569.html . I do not use Oracle databases, so cannot comment on advantages of the two packages.

RJDBC worked just fine for me. You just need to have Oracle-thin driver jar file and configure the connection like:
> install.packages("RJDBC")
> library(RJDBC)
> drv <- JDBC("oracle.jdbc.driver.OracleDriver","/path/to/driver/com/oracle/oracle-thin/11.2.0.1.0/oracle-thin-11.2.0.1.0.jar”)
> conn <- dbConnect(drv, "jdbc:oracle:thin:#database:port:schema”, “user”, “passwd”)
and then is ready to perform some queries.
JA.

I've had success in the past connecting to Oracle databases from R with RJDBC. I found it easier to get going as I just grabbed the connection string that I'd used successfully inside the java based GUI I was using at the time and like magic it "just works"(tm).

Did you install the oracle ODBC client/driver? You will need that if you are going to use the ODBC R package. Go to oracle instant client download get the client for your OS. install them and then proceed to configure the ODBC and test the connection outside of R then install the R and RODBC and test inside R.