How can I directly connect MS SQL Server to polars?
The documentation does not list any supported connections but recommends the use of pandas.
Update:
SQL Server Authentication works per answer, but Windows domain authentication is not working. see issue
Ahh, actually MsSQL is supported for loading directly into polars (via the underlying library that does the work, which is connectorx); the documentation is just slightly out of date - I'll take a look and refresh it accordingly.
Here you can connect to MS SQL Server with Polar (connectorx under the hood). Just use a connection string:
import polars as pl
# usually don't store sensitive info in plain text
username = 'my_username'
password = '1234'
server = 'SERVER1'
database = 'db1'
trusted_conn = 'no' # or yes
conn = f'mssql://{username}:{password}#{server}/{database}?driver=SQL+Server&trusted_connection={trusted_conn}'
query = "SELECT * FROM table1"
df = pl.read_sql(query, conn)
Related
I am trying to make sense of the following error that I started getting when I setup my python code to run on a VM server, which has 3.9.5 installed instead of 3.8.5 on my desktop. Not sure that matters, but it could be part of the reason.
The error
C:\ProgramData\Miniconda3\lib\site-packages\pandas\io\sql.py:758: UserWarning: pandas only support SQLAlchemy connectable(engine/connection) or
database string URI or sqlite3 DBAPI2 connection
other DBAPI2 objects are not tested, please consider using SQLAlchemy
warnings.warn(
This is within a fairly simple .py file that imports pyodbc & sqlalchemy fwiw. A fairly generic/simple version of sql calls that yields the warning is:
myserver_string = "xxxxxxxxx,nnnn"
db_string = "xxxxxx"
cnxn = "Driver={ODBC Driver 17 for SQL Server};Server=tcp:"+myserver_string+";Database="+db_string +";TrustServerCertificate=no;Connection Timeout=600;Authentication=ActiveDirectoryIntegrated;"
def readAnyTable(tablename, date):
conn = pyodbc.connect(cnxn)
query_result = pd.read_sql_query(
'''
SELECT *
FROM [{0}].[dbo].[{1}]
where Asof >= '{2}'
'''.format(db_string,tablename,date,), conn)
conn.close()
return query_result
All the examples I have seen using pyodbc in python look fairly similar. Is pyodbc becoming deprecated? Is there a better way to achieve similar results without warning?
Is pyodbc becoming deprecated?
No. For at least the last couple of years pandas' documentation has clearly stated that it wants either a SQLAlchemy Connectable (i.e., an Engine or Connection object) or a SQLite DBAPI connection. (The switch-over to SQLAlchemy was almost universal, but they continued supporting SQLite connections for backwards compatibility.) People have been passing other DBAPI connections (like pyodbc Connection objects) for read operations and pandas hasn't complained … until now.
Is there a better way to achieve similar results without warning?
Yes. You can take your existing ODBC connection string and use it to create a SQLAlchemy Engine object as described in the SQLAlchemy 1.4 documentation:
from sqlalchemy.engine import URL
connection_string = "DRIVER={ODBC Driver 17 for SQL Server};SERVER=dagger;DATABASE=test;UID=user;PWD=password"
connection_url = URL.create("mssql+pyodbc", query={"odbc_connect": connection_string})
from sqlalchemy import create_engine
engine = create_engine(connection_url)
Then pass engine to the pandas methods you need to use.
It works for me.
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import pyodbc
import sqlalchemy as sa
import urllib
from sqlalchemy import create_engine, event
from sqlalchemy.engine.url import URL
server = 'IP ADDRESS or Server Name'
database = 'AdventureWorks2014'
username = 'xxx'
password = 'xxx'
params = urllib.parse.quote_plus("DRIVER={SQL Server};"
"SERVER="+server+";"
"DATABASE="+database+";"
"UID="+username+";"
"PWD="+password+";")
engine = sa.create_engine("mssql+pyodbc:///?odbc_connect={}".format(params))
qry = "SELECT t.[group] as [Region],t.name as [Territory],C.[AccountNumber]"
qry = qry + "FROM [Sales].[Customer] C INNER JOIN [Sales].SalesTerritory t on t.TerritoryID = c.TerritoryID "
qry = qry + "where StoreID is not null and PersonID is not null"
with engine.connect() as con:
rs = con.execute(qry)
for row in rs:
print (row)
You can use the SQL Server name or the IP address, but this requires a basic DNS listing. Most corporate servers should already have this listing though. You can check the server name or IP address using the nslookup command in the command prompt followed by the server name or IP address.
I'm using SQL 2017 on Ubuntu server running on VMWare. I'm connecting with IP Address here as part of a wider "running MSSQL on Ubuntu" project.
If you are connecting with your Windows credentials, you can replace the params with the trusted_connection parameter.
params = urllib.parse.quote_plus("DRIVER={SQL Server};"
"SERVER="+server+";"
"DATABASE="+database+";"
"trusted_connection=yes")
since its a warning, I suppressed the message using the warnings python library. Hope this helps
import warnings
with warnings.catch_warnings(record=True):
warnings.simplefilter("always")
#your code goes here
My company doesn't use SQLAlchemy, preferring to use postgres connections based on pscycopg2 and incorporating other features. If you can run your script directly from a command line, then turning warnings off will solve the problem: start it with python3 -W ignore
The correct way to import for SQLAlchemy 1.4.36 is using:
import pandas as pd
from sqlalchemy import create_engine, event
from sqlalchemy.engine.url import URL
#...
conn_str = set_db_info() # see above
conn_url = URL.create("mssql+pyodbc", query={"odbc_connect": conn_str})
engine = create_engine(conn_url)
df = pd.read_sql(SQL, engine)
df.head()
I'm currently trying to write a pandas data frame into a new SQL Server table, and I'm having trouble figuring out how to connect WITHOUT USING USER/PASSWORD.
Pandas documentation states that an engine must be created via sqlalchemy, and the company only gave me a sample code (not using pandas, for other tasks) for the connection via pymssql:
server = "server name"
conn = pymssql.connect(server, database='TestDatabase')
cursor = conn.cursor()
cursor.execute(instruction)
conn.close()
Now I must pass a connection to sqlalchemy, which the sqlalchemy documentation states would be something like
engine = create_engine("mssql+pymssql://<username>:<password>#<freetds_name>/?charset=utf8",
encoding='latin1', echo=True)
but our SQL Server instance uses local authentication, and I found no model for this.
How can I create this connection string?
I've reached the writing to a SQL Server database part of my data journey, I hope someone is able to help.
I've been able to successfully connect to a remote Microsoft SQL Server database using PYODBC this allows me to pass in SQL queries into dataframes and to create reports.
I now would want to automate the "select import" manual method I've had a read of many blogs but I'm none the wiser to understanding the how behind it all.
import pandas as pd
import pyodbc
SERVER = r'Remote SQL Server'
database = 'mydB'
username = 'datanovice'
password = 'datanovice'
cnxn = pyodbc.connect('Driver={SQL
Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+
password)
cursor = cnxn.cursor()
I'm able to read queries easily using this and pass them into dataframes.
what's the best way to write into my MS SQL dB? noting that it's not local I'm happy to pass this into SQL Alchemy but I wasn't sure of the correct syntax.
Things to consider:
This is a mission critical database and some of the DataFrames must be written as delete queries
If this is an unsafe method and if I need to go back and study more to understand proper database methodology I'm very happy to do so
I'm not looking for someone to write or provide the code for me, but rather point me in the right direction
I envisage this to be something like.. but I'm not sure how I specify the correct table:
df.to_sql('my_df', con, chunksize=1000)
As you've seen from the pandas documentation you need to pass a SQLAlchemy engine object as the second argument to the to_sql method. Then you can use something like
df.to_sql("table_name", engine, if_exists="replace")
The SQLAlchemy documentation shows how to create the engine object. If you use an ODBC DSN then the statement will look something like this:
from sqlalchemy import create_engine
# ...
engine = create_engine("mssql+pyodbc://scott:tiger#some_dsn")
How can pypyodbc connect to linked tables in the .accdb database? Is this possible at all, or is this a limitation of pyodbc?
I need to get data from an MS Acess .accdb database into Python. This works perfectly and I can use pypyodbc to access tables and queries defined inside the .accdb Database. However, the database also has tables linked to an external SQL Server. When accessing such linked tables pypyodbc complains that it cannot connect to the SQL server.
test.accdb contains two tables: Test (local table) and cidb_ain (linked SQL table)
The following Python 3 code is my attempt to access the data:
import pypyodbc as pyodbc
cnxn = pyodbc.connect(driver='Microsoft Access Driver (*.mdb, *.accdb)',
dbq='test.accdb',
readonly=True)
cursor = cnxn.cursor()
# access to the local table works
for row in cursor.execute("select * from Test"):
print(row)
print('----')
# access to the linked table fails
for row in cursor.execute("select * from cidb_ain"):
print(row)
Output:
(1, 'eins', 1)
(2, 'zwei', 2)
(3, 'drei', 3)
----
Traceback (most recent call last):
File "test_02_accdb.py", line 14, in <module>
for row in cursor.execute("select * from cidb_ain"):
File "C:\software\installed\miniconda3\lib\site-packages\pypyodbc.py", line 1605, in execute
self.execdirect(query_string)
File "C:\software\installed\miniconda3\lib\site-packages\pypyodbc.py", line 1631, in execdirect
check_success(self, ret)
File "C:\software\installed\miniconda3\lib\site-packages\pypyodbc.py", line 986, in check_success
ctrl_err(SQL_HANDLE_STMT, ODBC_obj.stmt_h, ret, ODBC_obj.ansi)
File "C:\software\installed\miniconda3\lib\site-packages\pypyodbc.py", line 964, in ctrl_err
raise Error(state,err_text)
pypyodbc.Error: ('HY000', "[HY000] [Microsoft][ODBC-Treiber für Microsoft Access] ODBC-Verbindung zu 'SQL Server Native Client 11.0SQLHOST' fehlgeschlagen.")
The error message roughly translates to "ODBC connection to 'SQL Server Native Client 11.0SQLHOST' failed".
I cannot access the SQL Server through the .accdb database with pypyodbc, but querying the cidb_ain table from within MS Access is possible. Furthermore, I can connect to the SQL Server directly:
cnxn = pyodbc.connect(driver='SQL Server Native Client 11.0',
server='SQLHOST',
trusted_connection='yes',
database='stuffdb')
Considering that (1) MS Access (and Matlab too) can use the information contained in the .accdb file to query the linked tables, and (2) the SQL Server is accessible, I assume the problem is related to pypyodbc. (The way driver name and host name are mangled into 'SQL Server Native Client 11.0SQLHOST' in the error message seems somewhat suspicious, too.)
I have no previous experience with Access, so please be patient and let me know if I omitted important information that seemed unnecessary to me...
First, MS Access is a unique type of database application that is somewhat different than other RDMS's (e.g., SQLite, MySQL, PostgreSQL, Oracle, DB2) as it ships with both a default back-end Jet/ACE SQL Relational Engine (which by the way is not an Access-restricted component but a general Microsoft technology) and a front-end GUI interface and report generator. In essence, Access is a collection of objects.
Linked tables are somewhat a feature of the front-end side of MS Access used to replace the default Jet/ACE database (i.e., local tables) for another backend database, specifically for you SQL Server. Moreover, linked tables are ODBC/OLEDB connections themselves! You had to have used a DSN, Driver, or Provider to even establish and create linked tables in the MS Access file.
Hence, any external client, here being your Python script, that connects to the MS Access database [driver='Microsoft Access Driver (*.mdb, *.accdb)] is actually connecting to the backend Jet/ACE database. Client/script never interacts with frontend objects. In your error Python reads the ODBC connection of the linked table and since the SQL Server Driver/Provider [SQL Server Native Client 11.0SQLHOST] is never called in script, the script fails.
Altogether, to resolve your situation you must connect Python directly to the SQL Server database (and not use MS Access as a medium) to connect to any local tables there, here being cidb_ain. Simply use the connection string of the Access linked table:
#(USING DSN)
db = pypyodbc.connect('DSN=dsn name;')
cur = db.cursor()
cur.execute("SELECT * FROM dbo.cidb_ain")
for row in cur.fetchall()
print(row)
cur.close()
db.close()
# (USING DRIVER)
constr = 'Trusted_Connection=yes;DRIVER={SQL Server};SERVER=servername;' \
'DATABASE=database name;UID=username;PWD=password'
db = pypyodbc.connect(constr)
cur = db.cursor()
cur.execute("SELECT * FROM dbo.cidb_ain")
for row in cur.fetchall()
print(row)
cur.close()
db.close()
Update:
It turns out that the solution to this problem is as simple as setting pyodbc.pooling = False before establishing the connection to the Access database:
import pyodbc
# ... also works with `import pypyodbc as pyodbc`, too
pyodbc.pooling = False # this prevents the error
cnxn = pyodbc.connect(r"DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};DBQ= ... ")
(previous answer)
It appears that neither pypyodbc nor pyodbc can read a SQL Server linked table from an Access database. However, System.Data.Odbc in .NET can do it so IronPython can, too.
To verify, I created a table named [Foods] in SQL Server
id guestId food
-- ------- ----
1 1 pie
2 2 soup
I created an ODBC linked table named [dbo_Foods] in Access which pointed to that table on SQL Server.
I also created a local Access table named [Guests] ...
id firstName
-- ---------
1 Gord
2 Jenn
... and a saved Access query named [qryGuestPreferences] ...
SELECT Guests.firstName, dbo_Foods.food
FROM Guests INNER JOIN dbo_Foods ON Guests.id = dbo_Foods.guestId;
Running the following script in IronPython ...
import clr
import System
clr.AddReference("System.Data")
from System.Data.Odbc import OdbcConnection, OdbcCommand
connectString = (
r"DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};"
r"DBQ=C:\Users\Public\Database1.accdb;"
)
conn = OdbcConnection(connectString)
conn.Open()
query = """\
SELECT firstName, food
FROM qryGuestPreferences
"""
cmd = OdbcCommand(query, conn)
rdr = cmd.ExecuteReader()
while rdr.Read():
print("{0} likes {1}.".format(rdr["firstName"], rdr["food"]))
conn.Close()
... returns
Gord likes pie.
Jenn likes soup.
I don't know much about databases - Sorry if the question seems silly.
I have sql server 2012 on my machine and i create simple database table.
I want to connect to this database table thru C# code.
So, I need to know my ConnectionString.
I don't understand the parameters of the ConnectionString.
I try to google it - but still didn't find any good explanation.
Anyone can please explain the connectionString fields ?
How to define the connectionString that i will be able to connect the local database ?
thanks
Your connection string should be as simple as like below
Data Source=.;Initial Catalog=DB_NAME;Integrated Security=True"
Where
Data Source=. means local database
Initial Catalog=DB_NAME means the database it will connect to
Integrated Security=True means it will use windows authentication (no user name and password needed; it will use logged in credential)
Take a look Here
(OR)
Search in Google with key term sqlconncectionstring which will fetch you many help.
EDIT:
You are getting exception cause Initial Catalog=DB_Name\Table_001. It should be Initial Catalog=DB_Name (only database name). Provide the table name in sql query to execute. Check some online tutorial to get more idea on the same.
You use . in data source only when you are connecting to local machine database and to the default SQL Server instance. Else if you are using different server and named SQL Server instance then your connection string should look like
using(SqlConnection sqlConnection = new SqlConnection())
{
sqlConnection.ConnectionString =
#"Data Source=Actual_server_name\actual_sqlserver_instance_name;
Initial Catalog=actual_database_name_Name;
Integrated Security=True;";
sqlConnection.Open();
}
In case you are using local machine but named SQL Server instance then use
Data Source=.\actual_sqlserver_instance_name;
Initial Catalog=Actual_Database_NAME;Integrated Security=True"
using System.Data.SqlClient;
Then create a SqlConnection and specifying the connection string.
SqlConnection myConnection = new SqlConnection("user id=username;" +
"password=password;server=serverurl;" +
"Trusted_Connection=yes;" +
"database=database; " +
"connection timeout=30");
Note: line break in connection string is for formatting purposes only
SqlConnection.ConnectionString
The connection string is simply a compilation of options and values to specify how and what to connect to. Upon investigating the Visual Studio .NET help files I discovered that several fields had multiple names that worked the same, like Password and Pwd work interchangeably.
User ID
The User ID is used when you are using SQL Authentication. In my experience this is ignored when using a Trusted_Connection, or Windows Authentication. If the username is associated with a password Password or Pwd will be used.
"user id=userid;"
Password or Pwd
The password field is to be used with the User ID, it just wouldn't make sense to log in without a username, just a password. Both Password and Pwd are completely interchangeable.
"Password=validpassword;"-or-
"Pwd=validpassword;"
Data Source or Server or Address or Addr or Network Address
Upon looking in the MSDN documentation I found that there are several ways to specify the network address. The documentation mentions no differences between them and they appear to be interchangeable. The address is an valid network address, for brevity I am only using the localhost address in the examples.
"Data Source=localhost;"
-or-
"Server=localhost;"
-or-
"Address=localhost;"-or-"Addr=localhost;"
-or-"Network Address=localhost;"
Integrated Sercurity or Trusted_Connection
Integrated Security and Trusted_Connection are used to specify wheter the connnection is secure, such as Windows Authentication or SSPI. The recognized values are true, false, and sspi. According to the MSDN documentation sspi is equivalent to true. Note: I do not know how SSPI works, or affects the connection.
Connect Timeout or Connection Timeout
These specify the time, in seconds, to wait for the server to respond before generating an error. The default value is 15 (seconds).
"Connect Timeout=10;"-or-
"Connection Timeout=10;"
Initial Catalog or Database
Initial Catalog and Database are simply two ways of selecting the database associated with the connection.
"Inital Catalog=main;"
-or-
"Database=main;"