How to Save a Data Frame as a table in SQL - sql-server

I have a SQL Server on which I have databases that I want to use pandas to alter that data. I know how to get the data using pyodbc into a DataFrame, but then I have no clue how to get that DataFrame back into my SQL Server.
I have tried to create an engine with sqlalchemy and use the to_sql command, but I can not get that to work because my engine is never able to connect correctly to my database.
import pyodbc
import pandas
server = "server"
db = "db"
conn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';DATABASE='+db+';Trusted_Connection=yes')
cursor = conn.cursor()
df = cursor.fetchall()
data = pandas.DataFrame(df)
conn.commit()

You can use pandas.DataFrame.to_sql to insert your dataframe into SQL server. Databases supported by SQLAlchemy are supported by this method.
Here is a example how you can achieve this:
from sqlalchemy import create_engine, event
from urllib.parse import quote_plus
import logging
import sys
import numpy as np
from datetime import datetime, timedelta
# setup logging
logging.basicConfig(stream=sys.stdout,
filemode='a',
format='%(asctime)s.%(msecs)3d %(levelname)s:%(name)s: %(message)s',
datefmt='%m-%d-%Y %H:%M:%S',
level=logging.DEBUG)
logger = logging.getLogger(__name__) # get the name of the module
def write_to_db(df, database_name, table_name):
"""
Creates a sqlalchemy engine and write the dataframe to database
"""
# replacing infinity by nan
df = df.replace([np.inf, -np.inf], np.nan)
user_name = 'USERNAME'
pwd = 'PASSWORD'
db_addr = '10.00.000.10'
chunk_size = 40
conn = "DRIVER={SQL Server};SERVER="+db_addr+";DATABASE="+database_name+";UID="+user_name+";PWD="+pwd+""
quoted = quote_plus(conn)
new_con = 'mssql+pyodbc:///?odbc_connect={}'.format(quoted)
# create sqlalchemy engine
engine = create_engine(new_con)
# Write to DB
logger.info("Writing to database ...")
st = datetime.now() # start time
# WARNING!! -- overwrites the table using if_exists='replace'
df.to_sql(table_name, engine, if_exists='replace', index=False, chunksize=chunk_size)
logger.info("Database updated...")
logger.info("Data written to '{}' databsae into '{}' table ...".format(database_name, table_name))
logger.info("Time taken to write to DB: {}".format((datetime.now()-st).total_seconds()))
Calling this method should write your dataframe to the database, note that it will replace the table if there is already a table in the database with the same name.

Related

How to read in SQL Server query as Dask dataframe (AttributeError: module 'dask.dataframe' has no attribute 'read_sql_query')

I currently use pyodbc to read in as pandas dataframe, then I convert it to dask dataframe. Is there anyway to read in as Dask dataframe directly?
Here's the code I'm currently using:
import pandas as pd
import numpy as np
from dask.dataframe import from_pandas
def conn_sql_server(file_path):
#Connect to SQL Server
conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
'Server= Server1;'
'Database = Database1;'
'Trusted_Connection=yes;')
#run query and ouput the result to df
query = open(file_path, 'r')
df = pd.read_sql_query(query.read(), conn, chunksize=10**4)
chunks =[]
for chunk in df:
chunks.append(chunk)
df_comb = pd.concat(chunks)
query.close()
return df_comb
#load in as pandas dataframe
data = conn_sql_server('.\input\data pull.sql')
#Convert to dask dataframe
dd = from_pandas(data, npartitions=3)
I tried to use dd.read_sql_query with pyodbc package or sqlalchemy package. Both returned an AttributeError: module 'dask.dataframe' has no attribute 'read_sql_query'
(1) pyodbc:
import dask.dataframe as dd
def conn_sql_server(file_path):
#Connect to SQL Server
conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
'Server= Server1;'
'Database = Database1;'
'Trusted_Connection=yes;')
#run query and ouput the result to df
query = open(file_path, 'r')
df = dd.read_sql_query(query.read(), conn)
query.close()
return df
data = conn_sql_server('.\input\data pull.sql')
AttributeError: module 'dask.dataframe' has no attribute 'read_sql_query'
(2) sqlalchemy:
from sqlalchemy import create_engine
Server= 'Server1'
Database = 'Database1'
Driver= 'SQL Server Native Client 11.0'
uri = f'mssql://{Server}/{Database}?driver={Driver}'
query = open('.\input\data pull.sql', 'r')
dd.read_sql_query((query.read(), uri))
As Nick suggested, I've upgrade dask to the latest version using python -m pip install dask distributed --upgrade. I also checked all of the functions listed in the dask.dataframe module using the following script. Found out that there is only read_sql_table, no read_sql_query
from inspect import getmembers, isfunction
import dask.dataframe as dd
getmembers(dd, isfunction)

Connection to different SQL Server databases

I want to create a connection to a SQL Server Database using a python script that you will see below. In fact, I managed to connect to a single database but I want to make a loop on a database list so that every time I point to a database I can connect directly.
My question is how can I make the name of the database as a variable. Could you help me please?! Thank you in advance
from os import listdir
from os.path import isfile, join
import pyodbc
import pandas as pd
mypath = 'C:\\Users\\DataBaseList'
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
for i in range (len(onlyfiles)) :
print (onlyfiles[i])
connection = pyodbc.connect("Driver={ODBC Driver 11 for SQL Server};"
"Server=FR0010APP31;"
"Database= Base;"
"uid=*****;pwd=*****")
cursor = connection.cursor()
sql='Select * from Activity'
df= pd.read_sql(sql, connection)
df.to_csv('DatabaseFile')

to_sql not updating table in ms server

Hey so I have the below code I'm running where I'm pulling a table from a MS Sql Server table, running a bit of code and then trying to reimport it into another table within the same database. Running this in spyder
It runs all the way through, but when I
select * from pythontest
on the SQL Server, the table comes out blank. Is there anything that is standing out as not working?
## From SQL to DataFrame Pandas
import pandas as pd
import pyodbc
import mysql.connector
from sqlalchemy import create_engine
sql_conn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
"Server=njsrvnav1;"
"Database=cornerstone;"
"Trusted_Connection=yes;")
query = "SELECT [c1], [c2], [c3] from projectmaster"
df = pd.read_sql(query, sql_conn)
df = df[:100]
con = create_engine('mssql+pyodbc://username:pword#serverName:1433/cornerstone?driver=SQL+Server+Native+Client+11.0')
df.to_sql('dbo.pythontest', con, if_exists='replace')
con.dispose()
Try with pymysql :
conn = pymysql.connect(
host='',
port=
user='',
passwd='',
db='',
charset='utf8mb4')
df = pd.read_sql_query("SELECT * FROM table ",
conn)
df.head(2)

Error when copying data from Pandas Dataframe to Amazon RedShift table

I am trying to write a Pandas Dataframe to a Amazon Redshift DB. Given below is the code I am using.
from sqlalchemy import create_engine
import psycopg2
import io
engine=create_engine('postgresql+psycopg2://username:password#host:port/database')
conn=engine.raw_connection()
cur = conn.cursor()
output = io.StringIO()
report.to_csv(output, sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cur.copy_from(output, table_name, null="")
conn.commit()
However on running the above code, i get the below error
NameError: name 'database' is not defined
Could anyone help. Thanks..
Used the below code to copy Df to Amazon Redshift Table
from sqlalchemy import create_engine
conn = create_engine('postgresql://user:pass#host:port/datbase')
df.to_sql('table', conn, index = False, if_exists = 'replace')

python - export or write to excel ms SQL server QUERY

I have a database in sql server called zd and a table called user_tab_columns. I want to export in bulk or write to excel the result of the query statement. The code that I tried to mimic from different sources ended up giving me error messages.
In the database zd and table user_tab_columns, the field are as below:
Here is an example of my code below:
ValueError with Pandas - shaped of passed values
error message SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
import pyodbc
import pandas as pd
import os
cnxn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
"Server=DPC;"
"Database=zD;"
"trusted_connection=yes;")
cursor = cnxn.cursor()
script = """
SELECT *
FROM user_tab_columns
WHERE table_name = "A"
"""
cursor.execute(script)
columns = [desc[0] for desc in cursor.description]
data = cursor.fetchall()
df = pd.DataFrame(list(data), columns=columns)
writer = pd.ExcelWriter('C:\Users\PROGRAMs\TEST\export.xlsx')
df.to_excel(writer, sheet_name ='bar')
writer.save()
I would use pandas' built-in .read_sql(). Also, in order to put " in a string in python, you need to use ' as your quotation as opposed to ".
import pyodbc
import pandas as pd
import os
cnxn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
"Server=DPC;"
"Database=zD;"
"trusted_connection=yes;")
cursor = cnxn.cursor()
script = """
SELECT *
FROM user_tab_columns
WHERE table_name = 'A'
"""
df = pd.read_sql(script, cnxn)
writer = pd.ExcelWriter('C:\Users\PROGRAMs\TEST\export.xlsx')
df.to_excel(writer, sheet_name ='bar')
writer.save()
use forward slashes (/) instead of backslashes

Resources