Execute many issue CSV to Microsoft SQL Server in Python - sql-server

I am trying to insert 2,000,000 records from a .CSV file into Microsoft SQL Server using Python. Standard insert function took me 3 hours to upload all records. I am trying to reduce this to minutes.
I am fairly new to Python.
import pyodbc
import csv
import time
import pandas as pd
new_data = []
t0 = time.time()
conn = pyodbc.connect('Driver={SQL Server};'
'Server=xyz;'
'Database=Rst;'
'Trusted_Connection=yes;')
cursor = conn.cursor()
cursor.fast_executemany = True
parameter = [()]
with open(r"\\abc.csv") as csvDataFile:
csvReader = csv.reader(csvDataFile)
next(csvReader)
for row in csvReader:
cursor.executemany('INSERT INTO [dbo].[Science_Catalogue_Extraction]([Supplier],[Catalogue_Number],[Description],[Long_Description],[UNSPSC_Code],[CAS_Number],[Victoria_Hazard_Flag],[Overall_Status],[Chemical_Buyers_Override],[Standard_Buyers_Override])''VALUES(?,?,?,?,?,?,?,?,?,?)',row)
print(f'{time.time() - t0:.1f} seconds')
conn.commit()
cursor.close()
conn.close()`
I am getting an error
TypeError: ('Params must be in a list, tuple, or Row', 'HY000').
Please help :(

Related

Connection to different SQL Server databases

I want to create a connection to a SQL Server Database using a python script that you will see below. In fact, I managed to connect to a single database but I want to make a loop on a database list so that every time I point to a database I can connect directly.
My question is how can I make the name of the database as a variable. Could you help me please?! Thank you in advance
from os import listdir
from os.path import isfile, join
import pyodbc
import pandas as pd
mypath = 'C:\\Users\\DataBaseList'
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
for i in range (len(onlyfiles)) :
print (onlyfiles[i])
connection = pyodbc.connect("Driver={ODBC Driver 11 for SQL Server};"
"Server=FR0010APP31;"
"Database= Base;"
"uid=*****;pwd=*****")
cursor = connection.cursor()
sql='Select * from Activity'
df= pd.read_sql(sql, connection)
df.to_csv('DatabaseFile')

How to Save a Data Frame as a table in SQL

I have a SQL Server on which I have databases that I want to use pandas to alter that data. I know how to get the data using pyodbc into a DataFrame, but then I have no clue how to get that DataFrame back into my SQL Server.
I have tried to create an engine with sqlalchemy and use the to_sql command, but I can not get that to work because my engine is never able to connect correctly to my database.
import pyodbc
import pandas
server = "server"
db = "db"
conn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';DATABASE='+db+';Trusted_Connection=yes')
cursor = conn.cursor()
df = cursor.fetchall()
data = pandas.DataFrame(df)
conn.commit()
You can use pandas.DataFrame.to_sql to insert your dataframe into SQL server. Databases supported by SQLAlchemy are supported by this method.
Here is a example how you can achieve this:
from sqlalchemy import create_engine, event
from urllib.parse import quote_plus
import logging
import sys
import numpy as np
from datetime import datetime, timedelta
# setup logging
logging.basicConfig(stream=sys.stdout,
filemode='a',
format='%(asctime)s.%(msecs)3d %(levelname)s:%(name)s: %(message)s',
datefmt='%m-%d-%Y %H:%M:%S',
level=logging.DEBUG)
logger = logging.getLogger(__name__) # get the name of the module
def write_to_db(df, database_name, table_name):
"""
Creates a sqlalchemy engine and write the dataframe to database
"""
# replacing infinity by nan
df = df.replace([np.inf, -np.inf], np.nan)
user_name = 'USERNAME'
pwd = 'PASSWORD'
db_addr = '10.00.000.10'
chunk_size = 40
conn = "DRIVER={SQL Server};SERVER="+db_addr+";DATABASE="+database_name+";UID="+user_name+";PWD="+pwd+""
quoted = quote_plus(conn)
new_con = 'mssql+pyodbc:///?odbc_connect={}'.format(quoted)
# create sqlalchemy engine
engine = create_engine(new_con)
# Write to DB
logger.info("Writing to database ...")
st = datetime.now() # start time
# WARNING!! -- overwrites the table using if_exists='replace'
df.to_sql(table_name, engine, if_exists='replace', index=False, chunksize=chunk_size)
logger.info("Database updated...")
logger.info("Data written to '{}' databsae into '{}' table ...".format(database_name, table_name))
logger.info("Time taken to write to DB: {}".format((datetime.now()-st).total_seconds()))
Calling this method should write your dataframe to the database, note that it will replace the table if there is already a table in the database with the same name.

to_sql not updating table in ms server

Hey so I have the below code I'm running where I'm pulling a table from a MS Sql Server table, running a bit of code and then trying to reimport it into another table within the same database. Running this in spyder
It runs all the way through, but when I
select * from pythontest
on the SQL Server, the table comes out blank. Is there anything that is standing out as not working?
## From SQL to DataFrame Pandas
import pandas as pd
import pyodbc
import mysql.connector
from sqlalchemy import create_engine
sql_conn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
"Server=njsrvnav1;"
"Database=cornerstone;"
"Trusted_Connection=yes;")
query = "SELECT [c1], [c2], [c3] from projectmaster"
df = pd.read_sql(query, sql_conn)
df = df[:100]
con = create_engine('mssql+pyodbc://username:pword#serverName:1433/cornerstone?driver=SQL+Server+Native+Client+11.0')
df.to_sql('dbo.pythontest', con, if_exists='replace')
con.dispose()
Try with pymysql :
conn = pymysql.connect(
host='',
port=
user='',
passwd='',
db='',
charset='utf8mb4')
df = pd.read_sql_query("SELECT * FROM table ",
conn)
df.head(2)

Error when copying data from Pandas Dataframe to Amazon RedShift table

I am trying to write a Pandas Dataframe to a Amazon Redshift DB. Given below is the code I am using.
from sqlalchemy import create_engine
import psycopg2
import io
engine=create_engine('postgresql+psycopg2://username:password#host:port/database')
conn=engine.raw_connection()
cur = conn.cursor()
output = io.StringIO()
report.to_csv(output, sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cur.copy_from(output, table_name, null="")
conn.commit()
However on running the above code, i get the below error
NameError: name 'database' is not defined
Could anyone help. Thanks..
Used the below code to copy Df to Amazon Redshift Table
from sqlalchemy import create_engine
conn = create_engine('postgresql://user:pass#host:port/datbase')
df.to_sql('table', conn, index = False, if_exists = 'replace')

python - export or write to excel ms SQL server QUERY

I have a database in sql server called zd and a table called user_tab_columns. I want to export in bulk or write to excel the result of the query statement. The code that I tried to mimic from different sources ended up giving me error messages.
In the database zd and table user_tab_columns, the field are as below:
Here is an example of my code below:
ValueError with Pandas - shaped of passed values
error message SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
import pyodbc
import pandas as pd
import os
cnxn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
"Server=DPC;"
"Database=zD;"
"trusted_connection=yes;")
cursor = cnxn.cursor()
script = """
SELECT *
FROM user_tab_columns
WHERE table_name = "A"
"""
cursor.execute(script)
columns = [desc[0] for desc in cursor.description]
data = cursor.fetchall()
df = pd.DataFrame(list(data), columns=columns)
writer = pd.ExcelWriter('C:\Users\PROGRAMs\TEST\export.xlsx')
df.to_excel(writer, sheet_name ='bar')
writer.save()
I would use pandas' built-in .read_sql(). Also, in order to put " in a string in python, you need to use ' as your quotation as opposed to ".
import pyodbc
import pandas as pd
import os
cnxn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
"Server=DPC;"
"Database=zD;"
"trusted_connection=yes;")
cursor = cnxn.cursor()
script = """
SELECT *
FROM user_tab_columns
WHERE table_name = 'A'
"""
df = pd.read_sql(script, cnxn)
writer = pd.ExcelWriter('C:\Users\PROGRAMs\TEST\export.xlsx')
df.to_excel(writer, sheet_name ='bar')
writer.save()
use forward slashes (/) instead of backslashes

Resources