I'm inserting data from mysql table to postgres table and my code is:
from sqlalchemy import create_engine, MetaData, Table
from sqlalchemy.orm import mapper, sessionmaker
import psycopg2
class TestTable(object):
pass
class StoreTV(object):
pass
if __name__ == "__main__":
engine = create_engine('mysql://root#localhost:3306/irt', echo=False)
Session = sessionmaker(bind=engine)
session = Session()
metadata = MetaData(engine)
test_table = Table('test_1', metadata, autoload=True)
store_tv_table = Table('roku_store', metadata, autoload=True)
mapper(TestTable, test_table)
mapper(StoreTV, store_tv_table)
res = session.query(TestTable).all()
print res[1].test_1col
tv_list = session.query(StoreTV).all()
for tv in tv_list:
tv_data = dict()
tv_data = {
'title': tv.name,
'email': tv.business_email
}
print tv_data
conn = psycopg2.connect(database="db", user="user", password="pass", host="localhost", port="5432")
print "Opened database successfully"
cur = conn.cursor()
values = cur.execute("Select * FROM iris_store")
print values
cur.execute("INSERT INTO iris_store(title, business_email) VALUES ('title':tv_data[title], 'business_email':tv_data[business_email])")
print "Record created successfully"
conn.commit()
conn.close()
And I'm not able to get data from postgres data and insert into postgres table
while I'm successful to get data from Mysql table
ERROR is:
something
{'email': 'name#example.com', 'title': "Some Name"}
Opened database successfully
None
Traceback (most recent call last):
File "/home/Desktop/porting.py", line 49, in
cur.execute("INSERT INTO iris_store(title, business_email) VALUES ('title':tv_data[title], 'business_email':tv_data[business_email])")
psycopg2.ProgrammingError: syntax error at or near ":"
LINE 1: ... iris_store(title, business_email) VALUES ('title':tv_data[t...
^
Usman
you have to check your data type for email to insert data.
because to insert data from mysql to postgres you have to both fields of same type.
click here and page no 28 will describe you aboout the data types of mysql and postgres
Your main problem is that you hava a sql syntax error in your insert query. It should look something like this:
cur.execute("INSERT INTO iris_store(title, business_email) VALUES (%(title)s, %(email)s)", tv_data)
For reference, see: Passing parameters to SQL queries
Also you probably don't want to create a new connection to your postgres db for each single value in tv_list, you should move the connect and close calls outside of the for loop, and printing the whole table each time also doesn't seem very useful
Related
I have a pandas dataframe with 27 columns and ~45k rows that I need to insert into a SQL Server table.
I am currently using with the below code and it takes 90 mins to insert:
conn = pyodbc.connect('Driver={ODBC Driver 17 for SQL Server};\
Server=#servername;\
Database=dbtest;\
Trusted_Connection=yes;')
cursor = conn.cursor() #Create cursor
for index, row in t6.iterrows():
cursor.execute("insert into dbtest.dbo.test( col1, col2, col3, col4,col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,,col27)\
values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)",
row['col1'],row['col2'], row['col3'],,row['col27'])
I have also tried to load using executemany and that takes even longer to complete, at nearly 120mins.
I am really looking for a faster load time since I need to run this daily.
You can set fast_executemany in pyodbc itself for versions>=4.0.19. It is off by default.
import pyodbc
server_name = 'localhost'
database_name = 'AdventureWorks2019'
table_name = 'MyTable'
driver = 'ODBC Driver 17 for SQL Server'
connection = pyodbc.connect(driver='{'+driver+'}', server=server_name, database=database_name, trusted_connection='yes')
cursor = connection.cursor()
cursor.fast_executemany = True # reduce number of calls to server on inserts
# form SQL statement
columns = ", ".join(df.columns)
values = '('+', '.join(['?']*len(df.columns))+')'
statement = "INSERT INTO "+table_name+" ("+columns+") VALUES "+values
# extract values from DataFrame into list of tuples
insert = [tuple(x) for x in df.values]
cursor.executemany(statement, insert)
Or if you prefer sqlalchemy and dataframes directly.
import sqlalchemy as db
engine = db.create_engine('mssql+pyodbc://#'+server_name+'/'+database_name+'?trusted_connection=yes&driver='+driver, fast_executemany=True)
df.to_sql(table_name, engine, if_exists='append', index=False)
See fast_executemany in this link.
https://github.com/mkleehammer/pyodbc/wiki/Features-beyond-the-DB-API
I have worked through this in the past, and this was the fastest that I could get it to work using sqlalchemy.
import sqlalchemy as sa
engine = (sa.create_engine(f'mssql://#{server}/{database}
?trusted_connection=yes&driver={driver_name}', fast_executemany=True)) #windows authentication
df.to_sql('Daily_Report', con=engine, if_exists='append', index=False)
If the engine is not working for you, then you may have a different setup so please see: https://docs.sqlalchemy.org/en/13/core/engines.html
You should be able to create the variables needed above, but here is how I get the driver:
driver_name = ''
driver_names = [x for x in pyodbc.drivers() if x.endswith(' for SQL Server')]
if driver_names:
driver_name = driver_names[-1] #You may need to change the [-1] if wrong driver to [-2] or a different option in the driver_names list.
if driver_name:
conn_str = f'''DRIVER={driver_name};SERVER='''
else:
print('(No suitable driver found. Cannot connect.)')
You can try to use the method 'multi' built in pandas to_sql.
df.to_sql('table_name', con=engine, if_exists='replace', index=False, method='multi')
The multi method allows you to 'Pass multiple values in a single INSERT clause.' per documentation.
I found it to be pretty efficient.
I have an API service and in this service I'm writing pandas dataframe results to SQL Server.
But when I want to add new values to the table, I cannot add. I've used append option because in the documentation it says that it adds new values to the dataframe. I didn't use replace option because I don't want to drop my table every time.
My need is to send new values to the database table while I'm keeping the old ones.
I've researched any other methods or ways except pandas to_sql method but I could only see the pandas at everywhere.
Does anybody have an idea about this?
Thanks.
You should make sure that your pandas dataframe has the right structure where keys are your mysql column names and data is in lists:
df = pd.DataFrame({"UserId":["rrrrr"],
"UserFavourite":["Greek Salad"],
"MonthlyOrderFrequency":[5],
"HighestOrderAmount":[30],
"LastOrderAmount":[21],
"LastOrderRating":[3],
"AverageOrderRating":[3],
"OrderMode":["Web"],
"InMedicalCare":["No"]})
Establish a proper connection to your db. In my case I am connecting to my local db at 127.0.0.1 and 'use demo;':
sqlEngine = create_engine('mysql+pymysql://root:#127.0.0.1/demo', pool_recycle=3600)
dbConnection = sqlEngine.connect()
Lastly, input your table name, mine is "UserVitals", and try executing in a try-except block to handle errors:
try:
df.to_sql("UserVitals", con=sqlEngine, if_exists='append');
except ValueError as vx:
print(vx)
except Exception as ex:
print(ex)
else:
print("Table %s created successfully."%tableName);
finally:
dbConnection.close()
Here's an example of how to do that...with a little extra code included.
# Insert from dataframe to table in SQL Server
import time
import pandas as pd
import pyodbc
# create timer
start_time = time.time()
from sqlalchemy import create_engine
df = pd.read_csv("C:\\your_path\\CSV1.csv")
conn_str = (
r'DRIVER={SQL Server Native Client 11.0};'
r'SERVER=your_server_name;'
r'DATABASE=NORTHWND;'
r'Trusted_Connection=yes;'
)
cnxn = pyodbc.connect(conn_str)
cursor = cnxn.cursor()
for index,row in df.iterrows():
cursor.execute('INSERT INTO dbo.Table_1([Name],[Address],[Age],[Work]) values (?,?,?,?)',
row['Name'],
row['Address'],
row['Age'],
row['Work'])
cnxn.commit()
cursor.close()
cnxn.close()
# see total time to do insert
print("%s seconds ---" % (time.time() - start_time))
I'm working on a project where I want the select IMDB files (The files format is in *.list) that I downloaded from here into a sqlite database. Unfortunately, I'm not able to solve this issue. I'm able to create a database but can't tabulate the table with IMDB data.
The documentation I've been following is here. So far, I created a sqlite table but it will not populate it.
import sqlite3
from sqlite3 import Error
def create_connection(db_file):
""" create a database connection to the SQLite database
specified by db_file
:param db_file: database file
:return: Connection object or None
"""
try:
conn = sqlite3.connect(db_file)
return conn
except Error as e:
print(e)
return None
def create_table(conn, create_table_sql):
""" create a table from the create_table_sql statement
:param conn: Connection object
:param create_table_sql: a CREATE TABLE statement
:return:
"""
try:
c = conn.cursor()
c.execute(create_table_sql)
except Error as e:
print(e)
def main():
database = "/Users/Erudition/Desktop/imdb_database/sqldatabase.db"
sql_create_tile_akas = """ CREATE TABLE IF NOT EXISTS title (
titleid text PRIMARY KEY,
ordering integer NOT NULL,
title text,
region text,
language text NOT NULL,
types text NOT NULL,
attributes text NOT NULL,
isOriginalTitle integer NOT NULL
); """
conn = create_connection(database)
if conn is not None:
# create projects table
create_table(conn, sql_create_tile_akas)
else:
print("Error! cannot create the database connection.")
if __name__ == '__main__':
main()
In the terminal, I enter
imdbpy2sql.py -d /Users/Erudition/Desktop/imdb_database/aka-titles.list/
-u sqlite:///sqldatabase.db'''
The output I expect is a sqlite table with all the rows filled. Instead, I get a several sqlite tables with nothing filled out.
The terminal output is:
WARNING The file will be skipped, and the contained
WARNING information will NOT be stored in the database.
WARNING Complete error: [Errno 20] Not a directory:
'/Users/Erudition/Desktop/imdb_database/aka-titles.list/complete-
cast.list.gz'
WARNING WARNING WARNING
WARNING unable to read the "/Users/Erudition/Desktop/imdb_database/aka-
titles.list/complete-crew.list.gz" file.
WARNING The file will be skipped, and the contained
WARNING information will NOT be stored in the database.
WARNING Complete error: [Errno 20] Not a directory:
'/Users/Erudition/Desktop/imdb_database/aka-titles.list/complete-
crew.list.gz'
I found the solution!
pip install imdb-sqlite
Then
imdb-sqlite
Here's the link
I wrote the following code to create a database file and then created a table in that database and inserted values using sql queries in python.
let say this is a file named info.py
conn = sqlite3.connect('sqlite_file.db', timeout=20)
c = conn.cursor()
#Creating a new SQLite table with 1 column
c.execute('CREATE TABLE STUDENTS_ (Name CHAR, RollNo INTEGER)')
a=['Richa', 'Swapnil', 'Jahanavi', 'Shivam', 'Mehul']
b=[122, 143, 102, 186, 110]
p=0
for r in b:
c.execute("INSERT INTO STUDENTS_ VALUES (?,?);", (a[p],b[p]))
p=p+1
It runs well and gives the expected result.
Now I want to update the same table named STUDENTS_, through another code in a different python file,I tried the code below.
This is another file named info_add.py
import sqlite3
sqlite_file = 'my_first_db.sqlite' # name of the sqlite database file
STUD = 'STUDENTS_' # name of the table to be created
conn = sqlite3.connect('sqlite_file.db', timeout=20)
c = conn.cursor()
a=['Riya', 'Vipul']
b=[160, 173]
p=0
for r in b:
c.execute("INSERT INTO STUDENTS_ VALUES (?,?);", (a[p],b[p]))
p=p+1
I Get the following error:
OperationalError: database is locked
What is this error?I know i am doing wrong, please anyone help me with a right method!!Thankyou
The "database is locked" message indicates that some other connection still has an active transaction.
Python tries to be clever and automatically starts transactions for you, so you have to ensure that you end your transactions (conn.commit()) when needed.
Given:
CREATE PROCEDURE my_procedure
#Param INT
AS
SELECT Col1, Col2
FROM Table
WHERE Col2 = #Param
I would like to be able to use this as:
import pandas as pd
import pyodbc
query = 'EXEC my_procedure #Param = {0}'.format(my_param)
conn = pyodbc.connect(my_connection_string)
df = pd.read_sql(query, conn)
But this throws an error:
ValueError: Reading a table with read_sql is not supported for a DBAPI2 connection. Use an SQLAlchemy engine or specify an sql query
SQLAlchemy does not work either:
import sqlalchemy
engine = sqlalchemy.create_engine(my_connection_string)
df = pd.read_sql(query, engine)
Throws:
ValueError: Could not init table 'my_procedure'
I can in fact execute the statement using pyodbc directly:
cursor = conn.cursor()
cursor.execute(query)
results = cursor.fetchall()
df = pd.DataFrame.from_records(results)
Is there a way to send these procedure results directly to a DataFrame?
Use read_sql_query() instead.
Looks like #joris (+1) already had this in a comment directly under the question but I didn't see it because it wasn't in the answers section.
Use the SQLA engine--apart from SQLAlchemy, Pandas only supports SQLite. Then use read_sql_query() instead of read_sql(). The latter tries to auto-detect whether you're passing a table name or a fully-fledged query but it doesn't appear to do so well with the 'EXEC' keyword. Using read_sql_query() skips the auto-detection and allows you to explicitly indicate that you're using a query (there's also a read_sql_table()).
import pandas as pd
import sqlalchemy
query = 'EXEC my_procedure #Param = {0}'.format(my_param)
engine = sqlalchemy.create_engine(my_connection_string)
df = pd.read_sql_query(query, engine)
https://code.google.com/p/pyodbc/wiki/StoredProcedures
I am not a python expert, but SQL Server sometimes returns counts for statement executions. For instance, a update will tell how many rows are updated.
Just use the 'SET NO COUNT;' at the front of your batch call. This will remove the counts for inserts, updates, and deletes.
Make sure you are using the correct native client module.
Take a look at this stack overflow example.
It has both a adhoc SQL and call stored procedure example.
Calling a stored procedure python
Good luck
This worked for me after added SET NOCOUNT ON thanks #CRAFTY DBA
sql_query = """SET NOCOUNT ON; EXEC db_name.dbo.StoreProc '{0}';""".format(input)
df = pandas.read_sql_query(sql_query , conn)
Using ODBC syntax for calling stored procedures (with parameters instead of string formatting) works for loading dataframes using pandas 0.14.1 and pyodbc 3.0.7. The following examples use the AdventureWorks2008R2 sample database.
First confirm expected results calling the stored procedure using pyodbc:
import pandas as pd
import pyodbc
connection = pyodbc.connect(driver='{SQL Server Native Client 11.0}', server='ServerInstance', database='AdventureWorks2008R2', trusted_connection='yes')
sql = "{call dbo.uspGetEmployeeManagers(?)}"
params = (3,)
cursor = connection.cursor()
rows = cursor.execute(sql, params).fetchall()
print(rows)
Should return:
[(0, 3, 'Roberto', 'Tamburello', '/1/1/', 'Terri', 'Duffy'), (1, 2, 'Terri', 'Duffy',
'/1/', 'Ken', 'Sánchez')]
Now use pandas to load the results into a dataframe:
df = pd.read_sql(sql=sql, con=connection, params=params)
print(df)
Should return:
RecursionLevel BusinessEntityID FirstName LastName OrganizationNode \
0 0 3 Roberto Tamburello /1/1/
1 1 2 Terri Duffy /1/
ManagerFirstName ManagerLastName
0 Terri Duffy
1 Ken Sánchez
EDIT
Since you can't update to pandas 0.14.1, load the results from pyodbc using pandas.DataFrame.from_records:
# get column names from pyodbc results
columns = [column[0] for column in cursor.description]
df = pd.DataFrame.from_records(rows, columns=columns)