Copy data from PostgreSQL to SQL Server - sql-server

I have source table in PostgreSQL and and target table in SQL Server. I am using Groovy script for copying data.
I'm trying to implement batch loading but it's not working, here is my sample code.
Please let me know if any one has any idea.
import groovy.sql.Sql
def sql_orig = Sql.newInstance( )
def sql_dest = Sql.newInstance( )
batchSize=20
sql_dest.withBatch( batchSize, "insert into TABLE_DESTINO(a,b,c) values(?,?,?)"){ ps->
sql_orig.eachRow "select a,b,c from TABLE_ORIGEN",{ row ->
ps.addBatch(row)
}
}

Related

Pandas dataframe insert into SQL Server taking too long with execute and executemany

I have a pandas dataframe with 27 columns and ~45k rows that I need to insert into a SQL Server table.
I am currently using with the below code and it takes 90 mins to insert:
conn = pyodbc.connect('Driver={ODBC Driver 17 for SQL Server};\
Server=#servername;\
Database=dbtest;\
Trusted_Connection=yes;')
cursor = conn.cursor() #Create cursor
for index, row in t6.iterrows():
cursor.execute("insert into dbtest.dbo.test( col1, col2, col3, col4,col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,,col27)\
values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)",
row['col1'],row['col2'], row['col3'],,row['col27'])
I have also tried to load using executemany and that takes even longer to complete, at nearly 120mins.
I am really looking for a faster load time since I need to run this daily.
You can set fast_executemany in pyodbc itself for versions>=4.0.19. It is off by default.
import pyodbc
server_name = 'localhost'
database_name = 'AdventureWorks2019'
table_name = 'MyTable'
driver = 'ODBC Driver 17 for SQL Server'
connection = pyodbc.connect(driver='{'+driver+'}', server=server_name, database=database_name, trusted_connection='yes')
cursor = connection.cursor()
cursor.fast_executemany = True # reduce number of calls to server on inserts
# form SQL statement
columns = ", ".join(df.columns)
values = '('+', '.join(['?']*len(df.columns))+')'
statement = "INSERT INTO "+table_name+" ("+columns+") VALUES "+values
# extract values from DataFrame into list of tuples
insert = [tuple(x) for x in df.values]
cursor.executemany(statement, insert)
Or if you prefer sqlalchemy and dataframes directly.
import sqlalchemy as db
engine = db.create_engine('mssql+pyodbc://#'+server_name+'/'+database_name+'?trusted_connection=yes&driver='+driver, fast_executemany=True)
df.to_sql(table_name, engine, if_exists='append', index=False)
See fast_executemany in this link.
https://github.com/mkleehammer/pyodbc/wiki/Features-beyond-the-DB-API
I have worked through this in the past, and this was the fastest that I could get it to work using sqlalchemy.
import sqlalchemy as sa
engine = (sa.create_engine(f'mssql://#{server}/{database}
?trusted_connection=yes&driver={driver_name}', fast_executemany=True)) #windows authentication
df.to_sql('Daily_Report', con=engine, if_exists='append', index=False)
If the engine is not working for you, then you may have a different setup so please see: https://docs.sqlalchemy.org/en/13/core/engines.html
You should be able to create the variables needed above, but here is how I get the driver:
driver_name = ''
driver_names = [x for x in pyodbc.drivers() if x.endswith(' for SQL Server')]
if driver_names:
driver_name = driver_names[-1] #You may need to change the [-1] if wrong driver to [-2] or a different option in the driver_names list.
if driver_name:
conn_str = f'''DRIVER={driver_name};SERVER='''
else:
print('(No suitable driver found. Cannot connect.)')
You can try to use the method 'multi' built in pandas to_sql.
df.to_sql('table_name', con=engine, if_exists='replace', index=False, method='multi')
The multi method allows you to 'Pass multiple values in a single INSERT clause.' per documentation.
I found it to be pretty efficient.

Pandas Dataframe to SQL Server

I have an API service and in this service I'm writing pandas dataframe results to SQL Server.
But when I want to add new values to the table, I cannot add. I've used append option because in the documentation it says that it adds new values to the dataframe. I didn't use replace option because I don't want to drop my table every time.
My need is to send new values to the database table while I'm keeping the old ones.
I've researched any other methods or ways except pandas to_sql method but I could only see the pandas at everywhere.
Does anybody have an idea about this?
Thanks.
You should make sure that your pandas dataframe has the right structure where keys are your mysql column names and data is in lists:
df = pd.DataFrame({"UserId":["rrrrr"],
"UserFavourite":["Greek Salad"],
"MonthlyOrderFrequency":[5],
"HighestOrderAmount":[30],
"LastOrderAmount":[21],
"LastOrderRating":[3],
"AverageOrderRating":[3],
"OrderMode":["Web"],
"InMedicalCare":["No"]})
Establish a proper connection to your db. In my case I am connecting to my local db at 127.0.0.1 and 'use demo;':
sqlEngine = create_engine('mysql+pymysql://root:#127.0.0.1/demo', pool_recycle=3600)
dbConnection = sqlEngine.connect()
Lastly, input your table name, mine is "UserVitals", and try executing in a try-except block to handle errors:
try:
df.to_sql("UserVitals", con=sqlEngine, if_exists='append');
except ValueError as vx:
print(vx)
except Exception as ex:
print(ex)
else:
print("Table %s created successfully."%tableName);
finally:
dbConnection.close()
Here's an example of how to do that...with a little extra code included.
# Insert from dataframe to table in SQL Server
import time
import pandas as pd
import pyodbc
# create timer
start_time = time.time()
from sqlalchemy import create_engine
df = pd.read_csv("C:\\your_path\\CSV1.csv")
conn_str = (
r'DRIVER={SQL Server Native Client 11.0};'
r'SERVER=your_server_name;'
r'DATABASE=NORTHWND;'
r'Trusted_Connection=yes;'
)
cnxn = pyodbc.connect(conn_str)
cursor = cnxn.cursor()
for index,row in df.iterrows():
cursor.execute('INSERT INTO dbo.Table_1([Name],[Address],[Age],[Work]) values (?,?,?,?)',
row['Name'],
row['Address'],
row['Age'],
row['Work'])
cnxn.commit()
cursor.close()
cnxn.close()
# see total time to do insert
print("%s seconds ---" % (time.time() - start_time))

pandas to_sql for MS SQL

I'm trying to save a dataframe to MS SQL that uses Windows authentication. I've tried using engine, engine.connect(), engine.raw_connection() and they all throw up errors:
'Engine' object has no attribute 'cursor', 'Connection' object has no attribute 'cursor', and Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': ... respectively.
params = urllib.parse.quote('DRIVER={ODBC Driver 13 for SQL Server};'
'SERVER=server;'
'DATABASE=db;'
'TRUSTED_CONNECTION=Yes;')
engine = create_engine('mssql+pyodbc:///?odbc_connect=%s' % params)
df.to_sql(table_name,engine, index=False)
This will do exactly what you want.
# Insert from dataframe to table in SQL Server
import time
import pandas as pd
import pyodbc
# create timer
start_time = time.time()
from sqlalchemy import create_engine
df = pd.read_csv("C:\\your_path\\CSV1.csv")
conn_str = (
r'DRIVER={SQL Server Native Client 11.0};'
r'SERVER=name_of_your_server;'
r'DATABASE=name_of_your_database;'
r'Trusted_Connection=yes;'
)
cnxn = pyodbc.connect(conn_str)
cursor = cnxn.cursor()
for index,row in df.iterrows():
cursor.execute('INSERT INTO dbo.Table_1([Name],[Address],[Age],[Work]) values (?,?,?,?)',
row['Name'],
row['Address'],
row['Age'],
row['Work'])
cnxn.commit()
cursor.close()
cnxn.close()
# see total time to do insert
print("%s seconds ---" % (time.time() - start_time))
Here is an update to my original answer. Basically, this is the old-school way of doing things (INSERT INTO). I recently stumbled upon a super-easy, scalable, and controllable, way of pushing data from Python to SQL Server. Try the sample code and post back if you have additional questions.
import pyodbc
import pandas as pd
engine = "mssql+pyodbc://your_server_name/your_database_name?driver=SQL Server Native Client 11.0?trusted_connection=yes"
... dataframe here...
dataframe.to_sql(x, engine, if_exists='append', index=True)
dataframe is pretty self explanatory.
x = the name yo uwant your table to be in SQL Server.

SoapUI NG Pro - Executing an UPDATE script in SoapUI using Groovy

I need to be able to execute an update SQL script, but it isn't working
Here is a link to the site that I used for reference:
https://groovyinsoapui.wordpress.com/tag/sql-eachrow-groovy-soapui/
Here is the format of the code that I ended up writing (due to the nature of the work I am doing, I am unable to provide the exact script that I wrote)
import groovy.sql.Sql
def groovyUtils = new com.eviware.soapui.support.GroovyUtils(context)
groovyUtils.registerJdbcDriver("com.microsoft.sqlserver.jdbc.SQLServerDriver")
def connectString = "jdbc:microsoft:sqlserver://:;databaseName=?user=&password="
sql = Sql.newInstance(connectString) // TEST YOUR CONNECT STRING IN A SQL BROWSER
sql.executeUpdate("UPDATE TABLE SET COLUMN_1 = 'VALUE_1' WHERE COLUMN_2 = 'VALUE_2'")
The response that I am getting is:
Script-result: 0
I also tried to use:
sql.execute("UPDATE TABLE SET COLUMN_1 = 'VALUE_1' WHERE COLUMN_2 = 'VALUE_2'")
Which returns the following response:
Script-result: false
From what you say it seems that no row has COLUMN_2 = 'VALUE_2', so then number of updated rows is 0.
I would first check that statement on Management Studio just to make sure.

C# 20,000 records selected from oracle need to be inserted into SQL2005

i have a console app in c# that extracts 20 fields from an oracle DB witht he code below and i wanted an efficient way to insert them into SQL 2005.
i dotn want to insert each one of the 20,000 within the while loop, obviously. i was thinking to change the code to use a data set to cache all the records and then do a bulk insert...
thoughts?
pseudo code would be nice since i am new to oracle.
this is my code where i was testing getting a connection to oracle and seeing if i can view the data... now i can view it i want to get it out and into sql2005... what do i do from here?
static void getData()
{
string connectionString = GetConnectionString();
using (OracleConnection connection = new OracleConnection())
{
connection.ConnectionString = connectionString;
connection.Open();
OracleCommand command = connection.CreateCommand();
string sql = "SELECT * FROM BUG";
command.CommandText = sql;
OracleDataReader reader = command.ExecuteReader();
while (reader.Read())
{
//string myField = (string)reader["Project"];
string myField = reader[0].ToString();
Console.WriteLine(myField);
}
}
}
You can create a CSV file and then use BULK INSERT to insert the file into SQL Server. Have a look here for an example.
The "bulk" insert with the cached Dataset will work exactly like the while loop you are not wanting to write! The problem is that you'll lose control of the process if you try to use the "bulk" insert of the Dataset class. It is extraneous work in the end.
Maybe the best solution is to use a DataWriter so that you have complete control and no Dataset overhead.
You can actually do 100-1000 inserts per sql batch. Just generate multiple inserts, then submit. Pregenerate the next SELECT batch WHILE THE FIRST EXECUTES.

Resources