Groovy, safe way to create MSSQL database - sql-server

I'm trying to use groovy.sql.Sql to create databases in an MSSQL (Microsoft SQL Server) server. It seem like the prepared statement adds additional quotes around the last parameter breaking the query.
This test code:
import groovy.sql.Sql
import com.microsoft.sqlserver.jdbc.SQLServerDataSource
def host = 'myhost'
def port = '1433'
def database = 'mydatabasename'
def usernameName = 'myusername'
def password = 'mypassword'
def dataSource = new SQLServerDataSource()
dataSource.setURL("jdbc:sqlserver://$host:$port")
dataSource.setUser(username)
dataSource.setPassword(password)
def connection new Sql(dataSource)
connection.execute(
'IF EXISTS (SELECT * FROM master.dbo.sysdatabases WHERE name = ?) DROP DATABASE ?',
[ databaseName, databaseName ]
)
Gives the error:
Failed to execute: IF EXISTS (SELECT * FROM master.dbo.sysdatabases WHERE name = ?) DROP DATABASE ? because: Incorrect syntax near '#P1'.
How can I use prepared statements without having it add single quotes around parameter one (DROP DATABASE ? seem to be rewritten as DROP DATABASE '?') or can I write the query in a different way so that the added single quotes does not produce a syntax error?
I would also be fine with other frameworks, if anyone could give me a working example.

Can you try:
connection.execute(
"IF EXISTS (SELECT * FROM master.dbo.sysdatabases WHERE name = $databaseName) DROP DATABASE ${Sql.expand(databseName)}"
)

Related

Pandas dataframe insert into SQL Server taking too long with execute and executemany

I have a pandas dataframe with 27 columns and ~45k rows that I need to insert into a SQL Server table.
I am currently using with the below code and it takes 90 mins to insert:
conn = pyodbc.connect('Driver={ODBC Driver 17 for SQL Server};\
Server=#servername;\
Database=dbtest;\
Trusted_Connection=yes;')
cursor = conn.cursor() #Create cursor
for index, row in t6.iterrows():
cursor.execute("insert into dbtest.dbo.test( col1, col2, col3, col4,col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,,col27)\
values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)",
row['col1'],row['col2'], row['col3'],,row['col27'])
I have also tried to load using executemany and that takes even longer to complete, at nearly 120mins.
I am really looking for a faster load time since I need to run this daily.
You can set fast_executemany in pyodbc itself for versions>=4.0.19. It is off by default.
import pyodbc
server_name = 'localhost'
database_name = 'AdventureWorks2019'
table_name = 'MyTable'
driver = 'ODBC Driver 17 for SQL Server'
connection = pyodbc.connect(driver='{'+driver+'}', server=server_name, database=database_name, trusted_connection='yes')
cursor = connection.cursor()
cursor.fast_executemany = True # reduce number of calls to server on inserts
# form SQL statement
columns = ", ".join(df.columns)
values = '('+', '.join(['?']*len(df.columns))+')'
statement = "INSERT INTO "+table_name+" ("+columns+") VALUES "+values
# extract values from DataFrame into list of tuples
insert = [tuple(x) for x in df.values]
cursor.executemany(statement, insert)
Or if you prefer sqlalchemy and dataframes directly.
import sqlalchemy as db
engine = db.create_engine('mssql+pyodbc://#'+server_name+'/'+database_name+'?trusted_connection=yes&driver='+driver, fast_executemany=True)
df.to_sql(table_name, engine, if_exists='append', index=False)
See fast_executemany in this link.
https://github.com/mkleehammer/pyodbc/wiki/Features-beyond-the-DB-API
I have worked through this in the past, and this was the fastest that I could get it to work using sqlalchemy.
import sqlalchemy as sa
engine = (sa.create_engine(f'mssql://#{server}/{database}
?trusted_connection=yes&driver={driver_name}', fast_executemany=True)) #windows authentication
df.to_sql('Daily_Report', con=engine, if_exists='append', index=False)
If the engine is not working for you, then you may have a different setup so please see: https://docs.sqlalchemy.org/en/13/core/engines.html
You should be able to create the variables needed above, but here is how I get the driver:
driver_name = ''
driver_names = [x for x in pyodbc.drivers() if x.endswith(' for SQL Server')]
if driver_names:
driver_name = driver_names[-1] #You may need to change the [-1] if wrong driver to [-2] or a different option in the driver_names list.
if driver_name:
conn_str = f'''DRIVER={driver_name};SERVER='''
else:
print('(No suitable driver found. Cannot connect.)')
You can try to use the method 'multi' built in pandas to_sql.
df.to_sql('table_name', con=engine, if_exists='replace', index=False, method='multi')
The multi method allows you to 'Pass multiple values in a single INSERT clause.' per documentation.
I found it to be pretty efficient.

Execute sp_rename SQL command using pypyodbc

Initially I had used the following command to rename SQL tables:
Q = """sp_rename {}, {}""".format(OLD_TABLE_NAME,NEW_TABLE NAME)
However, this caused an "Lock request time out period exceeded" error, which I believe was due to the lack of "commit" at the end of the query (although I am not confident on this).
So instead, I adopted a new query (adapted from this question).
Q2 = """BEGIN TRANSACTION
GO
EXECUTE sp_rename N'{}', N'{}', 'OBJECT'
GO
ALTER TABLE {} SET (LOCK_ESCALATION = TABLE)
GO
COMMIT""".format(OLD_TABLE_NAME,NEW_TABLE NAME,NEW_TABLE NAME)
However, I'm now getting a ProgrammingError saying "Incorrect syntax near 'GO'."
Do I need to remove some parts of Q2 for the query to work? Or is some other part wrong?
Below are the two functions I use to connect to my SQL server:
from sqlalchemy import create_engine
import pypyodbc as pp
server1 = {
'drivername': 'mssql+pyodbc',
'servername': 'SERVERNAME',
#'port': '5432',
'username': 'WebAccess',
'password': ':|Ax-*6_6!5H',
'driver': 'SQL Server Native Client 11.0',
'trusted_connection': 'yes',
'legacy_schema_aliasing': False
}
def getEngine(servername, database):
DB = server1
#Create connection to SQL database
DB['database'] = database
servername1 = servername.lower()
engine = create_engine('mssql+pyodbc://' + DB['username'] + ':' + DB['password'] + '#' + DB['servername'] + '/' + DB['database'] + '?' + 'driver=' + DB['driver'])#, echo=True)
return engine
def SQLcommand(query,servername,database):
connection = pp.connect("""Driver={SQL Server};Server=""" + servername + """;Database=""" + database + """;uid=USERNAME;pwd=PASSWORD""")
cursor = connection.cursor()
cursor.execute(query)
connection.commit()
connection.close()
This works for me based on the note in the SQLAlchemy Docs:
sql_stmt = f"""EXECUTE sp_rename '{table2}', '{table1}';"""
with connection.begin() as conn:
conn.execute(text(sql_stmt))
Alright dude, you got a few problems here.
pypyodbc is something you should move away from. That library was
all the rage a while ago, but to my knowledge it's not getting many
commits anymore, google is moving away from it, in my build we moved
away from it for a number of reasons. It was great while it lasted
but I think its on the out.
You cannot use 'GO' in a query. 'GO' is not a tsql statement, its a sqlcmd command - thus it doesn't work with odbc. 'GO' essentially just breaks the code into batches, so to do this not using 'GO' you need to run multiple batches, something like:
conn = engine.connect()
tran = conn.transaction()
conn.execute(f"EXECUTE sp_rename N'{OLD_TABLE_NAME}', N'{NEW_TABLE_NAME}', 'OBJECT'")
conn.execute(f"ALTER TABLE {NEW_TABLE_NAME} SET LOCK_ESCALATION = TABLE")
tran.commit()
I'm sure there's more going on here as well, but hopefully this'll get you going.
For anyone still wondering how to solve the problem, I fixed Jamie Marshall's answer in order to adapt it to new transaction api:
conn = engine.connect()
def rename(cnxn):
rename_query = f"EXECUTE sp_rename N'{OLD_TABLE_NAME}', N'{NEW_TABLE_NAME}', 'OBJECT'"
cnxn.execute(rename_query)
conn.transaction(rename)

SQL-Server driver to access stored procedure or view source code

Does anyone know if exists a SQL-Server driver in some language (python, java, c#, javascript..) which allows to inspect the souce code of a stored procedure or a view in a sql server database?
For example let's say in the db XXX i have the following view:
CREATE VIEW [dbo].[MY_VIEW] AS
SELECT
FIELD_1 AS X
FIELD_2 AS Y
FIELD_3 AS Z
FROM
[XXX].[dbo].[MY_TABLE_SOURCE]
I need to retrieve the code above using some programming language (python preferred). Is it possible?
From #sepupic comment this solution works (python):
import pyodbc
DRIVER = "SQL Server"
DB_NAME = "mydb"
USR = "username"
PWD = "password"
conn = pyodbc.connect('DRIVER={0};SERVER=10.12.0.11;DATABASE={1};UID={2};PWD={3}'.format(DRIVER, DB_NAME, USR, PWD))
cursor = conn.cursor()
cursor.execute("EXECUTE SP_HELPTEXT N'MY_VIEW_OR_PROCEDURE'")
result = cursor.fetchall()

Execute DB_ID sql function from scala

I would like to call the DB_ID function of SQL Server to retrieve the databaseID of a user database from scala..
Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver")
val connection: Connection = DriverManager.getConnection(jdbcConnectionString)
//SELECT database_id FROM sys.databases WHERE Name
val statement: CallableStatement = connection.prepareCall("{? =call DB_ID(?)}")
statement.registerOutParameter(1,java.sql.Types.INTEGER)
statement.setString(2,s"'ABC_STORE'")
statement.execute()
val a = statement.getInt(1)
I get an error COULD NOT FIND the stored procedure DB_ID.
How do i get this to work.
If you make statement.setString you don't need to use your String between the two '' so change :
statement.setString(2,s"'ABC_STORE'")
by
statement.setString(2, "ABC_STORE")
What the s do in 2,s"' make sure to remove it.
You can learn more here :
Using a Stored Procedure with a Return Status and Prepared Statement doc

Read stored procedure select results into pandas dataframe

Given:
CREATE PROCEDURE my_procedure
#Param INT
AS
SELECT Col1, Col2
FROM Table
WHERE Col2 = #Param
I would like to be able to use this as:
import pandas as pd
import pyodbc
query = 'EXEC my_procedure #Param = {0}'.format(my_param)
conn = pyodbc.connect(my_connection_string)
df = pd.read_sql(query, conn)
But this throws an error:
ValueError: Reading a table with read_sql is not supported for a DBAPI2 connection. Use an SQLAlchemy engine or specify an sql query
SQLAlchemy does not work either:
import sqlalchemy
engine = sqlalchemy.create_engine(my_connection_string)
df = pd.read_sql(query, engine)
Throws:
ValueError: Could not init table 'my_procedure'
I can in fact execute the statement using pyodbc directly:
cursor = conn.cursor()
cursor.execute(query)
results = cursor.fetchall()
df = pd.DataFrame.from_records(results)
Is there a way to send these procedure results directly to a DataFrame?
Use read_sql_query() instead.
Looks like #joris (+1) already had this in a comment directly under the question but I didn't see it because it wasn't in the answers section.
Use the SQLA engine--apart from SQLAlchemy, Pandas only supports SQLite. Then use read_sql_query() instead of read_sql(). The latter tries to auto-detect whether you're passing a table name or a fully-fledged query but it doesn't appear to do so well with the 'EXEC' keyword. Using read_sql_query() skips the auto-detection and allows you to explicitly indicate that you're using a query (there's also a read_sql_table()).
import pandas as pd
import sqlalchemy
query = 'EXEC my_procedure #Param = {0}'.format(my_param)
engine = sqlalchemy.create_engine(my_connection_string)
df = pd.read_sql_query(query, engine)
https://code.google.com/p/pyodbc/wiki/StoredProcedures
I am not a python expert, but SQL Server sometimes returns counts for statement executions. For instance, a update will tell how many rows are updated.
Just use the 'SET NO COUNT;' at the front of your batch call. This will remove the counts for inserts, updates, and deletes.
Make sure you are using the correct native client module.
Take a look at this stack overflow example.
It has both a adhoc SQL and call stored procedure example.
Calling a stored procedure python
Good luck
This worked for me after added SET NOCOUNT ON thanks #CRAFTY DBA
sql_query = """SET NOCOUNT ON; EXEC db_name.dbo.StoreProc '{0}';""".format(input)
df = pandas.read_sql_query(sql_query , conn)
Using ODBC syntax for calling stored procedures (with parameters instead of string formatting) works for loading dataframes using pandas 0.14.1 and pyodbc 3.0.7. The following examples use the AdventureWorks2008R2 sample database.
First confirm expected results calling the stored procedure using pyodbc:
import pandas as pd
import pyodbc
connection = pyodbc.connect(driver='{SQL Server Native Client 11.0}', server='ServerInstance', database='AdventureWorks2008R2', trusted_connection='yes')
sql = "{call dbo.uspGetEmployeeManagers(?)}"
params = (3,)
cursor = connection.cursor()
rows = cursor.execute(sql, params).fetchall()
print(rows)
Should return:
[(0, 3, 'Roberto', 'Tamburello', '/1/1/', 'Terri', 'Duffy'), (1, 2, 'Terri', 'Duffy',
'/1/', 'Ken', 'Sánchez')]
Now use pandas to load the results into a dataframe:
df = pd.read_sql(sql=sql, con=connection, params=params)
print(df)
Should return:
RecursionLevel BusinessEntityID FirstName LastName OrganizationNode \
0 0 3 Roberto Tamburello /1/1/
1 1 2 Terri Duffy /1/
ManagerFirstName ManagerLastName
0 Terri Duffy
1 Ken Sánchez
EDIT
Since you can't update to pandas 0.14.1, load the results from pyodbc using pandas.DataFrame.from_records:
# get column names from pyodbc results
columns = [column[0] for column in cursor.description]
df = pd.DataFrame.from_records(rows, columns=columns)

Resources