How to adjust variables in MS SQL Server through Python - sql-server

I have several sql queries written in MS SQL Server and I used the following code to import them into Python using the pyodbc package.
import pyodbc
import pandas as pd
def conn_sql_server(file_path):
'''Function to connect to SQL Server and save query result to a dataframe
input:
file_path - query file path
output:
df - dataframe from the query result
'''
# Connect to SQL Server
conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
'Server= servername;'
'Database = databasename;'
'Trusted_Connection=yes;')
# run query and ouput the result to df
query = open(file_path, 'r')
df = pd.read_sql_query(query.read(), conn)
query.close()
return df
df1 = conn_sql_server('C:/Users/JJ/SQL script1')
df2 = conn_sql_server('C:/Users/JJ/SQL script2')
df3 = conn_sql_server('C:/Users/JJ/SQL script3')
In each SQL query, I have used DECLARE and SET to set the variables (variables are different in each SQL query). Here, I just copied a random query from online as an example. What I want to do is to update the Year variable directly in Python. My actual query is pretty long, so I don't want to copy over the SQL scripts in python, I just want to adjust the variables. Anyway to do it?
DECLARE #Year INT = 2022;
SELECT YEAR(date) #Year,
SUM(list_price * quantity) gross_sales
FROM sales.orders o
INNER JOIN sales.order_items i ON i.order_id = o.order_id
GROUP BY YEAR(date)
order by #Year
My other question is, is there anyway to add a WHERE statement, like WHERE itemNumber = 1002345 after importing the above query into Python. I'm asking this because df2 is a subset of df1. The restriction column isn't selected to show in the output, so I cannot do filterings in python after reading in df1. I could add that column in the df1 output and do more aggregations in Python, but that would largely increase the orginal data size and running time, so I prefer not to do it.

Here's a sample of how your script will look like. We are doing 2 modifications:
conn_sql_server now takes these parameters:
year: you can pass the year you want to replace declare #year...
where_clause: a where clause of your choice
before_clause_starts_with: the clause before which the where clause should be placed
modify_query method that reads the contents of the file and changes the content based on the year you provided. If you provide the where clause, it'll put it before the clause you provide in before_clause_starts_with
import pyodbc
import pandas as pd
def modify_query(lines, year, where_clause, before_clause_starts_with):
new_lines = []
for line in lines:
if year is not None:
if line.lower().startswith('declare #year int ='):
new_lines.append(f"DECLARE #Year INT = {year}\n")
continue
if where_clause is not None:
if line.lower().startswith(before_clause_starts_with.lower()):
new_lines.append(where_clause + "\n")
new_lines.append(line)
continue
new_lines.append(line)
new_query = ''.join(new_lines)
return new_query
def conn_sql_server(file_path, year=None, where_clause=None, before_clause_starts_with=None):
'''Function to connect to SQL Server and save query result to a dataframe
input:
file_path - query file path
output:
df - dataframe from the query result
'''
# Connect to SQL Server
conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
'Server= servername;'
'Database = databasename;'
'Trusted_Connection=yes;')
# run query and ouput the result to df
query = open(file_path, 'r')
lines = query.readlines()
query.close()
new_query = modify_query(lines, year, where_clause, before_clause_starts_with)
df = pd.read_sql_query(new_query, conn)
return df
df1 = conn_sql_server('C:/Users/JJ/SQL script1',
year=1999,
where_clause='WHERE itemNumber = 1002345',
before_clause_starts_with='group by')
df2 = conn_sql_server('C:/Users/JJ/SQL script2')
df3 = conn_sql_server('C:/Users/JJ/SQL script3',
year = 1500)
Simulation
Let's run an example.
script1.sql
DECLARE #Year INT = 2022;
SELECT YEAR(date) #Year,
SUM(list_price * quantity) gross_sales
FROM sales.orders o
INNER JOIN sales.order_items i ON i.order_id = o.order_id
GROUP BY YEAR(date)
order by #Year
script2.sql
DECLARE #Year INT = 2022;
SELECT gross_sales
FROM sales.orders
order by #Year
script3.sql
DECLARE #Year INT = 2022;
SELECT GETDATE()
Using a script similar to the above, we'll try to see how each script looks like after it gets modified.
Simulation script
#import pyodbc
#import pandas as pd
def modify_query(lines, year, where_clause, before_clause_starts_with):
new_lines = []
print('-------')
print('ORIGINAL')
print('-------')
print(lines)
for line in lines:
if year is not None:
if line.lower().startswith('declare #year int ='):
new_lines.append(f"DECLARE #Year INT = {year}\n")
continue
if where_clause is not None:
if line.lower().startswith(before_clause_starts_with.lower()):
new_lines.append(where_clause + "\n")
new_lines.append(line)
continue
new_lines.append(line)
print('-------')
print('NEW')
print('-------')
new_query = ''.join(new_lines)
print(new_query)
return new_query
def conn_sql_server(file_path, year=None, where_clause=None, before_clause_starts_with=None):
'''Function to connect to SQL Server and save query result to a dataframe
input:
file_path - query file path
output:
df - dataframe from the query result
'''
# Connect to SQL Server
#conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
# 'Server= servername;'
# 'Database = databasename;'
# 'Trusted_Connection=yes;')
# run query and ouput the result to df
query = open(file_path, 'r')
lines = query.readlines()
query.close()
new_query = modify_query(lines, year, where_clause, before_clause_starts_with)
#df = pd.read_sql_query(new_query, conn)
#return df
#df1 = conn_sql_server('C:/Users/JJ/SQL script1')
#df2 = conn_sql_server('C:/Users/JJ/SQL script2')
#df3 = conn_sql_server('C:/Users/JJ/SQL script3')
df1 = conn_sql_server('script1.sql', year=1999, where_clause='WHERE itemNumber = 1002345', before_clause_starts_with='group by')
df2 = conn_sql_server('script2.sql')
df3 = conn_sql_server('script3.sql', year=1500)
Original query 1 was like this in script1.sql
['DECLARE #Year INT = 2022;\n', 'SELECT YEAR(date) #Year, \n', ' SUM(list_price * quantity) gross_sales\n', 'FROM sales.orders o\n', ' INNER JOIN sales.order_items i ON i.order_id = o.order_id\n', 'GROUP BY YEAR(date)\n', 'order by #Year']
After running the script, the query will become
DECLARE #Year INT = 1999
SELECT YEAR(date) #Year,
SUM(list_price * quantity) gross_sales
FROM sales.orders o
INNER JOIN sales.order_items i ON i.order_id = o.order_id
WHERE itemNumber = 1002345
GROUP BY YEAR(date)
order by #Year
Query 3 used to look like this:
['DECLARE #Year INT = 2022;\n', 'SELECT GETDATE()']
It becomes
DECLARE #Year INT = 1500
SELECT GETDATE()
Give it a shot by changing the python script as you deem fit.

Related

How to allow NULL values (Loading pandas dataframe into MS SQL Server)

My current codes can only load full dataframe into SQL server tables.
If there are some NULL values in the column OpSeq. How can I change my current codes to accept NULL values.
import pandas as pd
import pyodbc
# Import CSV
data = pd.read_csv(r'C:\Users\moin\Desktop\MM\XXK.csv')
df = pd.DataFrame(data, columns['Order','Number','Group','ActualStartDate','ActualCompletionDate','OpSeq'])
# Connect to SQL Server
conn = pyodbc.connect('Driver={SQL Server};'
'Server=XXXXXXX\SQLEXPRESS;'
'Database=test_XXXXXproduction;'
'Trusted_Connection=yes;')
cursor = conn.cursor()
# Create Table
cursor.execute('CREATE TABLE people_info (Order nvarchar(50), Number nvarchar(50),
Group nvarchar(50), ActualStartDate datetime, ActualCompletionDate datetime, OpSeq int)')
# Insert DataFrame to Table
for row in df.itertuples():
cursor.execute('''
INSERT INTO test_preproduction.dbo.people_info (Order, Number, Group, ActualStartDate, ActualCompletionDate, OpSeq)
VALUES (?,?,?,?,?,?)
''',
row.Order,
row.Number,
row.Group,
row.ActualStartDate,
row.ActualCompletionDate,
row.OpSeq
)
conn.commit()
your code should work depending on how you pass null
, you can pass NULL as NaN or None or 'null'

Update row with values from select on condition, else insert new row

I'm need to run a calculation for month every day. If the month period, exists already, I need to update it, else I need to create a new row for the new month.
Currently, I've written
declare #period varchar(4) = '0218'
DECLARE #Timestamp date = GetDate()
IF EXISTS(select * from #output where period=#period)
/* UPDATE #output SET --- same calculation as below ---*/
ELSE
SELECT
#period AS period,
SUM(timecard.tworkdol) AS dol_local,
SUM(timecard.tworkdol/currates.cdrate) AS dol_USD,
SUM(timecard.tworkhrs) AS hrs,
#Timestamp AS timestamp
FROM dbo.timecard AS timecard
INNER JOIN dbo.timekeep ON timecard.ttk = timekeep.tkinit
INNER JOIN dbo.matter with (nolock) on timecard.tmatter = matter.mmatter
LEFT JOIN dbo.currates with (nolock) on matter.mcurrency = currates.curcode
AND currates.trtype = 'A'
AND timecard.tworkdt BETWEEN currates.cddate1
AND currates.cddate2
WHERE timekeep.tkloc IN('06','07') AND
timecard.twoper = #period
SELECT * FROM #output;
How can simply update my row with the new data from my select.
Not sure what RDBMS are you using, but in SQL Server something like this would update the #output table with the results of the SELECT that you placed in the ELSE part:
UPDATE o
SET o.dol_local = SUM(timecard.tworkdol),
SET o.dol_USD = SUM(timecard.tworkdol/currates.cdrate),
SET o.hrs = SUM(timecard.tworkhrs),
set o.timestamp = #Timestamp
FROM #output o
INNER JOIN dbo.timecard AS timecard ON o.period = timecard.twoper
INNER JOIN dbo.timekeep ON timecard.ttk = timekeep.tkinit
INNER JOIN dbo.matter with (nolock) on timecard.tmatter = matter.mmatter
LEFT JOIN dbo.currates with (nolock) on matter.mcurrency = currates.curcode
AND currates.trtype = 'A'
AND timecard.tworkdt BETWEEN currates.cddate1
AND currates.cddate2
WHERE timekeep.tkloc IN('06','07') AND
timecard.twoper = #period
Also, I think you want to do an INSERT in the ELSE part, but you are doing just a SELECT, so I guess you should fix that too
The answer to this will vary by SQL dialect, but the two main approaches are:
1. Upsert (if your DBMS supports it), for example using a MERGE statement in SQL Server.
2. Base your SQL on an IF:
IF NOT EXISTS (criteria for dupes)
INSERT INTO (logic to insert)
ELSE
UPDATE (logic to update)

SQL OpenQuery linked server Oracle DateAdd

I am writing a sql query to pull some data from one of our linked oracle servers.
The only problem with this query is with the two date add rows (with them removed the query runs) but i need their data.
I receive the below error:
OLE DB provider "MSDAORA" for linked server "MAGINUS" returned message "ORA-00904: "DATEADD": invalid identifier
Could anyone prescribe a syntax for these?
Thanks in advance
Will
DECLARE #TSQL VARCHAR(8000)
,#CUSTOMER_ACCOUNT VARCHAR(20)
SELECT #TSQL1 = '
SELECT * FROM OPENQUERY(MAGINUS,''
SELECT
CM.CUSTOMER_ACCOUNT AS "CustomerAccount"
,CM.CONTACT_NAME AS "ContactName"
,CM.MEMBERSHIP_NUMBER AS "MembershipNumber"
,P.LONG_DESCRIPTION_1 AS "ProductDescription"
,DATEADD(SECOND, CM.MEMBERSHIP_START_DATE, "19700101") AS "MembershipStartDate"
,DATEADD(SECOND, CM.MEMBERSHIP_EXPIRY_DATE, "19700101") AS "MemberhrshipEndDate"
,SH.ORDER_VALUE AS "Price Paid"
FROM MAGINUS.CUSTOMER_MEMBERSHIP CM
INNER JOIN MAGINUS.PRODUCT P
ON CM.PRODUCT_CODE = P.PRODUCT_CODE
INNER JOIN MAGINUS.SALES_HEADER SH
ON CM.CUSTOMER_ACCOUNT = SH.CUSTOMER_ACCOUNT
AND CM.SALES_DOCUMENT_NUM = SH.SALES_DOCUMENT_NUM
WHERE CM.CUSTOMER_ACCOUNT = ''''' + #CUSTOMER_ACCOUNT + ''''''')'
EXEC (#TSQL1)
Dateadd is valid for SQL Server, but the query you send to the Oracle db needs to be valid for oracle. So you need to move the dateadd outside of the openquery, and into the SQL Server bit:
DECLARE #TSQL VARCHAR(8000)
,#CUSTOMER_ACCOUNT VARCHAR(20)
SELECT #TSQL1 = '
SELECT CustomerAccount
,ContactName
,MembershipNumber
,ProductDescription
,DATEADD(SECOND, MembershipStartDate, ''19700101'') AS MembershipStartDate
,DATEADD(SECOND, MemberhrshipEndDate, ''19700101'') AS MemberhrshipEndDate
,Price Paid
FROM OPENQUERY(MAGINUS,''
SELECT
CM.CUSTOMER_ACCOUNT AS "CustomerAccount"
,CM.CONTACT_NAME AS "ContactName"
,CM.MEMBERSHIP_NUMBER AS "MembershipNumber"
,P.LONG_DESCRIPTION_1 AS "ProductDescription"
,CM.MEMBERSHIP_START_DATE AS "MembershipStartDate"
,CM.MEMBERSHIP_EXPIRY_DATE AS "MemberhrshipEndDate"
,SH.ORDER_VALUE AS "Price Paid"
FROM MAGINUS.CUSTOMER_MEMBERSHIP CM
INNER JOIN MAGINUS.PRODUCT P
ON CM.PRODUCT_CODE = P.PRODUCT_CODE
INNER JOIN MAGINUS.SALES_HEADER SH
ON CM.CUSTOMER_ACCOUNT = SH.CUSTOMER_ACCOUNT
AND CM.SALES_DOCUMENT_NUM = SH.SALES_DOCUMENT_NUM
WHERE CM.CUSTOMER_ACCOUNT = ''''' + #CUSTOMER_ACCOUNT + ''''''')'
EXEC (#TSQL1)

SQL Server Function like behavior in a query without creating a function?

I have a situation where I can save a lot of repeated text in a query if I use a function however at this site I do not have rights to create a function on the server.
It there a way to have a function defined in the body of the query and then call it from the query itself?
Hopefully I post the following stripped down code correctly. The real query calls two functions, count by year and revenue by year.
-- Function
alter function PetRockCountByYear
(#Company varchar(50), #DateFrom varchar(20), #DateTo varchar(20))
RETURNS varchar(50)
AS
begin
return (select isnull( SUM( transact.qty ), 0)
from TRANSACT
inner join CATALOG on transact.ITEM_NO = catalog.ITEM_NO
where transact.TRAN_DATE >= #DateFrom
and transact.TRAN_DATE <= #DateTo
and transact.ITEM_NO = 'PetRock'
and #Company = transact.company
and transact.ITEM_NO = catalog.ITEM_NO)
end
-- Simplified Query Below
select distinct
company.account, company.COMPANY,
company.STATUS, company.code,
-- Report counts from 1970 - 2015
(select dbo.PetRockCountByYear(company.COMPANY, '01/01/1970', '12/31/1970') ) as '#1970',
(select dbo.PetRockCountByYear( company.COMPANY , '01/01/1971', '12/31/1971') ) as '#1971'
from
TRANSACT
Inner join
invoices on transact.inv_no = invoices.inv_no
Inner join
COMPANY on invoices.COMPANY = company.company
where
ITEM_NO = 'PetRock'
order by
company.ACCOUNT
There are no temporary functions in SQL-Server T-SQL, you need to have CREATE FUNCTION rights on the database look at: https://msdn.microsoft.com/en-us/library/ms178569.aspx#Anchor_2 And https://msdn.microsoft.com/en-us/library/ms191320.aspx#Anchor_0

Extract Substring from text in SQL sever

I capture MDX statements fired on the SSAS Cube using a SQL profiler into a table. What I want to do is to extract the Cube name from the MDX statement.
The problem I have is the fact that the MDX statements are pretty huge and random (Users connect to the Cube and create Adhoc reports) and have multiple Sub Cubes constructed making it difficult to fetch the Cube Name.
I was able to figure out a pattern for search.
First string: 'FROM ['
Second string: ']'
I need to now pickup a substring from the variable.
Example below:
DECLARE #TEXT varchar(max) = 'SELECT NON EMPTY (((( [[ XXXXX ]] }) ON ROWS FROM (SELECT ({XXXXXXXX }) ON COLUMNS FROM [Sales Reporting]))
WHERE XXXXX ))'
DECLARE #FirstPosition int = (SELECT CHARINDEX('FROM [',#TEXT)+5)
DECLARE #SecondPosition int = (SELECT CHARINDEX(']',#TEXT,#FirstPosition))
SELECT #FirstPosition, #SecondPosition
SELECT SUBSTRING(#TEXT,CHARINDEX('FROM [',#TEXT)+5,(CHARINDEX(']',#TEXT,#FirstPosition)-CHARINDEX('[',#TEXT))-1)
Desired Result = Sales Reporting
Got the solution turned out to be simple than I expected.
DECLARE #TEXT varchar(max) = 'SELECT NON EMPTY (((( [[ XXXXX ]] }) ON ROWS FROM (SELECT ({XXXXXXXX }) ON COLUMNS FROM [Sales Reporting]))
WHERE XXXXX ))'
DECLARE #FirstPosition int = (SELECT CHARINDEX('FROM [',#TEXT)+5)
DECLARE #ExtractString nvarchar(max) = (SELECT SUBSTRING(#TEXT,#FirstPosition, LEN(#Text)))
DECLARE #SecondPosition int = (SELECT CHARINDEX(']',#ExtractString))
SELECT SUBSTRING(#ExtractString,CHARINDEX('[',#ExtractString)+1,(CHARINDEX(']',#ExtractString)-CHARINDEX('[',#ExtractString))-1) AS CubeName

Resources