My current codes can only load full dataframe into SQL server tables.
If there are some NULL values in the column OpSeq. How can I change my current codes to accept NULL values.
import pandas as pd
import pyodbc
# Import CSV
data = pd.read_csv(r'C:\Users\moin\Desktop\MM\XXK.csv')
df = pd.DataFrame(data, columns['Order','Number','Group','ActualStartDate','ActualCompletionDate','OpSeq'])
# Connect to SQL Server
conn = pyodbc.connect('Driver={SQL Server};'
'Server=XXXXXXX\SQLEXPRESS;'
'Database=test_XXXXXproduction;'
'Trusted_Connection=yes;')
cursor = conn.cursor()
# Create Table
cursor.execute('CREATE TABLE people_info (Order nvarchar(50), Number nvarchar(50),
Group nvarchar(50), ActualStartDate datetime, ActualCompletionDate datetime, OpSeq int)')
# Insert DataFrame to Table
for row in df.itertuples():
cursor.execute('''
INSERT INTO test_preproduction.dbo.people_info (Order, Number, Group, ActualStartDate, ActualCompletionDate, OpSeq)
VALUES (?,?,?,?,?,?)
''',
row.Order,
row.Number,
row.Group,
row.ActualStartDate,
row.ActualCompletionDate,
row.OpSeq
)
conn.commit()
your code should work depending on how you pass null
, you can pass NULL as NaN or None or 'null'
Related
I have several sql queries written in MS SQL Server and I used the following code to import them into Python using the pyodbc package.
import pyodbc
import pandas as pd
def conn_sql_server(file_path):
'''Function to connect to SQL Server and save query result to a dataframe
input:
file_path - query file path
output:
df - dataframe from the query result
'''
# Connect to SQL Server
conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
'Server= servername;'
'Database = databasename;'
'Trusted_Connection=yes;')
# run query and ouput the result to df
query = open(file_path, 'r')
df = pd.read_sql_query(query.read(), conn)
query.close()
return df
df1 = conn_sql_server('C:/Users/JJ/SQL script1')
df2 = conn_sql_server('C:/Users/JJ/SQL script2')
df3 = conn_sql_server('C:/Users/JJ/SQL script3')
In each SQL query, I have used DECLARE and SET to set the variables (variables are different in each SQL query). Here, I just copied a random query from online as an example. What I want to do is to update the Year variable directly in Python. My actual query is pretty long, so I don't want to copy over the SQL scripts in python, I just want to adjust the variables. Anyway to do it?
DECLARE #Year INT = 2022;
SELECT YEAR(date) #Year,
SUM(list_price * quantity) gross_sales
FROM sales.orders o
INNER JOIN sales.order_items i ON i.order_id = o.order_id
GROUP BY YEAR(date)
order by #Year
My other question is, is there anyway to add a WHERE statement, like WHERE itemNumber = 1002345 after importing the above query into Python. I'm asking this because df2 is a subset of df1. The restriction column isn't selected to show in the output, so I cannot do filterings in python after reading in df1. I could add that column in the df1 output and do more aggregations in Python, but that would largely increase the orginal data size and running time, so I prefer not to do it.
Here's a sample of how your script will look like. We are doing 2 modifications:
conn_sql_server now takes these parameters:
year: you can pass the year you want to replace declare #year...
where_clause: a where clause of your choice
before_clause_starts_with: the clause before which the where clause should be placed
modify_query method that reads the contents of the file and changes the content based on the year you provided. If you provide the where clause, it'll put it before the clause you provide in before_clause_starts_with
import pyodbc
import pandas as pd
def modify_query(lines, year, where_clause, before_clause_starts_with):
new_lines = []
for line in lines:
if year is not None:
if line.lower().startswith('declare #year int ='):
new_lines.append(f"DECLARE #Year INT = {year}\n")
continue
if where_clause is not None:
if line.lower().startswith(before_clause_starts_with.lower()):
new_lines.append(where_clause + "\n")
new_lines.append(line)
continue
new_lines.append(line)
new_query = ''.join(new_lines)
return new_query
def conn_sql_server(file_path, year=None, where_clause=None, before_clause_starts_with=None):
'''Function to connect to SQL Server and save query result to a dataframe
input:
file_path - query file path
output:
df - dataframe from the query result
'''
# Connect to SQL Server
conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
'Server= servername;'
'Database = databasename;'
'Trusted_Connection=yes;')
# run query and ouput the result to df
query = open(file_path, 'r')
lines = query.readlines()
query.close()
new_query = modify_query(lines, year, where_clause, before_clause_starts_with)
df = pd.read_sql_query(new_query, conn)
return df
df1 = conn_sql_server('C:/Users/JJ/SQL script1',
year=1999,
where_clause='WHERE itemNumber = 1002345',
before_clause_starts_with='group by')
df2 = conn_sql_server('C:/Users/JJ/SQL script2')
df3 = conn_sql_server('C:/Users/JJ/SQL script3',
year = 1500)
Simulation
Let's run an example.
script1.sql
DECLARE #Year INT = 2022;
SELECT YEAR(date) #Year,
SUM(list_price * quantity) gross_sales
FROM sales.orders o
INNER JOIN sales.order_items i ON i.order_id = o.order_id
GROUP BY YEAR(date)
order by #Year
script2.sql
DECLARE #Year INT = 2022;
SELECT gross_sales
FROM sales.orders
order by #Year
script3.sql
DECLARE #Year INT = 2022;
SELECT GETDATE()
Using a script similar to the above, we'll try to see how each script looks like after it gets modified.
Simulation script
#import pyodbc
#import pandas as pd
def modify_query(lines, year, where_clause, before_clause_starts_with):
new_lines = []
print('-------')
print('ORIGINAL')
print('-------')
print(lines)
for line in lines:
if year is not None:
if line.lower().startswith('declare #year int ='):
new_lines.append(f"DECLARE #Year INT = {year}\n")
continue
if where_clause is not None:
if line.lower().startswith(before_clause_starts_with.lower()):
new_lines.append(where_clause + "\n")
new_lines.append(line)
continue
new_lines.append(line)
print('-------')
print('NEW')
print('-------')
new_query = ''.join(new_lines)
print(new_query)
return new_query
def conn_sql_server(file_path, year=None, where_clause=None, before_clause_starts_with=None):
'''Function to connect to SQL Server and save query result to a dataframe
input:
file_path - query file path
output:
df - dataframe from the query result
'''
# Connect to SQL Server
#conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
# 'Server= servername;'
# 'Database = databasename;'
# 'Trusted_Connection=yes;')
# run query and ouput the result to df
query = open(file_path, 'r')
lines = query.readlines()
query.close()
new_query = modify_query(lines, year, where_clause, before_clause_starts_with)
#df = pd.read_sql_query(new_query, conn)
#return df
#df1 = conn_sql_server('C:/Users/JJ/SQL script1')
#df2 = conn_sql_server('C:/Users/JJ/SQL script2')
#df3 = conn_sql_server('C:/Users/JJ/SQL script3')
df1 = conn_sql_server('script1.sql', year=1999, where_clause='WHERE itemNumber = 1002345', before_clause_starts_with='group by')
df2 = conn_sql_server('script2.sql')
df3 = conn_sql_server('script3.sql', year=1500)
Original query 1 was like this in script1.sql
['DECLARE #Year INT = 2022;\n', 'SELECT YEAR(date) #Year, \n', ' SUM(list_price * quantity) gross_sales\n', 'FROM sales.orders o\n', ' INNER JOIN sales.order_items i ON i.order_id = o.order_id\n', 'GROUP BY YEAR(date)\n', 'order by #Year']
After running the script, the query will become
DECLARE #Year INT = 1999
SELECT YEAR(date) #Year,
SUM(list_price * quantity) gross_sales
FROM sales.orders o
INNER JOIN sales.order_items i ON i.order_id = o.order_id
WHERE itemNumber = 1002345
GROUP BY YEAR(date)
order by #Year
Query 3 used to look like this:
['DECLARE #Year INT = 2022;\n', 'SELECT GETDATE()']
It becomes
DECLARE #Year INT = 1500
SELECT GETDATE()
Give it a shot by changing the python script as you deem fit.
I have a SQL Server table with an identity column, set to autoincrement.
Coded in Perl, the insert in the code below works fine, in the while loop the fetchrow_array() call returns no data in the #row array.
How do I best retrieve the identity value for use in subsequent SQL statements?
my $term_sql = "INSERT INTO reminder_term(site, name, description, localization) OUTPUT \#\#IDENTITY VALUES(?,?,?,?)";
my $t_stmt = $dbh->prepare($term_sql);
...
$t_stmt->execute($site, $name, $description, $localizer);
while (#row = $t_stmt->fetchrow_array()) {
$referential_key = $row[0];
}
Avoid using the ##IDENTITY value since it's unreliable in the presence of triggers.
Given the example table schema...
create table [dbo].[reminder_term] (
[id] int not null identity(1,1),
[site] nvarchar(10),
[name] nvarchar(10),
[description] nvarchar(10),
[localization] nvarchar(10)
);
You can rework your OUTPUT clause slightly you can capture the new id value by way of the special inserted row source...
INSERT INTO reminder_term(site, name, description, localization)
OUTPUT inserted.id
VALUES(?,?,?,?)
I'm trying to load a CSV file into a table in order to sort the data. However, the smallmoney column BillingRate (e.g. $203.75) will not convert, with SQL Server producing the following message:
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 2, column 4 (BillingRate).
Msg 4864, Level 16, State 1, Line 10
Here is the code I am using in order to do this:
--CREATE TABLE SubData2
--(
--RecordID int,
--SubscriberID int,
--BillingMonth int,
--BillingRate smallmoney,
--Region varchar(255)
--);
BULK INSERT Subdata2
FROM 'C:\Folder\1-caseint.csv'
WITH
(FIRSTROW = 2,
FIELDTERMINATOR = '|', --CSV field delimiter
ROWTERMINATOR = '\n', --Use to shift the control to next row
ERRORFILE = 'C:\Folder\CaseErrorRows4.csv',
TABLOCK);
A typical line of code in the CSV files looks like:
1|0000000001|1|$233.94|"West"
Apologies if there are any glaring errors here - I'm new to SQL Server :)
Many thanks in advance,
Tom.
This is odd. On a direct insert only the last fails.
declare #t table (sm smallmoney);
insert into #t
values ('$256.6')
insert into #t
values (244.8);
insert into #t
values ('12.8');
insert into #t
values ('$256.5'), ('244.7');
insert into #t
values ($256.5);
insert into #t
values ('$256.5'), (244.7), ('12.12');
select * from #t;
Give it a try removing the $ from the data.
Hi I have the following table data I need to convert to Xml in SQl Server. Any ideas?
Thanks in advance
From
Party_Id HomePhoneNumber WorkPhoneNumber
62356 6314993578
62356 6314590922
62356 6313795488
To
<HomePhoneNumber>6314993578</HomePhoneNumber>
<WorkPhoneNumber>6314590922</WorkPhoneNumber>
<WorkPhoneNumber>6313795488</WorkPhoneNumber>
Convert the empty values into NULLs. These NULL values will be excluded from the XML.
Declare #YourTable table (Party_Id int,HomePhoneNumber varchar(25),WorkPhoneNumber varchar(25))
Insert Into #YourTable values
(62356,'6314993578',''),
(62356,'','6314590922'),
(62356,'','6313795488')
Select HomePhoneNumber=case when HomePhoneNumber='' then null else HomePhoneNumber end
,WorkPhoneNumber=case when WorkPhoneNumber='' then null else WorkPhoneNumber end
From #YourTable
For XML Path('')
Returns
<HomePhoneNumber>6314993578</HomePhoneNumber>
<WorkPhoneNumber>6314590922</WorkPhoneNumber>
<WorkPhoneNumber>6313795488</WorkPhoneNumber>
I have two databases Oracle (10.2.0.4) and SQL Server (2008 R2).
When I insert data through linked server:
EXECUTE( 'begin INSERT INTO TEST_TABLE(ID,data1,NIP) VALUES(?,sysdate,?); end;',
'',
''
) AT LS_ORACLE;
result in TEST_TABLE as:
id | data1 | NIP
-----------------
ǧ |16/07/21|
Empty string is convert to "ǧ".
How to eliminate this strange behavior?
For information:
CREATE TABLE "TEST_TABLE"
( "ID" VARCHAR2(200 BYTE),
"DATA1" DATE,
"NIP" VARCHAR2(200 BYTE)
)
you can use my workaround.
When you'll use variables this problem is solved
declare #id nvarchar(50) = ''
,#nip nvarchar(50) = ''
EXECUTE( 'begin INSERT INTO TEST_TABLE(ID,data1,NIP) VALUES(?,sysdate,?); end;',
#id,
#nip
) AT LS_ORACLE