SQLAlchemy raw sql with list in where clause mssql pyodbc - sql-server

I want to pass a list into my raw sql where clause but I keep getting this error:
sqlalchemy.exc.DBAPIError: (pyodbc.Error) ('HY004', '[HY004] [Microsoft][ODBC SQL Server Driver]Invalid SQL data type (0) (SQLBindParameter)'
id = [1, 2, 3]
query = text("select * from table where col in :id")
conn.execute(query, {'id': tuple(id)})
This should work (I see them as solutions on StackOverflow) but maybe not for sqlserver? How do I make it work for mssql?

id = [1, 2, 3]
query = text("select * from table where col in :id")
query = query.bindparams(bindparam('id', expanding=True))
conn.execute(query, {'id': id})
as per OP #jole5646's edit on his own question.

Related

SQLAlchemy find IDs from a list that don't exist in a table

I have a table with some records(every record has an id - primary key). Now I need to select all ids from specified list/set that don't exist in the table. I am using postgres database and sqlalchemy as ORM. Please suggest have to perform such query.
May not be efficient for a very large set, but straightforward flow for demonstration:
my_list = [1, 2, 3, 4, 5]
missing_from_table = []
for id in my_list:
result = session.query(Model.id).get(id) # result is a tuple
if not result:
missing_from_table.append(id)
print(f'The following ids are not in the table: {missing_from_table}')
Another option would be:
my_list = [1, 2, 3, 4, 5]
all_ids = [r.id for r in session.query(Model.id).all()]
missing_from_table = [id for id in my_list if id not in all_ids]
Here's an option that operates completely in the database. It bypasses the ORM, but still utilizes SQLAlchemy's conveniences for the session and object mapping.
from sqlalchemy import text
my_list = [1, 2, 3, 4, 5, 6]
query = text("""
SELECT array_agg(id)
FROM unnest(:my_list) id
WHERE id NOT IN (
SELECT
id
FROM
insert-table-name-here
)
""")
# Belows assumes session is a defined SQLAlchemy database session
missing_ids = session.execute(query, {'my_list': my_list}).scalar()
print(f'The following ids from my_list are missing in table: {missing_ids}')
This option uses SQLAlchemy and is very efficient.
from sqlalchemy.sql import Values, select, column
new_items_values = Values(column("id"), name="new_items").data(
items_ids
)
query = (
select(new_items_values.c.id)
.outerjoin(
Model,
Model.id == new_items_values.c.id,
)
.where(Model.id == None)
)
# assumes `session` is already defined
missing_ids = set(
session.execute(
query,
{"items_ids": items_ids},
)
or []
) # a list of unary tuples, e.g. [('id1',), ('id4',)]
This generates SQL like this:
select new_items.id
from (
values ('id1'),('id2'),('id3'), ('id4')
) as new_items(id)
left join model on model.id = new_items.id
where model.id is null;
(I compared this to #Matt Graham's solution below, using WHERE id NOT IN; this solution takes 20ms; the other one took literal minutes on my database.)

Trying to insert pandas dataframe to temporary table

I'm looking to create a temp table and insert a some data into it. I have used pyodbc extensively to pull data but I am not familiar with writing data to SQL from a python environment. I am doing this at work so I dont have the ability to create tables, but I can create temp and global temp tables. My intent is to insert a relatively small dataframe (150rows x 4cols)into a temp table and reference it throughout my session, my program structure makes it so that a global variable in the session will not suffice.I am getting the following error when trying the piece below, what am I doing wrong?
pyodbc.ProgrammingError: ('42S02', "[42S02] [Microsoft][ODBC SQL Server Driver][SQL Server]Invalid object name 'sqlite_master'. (208) (SQLExecDirectW); [42S02] [Microsoft][ODBC SQL Server Driver][SQL Server]Statement(s) could not be prepared. (8180)")
import numpy as np
import pandas as pd
import pyodbc
conn = pyodbc.connect('Driver={SQL Server};'
'Server=SERVER;'
'Database=DATABASE;'
'Trusted_Connection=yes;')
cursor = conn.cursor()
temp_creator = '''CREATE TABLE #rankings (Col1 int, Col2 int)'''
cursor.execute(temp_creator)
df_insert = pd.DataFrame({'Col1' : [1, 2, 3], 'Col2':[4,5,6]})
df_insert.to_sql(r'#rankings', conn, if_exists='append')
read_query = '''SELECT * FROM #rankings'''
df_back = pd.read_sql(read_query,conn)
Pandas.to_sql is failing there. But for SQL Server 2016+/Azure SQL Database there's a better way in any case. Instead of having pandas insert each row, send the whole dataframe to the server in JSON format and insert it in a single statement. Like this:
import numpy as np
import pandas as pd
import pyodbc
conn = pyodbc.connect('Driver={Sql Server};'
'Server=localhost;'
'Database=tempdb;'
'Trusted_Connection=yes;')
cursor = conn.cursor()
temp_creator = '''CREATE TABLE #rankings (Col1 int, Col2 int);'''
cursor.execute(temp_creator)
df_insert = pd.DataFrame({'Col1' : [1, 2, 3], 'Col2':[4,5,6]})
df_json = df_insert.to_json(orient='records')
print(df_json)
load_df = """\
insert into #rankings(Col1, Col2)
select Col1, Col2
from openjson(?)
with
(
Col1 int '$.Col1',
Col2 int '$.Col2'
);
"""
cursor.execute(load_df,df_json)
#df_insert.to_sql(r'#rankings', conn, if_exists='append')
read_query = '''SELECT * FROM #rankings'''
df_back = pd.read_sql(read_query,conn)
print(df_back)
which outputs
[{"Col1":1,"Col2":4},{"Col1":2,"Col2":5},{"Col1":3,"Col2":6}]
Col1 Col2
0 1 4
1 2 5
2 3 6
Press any key to continue . . .
Inserting into temp table using sqlalchemy works great:
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('mssql://sql-server/MY_DB?trusted_connection=yes&driver=ODBC+Driver+17+for+SQL+Server')
df1 = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})
df1.to_sql(name='#my_temp_table', con=engine)
df2 = pd.read_sql_query(sql='select * from #my_temp_table', con=engine)
# Now we can test they are the same:
pd.testing.assert_frame_equal(df1,df2.drop(columns=['index']))

How to parse wrong data in a column then save to other table using SQL Server?

How to parse the correct date data then separate or transfer the wrong date data in a separated table.
Example:
Correct Table A
ColumnId, ColumnDate
1, 06/11/2016
2, 簽訂臨時買賣合約後 交易再未有進展 The PASP has not proceeded further
3, 11/11/2016
4, 在 7/12/2016 ,基 於法例35(2)(b)條所 容許的原因,售價 更改為
5, 24-10-2016
Wrong Table B
ColumnId, ColumnDate
2, 簽訂臨時買賣合約後 交易再未有進展 The PASP has not proceeded further
4, 在 7/12/2016 ,基 於法例35(2)(b)條所 容許的原因,售價 更改為

Entity Framework 6 + Oracle "Where In" Clause

Despite our best efforts we have been unable to get Entity Framework (6.1.3) + Oracle Managed Data Access (12.1.2400) to generate an 'IN' clause when using contains in a where statement.
For the following query:
var x = Tests
.Where(t => new[] { 1, 2, 3}.Contains(t.ServiceLegId));
var query = x.ToString();
Using MS SQL (SQL Server) we see the following generated:
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[TestRunId] AS [TestRunId],
[Extent1].[DidPass] AS [DidPass],
[Extent1].[StartTime] AS [StartTime],
[Extent1].[EndTime] AS [EndTime],
[Extent1].[ResultData] AS [ResultData],
[Extent1].[ServiceLegId] AS [ServiceLegId]
FROM [dbo].[Test] AS [Extent1]
WHERE [Extent1].[ServiceLegId] IN (1, 2, 3)
Using Oracle we instead see:
SELECT
"Extent1"."Id" AS "Id",
"Extent1"."TestRunId" AS "TestRunId",
"Extent1"."DidPass" AS "DidPass",
"Extent1"."StartTime" AS "StartTime",
"Extent1"."EndTime" AS "EndTime",
"Extent1"."ResultData" AS "ResultData",
"Extent1"."ServiceLegId" AS "ServiceLegId"
FROM "dbo"."Test" AS "Extent1"
WHERE ((1 = "Extent1"."ServiceLegId") OR (2 = "Extent1"."ServiceLegId") OR (3 = "Extent1"."ServiceLegId"))
This is a trivialized example of what we actually have to do. In the actual code base this list can get quite long so a series of 'OR' statements is resulting in very inefficient execution plans.
Has anyone encountered this scenario? I feel like we've tried everything...

Strings and the WHERE clause when using RODBC's sqlQuery

I am querying tables in SQL server using RODBC in R:
Example:
num = 2
temp <- sqlQuery(conn, sprintf('SELECT "Time", "Temp"
FROM "DataTable"
WHERE "Week_Number" = %s
ORDER BY "Time"', num))
This works fine, but if I try to use the WHERE clause on a column containing strings I can't get it to work
Example:
place_name <- 'London'
temp <- sqlQuery(conn, sprintf('SELECT "Time", "Place"
FROM "Data_Table"
WHERE "Place" = %s
ORDER BY "Time"', place_name))
I have tried various things e.g:
place_name <- 'London'
place_name <- \'London\'
place_name <- "'London'"
place_name <- gsub("'", "''", London)
None of this have worked. I am getting the following error message:
"42000 102 [Microsoft][ODBC Driver 11 for SQL Server][SQL Server]Incorrect syntax near 'London'."
Any suggestions?
In case anybody else is interested I found a solution. I installed the RODBCext package, which provides support for parameterized queries. I used the following code:
query <- 'SELECT "Time", "Place" FROM "Data_Table" WHERE "Place" = ?'
temp <- sqlExecute(conn, query, 'London', fetch = TRUE)
Here is a some good information on the use of RODBCext: RODBCext

Resources