Despite our best efforts we have been unable to get Entity Framework (6.1.3) + Oracle Managed Data Access (12.1.2400) to generate an 'IN' clause when using contains in a where statement.
For the following query:
var x = Tests
.Where(t => new[] { 1, 2, 3}.Contains(t.ServiceLegId));
var query = x.ToString();
Using MS SQL (SQL Server) we see the following generated:
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[TestRunId] AS [TestRunId],
[Extent1].[DidPass] AS [DidPass],
[Extent1].[StartTime] AS [StartTime],
[Extent1].[EndTime] AS [EndTime],
[Extent1].[ResultData] AS [ResultData],
[Extent1].[ServiceLegId] AS [ServiceLegId]
FROM [dbo].[Test] AS [Extent1]
WHERE [Extent1].[ServiceLegId] IN (1, 2, 3)
Using Oracle we instead see:
SELECT
"Extent1"."Id" AS "Id",
"Extent1"."TestRunId" AS "TestRunId",
"Extent1"."DidPass" AS "DidPass",
"Extent1"."StartTime" AS "StartTime",
"Extent1"."EndTime" AS "EndTime",
"Extent1"."ResultData" AS "ResultData",
"Extent1"."ServiceLegId" AS "ServiceLegId"
FROM "dbo"."Test" AS "Extent1"
WHERE ((1 = "Extent1"."ServiceLegId") OR (2 = "Extent1"."ServiceLegId") OR (3 = "Extent1"."ServiceLegId"))
This is a trivialized example of what we actually have to do. In the actual code base this list can get quite long so a series of 'OR' statements is resulting in very inefficient execution plans.
Has anyone encountered this scenario? I feel like we've tried everything...
Related
I have an existing SQL Server 2008 database which has a number of views, stored procedures and functions.
I want to be able to SELECT data from one of these SQL functions and limit the number of rows that it returns in a paging scenario.
I have tried using .Select with .Skip and .Take as follows:
public IEnumerable<Product> CallSqlFunction_dbo_Search_Products_View(int clientId,
string environmentCode,
int sessionId)
{
IEnumerable<Product> results;
using (var db = _dbConnectionFactory.Open())
{
results = db.Select<Product>(#"
SELECT
*
FROM
[dbo].[Search_Products_View]
(
#pClientID,
#pEnvironmentCode,
#pSessionId
)", new
{
pClientID = clientId,
pEnvironmentCode = environmentCode,
pSessionId = sessionId
})
.Skip(0)
.Take(1000);
db.Close();
}
return results;
}
This produces the following SQL which is executed on the SQL Server.
exec sp_executesql N'
SELECT
*
FROM
[dbo].[Search_Products_View]
(
#pClientID,
#pEnvironmentCode,
#pSessionId
)',N'#pClientID int,#pEnvironmentCode varchar(8000),#pSessionId int',#pClientID=0,#pEnvironmentCode='LIVE',#pSessionId=12345
It means that this query returns 134,000 products, not the first page of 1000 I was expecting. The paging happens on the API server once the SQL Server has returned 134,000 rows.
Is it possible to use ORMLite so that I can get it to generate the paging in the query similar to this:
exec sp_executesql N'
SELECT
[t1].*
FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY [t0].[ProductId], [t0].[ProductName])
FROM
[dbo].[Search_Products_View](#pClientId, #pEnvironmentCode, #pSessionId) AS [t0]
WHERE
(LOWER([t0].[ProductStatus]) = #pProductStatus1) OR (LOWER([t0].[ProductStatus]) = #pProductStatus2) OR (LOWER([t0].[ProductStatus]) = #pProductStatus3)
) AS [t1]
WHERE
[t1].[ROW_NUMBER] BETWEEN #pPageNumber + 1 AND #pPageNumber + #pNumberOfRowsPerPage
ORDER BY [t1].[ROW_NUMBER]',
N'#pClientId decimal(9,0),#pEnvironmentCode char(3),#pSessionId decimal(9,0),#pProductStatus1 varchar(8000),#pProductStatus2 varchar(8000),#pProductStatus3 varchar(8000),#pPageNumber int,#pNumberOfRowsPerPage int',
#pClientId=0,#pEnvironmentCode='LIVE',#pSessionId=12345,#pProductStatus1='1',#pProductStatus2='2',#pProductStatus3='3',#pPageNumber=0,#pNumberOfRowsPerPage=1000
OrmLite will use the windowing function hack in <= SQL Server 2008 for its typed queries but not for Custom SQL. You'll need to include the entire SQL (inc. Windowing function) into your Custom SQL.
If you do this a lot I'd suggest wrapping the Windowing Function SQL Template in an extension method so you can easily make use of it in your custom queries, e.g:
results = db.Select<Product>(#"
SELECT
*
FROM
[dbo].[Search_Products_View]
(
#pClientID,
#pEnvironmentCode,
#pSessionId
)"
.InWindowingPage(0,1000), new
{
pClientID = clientId,
pEnvironmentCode = environmentCode,
pSessionId = sessionId
})
If you want to use DB Params for the offsets you'll need some coupling to use conventional param names:
results = db.Select<Product>(#"
SELECT
*
FROM
[dbo].[Search_Products_View]
(
#pClientID,
#pEnvironmentCode,
#pSessionId
)"
.InWindowingPage(), new
{
pClientID = clientId,
pEnvironmentCode = environmentCode,
pSessionId = sessionId,
pPageNumber = 0,
pNumberOfRowsPerPage = 100
})
I write a C++ application via Visual Studio 2008 + ADO(not ADO.net). Which will do the following tasks one by one:
Create a table in SQL Server database, as follows:
CREATE TABLE MyTable
(
[S] bigint,
[L] bigint,
[T] tinyint,
[I1] int,
[I2] smallint,
[P] bigint,
[PP] bigint,
[NP] bigint,
[D] bit,
[U] bit
);
Insert 5,030,242 records via BULK INSERT
Create an index on the table:
CREATE Index [MyIndex] ON MyTable ([P]);
Start a function which will lookup for 65,000,000 times. Each lookup using the following query:
SELECT [S], [L]
FROM MyTable
WHERE [P] = ?
Each time the query will either return nothing, or return one row. If getting one row with the [S] and [L], I will convert [S] to a file pointer and then read data from offset specified by [L].
Step 4 takes a lot of time. So I try to profile it and find out the lookup query takes the most of the time. Each lookup will take about 0.01458 second.
I try to improve the performance by doing the following tasks:
Use parametered ADO query. See step 4
Select only the required columns. Originally I use "Select *" for step 4, now I use Select [S], [L] instead. This improves performance by about 1.5%.
Tried both clustered and non-clustered index for [P]. It seems that using non-clustered index will be a little better.
Are there any other spaces to improve the lookup performance?
Note: [P] is unique in the table.
Thank you very much.
You need to batch the work and perform one query that returns many rows, instead of many queries each returning only one row (and incurring a separate round-trip to the database).
The way to do it in SQL Server is to rewrite the query to use a table-valued parameter (TVP), and pass all the search criteria (denoted as ? in your question) together in one go.
First we need to declare the type that the TVP will use:
CREATE TYPE MyTableSearch AS TABLE (
P bigint NOT NULL
);
And then the new query will be pretty simple:
SELECT
S,
L
FROM
#input I
JOIN MyTable
ON I.P = MyTable.P;
The main complication is on the client side, in how to bind the TVP to the query. Unfortunately, I'm not familiar with ADO - for what its worth, this is how it would be done under ADO.NET and C#:
static IEnumerable<(long S, long L)> Find(
SqlConnection conn,
SqlTransaction tran,
IEnumerable<long> input
) {
const string sql = #"
SELECT
S,
L
FROM
#input I
JOIN MyTable
ON I.P = MyTable.P
";
using (var cmd = new SqlCommand(sql, conn, tran)) {
var record = new SqlDataRecord(new SqlMetaData("P", SqlDbType.BigInt));
var param = new SqlParameter("input", SqlDbType.Structured) {
Direction = ParameterDirection.Input,
TypeName = "MyTableSearch",
Value = input.Select(
p => {
record.SetValue(0, p);
return record;
}
)
};
cmd.Parameters.Add(param);
using (var reader = cmd.ExecuteReader())
while (reader.Read())
yield return (reader.GetInt64(0), reader.GetInt64(1));
}
}
Note that we reuse the same SqlDataRecord for all input rows, which minimizes allocations. This is documented behavior, and it works because ADO.NET streams TVPs.
Note: [P] is unique in the table.
Then you should make the index on P unique too - for correctness and to avoid wasting space on the uniquifier.
As a forewarning, I'm working with a SQL query generated from entity framework, entity framework is irrelevant to the question though.
Some context:
I am trying to pull specific records from a batch of 4,000 C# objects and perform an update or insert on them. I do not have the primary key of the records, for the objects come from an API, so I have to use a unique set of columns to pull the correct record.
The (simplified) queries and their execution plans:
Parameterized query (parameters are declared with a value set to 1 for demonstration purposes)
SELECT [x].[Id]
FROM [Gradebook].[AssignmentScore] AS [x]
WHERE
( ( ((([x].[CourseSectionAssignment_Id] = #__CSA_1) AND ([x].[Student_Id] = #__S_ID_1)) AND ([x].[AssessmentGBID] = #__A_GBID_1))
OR ((([x].[CourseSectionAssignment_Id] = #__CSA_2) AND ([x].[Student_Id] = #__S_ID_2)) AND ([x].[AssessmentGBID] = #__A_GBID_2)) )
OR ((([x].[CourseSectionAssignment_Id] = #__CSA_3) AND ([x].[Student_Id] = #__S_ID_3)) AND ([x].[AssessmentGBID] = #__A_GBID_3)) )
And it's (unfavorable) execution plan:
Literal values query:
SELECT [x].[Id]
FROM [Gradebook].[AssignmentScore] AS [x]
WHERE
( ( ((([x].[CourseSectionAssignment_Id] = 1) AND ([x].[Student_Id] = 2)) AND ([x].[AssessmentGBID] = 3))
OR ((([x].[CourseSectionAssignment_Id] = 4) AND ([x].[Student_Id] = 5)) AND ([x].[AssessmentGBID] = 6)) )
OR ((([x].[CourseSectionAssignment_Id] = 7) AND ([x].[Student_Id] = 8)) AND ([x].[AssessmentGBID] = 9)) )
And it's (favorable) execution plan:
I know why, or at least I believe I know why, the execution plans are different. That is because the optimizer does not know whether the parameters will be NULL and must optimize around that case. Testing the literal query with NULLs creates the unfavorable execution plan. (Why wouldn't parameter sniffing see that the values are not null and create the better execution plan?)
Currently in the C# code, I am manually using expression trees to replace the object's properties with expression constants so that the generated query is the literal query. As far as I'm aware, literals force SQL server to generate a new execution plan each time, which isn't great.
I would like the parameterized query to generate the favorable execution plan. So far, the only answer I've found is using the OPTION(RECOMPILE) hint, which isn't exactly what I want, since it forces the execution plan to be recreated.
How can I use the same, favorable execution plan each time with the parameterized query?
If you can't get around a bad index, in my experience, this will consistently produce a good execution plan.
SELECT [x].[Id]
FROM [Gradebook].[AssignmentScore] AS [x]
WHERE [x].[CourseSectionAssignment_Id] = 1
AND [x].[Student_Id] = 2
AND [x].[AssessmentGBID] = 3
UNION
SELECT [x].[Id]
FROM [Gradebook].[AssignmentScore] AS [x]
WHERE [x].[CourseSectionAssignment_Id] = 4
AND [x].[Student_Id] = 5
AND [x].[AssessmentGBID] = 6
UNION
SELECT [x].[Id]
FROM [Gradebook].[AssignmentScore] AS [x]
WHERE [x].[CourseSectionAssignment_Id] = 7
AND [x].[Student_Id] = 8
AND [x].[AssessmentGBID] = 9
Alternatively, you could create a temp table with the "valid" combinations and simply join to it on these three fields.
I want to put this query into a new view. For your information dbo.TransferAS400Auftrag is also a view.
SELECT dbo.TransferAS400Auftrag.Angebotsnummer AS AngNr1,
dbo.CSDokument.Angebotsnummer AS AngNr2,
dbo.TransferAS400Auftrag.OfferAngebotsnummer AS OAngNr1,
substring(dbo.TransferAS400Auftrag.OfferAngebotsnummer, 1, 10) AS OAngNr1_SUB10,
dbo.CSDokument.OfferAngebotsnummer AS OAngNr2,
substring(dbo.CSDokument.OfferAngebotsnummer, 1, 10) AS OAngNr2_SUB10
FROM dbo.TransferAS400Auftrag INNER JOIN
dbo.CSDokument ON dbo.TransferAS400Auftrag.Angebotsnummer =
dbo.CSDokument.Angebotsnummer
WHERE (LEN(dbo.TransferAS400Auftrag.OfferAngebotsnummer) > 10) AND
substring(dbo.TransferAS400Auftrag.OfferAngebotsnummer, 1, 10)
= substring(dbo.CSDokument.OfferAngebotsnummer, 1, 10)
But the view builder of management studio always changes the substring() = substring() part from the where clause into the INNER JOIN part. But with this change I can't save the view (error- object reference not set to an instance of an object). Why is it not possible to use the substring() = substring() in the WHERE clause? Or can I reach the goal in another way?
Hm ok with CREATE VIEW it worked. Probably it is really a problem with the builder. Thanks for the hints.
Just create it with Transact SQL not using management studio create view windows like this:
CREATE VIEW AS ViewName
AS
SELECT dbo.TransferAS400Auftrag.Angebotsnummer AS AngNr1,
dbo.CSDokument.Angebotsnummer AS AngNr2,
dbo.TransferAS400Auftrag.OfferAngebotsnummer AS OAngNr1,
substring(dbo.TransferAS400Auftrag.OfferAngebotsnummer, 1, 10) AS OAngNr1_SUB10,
dbo.CSDokument.OfferAngebotsnummer AS OAngNr2,
substring(dbo.CSDokument.OfferAngebotsnummer, 1, 10) AS OAngNr2_SUB10
FROM dbo.TransferAS400Auftrag INNER JOIN
dbo.CSDokument ON dbo.TransferAS400Auftrag.Angebotsnummer =
dbo.CSDokument.Angebotsnummer
WHERE (LEN(dbo.TransferAS400Auftrag.OfferAngebotsnummer) > 10) AND
substring(dbo.TransferAS400Auftrag.OfferAngebotsnummer, 1, 10)
= substring(dbo.CSDokument.OfferAngebotsnummer, 1, 10);
I have a table in my SQL server 2008 R2 database which includes two nullable decimal(16,6) columns. Let's call them column1 and column2.
When I try to run a Linq query against the entity generated from this table:
Table.Select(r => new Foo
{
Bar = (r.Column1 + r.Column2) / 2m
}
);
I get a System.OverflowException if column1 + column2 >= 15846. The message of the exception is only:
Conversion overflows.
With a bit of trial and error I've managed to make the query work with the following:
Table.Select(r => new Foo
{
Bar = (r.Column1 + r.Column2).HasValue ?
(r.Column1 + r.Column2).Value / 2m : 0
}
);
However, I was wondering if anyone could explain what was going wrong with the initial query.
Edit
The first query generates this SQL:
SELECT
1 AS [C1],
([Extent1].[Column1] + [Extent1].[Column2]) / cast(2 as decimal(18)) AS [C2]
FROM [dbo].[Table] AS [Extent1]
With a value of 10000 for both columns, running the query manually in SSMS the result is 10000.0000000000000000000000000 (25 decimal zeros).
The second query has this SQL:
SELECT
1 AS [C1],
CASE WHEN ([Extent1].[Column1] + [Extent1].[Column2] IS NOT NULL)
THEN ([Extent1].[Column1] + [Extent1].[Column2]) / cast(2 as decimal(18))
ELSE cast(0 as decimal(18))
END AS [C2]
FROM [dbo].[Table] AS [Extent1]
Running the query in SSMS returns 10000.00000000000000000000 (20 decimal zeros). Apparently there is a problem when EF tries to convert the first value (with 25 decimal zeros) into a decimal but with the second (with 20 decimal zeros) it works.
In the meantime it turned out that the problem also occurs with non-nullable columns and even a single decimal(16, 6) column. The following ...
Table.Select(r => new Foo
{
Bar = r.Column1 / 2m
}
);
... throws the same conversion exception (with a value of 20000 in the Column1).
Why do those two SQL queries result in two different numbers of digits?
And why can't the first number be converted into a decimal by EF?