DataReader and large datasets - sql-server

I have a query that returns between 10k and 20k rows. I'm dumping this data into a IEnumerable<T> in what as far as I know is the fastest possible way:
using (var rdr = cmd.ExecuteReader()) {
var trackIdCol = rdr.GetOrdinal("TrackId");
var dateTimeCol = rdr.GetOrdinal("ActDateTime");
var clicksCol = rdr.GetOrdinal("Clicks");
var ipCol = rdr.GetOrdinal("IPAddress");
while (rdr.Read()) {
yield return new SiteClick() {
TrackId = (int)rdr[trackIdCol],
DateTime = (DateTime)rdr[dateTimeCol],
Clicks = (int)rdr[clicksCol],
IPAddress = rdr[ipCol] as string
};
}
}
The query takes about 11s to return all the results in SSMS and SSDT, but the code above takes over 2 minutes. There has to be something I'm doing wrong here. SqlDataAdapter.Fill() also takes about 2 minutes to run, if that helps.
It is worth noting that our database is horribly unoptimized. Just the fact that it takes 11 seconds to get results from that query in SSMS is ridiculous, but I gotta work with what I got. If the query executes quickly in SSMS, but an empty while(rdr.Read()){} still takes 2 minutes, it is possible that the DB is still the issue?

You can't dump into IEnumerable, because it is interface that must have implementation behind the scene. In your sample yield used as such implementation. But yield is rather expensive. For every iteration its state is saved, then restored and this is slow for database objects.
Database reader is the fastest iterator itself. So if you can use it directly - this will
be the fastest way. But there is a drawback, because such database cursors might lock
db tables for access from other threads. So use it in single-user environments or
when you can read uncommitted data.
To reduce locking time you can dump data into in-memory IEnumerable implementations.
For example, List. Just set larger capacity to avoid frequent allocations, so it
will be as fast as underlying array. The drawback of such solution is memory usage. But
in you case this will be less than 1M.
List<SiteClick> list = new List<SiteClick>(20000);
Also you can improve performance and memory usage if define SiteClick as struct, not class.
In this case list will contain objects, not references.
A little bit additional performance you can get if use typified reader methods.
TrackId = rdr.GetInt32(trackIdCol),
DateTime = rdr.GetDateTime(dateTimeCol),
Clicks = rdr.GetInt32(clicksCol),
IPAddress = rdr.GetString(ipCol)
UPDATE: For SQL Server it's a frequent case, when queries are tested in SSMS. But there are some issues, because SMSS has different options compared to database defaults. For example, ARITHABORT. So you can set it manually for connection to test.
// Use the same connection as for data reading
var cmdOptions = connection.CreateCommand();
cmdOptions.CommandType = CommandType.Text;
cmdOptions.CommandText = "SET ARITHABORT ON";
cmdOptions.ExecuteNonQuery();
According to MS recommendations - http://msdn.microsoft.com/en-us/library/aa933126%28SQL.80%29.aspx, its better to ensure that you have these options set:
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
SET CONCAT_NULL_YIELDS_NULL ON
SET NUMERIC_ROUNDABORT ON
SET QUOTED_IDENTIFIER ON
SET NUMERIC_ROUNDABORT OFF

Related

Suspending Azure function until Entity Framework makes changes

I have an Azure function (Iot hub trigger) that:
selects a top 1 record ordered by time in descending order
compares with a new record that comes
writes the coming record only if it differs from the selected one (some fields are different)
The issue pops up when records come into the azure function very rapidly - I end up with duplicates in the database. I guess this is because SQL Server doesn't have enough time to make changes in the database by the time the next record comes and Azure function selects, and when the Azure function selects the latest record, it actually receives an outdated one.
I use EF Core.
I do believe that there is no issue with function but with the transactional nature of the operation you described. To solve your issue trivially, you can try using transaction with the highest isolation level:
using (var transaction = new TransactionScope(
TransactionScopeOption.Required,
new TransactionOptions
{
// With this isolation level all data modifications are sequential
IsolationLevel = IsolationLevel.Serializable
}))
{
using (var connection = new SqlConnection("YOUR CONNECTION"))
{
connection.Open();
try
{
// Run raw ADO.NET command in the transaction
var command = connection.CreateCommand();
// Your reading query (just for example sake)
command.CommandText = "SELECT TOP 1 FROM dbo.Whatever";
var result = command.ExecuteScalar();
// Run an EF Core command in the transaction
var options = new DbContextOptionsBuilder<TestContext>()
.UseSqlServer(connection)
.Options;
using (var context = new TestContext(options))
{
context.Items.Add(result);
context.SaveChanges();
}
// Commit transaction if all commands succeed, transaction will auto-rollback
// when disposed if either commands fails
transaction.Complete();
}
catch (System.Exception)
{
// TODO: Handle failure
}
}
}
You should adjust the code for your need, but you have an idea.
Although, I would rather avoid the problem entirely and not modify any records, but rather insert them and select the latests afterwards. Transactions are tricky in application, they may cause performance degradation and deadlocks being applied in the wrong place and in the wrong way.

Fetching ElasticSearch Results into SQL Server by calling Web Service using SQL CLR

Code Migration due to Performance Issues :-
SQL Server LIKE Condition ( BEFORE )
SQL Server Full Text Search --> CONTAINS ( BEFORE )
Elastic Search ( CURRENTLY )
Achieved So Far :-
We have a web page created in ASP.Net Core which has a Auto Complete Drop Down of 2.5+ Million Companies Indexed in Elastic Search https://www.99corporates.com/
Due to performance issues we have successfully shifted our code from SQL Server Full Text Search to Elastic Search and using NEST v7.2.1 and Elasticsearch.Net v7.2.1 in our .Net Code.
Still looking for a solution :-
If the user does not select a company from the Auto Complete List and simply enters a few characters and clicks on go then a list should be displayed which we had done earlier by using the SQL Server Full Text Search --> CONTAINS
Can we call the ASP.Net Web Service which we have created using SQL CLR and code like SELECT * FROM dbo.Table WHERE Name IN( dbo.SQLWebRequest('') )
[System.Web.Script.Services.ScriptMethod()]
[System.Web.Services.WebMethod]
public static List<string> SearchCompany(string prefixText, int count)
{
}
Any better or alternate option
While that solution (i.e. the SQL-APIConsumer SQLCLR project) "works", it is not scalable. It also requires setting the database to TRUSTWORTHY ON (a security risk), and loads a few assemblies as UNSAFE, such as Json.NET, which is risky if any of them use static variables for caching, expecting each caller to be isolated / have their own App Domain, because SQLCLR is a single, shared App Domain, hence static variables are shared across all callers, and multiple concurrent threads can cause race-conditions (this is not to say that this is something that is definitely happening since I haven't seen the code, but if you haven't either reviewed the code or conducted testing with multiple concurrent threads to ensure that it doesn't pose a problem, then it's definitely a gamble with regards to stability and ensuring predictable, expected behavior).
To a slight degree I am biased given that I do sell a SQLCLR library, SQL#, in which the Full version contains a stored procedure that also does this but a) handles security properly via signatures (it does not enable TRUSTWORTHY), b) allows for handling scalability, c) does not require any UNSAFE assemblies, and d) handles more scenarios (better header handling, etc). It doesn't handle any JSON, it just returns the web service response and you can unpack that using OPENJSON or something else if you prefer. (yes, there is a Free version of SQL#, but it does not contain INET_GetWebPages).
HOWEVER, I don't think SQLCLR is a good fit for this scenario in the first place. In your first two versions of this project (using LIKE and then CONTAINS) it made sense to send the user input directly into the query. But now that you are using a web service to get a list of matching values from that user input, you are no longer confined to that approach. You can, and should, handle the web service / Elastic Search portion of this separately, in the app layer.
Rather than passing the user input into the query, only to have the query pause to get that list of 0 or more matching values, you should do the following:
Before executing any query, get the list of matching values directly in the app layer.
If no matching values are returned, you can skip the database call entirely as you already have your answer, and respond immediately to the user (much faster response time when no matches return)
If there are matches, then execute the search stored procedure, sending that list of matches as-is via Table-Valued Parameter (TVP) which becomes a table variable in the stored procedure. Use that table variable to INNER JOIN against the table rather than doing an IN list since IN lists do not scale well. Also, be sure to send the TVP values to SQL Server using the IEnumerable<SqlDataRecord> method, not the DataTable approach as that merely wastes CPU / time and memory.
For example code on how to accomplish this correctly, please see my answer to Pass Dictionary to Stored Procedure T-SQL
In C#-style pseudo-code, this would be something along the lines of the following:
List<string> = companies;
companies = SearchCompany(PrefixText, Count);
if (companies.Length == 0)
{
Response.Write("Nope");
}
else
{
using(SqlConnection db = new SqlConnection(connectionString))
{
using(SqlCommand batch = db.CreateCommand())
{
batch.CommandType = CommandType.StoredProcedure;
batch.CommandText = "ProcName";
SqlParameter tvp = new SqlParameter("ParamName", SqlDbType.Structured);
tvp.Value = MethodThatYieldReturnsList(companies);
batch.Paramaters.Add(tvp);
db.Open();
using(SqlDataReader results = db.ExecuteReader())
{
if (results.HasRows)
{
// deal with results
Response.Write(results....);
}
}
}
}
}
Done. Got the solution.
Used SQL CLR https://github.com/geral2/SQL-APIConsumer
exec [dbo].[APICaller_POST]
#URL = 'https://www.-----/SearchCompany'
,#JsonBody = '{"searchText":"GOOG","count":10}'
Let me know if there is any other / better options to achieve this.

Same SQL query is fast or slow depending on execution context

I've encountered a strange phenomena then investigatating a slow view of a typical ASP.NET MVC application. One of the queries is running ridiculously slow for no obvious reason. The LINQ query in question look like this (Db is DbContext):
var testResults = Db.CustomTestResults
.Include(tr => tr.TestMachine.Platform)
.Include(tr => tr.TestCase)
.Include(tr => tr.CustomTestResultAnalysis.Select(tra => tra.AnalysisOutcomeData))
.Where(tr => tr.CustomTestBuildId == testBuild.Id)
.ToList()
.AsReadOnly();
nothing special actually. Depending on filter query result set can vary in size, from 10 to 10000 records at max.
The SQL generated query (captured by LINQ debug log), executed from SSMS, runs fast, about 2 seconds for the largest sets and less than a second for smaller ones. However then run by IIS strange things happen. The queries began to run like ~1/100x slower speed. The smaller ones take ~10 seconds to execute, the larger are failing due to query execution timeout. I'm not sure if any other queries are affected, but this one is only that is dealing with large data sets, so it's most obvious to notice the problem.
As this was not confusing enough this same code was running perfectly as expected not so long ago. So the bug seems to be caused by some external factors. The database is SQL Server 2014 SP2, EF is at v6.2, IIS 7.5.
Would appreciate any ideas in what areas and how I could investigate this further.
As it turned out, the issue was in SQL Server optimizations, which start to work some time after multiple runs of the similar queries. This problem can be detected by any nonrelevant change to the original query, which fixes performance for some time.
This behaviour can be properly mitigated by controlling query command options. One of the solutions for EF is demonstrated here.
As a temporary "quick-and-dirty" solution I used this approach to randomize query each time, thus preventing optimizations by SQL Server engine:
private static IQueryable<CustomTestResult> RandomizeQuery(IQueryable<CustomTestResult> query)
{
const int minConditions = 1;
const int maxConditions = 5;
const int minId = -100;
const int maxId = -1;
var random = new Random();
var conditionsCount = random.Next(minConditions, maxConditions);
for (int i = 0; i < conditionsCount; i++)
{
var randomId = random.Next(minId, maxId);
query = query.Where(test => test.Id != randomId);
}
return query;
}
Since the SQL has not changed but it is having issues depending on what platform you run on I would start with your settings. A GREAT reference for the whys and hows is written by Erland Sommarskog this: http://www.sommarskog.se/query-plan-mysteries.html
It is long but I imagine you will find your answer in there.

Entity Framework insert (of one object) slow when table has large number of records

I have a large asp.net mvc application that runs on a database that is rapidly growing in size. When the database is empty, everything works quickly, but one of my tables now has 350K records in it and an insert is now taking 15s. Here is a snippet:
foreach (var packageSheet in packageSheets)
{
// Create OrderSheets
var orderSheet = new OrderSheet { Sheet = packageSheet.Sheet };
// Add Default Options
orderSheet.AddDefaultOptions();
orderSheet.OrderPrints.Add(
new OrderPrint
{
OrderPose = CurrentOrderSubject.OrderPoses.Where(op => op.Id == orderPoseId).Single(),
PrintId = packageSheet.Sheet.Prints.First().Id
});
// Create OrderPackageSheets and add it to the order package held in the session
var orderPackageSheet =
new OrderPackageSheet
{
OrderSheet = orderSheet,
PackageSheet = packageSheet
};
_orderPackageRepository.SaveChanges();
...
}
When I SaveChanges at this point it takes 15s the on the first loop; each iteration after is fast. I have indexed the tables in question so I believe the database is tuned properly. It's the OrderPackageSheets table that contains 350K rows.
Can anyone tell me how I can optimize this to get rid of the delay?
Thank you!
EF can be slow if you are inserting a lot of rows at same time.
context.Configuration.AutoDetectChangesEnabled = false; wont do too much for you if this is really web app
You need to share your table definition and for instance you can use Simple recovery model which will improve insert performances.
Or, as mentioned, if you need to insert a lot of rows use bulk inserts
If the number of records is too high ,You can use stored procedure instead of EF.
If you need to use EF itself ,Disable auto updating of the context using
context.Configuration.AutoDetectChangesEnabled = false;
and save the context after the loop
Check these links
Efficient way to do bulk insert/update with Entity Framework
http://weblog.west-wind.com/posts/2013/Dec/22/Entity-Framework-and-slow-bulk-INSERTs

SQL Server : Delete statement between 1ms and

I use a SqlTransaction in my C# project, and I use a Delete statement with an EcexuteNonQuery call.
This works very well and I have always the same amount of rows to delete, but 95% of the time, this needs 1 ms and approx 5% of the time, it is between 300 - 500 ms.
My code:
using (SqlTransaction DbTrans = conn.BeginTransaction(IsolationLevel.ReadCommitted))
{
SqlCommand dbQuery = conn.CreateCommand();
dbQuery.Transaction = DbTrans;
dbQuery.CommandType = CommandType.Text;
dbQuery.CommandText = "delete from xy where id = #ID";
dbQuery.Parameters.Add("ID", SqlDbType.Int).Value = x.ID;
dbQuery.ExecuteNonQuery();
}
Is something wrong with my code?
Read Understanding how SQL Server executes a query and How to analyse SQL Server performance to get you started on troubleshooting such issues.
Of course I assume you have an index on xy.id. Your DELETE is likely blocking from time to time. This an be caused by many causes:
data locks from other queries
IO block from your hardware
log growth events
etc
The gist of it is that using the techniques in the articles linked above (specially the second one) you can identify the cause and address it appropriately.
Changes to your C# code will have little impact, if any at all. Using a stored procedure is
not going to help. You need to root cause the problem.

Resources