SQL Server CE 4 (SQL Server Compact Edition 4.0) is not news already (If it is, you could read this article)
But it is very interesting to see SQL Server CE 4 performance comparison to other databases.
Especially with:
SQLite
SQL Server (1)
SQL Server Express *
maybe Firebird
(1) for applications where functionality is comparable.
Unfortunately there are not so much links about the subject that google provides right now. Actually I was unable to find any (for proper SQL CE version).
If one could find or share such information lets collect it here for future humanity.
In my opinion, it is incorrect to compare the embedded database (like SQL CE) versus server-side relational database (like all the rest, except for SQLite and the Embedded version of Firebird).
The main difference between them is that the general-purpose server-side relational databases (like MS SQL, MySQL, Firebird Classic and SuperServer etc.) are installed as an independent service and run outside of the scope of your main application. That is why they can perform much better because of the intrinsic support for multi-core and multi-CPU architectures, using OS features like pre-caching, VSS etc to increase the throughput in case of intensive database operation and can claim as much memory as your OS can provide for a single service/application. It also means that the performance indicators for them are more or less independent from your application, but largely depend upon your hardware. In this respect I would say that the server versions of any database are always more performance compared to the embedded ones.
SQL CE (along with Firebird Embedded, SQLite, TurboSQL and some other) are embedded DB engines, meaning that the complete database is packed into a single (or maximally 2) DLL-files that are distributed together with your application. Due to the evident size limitations (would you like to have to distribute a 30 MB DLL together with your 2-3 MB long application?) they also run directly in the context of your application and the total memory and performance for data access operations are shared with other parts of your application -- that regards both available memory, CPU time, disk throughput etc. Having a computation-intensive threads running in parallel with your data access thread might lead to dramatic decrease of your database performance.
Due to the different areas of application these databases have different palette of options: server-db provide extensive user and right management, support for views and stored procedures, whereas embedded database normally lack any support for users and rights management and have limited support for views and stored procedures (latter ones lose the majority of their benefits of running on server side). Data throughput is a usual bottlenecks of RDBMS, server versions are usually installed on striped RAID volumes, whereas embedded DB are often memory-oriented (try to keep all the actual data in the memory) and minimize the data storage access operations.
So, what would make sense probably is to compare different embedded RDBMS for .Net for their performance, like MS SQL CE 4.0, SQLite, Firebird Embedded, TurboSQL. I wouldn't expect drastic differences during usual non-peak operation, whereas some database may provide better support for large BLOBs due to better integration with OS.
-- update --
I have to take back my last words, for my quick implementation shows very interesting results.
I wrote a short console application to test both data providers, here is the source code for you if you want to experiment with them on your own.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data.SQLite;
using System.Data.SqlServerCe;
using System.Data.Common;
namespace TestSQL
{
class Program
{
const int NUMBER_OF_TESTS = 1000;
private static string create_table;
private static string create_table_sqlce = "CREATE TABLE Test ( id integer not null identity primary key, textdata nvarchar(500));";
private static string create_table_sqlite = "CREATE TABLE Test ( id integer not null primary key, textdata nvarchar(500));";
private static string drop_table = "DROP TABLE Test";
private static string insert_data = "INSERT INTO Test (textdata) VALUES ('{0}');";
private static string read_data = "SELECT textdata FROM Test WHERE id = {0}";
private static string update_data = "UPDATE Test SET textdata = '{1}' WHERE id = {0}";
private static string delete_data = "DELETE FROM Test WHERE id = {0}";
static Action<DbConnection> ACreateTable = (a) => CreateTable(a);
static Action<DbConnection> ATestWrite = (a) => TestWrite(a, NUMBER_OF_TESTS);
static Action<DbConnection> ATestRead = (a) => TestRead(a, NUMBER_OF_TESTS);
static Action<DbConnection> ATestUpdate = (a) => TestUpdate(a, NUMBER_OF_TESTS);
static Action<DbConnection> ATestDelete = (a) => TestDelete(a, NUMBER_OF_TESTS);
static Action<DbConnection> ADropTable = (a) => DropTable(a);
static Func<Action<DbConnection>,DbConnection, TimeSpan> MeasureExecTime = (a,b) => { var start = DateTime.Now; a(b); var finish = DateTime.Now; return finish - start; };
static Action<string, TimeSpan> AMeasureAndOutput = (a, b) => Console.WriteLine(a, b.TotalMilliseconds);
static void Main(string[] args)
{
// opening databases
SQLiteConnection.CreateFile("sqlite.db");
SQLiteConnection sqliteconnect = new SQLiteConnection("Data Source=sqlite.db");
SqlCeConnection sqlceconnect = new SqlCeConnection("Data Source=sqlce.sdf");
sqlceconnect.Open();
sqliteconnect.Open();
Console.WriteLine("=Testing CRUD performance of embedded DBs=");
Console.WriteLine(" => Samplesize: {0}", NUMBER_OF_TESTS);
create_table = create_table_sqlite;
Console.WriteLine("==Testing SQLite==");
DoMeasures(sqliteconnect);
create_table = create_table_sqlce;
Console.WriteLine("==Testing SQL CE 4.0==");
DoMeasures(sqlceconnect);
Console.ReadKey();
}
static void DoMeasures(DbConnection con)
{
AMeasureAndOutput("Creating table: {0} ms", MeasureExecTime(ACreateTable, con));
AMeasureAndOutput("Writing data: {0} ms", MeasureExecTime(ATestWrite, con));
AMeasureAndOutput("Updating data: {0} ms", MeasureExecTime(ATestUpdate, con));
AMeasureAndOutput("Reading data: {0} ms", MeasureExecTime(ATestRead, con));
AMeasureAndOutput("Deleting data: {0} ms", MeasureExecTime(ATestDelete, con));
AMeasureAndOutput("Dropping table: {0} ms", MeasureExecTime(ADropTable, con));
}
static void CreateTable(DbConnection con)
{
var sqlcmd = con.CreateCommand();
sqlcmd.CommandText = create_table;
sqlcmd.ExecuteNonQuery();
}
static void TestWrite(DbConnection con, int num)
{
for (; num-- > 0; )
{
var sqlcmd = con.CreateCommand();
sqlcmd.CommandText = string.Format(insert_data,Guid.NewGuid().ToString());
sqlcmd.ExecuteNonQuery();
}
}
static void TestRead(DbConnection con, int num)
{
Random rnd = new Random(DateTime.Now.Millisecond);
for (var max = num; max-- > 0; )
{
var sqlcmd = con.CreateCommand();
sqlcmd.CommandText = string.Format(read_data, rnd.Next(1,num-1));
sqlcmd.ExecuteNonQuery();
}
}
static void TestUpdate(DbConnection con, int num)
{
Random rnd = new Random(DateTime.Now.Millisecond);
for (var max = num; max-- > 0; )
{
var sqlcmd = con.CreateCommand();
sqlcmd.CommandText = string.Format(update_data, rnd.Next(1, num - 1), Guid.NewGuid().ToString());
sqlcmd.ExecuteNonQuery();
}
}
static void TestDelete(DbConnection con, int num)
{
Random rnd = new Random(DateTime.Now.Millisecond);
var order = Enumerable.Range(1, num).ToArray<int>();
Action<int[], int, int> swap = (arr, a, b) => { int c = arr[a]; arr[a] = arr[b]; arr[b] = c; };
// shuffling the array
for (var max=num; max-- > 0; ) swap(order, rnd.Next(0, num - 1), rnd.Next(0, num - 1));
foreach(int index in order)
{
var sqlcmd = con.CreateCommand();
sqlcmd.CommandText = string.Format(delete_data, index);
sqlcmd.ExecuteNonQuery();
}
}
static void DropTable(DbConnection con)
{
var sqlcmd = con.CreateCommand();
sqlcmd.CommandText = drop_table;
sqlcmd.ExecuteNonQuery();
}
}
}
Necessary disclaimer:
I got these results on my machine: Dell Precision WorkStation T7400 equipped with 2 Intel Xeon E5420 CPUs and 8GB of RAM, running 64bit Win7 Enterprise.
I used the default settings for both DBs with connection string "Data Source=database_file_name".
I used the latest versions of both SQL CE 4.0 and SQLite/System.Data.SQLite (from today, June 3rd 2011).
Here are the results for two different samples:
> =Testing CRUD performance of embedded DBs=
> => Samplesize: 200
> ==Testing SQLite==
> Creating table: 396.0396 ms
> Writing data: 22189.2187 ms
> Updating data: 23591.3589 ms
> Reading data: 21.0021 ms
> Deleting data: 20963.0961 ms
> Dropping table: 85.0085 ms
> ==Testing SQL CE 4.0==
> Creating table: 16.0016 ms
> Writing data: 25.0025 ms
> Updating data: 56.0056 ms
> Reading data: 28.0028 ms
> Deleting data: 53.0053 ms
> Dropping table: 11.0011 ms
... and a bigger sample:
=Testing CRUD performance of embedded DBs=
=> Samplesize: 1000
==Testing SQLite==
Creating table: 93.0093 ms
Writing data: 116632.6621 ms
Updating data: 104967.4957 ms
Reading data: 134.0134 ms
Deleting data: 107666.7656 ms
Dropping table: 83.0083 ms
==Testing SQL CE 4.0==
Creating table: 16.0016 ms
Writing data: 128.0128 ms
Updating data: 307.0307 ms
Reading data: 164.0164 ms
Deleting data: 306.0306 ms
Dropping table: 13.0013 ms
So, as you can see, any writing operations (create, update, delete) require almost 1000x more time in SQLite compared to SQLCE. It does not necessarily reflect the general bad performance of this database and might be due to the following:
The data provider I use for SQLite is the System.Data.SQLite, that is a mixed assembly containing both managed and unmanaged code (SQLite is originally written completely in C and the DLL only provides bindings). Probably P/Invoke and data marshaling eats up a good piece of the operation time.
Most likely SQLCE 4.0 caches all the data in memory by default, whereas SQLite flushes most of the data changes directly to the disk storage every time the change happens. One can supply hundreds of parameters for both databases via connection string and tune them appropriately.
I used a series of single queries to test the DB. At least SQLCE supports bulk operations via special .Net classes that would be better suited here. If SQLite supports them too (sorry, I am not an expert here and my quick search yielded nothing promising) it would be nice to compare them as well.
I have observed many problems with SQLite on x64 machines (using the same .net adapter): from data connection being closed unexpectedly to database file corruption. I presume there is some stability problems either with the data adapter or with the library itself.
Here is my freshly baked article about the benchmarking on CodeProject webpage:
Benchmarking the performance of embedded DB for .Net: SQL CE 4.0 vs SQLite
(the article has a pending status now, you need to be logged-in on CodeProject to access its content)
P.S.: I mistakenly marked my previous answer as a community wiki entry and won't get any reputation for it. This encouraged me to write the article for Code Project on this topic, with a somewhat optimized code, more additional information about embedded dbs and statistical analysis of the results. So, please, vote this answer up if you like the article and my second answer here.
Because I'm having a real hard time with Alaudo's tests, test results, and ultimately with his conclusion, I went ahead and played around a bit with his program and came up with a modified version.
It tests each of the following 10 times and outputs the average times:
sqlite without transactions, using default jounal_mode
sqlite with transactions, using default journal_mode
sqlite without transactions, using WAL jounal_mode
sqlite with transactions, using WAL journal_mode
sqlce without transactions
sqlce with transactions
Here is the program (it's a class actually):
using System;
using System.Collections.Generic;
using System.Data.Common;
using System.Data.SqlServerCe;
using System.Data.SQLite;
using System.Diagnostics;
using System.IO;
using System.Linq;
class SqliteAndSqlceSpeedTesting
{
class Results
{
public string test_details;
public long create_table_time, insert_time, update_time, select_time, delete_time, drop_table_time;
}
enum DbType { Sqlite, Sqlce };
const int NUMBER_OF_TESTS = 200;
const string create_table_sqlite = "CREATE TABLE Test (id integer not null primary key, textdata nvarchar(500));";
const string create_table_sqlce = "CREATE TABLE Test (id integer not null identity primary key, textdata nvarchar(500));";
const string drop_table = "DROP TABLE Test";
const string insert_data = "INSERT INTO Test (textdata) VALUES ('{0}');";
const string read_data = "SELECT textdata FROM Test WHERE id = {0}";
const string update_data = "UPDATE Test SET textdata = '{1}' WHERE id = {0}";
const string delete_data = "DELETE FROM Test WHERE id = {0}";
public static void RunTests()
{
List<Results> results_list = new List<Results>();
for (int i = 0; i < 10; i++) {
results_list.Add(RunTest(DbType.Sqlite, false, false));
results_list.Add(RunTest(DbType.Sqlite, false, true));
results_list.Add(RunTest(DbType.Sqlite, true, false));
results_list.Add(RunTest(DbType.Sqlite, true, true));
results_list.Add(RunTest(DbType.Sqlce, false));
results_list.Add(RunTest(DbType.Sqlce, true));
}
foreach (var test_detail in results_list.GroupBy(r => r.test_details)) {
Console.WriteLine(test_detail.Key);
Console.WriteLine("Creating table: {0} ms", test_detail.Average(r => r.create_table_time));
Console.WriteLine("Inserting data: {0} ms", test_detail.Average(r => r.insert_time));
Console.WriteLine("Updating data: {0} ms", test_detail.Average(r => r.update_time));
Console.WriteLine("Selecting data: {0} ms", test_detail.Average(r => r.select_time));
Console.WriteLine("Deleting data: {0} ms", test_detail.Average(r => r.delete_time));
Console.WriteLine("Dropping table: {0} ms", test_detail.Average(r => r.drop_table_time));
Console.WriteLine();
}
}
static Results RunTest(DbType db_type, bool use_trx, bool use_wal = false)
{
DbConnection conn = null;
if (db_type == DbType.Sqlite)
conn = GetConnectionSqlite(use_wal);
else
conn = GetConnectionSqlce();
Results results = new Results();
results.test_details = string.Format("Testing: {0}, transactions: {1}, WAL: {2}", db_type, use_trx, use_wal);
results.create_table_time = CreateTable(conn, db_type);
results.insert_time = InsertTime(conn, use_trx);
results.update_time = UpdateTime(conn, use_trx);
results.select_time = SelectTime(conn, use_trx);
results.delete_time = DeleteTime(conn, use_trx);
results.drop_table_time = DropTableTime(conn);
conn.Close();
return results;
}
static DbConnection GetConnectionSqlite(bool use_wal)
{
SQLiteConnection conn = new SQLiteConnection("Data Source=sqlite.db");
if (!File.Exists(conn.Database))
SQLiteConnection.CreateFile("sqlite.db");
conn.Open();
if (use_wal) {
var command = conn.CreateCommand();
command.CommandText = "PRAGMA journal_mode=WAL";
command.ExecuteNonQuery();
}
return conn;
}
static DbConnection GetConnectionSqlce()
{
SqlCeConnection conn = new SqlCeConnection("Data Source=sqlce.sdf");
if (!File.Exists(conn.Database))
using (var sqlCeEngine = new SqlCeEngine("Data Source=sqlce.sdf"))
sqlCeEngine.CreateDatabase();
conn.Open();
return conn;
}
static long CreateTable(DbConnection con, DbType db_type)
{
Stopwatch sw = Stopwatch.StartNew();
var sqlcmd = con.CreateCommand();
if (db_type == DbType.Sqlite)
sqlcmd.CommandText = create_table_sqlite;
else
sqlcmd.CommandText = create_table_sqlce;
sqlcmd.ExecuteNonQuery();
return sw.ElapsedMilliseconds;
}
static long DropTableTime(DbConnection con)
{
Stopwatch sw = Stopwatch.StartNew();
var sqlcmd = con.CreateCommand();
sqlcmd.CommandText = drop_table;
sqlcmd.ExecuteNonQuery();
return sw.ElapsedMilliseconds;
}
static long InsertTime(DbConnection con, bool use_trx)
{
Stopwatch sw = Stopwatch.StartNew();
var sqlcmd = con.CreateCommand();
DbTransaction trx = null;
if (use_trx) {
trx = con.BeginTransaction();
sqlcmd.Transaction = trx;
}
for (int i = 0; i < NUMBER_OF_TESTS; i++) {
sqlcmd.CommandText = string.Format(insert_data, Guid.NewGuid().ToString());
sqlcmd.ExecuteNonQuery();
}
if (trx != null)
trx.Commit();
return sw.ElapsedMilliseconds;
}
static long SelectTime(DbConnection con, bool use_trx)
{
Stopwatch sw = Stopwatch.StartNew();
var sqlcmd = con.CreateCommand();
DbTransaction trx = null;
if (use_trx) {
trx = con.BeginTransaction();
sqlcmd.Transaction = trx;
}
Random rnd = new Random(DateTime.Now.Millisecond);
for (var max = NUMBER_OF_TESTS; max-- > 0; ) {
sqlcmd.CommandText = string.Format(read_data, rnd.Next(1, NUMBER_OF_TESTS - 1));
sqlcmd.ExecuteNonQuery();
}
if (trx != null)
trx.Commit();
return sw.ElapsedMilliseconds;
}
static long UpdateTime(DbConnection con, bool use_trx)
{
Stopwatch sw = Stopwatch.StartNew();
var sqlcmd = con.CreateCommand();
DbTransaction trx = null;
if (use_trx) {
trx = con.BeginTransaction();
sqlcmd.Transaction = trx;
}
Random rnd = new Random(DateTime.Now.Millisecond);
for (var max = NUMBER_OF_TESTS; max-- > 0; ) {
sqlcmd.CommandText = string.Format(update_data, rnd.Next(1, NUMBER_OF_TESTS - 1), Guid.NewGuid().ToString());
sqlcmd.ExecuteNonQuery();
}
if (trx != null)
trx.Commit();
return sw.ElapsedMilliseconds;
}
static long DeleteTime(DbConnection con, bool use_trx)
{
Stopwatch sw = Stopwatch.StartNew();
Random rnd = new Random(DateTime.Now.Millisecond);
var order = Enumerable.Range(1, NUMBER_OF_TESTS).ToArray<int>();
Action<int[], int, int> swap = (arr, a, b) => { int c = arr[a]; arr[a] = arr[b]; arr[b] = c; };
var sqlcmd = con.CreateCommand();
DbTransaction trx = null;
if (use_trx) {
trx = con.BeginTransaction();
sqlcmd.Transaction = trx;
}
// shuffling the array
for (var max = NUMBER_OF_TESTS; max-- > 0; ) swap(order, rnd.Next(0, NUMBER_OF_TESTS - 1), rnd.Next(0, NUMBER_OF_TESTS - 1));
foreach (int index in order) {
sqlcmd.CommandText = string.Format(delete_data, index);
sqlcmd.ExecuteNonQuery();
}
if (trx != null)
trx.Commit();
return sw.ElapsedMilliseconds;
}
}
Here are the numbers I get:
Testing: Sqlite, transactions: False, WAL: False
Creating table: 24.4 ms
Inserting data: 3084.7 ms
Updating data: 3147.8 ms
Selecting data: 30 ms
Deleting data: 3182.6 ms
Dropping table: 14.5 ms
Testing: Sqlite, transactions: False, WAL: True
Creating table: 2.3 ms
Inserting data: 14 ms
Updating data: 12.2 ms
Selecting data: 6.8 ms
Deleting data: 11.7 ms
Dropping table: 0 ms
Testing: Sqlite, transactions: True, WAL: False
Creating table: 13.5 ms
Inserting data: 20.3 ms
Updating data: 24.5 ms
Selecting data: 7.8 ms
Deleting data: 22.3 ms
Dropping table: 16.7 ms
Testing: Sqlite, transactions: True, WAL: True
Creating table: 3.2 ms
Inserting data: 5.8 ms
Updating data: 4.9 ms
Selecting data: 4.4 ms
Deleting data: 3.8 ms
Dropping table: 0 ms
Testing: Sqlce, transactions: False, WAL: False
Creating table: 2.8 ms
Inserting data: 24.4 ms
Updating data: 42.8 ms
Selecting data: 30.4 ms
Deleting data: 38.3 ms
Dropping table: 3.3 ms
Testing: Sqlce, transactions: True, WAL: False
Creating table: 2.1 ms
Inserting data: 24.6 ms
Updating data: 44.2 ms
Selecting data: 32 ms
Deleting data: 37.8 ms
Dropping table: 3.2 ms
~3 seconds for 200 inserts or updates using sqlite might still seem a little high, but at least it's more reasonable than 23 seconds. Conversely, one might be worried how SqlCe takes too little time to complete the same 200 inserts or update, especially since there seems to be no real speed difference between having each SQL query in individual transactions, or together in one transaction. I don't know enough about SqlCe to explain this, but it worries me. Would it mean that when .Commit() returns, you are not assured that the changes are actually written to disk?
I have recently worked on a project using SQL CE 4 and NHibernate and I found the performance to be really good. With SQL CE 4 we were able to insert 8000 records in a second. With Oracle over the network we were only able to insert 100 records per second even batch-size and seqhilo approaches were used.
I did not test it but looking at some of the performance reports for NoSQL products for .NET, SQL CE 4 seems to be one of the best solutions for stand-alone applications.
Just avoid using Identity columns, we noticed the performance was 40 times better if they are not used. The same 8000 records were taking 40 secs to insert when an Identity column was used as PK.
Related
I'm trying to do a "database side" bulk copy (i.e. SELECT INTO/INSERT INTO) using linq2db. However, my code is trying to bring the dataset over the wire which is not possible given the size of the DB in question.
My code looks like this:
using (var db = new MyDb()) {
var list = db.SourceTable.
Where(s => s.Year > 2012).
GroupBy(s => new { s.Column1, s.Column2 }).
Select(g => new DestinationTable {
Property1 = 'Constant Value',
Property2 = g.First().Column1,
Property3 = g.First().Column2,
Property4 = g.Count(s => s.Column3 == 'Y')
});
db.Execute("TRUNCATE TABLE DESTINATION_TABLE");
db.BulkCopy(new BulkCopyOptions {
BulkCopyType = BulkCopyType.MultipleRows
}, list);
}
The generated SQL looks like this:
BeforeExecute
-- DBNAME SqlServer.2017
TRUNCATE TABLE DESTINATION_TABLE
DataConnection
Query Execution Time (AfterExecute): 00:00:00.0361209. Records Affected: -1.
DataConnection
BeforeExecute
-- DBNAME SqlServer.2017
DECLARE #take Int -- Int32
SET #take = 1
DECLARE #take_1 Int -- Int32
SET #take_1 = 1
DECLARE #take_2 Int -- Int32
...
SELECT
(
SELECT TOP (#take)
[p].[YEAR]
FROM
[dbo].[SOURCE_TABLE] [p]
WHERE
(([p_16].[YEAR] = [p].[YEAR] OR [p_16].[YEAR] IS NULL AND [p].[YEAR] IS NULL) AND ...
...)
FROM SOURCE_TABLE p_16
WHERE p_16.YEAR > 2012
GROUP BY
...
DataConnection
That is all that is logged as the bulkcopy fails with a timeout, i.e. SqlException "Execution Timeout Expired".
Please note that running this query as an INSERT INTO statement takes less than 1 second directly in the DB.
PS: Anyone have any recommendations as to good code based ETL tools to do large DB (+ 1 TB) ETL. Given the DB size I need things to run in the database and not bring data over the wire. I've tried pyspark, python bonobo, c# etlbox and they all move too much data around. I thought linq2db had potential, i.e. basically just act like a C# to SQL transpiler but it is also trying to move data around.
I would suggest to rewrite your query because group by can not return first element. Also Truncate is a part of the library.
var sourceQuery =
from s in db.SourceTable
where s.Year > 2012
select new
{
Source = s,
Count = Sql.Ext.Count(s.Column3 == 'Y' ? 1 : null).Over()
.PartitionBy(s.Column1, s.Column2).ToValue()
RN = Sql.Ext.RowNumber().Over()
.PartitionBy(s.Column1, s.Column2).OrderByDesc(s.Year).ToValue()
};
db.DestinationTable.Truncate();
sourceQuery.Where(s => s.RN == 1)
.Insert(db.DestinationTable,
e => new DestinationTable
{
Property1 = 'Constant Value',
Property2 = e.Source.Column1,
Property3 = e.Source.Column2,
Property4 = e.Count
});
After some investigation I stumbled onto this issue. Which lead me to the solution. The code above needs to change to:
db.Execute("TRUNCATE TABLE DESTINATION_TABLE");
db.SourceTable.
Where(s => s.Year > 2012).
GroupBy(s => new { s.Column1, s.Column2 }).
Select(g => new DestinationTable {
Property1 = 'Constant Value',
Property2 = g.First().Column1,
Property3 = g.First().Column2,
Property4 = g.Count(s => s.Column3 == 'Y')
}).Insert(db.DestinationTable, e => e);
Documentation of the linq2db project leaves a bit to be desired however, in terms of functionality its looking like a great project for ETLs (without horrible 1000s of line copy/paste sql/ssis scripts).
I recently migrated a MS Access database to SQL Server, mostly was imported just fine, but there are some datatype differences that I would like to find with some tool if available.
Tools I found so far compare MS Access against MS Access, or SQL Server vs SQL Server only.
At issue is that Access (or JET Red) does not have a single canonical API for working with its data-model, instead you mostly go through the OLE-DB driver or the ODBC driver. I think (but cannot confirm) that the Office Access GUI program probably its own internal API that bypasses the OLE-DB or ODBC abstractions, unfortunately the GUI program does not use specific technical terminology in things like the Table Designer (e.g. Number > Integer doesnt say if it's a 16, 32 or 64-bit integer, and Number > Replication ID is not a number at all but a Win32 GUID).
As of 2019, Microsoft has seemingly de-prioritized OLE-DB compared to the lower-level ODBC API for JET Red, but that's okay because ODBC still provides us with the necessary details for determining a database table's design.
Anyway - the good news is that you don't necessarily need a tool to compare an Access (JET Red) database with a SQL Server database because it's easy to get the ODBC table specifications yourself.
Something like this:
Dictionary<String,DataTable> jetTables = new Dictionary<String,DataTable>();
using( OleDbConnection jetConnection = new OleDbConnection( "your-access-connection-string") )
{
await jetConnection.OpenAsync().ConfigureAwait(false);
DataTable schemaTable = connection.GetOleDbSchemaTable(
OleDbSchemaGuid.Tables,
new object[] { null, null, null, "TABLE" }
);
foreach( DataRow row in schemaTable.Rows.Cast<DataRow>() )
{
String tableName = (String)row.ItemArray[2];
DataTable tableSchema = connection.GetOleDbSchemaTable(
OleDbSchemaGuid.Tables,
new object[] { null, null, tableName, "TABLE" }
);
jetTables.Add( tableName, tableSchema );
}
}
Dictionary<String,DataTable> sqlTables = new Dictionary<String,DataTable>();
using( SqlConnection sqlConnection = new SqlConnection( "your-sql-server-connection-string" ) )
{
await sqlConnection.OpenAsync().ConfigureAwait(false);
DataTable allTables = new DataTable();
using( SqlCommand cmd1 = sqlConnection.CreateCommand() )
{
cmd1.CommandText = "SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES";
using( SqlDataReader rdr1 = await cmd1.ExecuteReaderAsync.ConfigureAwait(false) )
{
allTables.Load( rdr1 );
}
}
foreach( DataRow row in allTables.Rows.Cast<DataRow>() )
{
String tableName = (String)row.ItemArray[0];
using( SqlCommand cmd2 = sqlConnection.CreateCommand() )
{
cmd2.CommandText = "SELECT COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = #tableName";
cmd2.Parameters.Add( "#tableName", SqlDbType.NVarChar ).Value = tableName;
using( SqlDataReader rdr2 = await cmd2.ExecuteReaderAsync.ConfigureAwait(false) )
{
DataTable dt = new DataTable();
dt.Load( rdr2 );
sqlTables.Add( tableName, dt );
}
}
}
}
Then compare jetTables with sqlTables as you so wish.
I have the following connection string:
provider=SQLNCLI11;Server=[server];Database=[db];uid=[uid];pwd=[pwd]
and I have the following code:
OleDbCommand oComm = new OleDbCommand();
oComm.Connection = OleConnection;
oComm.Transaction = m_oleTran;
oComm.CommandText = sSQL;
oComm.CommandTimeout = TimeOut;
BuildParams(ref oComm, sCols, (object [])oVals);
if (oComm.Connection.State == ConnectionState.Closed)
oComm.Connection.Open();
m_RowsAffected = oComm.ExecuteNonQuery();
if (m_oleTran == null)
oComm.Connection.Close();
oComm.Dispose();
private void BuildParams(ref OleDbCommand oComm, string [] sCols, object [] oVals)
{
for (int i = 0; i< sCols.Length; i++)
{
if (sCols.Length > 0)
oComm.Parameters.AddWithValue(sCols[i], oVals[i]);
}
}
when I executed a simple update SQL statement, I got the following error
The fractional part of the provided time value overflows the scale of the corresponding SQL Server parameter or column. Increase bScale in DBPARAMBINDINFO or column scale to correct this error. at System.Data.OleDb.OleDbCommand.ExecuteReaderInternal(CommandBehavior behavior, String method) at System.Data.OleDb.OleDbCommand.ExecuteNonQuery()
Any ideas?
Thanks,
Solution used in DTSX Script Task with VB language.
For compliance reasons our customer asked us to change the SQL Server provider in a database version upgrade. And we started getting the same error when trying to save the dates. The date sent to the database was not on the correct scale. We could have changed the data type in the database, but we chose not to do so after we tested this very simple solution that worked. We just use a
.ToString("yyyy-MM-dd HH:mm:ss")
and send a string instead of a date with the correct size and it went through the provider and the database and saved without problems.
Provider=SQLNCLI11;
SOLUTION:
com.Parameters.AddWithValue("#COD_INTERFACE", SqlDbType.Int)
com.Parameters.AddWithValue("#COD_SEQUENCIAL", SqlDbType.Int)
com.Parameters.AddWithValue("#DTA_GERACAO", SqlDbType.DateTime)
com.Parameters.AddWithValue("#DTA_IMPORTACAO", SqlDbType.DateTime)
com.Parameters("#COD_INTERFACE").Value = CInt(Dts.Variables("User::intCodigoInterface").Value)
com.Parameters("#COD_SEQUENCIAL").Value = CInt(Dts.Variables("User::intSequencialArquivo").Value)
com.Parameters("#DTA_GERACAO").Value = CDate(Dts.Variables("User::dtaGeracaoArquivo").Value).ToString("yyyy-MM-dd HH:mm:ss")
com.Parameters("#DTA_IMPORTACAO").Value = Now.ToString("yyyy-MM-dd HH:mm:ss")
I have simple DB with 4 tables. Table Results has 18 columns. 3 of them are foreign keys. I am trying to get number of all results (about 800k) with this code:
#I #"..\packages\SQLProvider.1.1.3\lib"
#r "FSharp.Data.SqlProvider.dll"
open FSharp.Data.Sql
let [<Literal>] ConnectionStringmdf = #"Data Source=(localdb)\MSSQLLocalDB;AttachDbFilename=C:\Users\Me\Desktop\myDb.mdf;Integrated Security=True;Connect Timeout=10"
type Sqlmdf = SqlDataProvider<
ConnectionString = ConnectionStringmdf,
DatabaseVendor = Common.DatabaseProviderTypes.MSSQLSERVER,
IndividualsAmount = 1000,
UseOptionTypes = true,
CaseSensitivityChange = Common.CaseSensitivityChange.ORIGINAL
>
let dbm = Sqlmdf.GetDataContext()
printfn "Results count:\t %i" (dbm.Dbo.Results |> Seq.length )
It takes about 40 seconds to get count of records in one table.
Why is it so slow? What am I doing wrong?
The types returned by SqlDataProvider implement IQueryable which means you can write a query expression or use Queryable.Count
open System.Linq
dbm.Dbo.Results |> Queryable.Count
or
query { for it in dbm.Dbo.Results do
count
}
You should just execute the query directly on the table, and have the server return the result to you. For example, I get an 8M row count instantenously:
type dbSchema = SqlDataConnection<connectionString1>
let dbx = dbSchema.GetDataContext()
dbx.DataContext.ObjectTrackingEnabled <- false
dbx.DataContext.CommandTimeout <- 60
let table1 = dbx.MyTable
table1.Count()
//val it : int = 7189765
You could also wrap it into a query.
Here's a query version, that (unless sqlprovider doesn't do count) should work on the other TP as well. Again, speed is almost instantenous.
query { for row in table1 do
select row
count
}
I tested the same with SqlDataProvider with similar results. Open the System.Linq namespace to access the .Count() extension function if necessary.
In our application, I have a method in DAL which executes an SQL query which should return 2 result sets. I use dapper to map the result sets to DTOs. Works fine on windows.
While running on Mono, the query fails to fetch the results (Dapper throws null ref exception).
Other SQL queries work fine on Mono. The application is able to read/write to SQL server just fine.
Are there any known issues with multiple result sets in Mono SQL server driver?
I wasn't able to find any documented outstanding issues.
Here's the method's code:
using (var connection = ConnectionFactory.OpenConnection())
{
if (connection == null)
{
throw new System.ApplicationException("Could not open DB connection for DAL.");
}
Logger?.Trace("Got connection for DAL.");
var sql = #"SELECT A.x, A.y FROM A;
SELECT A.x, B.z, B.w
FROM C INNER JOIN A ON C.x = A.x
INNER JOIN B ON C.z = B.z";
using (var multiRes = connection.QueryMultiple(sql))
{
if(multiRes == null)
{
throw new System.ApplicationException("No results returned from db.");
}
res.AddRange(multiRes.Read<aDTO>());
var mapping = multiRes.Read<aDTO, bDTO, aDTO>((a, b) =>
{
...
return a;
}, splitOn: "z").ToList();
}
}