What are possible ways of storing large data file( CSV filesaround 1 GB) using SQL database and streaming that data from Database using WCF to the client(without fetching the complete data in memory )?
I think there are a few issues to take into account here:
The size of the data you actually want to return
The structure of that data (or lack thereof)
The place to store that data behind your NLB
Returning that data to the consumer.
From your question, it sounds like you want to store 1 GB of structured (CSV) data and stream it to the client. If you really are generating and then serving a 1GB file (and don't have much metadata around it), I'd go for using a FTP/SFTP server (or perhaps a Network file share, which can certainly be secured in a variety of ways).
If you need to store metadata about the file that goes beyond its file name/create time/location, then SQL might be a good option, assuming you could do one of the following:
store the CSV data in tabular format in the database
Use FILESTREAM and store the file itself
Here is a decent primer on FILESTREAM from SimpleTalk. You could then use the SqlFileStream to help stream the data from the file itself (and SQL Server will help maintain transactional consistency for you, which you may or may not want), an example of which is present in the documentation. The relevant section is here:
private static void ReadFilestream(SqlConnectionStringBuilder connStringBuilder)
{
using (SqlConnection connection = new SqlConnection(connStringBuilder.ToString()))
{
connection.Open();
SqlCommand command = new SqlCommand("SELECT TOP(1) Photo.PathName(), GET_FILESTREAM_TRANSACTION_CONTEXT() FROM employees", connection);
SqlTransaction tran = connection.BeginTransaction(IsolationLevel.ReadCommitted);
command.Transaction = tran;
using (SqlDataReader reader = command.ExecuteReader())
{
while (reader.Read())
{
// Get the pointer for the file
string path = reader.GetString(0);
byte[] transactionContext = reader.GetSqlBytes(1).Buffer;
// Create the SqlFileStream
using (Stream fileStream = new SqlFileStream(path, transactionContext, FileAccess.Read, FileOptions.SequentialScan, allocationSize: 0))
{
// Read the contents as bytes and write them to the console
for (long index = 0; index < fileStream.Length; index++)
{
Console.WriteLine(fileStream.ReadByte());
}
}
}
}
tran.Commit();
}
}
Alternatively, if you do choose to store it in tabular format you can use typical SqlDataReader methods, or perhaps some combination of bcp and .NET helpers.
You should be able to combine that last link with Microsoft's remarks on streaming large data over WCF to get the desired result.
Related
Postgresql 9.1
NPGSQL 2.0.12
I have binary data I am wanting to store in a postgresql database. Most files load fine, however, a large binary (664 Mb) file is causing problems. When trying to load the file to postgresql using Large Object support through Npgsql, the postgresql server returns 'out of memory' error.
I'm running this at present on a workstation with 4Gb RAM, with 2Gb free with postgresql running in an idle state.
This is the code I am using, adapted from PG Foundry Npgsql User's Manual.
using (var transaction = connection.BeginTransaction())
{
try
{
var manager = new NpgsqlTypes.LargeObjectManager(connection);
var noid = manager.Create(NpgsqlTypes.LargeObjectManager.READWRITE);
var lo = manager.Open(noid, NpgsqlTypes.LargeObjectManager.READWRITE);
lo.Write(BinaryData);
lo.Close();
transaction.Commit();
return noid;
}
catch
{
transaction.Rollback();
throw;
}
}
I've tried modifying postgresql's memory settings from defaults to all manner of values adjusting:
shared_buffers
work_mem
maintenance_work_mem
So far I've found postgresql to be a great database system, but this is a show stopper at present and I can't seem to get this sized file into the database. I don't really want to have to deal with manually chopping the file into chunks and recreating client side if I can help it.
Please help!?
I think the answer appears to be calling the Write() method of the LargeObject class iteratively with chunks of the byte array. I know I said I didn't want to have to deal with chunking the data, but what I really meant was chunking the data into separate LargeObjects. This solution means I chunk the array, but it is still stored in the database as one object, meaning I don't have to keep track of file parts, just the one oid.
do
{
var length = 1000;
if (i + length > BinaryData.Length) length = BinaryData.Length - i;
byte[] chunk = new byte[length];
Array.Copy(BinaryData, i, chunk, 0, length);
lo.Write(chunk, 0, length);
i += length;
} (i < BinaryData.Length)
Try decreasing the number of max_connections to reserve memory for the few connections that need to do one 700MB operation. Increase the work_mem (which is memory available per-operation) to 1GB. Trying to cram 700MB into one field sounds odd.
Increase the shared_buffers size to 4096MB.
I am unit/auto-testing a large application which uses MSFT Sql Server, Oracle as well as Sybase as its back end. Maybe there are better ways to interface with a db, but ODBC library is what I have to use. Given these constraints, there is something that I need to figure out, and I would love your help on this. My tests do change the state of the database, and I seek an inexpensive, 99.99% robust way to restore things after I am done ( I feel like a full db restore after each test is too much of a penalty). So, I seek a complement to this function below - I need a way to populate a table from a DataSet.
private DataSet ReadFromTable(ODBCConnection connection, string tableName)
{
string selectQueryString = String.Format("select * from {0};", tableName);
DataSet dataSet = new DataSet();
using (OdbcCommand command = new OdbcCommand(selectQueryString, connection))
using (OdbcDataAdapter odbcAdapter = new OdbcDataAdapter(command))
{
odbcAdapter.Fill(dataSet);
}
return dataSet;
}
// The method that I seek.
private void WriteToTable(ODBCConnection connection, string tableName, DataSet data)
{
...
}
I realize that things can be more complicated - that there are triggers, that some tables depend on others. However, we barely use any constraints for the sake of efficiency of the application under test. I am giving you this information, so that perhaps you have a suggestion on how to do things better/differently. I am open to different approaches, as long as they work well.
The non-negotiables are: MsTest library, VS2010, C#, ODBC Library, support for all 3 vendors.
Is this what you mean? I might be overlooking something
In ReadFromTable
dataset.WriteXmlSchema(memorySchemaStream);
dataset.WriteXml(memoryDataStream);
In WriteToTable
/* empty the table first */
Dataset template = new DataSet();
template.ReadXmlSchema(memorySchemaStream);
template.ReadXml(memoryDataStream);
Dataset actual = new DataSet();
actual.ReadXmlSchema(memorySchemaStream);
actual.Merge(template, false);
actual.Update();
Other variant might be: read the current data, do compare with template and based on what you are missing add the data to the actual dataset. The only thing to remember is that you cannot copy the actual DataRows from one dataset to another, you have to recreate DataRows
In the code below, pathToNonDatabase is the path to a simple text file, not a real sqlite database. I was hoping for sqlite3_open to detect that, but it doesn't (db is not NULL, and result is SQLITE_OK). So, how to detect that a file is not a valid sqlite database?
sqlite3 *db = NULL;
int result = sqlite3_open(pathToNonDatabase, &db);
if((NULL==db) || (result!=SQLITE_OK)) {
// invalid database
}
sqlite opens databases lazily. Just do something immediately after opening that requires it to be a database.
The best is probably pragma schema_version;.
This will report 0 if the database hasn't been created (for instance, an empty file). In this case, it's safe work with (and run CREATE TABLE, etc)
If the database has been created, it will return how many revisions the schema has gone through. This value might not be interesting, but that it's not zero is.
If the file exists and isn't a database (or empty), you'll get an error.
If you want a somewhat more thorough check, you can use pragma quick_check;. This is a lighter-weight integrity check, which skips checking that the contents of the tables line up with the indexes. It can still be very slow.
Avoid integrity_check. It not only checks every page, but then verifies the contents of the tables against the indexes. This is positively glacial on a large database.
For anyone needing to do this in C# with System.Data.SQLite you can start a transaction, and then immediately roll it back as follows:-
private bool DatabaseIsValid(string filename)
{
using (SQLiteConnection db = new SQLiteConnection(#"Data Source=" + filename + ";FailIfMissing=True;"))
{
try
{
db.Open();
using (var transaction = db.BeginTransaction())
{
transaction.Rollback();
}
}
catch (Exception ex)
{
log.Debug(ex.Message, ex);
return false;
}
}
return true;
}
If the file is not a valid database the following SQLiteException is thrown - file is encrypted or is not a database (System.Data.SQLite.SQLiteErrorCode.NotADb). If you aren't using encrypted databases then this solution should be sufficient.
(Only the 'db.Open()' was required for version 1.0.81.0 of System.Data.SQLite but when I upgraded to version 1.0.91.0 I had to insert the inner using block to get it to work).
I think a pragma integrity_check; could do it.
If you want only to check if the file is a valid sqlite database then you can check with this function:
private bool CheckIfValidSQLiteDatabase(string databaseFilePath)
{
byte[] bytes = new byte[16];
using (FileStream fileStream = new FileStream(databaseFilePath, FileMode.Open, FileAccess.Read))
{
fileStream.Read(bytes, 0, 16);
}
string gg = System.Text.ASCIIEncoding.ASCII.GetString(bytes);
return gg.Contains("SQLite format");
}
as stated in the documentation:
sqlite database header
I am using SQL Server 2005 CE framework 3.5 and attempting to use merge replication between my hand held and my SQL Server. When I run the code to synchronise it just seems to sit forever, and when I put a breakpoint in my code it never gets past the call to Synchronize().
If I look at the replication monitor in sql server, it gets to the point where it says the subscription is no longer synchronising and doesn't show any errors. Therefore I am assuming this to mean the synchronisation is complete.
http://server/virtualdirectory/sqlcesa35.dll?diag does not report any issues.
This is my first attempt at any handheld development, so I may have done something daft. However, SQL Server seems to be reporting a successful synchronisation.
Any help would be greatly appreciated as I have spent ages on this !
Here is my code.
const string DatabasePath = #"SD Card\mydb.sdf";
var repl = new SqlCeReplication
{
ConnectionManager = true,
InternetUrl = #"http://server/virtualdirectory/sqlcesa35.dll",
Publisher = #"servername",
PublisherDatabase = #"databasename",
PublisherSecurityMode = SecurityType.DBAuthentication,
PublisherLogin = #"username",
PublisherPassword = #"password",
Publication = #"publicationname",
Subscriber = #"PPC",
SubscriberConnectionString = "Data Source=" + DatabasePath
};
try
{
Cursor.Current = Cursors.WaitCursor;
if (!File.Exists(DatabasePath))
{
repl.AddSubscription(AddOption.CreateDatabase);
}
repl.Synchronize();
MessageBox.Show("Successfully synchronised");
}
catch (SqlCeException e)
{
DisplaySqlCeErrors(e.Errors, e);
}
finally
{
repl.Dispose();
Cursor.Current = Cursors.Default;
}
Another thing you can do to speed up the Synchronize operation is to specify a db file path that is in your PDA's main program memory (instead of on the SD Card as in your example). You should see a speed improvement of up to 4X (meaning the Sync may take only 25% as long as it's taking now).
If you're running out of main program memory on your PDA, you can use System.IO.File.Move() to move the file to the SD Card after the Synchronize call. This seems a bit strange, I know, but it's much faster to sync to program memory and copy to the SD card then it is to sync directly to the SD card.
I have since discovered that it was just taking a long time to copy the data to the physical disk. Although the sql server replication had completed, it was still copying the data to the sd card.
I identified this by reducing the amount of tables I am replicating and I got a more immediate response (well another error but unrelated to this issue).
Thanks anyway :)
When writing manual SQL its pretty easy to estimate the size and shape of data returned by a query. I'm increasingly finding it hard to do this with LINQ to SQL queries. Sometimes I find WAY more data than I was expecting - which can really slow down a remote client that is accessing a database directly.
I'd like to be able to run a query and then tell exactly how much data has been returned across the wire, and use this to help me optimize.
I have already hooked up a log using the DataContext.Log method, but that only gives me an indication of the SQL sent, not the data received.
Any tips?
Looks like you can grab the SqlConnection of your DataContext and turn on statistics.
One of the statistics is "bytes returned".
MSDN Reference Link
Note: You need to cast the connection to a SqlConnection if you have an existing DataContext
((SqlConnection)dc.Connection).StatisticsEnabled = true;
then retrieve the statistics with :
((SqlConnection)dc.Connection).RetrieveStatistics()
I found no way to grab the SqlConnection of the DataContext, so i created the SqlConnection manually:
SqlConnection sqlConnection = new SqlConnection("your_connection_string");
// enable statistics
cn.StatisticsEnabled = true;
// create your DataContext with the SqlConnection
NorthWindDataContext nwContext = new NorthWindDataContext(sqlConnection);
var products = from product in nwContext
where product.Category.CategoryName = "Beverages"
select product;
foreach (var product in products)
{
//do something with product
}
// retrieve statistics - for keys see http://msdn.microsoft.com/en-us/library/7h2ahss8(VS.80).aspx
string bytesSent = sqlConnection.RetrieveStatistics()["BytesSent"].ToString();
string bytesReceived = sqlConnection.RetrieveStatistics()["BytesReceived"].ToString();