I have Entity Framework (.NET 4.0) going against SQL Server 2008. The database is (theoretically) getting updated during business hours -- delete, then insert, all through a transaction. Practically, it's not going to happen that often. But, I need to make sure I can always read data in the database. The application I'm writing will never do any types of writes to the data -- read-only.
If I do a dirty read, I can always access the data; the worst that happens is I get old data (which is acceptable). However, can I tell Entity Framework to always use dirty reads? Are there performance or data integrity issues I need to worry about if I set up EF this way? Or should I take a step back and see about rewriting the process that's doing the delete/insert process?
TransactionScope is your friend:
Entity Framework with NOLOCK
Don't use dirty reads. "The worst" isn't that you see old data. The worst is that you see uncommitted data. Stack Overflow uses snapshots rather than dirty reads to solve this problem. That's what I'd do, too.
From the previous link, I found this, which also answers the question.
http://www.hanselman.com/blog/GettingLINQToSQLAndLINQToEntitiesToUseNOLOCK.aspx
Related
Me and my colleges are having second thoughts on using MongoDB to store al the data for our application. Some think that because of Mongo's eventual consistency when user register or updates his profile (or something similar but more important) the result won't be seen immediately and could frustrate users.
I'm pretty sure unless we have a ton of data and we do some replication we won't see the effects of eventual consistency. But I'm not sure.
Any advice? Use only mongo or add an additional SQL server for storing sensitive data?
MongoDB is not eventually consistent, but it has asynchronous replication. You can avoid the risk of reading an old value by not reading from slaves (a simple connection flag), or writing wit a flag that waits for the replication to have finished before returning. Look atthedocumentation gor the getLastError command for all the details on the latter.
Using MongoDB forwhat you're describing is fine.
Lets answer a different question first: Do you need special capabilities of MongoDB or could you use a plain old relational data store?
I can't answer that one (you can).
If the answer is "no" you could switch everything to an ACID SQL database and have no worries anymore with such problems.
Fully elaborating on the trade-off between NoSQL and SQL would exceed the limits of this text editor. I recommend you do some research on that on the web.
I mentioned about this application in my earlier post about PBNI. The application (Tax Software) was written in PB/Java/EAF running on EA Server. It was developed about 8 years ago with the then available technologies. The application is working fine, but there are leftovers from past legacies that I am trying to to clean up code/design.
There is certain code that does database (Oracle) transactions across PB and Java and since the 2 happened to be in different Database (Oracle) sessions, changes in one aren't visible in the other. So, in these cases, the application uses a switch to use PB code for the complete transaction instead of splitting across PB and Java. Otherwise, it uses PB/Java combination.
What this means is that identical sets of program blocks exist in PB and Java. Maintenance nightmare!! I believe PB objects were created first and someone ported those to Java for performance reason (not considering the above split transaction issue). I am trying to eliminate one (probably the PB code, considering performance). I am exploring PBNI in this context.
Please let me know, if any of you faced a similar situation and how you would solve it.
Thanks a lot in advance.
Sam
I don't claim to fully understand the nature of your application, but, please consider my comments.
Let PowerBuilder and Java perform necessary updates. It seems to me that you could commit transactions in either system and employ the idea of a logical commit. At the beginning of a transaction, update a column to indicate that the record is logically uncommitted. Java and PowerBuilder take turns updating and committing the record(s). Pass ROWID(s) between the two programs and a SELECT in either program would provide accurate data. When the transaction is logically complete, update the column to logically committed.
As for performance, moving business logic to an Oracle package or stored procedure is never a bad idea. It might take a little planning, but, the same code can run from PowerBuilder OR Java. Plus, there are some outstanding tuning tools for Oracle. Keep your transactions short and commit inside the package/procedure.
Don't be afraid to put logically incomplete transactions in a "work" table and copy the logically complete rows to the "complete" table.
I consider to use SQLite in a desktop application to persist my model.
I plan to load all data to model classes when the user opens a project and write it again when the user saves it. I will write all data and not just the delta that changed (since it is hard for me to tell).
The data may contain thousands of rows which I will need to insert. I am afraid that consecutive insertion of many rows will be slow (and a preliminary tests proves it).
Are there any optimization best practices / tricks for such a scenario?
EDIT: I use System.Data.SQLite for .Net
Like Nick D said: If you are going to be doing lots of inserts or updates at once, put them in a transaction. You'll find the results to be worlds apart. I would suggest re-running your preliminary test within a transaction and comparing the results.
We're evaluating EF4 and my DBA says we must use the NOLOCK hint in all our SELECT statements. So I'm looking into how to make this happen when using EF4.
I've read the different ideas on how to make this happen in EF4, but all seem like a work around and not sanctioned by Microsoft or EF4. What is the "official Microsoft" response to someone who wants their SELECT statement(s) to include the NOLOCK hint when using LINQ-to-SQL / LINQ-to-Entities and EF4?
By the way, the absolute best information I have found was right here and I encourage everyone interested in this topic to read this thread.
Thanks.
NOLOCK = "READ UNCOMMITTED" = dirty reads
I'd assume MS knows why they chose the default isolation level as "READ COMMITTED"
NOLOCK, in fact any hint, should be used very judiciously: not by default.
Your DBA is a muppet. See this (SO): What can happen as a result of using (nolock) on every SELECT in SQL Sever?. If you happen to work at a bank, or any institution where I may have an account please let me know so I can close it.
I'm a developer on a tools team in the SQL org at Microsoft. I'm in no way authorized to make any official statement, and I'm sure there are people on SO who know more about these things than I do. Nevertheless, I'll offer a friendly rule of thumb, along the theme of "Premature optimization is the root of all evil":
Don't use NOLOCK (or any other query hint for that matter), until you have to. If you have a select statement which has a decent query plan, and it runs fine when there is very little other load on the system, but then it slows down when other queries are accessing the same table, try adding some NOLOCK hints. But always understand that when you do, you run the risk of getting inconsistent data. If you are writing some mission critical app that does online banking or controls an aircraft, this may be unacceptable. However, for many applications the perf speedup is worth the risk. Evaluate on a case-by-case basis, though. Don't just use them willy nilly all over the place.
If you do choose to use NOLOCK, I have blogged a solution in C# using extension methods, so that you can easily change a LINQ query to use NOLOCK hints. If you can adapt this to EF4, please post your adaptation.
EF4 does not currently have a built in way to do it IF ef4 is generating all your queries.
There are ways around this such as using stored procedures or a more extended inline query model, however, this can be time consuming to say the least.
I believe (and I don't speak for Microsoft on this) that caching is Microsoft's intended solution for lightening the load on the server in EF4 sites. Having read uncommitted (or nolock) built into a framework would create unpredictable issues for the expected behaviour of EF4 when 2 contexts are run at the same time. That doesn't mean your situation needs that level of concurrency.
It sounds like you were asked for nolock on ALL selects. While I agree with earlier poster that this can be dangerous if you have ANY transactions that need to be transactions, I don't agree that automatically makes the DBA a muppet. You might just be running a CMS which is totally cool for dirty reads. You can change the ISOLATION LEVEL on your whole database which can have the same effect.
The DBA may have recommended nolock for operations that were ONLY selects (which is fine, especially if there's an ORM being misuesd and doing some dodgy data dumps). The funniest thing about that muppet comment is that Stack Overflow itself runs SQL server in a READ UNCOMMITTED mode. Guess you need to find somewhere else to get answers for your problems then?
Talk to your DBA about the posibility of setting this on a database level or consider a caching strategy if you only need it in a few places. The web is stateless after all so concurrency can often be an illusion anyway unless you address it direclty.
Info about isolation levels
Having worked with EF4 for over a year now, I will offer that using stored procedures for specific tasks is not a hack and absolutely necessary for performance under certain situations.
Our platform gets a lot of traffic through our web site, APIs and ETL data feeds. We use EF primarily on our web side, but also for some back-end processes. Sometimes EF does a great job with its query generation, sometimes it is terrible. You need to look at the queries being generated, load them into query analyzer, and decide whether you might be better off writing the operation in another way (stored procedure, etc.).
If you find that you need to make data available via EF and need NOLOCKs, you can always create views with the NOLOCK hints included, and expose the view to EF instead of the underlying table. The same can be done with Stored Procedures. These methods are probably a bit easier when you are using the Code First approach.
But I think that one mistake a lot of people make with EF is believing that the EF object model has to map directly to the physical (table) model in the database. It doesn't and this is where your DBA comes into play. Let him design your physical model and you work together to abstract your logical data model which is mapped to your object model in EF.
Although this would be a major PITA to do, you can always drop your SQL in a stored procedure and get the functionality you need (or are forced into). It's definitely a hack though!
I know this isn't an answer to your question, but I just wanted to throw this in.
It seems to me that this is (at least partially) the DBA's job. It's fine to say that an application should behave a certain way, and you can and should certainly attempt to program it the way that he would like.
The only way to be sure though, is for the DBA to work on the application with you and construct the DB surface that he would like to present to the app. If he wants critical tables to be queried as READ UNCOMMITTED, then he should help to provide a set of stored procedures with the correct access and isolation level.
Relying on the application code to construct every ad-hoc query correctly is not a scalable approach.
I have a process that reads raw data and writes this to a database every few seconds.
What is the best way to tell if the database has been written to? I know that Oracle and MS-SQL can use triggers or something to communicate with other services, but I was hoping there would be a technique that would work with more types of SQL databases (SQL lite, MySQL, PostGRES).
Your question is lacking specifics needed for a good answer but I'll give it a try. Triggers are good for targeting tables but if you are interested in system-wide writes then you'll need a better method that is easier to maintain. For system-wide writes I'd investigate methods that detect changes in the transaction log. Unfortunately, each vendor implements this part differently, so one method that works for all vendors is not likely. That is, a method that works within the database server is unlikely. But there may be more elegant ways outside of the server at the OS level. For instance, if the transaction log is a file on disk then a simple script of some sort that detects changes in the file would indicate the DB was written to.
Keep in mind you have asked only to detect a db write. If you need to know what type of write it was then you'll need to get into the transaction log to see what is there. And that will definitely be vendor specific.
It depends on what you wish to do. If it is something external to the database that needs to be kicked off then a simple poll of the database would do the trick, otherwise a db specific trigger is probably best.
If you want to be database independant, polling can work. It's not very efficient or elegant. It also works if you are cursed to using a database that doesn't support triggers. A workaround that we've used in the past is to use a script that is timed (say via cron) to do a select MAX(primary_key_id) from saidTable. I am assuming that your primary key is an a sequential integer and is indexed.
And then compare that to the value you obtained the last time you ran the script. If they match, tell the script to exit or sleep. If not, do your thing.
There are other issues with this approach (ie: backlogs if the script takes too long, or concurrency issues, etc.). And of course performance can become an issue too!