SQL Server Database Mail provides nice built-in features like logging. I would like to use it to send large amounts of email (millions+) really fast (several millions / hour).
Is Database Mail designed for this kind of usage and will I reach the performance I need in production with this solution?
No, it is not. Invest in some listserv software or go through a reputable service provider like Message Systems.
BTW what legitimate purpose requires millions of e-mails per hour?
Related
I am working on an eCommerce website designed to present a large number of SKUs. The SQL Server schema describing these products is normalized to the extent that, a few years ago, it became unreasonably slow to retrieve the necessary information to present to customers, so we changed our infrastructure such that we would bear the cost of loading the data for each product once and then store that data in an AppFabric cache (previously Velocity).
Over time, the complexity of requirements placed on our AppFabric infrastructure has grown (imagine that), forcing us to spend a considerable amount of time writing code for handling data retrieval from our cache, data updates including incremental updates, etc.
We happen to have much of our product data stored in a denormalized form in a side database, so for experimentation's sake I wrote a console app to randomly select one of our ~150K SKUs at a time, and then retrieve the record for that product from our denormalized table.
I was surprised to find that I was able to select these records in about the same average time that I could select a record from our AppFabric cache, about 2.5 ms average in both cases. I'm sure in both cases the data is coming from an in-memory cache of one sort or another, be it AppFabric or disk cache, and the 2.5 ms is bumping against a bare minimum amount of time for a network round trip.
This makes me think we might be better off just using denormalized data in SQL Server for our high load/high performance needs. The management tools for SQL Server-based data are so much better. All of the devs on our team are adept at using Management Studio, whereas with AppFabric we have one dev who can use PowerShell to a) Give us a count of records stored in the cache and b) dump the cache. Any other management functionality we have to create ourselves.
This makes me ask why anyone would want to use AppFabric at all. We are not concerned with cost, because the cost of the development efforts we have to apply to an AppFabric-related solution vastly outweigh even the cost of SQL Server licensing.
Thank you for whatever feedback you can provide to help our team decide the best direction to move forward.
Deciding to use a caching mechanism should be a very thought out process -- and isn't really always the right choice. However, the primary reason for using caching over a durable persistance model is to manage an extremely high transaction load.
In AppFabric Cache I can setup a distributed set of servers to work off of one logical repository -- with built in load balancing. So, unlike Microsoft SQL Server which has no way of providing clustered instances for the purpose of load balancing -- if I'm reading and writing 50 to 100 million times a day the cache is a more viable solution for sharing those resources. Then those writes can be queued to the durable persistence model over time ensuring that there are no real peaks in usage because it's spread out both across the caching fabric and the durable store.
Using AppFabric rather than a dedicated cache-aside database containing a denormalised schema also provides the benefit of fine grained control over cache key expiry, eviction, and tuned region policies. You would have to roll this yourself if you used SqlServer. I also agree with #mperrenoud03 comments about load balancing and high transaction rate support. Also, if you use a good ORM tool like NHibernate, it can be configured to use Appfabric (or other distributed cache platforms) as a 2nd level cache. We are leveraging this in our project and getting good results.
We are planning to create a web app to store banking transactions for customers, e.g purchases, transfers etc and allow them to tag / categorize each transaction.
Could someone point us to the best DB for this purpose? It needs to scale horizontally and we also need to perform analysis on all transactions.
Thanks
The best database to store banking transactions is the one the banks use, DB2/z.
But, since I doubt you'd be able to afford a System z mainframe, that's probably not an option. That doesn't make it any less the best database of course.
If, however, you're talking about storing transaction for Joe Bloggs or Dodgy Brothers Rug Emporium (as opposed to the two hundred million or so customers of ICBC), pretty well any database will be up to the task - Oracle (despite its inability to differentiate NULLs from empty strings), SQL Server, MySQL PostgreSQL, even SQLite probably.
I'm going to start this by saying its almost impossible to recommend a system based on what you've described. It could be for such a varied number of uses, ranging from mission critical real time financial data that needs to be there and needs to be accurate, through to a web app that sucks in financial records from a bank/credit card statement and lets the user annotate them, in which case it isn't as sensitive.
If you're storing mission critical, sensitive data, I'd go with a commercial option that includes significant support. Also a DBA would be a good idea.
Oracle or MS SQL would be my inclination, and probably Oracle over MS SQL, over because of its multi-platform support. If you're happy to run on Windows then MS SQL is fine.
If you're storing existing transactions that can be tagged (ala Blippy), then any database would be sufficient. If you're thinking of scaling this out to the n'th degree, you might like one of the document database flavours of the month, (MongoDB, Couch etc).
Really I think the question should be reconsidered from the context of what your application will do, not that it happens to do it with financial data. The fact that financial data may require additional security, or additional accuracy checks, that forms part of what the system will do, as does the way the user interacts with your web app etc.
This may not answer your question directly, but here is what I have experienced.
I think, its really about how you'd save your banking transactions. Most database vendors provide sufficient amount of database performance, so all you have to do is to choose one over other.
What you are left with is the actual information to be saved(besides schema). You might think about using database encryption option, but then its not really realistic in your case; because you are talking about transactions, I assume there are quite alot of transactions coming in, and you doing large of amount of reads for your reporting(besides write), possibly for mining, etc.
Usually(sql server), using encryption any data that is written into the database file is encrypted. Snapshots and backups are also use encryption. The transaction log is also protected, so it would hit the performance that you might desire.
So, I see your question really boiling down to How to protect sensitive data?
Here are couple of articles that might help:
Btw, I have deployed solutions with Oracle, SQL Server, and even Sybase as backends, with several transactions pouring in from ATMs, and what I really look for is the performance, besides security. Except for minute limitations of one over other, all are same.
Following articles might help:
Database security: protecting sensitive and critical information
Using One-Way Functions to Protect Sensitive Information in SQL Server Databases
what database to choose to store information about site visits, key characteristics: big amount of data, many page requests per second, different reports for data presentation, i think to use MySql, any suggestions?
Consider letting the server log the requests and parsing them asynchronously. You don't need ACID for analytics, and you don't need to process them while talking to a client.
Most mainstream databases are good for that (including mysql, postgres, oracle etc). MySql is fine though, especially if you've used it before.
Be sure look at licenses as well: MySql is GPL (the database and the connectors), Postgres is BSD, Oracle (and a few others) you need to pay for.
Most web analytics companies use some kind of distributed file system to store logs, such as HDFS, QFS... The reason is that the data is too big for the traditional database.
Analytics reports are generated via MapReduce job.
If you want to do an adhoc query, you normally use something like Hive/Pig/Sawzall.
I run a very high traffic(10m impressions a day)/high revenue generating web site built with .net. The core meta data is stored on a SQL server. My team and I have a unique caching strategy that involves querying the database for new meta data at regular intervals from a middle tier server, serializing the data to files and sending those to the web nodes. The web application uses the data in these files (some are actually serialized objects) to instantiate objects and caches those in memory to use for real time requests.
The advantage of this model is that it:
Allows the web nodes to cache all data in memory and not incur any IO overhead querying a database.
If the database ever goes down either unexpectedly or for maintenance windows, the web servers will continue to run and generate revenue. You can even fire up a web server without having to retrieve its initial data from the DB because all the data it needs are in files on its own disks.
Allows us to be completely horizontally scalable. If throughput suffers, we can just add a web server.
The disadvantages are that this caching and persistense layers adds complexity in the code that queries the database, packages the data and unpackages it on the web server. Any time our domain model requires us to add entities, more of this "plumbing" has to be coded. This architecture has been in place for four years and there are probably better ways to tackle this.
One strategy I have been considering is using replication to replicate our master sql server database to local database instances installed on each web server. The web server application would use normal sql/ORM techniques to instantiate objects. Here, we can still sustain a master database outage and we would not have to code up specialized caching code and could instead use nHibernate to handle the persistence.
This seems like a more elegant solution and would like to see what others think or if anyone else has any alternatives to suggest.
I think you're overthinking this. SQL Server already has mechanisms available to you to handle these kinds of things.
First, implement a SQL Server cluster to protect your main database. You can fail over from node to node in the cluster without losing data, and downtime is a matter of seconds, max.
Second, implement database mirroring to protect from a cluster failure. Depending on whether you use synchronous or asynchronous mirroring, your mirrored server will either be updated in realtime or a few minutes behind. If you do it in realtime, you can fail over to the mirror automatically inside your app - SQL Server 2005 & above support embedding the mirror server's name in the connection string, so you don't even have to lift a finger. The app just connects to whatever server's live.
Between these two things, you're protected from just about any main database failure short of a datacenter-wide power outage or network outage, and there's none of the complexity of the replication stuff. That covers your high availability issue, and lets you answer the scaling question separately.
My favorite starting point for scaling is using three separate connection strings in your application, and choose the right one based on the needs of your query:
Realtime - Points directly at the one master server. All writes go to this connection string, and only the most mission-critical reads go here.
Near-Realtime - Points at a load balanced pool of read-only SQL Servers that are getting updated by replication or log shipping. In your original design, these lived on the web servers, but that's dangerous practice and a maintenance nightmare. SQL Server needs a lot of memory (not to mention money for licensing) and you don't want to be tied into adding a database server for every single web server.
Delayed Reporting - In your environment right now, it's going to point to the same load-balanced pool of subscribers, but down the road you can use a technology like log shipping to have a pool of servers 8-24 hours behind. These scale out really well, but the data's far behind. It's great for reporting, search, long-term history, and other non-realtime needs.
If you design your app to use those 3 connection strings from the start, scaling is a lot easier, and doesn't involve any coding complexity - just pick the right connection string.
Have you considered memcached? Since it is:
in memory
can run locally
fully scalable horizontally
prevents the need to re-cache on each web server
It may fit the bill. Check out Google for lots of details and usage stories.
Just some addition to what RickNZ proposed above..
Since your master data which you are caching currently won't change so frequently and probably over some maintenance window, here is what should you do first on database side:
Create a SNAPSHOT replication for the master tables which you want to cache. Adding new entities will be equally easy.
On all the webservers, install SQL Express and subscribe to this Publication.
Since, this is not a frequently changing data, you can rest assure, no much server resource usage issue minus network trips for master data.
All your caching which was available via previous mechanism is still availbale minus all headache which comes when you add new entities.
Next, you can leverage .NET mechanisms as suggested above. You won't face memcached cluster failure unless your webserver itself goes down. There is a lot availble in .NET which a .NET pro can point out after this stage.
It seems to me that Windows Server AppFabric is exactly what you are looking for. (AKA "Velocity"). From the introductory documentation:
Windows Server AppFabric provides a
distributed in-memory application
cache platform for developing
scalable, available, and
high-performance applications.
AppFabric fuses memory across multiple
computers to give a single unified
cache view to applications.
Applications can store any
serializable CLR object without
worrying about where the object gets
stored. Scalability can be achieved by
simply adding more computers on
demand. The cache also allows for
copies of data to be stored across the
cluster, thus protecting data against
failures. It runs as a service
accessed over the network. In
addition, Windows Server AppFabric
provides seamless integration with
ASP.NET that enables ASP.NET session
objects to be stored in the
distributed cache without having to
write to databases. This increases
both the performance and scalability
of ASP.NET applications.
Have you considered using SqlDependency caching?
You could also write the data to the local disk at the web tier, if you're concerned about initial start-up time or DB outages. But at least with a SqlDependency, you shouldn't have to poll the DB to look for changes. It can also be made relatively transparent.
In my experience, adding a DB instance on web servers generally doesn't work out too well from a scalability or performance perspective.
If you're concerned about performance and scalability, you might consider partitioning your data tier. The specifics depend on your app, but as an example, you could move read-only data onto a couple of SQL Express servers that are populated with replication.
In case it helps, I talk about this subject at length in my book (Ultra-Fast ASP.NET).
I'm looking for some help/suggestions for backing up two large databases to one server dedicated to reports. The situation is;
My company has two databases for its internal website. One for the UK and one for Europe. Both are mirrored for DR.
I have a server based in Europe which is dedicated to Microsoft Reporting Services, where we run reports based on the data collected in those two databases.
We do not want to point reporting services to the live databases for performance/security reasons so we currently backup both databases on a daily basis and restore them to our Reporting Services server.
However this means we are putting a strain on our networks by backing up the entire databases, and also the data is only up-to-date by midnight yesterday.
Our aim is to have the data up to date by at least 15 minutes, it has been suggested to look at Log Shipping so I wondered if anyone had any experience in setting this up and what are the pros and cons and whether there is a better alternative?
Any help would be greatley appreciated,
Thanks
We developed a similar environment. We used Mirroring to get the data off to our reporting server and created an automated routine to create Snapshots of the database every 15 min. These snapshots only take 1 to 2 seconds to create in our environment and give us a read only copy of the database. Let me know if you would like me to go into deeper detail.
Note we are running Enterprise on both servers.
Log shipping is a great solution for this. We've got articles about it over at SQLServerPedia's Log Shipping section, and I've got a video tutorial on there talking you through your different options. One thing to keep in mind about log shipping is that when the restores happen, your users will be kicked out of the reporting database.
Replication doesn't have that problem, but replication is nowhere near "set-it-and-forget-it" - it's time-intensive to manage, and isn't quite as reliable as you'd like it to be. In addition, you may have to make schema modifications in order to use replication. Log shipping is more automatic & stable, but at the cost of kicking users out at restore time.
You can minimize that by having two log shipping schedules - one for daytime during business hours, and one for the rest. During business hours, you only restore the data once per hour (or less), and the rest of the time you do it every 15 minutes.
You should look at replication as an alternative to backups.
I would recommend that you look into using Transactional Replication.
It sounds as though you are looking to implement a scenario that is similar to what we are currently implementing ourselves.
We use Transaction Replication (albeit real time, you would most likely wish to synchronize your environment on a less frequent schedule) to offload a copy of our live production database to another server for reporting purposes.
Offloading reporting data is a common replication scenario and is described here in the Microsoft Replication documentation.
http://msdn.microsoft.com/en-us/library/ms151784.aspx
Brent is right in that there is indeed an element of configuration required with Replication, along with security considerations that would need to be addressed however, there are a number of key advantages to using Replication in my opinion, including:
Reduced latency in comparison to log
shipping.
The ability to Publish only the
Articles (tables) that are required
for reporting.
Reduced storage requirements.
Less data being published means less
network traffic.
Access to your reporting
data/database at all times.
For example, in our environment, we decided to replicate only the specific tables (articles) from our production database that we actually require for reporting.
I hope what I have described is clear and makes sense but please do feel free to contact me if you have any queries.