As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
What would be the best DB for Inserting records at a very high rate.
The DB will have only one table and the Application is very simple. Insert a row into the DB and commit it but the insertion rate will be very high.
Targetting about 5000 Row Insert per second.
Any of the very expensive DB's like Oracle\SQLServer are out of option.
Also what are the technologies for taking a DB Backup and will it be possible to create one DB from the older backed up DB's ?
I can't use InMemory capabilities of any DB's as I can't afford a crash of the Application. I need to commit the row as soon as I recieve it.
If your main goal is to insert a lot of data in a little time, perhaps the filesystem is all you need.
Why not write the data in a file, optionally in a DB-friendly format (csv, xml, ...) ? That way you can probably achieve 10 times your performance goal without too much trouble. And most OSs are robust enough nowadays to prevent data loss on application failures.
Edit: As said below, jounaling file systems are pretty much designed so that data is not lost in case of software (or even hardware in case of raid-arrays) failures. ZFS has a good reputation.
Postgres provides WAL (Write Ahead Log) which essentially does inserts into RAM until the buffer is full or the system has time to breath. You combine a large WAL cache with a UPS (for safety) and you have very efficient insert performance.
If you can't do SQLite, I'd take a look at Firebird SQL if I were you.
To get high throughput you will need to batch inserts into a big transaction. I really doubt you could find any db that allows you to round trip 5000 times a second from your client.
Sqlite can handle tons of inserts (25K per second in a tran) provided stuff is not too multithreaded and that stuff is batched.
Also, if structure correctly I see no reason why mysql or postgres would not support 5000 rows per second (provided the rows are not too fat). Both MySQL and Postgres are a lot more forgiving to having a larger amount of transactions.
The performance you want is really not that hard to achieve, even on a "traditional" relational DBMS. If you look at the results for unclustered TPC-C (TPC-C is the de-facto standard benchmark for transaction processing) many systems can provide 10 times your requirements in an unclustered system. If you are going for cheap and solid you might want to check out DB2 Express-C. It is limited to two cores and two gigabytes of memory but that should be more than enough to satisfy your needs.
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I really interesting in non relational databases, but due the many reason familiar only with the small part of it. So I want to list all NoSQL technologies you use with basic use cases, pros and cons.
If you have some specific issues during the work with some technologies, interesting experience, etc. you are welcome to share it with community.
Personally I worked with:
Mongodb:
Usecases: For my opinion is one of the best if you need good aggregation features, automatic replication. Good in scale. Have many features which allow using it like everyday use database and if for some reason you don't want to use SQL solution - Mongo could be the great choice. Also mongo is great if you need dynamic queries. And also mongodb support indexing - it's also important feature.
Pros: Fast, good scale, easy to use, internal geospatial Indexes
Cons: Comparatively slow write operation, blocking atomic operation could make a lot of problems. Memory consuming process could "eat" all available memory.
Couchdb:
Usecases: I use it in Wiki liked project and I think for that cases is the perfect database. The fact that each document automatically saves in new revision during update helps to see all the changes. For accumulating, occasionally changing data, on which pre-defined queries are to be run.
Pros: Easy to use, REST oriented interface, versions.
Cons: Problem with performance when amount of docs is quite huge (more than half a million), a bit pure query features (could be solving with adding Lucene)
SimpleDB:
Usecases: This is dataservice from Amazon, the cheapest from the all stuff they provide. Very limited in features so the main use case is using it if you want to use Amazon service, but paying as less ass possible.
Pros: Cheap, all data stored like text - simple to operate, easy to use.
Cons: Very much limitation (document size, collections size, attribute count, attribute size). The way that all data stored like a text also creating additional problems during sorting by date or by number (because it use lexicographical sorting, which need some workaround when saving date or numbers).
Cassandra
Cassandra is perfect solution if writing is your main goal, it's designed to write a lot (in some cases writing could be faster then reading), so it's perfect for logging. Also it very useful for data analysis. Except that Cassandra have built in geographical distribution features.
Strengths Supported by Apache (good community and high quality), fast writing, no single point for failure. Easy to manage when scale (easy to deploy and enlarge cluster).
Weaknesses indexes implementation have problems, querying by index have some limitation, and if you using indexes inserting performance decrease. Problems with stream data transfering.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm gathering information for upcoming massive online game. I has my experience with MEGA MASSIVE farm-like games (millions of dau), and SQL databases was great solution. I also worked with massive online game where NoSQL db was used, and this particular db (Mongo) was not a best fit - bad when lot of connections and lot of concurrent writes going on.
I'm looking for facts, benchmarks, presentation about modern massive online games and technical details about their backend infrastructure, databases in particular.
For example I'm interested in:
Can it manage thousands of connection? May be some external tool can help (like pgbouncer for postgres).
Can it manage tens of thousands of concurrent read-writes?
What about disk space fragmentation? Can it be optimized without stopping database?
What about some smart replication? Can it tell that some data is missing from replica, when master fails? Can i safely propagate slave to master and know exactly what data is missing and act appropriately?
Can it fail gracefully? (like postgres for ex.)
Good reviews from using in production
Start with the premise that hard crashes are exceedingly rare, and when they occur
it won't be a tragedy of some information is lost.
Use of the database shouldn't be strongly coupled to the routine management of the
game. Routine events ought to be managed through more ephemeral storage. Some
secondary process should organize ephemeral events for eventual storage in a database.
At the extreme, you could imagine there being just one database read and one database
write per character per session.
Have you considered NoSQL ?
NoSQL database systems are often highly optimized for retrieval and
appending operations and often offer little functionality beyond
record storage (e.g. key–value stores). The reduced run-time
flexibility compared to full SQL systems is compensated by marked
gains in scalability and performance for certain data models.
In short, NoSQL database management systems are useful when working
with a huge quantity of data when the data's nature does not require a
relational model. The data can be structured, but NoSQL is used when
what really matters is the ability to store and retrieve great
quantities of data, not the relationships between the elements. Usage
examples might be to store millions of key–value pairs in one or a few
associative arrays or to store millions of data records. This
organization is particularly useful for statistical or real-time
analyses of growing lists of elements (such as Twitter posts or the
Internet server logs from a large group of users).
There are higher-level NoSQL solutions, for example CrouchDB, which has built-in replication support.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
When there is a web app will query some information frequently, how to improve the performance by cache the query result?
(The information is like top news in a website and my database is SQL Server 2008, the application is on tomcat.)
I can suggest the following:
In your database you can use idex views, please check: How to mimick Oracle Materialized Views on MS SQL Server?.
If you has used JPA or Hibernate it can cache Entities (objects).
http://docs.jboss.org/hibernate/orm/3.3/reference/en/html/performance.html#performance-cache
http://en.wikibooks.org/wiki/Java_Persistence/Caching
If you're looking for a cache system that is foreign to database and ORM, maybe you can review MemCache or EHCache.
http://memcached.org/
http://ehcache.org/
An option but not recommended is that you manage a cache in your application, by example you can store at ServletContext (also know as ApplicationContext) the list of Countries, but you need to implement the business logic for cache (update, delete and insert objects), also you need to be careful with the Heap Memory.
You can use a combination of the above strategies it depends of the context of your business
Best regards,
Ernesto.
This is a pretty general question and as you'd expxect, there are many options.
Closest to the UI, your web platform might have 'content caching.' ASP.NET, for example, will cache portions of a page for specified periods of time.
You could use a caching tool like memcached and cache a recordset (or whatever the stand-alone Java data structure is).
Some ORM's provide caching too.
And (probably not finally) you could define structure in your database to 'cache' results like this by running complex queries and saving the results into tables that are queried more often but are cheaper to query.
Just some ideas.
The answer for a really big site is all of the above. We do all our queries via stored procs. That helps because the query is compiled and one execution plan is reused. We have a wicked ccomplicated table valued function. It's so expensive we built a cache table. The table has the same general foormat as the function but with two extras. One is an expire time. The other is a search key. The search key is the parameters that go into the function concatenated together. Whenever we're about to query that table we run a Proc to check if the data is stale. If it is we start a transaction delete the rows, and then run the function and insert the rows. This means we run the function maybe 2 or 3% of the times we used to and the proc call we make to check for staleness is much cheaper. Whenever the app updates the relevant data it goes and updates the cache rows as stale - but it doesn't delete them we leave that to the cache check function. Why? Well maybe nobody will need that data right now, so less db hit. Then we hit the second layer. We cache many recordsets in memcached. Including all of the procs that call that function, and many more. That actually happens in our asp layer, which we still have. ADO recordsets can be persisted to xml natively, which then goes into memcache as a string.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Is a database a reasonable data structure for memoization? When extremely large amounts of data need to be cached, it may be unreasonable for an ordinary piece of software to actively maintain it in memory. A database makes it easy to store the results of calculations for later use, meaning calculations can be stopped and started at anytime without affecting a program's progress. If the database is shared, processing can also be distributed among multiple systems (a computer cluster).
My only reservation is that the delay caused by querying a database may impact algorithm performance, especially if an algorithm processes many permutations very quickly. Of course, database memoization would only be necessary if the space complexity of an algorithm / application is extremely high (gigabytes). Any thoughts?
If you're worried about large data to be answered on a single machine, the answer to this is almost certainly NO! And on modern hardware, if the answer is not no, then either there is a pattern to the calculation, or the computation should be ruled infeasible. But there are several variations where it can make sense.
The win with memoization is that the cost of recalculation is more than fetching your previous answer. But if your answer fits in RAM, then there is no win to using a database since it is faster to just keep the store in memory. So the only interesting case for the database is where the answer does not fit in RAM.
Let's suppose, for the sake of argument, that each key/value pair takes a whopping 640 bytes. Let us suppose that you have 64 GB of RAM available to you. So in order for it to not fit in RAM, you need over 100 million facts, which are created/accessed randomly. However let's consider actual hardware. These facts, when they don't fit in RAM, are stored in a hard drive. The hard drive spins at, let's say, 6k RPM, or 100 times per second. This makes the time to fetch/store a random piece of data an average of 1/200th of a second (on average you have to spin half-way to find your data). So after you fill your data structure, to access it all again randomly takes 100 million * 0.005 s = 500,000 seconds which is nearly 590 days. We're taking years just to access data (let alone create it) which is getting perilously close to the mean time between failure for the hardware. (BTW there is some parallelism we can take advantage of here, hard drives cam look for several disk sectors they are looking for at a time, but that is limited and will not save you.)
The moral is that randomly accessing large data sets on disk is not feasible. Even if you put a database in front of it. Hard drives are not RAM, and should not be thought of as such.
But all is not lost.
A scenario where the database makes sense is your suggestion of a distributed computation. If your computational steps are expensive, memoized calls are relatively few, and the data can fit in memory, then a database is very convenient. Calls to the database will be fast (things are in memory), you can't simply keep things on a local hard drive (your data is spread out across multiple machines to use CPUs so there is no shared hard drive), and the database may be convenient simply because it is there. (I've used databases this way before, and been very happy.)
However in this scenario the database is just a key/value store. While a SQL database works, you may want to consider no-SQL solutions. And once you go to no-SQL solutions you have options for data stores where data has been sharded such that it all fits in RAM, no matter how much data you have. (Yes, you can shard relational databases as well. eBay is a good example of a company that I know does, but once you do you tend to lose the "relational" part of it. Yes, I know that several companies claim otherwise, their claims come with significant caveats.)
In fact when you do a Google search you are running against just this kind of sharded data store, which contains what is essentially memoized answers to a lot of questions about which pages match which key words, and which of those pages are most relevant. Without memoization they could never do it. But they also could never actually do it if they had to go to a hard drive for the answer. (They're also not using SQL...)
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Would any one please differentiate what is best to use SQLite or SQL Server? I was using XML file as a data storage ADD, delete , update.. Some one suggested to use SQLite for fast operation but I am not familier with SQLite I know SQL Server.
SQLite is a great embedded database that you deploy along with your application. If you're writing a distributed application that customers will install, then SQLite has the big advantage of not having any separate installer or maintenance--it's just a single dll that gets deployed along with the rest of your application.
SQLite also runs in process and reduces a lot of the overhead that a database brings--all data is cached and queried in-process.
SQLite integrates with your .NET application better than SQL server. You can write custom function in any .NET language that run inside the SQLite engine but are still within your application's calling process and space and thus can call out to your application to integrate additional data or perform actions while executing a query. This very unusual ability makes certain actions significantly easier.
SQLite is generally a lot faster than SQL Server.
However, SQLite only supports a single writer at a time (meaning the execution of an individual transaction). SQLite locks the entire database when it needs a lock (either read or write) and only one writer can hold a write lock at a time. Due to its speed this actually isn't a problem for low to moderate size applications, but if you have a higher volume of writes (hundreds per second) then it could become a bottleneck. There are a number of possible solutions like separating the database data into different databases and caching the writes to a queue and writing them asynchronously. However, if your application is likely to run into these usage requirements and hasn't already been written for SQLite, then it's best to use something else like SQL Server that has finer grained locking.
UPDATE: SQLite 3.7.0 added a new journal mode called Write Ahead Locking that supports concurrent reading while writing. In our internal multi-pricess contention test, the timing went from 110 seconds to 8 seconds for the exact same sequence of contentious reads/writes.
Both are in different league altogether. One is built for enterprise level data management and another is for mobile devices (embedded or server less environment). Though SQLite deployments can hold data in many hundred GBs but that is not what it is built for.
Updated: to reflect updated question:
Please read this blog post on SQLite. I hope that would help you and let you access it from redirect you to resources to programatically access SQLite from .net.