As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
When there is a web app will query some information frequently, how to improve the performance by cache the query result?
(The information is like top news in a website and my database is SQL Server 2008, the application is on tomcat.)
I can suggest the following:
In your database you can use idex views, please check: How to mimick Oracle Materialized Views on MS SQL Server?.
If you has used JPA or Hibernate it can cache Entities (objects).
http://docs.jboss.org/hibernate/orm/3.3/reference/en/html/performance.html#performance-cache
http://en.wikibooks.org/wiki/Java_Persistence/Caching
If you're looking for a cache system that is foreign to database and ORM, maybe you can review MemCache or EHCache.
http://memcached.org/
http://ehcache.org/
An option but not recommended is that you manage a cache in your application, by example you can store at ServletContext (also know as ApplicationContext) the list of Countries, but you need to implement the business logic for cache (update, delete and insert objects), also you need to be careful with the Heap Memory.
You can use a combination of the above strategies it depends of the context of your business
Best regards,
Ernesto.
This is a pretty general question and as you'd expxect, there are many options.
Closest to the UI, your web platform might have 'content caching.' ASP.NET, for example, will cache portions of a page for specified periods of time.
You could use a caching tool like memcached and cache a recordset (or whatever the stand-alone Java data structure is).
Some ORM's provide caching too.
And (probably not finally) you could define structure in your database to 'cache' results like this by running complex queries and saving the results into tables that are queried more often but are cheaper to query.
Just some ideas.
The answer for a really big site is all of the above. We do all our queries via stored procs. That helps because the query is compiled and one execution plan is reused. We have a wicked ccomplicated table valued function. It's so expensive we built a cache table. The table has the same general foormat as the function but with two extras. One is an expire time. The other is a search key. The search key is the parameters that go into the function concatenated together. Whenever we're about to query that table we run a Proc to check if the data is stale. If it is we start a transaction delete the rows, and then run the function and insert the rows. This means we run the function maybe 2 or 3% of the times we used to and the proc call we make to check for staleness is much cheaper. Whenever the app updates the relevant data it goes and updates the cache rows as stale - but it doesn't delete them we leave that to the cache check function. Why? Well maybe nobody will need that data right now, so less db hit. Then we hit the second layer. We cache many recordsets in memcached. Including all of the procs that call that function, and many more. That actually happens in our asp layer, which we still have. ADO recordsets can be persisted to xml natively, which then goes into memcache as a string.
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
As Wikipedia says
Database Triggers are commonly used to:
audit changes (e.g. keep a log of the users and roles involved in
changes)
enhance changes (e.g. ensure that every change to a record
is time-stamped by the server's clock)
enforce business rules (e.g.
require that every invoice have at least one line item) etc.
ref: database triggers - wikipedia
But we can do these things inside the Business Layer using a common programming language (especially with OOP) easily. So what is the necessity of database triggers in modern software architecture? Why do we really need them?
It might work, if all data is changed by your application only. But there are other cases which I have seen very frequently:
There are other applications (like batch jobs doing imports etc.) which do not use the business layer
You cannot use plain SQL scripts as a means for hotfixes easily
Apart from that in some cases you can even combine both worlds: Define a trigger in the database, and use Java to implement it. PostgreSql for examples supports triggers written in Java. As for Oracle, you can call a Java method from a PL/SQL trigger. You can define CLR based triggers in MS SQL Server.
This way not every programmer needs to learn PL/SQL, and data integrity is enforced by the database.
Think about the performance. IF this is all to be done from the application, there are most likely a lot of extra sql*net round trips, slowing down the application. Having those actions defined in the database makes sure that they are always enforced, not only when the application is used to access the data.
When the database is in control, you have your rules defined on the central location, the database, instead of in many locations in the application.
Yes, you can completely omit database triggers.
However, if you can't guarantee that your database will only be accessed from the application layer (which is impossible) then you need them. Yes, you can perform all your database logic in the application layer but if you have a table that needs X done to it when you're updating it then the only way to do that is in a trigger. If you don't then people accessing your database directly, outside your application, will break your application.
There is nothing else you can do. If you need a trigger, use one. Do not assume that all connections to your database will be through your application...
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Recently I have encountered the concept of NoSQL and as far as I manage to comprehend is good for dealing with huge amount of data.
My question is, what is the limit were using NoSQL becomes worthwhile ? Is it only for companies which handle really huge amount of data like Google, Facebook etc. or it's worth the trouble to switching to it from a SQL database even for a smaller data amount .
I wonder what "concept of NoSQL" you mean, because it is an umbrella term for a wide field of different database technologies. The only thing they have in common is what sets them apart from each other: they are "not (only) SQL". They have widely different philosophies, use-cases and target groups.
Just to give you an overview, here are a few of the large factions of NoSQL databases.
There are document-based databases like MongoDB or CouchDB. Their advantage is that they do not require a consistent data structure. They are useful when your requirements and thus your database layout changes constantly, or when you are dealing with datasets which belong together but still look very differently. When you have a lot of tables with two columns called "key" and "value", then these might be worth looking into.
There are graph databases like Neo4j or GiraffeDB. Their focus is at defining data by its relation to other data. When you have a lot of tables with primary keys which are the primary keys of two other tables (and maybe some data describing the relation between them), then these might be something for you.
Then you have simple key-value stores like MemcacheDB, Cassandra or Google's BigTable. They are very simplistic, but that makes them fast and easy to use. When you have no need for stored procedures, constraints, triggers and all those advanced database features and you just want fast storage and retrieval of your data, then those are for you.
And these are just a few facets of the new database world.
But there is still one sector where relational databases excel, and that's when it comes to following the ACID principle. Most NoSQL databases don't fully guarantee all four of these:
Atomic transactions (chains of commands which are processed together, n-order and all-or-none)
Consistent database schema with constraints and triggers which ensure that garbage data can not exist in the database.
Isolation of transactions - transactions which are guaranteed to be unaffected by others which happen at the same time.
Durability - safety from data-loss even in case of a sudden system crash*
(* to be fair, most of the databases listed above are indeed pretty durable, especially those which are easy to set up as redundant fail-over clusters.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I would like to remove sql dependency of small chunks of data that I load on (almost) each request on a web application. Most of the data is key-value/document structured, but a relational solution is not excluded. The data is not too big so I want to keep it in memory for higher availability.
What solution would you recommend?
The simplest and most widely used in-memory Key-value storage is MemcacheD. The introduction page re-iterates what you are asking for:
Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.
The client list is impressive. It's been for a long time. Good documentation. It has API for almost every programming language. Horizontal scaling is pretty simple. As my experience goes Memcached is good.
You may also want to look into MemBase.
Redis is perfect for this kind of data. It also supports some fundamental datastructures and provides operations on them.
I recently converted my Django forum app to use it for all real-time/tracking data - it's so good to no longer have the icky feeling you get when you do this kind of stuff (SET views = views + 1 and other writes on every page view) with a relational database.
Here's an example of using Redis to store data required for user activity tracking, including keeping an ordered set of last seen users up to date, in Python:
def seen_user(user, doing, item=None):
"""
Stores what a User was doing when they were last seen and updates
their last seen time in the active users sorted set.
"""
last_seen = int(time.mktime(datetime.datetime.now().timetuple()))
redis.zadd(ACTIVE_USERS, user.pk, last_seen)
redis.setnx(USER_USERNAME % user.pk, user.username)
redis.set(USER_LAST_SEEN % user.pk, last_seen)
if item:
doing = '%s %s' % (
doing, item.get_absolute_url(), escape(str(item)))
redis.set(USER_DOING % user.pk, doing)
If you don't mind the sql but want to keep the db in memory, you might want to check out sqlite (see http://www.sqlite.org/inmemorydb.html).
If you don't want the sql and you really only have key-value pairs, why not just store them in a map / hash / associative array and be done with it?
If you end up needing an in-memory database, H2 is a very good option.
One more database to consider: Berkeley DB. Berkeley DB allows you to configure the database to be in-memory, on-disk or both. It supports both a key-value (NoSQL) and a SQL API. Berkeley DB is often used in combination with web applications because it's embedded, easily deployed (it deploys with your application), highly configurable and very reliable. There are several e-Retail web sites that rely on Berkeley DB for their e-Commerce applications, including Amazon.com.
I'm not sure this is what you are looking for but you should look into a caching framework (something that may be included in the tools you are using now). With a repository pattern you ask for the data, there you check if you have it in cache by key. I you don't, you fetch it from the database, if you do, you fetch it from the cache.
It will depend on what kind of data you are handling so it's up to you to decide how long to keep data in cache. Perhaps a sliding timeout is best as you'll keep the data as long as the key keeps being request. Which means if the cache has data for a user, once the user goes away, the data will expire from the cache.
Can you shard this data? Is data access pattern simple and stable (does not change with changing business requirements)? How critical is this data (session context, for example, is not too hard to restore, whereas some preferences a user has entered on a settings page should not be lost)?
Typically, provided you can shard and your data access patterns are simple and do not mutate too much, you choose Redis. If you look for something more reliable and supporting more advanced data access patterns, Tarantool is a good option.
Please do check out this :
http://www.mongodb.org/
Its a really good No-SQL database with drivers and support for all major languages.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Would any one please differentiate what is best to use SQLite or SQL Server? I was using XML file as a data storage ADD, delete , update.. Some one suggested to use SQLite for fast operation but I am not familier with SQLite I know SQL Server.
SQLite is a great embedded database that you deploy along with your application. If you're writing a distributed application that customers will install, then SQLite has the big advantage of not having any separate installer or maintenance--it's just a single dll that gets deployed along with the rest of your application.
SQLite also runs in process and reduces a lot of the overhead that a database brings--all data is cached and queried in-process.
SQLite integrates with your .NET application better than SQL server. You can write custom function in any .NET language that run inside the SQLite engine but are still within your application's calling process and space and thus can call out to your application to integrate additional data or perform actions while executing a query. This very unusual ability makes certain actions significantly easier.
SQLite is generally a lot faster than SQL Server.
However, SQLite only supports a single writer at a time (meaning the execution of an individual transaction). SQLite locks the entire database when it needs a lock (either read or write) and only one writer can hold a write lock at a time. Due to its speed this actually isn't a problem for low to moderate size applications, but if you have a higher volume of writes (hundreds per second) then it could become a bottleneck. There are a number of possible solutions like separating the database data into different databases and caching the writes to a queue and writing them asynchronously. However, if your application is likely to run into these usage requirements and hasn't already been written for SQLite, then it's best to use something else like SQL Server that has finer grained locking.
UPDATE: SQLite 3.7.0 added a new journal mode called Write Ahead Locking that supports concurrent reading while writing. In our internal multi-pricess contention test, the timing went from 110 seconds to 8 seconds for the exact same sequence of contentious reads/writes.
Both are in different league altogether. One is built for enterprise level data management and another is for mobile devices (embedded or server less environment). Though SQLite deployments can hold data in many hundred GBs but that is not what it is built for.
Updated: to reflect updated question:
Please read this blog post on SQLite. I hope that would help you and let you access it from redirect you to resources to programatically access SQLite from .net.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
What would be the best DB for Inserting records at a very high rate.
The DB will have only one table and the Application is very simple. Insert a row into the DB and commit it but the insertion rate will be very high.
Targetting about 5000 Row Insert per second.
Any of the very expensive DB's like Oracle\SQLServer are out of option.
Also what are the technologies for taking a DB Backup and will it be possible to create one DB from the older backed up DB's ?
I can't use InMemory capabilities of any DB's as I can't afford a crash of the Application. I need to commit the row as soon as I recieve it.
If your main goal is to insert a lot of data in a little time, perhaps the filesystem is all you need.
Why not write the data in a file, optionally in a DB-friendly format (csv, xml, ...) ? That way you can probably achieve 10 times your performance goal without too much trouble. And most OSs are robust enough nowadays to prevent data loss on application failures.
Edit: As said below, jounaling file systems are pretty much designed so that data is not lost in case of software (or even hardware in case of raid-arrays) failures. ZFS has a good reputation.
Postgres provides WAL (Write Ahead Log) which essentially does inserts into RAM until the buffer is full or the system has time to breath. You combine a large WAL cache with a UPS (for safety) and you have very efficient insert performance.
If you can't do SQLite, I'd take a look at Firebird SQL if I were you.
To get high throughput you will need to batch inserts into a big transaction. I really doubt you could find any db that allows you to round trip 5000 times a second from your client.
Sqlite can handle tons of inserts (25K per second in a tran) provided stuff is not too multithreaded and that stuff is batched.
Also, if structure correctly I see no reason why mysql or postgres would not support 5000 rows per second (provided the rows are not too fat). Both MySQL and Postgres are a lot more forgiving to having a larger amount of transactions.
The performance you want is really not that hard to achieve, even on a "traditional" relational DBMS. If you look at the results for unclustered TPC-C (TPC-C is the de-facto standard benchmark for transaction processing) many systems can provide 10 times your requirements in an unclustered system. If you are going for cheap and solid you might want to check out DB2 Express-C. It is limited to two cores and two gigabytes of memory but that should be more than enough to satisfy your needs.