How to decide whether I need to transition away from sqlite [closed] - database

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'm creating a website using django. It's just about done but it hasn't gone live yet. I'm trying to decide whether SQLite is good enough for the site or if it would be worthwhile to use PostgreSQL now, at the beginning, rather than risk needing to transition to it later. (In this post, I mention PostgreSQL because that's the other contender for me. I'm sure similar analysis could be made with MySQL or Oracle.)
I could use some input from people about how they decide upon what database to use with their django projects.
Here's what I currently understand about this:
From my experience, SQLite is super easy. I don't need to worry about installing some other dependency for it and it pretty much just works out of the box with django.
From my research online, it seems that SQLite is able to handle quite a bit of load before becoming a performance bottleneck.
Here's what I don't know:
What would be involved with me transitioning to PostgreSQL from SQLite? Again, I'm currently in a development-only phase and therefore would not need to transition any database data from SQLite. Is it pretty much just a matter of installing PostgreSQL on the server and then tweaking the settings.py file to use it? I doubt it, but would any of my django code need to change? (I don't have any raw SQL queries - my database access is limited to django's model API.)
From a performance perspective, is PostgreSQL simply better in every way over SQLite? Or does SQLite have certain advantages over PostgreSQL?
Performance aside, does using PostgreSQL offer other deployment benefits over SQLite?
Essentially I'm thinking that SQLite is good enough for my little site. What's the odds of it becoming really popular? Probably not that great. SQLite is working for me now and would require no changes from my end. However, I'm concerned that maybe using PostgreSQL from the beginning would be easy and that I'll kick myself a year from now for not making the transition. I'm torn though - if I go to PostgreSQL, perhaps it will be unnecessary hassle on my part with no benefit.
Does anyone have general guidelines for deciding between SQLite and something else?
Thanks!

Here are a few things to consider.
SQLite does not allow concurrent writes. If an insert or an update is issued, the entire database is locked, and even readers are not allowed during a short moment of the actual update. If your application is going to have many users that update its state (posting comments, adding likes, etc), this is going to become a bottleneck. Unpleasant slowdowns will time to time occur even with a relatively small number of users.
SQLite does not allow several processes to efficiently access a database. There can only be one writing process, even if you have multiple CPUs, and even then the locking mechanism is very inefficient. To ensure data integrity you'd need to jump through many hoops, and every update will be horribly slow. Postgres can reorder locks optimally, lock tables at row level or even update without locking, so it will run circles around SQLite performance-wise, unless your database is strictly read-only.
SQLite does not allow data partitioning or even putting different tables to different tablespaces; everything lives in one file. If you have a "hot" table that is touched very often (e.g sessions, authorization, stats), you cannot tweak its parameters, put it on an SSD, etc. You can use a separate database for that, though, if relational integrity is not crucial.
SQLite does not have replication or failover features. If your app's downtime is going to cost you money, you'd better have a hot backup database server, ready to take over if the main server goes down. With Postgres, this is relatively painless; with SQLite, hardly.
SQLite does not have an online backup and point-in-time recovery capability. If data you receive from users is going to cost you money (e.g. merchant orders or user data under a SLA), you better back up your data regularly, even continuously. Postgres, of course, can do this; SQLite cannot.
In short: when your site stops being a toy, you should have switched already. You should have switched some time before your first serious load spike, to iron out any obvious problems.
Fortunately, Django ORM makes switching very easy on the Python side: you mostly change the connection string in settings.py. On the actual database side you'll have to do a bit more: profile your most important queries, tweak certain column types and indexes, etc. Unless you know how to cook Postgres yourself, seek help of people who know; databases have many non-obvious subtleties that influence performance significantly. Deploying Postgres is definitely more tricky than SQLite (though not realy hard); the result is way more functional when it comes to operation / maintenance under load.

Related

Strategies to building a database of 30m images [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Summary
I am facing the task of building a searchable database of about 30 million images (of different sizes) associated with their metadata. I have no real experience with databases so far.
Requirements
There will be only a few users, the database will be almost read-only, (if things get written then by a controlled automatic process), downtime for maintenance should be no big issue. We will probably perform more or less complex queries on the metadata.
My Thoughts
My current idea is to save the images in a folder structure and build a relational database on the side that contains the metadata as well as links to the images themselves. I have read about document based databases. I am sure they are reliable, but probably the images would only be accessible through a database query, is that true? In that case I am worried that future users of the data might be faced with the problem of learning how to query the database before actually getting things done.
Question
What database could/should I use?
Storing big fields that are not used in queries outside the "lookup table" is recommended for certain database systems, so it does not seem unusual to store the 30m images in the file system.
As to "which database", that depends on the frameworks you intend to work with, how complicated your queries usually are, and what resources you have available.
I had some complicated queries run for minutes on MySQL that were done in seconds on PostgreSQL and vice versa. Didn't do the tests with SQL Server, which is the third RDBMS that I have readily available.
One thing I can tell you: Whatever you can do in the DB, do it in the DB. You won't even nearly get the same performance if you pull all the data from the database and then do the matching in the framework code.
A second thing I can tell you: Indexes, indexes, indexes!
It doesn't sound like the data is very relational so a non-relational DBMS like MongoDB might be the way to go. With any DBMS you will have to use queries to get information from it. However, if your worried about future users, you could put a software layer between the user and DB that makes querying easier.
Storing images in the filesystem and metadata in the DB is a much better idea than storing large Blobs in the DB (IMHO). I would also note that the filesystem performance will be better if you have many folders and subfolders rather than 30M images in one big folder (citation needed)

Should we start with multiple small-grained databases for an app that may scale massively

We're developing a new eCommerce website and are using NHibernate for the first time. At present we are splitting our data into multiple SQL Server databases, divided per area of functionality. So we have one for UserInfo, one for Orders, one for ProductCatalogue and so on...
Our justification for this decision is twofold really:
the website has the potential to be HUGE (it is a new website for one of the largest online brands in the UK) and we feel that by partitioning our data along functional lines we will be able to move the databases onto their own servers which would give us an easy scaling route should we need it;
my team has always worked this way - partly as a consequence of following the MS Commerce Server pattern from previous projects.
However, reading up on this decision on the internet, we find that the normal response to this sort of model is extremely scathing. "Creating more work for the devs now in order to create more work for the devs later" is one sample comment from Stack Overflow!
In addition, NHibernate is much easier to use with only one database (just one SessionFactory needed). And knowing that Stack Overflow ran off just one box for a long time makes me think that maybe we should not try to be so clever.
So, my question is, "are we correct in thinking that using fine-grained databases might increase our ability to scale or should we sacrifice this for easier development"?
Why don't you just design your database properly and put the files on appropriate disk? Use a cluster if necessary. Creating multiple databases is not an inherently scaling solution. Also - cross database referential integrity? Good luck.
What's your definition of "HUGE"? SQL Server can handle massive databases, but one thing I've learnt is that people often have no idea what constitutes a lot of data.
I've never worked in a project like this. I'm used to databases with several hundred tables, which had never been a problem.
Therefore I can't say if your idea is a good idea, I never tried it. The "my team has always worked this way"-argument is a major driver for many decisions, and I can't even say that it is always wrong.
With NHibernate you organize your data in classes. They can be in different namespaces and assemblies. You usually don't work much with the database directly, you don't need this kind of structure there.
About the scalability argument: I'm not sure if it is really scaling well when you need to access several databases every time. I mean: you always need users and orders and probably more. Then you need to get all this data from several databases.
Agree fully with starskythehutch - keep your related tables together in the same DB. BUT, you may want to consider having separate databases for things that are not related or non-critical to your main product; but that are a part of the app.
For eg: if you decide to log every visit/hit to the site in a DB, you should probably keep that in a separate DB.
The reason you should consider:
1. huge number of transactions - say hundreds of thousands / sec. Having non-critical un-related stuff in a separate DB will ensure that tlog contentions because of this are avoided.
Restore, DBCC CHECKDB, backup times. If you stuff your non-related non-critical stuff in your main DB, you are essentially increasing the size of your DB and it will affect these operations. Having it in separate DB will help you improve performance of these operations.

How to modernize an enormous legacy database?

I have a question, just looking for suggestions here.
So, my application is 'modernizing' a desktop application by converting it to the web, with an ICEFaces UI and server side written in Java. However, they are keeping around the same Oracle database, which at current count has about 700-900 tables and probably a billion total records in the tables. Some individual tables have 250 million rows, many have over 25 million.
Needless to say, the database is not scaling well. As a result, the performance of the application is looking to be abysmal. The architects / decision makers-that-be have all either refused or are unwilling to restructure the persistence. So, basically we are putting a fresh coat of paint on a functional desktop application that currently serves most user needs and does so with relative ease. The actual database performance is pretty slow in the desktop app now. The quick performance I referred to earlier was non-database related stuff (sorry I misspoke there). I am having trouble sleeping at night thinking of how poorly this application is going to perform and how difficult it is going to be for everyday users to do their job.
So, my question is, what options do I have to mitigate this impending disaster? Is there some type of intermediate layer I can put in between the database and the Java code to speed up performance while at the same time keeping the database structure intact? Caching is obviously an option, but I don't see that as being a cure-all. Is it possible to layer a NoSQL DB in between or something?
I don't understand how to reconcile two things you said.
Needless to say, the database is not scaling well
and
currently serves most user needs and does so with relative ease and quick performance.
You don't say you are adding new users or new function, just making the same function accessible via a web interface.
So why is there a problem. Your Web App will be doing more or less the same database work as before.
In fact introducing a web tier could well give new caching opportunities so reducing the work the DB is doing.
If your early pieces of web app development are showing poor performance then I would start by trying to understand how the queries you are doing in the web app differ from those done by the existing app. Is it possible that you are using some tooling which is taking a somewhat naive approach to generating queries?
If the current app performs well and your new java app doesn't, the problem is not in the database layer, but in your application layer. If performance is as bad as you say, they should notice fairly early and have the option of going back to the Desktop application.
The DBA should be able to readily identify the additional workload on the database from your application. Assuming the logic hasn't changed it is unlikely to be doing more writes. It could be reads or it could be 'chattier' (moving the same amount of information but in smaller parcels). Chatty applications can use a lot of CPU. A lot of architects try to move processing from the database layer into the application layer because "work on the database is expensive" but actually make things worse due to the overhead of the "to-and-fro".
PS.
There's nothing 'bad' about having 250 million rows in a table. Generally you access a table through an index. There are typically 2 or 3 hops from the top of an index to the bottom (and then one more to the table). I've got a 20 million row table with a BLEVEL of 2 and a 120+ million row table with a BLEVEL of 3.
Indexing means that you rarely hit more than a small proportion of your data blocks. The frequently used index blocks (and data blocks) get cached in the database server's memory. The DBA would be able to see if this memory area is too small for the workload (ie a lot of physical disk IO).
If your app is getting a lot of information that it doesn't really need, this can put pressure on the memory space. Don't be greedy. if you only need three columns from a row, don't grab the whole row.
What you describe is something that Oracle should be capable of handling very easily if you have the right equipment and database design. It should scale well if you get someone on your team who is a specialist in performance tuning large applications.
Redoing the database from scratch would cost a fortune and would introduce new bugs and the potential for loss of critical information is huge. It almost never is a better idea to rewrite the database at this point. Usually those kinds of projects fail miserably after costing the company thousands or even millions of dollars. Your architects made the right choice. Learn to accept that what you want isn't always the best way. The data is far more important to the company than the app. There are many reasons why people have learned not to try to redesign the database from scratch.
Now there are ways to improve database performance. First thing I would consider with a database this size is partioning the data. I would also consider archiving old data to a data warehouse and doing most reporting from that. Other things to consider would be improving your servers to higher performing models, profiling to find slowest running queries and individually fixing them, looking at indexing, updating statistics and indexes (not sure if this is what you do on Oracle, I'm a SLQ Server gal but your dbas would know). There are some good books on refactoring old legacy databases. The one below is not datbase specific.
http://www.amazon.com/Refactoring-Databases-Evolutionary-Database-Design/dp/0321293533/ref=sr_1_1?ie=UTF8&s=books&qid=1275577997&sr=8-1
There are also some good books on performance tuning (look for ones specific to Oracle, what works for SQL Server or mySQL is not what is best for Oracle)
Personally I would get those and read them from cover to cover before designing a plan for how you are going to fix the poor performance. I would also include the DBAs in all your planning, they know things that you do not about the database and why some things are designed the way they are.
If you have a lot of lookups that are for items not in the database you can reduce the number by using a bloom filter. Add everything in the database to the bloom filter then before you do a lookup check the bloom first. Only if the bloom reports it present do you need to bother the database. The bloom will result in false positives but you can design it to the 'size vs false positive' trade off that best suits you.
The strategy is used by Google in their big-table database and they have reported that it significantly improves performance.
http://en.wikipedia.org/wiki/Bloom_filter
Good luck, working on tasks you don't believe in is tough.
So you put a fresh coat of paint on a functional and quick desktop application and then the system becomes slow?
And then you say that "it is needless to say that the database isn't scaling well"?
I don't get it. I think that there is something wrong with your fresh coat of paint, not with the database.
Don't be put down by this sort of thing. See it as a challenge, rather than something to be losing sleep over! I know it's tempting as a programmer to want to rip everything out and start over again, but from a business perspective, it's just not always viable. For example, by using the same database, the business can continue to use the old application while the new one is being developed and switch over customers in groups, rather than having to switch everyone over at the same time.
As for what you can do about performance, it depends a lot on the usage pattern. Caching can help greatly with mostly read-only databases. Even with read/write database, it can still be a boon if correctly designed. A NoSQL database might help with write-heavy stuff, but it might also be more trouble than it's worth if the data has to end up in a regular database anyway.
In the end, it all depends greatly on your application's architecture and usage patterns.
Good luck!
Well without knowing too much about what kinds of queries that are mostly done (I would expact lookups to be more common) perhaps you should try caching first. And cache at different layers, at the layer before the app server if possible and of course what you suggested caching at the layer between the app server and the database.
Caching works well for read data and it might not be as bad as you think.
Have you looked at Terracotta ? They do have some caching and scaling stuff that might be relavant to you.
Take it as a challenge!
The way to 'mitigate this impending disaster' is to do what you should be doing anyway. If you follow best practices the pain of switching out your persistence layer at a later stage will be minimal.
Up until the time that you have valid performance benchmarks and identified bottlenecks in the system talk of performance is premature. In any case I would be surprised if many of the 'intermediate layer' strategies aren't already implemented at the database level.
If the database is legacy and enormous, then
1) it cannot be changed in a way that will change the interface, as this will break too many existing applications. Or, if you change the interface, this has to be coordinated with modifying multiple applications with associated testing.
2) If the issue is performance, then there are probably many changes that can be made to optimize the database without changing the interface.
3) Views can be used to maintain the existing interfaces while restructuring tables for more efficiency, or possibly to allow more efficient access in the future.
4) Standard database optimizations, such as performance analysis, indexing, caching can probably greatly increase efficiency and performance without changing the interface.
There's a lot more that can be done, but you get the idea. It can't really be updated in one single big change. Changes have to be incremental, or transparent to the applications that use it.
The database is PART of the application. Don't consider them to be separate, it isn't.
As developer, you need to be free to make schema changes as necessary, and suggest data changes to improve performance / functionality in production (for example archiving old data).
Your development system presumably does not have that much data, but has the exact same schema.
In order to do performance testing, you will need a system with the same hardware and same size data (same data if possible) as production. You should explain to management that performance testing is absolutely necessary as you feel the app isn't going to perform.
Of course making schema changes (adding / removing indexes, splitting tables out etc) may affect other parts of the system - which you should consider as parts of a SYSTEM - and hence do the necessary regression testing and fixing.
If you need to modify the database schema, and make changes to the desktop client accordingly, to make the web app perform, that is what you have to do - justify your design decision to the management.

What are the pros/cons of and best practices for using a single database?

Here at work (a multi-billion dollar manufaturing company with a 12 person Windows development team) we are about to go to a single master database for all new applications and will have it broken up with schemas for what we normally would have had databases for before. There will also be a few common schemas with stuff like employee directory and branch directory and so on...
I'm still not sure how I feel about this move, but we're about to have a meeting on this in a few hours to discuss pros, cons, best practices, pitfalls and so on... so I'm looking for your thoughts on this... Is it good? Is it bad? What problems are we going to run into a year from now?
Any thoughts, tips, or advice is welcome. Thanks
EDIT
In response to a comment on this question, we are using SQL Server 2005 and we are actually talking about moving what would have been seperate databases on the same instance into a single database. The driving issue is the complete lack of referential integrity accross databases as the majority of our applications need access to common data such as an employee record, or branch information.
UPDATE
Several people requested that I update this question with the results from our meeting so here it is. We debated back and forth the pros and cons of doing this (I even showed them this question using the projector) and by the time we were done we had pretty much covered the pros and cons covered here. About half of us thought we could get it done with the right resources and commitment, and about half thought we couldn't do it (or that it wouldn't work out well). We decided to use some time with Microsoft to get their thoughts and platform specific advice. I will be sure to update this question and my blog after we've talked to them. Thanks for all the help and helpful answers.
Larger database are harder to maintain due to sheer size: backups take longer, disaster recovery is slower which in turn requires more often backups. You can address these by creating filegroups and using filegroup level backup in your maintenance plans and on crash recovery you can use the 'piecemeal restore' strategy to speed things up.
Proper use of filegroups will make most of the 'cons' cited by previous replies go away: they can distribute the I/O, they can sanitize your maintenance plans and backup/restore strategy, they offer availability by taking offline only the damaged portion of the the db in case of crash. So I'd say that while those 'cons' are legit concerns, they have can be mitigated by a proper deployment strategy. Its true though that these mitigation actions require a true, experienced, dba at the helm as they will go beyond the comfort zone of a developer turned dba by need.
Some of the pros I can think of quickly:
Consistency. You can have a backup-restore so that all data is consistent. Separate dbs don't allow this because you cannot coordinate a consistent set of backups unless you take them all offline, or make them r/o, during the backup.
Dirt cheap high availability: you can deploy database mirroring for disaster recoverability and high availability. Multiple databases have problems because one cannot coordinate a simultaneous failover and apps are faced with the dilemma of seeking each database current location.
Security. While most other posts see one database harder to secure, I'd say is easier to secure. Multiple databases seem harder to secure properly simply because what everyone does is they make one login and add it to that database db_owner group. Having one database will make things harder (unless you end up making everyone dbo, very bad) but once you start doing the right thing (granular access) then one db is not harder than multiple dbs, is actually easier because you won't have to copy/maintain some common groups/rights across multiple dbs.
Control. Will be easier to impose certain policies and good practices on a single db rather than multiple ones (no data access to developers, app data access only through execute rights on the schema to enforce procedures access etc).
There are also some cons I did not see in other posts:
This will be much harder to pull off that you think right now
Increase coupling between formerly separated applications will impose development restrictions: you can't simply alter your schema, you will have to coordinate it with the rest of the apps (you can argue that this was also the case before, but was brushed under the carpet by having separate dbs, and you're right)
Log writes that are now distributed across multiple db logs will be consolidated into one single log file. If your writes are significant, this may turn out to be a serious bottleneck and force you to buy some expensive fast drives for the new, consolidated, log file. In general this can be addresses by making the log drive a stripped array across as many stripes as needed to make it fast enough (usually raid 10).
GAM/SGAM/PFS allocations will also be consolidated, but again this will be alleviated by proper use of file groups.
Pros:
You only need to remember one connection string
When users report that access is slow, you know which DB is causing the trouble
Cons:
Backups of The One DB will take a long time and will get progressively longer over time.
Restoring data from a backup will get increasingly difficult.
Performance Tuning (SQL Profiler, Execution Plan estimation) for a feature for one app will slow down every app.
Restricting access to a single application's data is cumbersome if at all possible which will likely mean in practice that all devs and DBAs will be given keys to the ENTIRE kingdom.
New developers/DBAs have a much larger learning curve as they need to navigate a large and mostly useless (to them) database structure which means higher costs for training/ramp up.
When The One database goes down, everyone in your organization plays solitaire until it is restored.
Creating test instances for app development means copying your entire db
The only "Pro" I can think of is that all of your systems will be in the one database and therefore a single place to backup, store, etc. However, I would consider this to also be one of the biggest "Cons".
Some other general Cons:
Much harder to move an application to a different location/server in the future.
Possible locking issues if any applications make use of tempdb.
Possible unrelated performance degredation on one application when another application is being used.
Much harder to implement an application level security model if all tables are in the same database.
It sounds to me as though your company is transitioning between two completely distinct motives for using database technology. The first is application support. The second is data integration. If I'm right about this, the process will open up a huge can of worms, and many of the issues won't even be addressed by putting all the data in one big database.
Consider two of the points you made. The first is the complete lack of referential integrity across different databases. The second is the idea that each application will have its own schema. What this permits to happen is complete lack of referential integrity across schemas, putting you back in the quicksand you are in now.
Fixing the data so that referential integrity is present, and fixing the schemas so that referential integrity is enforced, and fixing the applications so that the applications agree with the new schemas will turn out to be a monumental task.
Here's what your company really needs to do: Have one single CONCEPTUAL database that contains all "enterprise data", and defined in such a way that both referential integrity and entity integrity are enforced. Revise existing schemas so that they conform to the CONCEPTUAL database except for data that is both purely local to that schema and undocumented in the unified conceptual database. Use constraints wherever needed to guarantee that the data covered by these schemas doesn't lose integrity.
Make the decision about whether these schemas belong in one database or many databases based on database administration, fail soft, security, and performance requirements and NOT on the need to integrate data. Whether you use one platform or multiple platforms is a separable decision.
Where necessary, maintain synchronized copies of the same data in separate databases. Include the overhead of doing this in your performance considerations above.
Document the conceptual database out the gazoo. Don't just settle for definitions of the FORM of data. Insist on definitions of the semantics of the data as well.
Notice that if you use ID fields instead of natural keys to enforce referential integrity, you will have to generate each ID field in one schema, and let the association between ID and dependent data propagate by means of synonyms, views, and synchronized replication.
This is not going to be easy.
If DB is getting bigger, making back-up is getting more difficult because of it's size.
This could mean a serious scalability problem if you want to add high-traffic applications in the future, since it is much easier to add new database servers which run seperate dbs than it is to parrallelize a single DB. At least in SQL Server.
Pros:
The convenience of having everything in one place
Thinking less about good database design
Cons:
Even unrelated things are in one place
Less thinking about good database design leading to poorly normalized data
To me this just sounds like laziness and a belief that all this "fancy ivory tower database stuff" is worthless.
I can see that being scary, but considering the number of businesses that use Oracle EBS, or SAP, or other systems that are, in essence, this same configuration, I don't see it being a Bad Thing™. It's a big move, and will be tough to get correct, but it can really improve integration across the enterprise in the long run.
I've never heard of this approach and would like to know how the meeting goes. I see no real benefit in combining multiple applications into a single database when the data doesn't relate to each other.
I'm thinking you might have issues if you decide that an application requires it's own database server at one point.
Ah, the old EggsInOneBasket design pattern. It's not a favourite.
You're just compounding any problems caused by damage to that database. Spread the risk!
For the referential integrity issue, you can make copies of those shared tables in the subsidiary databases. You can't use real replication, but what you do is deny everything but select on these to most users.
On the same server, you can either push or pull data from the official repository of the master data and insert any new rows/update any changed rows. You can even do this with a trigger in the master database (I don't recommend it, though).
If it's different instances or servers, you can use linked servers or SSIS.
You can put the common data into a "core" schema in each database. Then you can have tools to check that all your core tables in every subsidiary database are consistent. The worse that can happen is that an application is not seeing a new employee because the core isn't updated. And keeping your database separate gives you an ability to decouple and gives you maintenance windows. (You can even decouple and run "standalone" if your master is down for maintenance).
I expect you'll only be seeing a few dozen of these core entity tables in even a largish enterprise.
There are many other ways to solve the referential integrity (RI) issue. I am not as familiar with SQL Server as other DB's. In Informix you can use synonyms to point to objects in other DB's and use these for your RI. In Oracle you can make a DB links to one or more DB's to accomplish the same thing.
These approaches have the issue that if any of the DB's are down the RI will fail causing issues in the dependent DB's. selects would work, but inserts would fail.
Consolidation can be a good idea, depending upon the size of the schema's, and other issues with scalability. SQL Server has serious scalability issues. Other DB platforms allow horizontal scaling with either a share everything approach (Oracle's RAC, latest Informix release) or a partitioned share nothing approach (DB2's DPF, Informix XPS, Netezza, Teradata)
I am with some of the others here interested to hear the results of your meeting.

Alternatives to MySQL [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I would like to store data persistently for my application, but I don't really need a full blown relational database. I really could get by with a basic "cache"-like persistent storage where the structure is just a (key, value) pair.
In lieu of a database what are my best, scalable options?
There's always SQLite, a database that's stored in a file. SQLite already has built-in concurrency, so you don't have to worry about things like file locking, and it's really fast for reads.
If, however, you are doing lots of database changes, it's best to do them all at once inside a transaction. This will only write the changes to the file once, as opposed to every time an change query is issued. This dramatically increases the speed of doing multiple changes.
When a change query is issued, whether it's inside a tranasction or not, the whole database is locked until that query finishes. This means that extremely large transactions could adversely affect the performance of other processes because they must wait for the transaction to finish before they can access the database. In practice, I haven't found this to be that noticeable, but it's always good practice to try to minimize the number of database modifying queries you issue.
if you want a 'persistent cache', and already use memcached, check memcachedb. it's a persistent hashtable using the memcached protocol, no need for a new client (but a new daemon)
I ask a similar question recently. Here are some choices:
The Berkley DB is groovy. (also see the Berkley DB Wikipedia entry)
If you are on Windows, then you can use the built in Extensible Storage Engine. This used to be called "Jet Blue".
Microsoft SQL Compact Edition is also free (But not built into the OS.
If you need scalability, then a RDBMS is your best bet. At the most basic level, you can serialize data structures into files - however, then you'd need to account for file locking issues which would limit concurrency.
SQLite is an SQL file-based database engine, which can run without a persistent database daemon (in PHP for example, it runs as an extension), however it too has concurrency issues (read the answers to this question which could help you decide if SQLite is right for you).
Unless you have a really good reason not to use a real DRBMS, I'd suggest you stick with MySQL or other "full-blown" engines.
If you want something really scalable, I wouldn't opt for a flat or XML file. As your data grows, it could kill your performance.
If you will have a lot of data at some stage, I would still opt for a database - I would take a look at something like SQLIte with a very simple schema to suit your needs.
I'm really not sure you should but have you considered just storing info in an XML doc if it really is that light? And if it's not have you considered SQLite?
If your writing a java program that wants an imbedded database look at hsqldb since it is written in java and works much better then sqlite if being called from java programs.
If you are writing Java, then there are Java database implementations (Jared mentions hsqldb, there are others) that you can directly include.
SQLite is fine for static inclusion, however you can also include MySQL within your application if you're using a compatible language such as C.
I think you would appreciate having SQL available as well. XML files just don't cut it any longer, maybe a couple of years ago when writing PDA software, but even the iPhone and Android includes SQLite now.
For Key=Value pairs, you can use INI file format with simple load and save procedures to load it and save it to in-memory hash table.
That can later scale up to anythig, just by changing load and save procedures to work with db.
You could try CounchDB, it is a very flexible document-oriented database that doesn't force you to define a schema up-front. It was written in Erlang and thanks to this it is considered very scalable solution. It can be easily queried through a REST interface.
hsqldb in in-memory mode will give you much better performance than flat file based databases. It's pretty easy to use too. And if the table gets too big, there is an option to cache it on disk too. Check out this performance comparison:
(source: hsqldb.org)

Resources