Related
I have 2 databases that have the same structure, one on a local machine and one on the company's server. Every determined amount of time, the data from the local DB should be synchronized to the server DB.
I have a general idea on how to do this - create a script that somehow "merges" the information that is not on the server DB, then make this script run as a scheduled job for the server. However, my problem lies in the fact that I am not very well experienced with this.
Does SQL Server Management Studio provide an easy way to do this (some kind of wizard) and generates this kind of script? Is this something I'll have to build from scratch?
I've done some basic google searches and came across the term 'Replication' but I don't fully understand it. I would rather hear some input from people who have actually done this or who are good with explaining this kind of stuff.
Thanks.
Replication sounds like a good option for this, but there would be some overhead (not technical overhead, but the knowledge need to support it).
Another SQL Server option is SSIS. SSIS provides graphical tools to design what you're trying to do. The SSIS package can also run SQL statements, if appropriate. An SSIS package can be started, and therefore scheduled, from a SQL Server job.
You should consider the complexity of the synchronization rules when choosing your solution. For example, would it be difficult to resolve conflicts, such as a duplicate key, when merging the data. A SQL script may be easy to create if the rules are simple. But, complex conflict rules may be more difficult to implement in a script (or, replication).
SQL Server Management Studio unfortunately doesn't offer much in this way.
You should have a serious look at some of the excellent commercial offerings out there:
Red Gate Software's SQL Compare and SQL Data Compare - excellent tools, highly recommended! You can even compare a live database against a backup from another database and synchronize the data - pretty nifty!
ApexSQL's SQL Diff and SQL Data Diff
They all cost money - but if you're serious about it, and you use them in your daily routine, they're paid for in no time at all - well worth every dime.
The only "free" option you have in SQL Server 2008 would be to create a link between the two servers and then use something like the MERGE statement (new in SQL Server 2008) to transfer the data. That doesn't work for structural changes, and it's limited only to having a live connection between the two servers.
You should definitely read up on transactional replication. It sounds like a good fit for the situation you've described. Here are a few links to get you started.
How Transactional Replication
Works
How do I... Configure
transactional replication between two
SQL Server 2005 systems?
Performance Tuning SQL Server
Transactional Replication
What you want is Peer-to-Peer Transactional Replication, which allows data to be updated at both databases yet keep them in sync through a contiguous merge of changes. This is the closes match to what you want, but is a fairly costly option (requires Enterprise Edition on both sites). Another option is Bidirectional Transactional Replication, but since this requires also two EE licenses, I say that peer-to-peer is easier to deploy for the same money.
A more budget friendly option is Updatable Subscriptions for Transactional Replication, but updatable subscriptions are being deprecated and you'd bet your money on a loosing horse.
Another option is to use Merge Replication. And finally, for the cases when the 'local' database is quite mobile there is Sync Framework.
Note that all these options require some configuration and cooperation from the Company's server DB.
There are some excellent third party tools out there. For me, xSQL Data Compare has always done the trick. And because the comparisons are highly modifiable it is suitable for almost every data compare or data-synchronization scenario. Hope this helps!
I have to make a decision of which database server to use for my next project, but the simple decision to use MySQL like almost all the projects I did is harder now, because I expect very much records.
The database will store a user list, some other irrelevant tables, and the last one, some user-collected data. Let's say, if I have 6000 users responding to a quiz about each other. Simple math shows that from those users, if each one completes the quiz about everyone (and in my project that is 99% sure that will happen) I'll end up with 35.99million records(they will exclude themselves and in this particular situation the operation is 6000*5999). Unfortunately 6000 maybe is a small number, the real one growing day by day.
What to choose? MySQL and maybe if things go well and the project grows to expand it in a cluster? PostgreSQL, MSSQL? Oracle?
I've read about all of them, each one has it's pros and cons, but still don't know what to choose. The advantage of MySQL and PostgreSQL is of course, the starting price of $0 which is pretty nice in a usual self-funded startup.
Any opinions, pieces of advice? If you encountered this situation in your experience as developers, I'd love to hear from you.
These days, free isn't something that differenciates between databases any more. Both Oracle and SQL Server have free versions, but the limitations is resources - 4 GB database, RAM & single CPU utilization. Millions of records is not a concern - it's what datatypes you're using.
I saw the OPs comment about not liking MS software - that's your prerogative, but using the free versions of either Oracle or SQL Server do benefit from seamless transition to upscale versions of the respective database.
Personally, my choice would be either Oracle or SQL Server because of IMHO, real feature considerations like hierarchical query support, subquery factoring/CTE, packages (long before I get concerned with functions/procedures), full text searching, xml support, etc.
MySQL will handle 35 million records no problem. Worry about scalability when you get there. You can easily add raid hard disks backing your database tables, and if you really start getting big you can get a compellant SAN that will scream... Don't worry about the DB engine as much as the underlying hardware.. MySQL rocks for us with millions of records.
I've had no problems handling tables as large as 36,000,000 rows on MySQL and Oracle.
Just be sure that you index the proper columns, run EXPLAINs for your queries, and maintain proper design principles.
Most of the truly large scale web properties use a distributed key-value store. That said, 35 million is large, but not that large. With most modern databases, your main two scaling worries should be throughput and what happens when no single box can contain your entire database anymore. And both of these problems can be solved to some degree for any database you choose to use. (Caching, replication, sharding, etc.)
Use MySQL until you can't anymore. At that point, you ought to be rolling in dough anyways and you now have a very desirable problem.
Use MySQL as it's free and you have experience with it.
Besides in my opinion it matters more on how you design the tables than which database you use.
35 million records can be easily handled by MS SQL Server (assuming proper database design, indices, etc.). You can start with the free SQL Server Express edition and later, if you need, you can upgrade to the full version which supports clustering, etc.
SQL Server Express does have some limitations - single CPU, 1 GB memory, max 4 GB database size and a few other things. I'm not sure how quickly these limitations will become a problem but you can always move to the full version when you run into them.
MySQL(i) & Postgre
0$ of costs
large community
many tutorials
well documentated
MSSQL
You can get "money" from MS if you promote that you are using MSSQL (secret information from some companies I worked for)
MS tools work very well
Complete tool set from C# IDE over .NET lib to Windows Server 2003
Oracle
Professional and commercial provider
Used by many large companies (I also heard about Blizzard (World of Warcraft) using Oracle)
- expensive
The final decision depends on the very special requirements of your project.
Make yourself a quick list of things , that ARE IMPORTANT for your project (e.g. quick performed queries) and look up which Database pros are matching the most to your requirements.
Everything is about design. SQL Database are some kind of cars, you just have to know which component has to be placed here and which there.
Make a clear design and you won't struggle with any of them.
May be you can test Firebird
Blog post about big Firebird database here
MySQL licence is here (not allways free).
Postgresql and Firebird are free.
First of all, don't think about performance. Premature optimization being the root of all evil and all that. You can always throw more hardware and/or tuning at it later.
All of the mentioned should perform nicely if tuned/maintained correctly. I'd focus on manageability and familiarity. IMHO open source databases excels on manageability (perhaps not the best GUIs, but the CLI has been my home for a long long time).
And if the database becomes the bottleneck, why limit yourself to those choices? How about a key-value distributed database? Or perhaps serialize data directly to disk? Storing data outside of a RDBMS, while often frowned upon, might be the correct path. Or simply use the common route of denormalization.
Always remember not to optimize prematurely.
As far as opinions go (since you specifically asked for it) I favor open source databases, specifically PostgreSQL. It's rock solid, fast and very well-featured. And even with (relatively) large datasets it has performed superbly on mediocre hardware (some tuning involved, of course, but you can't skip that step no matter which db you end up choosing).
Should I use for a web application Postgres or Oracle XE and why?
They are both available for free and for use commercially.
Why should I use one database over the other?
Well, with Oracle XE, if you hit the limits, you have to either buy Oracle or migrate your entire application to a different database. With PostgreSQL, there are no limits, and the software is completely free and open-source.
A few things to be aware of if you choose Oracle XE:
It will only use one CPU if you have a multiprocessor server
It will only use up to a maximum of 1 gigabyte of memory
It has a database size limit of 4 gigabytes for user data
Only available for 32-bit Windows and 32-bit Linux
If those limitations aren't an issue for you and you like the Oracle approach then give it a shot, otherwise consider an opensource server like Postgres or MySQL, which have none of the aforementioned limitations.
If you do choose XE and then later find your requirements have changed, the next version up of Oracle is 900USD for 5 seats, and an additional 180 per seat. This is in fact a bit cheaper than MS SQL Server afaik.
There are some good reasons to choose Oracle, particularly if you're a Java developer e.g. you can write stored procedures in Java, and I think there's native support for Java web services. Ultimately however you need to weigh up the cost with the requirements of your application. MySQL and Postgres will allow you to scale your application without any cost (other than hardware obviously).
You haven't provided much information, but unless you're already an Oracle expert, I see no reason to choose Oracle XE over PostgreSQL. PostgreSQL will always be free and is far more capable and more scalable.
And you can choose to run PostgreSQL on Windows, Mac OS X or Linux. I think Oracle XE is limited to Windows and Linux.
when choosing a db vendor be careful to match the demands of your app against the strengths of the vendors' db product. And watch out for their weaknesses.
For example, if you know your writes will be as frequent as your reads and both will occur simultaneously, then you'll want to know how each vendor you consider handles concurency. Vendors that rely on elaborate lock managers with complex lock escalation schemes are likely to bring you grief if you expect heavy load on you app. You'll spend more time trying to work around the DB's lock manager than actually solving your problems.
That's one example. Every DB has its strengths and weaknesses to consider. Do your research, find a site that compares vendors and make a choice that balances your needs against that. If you can get an eval copy, all the better to run some proof of concept tests against. Write scripts that pummel the db in some similar to what you expect your app to produce and go from there. While your at it, get the query plans for the SQL in your scripts from each vendor and see what you can learn from that about how each vendor's optimizer works.
There's more that can be said, but hopefully you get the gist.
What is your platform of choice? If it's Windows, I'd go with Oracle or MySQL as I have heard only bad things about running PostgreSQL on Windows machines.
Also, Oracle has a more GUI apprach to configuration so if you're not a Linux hacker, you may like it better.
Postgres on the other hand has a way superior SQL console and console tools in general are more developed and easier to use. Oracle has more tools but the decent ares (like PL/SQL navigator) are not free and the free ones simply suck.
Remember that choosing a database is a long-term decision so consider the possible changes in requirements for your application as well. If you are not prepared to eventually spend money on Oracle, better go with Postgres -- safe option. And good enough for most projects.
Is there a general rule of thumb to follow when storing web application data to know what database backend should be used? Is the number of hits per day, number of rows of data, or other metrics that I should consider when choosing?
My initial idea is that the order for this would look something like the following (but not necessarily, which is why I'm asking the question).
Flat Files
BDB
SQLite
MySQL
PostgreSQL
SQL Server
Oracle
It's not quite that easy. The only general rule of thumb is that you should look for another solution when the current one can't keep up anymore. That could include using different software (not necessarily in any globally fixed order), hardware or architecture.
You will probably get a lot more benefit out of caching data using something like memcached than switching to another random storage backend.
If you think you are going to ever need one of the heavyweights (SqlServer, Oracle), you should start with one of those at the beginning. Data migrations are extremely difficult. In the long run it will cost you less to just start at the top and stay there.
I think you're being overly specific in your rankings. You can pretty much start with flat files and the like for very small data sets, go up to something like DBM for slightly bigger ones that don't require SQL-like syntax, and go to some kind of SQL database after that.
But who wants to do all that rewriting? If the application will benefit from access to joins, stored procedures, triggers, foreign key validation, and the like--just use a SQL database regardless of the dataset size.
Which one should depend more on the client's existing installations and what DBA skills are available than on the amount of data you're holding.
In other words, the size of your database is far from the only consideration, and maybe not the most important one.
There is no blanket answer to this, but ALMOST always, using flat files is not a good idea. You have to parse through them (i suppose) and they do not scale well. Starting with a proper database, like Oracle or SQL Server (or MySQL, Postgres if you are looking for free options) is a good idea. For very little overhead, you will save yourself a lot of effort and headache later on. They also allow you to structure your data in a non-stupid fashion, leaving you free to think of WHAT you will do with the data rather than HOW you will be getting it in/out.
It really depends on your data, and how you intend to use it. At one of my previous positions, we used Postgres due to the native geo-location and timezone extensions which existed because it allowed us to manage our data using polygonal datatypes. For us, we needed to do that, and we also wanted to use stored procedures, views and the like.
Now, another place I worked at used MySQL simply because the data was normalized, standard row by row data.
SQL Server, for a long time, had a 4gb database limit (see SQL Server 2000), but despite that limitation it remains a very stable platform for small to medium applications for which the old data is purged.
Now, from working with Oracle and SQL Server 05/08, all I can tell you is that if you want the creme of the crop for stability, scalability and flexibility, then these two are your best bet. For enterprise applications, I strongly recommend them (merely because that's what we use where I work now).
Other things to consider:
Language integration (ASP.NET session storage, role management, etc.)
Query types (Select, Update, Delete) [Although this is more of a schema design issue, not a DBMS issue)
Data storage requirements
Your application's utilization of the database is the most critical ones. Mainly what queries are used most often (SELECT, INSERT or UPDATE)?
Say if you use SQLite, it is gears for smaller application but for "web" application you might a bigger one like MySQL or SQL Server.
The way you write scripts and your web application platforms also matters. If you're developing on a Microsoft platform, then SQL Server is a better alternative.
Typically, I go with what is commonly accepted by whichever framework I am using. So, if I'm doing .NET => SQL Server, Python (via Django or Pylons) => MySQL or SQLite.
I almost never use flat files though.
There is more to choosing an RDBMS solution that just "back end horsepower". The ability to have commitment control, for example, so you can roll back a failed transaction is one. reason.
Unless you are in the megatransaction rate application, most database engines would be adequate - so it becomes a question of how much you want to pay for the software, whether it runs on the hardware and operating system environment you want, and what expertise you have in managing that software.
That progression sounds painful. If you're going to include MS products (especially the for-pay SQL Server) in there anywhere, you may as well use the whole stack, since you only have to pay for the last of these:
SQL Server Compact -> SQL Server Express -> SQL Server Enterprise (clustered).
If you target your app at SQL Server Compact initially, all your SQL code is guaranteed to scale up to the next version without modification. If you get bigger than SQL Server Enterprise, then congratulations. That's what they call a good problem to have.
Also: go back and check the SO podcasts. I believe they talked about this briefly.
This question depends on your situation really.
If you have control over the server you're deploying to and you can install whatever services you need, then the time to install a MySql or MSSQL Express server and code against an existing database framework VERSUS coding against flat file structure is not worth the effort of considering.
What about FireBird? Where would that fit into that list?
And lets not forget the requirements that the "customer" of your solution must also have in place. If your writing a commercial application for a small companies, then Oracle might not be a good choice... but if your writing a customized solution for a large enterprise which must share data among multiple campuses, and has a good sized IT department then the decision of Oracle vs Sql Server would come down to what does the customer most likely already have deployed.
Data migration nowdays isn't that bad since we have those great tools from Embarcadero, so I would instead let the customer needs drive the decision.
If you have the option SQL Server is a good choice from the word go, predominantly because you have access to solid procedures and functions and the database backup facilities are totally reliable. Wrapping up as much as your logic as you can inside the database itself (rather than in whatever language you are using) helps security and performance - indeed there's an good argument to be made for always using procedures for insert/update logic as these make you invulnerable to injection attacks.
If I have the choice the only time I'd consider MySQL in preference is with a large, fairly simple, database predominantly used for read access. This isn't to decry MySQL which has improved markedly of late and I happily use if I don't have the choice, but for more complex systems with update/insert activity MSSQL is generally the superior option.
I think your list is subjective but I will play your game.
Flat Files
BDB
SQLite
MySQL
PostgreSQL
SQL Server
Oracle
Teradata
I'm starting a new project here (Windows Forms). What's the best option today for a small (free as in beer) DBMS?
I've used SQL Server Express on the past projects, but time and time again I hear people saying that the product from Oracle is faster and more powerful.
It will be used in a small company (around 20 users) and will not reach the 4 GB limit any time soon :)
I don't want to start a flame war on my first post, so please point me to some link showing a good (and actual) comparison between the 2 products, if possible.
PS: I've heard about IBM DB2 Express too, but I coudn't find any information about it. (Marketing material from IBM doesn't count :) )
I would go for the SQL Server Express solution, unless you absolutely have to use a feature in Oracle that SQL Server does not have and you have no usable workaround.
Example of Oracle's strengths:
Analytical Functions in Oracle ROCK!
PL/SQL is better than T-SQL.
If you're going to scale up the system to 1,000's of users all updating the same small dataset
You scale upto multi-TB databases,
You need to scale to need big numbers of CPU's in your server (over 8).
need instant failover (RAC)
you really cannot afford to lose a transaction.
Maybe you can tell, I'm a big Oracle fan! But I think that Oracle Express is a commercial reaction to SQL Server Express and I don't think Oracle really deep deep down likes it.
You know with SQL Server that there is an upgrade path (SQL Server 2008 is soon) plus service packs.
SQL Express is also more "install and forget" than Oracle.
and it will integrate better with your IDE (if your using .NET)
In terms of speed, both are going to be lighting quick with such a small dataset size.
It would be hard to argue either way given the needs you outlined, that either would shine over the other.
What I will say is this:
You say you are already familar with SSExpress, then that is a good reason to stick with it
IMHO the tools with SSExpress are superior and easier to use than the Oracle equivalent
That said, I have much more experience with SS than Oracle so YMMV.
Sorry, no link, but one advice. Because we support Oracle and SQL Server, I know that getting fixes for the 'normal' Oracle database, is not something what I call fun. You have to pay for it, and if you have no tool which updates your Oracle system for you, it's a pain in the a.., if you ask me. Check out how the Oracle XE is supported with updates/fixes. I don't know, I only use the 'normal' Oracle (Developer) database.
I think it's great to rethink things every once in a while and that it's very smart to consider alternative products when you are at a junction to do so.
If you are comfortable optimizing systems and are dba level in skills, I'd consider PostgreSQL. I do not consider myself a dba and have middling database skills and find SQL Server Express extremely easy to use. Also, I've had products exceed the limits of SQL Server Express - the transition to SQL Server Standard/Enterprise is seemless.
I realize that this doesn't matter at a technical level, but Larry Ellison buys jets and prostitutes with his profit. Bill Gates is solving problems of immense importance to humanity with his. All things being equal, I always prefer to give my money to Bill Gates.
Is this any use:
https://web.archive.org/web/1/http://downloads.techrepublic%2ecom%2ecom/5138-9592-6028761.html
NB Registration is required
Both of KiwiBastard's points are very good and I completely agree with him.
If you really want a free alternative that is similar to MS SQL and supports growth should you need it, you could have a look at MySQL or PostgreSQL. SQLite also seems a good choice.
Surely you can afford an old Linux server if you work in a company with 20 employees.
100% SQL Express, more easy to install and maintain than Oracle.
IMHO the major problem with SQL Server, has for a long time been, no multi-version read consistency. Fortunately this has been corrected since SQL Server 2005 with the snapshot isolation level.
If your looking for a good RDBMS for a small project requring minimal knowledge for maintenance, SQL Server Express Edition is a good pick. The SQL Server Express Edition UI is much easier to understand than RMAN or the "easier"-to-use backup scripts included with Oracle Database XE which requires offlining your database.
Oracle Database XE is on my *** list. They recently released an ODBC driver for Linux that wasn't compiled properly (ld returns missing symbols for required ODBC functions) to be at all usable (10.2.0.4). With this kind of lack of attention to any reasonable amount of QA even for a 'free' product I would think twice about going down that road.
For DB2 Express-C see:
"DB2 Express-C™ is the free version of one of the most advanced
database management systems in the world. Why pay when you can have
all you need for free? DB2 Express-C is free to develop, deploy and
distribute.
It is a fast, secure, reliable, and amazingly scalable dataserver,
ideal for most startups and small/medium sized businesses. DB2
Express-C 9.7 is available on Linux, Unix, Windows, and now Mac OS X
as well! It also enables developers to easily handle XML through the
native storage technology called pureXML™. Whether you develop in
Java, .Net, Ruby, Python, Perl or pretty much any other programming
language out there, DB2 can be your technological advantage."