Related
Our current system database system is a clipper DOS application. The database inside its folder is fragmented/divided into many parts. I want to decrypt the database so that I will have only one database in all and avoid reshuffling of data. I'll attached the file folder Screenshot.. the database is on .DBF format
VScreenshot of files
Often you can decompile the CLIPPER exe file to source code and work from the .prg I've done it many times. The program to use is called WALKYRIE.
In Clipper and Fox Pro for DOS .dbf file is a simple table file.
If You want to use as data base with many tables in one unit.
You can import these tables in MS SQL data base and/or part of a MS Access database.
I see that you got several answers. Most are partially right. Let's address these one at a time:
All those files essentially comprise the "database" for the application you're using. They could be used by other applications as well. Besides having a lot of files, what is the problem you're trying to solve?
People mentioned indexes. You can generally ignore these. There are there primarily to make access to the data files faster. Any properly written clipper application will recreate these if they're missing or corrupted. You could test this by renaming one, running the app, and seeing what happens. If it doesn't recreate it you can name it back. Not replacing missing index files would be unusual behavior.
The DBF file format is binary, but barely. Most of what's in a DBF is text and is readable with an editor. But there's no reason to do so - I'm sure there are several free DBF utilities out there to to read DBF files. Getting the structure of the files could be very helpful.
Getting the data out of the files would also be fairly simple with a utility. If you look up the DBF format you could even write one fairly easily in Clipper, any other language that uses DBF files, or in something like Python. Any language that can open and write files, really. It's not hard - any competent developer could do this in a matter of hours. Must less if you're using Clipper or another language that natively reads DBX files.
Most people create dBase/Clipper programs with relational data, like SQL Server. Where SQL Server has tables that relate to each other dBase/Clipper has a file for each "table." This isn't a requirement, but it was almost certainly done this way.
Given that, if you get the table structures through a utility or by reading the headers in an editor (don't save them from an editor!) you could quite likely recreate the database schema (i.e. the map of the data). Once you have that it's fairly trivial to get the data into another type of database (SQL Sever, Access, or whatever you like to use.) If non of the files are too large it's conceivable to put all the files into Excel sheets. It really depends on what you want to do with it.
As others have said, you may be able to get the code by Valkyrie. Some people have used it very successfully. I don't know where you get it and I've never used it. Why do you not have the code? If this is a commercial application you likely should not have it. If it's a custom app who ever wrote it or paid to have it written should have the code.
Again, it's not clear to me what problem you're trying to solve. But there are many options for doing something with those DBF files. Fortunately they are one of the easier to read data formats you could be working with.
Let me know if you have any questions. Apologies for the typos that are no doubt scattered throughout this reply.
You sort of can get an idea of how they relate to each other by opening the index files they use (.NTX files). If you have the DBU utility (executable) around, you can open the DBF and load the index (NTX). LibreOffice Calc is also able to open DBFs (haven't tested .NTX).
If you open the .NTX on a text editor you will see the indexes in the beginning.
I open with Access, but I can save the data using a PrintFill Program.
I work on Labware LIMS, which has both configuration, and customization via its own programming language and internal code editor, and stores this customization code in database records. (Note, not the source code of the actual application itself, just the customization code a.k.a. LIMS Basic.) Almost everything in LIMS is stored in the database.
We want to investigate the possibility of using source control to protect this code but we don't know much more than the theory of using something like Git. (I have worked as a junior QA and used git but not as a dev and my knowledge is limited!)
Of particular use would be the merging tools, as currently we have to manually merge code in a text editor, if we even notice there is a conflict (checking content between dev and live is time consuming and involves using multiple tools, some of which are 3rd party tools we have developed ourselves, which are hit and miss. I personally find it easiest to cut and paste into a text file and then use Beyond Compare.
There is no notification that the code is different when moving it from dev to live (no deployment as such, you just import an xml file) so we often have things going live that someone was working on unbeknownst to each other. I.e. dev 1 is working on the code in object 1, dev 2 gets a ticket to make a change to object 1, does so and puts their change Live, whatever dev 1 was doing is now also Live in whatever state it was in. (Because we don't always have time to thoroughly check what state each object is in between up to 3 different databases.)
Is it possible to use source control just on the code within the database, but not necessarily the database itself? (We have backups and such for that but its easy for some aspects of the system to get overwritten by multiple devs working on overlapping areas at the same time.)
If anyone reading this has any specific knowledge of LW LIMS, we are referring to the Subroutines mostly, we have versioned Analyses which stands in for source control for the moment and is somewhat effective but no way to control who is doing what on the subroutines other than a comment log at the top. I have tried to find any information on how other teams source control their code in LIMS but to no avail.
The structure of one of these tables can range from as simple as the code just existing in one field as a straight text dump with a few other fields such as changed_on, changed_by and name (Subroutines), or more complex with code relating to one record being sprinkled around in multiple rows on another table entirely (Analyses) but even if it could just deal with the simple scenario to start with that would be great!
TL;DR: Could the contents of the Code field in a database record be treated like a regular code object in other dev environments somehow and source controlled using Git? (And is anyone willing to explain it simply for me to follow?)
As you need to version control table fields of subroutine, but LW LIMS doesn’t have the IDE for version control (such as git, svn etc). So the direct answer is no.
If you really want to do version control for the codes in database, you can create a git repository and only put the codes in git repository. when a file has updated, you can commit & push the changes. And it’s easy to compare the difference between versions.
More detail about git, you can refer git book.
LabWare LIMS has a number of options for version control. You COULD version the Subroutine table by adding a SUBROUTINE.VERSION field to the table, this works the same way as other versioned tables in LabWare where it asks you if you would like to create a new version of the object before saving. There are a few customers I work with that have done this.
Alternatively, (and possibly our more recommended method prior to LEM) there is the Snapshot capability where the system automatically takes a "snapshot" of objects as they are saved - when viewing these you have the ability to view them side by side in a comparison dialogue - it will show < or > for lines which are different.
Another approach is, if you have auditing turned on you are able to view the audit history for changes to specific objects - this includes subroutines.
One other approach is to use configuration packages - this has the ability to record version AND build numbers. Though individual subroutines is probably a bit too granular for it's intended design.
Lastly, since this question was originally posted we have developed a product called LabWare Environment Manager (LEM) which has some good change control functionality built-in.
For more information on the suggestions above, please have a look at the LabWare Technical manual for the version you are on. We also have a mailing list for questions like this to be posted. You might find an answer there. If you have access to our Support webpage you're able to search previous questions that have been asked. I'd also suggest that you get in touch with your Account Manager at LabWare who can help you answer some of your questions.
HTH
I would like to know your opinion about how you would organize the files/directores in a big web application using MVC (backbone for example).
I would make the following ( * ). Please tell me your opinion.
( * )
js
js/models/myModel.js
js/collections/myCollection.js
js/views/myView.js
spec/model/myModel.spec.js
spec/collections/myCollection.spec.js
spec/views/myView.spec.js
This is how I've traditionally organized my files. However, I've found that with larger applications it really becomes a pain to keep everything organized, named uniquely, etc. A 'new' way that I've been going about it is organizing my files by feature rather than type. So, for example:
js/feature1/someView.js
js/feature1/someController.js
js/feature1/someTemplate.html
js/feature1/someModel.js
But, oftentimes there are global "things" that you need, like the "user" or a collection of locations that the user has built. So:
js/application/model/user.js
js/application/collection/location.js
This pattern was suggested to me because then you can work on feature sets, package and deploy them using requirejs with relatively little effort. It also reduces the possibility of dependencies occurring between feature sets, so if you want to remove a feature or update it with brand new code, you can just replace a folder of 'stuff' rather than hunting for every file. Also, in IDE's, it just makes the files you're working on easier to find.
My two cents.
Edit: What about the spec files?
A few thoughts - you'll just have to pick the one that seems most natural to you I think.
You could follow the same 'feature folder' pattern with the spec files. The upside being that all of the specs are in one place. The downside is that now, much like what you're currently doing, you have to places for one feature's files.
You could put the specs in a 'spec' folder of the feature folder. The upside is that you now have actual packages that can be wrapped up in a single zip file with no chance of clobbering other work. It's also easier to find directly related files for writing tests - they're all in the same parent folder. The downside is that now your production code and test code is in the same folder, publishing it (possibly) to the world. Granted you'll probably end up compiling the production javascript down to one file at some point.. so I'm not sure that's much of an issue.
My suggestion - if this is a large application and you figure you're going to have a few hands touching the files, leave something like a 'package.json/yml/xml' file in the folder. In there, list out the production, spec, and any data files you need for testing (you can most likely write a quick shell script to do this for you). Then write out a quick script to look through your source folder for 'package.whateverYouChose' files, get the test files and then build your unit testing page with it. So, let's say you add another package.. run 'updateSpecRunner' or whatever you name the script, and it'll generate you another SpecRunner.html file (or whatever you named the file your running the specs on). Then you can manually test it in a browser, or automate it using phantomjs/rhino.
Does that make sense?
You can find a good example how to organize your application to this link
Backbone Jasmine examples
It looks more or less like your implementation.
I'm totally new to the world of programming and understand very little in terms of jargon and typical methodology.
A while ago I was writing some code, but accidentally deleted some good code while I was deleting bad code. From then on I started creating versions of my files, I would name each file with the date and a version number.
However, this is a pain in the ass, having to give an unique name to each file and then going to my core file and changing the reference to the name of the new file.
And then, just the other day I accidentally over wrote something important even with this method, probably because of a typo in naming.
Needless to say, this method sucks.
I'm looking for suggestions on better practices, better tools. I've been looking at version control, but a lot of them, git svn look really complicated. The idea is to speed up the whole versioning process, not make it harder by having to do command line.
Right now I'm hoping that there's a tool that would save an unique version of the file every time I hit ctrl-s, and give me one button to create a finalized version.
Of course if there are suggestions for totally different ways of doing things, that would be more awesome.
Thanks everyone.
There are two approaches to this problem:
Versioning on demand. This is the model used by subversion, CVS, etc., etc. When you have made a 'significant' change, you decide to tell the system "keep this version".
Automatic versioning. This is the model used by some old VAXen, Eclipse, IDEA, every wiki ever, and a few writer's tools. Every time you save, a new version is implicitly created. At some remove, old versions may be culled (e.g., only one version is kept from work performed a week ago, rather than every save).
It sounds like you would prefer #2, because it is "fool-proof" -- you never have to go, "oops, I should have 'checked in' / 'kept' my work before making this change." You can always roll back. One downside is that you have to manually step through the old versions to find something, because unlike with #1 you generally are not giving a description of each change.
Another downside is that for large files, or ones that are not easily diff'd/patched (i.e. binary files), you will start burning through disk space pretty fast..
As an aside, it sounds like you don't need 90% of the features in a standard SCM system -- branching, labeling, etc. -- but you might find uses for them eventually. So learning one may be a win in the long run. You can do this with svn, etc. but it will take some customizing. If you use a scriptable editor (emacs, vi, TextMate, whatever) you could redefine the "Save" command as "Save and make a new version".
Subversion is more or less the gold standard.
I'd suggest (especially for a newbie) that you check out BeanStalk (www.Beanstalkapp.com) to run your subversion server and TortoiseSVN for your client.
Good luck!
Whatever you do, if someone mentions Visual SourceSafe -- run as fast as you can. VSS was created by Satan himself and handed down to torment developers the world over.
I think you're in a position where you have to get a little bit out of your comfort zone and take some time to learn git. It's pretty easy to learn and use.
Believe me, it's really worth it. Time spent learning git is time well spent.
If you are not working in a team, you could use something like Eclipse's local history feature. It stores versions of your files locally, and you can revert to previous versions whenever you feel like it. More details here: http://help.eclipse.org/ganymede/index.jsp (Search for "local history"). I am pretty sure other IDEs have such a feature too.
If you are collaborating with others on your code, there probably is no way around learning one of the standard tools like SVN, CVS or git. For most of them, there are plugins for many IDEs available, so you don't have to use the command line.
I currently use Subversion, but my source control experience is limited.
I would however suggest reading the tutorial by Eric Sink.
http://www.ericsink.com/scm/source_control.html
Its best to learn how to use an existing 'industry standard' versioning tool like Subversion. Even if you're new to programming and version control, SVN isn't that hard to learn and will serve you well. I personally use and recommend VisualSVN Server and TortoiseSVN for Windows. Both are free and quite simple to use.
For a system that creates a revision on every save, perhaps you should look into a Versioning File System.
I think TortoiseSVN would be a good Subversion client for you to try if you're in Windows. It won't do what you're looking for with every-time-I-save-I-get-a-new-version--you'll have to manually "commit" versions to the repository. When you do a commit, that creates a new version, essentially saving your progress at that point. TortoiseSVN is pretty user-friendly, and it's a GUI, so you won't be working at the command line. You'll be able to do things like right-click a file in Windows Explorer and choose Commit to save your progress. Plus, TortoiseSVN is free and open source.
Subversion is not really complicated. If you are using Windows, TortiseSVN will help a lot, if you are using Eclipse, subclipse plug-in is awesome. (You probably should be using eclipse regardless :) )
Some of the others are a bit complicated, but you just have to know the pattern with eclipse. Maybe you could "Try it out" with an open source project or some existing subversion server.
The cycle would be:
First you "Check out" a repository. This fills up your specified directory with the contents from the repository.
If you are doing it from the command line--it's "svn co"--there is enough help there to figure out the rest.
Second you edit your files. You don't have to lock them or anything.
if you add a new file, you use "svn add filename" as soon as you add it. This won't actually change the repository until you commit your changes.
When a group of edits are done, you check them in with "svn ci" (also svn commit works).
This one has a SLIGHT twist that you'll always forget--every commit needs a comment. You don't have to specify the files you are committing or anything, but you do need to be in the top level of your project (it will commit everything below your directory.
So the procedure here is, go to the "root" of your project tree and type:
svn ci -m "comment"
piece of cake.
Finally, IF someone else is checking stuff in things get SLIGHTLY stranger. before you commit, you should "update" and get their changes. "svn up" is all it takes, but it may warn you that there were merges. This only happens when both of you edited the same file, and 90% of the time, the merges will go okay. the rest of the time, it will put little markers in your file telling you what you changed and what they changed. The "up" command will tell you which files it did this to. Go look at them and clean the file up before you check the file in.
Always test between "svn up" and "svn ci", you never know if their crappy changes busted your pristine code.
That's really it. It's so easy from the CLI, that the graphics environments are hardly worth it (but subclipse is really nice if you are in eclipse anyway because it will visually show you modified files that need to be checked in).
If you ever forget, svn's command line help is extremely terse and useful, tells you JUST what you need to know, and has help on all the sub-commands and options.
If you're looking for an easy-to-set-up version control system for Windows, I highly recommend TortoiseHg, an easy-to-use Mercurial frontend for Windows. You don't have to worry about setting up and keeping track of a repository separate from your files, but you always can do so if you'd like to. Mercurial is a great tool because it can grow with your needs. It has all the usual features like easy merging, etc. and is quite a bit easier to wrap your head around than Git in my experience.
I think Git is really easy to use especially when you use GitHub. They also provide lots of good guides to get up and running.
http://github.com
http://github.com/guides/home
I've used Git, SVN, CVS, and Perforce. On both Windows and Unix environments.
My vote is definitely for SVN, as it's ease of use, and flexibility. I prever to use command-line now, but at one time I was using TortoiseSVN for Windows, which we were able to get non-technical people to use without a hitch.
Use SVN.
You're definitely on the right track with recognizing the need for version control, but sound unsure what that might mean to you and your work. Once you learn the concepts behind version control systems, you will really come to appreciate them.
The concepts are simple: a source code control system is a piece of software designed to help you store and manage your code. How you get code into and out of it differ based on which system you choose: one paradigm is that you deliberately "check out" a file, make your changes to it, test it and make sure it's good, then check it back in. Another is that you simply save every change you make because disk space is dirt cheap, much cheaper than your time and effort spent to create the source in the first place.
Another important concept is the "baseline" or "label". When your product is in a ready-to-ship state, you tell the source code control system to create a "label" and tag every current item your entire source code base with that label. That way, when someone reports a bug in version 4.1 you can go to your system, request all the files with the "Version 4.1" label and get exactly the source code they're having a problem with.
Having a source control tool integrated with your development environment makes the whole process much easier than having to mess with command lines. (Don't discount command line because of their complexity, they deliver elegant control to an experienced user, and you eventually will become an experienced user.) But for now, I'd recommend a source code tool that can automate the process as much as possible.
Some things to consider: are you now, or are you planning to share the development with another developer? That might make a difference on how you want to set up a server. If you're developing alone on your own box, you can set it all up locally, but that's probably not the best approach for a team. (If you're unsure, git is very flexible in that arena.) Are you going to be storing large multimedia files, or just source code? Some source code systems are designed to efficiently store only text files, and will not handle movies, sounds or image files very well.
Something else to know is that most newer source control systems require some kind of "daemon" program running on the server (Subversion, git, Perforce, Microsoft Team Foundation Server) while the older, simpler systems just use the file system directly (Visual Source Safe, cvs) and don't require a server program.
If you don't want to learn much and your demands are low, the simpler solutions should suffice. Microsoft's Visual Source Safe used to come with their visual studio products, and was a very simple to use tool. It's not very robust, it's Microsoft-only, and it can't handle large files well, but it's very, very easy to set up and use. If you don't want to spend money, Subversion and git are two stellar open source solutions, and there is a lot of documentation for both on the web.
If you like to spend money, Perforce is considered an excellent choice for professional development teams (and I believe they have a free single-developer version.) If you really like to spend lots of money and want to make Bill Gates happy, Microsoft's Team Foundation Server is a complete software development lifecycle manager, is extremely easy to use in the Windows environment, and very powerful; but you'd probably want to devote an entire Windows server (plus SQL Server) instance to host it, and it will cost you several thousand dollars just on licenses. Unfortunately it is not the right tool for a one-man shop, or if you have no Windows admin experience.
If you have the budget or the connections, bringing in an experienced software engineer to help you get things started might be the quickest path to success. Otherwise, you'll have to do some more research to learn which systems best fit your situation.
Whatever VCS you use, if you choose versioning on demand instead of automatic versioning (to borrow terms from Alex's post), you will have to go through some ceremony to:
-create,
-rename,
-move,
-copy, or
-delete
a file that is under source control.
When you create a new file, you have to Add it to source control before you Commit your changes to the repository.
When you rename, move, copy, or delete a file under source control, do so with your VCS client. In TortoiseSVN and TortoiseGit, the move and copy operations are done with a right-click-and-drag, whereas the rename and delete operations are available via a right-click.
As you can imagine, changing things like the name of a project can be quite the hassle, hence the case for automatic versioning.
Ordinary file edits and any changes to files not under source control, do not require you to tell your VCS client about them.
Finally, for one-man projects, I prefer git over SVN because SVN requires at least 2 copies of everything: a repository (the "master" copy of the files and history) and a working copy (the copy you do your work on). With git, the repository and working copy are the same thing, which makes my experience simpler.
We use SourceGear Vault, which has great integration with Visual Studio, and is free for a single user. Depending on what framework and languages you're using, though, Subversion is a great free solution.
First of all, please read these articles by Eric Sink. Eric sink runs a company that creates a Source Control system called Vault. He explains in a newbie friendly manner how to do source control, best practices etc:
Introduction to Source Control
I found it invaluable when I first wanted to understand Source Control.
SourceGear Vault is FREE for a single user. It's interface is intuitive and integrates well with Visual Studio.
If it's just you, you might want to try Bazaar. It's distributed like Git (so it'll be nice for a single person--no server to deal with), but one of their main goals was to make it it much easier to use than Git.
Also, there is a handy gui tool that should make it amazingly easy to use called ToroiseBzr. http://bazaar-vcs.org/TortoiseBzr
There is in fact such a tool. It is called emacs.
Just create yourself a "~/.emacs" file and put the following lines in it:
(setq kept-new-versions 5)
(setq kept-old-versions 5)
And then restart emacs.
This tells emacs to save your 5 oldest and 5 newest versions of that file. They will be kept in files named filename~n~ where "filename" is your file's normal name, and "n" is the backup number.
If you develop your project alone (don't need ane server for collaboration) Mercurial might be you system of choice. I personally value one of its features: it only uses one place to save its information, it is the .hg directory in the root of your project. It doesn't put its data into every directory (like SVN). This way the archive and the project directory is easy to manage.
I've used Visual Source Safe, Perforce, and Subversion. They were all fine, but I would have to say that the support and extensions for Subversion just seemed slightly better. If you're planning on entering/staying in the software industry, you MUST know the fundamentals to source control, and I would highly recommend setting up one of the source control services. Subversion would be my recommendation and is free as well. It will be complicated at first, but you really should use a SVN client to add a GUI to increase utility and cut down on all the complication you're observing.
I quick google of "dreamweaver svn" reveals that many people are working with Subversion in Dreamweaver. I'm a advocate of version control, and SVN in particular, so I would recommend you look into that :)
If you don't want to use a full on version control system (as noted above), you may be able to improve your lot by refining and automating the procedure you described originally. Depending on your comfort with the tools you should be able to put together a script in DreamWeaver itself or in Windows Scripting ( Powershell, VBA, Perl, etc ) that will at least make date-named copies of the folder you are working in every so often. This will keep you from having to do it and make sure there aren't any typo-related problems. Further down that path you can have your script put a copy of your work on a backup drive or remote server, and then you'd have a back up, too.
I'm afraid I don't know much about DreamWeaver, but if it has much scripting support built-in you may even be able to "hook" into the Save/ Auto-Save functions and have them do exactly what you want.
Hope this helps,
adricnet
I'm doing a web app, and I need to make a branch for some major changes, the thing is, these changes require changes to the database schema, so I'd like to put the entire database under git as well.
How do I do that? is there a specific folder that I can keep under a git repository? How do I know which one? How can I be sure that I'm putting the right folder?
I need to be sure, because these changes are not backward compatible; I can't afford to screw up.
The database in my case is PostgreSQL
Edit:
Someone suggested taking backups and putting the backup file under version control instead of the database. To be honest, I find that really hard to swallow.
There has to be a better way.
Update:
OK, so there' no better way, but I'm still not quite convinced, so I will change the question a bit:
I'd like to put the entire database under version control, what database engine can I use so that I can put the actual database under version control instead of its dump?
Would sqlite be git-friendly?
Since this is only the development environment, I can choose whatever database I want.
Edit2:
What I really want is not to track my development history, but to be able to switch from my "new radical changes" branch to the "current stable branch" and be able for instance to fix some bugs/issues, etc, with the current stable branch. Such that when I switch branches, the database auto-magically becomes compatible with the branch I'm currently on.
I don't really care much about the actual data.
Take a database dump, and version control that instead. This way it is a flat text file.
Personally I suggest that you keep both a data dump, and a schema dump. This way using diff it becomes fairly easy to see what changed in the schema from revision to revision.
If you are making big changes, you should have a secondary database that you make the new schema changes to and not touch the old one since as you said you are making a branch.
I'm starting to think of a really simple solution, don't know why I didn't think of it before!!
Duplicate the database, (both the schema and the data).
In the branch for the new-major-changes, simply change the project configuration to use the new duplicate database.
This way I can switch branches without worrying about database schema changes.
EDIT:
By duplicate, I mean create another database with a different name (like my_db_2); not doing a dump or anything like that.
Use something like LiquiBase this lets you keep revision control of your Liquibase files. you can tag changes for production only, and have lb keep your DB up to date for either production or development, (or whatever scheme you want).
Irmin (branching + time travel)
Flur.ee (immutable + time travel + graph query)
XTDB (formerly called 'CruxDB') (time travel + query)
TerminusDB (immutable + branching + time travel + Graph Query!)
DoltDB (branching + time-travel + SQL query)
Quadrable (branching + remote state verification)
EdgeDB (no real time travel, but migrations derived by the compiler after schema changes)
Migra (diffing for Postgres schemas/data. Auto-generate migration scripts, auto-sync db state)
ImmuDB (immutable + time-travel)
I've come across this question, as I've got a similar problem, where something approximating a DB based Directory structure, stores 'files', and I need git to manage it. It's distributed, across a cloud, using replication, hence it's access point will be via MySQL.
The gist of the above answers, seem to similarly suggest an alternative solution to the problem asked, which kind of misses the point, of using Git to manage something in a Database, so I'll attempt to answer that question.
Git is a system, which in essence stores a database of deltas (differences), which can be reassembled, in order, to reproduce a context. The normal usage of git assumes that context is a filesystem, and those deltas are diff's in that file system, but really all git is, is a hierarchical database of deltas (hierarchical, because in most cases each delta is a commit with at least 1 parents, arranged in a tree).
As long as you can generate a delta, in theory, git can store it. The problem is normally git expects the context, on which it's generating delta's to be a file system, and similarly, when you checkout a point in the git hierarchy, it expects to generate a filesystem.
If you want to manage change, in a database, you have 2 discrete problems, and I would address them separately (if I were you). The first is schema, the second is data (although in your question, you state data isn't something you're concerned about). A problem I had in the past, was a Dev and Prod database, where Dev could take incremental changes to the schema, and those changes had to be documented in CVS, and propogated to live, along with additions to one of several 'static' tables. We did that by having a 3rd database, called Cruise, which contained only the static data. At any point the schema from Dev and Cruise could be compared, and we had a script to take the diff of those 2 files and produce an SQL file containing ALTER statements, to apply it. Similarly any new data, could be distilled to an SQL file containing INSERT commands. As long as fields and tables are only added, and never deleted, the process could automate generating the SQL statements to apply the delta.
The mechanism by which git generates deltas is diff and the mechanism by which it combines 1 or more deltas with a file, is called merge. If you can come up with a method for diffing and merging from a different context, git should work, but as has been discussed you may prefer a tool that does that for you. My first thought towards solving that is this https://git-scm.com/book/en/v2/Customizing-Git-Git-Configuration#External-Merge-and-Diff-Tools which details how to replace git's internal diff and merge tool. I'll update this answer, as I come up with a better solution to the problem, but in my case I expect to only have to manage data changes, in-so-far-as a DB based filestore may change, so my solution may not be exactly what you need.
There is a great project called Migrations under Doctrine that built just for this purpose.
Its still in alpha state and built for php.
http://docs.doctrine-project.org/projects/doctrine-migrations/en/latest/index.html
Take a look at RedGate SQL Source Control.
http://www.red-gate.com/products/sql-development/sql-source-control/
This tool is a SQL Server Management Studio snap-in which will allow you to place your database under Source Control with Git.
It's a bit pricey at $495 per user, but there is a 28 day free trial available.
NOTE
I am not affiliated with RedGate in any way whatsoever.
I've released a tool for sqlite that does what you're asking for. It uses a custom diff driver leveraging the sqlite projects tool 'sqldiff', UUIDs as primary keys, and leaves off the sqlite rowid. It is still in alpha so feedback is appreciated.
Postgres and mysql are trickier, as the binary data is kept in multiple files and may not even be valid if you were able to snapshot it.
https://github.com/cannadayr/git-sqlite
I want to make something similar, add my database changes to my version control system.
I am going to follow the ideas in this post from Vladimir Khorikov "Database versioning best practices". In summary i will
store both its schema and the reference data in a source control system.
for every modification we will create a separate SQL script with the changes
In case it helps!
You can't do it without atomicity, and you can't get atomicity without either using pg_dump or a snapshotting filesystem.
My postgres instance is on zfs, which I snapshot occasionally. It's approximately instant and consistent.
I think X-Istence is on the right track, but there are a few more improvements you can make to this strategy. First, use:
$pg_dump --schema ...
to dump the tables, sequences, etc and place this file under version control. You'll use this to separate the compatibility changes between your branches.
Next, perform a data dump for the set of tables that contain configuration required for your application to operate (should probably skip user data, etc), like form defaults and other data non-user modifiable data. You can do this selectively by using:
$pg_dump --table=.. <or> --exclude-table=..
This is a good idea because the repo can get really clunky when your database gets to 100Mb+ when doing a full data dump. A better idea is to back up a more minimal set of data that you require to test your app. If your default data is very large though, this may still cause problems though.
If you absolutely need to place full backups in the repo, consider doing it in a branch outside of your source tree. An external backup system with some reference to the matching svn rev is likely best for this though.
Also, I suggest using text format dumps over binary for revision purposes (for the schema at least) since these are easier to diff. You can always compress these to save space prior to checking in.
Finally, have a look at the postgres backup documentation if you haven't already. The way you're commenting on backing up 'the database' rather than a dump makes me wonder if you're thinking of file system based backups (see section 23.2 for caveats).
What you want, in spirit, is perhaps something like Post Facto, which stores versions of a database in a database. Check this presentation.
The project apparently never really went anywhere, so it probably won't help you immediately, but it's an interesting concept. I fear that doing this properly would be very difficult, because even version 1 would have to get all the details right in order to have people trust their work to it.
This question is pretty much answered but I would like to complement X-Istence's and Dana the Sane's answer with a small suggestion.
If you need revision control with some degree of granularity, say daily, you could couple the text dump of both the tables and the schema with a tool like rdiff-backup which does incremental backups. The advantage is that instead of storing snapshots of daily backups, you simply store the differences from the previous day.
With this you have both the advantage of revision control and you don't waste too much space.
In any case, using git directly on big flat files which change very frequently is not a good solution. If your database becomes too big, git will start to have some problems managing the files.
Here is what i am trying to do in my projects:
separate data and schema and default data.
The database configuration is stored in configuration file that is not under version control (.gitignore)
The database defaults (for setting up new Projects) is a simple SQL file under version control.
For the database schema create a database schema dump under the version control.
The most common way is to have update scripts that contains SQL Statements, (ALTER Table.. or UPDATE). You also need to have a place in your database where you save the current version of you schema)
Take a look at other big open source database projects (piwik,or your favorite cms system), they all use updatescripts (1.sql,2.sql,3.sh,4.php.5.sql)
But this a very time intensive job, you have to create, and test the updatescripts and you need to run a common updatescript that compares the version and run all necessary update scripts.
So theoretically (and thats what i am looking for) you could
dumped the the database schema after each change (manually, conjob, git hooks (maybe before commit))
(and only in some very special cases create updatescripts)
After that in your common updatescript (run the normal updatescripts, for the special cases) and then compare the schemas (the dump and current database) and then automatically generate the nessesary ALTER Statements. There some tools that can do this already, but haven't found yet a good one.
What I do in my personal projects is, I store my whole database to dropbox and then point MAMP, WAMP workflow to use it right from there.. That way database is always up-to-date where ever I need to do some developing. But that's just for dev! Live sites is using own server for that off course! :)
Storing each level of database changes under git versioning control is like pushing your entire database with each commit and restoring your entire database with each pull.
If your database is so prone to crucial changes and you cannot afford to loose them, you can just update your pre_commit and post_merge hooks.
I did the same with one of my projects and you can find the directions here.
That's how I do it:
Since your have free choise about DB type use a filebased DB like e.g. firebird.
Create a template DB which has the schema that fits your actual branch and store it in your repository.
When executing your application programmatically create a copy of your template DB, store it somewhere else and just work with that copy.
This way you can put your DB schema under version control without the data. And if you change your schema you just have to change the template DB
We used to run a social website, on a standard LAMP configuration. We had a Live server, Test server, and Development server, as well as the local developers machines. All were managed using GIT.
On each machine, we had the PHP files, but also the MySQL service, and a folder with Images that users would upload. The Live server grew to have some 100K (!) recurrent users, the dump was about 2GB (!), the Image folder was some 50GB (!). By the time that I left, our server was reaching the limit of its CPU, Ram, and most of all, the concurrent net connection limits (We even compiled our own version of network card driver to max out the server 'lol'). We could not (nor should you assume with your website) put 2GB of data and 50GB of images in GIT.
To manage all this under GIT easily, we would ignore the binary folders (the folders containing the Images) by inserting these folder paths into .gitignore. We also had a folder called SQL outside the Apache documentroot path. In that SQL folder, we would put our SQL files from the developers in incremental numberings (001.florianm.sql, 001.johns.sql, 002.florianm.sql, etc). These SQL files were managed by GIT as well. The first sql file would indeed contain a large set of DB schema. We don't add user-data in GIT (eg the records of the users table, or the comments table), but data like configs or topology or other site specific data, was maintained in the sql files (and hence by GIT). Mostly its the developers (who know the code best) that determine what and what is not maintained by GIT with regards to SQL schema and data.
When it got to a release, the administrator logs in onto the dev server, merges the live branch with all developers and needed branches on the dev machine to an update branch, and pushed it to the test server. On the test server, he checks if the updating process for the Live server is still valid, and in quick succession, points all traffic in Apache to a placeholder site, creates a DB dump, points the working directory from 'live' to 'update', executes all new sql files into mysql, and repoints the traffic back to the correct site. When all stakeholders agreed after reviewing the test server, the Administrator did the same thing from Test server to Live server. Afterwards, he merges the live branch on the production server, to the master branch accross all servers, and rebased all live branches. The developers were responsible themselves to rebase their branches, but they generally know what they are doing.
If there were problems on the test server, eg. the merges had too many conflicts, then the code was reverted (pointing the working branch back to 'live') and the sql files were never executed. The moment that the sql files were executed, this was considered as a non-reversible action at the time. If the SQL files were not working properly, then the DB was restored using the Dump (and the developers told off, for providing ill-tested SQL files).
Today, we maintain both a sql-up and sql-down folder, with equivalent filenames, where the developers have to test that both the upgrading sql files, can be equally downgraded. This could ultimately be executed with a bash script, but its a good idea if human eyes kept monitoring the upgrade process.
It's not great, but its manageable. Hope this gives an insight into a real-life, practical, relatively high-availability site. Be it a bit outdated, but still followed.
Update Aug 26, 2019:
Netlify CMS is doing it with GitHub, an example implementation can be found here with all information on how they implemented it netlify-cms-backend-github
I say don't. Data can change at any given time. Instead you should only commit data models in your code, schema and table definitions (create database and create table statements) and sample data for unit tests. This is kinda the way that Laravel does it, committing database migrations and seeds.
I would recommend neXtep (Link removed - Domain was taken over by a NSFW-Website) for version controlling the database it has got a good set of documentation and forums that explains how to install and the errors encountered. I have tested it for postgreSQL 9.1 and 9.3, i was able to get it working for 9.1 but for 9.3 it doesn't seems to work.
Use a tool like iBatis Migrations (manual, short tutorial video) which allows you to version control the changes you make to a database throughout the lifecycle of a project, rather than the database itself.
This allows you to selectively apply individual changes to different environments, keep a changelog of which changes are in which environments, create scripts to apply changes A through N, rollback changes, etc.
I'd like to put the entire database under version control, what
database engine can I use so that I can put the actual database under
version control instead of its dump?
This is not database engine dependent. By Microsoft SQL Server there are lots of version controlling programs. I don't think that problem can be solved with git, you have to use a pgsql specific schema version control system. I don't know whether such a thing exists or not...
Use a version-controlled database, of which there are now several.
https://www.dolthub.com/blog/2021-09-17-database-version-control/
These products don't apply version control on top of another type of database -- they are their own database engines that support version control operations. So you need to migrate to them or start building on them in the first place.
I write one of them, DoltDB, which combines the interfaces of MySQL and Git. Check it out here:
https://github.com/dolthub/dolt
I wish it were simpler. Checking in the schema as a text file is a good start to capture the structure of the DB. For the content, however, I have not found a cleaner, better method for git than CSV files. One per table. The DB can then be edited on multiple branches and merges extremely well.