Is it possible to use Git as source control for code stored in a database? - database

I work on Labware LIMS, which has both configuration, and customization via its own programming language and internal code editor, and stores this customization code in database records. (Note, not the source code of the actual application itself, just the customization code a.k.a. LIMS Basic.) Almost everything in LIMS is stored in the database.
We want to investigate the possibility of using source control to protect this code but we don't know much more than the theory of using something like Git. (I have worked as a junior QA and used git but not as a dev and my knowledge is limited!)
Of particular use would be the merging tools, as currently we have to manually merge code in a text editor, if we even notice there is a conflict (checking content between dev and live is time consuming and involves using multiple tools, some of which are 3rd party tools we have developed ourselves, which are hit and miss. I personally find it easiest to cut and paste into a text file and then use Beyond Compare.
There is no notification that the code is different when moving it from dev to live (no deployment as such, you just import an xml file) so we often have things going live that someone was working on unbeknownst to each other. I.e. dev 1 is working on the code in object 1, dev 2 gets a ticket to make a change to object 1, does so and puts their change Live, whatever dev 1 was doing is now also Live in whatever state it was in. (Because we don't always have time to thoroughly check what state each object is in between up to 3 different databases.)
Is it possible to use source control just on the code within the database, but not necessarily the database itself? (We have backups and such for that but its easy for some aspects of the system to get overwritten by multiple devs working on overlapping areas at the same time.)
If anyone reading this has any specific knowledge of LW LIMS, we are referring to the Subroutines mostly, we have versioned Analyses which stands in for source control for the moment and is somewhat effective but no way to control who is doing what on the subroutines other than a comment log at the top. I have tried to find any information on how other teams source control their code in LIMS but to no avail.
The structure of one of these tables can range from as simple as the code just existing in one field as a straight text dump with a few other fields such as changed_on, changed_by and name (Subroutines), or more complex with code relating to one record being sprinkled around in multiple rows on another table entirely (Analyses) but even if it could just deal with the simple scenario to start with that would be great!
TL;DR: Could the contents of the Code field in a database record be treated like a regular code object in other dev environments somehow and source controlled using Git? (And is anyone willing to explain it simply for me to follow?)

As you need to version control table fields of subroutine, but LW LIMS doesn’t have the IDE for version control (such as git, svn etc). So the direct answer is no.
If you really want to do version control for the codes in database, you can create a git repository and only put the codes in git repository. when a file has updated, you can commit & push the changes. And it’s easy to compare the difference between versions.
More detail about git, you can refer git book.

LabWare LIMS has a number of options for version control. You COULD version the Subroutine table by adding a SUBROUTINE.VERSION field to the table, this works the same way as other versioned tables in LabWare where it asks you if you would like to create a new version of the object before saving. There are a few customers I work with that have done this.
Alternatively, (and possibly our more recommended method prior to LEM) there is the Snapshot capability where the system automatically takes a "snapshot" of objects as they are saved - when viewing these you have the ability to view them side by side in a comparison dialogue - it will show < or > for lines which are different.
Another approach is, if you have auditing turned on you are able to view the audit history for changes to specific objects - this includes subroutines.
One other approach is to use configuration packages - this has the ability to record version AND build numbers. Though individual subroutines is probably a bit too granular for it's intended design.
Lastly, since this question was originally posted we have developed a product called LabWare Environment Manager (LEM) which has some good change control functionality built-in.
For more information on the suggestions above, please have a look at the LabWare Technical manual for the version you are on. We also have a mailing list for questions like this to be posted. You might find an answer there. If you have access to our Support webpage you're able to search previous questions that have been asked. I'd also suggest that you get in touch with your Account Manager at LabWare who can help you answer some of your questions.
HTH

Related

Database - Version Control - Managing dropped/deleted objects

We want to clean up our database schema and drop/delete objects which are no longer being used.
We suspect that sometime in the future we'll want to resurrect the removed functionality.
We've discussed the following options for dealing with dropped objects in version control:
Deleting the .sql files from source control once they are gone from the database and relying on the version history to store the definitions. Our concern with this approach is that sometime over the years source control will be moved and we will lose the history. It also seems difficult to know what to look for to recover if we can't see all the dropped objects.
Leaving the .sql files in source control but updating the definitions to "drop proc {someproc}". With this approach we our concerned about leaving the objects in version control which no longer exists and also the risk to losing the history if the vcs was moved
Creating a new repo for dropped objects and migrating .sql files to this repo once they have been dropped from SQL Server.
We're working in a windows environment and are fairly new to working with VCS for databases. Currently GIT + SSDT.
Currently option 3 is our preferred approach.
I see this a lot with database code, what happens is over time people end up with stuff in the database that is either not used or just does not work (think a proc that references a table and the table is modified but not the proc).
The thing to do is to get everything in source control (which it looks like you have) and then create a tag or branch of all the code before and after deleting it so you can get it back.
Two things normally transpire, either the code was genuinely never used or it was used at year end and when you find out, the world is about to fall on your head so better have a quick way to get it back.
Of course if you had a full suite of tests then even the year end process would be safe :)
I personally wouldn't use option 3, I would just keep the history in the main branch so you keep the history with it.
ed
There are a lot of good tools for versioning database changes: you have a big chance to get this question closed with "Too broad" reason, but I'll try to suggest to
Read about, understand and try to add Liquibase to your Development-Toolbox
Adopt your workflow for using this additional layer - technically it will be one more file (changelog in terms of Liquibase) in changesets, where you changing DD and|or data.
These changelogs provide good and smooth way of moving back and forth in linear history of changes in databases, not so good (or I don't know The Right Way) for direct jumping between nodes of diverged history, but it seems not your case
From your options-list it will be more p.1, than others (but it's storing changes in database in version-contol, not states)
Just to note another option, in SSDT you can mark the file property as Build Action = None. The file won't be included in the dacpac when this build option is selected. But I tend to agree with the idea that you should rely on your VCS to handle history.

Manage SSDT project file properly with version control (*.sqlproj)

We have constant problem with project XML file (*.sqlproj). If the files are added/renoved/changed location then it automatically adds/removes records in some unexpected places. After that we have big troubles by merging it when somebody changes that file also.
We came to conclusion that we might sort it before checkin. We would alphabetically sort it and in that case merge tool will understand it much better.
So, my questions would be:
Is it possible to re-arrange sqlproj file somehow before EVERY check-in? Maybe there are somekind of options/tools that doing that already?
Are there any other ways to make developers life easier?
UPDATE:
Once again I got the same problem. sqlproj file was modified 3 times and I want to merge to production only the last change, other 2 are not tested yet. in the merge tool I have the option to add all these 3 new objects or leave it without changes. I am not able to select only the last change ...
EXAMPLE:
developerA created tableA and checked in;
developerB got the latest version of dev branch, created tableB and checked in;
developerC got the latest version of dev branch, created tableC and checked in. DeveloperC tested the code and ready to go to production. He tries to merge his code to QA and get's the conflict where he has an option only to go with ALL changes.
I understand the scenario you are running into very well. This typically happens when you have multiple work streams happening in the context of a single repository and you don't have a common promotion schedule (as in all work will go to QA at the same time and PROD at the same time).
There's a few ways I can think to get around this problem and there are pros and cons to each option.
Lock each environment until everything can promote together. Not realistic in most cases.
When you are ready to promote, create a promotion branch from source environment and take things out of the promotion branch that aren't ready to promote to destination environment. This allows devs to keep working and be able to promote without freezing.
Hybrid approach... Don't source control anything in Dev until it's ready to promote to test. Then either do option #1 or 2 from there onward.
Create a more flexible ecosystem that can spin up an environment for each Feature branch in order to demo/test with others(or at least allocate/rotate enough between the developers to accomplish the same objective). Once it's accepted promote. This is what we are working towards currently but building out the infrastructure and process when you have a ton of interconnected databases and apps that share them is a bit challenging to say the least (especially in the Microsoft world).
Anyways hopes this helps...
1 - what source control are you using? No source control that I am aware of understands the context of sqlproj files but this isn't normally a problem.
2.a - This shouldn't be a problem you get constantly, are you checking in/out regularly? I would only expect to see issues if different developers are making large scale changes to the projects and not checking out / checking in before and after.
2.b - It is also possible you are not merging correctly, if you take both both sets of changes then it is normally fine.
ed

Web-App : Keeping trace of the version of the application in database?

We are building a webapp which is shipped to several client as a debian package. Each client runs his own server. But the update and support is done by us.
We make regular releases of the product, with a clean version number. Most of the users get an automatic update (by Puppet), some others don't.
We want to keep a trace of the version of the application (in order to allow the user to check the version in an "about" section, and for our support to help the user more accurately).
We plan to store the version of the code and the version of the base in our database, and to keep the info up to date automatically.
Is that a good idea ?
The other alternative we see is a file.
EDIT : The code and database schema are updated together. ( if we update to version x.y.z , both code and database go to x.y.z )
Using a table to track every change to a schema as described in this post is a good practice that I'd definitely suggest to follow.
For the application, if it is shipped independently of the database (which is not clear to me), I'd embed a file in the package (and thus not use the database to store the version of the web application).
If not and thus if both the application and the database versions are maintained in sync, then I'd just use the information stored in the database.
As a general rule, I would have both, DB version and application version. The problem here is how "private" is the database. If the database is "private" to the application, and user never modifies the schema then your initial solution is fine. In my experience, databases which accumulate several years of data stop being private, it means that users add a table or two and access data using some reporting tool; from that point on the database is not exclusively used by the application any more.
UPDATE
One more thing to consider is users (application) not being able to connect to the DB and calling for support. For this case it would be better to have version, etc.. stored on file system.
Assuming there are no compelling reasons to go with one approach or the other, I think I'd go with keeping them in the database.
I'd put them in both places. Then when running your about function you quickly check that they are both the same, and if they aren't you can display extra information about the version mismatch. If they're the same then you will only need to display one of them.
I've generally found users can do "clever" things like revert databases back to old versions by manually copying directories around "because they can" so defensively dealing with it is always a good idea.

Creating the Front End MDE

I created a database for tracking metrics, with some automation tricks (email, .doc,.ppt presentations, etc) with a very large Main-table, and lots of forms/GUI. This is the first time I have ever I worried about an MDE/front-end for the thing. So if you would be so kind to answer a few questions, or offer any advice, it would be greatly appreciated (I would hate for all this work to not be utilized).
What is the first thing I need to do? It the 2000 version that must be converted to 03 to create the MDE, but does that get done before I use the database splitter?
Will the amount of objects in the database effect the ability to do this? I have something like 80 forms, 70 queries, 20+ macros, 12 tables, etc...but does the amount of objects prevent some of this from working well once the front end is there?
when i split the database, can I continue to work/make changes and such on the "back end", and have those changes directly effect the front end?
These may be some basic questions, but I don't know the answer so.....Thanks!
Here is my 2 ¢.
Question 1 - I have never used the database splitter as I feel I have more control doing it manually. If you do it manually you can do it to a version that does not have a database splitter. But if you do use the splitter then--yes--you will have to upgrade to a version that has a splitter before doing it.
To do it manually here are the steps.
Backup everything.
Create a copy of your file into the same directory. So if you have an MyApp.MDB create a copy into the same directory with a new name, such as MyAppDATA.mdb.
Open the new DATA file (MyAppDATA.mdb) and delete all of the objects EXCEPT the TABLES.
Open the App file (MyApp.mdb) and delete all of the tables.
Also in MyApp.mdb...go to the File/Get External Data/Link Tables menu to link the tables in MyAppDATA.mdb to MyApp.mdb. Select All and create the links.
That should do it. And if you screw up you made a backup...right?
A couple of tips and gotchas...be sure that you go to Tools/Options and that you are NOT showing System and Hidden tables. You just don't want to delete system tables from MyApp. Another way to do it is do NOT delete tables that start with MSys or USys.
Question 2 - Does not matter how many object you have. In fact you don't have that many objects anyway.
Question 3 - Yes...you will make backend changes in MyAppData.mdb and when you open MyApp.mdb those changes will auto-magically be there to see and query against etc. (In the query designer you may need to save/close/reopen to see new fields if you made the mod while in the query). The EXCEPTION to that is New Tables You will have to use the File/Get External Data/Link Tables option to create links to new tables.
One thing to remember (and that I hope you already realize) is that the one downside of splitting the database is that when you deploy the front end file that usually the relative path to the data will vary from machine to machine and there is no automatic re-linking of tables in access. If your target clients have full access you can always use Tools/Database Utilities/Linked Table Manager to refresh the links to the right location. If you can't do that then you will have to do one of the following:
1. Write code that does the automatic re-linking for you. Basically it will check the links...if invalid it will prompt the user for the data location (or look it up in an INI file) and re-link the tables.
2. Always deploy your app to the same location on all machines. If you have commercial visions for your application this won't work...I mention it for academic reasons. It might be doable for a limited deployment where you have a lot of control over file placement on each machine.
3. Put the Data file (MyAppDATA.mdb) onto a network share and link the table across the network using a drive mapping or UNC (\myserver\mydata\ApplicationData\MyAppData.mdb). The latter is preferred but both of them run the same risks as number two.
Seth
PS This answer assumes Access 2003.
PPS If you have commercial visions for your application then the table linking has got to be REALLY robust.
PPPS I agree with the commenter that you may want to take the plunge and do SQL if it is in your skill set.
One thing that hasn't been discussed, and that's the issue of whether the compile to MDE could fail. Basically, if your code compiles in your front-end MDB, it will convert to an MDE. But I've noticed that lots of people never compile.
Some hints for keeping your VBA code in good shape:
in VBE options, turn off COMPILE ON DEMAND.
add the COMPILE button to your standard VBE toolbar and USE IT OFTEN.
periodically, backup your MDB and decompile/recompile it.
Also, remember that you must keep the MDB source, as the VBA code is not editable in an MDE and not recoverable by any good method.
EDIT:
Steps for a decompile:
backup your MDB.
start an instance of Access with the /decompile commandline argument. For, instance, I have a shortcut on my deskstop that has this as the target:
"C:\Program Files\Microsoft Office\OFFICE11\MSACCESS.EXE" /decompile
having opened that instance of Access, open the MDB you want to decompile. You will see nothing happen. DO NOTHING FURTHER IN THIS INSTANCE OF ACCESS -- close this instance of Access (the reason for this is that Michael Kaplan, who knows a thing or two about this, recommended that you never do any work in an Access instance opened with the decompile switch because he said there was no guarantee that the Access application code executed under those circumstances in a way that was fully safe for all kinds of Access work).
open the just-decompiled MDB holding down the shift key (you want to be sure that startup routines don't run because that would likely recompile the product before you've finished your cleanup) and compact the MDB (holding down the shift key again).
open the code editor and compile the project (DEBUG -> COMPILE [db name] for those who haven't step #2 in my original compiling instructions at the top of the post before the edit).
compact the MDB (doesn't matter if you bypass startup, since it's already fully compiled).
Why so many steps?
Because the purpose of the decompile is to get rid of the compiled p-code in order to start afresh from the canonical VBA code. Following the steps above insures that you have completely cleared the data pages storing the compiled code before you recompile. The reason for this is that without the compact step after the decompile, under some very rare circumstances, the code can behave strangely. I can't imagine that the old discarded p-code is being used again, but there's something about the pointers between the canonical code and the compiled code that apparently doesn't get completely flushed by a decompile without a compact.
This would be a comment to Seth's answer, but my rep isn't high enough to comment yet.
Seth did a great job answering your questions, I just wanted to add a bit more to part #1 about using the Database Splitter. The Database Splitter in the Tools menu works fine. Doing it manually is alright too, but it's a whole lot faster and easier to use the Database Splitter. I've used it a dozen times and never encountered any issues after using it.
http://www.databasedev.co.uk/split_a_database.html has a decent page about some of the pros, cons of splitting your database.
http://www.accessmvp.com/TWickerath/articles/multiuser.htm also has some good info when dealing with a split database in a multi-user environment.
Seth gave you a very good answer. But I'll add a few comments.
The number of objects only becomes relevant when you get close to about 1000 forms, reports and modules which have code. There's a limit about there. If you do get that message when trying to make an MDE then you almost certainly have a code error and need to compile to find the error
Another resource is "Splitting your app into a front end and back end Tips"
See the Auto FE Updater downloads page to make the process of distributing new FEs relatively painless.. The utility also supports Terminal Server/Citrix quite nicely.

What is a good way to implement an agile database process, which is in synch with the code base, especially in regards to continuous integration? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
The project I am working on were are trying to come up with a solution for having the database and code be agile and be able to be built and deployed together.
Since the application is a combination of code plus the database schema, and database code tables, you can not truly have a full build of the application unless you have a database that is versioned along with the code.
We have not yet been able to come up with a good agile method of doing the database development along with the code in an agile/scrum environment.
Here are some of my requirements:
I want to be able to have a svn revision # that corresponds to a complete build of the system.
I do not want to check in binary files into source control for the database.
Developers need to be able to commit code to the continuous integration server and build the entire system and database together.
Must be able to automate deployment to different environments without doing a rebuild other than the original build on the build server.
(Update)
I'll add some more info here to explain a bit further.
No OR/M tool, since its a legacy project with a huge amount of code.
I have read the agile database design information, and that process in isolation seems to work, but I am talking about combining it with active code development.
Here are two scenario's
Developer checks in a code change, that requires a database change. The developer should be able to check in a database change at the same time, so that the automated build doesn't fail.
Developer checks in a DB change, that should break code. The automated build needs to run and fail.
The biggest problem is, how do these things synch up. There is no such thing as "checking in a database change". Right now the application of the DB changes is a manual process someone has to do, while code change are constantly being made. They need to be made together and checked together, the build system needs to be able to build the entire system.
(Update 2)
One more add here:
You can't bring down production, you must patch it. Its not acceptable to rebuild the entire production database.
You need a build process that constructs the database schema and adds any necessary bootstrapping data. If you're using an O/R tool that supports schema generation, most of that work is done for you. Whatever is not tool-generated, keep in scripts.
For continuous integration, ideally a "build" should include a complete rebuild of the database and a reload of static testing data.
I just saw that you have no ORM tool... here's what we had at a company I used to work for
db/
db/Makefile (run `make` to rebuild db from scratch, `make clean` to close db)
db/01_type.sql
db/02_table.sql
db/03_function.sql
db/04_view.sql
db/05_index.sql
db/06_data.sql
Arrange however necessary... each of those *.sql scripts would be run in order to generate the structure. Developers each had local copies of the DB, and any DB change was just another code change, nothing special.
If you're working on a project that already has a build process (Java, C, C++), this is second nature. If you're using scripts in such a way that there is no build process at all, this'll be a bit of extra work.
"There is no such thing as "checking in a database change"."
Actually, I think you can check in database change. The trick is to stop using simple -- unversioned -- schema and table names.
If you have a version number attached to a schema as a whole (or a table), then you can easily have a version check-in.
Note that database versions doesn't have fancy major-minor-release. The "major" revision in application software usually reflects a basic level of compatibility. That basic level of compatibility should be defined as "uses the same data model".
So app version 2.23 and 2.24 use the version 2 of a the database schema.
The version check-in has two parts.
The new table. For example, MyTable_8 is version 8 of a given table.
The migration script. For example MyTable_8 includes a MyTable_7 to MyTable_8 script which moves the data, providing defaults or whatever is required.
There are several ways this is used.
Compatible upgrades. When merely altering a table to add a column that permits nulls, the version number stays the same.
Incompatible upgrades. When adding non-null columns (that need initial values) or changing the fundamental shape of tables or data types of columns, you're making a big change and you have a migration script.
Note that the old data stays in place until explicitly dropped at the end of the change procedure. You have to run tests to assure that everything worked.
You might have two-part drop -- first rename, then (a week later) finally drop.
Make sure that your O/R-Mapping tool is able to build the necessary tables out of the default configuration it has and also add missing columns. This should cover 90% of your cases.
The other 10% are
coping with missing values for columns that where added after the data was inserted
write data-migration scripts for the rare case where you need to do more fundamental changes between versions
See the DBDeploy open source project. http://dbdeploy.com/
It allows you to check in database change scripts. It will then produce a consolidated change script including all changes that have not been applied.
The site describes the process pretty well.
This project is based on the techniques in the Martin Fowler article that was mentioned before. I was on the project that Martin based the article on. DbDeploy is a pretty good implementation of the process we used.
The migrations facility of Ruby on Rails was developed to handle exactly this need. If you're not using Rails for your application, you might see if this same concept has been ported to the framework of your choice, or read up on it and determine whether you could write some quick scripts that implement the same sort of functionality.

Resources