Worth the headache to organize SQL files by application subject?

Worth the headache to organize SQL files by application subject? - database

At my company, we save each database object (stored proc, view, etc) as an individual SQL file, and place them under source control that way.
Up until now, we've had a very flat storage model in our versioned file structure:
DatabaseProject
Functions
(all functions here; no further nesting)
StoredProcedures
(all stored procs in here; no further nesting)
Views
(ditto)
For a big new project, another idea has occurred to me: why not store these files by subject instead of in these prefab flat lists?
For example:
DatabaseProject
Reports
(individual stored procs, views, etc.)
SpecificReport
(more objects here, further nesting as necessary)
SpecificApplication
(all types of DB objects, with arbitrarily deep nesting)
et cetera....
The obvious flaw is that this folder structure doesn't impose any kind of namespace hierarchy on the database objects; it's for organization only. Thus, it would be very easy to introduce objects with duplicate names. You'd need some kind of build tool to survey the database project and die on naming conflicts.
What I'd like to know is: has anyone tried this method of organizing SQL files by application subject in their versioned file structure? Was it worth it? Did you create a build tool that would police the project as I have described?

I like to have my SQL scripts organized by topics, rather than by name. As a rule, I even group related items into single files. The main advantages of this are :
You do not clutter your filesystem/IDE with files (many of them being a few lines long).
The overall database structure shows more directly.
ON the other hand, it may be more difficult to find the source code related to a specific object...
As for duplicate names : it can never happen, because you obviously have automated scripts to build your database. Relying on your filesystem for this is looking for trouble...
As a conclusion, I would say that your current rules are much better than no rule at all.

You should define a naming scheme for your database objects, so that it's clear where a view or SP is being used.
This can either be prefixes to describe the app modules, or separate schema names for modules/functionality.
No nesting required, and names in the VCS show up the same as in the database, and sort properly depending on the naming scheme.

We save our SQL files in a "SQL" solution folder with each project. That way, each project is "installed" separately.

Related

Version control using database tables, or source control tool?

Our application has an MS Access 2010 database (I know.. I would much prefer SQL Server, but that's another topic).
Since MS Access stores its data in single mysterious monolithic binary files rather than scripts, my team is thinking of creating several extra tables corresponding to different versions of the software and maintain these versions inside one master database.
I suggest simply placing the binary file in the same source control tool as the software source code. Then the vast majority of the database content would be a duplicate of the other versions, but at least it puts the version control tool in control of the software source and database simultaneously in a synced fashion.
The application uses XML files that are exported from the database (doesn't tie into the database directly).
What are the pros and cons of these two approaches?
I'm familiar with version control methods for SQL Server, but MS Access seems cumbersome to manage for applications with lots of branches.

To put it short: You are pushing Access to something it is not intended for.
You do have the commands SaveAsText and LoadFromText that can export and import most objects as discrete text files. This has been used by Visual SourceSafe to create some sort of source control but it doesn't work 100% reliably.
Also, you can just as well import and export objects "as is" to another (archive) database building some kind of version control.
I once worked with a team in a very large corporation having all imaginable resources from MS at hand and, still, we ended up with a simple system of zip files given a filename including the date and time.
We had a master accdb file we pulled as a copy to a local folder, then did what we were assigned, and copied the file back leaving a note about what objects were altered. One person had the task to collect the altered objects and "rebuild" a new master. A minimum was one per day, but often we also created one at lunch break.
It worked better than you might imagine, because we typically operated in different corners - one with some reports, one with other reports, one with some forms, and one (typically me) with some code modules. Of course, mistakes happened, but as we had the zip files, it was always fast and safe to pull an old copy of an object if in doubt.

Database of metadata files across multiple directories

Consider multiple binary files associated with one metadata file each across multiple directories:
directory1: file1.bin file1.txt
directory2: file2.bin file2.txt
The metadata files contain structered data in XML or JSON format.
Is there a database which can use these metadata files for operating and running queries on them?
From what I understand about document oriented databases is, that their data files are stored in one directory.
My question is related to this stackexchange question. Unfortunately, there is no good description on a XML-based solution.

To get good query performance on metadata based queries, virtually any system will have to extract the metadata from individual metadata files and store in a more optimized form: one or more index(es) of some form or other. If there's associated data only stored in files, and not in the index (like your .bin files), then the index entry would need to store a path to to the file it so the associated data can be retrieved when needed. The path can typically store directories names, machine names, etc. In modern systems the path could be a URL.
A document oriented database might be a perfectly good place to store the metadata index, but isn't necessarily the best choice. If the metadata you need to query on is highly regular (always has the same fields, then some other form of index storage could have substantially better performance, but if you don't know ahead of time the structure of the metadata, a document oriented database might be be more flexible. Another approach might be use of a full-text search engine if you are trying to match words and phrases in the metadata.
So yes, such databasees exist. Unfortunately, there are far too many factors unspecified to make a specific recommendation. The question isn't well suited to a generic answer, the size of the document collection, expected transaction rate, required storage and retrieval latency targets and and consistency requirements could all factor into a recommendation, as would might any platform preferences (window vs *nix, on-premise vs cloud, etc.)

If you want to query structured data directly in XML or JSON files there are tools for doing so, for example:
xml-grep
jq
If your metadata text files relate to interpreting the binary files, I'm not aware of any generic parser for this. One may exist, but it seems a stretch unless you are using well-defined formats.
The general approach of working with these files directly is going to be inefficient if you need to make repeated queries, as any non-database solution is going to involve parsing the files to resolve your queries. A document-oriented database refers to the ability to store structured content, but the on-disk format will be more efficient (and complex) than text files and XML/JSON metadata which has to be parsed.
If you actually want to use a database and build appropriate indexes over structured content, you should import your raw data into one.

Prefixing database table names

I have noticed a lot of companies use a prefix for their database tables. E.g. tables would be named MS_Order, MS_User, etc. Is there a good reason for doing this?
The only reason I can think of is to avoid name collision. But does it really happen? Do people run multiple apps in one database? Is there any other reason?

Personally, I don't see any value in it. In fact, it's a bummer for intellisense-like features because everything begins with MS_. :) The Master agrees with me too.

Huge schemas often have many tables with similar, but distinct, purposes. Thus, various "segmented" naming conventions.
Darn, didn't get first post :-)

In SQL Server 2005 and above the schema feature eliminates the need for any kind of prefix. A good example of their usage can be found by reading about the Schemas in AdventureWorks.
In some older versions of SQL server, having a prefix to create a pseudo namespace might of been useful with DBs with lots of tables.
Other than that I can't really see the point.

Even when the database only contains one application, the prefixes can be useful in grouping like parts of the application together. So tables that containtain cutomer information might be prefixed with cust_, those that contain warehouse information might be prefixed with inv_ (for inventory), those that contain finacial information might be prefixed with fin_, etc.

I've worked on systems where there is an existing database for an application which was created and maintained by a different company and we've needed to add another app that uses large amounts of the same data with just a few extra tables of our own, so in that case having an app specific prefix can help with the separation.
Slightly tangentially to the original question, I've seen databases use prefixes to indicate the type of data that a table is holding. There'd be a prefix for lookup tables which are obviously pretty static in both size and content and a different prefix for the tables that contain variable data. This in turn may be broken into having one prefix for tables that are added to but not really changed like logging, processed orders, customer transactions etc, and and another for more variable data like customer balance or whatever. Link tables could also have their own prefix to separate them out too.

I have never seen a naming collision, as it usually doesn't make sense to put tables from different applications into the same database namespace. If you had some sort of reusable library that could be integrated into different applications, perhaps that might be a reason, but I haven't seen anything like that.
Though, now that I think about it, there are some cheap web hosting providers that only allow users to create a very small number of databases, so it would be possible to run a number of different applications using a single database, so long as the names didn't collide (and such a prefixing convention would certainly help).

Multiple applications using a particular table, correct. Prefixes prevent name collision. Also, it makes it rather simple to backup tables and keep them in the same database, just change the prefix and your backup will be fully functional, etc. Aside from that, it's just good practice.

Prefixes are a good way to sort out which sql objects are associated with which app when multiple apps dip into the same database.
I have also prefixed sql objects differently within the same app to facilitate easier management of security. i.e. all the objects with admin_ need this security applied and the rest need something else.
Prefixes can be handy for humans, search tools and scripts. If the situation is a simple one, however, there is probably no use for them at all.

It's most often use if several applications are sharing one database. For example, if you install Wordpress, it prefixes all tables with "wp_". This is good if you want your applications to share data very easily(sessions across all applications in your company, for example.)
There are better ways to accomplish this however, and I never prefix my table names, as each application has it's own self-contained database.

Storing Data in Files on the Server rather than in Databases?

What are the problems associated with storing your Data in files rather than databases? I'm thinking in terms of something like a blog engiene. I read that MoveableType used to do this. What are the pros/cons of working this way?

Databases provide means to perform interesting queries more easily.
Examples: You would want to list the 10 most recent posts on the front page. Make an archive page that lists all articles published in a given year (taken from the url).

I think the main one is data consistency. If you keep everything together in one db table, you don't have to worry (as much) about the file being externally modified or deleted without the meta data being modified in sync. There's also the possibility of an incomplete write if the server fails while you're updating. In this case you have to take your own steps to implement transactions.
I think that with an appropriate level of care and file permissions though, these problems can be overcome.

It is much easier and more comfortable to specify access rights (to data or file) in database than to use OS specific access rights.
You can easily share data across machines and/or websites using database-stored files.
Unfortunately, it is (often) much slower to serve files stored in database.

Storing Database-Agnostic Schema

We have a set of applications that work with multiple database engines including Sql Server and Access. The schemas for each are maintained separately and are not stored in text form making source control difficult. We are interested in moving to a system where the schema is stored in some text-based format (such as XML or YAML) with descriptions of field data types, foreign key relationhsips, etc.
When all is said and done, we want to have a single text file in source control that can be used to generate a clean database that works with both SQL Server, Access at least (and preferably is capable of working with Oracle, DB2 and other engines).
I'm certain that there are tools or libraries out there that can get us at least part of the way there. For one, I've found Altova MapForce that looks like it may do the trick but I'm interested in hearing about any alternative tools or libraries or even entirely different solutions for those in the same predicament.
Note: The applications are written in C++ and ORM solutions are both not readily available in C++ and would take far too long to integrate into our aging products.

If you don't use a object relational mapper that does this (and many other things for you) the easiest way might be to whip up a few structures to define your tables and attributes in some form of (static) code and write little generators to create actual databases from that description.
That makes it easy for source control, and if you're careful when designing those structures, you can easily re-use them for other DBs if need arises.

The consensus when I asked a similar (if rather more naive) question seem to be to use raw SQL, and to manage the RDMS dependencies with an additional layer. Good luck.

Tool you're looking for is liquibase. No support for Access though...

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight