Transact SQL, prefix with schema or not? - sql-server

I'm developing a tool where I've prefixed tables etc. with "dbo" now I get requests for custom schema names. I'm thinking of skipping them and instead let the user control this via the associated login against the Db. I know there's talk about "performance" since it needs to search the users's schemes and then fallback on dbo etc, but is that really an issue? Opinions?

First, I would look at this question as a feature request from your customers (users?). So the immediate decision to make is, should you even consider looking into this now, or do you have other feature requests that are obviously more important and deliver more benefit to the customer?
For example, for now you could simply tell customers that your application requires its own database that should not be shared with other applications or manipulated in any way by the customer. Then you don't have to worry about schemas or the same object name in two schemas because your application 'owns' the database. Perhaps this is already the case, but if so then I don't understand why your customers care which schema your objects are in.
Second, assuming that you do decide to work on it, you should gather some information about why people are asking for this, to make sure that you clearly understand what they expect you to deliver and what the benefit is for them. If customers are really saying "your application runs slowly" then the choice of schema is highly unlikely to be the reason, it's much more probable that indexing, schema design or your application code are the areas to look at.
Finally, if you still want to go ahead you need to find a technical solution. This is partly a deployment issue and partly a coding issue. It's a deployment issue because you have to deploy your database objects in a specific schema that is specified at installation time, and all your patches and later releases need to be aware of that too. The coding issue is that you need your database code to be "schema-aware", in case you end up in a situation where you have dbo.TableName, MyTool.TableName and OtherSchema.TableName all in the same database. The solution is obviously to reference the schema name in all code, which is considered an important best practice anyway. But exactly how you do this depends on how you have structured your application, if you use an ORM etc.

Related

Using LDAP server as a storage base, how practical is it?

I want to learn how practical using an LDAP server (say AD) as a storage base. To be more clear; how much does it make sense using an LDAP server instead of using RDBMS to store data?
I can guess that most you might just say "it doesn't" but there might be some reasons to make it meaningful (especially business wise);
A few points first;
Each table becomes a container entity and each row becomes a new entity as a child. Row entities contains attributes for columns. So you represent your data in this way. (This should be the most meaningful representation I think, suggestions are welcome)
So storing data like a DB server is possible but lack of FK and PK (not sure about PK) support is an issue. On the other hand it supports attribute (relates to a column) indexing (Not sure how efficient). So consistency of data is responsibility of the application layer.
Why would somebody do this ever?
Data that application uses/stores closely matches with the existing data in AD. (Users, Machines, Department Info etc.) (But still some customization is required to existing entity schema, and new schema definitions are needed for not very much related data.)
(I think strongest reason would be this: business related) Most mid-sized companies have very well configured AD servers (replicated, backed-up etc.) but they don't have such DB setup (you can make comment to this as much as you want). Say when you sell your software which requires a DB setup to these companies, they must manage their DB setup; but if you say "you don't need DB setup and management; you can just use existing AD", it sounds appealing.
Obviously there are many disadvantages of giving up using DB, feel free to mention them but let's assume they are acceptable. (I can mention more if question is not clear enough.)
LDAP is a terrible tool for maintaining most business data.
Think about a typical one-to-many relationship - say, customer and orders. One customer has many orders.
There is no good way to represent this data in an LDAP directory.
You could try having a mock "foreign key" by making every entry of that given object class have a "foreign key" attribute, but your referential integrity just went out the window. Cascade deletes are impossible.
You could try having a "customer" object that has "order" children. However, you've just introduced a specific hierachy - you're now tied to it.
And that's the simplest use case. Once you start getting into more complex relationships, you're basically re-inventing an RDBMS in a system explicity designed for a different purpose. The clue's in the name - directory.
If you're storing a phonebook, then sure, use LDAP. For anything else, use a real database.
For relatively small, flexible data sets I think an LDAP solution is workable. However an RDBMS provides a number significant advantages:
Backup and Recovery: just about any database will provide ACID properties. And, RDBMS backups are generally easy to script and provide several options (e.g. full vs. differential). Just don't know with LDAP, but I imagine these qualities are not as widespread.
Reporting: AFAIK LDAP doesn't offer a way to JOIN values easily, much the less do things like calculate summations. So you would put a lot of effort into application code to reproduce those behaviors when you do need reporting. And what application doesn't ultimately?
Indexing: looks like LDAP solutions have indexing, but again, seems hit or miss. Whereas seemingly all databases out there have put some real effort into getting this right.
I think any serious business system's storage should be backed up in the same fashion you believe LDAP is in most environments. If what you're really after is its flexibility in terms of representing hierarchy and ability to define dynamic schemas I'd suggest looking into NoSQL solutions or the Java Content Repository.
LDAP is very usefull for storing that information and if you want it, you may use it. RDMS is just more comfortable with ORM systems. Your persistence logic with LDAP will so complex.
And worth mentioning that this is not a standard approach -> people who will support the project will spend more time on analysis.
I've used this approach for fun, i generate a phonebook from Active Directory, but i don`t think that it's good idea to use LDAP as a store for business applications.
In short: Use the right tool for the right job.
When people see LDAP you already set an expectation on your system. Don't forget that the L Lightweight. LDAP was designed for accessing directories over a network.
With a “directory database” you can build a certain type of application. If you can map your data to a tree like data structure it will work. I surely would not want to steam videos from LDAP! You can probably hack something but I would prefer a steaming server..
There might be some hidden gotchas down the line if you use a tool not designed for what it is supposed to do. So, the downside is you'll have to test stuff that would have been a given in some cases.
It's not is not just a technical concern. Your operational support team might “frown” on your application as they would have certain expectations/preconceptions based on your applications architectural nature. Imagine their surprise if you give them CRM system (website + files and popped email etc.) in a LDAP server as database to maintain.
If I was in your position, I would steer towards one of the NoSQL db solutions rather than trying to use LDAP. LDAP is fine for things like storing user and employee information, but is terrible to interact with when you need to make changes. A NoSQL db will allow you to store your data how you want without the RDBMS overhead you would like to avoid.
The answer is actually easy. Think of CRUD (Create, Read, Update, Delete). If a lot of Read will be made in your system, you can think of using LDAP. Because LDAP is quick in read operations and designed so. If the other operations will be made more, the RDMS would be a better option.

Questions and considerations to ask client for designing a database

so as title says, I would like to hear your advices what are the most important questions to consider and ask end-users before designing database for their application. We are to make database-oriented app, with special attenion to pay on db security (access control, encryption, integrity, backups)... Database will also keep some personal information about people, which is considered sensitive by law regulations, so security must be good.
I worked on school projects with databases, but this is first time working "in real world", where this db security has real implications.
So I found some advices and questions to ask on internet, but here I always get best ones. All help appreciated!
Thank you!
Some other specifics besides what has already been said:
Do you have any Regulatory
requirements for data access and
storage (Sarbanes-Oxley and HIPAA
come to mind)
Do you need to be able to audit
record changes
What internal controls do you need
reflected in the database
What business rules must be followed
under what circumstances
How large to you expect the data to
get - the larger the data store
expected the more critical to design
with performance in mind from the
start
How flexible do you want the system
to be (do you want to be able to add
columns on the fly? OR add business
rules) Be careful with this one, make
sure the client understands that
flexibilty often comes at the cost of
performance.
Do you need a separate data warehouse
for reporting?
How do you need the data populated?
Will it come from an application,
multiple applications, data imports
or a combination?
What databases do you currently have
license for? Do you want to have
this application use it?
Will different groups of users need
different accesses?
How is the process currently being
handled, can we have access to that
database or see the current process
in action. Observe, for a minimum of
one day, the client using the current
system. Take extensive notes, you will learn many things no one will think to tell you.
Do you need to migrate data from the
old system
i would start with:
Please explain your business to me.
Which processes are you looking to
automate or improve?
Do you have any reports you need to
generate?
Do you need inputs to any other
systems?
use cases (google for that, it does not need to be drawings, text is fine)
inputs
outputs
static data
historical data
From there you derive the info you need to store, you apply 4th NF, and go !
Good luck ! 8-))

Why do we need Audit Columns in Database Tables?

I have seen many database designs having following audit columns on all the tables...
Created By
Create DateTime
Updated By
Upldated DateTime
From one perspective I see tables from the following view...
Entity Tables:
Good candidate for Audit columns)
Reference Tables:
Audit columns may or may not required. In some case last update information is not at all required because record is never going to be modified.)
Reference Data Tables
Like Country Names, Entity State etc... Audit columns may not required because these information is created only during system installation time, and never going to be changed.
I have seen many designers blindly put all audit columns to all tables, is this practice good, if yes what could be the reason...
I just want to know because to me it seems illogical. It is difficult for me to figure out why do they design their db this way? I am not saying they are wrong or right, just want to know the WHY?
You can also suggest me, if there is an alternative auditing patter or solution available...
Thanks and Regards
Data auditing is a required internal control for many business systems (see Sarbanes Oxley for reasons why). It must be at the database level to assure that all changes are captured especially unauthorized ones.
Even with lookup tables an unauthorized change could wreak havok in your system and thus it is important to know who made the change and when. When is especially important because it helps the dbas know how far back to grab a backup to restore information accidentally or maliciously changed.
We like to think all our employees are trustworthy, but many of the thefts of personal data and the malicious changes to destroy company data come from internal sources (this is why is is dangerous to have many disgruntled employees) as does almost all of the fraud. Yet most programmers seem to think that they only have to protect against outside threats.
Of course you are still going to have a few people who can make unauthorized changes, you can't prevent system admins from doing this. But with auditing at least you can limit the potential for data damage (and be especially careful when hiring dbas and allow no one else admin rights on your database servers).
These columns are for the benefit of the DBA and the database developers. They just provide a quick mechanism to answer questions like "When did this record last change?" "who changed it?" They are not robust enough or fine-grained enough to satisfy compliance with SOX, HIPAA or whatever.
It is simply easier to have these columns on every table. All data can change, so it is useful to know when changes happened, especially if that data isn't supposed to change. It is possible to automate the process of adding them, by using the data dictionary to generate scripts.
It is good practice for these columns to be populated independently of the application, by triggers or some similar mechanism. These columns are metadata, the application shouldn't really be aware of them.
Relying on a full-blown audit trail to provide this functionality is usually not an option. Audit data which is collected for compliance purposes usually has restricted access, and indeed may be stored in a separate physical location.
Many applications are developed using some OOP language in which there is generally a class like BusinessObject that contains what is perceived generally helpful information like such auditing fields. Not all subclassing entities may need it, but it's there if they do. Since the overhead of the db is small and the chances that the client may request another odd statistic based on the audit fields it's better to have them around than not to have them at all. If something represents a static list of information such as country names I generally wouldn't put it in the db at all - enumerated data type are created just for such purposes.
I come across this thread by chance, as the same question popped up in my mind this morning. Every answer has got the point and I definitely agree with all of you. It is undeniable to safeguard business data and transaction data. Instead of that, the author feels doubtful about audit fields for some configuration or static data.
This kind of configuration data are not updatable by users. Usually they can be placed in other places as well, like properties, config files or even hard-coded constants. Of course putting configuration data in these places might be bad designs or styles, but from the perspective of auditing, do they matter? In addition, if these data are updatable by users, then the only ones who can update it are either dba or hackers. Truly malicious dba or hackers will already know laws before they break the laws and they do find ways circumvent the laws.
To me, the question is more related to the environment in your company. Does your company have a culture of keeping track of every little bit of tiny information? Does your company constantly enforce strict discipline, monitoring or auditing? Having these auditing fields for non-user data are simply for their satisfaction, more than any other purposes.

What DBMS is appropriate for keeping a schema private even when installed 'in the wild'

I have an application server which connects to a database server. I would like to be able to supply users with installers and, with a moderate degree of comfort, trust that the database schema is secure.
I understand that there are some risks that I will just have to accept with not controlling the computer on which it installed - a determined person with the right tools and knowledge could look directly at memory and pull out information.
Initially I thought my area of focus would simply be on adding the credentials to the installer without them being trivially viewed in a hex editor.
However, as I began to research, I learned that for PostGreSQL, even if I install the database silently and don't provide the credentials to the user -- they can simply change a text-based configuration file (pg_hba.conf), and restart the server, enabling full access to the database without credentials.
Is this scenario secured in other DBMS? How do most commercial products protect their schemas in this scenario? Would most products use embedded databases?
Edit: I assume (perhaps wrongly so) that some products rely on databases that the user never actually touches directly. And I of course never see them because they have designed it in such a way that the user does not need to - probably using an embedded database.
As far as I remember, there are no commercial products that "protect" their schemas. What do you want the schema to be protected against?
Consider the following points:
After all, the only person who can protect anything in a RDBMS is the database server administrator. And you want the schema to be protected against this person?
If I was a costumer and I had my data inside your schema, I would not only like, but expect, to be able to see and consume it directly.
Do you really need to protect your relational design? Is it really that interesting? Have you invented something worth hiding? I really don't think so. And I apologize in advance if you have.
EDIT: Additional comment:
I don't care about most database internals for the products I use. That's another reason I think most of them don't take any action to protect them. Most of them are not that interesting.
On one side, I strongly believe that users should not need to know or to care about the internals of the database. But at the same level, as a developer, I don't think it is worth trying to protect them. Hiding them from the user, yes. Protect them against direct access, in most cases, no. And not because I think it is wrong to protect your schema. It is because I think it is a very hard thing to do, and it is not worth your time as a developer.
But at the end, as with any security related topic, the only right answer is about what are the risks involved vs the costs of implementing the security measure.
Current database engines, embedded or server-style, are not designed to easily hide the schema of the database, and therefore, the development cost of doing it is much greater than the risk involved, for most people.
But your case might be different.
Most commerical products do not protect their schemas. They fall into one of two camps:
Either they are making use of an enterprise class database for a key component of the product (such as a payroll system), in which case there is no attempt made to hide the schema/data. In most of these cases the customer needs control over the database anyway - to configure how the database is backed up, to be able to make a clustered environment, etc.
The other case is if your "database" is nothing but a small settings or storage file for a desktop application (ex. the history and bookmark databases in FireFox). In that case you should just use an embedded database (like SQLite, same as FireFox) and add a streaming encryption layer (there is an official version of this called SEE), or just use the embedded database and forget about the encryption layer, since the user will need to have to install their own database tools to read the file in the first place.
What problem are you trying to solve? Nothing can stop the DBA* from doing whatever he wants to standard databases, and as others have pointed out it's actively hostile to interfere with site-specific needs like backups and database upgrades. At most you can encrypt the contents of your database, but even then you have to provide a decryption key for your application to actually run and a motivated and hostile DBA can probably subvert it.
The military and intelligence communities undoubtably have databases where even the schema is highly classified, but I don't know if they're protected by technical means or just large men with guns.
(*) DBA or system administrator able to modify files like pg_hba.conf.
How do most commercial products
protect their schemas in this
scenario?
I don't believe most commercial products do anything to protect their schemas.
How an embedded DBMS can stop someone to tinker with its storage (files in this non-embedded hardware context) when such person has physical access to the machine where this DBMS is running? Security through obscurity is a risky proposition.
This idea will suffer from the same problems as DRM. You can't prevent access by the determined, and you will only cause general pain and suffering for your customers. Just don't do it.
SQLite wraps its entire database format into a single file, and you could conceivably encrypt and decrypt it in-place. The flaw, of course, is that users need the key to use the database now, and the only way that can happen is if you give it to them, perhaps by hard-coding it in at compile-time (security by obscurity) or a phone-home scheme (whole host of reasons why this one's a bad idea). Plus now they'll hate you because you've thwarted any attempt at a useful backup system and they get terrible performance to boot.
Besides, nobody actually cares about schemas. Hate to break it to you, but schema design isn't a hard problem, and certainly never a legitimate competitive advantage (outside of maybe a few specific areas like knowledge representation and data warehousing). Schemas are generally not worth protecting in the first place.
If it's really that important to you, do a hosted application instead.

Defining the database schema in the application or in the database?

I know that the title might sound a little contradictory, but what I'm asking is with regards to ORM frameworks (SQLAlchemy in this case, but I suppose this would apply to any of them) that allow you to define your schema within your application.
Is it better to change the database schema directly and then update the column types in your program manually, or does it make more sense to define the tables in your application and then use the ORM framework's table generation functions to make the schema and then build the tables on the database side for you?
Bear in mind that applications and databases tend to live in a M:M relationship in any but the most trivial cases. If your application is at all likely to have interfaces to other systems, reports, data extracts or loads, or data migrated onto or off it from another system then the database has more than one stakeholder.
Be nice to the other stakeholders in your application. Take the time and get the schema right and put some thought into data quality in the design of your application. Keep an eye on anyone else using the application and make sure you don't break bits of the schema that they depend on without telling them. This means that the database has a life of its own to a greater or lesser extent. The more integration, the more independent the database.
Of course, if nobody else uses or cares about the data, feel free to ignore my advice.
My personal belief is that you should design the database on its own merits. The database is the best place to handle things modeling your Domain data. The database is also the biggest source of slow down in applications and letting your ORM design your database seems like a bad idea to me. :)
Of course, I've only got a couple of big projects behind me. I'm still learning daily. :)
The best way to define your database schema is to start with modeling your application domain (domain driven design anyone?) and seeing what tables take shape based on the domain objects you define.
I think this is the best way because really the database is simply a place to persist information from the application, it should never lead the design. It's not the only place to persist information as well. We have users that want to work from flat files or the database for instance. They could also use XML files. So by starting with your domain objects and then generating tables (or flat file or XML schema or whatever) from there will lead to a much better design in the end.
While this may depend on you using an object-oriented language, using an ORM tool like Hibernate/NHibernate, SubSonic, etc. can really make this transition easy for you up to, and including generating the database creation scripts.
In reference to performance, performance should be one of the last things you look at in an application, it should never drive the design. After you get a good schema up and running based on your domain you can always make tweaks to improve its performance.
Alot depends on your skill level with the specific database product that you're going to use. Think of it as the difference between a "manual" and "automatic" transmission car. ORMs provide you with that "automatic" transmission, just start designing your classes, and let the ORM worry about getting it stored into the database somehow.
Sounds good. The problem with most ORMs is that in their quest to be PI "persistence ignorant", they often don't take advantage of specific database features that can provide elegant solutions for a given task. Notice, I didn't say ALL ORMs, just most.
My take is to design the conceptual data model first yourself. Then you can go in either direction, up towards the application space, or down towards the physical database. But remember, only YOU know if it's more advantageous to use a view instead of a table, should you normalize or de-normalize a table, what non-clustered index(es) make sense with this table, is a natural or surrogate key more appropriate for this table, etc... Of course, if you feel that these questions are beyond your grasp, then let the ORM help you out.
One more thing, you really need to seperate the application design from the database design. They are almost never the same. How important is that data? Could another application be designed to use that data? It's a lot easier to refactor an application than it is to refactor a database with a billion rows of data spread across thousands of tables.
Well, if you can get away with it, doing it in the application is probably the best way. Since it's a perfect example of the DRY principle.
Having said that however, getting away with it is always going to be hard to pull off since you're practically choosing to give up most database specific optimizations. (more so, with querying, but it still applies to schemas (indexes, etc)).
You'll probably end up changing the schema by hand anyway, and then you'll be stuck with a brittle database schema that's going to be the source of your worst nightmares :)
My 2 Cents
Design each based on their own requirements as much as possible. Trying to keep them in too rigid sync is a good illustration of increased coupling/decreased cohesion.
Come to think of it, ORMs can easily be used to spread coupling (even though it can be avoided to some degree).

Resources