Why would you want a case sensitive database? - database

What are some reasons for choosing a case sensitive collation over a case insensitive one? I can see perhaps a modest performance gain for the DB engine in doing string comparisons. Is that it? If your data is set to all lower or uppercase then case sensitive could be reasonable but it's a disaster if you store mixed case data and then try to query it. You have to then say apply a lower() function on the column so that it'll match the corresponding lower case string literal. This prevents index usage in every dbms that I've used. So wondering why anyone would use such an option.

There are many examples of data with keys that are naturally case sensitive:
Files in a case sensitive filesystem like Unix.
Base-64 encoded names (which I believe is what YouTube is using, as in Artelius's answer).
Symbols in most programming languages.
Storing case sensitive data in case-insensitive system runs the risk of data inconsistency or even the loss of important information. Storing case insensitive data in a case sensitive system is, at worst, slightly inefficient. As you point out, if you only know the case-insensitive name of an object you're looking for, you need to adjust your query:
SELECT * FROM t WHERE LOWER(name) = 'something';
I note that in PostgreSQL (and presumably in other systems), it is a simple matter to create an index on the expression LOWER(name) which will be used in such queries.

Depends on the data you want to store. Most UNIX filesystems are databases with case-sensitive keys. YouTube videos seem to be organised with case-sensitive keys.
Most of the time you want case-insensitive searches, but clearly there are certain exceptions.

Use a case insensitive index for your field. In most cases you don't want to manipulate the data in order to find it.

One reason is for Content Management. Typically you will need to identify changes in content so that those changes can be reviewed, recorded and published. Case matters for human readable content. "Dave Doe" is correct. "dave doe" is plain wrong.
Case-sensitivity also matters for software developers. If you don't know the desired case-sensitivity for all your customers' systems then you may want to test case-senstivity as part of testing anyway.

I have worked on an application that involves a database with purely natural keys (i.e. 'codes') that should be case sensitive but are not necessarily so.
A lot of data would come out of the database in stored procs (with the database is doing the joins), where case sensitivity is not an issue. However some data needed to come from the database in separate queries and then be 'stitched together' in loops - primarily due to a complex data type that SQL couldn't easily work with - and this is where the problem arose. When I'm iterating two result sets and trying to join on the 'code', the values Productcode and ProductCode don't naturally match.
Rather than fixing the data, I had to change my code (C#) to do case insensitive string matching. Not throughout the entire solution, mind, just when looking through these 'codes' for matching.
If I had a case sensitive database I would've had tidier code.
Now, rather than 'why case sensitive', I'd really like to know why you'd want a case insensitive database. Is it due to laziness? I don't see any good reason that databases are case insensitive.

Related

How to ignore case in an NDB/DB query

This seems like a simple question, but I don't see anything in the class definition.
If I have the query
Video.query(Video.tags.IN(topics))
topics are coming in as lowercase unicode strings, but Video.tags are mostly capitalized. I can loop through topics and capitalize them before querying with them, but is there a way to ignore case altogether?
It's not possible to ignore case in a query.
Typically if you know you want to do a case insensitive search, you may store a "denormalized" duplicate of the data in lower case. Whenever you want to query, you would lowercase the text before querying.
To reduce write costs, you probably only want to index the lowercased version, and you probably wouldn't need to index the actual case-sensitive data.

Databases - Why case insensitive?

I saw one or two threads talking globally about case sensitivity, but my question is more specific.
I understand the interest of case insensitive searches on text values for example.
But why would we use case-insensitive database names, tables and columns?
Isn't it going to lead to mistakes? Script languages that use databases are all case-sensitive, so for example if we didn't use the right case for a field it will not be found...
The SQL:2008 and SQL-99 standards define databases to be case-insensitive for identifiers unless they are quoted. I've found most ORMs will quote identifiers in the SQL they generate.
However, as you probably know not all relational databases strictly adhere to the standards. DB2 and Oracle are 100% compliant. PostgreSQL is mostly compliant, except for the fact that it automatically lowercases anything that isn't quoted (which personally I prefer.)
mySQL gets a bit weird, since it stores each table as a file on the file system. For this reason, it's subject to the case sensitivity of the file system. On Windows:
CREATE TABLE FOO (a INTEGER);
CREATE TABLE 'Foo' (a INTEGER); -- Errors out, already exists
Where-as on Linux:
CREATE TABLE FOO (a INTEGER); -- Creates FOO table
CREATE TABLE 'Foo' (a INTEGER); -- Creates Foo table
SQL Server is even stranger. It will preserve the case on creation, however let you refer to it in any way after (even if you quote the name!) You can't create two tables whose only difference is their casing. Note: SQL Server does have configuration options that control this stuff though, as case-sensitivity of identifiers will depend on the default collation of the database instance. How confusing!
While for the most part I agree with you that computers (programming languages, databases, file systems, URLs, passwords, etc) should be case-sensitive, all systems are implemented independently and may or may not adhere to standards that may or may not exist. Implementing a case-senstive database is definitely possible, if you know the ins and outs of your particular database system and how it behaves.
It's really your responsibility to implement your system in a way that works for you, and not the entire technology industry to implement everything in a consistent way to make your life easier.
The main advantage of using case sensitivity is that when we deploy it on the client site, our DB works regardless whether the client's SQL Server is set up case sensitive or not, so yes it really isn't a good idea and I don't know why anyone would use case-insensitve database tables/columns.
If you would redo all the it industry today, with the knowledge and technology you might default to do everything case sensitive with the only exception of things especially asked for being not case sensitive.
But back in the days before I was born and even when I started working (ok, playing) with computers many computers couldn't even differentiate between upper and lower case letters. I build a might complicated card to plug into my fake apple II to make it understand the difference.
So I guess in these days having something like a difference between upper and lower case was something like having a retina display nowadays. Its cool if you have it. And in 10 years we might ask why anybody ever created an application without such displays in mind, but today it just isn't that relevant.
Same is true for databases (and file systems) since many of them and their respective standards go back to the 70s at least.

Table Naming: Underscore vs Camelcase? namespaces? Singular vs Plural?

I've been reading a couple of questions/answers on StackOverflow trying to find the 'best', or should I say must accepted way, to name tables on a Database.
Most of the developers tend to name the tables depending on the language that requires the database (JAVA, .NET, PHP, etc). However I just feel this isn't right.
The way I've been naming tables till now is doing something like:
doctorsMain
doctorsProfiles
doctorsPatients
patientsMain
patientsProfiles
patientsAntecedents
The things I'm concerned are:
Legibility
Quick identifying of the module the table is from (doctors||patients)
Easy to understand, to prevent confusions.
I would like to read any opinions regarding naming conventions.
Thank you.
Being consistent is far more important than what particular scheme you use.
I typically use PascalCase and the entities are singular:
DoctorMain
DoctorProfile
DoctorPatient
It mimics the naming conventions for classes in my application keeping everything pretty neat, clean, consistent, and easy to understand for everybody.
Since the question is not specific to a particular platform or DB engine, I must say for maximum portability, you should always use lowercase table names.
/[a-z_][a-z0-9_]*/ is really the only pattern of names that seamlessly translates between different platforms. Lowercase alpha-numeric+underscore will always work consistently.
As mentioned elsewhere, relation (table) names should be singular: http://www.teamten.com/lawrence/programming/use-singular-nouns-for-database-table-names.html
Case insensitive nature of SQL supports Underscores_Scheme. Modern software however supports any kind of naming scheme. However sometimes some nasty bugs, errors or human factor can lead to UPPERCASINGEVERYTHING so that those, who selected both Pascal_Case and Underscore_Case scheme live with all their nerves in good place.
An aggregation of most of the above:
don't rely on case in the database
don't consider the case or separator part of the name - just the words
do use whatever separator or case is the standard for your language
Then you can easily translate (even automatically) names between environments.
But I'd add another consideration: you may find that there are other factors when you move from a class in your app to a table in your database: the database object has views, triggers, stored procs, indexes, constraints, etc - that also need names. So for example, you may find yourself only accessing tables via views that are typically just a simple "select * from foo". These may be identified as the table name with just a suffix of '_v' or you could put them in a different schema. The purpose for such a simple abstraction layer is that it can be expanded when necessary to allow changes in one environment to avoid impacting the other. This wouldn't break the above naming suggestions - just a few more things to account for.
I use underscores. I did an Oracle project some years ago, and it seemed that Oracle forced all my object names to upper case, which kind of blows any casing scheme. I am not really an Oracle guy, so maybe there was a way around this that I wasn't aware of, but it made me use underscores and I have never gone back.
I tend to agree with the people who say it depends on the conventions of language you're using (e.g. PascalCase for C# and snake_case for Ruby).
Never camelCase, though.
After reading a lot of other opinions I think it's very important to use the naming conventions of the language, consistency is more important than naming conventions only if you're (and will be) the only developer of the application. If you want readability (which is of huge importance) you better use the naming conventions for each language. In MySQL for example, I don't suggest using CamelCase since not all platforms are case sensitive. So here underscore goes better.
These are my five cents. I came to conclusion that if DBs from different vendors are used for one project there are two best ways:
Use underscores.
Use camel case with quotes.
The reason is that some database will convert all characters to uppercase and some to lowercase. So, if you have myTable it will become MYTABLE or mytable when you will work with DB.
Naming conventions exist within the scope of a language, and different languages have different naming conventions.
SQL is case-insensitive by default; so, snake_case is a widely used convention. SQL also supports delimited identifiers; so, mixed case in an option, like camelCase (Java, where fields == columns) or PascalCase (C#, where tables == classes and columns == fields). If your DB engine can't support the SQL standard, that's its problem. You can decide to live with that or choose another engine. (And why C# just had to be different is a point of aggravation for those of us who code in both.)
If you intend to ever only use one language in your services and applications, use the conventions of that language at all layers. Else, use the most widely used conventions of the language in the domain where that language is used.
C# approach
Singular/Plural
singular if your record in row contains just 1 value.
If it is array then go for plural. It would make perfect sense also when you foreach such element. E.g. your array column contains MostVisitedLocations: London, NewYork, Bratislava
then:
foreach(var mostVisitedLocation in MostVisitedLocations){
//go through each array element
}
Casing
PascalCase for table names and camelCase for columns made the best sense to me. But in my case in .NET 5 when I had json objects saved in dbs with json object names in camelCase, System.Text.Json wasnt able to deserialise it to object. Because your model has to be public and public properties are PascalCase. So mapping table columns(camelCase) and json object names(camelCase) to these properties can result in error(because mapping is case sensitive).
Btw with NeftonsoftJson this problem is not present.
So I ended app with:
Tables: App.Admin, App.Pricing, UserData.Account
Columns: Id, Price, IsOnline.
2 suggestions based on use cases:
Singular table names.
Although I used to believe in pluralizing table names once, I found in practise that there is little to no benefit to it other than the human mind to think in terms of tables as collections.
When singularising the table names, you can silently add -table to the singular table name in your head, and then it all makes sense again.
SELECT username FROM UserTable
Sounds more natural than
SELECT username FROM UsersTable
But post-fixing every table with is just a waste.
The actual practical argumentation for singularising table names:
What is the plural of person: persons or people?
This is still ok.
But how do you like a table with postfix -status? Statuses?
That sucks, sorry.
It is easy to inadvertently make a human mistake by singularizing the status table, but pluralizing the other tables.
PascalCasing + Underscore convention.
Given table User, Role and a many-to-many table User_Role.
Considering underscore cased user_role is dubious when all table names are using underscore per default.
Is user_role a table that contains user roles? In this case it is not, it is a join table.
When deciding on table name conventions I think it is useful to let go of personal preference and take into account the real practical considerations of real life problems in order to minimize dubious situations to occur.
As the many answers and opinions have indicated, whatever your personal opinion is, different people think differently, and you will not be the only person working on the database despite being the one who sets it up (unless you do, in which case you're only helping yourself).
Therefore it is useful to have practical argumentation (practical in the sense of, does it help my future co-workers to avoid dubious situations) when your past decision is being questioned.
Unfortunately there is no "best" answer to this question. As #David stated consistency is far more important than the naming convention.
there's wide variability on how to separate words, so there you'll have to pick whatever you like better; but at the same time, it seems there's near consensus that the table name should be singular.

Should I write table and column names ALWAYS lower case?

I wonder if it's a problem, if a table or column name contains upper case letters. Something lets me believe databases have less trouble when everything is kept lower case. Is that true? Which databases don't like any upper case symbol in table and column names?
I need to know, because my framework auto-generates the relational model from an ER-model.
(this question is not about whether it's good or bad style, but only about if it's a technical problem for any database)
As far as I know there is no problem using either uppercase and lowercase. One reason for the using lower case convention is so that queries are more readable with lowercase table and column names and upper case sql keywords:
SELECT column_a, column_b FROM table_name WHERE column_a = 'test'
It is not a technical problem for the database to have uppercase letters in your table or column names, for any DB engine that I'm aware of. Keep in mind many DB implementations use case sensitive names, so always refer to tables and columns using the same case with which they were created (I am speaking very generally since you didn't specify a particular implementation).
For MySQL, here is some interesting information about how it handles identifier case. There are some options you can set to determine how they are stored internally. http://dev.mysql.com/doc/refman/5.0/en/identifier-case-sensitivity.html
The SQL-92 standard specifies that identifiers and keywords are case-insensitive (per A Guide to the SQL Standard 4th edition, Date / Darwen)
That's not to say that a particular DBMS isn't either (1) broken, or (2) configurable (and broken)
From a programming style perspective, I suggest using different cases for keywords and identifiers. Personally, I like uppercase identifiers and lowercase keywords, because it highlights the data that you're manipulating.
SQL standard requires names stored in uppercase
The SQL standard requires identifiers be stored in all-uppercase. See section 5.2.13 of the SQL-92 as quoted from a draft copy in this Answer on another Question. The standard allows you use undelimited identifiers in lowercase or mixed case, as the SQL processor is required to convert as needed to convert to the uppercase version.
This requirement presumably dates back to the early days of SQL when mainframe systems were limited to uppercase English characters only.
Non-issue
Many database ignore this requirement by the standard.
For example, Postgres does just the opposite, converting all unquoted (“undelimited”) identifiers to lowercase — this despite Postgres otherwise hewing closer to the standard than any other system I know of.
Some databases may store the identifier in the case you specified.
Generally this is a non-issue. Virtually all databases do a case-insensitive lookup from the case used by an identifier to the case stored by the database.
There are occasional oddball cases where you may need to specify an identifier in its stored case or you may need to specify all-uppercase. This may happen with certain utilities where you must pass an identifier as a string outside the usual SQL processor context. Rare, but tuck this away in the back of your head in case you encounter some mysterious "cannot find table" kind of error message someday when using some unusual tool/utility. Has happened to me once.
Snake case
Common practice nowadays seems to be to use all lowercase with underscore separating words. This style is known as Snake case.
The use of underscore rather than Camel case helps if your identifiers are ever presented as all uppercase (or all lowercase) and thereby lose readability without the word separation.
Bonus Tip: The SQL standard (SQL-92 section 5.2.11) explicitly promises to never use a trailing underscore in a keyword. So append a trailing underscore to all your identifiers to eliminate all worry of accidentally colliding.
As far as I know for a common L.A.M.P. setup it won't really matter - but be aware that MySQL hosted on Linux is case sensitive!
To keep my code tidy I usually stick to lower case names for tables and colums, uppercase MySQL-Code and mixed Upper-Lower-Case variables - like this:
SELECT * FROM my_table WHERE id = '$myNewID'
I use pascal case for field names lower case for table names (usually) as follows:
students
--------
ID
FirstName
LastName
Email
HomeAddress
courses
-------
ID
Name
Code
[etc]
Why is this cool? because it's readable, and because I can parse it as:
echo preg_replace('/([a-z])([A-Z])/','$1 $2',$field); //insert a space
NOW, here's the fun part for tables:
StudentsCourses
--------------
Students_ID
Courses_ID
AcademicYear
Semester
notice I capitalized S and C? That way they point back to the primary table(s). You could even write a routine to logically parse db structure this way and build queries automatically. So I use caps in tables when they are JOIN tables as in this case.
Similarly, think of the _ as a -> in this table as: Students->ID and Courses->ID
Not student_id - instead Students_ID - the cognate of the field matches the exact name of the table.
Using these simple conventions produces a readable protocol which handles about 70% of your typical relational structure.
If you're using postgresql and PHP, for instance, you'd have to write your query like this:
$sql = "SELECT somecolumn FROM \"MyMixedCaseTable\" where somerow= '$somevar'";
"Quoting an identifier also makes it case-sensitive, whereas unquoted names are always folded to lower case. For example, the identifiers FOO, foo, and "foo" are considered the same by PostgreSQL, but "Foo" and "FOO" are different from these three and each other. (The folding of unquoted names to lower case in PostgreSQL is incompatible with the SQL standard, which says that unquoted names should be folded to upper case. Thus, foo should be equivalent to "FOO" not "foo" according to the standard. If you want to write portable applications you are advised to always quote a particular name or never quote it.)"
http://www.postgresql.org/docs/8.4/static/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS
So, sometimes, it depends on what you are doing...
Whatever you use, keep in mind the MySQL on Linux is case sensitive, while on Windows it is case insensitive .
The column names which are mixed case or uppercase have to be double quoted in PostgreSQL. If you don't want to worry about it in the future, name it in the lower case.
MySQL - the columns are absolutely case insensitive. And it can lead to problems. Say someone has written "mynAme" instead of "myName". The system would work fine, but once some developer would go searching for it through the source code, they might overlook it, and you all get in trouble.
No modern database cannot handle upper or lower case text.
Think this is worth emphasizing: If a binary or case-sensitive collation is in effect, then (at least in Sql Server and other databases with rich collation features) identifiers and variable names WILL be case sensitive. You can even create tables whose names differ only in case. (—I am not sure the info above about the sql-92 standard is correct—if so, this part of the standard is not widely followed.)

Are there benefits to a case sensitive database?

We have just 'migrated' an SQL Server 2005 database from DEVEL into TEST. Somehow during the migration process the DB was changed from case insensitive to sensitive - so most SQL queries broke spectacularly.
What I would like to know, is - are there any clear benefits to having a case sensitive schema?
NOTE: By this I mean table names, column names, stored proc names etc. I am NOT referring to the actually data being stored in the tables.
At first inspection, I cannot find a valid reason that offers benefits over case insensitivity.
I just found out why WE make it case sensitive. It is to ensure that when we deploy it on the client site, our DB works regardless whether the client's SQL Server is set up case sensitive or not.
That is one answer I wasn't expecting.
I really can't think of any good reason SQL identifiers should be case sensitive. I can think of one bad one, its the one MySQL gives for why their table names are case sensitive. Each table is a file on disk, your filesystem is case-sensitive and the MySQL devs forgot to table_file = lc(table_name). This is heaps of fun when you move a MySQL schema to a case-insensitive filesystem.
I can think of one big reason why they shouldn't be case sensitive.
Some schema author is going to be clever and decide that this_table obviously means something different from This_Table and make those two tables (or columns). You might as well write "insert bugs here" at that point in the schema.
Also, case-insensitivity lets you be more expressive in your SQL to emphasize tables and columns vs commands without being held to what the schema author decided to do.
SELECT this, that FROM Table;
Not all sections of Unicode have a bijective mapping between upper and lower-case characters — or even two sets of cases.
In those regions, "case-insensitivity" is a little meaningless, and probably misleading.
That's about all I can think of for now; in the ASCII set, unless you want Foo and foo to be different, I don't see the point.
Most languages out there are case-sensitive, as are most comparison algorithms, most file systems, etc. Case insensitivity is for lazy users. Although it does tend to make things easier to type, and does lead to many variants of the same names differing only by case.
Personally, between (MyTable, mytable, myTable, MYTABLE, MYTable, myTABLE, MyTaBlE), I would please like to see one universal version.
Case insensitivity is a godsend when you have developers that fail to follow any sort of conventions when writing SQL or come from development languages where case insensitivity is the norm such as VB.
Generally speaking I find it easier to deal with databases where there is no possibility that ID, id, and Id are distinct fields.
Other than a personal preference for torture, I would strongly recommend you stay with case insensitivity.
The only database I ever worked on that was set up for case sensitivity was Great Plains. I found having to remember every single casing of their schema namings was painful. I have not had the privilege of working with more recent versions.
Unless it has changed and if my memory serves, the nature of case sensitivity you are speaking of is determined at installation time and is applied to all databases. It was the case with the SQL Server installion that ran the Great Plains database I mentioned that all databases on that installation were case sensitive.
I like case-sensitivity, mostly because that's what I'm used to from programming in Perl (and most any other language too). I like using StudlyCaps for table names and all lower case with underscores for columns.
Of course, many databases allow you to quote names to enforce casing, like Postgres does. That seems like a reasonable approach as well.
I do support for Sybase Advantage Database Server and it uses a flat file format allowing DBF's as well as our own proprietary ADT format. The case where I see case sensitivity being an issue is when using our Linux version of the server. Linux is a case sensitive OS so we have an option in our db to lowercase all calls. This requires that the table files be lower case.
I'm pretty sure the SQL Spec requires case folding (which is effectively the same as insensitivity) for identifiers. PostgreSQL folds to lower, ORACLE folds to upper.

Resources