I am designing a database and recently named a column in a table DayOfWeek, completely forgetting that DayOfWeek is a built-in function in SQL Server. Now I am deciding if I should just leave it as is and reference the column with square brackets [DayOfWeek] or change the column name to avoid any conflicts in the future. I am not too far into the project so changing it is not too hard. The debate in my head is that the column name of DayOfWeek just makes so much sense for its purpose, so I really want to use it... but it is a reserved word... and could cause pain in the future (especially if I always have to put square brackets around it when referencing the column).
What does everyone think?
I would change it - i've got a legacy table called user - it is a pain with the square brackets all the time. perhaps call it DayOfWeekName or DayOFWeekId
Josh
Jeff,
If you're not too far down the track to rename the column (relatively) painlessly then I'd recommend you change it. You've identified one probable-future-headache for the maintentance crew, and I guess it would be actually less costly (over time) to clean it up now, especially considering that renaming something isn't the hell-on-wheels since the advent of truly effective search and replace functionality in text-editors and IDE's.
The truly hard part of renaming it is gaining the understanding required to do the job safely. You are unique (being the author) in having that understanding. If you asked me (just for instance) to do the job, then it probably wouldn't be a cost effective business proposition.
So... +1 for fixing the sucker yourself... and +2 for not doing it again ;-)
Cheers. Keith.
I always make sure I never use a keyword as the start of any variable/object/function simply because you're never sure if your target language will like it. It can often generate wacky errors that take a while to track down. Even if syntax checking does pick it up, it means you've wasted more time than if it had been a different name like FurryKitten.
I would avoid DayOfWeek and opt for something which is a completely different, maybe Weekday or DayName. It just saves hassle.
Plus - square brackets just create headaches, and there are a lot of SQL developers out there who don't use the brackets - new developers will end up creating "non-bracketed" code out of habit for some time after they join the team. Uncommon conventions should be avoided if possible.
change it if is easy. But as a good practice I'd use square brackets everywhere. Once you get used to, is no more of a pain than putting space between words.
We have a table here called User. If at all possible, I would change it. Although Josh is correct, you can place square brackets to denote it is a table, that task gets old very quickly. Using reserved words as tables also makes it difficult for other developers. If someone doesn't know that is a reserved word, it can be difficult determining why a query doesn't work.
If you using any DB adapter / abstraction library to get data from database, then don't worry. Class will escape column names for you. If you writing SQL queries yourself, then it can make some troubles. You need to escape column names in your queries.
P.S. I remmember, few times I was in situation like you. But I was named column as "order" :-)
I would avoid also using reserved words, you can get very easly in trouble if you're not paying attention.
Related
Apologies for the fuzzy title...
My problem is this; I have a SQL Server table persons with about 100.000 records. Every person has an address, something like "Nieuwe Prinsengracht 12 - III". The customer now wants to separate the street from the number and addition (so each address becomes two or three fields). The problem is that we can not be sure of the format the current address is in, it could also simply be something like "Velperweg 30".
The only thing we do know about it is that it's a piece of text, followed by a number, possibly followed by some more text (which can contain a number).
A possible solution would be to do this with regexes, but I would much (much, much) rather do this using a query. Is there any way to use regexes in a query? Or do you have any other suggestions how to solve such a problem?
Something like this maybe?
SELECT
substring([address_field], 1, patindex('%[1-9]%', [address_field])-1) as [STREET],
substring([address_field], patindex('%[1-9]%', [address_field]), len([address_field])) as [NUMBER_ADDITON]
FROM
[table]
It relies on the assumption that the [street] field will not contain any numbers, and the [number_addition] field will begin with a number.
SQL Server and T-SQL are rather limited in their processing prowess - if you're really serious about heavy-lifting and regexes etc., you're best bet is probably creating an assembly in C# or VB.NET that does all that tricky Regex business, and then deploying that into SQL-CLR and use the functions in T-SQL.
"Pure" T-SQL cannot really handle much string manipulation beyond SUBSTRING and CHARINDEX - but that's about it.
In answer to your "Is there any way to use regexes in a query?", then yes there is, but it needs a little .NET knowledge. Create a CLR assembly with a user-defined function that does your regex work. Visual Studio 2008 has a template project for this. Deploy it to your SQL server and call it from your query.
Name and Address parsing and standardization is probably one of the most difficult problems we can encounter as programmers for precisely the reasons you've mentioned.
I assume that whoever you work for their main business is not address parsing. My advice is to buy a solution rather than build one of your own.
I am familiar with this company. Your address examples appear to be non US or Canadian so I don't know if their products would be useful, but they may be able to point you to another vendor.
Other than a user of their products I am not affiliated with them in any way.
This sounds like the common "take a piece of complex text that could look like anything and make it look like what we now want it to look like" problem. These tend to be very hard to do using only T-SQL (which does not have native regex functionality). You will probably have to work with complex code outside of the database to solve this problem.
TGnat is correct. Address standardization is complicated.
I've encountered this problem before.
If your customer doesn't want to spring for the custom software, develop a simple GUI that allows a person to take an address and split it manually. You'd delete the address row with the old format and insert the row with the new address format.
It wouldn't take long for typists familiar with your addresses to manually make 100,000 changes. Of course, it's up to the customer if he wants to spend the money on custom software or typists.
But you shouldn't be stuck with the data cleaning bill, either.
I realize that this is an old question, but for future reference, I still decided to add an answer using regex (also so I don't forget it myself). Today, I ran into a similar problem in Excel, in which I had to split the address in street and house number too. In the end, I've copied the column to SublimeText (a shareware text editor), and used a regex to do the job (CTRL-H, enable regex):
FIND: ^('?\d?\d?\d?['-\.a-zA-Z ]*)(\d*).*$
REPLACE FOR THE HOUSE NUMBER: $2
REPLACE FOR THE STREET NAME: $1
Some notes:
Some addresses started with a quote, e.g. 't Hofje, so I needed to add '?
Some addresses contained digits at the start, e.g. 17 Septemberplein or 2e Molendwarsstraat, so I added \d?\d?\d?
Some addresses contained a -, e.g. Willem-Alexanderlaan or a '
Simple question: Office debate about whether the keyword AS is necessary in our T-SQL statements. I've always used it in cases such as
SELECT mycol AS Something
FROM MYTABLE;
I know you don't NEED it but is there any benefit of using it (or not using it)? I think it makes statements easier to read but people seem to disagree with me.
Thanks!
Generally yes, as it makes it easier to see what is aliased to what.
I agree with you that including the AS keyword makes queries easier to read. It's optional, but not including it is lazy. It doesn't make a significant difference to the performance of the query - the query plan will be the same. I would always prefer to include it for clarity.
I think it depends upon how readable your schema is to start with. If the field names are cryptic, then yes, using an alias can make it easier to understand the output of the SQL statement. However, there can be a cost associated with this when debugging. In a complex schema it can be difficult to track down the source of a column unless you look at the SQL statement itself to understand what field the alias is referring to.
I have almost always aliased my table names, and sometimes aliased my column names.
For production queries, I suggest that you go with uniformity - if you do it, do it at all times, and use the same convention. If you do not, then just leave things as they are.
I don't think it makes much difference. It certainly makes no difference to the performance. I think the important thing is to be consistent.
Similarly with table aliases:
SELECT mycol AS Something
FROM MYTABLE AS m;
Personally, I prefer to omit the AS, because it is faster to write and fewer characters to read.
I think you should always keep using AS when aliasing columns as it is obligatory for some other DBRM engines such as Oracle, for instance. So, if you are used to use this syntax, you won't get bothered for something as simple as that.
We are going to develop a new system over a legacy database(using .NET C#).There are approximately 500 tables. We have choosen to use an ORM tool for our data access layer. The problem is with namig convention for the entities.
The tables in the database have names like TB_CUST - table with customer data, or TP_COMP_CARS - company cars. The first letter of prefix defines the modul and the second letter defines its relations to other tables.
I would like to name the entities more meaningful. Like TB_CUST just Customer or CustomerEntity. Of course there would be a comment pointing to its table name.
But the DBA and programer in one person, dont want names like this. He wants to have the entities names exactly the same to the table names. He is saying that he would have to remember two names and it would be difficult and messy. I have to say his not really familiar with the principles of OOP.
But in case of an entity name like TP_COMP_CARS there should be methods names like Get TP_COMP_CARS or SaveTP_COMP_CARS..I think this is unreadable and ugly.
So please tell me your opinion. Who is right and why.
Thank you in advance
The whole idea of ORM tools, is to avoid the need of remembering database objects.
We usually create a database class with all the table and column names, so no one needs to remember anything, and the ORM should map database "names" to normal entities.
Although it is subjective, in my opinion you are right and he is wrong....
Who is going to work mostly with the new code? That person should decide the naming convention IMHO.
Personally of course I would go for your solution because as has already been mentioned, if you use ORM you don't need to hit the DB directly often.
As a compromise you could use names like TB_CUST where act directly with the database but then use names like Customer for your Data Transfer Objects. Writing good code involves creating an abstraction of any datasources you might be working with. Have GetTB_CUST() throughout your code is a little like having GetTB_CUSTFromThatSQLDatabaseWeHave() dotted around the place.
I personally hate table names like that, but it's a legacy system and I'm sure the DBA doesn't feel like renaming the tables. Renaming the tables could be an option. You would just have to create views representing the old table names so that your legacy system keeps running while you develop your new system. If this is not an option you can use the ORM to map table names to entity names. Or you can abstract your ORM away in a data access layer and define nice entity names in your domain model, having your DAL do the name conversion.
The naming conventions used in two different domains simply don't align. Java, for example, hasa very well defined rules/conventions for Class names and field names, where capitalisation is significant. In general, your application may be ported to a completely different Database with a different naming standard, it's not reasonable to demand alignment of names in Business Logic with names in Database. Consider a slightly more complex mapping, one Entity may not correspond to one Table.
And, really, come on ...
Customer == TB_CUST
That is just not so hard! I'm with you, makes the names meaningful in the code and map in the ORM. The learning for the DBA/Programmer should not be that painful, my guess is that it's one of those things that feels much worse in the anticipation than the reality.
If there are 500 tables in the database - you've already got a challenge keeping those names straight. Hopefully, you've got metadata and some graphical models that describe them more meaningfully.
When you create the next 500 ORM objects - you'll have another challenge. Even if you give them meaningful names it's still too many to really hope that all will be obvious. So, now you've got 2 problems.
If there's no way to link those two sets of 500 tables together - then you've got 3 problems. Think about debugging performance of queries in the ORM, and going to the DBA with names that he doesn't recognize. Think about your carefully crafted names - that then must be ignored when you create reports that hit the database directly.
So, I'd try very hard to use the database names in the ORM. But I would tweak a few things:
if a name is too cryptic to understand - I'd work with the DBA to improve its name. An easy way to transition to better names is through views. Ideally you get rid of the original confusing name eventually tho.
changing from underscores to camelcase, etc shouldn't be considered a change to the name - it's just a change to the separator. So, use the appropriate name for your language.
the database prefixes can probably be dropped - they're actually just attributes of the table name that have been "denormalized" and grafted onto the name. They may be necessary to avoid duplication if they indicate a subsection of the model, but in general may be be as important.
"I have to say his not really familiar with the principles of OOP.
But in case of an entity name like TP_COMP_CARS there should be methods names like Get TP_COMP_CARS or SaveTP_COMP_CARS..I think this is unreadable and ugly.
So please tell me your opinion. Who is right and why."
Which names are given to the things your IT systems manages has absolutely nothing to do with "the principles of OOP".
The same holds for which names are given to "standard" "getter and setter" methods : those are just agreements and conventions, not "principles of OOP".
The issue is a certain kind of "ergonomics" (and thus also the self-documenting value) of the code.
It is true that getTP_COMP_CARS looks ugly (though not, as you claim, "unreadable"). It is also true that if you start adhering to "more modern" naming conventions, then there will have to be someone somewhere who will have to maintain a mapping between the names that are synonymous. (And it is untrue that names such as TP_COMP_CARS are less self-documenting than full "natural-language-words" names : usually such names are constructed from a VERY SMALL set of mnemonic words that are used over and over again with the same meaning, making it more than easy enough for anyone to remember them.)
There is no right and wrong about this. Names like that were the usual convention in the days before the ones we live in now. At least, those names usually had the benefit of being case-insensitive, as opposed to the braindead (because case-sensitive) naming rules that are imposed upon us by so-called "more modern" systems.
Twenty years from now, people will call the naming conventions we use these days "braindead" too.
You usually normalize a database to avoid data redundancy. It's easy to see in a table full of names that there is plenty of redundancy. If your goal is to create a catalog of the names of every person on the planet (good luck), I can see how normalizing names could be beneficial. But in the context of the average business database is it overkill?
(Of course I know you could take anything to an extreme... say if you normalized down to syllables... or even adjacent character pairs. I can't see a benefit in going that far)
Update:
One possible justification for this is a random name generator. That's all I could come up with off the top of my head.
Yes, it's an overkill.
People don't change their names from Bill to Joe all at once.
Database normalization usually refers to normalizing the field, not its content. In other words, you would normalize that there only be one first name field in the database. That is generally worthwhile. However the data content should not be normalized, since it is individual to that person - you are not picking from a list, and you are not changing a list in one place to affect everybody - that would be a bug, not a feature.
How do you normalize a name? Not all names have the same structure. Not all countries or cultures use the same rules for names. A first name is not necessarily just a first name. People have variable numbers of names. Some countries don't have the simple pair of firstname/lastname. What if my first name just so happens to be your last name, should they be considered the same in your database? If not, then you get into the problem that last name might mean different things in different countries. In most countries I know of, it is a family name. Your last name is the same as at least one of your parents' last name. On Iceland, it is your father's first name, followed by "son" or "daughter". So the same last name will mean completely different things depending on whether you encounter it in Iceland and the US.
In some cultures it is common when getting married, for the woman to take her husband's last name. In other cultures, that's completely optional, or might even work the opposite way.
How can you normalize this? What information would it gain you? If you find someone in your database who has "Smith" as the last word making up their name, what does that tell you? It might not be their family name. It might only be part of the family name. It might be an honorary in some language, but which according to their culture, should be considered part of the name.
You can only normalize data if it follows a common structure.
If you had a need to perform queries based on diminutive names I could see a need for normalizing the names. e.g. a search for "Betty" may need to return results for "Betty", "Beth", and "Elizabeth"
Yes, definitely overkill. What's a few dozen bytes betewen friends?
Maybe if you work in the Census office it might make sense. Otherwise, see every other answer :)
I would say yes, it is going too far in 95%+ of the cases.
Yes. I cannot think of an instance where the benefits outweigh the problems and query complications.
No, but you might want to normalise to a canonical record for a customer (so you don't get 5 different entries for 'Bloggs & Co.' in your database. This is a data cleansing issue that often bites on MIS projects.
You often don't go over fourth form normalization in a database. Therefore seventh form normalization is quite a bit overboard. The only place this might even be a remotely plausible idea is in some kind of massive data warehouse.
Generally yes. Normalizing to that level would be going to far. Depending on the queries (such as phone books where searches by last name are common) it might be worthwhile. I expect that to be rare.
I generally haven't seen a need to normalize the name, mainly because that adds a performance hit on the join that will always be called, and doesn't give any benefit.
If you have so many similar names, and have a storage problem then it may be worth it, but there will be a performance hit that would need to be considered.
I would say it is absolutely overkill. In most applications, you display folks' names so often, every query involved with that is going to look that much more complex and harder to read.
Yes, it is. It is commonly recognized that just applying all of the Rules of Normalization can cause you to go way too far and end up with an overnormalized database. For example, it would be possible to normalize every instance of every character to a reference to a character enumeration table. It's easy to see that that's ridiculous.
Normalization needs to be performed at a level that is appropriate for your problem domain. Overnormalization is as much a problem as undernormalization (although, of course, for different reasons).
There might be a case where being able to link married/maiden names would be useful.
Recently had a case where I had to rename thousands of emails in exchange because somebody got divorced and didn't want any emails listing her as married_name#company.com
No need to normalize to that level unless the names make up a composite primary key and you have data that is dependant on one of the names (e.g. anyone with the surname Plummer knows nothing about databases). In which case, by not normalizing, you would violate second normal form.
I agree with the general response, you wouldn't do that.
One thing comes to mind though, compression. If you had a billion people and you found that 60% of first names were pulled from 5 very common names, you could use some tricky bit manipulation to reduce the size very significantly. It would also require very customized database software.
But this isn't for the purpose of normalization, just compression.
You should normalize it out if you need to avoid the delete anomaly that comes with not breaking it out. That is, if you ever need to answer the question, has my database ever had a person named "Joejimbobjake" in it, you need to avoid the anomaly. Soft deletes is probably a much better way than having a comprehensive first name table (for example), but you get my point.
In addition to all the points everyone else has made, consider that if you were implementing a data entry operation (for example), and were to insert a new contact, you would have to search your first name and last name tables to locate the correct Id's and then use those values. But then this is further complicated by the occasion when the name is not on the FN and/or LN tables, then you have to insert the new first/last name and use the new id(s).
And if you think that you have a comprehensive list of names, think again. I work with a list of over 200k unique first names and I'd guess it represents 99.9% of the US population. But that .1% = a lot of people. And don't forget the foreign names and misspellings...
I've got a question concerning fields in databases which are measures that might be displayed in different units but are stored only in one, such as "height", for example.
Where should the "pattern unit" be stated?. Of course, in the documentation, etc... But we all know nobody reads the documentation and that self-documented things are preferable.
From a practical point of view, what do you think of coding it in the database field (such as height_cm for example)?.
I find this weird at a first look, but I find it practical to avoid any mistakes when different people deal with the database directly and the "pattern unit" will never change.
What do you think?
What's weird about height_cm? Looks good to me.
Sometimes you see measures and units in two separate fields, which is much more painful.
As long as you know the units aren't going to change, I think height_cm is a good way to deal with it.
Most databases support comments on columns. For example in Postgres you could set a comment like this:
COMMENT ON COLUMN my_table.my_column IS 'cm';
Storing the unit name this way means your database is self-documenting.
I would also strongly recommend using standard scientific units (i.e. the metric system).
Be wary about measures that may change like currencies. In many cases it is not practical rename database field when it's measure changes.
It is rather silly to have a field called amount_mk which used to contain money amount in marks but currently actually contains money amount in euros.
I agree, nothing wrong with adding the unit to the field name.
The only thing I'd say is to make the naming convention consistent across your database - i.e. avoid situations where you have both height_cm and mm_width present in the same database!