As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I have for example a table name of Cars_Import
I need a view that will be called that grabs the data to be imported and that view is run and does the work to import the data into the Cars_Import table.
My problem is, I can't name the view the same, I have to differentiate it because I guess SQL server look at same name objects as a conflict no matter what type of object it is.
So for best practices in naming conventions that are generally accepted, when you have 2 objects that really relate to each other, and I know it's not good practice to append stuff like tbl, vw for view, etc. in the name, what would you suggest here as the view name related to Cars_Import?
I wouldn't want the view to have it for example switched around which would work but just seems messy to me such as Import_Cars
So what's the advice here on naming the table and its related view which will grab all data from that table that we need? There is no business logic, it's just grabbing the data and we're gonna import it into a data warehouse, all the data as is initially.
Views are actually the one place where I don't mind a prefix or suffix that describes what it is. Unlike when comparing a table or a stored procedure, which are quite obvious because they are used differently, tables and views are largely interchangeable. So I find that this differentiation can be helpful when reverse engineering or troubleshooting code (and I'm talking about when you come across the name in a piece of code, not browsing the objects through Object Explorer, which makes things much more obvious by definition).
Your naming scheme is up to you, and you're largely not going to get a "correct" answer here, other than that you should apply your convention consistently and unilaterally, and do what you can to make sure your entire time buys into it and follows it as well. But I will say that I wouldn't balk at something like this:
Table: dbo.Cars_Import
View: dbo.View_Cars_Import
But to me, this seems to imply that the view may just be something that sits over the table (say prettifying output, adding or hiding columns, etc.), not something that feeds the table. So I kind of agree with #HABO that maybe there is a better way to name this view that describes what it does.
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I face the following problem.
I'm creating a database for (say) human beings' info. All the human beings may be classified in one of the three categories: adult female, adult male, child. It is clear that the parameters like "height" and "weight" are applicable to all of the categories. The parameter "number of children" is applicable only to adults, while the parameter "number of pregnancies" is applicable to females only. Also, each parameter may be classified as mandatory or optional depending on the category (for example, for adults the parameter "number of ex-partners" is optional).
When I load (say) "height" and "weight", I check whether the info in these two fields is self-consistent. I.e., I mark as a mistake the record which has height=6'4'' and weight=10 lb (obviously, this is physically impossible). I have several similar verification rules.
When I insert a record about a human being, I need to reflect the following characteristics of the info:
the maximum possible info for the category of this particular human being (including all the optional parameters).
the required minimum of information for the category (i.e., mandatory fields only)
what has actually been inserted for this particular human being (i.e., it is possible to insert whatever I have for this person no matter whether it is smaller than the amount of required minimum of info or not). The non-trivial issue here is that a field "XXX" may have NULL value because I have never inserted anything there OR because I have intentionally inserted exactly NULL value. The same logic with the fields that have a default value. So somewhere should be reflected that I have processed this particular field.
what amount of inserted information has been verified (i.e., even if I load some 5 fields, I can check for self-consistency only 3 fields while ignoring the 2 left).
So my question is how to technically organize it. Currently, all these required features are either hardcoded with no unified logic or broken into completely independent blocks. I need to create a unified approach.
I have some naive ideas in my head in this regard. For example, for each category of human beings, I can create and store a list of possible fields (I call it "template"). A can mark those fields that are mandatory.
When I insert a record about a human being, I copy the template and mark what fields from this templates have actually been processed. At the next stage, I can mark in this copy of the template those fields that will be currently verified.
The module of verification is specially corrected in the following way: for each verification procedure I create a list of fields that are being used in this particular verification procedure. Then I call only those verification procedures that have those fields that are actually marked "to be verified" in the copy of the template for the particular human being that is to be verified (see the previous passage).
As you see, this is the most straightforward way to solve this problem. But my guess is that there are a lot of quite standardized approaches that I'm not aware of. I really doubt that I'm the first in the world to solve such a problem. I don't like my solution because it is really painfull to write the code to correctly reflect in this copied template all the "updates" happening with a record.
So, I ask you to share your opinion how would you solve this problem.
I think there are two questions here:
how do I store polymorphic data in a database?
how do I validate complex business rules?
You should address them separately - trying to solve both at once is probably too hard.
There are a few approaches to polymorphic data in RDBMSes - ORMs use the term inheritance mapping, for instance. The three solutions here - table per class hierarchy, table per subclass and table per concrete class - are "pure" relational solutions. You can also use the "Entity-Attribute-Value" design, or use a document approach (storing data in structured formats such as XML or JSON) - these are not "pure" relational options, but have their place.
Validating complex business rules is often done using rule engines - these are super cool bits of technology, but you have to be sure that your problem really fits with their solution - deciding to invest in a rules engine means your project changes into a rules engine project, not a "humans" project. Alternatively, most mainstream solutions to this embody the business logic about the entities in the application's business logic layer. It sounds like you're outgrowing this.
This exact problem, both in health terms and in terms of a financial instrument, is used as a primary example in Martin Fowlers book Analysis Patterns. It is an extensive topic. As #NevilleK says you are trying to deal with two questions, and it is best to deal with them separately. One ultra simplified way of approaching these problems is:
1 Storage of polymorphic data - only put mandatory data that is common to the category in the category table. For optional data put these in a separate table in 1-1 relationship to the category table. Entries are made in these optional tables only if there is a value to be recorded. The record of the verification of the data can also be put in these additional tables.
2 Validate complex business rules - it is useful to consider the types of error that can arise. There are a number of ways of classifying the errors but the one I have found most useful is (a) type errors where one can tell that the value is in error just by looking at the data - eg 1980-02-30. (b) context errors where one can detect the error only by reference to previously captured date - eg DoB 1995-03-15, date of marriage 1996-08-26. and (c) lies to the system - where the data type is ok; the context is ok; but the information can only be detected as incorrect at a later date when more information comes to light eg if I register my DoB as 1990-12-31, when it is something different. This latter class of error typically has to be dealt with by procedures outside the system being developed.
I would use the Party Role pattern (Silverston):
Party
id
name
Individual : Party
current_weight
current_height
PartyRole
id
party_id
from_date
to_date (nullable)
AdultRole : PartyRole
number_of_children
FemaleAdultRole : AdultRole
number_of_pregnancies
Postgres has a temporal extension such that you could enforce that a party could only play one role at a time (yet maintain their role histories).
Use table inheritance. For simplicity use Single Table Inheritance (has nulls), for no nulls use Class Table Inheritance.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
If all data were put into memory, which means the media speed is much more faster, what's the fastest way to do a "SELECT .. WHERE .." query (filter data)? So far the options in my mind:
1) b tree like algorithms, but it may still need index and larger space
2) fixed length array, smaller size but may be slower.
So are there any other better ways, if both speed and size are the concerns
It is dependent on the specific case you have - what operations you need fast, what is the exact size, and more. Some examples:
For AND queries, a set of sorted lists is usually maintained (a list for each feature). This data structure is called an inverted index, and
is used often by search engines to get the relevant documents from a
given query. (Apache Lucene uses this data structure, for example).
If arrays can be used - and iteration over the data is needed - it is a very efficient approach, since arrays are basically the most cache efficient data structure there is. Reading sequentially from an array is much faster in most cases then any other DS, since it gets you the fewest "hit misses", which are often the bottle neck when iterating your data.
If your data is strings for example, and you are going to filter according to some string attributes (prefix for example) using a designed data structure for strings, such as a trie or a radix tree - might get you the best performance.
Buttom line: If you are going to do something custom made in order to enhance performance of the default libraries, you should consider the specific problem details before designing your data structure of choice.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I don't usually design databases, so I'm having some doubts about how to normalize (or not) a table for registering users. The fields I'm having doubts are:
locationTown: I plan to normalize for countries, and have a separate table for it, but should I do the same for towns? I guess users would type this in when registering, and not choosing from a dropdown. Can one normalize when the input may be coming from users?
maritalStatus: I would have a choice of about 5 or so different statuses.
Also, does anyone know of a good place to find real world database schema/normalizing examples?
Thanks
locationTown - just store it directly inside user table. Otherwise you will have to search for existing town, taking typos and code case into account. Also some people use non-standard characters and languages (Kraków vs. Krakow vs. Cracow, see also: romanization). If you really want to have a table with towns, at least provide auto-complete box so the users are more likely choosing existing town. Otherwise prepare for lots of duplicates or almost duplicates.
maritalStatus - this in the other hand should be in a separate table. Or more accurately: use single character or a number to represent marital status. An extra table mapping this to human-readable form is just for convenience (remember about i18n) and foreign key constraint makes sure incorrect status aren't used.
I wouldn't worry about it too much - database normalization (3NF, et al) has been over-emphasized in academia and isn't overly practical in industry. In addition, we would need to see your whole schema in order to judge where these implementations are appropriate. Focus on indexing commonly-used columns before you worry about normalization.
You might want to take a look at this SO question before you dive in any further.
I am planning to make Q&A system (quite specific, has nothing to do with IT)
I was looking for Stackoverflow database map: https://meta.stackexchange.com/questions/2677/anatomy-of-a-data-dump/2678#2678
And I am thinking is not it is better practice to make separate table for questions titles. With "firstPostId".
Instead of
|- PostTypeId
| - 1: Question
| - 2: Answer
So I want to know, why stackoverflow did not use separate table for questions title. Is it "Do not optimize yet" or does it have any logic behind it?
Based just on the schema as shown in your link, I surmise that Questions and Answers have so many attributes in common that it was convenient to model it as was done. In short, symmetry and failing to multiply entities unnecessarily seem credible reasons for the approach.
I also suspect they use a key/value (a.k.a. nosql) database for the backing store which allows entries to not possess all possible attributes. For example, a question can have tags but an answer will not. Key/value databases don't fret over differences like that.
Disclaimer: I have no actual knowledge of how SO is implemented.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Being stuck with a legacy database schema that no longer reflects your data model is every developer's nightmare. Yet with all the talk of refactoring code for maintainability I have not heard much of refactoring outdated database schemas.
What are some tips on how to transition to a better schema without breaking all the code that relies on the old one? I will propose a specific problem I am having to illustrate my point but feel free to give advice on other techniques that have proven helpful - those will likely come in handy as well.
My example:
My company receives and ships products. Now a product receipt and a product shipment have some very different data associated with them so the original database designers created a separate table for receipts and for shipments.
In my one year working with this system I have come to the realization that the current schema doesn't make a lick of sense. After all, both a receipt and a shipment are basically a transaction, they each involve changing the amount of a product, at heart only the +/- sign is different. Indeed, we frequently need to find the total amount that the product has changed over a period of time, a problem for which this design is downright intractable.
Obviously the appropriate design would be to have a single Transactions table with the Id being a foreign key of either a ReceiptInfo or a ShipmentInfo table. Unfortunately, the wrong schema has already been in production for some years and has hundreds of stored procedures, and thousands of lines of code written off of it. How then can I transition the schema to work correctly?
Here's a whole catalogue of database refactorings:
http://databaserefactoring.com/
That's a very difficult thing to work around; A couple quick options after refactoring the database are:
Create views that match the original schema but pull from the new schema; You may need triggers here so any updates to the views can be handled.
Create the new schema and put in triggers on each side to maintain the other side.
This book (Refactoring Databases) has been a God-send to me when dealing with legacy database schemas, including when I had to deal with almost the exact same issue for our inventory database.
Also, having a system in place to track changes to the database schema (like a series of alter scripts that is stored int he source control repository) helps immensely in figuring out code-to-database dependencies.
Stored procedures and views are your friend here. Even if the system doesn't use them, change it to use them, then refactor the database underneath.
Your receipts and shipments then become views.
Beware, receipts and shipments are actually two very different beasts in most systems I have worked with. Receipts are linked to suppliers, while shipments are linked to customers (or customer/ship-to locations). At the inventory level, they are often represented the same.
Is all data access limited to stored procedures? If not, the task could be nearly impossible. If so, you just have to make sure your data migration scripts work well transitioning from the old to the new schema, and then make sure your stored procedures honor theur inputs and outputs.
Hopefully none of them have "select *" queries. If they do, use 'sp_help tablename' to get the complete list of columns, copy that out and replace each * with the complete column list, just to make sure you don't break client code.
I would recommend making the changes gradually, and do lots of integration testing. It's hard to do a significant remodel without introducing a few bugs.
The first thing is to create the table schema. I already did that for a Legacy database using Enterprise Architect. You can select the DB and it will create you every tables/fields. Then, you will need to split everything in categories. Exemple all your receives and ships products together, client stuff in an other category. Once everything is clear up, you will be able to refactor field by creating new table, new releashionship and new fields. Of course, this will need lot of change if all is accessed without Stored Procedure.
I don't think its obvious that the id of the transactions table should be a foreign key to either ReceiptInfo or a ShipmentInfo. Think the other way around. In an object oriented model you should have a transaction table and the ReceiptInfo or a ShipmentInfo should have a foreign key to the transaction table. If you are lucky, there will be only 1 or 2 points in code where new records in ReceiptInfo or a ShipmentInfo are made. There you should add code where you add an entry in the Transaction table and after that create the entry in ReceiptInfo or ShipmentInfo with the foreign key to Transaction.
Sometimes you can create new tables that have better structures and then create views with the names of your old tables but are based on the data in the new tables. That way, you code doesnt break while you start to move to a better structure. Be careful with thsi though as sometimes you move from a non-relational table to a relational structure where you have multiple records while the code will be expecting only one. This is particulalry true if you have developers who use subqueries.
Then as each thing is changed, it will move away from the views to the real table. Eventually you can drop the views. This at least allows you to work incrementally to keep things working as you move stuff, but start to fix things to use a better design.