I've been struggling with how I should indicate that a certain record in a database is the "fallback" or default entry. I've also been struggling with how to reduce my problem to a simple problem statement. I'm going to have to provide an example.
Suppose that you are building a very simple shipping application. You'll take orders and will need to decide which warehouse to ship them from.
Let's say that you have a few cities that have their own dedicated warehouses*; if an order comes in from one of those cities, you'll ship from that city's warehouse. If an order comes in from any other city, you want to ship from a certain other warehouse. We'll call that certain other warehouse the fallback warehouse.
You might decide on a schema like this:
Warehouses
WarehouseId
Name
WarehouseCities
WarehouseId
CityName
The solution must enforce zero or one fallback warehouses.
You need a way to indicate which warehouse should be used if there aren't any warehouses specified for the city in question. If it really matters, you're doing this on SQL Server 2008.
EDIT: To be clear, all valid cities are NOT present in the WarehouseCities table. It is possible for an order to be received for a City not listed in WarehouseCities. In such a case, we need to be able to select the fallback warehouse.
If any number of default warehouses were allowed, or if I was assigning default warehouses to, say, states, I would use a DefaultWarehouse table. I could use such a table here, but I would need to limit it to exactly one row, which doesn't feel right.
How would you indicate the fallback warehouse?
*Of course, in this example we discount the possibility that there might be multiple cities with the same name. The country you are building this application for rigorously enforces a uniqueness constraint on all city names.
I understand your problem, but have questions about parts of it, so I'll be a bit more general.
If at all possible I would store warehouse/backup warehouse data with your inventory data (either directly hanging of warehouses, or if it's product specific off the inventory tables).
If the setup has to be calculated through your business logic then the records should hang off the order/order_item table
In terms how to implement the structure in SQL, I'll assume that all orders ship out of a single warehouse and that the shipping must be hung off the orders table (but the ideas should be applicable elsewhere):
The older way to enforce zero/one backup warehouses would be to hang a Warehouse_Source record of the Orders table and include an "IsPrimary" field or "ShippingPriority" then include a composite unique index that includes OrderID and IsPrimary/ShippingPriority.
if you will only ever have one backup warehouse you could add ShippingSource_WareHouseID and ShippingSource_Backup_WareHouseID fields to the order. Although, this isn't the route I would go.
In SQL 2008 and up we have the wonderful addition of Filtered Indexes. These allow you to add a WHERE clause to your index -- resulting in a more compact index. It also has the added benefit of allowing you to accomplish some things that could only be done through triggers in the past.
You could put a Unique filtered index on OrderID & IsPrimary/ShippingPriority (WHERE IsPrimary = 0).
Add a comment or such if you want me to explain further.
re: how I should indicate that a certain record in a database is the "fallback" or default entry
use another column, isFallback, holding a binary value. I'm assuming your fallback warehouse won't have any cities associated with it.
As I see it, the fallback warehouse after all is just another warehouse and if I understood, every record in WarehouseCities has a reference to one record in Warehouses:
WarehouseCities(*)...(1)Warehouses
Which means that if there are a hundred cities without a dedicated warehouse they all will reference the id of an specific fallback warehouse. So I don't see any problem (which makes me thing I didn't understand the problem), even the model looks well defined.
Now you could identify if a warehouse is fallback warehouse with an attribute like type_warehouse on Warehouses.
EDIT after comment
Assuming there is only one fallback warehouse for the cities not present in WarehouseCities, I suggest to keep the fallback warehouse as just another warehouse and keep its Id (WarehouseId) as an application parameter (a table for parameters maybe?), of course, this solution is programmatically and not attached to your database platform.
Related
I am working on a data warehouse solution, and I am trying to build a dimensional model from tables held in a SQL Server database. Some of the tables include but aren't limited to Customer, Customer Payments, Customer Address, etc.
All these tables in the DB have some fields that are repeated multiple times across each table i.e. Record update date, record creatuin date, active flag, closed flag and a few others. These tables all relate to the Customer in some way, but the tables can be updated independently.
I am in the process of building out a dimension(s) on the back of these tables, but I am struggling to see how best to deal with these repeated fields in an elegant way, as they are all used.
I'll appreciate any guidance from people who have experience with scenarios like this, as I ammjust starting out
If more details are needed, I am happy to provide
Thanks
Before you even consider how to include them, ask if those metadata fields even need to be in your dimensional model? If no one will use the Customer Payment Update Date (vs Created Date or Payment Date), don't bring it into your model. If the customer model includes the current address, you won't need the CustomerAddress.Active flag included as well. You don't need every OLTP field in your model.
Make notes about how you talk about the fields in conversation. How do you identify the current customer address? Check the CurrentAddress flag (CustomerAddress.IsActive). When was the Customer's payment? Check the Customer Payment Date (CustomerPayment.PaymentDate or possibly CustomerPayment.CreatedDate). Try to describe them in common language terms. This will provide the best success in making your model discoverable by your users and intuitive to use.
Naming the columns in the model and source as similar as possible will also help with maintenance and troubleshooting.
Also, make sure you delineate the entities properly. A customer payment would likely be in a separate dimension from the customer. The current address may be in customer, but if there is any value to historical address details, it may make sense to put it into its own dimension, with the Active flag as well.
I am having a hard time trying to figure out when to use a 1-to-1 relationship in db design or if it is ever necessary.
If you can select only the columns you need in a query is there ever a point to break up a table into 1-to-1 relationships. I guess updating a large table has more impact on performance than a smaller table and I'm sure it depends on how heavily the table is used for certain operations (read/ writes)
So when designing a database schema how do you factor in 1-to-1 relationships? What criteria do you use to determine if you need one, and what are the benefits over not using one?
From the logical standpoint, a 1:1 relationship should always be merged into a single table.
On the other hand, there may be physical considerations for such "vertical partitioning" or "row splitting", especially if you know you'll access some columns more frequently or in different pattern than the others, for example:
You might want to cluster or partition the two "endpoint" tables of a 1:1 relationship differently.
If your DBMS allows it, you might want to put them on different physical disks (e.g. more performance-critical on an SSD and the other on a cheap HDD).
You have measured the effect on caching and you want to make sure the "hot" columns are kept in cache, without "cold" columns "polluting" it.
You need a concurrency behavior (such as locking) that is "narrower" than the whole row. This is highly DBMS-specific.
You need different security on different columns, but your DBMS does not support column-level permissions.
Triggers are typically table-specific. While you can theoretically have just one table and have the trigger ignore the "wrong half" of the row, some databases may impose additional limits on what a trigger can and cannot do. For example, Oracle doesn't let you modify the so called "mutating" table from a row-level trigger - by having separate tables, only one of them may be mutating so you can still modify the other from your trigger (but there are other ways to work-around that).
Databases are very good at manipulating the data, so I wouldn't split the table just for the update performance, unless you have performed the actual benchmarks on representative amounts of data and concluded the performance difference is actually there and significant enough (e.g. to offset the increased need for JOINing).
On the other hand, if you are talking about "1:0 or 1" (and not a true 1:1), this is a different question entirely, deserving a different answer...
See also: When I should use one to one relationship?
Separation of duties and abstraction of database tables.
If I have a user and I design the system for each user to have an address, but then I change the system, all I have to do is add a new record to the Address table instead of adding a brand new table and migrating the data.
EDIT
Currently right now if you wanted to have a person record and each person had exactly one address record, then you could have a 1-to-1 relationship between a Person table and an Address table or you could just have a Person table that also had the columns for the address.
In the future maybe you made the decision to allow a person to have multiple addresses. You would not have to change your database structure in the 1-to-1 relationship scenario, you only have to change how you handle the data coming back to you. However, in the single table structure you would have to create a new table and migrate the address data to the new table in order to create a best practice 1-to-many relationship database structure.
Well, on paper, normalized form looks to be the best. In real world usually it is a trade-off. Most large systems that I know do trade-offs and not trying to be fully normalized.
I'll try to give an example. If you are in a banking application, with 10 millions passbook account, and the usual transactions will be just a query of the latest balance of certain account. You have table A that stores just those information (account number, account balance, and account holder name).
Your account also have another 40 attributes, such as the customer address, tax number, id for mapping to other systems which is in table B.
A and B have one to one mapping.
In order to be able to retrieve the account balance fast, you may want to employ different index strategy (such as hash index) for the small table that has the account balance and account holder name.
The table that contains the other 40 attributes may reside in different table space or storage, employ different type of indexing, for example because you want to sort them by name, account number, branch id, etc. Your system can tolerate slow retrieval of these 40 attributes, while you need fast retrieval of your account balance query by account number.
Having all the 43 attributes in one table seems to be natural, and probably 'naturally slow' and unacceptable for just retrieving single account balance.
It makes sense to use 1-1 relationships to model an entity in the real world. That way, when more entities are added to your "world", they only also have to relate to the data that they pertain to (and no more).
That's the key really, your data (each table) should contain only enough data to describe the real-world thing it represents and no more. There should be no redundant fields as all make sense in terms of that "thing". It means that less data is repeated across the system (with the update issues that would bring!) and that you can retrieve individual data independently (not have to split/ parse strings for example).
To work out how to do this, you should research "Database Normalisation" (or Normalization), "Normal Form" and "first, second and third normal form". This describes how to break down your data. A version with an example is always helpful. Perhaps try this tutorial.
Often people are talking about a 1:0..1 relationship and call it a 1:1. In reality, a typical RDBMS cannot support a literal 1:1 relationship in any case.
As such, I think it's only fair to address sub-classing here, even though it technically necessitates a 1:0..1 relationship, and not the literal concept of a 1:1.
A 1:0..1 is quite useful when you have fields that would be exactly the same among several entities/tables. For example, contact information fields such as address, phone number, email, etc. that might be common for both employees and clients could be broken out into an entity made purely for contact information.
A contact table would hold common information, like address and phone number(s).
So an employee table holds employee specific information such as employee number, hire date and so on. It would also have a foreign key reference to the contact table for the employee's contact info.
A client table would hold client information, such as an email address, their employer name, and perhaps some demographic data such as gender and/or marital status. The client would also have a foreign key reference to the contact table for their contact info.
In doing this, every employee would have a contact, but not every contact would have an employee. The same concept would apply to clients.
Just a few samples from past projects:
a TestRequests table can have only one matching Report. But depending on the nature of the Request, the fields in the Report may be totally different.
in a banking project, an Entities table hold various kind of entities: Funds, RealEstateProperties, Companies. Most of those Entities have similar properties, but Funds require about 120 extra fields, while they represent only 5% of the records.
I'm looking at the best practice approach here. I have a web page that has several drop down options. The drop downs are not related, they are for misc. values (location, building codes, etc). The database right now has a table for each set of options (e.g. table for building codes, table for locations, etc). I'm wondering if I could just combine them all into on table (called listOptions) and then just query that one table.
Location Table
LocationID (int)
LocatValue (nvarchar(25))
LocatDescription (nvarchar(25))
BuildingCode Table
BCID (int)
BCValue (nvarchar(25))
BCDescription (nvarchar(25))
Instead of the above, is there any reason why I can't do this?
ListOptions Table
ID (int)
listValue (nvarchar(25))
listDescription (nvarchar(25))
groupID (int) //where groupid corresponds to Location, Building Code, etc
Now, when I query the table, I can pass to the query the groupID to pull back the other values I need.
Putting in one table is an antipattern. These are differnt lookups and you cannot enforce referential integrity in the datbase (which is the ciorrect place to enforce it as applications are often not the only way data gets changed) unless they are in separate tables. Data integrity is FAR more important than saving a few minutes of development time if you need an additonal lookup.
If you plan to use the values later in some referencing FKeys - better use separate tables.
But why do you need "all in one" table? Which problem it solves?
You could do this.
I believe that is your master data and it would not be having any huge amounts of rows that it might create and performance problems.
Secondly, why would you want to do it once your app is up and running. It should have thought about earlier. The tables might be used in a lot of places and it's might be a lot of coding and most importantly testing.
Can you throw further light into your requirements.
You can keep them in separate tables and have your stored procedure return one set of data with a "datatype" key that signifies which set of values go with what option.
However, I would urge you to consider a much different approach. This suggestion is based on years of building data driven websites. If these drop-down options don't change very often then why not build server-side include files instead of querying the database. We did this with most of our websites. Think about it, each time the page is presented you query the database for the same list of values... that data hardly ever changes.
In cases when that data did have the tendency to change, we simply added a routine to the back end admin that rebuilt the server-side include file whenever an add, change or delete was done to one of the lookup values. This reduced database I/O's and spead up the load time of all our websites.
We had approximately 600 websites on the same server all using the same instance of SQL Server (separate databases) our total server database I/O's were drastically reduced.
Edit:
We simply built SSI that looked like this...
<option value="1'>Blue</option>
<option value="2'>Red</option>
<option value="3'>Green</option>
With single table it would be easy to add new groups in favour of creating new tables, but for best practices concerns you should also have a group table so you can name those groups in the db for future maintenance
The best practice depends on your requirements.
Do the values of location and building vary frequently? Where do the values come from? Are they imported from external data? Do other tables refer the unique table (so that I need a two-field key to preper join the tables)?
For example, I use unique table with hetorogeneus data for constants or configuration values.
But if the data vary often or are imported from external source, I prefer use separate tables.
I am working on a project in which I have generated a unique id of a customer with the customer's Last name's first letter. And stored it in a database in different tables as if customer's name starting with a then the whole information of the customer will stored in Registration_A table. As such I have created tables of Registration up to Z. But retrieving if data with such structure is quiet difficult. can you suggest me another method to save data so that retrieving become more flexible?
Put all of your registration data into one table. There's absolutely no need for you to break it into alphabetical pieces like that unless you have some serious performance issues.
When querying for registration data, use SQL's WHERE clause to narrow down your results.
You have to merge this to one table ´Registration´, then let the database care about unique ids. This depends on your database, but searching for PRIMARY KEY or AUTO INCREMENT should give you lots of results.
If you have done the the splitting because of performance reasons, you can add a Index on the users last name.
(Sorry about the vagueness of the title; I can't think how to really say what I'm looking for without writing a book.)
So in our app, we allow users to change key pieces of data. I'm keeping records of who changed what when in a log schema, but now the problem presents itself: how do I best represent that data in a view for reporting?
An example will help: a customer's data (say, billing address) changed on 4/4/09. Let's say that today, 10/19/09, I want to see all of their 2009 orders, before and after the change. I also want each order to display the billing address that was current as of the date of the order.
So I have 4 tables:
Orders (with order data)
Customers (with current customer data)
CustomerOrders (linking the two)
CustomerChange (which holds the date of the change, who made the change (employee id), what the old billing address was, and what they changed it to)
How do I best structure a view to be used by reporting so that the proper address is returned? Or am I better served by creating a reporting database and denormalizing the data there, which is what the reports group is requesting?
There is no need for a separate DB if this is the only thing you are going to do. You could just create a de-normalized table/cube...and populate and retrieve from it. If your data is voluminous apply proper indexes on this table.
Personally I would design this so you don't need the change table for the report. It is a bad practice to store an order without all the data as of the date of the order stored in a table. You lookup the address from the address table and store it with the order (same for partnumbers and company names and anything that changes over time.) You never get information on an order by joining to customer, address, part numbers, price tables etc.
Audit tables are more for fixing bad changes or looking up who made them than for reporting.