I'm kinda new in databases and I need to solve this one:
client(NAME, CITY, STREET, NUMBER, BALANCE, CLIENT_CODE);
order(ORDER_NUMBER, CLIENT_CODE, PROVIDER_NAME, FUEL, QTY, DATE);
provider(PROVIDER_NAME, PROVIDER_ADRESS, FUEL_CODE, PRICE, STOCK);
fuel(FUEL_CODE, FUEL_NAME);
I've already tried (and succeeded I guess) to create an ALTER TABLE for provider and fuel named provider_fuel in order to solve many-to-many relationship, but I don't have any idea how I could create a connection between order and provider.
I've changed the entities like this:
provider(PROVIDER_ID, PROVIDER_ADRESS);
fuel(FUEL_CODE, FUEL_NAME);
provider_fuel(ID_ENTRY, ID_PROVIDER, ID_FUEL, PRICE, STOCK).
Is that even ok? If so, how can I make connections between all those entities in my database?
I have to mention that my app should allow clients to place orders from providers who have their specific fuels, prices and so on.
I would suggest the following table layouts:
address(ID_ADDRESS, CITY, STREET, NUMBER);
client(ID_CLIENT, NAME, ID_ADDRESS);
client_order(ID_CLIENT_ORDER, ID_CLIENT, RECEIVED_DATE);
client_order_detail(ID_CLIENT_ORDER_DETAIL, ID_CLIENT_ORDER, ID_PROVIDER, ID_PRODUCT,
ORDER_QTY, STATUS, DELIVERED_DATE); -- Status in ('OPEN', 'DELIVERED')
provider(ID_PROVIDER, NAME, ID_ADDRESS);
product(ID_PRODUCT, PRODUCT_NAME);
provider_product(ID_PROVIDER, ID_PRODUCT, PRICE, STOCK_QTY);
You can, if you like, expand this. For example, a provider might have multiple locations from which product can be supplied, in which case you'd need to re-work the single PROVIDER table into something like
PROVIDER(ID_PROVIDER, NAME)
PROVIDER_PRODUCT(ID_PROVIDER, ID_PRODUCT, PRICE)
PROVIDER_ADDRESS(ID_PROVIDER_ADDRESS, ID_PROVIDER, ID_ADDRESS)
and then rework PROVIDER_PRODUCT as
PROVIDER_ADDRESS_PRODUCT(ID_PROVIDER_ADDRESS, ID_PRODUCT, STOCK_QTY)
I suppose it's also possible that the price a provider charges might depend on the location from which it's shipped so you might need to change the model to accommodate that. The point is that there are many different ways to do this, and it depends very much on the requirements you have.
EDIT
To obtain the total value of an order using the tables above you'd use a query such as:
SELECT co.ID_CLIENT_ORDER, SUM(cod.ORDER_QTY * pp.PRICE) AS ORDER_TOTAL_VALUE
FROM CLIENT_ORDER co
INNER JOIN CLIENT_ORDER_DETAIL cod
ON cod.ID_CLIENT_ORDER = co.ID_CLIENT_ORDER
INNER JOIN PROVIDER_PRODUCT pp
ON pp.ID_PROVIDER = cod.ID_PROVIDER AND
pp.ID_PRODUCT = cod.ID_PRODUCT
GROUP BY co.ID_CLIENT_ORDER
ORDER BY co.ID_CLIENT_ORDER
You can put id_provider in order table and get the provider details by the id .For ex
Select *,p.price,p.fuel_code from order as o join provider as p on o.id_provider=p.id
Welcome to the Oracle database world!
You can make connections between tables or views using JOIN structure.
In your example, you can connect your tables like this:
SELECT od.* FROM order od, provider pd WHERE od.provider_name=pd.provider_name
Or you can write it like this. It's up to your style. But I prefer the first one.
SELECT od.* FROM order od INNER JOIN provider pd ON od.provider_name=pd.provider_name
But it's always better using NUMBER type columns in joins. So I suggest, create provider_id (NUMBER type) on 2 tables and join them.
There are 5 main types (INNER, OUTER, FULL, LEFT, RIGHT) of joins in SQL. Each of them provides different things.
If I understand correctly your database represents a market place where a client can buy different kinds of fuels from providers. So far you have used natural keys:
A client has some login code that identifys them (client_code).
A provider is identified by their name. Do you want this? That would mean a provider address stays constant and cannot change. That may or may not be desired. If the company JACOB'S FUELSHOP INC changes to THE FUELSHOP INC, do you want to treat it as the same company or a different one?
You also generate order numbers. Do you want them unique in your database, or per client or per provider? The order table's key would accordingly be either just order_number or order_number + client_code or order_number + provider_name.
Now you don't want to have one provider only sell one type of fuel, but various types. You introduce a bridge table. But in the same step you introduce technical IDs. Do you want to use technical IDs instead of natural keys?
You have provider(PROVIDER_ID, PROVIDER_ADRESS) where the provider's name is either missing or included in the address. This allows you to change the name later. But remember that you may want to store the name then with every order, as you may want to reprint it later or just say whether you placed the order with the legal company JACOB'S FUELSHOP INC or THE FUELSHOP INC. Technical IDs are fine, but you must remember to still keep unique constraints on the natural keys and think about consequences as the one just mentioned.
As to the fuel price: You are storing the current fuel price, but you are not storing the price when ordered. Add the price to the orders table.
As to your question: Order and provider are already linked by provider_name in the original design. In the new design you've introduced the provider_id. If you want to use this instead, put provider_id in the orders table. As mentioned, you may or may not want the provider name in that table, too.
Related
I'm designing a database with a connection I haven't encountered before and wondering the best approach.
Let's say I have an Invoice, and that invoice can be assigned to an Organization, or an Individual, and in some cases that individual can be part of an Organization.
The way I have this thought-out so far is as follows:
Organizations Invoices Individuals
-pk -pk -pk
-name -organization_id -org_member_id
-address_id -individual_id -name
-... -... -...
So if an invoice is assigned to an individual, the individual_id is used. If that individual is associated with an organization then a through association would pick that up... (but i imagine organization_id would remain nil?) However if only an organization is assigned to the invoice then individual_id would of course be nil.
Not sure what the best way to go about this is. Thanks in advance for any advice.
There are multiple ways in which you can approach this.
One approach : as you mentioned, if the Invoice is individual based, then only individual_id is filled while keeping the organization_id as null. If that individual is part of an organization, then you can fill that organization's ID in to organization_id - so this column can be NULLABLE in your schema. If invoice is assigned to an organization only, then fill that id and keep individual_id as NULL.
Another approach : Introduce a column named assignee_type [char(1)] and use either O or I to determine the type of assignment, and just fill the assignee_id column with either Individual or Organization ID only. When you query the data, you need to refer the assignee_type column and then based on that join with either Organization table or Individual table - this can add overhead.
Both approaches have their own pros and cons, it depends on how your retrievals are going to be from this Invoice table, that will influence which approach you could take.
I have a scenario that, there are three types of functionalities has same set of fields (except their primary key).
The below is the sample. I would like know, whether it is a better idea to group the common fields in a single table? If we create a common table, how can we give the FK reference to the corresponding primary key table? What would be the better approach?
tblCountry
tblState
tblCity
countryid
StateId
CityId
Name
CountryId
StateId
officiallanguage
officiallanguage
officiallanguage
officialFlag
officialFlag
officialFlag
officialFlower
officialFlower
officialFlower
officialAnimal
officialAnimal
officialAnimal
officialBird
officialBird
officialBird
...
...
...
...
...
...
...
...
...
etc
etc
etc
Your intended third normal form (3NF) is good as it is.
From a simplicity point of view (affecting joins) it is as good as it can be. And foreign keys between country, state and city are obivous and trivial.
Now to save you from copying some column names you could put all three elements country, state and city into a single table - effectively making it second normal form. With this the meaning of your column names start to roleplay. With this I mean the officialLanguage can either be country, state or city related. From the stored table design this is no longer obvious. Only by interpretation of the multi column key.
So in short by saving on some typing / copying you will complicate any further work using a single table with convoluted meaning instead of using three tables with clear meanings.
Now towards data selection this is an issue only if there are no aliases.
Consider selecting officialLanguage of a city in a country.
SELECT
name,
officialLanguage,
name,
officialLanguage
FROM city
INNER JOIN state
ON state.stateid = city.stateid
INNER JOIN country
ON country.countryid = state.countryid
;
This will fail as the columns chosen are ambigiuous.
Now consider this query (where the aliases are shorted just to demonstrate the aliases - personally I try to use up-to-10-letters aliases):
SELECT
cit.name AS city_name,
cit.officialLanguage AS city_language,
cou.name AS country_name,
cou.officialLanguage aS country_language
FROM city AS cit
INNER JOIN state AS sta
ON sta.stateid = cit.stateid
INNER JOIN country AS cou
ON cou.countryid = sta.countryid
;
It is very clear and concise. I can use country table in different queries without having to pre-select those countries from a table with intermingled objects like country, state and city.
The only downside to this approach is the multi join of properly indexed tables.
Also as there are quite a few countries, states and cities across the world this single table approach can be a performance issue down the line.
4NF (or at least BCNF or 3.5NF as it is otherwise known) is best for fast performance in joins with the trade off that joins can become complex (to write) when properly indexed. However for database engines these are easiest to read.
2NF (or Excel tables as I call those) are easiest to read for humans. Which require complicated join and/or conditions (WHERE clause) to properly identify just a subset.
For the database design best use at minimum 3NF or better, then prepare views to turn the data back to 2NF to make your data human-readable.
You're asking for advice around when you should consider further normalisation on a schema.
This answer from #KnutBoehnert is quite detailed, my answer is really a supplement to that.
The goal of normalizing the database structure is NOT to reduce the duplication names of fields to a single reference, but to reduce the duplication of the data that is stored within those fields. To provide a definitive answer for a given schema would normally require you to provide a set of records, however your dataset is simple enough to correlate to real world norms that we can talk about some hypotheticals.
This is different to inheritance or composition in OO programming, where I would strongly encourage you to encapsulate these fields either as properties on a common base class between the objects that represent records from these tables, or as an interface that they all compose into their structure.
Only when Country/State/City commonly have the same values for the duplicated field names is there is a strong argument to refactor this structure to introduce a separate table to hold the values of Language,Flag,Flower,Animal,Bird.
If you did create a separate table for this information, how often would the records in it be re-used? For instance, how many different countries are going to reference the same record? most likely none as the flag is usually unique for each country, but certainly the combination of all those fields will be uinique per country. For state the same statement is usually true. If there is no re-use of the records in this new table, if the relationship is always 1:1 then the database and the queries are optimised by leaving them within the table the way you have it now.
1:1 relationships do have a place, especially if for a given cenceptual record there are distinct use cases where one set of fields is updated or queried in isolation from another set and that these two sets have very different query rates, however on its own without any further supportive reasons, 1:1 relationships in a schema can be simplified by merging the two tables in the one record.
If for instance all states within a country, and all cities within a state are expeceted to have the same Language, then you would not need the field at the lower levels at all and you could use joins to access the Language field from the Country record. But in the physical world there are many countries that have states that have different local languages or dialects to that of the country as a whole, so I don't think this applies to this particular schema.
In fact I see no reason for your current schema to change, the structure as it is now even allows you use null values to indicate that the value should be coalesced from the parent level.
In Australia for instance, all of the records for State and City are more than likely to have a Language value of English, we can use null coalescing statements to prevent the need to enter and maintain this value in each State and City record, so in Australia, of all states and cities left a value of null in the officialLanguage field, we could still coalesce it from the parent related country:
SELECT
country.Name AS country_name,
country.officialLanguage AS country_language
state.name AS state_name,
IsNull(state.officialLanguage,country.officialLanguage) AS state_language
city.name AS city_name,
COALESCE(city.officialLanguage, state.officialLanguage, country.officialLanguage) AS city_language,
FROM tblCity AS city
INNER JOIN tblState AS state
ON state.Stateid = city.Stateid
INNER JOIN tblCountry AS country
ON country.Countryid = state.Countryid
To query the records for just the city table in this way could look like this:
I'm not sure it makes sense to coalesce the other fields, but it could be done
SELECT
CityId,
city.StateId,
state.CountryId,
...
COALESCE(city.officialLanguage, state.officialLanguage, country.officialLanguage) AS officialLanguage,
COALESCE(city.officialFlag, state.officialFlag, country.officialFlag) AS officialFlag,
COALESCE(city.officialFlower, state.officialFlower, country.officialFlower) AS officialFlower,
COALESCE(city.officialAnimal, state.officialAnimal, country.officialAnimal) AS officialAnimal,
COALESCE(city.officialBird, state.officialBird, country.officialBird) AS officialBird,
FROM tblCity AS city
INNER JOIN tblState AS state
ON state.Stateid = city.Stateid
INNER JOIN tblCountry AS country
ON country.Countryid = state.Countryid
In many real world applications of this null coalescing concept the application layer would handle the display of these fields differently, but the storage is what we are most concerned about for this question today.
I've run into a bit of a pickle during my development of a web application. I've boiled down the complexity of the application for sake of simplicity in this question.
The purpose of this web application is to sell insurance. Insurance can be purchased through an agent (Agency) or over the phone directly (Customer). Insurance policies can be paid through the agency or the customer can pay for the policy directly. So money is owed (invoiced) and received (payments) from multiple sources (Agencies/Customers).
Billing Options:
Agency (Agency collects from customer outside of app)
Customer
Here's where it gets complicated. Agencies are stored in a separate database table than customers (for obvious reasons). However, both agencies and customers need to be able to make payments and have invoices assigned to them. I'm having difficulty figuring out how to create the proper database schema to allow for both types of database records to be connected to their invoices and payments.
My initial plan was to set up separate relationship (joining) tables that link the agencies and customers to invoices/payments.
However, now that I've been thinking about the problem more, I think it might be beneficial to merge both agencies and customers into a single "Payee" table which would then be associated with payments/invoices. The payee table would only store a primary key. It would not contain actual names or info for the payee - instead I would pull that data via a JOIN with either the agencies or customers tables.
Regardless of whatever solution I choose I am still faced with the problem when creating a new payment record is that I need to scan both the agencies and customers table for possible payees. I'm wondering if there's a proper way to approach this from a database schema standpoint (or from an accounting/e-commerce standpoint).
What is the correct way to handle this type of situation? All ideas and possible solutions are most welcome!
Update 01:
After a few helpful suggestions (see below) I've come up with a possible solution that may solve this issue while keeping the data normalized.
The one thing about this method that rubs me the wrong way is that I will have to make multiple table selects to get a list of all the people who can potentially make payments and/or have invoices assigned to them.
Perhaps this is unavoidable though in this situation since indeed there are different "types" of people that can be associated with payments and invoices. I'm stuck with a situation where I have two different types of records that need to be associated to the same thing. In the above approach I'm using the FKs to link each table (Agencies/Customers) to a Payee record (the table that unifies both Agencies/Customers) and then ultimately links them to Payments and Invoices.
Is this the proper solution? Or is there something I've overlooked?
There are several options:
You might put this like you'd do it with OOP programming and inheritance.
There is one table Person which holds an uniqueID and a type (Agency, Customer, more in Future). Additionally you might add columns with meta-data like who inserted/when/why and columns for status/soft-delete/???
There are two tables Agency and Customer, both holding a PersonID as FK.
Your Payee is the Person
You might use a schema-bound VIEW with a UNION ALL to return both tables of your modell in one result. A unique index on this view should ensure, that you'll have a unique key, at least as combination of the table-source and the ID there.
You might use a middle table with the table-source and the ID there as unique Key and use this two-column-id in you payment process
For sure there are several more...
My best friend was the first option...
My suggestion would be: instead of Payees table - to have two linking tables:
PayeeInvoices {
Id, --PK
PayeeId,
PayeeType,
InvoiceId --FK to Invoices tabse
}
and
PayeePayments {
Id, --PK
PayeeId,
PayeeType,
PaymentId --FK to Payments table.
}.
PayeeType is an option of two: Customer or Agency. When creating a new payment record you can query PayeeInvoices by InvoiceId to get PayeeType and corresponding PayeeId, and then lookup the rest of the data in corresponding tables.
EDIT:
Having second thoughts now. Instead of two extra tables PayeeInvoices and PayeePayments, you can just have PayeeId and PayeeType columns right in Invocies and Payments tables, assuming that Invoice or Payment belongs only to one Payee (Customer or Agency). Both my solutions are not really normalized, though.
While surfing through 9gag.com, an idea (problem) came up to my mind. Let's say that I want to create a website where users can add diffirent kinds of entries. Now each entry is diffirent type and needs diffirent / additional columns.
Let's say that we can add:
a youtube video
a cite which requires the cite's author name and last name
a flash game which requires additional game category, description, genre etc.
an image which requires the link
Now all the above are all entries and have some columns in common (like id, add_date, adding_user_id, etc...) and some diffirent / additional (for example: only flash game needs description or only image needs plus_18 column to be specified). The question is how should I organize DB / code for controlling all of the above as entries together? I might want to order them, or search entries by add_date etc...
The ideas that came up to my mind:
Add a "type" column which specifies what entry it is and add all the possible columns where NULL is allowed for not related to this particular type columns. But this is mega nasty. There is no data integration.
Add some column with serialized data for the additional data but it makes any filtration a total hell.
Create a master (parent) table for an entry and separate tables for concrete entry types (their additional columns / info). But here I don't even know how I'm supposed to select data properly and is just nasty as well.
So what's the best way to solve this problem?
The parent table seems like the best option.
// This is the parent table
Entry
ID PK
Common fields
Video
ID PK
EntryID FK
Unique fields
Game
ID PK
EntryID FK
Unique fields
...
What the queries will look like will largely depend on the type of query. To, for example, get all games ordered by a certain date, the query will look something like:
SELECT *
FROM Game
JOIN Entry ON Game.EntryID = Entry.ID
ORDER BY Entry.AddDate
To get all content ordered by date, will be somewhat messy. For example:
SELECT *
FROM Entry
LEFT JOIN Game ON Game.EntryID = Entry.ID
LEFT JOIN Video ON Video.EntryID = Entry.ID
...
ORDER BY Entry.AddDate
If you want to run queries like the one above, I suggest you give unique names to your primary key fields (i.e. VideoID and GameID) so you can easily identify which type of entry you're dealing with (by checking GameID IS NOT NULL for example).
Or you could add a Type field in Entry.
I understand how to design a database schema that has simple one-to-many relationships between its tables. I would like to know what the convention or best practice is for designating one particular relationship in that set as the primary one. For instance, one Person has many CreditCards. I know how to model that. How would I designate one of those cards as the primary for that person? Solutions I have come up with seem inelegant at best.
I'll try to clarify my actual situation. (Unfortunately, the actual domain would just confuse things.) I have 2 tables each with a lot of columns, let's say Person and Task. I also have Project which has only a couple of properties. One Person has many Projects, but has a primary Project. One Project has many Tasks, but sometimes has one primary Task with alternates, and other times has no primary task and instead a sequence of Tasks. There are no Tasks that are not part of a Project, but it isn't strictly forbidden.
PERSON (PERSON_ID, NAME, ...)
TASK (TASK_ID, NAME, DESC, EST, ...)
PROJECT (NAME, DESC)
I can't seem to figure a way to model the primary Project, primary Task, and the Task sequence all at the same time without introducing either overcomplexity or pure evil.
This is the best I've come up with so far:
PERSON (PERSON_ID, NAME, ...)
TASK (TASK_ID, NAME, DESC, EST, ...)
PROJECT (PROJECT_ID, PERSON_FK, TASK_FK, INDEX, NAME, DESC)
PERSON_PRIMARY_PROJECT (PERSON_FK, PROJECT_FK)
PROJECT_PRIMARY_TASK (PROJECT_FK, TASK_FK)
It just seems like too many tables for a simple concept.
Here's a question I've found that deals with a very similar situation: Database Design: Circular dependency.
Unfortunately, there didn't seem to be a consensus about how to handle the situation, and the "correct" answer was to disable the database consistency checking mechanism. Not cool.
Well, it seems to me that a Person has two relationships with a CreditCard. One is that the person owns it, and the other is that they consider it their primary CreditCard. That tells me you have a one-to-one and a one-to-many relationship. The return relationship for the one-to-one is already in the CreditCard because of the one-to-many relationship its in.
This means I'd add primary_cc_id as a field in Person and leave CreditCard alone.
Two strategies:
Use a bit column to indicate the preffered card.
Use a PrefferedCardTable associating each Person with the ID of its preffered card.
One person can have many credit cards; Then you'd need an identifier on each credit card to actually link that specific credit card to one individual - which I assume you've already made in your model (some kind of ID that links the person to that credit card).
Primary credit card (I assume you mean e.g. as a default credit card?) That would have to be some sort of manual operation (e.g. that you have a third table, that links them together and a column that specifies which one would be the default).
Person (SSN, Name)
CreditCard (CCID, AccountNumber)
P_CC (SSN, CCID, NoID)
So that would mean that if you connect a person to a credit card, you'd have to specify the NoID, as say '1', then design your query to per default find the credit card that belongs to this individual with NoID '1'.
This is of course just one way of doing it, maybe you'd want to limit by 0, 1 - and then sort them by the date the credit card was added to that person.
Maybe if you'd elaborate and give more information about your columns and ideas it'd make it easier.
So here what I tried out with Northwind and C# Windows App ,and I had just one query executed.
My Code:
DataClasses1DataContext context = new DataClasses1DataContext();
DataLoadOptions dlo = new DataLoadOptions();
dlo.LoadWith<Product>(b => b.Category);
context.LoadOptions = dlo;
context.DeferredLoadingEnabled = false;
context.Log = Console.Out;
List<Product> test = context.Products.ToList();
MessageBox.Show(test[0].Category.CategoryName);
Result:
SELECT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnOrder], [t0].[ReorderLevel], [t0].[Discontinued], [t2].[test], [t2].[CategoryID] AS [CategoryID2], [t2].[CategoryName], [t2].[Description], [t2].[Picture]
FROM [dbo].[Products] AS [t0]
LEFT OUTER JOIN (
SELECT 1 AS [test], [t1].[CategoryID], [t1].[CategoryName], [t1].[Description], [t1].[Picture]
FROM [dbo].[Categories] AS [t1]
) AS [t2] ON [t2].[CategoryID] = [t0].[CategoryID]