How can I traverse a hierarchy to find circular references? - sql-server

I have a table that maps employees for profit sharing. It allows a given employee to assign a percentage of their take to another employee. This means there isn't necessarily a single entry for a given employee, there could be 0 or more. Alice could give 5% to Bob and 3% to Charlie, so Alice would have two records in the table.
The table has three fields, FromEmployeeId of type int, ToEmployeeId of type int, PctShare of type float.
When a user attempts to add a new mapping record I need to make sure that it won't result in a circular reference.
The problem is I have both multiple hierarchy levels and each employee can have multiple mappings. Initially I thought a recursive CTE may work but I don't think it will work with the multiple mapping issue.
Here's an example:
Bob maps to Charlie and Yusuf
Charlie maps to David and William
Yusuf has no mappings
David maps to Alice and Vince
Vince has no mappings
Alice maps to Tucker
Tucker has no mappings
or in tabular form:
------------------------
| From | To |
------------------------
| Bob | Charlie|
| Bob | Yusuf |
| Charlie | David |
| Charlie | William|
| David | Alice |
| David | Vince |
| Alice | Tucker |
------------------------
If a user attempted to add a mapping record from Alice -> Bob, I want to recognize that that results in a circular reference because Bob -> Charlie -> David -> Alice.
Is my only option something like a cursor and looping through each of the result sets?

Related

How to store country and state in sql server

I'm creating a video website that similar to youtube except its targets the indie gaming community.
I'm working on the table design and have run into a bit of stumbling block with the location column.
How do major sites design tables for storing location?
Profile table:
ID | username | country | state
0 | jack | US | New York
1 | ted | Canada | Alberta
OR
ID | username | countryID
0 | jack | 1
1 | ted | 2
Regions table:
ID | country | state
0 | United States | Texas
1 | United states | New york
2 | Canada | Alberta
Or is there some other design I missed?
And what about :
profile Table
ID | username | stateID
0 | jack | 1
1 | ted | 2
states table
ID | countryID | state
0 | 0 | Texas
1 | 0 | New york
2 | 1 | Alberta
countries table
ID | country
0 | United States
1 | Canada
I have no idea how "big" websites handle their data, but anyway I think this would be a matter of preference and business requirements, in the first case the table isn't properly normalized as the state depends on the country, and in the other case the model is [almost] properly normalized (the country could be moved to another table). The first option can be faster when doing lookups et cetera but as it breaks the normalized relational model it can lead to issues when inserting/updating the data (as well as additional storage). Personally I would chose to use the second option (and maybe de-normalize it for analytics processing if needed - I would think it very much depends on the amount of data you expect to handle)
A normalized model would look something like:
profile (**username**, state)
states (**state**, country)
countries (**country**)
The example above doesn't use surrogate keys and only illustrate the model; a database implementation of the model would often use surrogate keys such as UserID, StateID and CountryID although if properly normalized they shouldn't be needed as the entities should be primary keys (as they are candidate keys).

Friendship Website Database Design

I'm trying to create a database for a frienship website I'm building. I want to store multiple attributes about the user such as gender, education, pets etc.
Solution #1 - User table:
id | age | birth day | City | Gender | Education | fav Pet | fav hobbie. . .
--------------------------------------------------------------------------
0 | 38 | 1985 | New York | Female | University | Dog | Ping Pong
The problem I'm having is the list of attributes goes on and on and right now my user table has 20 something columns.
I feel I could normalize this by creating another table for each attribute see below. However this would create many joins and I'm still left with a lot of columns in the user table.
Solution #2 - User table:
id | age | birth day | City | Gender | Education | fav Pet | fav hobbies
--------------------------------------------------------------------------
0 | 38 | 1985 | New York | 0 | 0 | 0 | 0
Pets table:
id | Pet Type
---------------
0 | Dog
Anyone have any ideas how to approach this problem it feels like both answers are wrong. What is the proper table design for this database?
There is more to this than meets the eye: First of all - if you have tons of attributes, many of which will likely be null for any specific row, and with a very dynamic selection of attributes (i.e. new attributes will appear quite frequently during the code's lifecycle), you might want to ask yourself, whether a RDBMS is the best way to materialize this ... essentially non-schema. Maybe a document store would be a better fit?
If you do want to stay in the RDBMS world, the canonical answer is to have either one or one-per-datatype property table plus a table of properties:
Users.id | .name | .birthdate | .Gender | .someotherfixedattribute
----------------------------------------------------------
1743 | Me. | 01/01/1970 | M | indeed
Propertytpes.id | .name
------------------------
234 | pet
235 | hobby
Poperties.uid | .pid | .content
-----------------------------
1743 | 234 | Husky dog
You have a comment and an answer that recommend (or at least suggest) and Entity-Attribute-Value (EAV) model.
There is nothing wrong with using EAV if your attributes need to be dynamic, and your system needs to allow adding new attributes post-deployment.
That said, if your columns and relationships are all known up front, and they don't need to be dynamic, you are much better off creating an explicit model. It will (generally) perform better and will be much easier to maintain.
Instead of a wide table with a field per attribute, or many attribute tables, you could make a skinny table with many rows, something like:
Attributes (id,user_id,attribute_type,attribute_value)
Ultimately the best solution depends greatly on how the data will be used. People can only have one DOB, but maybe you want to allow for multiple addresses (billing/mailing/etc.), so addresses might deserve a separate table.

Which of these definitions explain 1NF?

I found it vague when i'm trying to look for the definition of 1NF in google.
Some of the sites like this one, says the table is in 1st normal form when it doesn't have any repetitive set of columns.
Some others (most of them) says there shouldn't be multiple values of the same domain exist in the same column.
and some of them says, all tables should have a primary key but some others doesn't talk about primary key at all !
can someone explain this for me ?
A relation is in first normal form if it has the property that none of its domains has elements which are themselves sets.
From E. F. Codd (Oct 1972). "Further normalization of the database relational model"
This really gets down to what it is about, but the guy who invented the relational database model.
When something is in the first normal form, there are no columns which themselves contain sets of data.
The wikipedia article on first normal form demonstrates this with a denormalized table:
Example1:
Customer
Customer ID | First Name | Surname | Telephone Number
123 | Robert | Ingram | 555-861-2025
456 | Jane | Wright | 555-403-1659, 555-776-4100
789 | Maria | Fernandez | 555-808-9633
This table is denormalized because Jane has a telephone number that is a set. Writing the table thus is still in violation of 1NF.
Example2:
Customer
Customer ID | First Name | Surname | Telephone Number
123 | Robert | Ingram | 555-861-2025
456 | Jane | Wright | 555-403-1659
456 | Jane | Wright | 555-776-4100
789 | Maria | Fernandez | 555-808-9633
The proper way to normalize the table is to break it out into two tables.
Example3:
Customer
Customer ID | First Name | Surname
123 | Robert | Ingram
456 | Jane | Wright
789 | Maria | Fernandez
Phone
Customer ID | Telephone Number
123 | 555-861-2025
456 | 555-403-1659
456 | 555-776-4100
789 | 555-808-9633
Another way of looking at 1NF is as defined by Chris Date (from Wikipedia):
There's no top-to-bottom ordering to the rows.
There's no left-to-right ordering to the columns.
There are no duplicate rows.
Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else).
All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps].
Example2 lacks a unique key which is in violation of rule 3. Example1 violates rule 4 in that the telephone number contains multiple values.
Only Example3 fills all those requirements.
Further reading:
Simple Guide to Five Normal Forms in Relational Database Theory
The simplest explanation I have found is this modified definition copied from here:
1st Normal Form Definition
A database is in first normal form if it satisfies the following conditions:
1) Contains only atomic values
2) There are no repeating groups

How to design database, one table for all product type or each table for each product type?

I start new e-commerce web application (pet project) that sale both t-shirt and shoe. My store has only free size T shirt so t-shirt has only color column while shoe has columns for size and color.
Now it's time to create table to store that data, I want to know is it good to create separate table for shoe and t-shirt or it's better to keep all of them in one table?
If it has a better idea to store such data, please let me know.
You definitely don't want to create a Shoe table and a TShirt table. Your shop might grow, and one day you'll have a thousand such product tables. Writing SQL for that would be a nightmare. Plus, you might have different kinds of t-shirts eventually, some with color, some with size and color, and so on. If you create a new table for each, you'll lose track of them quickly, and if you don't, why have separate tables for t-shirts and shoes, but not for one-size t-shirts and multi-size t-shirts?
While designing your database, you should be asking yourself: what are the entities in my realm? what are the things that never change and are uniquely identifiable? In a shop, a particular item that can be sold at a particular price is one such entity. So you might have a products table that has a key for each particular item you sell, and maybe a name, a type, a size and a color column:
item
id | type | name | size | color
------------------------------------
1 | shoe | Marathon | 9 | white
2 | shoe | Marathon | 9 | black
Looking at this table, you notice that we have two entries for the highly successful Marathon running shoe, and that seems to be a normalization violation. Indeed, you probably have two entities a shippable item and a catalog product. The shoe "Marathon" is probably something that has one picture and one description in your store, followed by a "available in the following colors and sizes:" line. So now you have two tables:
product
id | type | name | supplier | picture | description
--------------------------------------------------------------------------------------
1 | shoe | Marathon | TrackNField Co. | marathon.jpg | Run faster than light!
2 | tshirt | FlowerPower | SF Shirts | fpower.jpg | If you're going to San Francisco...
item
id | product_id | size | color | price
--------------------------------------
1 | 1 | 9 | white | 99.99
2 | 1 | 9 | black | 99.99
3 | 2 | | blue | 19.99
The "type" column in the product table can be a tricky one. You'll probably want to display products by category, let the user click on "shoes" and get all products with type "shoe". Easy so far, but eventually someone will mistype an entry "sheo", and then you can't find that product under shoes anymore. So it's better to separate the categorization from the products, for example by having a product_type table:
product_type
id | name
---------
1 | shoe
product
id | type_id | name | supplier | picture | description
--------------------------------------------------------------------------------------
1 | 1 | Marathon | TrackNField Co. | marathon.jpg | Run faster than light!
with a reference to the type in the product table. That's ok as long as your type hierarchy stays shallow, but what if you want to have subcategories, like "sneaker", "basketball shoe", "suede shoe", and so on? One shoe might even belong to several of these subcategories. In that case you can try this
category
id | name | supercategory_id
------------------------------------
1 | shoe |
2 | running shoe | 1
product_category
product_id | category_id
------------------------
1 | 2
product
id | name | supplier | picture | description
--------------------------------------------------------------------------------------
1 | Marathon | TrackNField Co. | marathon.jpg | Run faster than light!
And if you want to display multiple hierarchies of categorizations (as most big ecommerce sites do these days), you'll have to come up with something even more sophisticated.
Keep them all in one table and have a type field. The reason to do it this way is so that your data structure is scalable: i.e. if there is a new type of product then instead of adding a new table and having to drastically change your application code, you just use the same table and simply add a type.
if you don't want make it to complex you can keep all of them in the same table and create another table called "ProductType" that tells you if it is a shoe or a t-shirt.
The relationship will be One-To-Many on the "ProductType" side as you can have the same type of product associated with more then one record on the product table(where you store all your products)

Checking for overlapping car reservations

I'm writing a simple booking program for a car rental (a school assignment). Me and my buddy are trying to make the system a little more advanced than the assignment dictates, but we're having some problems we hoped you could help us with.
The idea is that you can reserve a certain car type, and when you get the car it will be one of that type (you don't reserve a specific car, as our assignment dictates, but only a type). Only one customer can have the car on a specific date. As the reservations tick in we have to make sure, that we don't hire out more cars of each type than we've got. The reservations are basically stored with a start date, an end date, and a car type.
If we ignore the car type for now (lets say we only have one type) then the reservations could graphically look something like this:
1/12 2/12 3/12 4/12 5/12 6/12 7/12
|-------------------|
|-----------------|
|-----|
|-------|
|-----------|
|-------------|
If the rental only has three cars it would be possible to rent a car from 3/12 to 5/12 since all days only have 2 car reservations. But how do we know this? Do we have to check each date and count() the number of reservations that spans over that date?
And what if somebody had reserved a car on 4/12, then 3/12 and 5/12 would still only have 2 reservations, but 4/12 would have 3.
Would it be possible to do with a query some how, or do we have to step through each date in the program to check the number of reservations didn't exceed the number of cars?
(This is easy enough with only full dates, but consider the scenario where you could rent the cars on an hourly basis (not only on a daily as here). Then it could be a though one to step through each our if we have a lot of reservations and cars and the timespan is long...)
Hope you have some nice ideas that will help us along. Thanks for taking the time to read the question :)
Mikkel, Denmark
Assume, You have such reservation situation in real life:
1/12 2/12 3/12 4/12 5/12 6/12 7/12
Car1: |-------------------|
Car2: |-----------------|
Car3: |-------| |-----------| |-----|
Car4: |-------------|
Table car
| id | type | registration |
| 1 | 1 | HH1111 |
| 2 | 1 | HH3333 |
| 3 | 2 | HH77 |
| 4 | 3 | DD999 |
Table reservation
| car_id | date_from | date_to |
| 1 | 2013-12-01 | 2013-12-04 |
| 2 | 2013-12-04 | 2013-12-07 |
| 3 | 2013-12-01 | 2013-12-02 |
| 3 | 2013-12-03 | 2013-12-05 |
| 3 | 2013-12-06 | 2013-12-07 |
| 4 | 2013-12-01 | 2013-12-03 |
Now, You must by really simple logic, select all available cars for period
from 2013-12-05 to 2013-12-06
"Select ALL cars, which does not have any reservation with dates, which blocks it for usage"
with brillian mysql select:
select * from car where not exists ( select * from reservation
where car.id = reservation.car_id AND
date_from < '2013-12-06' AND
date_to > '2013-12-05' )
"Would it be possible to do with a query some how, or do we have to step through each date in the program to check the number of reservations didn't exceed the number of cars? (This is easy enough with only full dates,"
The nature of your problem is that a violation of the constraint could appear on any individual date. So logically speaking, it is indeed necessary to do the check for each individual date comprised in a new reservations. The only optimisation possible would be to do the check at the level of "smallest intervals". To do that, you must first compute all the intervals that already appear in the database, and which overlap with your new reservation.
For example, a new reservation for 4/12-6/12 would have to be split into 4/12-5/12 (second line) and 5/12-6/12 (third line). Those individual intervals might be longer than one single day, and you can do the checks on the level of those individual intervals. (They are the same as individual days in this particular example, but a reservation 7/12-19/12 would not have to be split at all.
However, computing this might prove difficult, and there's another caveat: when you're looking al multi-row inserts, you should also be splitting over the other rows to be inserted (and that requires you to record all the inserted rows in a temporary table, otherwise you won't be able to access them).

Resources