SQL Database How far to go to prevent Duplication of Records? - sql-server

I've been reading up on proper DB creation techniques, and I've got a large project that will have numerous items in it. I have already mapped out the tables and the various lookup tables, but then I realized that in the user table, I have first_name, middle_name, last_name columns that could be changed to first_name_id, middle_name_id and last_name_id that act as lookups to a first name table, middle name table and last name table that hold only unique names to prevent duplication of data.
The question I have is how far do I go with this process? Does every single item that could potentially be duplicate information need to be done like this? At some point this seems like it will get confusing to ME to keep track of all the relationships and cascading updates/deletes, etc on everything...
Just looking for some advice as I want to make sure this is done properly as setting a proper foundation is very important to build and scale on top of in the future, but at the same time don't want to take this to extremes if it is not needed.

Related

Data Modeling: Is it bad practice to store IDs from various sources in the same column?

I am attempting to merge data from various sources into an existing data model. Each source uses different types of IDs (such as GUID, Salesforce IDs, etc.). For example, if I were to merge data from two different sources, the table may look like the following (where the first two SalesPersonIDs are GUID IDs and the second two are Salesforce IDs):
Is this a bad practice? I could also imagine a table where each ID type was its own column and could be left blank if it was not applicable. Something like the following:
I apologize, I am a bit new to this. Thanks in advance for any insight, I greatly appreciate it!
The big roles of an ID column are to act as a key connecting data in different tables, and to help indexing - quickly find rows so your queries run fast.
The second solution wouldn't work well for these purposes, and will lead to big headaches in queries: every time you want to group by the ID, you'll have to combine the info from 2 columns in some way, hopefully getting a correct unique result every time.
On the one hand, all you might ever need from an ID is for it to be unique. The first solution might be fine this respect - but are you sure you'll never, ever get data about one SalesPerson from more than one source?
I'd suggest keeping all the IDs in one column, and adding a column to say what kind of ID this is. At least this way, you won't lose any information and can do other things in the future.
One thing you might consider is making a separate table of SalesPerson with all their possible IDs, and have this keyed to other (Sales?) data by a unique ID used only in your database.

Store multiple values in one database field in Access (hear me out)

So I've done extensive searching on this and I can't seem to find a good solution that actually applies to my situation.
I have a list of projects in a table, then a list of people. I want to assign multiple people to one project. Seems pretty common. Obviously, I can't make multiple columns on my projects table for each person, as the people will change fairly frequently.
I need to display this information very quickly in a continuous list of projects (the ultimate way would be a multiple-select combobox as a listbox is too tall, but they don't exist outside of the dreaded lookup fields)
I can think of two ways:
- Store multiple employee IDs delimited by commas in one field in my projects table (I know this goes against good database design). Would require some code to store and retrieve the data.
- Have a separate table for employees assigned to projects (ID, ProjectID, EmployeeID). One to many relationship between projects table and this new table. One to many relationship between employees table and this new table. If a project has 3 employees assigned, it would store 3 records in this table. It seems a bit odd joining both tables in this way, and would also require code to get it to store and retrieve into a control like the one mentioned above).
Does anyone know if there is a better way (including displaying in an easy control) or how you usually tackle this problem?
The usual way to tackle this problem would be with a Junction Table. This is what you describe where you have a separate table maybe called EmployeeProject which has an EmployeeProjectID(PK), EmployeeID(FK) and ProjectID(FK).
In this way you model a Many-to-Many relationship where each project can have many employees involved and each employee can be involved in many projects. It's not actually all that difficult to do the SQL etc. required to pull the information back together again for display.
I would definitely stay away from storing comma-delimited values as this becomes significantly more complicated when you want to display or manipulate the data.
There's a good guide here: http://en.tekstenuitleg.net/articles/software/create-a-many-to-many-relationship-in-access but if you google "many to many junction table" or similar, there are thousands of pages/articles about implementation.

Creating several database tables for user data?

I need to have a lot of user data in the database. Now, I've been thinking about having two tables, users that would have only the id, username and password and another table userData that would have everything else like name, lastname etc.
Is this a prefered method?
The simplest design would put all the fields in one table. From that point, though, there are a bunch of reasons you might want to consider splitting that information up into multiple tables. From your description, I cant' tell whether there are any valid reasons to do so.
If you start with one table, you might find it advantageous to split the data for reasons such as:
Normalization.
Reducing contention (different parts of the app update different information)
Truly huge column lists (look into the limit for your DB)
Other?? (how you're going to maintain your app, maybe?)
In short, I'd try to start simple and have a reason to pick the more complex design if you go that route.
There is nothing wrong with that design IMHO. You can have a users table and link it to a users_custom table that has additional information. Just be consistant with your design. Just remember that in order to get any additional user information you will always have to JOIN to that data.
To me this is a matter of preference, if you feel that this table will grow over time, consider your design, if not just keep it all in one table and properly index columns that you deem necessary.
You can go further by having a UserLog table to build a historical view of values as they change.
Yes it is :) In theory there are this so called "normal forms" (3NF BCnF, etc...). Using them, means seperating table into smaller ones :)
I think it might be better for you to keep it all in one table. Assuming you will be enforcing unique usernames, all the fields (password, first_name, and last_name) have a functional dependency on username. Therefore, you can put them all in the same table and still have a normalized database.
Although you can certainly separate first_name and last_name into their own table, queries will get a lot easier (fewer JOINs) if you keep all those fields in one table.

How can simplify my database?

I am working on a project in which I have generated a unique id of a customer with the customer's Last name's first letter. And stored it in a database in different tables as if customer's name starting with a then the whole information of the customer will stored in Registration_A table. As such I have created tables of Registration up to Z. But retrieving if data with such structure is quiet difficult. can you suggest me another method to save data so that retrieving become more flexible?
Put all of your registration data into one table. There's absolutely no need for you to break it into alphabetical pieces like that unless you have some serious performance issues.
When querying for registration data, use SQL's WHERE clause to narrow down your results.
You have to merge this to one table ´Registration´, then let the database care about unique ids. This depends on your database, but searching for PRIMARY KEY or AUTO INCREMENT should give you lots of results.
If you have done the the splitting because of performance reasons, you can add a Index on the users last name.

Do 1 to 1 relations on db tables smell?

I have a table that has a bunch of fields. The fields can be broken into logical groups - like a job's project manager info. The groupings themselves aren't really entity candidates as they don't and shouldn't have their own PKs.
For now, to group them, the fields have prefixes (PmFirstName for example) but I'm considering breaking them out into multiple tables with 1:1 relations on the main table.
Is there anything I should watch out for when I do this? Is this just a poor choice?
I can see that maybe my queries will get more complicated with all the extra joins but that can be mitigated with views right? If we're talking about a table with less than 100k records is this going to have a noticeable effect on performance?
Edit: I'll justify the non-entity candidate thoughts a little further. This information is entered by our user base. They don't know/care about each other. So its possible that the same user will submit the same "projectManager name" or whatever which, at this point, wouldn't be violating any constraint. Its for us to determine later on down the pipeline if we wanna correlate entries from separate users. If I were to give these things their own key they would grow at the same rate the main table grows - since they are essentially part of the same entity. At no pt is a user picking from a list of available "project managers".
So, given the above, I don't think they are entities. But maybe not - if you have further thoughts please post.
I don't usually use 1 to 1 relations unless there is a specific performance reason for it. For example storing an infrequently used large text or BLOB type field in a separate table.
I would suspect that there is something else going on here though. In the example you give - PmFirstName - it seems like maybe there should be a single pm_id relating to a "ProjectManagers" or "Employees" table. Are you sure none of those groupings are really entity candidates?
To me, they smell unless for some rows or queries you won't be interested in the extra columns. e.g. if for a large portion of your queries you are not selecting the PmFirstName columns, or if for a large subset of rows those columns are NULL.
I like the smells tag.
I use 1 to 1 relationships for inheritance-like constructs.
For example, all bonds have some basic information like CUSIP, Coupon, DatedDate, and MaturityDate. This all goes in the main table.
Now each type of bond (Treasury, Corporate, Muni, Agency, etc.) also has its own set of columns unique to it.
In the past we would just have one incredibly wide table with all that information. Now we break out the type-specific info into separate tables, which gives us much better performance.
For now, to group them, the fields have prefixes (PmFirstName for example) but I'm considering breaking them out into multiple tables with 1:1 relations on the main table.
Create a person table, every database needs this. Then in your project table have a column called PMKey which points to the person table.
Why do you feel that the group of fields are not an entity candidates? If they are not then why try to identify them with a prefix?
Either drop the prefixes or extract them into their own table.
It is valuable splitting them up into separate tables if they are separate logical entities that could be used elsewhere.
So a "Project Manager" could be 1:1 with all the projects currently, but it makes sense that later you might want to be able to have a Project Manager have more than one project.
So having the extra table is good.
If you have a PrimaryFirstName,PrimaryLastName,PrimaryPhone, SecondaryFirstName,SecondaryLastName,SEcondaryPhone
You could just have a "Person" table with FirstName, LastName, Phone
Then your original Table only needs "PrimaryId" and "SecondaryId" columns to replace the 6 columns you previously had.
Also, using SQL you can split up filegroups and tables across physical locations.
So you could have a POST table, and a COMMENT Table, that have a 1:1 relationship, but the COMMENT table is located on a different filegroup, and on a different physical drive with more memory.
1:1 does not always smell. Unless it has no purpose.

Resources