Why Do I need user id attribute? - database

I am currently trying to design a social network type of website and this is the class diagram
that I have so far
at the moment I have userId and username in separate tables because I wanted to normalize these tables but now I am not sure why do I need the userId attribute? I have done research and a lot of similar projects have this attribute but I don't get why? if the username is already going to uniquely identify a particular user.
By the way I am aware I have a problem with the requests table because at the moment with the attributes given I cannot identify a primary key
Thanks

Two big reasons I can think of:
Optimization. SQL databases typically perform far better when using integer primary keys than varchar ones. Lookup-something-by-user is one of the most common operations in this environment, so this has real performance implications. Many DBAs don't like GUID/UUIDs as PKs for exactly this reason.
Nothing dictates that a username must uniquely identify users. Case in point: Stack Exchange user handles don't have to be unique, and are freely editable.

Related

SQLite: Individual tables per user or one table for them all?

I've already designed a website which uses an SQLite database. Instead of using one large table, I've designed it so that when a user signs up, a individual table is created for them. Each user will possibly use several hundreds of records. I done this because I thought it would be easier to structure and access.
I found on other questions on this site that one table is better than using many tables for each user.
Would it be worth redesigning my site so that instead of having many tables, there would be one large table? The current method of mine seems to work well though it is still in development so I'm not sure how well it would stack up in a real environment.
The question is: Would changing the code so that there is one large database instead of many individual ones be worth it in terms of performance, efficiency, organisation and space?
SQLite: Creating a user's table.
CREATE TABLE " + name + " (id INTEGER PRIMARY KEY, subject TEXT, topic TEXT, questionNumber INTEGER, question TEXT, answer TEXT, color TEXT)
SQLite: Adding an account to the accounts table.
"INSERT INTO accounts (name, email, password, activated) VALUES (?,?,?,?)", (name, email, password, activated,)
Please note that I'm using python with Flask if it makes any difference.
EDIT
I am also aware that there are questions like this already, however none state whether the advantages or disadvantages will be worth it.
In an object oriented language, would you make a class for every user? Or would you have an instance of a class for each user?
Having one table per user is a really bad design.
You can't search messages based on any field that isn't the username. With your current solution, how would you find all messages for a certain questionNumber?
You can't join with the messages tables. You have to make two queries, one to find the table name and one to actually query the table, which requires two round-trips to the database server.
Each user now has their own table schema. On an upgrade, you have to apply your schema migration to every messages table, and God help you if some of the tables are inconsistent with the rest.
It's effectively impossible to have foreign keys pointing to your messages table. You can't specify the table that the foreign key column points to, because it won't be the same.
You can have name conflicts with your current setup. What if someone registers with the username accounts? Admittedly, this is easy to fix by adding a user_ prefix, but still something to keep in mind.
SQL injection vulnerabilities. What if I register a user named lol; DROP TABLE accounts; --? Query parameters, the primary way of preventing such attacks, don't work on table names.
I could go on.
Please merge all of the tables, and read up on database normalization.

Auto-Complete/Primary Key as String - PostgreSQL

I setup a database that is not too complex but still nonetheless has multiple many-to-many relationships. Let me explain the database first briefly using three tables(there are many more, but just to keep things simple):
Database is storing information about projects completed. One attribute is software used. So I have three tables(with respective columns/keys):
tblProjects(ProjectID[PK], ProjectTitle, etc...)
tblProjectsSoftware(SoftwareID[FK], ProjectID[FK], UniqueID[PK])
tblSoftwareUsed(SoftwareID[PK], SoftwareName)
In order to make data entry easier in phppgadmin, I was considering just making 'SoftwareName' the primary key in tblSoftwareUsed. This is because when I go to enter the software associated with certain projects into tblProjectsSoftware, I can only use the auto-complete feature on the SoftwareID column which is just more or less a meaningless number.
As you can see, when entering data into the SoftwareID column of tblSoftwareUsed, I would only be able to 'filter' results by the ID and not the name. When this database gets large, it may not be an issue for software, but there are some other attributes that will have tons of records. To explain that further, I would start my data entry by creating a record for the project in tblProjects. Then I would create new records (if necessary) for software used. Then, when entering data into tblProjectsSoftware, I would either have to know the ID of the software or click through a few pages to find it.
So, my question is, would I have any issues by making the name of the software my Primary Key, or would it be better to just leave it as is with the ID as the PK? Furthermore, maybe I am missing an option to make 'SoftwareName' searchable as in addition to the ID.
There are advantages and disadvantages to using surrogate keys, which are discussed at length in this wikipedia article:
http://en.wikipedia.org/wiki/Surrogate_key
Borrowing their headers...
Advantages:
Immutability
Requirement changes
Performance
Compatibility
Uniformity
Validation
Disadvantages:
Disassociation
Query optimization
Normalization
Business process modeling
Inadvertent disclosure
Inadvertent assumptions
More often than not, you'll want to use a surrogate key for practical reasons -- such as avoiding headaches when you need to update a software name.

Is it insecure to reveal a row's primary key to the user?

Why do many applications replace the primary key of a database with a seemingly random alternative id when revealing the record to the user?
My guess is that it prevents users from guessing other rows in the table. If so, isn't that just false sense of security?
I guess you are talking about surrogate keys here. One of the desired or supposed advantages of surrogate keys is that they aren't burdened by any external meaning or dependency on anything outside the database. So for example the surrogate key values can safely be reassigned or the key can be refactored or discarded without any consequences for users of the system.
Generally surrogate keys are kept hidden from users so that they don't acquire any such external dependencies. Being hidden from users was in fact part of the original definition of a surrogate key as proposed by E.F.Codd. If key values reside in the user's browser cache or favourites list then they aren't much use as "surrogates" any more. So that's one common reason why you will see one key used only inside the database and a different key for the same table made visible in the application.
I think it may depend on the type of application you are working with. I work with Enterprise software that is only used by the company I work for and is not generally available to the outside world. In this case, it is often critical to let the user see the surrogate key for people-related records because the information in the person table has no uniqueness. There can be two John Smiths (we actually have over 1000 of them) who are genuinely different people. They may even have the same business address and be different people (Sons are often named for fathers and work in the same medical practice for instance). So they need to refer to the surrogate key on forms and in reporting to ensure they are using the record they thought they wanted. OItherwise if they wanted to research further details about the John Smith that they saw in a report, how would they look it up in the aaplication without having to go through all 1000 to find the right one? Creating a fake id as well as the real one would be time consuming (we import millions of records at a time) and for no real gain since the data would not be visible outside our comapny application.
For a web app that is open to the general public, I can see where you might not want to show this information.

Should I expose a user ID to public?

I have a form that reveals user IDs to public. I was wondering that is this dangerous. Personally I do not see anything bad about it. The ID is just used to reference a single database record.
If it were dangerous, Stack Overflow wouldn't be displaying user IDs in their URLs in order to make user profile lookups work: https://stackoverflow.com/users/104826/rfactor
Edit of seriousness of immense levels: if user IDs are themselves sensitive data; for example your primary keys for some reason happen to be social security numbers, that'll definitely be a security and privacy liability. If your user IDs are just auto-increment numbers though, you're clear.
Generally it's not a problem but it can give away hints on how active your site is, like how many users you have etc. If you consider this sensitive information or maybe even good marketing is completely up to you.
There's a story that this was one of the reasons the germans lost the WW2. They had sequential serial numbers from production written on each tank. By collecting id numbers from tanks taken out the british could estimate how many tanks the whole german army had and make new strategies from that.
I have found that exposing primary keys that identify physical entities can create headaches.
Imagine if two blood samples come into a laboratory and test results are generated for each sample. Many different kinds of test might be done and each record representing a test result will have the sample_id as a foreign key.
If you share the database ID with the customer and you discover that two samples were accidentally switched, you will have to update the foreign keys in all the detail records representing the tests. If you instead exposed some other unique name outside your system, you will just need to switch the two unique names on the sample records in the master table.
There are other advantages related to data migration and there are advantages when entities are represented in more than one database in which it is difficult to create records with identical database ID's.
In my experience it is always best to expose a unique identifier other than the primary key outside your system. It gives you more flexibility in resolving data mix-ups, dealing with data migration issues, and in otherwise future-proofing your system.
as For me ID is as dangerous as showing user name.
Exposing an user ID is not, in and of itself, bad. It depends on the level of privacy and security needed. If the user ID does not expose and cannot be tied to any other personal data that should otherwise be private, it may not be a problem.
But don't think that public user IDs can never be a problem.
Make sure you don't allow anyone to break in to any private data just by knowing user IDs. Facebook has had problems like that. Here's just one example. While revealing user IDs wasn't the whole story, it was part of the equation.
Will it hurt anything? Only you can decide that, and you should think that through.
But in general, it is poor form to display the User ID without having a business reason to do so. (Saves you work is probably not a good business reason.)
If it is a generated database id with no other meaning, it's not dangerous. Though I don't think revealing an id is elegant either. It's a technical detail and I can't understand why you would like to show it to users.

Separating user table from people table in a relational database

I've done many web apps where the first thing you do is make a user table with usernames, passwords, names, e-mails and all of the other usual flotsam. My current project presents a situation where non-users records need to function similarly to users, but do not need to the ability to be a first order user.
Is it reasonable to create a second table, people_tb, that is the main relational table and data store, and only use the users_tb for authentication? Does separating user_tb from people_tb present any problems? If this is commonly done, what are some strategies and solutions as well as drawbacks?
This is certainly a good idea, as you are normalizing the database. I have done a similar design in an app that I am writing, where I have an employee table and a user table. Users may a from an external company or an employee, so I have separate tables because an employee is always a user, but a user may not be an employee.
The issues that you'll run into is that whenever you use the user table, you'll nearly always want the person table to get the name or other common attributes you would want to show up.
From a coding standpoint, if you're using straight SQL, it will take a little more effort to mentally parse the select statement. It may be a little more complicated if you're using an ORM library. I don't have enough experience with those.
In my application, I'm writing it in Ruby on Rails, so I'm constantly doing things like employee.user.name, where if I kept them together, it would be just employee.name or user.name.
From a performance standpoint, you are hitting two tables instead of one, but given proper indexes, it should be negligible. If you had an index that contained the primary key and the person name, for instance, the database would hit the user table, then the index for the person table (with a nearly direct hit), so the performance would be nearly the same as having one table.
You could also create a view in the database to keep both tables joined together to give you additional performance enhancements. I know in the later versions of Oracle you can even put an index on a view if needed to increase performance.
I routinely do that because for me the concept of "user" (username, password, create date, last login date) is different from "person" (name, address, phone, email). One of the drawbacks that you may find is that your queries will often require more joins to get the info you're looking for. If all you have is a login name, you'll need to join the "people" table to get the first and last name for example. If you base everything around the user id primary key, this is mitigated a bit, but still pops up.
If user_tb has auth info, I would very much keep it separate from people_tb. I would however keep a relationship between the two, and most of users' info would be stored in people_tb except all of the info needed for auth (which i guess will not be used for much else) Its a nice tradeoff between design and efficiency i think.
That is definitely what we do as we have millions of people records and only thousands of users. We also separate address, phones and emails into relational tables as many people have more than one of each of these things. Critial is to not rely on name as the identifier as name is not unique. Make sure the tables are joined through some type of surrogate key (an integer or a GUID is preferable) not name.
I always try to avoid as much data repetition as possible. If not all people need to login, you can have a generic people table with the information that applies to both people and users (eg. firstname, lastname, etc).
Then for people that login, you can have a users table that has a 1~1 relationship with people. This table can store the username and password.
I'd say go for the normalized design (two tables) and only denormalize (go down to one user/person table) if it will really make your life easier down the line. If however practically all people are also users it may be simpler to denormalize up front. Its up to you; I have used the normalized approach without problems.
Very reasonable.
As an example, take a look at the aspnet_* services tables here.
Their built in schema has a aspnet_Users and aspnet_Membership with the later table having more extended information about a given user (hashed passwords, etc) but the aspnet_User.UserID is used in the other portions of the schema for referential integrity etc.
Bottom line, it's very common, and good design, to have attributes in a separate table if they are different entities, as in your case.

Resources