I have a database with two tables: Users and Categories.
Users has these fields:
UserId unique identifier
UserName nvarchar
CategoryId int
Categories has these fields:
CategoryId int
CategoryName nvarchar
At the moment, every user is in one category. I want to change this so that each user can be in any number of categories. What is the best way to do this?
My site does a lot of really expensive searches, so the solution needs to be as efficient as possible. I don't really want to put the list of categories each user has in a third table, as this means that when I pull back a search, each user will be represented in several rows at once (at least, this is what would happen if I implemented a search with my current, fairly crude, understanding of sql.)
EDIT:
If setting up a many-many relationship, is it possible to return only one row for each user?
For instance:
DECLARE #SearchUserID nvarchar(200) = 1;
SELECT *
FROM Users JOIN Categories JOIN CategoriesPerUser
WHERE UserId = #SearchUserID
This would return one row for each category the user belonged to. It is possible to have it only return one row?
At the moment you have a one-to-many relationship, that is to say category can be assocaited with many users, but a user can only be assocaited with one category.
You need to change this to a many-to-many relationship so that each user can be assocaited with many categories and each category can be assocaited with many users.
This is achieves by adding a table which links a userid and a category id (and removing categoryid from the user table)
Table: UserToCategory
UserId int
CategoryId int
As for your last paragraph, this is the most efficient way of modelling your requirement. You should make a combination of UserId/CategoryId the PrimaryKey in this table to stop a user being associated with the same category twice. This stops the problem of a user returned twice for a particular category.
The SQL to find, for example, all users associated with a category would be
SELECT u.*
FROM Users u
INNER JOIN UserToCategory uc
ON u.UserId = uc.UserID
WHERE uc.CategoryId = 123
Edit after comments: If you have a query that finds a number of users, and you want a distinct list of categories associated with those users this could be done like
SELECT c.*
FROM Categories c
WHERE CategoryId IN
(
SELECT uc.CategoryID
FROM UserToCategory uc
INNER JOIN Users u ON uc.UserId = u.UserID
WHERE <some criteria here to filter users>
)
I would drop CategoryId out of Users and go for the 3d table:
UserCategories
- UserId
- CategoryId
If you want to search all the categories for a user you can use for example:
SELECT uc.CategoryId, c.CategoryName
FROM UserCategories uc
JOIN Categories c ON uc.CategoryId = c.CategoryId
WHERE uc.UserId = #UserId
As this is an n-to-n Relationship (one category can have several user and one user can have several categories), the typical way to implement this would be to have a junction table.
But as you said, you don't want to create a third table for reasons of already implemented features, i guess you could also change the column "CategoryId" in the User table to "CategorieIds", which could then contain a text field. This text field could contain a list of integers, separated by a special character ("," for example). as far as i'm concerned, you should then do the split operation on your implementing code, since i don't know of any practical way to do this in sql (maybe someone could correct me here...).
You could also keep you categoryId Column then, if you wanted to implement something like a 'main' category per user.
Hope this helps and I hope to have suggested a correct way to implement this!
Related
The main idea is to store multiple ids from areas into one column. Example
Area A id=1
Area B id=2
I want if it is possible to save into one column which area my customer can service.
Example if my customer can service both of them to store into one column, I imagine something like:
ColumnArea
1,2 //....or whatever area can service
Then I want using an SQL query to retrieve this customer if contains this id.
Select * from customers where ColumnArea=1
Is there any technique or idea making that?
You really should not do that.
Storing multiple data points in a single column is bad design.
For a detailed explanation, read Is storing a delimited list in a database column really that bad?, where you will see a lot of reasons why the answer to this question is Absolutely yes!
What you want to do in this situations is create a new table, with a relationship to the existing table. In this case, you will probably need a many-to-many relationship, since clearly one customer can service more than one area, and I'm assuming one area can be serviced from more than one customer.
A many-to-many relationship is generating by connection two tables containing the data with another table containing the connections between the data (A.K.A bridge table). All relationships directly between tables are either one-to-one or one-to-many, and the fact that there is a bridge table allows the relationship between both data tables to be a many-to-many relationship.
So the database structure you want is something like this:
Customers Table
CustomerId (Primary key)
FirstName
LastName
... Other customer related data here
Areas Table
AreaId (Primary key)
AreaName
... Other area related data here
CustomerToArea table
CustomerId
AreaId
(Note: The combination of both columns is the primary key here)
Then you can select customers for area 1 like this:
SELECT C.*
FROM Customers AS C
WHERE EXISTS
(
SELECT 1
FROM CustomerArea As CA
WHERE CA.CustomerId = C.CustomerId
AND AreaId = 1
)
Have a question. I have a database called scraped. It has two tables. One called profiles and the other called charges. There is an identifier for each person in the profiles table called ID and a corresponding identifier in the charges table called profile_id. I'd like to export the profiles table, but want to join up all the contents of the charges to make one big flatfile (or table) using the "use query" as an export method. I'm stumped as to how to do this.
One other issue. say john smith has ID of 101, he may have 10 rows In the charges table that correspond to his ID number. Will they all be listed in separate rows in final output or not? If not can they be somehow?
It sounds like you just need to write a simple join query.
SELECT [list of columns you need]
FROM Profiles
INNER JOIN Charges ON Profiles.ID = Charges.profile_id
If you need all profiles whether or not they have a charge then change INNER JOIN to LEFT JOIN.
And yes, you'll get a record for every match in the Charges table (so 10 for your john smith).
Are tables with lots of columns indicative of bad design? For example say I have the following table that stores user information and user settings:
[Users table]
userId
name
address
somesetting1
...
somesetting50
As the site requires more settings the table gets larger. In my mind this table is normalized, all the settings are dependent on the userId.
I have a thing against tables with lots of columns it just seems wrong to me, but then I remembered that you can select what data to return from the table, so If the table is large I could still break it into several different objects in code. For example
[User object]
[UserSetting object]
and return only the data to fill those objects.
Is the above common practice, or are their other techniques that deal with tables with lots of columns that are more suitable to use?
I think you should use multiple tables like this:
[Users table]
userId
name
address
[Settings table]
settingId
userId
settingKey
settingValue
The tables are related by the userId column which you can use to retrieve the settings for the user you need to.
I would say that it is bad table design. If a user doesn't have an entry for 47 of those 50 settings then you will have a large number of NULL's in the table which isn't good practice and will also slow down performance (NULL's have to be handled in a special way).
Instead, have the following:
USER TABLE
Id,
FirstName
LastName
etc
SETTINGS
Id,
SettingName
USER SETTINGS
Id,
SettingId,
UserId,
SettingValue
You then have a many to many join, and eliminate NULL's
first, don't put spaces in table names! all the [braces] will be a real pain!
if you have 50 columns how meaningful will all that data be for each user? will there be lots of nulls? Most data may not even apply to any given user. Think 1 to 1 tables, where you break down the "settings" into logical groups:
Users: --main table where most values will be stored
userId
name
address
somesetting1 ---please note that I'm using "somesetting1", don't
... --- name the columns like this, use meaningful names!!
somesetting5
UserWidgets --all widget settings for the user
userId
somesetting6
....
somesetting12
UserAccounting --all accounting settings for the user
userId
somesetting13
....
somesetting23
--etc..
you only need to have a Users row for each user, and then a row in each table where that data applies to the given user. I f a user doesn't have any widget settings then no row for that user. You can LEFT join each table as necessary to get all the settings as needed. Usually you only need to work on a sub set of settings based on which part of the application that is running, which means you won't need to join in all of the tables, just the one or tow that you need at that time.
You could consider an attributes table. As long as your indexes are good, then you wouldn't have too much of a performance issue:
[AttributeDef]
AttributeDefId int (primary key)
GroupKey varchar(50)
ItemKey varchar(50)
...
[AttributeVal]
AttributeValId int (primary key)
AttributeDefId int (FK -> AttributeDef.AttributeDefId)
UserId int (probably FK to users table?)
Val varchar(255)
...
basically you're "pivoting" your table with many columns into 2 tables with less columns. You can write views and table functions around this structure to give you data for a group of related items or just a specific item, etc. You could also add other things to the attribute definition table to indicate required data elements, restrictions on the data elements, etc.
What's your thought on this type of design?
Use several tables with matching indexes to get the best SELECT speed. Use the indexes as a way to relate the information between tables using a JOIN.
I use CakePHP 2.0 and I have two models: user and course.
They are connected with HABTM, so it is a many-to-many relationship.
I can create, read, update and delete courses, that's all okay.
But now, the user can take part in some courses.
In the add view of the user, I can select many courses with these lines of code, because it creates a multiple select field.
$this->set('courses', $this->User->Course->find('list'));
echo $this->Form->input('Course');
So I can select multiple values, but the problem is the order! I need a order in my courses, which the user of the application should be able to manage (the start/end date of the course is not relevant).
Example: I have the user with the id = 10.
It should be possible to select no course or 1 or many with a correct order.
I have approx. 200 courses with the IDs 1-200
So it should be possible:
* UserId 10 -> no course
* UserId 10 -> CourseId 23
* UserId 10 -> CourseId 23, CourseId 11, CourseId 45, CourseId 10, CourseId 199 (the order is important)
How can I do this in my add / edit view and in my controller to handle this?
Best regards.
In your courses table, you'll have to add another column, let's say related courses.
You shall use this column as a constraint. You can store data either in JSON format or serialized, etc. You can use the data in the column to sort your courses in the find query.
Then, in the view, you must use javascript to restrict the selection of the wrong course and notify the user about the order.
It's just an idea, but I hope it helps!
I have a 3-table schema. Two of the tables (Trade/Portfolio) have a 1:1 relationship, so the FK on one of these tables has the unique constraint.
The table, as explained above, with the FK (which is Portfolio) relates to a third table. As this third table (Price) is displaying historical information for a Portfolio (there can be many prices for a portfolio over a time-period), there's a bog-standard 1:m relationship.
However, I need to get the various prices for a portfolio. That's easy with a query which works on the portfolio ID. However, is this a feasible way to get the price of a single trade? Is there any limitation in the design that would prevent this?
Apologies for the long title, but could not find a better way to explain the issue!
Thanks
By your description I guess this is your data model. FK TradeID is a unique in Portfolio.
And you wonder if it is possible to get the rows from Price related to Trade.
Here is a query that will give you all rows from Price where TradeID is 1.
select Price.*
from Portfolio
inner join Price
on Portfolio.PortfolioID = Price.PortfolioID
where Portfolio.TradeID = 1
I see nothing in this design that will prevent you from fetching the rows from Price given a TradeID.