Handling Database Cross Foreign Keys

Handling Database Cross Foreign Keys - database

I have the following tables:
USER table:
id | username | password | join_date | avatar_image_id
IMAGE table:
id | url | user_owner_id
image table holds all images of posts, articles and user avatars. each image belongs to a user who can edit it. so user_owner_id is necessary but it is not enough to know which image is the user's avatar so I need avatar_image_id.
Does this cross foreign key make problem? is it a bad design? and is there any way to solve it?

Here are a couple of options.
Option 1
You do this in the IMAGE table:-
id
url
user_owner_id
user_avatar_id
user_avatar_id column is a foreign key to the USER table
user_avatar_id allows NULLs, indicating the image is not an avatar for anyone
Since each user has only one Avatar, user_avatar_id should be made unique
This has the advantage that you can continue to treat all images generically in code that only needs the first three columns. Only when you have specific code that is used for Avatar images will you need to consider the last column.
This effectively enforces a rule that "each user has 0 or 1 avatars". Perhaps when a user is first created, they don't have an avatar yet so this is ok? If every user strictly must have an avatar and you want to enforce this in the database you need Option 2.
Option 2
How necessary is it that all images go in the same table? Do you often treat avatar images and other images generically with the same piece of code? If not, you could consider adding the following column to the USER table:-
avatar_url
This would make it quick and easy to find the avatar image for a given user. However, it may complicate image editing code because now you have to consider things stored in the image table as well as these special avatar images.
On balance, I would probably choose Option 1.

In general yes Cross Foreign Keys are a painful, specially in case you want to delete (or archive). This because you'll not be able delete neither of the rows.
A way to handle this is just defining the FOREIGN KEY with NOCHECK constraint to the avatar_image_id
Another way is adding a BIT column to table IMAGE IsAvatarImage. With the right indexes, the performance impact for this approach should be minimal.

Related

Best way to interconnect data to lower db queries [generic, no language specific]

I'm making a bot that receive and send images, i have to keep track to which image is sent to who so it send it only once. An user can also flag the image as inappropriate.
I made a db with 2 tables:
userTable with userID and userName
imageTable with imgID, fileName, fileCRC
I can think only of:
a) add viewedBy to imageTable "user1,user213,user9"
or
b) add imageToView to userTable "123,545,21321,654565"
But if I do [a] there is the problem that the more images a user views the more time is needed to get one random image.
And if I do [b] I already have a list of unseen images so I can just pick one random from here then delete the id. But if one user flag it as inappropriate I have to loop/remove the id from all the user in the db...
There is any better way?

You need an intermediate table that keeps track which user has seen which image. So basically this new table, let’s call it imageByUser, would contain a user id, an image id, inappropriate boolean flag, dateseen datetime (optional but would be useful on the long run) and a generated primary key (or you can have the combination of the user, the image and the dateseen as a composite logical primary key instead).
Having this third table would solve all your problems as you would just add a new row when someone sees an image. Also if they mark it as inappropriate m all you have to do is update the inappropriate flag to yes. This way you could even keep track of the cases when a user happens to see the same image twice (just add another row to the table).

Group multiple rows from one table?

I have a projects table with a many-to-many relationship with a images table through a junction table. I also want to add multiple cover images for a project.
I have multiple tables where I want to group some rows from the images table. Should I add a column with an boolean value if it's a cover image or should I create a one-to-many table with the ID's of the images I want as cover? And if I would add some column wouldn't it be redundant if most of the column values would be null?
There is also a clients and discipline table where I want to select images from the image table but add some extra columns like sortorder.

Both techniques have its merits and the answer can be opinion based; mine is. I'd recommend adding field (e.g. cover) to your images table indicating whether the image is primary or not to begin. It's okay for most rows to have NULL value in cover field. Create an index on imageid and cover.
I'd also recommend creating a view called CoverImages that would be select ... from images where cover=true so that whichever application needing only a cover image can directly use this view. Depending on your database engine, you may have the ability to create sparse columns and/or create filtered indexes.
The other option is to add cover to your junction table that has projectid and imageid. However, that would violate 2nd normal form of database normalization since cover would be an attribute of an image, not a project.
The option you mentioned - that of putting images that are cover images in one-to-many table - could be problematic when more flags are added. If the image is flagged as sensitive, you would then have to create another table and put only sensitive imageids in it.
Based on the last 2 thoughts, I recommend putting cover flag in images table.
EDIT
OP reminded that imageid 1 could be a cover for project1 but not for project2. That means cover should be associated with the junction table and not the images table. 2NF will not be violated

What is a recommended schema / database design to store custom report settings in my sql database?

I am building a tool to allow people to create customized reports. My question resolves around getting the right database schema and design to support some custom report settings.
In terms of design, I have various Slides and each Slide has a bunch of settings (like date range, etc). A Report would basically be an ordered list of slides
The requirements are:
A user can create a report by putting together a list of "Slides" in any order they wish
A user can include the same slide twice in a report with different settings
So I was thinking of having the following tables:
Report Table: Id, Name, Description
Slide Table:, Id, Description
ReportSlide Table: ReportId, SlideId, Order, SlideSettings
my 2 main questions are:
Order: Is this the best way to manage the fact that a user can order their slides on any given report
SlideSettings: since every slides has a different set of settings (inputs), i was thinking of storing this as just a json blob and then parsing it out on the front end. Does anything one think this is the wrong design? Is there a better way to store this information (again, each slide has different inputs and you could have the same slide listed twice in a report each with different settings

Order: Is this the best way to manage
It is the correct way.
SlideSettings: ... storing this as just a json blob
If you never intend to query these values, then that's fine.
You may want to rename ReportSlide to SlideInReport. A relationship should not just list the referenced tables, but the nature of the relationship.
Some (me) prefer to give PK-columns and FK-columns the same name. Then you cannot get away with just Id, but you need to call them sld_id, rep_id.

May be you should have a Settings table. You may also need a ValueTypes table to define which setting can take what kind of values. (such as Date Range). And then let the list of setting IDs be stored against a slide.
Needless to say, these "best way"s will depend on type and amount of data being stored etc. Am a novice in JSON etc, but as far as I read, it's not a good idea to keep JSON strings as database fields, but not a rule.

I think, from a high level view, your schema will work. However, you might consider revising some of the table structure. For example:
Settings
Rather than a JSON blob, it may be best to add columns for each setting the ReportSlide table. Depending on what inputs you allow, give a column for each. For example, your date range will need to have StartDate/EndDate, Integers, Text fields, etc.
What purpose does the Slide Table serve? If your database allows a many-to-many relationship between Slides and Reports, then the ReportSlide table will hold all your settings. Will your Slide Table have attributes? If not, then perhaps Report Slides are all you need. For example:
Report Table: ReportID | DateCreated | UserID | Description
ReportSlides Table: ReportSlideID | ReportID | SlideOrder | StartDate | EndDate | Description...
Unless your Slide table is going to hold specific attributes that will be common across every report, you don't need the extra joins or space.
Depending on the tool, you may also want to have a DateCreated, UserID, FolderID, etc. Attributes that allow people to organize their reports.
If the Slides are dependent on each other, you will want to add constraints so Slide 2 cannot be deleted if Slide 3 depends on it.
Order
Regarding order, having a SlideOrder column will work. Because each ReportSlideID will have a corresponding Report, the SlideOrder can still be changed. That way, if ReportSlideID = 1 belongs to ReportID = 1 and has specific settings, it can be ordered 7th or 3rd and still work.
Be aware of your naming convention. If the Order column is directly referencing Slide Order, then go ahead and name it SlideOrder.
I'm sure there are a million other ways to make it efficient. That's my initial idea based on what you've provided.

Report Table: ID (Primary Key), Name, Description,....
Slide Table: ID (PK), Name, Description,...
Slide_x_report Table: ID(PK), ReportID (FK), SlideID (FK), order
Slide_settings Table: ID(PK), NameSetting, DescriptionSettings, SlideXReportID (FK),...
I think that you shoud have a structure like this, and in the Slide_settings table you will have the setting of the differents slides by reports.
Imagine that the slide_settings table may contain dynamic forms and these should relate to a specific slide of a report, in this way you can have it all properly stored and the slide_settings table, you would have only columns that are needed to define an element of slide.

Database Normalization and Nested Lists -- Cannot Think of a Solution

I am trying to implement a system on my website similar to that of Facebook's "Like" feature. Where users can click a button which counter++'s. However, I have run into a problem in terms of efficiently storing data into my DB.
Each story has it's own row in the stories table in my DB with the columns like and users_like.
I want each person to only be able to like the story once. Therefore I need to somehow store data that shows that the user has, in fact, like++'d the post.
All I could thing of was to have a column named users_like and then add each user, followed by a comma, to the column using CONCAT and then using the php function to explode the data.
However, this method, as far as I know, is in the opposite direction of database normalization.
What is the best way to do this and I understand "best" is subjective.
I cannot add a liked flag to the user table because there will be a vast number of stories the person could 'like.'
Thanks

You need a many to many table in your database that will store a foreign key to the stories table and a foreign key to the user table. You put a constraint on this table saying that the story fk - user fk combo must be unique.
You now don't even have to have a like column, you just count the number of rows in the many to many table corresponding to your story.

What is the best practices in db design when I want to store a value that is either selected from a dropdown list or user-entered?

I am trying to find the best way to design the database in order to allow the following scenario:
The user is presented with a dropdown list of Universities (for example)
The user selects his/her university from the list if it exists
If the university does not exist, he should enter his own university in a text box (sort of like Other: [___________])
how should I design the database to handle such situation given that I might want to sort using the university ID for example (probably only for the built in universities and not the ones entered by users)
thanks!
I just want to make it similar to how Facebook handles this situation. If the user selects his Education (by actually typing in the combobox which is not my concern) and choosing one of the returned values, what would Facebook do?
In my guess, it would insert the UserID and the EducationID in a many-to-many table. Now what if the user is entering is not in the database at all? It is still stored in his profile, but where?

CREATE TABLE university
(
id smallint NOT NULL,
name text,
public smallint,
CONSTRAINT university_pk PRIMARY KEY (id)
);
CREATE TABLE person
(
id smallint NOT NULL,
university smallint,
-- more columns here...
CONSTRAINT person_pk PRIMARY KEY (id),
CONSTRAINT person_university_fk FOREIGN KEY (university)
REFERENCES university (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
);
public is set to 1 for the Unis in the system, and 0 for user-entered-unis.

You could cheat: if you're not worried about the referential integrity of this field (i.e. it's just there to show up in a user's profile and isn't required for strictly enforced business rules), store it as a simple VARCHAR column.
For your dropdown, use a query like:
SELECT DISTINCT(University) FROM Profiles
If you want to filter out typos or one-offs, try:
SELECT University FROM PROFILES
GROUP BY University
HAVING COUNT(University) > 10 -- where 10 is an arbitrary threshold you can tweak
We use this code in one of our databases for storing the trade descriptions of contractor companies; since this is informational only (there's a separate "Category" field for enforcing business rules) it's an acceptable solution.

Keep a flag for the rows entered through user input in the same table as you have your other data points. Then you can sort using the flag.

One way this was solved in a previous company I worked at:
Create two columns in your table:
1) a nullable id of the system-supplied string (stored in a separate table)
2) the user supplied string
Only one of these is populated. A constraint can enforce this (and additionally that at least one of these columns is populated if appropriate).
It should be noted that the problem we were solving with this was a true "Other:" situation. It was a textual description of an item with some preset defaults. Your situation sounds like an actual entity that isn't in the list, s.t. more than one user might want to input the same university.

This isn't a database design issue. It's a UI issue.
The Drop down list of universities is based on rows in a table. That table must have a new row inserted when the user types in a new University to the text box.
If you want to separate the list you provided from the ones added by users, you can have a column in the University table with origin (or provenance) of the data.

I'm not sure if the question is very clear here.
I've done this quite a few times at work and just select between either the drop down list of a text box. If the data is entered in the text box then I first insert into the database and then use IDENTITY to get the unique identifier of that inserted row for further queries.
INSERT INTO MyTable Name VALUES ('myval'); SELECT ##SCOPE_IDENTITY()
This is against MS SQL 2008 though, I'm not sure if the ##SCOPE_IDENTITY() global exists in other versions of SQL, but I'm sure there's equivalents.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight