Save image path in database issue - database

When user upload image I resize that image in 3 different sizes. I can't decide what is better option:
1) Save image path in one column in database like: User9876/ImageName and extension in another column. When I get user profile data from database I have business object with three properties for different image size. And in code I fill this properties by adding extension to image path from database like:
User9876/ImageName_Original.jpg
User9876/ImageName_Small.jpg
User9876/ImageName_Smallest.jpg
2) Or it is better to have three columns in database for each size?

Couple of recommendations:
Don't store the extension separately, unless you plan on querying for particular extensions. Just adds more complexity for no real gain.
I would store the image size code as an id in your images table. This way you can update/edit image codes without having to run an update statement on your entire table.
create table image (
id int(11) unsigned not null auto_increment,
path varchar(255) not null,
size_code_id smallint(3) not null,
PRIMARY KEY (id)
)Engine=MyISAM Default Charset=utf8;
Example row:
1 | user9876/imagename_og.jpg | 1
2 | user9876/imagename_me.jpg | 2
3 | user9876/imagename_sm.jpg | 3
This design will allow you to query for the 'small' image without having to parse the filename, yet you can still make the distinction of a small image from medium and original images on the filesystem.

I think the answer to your question can be found by asking yourself another question: what is going to happen if you later choose to have not three, but four different image sizes? Clearly, the solution which does not require you to reorganize your entire database is better.

Related

Is a join faster than storing simple values directly in the table?

We are building a SQL Server database that contains rows with file types. These types are going to be pdf in 90% of cases and zip in 10% of the cases. When designing this database, my first though was to use foreign keys to another table holding the types, like this:
Files
Id | Name | FileTypeId
---+--------+------------
1 'f123' 1
FileTypes
Id | Name
---+------
1 'pdf'
2 'zip'
An advantage of this design is that it makes the Files table less cluttered and in theory it should take up less space. However this design would also require me to do a join on the FileTypes table every time I want to select a row from the Files table.
The alternative then would be to store the FileTypes directly in the Files table instead:
Id | Name | FileTypeId
---+---------+-----------
1 'f123' 'pdf'
What design is best? What is faster? Which saves more space?
The design which is "best" totally depends on your particular needs and use case. Certainly, the first version is normalized and is what you would want most of the time. The Files table only maintains a lightweight FileTypeId foreign key value which can be used to lookup in the FileTypes table. However, there is a join required to bring these two tables together.
In rare cases, you might find that the following join query takes too long:
SELECT f.Id, f.Name, ft.Name AS FileTypeId
FROM Files f
INNER JOIN FileTypes ft
ON ft.Id = f.FileTypeId;
Typically, your first line of defense would be to resort to indexing one/both tables. But in the event that even indexing still were not fast enough, you might go with the unnormalized second version in your question. In that case, you would be throwing away a bunch of storage space, but in return the above join query would be against a single table, and might be faster.
We're talking about bytes of difference here, but a filetype of varchar(3) is smaller than an int - and of course a tinyint is smaller than both.
And as there is no inherent ordering of filetypes (e.g., filetype_id 1 isn't really less than filetype_id 2 in any real sense), there is no absolute benefit in using numbers to store the filetypes.
Personally, I would find a new option (Option 3) to be easier - having a meaningful key for filetype.
Id | Name | FileTypeId
1 'f123' 'pdf'
FileTypes
Id
'pdf'
'zip'
Then it's clean when looking at data (you don't need to do a lookup with views etc). But you can also enforce referential integrity.
Edit: as an example of cleanliness, you may have certain processing you need to do PDFs (e.g., to remove the identifying information from them). In your code you can have one of these two
IF FileType_ID = 14 -- PDFs
BEGIN
...
END
IF FileType_ID = 'pdf'
BEGIN
...
END
CAVEAT (added as an edit - I had it as a now-deleted comment that the OP has replied to (sorry))
There is one (potentially serious) limitation to this - the meaningful key has to mean what you want it to mean.
In this example, you will run into problems if the extension does not define the filetype but you still want to differentiate them. For example, you may have a 'cs' filetype that could be either a C# code file, or CounterStrike streamed video.
In that case, you may want to separate them into two filetypes - and then using numbers to represent them is probably the easiest/most efficient.

Handling Database Cross Foreign Keys

I have the following tables:
USER table:
id | username | password | join_date | avatar_image_id
IMAGE table:
id | url | user_owner_id
image table holds all images of posts, articles and user avatars. each image belongs to a user who can edit it. so user_owner_id is necessary but it is not enough to know which image is the user's avatar so I need avatar_image_id.
Does this cross foreign key make problem? is it a bad design? and is there any way to solve it?
Here are a couple of options.
Option 1
You do this in the IMAGE table:-
id
url
user_owner_id
user_avatar_id
user_avatar_id column is a foreign key to the USER table
user_avatar_id allows NULLs, indicating the image is not an avatar for anyone
Since each user has only one Avatar, user_avatar_id should be made unique
This has the advantage that you can continue to treat all images generically in code that only needs the first three columns. Only when you have specific code that is used for Avatar images will you need to consider the last column.
This effectively enforces a rule that "each user has 0 or 1 avatars". Perhaps when a user is first created, they don't have an avatar yet so this is ok? If every user strictly must have an avatar and you want to enforce this in the database you need Option 2.
Option 2
How necessary is it that all images go in the same table? Do you often treat avatar images and other images generically with the same piece of code? If not, you could consider adding the following column to the USER table:-
avatar_url
This would make it quick and easy to find the avatar image for a given user. However, it may complicate image editing code because now you have to consider things stored in the image table as well as these special avatar images.
On balance, I would probably choose Option 1.
In general yes Cross Foreign Keys are a painful, specially in case you want to delete (or archive). This because you'll not be able delete neither of the rows.
A way to handle this is just defining the FOREIGN KEY with NOCHECK constraint to the avatar_image_id
Another way is adding a BIT column to table IMAGE IsAvatarImage. With the right indexes, the performance impact for this approach should be minimal.

what is the best table structure for keeping several combo box(list box)

I have several list boxes in my web application that user has to fill. Administrator can add/remove/edit values in the combo box from controle panel. so problem is what is the best way to keep these combo box in database.
one way is keeping each table for each combo box. I think this is very easy to handle but I will have to create more than 20 tables for each combo/list box.And I think whether it is good practice to do so.
anotherway is keeping one table for all combo box. But I am worring when deleting data in this case.
If I want to remove India from countr coloum in combo box table, then I will be in problem. I may have to update it to null or some otherway and have to handel this in programming side.
Am I correct. can you help me ?
I think you just should create a table with 3 fields. First field is the id, second is the name and the last is the foreign key. For example:
combo_box_table
id - name - box
1 - Japan - 1
2 - India - 1
3 - Scotland - 2
4 - England - 3
you just have to play with query, each box represent the last field. 1 represent combo box 1 and 2 represent combo box 2 etc.
select * from combo_box_table where box = 1
if you want to delete india the query is just delete from combo_box_table where id = 2
May this help
Another possibility would be to save the combo box data as an array or a json string in a single field in your table, but whether you want to do this or not depends on how you want your table to function and what you application is. See Save PHP array to MySQL? for further information.
EDIT:
I'm going to assume you have a combo-box with different countries and possibly another with job titles and others.
If you create multiple tables then yes you would have to use multiple SQL querys, but the amount of data in the table would be flexible and deleting would be a one step process:
mysqli_query($link,"DELETE FROM Countries WHERE Name='India'");
With the json or array option you could have one table, and one column would be each combo-box. This would mean you only have to query the table once to populate the combo-boxes, but then you would have to decode the json strings and iterate through them also checking for null values for instance if countries had 50 entries but job titles only had 20. There would be some limitations on data amount as the "text" type only has a finite amount of length. (Possible, but a nightmare of code to manage)
You may have to query multiple times to populate the boxes, but I feel that the first method would be the most organized and flexible, unless I have mis-interpreted your database structure needs...
A third possible answer, though very different, could be to use AJAX to populate the combo-boxes from separate .txt files on the server, though editing them and removing or adding options to them through any way other than manually opening the file and typing in it or deleting it would be complex as well.
Unless you have some extra information at the level of the combo-box itself, just a simple table of combo-box items would be enough:
CREATE TABLE COMBO_BOX_ITEM (
COMBO_BOX_ID INT,
VALUE VARCHAR(255),
PRIMARY KEY (COMBO_BOX_ID, VALUE)
)
To get items of a given combo-box:
SELECT VALUE FROM COMBO_BOX_ITEM WHERE COMBO_BOX_ID = <whatever>
The nice thing about this query is that it can be satisfied by a simple range scan on the primary index. In fact, assuming the query optimizer of your DBMS is clever enough, the table heap is not touched at all, and you can eliminate this "unnecessary" heap by clustering the table (if your DBMS supports clustering). Physically, you'd end-up with just a single B-Tree representing the whole table.
Use a single table Countries and another for Job Descriptions setup like so:
Countries
ID | Name | JobsOffered | Jobs Available
_________________________________________
1 | India | 1,2,7,6,5 | 5,6
2 | China | 2,7,5, | 2,7
etc.
Job Descriptions
ID | Name | Description
___________________________________
1 | Shoe Maker | Makes shoes
2 | Computer Analyst | Analyzes computers
3 | Hotdog Cook | Cooks hotdogs well
Then you could query your database for the country and get the jobs that are available (and offered) then simply query the Job Description table for the names and display to the user which jobs are available. Then when one job is filled or is opened all you have to do is Update the contry table with the new jobID.
Does this help? (In this case you will need a separate table for each combo-box, as suggested, and you have referencing IDs for the jobs available)

Suggestions for implementing a document management system in which some documents have runtime replaced fields

One of my apps is a Document Management System in which the documents are stored as blob fields into a db. This is not a language specific question, anyway I put Delphi in tags since this is the community to which i tipically ask questions (and many people that uses Delphi faces these problems).
One feature I need to add is to programmatically add some data to the document. I make a simple example just to get the idea. One field is the date in which the document has been created. For this the user will type "a tag" for example <DOCUMENT_DATE> and the date will be automatically substituted when the docuemnt is extracted from db.
So I have 2 main concerns. ONe is what to use as "tag". The simplest thing is to use a text tag, so simply typing into the docuemnt and then do Search & Replace text (using for example MS Word ActiveX). I already do this for other purposes. AN alternative could be using bookmarks or another technique.
The other question is strictly related with the previous.
How do I store it? My first idea is to store the document in DB with the "tags", so when it is "checked out" the user sees the tags, while when the user opens it (in readonly mode) he sees the subsituted text. (so in first case he sees and in the second "12 october 2011").
In this way I store the file once, but every time it is opened there is an overhead in processing it and doing the Search Replace thing, that can be also relatively slow. So this is why I asked for other techniques. Like serach replace for bookmark. The fastest the better.
The alternative is to store the document twice: once with the "tags" and the other with the "substituted veresion". This will be good for performance: no Searh & Replace but simply when the document is openeed in "checkout" mode I will open the one with the tags, while whne I open it in readonly mode I will open the subsituted one.
This of course takes more storage, for every document version (revision1, revision2, ...) I need to store 2 files.
I feel double storage is the best, because it won't affect perfomance at all, I mean it will be as fast as now, just the checkin process will be slower since I need to save 2 files and not one. Moreover by not enabling this auto substitution feature on all documents by default I won't have double db size.
But anyway I would like to hear some comments, since it is a quite crucial decision.
It really does not make sense to store identical data twice.
in fact it is a really bad idea, mainly from a consistancy point of view.
The way you do this is to store stuff in different tables and create links between the tables.
This is a process called normalization.
Here's an example loosely inspired by your post using MySQL:
TABLE document
--------------
id UNSIGNED INTEGER AUTO_INCREMENT PRIMARY KEY,
data BLOB
TABLE tag
------------
id UNSIGNED INTEGER AUTO_INCREMENT PRIMARY KEY,
tag VARCHAR(20)
TABLE tag_link
-------------------
tag_id UNSIGNED INTEGER,
reference_nr UNSIGNED INTEGER,
PRIMARY KEY (tag_id, reference_nr)
FOREIGN KEY (tag_id) REFERENCES tag(id) ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (reference_nr) REFERENCES post(reference_nr) ON DELETE CASCADE ON UPDATE CASCADE,
TABLE post
----------------
reference_nr UNSIGNED INTEGER NOT NULL,
revision UNSIGNED INTEGER NOT NULL DEFAULT 1,
document_id UNSIGNED INTEGER,
title VARCHAR(255),
creation_date TIMESTAMP,
other_fields .....
PRIMARY KEY (reference_nr, revision),
FOREIGN KEY (document_id) REFERENCES document(id) ON DELETE SET NULL ON CASCADE UPDATE
Now you can add tags to a post, all revisions of a post share the same tags.
Revisions of a post can link to the same document, or to different documents no need to duplicate data.
If you want to get all the lastest revisions of documents with certain tags, you use the following query:
SELECT p.title, d.data, GROUP_CONCAT(t.tag) AS tags
FROM post p
LEFT JOIN d.data ON (p.document_id = d.id)
INNER JOIN taglink tl ON (tl.reference_nr = p.reference_nr)
INNER JOIN tags t ON (tl.tag_id = t.id)
WHERE t.tag IN ('test','test2')
GROUP BY p.reference_nr /*only works in MySQL because other db's do not support ANSI SQL 2003*/
HAVING p.revision = MAX(p.revision)
ORDER BY p.creation_date DESC
I see two other possibilities worth considering.
1. Use RTF
If your document templates are Word documents, I'd rather store them as RTF.
RTF is just plain ASCII, and even if it is a proprietary format, it is well documented, and can be easily parsed. Word is able to save its content and read it as RTF. If you have pictures within, it can grow, but you can zip it before storing as BLOB in your database (and you may embed EMF pictures).
Then you can process those RTF content very fast in your code, changing all <DOCUMENT_DATE> using the latest version of the date field value.
I use this technique in several applications, and it gives very good results. See for instance how our SynProject tool generates Word documents from plain text, replacing tags, setting bookmarks or indexes on the fly. With RTF, you can do much more than just replacing a tag, but create a whole document easily.
For end-user input, you can use a basic TRichEdit or a more advanced (but not free) TRichView instead of Word.
You may consider using HTML instead of RTF, but it is much less printing-friendly.
2. Use a report engine
Another possibility could be to use a code-based report engine, then create PDF files.
Our Open Source units can be used from a simple reporting class to create easily the file content, preview it on screen and/or print/export as PDF. It is much easier than RTF to work with, but the layout has to be set in your code, or with text-based / wiki-like templates to be stored in your DB.

Database design: Splitting a blog entry into multiple pages

What is the best database strategy for paginating a blog entry or page content where some entries may be a single page and some may span multiple pages? Note: The content would be article-like rather than a list of items.
The method that I'm currently considering is storing all of the content in a single text field and using a page separator like {pagebreak}. Upon retrieval, the content would be split into an array by the page separator and then the page would display the appropriate index. Is this the best way to go about it, or is there a better approach?
I think your current idea would be the best option. Makes it a lot easier to move the page breaks if you ever want to, or to put them in when you originally compose the article. Also allows you to have a print page option, where the entire article is in 1 field.
the easy way (now, but you'll pay later )is to store the entire article within one text field, but you give up some display control because you'll might need to put some html in that text. If you put html in the text, you'll have a lot of data to fix if you ever change your web page's look/feel. This may not be an issue
As a rule I try not to ever put html into the database. You might be better off using XML to define your article, and store that in one text field, so your application can properly render the contents in a dynamic way. You could store page breaks in the XML, or let the app read in the entire article and split it up dynamically based on your current look/feel.
You can use my "poor man's CMS" schema (below) if you don't want to use XML. It will give you more control over the formatting than the "all text in one field" method.
these are just a wild guess based on your question
tables:
Articles
--------
ArticleID int --primary key
ArticleStatus char(1) --"A"ctive, "P"ending review, "D"eleted, etc..
ArticleAuthor varchar(100) --or int FK to a "people" table
DateWritten datetime
DateToDisplay datetime
etc...
ArticleContent
--------------
ArticleID int --primary key
Location int --primary key, will be the order to display the article content, 1,2,3,4
ContentType char(1) --"T"ext, "I"mage, "L"ink, "P"age break
ArticleContentText
------------------
ArticleID int --primary key
Location int --primary key
FormatStyle char(1) --"X"extra large, "N"ormal, "C"ode fragment
ArticleText text
ArticleContentImage
-------------------
ArticleID int --primary key
Location int --primary key
AtricleImagePath varchar(200)
AtricleImageName varchar(200)
You can still put the entire article in one field, but you can split it up if it contains different types of "things".
If you have an article about PHP code with examples, the "one field method" would force you to put html in the text to format the code examples. with this model, you store what is what, and let the application display it properly. You can add and expand different types, put page breaks in, remove them. You can store your content in multiple "ArticleContentText" rows each representing a page break, or include "ArticleContent" rows that specify page breakes. You could let the app read the entire article and then only display what it wants.
I think the correct approach is what you've mentioned: the entry should be stored in the database as a single entry, and then you can use markup / the UI layer to determine where pagebreaks or other formatting should occur.
The database design shouldn't be influenced by the UI concepts - because you might decide to change how they are displayed down the road, and you need your database to be consistent.
You're much better off leaving formatting like this on the client side. Let the database hold your data and your application present it to the user in the correct format.
It seems to me like a good solution. This way you will have your article as one piece and have the possibility to paginate it when necesary.

Resources