Storing website content: database or file? - database

I'm building a website, and I'm planning to publish various kinds of posts, like tutorials, articles, etc. I'm going to manage it with php, but when it comes to storing the content of each post, the very text that will be displayed, what's a better option: using a separate text file or adding it as an for each entry in the database?
I don't see a reason why not to use the database directly to store the content, but it's the first time I use a DB and that feels kind of wrong.
What's your experience in this matter?

Ok Friends I am visiting this question once again for the benefit of those who will read this answer. After a lot of trial and error I have reached a conclusion that keeping text in database is a lot convenient and easy to manipulate. Thus all my data is now with in database. Previously I had some details in database and the text part in file but now i have moved all to database.
The only problem is that when editing your posts the field like title or tags or subject etc are changed on a simple html form. but for the main content I have created a text area. however i just have to cut and copy it from the text area to my favorite text editor. and after the editing copy and paste it back.
some benefits that forced me to put every thing in database are
EASY SEARCH: you can run quires like mysql LIKE on your text (specially main content).
EASY ESCAPING: you can run commands easily on your data to escape special characters and make it suitable for display etc.
GETTING INPUT FROM USER: if you want the user to give you input it makes sense to save his input in database , escape it and manipulate it as and when required.
Functions like moving tables , back up, merging two records, arranging posts with similar content in sequential order... etc etc all is more easy in database than the file system.
in file system there is always the problem of missing files, different file names, wrong file shown for different title etc etc
I do not escape user input before adding it to database just before display. this way no permanent changes are stored to the text.(i don't know if that's ok or not)

Infact I am also doing something like you. However I have reached the conclusion as explained below (almost the same as mentioned in the answer above me). I hope you must have made the decision by now but still I will explain it so that it is useful for future.
My Solution: I have a table called content_table it contain details about each and every article, post or anything else that I write. The main (text portion) of the articles/post is placed in a directory in a .php or .txt file. When a user clicks on an article to read, a view of the article is created dynamically by using the information in database and then pulling the text part (I call it main content) from the .txt file. The database contain information like _content_id_, creation date, author, catagory (most of this become meta tags).
The two major benefits are:
performance since less load on datbase
editing the text content is easy.

I am giving comments based on my experience ,
Except attachments you can store things in DB, why because managing content,back up, restore ,querying , searching especially full text search will be easy.
Store attached files in some folder and keep path in DB tables.
Even more if you r willing to implement search inside attachments you can go for some search engine like lucene which is efficient to search static contents.
keeping attachment in DB or in file system is upto the level of important to the files.

Related

Is it possible to write/read metadata for a text file using Labview?

The situation
I use Labview 2012 on Windows 7
my test result data is written in text files. First, information about the test is written in the file (product type, test type, test conditions etc) and after that the logged data is written each second.
All data files are stored in folders, sorted to date and the names of the files contain some info about the test
I have years worth of data files and my search function now only works on the file names (opening each file to look for search terms costs too much time)
The goal
To write metadata (additional properties like Word files can have) with the text files so that I can implement a search function to quickly find the file that I need
I found here the way to write/read metadata for images, but I need it for text files or something similar.
You would need to be writing to data files that supports meta data to begin with (such as LabVIEW TDMS or datalog file formats). In a similar situation, I would simply use a separate file with the same name, but a different extension for example. Then you can index those file names, and if you want the data you just swap the meta data filename extension and you are good to go.
I would not bother with files and use database for results logging. It may be not what you wiling to do, but this is the ultimate solution for the search problem and it open a lot of data analytics possibilities.
The metadata in Word files is from a feature called "Alternative Data Streams" which is actually a function of NTFS. You can learn more about it here.
I can't say I've ever used this feature. I don't think there is a nice API for LabVIEW, but one could certainly be made. With some research you should be able to play around with this feature and see if it really makes finding files any easier. My understanding is that the data can be lost if transferred over the network or onto a non-NTFS thumbdrive.

Best "architecture" to store lots of files per user on server

i would like to ask for your opinion and advice.
In my application i need to store files uploaded from user to provide import to database - it could be XML or excel file (.xlsx), i guess max file size about 500kB per file.
There is need to store files because of import to database, which is not done immediately and also because of backup.
I consider scenario about thousands (ten thousands) of users.
Scenario - one user can upload many files to many categories. It means that user can upload file_1 to category 1, file_2 to category_2, but also file_3 to category_2_1(subcategory of category_2).
Generally, there is some kind of category tree and user can upload many files to many nodes.
Because of import application, filename will always contain :
user_code_category_code_timestamp
And my problem is, that i do not know that is the best way to store that files.
should i have one directory per user -> one directory per category -> relevant files
should i have one directory per user -> all user files
should i have one root directory -> all users and all files
?
In the best way i mean - there must be application for import, which will list relevant files in category and for relevant user. As i wrote above, there are many ways, so i am a bit confused.
What else should i consider ? File system limitations ?
Hope you understand problem.
Thank you.
Are you using some kind of a framework? Best case is you use a plugin for it.
The standard basic solution for storing files is to have one directory for all files(images for example). When you save a file, you change the name of the file so they do not duplicate in the directory. You keep all other data in a DB table.
From that base - you can improve and change the solution depending on the business logic.
You might want to restrict access to the files, you might want to put them in a tree directory if you need browsing in them.
And so on...
Thank you for this question! It was difficult to find answers for this online, but in my case I have potentially 10k's of images/pdfs/files/etc. and it seems that using hashes and saving to one location directory is ideal and makes it much less complicated.
Useful things to think about:
1. Add some additional meta data (you can do this in S3 buckets)
2. I would make sure you have the option to resize images if relevant such as ?w=200&h=200.
3. Perhaps save a file name that can be displayed if the user downloads it so it doesn't give them some weird hash.
4. if you save based on a hash that works off of the current time, you can generate non-duplicating hashes.
5. trying to view all the files at once would hurt performance, but when your app is requesting only one file at a time based on endpoint this shouldn't be an issue.

Database or file type for containing modular page contents?

I'm building a personal web and I have a problem planning my DB structure.
I have a portfolio page and each artwork has its own description page.
So I need to save contents in a file or a database.
The Description page has some guideline,
but the length, the number, and the order of elements are free.
for example a page may have just one paragraph or more,
and in each paragraph, many footage codes and text blocks can be mixed.
the Question is:
(I made a simple diagram to describe my needs).
What can I choose for my data type to save that contents structure maintaining the order?
I'm used to XML and I know XML can be one of the choice, but if the contents is big, it will be hard to read and slow.
I've heard that JSON alternates XML these days, but as I searched, JSON cannot maintain the order of elements, can it?
Waiting for clever recommendations:)
Thanks.

Database for record storage with revisioning

I've recently been tasked with improving a records database that consists of the following:
All records are stored in one giant
XML file.
Any changes or updates to
these records are done by hand within
this XML file.
Each record contains
an 'Updated' datetime stamp to keep
some form of revision control.
The entire XML file is also checked into
a subversion repository to keep
revision control for the entire
collection.
This records database is strictly for internal use only and does not face any public interface.
I'm a bit of a newbie to database design, but the above method feels a little cumbersome. I was thinking of moving all of the above to some form of perhaps a SQLite database and building some form of a front end to update/remove/view entries while keeping track of any changes to that DB. Are there better ways to do this or is it pretty standard to have a system like is already in place?
Putting the information into a database is a good solution. Another decent solution is just making each record its own file and using a revision-control system to track the changes to each individual record. This is much more efficient than having one glommed-together file :-).
Doesnt actually sound that bad! Depends how often its updated and how many programs read the XML.
I would try to approaches depending on the above.
First get one of the nifty XML validating editors like XML spy and define an XML Schema if or xsd if you havent already got one. You you now have a clean user interface that can update and validate the file. Continue to use the revision control to system to keep a history.
Secondly -- if the updates are really simple write a quick Java/C#/VB or whatever program to update the XML -- otherwise carry on as before.

Best generic strategy to group items using multiple criteria

I have a simple, real life problem I want to solve using an OO approach. My harddrive is a mess. I have 1.500.000 files, duplicates, complete duplicate folders, and so on...
The first step, of course, is parsing all the files into my database. No problems so far, now I got a lot of nice entries which are kind of "naturaly grouped". Examples for this simple grouping can be obtained using simple queries like:
Give me all files bigger than 100MB
Show all files older than 3 days
Get me all files ending with docx
But now assume I want to find groups with a little more natural meaning. There are different strategies for this, depending on the "use case".
Assume I have a bad habit of putting all my downloaded files first on the desktop. Then I extract them to the appropriate folder, without deleting the ZIP file always. The I move them into a "attic" folder. For the system, to find this group of files a time oriented search approach, perhaps combined with a "check if ZIP is same then folder X" would be suitable.
Assume another bad habit of duplicating files, having some folder where "the clean files" are located in a nice structure, and another messy folders. Now my clean folder has 20 picture galleries, my messy folder has 5 duplicated and 1 new gallery. A human user could easily identify this logic by seeing "Oh, thats all just duplicates, thats a new one, so I put the new one in the clean folder and trash all the duplicates".
So, now to get to the point:
Which combination of strategies or patterns would you use to tackle such a situation. If I chain filters the "hardest" would win, and I have no idea how to let the system "test" for suitable combination. And it seemes to me it is more then just filtering. Its dynamic grouping by combining multiple criteria to find the "best" groups.
One very rough approach would be this:
In the beginning, all files are equal
The first, not so "good" group is the directory
If you are a big, clean directory, you earn points (evenly distributed names)
If all files have the same creation date, you may be "autocreated"
If you are a child of Program-Files, I don't care for you at all
If I move you, group A, into group C, would this improve the "entropy"
What are the best patterns fitting this situation. Strategy, Filters and Pipes, "Grouping".. Any comments welcome!
Edit in reacation to answers:
The tagging approach:
Of course, tagging crossed my mind. But where do I draw the line. I could create different tag types, like InDirTag, CreatedOnDayXTag, TopicZTag, AuthorPTag. These tags could be structured in a hirarchy, but the question how to group would remain. But I will give this some thought and add my insights here..
The procrastination comment:
Yes, it sounds like that. But the files are only the simplest example I could come up with (and the most relevant at the moment). Its actually part of the bigger picture of grouping related data in dynamic ways. Perhaps I should have kept it more abstract, to stress this: I am NOT searching for a file tagging tool or a search engine, but an algorithm or pattern to approach this problem... (or better, ideas, like tagging)
Chris
You're procrastinating. Stop that, and clean up your mess. If it's really big, I recommend the following tactic:
Make a copy of all the stuff on your drive on an external disk (USB or whatever)
Do a clean install of your system
As soon as you find you need something, get it from your copy, and place it in a well defined location
After 6 months, throw away your external drive. Anything that's on there can't be that important.
You can also install Google Desktop, which does not clean your mess, but at least lets you search it efficiently.
If you want to prevent this from happening in the future, you have to change the way you're organizing things on your computer.
Hope this helps.
I don't have a solution (and would love to see one), but I might suggest extracting metadata from your files besides the obvious name, size and timestamps.
in-band metadata such as MP3 ID3 tags, version information for EXEs / DLLs, HTML title and keywords, Summary information for Office documents etc. Even image files can have interesting metadata. A hash of the entire contents helps if looking for duplicates.
out-of-band metadata such as can be stored in NTFS alternate data streams - eg. what you can edit in the Summary tab for non-Office files
your browsers keep information on where you have downloaded files from (though Opera doesn't keep it for long), if you can read it.
You've got a fever, and the only prescription is Tag Cloud! You're still going to have to clean things up, but with tools like TaggCloud or Tag2Find you can organize your files by meta data as opposed to location on the drive. Tag2Find will watch a share, and when anything is saved to the share a popup appears and asks you to tag the file.
You should also get Google Desktop too.

Resources