Configuration vs Database storage - database

I will keep this short, I am looking to store product plan data, these are the plans that the users would pick for their payment options. This data include how much the plan cost and what the unit details of the plan are, like what makes a unit (day/week/month) and fairly simple data about the plan. These plans may or may not change once a month or once a year, the company is a start up and things are always changing on the 11th hour and contently so there is no real way to predict when they will change. A co-worker and I are discussing whether these values should be stored in the web.config (where they currently are) or move them to the database.
I have done some googling and I have not found any good resource that help draw a clear line of when something should be in the database or in the web config. I wanted to know what your thought on this was and see if someone could clearly define when data should be stored in config or in the database.
Thanks for the help!

From the brief description you provide, it seems to me that the configuration data, eventually, may be accessed not just by your web server-based application running on one computer, but also by other supporting applications, such as end-of-month batch jobs, that you may want to run on other computers. To support that possibility, it would be a good idea to store the data in some sort of centralized repository that can be accessed remotely from multiple computers.
Storing the configuration data in a database is the obvious way to meet that requirement. But if you don't want to do that, then another approach would be to store the configuration data in a file on a company-internal (rather than public) web/ftp server. Then an application can use a utility such as curl to retrieve the configuration file from the web/ftp server.
Of those two approaches, I think using a database is probably best, because it provides an ergonomic way to not just read the configuration data, but also update it.

Related

Storing images in a database versus a filesystem

Well we all know how many arguments and lives have been lost with the discussion of using databases for file storage (images specifically). I'm in a bit of a pickle when it comes to deciding on how to proceed with my project.
I have a website that allows admins to upload employee pictures. For now, these pictures are stored in BLOB in my MySQL database. Also, I have a windows application that runs alongside the website. This application enables employees to punch in and have their pictures appear when they've successfully done so. The picture is retrieved via a mysql query within the application (from a non-local remote location) that converts the image content to a readable image that's being outputted in a picture box, confirming the identity of the employee.
In my eyes, it is much much easier to have the images stored in the database and have them retrieved via a simple query. I found this a lot easier than storing image paths in the database and having to deal with the application downloading the images. I also don't have to deal with collisions, folder organization and security and paths being re-written for x,y reasons, etc etc.
The images stored in the DB are a mere 20 kb after being cropped to a certain size. My question is, is it still worth embedding the database with image paths or should they simply be stored as they are right now? If storing images in the database is still ill-advised in this case, is there a formal way to store image paths?
Any help on this would be greatly appreciated. If this question doesn't belong here, I'll be happy to move it.
If the images are user data, rather than part of your application's code or theme, then storing the images in the database is a good idea, becauseā€¦
Backups are easier to manage if all you have to back up is the database. On the other hand, if you store some application data in the database and some in the filesystem, then you'll have to coordinate the backup schedules of your database and your filesystem to ensure that the two are consistent.
If you have a database administrator at your disposal, then great! Your backups should already be taken care of. If not, then database backups may be slightly tricky to set up, but once you do have a backup system, it can be better than filesystem backups. For example, many database systems have support for streaming replication.
If your application is load-balanced and served by a pool of multiple webservers, then you'll either have to replicate the data to all of the machines, or share them among your servers using a network filesystem.
Of course, having the images on a filesystem also has its advantages, namely in performance and simplicity, since most webservers are built to serve static files. A hybrid approach could give you the best of both worlds:
The images stored in the database would be the authoritative data.
Your application can have a feature to extract them as files in their local filesystem as a kind of cache. That cache can be rebuilt at any time, since it is not authoritative.
The webserver can then serve the files directly from the filesystem.
There are several reasons why I think storing images in a database is a bad idea:
1) The server will have timestamp info associated with files that the database won't keep track of. if you ever need to this for forensics reasons, the DB solution will likely be limited in this regard. Feel free to save info about images uploaded regarding IP info, timestamp, etc. in the DB though too.
2) If you ever want these files used by, say, another system/service you'll have to constantly reference the database and interact with it, when you could far more easily just target a specific folder.
3) Any time an image needs to be retrieved, you have to open a connection to the database just to generate it. This may add extra code and steps to things that could be easier to implement by pointing to a folder.
To avoid naming collisions, if I were on a Linux box, I'd use something like a Unix timestamp as a prefix to the filename when it's saved, or simply use that (+ maybe a short random #) as the image ID altogether. So instead of 'jane-image.jpg', it'd be '1407369600_img3547.jpg'. Then, just put a reference to that in the DB and viola, that's a random enough ID where there should never be a collision, unless time starts flowing backwards. Whatever the Windows timestamp equivalent is would be used, obviously.
NOTE: What you're doing now isn't bad and from what it sounds like it may work best for you...but generally speaking I try not to put everything in the hands of a database, just because I can. But that's me :)

Database blobs vs Disk stored files

So I have this requirement that says the app must let users upload and download about 6000 files per month (mostly pdf, doc, xls).
I was thinking about the optimal solution for this. Question is whether I'll use BLOb's in my database or a simple file hierarchy for writing/reading these bunch of files.
The app architecture is based on Java 1.6, Spring 3.1 and DOJO, Informix 10.X.
So I'm here just to be advised based on your experience.
When asking what's the "best" solution, it's a good idea to include your evaluation criteria - speed, cost, simplicity, maintenance etc.
The answer Mikko Maunu gave is pretty much on the money. I haven't used Informix in 20 years, but most databases are a little slow when dealing with BLOBs - especially the step of getting the BLOB into and out of the database can be slow.
That problem tends to get worse as more users access the system simultaneously, especially if they use a web application - the application server has to work quite hard to get the files in and out of the database, probably consumes far more memory for those requests than normal, and probably takes longer to complete the file-related requests than for "normal" pages.
This can lead to the webserver slowing down under only moderate load. If you choose to store the documents in your database, I'd strongly recommend running some performance tests to see if you have a problem - this kind of solution tends to expose flaws in your setup that wouldn't otherwise come to light (slow network connection to your database server, insufficient RAM in your web servers, etc.)
To avoid this, I've stored the "master" copies of the documents in the database, so they all get backed up together, and I can ask the database questions like "do I have all the documents for user x?". However, I've used a cache on the webserver to avoid reading documents from the database more than I needed to. This works well if you have a "write once, read many" time solution like a content management system, where the cache can earn its keep.
If you have other data in database in relation to these files, storing files to file system makes it more complex:
Back-up should be done separately.
Transactions have to be separately implemented (as far as even possible for file system operations).
Integrity checks between database and file system structure do not come out of the box.
No cascades: removing users pictures as consequence of removing user.
First you have to query for path of file from database and then pick one from file system.
What is good with file system based solution is that sometimes it is handy to be able to directly access files, for example copying part of the images somewhere else. Also storing binary data of course can dramatically change size of database. But in any case, more disk storage is needed somewhere with both solutions.
Of course all of this can ask more DB resources than currently available. There can be in general significant performance hit, especially if decision is between local file system and remote DB. In your case (6000 files monthly) raw performance will not be problem, but latency can be.

Caching moderate amounts of data in a web app - DB or flat files?

A web app I'm working on requires frequent parsing of diverse web resources (HTML, XML, RSS, etc). Once downloaded, I need to cache these resources to minimize network load. The app requires a very straightforward cache policy: only re-download a cached resource when more than X minutes have passed since the access time.
Should I:
Store both the access time (e.g. 6/29/09 at 10:50 am) and the resource itself in the database.
Store the access time and a unique identifier in the database. The unique identifier is the filename of the resource, stored on the local disk.
Use another approach or third party software solution.
Essentially, this question can be re-written as, "Which is better for storing moderate amounts of data - a database or flat files?"
Thanks for your help! :)
NB: The app is running on a VPS, so size restrictions on the database/flat files do not apply.
To answer your question: "Which is better for storing moderate amounts of data - a database or flat files?"
The answer is (in my opinion) Flat Files. Flat files are easier to backup, and easier to remove.
However, you have extra information that isn't encapsulated in this question, mainly the fact that you will need to access this stored data to determine if a resource has gone stale.
Given this need, it makes more sense to store it in a database. Flat Files do not lend themselves well for random access, and search, compared to a relational DB.
Depends on the platform, IF you use .NET
The answer is 3, use Cache object, ideally suited for this in ASP.NET
You can set time and dependency expiration,
this doc explains the cache object
https://web.archive.org/web/1/http://articles.techrepublic%2ecom%2ecom/5100-10878_11-5034946.html
Neither.
Have a look at memcached to see if it works with your server/client platform. This is easier to set up and performs much better than filesystem/rdbms based caching, provided you can spare the RAM needed for the data being cached.
All of the proposed solutions are reasonable. However, for my particular needs, I went with flat files. Oddly enough, though, I did so for reasons not mentioned in some of the other answers. It doesn't really matter to me that flat files are easier to backup and remove, and both DB and flat-file solutions allow for easy checking of whether or not the cached data has gone stale. I went with flat files first and foremost because, on my mid-sized one-box VPS LAMP architecture, I think it will be faster than a third-party cache or DB-based solution.
Thanks to all for your thoughts! :)

Database design for physics hardware

I have to develop a database for a unique environment. I don't have experience with database design and could use everybody's wisdom.
My group is designing a database for piece of physics hardware and a data acquisition system. We need a system that will store all the hardware configuration parameters, and track the changes to these parameters as they are changed by the user.
The setup:
We have nearly 200 detectors and roughly 40 parameters associated with each detector. Of these 40 parameters, we expect only a few to change during the course of the experiment. Most parameters associated with a single detector are static.
We collect data for this experiment in timed runs. During these runs, the parameters loaded into the hardware must not change, although we should be able to edit the database at any time to prepare for the next run. The current plan:
The database will provide the difference between the current parameters and the parameters used during last run.
At the start of a new run, the most recent database changes be loaded into hardware.
The settings used for the upcoming run must be tagged with a run number and the current date and time. This is essential. I need a run-by-run history of the experimental setup.
There will be several different clients that both read and write to the database. Although changes to the database will be infrequent, I cannot guarantee that the changes won't happen concurrently.
Must be robust and non-corruptible. The configuration of the experimental system depends on the hardware. Any breakdown of the database would prevent data acquisition, and our time is expensive. Database backups?
My current plan is to implement the above requirements using a sqlite database, although I am unsure if it can support all my requirements. Is there any other technology I should look into? Has anybody done something similar? I am willing to learn any technology, as long as it's mature.
Tips and advice are welcome.
Thank you,
Sean
Update 1:
Database access:
There are three lite applications that can write and read to the database and one application that can only read.
The applications with write access are responsible for setting a non-overlapping subset of the hardware parameters. To be specific, we have one application (of which there may be multiple copies) which sets the high voltage, one application which sets the remainder of the hardware parameters which may change during the experiment, and one GUI which sets the remainder of the parameters which are nearly static and are only essential for the proper reconstruction of the data.
The program with read access only is our data analysis software. It needs access to nearly all of the parameters in the database to properly format the incoming data into something we can analyze properly. The number of connections to the database should be >10.
Backups:
Another setup at our lab dumps an xml file every run. Even though I don't think xml is appropriate, I was planning to back up the system every run, just in case.
Some basic things about the design; you should make sure that you don't delete data from any tables; keep track of the most recent data (probably best with most recent updated datetime); when the data value changes, though, don't delete the old data. When a run is initiated, tag every table used with the Run ID (in another column); this way, you maintain full historical record about every setting, and can pin exactly what the state used at a given run was.
Ask around of your colleagues.
You don't say what kind of physics you're doing, or how big the working group is, but in my discipline (particle physics) there is a deep repository of experience putting up and running just this type of systems (we call it "slow controls" and similar). There is a pretty good chance that someone you work with has either done this or knows someone who has. There may be a detailed description of the last time out in someone's thesis.
I don't personally have much to do with this, but I do know this: one common feature is to have no-delete-no-overwrite design. You can only add data, never remove it. This preserves your chances of figuring out what really happened in the case of trouble
Perhaps I should explain a little more. While this is an important task and has to be done right, it is not really related to physics, so you can't look it up on Spires or on arXive.org. No one writes papers on the design and implementation of medium sized slow controls databases. But they do sometimes put it in their dissertations. The easiest way to find a pointer really is to ask a bunch of people around the lab.
This is not a particularly large database by the sounds of things. So you might be able to get away with using Oracle's free database which will give you all kinds of great flexibility with journaling (not sure if that is an actual word) and administration.
Your mentioning of 'non-corruptible' right after you say "There will be several different clients that both read and write to the database" raises a red flag for me. Are you planning on creating some sort of application that has a interface for this? Or were you planning on direct access to the db via a tool like TOAD?
In order to preserve your data integrity you will need to get really strict on your permissions. I would only allow one (and a backup) person to have admin rights with the ability to do the data manipulation outside the GUI (which will make your life easier).
Backups? Yes, absolutely! Not only should you do daily, weekly and monthly backups you should do full and incremental. Also, test your backup images often to confirm they are in fact working.
As for the data structure I would need much greater detail in what you are trying to store and how you would access it. But from what you have put here I would say you need the following tables (to begin with):
Detectors
Parameters
Detector_Parameters
Some additional notes:
Since you will be doing so many changes I recommend using a version control like SVN to keep track of all your DDLs etc. I would also recommend using something like bugzilla for bug tracking (if needed) and using google docs for team document management.
Hope that helps.

How important is a database in managing information?

I have been hired to help write an application that manages certain information for the end user. It is intended to manage a few megabytes of information, but also manage scanned images in full resolution. Should this project use a database, and why or why not?
Any question "Should I use a certain tool?" comes down to asking exactly what you want to do. You should ask yourself - "Do I want to write my own storage for this data?"
Most web based applications are written against a database because most databases support many "free" features - you can have multiple webservers. You can use standard tools to edit, verify and backup your data. You can have a robust storage solution with transactions.
The database won't help you much in dealing with the image data itself, but anything that manages a bunch of images is going to have meta-data about the images that you'll be dealing with. Depending on the meta-data and what you want to do with it, a database can be quite helpful indeed with that.
And just because the database doesn't help you much with the image data, that doesn't mean you can't store the images in the database. You would store them in a BLOB column of a SQL database.
If the amount of data is small, or installed on many client machines, you might not want the overhead of a database.
Is it intended to be installed on many users machines? Adding the overhead of ensuring you can run whatever database engine you choose on a client installed app is not optimal. Since the amount of data is small, I think XML would be adequate here. You could Base64 encode the images and store them as CDATA.
Will the application be run on a server? If you have concurrent users, then databases have concepts for handling these scenarios (transactions), and that can be helpful. And the scanned image data would be appropriate for a BLOB.
You shouldn't store images in the database, as is the general consensus here.
The file system is just much better at storing images than your database is.
You should use a database to store meta information about those images, such as a title, description, etc, and just store a URL or path to the images.
When it comes to storing images in a database I try to avoid it. In your case from what I can gather of your question there is a possibilty for a subsantial number of fairly large images, so I would probably strong oppose it.
If this is a web application I would use a database for quick searching and indexing of images using keywords and other parameters. Then have a column pointing to the location of the image in a filesystem if possible with some kind of folder structure to help further decrease the image load time.
If you need greater security due to the directory being available (network share) and the application is local then you should probably bite the bullet and store the images in the database.
My gut reaction is "why not?" A database is going to provide a framework for storing information, with all of the input/output/optimization functions provided in a documented format. You can go with a server-side solution, or a local database such as SQLite or the local version of SQL Server. Either way you have a robust, documented data management framework.
This post should give you most of the opinions you need about storing images in the database. Do you also mean 'should I use a database for the other information?' or are you just asking about the images?
A database is meant to manage large volumes of data, and are supposed to give you fast access to read and write that data in spite of the size. Put simply, they manage scale for data - scale that you don't want to deal with. If you have only a few users (hundreds?), you could just as easily manage the data on disk (say XML?) and keep the data in memory. The images should clearly not go in to the database so the question is how much data, or for how many users are you maintaining this database instance?
If you want to have a structured way to store and retrieve information, a database is most definitely the way to go. It makes your application flexible and more powerful, and lets you focus on the actual application rather than incidentals like trying to write your own storage system.
For individual applications, SQLite is great. It fits right in an app as a file; no need for a whole DRBMS juggernaut.
There are a lot of factors to this. But, being a database weenie, I would err on the side of having a database. It just makes life easier when things changes. and things will change.
Depending on the images, you might store them on the file system or actually blob them and put them in the database (Not supported in all DBMS's). If the files are very small, then I would blob them. If they are big, then I would keep them on he file system and manage them yourself.
There are so many free or cheap DBMS's out there that there really is no excuse not to use one. I'm a SQL Server guy, but f your application is that simple, then the free version of mysql should do the job. In fact, it has some pretty cool stuff in there.
Our CMS stores all of the check images we process. It uses a database for metadata and lets the file system handle the scanned images.
A simple database like SQLite sounds appropriate - it will let you store file metadata in a consistent, transactional way. Then store the path to each image in the database and let the file system do what it does best - manage files.
SQL Server 2008 has a new data type built for in-database files, but before that BLOB was the way to store files inside the database. On a small scale that would work too.

Resources