Beginners advice on server side file storage - sql-server

I'm new to stack overflow so hi. I'm also pretty new to the programming scene so I'll try be as technically adept as possible.
Basically I'm designing and application that stores files and schema's on the server side.
The simple architecture is that there will be 2 databases. One for file storage and the other to hold XML files that point to the files.
I'm very new to this and after looking around at some different options (fedora commons, php nuke, remository) I'm feeling a bit lost and just looking for some advice or good paths on the topic.
For general information. The files being stored in the repository/ database will be quite small but there will be a lot of them. The XML schema will point towards the location of these files.
EG files location is /images/stackOverflow/mine.jpg
And the XML would look like
<images>
<location>/images/stackOverflow/mine.jpg</location>
</images>
The architecture is pretty flexible and theres no content management system in place so any advice is appreciated.
I done a search on stack but most questions were on client side storage instead of server side.

For storing documents you could look into a Document-Oriented Database or Document Management System.
Document-Oriented Database
You can also look at other NoSQL databases that may make more sense than a typical RDBMS that are XML based.
XML Database
Yet another option is utilizing an RDBMS for document storage which allows you to organize documents by record and perform industry standard SQL queries to search for and retreive documents.

Related

Rails when to avoid database

how big a data is too big to go into a database?
If optimised, database access can be faster than simple file system access.
But assuming the server is running on either:
a conventional budget home server
AWS
Is there any reason to use database for storing things larger than some articles?
The power of X-sendfile made me decide to move some data to the filesystem, but to what extent should I do this?
The only data I wouldn't store on a database are files.
Files whose content I do not need to search.
Like images, videos etc.
Any other data, regardless of the size will go into a database.
If I have a JSON file it will go into a NoSQL database that can search and index JSON file.
If I have a Gigabytes of data on anything, anything at all it will go into a database.
Databases have much better mechanisms than anything anyone can rig for files in a filesystem.
You question lacks context for me to give a better answer.

Which database model is best to use SQL or NoSQL to store/retrieve high resolution media files for broadcast playout?

Currently using a SQL Server database to keep records of storage and retrieve of high resolution media files located in several SANs. However, due to several SQL Server databases involved, I'm curious to learn if No-SQL design approach would be best. Thank you.
NoSQL solutions are great at non-structured or variably-structured information like the layout of a web page, while SQL solutions are better at structured information. The general rule is that if you can fit it into a table structure, use that. Since you're just storing a pointer to a file location I don't think you'll get much by switching data engines, unless you're also storing other information along with the location like a bunch of hashtags that describe the video (for easier searching or "others like this one"). And even then it's probably easier to just have an indexed table of hashtags. If you want to share more information about your architecture the geniuses here can probably give you some pointers.

Storing rich text documents

This is a follow-up to another question I asked earlier today. I am creating a desktop app that stores rich text documents created in WPF (in a RichTextBox control). The app uses SQL Compact, and up until now, I had planned to store each document in a binary column in the database.
I am rethinking that approach. Would it be better practice to store each rich text document in the file system, rather than saving it to the database? I figure I could put the documents in the same folder with the database, then store a relative path to each document in its database record, along with other information about the document (tags and so on).
I'd like to know some pros and cons of that approach, along with ideas of what is generally considered best practice for this sort of thing. Thanks for your help.
Personally I tend to use the filesystem
Pro DB
Can search using SQL search features (will probably be a bit wonky with RTF becuase of the control codes)
Backup the MDF file & you've backed all the documents in one place
Can easily implement versioning
Easier to keep file data & stuff that references it in sync
Pro Filesystem
Loadable by external apps (and people)
A corrupt DB kills all your documents
Searchable via filesystem tools/indexers
Less complex IO code needed
Familiar to user
The path can point anywhere (i.e on another machine/another logical drive)
More portable IO code
Not sure why it has not been mentioned before but have you looked at the FILESTREAM data type that is available in SQL Server 2008 and above?
It combines the benefits of file system storage with the benefits of DB storage. Here is a link to a MS white paper http://download.microsoft.com/download/a/c/d/acd8e043-d69b-4f09-bc9e-4168b65aaa71/SQL2008UnstructuredData.doc
Another very strong point with filestream from my point of view is that is does not eat into the size limit of the express editions of SQL Server which can be very handy

NoSQL for filesystem storage organization and replication?

We've been discussing design of a data warehouse strategy within our group for meeting testing, reproducibility, and data syncing requirements. One of the suggested ideas is to adapt a NoSQL approach using an existing tool rather than try to re-implement a whole lot of the same on a file system. I don't know if a NoSQL approach is even the best approach to what we're trying to accomplish but perhaps if I describe what we need/want you all can help.
Most of our files are large, 50+ Gig in size, held in a proprietary, third-party format. We need to be able to access each file by a name/date/source/time/artifact combination. Essentially a key-value pair style look-up.
When we query for a file, we don't want to have to load all of it into memory. They're really too large and would swamp our server. We want to be able to somehow get a reference to the file and then use a proprietary, third-party API to ingest portions of it.
We want to easily add, remove, and export files from storage.
We'd like to set up automatic file replication between two servers (we can write a script for this.) That is, sync the contents of one server with another. We don't need a distributed system where it only appears as if we have one server. We'd like complete replication.
We also have other smaller files that have a tree type relationship with the Big files. One file's content will point to the next and so on, and so on. It's not a "spoked wheel," it's a full blown tree.
We'd prefer a Python, C or C++ API to work with a system like this but most of us are experienced with a variety of languages. We don't mind as long as it works, gets the job done, and saves us time. What you think? Is there something out there like this?
Have you had a look at MongoDB's GridFS.
http://www.mongodb.org/display/DOCS/GridFS+Specification
You can query files by the default metadata, plus your own additional metadata. Files are broken out into small chunks and you can specify which portions you want. Also, files are stored in a collection (similar to a RDBMS table) and you get Mongo's replication features to boot.
Whats wrong with a proven cluster file system? Lustre and ceph are good candidates.
If you're looking for an object store, Hadoop was built with this in mind. In my experience Hadoop is a pain to work with and maintain.
For me both Lustre and Ceph has some problems that databases like Cassandra dont have. I think the core question here is what disadvantage Cassandra and other databases like it would have as a FS backend.
Performance could obviously be one. What about space usage? Consistency?

How important is a database in managing information?

I have been hired to help write an application that manages certain information for the end user. It is intended to manage a few megabytes of information, but also manage scanned images in full resolution. Should this project use a database, and why or why not?
Any question "Should I use a certain tool?" comes down to asking exactly what you want to do. You should ask yourself - "Do I want to write my own storage for this data?"
Most web based applications are written against a database because most databases support many "free" features - you can have multiple webservers. You can use standard tools to edit, verify and backup your data. You can have a robust storage solution with transactions.
The database won't help you much in dealing with the image data itself, but anything that manages a bunch of images is going to have meta-data about the images that you'll be dealing with. Depending on the meta-data and what you want to do with it, a database can be quite helpful indeed with that.
And just because the database doesn't help you much with the image data, that doesn't mean you can't store the images in the database. You would store them in a BLOB column of a SQL database.
If the amount of data is small, or installed on many client machines, you might not want the overhead of a database.
Is it intended to be installed on many users machines? Adding the overhead of ensuring you can run whatever database engine you choose on a client installed app is not optimal. Since the amount of data is small, I think XML would be adequate here. You could Base64 encode the images and store them as CDATA.
Will the application be run on a server? If you have concurrent users, then databases have concepts for handling these scenarios (transactions), and that can be helpful. And the scanned image data would be appropriate for a BLOB.
You shouldn't store images in the database, as is the general consensus here.
The file system is just much better at storing images than your database is.
You should use a database to store meta information about those images, such as a title, description, etc, and just store a URL or path to the images.
When it comes to storing images in a database I try to avoid it. In your case from what I can gather of your question there is a possibilty for a subsantial number of fairly large images, so I would probably strong oppose it.
If this is a web application I would use a database for quick searching and indexing of images using keywords and other parameters. Then have a column pointing to the location of the image in a filesystem if possible with some kind of folder structure to help further decrease the image load time.
If you need greater security due to the directory being available (network share) and the application is local then you should probably bite the bullet and store the images in the database.
My gut reaction is "why not?" A database is going to provide a framework for storing information, with all of the input/output/optimization functions provided in a documented format. You can go with a server-side solution, or a local database such as SQLite or the local version of SQL Server. Either way you have a robust, documented data management framework.
This post should give you most of the opinions you need about storing images in the database. Do you also mean 'should I use a database for the other information?' or are you just asking about the images?
A database is meant to manage large volumes of data, and are supposed to give you fast access to read and write that data in spite of the size. Put simply, they manage scale for data - scale that you don't want to deal with. If you have only a few users (hundreds?), you could just as easily manage the data on disk (say XML?) and keep the data in memory. The images should clearly not go in to the database so the question is how much data, or for how many users are you maintaining this database instance?
If you want to have a structured way to store and retrieve information, a database is most definitely the way to go. It makes your application flexible and more powerful, and lets you focus on the actual application rather than incidentals like trying to write your own storage system.
For individual applications, SQLite is great. It fits right in an app as a file; no need for a whole DRBMS juggernaut.
There are a lot of factors to this. But, being a database weenie, I would err on the side of having a database. It just makes life easier when things changes. and things will change.
Depending on the images, you might store them on the file system or actually blob them and put them in the database (Not supported in all DBMS's). If the files are very small, then I would blob them. If they are big, then I would keep them on he file system and manage them yourself.
There are so many free or cheap DBMS's out there that there really is no excuse not to use one. I'm a SQL Server guy, but f your application is that simple, then the free version of mysql should do the job. In fact, it has some pretty cool stuff in there.
Our CMS stores all of the check images we process. It uses a database for metadata and lets the file system handle the scanned images.
A simple database like SQLite sounds appropriate - it will let you store file metadata in a consistent, transactional way. Then store the path to each image in the database and let the file system do what it does best - manage files.
SQL Server 2008 has a new data type built for in-database files, but before that BLOB was the way to store files inside the database. On a small scale that would work too.

Resources