File Vs Database - database

File Vs Database - database

I'm going to implement an Access Control List for each individual user so they can assign access to their own resources so they can hide stuff, for example, from their mothers, but show their friends.
Now storing ACL in a database seems like it can get pretty insane when each user is also a group, which can have many sub groups. So I'm thinking of storing the ACL stuff in a text file.
Good Idea? Bad Idea?
EDIT: I should note, I'm talking about an individual text file for each user. I'm thinking of creating an ACL class which I could serialize and write to the text file. My fear is that storing the ACL in a database would create insanely huge join tables and put a significant strain on the database server.

Given the option I would store ACL in a database:
The language to access and query the data is easier and standard (SQL)
The DB will give you transactions, indexes for fast query, integrity constraints and security.
Depending on how you store the data in a local file you may need to move your data files along with your application. Ex: Moving application from server1 to server2.
For data that is immutable (or don't change often) some form of cache should be used. So, independently if you're using File or DB, you should be caching some of this data for some period of time.
I'm pretty sure you can find good ACL schemas templates for relational database that you can use as a reference, like this document here: http://edhs1.gsfc.nasa.gov/waisdata/v2r20/ps/cd31110001.ps
I hope it helps.

I'm not sure how it's any easier storing it in a text file. The relationships of the data are the same. Granted, SQL is not always friendly hierarchical data, but neither is a flat text file.

Is your textfile protected against concurrent access? Else a database will be a better idea.

XML is ideal for hierarchical data. It's possible to work with hierarchical data in a database though, this explains how beautifully (the concepts apply not only to MySQL):
http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
Personally I wouldn't store data like that in a text file. The manipulation becomes a lot more difficult.

Concurrency will bite you. As will complexity and decreased performance as the file gets larger. As will the increased risk of corruption of something happens during writing of the file like power loss of system crashes.
A database will generally shield you from these issues and leave you to concentrate on the logic
You can implement hierarchical storage in a database using self joins e.g
ItemID Data ParentID
--------------------------
Where ParentID is a pointer to ItemID of a different row

My fear is that storing the ACL in a
database would create insanely huge
join tables and put a significant
strain on the database server.
With one text file per user, and your groups/sub-groups presumably in some other files, you'll have to roll your own joins, won't you?
The problem/requirement you have described is exactly what SQL is good for. SQL is the right tool for the job, no question.

Related

Storing Email Body in SQL Server database?

Guys I am building a bulk email sending application for my client and right now I am designing the database architecture. Basically there will be hundreds of thousands emails per day and I need to store them in database.
What will be the best way to store the email's body in database? Do I store them in varchar(max) column, or do I save them in file system and save their path in the database? Or is there any other approach to this? I am only worried about performance of the application. Btw I am using SQL Server 2008 R2.

Generally I don't recommend building bulk email sending activities as there are a lot things to be done to avoid considering your email as spam
However if you decided to do it your self you need to decide the content of the emails, Is it text only, HTML that may contains embedded images,...
You can use varchar(max) for the field type. performance will not be a big issue however consider thinking about the retention policies
if you would like to save the email as file then you can use FILESTREAM which will provide you with better performance given that you use the SqlFileStream APIs

I don't know SQL Server 2008 I use Mysql that it has BLOB or TEXT column.
I think that also SQL Server has this type of fields. Into a LONGBLOB field you can store about L + 4 bytes, where L < 2^32.
Besides, you can store also any attached files.

Just use varchar(max) as it will be the easiest option to keep all data in the database so you can migrate, replicate or whatever the database and everything will be kept intact. Storing part of the data outside the database will only complicate things without any gain.
Performance on the data part will not be a problem, you will have bigger problem getting the spam^H^H^H^H emails delivered out of your box.
Just think about data cleanup from the beginning, as storing each and every email separately will over time use a bit of diskspace. With suitable indexes the amount of data should not be a problem.

The best way to do this is probably not to do it :) You need to find out from the client precisely why they want to do this. More often than not people store bulk data like this because they think they should, then never ever look at it. If you need to store this data, for how long, and what's the archiving process going to be? As mentioned elsewhere storing a pointer to a template then the inserted values would be a much more compact way of doing this, but again, only do it if you really need to. Storing the emails as files in a file system isn't a crazy idea, but avoid directories with thousands of files in.
Something else to consider for large databases is the disaster recovery strategy - how long will it take to backup every night, how long to restore in a DR scenario. How many backups are you going to keep, online & offline, and how much space do you need for that? In this regard, having 'application data' and 'archived data' in seperate databases might be a good starting point.
From a technical SQL Server perspective there are things that could help such as table partitioning and data compression, but understanding the requirement is still the most important place to start.

Storing data in text files instead of SQL Server

I'm intending to use both of SQL Server and simple text files to save my data.
Information like Users data are going to be stored in SQL Server, RSS fedd for each user are going to be stored in folder with the user Id as a title and inside this folder I can put the files that going to store the data in, each file can take only 20 lines, if there is more than 20 then I make a new file.
When I need to reed this data I simply call the last file in the user's folder.
I need to know what is the advantages and disadvantages of using this method?
thanx

I would suggest you to store the text file data into either VARCHAR(8000) or Blob and store inside the table in database.
The advantages of storing in database is:
All your data is stored in a single place. It is very easy for you to backup and restore in other place, if required
Database by default comes with concurrency and if you have say multiple users trying to access the same row, same table, database handles it inherently
When you go for files and database kind of hybrid approach, you are going for distributed storage and you have to always make sure that they are consistent
If you want to just store the latest text file content, go for UPDATE. If you want to keep history of earlier text files content, go for SCD Type 2 kind of storage or go for historical table containing previous text file data
Database is a single contained unit and you can do so many things on it like : Transparent data encryption, masking, access control and all security related stuff in a single contained unit. In hybrid approach, you have to manage security in two places.
When all your data is in a single place, and once you have proper indexes, you can write queries and come up with so many different reporting use cases, using SQL. But, if the data is distributed, you have to manage how will be handling the different reporting use cases.

The question is not quite correct.
You should start with clarification of requirements for the application. Answer to yourself the following questions:
What type of data queries need to be executed (selects, updates, reports).
How many users will be. How often requests from them will be coming. Does data must be synchronized across users (Concurrency).
Need of authentication and authorization, localization.
Need for modification history support.
Etc.
Databases usually have all this mechanisms and you do not have to implement them in your application.
Depending on your application needs you decide what strategy to use for storing the data: by means of database, files, or by both approaches.

Designing a generic unstructured data store

The project I have been given is to store and retrieve unstructured data from a third-party. This could be HR information – User, Pictures, CV, Voice mail etc or factory related stuff – Work items, parts lists, time sheets etc. Basically almost any type of data.
Some of these items may be linked so a User many have a picture for example. I don’t need to examine the content of the data as my storage solution will receive the data as XML and send it out as XML. It’s down to the recipient to convert the XML back into a picture or sound file etc. The recipient may request all Users so I need to be able to find User records and their related “child” items such as pictures etc, or the recipient may just want pictures etc.
My database is MS SQL and I have to stick with that. My question is, are there any patterns or existing solutions for handling unstructured data in this way.
I’ve done a bit of Googling and have found some sites that talk about this kind of problem but they are more interested in drilling into the data to allow searches on their content. I don’t need to know the content just what type it is (picture, User, Job Sheet etc).
To those who have given their comments:
The problem I face is the linking of objects together. A User object may be added to the data store then at a later date the users picture may be added. When the User is requested I will need to return the both the User object and it associated Picture. The user may update their picture so you can see I need to keep relationships between objects. That is what I was trying to get across in the second paragraph. The problem I have is that my solution must be very generic as I should be able to store anything and link these objects by the end users requirements. EG: User, Pictures and emails or Work items, Parts list etc. I see that Microsoft has developed ZEntity which looks like it may be useful but I don’t need to drill into the data contents so it’s probably over kill for what I need.

I have been using Microsoft Zentity since version 1, and whilst it is excellent a storing huge amounts of structured data and allowing (relatively) simple access to the data, if your data structure is likely to change then recreating the 'data model' (and the regression testing) would probably remove the benefits of using such a system.
Another point worth noting is that Zentity requires filestream storage so you would need to have the correct version of SQL Server installed (2008 I think) and filestream storage enabled.

Since you deal with XML, it's not an unstructured data. Microsoft SQL Server 2005 or later has XML column type that you can use.
Now, if you don't need to access XML nodes and you think you will never need to, go with the plain varbinary(max). For your information, storing XML content in an XML-type column let you not only to retrieve XML nodes directly through database queries, but also validate XML data against schemas, which may be useful to ensure that the content you store is valid.
Don't forget to use FILESTREAMs (SQL Server 2008 or later), if your XML data grows in size (2MB+). This is probably your case, since voice-mail or pictures can easily be larger than 2 MB, especially when they are Base64-encoded inside an XML file.

Since your data is quite freeform and changable, your best bet is to put it on a plain old file system not a relational database. By all means store some meta-information in SQL where it makes sense to search through structed data relationships but if your main data content is not structured with data relationships then you're doing yourself a disservice using an SQL database.
The filesystem is blindingly fast to lookup files and stream them, especially if this is an intranet application. All you need to do is share a folder and apply sensible file permissions and a large chunk of unnecessary development disappears. If you need to deliver this over the web, consider using WebDAV with IIS.
A reasonably clever file and directory naming convension with a small piece of software you write to help people get to the right path will hands down, always beat any SQL database for both access speed and sequential data streaming. Filesystem paths and file names will always beat any clever SQL index for data location speed. And plain old files are the ultimate unstructured, flexible data store.
Use SQL for what it's good for. Use files for what they are good for. Best tools for the job and all that...

You don't really need any pattern for this implementation. Store all your data in a BLOB entry. Read from it when required and then send it out again.
Yo would probably need to investigate other infrastructure aspects like periodically cleaning up the db to remove expired entries.
Maybe i'm not understanding the problem clearly.

So am I right if I say that all you need to store is a blob of xml with whatever binary information contained within? Why can't you have a users table and then a linked(foreign key) table with userobjects in, linked by userId?

Can you provide some advice on setting up my database?

I'm working on a MUD (Multi User Dungeon) in Python and am just now getting around to the point where I need to add some rooms, enemies, items, etc. I could hardcode all this in, but it seems like this is more of a job for a database.
However, I've never really done any work with databases before so I was wondering if you have any advice on how to set this up?
What format should I store the data in?
I was thinking of storing a Dictionary object in the database for each entity. In htis way, I could then simply add new attributes to the database on the fly without altering the columns of the database. Does that sound reasonable?
Should I store all the information in the same database but in different tables or different entities (enemies and rooms) in different databases.
I know this will be a can of worms, but what are some suggestions for a good database? Is MySQL a good choice?

1) There's almost never any reason to have data for the same application in different databases. Not unless you're a Fortune500 size company (OK, i'm exaggregating).
2) Store the info in different tables.
As an example:
T1: Rooms
T2: Room common properties (aplicable to every room), with a row per **room*
T3: Room unique properties (applicable to minority of rooms, with a row per property per room - thos makes it easy to add custom properties without adding new columns
T4: Room-Room connections
Having T2 AND T3 is important as it allows you to combine efficiency and speed of row-per-room idea where it's applicable with flexibility/maintanability/space saving of attribute-per-entity-per-row (or Object/attribute/value as IIRC it's called in fancy terms) schema
Good discussion is here
3) Implementation wise, try to write something re-usable, e.g. have generic "Get_room" methods, which underneath access the DB -= ideally via transact SQL or ANSI SQL so you can survive changing of DB back-end fairly painlessly.
For initial work, you can use SQLite. Cheap, easy and SQL compatible (the best property of all). Install is pretty much nothing, DB management can be done by freeware tools or even FireFox plugin IIRC (all of FireFox 3 data stores - history, bookmarks, places, etc... - are all SQLite databases).
For later, either MySQL or Postgres (I don't do either one professionally so can't recommend one). IIRC at some point Sybase had free personal db server as well, but no idea if that's still the case.

This technique is called entity-attribute-value model. It's normally preferred to have DB schema that reflects the structure of the objects, and update the schema when your object structure changes. Such strict schema is easier to query and it's easier to make sure that the data is correct on the database level.
One database with multiple tables is the way to do.
If you want a database server, I've recommend PostgreSQL. MySQL has some advantages, like easy replication, but PostgreSQL is generally nicer to work with. If you want something smaller that works directly with the application, SQLite is a good embedded database.

Storing an entire object (serialized/encoded) as a value in the database is bad for querying - I am sure that some queries in your mud will NOT need to know 100% of attributes, or may retrieve a list of object by a value of attributes.

it seems like this is more of a job
for a database
True, although 'database' doesn't have to mean 'relational database'. Most existing MUDs store all data in memory, and read it in from flat-file saved in a plain-text data format. I'm not necessarily recommending this route, just pointing out that a traditional database is by no means necessary. If you do want to go the relational route, recent versions of Python come with sqlite which is a lightweight embedded relational database with good SQL support.
Using relational databases with your code can be awkward. Any change to a game logic class can require a parallel change to the database, and changes to the code that read and write to the database. For this reason good planning will help you a lot, but it's hard to plan a good database schema without experience. At least get your entity classes planned first, then build a database schema around it. Reading up on normalizing a database and understanding the principles there will help.
You may want to use an 'object-relational mapper' which can simplify a lot of this for you. Examples in Python include SQLObject, SQLAlchemy, and Autumn. These hide a lot of the complexities for you, but as a result can hide some of the important details too. I'd recommend using the database directly until you are more familiar with it, and consider using an ORM in the future.
I was thinking of storing a Dictionary
object in the database for each
entity. In htis way, I could then
simply add new attributes to the
database on the fly without altering
the columns of the database. Does that
sound reasonable?
Unfortunately not - if you do that, you waste 99% of the capabilities of the database and are effectively using it as a glorified data store. However, if you don't need aforementioned database capabilities, this is a valid route if you use the right tool for the job. The standard shelve module is well worth looking at for this purpose.
Should I store all the information in
the same database but in different
tables or different entities (enemies
and rooms) in different databases.
One database. One table in the database per entity type. That's the typical approach when using a relational database (eg. MySQL, SQL Server, SQLite, etc).
I know this will be a can of worms,
but what are some suggestions for a
good database? Is MySQL a good choice?
I would advise sticking with sqlite until you're more familiar with SQL. Otherwise, MySQL is a reasonable choice for a free game database, as is PostGreSQL.

One database. Each database table should refer to an actual data object.
For instance, create a table for all items, all creatures, all character classes, all treasures, etc.
Spend some time now and figure out how objects will relate to each other, as this will affect your database structure. For example, can a character have more than one character class? Can monsters have character classes? Can monsters carry items? Can rooms have more than one monster?
It seems pedantic, but you'll save yourself a whole lot of trouble early by figuring out what database objects "belong" to which other database objects.

How do you handle small sets of data?

With really small sets of data, the policy where I work is generally to stick them into text files, but in my experience this can be a development headache. Data generally comes from the database and when it doesn't, the process involved in setting it/storing it is generally hidden in the code. With the database you can generally see all the data available to you and the ways with which it relates to other data.
Sometimes for really small sets of data I just store them in an internal data structure in the code (like A Perl hash) but then when a change is needed, it's in the hands of a developer.
So how do you handle small sets of infrequently changed data? Do you have set criteria of when to use a database table or a text file or..?
I'm tempted to just use a database table for absolutely everything but I'm not sure if there are any implications to this.
Edit: For context:
I've been asked to put a new contact form on the website for a handful of companies, with more to be added occasionally in the future. Except, companies don't have contact email addresses.. the users inside these companies do (as they post jobs through their own accounts). Now though, we want a "speculative application" type functionality and the form needs an email address to send these applications to. But we also don't want to put an email address as a property in the form or else spammers can just use it as an open email gateway. So clearly, we need an ID -> contact_email type relationship with companies.
SO, I can either add a column to a table with millions of rows which will be used, literally, about 20 times OR create a new table that at most is going to hold about 20 rows. Typically how we handle this in the past is just to create a nasty text file and read it from there. But this creates maintenance nightmares and these text files are frequently looked over when data that they depend on changes. Perhaps this is a fault with the process, but I'm just interested in hearing views on this.

Put it in the database. If it changes infrequently, cache it in your middle tier.

The example that springs to mind immediately is what is appropriate to have stored as an enumeration and what is appropriate to have stored in a "lookup" database table.
I tend to "draw the line" with the rule that if it will result in a column in the database containing a "magic number" that maps to an enumeration value, then the enumeration should really exist as a lookup table. If it's unrelated to the data stored in the database (eg. Application configuration data rather than user generated data), then it's an enumeration all the way.

Surely it depends on the user of the software tool you've developed to consume the set of data, regardless of size?
It might just be that they know Excel, so your tool would have to parse a .csv file that they create.
If it's written for the developers, then who cares what you use. I'm not a fan of cluttering databases with minor or transient data however.

We have a standard config file format (key:value) and a class to handle it. We just use that on all projects. Mostly we're just setting persistent properties for our applications (mobile phone development) so that's an appropriate thing to do. YMMV

In cases where the program accesses a database, I'll store everything in there: easier for backup and moving data around.
For small programs without database access I store my data in the .net settings, which are stored in an xml file - of course this is a feature of c#, so it might not apply to you.
Anyway, I make sure to store all data in one place. Usually a database.

Have you considered sqlite ? It's file-based, which addresses your feeling that "just a file might do" (zero configuration), but it's a perfectly good database and scales remarkably well. It supports a number of APIs and there are numerous front ends for administering it.

If these are small config-like data, i use some simple and common format. ini, json and yaml are usually ok. Java and .NET fans also like XML. in short, use something that you can easily read to an in-memory object and forget about it.

I would add it to the database in the main table:
Backup and recovery (you do want to recover this text file, right?)
Adhoc querying (since you can do it will a SQL tool and join it to the other database data)
If the database column is empty the store requirements for it should be minimal (nothing if it's a NULL column at the end of the table in Oracle)
It will be easier if you want to have multiple application servers as you will not need to keep multiple copies of some extra config file around
Putting it into a little child table only complicates the design without giving any real benefits
You may well already be going to that same row in the database as part of your processing anyway, so performance is not likely to be a problem. If you are not, you could cache it in memory.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight