Storing data in text files instead of SQL Server - sql-server

I'm intending to use both of SQL Server and simple text files to save my data.
Information like Users data are going to be stored in SQL Server, RSS fedd for each user are going to be stored in folder with the user Id as a title and inside this folder I can put the files that going to store the data in, each file can take only 20 lines, if there is more than 20 then I make a new file.
When I need to reed this data I simply call the last file in the user's folder.
I need to know what is the advantages and disadvantages of using this method?
thanx

I would suggest you to store the text file data into either VARCHAR(8000) or Blob and store inside the table in database.
The advantages of storing in database is:
All your data is stored in a single place. It is very easy for you to backup and restore in other place, if required
Database by default comes with concurrency and if you have say multiple users trying to access the same row, same table, database handles it inherently
When you go for files and database kind of hybrid approach, you are going for distributed storage and you have to always make sure that they are consistent
If you want to just store the latest text file content, go for UPDATE. If you want to keep history of earlier text files content, go for SCD Type 2 kind of storage or go for historical table containing previous text file data
Database is a single contained unit and you can do so many things on it like : Transparent data encryption, masking, access control and all security related stuff in a single contained unit. In hybrid approach, you have to manage security in two places.
When all your data is in a single place, and once you have proper indexes, you can write queries and come up with so many different reporting use cases, using SQL. But, if the data is distributed, you have to manage how will be handling the different reporting use cases.

The question is not quite correct.
You should start with clarification of requirements for the application. Answer to yourself the following questions:
What type of data queries need to be executed (selects, updates, reports).
How many users will be. How often requests from them will be coming. Does data must be synchronized across users (Concurrency).
Need of authentication and authorization, localization.
Need for modification history support.
Etc.
Databases usually have all this mechanisms and you do not have to implement them in your application.
Depending on your application needs you decide what strategy to use for storing the data: by means of database, files, or by both approaches.

Related

How and where to store the current customer purchasing history data?

I am now working on a project which requires to show the transaction history of one customer and if the product customer buys is under warranty or not. I need to use the data from the current system, the system can provide Web API, which is a .csv file. So how can I make use of the current system data?
A solution I think of is to download all the .csv files and write scripts to insert every record into the database I built which contains the necessary tables and relations to hold the data I retrieve. Then I can have a new database which I want. because I never done this before so I want know if it is feasible?
And one more question would be, if I should store the data locally or use a cloud database like Firebase?
High-end databases like SQL Server and Oracle come with utilities that allow you to read directly from a csv file. Check the docs. Having done this many times, the best procedure I found was to read the file into one holding table. This gives you the chance to examine the data and find any unexpected quirks or missing fields. This allows you to correct the data, where possible.
Then write the scripts to move the data from the holding table into the proper tables you have designed. This must be done in a logical manner. For example, move the customer data before the buy transactions. Thus any error messages you get will not be because you tried to store a transaction before you stored the customer. (You will have referential integrity set up, yes?) This gives you more chances to correct or adjust the data or just identify problems more or less at your leisure.
Whether or not to store the data in the cloud is strictly according to the preferences of your employer.

Temporary storage for excel file processing

I am developing a web application in Java EE technologies(Spring, Spring MVC, Hibernate). In this application I am parsing an Excel file and need to add these data to a sql server database.
Before adding these data to the database I need to get the user input for each row in the Excel file whether they really want to add these data to the database.
I can do this something like this:
First save the data to a table(table_tmp)
Then display the data to the user and from their input add it to the actual table and remove from the temporary table.
But I think there will be a better solution (some kind of temporary storage that I can delete after getting user input).
Can I use some NoSQL solution for this?
Why do you need to store it in the first place? You want to allow them to upload that excel, and come back way after their session expired and select rows they want? Do you really have to make it persistent?
If yes, then do you have any problems with your current setup? If not then you'll be introducing another external component that needs to be administered and that you have to interface with (what happens if mongodb is down, not enough disk space, connection timed out, ...) just to keep a temporary file.
But if you still want to do it, then you might first consider something really simple (and fast), like memcached - a key-value in-memory storage.

Designing a generic unstructured data store

The project I have been given is to store and retrieve unstructured data from a third-party. This could be HR information – User, Pictures, CV, Voice mail etc or factory related stuff – Work items, parts lists, time sheets etc. Basically almost any type of data.
Some of these items may be linked so a User many have a picture for example. I don’t need to examine the content of the data as my storage solution will receive the data as XML and send it out as XML. It’s down to the recipient to convert the XML back into a picture or sound file etc. The recipient may request all Users so I need to be able to find User records and their related “child” items such as pictures etc, or the recipient may just want pictures etc.
My database is MS SQL and I have to stick with that. My question is, are there any patterns or existing solutions for handling unstructured data in this way.
I’ve done a bit of Googling and have found some sites that talk about this kind of problem but they are more interested in drilling into the data to allow searches on their content. I don’t need to know the content just what type it is (picture, User, Job Sheet etc).
To those who have given their comments:
The problem I face is the linking of objects together. A User object may be added to the data store then at a later date the users picture may be added. When the User is requested I will need to return the both the User object and it associated Picture. The user may update their picture so you can see I need to keep relationships between objects. That is what I was trying to get across in the second paragraph. The problem I have is that my solution must be very generic as I should be able to store anything and link these objects by the end users requirements. EG: User, Pictures and emails or Work items, Parts list etc. I see that Microsoft has developed ZEntity which looks like it may be useful but I don’t need to drill into the data contents so it’s probably over kill for what I need.
I have been using Microsoft Zentity since version 1, and whilst it is excellent a storing huge amounts of structured data and allowing (relatively) simple access to the data, if your data structure is likely to change then recreating the 'data model' (and the regression testing) would probably remove the benefits of using such a system.
Another point worth noting is that Zentity requires filestream storage so you would need to have the correct version of SQL Server installed (2008 I think) and filestream storage enabled.
Since you deal with XML, it's not an unstructured data. Microsoft SQL Server 2005 or later has XML column type that you can use.
Now, if you don't need to access XML nodes and you think you will never need to, go with the plain varbinary(max). For your information, storing XML content in an XML-type column let you not only to retrieve XML nodes directly through database queries, but also validate XML data against schemas, which may be useful to ensure that the content you store is valid.
Don't forget to use FILESTREAMs (SQL Server 2008 or later), if your XML data grows in size (2MB+). This is probably your case, since voice-mail or pictures can easily be larger than 2 MB, especially when they are Base64-encoded inside an XML file.
Since your data is quite freeform and changable, your best bet is to put it on a plain old file system not a relational database. By all means store some meta-information in SQL where it makes sense to search through structed data relationships but if your main data content is not structured with data relationships then you're doing yourself a disservice using an SQL database.
The filesystem is blindingly fast to lookup files and stream them, especially if this is an intranet application. All you need to do is share a folder and apply sensible file permissions and a large chunk of unnecessary development disappears. If you need to deliver this over the web, consider using WebDAV with IIS.
A reasonably clever file and directory naming convension with a small piece of software you write to help people get to the right path will hands down, always beat any SQL database for both access speed and sequential data streaming. Filesystem paths and file names will always beat any clever SQL index for data location speed. And plain old files are the ultimate unstructured, flexible data store.
Use SQL for what it's good for. Use files for what they are good for. Best tools for the job and all that...
You don't really need any pattern for this implementation. Store all your data in a BLOB entry. Read from it when required and then send it out again.
Yo would probably need to investigate other infrastructure aspects like periodically cleaning up the db to remove expired entries.
Maybe i'm not understanding the problem clearly.
So am I right if I say that all you need to store is a blob of xml with whatever binary information contained within? Why can't you have a users table and then a linked(foreign key) table with userobjects in, linked by userId?

Saving images: files or blobs?

When you save your images (supose you have lots of them) do you store then as blobs in your Database, or as files? Why?
Duplicate of: Storing Images in DB - Yea or Nay?
I usually go with storing them as files, and store the path in the database. To me, it's a much easier and more natural approach than pushing them into the database as blobs.
One argument for storing them in the database: much easier to do full backups, but that depends on your needs. If you need to be able to easily take full snapshots of your database (including the images), then storing them as blobs in the database is probably the way to go. Otherwise you have to pair your database backup with a file backup, and somehow try to associate the two, so that if you have to do a restore, you know which pair to restore.
It depends on the size of the image.
Microsoft Research has an interesting document on the subject
I've tried to use the db (SQL Server and MySQL) to store medium (< 5mb) files, and what I got was tons of trouble.
1) Some DBs (SQL Server Express) have size limits;
2) Some DBs (MySQL) become mortally slow;
3) When you have to display a list of object, if you inadvertedly do SELECT * FROM table, tons of data will try to go up and down from the db, resulting in a deadly slow response or memory fail;
4) Some frontends (ruby ActiveRecord) have very big troubles handling blobs.
Just use files. Don't store them all in the same directory, use some technique to put them on several dirs (for instance, you could use last two chars of a GUID or last two digits of an int id) and then store the path on db.
The performance hit of a database server is a moot issue. If you need the performance benefits of a file system, you simply cache it there on the first request. Subsequent requests can then be served directly from the file system by a direct link (which, in case of a web app, you could rewrite the HTML with before flushing the output buffer).
This provides the best of both worlds:
The authoritative store is the
database, keeping transactional and
referential integrity
You can deploy all user data by
simply deploying the database
Emptying this cache (e.g. by adding a
web server) would only cause a
temporary performance hit while it is
refilled automatically.
There is no need to constantly hammer the database for things that won't change all the time, but the important thing is that the user data is all there and not scattered around different places, making multi-server operation and deployment a total mess.
I'm always advocating the "database as the user data store, unless" approach, because it is better architecturally, and not necessarily slower with effective caching.
Having said that, a good reason to use the file system as the authoritative store would be when you really need to use external independent tools for accessing it, e.g. SFTP and whatnot.
Given that you might want to save an image along with a name, brief description, created date, created by, etc., you might find it better to save in a database. That way, everything is together. If you saved this same info and stored the image as a file, you would have to retrieve the whole "image object" from two places...and down the road, you might find yourself having syncing issues (some images not being found). Hopefully this makes sense.
By saving you mean to use them to show in a webpage or something like that?
If it's the case, the better option will be to use files, if you use a database it will be constantly hammered with the request for photos. And it's a situation that doesn't scale too well.
The question is, does your application handle BLOBS or other files like other application data? Do your users upload images alongside other data? If so, then you ought to store the BLOBs in the database. It makes it easier to back up the database and, in the event of a problem, to recover to a transactionally consistent state.
But if you mean images which are part of the application infratstructure rather than user data then probably the answer is, No.
If I'm running on one web server and will only ever be running on one web server, I store them as files. If I'm running across multiple webheads, I put the reference instance of the image in a database BLOB and cache it as a file on the webheads.
Blobs can be heavy on the db/scripts, why not just store paths. The only reason we've ever used blobs is if it needs to be merge replicated or super tight security for assets (as in cant pull image unless logged in or something)
I would suggest to go for File systems. First, let's discuss why not Blob? So to answer that, we need to think what advantages DB provides us over File system?
Mutability: We can modify the data once stored. Not Applicable in case of images. Images are just a series of 1s and 0s. Whenever we changes an image, it wouldn't be a matter of few 1s and 0s altered and hence, modifying the same image content doesn't make sense. It's better to delete the old one, and store new.
Indexing: We can create indexes for faster searching. But it doesn't apply on images as images are just 1s and 0s and we can't index that.
Then why File systems?
Faster access: If we are storing images in Blob inside our DB, then a query to fetch the complete record (select *) will result in a very poor performance of the query as a lots and lots of data will be going to and from the DB. Instead if we just store the URL of images in DB and store images in a distributed file system (DFS), it will be much faster.
Size limit: If DBs are storing images, a lot and lot of images then it might face performance issues and also, reach its memory limit (few DBs do have it).
Using file System is better as the basic feature you would be provided with while storing images as a blob would be
1. mutability which is not needed for an image as we won't be changing the binary data of images, we will be removing images as whole only
2. Indexed searching :which is not needed for image as the content of images can't be indexed and indexed searching searches the content of the BLOB.
Using file system is beneficial here because
1. its cheaper
2. Using CDN for fast access
hence one way forward could be to store the images as a file and provide its path in database

How do you handle small sets of data?

With really small sets of data, the policy where I work is generally to stick them into text files, but in my experience this can be a development headache. Data generally comes from the database and when it doesn't, the process involved in setting it/storing it is generally hidden in the code. With the database you can generally see all the data available to you and the ways with which it relates to other data.
Sometimes for really small sets of data I just store them in an internal data structure in the code (like A Perl hash) but then when a change is needed, it's in the hands of a developer.
So how do you handle small sets of infrequently changed data? Do you have set criteria of when to use a database table or a text file or..?
I'm tempted to just use a database table for absolutely everything but I'm not sure if there are any implications to this.
Edit: For context:
I've been asked to put a new contact form on the website for a handful of companies, with more to be added occasionally in the future. Except, companies don't have contact email addresses.. the users inside these companies do (as they post jobs through their own accounts). Now though, we want a "speculative application" type functionality and the form needs an email address to send these applications to. But we also don't want to put an email address as a property in the form or else spammers can just use it as an open email gateway. So clearly, we need an ID -> contact_email type relationship with companies.
SO, I can either add a column to a table with millions of rows which will be used, literally, about 20 times OR create a new table that at most is going to hold about 20 rows. Typically how we handle this in the past is just to create a nasty text file and read it from there. But this creates maintenance nightmares and these text files are frequently looked over when data that they depend on changes. Perhaps this is a fault with the process, but I'm just interested in hearing views on this.
Put it in the database. If it changes infrequently, cache it in your middle tier.
The example that springs to mind immediately is what is appropriate to have stored as an enumeration and what is appropriate to have stored in a "lookup" database table.
I tend to "draw the line" with the rule that if it will result in a column in the database containing a "magic number" that maps to an enumeration value, then the enumeration should really exist as a lookup table. If it's unrelated to the data stored in the database (eg. Application configuration data rather than user generated data), then it's an enumeration all the way.
Surely it depends on the user of the software tool you've developed to consume the set of data, regardless of size?
It might just be that they know Excel, so your tool would have to parse a .csv file that they create.
If it's written for the developers, then who cares what you use. I'm not a fan of cluttering databases with minor or transient data however.
We have a standard config file format (key:value) and a class to handle it. We just use that on all projects. Mostly we're just setting persistent properties for our applications (mobile phone development) so that's an appropriate thing to do. YMMV
In cases where the program accesses a database, I'll store everything in there: easier for backup and moving data around.
For small programs without database access I store my data in the .net settings, which are stored in an xml file - of course this is a feature of c#, so it might not apply to you.
Anyway, I make sure to store all data in one place. Usually a database.
Have you considered sqlite ? It's file-based, which addresses your feeling that "just a file might do" (zero configuration), but it's a perfectly good database and scales remarkably well. It supports a number of APIs and there are numerous front ends for administering it.
If these are small config-like data, i use some simple and common format. ini, json and yaml are usually ok. Java and .NET fans also like XML. in short, use something that you can easily read to an in-memory object and forget about it.
I would add it to the database in the main table:
Backup and recovery (you do want to recover this text file, right?)
Adhoc querying (since you can do it will a SQL tool and join it to the other database data)
If the database column is empty the store requirements for it should be minimal (nothing if it's a NULL column at the end of the table in Oracle)
It will be easier if you want to have multiple application servers as you will not need to keep multiple copies of some extra config file around
Putting it into a little child table only complicates the design without giving any real benefits
You may well already be going to that same row in the database as part of your processing anyway, so performance is not likely to be a problem. If you are not, you could cache it in memory.

Resources