I have a job that need to write lots of small amount of data individually to a file (like logging). If I implement it just like normal file write access, will that wear out my disk very quick?
I later realized the best solution depends on different systems too. In my case I can use ramdisk or such. But I wonder what is the solution for industry systems normally use, in case I want expandability.
But I wonder what is the solution for industry systems normally use
Filebeat was used to manage logging in one of my past project. As far as I remember, you need to do few settings in its config file. You need to specify the remote server and the file that you want to keep uploading.
filebeat will keep uploading your logs on that remote server.
You can read more about it: https://www.elastic.co/beats/filebeat
Related
I am trying to get input from the user that the program will remember every time it runs. I want to do this in C. Based on my limited knowledge of computers, this will be stored in the storage/hard drive/ssd instead of the RAM/memory. I know I could use a database, or write to a text file, but I don't want to use an external file or a database, since I think that a database is a bit overkill for this and an external file can be messed with by the user. How can I do this (get the user to enter the input once and for the program to remember it forever)? Thanks! (If anyone needs me to clarify my question, I'd be happy to do so and when I get an answer, I will clarify this question for future users.)
People have done this all kinds of ways. In order from best to worst (in my opinion):
Use a registry or equivalent. That's what it's there for.
Use an environment variable. IMO this is error prone and may not really be what you want.
Store a file on your user's computer. Easy to do and simple.
Store a file on a server and read/write to the server via the network. Annoying that you have to use the network, but OK.
Modify your own binary on disk. This is fun as a learning experience, but generally inadvisable in production code. Still it can be done sometimes especially using an installer.
Spawn a background process that "never" dies. This is strictly worse than using a file.
You won't be able to prevent the user from modifying a file if they really want to. What you could do is create a file with a name or extension that makes it obvious that it should not be modified, or make it hidden to the user.
There isn't really any common way that you could write to a file and at the same time prevent the user from accessing it. You would need OS/platform level support to have some kind of protected storage.
The only real alternative commonly available is to store the information online on a server that you control and fetch it from there over the network. You could cache a cryptographically signed local copy with an expiration date to avoid having to connect every time the program is run. Of course if you are doing this as some kind of DRM or similar measure (e.g., time-limited demo), you will also need to protect your software from modification.
(Note that modifying the program itself, which you mentioned in a comment, is not really any different from modifying other files. In particular, the user can restore an earlier version of the program from backup or by re-downloading it, which is something even a casual user might try. Also, any signature on the software would become invalid by your modifications, and antivirus software may be triggered.)
If you simply wish to hide your file somewhere to protect against casual users (only), give it an obscure name and set the file hidden using both filesystem attributes and naming (in *nix systems with a . as the first character of file name). (How hidden you can make it may be thwarted by permissions and/or sandboxing, depending on the OS.)
Also note that if your goal is to hide the information from the user, you should encrypt it somehow. This includes any pieces of the data that are part of the program that writes it.
edit: In case I guessed incorrectly and the reason for wanting to do this is simply to keep things "clean", then just store it in the platform's usual "user settings" area. For example, AppData or registry on Windows, user defaults or ~/Library/Application Support on macOS, a "dotfile" on generic *nix systems, etc. Do not modify the application itself for this reason.
If you want to persist data, the typical way to that is to store it to a file. Just use FILE* and go about your business.
Using a database for this may be an overkill, it depends on how you want to later access the data once it is stored.
If you just load the data from the file and search through it, then there is no need for a database, if you have loads of data and want to make complex searches, then a database is the way to go. If you need redundancy, user handling, security then choose a database, since the developers of each one already spent a lot of time fixing this.
I was thinking - let's take a look at a computer game of any kind, or any program in general.
(Chrome, Skype, Warcraft,...)
They need to save some things that a user wanted them to save.
How do they do it?
Do they save it in a simple text file, or do they pack a database system (like MySQL,...) with themselves?
That really depends on your needs. If you only need to store some key value pairs, an application can use a simple text file (e.g. an *.ini file) That however is a plain text file readable by everybody.
An application can of course also use a database like MySql, MS SQL. However, these are not very handy if you want to distribute your application as they run as a seperate service on a server and need to be installed seperately. Then, there are databases like Sqlite which is also a SQL database, but which stores everything inside a single file. Your application just needs a way to interact with this file.
Yet another way would be to serialize/deserialize an object which holds your data you want to store.
There are other ways to store data, like NoSQL databases. I personally haven't used one of those yet, but here is a listing of some of them: http://nosql-database.org/
XML could also be used.
There are endless way an application can store its data
There is literally no end to the ways programs will store data. OTOH:
home-made archive formats: every game company seems to have a few of their own (Blizzard MoPaQ,
XML files: usually used for simple configuration (Apple's plist files, Windows application configurations, Skype's user preferences, ...)
SQLite databases: usually used for larger amounts of personal data (Firefox: bookmarks, history, etc.; iOS personal information databases, etc.)
"In the cloud" in someone else's database (basically all web apps)
Plain text or simple text formats (Windows .ini/.inf, Java MANIFEST.MF, YAML, etc.)
...
A single program might use multiple methods depending on what they're storing. There is no unified solution, and there is no one solution that is right for every task since every system has tradeoffs (human-readability vs. packing efficiency, random access vs. sequential archive, etc.)
A lot of programs use Sqlite to store data (http://www.sqlite.org). Sqlite is a very compact cross platform SQL database. Many programs do use text files.
So I have this requirement that says the app must let users upload and download about 6000 files per month (mostly pdf, doc, xls).
I was thinking about the optimal solution for this. Question is whether I'll use BLOb's in my database or a simple file hierarchy for writing/reading these bunch of files.
The app architecture is based on Java 1.6, Spring 3.1 and DOJO, Informix 10.X.
So I'm here just to be advised based on your experience.
When asking what's the "best" solution, it's a good idea to include your evaluation criteria - speed, cost, simplicity, maintenance etc.
The answer Mikko Maunu gave is pretty much on the money. I haven't used Informix in 20 years, but most databases are a little slow when dealing with BLOBs - especially the step of getting the BLOB into and out of the database can be slow.
That problem tends to get worse as more users access the system simultaneously, especially if they use a web application - the application server has to work quite hard to get the files in and out of the database, probably consumes far more memory for those requests than normal, and probably takes longer to complete the file-related requests than for "normal" pages.
This can lead to the webserver slowing down under only moderate load. If you choose to store the documents in your database, I'd strongly recommend running some performance tests to see if you have a problem - this kind of solution tends to expose flaws in your setup that wouldn't otherwise come to light (slow network connection to your database server, insufficient RAM in your web servers, etc.)
To avoid this, I've stored the "master" copies of the documents in the database, so they all get backed up together, and I can ask the database questions like "do I have all the documents for user x?". However, I've used a cache on the webserver to avoid reading documents from the database more than I needed to. This works well if you have a "write once, read many" time solution like a content management system, where the cache can earn its keep.
If you have other data in database in relation to these files, storing files to file system makes it more complex:
Back-up should be done separately.
Transactions have to be separately implemented (as far as even possible for file system operations).
Integrity checks between database and file system structure do not come out of the box.
No cascades: removing users pictures as consequence of removing user.
First you have to query for path of file from database and then pick one from file system.
What is good with file system based solution is that sometimes it is handy to be able to directly access files, for example copying part of the images somewhere else. Also storing binary data of course can dramatically change size of database. But in any case, more disk storage is needed somewhere with both solutions.
Of course all of this can ask more DB resources than currently available. There can be in general significant performance hit, especially if decision is between local file system and remote DB. In your case (6000 files monthly) raw performance will not be problem, but latency can be.
I know there are several questions similar, but I can't find one that answers my specific problem:
I need to save some data in a server for a game I'm developing.
Each user will save one binary file and some time after ask for it.
The file can be something between just a bunch of bytes to around 50kb.
A lot of questions (mostly about images) say to use the filesystem, because that file can then be served as static content. In this case that's not possible, since I will have to check somehow that I'm sending that file to the right user, and also I need some logic return the file only if it's not the same the user already has.
Should I save that file in the database or in the filesystem?
Note: The server will be hosted on Linux, and the DB will probably by MySQL.
Thanks!
I'm afraid you're far from providing enough information to answer your question correctly. If I read your question "naively", all you're trying to do is write a save game system.
In such a case, the file system is really all you need. DB are good for storing structured data that you're going to search, sum, combine and index, not for storing arbitrary bunch of small blobs.
Now, if there is other requirement, for instance, you're writing a web-based game that store the data for all players in a central location, the answer MIGHT be different (again, you need to provide much more details about what you're doing, though)
I suggest reading this whitepaper (To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem) by Microsoft research (it deals with SQL Server 2005).
The conclusions are - if the files are under 250kb, use the DB to store them
In essence,
filesystems seem to have better fragmentation handling
than databases and this drives the break-even point
down from about 1MB to about 256KB.
Of course, you should test for your own DB and OS.
I think you could user a sort of database. Filesystem is slow, cause it go on hard disk, move head to find file. With db, the access is faster. If your server is hosted on windows, you can use a microsoft db, access, that is little and fast.
I need to give data to a data processing windows service (one-way, loosely coupled). I want to ensure that the service being down etc. doesn't result in 'lost' data, that restarting the windows service simply causes it to pick up work where it left and I need the system to be really easy to troubleshoot, which is why I'm not using MSMQ.
So I came up with one of two solutions - either:
I drop text files with the processing data into a drop directory and the windows service waits for file change notifications, processes and deletes the file then
or
I insert data in a special table in the local MS SQL database, and the windows service polls the database for changes/new items and then erases them as they are processed
The MSSQL database is local on the system, not over the network, but later on I may want to move it to a different server.
Which, from a performance (or other standpoint) is the better solution here?
From a performance perspective, it's likely the filesystem will be fastest - perhaps by a large margin.
However, there are other factors to consider.
It doesn't matter how fast it is, generally, only whether it's sufficiently fast. Storing and retrieving small blobs is a simple task and quite possibly this will never be your bottleneck.
NTFS is journalled - but only the metadata. If the server should crash mid-write, a file may contain gibberish. If you use a filesystem backend, you'll need to be robust against arbitrary data in the files. Depending on the caching layer and the way the file system reuses old space, that gibberish could contains segments of other messages, so you'd best be robust even against an old message being repeated.
If you ever want to add new features involving a richer message model, a database is more easily extended (say, some sort of caching layer).
The filesystem is more "open" - meaning it may be easier to debug with really simple tools (notepad), but also that you may encounter more tricky issues with local indexing services, virus scanners, poorly set permissions, or whatever else happens to live on the system.
Most API's can't deal with files with paths of more than 260 characters, and perform poorly when faced with huge numbers of files. If ever your storage directory becomes too large, things like .GetFiles() will become slow - whereas a DB can be indexed on the timestamp, and the newest messages retrieved irrespective of old clutter. You can work around this, but it's an extra hurdle.
MS SQL isn't free and/or isn't installed on every system. There's a bit of extra system administration necessary for each new server and more patches when you use it. Particularly if your software should be trivially installable by third parties, the filesystem has an advantage.
I don't know what your building, but don't prematurely optimize. Both solutions are quite similar in terms of performance, and it's likely not to matter - so pick whatever is easiest for you. If performance is ever really an issue, direct communication (whether via IPC or IP or whatnot) is going to be several orders of magnitude more performant, so don't waste time microoptimizing.
My experience with 2005 and lower is that it's much slower with the database.
Especially with larger file.. That really messes up SQL server memory when doing table scans
However
The new SQL server 2008 has better file support in the engine.