How to permanently store information using C program? - c

I am trying to get input from the user that the program will remember every time it runs. I want to do this in C. Based on my limited knowledge of computers, this will be stored in the storage/hard drive/ssd instead of the RAM/memory. I know I could use a database, or write to a text file, but I don't want to use an external file or a database, since I think that a database is a bit overkill for this and an external file can be messed with by the user. How can I do this (get the user to enter the input once and for the program to remember it forever)? Thanks! (If anyone needs me to clarify my question, I'd be happy to do so and when I get an answer, I will clarify this question for future users.)

People have done this all kinds of ways. In order from best to worst (in my opinion):
Use a registry or equivalent. That's what it's there for.
Use an environment variable. IMO this is error prone and may not really be what you want.
Store a file on your user's computer. Easy to do and simple.
Store a file on a server and read/write to the server via the network. Annoying that you have to use the network, but OK.
Modify your own binary on disk. This is fun as a learning experience, but generally inadvisable in production code. Still it can be done sometimes especially using an installer.
Spawn a background process that "never" dies. This is strictly worse than using a file.

You won't be able to prevent the user from modifying a file if they really want to. What you could do is create a file with a name or extension that makes it obvious that it should not be modified, or make it hidden to the user.

There isn't really any common way that you could write to a file and at the same time prevent the user from accessing it. You would need OS/platform level support to have some kind of protected storage.
The only real alternative commonly available is to store the information online on a server that you control and fetch it from there over the network. You could cache a cryptographically signed local copy with an expiration date to avoid having to connect every time the program is run. Of course if you are doing this as some kind of DRM or similar measure (e.g., time-limited demo), you will also need to protect your software from modification.
(Note that modifying the program itself, which you mentioned in a comment, is not really any different from modifying other files. In particular, the user can restore an earlier version of the program from backup or by re-downloading it, which is something even a casual user might try. Also, any signature on the software would become invalid by your modifications, and antivirus software may be triggered.)
If you simply wish to hide your file somewhere to protect against casual users (only), give it an obscure name and set the file hidden using both filesystem attributes and naming (in *nix systems with a . as the first character of file name). (How hidden you can make it may be thwarted by permissions and/or sandboxing, depending on the OS.)
Also note that if your goal is to hide the information from the user, you should encrypt it somehow. This includes any pieces of the data that are part of the program that writes it.
edit: In case I guessed incorrectly and the reason for wanting to do this is simply to keep things "clean", then just store it in the platform's usual "user settings" area. For example, AppData or registry on Windows, user defaults or ~/Library/Application Support on macOS, a "dotfile" on generic *nix systems, etc. Do not modify the application itself for this reason.

If you want to persist data, the typical way to that is to store it to a file. Just use FILE* and go about your business.
Using a database for this may be an overkill, it depends on how you want to later access the data once it is stored.
If you just load the data from the file and search through it, then there is no need for a database, if you have loads of data and want to make complex searches, then a database is the way to go. If you need redundancy, user handling, security then choose a database, since the developers of each one already spent a lot of time fixing this.

Related

What's the point of storing the user agent?

So far while logging userlogins I always stored the complete user agent additionally to already parsed informations (like browser, version, os, etc). The user agent usually just is a TEXT field in the table.
While implementing another similar thing, I was asking myself: What's even the point of doing that? Obviously, the user agent can be manipulated easily in any case, and the only relevant informations (browser, version and operating system) are already parsed and stored separately anyways.
Is there some actual benefit in still storing it, except for backtracking of data that could be faked anyways? What other relevant informations does the user agent contain to justify the (over years, quite large) amount of data that is used to store it?
And of course I realize that the user agent contains a lot more than just the browser specifications - but how many times did you really have to go back and analyze the user agent itself?
Just to clarify: I'm talking about reasons why to store the raw user-agent string, after parsing the "relevant" informations out of it (browser, os, etc) - what is the point of the user-agent after that point?
The user agent string contains information about the environment including operating system and browser. It is something I frequently check. There are two main reasons to store it.
If you are following up on a bug report or error then this
information is useful or even essential for determining what went
wrong - imagine trying to find an error that occurs only on IE8
without the user agent! This information can also help you prioritize a bug fix. You will want to fix an issue that is present on 93% of environments before you fix the one that is present on 7%.
Secondly, it provides very useful stats on the profile of your user. You might only want to support environments of more than a certain percentage of your user base. For example, if you are designing a new version of your software and, on examining your user agent logs, you find no one using IE, you might not bother to optimize or design for IE.
You seem to be concerned that the user agent string can be faked. While this is possible, unless there is some specific reason someone might do this in your app, it seems rather paranoid to worry about it. You make a good point, though, to remember what information is possible to fake.
UPDATE: I see your point, in fact in the logging I recently implemented I removed the parsed string because of the data overhead. There is little point in storing both the raw string and the parsed string. The only real reason to do that would be to make querying the logs slightly easier, which is not a good enough reason to me. Personally, I store the whole raw useragent which means no loss of data, future proofing for future browsers/oses/formats of user string, and eliminates the possibility of making mistakes when parsing.
From Wikipedia:
For this reason, most Web browsers use a User-Agent value as follows:
Mozilla/[version] ([system and browser information]) [platform]
([platform details]) [extensions]
If you have stored all the fields out of that you need then by all means discard the rest. The amount of data to log, how long to keep logs for, and in what form to keep them is a fairly personal thing that will differ in some ways from company to company and project to project.

Why is the save button needed?

Software like OneNote has shown that auto-save can be implemented, and it works just as well (or better) as the manual save button / CTRL+S.
Anyways everything that you work on you want saved. Its just if you're trying out something destructive that you would close without saving.
So from a programmers/usability perspective, why is the manual "save" feature still seen in virtually all software today? Is it because everyone is too lazy to implement "auto-save" whenever data gets modified?
And is it a good idea for us implement auto-save, at least to start some traction in our specific industry and amongst our competitors?
autosave normally saves on a defined interval. What happens if you want to save in between intervals?
You should implement a manual save to stay consisent with other applications in the environment as well.
People expect file -> save, or CTRL + S to exist.
The save button is a well-known, comfortable UI feature that everyone from Jon Skeet to grandma is familiar with. If you got rid of it, it would be like removing the close button on a window for some people. Granted, they would eventually get used to it, but some people would not understand that their data has been saved automatically.
Also, if you're autosaving on the web, not only are you taking up a lot of space on your server with all those instances, you're also using up a lot of bandwidth with periodical saves. At least with manual save, you are only using the space and bandwidth when the user intends, which can be more infrequent, thus saving bandwidth. The advantage, of course, to autosaving is the retention of work should something go awry.
Check the definition of "skeuomorph" :)
Additionally with a "save" there is commonly "save as.." as well. Both give the user the feeling of control and security. Knowing that they clicked save lets them know at what state they can expect their data to be in when reloading it.
It really comes down to this: a Save button is cheaper to implement and maintain than Undo.
It is not hard to implement auto-save - just implement a normal save and call it when ever needed or just in a timer (if you are lazy).
Save buttons are common because of the common pattern learned by the users for decades.
Load data or files from a persistent storage into main memory.
Modify the data in main memory.
Save the modified data back to a persistent storage.
This pattern comes from the old distinction between harde drive and main memory. If you think about it in another way (as some experimental operating systems do), there is no need for loading and saving files - just think about a hard drive as your main memory and and the main memory as another cache level for the hard drive. In consequence all files (not on removeable medias) are always in memory and you will never again need to load or save files.
But doing this change is not easy because users are used to the old pattern for years. Further the old load and save pattern is an very easy way to get a kind of primitve undo system.
Auto-saving requires an undo system, too, and it is not that trivial to build one. Esspecialy if you perform image, audio or video editing and you are producing a lot of data it is hard to find a good time-memory-trade-off. And there is the risk that user will try to undo things by closing the application and then recognize that this did not work. So it might even be a good idea to persist undo information to protect users from this error or saving unwanted changes in the case of a crash.
So, yes, I would realy like to see the save (and load) buttons to disappear. I would like persisted undo-information or even complete edit histories, too. But I don't think this change can happen in a few years - if ever.
I work in the medical field and there are situations where you want the user to take responsibility for saving something. If you have an EHR and you are entering a prescription for a patient then you dont necessarily want it autosaving - you want the user to be aware of and take responsibility for their actions. Also, autosaving a value in a critical system like this could be disastrous for obvious reasons...
Should be tagged subjective maybe?
As a developer, I am always a little uneasy around apps like that. I like having control over when my data is saved, though perhaps this is just years of conditioning at work. I get that little "uh oh" feeling whenever I close a window into which I've entered data without explicitly pressing a close button (or shortcut).
That said, I have been "trained" to accept it in certain situations. OneNote, for example, or Tomboy. A great many OS X apps follow this pattern, especially utility apps like DB server GUI tools.
So, in short, different tools for different situations. IMO, most software these days would not benefit from a move from a manual save to an auto-save.
I think the answer to this is that 'it depends'!
You should consider not only your user's expectations in terms of consistency with other applications, but also the way in which the user is going to use your application.
A very common use-case for OneNote is that someone opens it up to dump in some information almost as an aside to what they're working on. They need to get in and out quickly. Any prompts about saving would be a nuisance.
Applications like Word, on the other hand, expect users to be spending a concerted amount of time working on a document. In this case, the chore of manually saving and responding to confirmation boxes etc will be seen as a relatively small task.
From programmers perspective implementing autosave would not be a huge deal. You just set up a timer and callback would do the saving.
However, from usability point of view autosave is very problematic. First of all users are used to having a manual saving and not offering it to them would confuse a majority of users and take feel of control away.
Even bigger issue would be that autosave overwrites contents of underlaying file whether you wanted it or not. Of course you could have autosave feature saving onto temporary file but the decision to overwrite original document must always come from the user, not from the software. And because you would anyways need the user to initiate at least one manual saving, why not enable manual saving to be available always?
An autosave feature is great when you are dealing with a document. What about a business application? If I edit a customer's account, should it update the account as I tab out of the edited fields? If so, what should it do when the account is in an invalid state? When do you enforce business rules and how do you enforce them? How will it perform when you have to take business rules into account on every edit?
You can certainly build an application that will take any of these considerations into account, but will it have been worth the extra effort?
So should we get rid of the Save button? It depends.
Short answer: "auto save" = "auto destroy" / "auto <expletive>".
For one project in university, my group and I built an application without explicit saving as an experiment.
We implemented an infinite undo stack and serialized the undo stack with the actual data so that even if you closed the app and re-opened it, you could always undo your last operation. Every operation wrote a new entry to the action list on disk so that the file was always consistent (well, mostly...), even if the power failed. It was a bit of a cross between a version control system and a journaling file system.
There were two problems: one, we didn't have time to get it completely right (ah, youthful hubris); two, everyone (fellow students and, most importantly, the TAs) hated it, because of all of the reasons mentioned already.
Sometimes, for all your best intentions, you just can't ignore ingrained behaviours.

Database design for physics hardware

I have to develop a database for a unique environment. I don't have experience with database design and could use everybody's wisdom.
My group is designing a database for piece of physics hardware and a data acquisition system. We need a system that will store all the hardware configuration parameters, and track the changes to these parameters as they are changed by the user.
The setup:
We have nearly 200 detectors and roughly 40 parameters associated with each detector. Of these 40 parameters, we expect only a few to change during the course of the experiment. Most parameters associated with a single detector are static.
We collect data for this experiment in timed runs. During these runs, the parameters loaded into the hardware must not change, although we should be able to edit the database at any time to prepare for the next run. The current plan:
The database will provide the difference between the current parameters and the parameters used during last run.
At the start of a new run, the most recent database changes be loaded into hardware.
The settings used for the upcoming run must be tagged with a run number and the current date and time. This is essential. I need a run-by-run history of the experimental setup.
There will be several different clients that both read and write to the database. Although changes to the database will be infrequent, I cannot guarantee that the changes won't happen concurrently.
Must be robust and non-corruptible. The configuration of the experimental system depends on the hardware. Any breakdown of the database would prevent data acquisition, and our time is expensive. Database backups?
My current plan is to implement the above requirements using a sqlite database, although I am unsure if it can support all my requirements. Is there any other technology I should look into? Has anybody done something similar? I am willing to learn any technology, as long as it's mature.
Tips and advice are welcome.
Thank you,
Sean
Update 1:
Database access:
There are three lite applications that can write and read to the database and one application that can only read.
The applications with write access are responsible for setting a non-overlapping subset of the hardware parameters. To be specific, we have one application (of which there may be multiple copies) which sets the high voltage, one application which sets the remainder of the hardware parameters which may change during the experiment, and one GUI which sets the remainder of the parameters which are nearly static and are only essential for the proper reconstruction of the data.
The program with read access only is our data analysis software. It needs access to nearly all of the parameters in the database to properly format the incoming data into something we can analyze properly. The number of connections to the database should be >10.
Backups:
Another setup at our lab dumps an xml file every run. Even though I don't think xml is appropriate, I was planning to back up the system every run, just in case.
Some basic things about the design; you should make sure that you don't delete data from any tables; keep track of the most recent data (probably best with most recent updated datetime); when the data value changes, though, don't delete the old data. When a run is initiated, tag every table used with the Run ID (in another column); this way, you maintain full historical record about every setting, and can pin exactly what the state used at a given run was.
Ask around of your colleagues.
You don't say what kind of physics you're doing, or how big the working group is, but in my discipline (particle physics) there is a deep repository of experience putting up and running just this type of systems (we call it "slow controls" and similar). There is a pretty good chance that someone you work with has either done this or knows someone who has. There may be a detailed description of the last time out in someone's thesis.
I don't personally have much to do with this, but I do know this: one common feature is to have no-delete-no-overwrite design. You can only add data, never remove it. This preserves your chances of figuring out what really happened in the case of trouble
Perhaps I should explain a little more. While this is an important task and has to be done right, it is not really related to physics, so you can't look it up on Spires or on arXive.org. No one writes papers on the design and implementation of medium sized slow controls databases. But they do sometimes put it in their dissertations. The easiest way to find a pointer really is to ask a bunch of people around the lab.
This is not a particularly large database by the sounds of things. So you might be able to get away with using Oracle's free database which will give you all kinds of great flexibility with journaling (not sure if that is an actual word) and administration.
Your mentioning of 'non-corruptible' right after you say "There will be several different clients that both read and write to the database" raises a red flag for me. Are you planning on creating some sort of application that has a interface for this? Or were you planning on direct access to the db via a tool like TOAD?
In order to preserve your data integrity you will need to get really strict on your permissions. I would only allow one (and a backup) person to have admin rights with the ability to do the data manipulation outside the GUI (which will make your life easier).
Backups? Yes, absolutely! Not only should you do daily, weekly and monthly backups you should do full and incremental. Also, test your backup images often to confirm they are in fact working.
As for the data structure I would need much greater detail in what you are trying to store and how you would access it. But from what you have put here I would say you need the following tables (to begin with):
Detectors
Parameters
Detector_Parameters
Some additional notes:
Since you will be doing so many changes I recommend using a version control like SVN to keep track of all your DDLs etc. I would also recommend using something like bugzilla for bug tracking (if needed) and using google docs for team document management.
Hope that helps.

Non-file FileSystems?

I've been thinking on this for a while now (you know, that dangerous thing programmers tend to do) and I've been wondering, is the method of storing data that we're so accustomed to really all that efficient? The trouble with answering this question is that I really don't have anything to compare it to, since it's the only thing I've ever used.
I don't mean FAT or NTFS or a particular type of file system, I mean the filesystem structure as a whole. We are simply used to thinking of "files" inside "folders" like our hard drive was one giant filing cabinet. This is a great analogy and indeed, it makes it a lot easier to learn when we think of it this way, but is it really the best way to go about describing programs and their respective parts?
I'd like to know if anyone can think of (or knows about) a data storage technique that might be used to store data for an Operating System to use that would organize the parts of data in a different manner. Does anything... different even exist?
Emails are often stored in folders. But ever since I have migrated to Gmail, I have become accustomed to classifying my emails with tags.
I often wondered if we could manage a whole file-system that way: instead of storing files in folders, you could tag files with the tags you like. A file identifier would not look like this:
/home/john/personal/contacts.txt
but more like this:
contacts[john,personal]
Well... just food for thought (maybe this already exists!)
You can for example have dedicated solutions, like Oracle Raw Partitions. Other databases support similar thing. In these cases the filesystem provides unnecessary overhead and can be ommited - DB software will take care of organising the structure.
The problem seems very application dependent and files/folders seem to be a reasonable compromise for many applications (and is easy for human beings to comprehend).
Mainframes used to just give programmers a number of 'devices' to use. The device corresponsed to a drive or a partition thereof and the programmer was responsible for the organisation of all data on it. Of course they quickly built up libraries to help with that.
The only OS I think think of that does use the common hierachical arrangement of flat files (like UNIX) is PICK. That used a sort of relational database as the filesystem.
Microsoft had originally planned to introduce a new file-system for windows vista (WinFS - windows future storage). The idea was to store everything in a relational database (SQL Server). As far as I know, this project was never (or not yet?) finished.
There's more information about it on wikipedia.
I knew a guy who wrote his doctorate about a hard disk that comes with its own file system. It was based on an extension of SCSI commands that allowed the usual open, read, write and close commands to be sent to the disk directly, bypassing the file system drivers of the OS. I think the conclusion was that it is inflexible, and does not add much efficiency.
Anyway, this disk based file system still had a folder like structure I believe, so I don't think it really counts for you ;-)
Well, there's always Pick, where the OS and file system were an integrated database.
Traditional file systems are optimized for fast file access if you know the name of the file you want (including its path). Directories are a way of grouping files together so that they're easier to find if you know properties of the file but not its actual name.
Traditional file systems are not good at finding files if you know very little about them, however they are robust enough that one can add a layer on top of them to aid in retrieving files based on content or meta-information such as tags. That's what indexers are for.
The bottom line is we need a way to store persistently the bytes that the CPU needs to execute. So we have traditional file systems which are very good at organizing sequential sets of bytes. We also need to store persistently the bytes of files that aren't executed directly, but are used by things that do execute. Why create a new system for the same fundamental thing?
What more should a file system do other than store and retrieve bytes?
I'll echo the other responses. If I could pick a filesystem type, I personally would rather see a hybrid approach: a flat database of subtrees, where each subtree is considered as a cohesive unit, but if you consider the subtrees themselves as discrete units they would have no hierarchy, but instead could have metadata + be queryable on that metadata.
The reason for files is that humans like to attach names to "things" they have to use. Otherwise, it becomes hard to talk or think about or even distinguish them.
When we have too many things on a heap, we like to separate the heap. We sort it by some means and we like to build hierarchies where you can navigate arbitrarily sized amounts of things.
Hence directories and files just map our natural way of working with real objects. Since you can put anything in a file. On Unix, even hardware is mapped as "device nodes" into the file system which are special files which you can read/write to send commands to the hardware.
I think the metaphor is so powerful, it will stay.
I spent a while trying to come up with an automagically versioning file system that would maintain versions (and version history) of any specific file and/or directory structure.
The idea was that all of the standard access command (e.g. dir, read, etc.) would have an optional date/time parameter that could be passed to access the file system as it looked at that point in time.
I got pretty far with it, but had to abandon it when I had to actually go out and earn some money. It's been on the back-burner since then.
If you take a look at the start-up times for operating systems, it should be clear that improvements in accessing disks can be made. I'm not sure if the changes should be in the file system or rather in the OS start-up code.
Personally, I'm really sorry WinFS didn't fly. I loved the concept..
From Wikipedia (http://en.wikipedia.org/wiki/WinFS) :
WinFS includes a relational database
for storage of information, and allows
any type of information to be stored
in it, provided there is a well
defined schema for the type.
Individual data items could then be
related together by relationships,
which are either inferred by the
system based on certain attributes or
explicitly stated by the user. As the
data has a well defined schema, any
application can reuse the data; and
using the relationships, related data
can be effectively organized as well
as retrieved. Because the system knows
the structure and intent of the
information, it can be used to make
complex queries that enable advanced
searching through the data and
aggregating various data items by
exploiting the relationships between
them.

Where does software protection store its data? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am in the process of exploring the software protection schemes for my company. Sure enough, there are so many alternatives and almost all of them give a facility to limit:
Number of usage (executions)
Number of days
Now if I think about it, there must be some place in computer where "number of times the application has been used" or "number of days it has been used for" is stored. Here I assume that an application protected using one of these mechanism would NOT require it to run with Administrative privileges. And I understand that an application with normal user rights cannot modify a place which affects other users. Which would mean that if an application is expired for user A, it will still run for user B (which looks foolish enough). I wonder what place these schemes can possibly hide their information in to make it work?
They just hide it somewhere where it is hard to find, for example in a data file of the application or somewhere deep in the registry. So for timed limits (runs until April, 4th), you can use the date of a file or write the installation date somewhere in the registry (not the usual places; they write it below an odd key in the drivers section where you have lots of random 64 character keys). These keys can then additionally protected (removing write access for anyone).
The "number of times" limits needs to write the key, though, so the "limited access" scheme doesn't work (or works against the protection). These places have no protection but the fact that no one knows where the information is stored. A good place is somewhere in the middle of a huge data file: That makes it hard for the cracker to find even when they figure out the counter must be somewhere in that file.
That said, most good software sells because it's good, not because it's protected.
I believe the only way do do this kind of stuff reliably is some kind of client-server scheme. E.g. your company has a license server, and the client's software queries the server every time it runs. Of course this requires a working internet connection, which is not always available...
Sure you can write something to registry, but nothing prevents the user modifying it.
I know some protection mechanisms that require to be run with administrative privileges at least once (e.g. during installation). I assume they set up a place in a non-user-specific location (e.g. under HKEY_LOCAL_MACHINE or ProgramFiles or even WinDir) and also set write permissions for (authenticated) users to that location.
"And I understand that an application with normal user rights cannot modify a place which affects other users" - this sentence is where you are misunderstanding.
The application can store this sort of information in a file, in the registry (under windows) or possibly even in its own code or data files.
For example, a user can save a text file so another user may or may read it. Permissions can keep things private to only one user, but code is usually free to make a file readable by any user on almost any operating system.
I wonder what place these schemes can
possibly hide their information in to
make it work?
At least under Windows, the registry would be the common data store accessible to all users.
Software protections store time trial info either into the registry or into a file. You can use programs such as registry and file monitor in order to have a quick idea about the attempts of reading this data from the registry or from a file.
Another way is through reverse engineering. With the use of a debugger you can place breakpoints on the well known win APis that are used for this scope such as RegOpenKeyEx/RegQueryValueEx for reading data from the registry and CreateFile/ReadFile/GetFileSize etc in order to read info from a file.
You should consider reading the documentation of those API onto the MSDN.

Resources