Modifying the EXT2 Filesystem - filesystems

For a project I'm working on, I need to be able to modify the EXT2 filesystem. I have done extensive research, but as this is not a commonly required task, there seems to be very little helpful information available online. Unfortunately, due to confidentiality, I cannot go into specifics of what I am working on, but I can break the situation down to several key problems that I would be very grateful for assistance on. I should note that I have very little experience with Linux kernel/OS development.
I realise that this is not an easy problem to tackle, and it is definitely going to be a big challenge seeing as this is only a small part of the project. Any assistance with these problems (or any other warnings/advice) is greatly appreciated.
If this turns out to be impossible, then the entire project will have to be rethought, hence why I am starting at this point.
Problem 1: From source code to implementation
This is not really the first problem, but I need to answer it first in order to make sure I'm on the right track at all! Remember that I am a novice, so this may be a silly question.
From what I can tell, all EXT2 source code is contained within the kernel source code at linux/fs/ext2/. If I were to modify this source code to make the changes I require, then successfully rebuild the kernel, what would it take to make a drive or partition use this newly modified filesystem? Would I just be able to reformat a drive as EXT2, using the new kernel, and the modified filesystem would be applied to it due to the modified source code? Or am I oversimplifying this?
Problem 2: Extending the metadata
A vital part of my project requires me to extend the metadata that each file on the drive contains. In this case, metadata refers to details such as owner, created timestamp, modified timestamp etc. What I would like to be able to do is add and be able to update an additional metadata field. I would think the best approach would be to modify the inode, but after looking through the source code for a long time, I can't see anywhere obvious to start.
Problem 3: Consequences?
Assume I am successful in adding and using this field on the modified filesystem. If a file was then moved to another drive, with an unmodified EXT2 filesystem, would the data contained in this extra metadata field be lost? Obviously files are moved between different filesystems all the time with no problem, but I am unsure as to how they handle this. I should note that it is not required to be able to access this metadata on any other system, I only require that the data not be lost.
Bonus question
So far I've been using CentOS for my prototype. If there is any advice as to whether this is good/bad/doesn't matter, I would appreciate it.
EDIT/UPDATE TO CLARIFY PROBLEM
As I said previously, confidentiality prevents me discussing exactly what the end-game is. Here is a very basic problem, which should hopefully explain more of the kind of things I need to be able to do.
Traditional filesystems keep track of three main timestamps: created, last accessed, last modified. Let's say I wanted to add a new type of metadata, called "modify history", which stores a list of timestamps for each and every modification. The only way I can think to do this, would be to add another attribute which would point to a datablock, to which timestamps are appended every time a write is made to that file/inode.
Again, the actual problem is far more complex, but this hopefully gives a better idea of the kind of thing I need to be able to do.

Related

Is there some technology that can be used to read program-related custom file, and then manage cutom data like Memory Database?

Even I can use Sql statement with CRUD.In this case, I can show some data to UI easily by some specific change condition. Thanks in advance.
After a few days of research, I find some method. In Linux We can use MongoDB + memory-mapped file, such as tmpfs in Linux. Guide is here, English original and Chinese Translation.
But this is a bit troublesome when relating to Windows, which need help of some tool. If so, that's unnecessary. Write your own file helper honestly manage your data (also for me).

Is it possible to use Git as source control for code stored in a database?

I work on Labware LIMS, which has both configuration, and customization via its own programming language and internal code editor, and stores this customization code in database records. (Note, not the source code of the actual application itself, just the customization code a.k.a. LIMS Basic.) Almost everything in LIMS is stored in the database.
We want to investigate the possibility of using source control to protect this code but we don't know much more than the theory of using something like Git. (I have worked as a junior QA and used git but not as a dev and my knowledge is limited!)
Of particular use would be the merging tools, as currently we have to manually merge code in a text editor, if we even notice there is a conflict (checking content between dev and live is time consuming and involves using multiple tools, some of which are 3rd party tools we have developed ourselves, which are hit and miss. I personally find it easiest to cut and paste into a text file and then use Beyond Compare.
There is no notification that the code is different when moving it from dev to live (no deployment as such, you just import an xml file) so we often have things going live that someone was working on unbeknownst to each other. I.e. dev 1 is working on the code in object 1, dev 2 gets a ticket to make a change to object 1, does so and puts their change Live, whatever dev 1 was doing is now also Live in whatever state it was in. (Because we don't always have time to thoroughly check what state each object is in between up to 3 different databases.)
Is it possible to use source control just on the code within the database, but not necessarily the database itself? (We have backups and such for that but its easy for some aspects of the system to get overwritten by multiple devs working on overlapping areas at the same time.)
If anyone reading this has any specific knowledge of LW LIMS, we are referring to the Subroutines mostly, we have versioned Analyses which stands in for source control for the moment and is somewhat effective but no way to control who is doing what on the subroutines other than a comment log at the top. I have tried to find any information on how other teams source control their code in LIMS but to no avail.
The structure of one of these tables can range from as simple as the code just existing in one field as a straight text dump with a few other fields such as changed_on, changed_by and name (Subroutines), or more complex with code relating to one record being sprinkled around in multiple rows on another table entirely (Analyses) but even if it could just deal with the simple scenario to start with that would be great!
TL;DR: Could the contents of the Code field in a database record be treated like a regular code object in other dev environments somehow and source controlled using Git? (And is anyone willing to explain it simply for me to follow?)
As you need to version control table fields of subroutine, but LW LIMS doesn’t have the IDE for version control (such as git, svn etc). So the direct answer is no.
If you really want to do version control for the codes in database, you can create a git repository and only put the codes in git repository. when a file has updated, you can commit & push the changes. And it’s easy to compare the difference between versions.
More detail about git, you can refer git book.
LabWare LIMS has a number of options for version control. You COULD version the Subroutine table by adding a SUBROUTINE.VERSION field to the table, this works the same way as other versioned tables in LabWare where it asks you if you would like to create a new version of the object before saving. There are a few customers I work with that have done this.
Alternatively, (and possibly our more recommended method prior to LEM) there is the Snapshot capability where the system automatically takes a "snapshot" of objects as they are saved - when viewing these you have the ability to view them side by side in a comparison dialogue - it will show < or > for lines which are different.
Another approach is, if you have auditing turned on you are able to view the audit history for changes to specific objects - this includes subroutines.
One other approach is to use configuration packages - this has the ability to record version AND build numbers. Though individual subroutines is probably a bit too granular for it's intended design.
Lastly, since this question was originally posted we have developed a product called LabWare Environment Manager (LEM) which has some good change control functionality built-in.
For more information on the suggestions above, please have a look at the LabWare Technical manual for the version you are on. We also have a mailing list for questions like this to be posted. You might find an answer there. If you have access to our Support webpage you're able to search previous questions that have been asked. I'd also suggest that you get in touch with your Account Manager at LabWare who can help you answer some of your questions.
HTH

Interpreting WinBUGS traps and how to automate the program?

First of all, does anybody know of a developer's guide for WinBUGS? The website is full of detailed examples for Doodles and documentation for the model language, but I have yet to find anything about how to interpret trap windows.
Secondly, has anybody found any ways to streamline the check/load/compile/init/monitor/update cycle? By that I mean, there doesn't seem to be any way to say "don't bother rechecking the model or putting any of the settings back to their defaults (!!!), just keep loading data from these files, inits from those files, and for each generate a new coda". Even the standard Windows shortcuts are neutered here, forcing the user to keep clicking and filling the same fields with the same values over and over. This might seem like a minor issue, but when you are doing many similar analyses one after the other, it gets old fast.
I'm at the point where I'm about to use TRON.EXE to send fake mouseclicks to the program, but before going to that extreme I'm hoping there is some native and more elegant way to automate repetitive WinBUGS tasks.
Well... that's WinBUGS at its normal :-) Unfriendly, showing traps that would scare of an experienced kernel hacker.. :-) I don't think there exist some guide to traps. I mean if WinBUGS creators wanted to put some effort in being more user friendly, they would probably first made the traps more understandable, so that no guide was necessary.
I was trying to do something similar - i.e. to customize WinBUGS behaviour. First, you can call WinBUGS from R using R2WinBUGS. That way you are able to do a lot automatization but not all. For example, I wanted to have something like progress information in WinBUGS. The problem is that WinBUGS UI gets stuck during update cycles. R2WinBUGS creates the script.txt command script and there is command update (<big number of cycles>). What I wanted here was to customize this script.txt to contain a lot of smaller update(..) commands instead of one big one. But, the problem is that R2WinBUGS generates this script itself and you cannot change it.
So the way to customize WinBUGS could be that you create your own wrapper that creates the script.txt and other files. I believe you could do a lot more customization to WinBUGS this way.
However, I'm not sure if WinBUGS is worth it. Its development has stopped and while favorited by many people, it remains rigid. You can try JAGS or CppBugs which seem to have much more promissing future.
For a wrapper around R2WinBUGS that adds lots of functionality to streamline serious WinBUGS use, see my package rube (http://www.stat.cmu.edu/~hseltman/rube/) which is not yet on CRAN.
Among other things, it gives plain English error messages rather than passing your model/data/inits along to WinBUGS when a trap error is certain. It also gives a highly useful summary of your model/data/inits for finding problems that cannot be automatically detected. Of course, it does not catch all trap errors.
Turns out I didn't RTFM enough on the second part of my question. It turns out that the section of the WinBUGS 1.4 manual entitled "Batch-Mode: Scripts" lists all the batch commands. All the important UI functionality has a batch-mode command. There was only a little trial-and-error in getting the arguments right (for example over.relax('true')). What really took me a while to sort out is that WinBUGS seems to have trouble with some Windows paths, but as long as everything is in a subdirectory of the directory where WinBUGS is installed, it runs okay.
It's still kind of messy to have to keep loading all these little files, but I wrote an R-script that uses functions from the BRugs package to create all the files, name them in a consistent pattern, and generate a script that will then initialize the model and load them, over and over again.
I'll leave this question open for a while, though, to see if anybody has any suggestions on where I can learn to make better use of traps.

Version Control for a total newbie

I'm totally new to the world of programming and understand very little in terms of jargon and typical methodology.
A while ago I was writing some code, but accidentally deleted some good code while I was deleting bad code. From then on I started creating versions of my files, I would name each file with the date and a version number.
However, this is a pain in the ass, having to give an unique name to each file and then going to my core file and changing the reference to the name of the new file.
And then, just the other day I accidentally over wrote something important even with this method, probably because of a typo in naming.
Needless to say, this method sucks.
I'm looking for suggestions on better practices, better tools. I've been looking at version control, but a lot of them, git svn look really complicated. The idea is to speed up the whole versioning process, not make it harder by having to do command line.
Right now I'm hoping that there's a tool that would save an unique version of the file every time I hit ctrl-s, and give me one button to create a finalized version.
Of course if there are suggestions for totally different ways of doing things, that would be more awesome.
Thanks everyone.
There are two approaches to this problem:
Versioning on demand. This is the model used by subversion, CVS, etc., etc. When you have made a 'significant' change, you decide to tell the system "keep this version".
Automatic versioning. This is the model used by some old VAXen, Eclipse, IDEA, every wiki ever, and a few writer's tools. Every time you save, a new version is implicitly created. At some remove, old versions may be culled (e.g., only one version is kept from work performed a week ago, rather than every save).
It sounds like you would prefer #2, because it is "fool-proof" -- you never have to go, "oops, I should have 'checked in' / 'kept' my work before making this change." You can always roll back. One downside is that you have to manually step through the old versions to find something, because unlike with #1 you generally are not giving a description of each change.
Another downside is that for large files, or ones that are not easily diff'd/patched (i.e. binary files), you will start burning through disk space pretty fast..
As an aside, it sounds like you don't need 90% of the features in a standard SCM system -- branching, labeling, etc. -- but you might find uses for them eventually. So learning one may be a win in the long run. You can do this with svn, etc. but it will take some customizing. If you use a scriptable editor (emacs, vi, TextMate, whatever) you could redefine the "Save" command as "Save and make a new version".
Subversion is more or less the gold standard.
I'd suggest (especially for a newbie) that you check out BeanStalk (www.Beanstalkapp.com) to run your subversion server and TortoiseSVN for your client.
Good luck!
Whatever you do, if someone mentions Visual SourceSafe -- run as fast as you can. VSS was created by Satan himself and handed down to torment developers the world over.
I think you're in a position where you have to get a little bit out of your comfort zone and take some time to learn git. It's pretty easy to learn and use.
Believe me, it's really worth it. Time spent learning git is time well spent.
If you are not working in a team, you could use something like Eclipse's local history feature. It stores versions of your files locally, and you can revert to previous versions whenever you feel like it. More details here: http://help.eclipse.org/ganymede/index.jsp (Search for "local history"). I am pretty sure other IDEs have such a feature too.
If you are collaborating with others on your code, there probably is no way around learning one of the standard tools like SVN, CVS or git. For most of them, there are plugins for many IDEs available, so you don't have to use the command line.
I currently use Subversion, but my source control experience is limited.
I would however suggest reading the tutorial by Eric Sink.
http://www.ericsink.com/scm/source_control.html
Its best to learn how to use an existing 'industry standard' versioning tool like Subversion. Even if you're new to programming and version control, SVN isn't that hard to learn and will serve you well. I personally use and recommend VisualSVN Server and TortoiseSVN for Windows. Both are free and quite simple to use.
For a system that creates a revision on every save, perhaps you should look into a Versioning File System.
I think TortoiseSVN would be a good Subversion client for you to try if you're in Windows. It won't do what you're looking for with every-time-I-save-I-get-a-new-version--you'll have to manually "commit" versions to the repository. When you do a commit, that creates a new version, essentially saving your progress at that point. TortoiseSVN is pretty user-friendly, and it's a GUI, so you won't be working at the command line. You'll be able to do things like right-click a file in Windows Explorer and choose Commit to save your progress. Plus, TortoiseSVN is free and open source.
Subversion is not really complicated. If you are using Windows, TortiseSVN will help a lot, if you are using Eclipse, subclipse plug-in is awesome. (You probably should be using eclipse regardless :) )
Some of the others are a bit complicated, but you just have to know the pattern with eclipse. Maybe you could "Try it out" with an open source project or some existing subversion server.
The cycle would be:
First you "Check out" a repository. This fills up your specified directory with the contents from the repository.
If you are doing it from the command line--it's "svn co"--there is enough help there to figure out the rest.
Second you edit your files. You don't have to lock them or anything.
if you add a new file, you use "svn add filename" as soon as you add it. This won't actually change the repository until you commit your changes.
When a group of edits are done, you check them in with "svn ci" (also svn commit works).
This one has a SLIGHT twist that you'll always forget--every commit needs a comment. You don't have to specify the files you are committing or anything, but you do need to be in the top level of your project (it will commit everything below your directory.
So the procedure here is, go to the "root" of your project tree and type:
svn ci -m "comment"
piece of cake.
Finally, IF someone else is checking stuff in things get SLIGHTLY stranger. before you commit, you should "update" and get their changes. "svn up" is all it takes, but it may warn you that there were merges. This only happens when both of you edited the same file, and 90% of the time, the merges will go okay. the rest of the time, it will put little markers in your file telling you what you changed and what they changed. The "up" command will tell you which files it did this to. Go look at them and clean the file up before you check the file in.
Always test between "svn up" and "svn ci", you never know if their crappy changes busted your pristine code.
That's really it. It's so easy from the CLI, that the graphics environments are hardly worth it (but subclipse is really nice if you are in eclipse anyway because it will visually show you modified files that need to be checked in).
If you ever forget, svn's command line help is extremely terse and useful, tells you JUST what you need to know, and has help on all the sub-commands and options.
If you're looking for an easy-to-set-up version control system for Windows, I highly recommend TortoiseHg, an easy-to-use Mercurial frontend for Windows. You don't have to worry about setting up and keeping track of a repository separate from your files, but you always can do so if you'd like to. Mercurial is a great tool because it can grow with your needs. It has all the usual features like easy merging, etc. and is quite a bit easier to wrap your head around than Git in my experience.
I think Git is really easy to use especially when you use GitHub. They also provide lots of good guides to get up and running.
http://github.com
http://github.com/guides/home
I've used Git, SVN, CVS, and Perforce. On both Windows and Unix environments.
My vote is definitely for SVN, as it's ease of use, and flexibility. I prever to use command-line now, but at one time I was using TortoiseSVN for Windows, which we were able to get non-technical people to use without a hitch.
Use SVN.
You're definitely on the right track with recognizing the need for version control, but sound unsure what that might mean to you and your work. Once you learn the concepts behind version control systems, you will really come to appreciate them.
The concepts are simple: a source code control system is a piece of software designed to help you store and manage your code. How you get code into and out of it differ based on which system you choose: one paradigm is that you deliberately "check out" a file, make your changes to it, test it and make sure it's good, then check it back in. Another is that you simply save every change you make because disk space is dirt cheap, much cheaper than your time and effort spent to create the source in the first place.
Another important concept is the "baseline" or "label". When your product is in a ready-to-ship state, you tell the source code control system to create a "label" and tag every current item your entire source code base with that label. That way, when someone reports a bug in version 4.1 you can go to your system, request all the files with the "Version 4.1" label and get exactly the source code they're having a problem with.
Having a source control tool integrated with your development environment makes the whole process much easier than having to mess with command lines. (Don't discount command line because of their complexity, they deliver elegant control to an experienced user, and you eventually will become an experienced user.) But for now, I'd recommend a source code tool that can automate the process as much as possible.
Some things to consider: are you now, or are you planning to share the development with another developer? That might make a difference on how you want to set up a server. If you're developing alone on your own box, you can set it all up locally, but that's probably not the best approach for a team. (If you're unsure, git is very flexible in that arena.) Are you going to be storing large multimedia files, or just source code? Some source code systems are designed to efficiently store only text files, and will not handle movies, sounds or image files very well.
Something else to know is that most newer source control systems require some kind of "daemon" program running on the server (Subversion, git, Perforce, Microsoft Team Foundation Server) while the older, simpler systems just use the file system directly (Visual Source Safe, cvs) and don't require a server program.
If you don't want to learn much and your demands are low, the simpler solutions should suffice. Microsoft's Visual Source Safe used to come with their visual studio products, and was a very simple to use tool. It's not very robust, it's Microsoft-only, and it can't handle large files well, but it's very, very easy to set up and use. If you don't want to spend money, Subversion and git are two stellar open source solutions, and there is a lot of documentation for both on the web.
If you like to spend money, Perforce is considered an excellent choice for professional development teams (and I believe they have a free single-developer version.) If you really like to spend lots of money and want to make Bill Gates happy, Microsoft's Team Foundation Server is a complete software development lifecycle manager, is extremely easy to use in the Windows environment, and very powerful; but you'd probably want to devote an entire Windows server (plus SQL Server) instance to host it, and it will cost you several thousand dollars just on licenses. Unfortunately it is not the right tool for a one-man shop, or if you have no Windows admin experience.
If you have the budget or the connections, bringing in an experienced software engineer to help you get things started might be the quickest path to success. Otherwise, you'll have to do some more research to learn which systems best fit your situation.
Whatever VCS you use, if you choose versioning on demand instead of automatic versioning (to borrow terms from Alex's post), you will have to go through some ceremony to:
-create,
-rename,
-move,
-copy, or
-delete
a file that is under source control.
When you create a new file, you have to Add it to source control before you Commit your changes to the repository.
When you rename, move, copy, or delete a file under source control, do so with your VCS client. In TortoiseSVN and TortoiseGit, the move and copy operations are done with a right-click-and-drag, whereas the rename and delete operations are available via a right-click.
As you can imagine, changing things like the name of a project can be quite the hassle, hence the case for automatic versioning.
Ordinary file edits and any changes to files not under source control, do not require you to tell your VCS client about them.
Finally, for one-man projects, I prefer git over SVN because SVN requires at least 2 copies of everything: a repository (the "master" copy of the files and history) and a working copy (the copy you do your work on). With git, the repository and working copy are the same thing, which makes my experience simpler.
We use SourceGear Vault, which has great integration with Visual Studio, and is free for a single user. Depending on what framework and languages you're using, though, Subversion is a great free solution.
First of all, please read these articles by Eric Sink. Eric sink runs a company that creates a Source Control system called Vault. He explains in a newbie friendly manner how to do source control, best practices etc:
Introduction to Source Control
I found it invaluable when I first wanted to understand Source Control.
SourceGear Vault is FREE for a single user. It's interface is intuitive and integrates well with Visual Studio.
If it's just you, you might want to try Bazaar. It's distributed like Git (so it'll be nice for a single person--no server to deal with), but one of their main goals was to make it it much easier to use than Git.
Also, there is a handy gui tool that should make it amazingly easy to use called ToroiseBzr. http://bazaar-vcs.org/TortoiseBzr
There is in fact such a tool. It is called emacs.
Just create yourself a "~/.emacs" file and put the following lines in it:
(setq kept-new-versions 5)
(setq kept-old-versions 5)
And then restart emacs.
This tells emacs to save your 5 oldest and 5 newest versions of that file. They will be kept in files named filename~n~ where "filename" is your file's normal name, and "n" is the backup number.
If you develop your project alone (don't need ane server for collaboration) Mercurial might be you system of choice. I personally value one of its features: it only uses one place to save its information, it is the .hg directory in the root of your project. It doesn't put its data into every directory (like SVN). This way the archive and the project directory is easy to manage.
I've used Visual Source Safe, Perforce, and Subversion. They were all fine, but I would have to say that the support and extensions for Subversion just seemed slightly better. If you're planning on entering/staying in the software industry, you MUST know the fundamentals to source control, and I would highly recommend setting up one of the source control services. Subversion would be my recommendation and is free as well. It will be complicated at first, but you really should use a SVN client to add a GUI to increase utility and cut down on all the complication you're observing.
I quick google of "dreamweaver svn" reveals that many people are working with Subversion in Dreamweaver. I'm a advocate of version control, and SVN in particular, so I would recommend you look into that :)
If you don't want to use a full on version control system (as noted above), you may be able to improve your lot by refining and automating the procedure you described originally. Depending on your comfort with the tools you should be able to put together a script in DreamWeaver itself or in Windows Scripting ( Powershell, VBA, Perl, etc ) that will at least make date-named copies of the folder you are working in every so often. This will keep you from having to do it and make sure there aren't any typo-related problems. Further down that path you can have your script put a copy of your work on a backup drive or remote server, and then you'd have a back up, too.
I'm afraid I don't know much about DreamWeaver, but if it has much scripting support built-in you may even be able to "hook" into the Save/ Auto-Save functions and have them do exactly what you want.
Hope this helps,
adricnet

Optimisation: use local files or databases for HTML

This follows on from this question where I was getting a few answers assuming I was using files for storing my HTML templates.
Before recently I've always saved 'compiled' templates as html files in a directory (above the root). My templates usually have two types of variable - 'static' variables which are not replaced on every usage but are used across the site - basically for ease of maintenanceif I decide to change the site name for example; and dynamic vars that change on every page load.
I always used to save these as files on the server - but my friend pointed out something I'd overlooked: why have 5-10 filesystem calls when you can have one database call?
What I want to know is which is more efficient? Calling several HTML files from the system or calling several rows of templates from the database (in one query/call).
Don't Store Editable HTML In the database
Seriously, because the maintenance overhead for mere changes becomes exhaustive once you realise you can no longer just pop open a text editor.
I worked on many projects which had HTML content in the database, and it was a constant nightmare of "find that row the content is on" and I really would liked to have shot the person whom made it.
Also, DON'T PREMATURELY OPTIMISE . If you find it a problem thats slowing the project down, then change it. Because making the code exhaustively less maintainable to save a millisecond. But design the code well enough that should you need to change where the content comes from later it should be easy to do.
Surely that can be resolved by having
a suitable web interface for editing
the templates?
Erm, really not, unless you're only trying to compete with notepad. Syntax highlighting and all the other full host of features you can get in a standard editor just make your developers suicidal when they find themselves editing web pages by hacking at an undersized text area with awful white on black ( not to mention the extra fun you get with having to worry about entity encoding etc, for instance, try editing html with in a text area where the html content contains a text area element! )
On FileIO While File IO can be a bottleneck, keep in mind that if you have a proper linux install, and plenty of memory, a handy thing known as "disk-cache" takes effect, which in effect, keeps files used in memory, so file IO becomes mere memcpy.
On the contrary, in real stress tests on any of the code I have used, the biggest slowdowns have been in the database!, primarily the nice slow CONNECT string, query parse time, extra php<->mysql interactions. You're not really looking at gaining anything. Filesystem lookup is close to database index lookup, and you don't have any unknowns other than "you need to stream it from a disk", no table locking stuff to worry about!
You should probably try something like a caching library, X-Cache comes highly recommended, thats more likely to give you visible performance gains.
I pretty much agree with the gist of Kent Fredric's answer. But, if you really want to know, which is more efficient/faster, you cannot reasonably expect to get the answer here. If you want that answer, there is only one way to get it: profile the application both ways.

Resources