Update2:
Thanks for the input. I have implemented the algorithm and it is available for download at SourceForge. It is my first open source project so be merciful.
Update:
I am not sure I was clear enough or everyone responding to this understands how shells consume #! type of input. A great book to look at is Advanced Unix Programming. It is sufficient to call popen and feed its standard input as demonstrated here.
Original Question:
Our scripts run in highly distributed environment with many users. Using permissions to hide them is problematic for many reasons.
Since the first line can be used to designate the "interpreter" for a script the initial line can be used to define a a decrypter
#!/bin/decryptandrun
*(&(*S&DF(*SD(F*SDJKFHSKJDFHLKJHASDJHALSKJD
SDASDJKAHSDUAS(DA(S*D&(ASDAKLSDHASD*(&A*SD&AS
ASD(*A&SD(*&AS(D*&AS(*D&A(SD&*(A*S&D(A*&DS
Given that I can write the script to encrypt and place the appropriate header I want to decrypt the script (which itself may have an interpreter line such as #!/bin/perl at the top of it) without doing anything dumb like writing it out to a temporary file. I have found some silly commercial products to do this. I think this could be accomplished in a matter of hours. Is there a well known method to do this with pipes rather than coding the system calls? I was thinking of using execvp but is it better to replace the current process or to create a child process?
If your users can execute the decryptandrun program, then they can read it (and any files it needs to read such as decryption keys). So they can just extract the code to decrypt the scripts themselves.
You could work around this by making the decrtyptandrun suid. But then any bug in it could lead to the user getting root privileges (or at least privileges to the account that holds the decryption keys). So that's probably not a good idea. And of course, if you've gone to all the trouble of hiding the contents or keys of these decryption scripts by making them not readable to the user... then why can't you do the same with the contents of the scripts you're trying to hide?
Also, you can't have a #! interpreted executable as an interpreter for another #! interpreted executable.
And one of the fundamental rules of cryptography is, don't invent your own encryption algorithm (or tools) unless you're an experienced cryptanalyst.
Which leads me to wonder why you feel the need to encrypt scripts that your users will be running. Is there anything wrong with them seeing the contents of the scripts?
Brian Campbell's answer has the right idea, I'll spell it out:
You need to make your script unreadable but executable by the user (jbloggs), and to make decodeandrun setuid. You could make it setuid root, but it would be much safer to make it setgid for some group decodegroup instead, and then set the script file's group to decodegroup. You need to make sure that decodegroup has both read and execute permissions on the script file and that jbloggs is not a member of this group.
Note that decodegroup needs read permission for decodeandrun to be able to read the text of the script file.
With this setup, it is then possible (on Linux at least) for jbloggs to execute the script but not to look at it. But observe that this makes the decryption process itself unnecessary -- the script file might as well be plaintext, since jbloggs can't read it.
[UPDATE: Just realised that this strategy doesn't handle the case where the encrypted contents is itself a script that starts with #!. Oh well.]
You're solving the wrong problem. The problem is that you have data which you don't want your users to access, and that data's stored in a location to which the users have access. Start by attempting to fix the problem of users with more access than they require...
If you can't protect the whole script, you may want to look into just protecting the data. Move it to a separate location and encrypt it. Encrypt the data with a key only accessible by a specific ID (preferably not root), and write a small suid program to access the data. In your setuid program, do your validation of who should be running the program, and compare the name / checksum of the calling program (you can inspect the command line for the process in combination with the calling process's cwd to find the path, use lsof or the /proc filesystem) with the expected value before decrypting.
If it takes more than that, you really need to reevaluate the state of users on the system - they either have too much access or you have too little trust. :)
All of the exec()-family functions you link to accept a filename, not a memory address. I'm not sure at all how you would go about doing what you want, i.e. "hooking" in a decryption routine and then re-directing to the decrypted script's #! interpreter.
This would require you to decrypt the script into a temporary file, and pass that filename to the exec() call, but you (very reasonably) said you didn't want to expose the script by putting it in a temporary file.
If it were possible to tell the kernel to replace a new process with an existing one in memory, you would have a path to follow, but as far as I know, it isn't. So I don't think it will be very easy to do this "chained" #! following.
Related
I'm working on a project in golang that needs to index recently added file content (using framework called bleve), and I'm looking for a solution to get content of a file since last modification. My current work-around is to record the last indexed position of each file, and during indexing process later on I only retrieve file content starting from the previous recorded position.
So I wonder if there's any library or built-in functionality for this? (doesn't need to be restricted to go, any language could work)
I'll really appreciate it if anyone has a better idea than my work-around as well!
Thanks
It depends on how the files change.
If the files are append-only, then you only need to record the last offset where you stopped indexing, and start from there.
If the changes can happen anywhere, and the changes are mostly replacing old bytes with new bytes (like changing pixels of an image), then perhaps you can consider computing checksum for small chucks, and only index those chunks that has different checksums.
You can check out crypto package in Go standard library for computing hashes.
If the changes are line insertion/deletion to text files (like changes to source code), then maybe a diff algorithm can help you find the differences. Something like https://github.com/octavore/delta.
If you're running in a Unix-like system, you could just use tail. If you specify to follow the file, the process will keep waiting after reaching end of file. You can invoke this in your program with os/exec and pipe the Stdout to your program. Your program can then read from it periodically or with blocking.
The only way I can think of to do this natively in Go is like how you described. There's also a library that tries to emulate tail in Go here: https://github.com/hpcloud/tail
I have this funny idea: write some data (say variable of integer type) to the end of the executable itself and then read it on the next run.
Is this possible? Is it a bad thing to do (I'm pretty sure it's :) )? How one would approach this problem?
Additional:
I would prefer to do this with C under Linux OS, but answers with any combination of programming language/OS would be appreciated.
EDIT:
After some time playing with the idea, it became apparent that Linux won't allow to write to a file while it's being executed. However, it allows to delete it.
My vision of the writing process at this point:
make a copy of the program from withing a program
append data to the end of the copy
make a program to delete itself
rename copy to the original name
Will try to implement that as soon as I have some time.
If anyone is interested about how "delete itself" works under Linux - look for info about inode. It's not possible to do this under Windows, as far as I know (might be wrong).
EDIT 2:
Have implemented a working example under Linux with C.
It basically use a strategy described above, i.e. appending bits of data to the end of the copy program, deletes itself and renaming program to the original name. It accepts integers to save as single argument in the CLI, and prints old data as well.
This surely won't work under Windows (although I found some options on a quick search), but I'm curious how it's gonna behave under OS X.
Efficiency thoughts:
Obviously copying whole executable isn't efficient. I guess that something faster is possible with another helper executable that will do the same after program stops executing.
It's not reusing old space but just appending new data to the end on each run. This can be fixed with some footer reservation process (maybe will try to implement this in the future)
EDIT 3:
Surprisingly, it works with OS X! (ver. 10.11.5, default gcc).
I have an application that is currently configurable via command line arguments
myprog -fooFile 'foo.txt' -barFile 'bar.txt'
Command line parameters are a bit cumbersome so I want to allow for other avenues for these configurations but I am a bit disappointed that its looking more complicated then it taught it should be.
Solution 1: Use environment variables
I could make my program look for MYPROG_FOO_FILE and MYPROG_BAR_FILE envorinment variables. The implementation is just a getenv but env variables are global and add clutter and are also hard to configure. Specially for GUI apps because many window managers don't source ".profile" during initialization.
Solution 2: Use a configuration file
Since the fooFile and progFile are kind of static configuration values, it seems like its better to put those in a configuration file somewhere:
fooFile = /home/hugomg/foo.txt
barFile = /home/hugomg/bar.txt
But afaik there is no easy way in C to read an arbitrarily long string from a file. Do I really need to make a loop that keeps reallocing a buffer just for this?
Alternatively, since all my values are file names, is there some #define somewhere specifying the maximum size of a path string?
Your configuration file approach with a simple syntax looks good to me.
Since you are on Linux, you could use getline to read the file. It can handle lines of arbitrary length and manages the memory allocation for you.
If your application has a GUI, you may want your user to configure it from within the application. In this case, you may want to use a database to store your configuration; that way you can validate your data on it's way in, so your user can't write a bad config file. I would suggest using SQLite for this: it provides a full database in a file, and is pretty easy to use.
http://www.sqlite.org/
I'm making a program and one of the things it needs to do is transfer files. I would like to be able to check before I start moving files if the File system supports files of size X. What is the best way of going about this?
Go on with using a function like ftruncate to create a file of the desired size in advance, before the moving, and do the appropriate error-handling in case it fails.
There's no C standard generic API for this. You could simply try creating a file and writing junk to it until it is the size you want, then deleting it, but even that isn't guaranteed to give you the info you need - for instance another process might have come and written a large file in between your test and your transfer, taking up space you were hoping to use.
I need to interface with some executables that expect to be passed filenames for input or output:
./helper in.txt out.txt
What is the standard (and preferably cross-platform) way of doing this?
I could create lots of temporary files int the /tmp directory, but I am concerned that creating tons of files might cause some issues. Also, I want to be able to install my program and not have to worry about permissions later.
I could also just be Unix specific and try to go for a solution using pipes, etc. But then, I don't think I would be able to find a solution with nice, unnamed pipes.
My alternative to this would be piping input to stdin (all the executables I need also accept it this way) and get the results from stdout. However, the outputs they give to stdout are all different and I would need to write lots of adapters by hand to make this uniform (the outputs through files obey a same standard). I don't like how this would lock in my program to a couple of formats though.
There isn't a right or wrong answer necessarily. Reading/writing to stdin/out is probably cleaner and doesn't use disk space. However, using temporary files is just fine too as long as you do it safely. Specifically, see the mktemp and mkstemp manual page for functions that let you create temporary files for short-term usage. Just clean them up afterward (unlink) and it's just fine to use and manipulate temp files.