I have a c code with a character array initialized to "hello world".
I would like to know if there is a possibility to re-initialize this character array upon each execution of the code, to some other random string. If not C, may I know if such an implementation is possible in any other programming language?
In detail, let's say my code looks like:
char c[] = "hello world";
.
.
After executing this code once, I want the char array c to be initialized automatically to some other random string (and not "hello world") and this should be a permanent change. This need is for security reasons. May I know if such a thing is even possible?
If not, may I know if it is possible to let the code self-destroy after executing it once?
It would be much easier to answer this question if you could describe what you are trying to achieve, rather than a particular mechanism you thought of to achieve it.
Modifying the running executable is typically prevented by modern operating systems, and for good reasons (security, integrity, etc). And modifying the executable on-disk is also inadvisable, for similar reasons. I'm sure there are other ways to achieve what you wish without resorting to self-modifying code.
I have a c code with a character array initialized to "hello world'. I would like to know if there is a possibility to re-initialize a this character array upon each execution of the code, to some other random string.
Yes, it's possible, but you would need to give the user write access to the executable to do so, find the correct offset in the binary, patch the file and save it intact atomically, etc, etc. There are many ways this can go wrong. Don't do it this way.
Instead of a statically allocated string in the executable itself, just have a separate resource file that contains the string(s) and whatever other state is required. You can change this resource file (INI file, data file, whatever) to modify the string as and when required (or even delete it). To provide security, you can digitally sign the file, which allows you to verify the contents are legitimate. If the signature fails, the software can refuse to work. You can also encrypt the contents so that it cannot be read out by an inquisitive user. (Unless they are handy with a debugger!)
If not C, may I know if such an implementation is possible in any other programming language?
It's mainly an OS restriction, including a regular user being able to write to an installed app in a system folder (typically something you do not want to allow). And if you can't do it in C, you probably can't do it in another language!
After executing this code once, I want the char array c to be initialized automatically to some other random string (and not "hello world") and this should be a permanent change. ... May I know if such a thing is even possible?
Yes indeed, but use a separate file as described above. Don't modify the executable itself.
This need is for security reasons.
The reason self-modifying code is disallowed in the first place is for security reasons. If you are attempting to implement some kind of copy protection, you would do well to research existing methods and tools and best practices, and possibly even use an off-the-shelf solution. This stuff is hard, and there are people who crack software for fun who could easily get around all but the most sophisticated protection schemes.
If not, may I know if it is possible to let the code self-destroy after executing it once?
You might be able to get the executable to delete itself, but this would typically require elevated privileges too.
Just get the program to check the signature on the resource file and refuse to run if it isn't valid.
You can self modify a EXE file, but you need to know:
You are basically reinventing the wheel doing it that way
Self modifying code is usually (in fact, most likely) flagged by antivirus software. Destroying the EXE after it is executed is even more shady.
How EXE (and ELF) files are executed
What the assembly language is
Related
I have this funny idea: write some data (say variable of integer type) to the end of the executable itself and then read it on the next run.
Is this possible? Is it a bad thing to do (I'm pretty sure it's :) )? How one would approach this problem?
Additional:
I would prefer to do this with C under Linux OS, but answers with any combination of programming language/OS would be appreciated.
EDIT:
After some time playing with the idea, it became apparent that Linux won't allow to write to a file while it's being executed. However, it allows to delete it.
My vision of the writing process at this point:
make a copy of the program from withing a program
append data to the end of the copy
make a program to delete itself
rename copy to the original name
Will try to implement that as soon as I have some time.
If anyone is interested about how "delete itself" works under Linux - look for info about inode. It's not possible to do this under Windows, as far as I know (might be wrong).
EDIT 2:
Have implemented a working example under Linux with C.
It basically use a strategy described above, i.e. appending bits of data to the end of the copy program, deletes itself and renaming program to the original name. It accepts integers to save as single argument in the CLI, and prints old data as well.
This surely won't work under Windows (although I found some options on a quick search), but I'm curious how it's gonna behave under OS X.
Efficiency thoughts:
Obviously copying whole executable isn't efficient. I guess that something faster is possible with another helper executable that will do the same after program stops executing.
It's not reusing old space but just appending new data to the end on each run. This can be fixed with some footer reservation process (maybe will try to implement this in the future)
EDIT 3:
Surprisingly, it works with OS X! (ver. 10.11.5, default gcc).
There are several questions dealing with some aspects of this problem, but neither seems to answer it wholly. The whole problem can be summarized as follows:
You have an already compiled executable (obviously expecting the use of this technique).
You want to add an arbitrarily sized binary data to it (not necessarily by itself which would be another nasty problem to deal with).
You want the already compiled executable to be able to access this added binary data.
My particular use-case would be an interpreter, where I would like to make the user able to produce a single file executable out of an interpreter binary and the code he supplies (the interpreter binary being the executable which would have to be patched with the user supplied code as binary data).
A similar case are self-extracting archives, where a program (the archiving utility, such as zip) is capable to construct such an executable which contains a pre-built decompressor (the already compiled executable), and user-supplied data (the contents of the archive). Obviously no compiler or linker is involved in this process (Thanks, Mathias for the note and pointing out 7-zip).
Using existing questions a particular path of solution shows along the following examples:
appending data to an exe - This deals with the aspect of adding arbitrary data to arbitrary exes, without covering how to actually access it (basically simple append usually works, also true with Unix's ELF format).
Finding current executable's path without /proc/self/exe - In companion with the above, this would allow getting a file name to use for opening the exe, to access the added data. There are many more of these kind of questions, however neither focuses especially on the problem of getting a path suitable for the purpose of actually getting the binary opened as a file (which goal alone might (?) be easier to accomplish - truly you don't even need the path, just the binary opened for reading).
There also may be other, probably more elegant ways around this problem than padding the binary and opening the file for reading it in. For example could the executable be made so that it becomes rather trivial to patch it later with the arbitrarily sized data so it appears "within" it being in some proper data segment? (I couldn't really find anything on this, for fixed size data it should be trivial though unless the executable has some hash)
Can this be done reasonably well with as little deviation from standard C as possible? Even more or less cross-platform? (At least from maintenance standpoint) Note that it would be preferred if the program performing the adding of the binary data didn't rely on compiler tools to do it (which the user might not have), but solutions necessiting those might also be useful.
Note the already compiled executable criteria (the first point in the above list), which requires a completely different approach than solutions described in questions like C/C++ with GCC: Statically add resource files to executable/library or SDL embed image inside program executable , which ask for embedding data compile-time.
Additional notes:
The problems with the obvious approach outlined above and suggested in some comments, that to just append to the binary and use that, are as follows:
Opening the currently running program's binary doesn't seem something trivial (opening the executable for reading is, but not finding the path to supply to the file open call, at least not in a reasonably cross-platform manner).
The method of acquiring the path may provide an attack surface which probably wouldn't exist otherwise. This means that a potential attacker could trick the program to see different binary data (provided by him) like which the executable actually has, exposing any vulnerability which might reside in the parser of the data.
It depends on how you want other systems to see your binary.
Digital signed in Windows
The exe format allows for verifying the file has not been modified since publishing. This would allow you to :-
Compile your file
Add your data packet
Sign your file and publish it.
The advantage of following this system, is that "everybody" agrees your file has not been modified since signing.
The easiest way to achieve this scheme, is to use a resource. Windows resources can be added post- linking. They are protected by the authenticode digital signature, and your program can extract the resource data from itself.
It used to be possible to increase the signature to include binary data. Unfortunately this has been banned. There were binaries which used data in the signature section. Unfortunately this was used maliciously. Some details here msdn blog
Breaking the signature
If re-signing is not an option, then the result would be treated as insecure. It is worth noting here, that appended data is insecure, and can be modified without people being able to tell, but so is the code in your binary.
Appending data to a binary does break the digital signature, and also means the end-user can't tell if the code has been modified.
This means that any self-protection you add to your code to ensure the data blob is still secure, would not prevent your code from being modified to remove the check.
Running module
Windows GetModuleFileName allows the running path to be found.
Linux offers /proc/self or /proc/pid.
Unix does not seem to have a method which is reliable.
Data reading
The approach of the zip format, is to have a directory written to the end of the file. This means the data can be found at the end of the location, and then looked backwards for the start of the data. The advantage here, is the data blob is signposted from the end of the data, rather than the natural start.
Is it possible for one portable program to keep save data inside the application?
I don't want the program to create folders or files.
In order to do it in a portable way, you should have no assumptions about the architecture or operating system: you may or may not have access to the executable in the first place (it could be argv[0], but maybe it isn't. If you had access to the executable file, you could have the rights to open the file and modify it, but maybe you cannot do it.
If, anyway, you want to try it, you could:
Check if argv[0] is a file, that you have read and write permission, and if it is really your code (looking for a random string you can leave somewhere in your code, for example).
Choose a string to mark your modifications, for example, "Edenia", and check if the last bytes of that file are those. If so, the file has been previously modified, and you can read your data process it.
When you want to store additional data, add it to the end of the file (if it was not modified yet), or substitute the modifications it had. Don't forget to add the mark at the end of the file ("Edenia", or whatever).
Anyway, I still think this is not the proper way to store data: try to use external storage (files, database, etc) if you can.
So I've run into an interesting design pattern and I wanted to know if you guys had an opinion on it.
Basically, the design is passing everything around as a pre-serialized type. There is no "types" for the returns, for example. It is passed as a simple uint8_t*. There is a defined header that "tells" you what is in the buffer, how big it is, what the version of the buffer is, ect. I call it "pre-serialized" because it forces flattening of all structures.
The pros:
You can easily write it (or even a set of it) to what ever you want. Files, IO, whatever.
Can store arbitrary data.
The Cons: IMHO:
No type safety is going to be a nightmare
The programmer has to parse the code. Even if there is an enumerated type, the user would have to know what that type means. Even if there are functions to parse the type, the programmer has to know that is the function to call.
Version hell: changing code will cause a ripple effect of errors. Because everywhere is parsing it differently, you have no idea where the code works or where it is broken.
It is viral: because it is flat, you can't "insert" the header on the end of outside data. You could wrap the call if you copy your "data", but this could cause an unnecessary copy that would be SLOW. So either your code is slower than it needs to be, or you conform to this data structure.
It isn't human readable OR debug-able.
Have you seen this design pattern before? Is there a name for this design pattern? Things I missed?
Is there a name for this design pattern?
Well, Legacy Code? :) I have seen such design in 30 years old Cobol systems...
The pros you have stated are easily reachable also by using XML format (or JSON):
You can easily write it (or even a set of it) to what ever you want. Files, IO, whatever - most of all, web services!
Can store arbitrary data.
Furthermore, all your cons are eliminated.
The only pro I can see in your solution is conciseness - when every byte counts and you need to avoid any overhead as too expensive, then this is nice.
Added: Cobol has a feature to easily define the structure of such serialized data, see PICTURE clause. Reading the data is very easy then, you read them as variables. (Like if you have a binary data and define a struct in the C language and typecast the binary to the struct.)
As Honza said this would be normal in Legacy Cobol/PL1 (was there a Cobol/PL1 conversion or interface to COBOL programs ???).
In COBOL this design pattern would make sense, not sure about C though (one of the binary serialization packages or JSON etc might be more sensible).
In Cobol, you would have a Cobol copybook which all programs would use and could edit the data using the Cobol Copybook (with something like file-aid or Microfocus Data Editor).
Why use this "design pattern" in Cobol:
Regression testing of Modules; you can write a driver module like
Read Test-data-file
while more-data
Call Module
write Result to output-file
Read Test-data-file
end
You can then do a compare between Output from the
re-Change Program to the changed program.
Testing - some times you can use a "production file" in testing
A file provides trace or snapshot of what is going on, this can be very useful.
Easy to reorganize Batch streams:
Split a programs up (and pass the data via file). There variety of reason for doing this including
program has gotten to big and is hard to maintain.
Sorting the data
Performance (use a file rather than hitting the DB multiple times)
new uses for extracted data
While your cons are valid for C, they will be less of an issue in Cobol.
The key to using this "design pattern" is being able to edit/view/compare the format. If you can not edit/view/compare a file, I do not see the point
Update2:
Thanks for the input. I have implemented the algorithm and it is available for download at SourceForge. It is my first open source project so be merciful.
Update:
I am not sure I was clear enough or everyone responding to this understands how shells consume #! type of input. A great book to look at is Advanced Unix Programming. It is sufficient to call popen and feed its standard input as demonstrated here.
Original Question:
Our scripts run in highly distributed environment with many users. Using permissions to hide them is problematic for many reasons.
Since the first line can be used to designate the "interpreter" for a script the initial line can be used to define a a decrypter
#!/bin/decryptandrun
*(&(*S&DF(*SD(F*SDJKFHSKJDFHLKJHASDJHALSKJD
SDASDJKAHSDUAS(DA(S*D&(ASDAKLSDHASD*(&A*SD&AS
ASD(*A&SD(*&AS(D*&AS(*D&A(SD&*(A*S&D(A*&DS
Given that I can write the script to encrypt and place the appropriate header I want to decrypt the script (which itself may have an interpreter line such as #!/bin/perl at the top of it) without doing anything dumb like writing it out to a temporary file. I have found some silly commercial products to do this. I think this could be accomplished in a matter of hours. Is there a well known method to do this with pipes rather than coding the system calls? I was thinking of using execvp but is it better to replace the current process or to create a child process?
If your users can execute the decryptandrun program, then they can read it (and any files it needs to read such as decryption keys). So they can just extract the code to decrypt the scripts themselves.
You could work around this by making the decrtyptandrun suid. But then any bug in it could lead to the user getting root privileges (or at least privileges to the account that holds the decryption keys). So that's probably not a good idea. And of course, if you've gone to all the trouble of hiding the contents or keys of these decryption scripts by making them not readable to the user... then why can't you do the same with the contents of the scripts you're trying to hide?
Also, you can't have a #! interpreted executable as an interpreter for another #! interpreted executable.
And one of the fundamental rules of cryptography is, don't invent your own encryption algorithm (or tools) unless you're an experienced cryptanalyst.
Which leads me to wonder why you feel the need to encrypt scripts that your users will be running. Is there anything wrong with them seeing the contents of the scripts?
Brian Campbell's answer has the right idea, I'll spell it out:
You need to make your script unreadable but executable by the user (jbloggs), and to make decodeandrun setuid. You could make it setuid root, but it would be much safer to make it setgid for some group decodegroup instead, and then set the script file's group to decodegroup. You need to make sure that decodegroup has both read and execute permissions on the script file and that jbloggs is not a member of this group.
Note that decodegroup needs read permission for decodeandrun to be able to read the text of the script file.
With this setup, it is then possible (on Linux at least) for jbloggs to execute the script but not to look at it. But observe that this makes the decryption process itself unnecessary -- the script file might as well be plaintext, since jbloggs can't read it.
[UPDATE: Just realised that this strategy doesn't handle the case where the encrypted contents is itself a script that starts with #!. Oh well.]
You're solving the wrong problem. The problem is that you have data which you don't want your users to access, and that data's stored in a location to which the users have access. Start by attempting to fix the problem of users with more access than they require...
If you can't protect the whole script, you may want to look into just protecting the data. Move it to a separate location and encrypt it. Encrypt the data with a key only accessible by a specific ID (preferably not root), and write a small suid program to access the data. In your setuid program, do your validation of who should be running the program, and compare the name / checksum of the calling program (you can inspect the command line for the process in combination with the calling process's cwd to find the path, use lsof or the /proc filesystem) with the expected value before decrypting.
If it takes more than that, you really need to reevaluate the state of users on the system - they either have too much access or you have too little trust. :)
All of the exec()-family functions you link to accept a filename, not a memory address. I'm not sure at all how you would go about doing what you want, i.e. "hooking" in a decryption routine and then re-directing to the decrypted script's #! interpreter.
This would require you to decrypt the script into a temporary file, and pass that filename to the exec() call, but you (very reasonably) said you didn't want to expose the script by putting it in a temporary file.
If it were possible to tell the kernel to replace a new process with an existing one in memory, you would have a path to follow, but as far as I know, it isn't. So I don't think it will be very easy to do this "chained" #! following.