how to find source file name from executable? - c

IN LINUX:
Not sure if it is possible. I have 100 source file, and 100 respective executable files.
Now, given the executable file, is it possible to determine, respective source file.

I guess you can give this a try.
readelf -s a.out | grep FILE
I think you can add some grep and sed magic to the above command and get the source file name.

No, since your assumption, that a single binary comes from exactly one source file, is very false.
Most real applications consist of hundreds, if not thousands, of individual source files that are all compiled separately, with the results liked together to form the binary.
If you have non-stripped binaries, or (even better) binaries compiled with debugging information present, then there might (or will, for the case of debugging info) be information left in the file to allow you to figure out the names of the source files, but in general you won't have such binaries unless you build them yourself.

If source filenames are present in an executable, you can find them with:
strings executable | grep '\.c'
But filenames may or may not be present in the executable and they may or may not represent the source filenames.
Change .c to whatever extension you assume the program has been written in.

Your question only makes sense if we presume that it is a given fact that every single one of these 100 executables comes from a single source file, and that you have all those source files and are capable of compiling them all.
What you can do is to declare within each source file a string that looks like "HERE!HERE!>>>" + __FILE__ and then write a utility which searches for "HERE!HERE!>>>" inside the executable and parses the string which follows it. __FILE__ is a preprocessor directive which expands to the full pathname of the source file being compiled.

This kind of help falls in the 'close the barn door after the horse has run away' kind of thing, but it might help future posters.
This is an old problem. UNIX and Linux support the what command which was invented by Mark Rochkind (if I remember correctly), for his version of SCCS. Handles exactly this type of problem. It is only 100% reliable for one source file -> one exectuable (or object file ) kind of thing. There are other more important uses.
char unique_id[] = "#(#)identification information";
The #(#) is called a "what string" and does not occur as a by-product of compiling source into an executable image. Use what from the command line. Inside code use maybe something like this (assumes you get only one file name as an answer, therefore choose your what strings carefully):
char *foo(char *whoami, size_t len_whoami)
{
char tmp[80]={0x0};
FILE *cmd;
sprintf(tmp, "/usr/bin/grep -F -l '%s' /path/to/*.c", unique_id);
cmd=popen(tmp, "r");
fgets(whoami, len_whoami, cmd);
pclose(cmd);
return whoami;
}
will return the source code file name with the same what string from which your executable was built. In other words, exactly what you asked, except I'm sure you never heard of what strings, so they do not exist in your current code base.

Related

How do I bundle/include and use files into an executable in C - linux?

I've googled this for a bit but I don't seem to get anything resembling what I’m asking.
I'm creating a very simple C script which creates a bunch of template files (none of them are code or libraries or anything, they're just txt files), depending on an argument passed through the console, the way I was gonna resolve this is to just use fopen and fwrite, basically assembling the files line by line, replacing only the ones I need, but I figure there must be a way to bundle some files into the code so I can open and replace just what needs to change from case to case.
I imagine i could also create a couple of const char *file_text and that would do the trick, but I'd like to know if what I'm asking is possible should the need to use something harder to work with than text arise.
Problem is I can't seem to find how, to be clear, I want to do something like this on the console:
./Gen.sh ProjectName
And have Gen.sh be a self-contained file, with no need to keep the original templates around on the same folder.
You can use the linker to do such embedding
$ ld -r -b binary cat.png -o cat.o
The object file will have three symbols in it,
$ nm cat.o
_cat_start
_cat_end
_cat_size
To use them from C, declare some extern variables
extern const char cat_start;
extern const char cat_end;
extern const int cat_size;
Then you can link the generated object file ,the same can be used for text files

How to check whether two executable binary files are generated from same source code?

For example, I have two C binary executable files. How can I determine whether the two were generated using same source code or not?
In general, this is completely impossible to do.
You can generate different binaries from the same source
Two identical binaries can be generated from different sources
It is possible to add version information in different ways. However, you can fool all of those methods quite easily if you want.
Here is a short script that might help you. Note that it might have flaws. It's just to show the idea. Don't just copy this and use in production code.
#!/bin/bash
STR="asm(\".ascii \\\"$(md5sum $1)\\\"\");"
NEWNAME=$1.aux.c
cp $1 $NEWNAME
echo $STR >> $NEWNAME
gcc $NEWNAME
What it does is basically to make sure that the md5sum of the source gets included as a string in the binary. It's gcc specific, and you can read more about the idea here: embed string via header that cannot be optimized away

How to check the given file is binary file or not in C programming?

I'm trying to check the given file is binary or not.
I refer the link given below to find the solution,
How can I check if file is text (ASCII) or binary in C
But the given solutions is not working properly, If I pass the .c file as argument, Its not working, It gives wrong output.
The possible files I may pass as argument:
a.out
filename.c
filename.txt
filename.pl
filename.php
So I need to know whether there is any function or way to solve the problem?
Thanks...
Note : [ Incase of any query, Please ask me before down vote ]
You need to clearly define what a binary file is for you, and then base your checking on that.
If you just want to filter based on file extensions, then create a list of the ones you consider binary files, and a list of the ones you don't consider binary files and then check based on that.
If you have a list of known formats rather then file extensions, attempt to parse the file in the formats, and if it doesn't parse / the parse result makes no sense, it's a binary file (for your purposes).
Depending on your OS, binary files begin with a header specifying that they are an executable and contain several informations about them (architecture, size, byte order etc.). So by trying to parse this header, you should know if a file is binary or not. If you are on mac, check the Mach-O file format, if you are on Linux, it should be ELF format. But beware, it's a lot of documentation.

How to find execute files in Linux?

I wants to get the names of execute files in some directory in Linux.
How can I do it?
I tried to use opendir like this:
dir = opendir(directoryName);
I need to get only the names of the execute files.
I programming in C.
thanks :)
You should define what you mean by executable files.
That could be any file with its execute bit (it is the owner, group, or other) set. Then test with access(2) & X_OK and/or use stat(2).
That could also be only ELF executables. See elf(5); then the issue might be to check that a file could indeed be executed, which might be difficult (what about missing library dependencies? or ill-formed ELF files?). Maybe use libelf (and/or libmagic to do the equivalent of file(1) command).
To scan recursively a file tree, use nftw(3); to scan just a directory use opendir(3) & readdir(3) (don't forget the closedir!), then you'll probably need to build the complete file path from each directory entry (perhaps using snprintf(3) or asprintf(3))
See also Advanced Linux Programming

How to stop file names/paths from appearing in compiled C binary

This may be compiler specific, in which case I am using the IAR EWARM 5.50 compiler (firmware development for the STM32 chip).
Our project consists of a bunch of C-code libraries that we compile first, and then the main application which compiles its C-code and then links in those libraries (pretty standard stuff).
However, if I use a hex editor and open up any of the library object files produced or the final application binary, I find a whole bunch of plain text references inside the output binary to the file paths of the C files that were compiled. (eg. I see "C:\Development\trunk\Common\Encryption\SHA_1.c")
Two issues with this:
we don't really want the file paths being easily readable as that indicates our design some what
the size of the binary grows if you have your C-files located in a long subdirectory (the binary contains the full path, not just the name)...this is especially important when we're dealing with firmware that has a limited amount of code space (256KB).
Any thoughts on this? I've tried all the switches in the compiler I can think of to "remove debug information", etc., but those paths are still in there.
"The command-line option --no_path_in_file_macros has been added. It removes the path leaving only the filename for the symbols FILE and BASE_FILE."
It is defined in the release notes if IAR.
http://supp.iar.com/FilesPublic/UPDINFO/005832/arm/doc/infocenter/iccarm_history.ENU.html
Or you can look for FILE and BASE_FILE macros and remove it you do not want to use the flag.

Resources