Counting the number of # includes and # define - c

I'd like to use C program to find the total number of directives like #include, #define, #ifdef, #typedef, etc. Could you suggest any logic for that? I'm not interested in using any scripting or tools. I want it to be done purely using C program.

Store all the directives in an array of pointers (or arrays).
Read the C file line by line and check if the first word starts with any of the directives in the list excluding any whitespaces at the beginning.
char *directives[]={"#assert", "#define#, ......};
int count[NUM_DIRS]= { 0 };
Everytime you find a match increment the correspondin index of the count array. You can also maintain another counter for total to avoid adding values in count array.

Assuming you don't want to parse them, or any other kind of syntactic/semantic analysis, you can simply count the number of lines which start with 0 or more whitespace characters and then a # character (losely tested, should work fine):
#include <stdio.h>
#include <ctype.h>
int main(int argc, char *argv[])
{
FILE *f = fopen(argv[1], "r");
char line[1024];
unsigned ncppdirs = 0;
while (feof(f) == 0) {
fgets(line, sizeof(line), f);
char *p = line;
while (isspace(*p))
p++;
if (*p == '#') ncppdirs++;
}
printf("%u preprocessor directives found\n", ncppdirs);
return 0;
}

You might take advantage that gcc -H is showing you every included file, then you might popen that command, and (simply) parse its output.
You could also parse the preprocessed output, given by gcc -C -E ; it contains line information -as lines starting with #
Counting just lexically the occurrences of #include is not enough, because it does happen (quite often, actually, see what does <features.h>) that some included files do tricks like
#if SOME_SYMBOL > 2
#include "some-internal-header.h"
#define SOME_OTHER_SYMBOL (SOME_SYMBOL+1)
#endif
and some later include would have #if SOME_OTHER_SYMBOL > 4
And the compilation command might BTW define SOME_SYMBOL with e.g. gcc -DSOME_SYMBOL=3 (and such tricks happen a lot, often in Makefile-s, and just optimizing with -O2 makes __OPTIMIZE__ a preprocessor defined symbol).
If you want some more deep information about source programs, consider making GCC plugins or extensions, e.g. with MELT (a domain specific language to extend GCC). For instance, counting Gimple instructions in the intermediate representation is more sensible than counting lines of code.
Also, some macros might do some typedef; some programs may have
#define MYSTRUCTYPE(Name) typedef struct Name##_st Name##_t;
and later use e.g. MYSTRUCTYPE(point); what does that mean about counting typedef-s?

Related

Self-replicating code, how to implement different behavior in first iteration vs following ones?

So I'm having a tough time with a school project. The goal is to make a self-replicating code, name Sully.c. That program must output it's own source code (it's a quine) into a program named Sully_x.c, where x is an integer in the source code, then compile said program and execute it iff x > 0. x must decrement from one copy to the next, but not from the original Sully.c to Sully_5.c.
Here is my code so far:
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int k = 5;
#define F1 int main(void){int fd = open("Sully_5.c", 0);if(fd != -1){close(fd);k-=1;}char buff[62];(sprintf)(buff, "Sully_%d.c", k);FILE *f = fopen(buff, "w");fprintf(f, "#include <fcntl.h>\n#include <stdio.h>\n#include <stdlib.h>\n#include <unistd.h>\nint k = %d;\n#define F1 %s\n#define F2(x) #x\n#define F3(x) F2(x)\nconst char *s = F3(F1);\nF1\n", k, s);fclose(f);(sprintf)(buff, "gcc -Wall -Wextra -Werror Sully_%d.c -o Sully_%d", k, k);system(buff);if (k != 0){(sprintf)(buff, "./Sully_%d", k);system(buff);}return 0;}
#define F2(x) #x
#define F3(x) F2(x)
const char *s = F3(F1);
F1
That code works, and checks all the requirements for the program. However, I'm using a method that checks something other than the code itself -> I'm checking if sully_5.c already exists or not. If it doesn't, x doesn't move, if it does, then it is decremented.
Another method would have been to use argv[0] or the macro __FILE__, but both these options are explicitly forbidden for the assignment and considered cheating.
But, apparently there are other methods that doesn't require any of the above technique. I can't think of any, because if Sully.c and Sully_5.c need different behaviors but the same source code, than there must be an external variable that needs to influence the code behavior, or so is my hypothesis.
Am I right? Wrong? How else could this be done?
... there must be an external variable that needs to influence the code behavior
How else could this be done?
You can define or not some preprocessing variables (e.g. -Daze or -Daze=12 etc) to generate a different code using conditional compilation without changing the source
The execution can also use the argument(s) given to the program when it is run to change its behavior

path: valid or not in `#include`?

Today I read this question Any rules about underscores in filenames in C/C++?,
and I found it very interesting that the standard seems to not allow what is usually seen in many libraries (I also do it in my personal library this way):
For example, in opencv we can see this:
// File: opencv/include/opencv2/opencv.hpp
#include "opencv2/opencv_modules.hpp"
But the standard says:
§ 6.10.2 Source file inclusion
Semantics
5 The implementation shall provide unique mappings for sequences
consisting of one or more nondigits or digits (6.4.2.1) followed by a
period (.) and a single nondigit. The first character shall not be
a digit. The implementation may ignore distinctions of alphabetical
case and restrict the mapping to eight significant characters before
the period.
nondigit means letters (A-Z a-z) and underscore _.
It says absolutely nothing about / which would imply that it is forbidden to use a path, not to mention dots or hyphens in file names.
To test this first, I wrote a simple program with a source file test.c and a header file _1.2-3~a.hh in the same directory tst/:
// File: test.c
#include "./..//tst//./_1.2-3~a.hh"
int main(void)
{
char a [10] = "abcdefghi";
char b [5] = "qwert";
strncpy(b, a, 5 - 1);
printf("b: \"%c%c%c%c%c\"\n", b[0], b[1], b[2], b[3], b[4]);
/* printed: b: "abcdt" */
b[5 - 1] = '\0';
printf("b: \"%c%c%c%c%c\"\n", b[0], b[1], b[2], b[3], b[4]);
/* printed: b: "abcd" */
return 0;
}
// File: _1.2-3~a.hh
#include <stdio.h>
#include <string.h>
Which I compiled with this options: $ gcc -std=c11 -pedantic-errors test.c -o tst with no complain from the compiler (I have gcc (Debian 8.2.0-8) 8.2.0).
Is it really forbidden to use a relative path in an include?
Ah; the standard is really talking about the minimum character set of the filesystem supporting the C compiler.
Anything in the "" (or <> with some preprocessing first) is parsed as a string according to normal C rules and passed from there to the OS to do whatever it wants with it.
This leads to compiler errors on Windows when the programmer forgets to type \\ instead of '\' when writing a path into the header files. On modern Windows we can just use '/' and expect it to work but on older Windows or DOS it didn't.
For extra fun, try
#include "/dev/tty"
Really nice one. It wants you to type C code while compiling.
I'd would say it's not forbidden but not recommanded since it will not compile in some of cases there.
For example:
if you clone this directory into your root (so you'd have C:\test\).
if you try to run it in a virtual environment online, you may face issues.
Is it really forbidden to use a path in an include?
Not sure what you mean here: relative paths are commonly used, but using absolute path would be foolish.

Check if a system implements a function

I'm creating a cross-system application. It uses, for example, the function itoa, which is implemented on some systems but not all. If I simply provide my own itoa implementation:
header.h:115:13: error: conflicting types for 'itoa'
extern void itoa(int, char[]);
In file included from header.h:2:0,
from file.c:2:0,
c:\path\to\mingw\include\stdlib.h:631:40: note: previous declaration of 'itoa' was here
_CRTIMP __cdecl __MINGW_NOTHROW char* itoa (int, char*, int);
I know I can check if macros are predefined and define them if not:
#ifndef _SOME_MACRO
#define _SOME_MACRO 45
#endif
Is there a way to check if a C function is pre-implemented, and if not, implement it? Or to simply un-implement a function?
Given you have already written your own implementation of itoa(), I would recommend that you rename it and use it everywhere. At least you are sure you will get the same behavior on all platforms, and avoid the linking issue.
Don't forget to explain your choice in the comments of your code...
I assume you are using GCC, as I can see MinGW in your path... there's one way the GNU linker can take care of this for you. So you don't know whether there is an itoa implementation or not. Try this:
Create a new file (without any headers) called my_itoa.c:
char *itoa (int, char *, int);
char *my_itoa (int a, char *b, int c)
{
return itoa(a, b, c);
}
Now create another file, impl_itoa.c. Here, write the implementation of itoa but add a weak alias:
char* __attribute__ ((weak)) itoa(int a, char *b, int c)
{
// implementation here
}
Compile all of the files, with impl_itoa.c at the end.
This way, if itoa is not available in the standard library, this one will be linked. You can be confident about it compiling whether or not it's available.
Ajay Brahmakshatriya's suggestion is a good one, but unfortunately MinGW doesn't support weak definition last I checked (see https://groups.google.com/forum/#!topic/mingwusers/44B4QMPo8lQ, for instance).
However, I believe weak references do work in MinGW. Take this minimal example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
__attribute__ ((weak)) char* itoa (int, char*, int);
char* my_itoa (int a, char* b, int c)
{
if(itoa != NULL) {
return itoa(a, b, c);
} else {
// toy implementation for demo purposes
// replace with your own implementation
strcpy(b, "no itoa");
return b;
}
}
int main()
{
char *str = malloc((sizeof(int)*3+1));
my_itoa(10, str, 10);
printf("str: %s\n", str);
return 0;
}
If the system provides an itoa implementation, that should be used and the output would be
str: 10
Otherwise, you'll get
str: no itoa
There are two really important related points worth making here along the "don't do it like this" lines:
Don't use atoi because it's not safe.
Don't use atoi because it's not a standard function, and there are good standard functions (such as snprintf) which are available to do what you want.
But, putting all this aside for one moment, I want to introduce you to autoconf, part of the GNU build system. autoconf is part of a very comprehensive, very portable set of tools which aim to make it easier to write code which can be built successfully on a wide range of target systems. Some would argue that autoconf is too complex a system to solve just the one problem you pose with just one library function, but as any program grows, it's likely to face more hurdles like this, and getting autoconf set up for your program now will put you in a much stronger position for the future.
Start with a file called Makefile.in which contains:
CFLAGS=--ansi --pedantic -Wall -W
program: program.o
program.o: program.c
clean:
rm -f program.o program
and a file called configure.ac which contains:
AC_PREREQ([2.69])
AC_INIT(program, 1.0)
AC_CONFIG_SRCDIR([program.c])
AC_CONFIG_HEADERS([config.h])
# Checks for programs.
AC_PROG_CC
# Checks for library functions.
AH_TEMPLATE([HAVE_ITOA], [Set to 1 if function atoi() is available.])
AC_CHECK_FUNC([itoa],
[AC_DEFINE([HAVE_ITOA], [1])]
)
AC_CONFIG_FILES([Makefile])
AC_OUTPUT
and a file called program.c which contains:
#include <stdio.h>
#include "config.h"
#ifndef HAVE_ITOA
/*
* WARNING: This code is for demonstration purposes only. Your
* implementation must have a way of ensuring that the size of the string
* produced does not overflow the buffer provided.
*/
void itoa(int n, char* p) {
sprintf(p, "%d", n);
}
#endif
int main(void) {
char buffer[100];
itoa(10, buffer);
printf("Result: %s\n", buffer);
return 0;
}
Now run the following commands in turn:
autoheader: This generates a new file called config.h.in which we'll need later.
autoconf: This generates a configuration script called configure
./configure: This runs some tests, including checking that you have a working C compiler and, because we've asked it to, whether an itoa function is available. It writes its results into the file config.h for later.
make: This compiles and links the program.
./program: This finally runs the program.
During the ./configure step, you'll see quite a lot of output, including something like:
checking for itoa... no
In this case, you'll see that the config.h find contains the following lines:
/* Set to 1 if function atoi() is available. */
/* #undef HAVE_ITOA */
Alternatively, if you do have atoi available, you'll see:
checking for itoa... yes
and this in config.h:
/* Set to 1 if function atoi() is available. */
#define HAVE_ITOA 1
You'll see that the program can now read the config.h header and choose to define itoa if it's not present.
Yes, it's a long way round to solve your problem, but you've now started using a very powerful tool which can help you in a great number of ways.
Good luck!

How to prints the built in functions name used in our program using a specific header file in C?

I need to find the built-in functions used in our program from a specific header file.
For example, I have the C file below:
#include<stdio.h>
int main()
{
int a;
scanf("%d",&a);
printf("a = %d\n", a);
}
If I given the stdio.h header file to any command, it needs to give the output as below:
scanf
printf
Is there any built-in command to get this?
Or any options available in the gcc or cc command to get this?
If you are using GCC as compiler, you can run this command:
echo "#include <stdio.h>" | gcc -E -
This will print many lines from the stdio.h header, and from the files that are included by that header, and so on.
Some lines look like #line …, they tell you where the following lines come from.
You can analyze these lines, but extracting the functions from them (parsing) is quite complicated. But if you just want a quick, unreliable check, you could search whether these lines contain the word scanf or printf.
EDIT
As suggested in a comment, the -aux-info is more useful, but it works only when compiling a file, not when preprocessing. Therefore:
cat <<EOF >so.c
#include <stdio.h>
int main(int argc, char **argv) {
for (int i = 1; i < argc; i++) {
fprintf(stdout, "%s%c", argv[i], i < argc - 1 ? ' ' : '\n');
}
fflush(stdout);
return ferror(stdout) == -1;
}
EOF
gcc -c so.c -aux-info so.aux
Determining the function calls from your program can be done using objdump, as follows:
objdump -t so.c
The above commands give you the raw data. You still need to parse this data and combine it to only give you the data relevant to your question.

Executing machine code in memory

I'm trying to figure out how to execute machine code stored in memory.
I have the following code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[])
{
FILE* f = fopen(argv[1], "rb");
fseek(f, 0, SEEK_END);
unsigned int len = ftell(f);
fseek(f, 0, SEEK_SET);
char* bin = (char*)malloc(len);
fread(bin, 1, len, f);
fclose(f);
return ((int (*)(int, char *)) bin)(argc-1, argv[1]);
}
The code above compiles fine in GCC, but when I try and execute the program from the command line like this:
./my_prog /bin/echo hello
The program segfaults. I've figured out the problem is on the last line, as commenting it out stops the segfault.
I don't think I'm doing it quite right, as I'm still getting my head around function pointers.
Is the problem a faulty cast, or something else?
You need a page with write execute permissions. See mmap(2) and mprotect(2) if you are under unix. You shouldn't do it using malloc.
Also, read what the others said, you can only run raw machine code using your loader. If you try to run an ELF header it will probably segfault all the same.
Regarding the content of replies and downmods:
1- OP said he was trying to run machine code, so I replied on that rather than executing an executable file.
2- See why you don't mix malloc and mman functions:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/mman.h>
int main()
{
char *a=malloc(10);
char *b=malloc(10);
char *c=malloc(10);
memset (a,'a',4095);
memset (b,'b',4095);
memset (c,'c',4095);
puts (a);
memset (c,0xc3,10); /* return */
/* c is not alligned to page boundary so this is NOOP.
Many implementations include a header to malloc'ed data so it's always NOOP. */
mprotect(c,10,PROT_READ|PROT_EXEC);
b[0]='H'; /* oops it is still writeable. If you provided an alligned
address it would segfault */
char *d=mmap(0,4096,PROT_READ|PROT_WRITE|PROT_EXEC,MAP_PRIVATE|MAP_ANON,-1,0);
memset (d,0xc3,4096);
((void(*)(void))d)();
((void(*)(void))c)(); /* oops it isn't executable */
return 0;
}
It displays exactly this behavior on Linux x86_64 other ugly behavior sure to arise on other implementations.
Using malloc works fine.
OK this is my final answer, please note I used the orignal poster's code.
I'm loading from disk, the compiled version of this code to a heap allocated area "bin", just as the orignal code did (the name is fixed not using argv, and the value 0x674 is from;
objdump -F -D foo|grep -i hoho
08048674 <hohoho> (File Offset: 0x674):
This can be looked up at run time with the BFD (Binary File Descriptor library) or something else, you can call other binaries (not just yourself) so long as they are statically linked to the same set of lib's.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
unsigned char *charp;
unsigned char *bin;
void hohoho()
{
printf("merry mas\n");
fflush(stdout);
}
int main(int argc, char **argv)
{
int what;
charp = malloc(10101);
memset(charp, 0xc3, 10101);
mprotect(charp, 10101, PROT_EXEC | PROT_READ | PROT_WRITE);
__asm__("leal charp, %eax");
__asm__("call (%eax)" );
printf("am I alive?\n");
char *more = strdup("more heap operations");
printf("%s\n", more);
FILE* f = fopen("foo", "rb");
fseek(f, 0, SEEK_END);
unsigned int len = ftell(f);
fseek(f, 0, SEEK_SET);
bin = (char*)malloc(len);
printf("read in %d\n", fread(bin, 1, len, f));
printf("%p\n", bin);
fclose(f);
mprotect(&bin, 10101, PROT_EXEC | PROT_READ | PROT_WRITE);
asm volatile ("movl %0, %%eax"::"g"(bin));
__asm__("addl $0x674, %eax");
__asm__("call %eax" );
fflush(stdout);
return 0;
}
running...
co tmp # ./foo
am I alive?
more heap operations
read in 30180
0x804d910
merry mas
You can use UPX to manage the load/modify/exec of a file.
P.S. sorry for the previous broken link :|
It seems to me you're loading an ELF image and then trying to jump straight into the ELF header? http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
If you're trying to execute another binary, why don't you use the process creation functions for whichever platform you're using?
An typical executable file has:
a header
entry code that is called before main(int, char **)
The first means that you can't generally expect byte 0 of the file to be executable; intead, the information in the header describes how to load the rest of the file in memory and where to start executing it.
The second means that when you have found the entry point, you can't expect to treat it like a C function taking arguments (int, char **). It may, perhaps, be usable as a function taking no paramters (and hence requiring nothing to be pushed prior to calling it). But you do need to populate the environment that will in turn be used by the entry code to construct the command line strings passed to main.
Doing this by hand under a given OS would go into some depth which is beyond me; but I'm sure there is a much nicer way of doing what you're trying to do. Are you trying to execute an external file as a on-off operation, or load an external binary and treat its functions as part of your program? Both are catered for by the C libraries in Unix.
It is more likely that that it is the code that is jumped to by the call through function-pointer that is causing the segfault rather than the call itself. There is no way from the code you have posted to determine that that code loaded into bin is valid. Your best bet is to use a debugger, switch to assembler view, break on the return statement and step into the function call to determine that the code you expect to run is indeed running, and that it is valid.
Note also that in order to run at all the code will need to be position independent and fully resolved.
Moreover if your processor/OS enables data execution prevention, then the attempt is probably doomed. It is at best ill-advised in any case, loading code is what the OS is for.
What you are trying to do is something akin to what interpreters do. Except that an interpreter reads a program written in an interpreted language like Python, compiles that code on the fly, puts executable code in memory and then executes it.
You may want to read more about just-in-time compilation too:
Just in time compilation
Java HotSpot JIT runtime
There are libraries available for JIT code generation such as the GNU lightning and libJIT, if you are interested. You'd have to do a lot more than just reading from file and trying to execute code, though. An example usage scenario will be:
Read a program written in a scripting-language (maybe
your own).
Parse and compile the source into an
intermediate language understood by
the JIT library.
Use the JIT library to generate code
for this intermediate
representation, for your target platform's CPU.
Execute the JIT generated code.
And for executing the code you'd have to use techniques such as using mmap() to map the executable code into the process's address space, marking that page executable and jumping to that piece of memory. It's more complicated than this, but its a good start in order to understand what's going on beneath all those interpreters of scripting languages such as Python, Ruby etc.
The online version of the book "Linkers and Loaders" will give you more information about object file formats, what goes on behind the scenes when you execute a program, the roles of the linkers and loaders and so on. It's a very good read.
You can dlopen() a file, look up the symbol "main" and call it with 0, 1, 2 or 3 arguments (all of type char*) via a cast to pointer-to-function-returning-int-taking-0,1,2,or3-char*
Use the operating system for loading and executing programs.
On unix, the exec calls can do this.
Your snippet in the question could be rewritten:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc, char* argv[])
{
return execv(argv[1],argv+2);
}
Executable files contain much more than just code. Header, code, data, more data, this stuff is separated and loaded into different areas of memory by the OS and its libraries. You can't load a program file into a single chunk of memory and expect to jump to it's first byte.
If you are trying to execute your own arbitrary code, you need to look into dynamic libraries because that is exactly what they're for.

Resources