Today I read this question Any rules about underscores in filenames in C/C++?,
and I found it very interesting that the standard seems to not allow what is usually seen in many libraries (I also do it in my personal library this way):
For example, in opencv we can see this:
// File: opencv/include/opencv2/opencv.hpp
#include "opencv2/opencv_modules.hpp"
But the standard says:
§ 6.10.2 Source file inclusion
Semantics
5 The implementation shall provide unique mappings for sequences
consisting of one or more nondigits or digits (6.4.2.1) followed by a
period (.) and a single nondigit. The first character shall not be
a digit. The implementation may ignore distinctions of alphabetical
case and restrict the mapping to eight significant characters before
the period.
nondigit means letters (A-Z a-z) and underscore _.
It says absolutely nothing about / which would imply that it is forbidden to use a path, not to mention dots or hyphens in file names.
To test this first, I wrote a simple program with a source file test.c and a header file _1.2-3~a.hh in the same directory tst/:
// File: test.c
#include "./..//tst//./_1.2-3~a.hh"
int main(void)
{
char a [10] = "abcdefghi";
char b [5] = "qwert";
strncpy(b, a, 5 - 1);
printf("b: \"%c%c%c%c%c\"\n", b[0], b[1], b[2], b[3], b[4]);
/* printed: b: "abcdt" */
b[5 - 1] = '\0';
printf("b: \"%c%c%c%c%c\"\n", b[0], b[1], b[2], b[3], b[4]);
/* printed: b: "abcd" */
return 0;
}
// File: _1.2-3~a.hh
#include <stdio.h>
#include <string.h>
Which I compiled with this options: $ gcc -std=c11 -pedantic-errors test.c -o tst with no complain from the compiler (I have gcc (Debian 8.2.0-8) 8.2.0).
Is it really forbidden to use a relative path in an include?
Ah; the standard is really talking about the minimum character set of the filesystem supporting the C compiler.
Anything in the "" (or <> with some preprocessing first) is parsed as a string according to normal C rules and passed from there to the OS to do whatever it wants with it.
This leads to compiler errors on Windows when the programmer forgets to type \\ instead of '\' when writing a path into the header files. On modern Windows we can just use '/' and expect it to work but on older Windows or DOS it didn't.
For extra fun, try
#include "/dev/tty"
Really nice one. It wants you to type C code while compiling.
I'd would say it's not forbidden but not recommanded since it will not compile in some of cases there.
For example:
if you clone this directory into your root (so you'd have C:\test\).
if you try to run it in a virtual environment online, you may face issues.
Is it really forbidden to use a path in an include?
Not sure what you mean here: relative paths are commonly used, but using absolute path would be foolish.
Related
By using the objdump command I figured that the address 0x02a8 in memory contains start the path /lib64/ld-linux-x86-64.so.2, and this path ends with a 0x00 byte, due to the C standard.
So I tried to write a simple C program that will print this line (I used a sample from the book "RE for beginners" by Denis Yurichev - page 24):
#include <stdio.h>
int main(){
printf(0x02a8);
return 0;
}
But I was disappointed to get a segmentation fault instead of the expected /lib64/ld-linux-x86-64.so.2 output.
I find it strange to use such a "fast" call of printf without specifiers or at least pointer cast, so I tried to make the code more natural:
#include <stdio.h>
int main(){
char *p = (char*)0x02a8;
printf(p);
printf("\n");
return 0;
}
And after running this I still got a segmentation fault.
I don't believe this is happening because of restricted memory areas, because in the book it all goes well at the 1st try. I am not sure, maybe there is something more that wasn't mentioned in that book.
So need some clear explanation of why the segmentation faults keep happening every time I try running the program.
I'm using the latest fully-upgraded Kali Linux release.
Disappointing to see that your "RE for beginners" book does not go into the basics first, and spits out this nonsense. Nonetheless, what you are doing is obviously wrong, let me explain why.
Normally on Linux, GCC produces ELF executables that are position independent. This is done for security purposes. When the program is run, the operating system is able to place it anywhere in memory (at any address), and the program will work just fine. This technique is called Address Space Layout Randomization, and is a feature of the operating system that nowdays is enabled by default.
Normally, an ELF program would have a "base address", and would be loaded exactly at that address in order to work. However, in case of a position independent ELF, the "base address" is set to 0x0, and the operating system and the interpreter decide where to put the program at runtime.
When using objdump on a position independent executable, every address that you see is not a real address, but rather, an offset from the base of the program (that will only be known at runtime). Therefore it is not possible to know the position of such a string (or any other variable) at runtime.
If you want the above to work, you will have to compile an ELF that is not position independent. You can do so like this:
gcc -no-pie -fno-pie prog.c -o prog
It no longer works like that. The 64-bit Linux executables that you're likely using are position-independent and they're loaded into memory at an arbitrary address. In that case ELF file does not contain any fixed base address.
While you could make a position-dependent executable as instructed by Marco Bonelli it is not how things work for arbitrary executables on modern 64-bit linuxen, so it is more worthwhile to learn to do this with position-independent ones, but it is a bit trickier.
This worked for me to print ELF i.e. the elf header magic, and the interpreter string. This is dirty in that it probably only works for a small executable anyway.
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
int main(){
// convert main to uintptr_t
uintptr_t main_addr = (uintptr_t)main;
// clear bottom 12 bits so that it points to the beginning of page
main_addr &= ~0xFFFLLU;
// subtract one page so that we're in the elf headers...
main_addr -= 0x1000;
// elf magic
puts((char *)main_addr);
// interpreter string, offset from hexdump!
puts((char *)main_addr + 0x318);
}
There is another trick to find the beginning of the ELF executable in memory: the so-called auxiliary vector and getauxval:
The getauxval() function retrieves values from the auxiliary vector,
a mechanism that the kernel's ELF binary loader uses to pass certain
information to user space when a program is executed.
The location of the ELF program headers in memory will be
#include <sys/auxv.h>
char *program_headers = (char*)getauxval(AT_PHDR);
The actual ELF header is 64 bytes long, and the program headers start at byte 64 so if you subtract 64 from this you will get a pointer to the magic string again, therefore our code can be simplified to
#include <stdio.h>
#include <inttypes.h>
#include <sys/auxv.h>
int main(){
char *elf_header = (char *)getauxval(AT_PHDR) - 0x40;
puts(elf_header + 0x318); // or whatever the offset was in your executable
}
And finally, an executable that figures out the interpreter position from the ELF headers alone, provided that you've got a 64-bit ELF, magic numbers from Wikipedia...
#include <stdio.h>
#include <inttypes.h>
#include <sys/auxv.h>
int main() {
// get pointer to the first program header
char *ph = (char *)getauxval(AT_PHDR);
// elf header at this position
char *elfh = ph - 0x40;
// segment type 0x3 is the interpreter;
// program header item length 0x38 in 64-bit executables
while (*(uint32_t *)ph != 3) ph += 0x38;
// the offset is 64 bits at 0x8 from the beginning of the
// executable
uint64_t offset = *(uint64_t *)(ph + 0x8);
// print the interpreter path...
puts(elfh + offset);
}
I guess it segfaults because of the way you use printf: you dont use the format parameter how it is designed to be.
When you want to use the printf function to read data the first argument it takes is a string that will format how the display will work int printf(char *fmt , ...) "the ... represent the data you want to display accordingly to the format string parameter
so if you want to print a string
//format as text
printf("%s\n", pointer_to_beginning_of_string);
//
If this does not work cause it probably will it is because you are trying to read memory that you are not supposed to access.
try adding extra flags " -Werror -Wextra -Wall -pedantic " with your compiler and show us the errors please.
I'm running OS X Sierra and trying to compile a c program that uses strcpy_s, but my installed clang compiler is using the c99 standard, but from what I've read strcpy_s requires c11.
Here's the code I'm trying to compile
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
char source[] = "Test string";
char destination[50];
if(strcpy_s(destination, sizeof(destination), source))
printf("string copied - %s",destination);
return 0;
}
And here's the command I'm using to compile
$ clang copytest.c -o copytest
copytest.c:11:5: warning: implicit declaration of function 'strcpy_s' is invalid in C99 [-Wimplicit-function-declaration]
if(strcpy_s(copied_string, sizeof(copied_string), source))
^
1 warning generated.
Undefined symbols for architecture x86_64:
"_strcpy_s", referenced from:
_main in copytest-e1e05a.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
I've tried compiling with the standard flag...
clang -std=c11 copytest.c -o copytest
but I get the same exact "invalid in c99" warning. I've also tried compiling with gcc instead, and I still get the same c99 warning.
I tried upgrading via homebrew which shows the following
Warning: gcc 9.2.0 is already installed and up-to-date
I have clang version 9.0.0
$ clang -v
Apple LLVM version 9.0.0 (clang-900.0.39.2)
My xcode version is Xcode 9.2, which from everything I've read should come with c11 support.
Am I doing something wrong with the compiling, is my code itself incorrect? This is the only similar question I found on here, but it didn't even have an answer. Thanks
The _s functions are an optional component of the 2011 C standard (Annex K), and, to the best of my knowledge, they have never been implemented as an integrated part of any C library. Portable code cannot rely on their availability. (Microsoft's C compilers for Windows implement an overlapping set of functions with the same names but different semantics (and sometimes even a different argument list), and at least one bolt-on implementation does exist. See this old answer, and the much longer question and answer it links to, for more detail.)
Also, the _s functions do not solve the problem that they were intended to solve (unsafe string handling); it is necessary to put actual thought into a proper fix for each use of strcpy, instead of globally search-and-replacing strcpy with strcpy_s, etc., as was the hope of the authors of Annex K. If you do put appropriate amounts of thought into a proper fix, you won't need any of the _s functions to implement it. For instance, here's a fixed version of your example program:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
char source[] = "Test string";
char destination[50];
size_t srclen = strlen(source);
if (srclen + 1 > sizeof destination) {
fprintf(stderr, "string too long to copy - %zu bytes, need %zu\n",
sizeof destination, srclen + 1);
return 1;
} else {
memcpy(destination, source, srclen + 1);
printf("string copied - %s\n", destination);
return 0;
}
}
And here's an even better version:
#define _XOPEN_SOURCE 700
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
if (argc != 2) {
fprintf(stderr, "usage: ./test 'message of arbitrary length'\n");
return 1;
}
char *destination = strdup(argv[1]);
if (!destination) {
perror("strdup");
return 1;
}
printf("string copied - '%s'\n", destination);
free(destination);
return 0;
}
Therefore: Never use any of the _s functions. If you need to write a program that compiles on Windows with no warnings, put #define _CRT_SECURE_NO_WARNINGS 1 at the top of each file to make MSVC stop giving you bad advice.
If all, or even most, programmers wrote the suggested solutions above all the time, then these functions wouldn't be needed. We have a lot of evidence that many programmers do not write such careful code, going back to Spaf's notes on the Robert T Morris finger worm in the late 1980's.
You also would prefer not to have to duplicate 10 lines of code for every call site of strcpy. That leads to unreadable code. More so, what zwol suggests is really just an implementation of the function he claims we don't need. A good programmer would take that, stick it in a header, and name it something helpful, maybe checked_strcpy? Or even strcpy_s?
The second suggested implementation, which is purportedly better is not - it would cause an allocation when we might already have a buffer. Allocations are expensive, using this approach everywhere would be bad for perf. It also introduces new complexity because now we'd have to free every duplicated string - imagine doing that with repeated calls to strcat.
There is a fairly nicely done cross-platform implementation here:
https://github.com/intel/safestringlib
I'm also not sure whether this is actually any different, but worth taking a look - https://github.com/coruus/safeclib
On Mac Os X 10.6.8 I can't compile code using wchar_t functions from the standard library until I have resolved this.
The wcscoll function, together with a bunch of others:
inttypes.h:#pragma GCC poison wcstoimax wcstoumax
stdlib.h:#pragma GCC poison mbstowcs mbtowc wcstombs wctomb
wchar.h:#pragma GCC poison fgetws fputwc fputws fwprintf fwscanf mbrtowc mbsnrtowcs >mbsrtowcs putwc putwchar swprintf swscanf vfwprintf vfwscanf vswprintf vswscanf vwprintf >vwscanf wcrtomb wcscat wcschr wcscmp wcscoll wcscpy wcscspn wcsftime wcsftime wcslcat >wcslcpy wcslen wcsncat wcsncmp wcsncpy wcsnrtombs wcspbrk wcsrchr wcsrtombs wcsspn wcsstr >wcstod wcstof wcstok wcstol wcstold wcstoll wcstoul wcstoull wcswidth wcsxfrm wcwidth >wmemchr wmemcmp wmemcpy wmemmove wmemset wprintf wscanf
#include <stdio.h>
#include <wchar.h>
#include <string.h>
#include <locale.h>
#include <stdlib.h>
extern int errno;
int main(void)
{
wchar_t pwcs1[3]={L"ØL"}, pwcs2[3]={L"Ål"};
size_t n;
(void)setlocale(LC_ALL, "");
/* set it to zero for checking errors on wcscoll */
errno = 0;
/*
** Let pwcs1 and pwcs2 be two wide character strings to
** compare.
*/
/* n = wcscmp(pwcs1, pwcs2); */
n = wcscoll(pwcs1, pwcs2);
/*
** If errno is set then it indicates some
** collation error.
*/
if (n < 0 ) {
printf("%s\n","Øl mindre en Ål" );
} else if (n == 0) {
printf("%s\n","Øl lik Ål" );
} else {
printf("%s\n","Øl større en Ål" );
}
if(errno != 0){
/* error has occurred... handle error ...*/
}
}
How do I resolve this?
I am a little bit reluctant to mess with the standard library. But I guess I maybe can compile the GNU C library, if Apple hasn't a fix for it? Or is there any other suitable alternatives amongst libraries for handling wide characters (Utf-8).
I am porting something ancient, so I really need to use ncurses, and in order to use ncurses, I need wide characters! :)
Edit: The standard includepath should, as I have understood it be /usr/include. I have been through the include directories of the SDK's I have, and a grep through the header files there reveals the same poison pragma's, as did the latest tarball from http://opensource.apple.com/tarballs/Libc/
Edit++
Hindsightly, those pragmas are there for a reason, and I was looking for alternatives, so right now, I am trying to build glibc, just downloaded, and I have inspected the headers, which are without any "GCC poison" pragmas.
Having read up a little bit, in the configure file of glibc, I guess that isn't an easy option. I guess I'll have to dissect something that works with utf-8 and uses ncurses on mac osX to figure out how.
It might be that I am just overlooking an easy solution. But ncurses falls back on 7-bit ascii, and that is my problem. My goal is to render utf-8 language specific characters, while using ncurses. I need to be able to sort since the format is "propritary" with indexing, forking out a system call to sort records is no option. I also need to be able to know how many codepoints that are in a string of some kind for field-editing, insertion and removal of characters from the display with ncurses.
Thanks!
So far it seems that the ICU library looks promising: I think I will pursue a solution with the ICU library, that as far as I know are shipped with Mac Os X. http://icu-project.org/apiref/icu4c/
I'd like to use C program to find the total number of directives like #include, #define, #ifdef, #typedef, etc. Could you suggest any logic for that? I'm not interested in using any scripting or tools. I want it to be done purely using C program.
Store all the directives in an array of pointers (or arrays).
Read the C file line by line and check if the first word starts with any of the directives in the list excluding any whitespaces at the beginning.
char *directives[]={"#assert", "#define#, ......};
int count[NUM_DIRS]= { 0 };
Everytime you find a match increment the correspondin index of the count array. You can also maintain another counter for total to avoid adding values in count array.
Assuming you don't want to parse them, or any other kind of syntactic/semantic analysis, you can simply count the number of lines which start with 0 or more whitespace characters and then a # character (losely tested, should work fine):
#include <stdio.h>
#include <ctype.h>
int main(int argc, char *argv[])
{
FILE *f = fopen(argv[1], "r");
char line[1024];
unsigned ncppdirs = 0;
while (feof(f) == 0) {
fgets(line, sizeof(line), f);
char *p = line;
while (isspace(*p))
p++;
if (*p == '#') ncppdirs++;
}
printf("%u preprocessor directives found\n", ncppdirs);
return 0;
}
You might take advantage that gcc -H is showing you every included file, then you might popen that command, and (simply) parse its output.
You could also parse the preprocessed output, given by gcc -C -E ; it contains line information -as lines starting with #
Counting just lexically the occurrences of #include is not enough, because it does happen (quite often, actually, see what does <features.h>) that some included files do tricks like
#if SOME_SYMBOL > 2
#include "some-internal-header.h"
#define SOME_OTHER_SYMBOL (SOME_SYMBOL+1)
#endif
and some later include would have #if SOME_OTHER_SYMBOL > 4
And the compilation command might BTW define SOME_SYMBOL with e.g. gcc -DSOME_SYMBOL=3 (and such tricks happen a lot, often in Makefile-s, and just optimizing with -O2 makes __OPTIMIZE__ a preprocessor defined symbol).
If you want some more deep information about source programs, consider making GCC plugins or extensions, e.g. with MELT (a domain specific language to extend GCC). For instance, counting Gimple instructions in the intermediate representation is more sensible than counting lines of code.
Also, some macros might do some typedef; some programs may have
#define MYSTRUCTYPE(Name) typedef struct Name##_st Name##_t;
and later use e.g. MYSTRUCTYPE(point); what does that mean about counting typedef-s?
I'm attempting to add wchar_t Unicode characters to an ncurses display in C.
I have an array:
wchar_t characters[]={L'\uE030', L'\uE029'}; // containing 2 thai letters, for example
And I later try to add a wchar_t from the array to the ncurses display with:
add_wch(characters[0]);
To provide a bit more info, doing this with ASCII works ok, using:
char characters[]={'A', 'B'};
// and later...
addch(characters[0]);
To setup the locale, I add the include...
#include <locale.h>
// in main()
setlocale(LC_CTYPE,"C-UTF-8");
The ncurses include is:
#include <ncurses.h>
Compiling with :
(edit: added c99 standard, for universal char name support.)
gcc -o ncursesutf8 ncursesutf8.c -lm -lncurses -Wall -std=c99
I get the following compilation warning (of course the executable will fail):
ncursesutf8.c:48: warning: implicit declaration of function ‘add_wch’
I've tried just using addch which appears to be macro'ed to work with wchar_t but when I do that the Unicode chars do not show up, instead they show as ASCII chars instead.
Any thoughts?
I am using OS X Snow Leopard, 10.6.6
Edit: removed error on wchar_t [] assignment to use L'\u0E30' instead of L"\u0E30" etc.
I've also updated the compiler settings to use C99 (to add universal char name support). both changes do not fix the problem.
Still no answers on this, does anyone know how to do Unicode ncurses addchar (add_wchar?) ?! Help!
The wide character support is handled by ncursesw. Depending on your distro, ncurses may or may not point there (seemingly not in yours).
Try using -lncursesw instead of -lncurses.
Also, for the locale, try calling setlocale(LC_ALL, "")
This is not 2 characters:
wchar_t characters[]={L"\uE030", L"\uE029"};
You're trying to initialize wchar_t (integer) values with pointers, which should result in an error from the compiler. Either use:
wchar_t characters[]={L'\uE030', L'\uE029'};
or
wchar_t characters[]=L"\uE030\uE029";
cchar_t is defined as:
typedef struct {
attr_t attr;
wchar_t chars[CCHARW_MAX];
} cchar_t;
so you might try:
int add_wchar(int c)
{
cchar_t t = {
0, // .attr
{c, 0} // not sure how .chars works, so best guess
};
return add_wch(t);
}
not at all tested, but should work.
Did you define _XOPEN_SOURCE_EXTENDED before including the ncurses header?