unistd_64 as my understanding (with lots of limited) contains the system call number. When I search the file from terminal, it shows more than one results under different directories as below:
/usr/include/x86_64-linux-gnu/asm/unistd_64.h
/usr/src/linux-headers-3.5.0-23/arch/sh/include/asm/unistd_64.h
/usr/src/linux-headers-3.5.0-23-generic/arch/x86/include/generated/ asm/.unistd_64.h.cmd
/usr/src/linux-headers-3.5.0-23-generic/arch/x86/include/generated/asm/unistd_64.h
I don't understand the difference between these files and the use of each file. And the file number 3 has .cmd, what does it mean?
If you are writing an ordinary C program that needs to know system call numbers, you should not use any of those headers. Instead, you should use <sys/syscall.h>. Your C program does not need to know the full pathname of this header; #include <sys/syscall.h> is all that is necessary. However, if you want to read it, it will be found somewhere in /usr/include, probably either /usr/include/sys/syscall.h or /usr/include/x86_64-linux-gnu/sys/syscall.h.
Now, I will explain the files you found:
/usr/include/x86_64-linux-gnu/asm/unistd_64.h: This is a header file that may be used internally by sys/syscall.h. You can read it, but do not include it directly in your program. It probably defines a whole bunch of names that begin with __NR_. Those names should never be used in an ordinary, "userspace" program: always use the names beginning with SYS_ instead.
/usr/src/linux-headers-3.5.0-23/arch/sh/include/asm/unistd_64.h and /usr/src/linux-headers-3.5.0-23-generic/arch/x86/include/generated/asm/unistd_64.h: These are private kernel headers. They exist for the sake of people trying to build kernel modules that are developed separately from the kernel proper. It's possible that one of them is textually the same as /usr/include/x86_64-linux-gnu/asm/unistd_64 but that is not something you should rely on.
/usr/src/linux-headers-3.5.0-23-generic/arch/x86/include/generated/ asm/.unistd_64.h.cmd: This is not a header file at all, it is used by the Linux kernel's build system.
The first file, which resides in /usr/include (the system include directory) is the one you would include.
The others reside in /usr/src, which is a source code directory that should not be referenced.
Related
So I'm going through CS50 introduction course.
And I'm confused about the fact that if you write in IDE #include some header file, which tells computer to find some library, and your compiler will find that code and combine it with your code.
But then how does compiler find that code? Like I just type in #include <cs50.h> for example. But how does it find that code when I don't have it on my PC? Why would I have it without finding it online and downloading it beforehand? Does it look online and then download file, which it uses now and in future for my programs when I call #include? And if so does it mean that you can't properly program without connection to internet?
And im confused about the fact that if you write in ide #include some header file which tells computer to find some library,
Be careful with terminology. #include tells the compiler to find some header. The word library in C usually refers to the file with compiled code (.a, .lib, .so, .dll).
and your compiler will find that code and combine it with your code.
This is right. By and large, the effect is the same as copying and pasting the contents of the header in the place of the #include statement.
But then how does compiler find that code? Like i just type in #include Cs50 for example.
The compiler has default search paths built in, where it looks for the header that is being #included. Typically directories like /usr/include and /usr/local/include are built into the compiler, as well as the current directory .. You can add more search paths through compiler command line arguments, for example -I/some/path in gcc and clang.
But how does it find that code when i dont have it on my pc
It doesn't. You will get an error like "Cs50: No such file or directory".
If this works on your university's system, probably the header is installed on that system in some central location.
why would i have it without finding it online and downloading it beforehand? Does it look online and then download file which it uses now and in future for my programs when i call #include?
It doesn't.
And if so does it mean that you cant properly program without connection to internet?
C was developed in the 1970s. The internet barely existed yet. You can program perfectly well in C without an internet connection – if you can do it without StackOverflow, of course ;)
The code of the library must be present on your computer or nothing will work. There's no such thing as magic downloads of libraries, C compilers and linkers have worked the same since long before the Internet was even invented.
Standard library headers are typically downloaded & installed along with the compiler. They are also very likely already pre-linked into some convenient format for the target system. You don't need to worry about manually adding standard libs to your project since the compiler will take care of that for you.
Custom headers require their corresponding .c files or linked libs to be present too, but they must be manually added to the current project. Either by adding them in your IDE's project, or as in the old days by creating a make file.
How that works in CS50 I don't know, but the lib obviously comes pre-installed somehow and they have hidden how to the students. We wouldn't want to risk CS50 students actually learning how programming works, now would we...
if you write in ide #include some header file which tells computer to find some library
Slight misunderstanding. When you type #include "somefile.h" into your program, the first stage of C and C++ compilation called the pre-processor will search for that file (called a "header") from the INCLUDE path. That is, the pre-processor will search through the local directory of the .c file, then a specified set of standard directories, and perhaps your own project directories to find a file called "somefile.h". The INCLUDE path is highly configurable with IDEs, command lines, and environment variables. Upon finding that file, the result is that the contents of that file are virtually substituted directly into your source code, exactly where the #include statement originally appeared. It's as if you had typed that exact file contents yourself into your .c file. After the textual substitution of the #include statements with the file contents, the intermediate file contents are handed off to the compiler stage to convert to object (assembly) code.
Like i just type in #include Cs50 for example. But how does it find that code when i dont have it on my pc
If the file can't be found, the pre-processor stage of the compile will fail and the whole thing comes to an end. I'm not sure what's in Cs50, but if you type #include "Cs50" and it works, then my guess is that your university environment has some project or environment configuration that adds a course specific include directory into your compilation path. Difficult to say since you didn't specify how you were building your code. Most IDEs will let you right-click on a #include statement and navigate to the actual source of the header file so you can inspect its contents and see where it originates from on your disk.
Also, your title says "linking", but you really mean "including". Linking is final stage after the compiler compiles all source files to object code to produce a final executable program.
.h files (should only) contain data type definitions, extern declarations, and function prototypes. .h files are not libraries.
So the compiler will know what parameters function take and what is the return type. The compiler will know how to call them. It will also know the type of variables defined somewhere else in the code (including the libraries). When the compiler compiles the code it generates intermediate files called object files.
Those files are later linked together by a special program called linker. This program will find the actual code of the functions in other object files or libraries (Libraries are basically sets of object files grouped in one larger file). It happens behind the scenes. If you need to tell the linker to use specific library you simply use command line option. For example to use math library you need to use -lm compiler command line option.
But how does it find that code when I don't have it on my PC?
The library file has to be present in your file system. Otherwise linker will not be able to link functions or variables from that library.
And if so does it mean that you can't properly program without
connection to internet?
No, it means that you have to have properly configured toolchain (ie all libraries needed present in your file system).
But then how does compiler find that code?
For your level of knowledge: some libraries are linked by default. Other not - so you need to tell the compiler/linker what you want to use.
--gcc & binututils related--
Linking is generally quite a complicated process and the compiler has its own configuration files called "spec files" and linker has "linker scripts". But explaining what those files do is rather an advanced topic far beyond the scope of this question.
I want to modify the glibc dynamic linker/loader so that before mapping a shared library into a process, the linker/loader checks whether the library has been loaded/in-use by any other process in the system or not. The linker/loader will perform a specific operation on the shared library code only if the library has not been used/loaded by any other process. I understand that currently the linker/loader only linearly maps the shared library and waits for demand paging to physically load the library.
I have tried to use the shell command lsof /path/library.so from within the dynamic linker/loader code to accomplish that. To invoke lsof command from within dynamic linker code, I have tried
system("lsof /path/library.so")
File* fp=popen("lsof /path/library.so", "r")
Building dynamic linker, however, gives me "multiple definitions of x symbols" error as I tried to include stdio.h (for popen()) or stdlib.h (for system()) header files. Can you please suggest how to resolve the glibc build error or any other better way to solve my original problem?
Addition 1: Thanks #EmployedRussian. I also explored the option that you mentioned.
One possible answer is: store them in a file or a database. If that is your answer, then the solution becomes obvious: check if the file or a database entry exists. If it does, you don't need to do the computation again.
The main problem for both lsof or file/databased based solution is: when I add a new .c file and include <stdio.h> in that file to do file operations (such as FILE* fp = fopen()), the glibc build gives me errors like this for few functions: '-Wl,-(' /path/glibc-2.30_build/elf/dl-allobjs.os /path/glibc-2.30_build/libc_pic.a -lgcc '-Wl,-)' -Wl,-Map,/path/glibc-2.30_build/elf/librtld.mapT /usr/bin/ld: /path/glibc-2.30_build/libc_pic.a(dl-error.os): in function `__GI__dl_signal_exception': /path/glibc_2.30_shared_library/elf/dl-error-skeleton.c:91: multiple definition of `_dl_signal_exception'; /path/glibc-2.30_build/elf/dl-allobjs.os:/path/glibc_2.30_shared_library/elf/dl-error-skeleton.c:91: first defined here
Building dynamic linker, however, gives me "multiple definitions of x symbols" error
This is because the dynamic linker is very special, and you are very restricted in what you can do in the dynamic linker.
It is special because it must be a stand-alone program -- it can't use any other library (including libc.so.6) -- it is responsible for loading all other libraries, so naturally it can't use anything that it has yet to load.
I just want to compute them once when the library is being physically loaded the first time.
This is still an XY Problem. What are you going to do with the result of this computation?
One possible answer is: store them in a file or a database.
If that is your answer, then the solution becomes obvious: check if the file or a database entry exists. If it does, you don't need to do the computation again.
Update:
The main problem for both lsof or file/databased based solution is: when I add a new .c file and include <stdio.h> in that file to do file operations (such as FILE* fp = fopen()), the glibc build gives me errors
This is the exact same problem: you are trying to use parts of libc.so which can't be used in a dynamic linker.
If you want to store the result of your computation in a file, you need to use low-level parts which are usable. Use open() and write() instead of fopen() and fprintf().
Alternatively, do it from within your library or program -- since you will no longer care about how many processes have loaded the library, there is no reason to try to perform this computation in the loader. (There might be a reason, but you are not explaining it; so we are back to XY problem.)
I have a header file, sample_A.h, which has an include statement of the form #include "sample_B.h". I also have another header file sample_C.h. I would like header file sample_A.h to include sample_C.h instead of sample_B.h, but under no circumstances can I edit anything outside of the Makefile used to build the project. What would be the best way to "redirect" sample_A.h to include sample_C.h instead of sample_B.h by solely editing the Makefile? Assume that both sample_C.h and sample_B.h will allow sample_A.h to compile.
EDIT: I am working on a project which has a vast build structure. Some files are outdated, but due to upper management orders, these files should not be messed with until future meetings take place. In the meantime, I am trying to figure out a way to circumvent these outdated files (and their outdated include directives) without touching the files themselves. I am using the gcc compiler.
I would like header file sample_A.h to include sample_C.h instead of sample_B.h, but under no circumstances can I edit anything outside of the Makefile used to build the project.
In principle, the compiler is at leisure to choose how to interpret the file identifier presented in an #include directive (i.e. "sample_B.h"), but in practice, every compiler you're likely ever to meet interprets it as a file name or path. Thus, if you have a file whose (only) simple name is sample_C.h, and you have no allowable way to provide another name for it (by copying it or creating a symlink to it, for example) then it is unlikely that there is any way that file will ever be chosen to satisfy an #include "sample_B.h" directive.
If, however, the two headers have the same simple name but reside in different directories, e.g. src/sample_B.h and custom/sample_B.h, then it is normally possible to influence which is selected via compiler options that affect the search path for include files. The traditional option of that kind for Unix C compilers is -I.
I wonder what is inside stdio.h and conio.h etc.
I want to know how printf and scanf are are defined.
Is there a way I can open stdio.h and see what is written inside?
Depending on your implementation, you should be able to open any .h file in your favorite editor and read it directly; they're (usually) just plain text files.
However, stdio.h will only give you the declarations for printf and scanf; it won't contain the source code for them. Most compilers don't ship the source code for standard library functions; instead, they ship precompiled libraries which are linked with your code when you build the executable.
If you're willing to spend some money, P.J Plauger's The Standard C Library is a good resource that shows an implementation of the standard library functions.
When the preprocessor includes a header file into a source file, that inclusion is very much literal. That means that the header files are normal text files with source in them, and must be readable by the compiler (and therefore by you). You just have to find where they are, and you can open them like any other text file.
However, you won't find out how functions are defined, just how they are declared. And some structures are supposed to be "black boxes", whose data members should be considered private. Usually the source for the standard C library is available or downloadable, so try and find that too. It all depends on what compiler you're using.
You might also want to check out a reference site such as this one. There you can find pretty detailed information about e.g. printf.
Those headers generally chain include more machine/OS specific headers.
If you are on Linux/OS X then you can get some more info with
man stdio
Also check out http://www.cplusplus.com/reference/cstdio/ https://en.wikipedia.org/wiki/Conio.h
Most compilers allow you to read the results after the preprocessor (the compilation step that processes the #include directives) has been run. With gcc for instance, use the -E command-line option.
You can always rely on the Internet's supply of Unix-style manual pages, by searching for "man something" you can look for the relevant manual section for something.
For instance, there are pages for both printf() and scanf().
You can easily see there that the declarations aren't very special, and quite obvious from the usage. It's just int printf(const char *format, ...); for instance.
the content of some headers is defined by the C-Standard.
other headers are defined by the library that provides it.
Some headers are defined from the system for that you are writing the code (may fall into the second case since the OS provides the libs)
depending on that you may look into c language reference or you may look into the libraries manual or in the OS's API reference.
But one thin is for sure. if you can include a header (and the compiler does not complain that he could not find it) than you also can look into it. just look into the standard include directories of the compiler or the additional include directories that are specified in project file ore Makefile to find the files on your file system.
But usually the better way is to look in the Documentation because the header itself may be difficult to read because of many #ifdefs and further includes
The most fundamental way to find out what's inside those headers is to read them. Of course, you must locate them first. To this end you can use this short shell code:
gcc -E -M - << EOF
#include <stdio.h>
EOF
This will provide you with a complete list of all the headers directly or indirectly included by #include <stdio.h>. Of course, if you are only interested in the 'stdio.h' header itself, you can just do
locate stdio.h
but this will usually list quite a few false positives.
I am trying to find the file type of a file like .pdf, .doc, .docx etc. but programmatically not using shell command. Actually i have to make an application which blocks access to files of a particular extension. I have already hooked sys_call_table in LKM and now i want that when an open/read system call is triggered then my LKM checks the file type.
I know that we have a current pointer which gives access to current process structure and we can use it to find the file name stored in dentry structure and also in Linux a file type is identified by a magic number stored in starting bytes of file. But i don't know that how to find file type and exactly where it is stored ?
Linux doesn't "store" the file type for its files (unlike Mac OS' resource fork, which I think is the most well-known platform to do this). Files are just named streams of bytes, they have no structure implied by the operating system.
Either you just tell programs which file to use (and then it Does What You Say), or programs use higher-level features to figure it out.
There are programs that re-invent this particular wheel (I'm responsible for one of those), but you can also use e.g. file(1). Of course that requires your program to parse and "understand" the textual output you'll get, which in a sense only moves the problem.
However, I don't think calling into file from kernel space is very wise, so it's probably best to re-create the test for whatever set of types you need, to keep it small.
In other words, I mean you should simply re-implement the required tests. This is quite complicated in general, so if you really need to do it for as a large a set of types as possible, it might not be a very good idea. :/
Actually i have to make an application which blocks access to files of a particular extension.
that's a flawed requirement. If you check by file extension, then you'll miss files that doesn't use the extension which is quite common in Linux since it does not use file extension.
The officially sanctioned way of detecting file type in Linux is by their magic number. The shell command file is basically just a wrapper for libmagic, so you have the option of linking to that library