LibClang: parse a header file with definitions from another header file?

LibClang: parse a header file with definitions from another header file? - c

I am using the latest LibClang to parse some C header files. The code I process comes from CXUnsavedFile's (it is all generated dynamically and nothing lives on disk). For Example:
FileA.h contains:
struct STRUCT_A {
int a;
struct STRUCT_B foo;
};
FileB.h contains:
struct STRUCT_B {
int b;
};
When parsing fileA.h with the following code snippet:
CXUnsavedFile unsaved_files[2];
unsaved_files[0].Filename = "fileA.h";
unsaved_files[0].Contents = fileA_contents;
unsaved_files[0].Length = strlen( fileA_contents );
unsaved_files[1].Filename = "fileB.h";
unsaved_files[1].Contents = fileB_contents;
unsaved_files[1].Length = strlen( fileB_contents );
tu = clang_parseTranslationUnit(
index,
"fileA.h",
argv, // "-x c-header -target i386-pc-win32"
argc,
(CXUnsavedFile *)&unsaved_files,
2,
CXTranslationUnit_None
);
CXCursor cur = clang_getTranslationUnitCursor( tu );
clang_visitChildren( cur, visitor, NULL );
I get the error "field has incomplete type 'struct STRUCT_B'" which makes sense as I have not included fileB.h in order to define struct STRUCT_B.
Adding an "#include <fileB.h>" does not work (fatal error: 'fileB.h' file not found).
How do I get parsing fileA.h to work when one or more needed definitions are present in another CXUnsavedFile fileB.h?

Not sure this will help you, but here are two remarks:
Although it isn't explicitly mentioned in the documentation, I think that the Filename field should contain a full path to the file (which could be important for inclusions, especially when there are "-I" switches in the command-line)
from libclang's documentation (emphasis mine):
const char* CXUnsavedFile::Filename
The file whose contents have not yet been saved.
This file must already exist in the file system.
I suspect libclang relies on the filesystem for almost everything (finding the correct file to include, checking it exists, ...) and only account for CXUnsavedFiles at the last step, when actual content must be read.
If you can, I would suggest creating empty files in a memory filesystem. This would not incur much resource usage, and could help libclang find the correct include files.

Related

Can memmove accessing the contents of a FILE* and delete information?

Does memmove work on file pointer data?
I am trying to remove a line from a C file. I am trying to use memmove to make this more efficient than the internet's recommendation to create a duplicate file and overwrite it. I have debugged and I can't figure out why this code isn't working. I am asking for input. The logic is a for loop. Inside the loop, I have logic to do a memmove but it doesn't seem effective.
nt RemoveRow(int iRowNum)
{
char sReplaceLineStart[m_MaxSizeRow]={0};
char sTemp[m_MaxSizeRow] ={0};
size_t RemovalLength = 0;
GoToBeginningOfFile();
for(int i =0;i<m_iNumberOfRows;i++)
{
if(i == iRowNum)
{
// Line to remove
fgets(m_sRemovalRow,m_MaxSizeRow,pFile);
}
if(m_sRemovalRow == NULL)
{
// Were removing the last line
// just make it null
memset(m_sRemovalRow,0,sizeof(m_MaxSizeRow));
}
}
else if(i==iRowNum+1)
{
// replace removal line with this.
RemovalLength+=strlen(sTemp);
fgets(sReplaceLineStart, m_MaxSizeRow, pFile);
}
else if(i>iRowNum) {
// start line to replace with
RemovalLength+=strlen(sTemp);
fgets(sTemp, m_MaxSizeRow, pFile);
}
else
{
// were trying to get to the removal line
fgets(m_sCurrentRow, m_MaxSizeRow, pFile);
printf("(not at del row yet)iRow(%d)<iRowNum(%d) %s\n",
i,
m_iNumberOfRows,
m_sCurrentRow);
}
}
{
memmove(m_sRemovalRow,
sReplaceLineStart,
RemovalLength);
}
return 1;
}

FILE is a so-called opaque type, meaning that the application programmer is purposely locked out of its internals as per design - private encapsulation.
Generally one would create an opaque type using the concept of forward declaration, like this:
// stdio.h
typedef struct FILE FILE;
And then inside the private library:
// stdio.c - not accessible by the application programmer
struct FILE
{
// internals
};
Since FILE was forward declared and we only have access to the header, FILE is now an incomplete type, meaning we can't declare an instance of that type, access its members nor pass it to sizeof etc. We can only access it through the API which does know the internals. Since C allows us to declare a pointer to an incomplete type, the API will use FILE* like fopen does.
However, the implementation of the standard library isn't required to implement FILE like this - the option is simply there. So depending on the implementation of the standard library, we may or may not be able to create an instance of a FILE objet and perhaps even access its internals. But that's all in the realm of non-standard language extensions and such code would be non-portable.

Standard Library vs Windows API Speed

My question is about whether or not I should use the Windows API if I'm trying to get the most speed out of my program, where I could instead use a Standard Library function.
I assume the answer isn't consistent among every call; Specifically, I'm curious about stat() vs dwFileAttributes, if I wanted to figure out if a file was a directory or not for example (assuming file_name is a string containing the full path to the file):
WIN32_FIND_DATA fileData;
HANDLE hSearch;
hSearch = FindFirstFile(TEXT(file_name), &fileData);
int isDir = fileData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY
vs.
struct stat info;
stat(file_name, &info);
int isDir = S_ISDIR(info.st_mode);
If anyone knows, or can elaborate on what the speed difference between these libraries generally is (if any) I'd appreciate it.

Not an answer regarding speed, but #The Corn Inspector, the MSVC CRT code is open sourced. If you look at an older version ( before .net and common UCRT), and look at the stat() function, it is INDEED a wrapper around the same OS call.
int __cdecl _tstat (
REG1 const _TSCHAR *name,
REG2 struct _stat *buf
)
{ // stuff omitted for clarity
_TSCHAR * path;
_TSCHAR pathbuf[ _MAX_PATH ];
int drive; /* A: = 1, B: = 2, etc. */
HANDLE findhandle;
WIN32_FIND_DATA findbuf;
/* Call Find Match File */
findhandle = FindFirstFile((_TSCHAR *)name, &findbuf);
Of course there is addtional code for mapping structures, etc. Looks like it also does some time conversion:
SYSTEMTIME SystemTime;
FILETIME LocalFTime;
if ( !FileTimeToLocalFileTime( &findbuf.ftLastWriteTime,
&LocalFTime ) ||
!FileTimeToSystemTime( &LocalFTime, &SystemTime ) )
{
so theoretically, it could be slower, but probably so insignificant, as to make no practical difference in the context of a complete, complex program. If you are calling stat() a million times, and worry about milliseconds, who knows. Profile it.

How avoid using global variable when using nftw

I want to use nftw to traverse a directory structure in C.
However, given what I want to do, I don't see a way around using a global variable.
The textbook examples of using (n)ftw all involve doing something like printing out a filename. I want, instead, to take the pathname and file checksum and place those in a data structure. But I don't see a good way to do that, given the limits on what can be passed to nftw.
The solution I'm using involves a global variable. The function called by nftw can then access that variable and add the required data.
Is there any reasonable way to do this without using a global variable?
Here's the exchange in previous post on stackoverflow in which someone suggested I post this as a follow-up.

Using ftw can be really, really bad. Internally it will save the the function pointer that you use, if another thread then does something else it will overwrite the function pointer.
Horror scenario:
thread 1: count billions of files
thread 2: delete some files
thread 1: ---oops, it is now deleting billions of
files instead of counting them.
In short. You are better off using fts_open.
If you still want to use nftw then my suggestion is to put the "global" type in a namespace and mark it as "thread_local". You should be able to adjust this to your needs.
/* in some cpp file */
namespace {
thread_local size_t gTotalBytes{0}; // thread local makes this thread safe
int GetSize(const char* path, const struct stat* statPtr, int currentFlag, struct FTW* internalFtwUsage) {
gTotalBytes+= statPtr->st_size;
return 0; //ntfw continues
}
} // namespace
size_t RecursiveFolderDiskUsed(const std::string& startPath) {
const int flags = FTW_DEPTH | FTW_MOUNT | FTW_PHYS;
const int maxFileDescriptorsToUse = 1024; // or whatever
const int result = nftw(startPath.c_str(), GetSize, maxFileDescriptorsToUse , flags);
// log or something if result== -1
return gTotalBytes;
}

No. nftw doesn't offer any user parameter that could be passed to the function, so you have to use global (or static) variables in C.
GCC offers an extension "nested function" which should capture the variables of their enclosing scopes, so they could be used like this:
void f()
{
int i = 0;
int fn(const char *,
const struct stat *, int, struct FTW *) {
i++;
return 0;
};
nftw("path", fn, 10, 0);
}

The data is best given static linkage (i.e. file-scope) in a separate module that includes only functions required to access the data, including the function passed to nftw(). That way the data is not visible globally and all access is controlled. It may be that the function that calls ntfw() is also part of this module, enabling the function passed to nftw() to also be static, and thus invisible externally.
In other words, you should do what you are probably doing already, but use separate compilation and static linkage judiciously to make the data only visible via access functions. Data with static linkage is accessible by any function within the same translation unit, and you avoid the problems associated with global variables by only including functions in that translation unit that are creators, maintainers or accessors of that data.
The general pattern is:
datamodule.h
#if defined DATAMODULE_INCLUDE
<type> create_data( <args>) ;
<type> get_data( <args> ) ;
#endif
datamodule.c
#include "datamodule.h"
static <type> my_data ;
static int nftwfunc(const char *filename, const struct stat *statptr, int fileflags, struct FTW *pfwt)
{
// update/add to my_data
...
}
<type> create_data( const char* path, <other args>)
{
...
ret = nftw( path, nftwfunc, fd_limit, flags);
...
}
<type> get_data( <args> )
{
// Get requested data from my_data and return it to caller
}

How does cryoPID create ELF headers or is there an easy way for ELF generation?

I'm trying to do a checkpoint/restart program in C and I'm studying cryoPID's code to see how a process can be restarted. In it's code, cryoPID creates the ELF header of the process to be restarted in a function that uses some global variable and it's really confusing.
I have been searching for an easy way to create an ELF executable file, even trying out libelf, but I find that most of the times some necessary information is vague in the documentation of these programs and I cannot get to understand how to do it. So any help in that matter would be great.
Seeing cryoPID's code I see that it does the whole creation in an easy way, not having to set all header fields, etc. But I cannot seem to understand the code that it uses.
First of all, in the function that creates the ELF the following code is relevant (it's in arch-x86_64/elfwriter.c):
Elf64_Ehdr *e;
Elf64_Shdr *s;
Elf64_Phdr *p;
char* strtab;
int i, j;
int got_it;
unsigned long cur_brk = 0;
e = (Elf64_Ehdr*)stub_start;
assert(e->e_shoff != 0);
assert(e->e_shentsize == sizeof(Elf64_Shdr));
assert(e->e_shstrndx != SHN_UNDEF);
s = (Elf64_Shdr*)(stub_start+(e->e_shoff+(e->e_shstrndx*e->e_shentsize)));
strtab = stub_start+s->sh_offset;
stub_start is a global variable defined with the macro declare_writer in cryopid.h:
#define declare_writer(s, x, desc) \
extern char *_binary_stub_##s##_start; \
extern int _binary_stub_##s##_size; \
struct stream_ops *stream_ops = &x; \
char *stub_start = (char*)&_binary_stub_##s##_start; \
long stub_size = (long)&_binary_stub_##s##_size
This macro is used in writer_*.c which are the files that implement writers for files. For example in writer_buffered.c, the macro is called with this code:
struct stream_ops buf_ops = {
.init = buf_init,
.read = buf_read,
.write = buf_write,
.finish = buf_finish,
.ftell = buf_ftell,
.dup2 = buf_dup2,
};
declare_writer(buffered, buf_ops, "Writes an output file with buffering");
So stub_start gets declared as an uninitialized global variable (the code above is not in any function) and seeing that all the variables in declare_writer are not set in any other part of the code, I assume that stub_start just point to some part of the .bss section, but it seems like cryoPID use it like it's pointing to its own ELF header.
Can anyone help me with this problem or assist me in anyway to create ELF headers easily?

As mentioned in the comment, it uses something similar to objcopy to set those variables (it doesn't use the objcopy command, but custom linkers that I think could be the ones that area "setting" the variables). Couldn't exactly find what, but I could reproduce the behavior by mmap'ing an executable file previously compiled and setting the variables stub_start and stub_size with that map.

get function address from name [.debug_info ??]

I was trying to write a small debug utility and for this I need to get the function/global variable address given its name. This is built-in debug utility, which means that the debug utility will run from within the code to be debugged or in plain words I cannot parse the executable file.
Now is there a well-known way to do that ? The plan I have is to make the .debug_* sections to to be loaded into to memory [which I plan to do by a cheap trick like this in ld script]
.data {
*(.data)
__sym_start = .;
(debug_);
__sym_end = .;
}
Now I have to parse the section to get the information I need, but I am not sure this is doable or is there issues with this - this is all just theory. But it also seems like too much of work :-) is there a simple way. Or if someone can tell upfront why my scheme will not work, it ill also be helpful.
Thanks in Advance,
Alex.

If you are running under a system with dlopen(3) and dlsym(3) (like Linux) you should be able to:
char thing_string[] = "thing_you_want_to_look_up";
void * handle = dlopen(NULL, RTLD_LAZY | RTLD_NOLOAD);
// you could do RTLD_NOW as well. shouldn't matter
if (!handle) {
fprintf(stderr, "Dynamic linking on main module : %s\n", dlerror() );
exit(1);
}
void * addr = dlsym(handle, thing_string);
fprintf(stderr, "%s is at %p\n", thing_string, addr);
I don't know the best way to do this for other systems, and this probably won't work for static variables and functions. C++ symbol names will be mangled, if you are interested in working with them.
To expand this to work for shared libraries you could probably get the names of the currently loaded libraries from /proc/self/maps and then pass the library file names into dlopen, though this could fail if the library has been renamed or deleted.
There are probably several other much better ways to go about this.
edit without using dlopen
/* name_addr.h */
struct name_addr {
const char * sym_name;
const void * sym_addr;
};
typedef struct name_addr name_addr_t;
void * sym_lookup(cost char * name);
extern const name_addr_t name_addr_table;
extern const unsigned name_addr_table_size;
/* name_addr_table.c */
#include "name_addr.h"
#define PREMEMBER( X ) extern const void * X
#define REMEMBER( X ) { .sym_name = #X , .sym_addr = (void *) X }
PREMEMBER(strcmp);
PREMEMBER(printf);
PREMEMBER(main);
PREMEMBER(memcmp);
PREMEMBER(bsearch);
PREMEMBER(sym_lookup);
/* ... */
const name_addr_t name_addr_table[] =
{
/* You could do a #include here that included the list, which would allow you
* to have an empty list by default without regenerating the entire file, as
* long as your compiler only warns about missing include targets.
*/
REMEMBER(strcmp),
REMEMBER(printf),
REMEMBER(main),
REMEMBER(memcmp),
REMEMBER(bsearch),
REMEMBER(sym_lookup);
/* ... */
};
const unsigned name_addr_table_size = sizeof(name_addr_table)/sizeof(name_addr_t);
/* name_addr_code.c */
#include "name_addr.h"
#include <string.h>
void * sym_lookup(cost char * name) {
unsigned to_go = name_addr_table_size;
const name_addr_t *na = name_addr_table;
while(to_to) {
if ( !strcmp(name, na->sym_name) ) {
return na->sym_addr;
}
na++;
to_do--;
}
/* set errno here if you are using errno */
return NULL; /* Or some other illegal value */
}
If you do it this way the linker will take care of filling in the addresses for you after everything has been laid out. If you include header files for all of the symbols that you are listing in your table then you will not get warnings when you compile the table file, but it will be much easier just to have them all be extern void * and let the compiler warn you about all of them (which it probably will, but not necessarily).
You will also probably want to sort your symbols by name such that you can use a binary search of the list rather than iterate through it.
You should note that if you have members in the table which are not otherwise referenced by the program (like if you had an entry for sqrt in the table, but didn't call it) the linker will then want (need) to link those functions into your image. This can make it blow up.
Also, if you were taking advantage of global optimizations having this table will likely make those less effective since the compiler will think that all of the functions listed could be accessed via pointer from this list and that it cannot see all of the call points.
Putting static functions in this list is not straight forward. You could do this by changing the table to dynamic and doing it at run time from a function in each module, or possibly by generating a new section in your object file that the table lives in. If you are using gcc:
#define SECTION_REMEMBER(X) \
static const name_addr_t _name_addr##X = \
{.sym_name= #X , .sym_addr = (void *) X } \
__attribute__(section("sym_lookup_table" ) )
And tack a list of these onto the end of each .c file with all of the symbols that you want to remember from that file. This will require linker work so that the linker will know what to do with these members, but then you can iterate over the list by looking at the begin and end of the section that it resides in (I don't know exactly how to do this, but I know it can be done and isn't TOO difficult). This will make having a sorted list more difficult, though. Also, I'm not entirely certain initializing the .sym_name to a string literal's address would not result in cramming the string into this section, but I don't think it would. If it did then this would break things.
You can still use objdump to get a list of the symbols that the object file (probably elf) contains, and then filter this for the symbols you are interested in, and then regenerate the table file the table's members listed.