How can I use VTD-XML inside Perl with Inline::C? - c

I've recently discovered the power of the VTD-XML approach to XML parsing, mainly its speed.
Just to be specific, I have built the C version 2.10 ( there are Java, C++ and C# implementations too ).
My objective is simple: I want to extract data from XML using VTD-XML for parsing, and using Perl to work with data.
The easy way may be dump data with a C program I made, and send them via pipe to the Perl program. Maybe not elegant but it works.
Another, less easy way, consists of a Perl program that calls the C data collector subroutine using Inline::C.
So I started studying Inline::C and managed to do basic things I need to pass data back to Perl from C subroutines using Perl C API functions.
Problems arise in the compiling phase when I write the C collector subroutine in the C source under Inline::C control.
There are symbol conflicts like this: bind() is defined both in socket.h ( Perl ) and in autoPilot.h ( VTD-XML ). Symbol conflicts can be avoided building VTD-XML as a shared library with an explicit export map ( gcc -Wl,-version-script=foo.map )... Is this the right way to go?
Are there better ways?

I did reach my goal by adding a layer of indirection: awful, as it seems to me it works.
First of all, I made a shared library containing the VTD-XML API. Building this shared object, I had to avoid global scope pollution, exporting only symbols needed.
Then I built another shared library. This second shared libray hides the VTD-XML API and is supposed to be used from Perl via Inline::C. In this shared object I wrote a handful of functions, using libvtd.so partially exposed API.
The idea looks like this:
Perl -> Inline::C dynamic loader -> wrapper_API.so -> libvtd.so
Major issues came from runtime loading of shared libraries and from symbol collision/resolution.
Here is how I build libvtd.so, making it easy for the so called wrapper_API.so to use it.
Unfortunately, VTD-XML doesn't build a libvtd.so shared object, so I had to build it myself linking together several .o object files with gcc:
gcc -shared -fPIC -Wl,-soname,libvtd.so.2.10 -Wl,--version-script=vtd-xml.map \
-o libvtd.so.2.10 libvtd.o arrayList.o fastIntBuffer.o fastLongBuffer.o \
contextBuffer.o vtdNav.o vtdGen.o autoPilot.o XMLChar.o XMLModifier.o intHash.o \
bookMark.o indexHandler.o transcoder.o elementFragmentNs.o
Symbol visibility was tuned with the linker option -Wl,--version-script=vtd-xml.map, where the map file being:
{
global:
the_exception_context;
toString;
getText;
getCurrentIndex;
toNormalizedString;
toElement;
toElement2;
createVTDGen;
setDoc;
parse;
getNav;
freeVTDGen;
freeVTDNav;
getTokenCount;
local:
*;
};
Global ( "exported" ) symbols are under the global: section, while the catchall * under local says all other symbols are only known locally.
All object modules come from the VTD-XML distribution, with the exception of libvtd.o: this custom object was needed to address issues with exception handling library cexept.h.
libvtd.c is only two lines of code.
#include "customTypes.h"
struct exception_context the_exception_context[ 1 ];
In the compilation phase I had to adjust CFLAGS of to make Position Independent Code ( gcc -fPIC option ), in order to make shared objects.
readelf tool was useful to check symbol visibility:
readelf --syms libvtd.so.2.10
Symbol table '.dynsym' contains 35 entries:
Num: Value Size Type Bind Vis Ndx Name
...
280: 000000000000d010 117 FUNC LOCAL DEFAULT 12 writeIndex
281: 000000000003c5d0 154 FUNC LOCAL DEFAULT 12 setCursorPosition
282: 000000000003c1f0 56 FUNC LOCAL DEFAULT 12 resetIntHash
...
331: 0000000000004f50 3545 FUNC GLOBAL DEFAULT 12 toElement
332: 00000000000071e0 224 FUNC GLOBAL DEFAULT 12 getText
333: 000000000000d420 114 FUNC GLOBAL DEFAULT 12 freeVTDGen
...
339: 000000000000b600 731 FUNC GLOBAL DEFAULT 12 toElement2
340: 000000000000e650 120 FUNC GLOBAL DEFAULT 12 getNav
341: 0000000000025750 70567 FUNC GLOBAL DEFAULT 12 parse
The wrapperAPI.so consists of several functions that use VTD-XML API, its custom types, but accept and return only standard C types and/or structs.
The wrapper came straight from a former standalone C program.

Related

How to link an externally built C library to Ada default runtime math services (sin, cos, etc.)?

I need to use an externally built C library doing some calculation with trigonometric services to my Ada program. I do it well using an stm32 bb runtime (SFP) but when trying to do the same in a native environment using the default Ada runtime, I end up with linking problems. Hope I can find some help here.
I tried several configurations of project files (gpr) solutions and I always end up with the same kind of linking error:
Memory region Used Size Region Size %age Used/opt/gnat/gnat_native/bin/../libexec/gcc/x86_64-pc-linux-gnu/7.3.1/ld: /home/pulsar/repos/pulsar-software/something/lib_c/libC.a(something.o): in function `compute':
(.text+0xa5): undefined reference to `sin'
collect2: error: ld returned 1 exit status
Here is what I've got so far.
The C library build sequence is as follows (confirmed by the library provider):
$ gcc -c something.c -o something.o
$ ar -r libsomethingLib.a something.o
The C library gpr file something_lib_c.gpr:
library project Something_Lib_C is
for Languages use ("C");
for Externally_Built use "true";
for Source_Dirs use ("src_c");
for Library_Dir use "lib_c";
for Library_Name use "somethingLib";
for Library_Kind use "static";
end Geocaging_Lib_C;
In the lib_c directory, I have the actual library libsomethingLib.a
In the src_c directory, I have the header API to use the C library (something.h):
#ifndef _GEOCAGING_H
#define _GEOCAGING_H
typedef struct something_s something_t;
extern void compute(something_t* const self);
#endif // _GEOCAGING_H
Then here is the Ada project file that wraps the C library something_lib.gpr:
with "something_lib_c.gpr";
project Something_Lib extends "../settings.gpr" is
for Languages use ("Ada");
for Source_Dirs use ("./src_ada");
for Object_Dir use "obj" & "/" & Target & "/" & Build;
end Geocaging_Lib;
In the directory src_ada, I have the Ada API wrapper (something_api.ads):
with Interfaces; use Interfaces;
with Interfaces.C; use Interfaces.C;
package Something_API is
type T_Something is null record;
procedure Compute (Something : access T_Something);
with Import => True,
Convention => C,
External_Name => "compute";
end Something_API;
And finally, I call the compute service from my Ada program by with-ing the Ada API wrapper.
Once again, when building/linking the whole thing for an arm-eabi target, using an stm32-full or stm32-sfp Ada runtime, everything runs well and the behavior of the library is validated.
The whole point is I'd like to do the thing in a native environment in order to run CI tests on it and I can't find a way to pass the link stage.
Last thing, in the Settings.gpr generic project file contains some common Ada build/bind/build switches that I can provide if necessary. But I can't see how this could work in arm and not in native with the same options. This HAS to be linked to the default Ada runtime thing...
Any idea?
If you were building with a C main program, what would you have to do to bring in the maths libraries at link time? ... possibly something like
gcc foo.c -l somethingLib -lm
What you need to do is to arrange for the -lm to be included whenever you call in something_lib_c.gpr.
I think that what you need to do is to modify library project Something_Lib_C to include the line
for Library_Options use ("-lm");
OK, my HUGE apologies to all of you who tried to help...
The solution was more obvious than I thought, I was just too obsessed with the thing working in arm and not in native.
BUT, the solution was simply to add the -lm switch to the global linker switches. Hence:
Ada_Switches_Linker_Native := (
"-Wl,--gc-sections"
,"-Wl,--verbose"
,"-Wl,-lm"
);
package Linker is
case Target is
when "native" =>
for Switches ("Ada") use Ada_Switches_Linker_Native;
...
end case;
end Linker;
In case it could be of interest for someone else, the fact that it works straightforward in arm environment and not in native is because the default runtime does not embed a specific mathematical library and you are supposed to use the C one provided by gcc, linking through the -lm switch.
In the contrary, when using a target specific runtime like arm (for stm32f4 for example), the correct mathematical libraries are provided, selected and automatically linked depending on your compilation options (-mhard-float, -msoft-float, etc.).
Sorry again and thank you very much for your time.

Sharing C functions between two XS Perl modues

I have a Perl module A that is a XS based module. I have an A.xs file, and an aux_A.c file, where I have some standard C functions. I use DynaLoader, and it works file.
Now, I have a new module B, that is also a XS module. I also have the B.xs file, and the aux_B.c file. Now, I want that a standard C function defined in aux_B.c file to be able to use a function defined in aux_A.c file.
One option is to make A module to create a standard C library, and link B module with it. But I was trying to get away from that option.
Is there any other way to go?
What I am currently getting is DynaLoader complaining on undefined symbol when trying to load the B.so library.
Thanks
Alberto
To make module A export its C symbols with DynaLoader, you have to add the following to A.pm:
sub dl_load_flags { 1 }
This is badly documented, unfortunately. See this thread on PerlMonks and the DynaLoadersource code for more details. The effect of the flag is to set RTLD_GLOBAL when loading A.so with dlopen which makes its symbols available to other shared objects.

How to generate a hex without main function using IAR linker - xlink?

The point is to generate a hex without main function using IAR linker - xlink?
This code should be loaded into the RAM of RL78 MCU.
A quick Google search of iar generate hex from library brought me to this document, "Creating an Absolutely Placed Library", as a first result. It has all the information you need, plus some information on using a CRC for consistency checking. The document is for the IAR EWRX variant, but the concepts should all be the same.
The basic process is to compile your library as an executable, but without a main() function in it. You'll need to set your library configuration under General -> Library Options to None. You can also setup your file conversion settings at this point.
Since you don't have a main() function for a program entry point, you will need to create an entry function to call the IAR C runtime initialization function, __iar_data_init2(), and then set the linker to use this function as the entry point (which can be found under Linker Options -> Library Options).
When building a library, all the symbols will be preserved until the final link step for the application using it, but since you are building this as an executable, it is important that the symbols you want to keep have the __root keyword, or under Linker -> Extra Options you can specify --no-remove to keep all symbols.
In the next step, you need to use isymexport to export the symbols that you want. You will need a file to direct the tool what to export. In the example, they have a file that just contains the following:
show lib_*
show __checksum*
This will direct the tool to export all symbols beginning with lib_ and all symbols beginning with __checksum. They note that __iar_data_init2() should not be exported, as this would cause conflicts with the application that ultimately will use this code. You invoke the tool like so:
isymexport <path to .out file> <path to output from tool> --edit <path to file created above>
Now you should have the output from isymexport and the library file that you were looking for. For the application using this library, you'll need to add the output from isymexport as a library under Linker -> Library, and in your application, you'll need to call your entry function in the library before you attempt to use any of the library's symbols.
This should be the information you need to generate a library that lives in a hex file and can be loaded separately, as well as how to use that library. The referenced document has a lot more detail, so if it is available at that link (or can be found elsewhere by title) it will be a better reference than my summary here.

Override weak symbols in static library

I want to make a static .a library for my project from multiple sources, some of them define weak functions and others implements them. Let's say as example I have :
lib1.c :
void defaultHandler()
{
for(;;);
}
void myHandler() __attribute__((weak, alias ("defaultHandler")));
lib2.c :
void myHandler()
{
/* do my stuff here */
}
Then I want to put them into one single library, so that it seems transparent for the end application
$ ar -r libhandlers.a lib1.o lib2.o
But there is now 2 symbols myHandler in libhandlers :
$ nm libhandlers.a | grep "myHandler"
00000001 W myHandler
00000581 T myHandler
And then when using the lib, the weak reference is linked. The only solution I have for the moment is to not include in the library lib2.c but to add it as source in the application's Makefileā€¦ that's not satisfying since I would like to provide only a few libraries to use and not a whole bunch of files.
The --whole-archive option is also not satisfying since I work on embedded system and I don't want to include all things I don't need.
Is there a way to compile the library so that the weak symbol disappear if a strong one is provided?
NOTEĀ : I'm using arm-none-eabi-gcc v4.8
This is a byproduct of the way that .a libraries work - they're simply a collection of .o files.
What happens at compile link time is that the first reference to the name gets resolved to the weak reference and the strong name never gets a look in.
You can test this yourself by actually making both the identical name and strong and you'll see exactly the same behaviour.
If you want the strong references resolved first, then put them earlier in the archive, or create a separate strong archive and link that first in the link-line.
While not directly applicable to your case, as you're using an embedded environment, weak vs. strong references come into force properly when creating/consuming .so dynamic libraries rather than .a archives. When you create a .so, all the weak references that make up the library will not generate an error, and only one of them will be used for the final product; and if there is a strong definition anywhere then it gets used rather than any of the weak ones (this only works properly if, when you're creating a .so, you link all the .o files that make it up separately, or use the --whole-archive when creating the .so if linking to a .a).

Re-export Shared Library Symbols from Other Library (OS X / POSIX)

My question is fairly OS X on x86-64 specific but a universal solution that works on other POSIX OSes is even more appreciated.
Given a list of symbol names of some shared library (called original library in the following) and I want my shared library to re-export these symbols. Re-export as in if someone tries to resolve the symbol against my library I either provide my version of this symbol or (if my library doesn't have this symbol) forward to the original library's symbol.
I don't know the types of the symbols, I only know whether they are functions (type T in nm output) or other symbols (type S in nm output).
For functions, I already have a solution: For every function I want to re-export I generate an assembly stub that does dynamically resolve the symbol (using dlsym()) and then jumps into the resolved function with the very same environment (registers rdi, rsi, rdx, rcx, r8, r9, stack pointer, ...). I'm basically generating universal proxy functions. Using some macro trickery that can be generated fairly easy without writing code for each and every symbol.
For non-function symbols the problem seems to be harder because I cannot generate this universal proxy function, because the resolving party does never call a function.
Using a constructor function static void init(void) __attribute__((constructor)); I can execute code whenever someone loads my library, that would be a good point to resolve and re-export all non-function symbols if that's possible.
In other words, I'd like to write the symbol table of my library to point to the respective symbols of another shared library. Doing the rewriting at compile or run time is okay (run time preferred). Or put yet another way, the behaviour of DYLD_INSERT_LIBRARIES (LD_PRELOAD) is exactly what I need but I don't want to insert a new library, I want to replace one (in the file system). EDIT: The reason I don't want/can't use DYLD_INSERT_LIBRARIES or any other environment variable of the DYLD_* family is that they are ignored for code signed, restricted, ... binaries.
I'm aware of the -reexport-l, -reexport_library and -reexported_symbols_list linker flags but I could not get them to work, especially when my library is a "replacement" for frameworks that are part of umbrella frameworks (example: /System/Library/Frameworks/CoreServices.framework/Frameworks/SearchKit.framework/SearchKit) because ld forbids to link directly against parts of umbrella frameworks.
EDIT: Because I explained it somewhat ambiguously: I can't change the way the actual program is linked. The goal is to produce a shared library that is a replacement for the original library. (Apparently called filter library.)
Found it out now (OS X specific): clang -o replacement-lib.dylib ... -Xlinker -reexport_library PATH_TO_ORIGINAL_LIB does the trick. PATH_TO_ORIGINAL_LIB could for example be /System/Library/Frameworks/CoreServices.framework/Frameworks/SearchKit.framework/Versions/Current/SearchKit.
If PATH_TO_ORIGINAL_LIB is a library that is part of an umbrella framework (as in the example above), then replace PATH_TO_ORIGINAL_LIB by the path of some other lib (I created a lib empty.dylib for that) and as a second step do
install_name_tool -change /usr/local/lib/empty.dylib PATH_TO_ORIGINAL_LIB replacement-lib.dylib
To see if the actual reexporting worked use:
otool -l replacement-lib.dylib | grep -A2 LC_REEXPORT_DYLIB
The output should look like
cmd LC_REEXPORT_DYLIB
cmdsize XX
name empty.dylib (offset YY)
After launching the install_name_tool it could be
cmd LC_REEXPORT_DYLIB
cmdsize XX
name /System/Library/Frameworks/CoreServices.framework/Frameworks/SearchKit.framework/Versions/Current/SearchKit (offset YY)
You could link against both libraries and use the link order to make sure to link against the right symbols. This works on both OS X and Linux:
cc -o executable -lmylib -loriglib
Where origlib is the original library and mylib contains symbols that are supposed to overwrite symbols in origlib. Then the executable will be linked against your symbols from mylib first and all unresolved symbols will be linked against origlib.
This works in the same way when linking against OS X frameworks. Just link against your library that replaces symbols first and against the framework after.
cc -o executable -lmylib -framework SomeFramework
Edit: If you just want to replace symbols at runtime then you can use LD_PRELOAD in the same way:
cc -o executable -framework SomeFramework
LD_PRELOAD=libmylib.dylib ./executable

Resources