Watch out for C function names with R code - c

So here is something a bit crazy.
If you have some C code which is called by an R function (as a shared object), try adding this to the code
void warn() {
int i; // just so the function has some work, but you could make it empty to, or do other stuff
}
If you then call warn() anywhere in the C code being called by the R function you get a segfault;
*** caught segfault ***
address 0xa, cause 'memory not mapped'
Traceback:
1: .C("C_function_called_by_R", as.double(L), as.double(G), as.double(T), as.integer(nrow), as.integer(ncolL), as.integer(ncolG), as.integer(ncolT), as.integer(trios), as.integer(seed), as.double(pval), as.double(pval1), as.double(pval2), as.double(pval3), as.double(pval4), as.integer(ntest), as.integer(maxit), as.integer(threads), as.integer(quietly))
2: package_name::R_function(L, G, T, trios)
3: func()
4: system.time(func())
5: doTryCatch(return(expr), name, parentenv, handler)
6: tryCatchOne(expr, names, parentenv, handlers[[1L]])
7: tryCatchList(expr, classes, parentenv, handlers)
8: tryCatch(expr, error = function(e) { call <- conditionCall(e) if (!is.null(call)) { if (identical(call[[1L]], quote(doTryCatch))) call <- sys.call(-4L) dcall <- deparse(call)[1L] prefix <- paste("Error in", dcall, ": ") LONG <- 75L msg <- conditionMessage(e) sm <- strsplit(msg, "\n")[[1L]] w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w") if (is.na(w)) w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L], type = "b") if (w > LONG) prefix <- paste(prefix, "\n ", sep = "") } else prefix <- "Error : " msg <- paste(prefix, conditionMessage(e), "\n", sep = "") .Internal(seterrmessage(msg[1L])) if (!silent && identical(getOption("show.error.messages"), TRUE)) { cat(msg, file = stderr()) .Internal(printDeferredWarnings()) } invisible(structure(msg, class = "try-error", condition = e))})
9: try(system.time(func()))
10: .executeTestCase(funcName, envir = sandbox, setUpFunc = .setUp, tearDownFunc = .tearDown)
11: .sourceTestFile(testFile, testSuite$testFuncRegexp)
12: runTestSuite(testSuite)
aborting ...
Segmentation fault (core dumped)
(END)
Needless to say the code runs fine if you call the same function from a C or C++ wrapper instead of from an R function. If you rename warn() it also works fine.
Any ideas? Is this a protected name/symbol? Is there a list of such names? I'm using R version 2.14.1 on Ubuntu 12.01 (i686-pc-linux-gnu (32-bit)). C code is compiled with GNU GCC 4.6.3.

This seems like quite an interesting question. Here's my minimal example, in a file test.c I have
void warn() {}
void my_fun() { warn(); }
I compile it and then run
$ R CMD SHLIB test.c
$ R -e "dyn.load('test.so'); .C('my_fun')"
With my linux gcc version 4.6.3., the R output is
> dyn.load('test.so'); .C('my_fun')
R: Success
list()
with that "R: Success" coming from the warn function defined in libc (see man warn, defined in err.h). What happens is that R loads several dynamic libraries as a matter of course, and then loads test.so as instructed. When my_fun gets called, the dynamic linker resolves warn, but the rules of resolution are to search globally for the warn symbol, and not just in test.so. I really don't know what the global search rules are, perhaps in the order the .so's were opened, but whatever the case the resolution is not where I was expecting.
What is to be done? Specifying
static void warn() {}
forces resolution at compile time, when the .o is created, and hence avoiding the problem. This wouldn't work if, for instance, warn was defined in one file (utilities.c) and my_fun in another. On Linux dlopen (the function used to load a shared object) can be provided with a flag RTLD_DEEPBIND that does symbol resolution locally before globally, but (a) R does not use dlopen that way and (b) there are several (see p. 9) reservations with this kind of approach. So as far as I can tell the best practice is to use static where possible, and to carefully name functions to avoid name conflicts. This latter is not quite as bad as it seems, since R loads package shared objects such that the package symbols themselves are NOT added to the global name space (see ?dyn.load and the local argument, and also note the OS-specific caveats).
I'd be interested in hearing of a more robust 'best practice'.

Related

Pass array to Julia binary code in a similar way as in C/C++

(I am editing my initial post to add some more information)
I have recently moved to Julia because of its nice ability to create a binary out of the code.
While I followed this documentation I managed to create a binary.
Now, my intention is to pass an array to this binary. As it is mentioned in the documentation, one can pass arguments by using the global variable ARGS.
I am not sure how this could help in getting/returning an array.
To be more specific, I would like to:
write my algorithm in Julia (which gets an array, it does some calculations and returns a new array)
do the precompile
create the sysimage
create the shared library
and then call it in a similar way as in the documentation
EDIT: While the above seemed to be quite tricky, I thought I could follow the example here but simply creating my own module. That didn't work.
Here is what I tried:
I created the "my_test.jl"
module my_test
export real_main
export julia_main
function real_main(x::Float64, y::Float64)
println("from main " , x, " " , y)
end
Base.#ccallable function julia_main(x::Float64, y::Float64)::Cint
try
real_main(x,y)
return 0
catch
Base.invokelatest(Base.display_error, Base.catch_stack())
return 1
end
return 0
end
if abspath(PROGRAM_FILE) == #__FILE__
julia_main(3.,4.)
end
end
then I precompiled it, by using:
julia --startup-file=no --trace-compile=app_precompile.jl my_test.jl
Once the pre-compilation was successful, I created the create_sysimage.jl:
Base.init_depot_path()
Base.init_load_path()
#eval Module() begin
Base.include(#__MODULE__, "my_test.jl")
for (pkgid, mod) in Base.loaded_modules
if !(pkgid.name in ("Main", "Core", "Base"))
eval(#__MODULE__, :(const $(Symbol(mod)) = $mod))
end
end
for statement in readlines("app_precompile.jl")
try
Base.include_string(#__MODULE__, statement)
catch
# See julia issue #28808
Core.println("failed to compile statement: ", statement)
end
end
end # module
empty!(LOAD_PATH)
empty!(DEPOT_PATH)
Then, I built the shared library based on that image, in 2 steps:
julia --startup-file=no -J"$JULIA_DIR/lib/julia/sys.so" --output-o sys.o create_sysimage.jl
gcc -g -shared -o libsys.so -Wl,--whole-archive sys.o -Wl,--no-whole-archive -L"$JULIA_DIR/lib" -ljulia
Once this succeeds, I created the cpp file to use the library above. Therefore, my_test.cpp:
#include <julia.h>
JULIA_DEFINE_FAST_TLS()
int main()
{
libsupport_init();
jl_options.use_compiled_modules = JL_OPTIONS_USE_COMPILED_MODULES_YES;
jl_options.image_file = JULIAC_PROGRAM_LIBNAME;
jl_options.image_file_specified = 1;
jl_init_with_image(NULL,JULIAC_PROGRAM_LIBNAME);
//Enabling the below gives a better explanation of te failure
/*
jl_eval_string("using Main.my_test.jl");
if (jl_exception_occurred()) {
jl_call2(jl_get_function(jl_base_module, "showerror"),
jl_stderr_obj(),
jl_exception_occurred());
jl_printf(jl_stderr_stream(), "\n");
jl_atexit_hook(2);
exit(2);
}
jl_module_t* LA = (jl_module_t *)jl_eval_string("Main.my_test");
if (jl_exception_occurred()) {
jl_call2(jl_get_function(jl_base_module, "showerror"),
jl_stderr_obj(),
jl_exception_occurred());
jl_printf(jl_stderr_stream(), "\n");
jl_atexit_hook(3);
exit(3);
}
*/
jl_function_t *func1 = jl_get_function(jl_main_module, "julia_main");
if (jl_exception_occurred()) {
jl_call2(jl_get_function(jl_base_module, "showerror"),
jl_stderr_obj(),
jl_exception_occurred());
jl_printf(jl_stderr_stream(), "\n");
jl_atexit_hook(4);
exit(4);
}
jl_value_t* in1 = jl_box_float64(12.);
jl_value_t* in2 = jl_box_float64(24.);
jl_value_t* ret = NULL;
JL_GC_PUSH3(&in1,&in2,&ret);
ret = jl_call2(func1, in1, in2);
JL_GC_POP();
jl_atexit_hook(0);
}
And then compile it, as:
g++ -o pass_2_arrays_to_my_test_by_eval -fPIC -I$JULIA_DIR/include/julia -L$JULIA_DIR/lib -ljulia -L$CURRENT_FOLDER -lsys pass_2_arrays_to_my_test_by_eval.cpp $JULIA_DIR/lib/julia/libstdc++.so.6
The JULIA_DIR points to the Julia's installation directory and CURRENT_FOLDER points to the current working dir.
Calling the pass_2_arrays_to_my_test_by_eval fails with Segmentation Fault message.
To my understanding, it fails because it cannot load the module (you can see that if un-comment some lines in the cpp code).
Could someone give some help on that?
Some people in the past seem to do that without any issue (as here).
Thanks a lot in advance!

"not resolved from current namespace" error, when calling C routines from R

I am recently doing some computational testing with mgcv GAM. Some original functions are modified, and some are added. In order not to break compatibility, for every function I want to modify, I create a new version with a .zheyuan surfix in function name. For example, with Sl.fit function for doing penalized least squares fitting, I will have an Sl.fit.zheyuan. I would simply collect all R functions I write into a standalone R script "zheyuan.R". By adding this file into the R directory of mgcv_1.8-17 package source, and compiling this modified package into a local path, I could load it for testing purpose.
I have no problem in adding R routines, but not when adding C routines. No error occurs when installing the modified package, but when I call the R wrapper function of my added C routine, I would get the error as in my question title. If you are interested in my case, you may follow the following steps to reproduce such error.
Step 1: download the latest package source
Download the 1.8-17 version from the above link. Such link will die when new mgcv release is published on CRAN. But you can always go to mgcv CRAN page to download the latest release.
Let's untar the source. First, remove the file MD5, so that we don't get annoying MD5 warning when compiling the modified version. In the following, we would add new stuff into R directory, and src directory.
Step 2: create an R script
Consider the following R wrapper function:
RX <- function (R, X) {
X <- X + 0
.Call("C_mgcv_RX", R, X)
X
}
Create an R script "zheyuan.R" to place this function, and add it into mgcv/R.
Step 3: adding C routines
C routines regarding matrix computations are normally under src/mat.c. So let's append a new function to the end of this script:
void mgcv_RX (SEXP R, SEXP X) {
int nrowX = nrows(X);
int ncolX = ncols(X);
double one = 1.0;
F77_CALL(dtrmm)("l", "u", "n", "n", &nrowX, &ncolX, &one, REAL(R), &nrowX, REAL(X), &nrowX);
}
This is a simple routine multiplying an upper triangular matrix R, with a rectangular matrix X. The output matrix will overwrite X. Level-3 BLAS dtrmm will be called for this purpose. We don't need to worry about header files or runtime linking to BLAS library. Headers are availalbe in mat.c, and linking to BLAS is managed by R.
Step 4: register C routine
The above is insufficient. Each C routine in mgcv will appear in three places. For example, let's try searching a native C routine:
grep mgcv_RPPt mgcv/src/*
# mgcv/src/init.c: {"mgcv_RPPt",(DL_FUNC)&mgcv_RPPt,3},
# mgcv/src/mat.c:void mgcv_RPPt(SEXP a,SEXP r, SEXP NT) {
# mgcv/src/mgcv.h:void mgcv_RPPt(SEXP a,SEXP r, SEXP NT);
We also need to append a header file mgcv.h, and register this C routine in init.c.
Let's append
void mgcv_RX (SEXP R, SEXP X);
to the end of mgcv.h, and inside init.c, do:
R_CallMethodDef CallMethods[] = {
{"mgcv_pmmult2", (DL_FUNC) &mgcv_pmmult2,5},
{"mgcv_Rpiqr", (DL_FUNC) &mgcv_Rpiqr,5},
{"mgcv_tmm",(DL_FUNC)&mgcv_tmm,5},
{"mgcv_Rpbsi",(DL_FUNC)&mgcv_Rpbsi,2},
{"mgcv_RPPt",(DL_FUNC)&mgcv_RPPt,3},
{"mgcv_Rpchol",(DL_FUNC)&mgcv_Rpchol,4},
{"mgcv_Rpforwardsolve",(DL_FUNC)&mgcv_Rpforwardsolve,3},
{"mgcv_Rpcross",(DL_FUNC)&mgcv_Rpcross,3},
{"mgcv_RX",(DL_FUNC)&mgcv_RX,2}, // we add this line
{NULL, NULL, 0}
};
Step 5: complie and load
tar the modified mgcv folder to mgcv.tar.gz.
Open up a new, clean R session (possibly you need R --vanilla for start-up). Then specifying a local library path and run:
path <- getwd() ## let's just use current working directory
## make sure you move "mgcv.tar.gz" into current working path
install.packages("mgcv.tar.gz", repos = NULL, lib = path)
library(mgcv, lib.loc = path)
Step 6: test and get error
R <- matrix(runif(25), 5)
R[lower.tri(R)] <- 0
X <- matrix(runif(25), 5)
mgcv:::RX(R, X) ## function is not exported, so use `mgcv:::` to find it
# Error in .Call("C_mgcv_RX", R, X) :
# "C_mgcv_RX" not resolved from current namespace (mgcv)
Could anyone explain why and how to resolve this?
I have a temporary "fix" now. Instead of
.Call("C_mgcv_RX", R, X)
use either of the following:
.Call(mgcv:::"C_mgcv_RX", R, X)
.Call(getNativeSymbolInfo("mgcv_RX"), R, X)
I came about this because I suddenly realize that C routines can be extracted by :::, too. Since package compilation is successful, there is no way that mgcv::: can not find this C routine. And yes, it works.
To check that our defined C routine is available in the shared library loaded, try
is.loaded("mgcv_RX")
# TRUE
To list all registered C routines in the loaded shared library, use
getDLLRegisteredRoutines("mgcv")

C symbol name not in load table when exporting a function in foreach loop

This issue I posted here is actually due to the configuration of the servers. The package actually has no issue related to R/C.
I am developing an R package using foreach to speedup the computation. To illustrate the structure of the package, I give a simplified R script and a C file below:
f3.R:
f3 = function(.lst){
cl <- makeCluster(2)
registerDoParallel(cl)
f1 <-function(x){
tmp <- .C("foo")
x
}
f2 <-function(x){
f1(x)
}
foreach(x=.lst, .verbose = TRUE ) %dopar% {
f2(x)
}
}
foo.c
#include <stdio.h>
#include <R.h>
extern "C" {
void foo() {
;
}
}
This package works very well on Mac OS, but throw an error message
automatically exporting the following variables from the local environment: f1, f2
numValues: 1, numResults: 0, stopped: TRUE
got results for task 1
accumulate got an error result
numValues: 1, numResults: 1, stopped: TRUE
not calling combine function due to errors
returning status TRUE
Error in { : task 1 failed - "C symbol name "foo" not in load table"
This is what I’ve tried:
add .export = c("foo"): doesn't help
move f1() and f2() to a separate R script and add .export = c("f1", "f2"): work on Mac OS, too, but doesn't help on linux
if I don't build a package, but load the functions above to the memory directly, then it works on both Mac OS and Linux
if no C function is called (note that in my example, the C code did nothing), then it works on both Mac OS and Linux. However, if I move f1() and f2() to separate R scripts and add .export = c("f1", "f2"), it fails on Linux again.
Most of the posts found from the internet suggested using .export to export R functions, but I didn't find anything about exporting C symbols.
I solved the problem by creating a package with all functions with a .C in it.
Then you add .packages = "yourpackage" and export all functions needed.

Using R random number generators in C [duplicate]

I would like to, within my own compiled C++ code, check to see if a library package is loaded in R (if not, load it), call a function from that library and get the results back to in my C++ code.
Could someone point me in the right direction? There seems to be a plethora of info on R and different ways of calling R from C++ and vis versa, but I have not come across exactly what I am wanting to do.
Thanks.
Dirk's probably right that RInside makes life easier. But for the die-hards... The essence comes from Writing R Extensions sections 8.1 and 8.2, and from the examples distributed with R. The material below covers constructing and evaluating the call; dealing with the return value is a different (and in some sense easier) topic.
Setup
Let's suppose a Linux / Mac platform. The first thing is that R must have been compiled to allow linking, either to a shared or static R library. I work with an svn copy of R's source, in the directory ~/src/R-devel. I switch to some other directory, call it ~/bin/R-devel, and then
~/src/R-devel/configure --enable-R-shlib
make -j
this generates ~/bin/R-devel/lib/libR.so; perhaps whatever distribution you're using already has this? The -j flag runs make in parallel, which greatly speeds the build.
Examples for embedding are in ~/src/R-devel/tests/Embedding, and they can be made with cd ~/bin/R-devel/tests/Embedding && make. Obviously, the source code for these examples is extremely instructive.
Code
To illustrate, create a file embed.cpp. Start by including the header that defines R data structures, and the R embedding interface; these are located in bin/R-devel/include, and serve as the primary documentation. We also have a prototype for the function that will do all the work
#include <Rembedded.h>
#include <Rdefines.h>
static void doSplinesExample();
The work flow is to start R, do the work, and end R:
int
main(int argc, char *argv[])
{
Rf_initEmbeddedR(argc, argv);
doSplinesExample();
Rf_endEmbeddedR(0);
return 0;
}
The examples under Embedding include one that calls library(splines), sets a named option, then runs a function example("ns"). Here's the routine that does this
static void
doSplinesExample()
{
SEXP e, result;
int errorOccurred;
// create and evaluate 'library(splines)'
PROTECT(e = lang2(install("library"), mkString("splines")));
R_tryEval(e, R_GlobalEnv, &errorOccurred);
if (errorOccurred) {
// handle error
}
UNPROTECT(1);
// 'options(FALSE)' ...
PROTECT(e = lang2(install("options"), ScalarLogical(0)));
// ... modified to 'options(example.ask=FALSE)' (this is obscure)
SET_TAG(CDR(e), install("example.ask"));
R_tryEval(e, R_GlobalEnv, NULL);
UNPROTECT(1);
// 'example("ns")'
PROTECT(e = lang2(install("example"), mkString("ns")));
R_tryEval(e, R_GlobalEnv, &errorOccurred);
UNPROTECT(1);
}
Compile and run
We're now ready to put everything together. The compiler needs to know where the headers and libraries are
g++ -I/home/user/bin/R-devel/include -L/home/user/bin/R-devel/lib -lR embed.cpp
The compiled application needs to be run in the correct environment, e.g., with R_HOME set correctly; this can be arranged easily (obviously a deployed app would want to take a more extensive approach) with
R CMD ./a.out
Depending on your ambitions, some parts of section 8 of Writing R Extensions are not relevant, e.g., callbacks are needed to implement a GUI on top of R, but not to evaluate simple code chunks.
Some detail
Running through that in a bit of detail... An SEXP (S-expression) is a data structure fundamental to R's representation of basic types (integer, logical, language calls, etc.). The line
PROTECT(e = lang2(install("library"), mkString("splines")));
makes a symbol library and a string "splines", and places them into a language construct consisting of two elements. This constructs an unevaluated language object, approximately equivalent to quote(library("splines")) in R. lang2 returns an SEXP that has been allocated from R's memory pool, and it needs to be PROTECTed from garbage collection. PROTECT adds the address pointed to by e to a protection stack, when the memory no longer needs to be protected, the address is popped from the stack (with UNPROTECT(1), a few lines down). The line
R_tryEval(e, R_GlobalEnv, &errorOccurred);
tries to evaluate e in R's global environment. errorOccurred is set to non-0 if an error occurs. R_tryEval returns an SEXP representing the result of the function, but we ignore it here. Because we no longer need the memory allocated to store library("splines"), we tell R that it is no longer PROTECT'ed.
The next chunk of code is similar, evaluating options(example.ask=FALSE), but the construction of the call is more complicated. The S-expression created by lang2 is a pair list, conceptually with a node, a left pointer (CAR) and a right pointer (CDR). The left pointer of e points to the symbol options. The right pointer of e points to another node in the pair list, whose left pointer is FALSE (the right pointer is R_NilValue, indicating the end of the language expression). Each node of a pair list can have a TAG, the meaning of which depends on the role played by the node. Here we attach an argument name.
SET_TAG(CDR(e), install("example.ask"));
The next line evaluates the expression that we have constructed (options(example.ask=FALSE)), using NULL to indicate that we'll ignore the success or failure of the function's evaluation. A different way of constructing and evaluating this call is illustrated in R-devel/tests/Embedding/RParseEval.c, adapted here as
PROTECT(tmp = mkString("options(example.ask=FALSE)"));
PROTECT(e = R_ParseVector(tmp, 1, &status, R_NilValue));
R_tryEval(VECTOR_ELT(e, 0), R_GlobalEnv, NULL);
UNPROTECT(2);
but this doesn't seem like a good strategy in general, as it mixes R and C code and does not allow computed arguments to be used in R functions. Instead write and manage R code in R (e.g., creating a package with functions that perform complicated series of R manipulations) that your C code uses.
The final block of code above constructs and evaluates example("ns"). Rf_tryEval returns the result of the function call, so
SEXP result;
PROTECT(result = Rf_tryEval(e, R_GlobalEnv, &errorOccurred));
// ...
UNPROTECT(1);
would capture that for subsequent processing.
There is Rcpp which allows you to easily extend R with C++ code, and also have that C++ code call back to R. There are examples included in the package which show that.
But maybe what you really want is to keep your C++ program (i.e. you own main()) and call out to R? That can be done most easily with
RInside which allows you to very easily embed R inside your C++ application---and the test for library, load if needed and function call are then extremely easy to do, and the (more than a dozen) included examples show you how to. And Rcpp still helps you to get results back and forth.
Edit: As Martin was kind enough to show things the official way I cannot help and contrast it with one of the examples shipping with RInside. It is something I once wrote quickly to help someone who had asked on r-help about how to load (a portfolio optimisation) library and use it. It meets your requirements: load a library, accesses some data in pass a weights vector down from C++ to R, deploy R and get the result back.
// -*- mode: C++; c-indent-level: 4; c-basic-offset: 4; tab-width: 8; -*-
//
// Simple example for the repeated r-devel mails by Abhijit Bera
//
// Copyright (C) 2009 Dirk Eddelbuettel
// Copyright (C) 2010 - 2011 Dirk Eddelbuettel and Romain Francois
#include <RInside.h> // for the embedded R via RInside
int main(int argc, char *argv[]) {
try {
RInside R(argc, argv); // create an embedded R instance
std::string txt = "suppressMessages(library(fPortfolio))";
R.parseEvalQ(txt); // load library, no return value
txt = "M <- as.matrix(SWX.RET); print(head(M)); M";
// assign mat. M to NumericMatrix
Rcpp::NumericMatrix M = R.parseEval(txt);
std::cout << "M has "
<< M.nrow() << " rows and "
<< M.ncol() << " cols" << std::endl;
txt = "colnames(M)"; // assign columns names of M to ans and
// into string vector cnames
Rcpp::CharacterVector cnames = R.parseEval(txt);
for (int i=0; i<M.ncol(); i++) {
std::cout << "Column " << cnames[i]
<< " in row 42 has " << M(42,i) << std::endl;
}
} catch(std::exception& ex) {
std::cerr << "Exception caught: " << ex.what() << std::endl;
} catch(...) {
std::cerr << "Unknown exception caught" << std::endl;
}
exit(0);
}
This rinside_sample2.cpp, and there are lots more examples in the package. To build it, you just say 'make rinside_sample2' as the supplied Makefile is set up to find R, Rcpp and RInside.

How to embed a Lua script within a C binary?

I've been getting spoiled in the shell world where I can do:
./lua <<EOF
> x="hello world"
> print (x)
> EOF
hello world
Now I'm trying to include a Lua script within a C application that I expect will grow with time. I've started with a simple:
const char *lua_script="x=\"hello world\"\n"
"print(x)\n";
luaL_loadstring(L, lua_script);
lua_pcall(L, 0, 0, 0);
But that has several drawbacks. Primarily, I have to escape the line feeds and quotes. But now I'm hitting the string length ‘1234’ is greater than the length ‘509’ ISO C90 compilers are required to support warning while compiling with gcc and I'd like to keep this program not only self-contained but portable to other compilers.
What is the best way to include a large Lua script inside of a C program, and not shipped as a separate file to the end user? Ideally, I'd like to move the script into a separate *.lua file to simplify testing and change control, and have that file somehow compiled into the executable.
On systems which support binutils, you can also 'compile' a Lua file into a .o with 'ld -r', link the .o into a shared object, and then link your application to the shared library. At runtime, you dlsym(RTLD_DEFAULT,...) in the lua text and can then evaluate it as you like.
To create some_stuff.o from some_stuff.lua:
ld -s -r -o some_stuff.o -b binary some_stuff.lua
objcopy --rename-section .data=.rodata,alloc,load,readonly,data,contents some_stuff.o some_stuff.o
This will get you an object file with symbols that delimit the start, end, and size of your lua data. These symbols are, as far as I know, determined by ld from the filename. You don't have control over the names, but they are consistently derived. You will get something like:
$ nm some_stuff.o
000000000000891d R _binary_some_stuff_lua_end
000000000000891d A _binary_some_stuff_lua_size
0000000000000000 R _binary_some_stuff_lua_start
Now link some_stuff.o into a shared object like any other object file. Then, within your app, write a function that will take the name "some_stuff_lua", and do the appropriate dlsym magic. Something like the following C++, which assumes you have a wrapper around lua_State called SomeLuaStateWrapper:
void SomeLuaStateWrapper::loadEmbedded(const std::string& embeddingName)
{
const std::string prefix = "_binary_";
const std::string data_start = prefix + embeddingName + "_start";
const std::string data_end = prefix + embeddingName + "_end";
const char* const data_start_addr = reinterpret_cast<const char*>(
dlsym(RTLD_DEFAULT, data_start.c_str()));
const char* const data_end_addr = reinterpret_cast<const char*>(
dlsym(RTLD_DEFAULT, data_end.c_str()));
THROW_ASSERT(
data_start_addr && data_end_addr,
"Couldn't obtain addresses for start/end symbols " <<
data_start << " and " << data_end << " for embedding " << embeddingName);
const ptrdiff_t delta = data_end_addr - data_start_addr;
THROW_ASSERT(
delta > 0,
"Non-positive offset between lua start/end symbols " <<
data_start << " and " << data_end << " for embedding " << embeddingName);
// NOTE: You should also load the size and verify it matches.
static const ssize_t kMaxLuaEmbeddingSize = 16 * 1024 * 1024;
THROW_ASSERT(
delta <= kMaxLuaEmbeddingSize,
"Embedded lua chunk exceeds upper bound of " << kMaxLuaEmbeddingSize << " bytes");
namespace io = boost::iostreams;
io::stream_buffer<io::array_source> buf(data_start_addr, data_end_addr);
std::istream stream(&buf);
// Call the code that knows how to feed a
// std::istream to lua_load with the current lua_State.
// If you need details on how to do that, leave a comment
// and I'll post additional details.
load(stream, embeddingName.c_str());
}
So, now within your application, assuming you have linked or dlopen'ed the library containing some_stuff.o, you can just say:
SomeLuaStateWrapper wrapper;
wrapper.loadEmbedded("some_stuff_lua");
and the original contents of some_stuff.lua will have been lua_load'ed in the context of 'wrapper'.
If, in addition, you want the shared library containing some_stuff.lua to be able to be loaded from Lua with 'require', simply give the same library that contains some_stuff.o a luaopen entry point in some other C/C++ file:
extern "C" {
int luaopen_some_stuff(lua_State* L)
{
SomeLuaStateWrapper wrapper(L);
wrapper.loadEmbedded("some_stuff_lua");
return 1;
}
} // extern "C"
Your embedded Lua is now available via require as well. This works particularly well with luabind.
With SCons, it is fairly easy to educate the build system that when it sees a .lua file in the sources section of a SharedLibrary that it should 'compile' the file with the ld/objcopy steps above:
# NOTE: The 'cd'ing is annoying, but unavoidable, since
# ld in '-b binary' mode uses the name of the input file to
# set the symbol names, and if there is path info on the
# filename that ends up as part of the symbol name, which is
# no good. So we have to cd into the source directory so we
# can use the unqualified name of the source file. We need to
# abspath $TARGET since it might be a relative path, which
# would be invalid after the cd.
env['SHDATAOBJCOM'] = 'cd $$(dirname $SOURCE) && ld -s -r -o $TARGET.abspath -b binary $$(basename
$SOURCE)'
env['SHDATAOBJROCOM'] = 'objcopy --rename-section .data=.rodata,alloc,load,readonly,data,contents $
TARGET $TARGET'
env['BUILDERS']['SharedLibrary'].add_src_builder(
SCons.Script.Builder(
action = [
SCons.Action.Action(
"$SHDATAOBJCOM",
"$SHDATAOBJCOMSTR"
),
SCons.Action.Action(
"$SHDATAOBJROCOM",
"$SHDATAOBJROCOMSTR"
),
],
suffix = '$SHOBJSUFFIX',
src_suffix='.lua',
emitter = SCons.Defaults.SharedObjectEmitter))
I'm sure it is possible to do something like this with other modern build systems like CMake as well.
This technique is of course not limited to Lua, but can be used to embed just about any resource in a binary.
A really cheap, but not so easy to alter way is to use something like bin2c to generate a header out of a selected lua file (or its compiled bytecode, which is faster and smaller), then you can pass that to lua to execute.
You can also try embedding it as a resource, but I have no clue how that works outside of visual studio/windows.
depending what you want to do, you might even find exeLua of use.

Resources