Error in kernel when using big arrays - arrays

While using a simple function to memset CUDA array, I get invalid argument for big arrays ( around > pow(2,25)).
I am running on a Tesla k40. I should have enough memory (by far) to allocate the array, and also enough capacity to throw the amount of blocks I am, however the following code exits with an error:
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#define MAXTHREADS 1024
//http://stackoverflow.com/a/16283216/1485872
#define cudaCheckErrors(msg) \
do { \
cudaError_t __err = cudaGetLastError(); \
if (__err != cudaSuccess) { \
fprintf(stderr, "Fatal error: %s (%s at %s:%d)\n", \
msg, cudaGetErrorString(__err), \
__FILE__, __LINE__); \
fprintf(stderr, "*** FAILED - ABORTING\n"); \
exit(1);} \
} while (0)
__global__ void mymemset(float* image, const float val, size_t N)
{
//http://stackoverflow.com/a/35133396/1485872
size_t tid = threadIdx.x + blockIdx.x * blockDim.x;
while (tid < N) {
image[tid] = val;
tid += gridDim.x * blockDim.x;
}
}
int main()
{
size_t total_pixels = pow(2, 26) ;
float* d_image;
cudaMalloc(&d_image, total_pixels*sizeof(float));
cudaCheckErrors("Malloc");
dim3 bsz = dim3(MAXTHREADS);
dim3 gsz = dim3(total_pixels / bsz.x + ((total_pixels % bsz.x > 0) ? 1 : 0));
mymemset << <gsz, bsz >> >(d_image, 1.0f, total_pixels);
cudaCheckErrors("mymemset"); //<- error!
cudaDeviceReset();
}
The code works fine up to (and a bit more) pow(2,25) in total_pixels but fails for pow(2,26).
Coincidentally this is the point where the block size bsz is 65536, which seems to be an upper limit in some GPUs, but in the Tesla k40 its supposed to be 2147483647 for the x dimension, while 65536 for y and z (that I am not using). Any insight about the origin of this error?
Compiler flags from VS2013: Properties->CUDA C/C++/command line
# Driver API (NVCC Compilation Type is .cubin, .gpu, or .ptx)
set CUDAFE_FLAGS=--sdk_dir "C:\Program Files (x86)\Windows Kits\8.1\"
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\bin\nvcc.exe" --use-local-env --cl-version 2013 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin" -G --keep-dir Debug -maxrregcount=0 --machine 32 --compile -cudart static -o Debug\%(Filename)%(Extension).obj "%(FullPath)"
# Runtime API (NVCC Compilation Type is hybrid object or .c file)
set CUDAFE_FLAGS=--sdk_dir "C:\Program Files (x86)\Windows Kits\8.1\"
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\bin\nvcc.exe" --use-local-env --cl-version 2013 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin" -G --keep-dir Debug -maxrregcount=0 --machine 32 --compile -cudart static -g -Xcompiler "/EHsc /nologo /Zi " -o Debug\%(Filename)%(Extension).obj "%(FullPath)"

You are compiling for the default architecture (sm_20), which has a block size limit of 65535 each dimension of the grid. You must build for sm_35 to be able to launch 2147483647 blocks in a 1D grid.
You should also note that the kernel you are using (which I wrote), could be run with many fewer blocks than (n/blocksize) and still work correctly, and it would be more efficient to do so.

Related

How do I create a static library (.lib) which depends on the dynamic (.dll) tensorflow library?

I have created a static C library in Visual Studio 2019 on Windows 10 which depends on the tensorflow library, which is dynamic (.dll). My library, lets call it A.lib, contains a function which takes data, pass it to a tensorflow model and returns the model's output. The compilation seems to work well and creates an A.lib file.
Now I want to use my static library in another project to create an .exe. Lets call it B. I copied the header A.h and the A.lib into the B project and adapt the project properties so that my library can be found.
The problem is that I get LNK2001 errors, because the linker can not find the definitions of the tensorflow functions which I call in my A.lib.
I tried to copy the tensorflow lib into my project B as well. But that did not help.
What do I have to do to include the libraries correctly? Or is there a simpler alternative to deploy a convolutional neural network in C?
Here's a [SO]: How to create a Minimal, Reproducible Example (reprex (mcve)).
dll00.h:
#if defined(_WIN32)
# if defined(DLL00_EXPORTS)
# define DLL00_EXPORT_API __declspec(dllexport)
# else
# define DLL00_EXPORT_API __declspec(dllimport)
# endif
#else
# define DLL00_EXPORT_API
#endif
#if defined(__cplusplus)
extern "C" {
#endif
DLL00_EXPORT_API int dll00Func00();
#if defined(__cplusplus)
}
#endif
dll00.c:
#define DLL00_EXPORTS
#include "dll00.h"
#include <stdio.h>
int dll00Func00() {
printf("%s - %d - %s\n", __FILE__, __LINE__, __FUNCTION__);
return -3;
}
lib00.h:
#if defined(__cplusplus)
extern "C" {
#endif
int lib00Func00();
#if defined(__cplusplus)
}
#endif
lib00.c:
#include "lib00.h"
#include "dll00.h"
#include <stdio.h>
int lib00Func00() {
printf("%s - %d - %s\n", __FILE__, __LINE__, __FUNCTION__);
return dll00Func00() - 3;
}
main00.c:
#include "lib00.h"
#include <stdio.h>
int main() {
printf("%s - %d - %s\n", __FILE__, __LINE__, __FUNCTION__);
int res = lib00Func00();
printf("Lib func returned: %d\n", res);
printf("\nDone.\n");
return 0;
}
Output:
[cfati#CFATI-5510-0:e:\Work\Dev\StackOverflow\q069197545]> sopr.bat
### Set shorter prompt to better fit when pasted in StackOverflow (or other) pages ###
[prompt]> "c:\Install\pc032\Microsoft\VisualStudioCommunity\2019\VC\Auxiliary\Build\vcvarsall.bat" x64 >nul
[prompt]> dir /b
dll00.c
dll00.h
lib00.c
lib00.h
main00.c
[prompt]> :: Build .dll (1 step)
[prompt]> cl /nologo /MD /DDLL dll00.c /link /NOLOGO /DLL /OUT:dll00.dll
dll00.c
Creating library dll00.lib and object dll00.exp
[prompt]> :: Build .lib (2 steps)
[prompt]> cl /c /nologo /MD /Folib00.obj lib00.c
lib00.c
[prompt]> lib /NOLOGO /OUT:lib00.lib lib00.obj
[prompt]> :: Build .exe (1 step)
[prompt]> cl /nologo /MD /W0 main00.c /link /NOLOGO /OUT:main00_pc064.exe lib00.lib dll00.lib
main00.c
[prompt]> dir /b
dll00.c
dll00.dll
dll00.exp
dll00.h
dll00.lib
dll00.obj
lib00.c
lib00.h
lib00.lib
lib00.obj
main00.c
main00.obj
main00_pc064.exe
[prompt]> main00_pc064.exe
main00.c - 7 - main
lib00.c - 9 - lib00Func00
dll00.c - 8 - dll00Func00
Lib func returned: -6
Done.
So, it works (at least this trivial example). As seen, when building the .exe I also passed the .dll's .lib to the linker (meaning that the .dll (together with all its (recurring) dependents) is required at runtime). For info on how to do it on the VStudio project, check [SO]: How to include OpenSSL in Visual Studio (#CristiFati's answer).

How do I compile a cilk program?

I installed Cilk using the instructions from their website.
sudo apt-add-repository ppa:wsmoses/tapir-toolchain
sudo apt-get update
sudo apt-get install tapirclang-5.0 libcilkrts5
I copied the following program from the Cilk documentation.
#include <stdio.h>
#include <stdint.h>
int64_t fib(int64_t n) {
if (n < 2) return n;
int x, y;
x = cilk_spawn fib(n - 1);
y = fib(n - 2);
cilk_sync;
return x + y;
}
int main(){
printf("%ld\n", fib(20));
}
I then compiled using the compiler flag that they specified.
clang-5.0 -fcilkplus Fib.c
Fib.c:7:9: error: use of undeclared identifier 'cilk_spawn'
x = cilk_spawn fib(n - 1);
^
Fib.c:9:5: error: use of undeclared identifier 'cilk_sync'
cilk_sync;
^
The desired output is a working executable that uses Cilk and prints 6765.
What magic incantations are needed to produce this executable?
I am running Ubuntu 18.04 with kernel 4.4.0-45-generic.

What is the stack limit when MATLAB calls function in DLL

I am trying to figure out, what is the stack size limitation, when MATLAB calls function in DLL.
Is there a way to configure the limit?
I am using loadlibrary, and calllib functions to call function implemented in C (in Dynamic-link library).
I created a test to figure out the stack limit.
I am using MATLAB 2016a (64 bits), and Visual Studio 2010 for building the DLL.
Here is my MATLAB source code:
loadlibrary('MyDll','MyDll.h')
size_in_bytes = 1000000;
res = calllib('MyDll', 'Test', size_in_bytes);
if (res == -1)
disp(['Stack Overflow... (size = ', num2str(size_in_bytes), ')']);
else
disp(['Successful stack allocation... (size = ', num2str(size_in_bytes), ')']);
end
unloadlibrary MyDll
Here is my C source code:
MyDll.h
// MyDll.h : DLL interface.
#ifndef MY_DLL_H
#define MY_DLL_H
#ifdef MY_DLL_EXPORTS
#define MY_DLL_API __declspec(dllexport)
#else
#define MY_DLL_API __declspec(dllimport)
#endif
extern MY_DLL_API int Test(int size);
#endif
MyDll.c
// MyDll.c
#include "MyDll.h"
#include <windows.h>
#include <stdio.h>
#include <string.h>
#include <malloc.h>
//Allocate <size> bytes in stack using _alloca(size).
//Return 0 if OK.
//Return (-1) in case of stack overflow.
int Test(int size)
{
//Not allocated on the stack...
static wchar_t errorMsg[100];
static wchar_t okMsg[100];
int errcode = 0;
void *pData = NULL;
//Prepare messages from advance.
swprintf_s(errorMsg, 100, L"Stack Overflow (size = %d)", size);
swprintf_s(okMsg, 100, L"Successful stack allocation (size = %d)", size);
__try
{
pData = _alloca(size);
}
// If an exception occurred with the _alloca function
__except (GetExceptionCode() == STATUS_STACK_OVERFLOW)
{
MessageBox(NULL, errorMsg, TEXT("Error"), MB_OK | MB_ICONERROR);
// If the stack overflows, use this function to restore.
errcode = _resetstkoflw();
if (errcode)
{
MessageBox(NULL, TEXT("Could not reset the stack!"), TEXT("Error"), MB_OK | MB_ICONERROR);
_exit(1);
}
pData = NULL;
};
if (pData != NULL)
{
//Fill allocated buffer with zeros
memset(pData, 0, size);
MessageBox(NULL, okMsg, TEXT("OK"), MB_OK);
return 0;
}
return -1;
}
The __try and __except block is taken from Microsoft example:
https://msdn.microsoft.com/en-us/library/wb1s57t5.aspx
DLL Compiler flags:
/Zi /nologo /W4 /WX- /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_USRDLL" /D "MY_DLL_EXPORTS" /D "_WINDLL" /D "_UNICODE" /D "UNICODE" /Gm /EHsc /RTC1 /MTd /GS /fp:precise /Zc:wchar_t /Zc:forScope /Fp"x64\Debug\MyDll.pch" /Fa"x64\Debug\" /Fo"x64\Debug\" /Fd"x64\Debug\vc100.pdb" /Gd /errorReport:queue
DLL Linker flags:
/OUT:"x64\Debug\MyDll.dll" /INCREMENTAL:NO /NOLOGO /DLL "kernel32.lib" "user32.lib" "gdi32.lib" "winspool.lib" "comdlg32.lib" "advapi32.lib" "shell32.lib" "ole32.lib" "oleaut32.lib" "uuid.lib" "odbc32.lib" "odbccp32.lib" /MANIFEST /ManifestFile:"x64\Debug\MyDll.dll.intermediate.manifest" /ALLOWISOLATION /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG /PDB:"c:\Tmp\MyDll\x64\Debug\MyDll.pdb" /SUBSYSTEM:CONSOLE /PGD:"c:\Tmp\MyDll\x64\Debug\MyDll.pgd" /TLBID:1 /DYNAMICBASE /NXCOMPAT /MACHINE:X64 /ERRORREPORT:QUEUE
I executed the MATLAB code using different values of size_in_bytes:
size_in_bytes = 1000000: Pass!
size_in_bytes = 10000000: Pass!
size_in_bytes = 50000000: Pass!
size_in_bytes = 60000000: Pass!
size_in_bytes = 70000000: Stack Overflow!
Looks like the limit in my system is about 64MByte (but I don't know if this number is true for all systems).
I tried to modify stack size of Matlab.exe using editbin tool.
I tried the following command (for example):
editbin /STACK:250000000 "c:\Program Files\MATLAB\R2016a\bin\matlab.exe".
This option sets the size of the stack in bytes and takes arguments in decimal or C-language notation. The /STACK option applies only to an executable file.
It seems to have no affect...
Seems that on windows the size of the stack is set at compile time. So you can use option /F or the binary EDITBIN.
For example, you could to edit the following file:
EDITBIN /STACK:134217728 "C:\Program Files\MATLAB\R2016a\bin\win64\MATLAB.exe"
This would set the stack size to 128 MB (128 x 1024 x 1024 Bytes = 134217728 Bytes).
Note: be aware that editing the C:\Program Files\MATLAB\R2016a\bin\matlab.exe will have no effect.

Modifying Linker Script to make the .text section writable, errors

I am trying to make the .text section writable for a C program. I looked through the options provided in this SO question and zeroed on modifying the linker script to achieve this.
For this I created a writable memory region using
MEMORY { rwx (wx) : ORIGIN = 0x400000, LENGTH = 256K}
and at the section .text added:
.text :
{
*(.text.unlikely .text.*_unlikely)
*(.text.exit .text.exit.*)
*(.text.startup .text.startup.*)
*(.text.hot .text.hot.*)
*(.text .stub .text.* .gnu.linkonce.t.*)
/* .gnu.warning sections are handled specially by elf32.em. */
*(.gnu.warning)
} >rwx
On compiling the code with gcc flag -T and giving my linker file as an argument I am getting an error:
error: no memory region specified for loadable section '.interp'
I am only trying to change the memory permissions for the .text region. Working on Ubuntu x86_64 architecture.
Is there a better way to do this?
Any help is highly appreciated.
Thanks
The Linker Script
Linker Script on pastie.org
In Linux, you can use mprotect() to enable/disable text section write protection from the runtime code; see the Notes section in man 2 mprotect.
Here is a real-world example. First, however, a caveat:
I consider this just a proof of concept implementation, and not something I'd ever use in a real world application. It may look enticing for use in a high-performance library of some sort, but in my experience, changing the API (or the paradigm/approach) of the library usually yields much better results -- and fewer hard-to-debug bugs.
Consider the following six files:
foo1.c:
int foo1(const int a, const int b) { return a*a - 2*a*b + b*b; }
foo2.c:
int foo2(const int a, const int b) { return a*a + b*b; }
foo.h.header:
#ifndef FOO_H
#define FOO_H
extern int foo1(const int a, const int b);
extern int foo2(const int a, const int b);
foo.h.footer:
#endif /* FOO_H */
main.c:
#include <unistd.h>
#include <sys/mman.h>
#include <errno.h>
#include <string.h>
#include <stdio.h>
#include "foo.h"
int text_copy(const void *const target,
const void *const source,
const size_t length)
{
const long page = sysconf(_SC_PAGESIZE);
void *start = (char *)target - ((long)target % page);
size_t bytes = length + (size_t)((long)target % page);
/* Verify sane page size. */
if (page < 1L)
return errno = ENOTSUP;
/* Although length should not need to be a multiple of page size,
* adjust it up if need be. */
if (bytes % (size_t)page)
bytes = bytes + (size_t)page - (bytes % (size_t)page);
/* Disable write protect on target pages. */
if (mprotect(start, bytes, PROT_READ | PROT_WRITE | PROT_EXEC))
return errno;
/* Copy code.
* Note: if the target code is being executed, we're in trouble;
* this offers no atomicity guarantees, so other threads may
* end up executing some combination of old/new code.
*/
memcpy((void *)target, (const void *)source, length);
/* Re-enable write protect on target pages. */
if (mprotect(start, bytes, PROT_READ | PROT_EXEC))
return errno;
/* Success. */
return 0;
}
int main(void)
{
printf("foo1(): %d bytes at %p\n", foo1_SIZE, foo1_ADDR);
printf("foo2(): %d bytes at %p\n", foo2_SIZE, foo2_ADDR);
printf("foo1(3, 5): %d\n", foo1(3, 5));
printf("foo2(3, 5): %d\n", foo2(3, 5));
if (foo2_SIZE < foo1_SIZE) {
printf("Replacing foo1() with foo2(): ");
if (text_copy(foo1_ADDR, foo2_ADDR, foo2_SIZE)) {
printf("%s.\n", strerror(errno));
return 1;
}
printf("Done.\n");
} else {
printf("Replacing foo2() with foo1(): ");
if (text_copy(foo2_ADDR, foo1_ADDR, foo1_SIZE)) {
printf("%s.\n", strerror(errno));
return 1;
}
printf("Done.\n");
}
printf("foo1(3, 5): %d\n", foo1(3, 5));
printf("foo2(3, 5): %d\n", foo2(3, 5));
return 0;
}
function-info.bash:
#!/bin/bash
addr_prefix=""
addr_suffix="_ADDR"
size_prefix=""
size_suffix="_SIZE"
export LANG=C
export LC_ALL=C
nm -S "$#" | while read addr size kind name dummy ; do
[ -n "$addr" ] || continue
[ -n "$size" ] || continue
[ -z "$dummy" ] || continue
[ "$kind" = "T" ] || continue
[ "$name" != "${name#[A-Za-z]}" ] || continue
printf '#define %s ((void *)0x%sL)\n' "$addr_prefix$name$addr_suffix" "$addr"
printf '#define %s %d\n' "$size_prefix$name$size_suffix" "0x$size"
done || exit $?
Remember to make it executable using chmod u+x ./function-info.bash
First, compile the sources using valid sizes but invalid addresses:
gcc -W -Wall -O3 -c foo1.c
gcc -W -Wall -O3 -c foo2.c
( cat foo.h.header ; ./function-info.bash foo1.o foo2.o ; cat foo.h.footer) > foo.h
gcc -W -Wall -O3 -c main.c
The sizes are correct but the addresses are not, because the code is yet to be linked. Relative to the final binary, the object file contents are usually relocated at link time. So, link the sources to get example executable, example:
gcc -W -Wall -O3 main.o foo1.o foo2.o -o example
Extract the correct (sizes and) addresses:
( cat foo.h.header ; ./function-info.bash example ; cat foo.h.footer) > foo.h
Recompile and link,
gcc -W -Wall -O3 -c main.c
gcc -W -Wall -O3 foo1.o foo2.o main.o -o example
and verify that the constants now do match:
mv -f foo.h foo.h.used
( cat foo.h.header ; ./function-info.bash example ; cat foo.h.footer) > foo.h
cmp -s foo.h foo.h.used && echo "Done." || echo "Recompile and relink."
Due to high optimization (-O3) the code that utilizes the constants may change size, requiring a yet another recompile-relink. If the last line outputs "Recompile and relink", just repeat the last two steps, i.e. five lines.
(Note that since foo1.c and foo2.c do not use the constants in foo.h, they obviously do not need to be recompiled.)
On x86_64 (GCC-4.6.3-1ubuntu5), running ./example outputs
foo1(): 21 bytes at 0x400820
foo2(): 10 bytes at 0x400840
foo1(3, 5): 4
foo2(3, 5): 34
Replacing foo1() with foo2(): Done.
foo1(3, 5): 34
foo2(3, 5): 34
which shows that the foo1() function indeed was replaced. Note that the longer function is always replaced with the shorter one, because we must not overwrite any code outside the two functions.
You can modify the two functions to verify this; just remember to repeat the entire procedure (so that you use the correct _SIZE and _ADDR constants in main()).
Just for giggles, here is the generated foo.h for the above:
#ifndef FOO_H
#define FOO_H
extern int foo1(const int a, const int b);
extern int foo2(const int a, const int b);
#define foo1_ADDR ((void *)0x0000000000400820L)
#define foo1_SIZE 21
#define foo2_ADDR ((void *)0x0000000000400840L)
#define foo2_SIZE 10
#define main_ADDR ((void *)0x0000000000400610L)
#define main_SIZE 291
#define text_copy_ADDR ((void *)0x0000000000400850L)
#define text_copy_SIZE 226
#endif /* FOO_H */
You might wish to use a smarter scriptlet, say an awk one that uses nm -S to obtain all function names, addresses, and sizes, and in the header file replaces only the values of existing definitions, to generate your header file. I'd use a Makefile and some helper scripts.
Further notes:
The function code is copied as-is, no relocation etc. is done. (This means that if the machine code of the replacement function contains absolute jumps, the execution continues in the original code. These example functions were chosen, because they're unlikely to have absolute jumps in them. Run objdump -d foo1.o foo2.o to verify from the assembly.)
That is irrelevant if you use the example just to investigate how to modify executable code within the running process. However, if you build runtime-function-replacing schemes on top of this example, you may need to use position independent code for the replaced code (see the GCC manual for relevant options for your architecture) or do your own relocation.
If another thread or signal handler executes the code being modified, you're in serious trouble. You get undefined results. Unfortunately, some libraries start extra threads, which may not block all possible signals, so be extra careful when modifying code that might be run by a signal handler.
Do not assume the compiler compiles the code in a specific way or uses a specific organization. My example uses separate compilation units, to avoid the cases where the compiler might share code between similar functions.
Also, it examines the final executable binary directly, to obtain the sizes and addresses to be modified to modify an entire function implementation. All verifications should be done on the object files or final executable, and disassembly, instead of just looking at the C code.
Putting any code that relies on the address and size constants into a separate compilation unit makes it easier and faster to recompile and relink the binary. (You only need to recompile the code that uses the constants directly, and you can even use less optimization for that code, to eliminate extra recompile-relink cycles, without impacting the overall code quality.)
In my main.c, both the address and length supplied to mprotect() are page-aligned (based on the user parameters). The documents say only the address has to be. Since protections are page-granular, making sure the length is a multiple of the page size does not hurt.
You can read and parse /proc/self/maps (which is a kernel-generated pseudofile; see man 5 proc, /proc/[pid]/maps section, for further info) to obtain the existing mappings and their protections for the current process.
In any case, if you have any questions, I'd be happy to try and clarify the above.
Addendum:
It turns out that using the GNU extension dl_iterate_phdr() you can enable/disable write protection on all text sections trivially:
#define _GNU_SOURCE
#include <unistd.h>
#include <dlfcn.h>
#include <sys/mman.h>
#include <link.h>
static int do_write_protect_text(struct dl_phdr_info *info, size_t size, void *data)
{
const int protect = (data) ? PROT_READ | PROT_EXEC : PROT_READ | PROT_WRITE | PROT_EXEC;
size_t page;
size_t i;
page = sysconf(_SC_PAGESIZE);
if (size < sizeof (struct dl_phdr_info))
return ENOTSUP;
/* Ignore libraries. */
if (info->dlpi_name && info->dlpi_name[0] != '\0')
return 0;
/* Loop over each header. */
for (i = 0; i < (size_t)info->dlpi_phnum; i++)
if ((info->dlpi_phdr[i].p_flags & PF_X)) {
size_t ptr = (size_t)info->dlpi_phdr[i].p_vaddr;
size_t len = (size_t)info->dlpi_phdr[i].p_memsz;
/* Start at the beginning of the relevant page, */
if (ptr % page) {
len += ptr % page;
ptr -= ptr % page;
}
/* and use full pages. */
if (len % page)
len += page - (len % page);
/* Change protections. Ignore unmapped sections. */
if (mprotect((void *)ptr, len, protect))
if (errno != ENOMEM)
return errno;
}
return 0;
}
int write_protect_text(int protect)
{
int result;
result = dl_iterate_phdr(do_write_protect_text, (void *)(long)protect);
if (result)
errno = result;
return result;
}
Here is an example program you can use to test the above write_protect_text() function:
#define _POSIX_C_SOURCE 200809L
int dump_smaps(void)
{
FILE *in;
char *line = NULL;
size_t size = 0;
in = fopen("/proc/self/smaps", "r");
if (!in)
return errno;
while (getline(&line, &size, in) > (ssize_t)0)
if ((line[0] >= '0' && line[0] <= '9') ||
(line[0] >= 'a' && line[0] <= 'f'))
fputs(line, stdout);
free(line);
if (!feof(in) || ferror(in)) {
fclose(in);
return errno = EIO;
}
if (fclose(in))
return errno = EIO;
return 0;
}
int main(void)
{
printf("Initial mappings:\n");
dump_smaps();
if (write_protect_text(0)) {
fprintf(stderr, "Cannot disable write protection on text sections: %s.\n", strerror(errno));
return EXIT_FAILURE;
}
printf("\nMappings with write protect disabled:\n");
dump_smaps();
if (write_protect_text(1)) {
fprintf(stderr, "Cannot enable write protection on text sections: %s.\n", strerror(errno));
return EXIT_FAILURE;
}
printf("\nMappings with write protect enabled:\n");
dump_smaps();
return EXIT_SUCCESS;
}
The example program dumps /proc/self/smaps before and after changing the text section write protection, showing that it indeed does enable/disable write protectio on all text sections (program code). It does not try to alter write protect on dynamically-loaded libraries. This was tested to work on x86-64 using Ubuntu 3.8.0-35-generic kernel.
If you just want to have one executable with a writable .text, you can just link with -N
At least for me, binutils 2.22 , ld -N objectfile.o
will produce a binary that i can happily write around in.
Reading gcc pages, you can pass the linker option from gcc by : gcc -XN source

System string2.h header file generates a compilation error when optimizing

I am running Linaro Ubuntu 12.03 on an embedded platform. After using this system for a few months for building a simple program, I started receiving a compilation error when adding an optimization option. So, I created a test program:
// test.c
#include <string.h>
int main(int argc, char *argv[])
{
return 0;
}
compiling with:
gcc test.c
works just fine. However, when I add an optimization option:
gcc -O1 test.c
I get an error:
In file included from /usr/include/string.h:637:0,
from test.c:1:
/usr/include/arm-linux-gnueabi/bits/string2.h:1305:3: error: "(" may not appear in macro parameter list
This happens for all levels from -O1 to -Ofast.
Trying the same on another embedded system with Linaro Ubuntu 12.04, it works just fine. So does it on my Ubuntu PC.
The code section in string2.h:
# define __strdup(sp \
(__extension__ (__builtin_constant_p (s) && __string2_1bptr_p (s) \
? (((__const char *) (s))[0] == '\0' \
? (char *) calloc ((size_t) 1, (size_t) 1) \
: ({ size_t __len = strlen (s) + 1; \
char *__retval = (char *) malloc (__len); \
if (__retval != NULL) \
__retval = (char *) memcpy (__retval, s, __len); \
__retval; })) \
: __strdup (s)))
(the problem is in the 2nd line of the macro)
Why did my build environment stop working with no apparent reason?
UPDATE 1:
I just examined the same file on another board running 12.03, as well as the one on the 12.04 system. It looks like there is indeed a syntax error in the string2.h file on the 1st board. The two other files show:
# define __strdup(s) \
instead of:
# define __strdup(sp \
so it the ) was replaced with p. The only explanation I can think of now is that the SD card I am using starts to corrupt files. However, any other explanation is appreciated.
Since it used to work in the past and string2.h changed on the SD card, it's likely that there's a bad sector in the SD card at the least.

Resources