I am trying to figure out, what is the stack size limitation, when MATLAB calls function in DLL.
Is there a way to configure the limit?
I am using loadlibrary, and calllib functions to call function implemented in C (in Dynamic-link library).
I created a test to figure out the stack limit.
I am using MATLAB 2016a (64 bits), and Visual Studio 2010 for building the DLL.
Here is my MATLAB source code:
loadlibrary('MyDll','MyDll.h')
size_in_bytes = 1000000;
res = calllib('MyDll', 'Test', size_in_bytes);
if (res == -1)
disp(['Stack Overflow... (size = ', num2str(size_in_bytes), ')']);
else
disp(['Successful stack allocation... (size = ', num2str(size_in_bytes), ')']);
end
unloadlibrary MyDll
Here is my C source code:
MyDll.h
// MyDll.h : DLL interface.
#ifndef MY_DLL_H
#define MY_DLL_H
#ifdef MY_DLL_EXPORTS
#define MY_DLL_API __declspec(dllexport)
#else
#define MY_DLL_API __declspec(dllimport)
#endif
extern MY_DLL_API int Test(int size);
#endif
MyDll.c
// MyDll.c
#include "MyDll.h"
#include <windows.h>
#include <stdio.h>
#include <string.h>
#include <malloc.h>
//Allocate <size> bytes in stack using _alloca(size).
//Return 0 if OK.
//Return (-1) in case of stack overflow.
int Test(int size)
{
//Not allocated on the stack...
static wchar_t errorMsg[100];
static wchar_t okMsg[100];
int errcode = 0;
void *pData = NULL;
//Prepare messages from advance.
swprintf_s(errorMsg, 100, L"Stack Overflow (size = %d)", size);
swprintf_s(okMsg, 100, L"Successful stack allocation (size = %d)", size);
__try
{
pData = _alloca(size);
}
// If an exception occurred with the _alloca function
__except (GetExceptionCode() == STATUS_STACK_OVERFLOW)
{
MessageBox(NULL, errorMsg, TEXT("Error"), MB_OK | MB_ICONERROR);
// If the stack overflows, use this function to restore.
errcode = _resetstkoflw();
if (errcode)
{
MessageBox(NULL, TEXT("Could not reset the stack!"), TEXT("Error"), MB_OK | MB_ICONERROR);
_exit(1);
}
pData = NULL;
};
if (pData != NULL)
{
//Fill allocated buffer with zeros
memset(pData, 0, size);
MessageBox(NULL, okMsg, TEXT("OK"), MB_OK);
return 0;
}
return -1;
}
The __try and __except block is taken from Microsoft example:
https://msdn.microsoft.com/en-us/library/wb1s57t5.aspx
DLL Compiler flags:
/Zi /nologo /W4 /WX- /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_USRDLL" /D "MY_DLL_EXPORTS" /D "_WINDLL" /D "_UNICODE" /D "UNICODE" /Gm /EHsc /RTC1 /MTd /GS /fp:precise /Zc:wchar_t /Zc:forScope /Fp"x64\Debug\MyDll.pch" /Fa"x64\Debug\" /Fo"x64\Debug\" /Fd"x64\Debug\vc100.pdb" /Gd /errorReport:queue
DLL Linker flags:
/OUT:"x64\Debug\MyDll.dll" /INCREMENTAL:NO /NOLOGO /DLL "kernel32.lib" "user32.lib" "gdi32.lib" "winspool.lib" "comdlg32.lib" "advapi32.lib" "shell32.lib" "ole32.lib" "oleaut32.lib" "uuid.lib" "odbc32.lib" "odbccp32.lib" /MANIFEST /ManifestFile:"x64\Debug\MyDll.dll.intermediate.manifest" /ALLOWISOLATION /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG /PDB:"c:\Tmp\MyDll\x64\Debug\MyDll.pdb" /SUBSYSTEM:CONSOLE /PGD:"c:\Tmp\MyDll\x64\Debug\MyDll.pgd" /TLBID:1 /DYNAMICBASE /NXCOMPAT /MACHINE:X64 /ERRORREPORT:QUEUE
I executed the MATLAB code using different values of size_in_bytes:
size_in_bytes = 1000000: Pass!
size_in_bytes = 10000000: Pass!
size_in_bytes = 50000000: Pass!
size_in_bytes = 60000000: Pass!
size_in_bytes = 70000000: Stack Overflow!
Looks like the limit in my system is about 64MByte (but I don't know if this number is true for all systems).
I tried to modify stack size of Matlab.exe using editbin tool.
I tried the following command (for example):
editbin /STACK:250000000 "c:\Program Files\MATLAB\R2016a\bin\matlab.exe".
This option sets the size of the stack in bytes and takes arguments in decimal or C-language notation. The /STACK option applies only to an executable file.
It seems to have no affect...
Seems that on windows the size of the stack is set at compile time. So you can use option /F or the binary EDITBIN.
For example, you could to edit the following file:
EDITBIN /STACK:134217728 "C:\Program Files\MATLAB\R2016a\bin\win64\MATLAB.exe"
This would set the stack size to 128 MB (128 x 1024 x 1024 Bytes = 134217728 Bytes).
Note: be aware that editing the C:\Program Files\MATLAB\R2016a\bin\matlab.exe will have no effect.
Related
I have created a static C library in Visual Studio 2019 on Windows 10 which depends on the tensorflow library, which is dynamic (.dll). My library, lets call it A.lib, contains a function which takes data, pass it to a tensorflow model and returns the model's output. The compilation seems to work well and creates an A.lib file.
Now I want to use my static library in another project to create an .exe. Lets call it B. I copied the header A.h and the A.lib into the B project and adapt the project properties so that my library can be found.
The problem is that I get LNK2001 errors, because the linker can not find the definitions of the tensorflow functions which I call in my A.lib.
I tried to copy the tensorflow lib into my project B as well. But that did not help.
What do I have to do to include the libraries correctly? Or is there a simpler alternative to deploy a convolutional neural network in C?
Here's a [SO]: How to create a Minimal, Reproducible Example (reprex (mcve)).
dll00.h:
#if defined(_WIN32)
# if defined(DLL00_EXPORTS)
# define DLL00_EXPORT_API __declspec(dllexport)
# else
# define DLL00_EXPORT_API __declspec(dllimport)
# endif
#else
# define DLL00_EXPORT_API
#endif
#if defined(__cplusplus)
extern "C" {
#endif
DLL00_EXPORT_API int dll00Func00();
#if defined(__cplusplus)
}
#endif
dll00.c:
#define DLL00_EXPORTS
#include "dll00.h"
#include <stdio.h>
int dll00Func00() {
printf("%s - %d - %s\n", __FILE__, __LINE__, __FUNCTION__);
return -3;
}
lib00.h:
#if defined(__cplusplus)
extern "C" {
#endif
int lib00Func00();
#if defined(__cplusplus)
}
#endif
lib00.c:
#include "lib00.h"
#include "dll00.h"
#include <stdio.h>
int lib00Func00() {
printf("%s - %d - %s\n", __FILE__, __LINE__, __FUNCTION__);
return dll00Func00() - 3;
}
main00.c:
#include "lib00.h"
#include <stdio.h>
int main() {
printf("%s - %d - %s\n", __FILE__, __LINE__, __FUNCTION__);
int res = lib00Func00();
printf("Lib func returned: %d\n", res);
printf("\nDone.\n");
return 0;
}
Output:
[cfati#CFATI-5510-0:e:\Work\Dev\StackOverflow\q069197545]> sopr.bat
### Set shorter prompt to better fit when pasted in StackOverflow (or other) pages ###
[prompt]> "c:\Install\pc032\Microsoft\VisualStudioCommunity\2019\VC\Auxiliary\Build\vcvarsall.bat" x64 >nul
[prompt]> dir /b
dll00.c
dll00.h
lib00.c
lib00.h
main00.c
[prompt]> :: Build .dll (1 step)
[prompt]> cl /nologo /MD /DDLL dll00.c /link /NOLOGO /DLL /OUT:dll00.dll
dll00.c
Creating library dll00.lib and object dll00.exp
[prompt]> :: Build .lib (2 steps)
[prompt]> cl /c /nologo /MD /Folib00.obj lib00.c
lib00.c
[prompt]> lib /NOLOGO /OUT:lib00.lib lib00.obj
[prompt]> :: Build .exe (1 step)
[prompt]> cl /nologo /MD /W0 main00.c /link /NOLOGO /OUT:main00_pc064.exe lib00.lib dll00.lib
main00.c
[prompt]> dir /b
dll00.c
dll00.dll
dll00.exp
dll00.h
dll00.lib
dll00.obj
lib00.c
lib00.h
lib00.lib
lib00.obj
main00.c
main00.obj
main00_pc064.exe
[prompt]> main00_pc064.exe
main00.c - 7 - main
lib00.c - 9 - lib00Func00
dll00.c - 8 - dll00Func00
Lib func returned: -6
Done.
So, it works (at least this trivial example). As seen, when building the .exe I also passed the .dll's .lib to the linker (meaning that the .dll (together with all its (recurring) dependents) is required at runtime). For info on how to do it on the VStudio project, check [SO]: How to include OpenSSL in Visual Studio (#CristiFati's answer).
How can I change my current working directory in C++ in a platform-agnostic way?
I found the direct.h header file, which is Windows compatible, and the unistd.h, which is UNIX/POSIX compatible.
The chdir function works on both POSIX (manpage) and Windows (called _chdir there but an alias chdir exists).
Both implementations return zero on success and -1 on error. As you can see in the manpage, more distinguished errno values are possible in the POSIX variant, but that shouldn't really make a difference for most use cases.
Now, with C++17 is possible to use std::filesystem::current_path:
#include <filesystem>
int main() {
auto path = std::filesystem::current_path(); //getting path
std::filesystem::current_path(path); //setting path
}
For C++, boost::filesystem::current_path (setter and getter prototypes).
A file system library based on Boost.Filesystem will be added to the standard.
This cross-platform sample code for changing the working directory using POSIX chdir and MS _chdir as recommend in this answer. Likewise for determining the current working directory, the analogous getcwd and _getcwd are used.
These platform differences are hidden behind the macros cd and cwd.
As per the documentation, chdir's signature is int chdir(const char *path) where path is absolute or relative. chdir will return 0 on success. getcwd is slightly more complicated because it needs (in one variant) a buffer to store the fetched path in as seen in char *getcwd(char *buf, size_t size). It returns NULL on failure and a pointer to the same passed buffer on success. The code sample makes use of this returned char pointer directly.
The sample is based on #MarcD's but corrects a memory leak. Additionally, I strove for concision, no dependencies, and only basic failure/error checking as well as ensuring it works on multiple (common) platforms.
I tested it on OSX 10.11.6, Centos7, and Win10. For OSX & Centos, I used g++ changedir.cpp -o changedir to build and ran as ./changedir <path>.
On Win10, I built with cl.exe changedir.cpp /EHsc /nologo.
MVP solution
$ cat changedir.cpp
#ifdef _WIN32
#include <direct.h>
// MSDN recommends against using getcwd & chdir names
#define cwd _getcwd
#define cd _chdir
#else
#include "unistd.h"
#define cwd getcwd
#define cd chdir
#endif
#include <iostream>
char buf[4096]; // never know how much is needed
int main(int argc , char** argv) {
if (argc > 1) {
std::cout << "CWD: " << cwd(buf, sizeof buf) << std::endl;
// Change working directory and test for success
if (0 == cd(argv[1])) {
std::cout << "CWD changed to: " << cwd(buf, sizeof buf) << std::endl;
}
} else {
std::cout << "No directory provided" << std::endl;
}
return 0;
}
OSX Listing:
$ g++ changedir.c -o changedir
$ ./changedir testing
CWD: /Users/Phil
CWD changed to: /Users/Phil/testing
Centos Listing:
$ g++ changedir.c -o changedir
$ ./changedir
No directory provided
$ ./changedir does_not_exist
CWD: /home/phil
$ ./changedir Music
CWD: /home/phil
CWD changed to: /home/phil/Music
$ ./changedir /
CWD: /home/phil
CWD changed to: /
Win10 Listing
cl.exe changedir.cpp /EHsc /nologo
changedir.cpp
c:\Users\Phil> changedir.exe test
CWD: c:\Users\Phil
CWD changed to: c:\Users\Phil\test
Note: OSX uses clang and Centos gnu gcc behind g++.
Does chdir() do what you want? It works under both POSIX and Windows.
You want chdir(2). If you are trying to have your program change the working directory of your shell - you can't. There are plenty of answers on SO already addressing that problem.
Did you mean C or C++? They are completely different languages.
In C, the standard that defines the language doesn't cover directories. Many platforms that support directories have a chdir function that takes a char* or const char* argument, but even where it exists the header where it's declared is not standard. There may also be subtleties as to what the argument means (e.g. Windows has per-drive directories).
In C++, googling leads to chdir and _chdir, and suggests that Boost doesn't have an interface to chdir. But I won't comment any further since I don't know C++.
Nice cross-platform way to change current directory in C++ was suggested long time ago by #pepper_chico. This solution uses boost::filesystem::current_path().
To get the current working directory use:
namespace fs = boost::filesystem;
fs::path cur_working_dir(fs::current_path());
To set the current working directory use:
namespace fs = boost::filesystem;
fs::current_path(fs::system_complete( fs::path( "new_working_directory_path" ) ));
Bellow is the self-contained helper functions:
#include "boost/filesystem/operations.hpp"
#include "boost/filesystem/path.hpp"
#include <string>
namespace fs = boost::filesystem;
fs::path get_cwd_pth()
{
return fs::current_path();
}
std::string get_cwd()
{
return get_cwd_pth().c_str();
}
void set_cwd(const fs::path& new_wd)
{
fs::current_path(fs::system_complete( new_wd));
}
void set_cwd(const std::string& new_wd)
{
set_cwd( fs::path( new_wd));
}
Here is my complete code-example on how to set/get current working directory:
#include "boost/filesystem/operations.hpp"
#include "boost/filesystem/path.hpp"
#include <iostream>
namespace fs = boost::filesystem;
int main( int argc, char* argv[] )
{
fs::path full_path;
if ( argc > 1 )
{
full_path = fs::system_complete( fs::path( argv[1] ) );
}
else
{
std::cout << "Usage: tcd [path]" << std::endl;
}
if ( !fs::exists( full_path ) )
{
std::cout << "Not found: " << full_path.c_str() << std::endl;
return 1;
}
if ( !fs::is_directory( full_path ))
{
std::cout << "Provided path is not a directory: " << full_path.c_str() << std::endl;
return 1;
}
std::cout << "Old current working directory: " << boost::filesystem::current_path().c_str() << std::endl;
fs::current_path(full_path);
std::cout << "New current working directory: " << boost::filesystem::current_path().c_str() << std::endl;
return 0;
}
If boost installed on your system you can use the following command to compile this sample:
g++ -o tcd app.cpp -lboost_filesystem -lboost_system
Can't believe no one has claimed the bounty on this one yet!!!
Here is a cross platform implementation that gets and changes the current working directory using C++. All it takes is a little macro magic, to read the value of argv[0], and to define a few small functions.
Here is the code to change directories to the location of the executable file that is running currently. It can easily be adapted to change the current working directory to any directory you want.
Code :
#ifdef _WIN32
#include "direct.h"
#define PATH_SEP '\\'
#define GETCWD _getcwd
#define CHDIR _chdir
#else
#include "unistd.h"
#define PATH_SEP '/'
#define GETCWD getcwd
#define CHDIR chdir
#endif
#include <cstring>
#include <string>
#include <iostream>
using std::cout;
using std::endl;
using std::string;
string GetExecutableDirectory(const char* argv0) {
string path = argv0;
int path_directory_index = path.find_last_of(PATH_SEP);
return path.substr(0 , path_directory_index + 1);
}
bool ChangeDirectory(const char* dir) {return CHDIR(dir) == 0;}
string GetCurrentWorkingDirectory() {
const int BUFSIZE = 4096;
char buf[BUFSIZE];
memset(buf , 0 , BUFSIZE);
GETCWD(buf , BUFSIZE - 1);
return buf;
}
int main(int argc , char** argv) {
cout << endl << "Current working directory was : " << GetCurrentWorkingDirectory() << endl;
cout << "Changing directory..." << endl;
string exedir = GetExecutableDirectory(argv[0]);
ChangeDirectory(exedir.c_str());
cout << "Current working directory is now : " << GetCurrentWorkingDirectory() << endl;
return 0;
}
Output :
c:\Windows>c:\ctwoplus\progcode\test\CWD\cwd.exe
Current working directory was : c:\Windows
Changing directory...
Current working directory is now : c:\ctwoplus\progcode\test\CWD
c:\Windows>
I am trying to convert the given C code to DLL file I tried to compile the given code but it doesn't work it gives me an error
.\sample.cpp:16:66: warning: passing NULL to non-pointer argument 5 of 'void* CreateThread(LPSECURITY_ATTRIBUTES, DWORD, LPTHREAD_START_ROUTINE, PVOID, DWORD, PDWORD)' [-Wconversion-null]
c:/mingw/bin/../lib/gcc/mingw32/8.2.0/../../../../mingw32/bin/ld.exe: c:/mingw/bin/../lib/gcc/mingw32/8.2.0/../../../libmingw32.a(main.o):(.text.startup+0xb0): undefined reference to `WinMain#16'
collect2.exe: error: ld returned 1 exit status
I have already tried looking up the answers available online, along with looking at the syntax which seems to be okay.
My Code
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
extern "C" __declspec(dllexport)
DWORD WINAPI MessageBoxThread(LPVOID lpParam) {
MessageBox(NULL, "Hello world!", "Hello World!", NULL);
return 0;
}
extern "C" __declspec(dllexport)
BOOL APIENTRY DllMain(HMODULE hModule,
DWORD ul_reason_for_call,
LPVOID lpReserved) {
switch (ul_reason_for_call) {
case DLL_PROCESS_ATTACH:
CreateThread(NULL, NULL, MessageBoxThread, NULL, NULL, NULL);
break;
case DLL_THREAD_ATTACH:
case DLL_THREAD_DETACH:
case DLL_PROCESS_DETACH:
break;
}
return TRUE;
}
I'm using Visual Studio 2017 community edition, version 15.8.1 for reference. I created a simple DLL project (using File | New | Project) and removed all files except dllmain.cpp I had to modify the project properties to disable the use of precompiled headers. The code in dllmain.cpp is:
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
extern "C" __declspec(dllexport)
DWORD WINAPI MessageBoxThread(LPVOID lpParam)
{
MessageBoxA(NULL, "Hello World", "Hello World", MB_YESNO);
return 0;
}
BOOL APIENTRY DllMain(HMODULE hModule, DWORD dwReason, LPVOID lpReserved)
{
switch (dwReason)
{
case DLL_PROCESS_ATTACH:
DWORD dwTID;
CreateThread(nullptr, 0, MessageBoxThread, nullptr, 0, &dwTID);
break;
case DLL_THREAD_ATTACH:
case DLL_THREAD_DETACH:
case DLL_PROCESS_DETACH:
break;
}
return TRUE;
}
I've made a few modifications from your code,
I passed MB_YESNO as the final argument to MessageBox instead of a null pointer. This will give your message box YES and NO buttons, and this matches the type expected (uint) for the fourth parameter.
I used MessageBoxA to force ASCII argument instead of MessageBox, which is a typedef that resolves to MessageBoxW. This effects the type of strings that is expected, and is based on if you are using Unicode or not.
I passed a value of zero for the second argument (dwStackSize) and the fifth argument (dwCreationFlags) of CreateThread instead of NULL. Both of these arguments have type DWORD. Notice that this fixes the first line of your error message ".\sample.cpp:16:66: warning: passing NULL to non-pointer argument 5"
I declared a variable dwTID and pass a pointer to it as the sixth argument of CreateThread.
I added a break statement to the first case. This should not have any consequence in the code. I just think it is a good idea.
The above code compiles without warnings or errors. Thus I believe that your code should also compile as well. Thus I strongly suspect that the errors you are seeing are due to the compiler and linker flags you are using. The command lines that are being used are
for compilation:
/JMC /permissive- /GS /analyze- /W3 /Zc:wchar_t /ZI /Gm- /Od /sdl /Fd"Debug\vc141.pdb" /Zc:inline /fp:precise /D "WIN32" /D "_DEBUG" /D "TESTDLL_EXPORTS" /D "_WINDOWS" /D "_USRDLL" /D "_WINDLL" /D "_UNICODE" /D "UNICODE" /errorReport:prompt /WX- /Zc:forScope /RTC1 /Gd /Oy- /MDd /FC /Fa"Debug\" /EHsc /nologo /Fo"Debug\" /Fp"Debug\testDll.pch" /diagnostics:classic
for linking:
/OUT:"D:\GNUHome\Projects\testDll\Debug\testDll.dll" /MANIFEST /NXCOMPAT /PDB:"D:\GNUHome\Projects\testDll\Debug\testDll.pdb" /DYNAMICBASE "kernel32.lib" "user32.lib" "gdi32.lib" "winspool.lib" "comdlg32.lib" "advapi32.lib" "shell32.lib" "ole32.lib" "oleaut32.lib" "uuid.lib" "odbc32.lib" "odbccp32.lib"/IMPLIB:"D:\GNUHome\Projects\testDll\Debug\testDll.lib" /DEBUG /DLL /MACHINE:X86 /INCREMENTAL /PGD:"D:\GNUHome\Projects\testDll\Debug\testDll.pgd" /SUBSYSTEM:WINDOWS /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /ManifestFile:"Debug\testDll.dll.intermediate.manifest" /ERRORREPORT:PROMPT /NOLOGO /TLBID:1
Because you are not using native Microsoft toos (CL and LINK), you are going to need to find or prepare a mapping between the tool chain you are using (which you did not mention, but it appears to be mingw from the error messages) and Microsoft's tool chain.
If'n I had to guess, I would suspect that the issue is due to the /DLL flag in the linking command line. You might have to use something line -shared with mingw. However this is just a guess.
I've a C program structured in this way:
#include <Windows.h>
#include <stdio.h>
#include <stdint.h>
#pragma section(".code",execute, read, write)
#pragma comment(linker,"/SECTION:.code,ERW")
#pragma code_seg(".code")
//Code to decrypt
#pragma section(".stub", execute, read, write)
#pragma code_seg(".stub")
void decryptor(){
//Retrieve virtual address of the pointer to the .code section
//Retrieve the virtual size of the pointer to the .code section
for(int i = 0; i<size; i++){
//HERE THE PROGRAM STOPS
ptrCode[0] = //Reverse function of the encryptor
}
}
int main(){
decryptor();
mainFunctionDecrypted();
return 0;
}
Basically i've an encryptor which first encrypt the .code segment in the exe of this program after compilation.
Then when i execute the modified exe i want to be able to first decrypt it and then execute the decrypted part. However it seems like i cannot write to the .code segment loaded in memory (I think because it's a part memory dedicated to code to be executed).
Is there any way to write to executable memory?
Is there any workaroud you would do?
Windows and other operating systems go out of their way to prevent you from doing this (modifying the code sections of a running application).
Your immediate options, then, are
1) decrypt the code to some other memory area allocated dynamically for that purpose (code must then either use only position-independent instructions, or contain custom fixups for the instructions that have position-specific data).
2) use a separate program that decrypts the program on disk before it is executed.
Obfuscating a program in this way is inherently futile. Whatever your "decryptor" does, a human who is determined to reverse engineer your program can also do. Spend your effort instead on making your program desirable enough that people want to pay you for it, and benevolent enough that you don't have to hide what it's doing.
I need to modify the code in the following way. Moreover there are important compiler option to set in visual studio, for example to disable the Data Execution Prevention.
Compiler option used:
/permissive- /GS /TC /GL /analyze- /W3 /Gy /Zc:wchar_t /Gm- /O2 /sdl /Zc:inline /fp:precise /Zp1 /D "_MBCS" /errorReport:prompt /WX- /Zc:forScope /GR- /Gd /Oy- /Oi /MD /FC /nologo /diagnostics:classic
Linker option used:
/MANIFEST /LTCG:incremental /NXCOMPAT:NO /DYNAMICBASE:NO "kernel32.lib" "user32.lib" "gdi32.lib" "winspool.lib" "comdlg32.lib" "advapi32.lib" "shell32.lib" "ole32.lib" "oleaut32.lib" "uuid.lib" "odbc32.lib" "odbccp32.lib" /FIXED /MACHINE:X86 /OPT:REF /SAFESEH /INCREMENTAL:NO /SUBSYSTEM:CONSOLE /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /MAP /OPT:ICF /ERRORREPORT:PROMPT /NOLOGO /TLBID:1
#pragma section(".code", execute, read)
#pragma section(".codedata", read, write)
#pragma comment(linker,"/SECTION:.code,ERW")
#pragma comment(linker,"/SECTION:.codedata,ERW")
#pragma comment(linker, "/MERGE:.codedata=.code")
//All the following will go in code
#pragma code_seg(".code")
#pragma data_seg(".codedata")
#pragma const_seg(".codedata")
//CODE TO DECRYPT
// .stub SECTION
#pragma section(".stub", execute, read)
#pragma section(".stubdata", read, write)
#pragma comment(linker,"/SECTION:.stub,ERW")
#pragma comment(linker,"/SECTION:.stubdata,ERW")
#pragma comment(linker, "/MERGE:.stubdata=.stub")
//All the following will go in .stub segment
#pragma code_seg(".stub")
#pragma data_seg(".stubdata")
#pragma const_seg(".stubdata")
/*This function needs to be changed to whatever correspond to the decryption function of the encryotion function used by the encryptor*/
void decryptCodeSection(){
//Retrieve virtual address of the pointer to the .code section
//Retrieve the virtual size of the pointer to the .code section
for(int i = 0; i<size; i++){
//HERE THE PROGRAM STOPS
ptrCode[0] = //Reverse function of the encryptor
}
void main(int argc, char* argv[]){
decryptor();
mainFunctionDecrypted();
}
Doing this way i was able to first decrypt the segment and then execute the function.
While using a simple function to memset CUDA array, I get invalid argument for big arrays ( around > pow(2,25)).
I am running on a Tesla k40. I should have enough memory (by far) to allocate the array, and also enough capacity to throw the amount of blocks I am, however the following code exits with an error:
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#define MAXTHREADS 1024
//http://stackoverflow.com/a/16283216/1485872
#define cudaCheckErrors(msg) \
do { \
cudaError_t __err = cudaGetLastError(); \
if (__err != cudaSuccess) { \
fprintf(stderr, "Fatal error: %s (%s at %s:%d)\n", \
msg, cudaGetErrorString(__err), \
__FILE__, __LINE__); \
fprintf(stderr, "*** FAILED - ABORTING\n"); \
exit(1);} \
} while (0)
__global__ void mymemset(float* image, const float val, size_t N)
{
//http://stackoverflow.com/a/35133396/1485872
size_t tid = threadIdx.x + blockIdx.x * blockDim.x;
while (tid < N) {
image[tid] = val;
tid += gridDim.x * blockDim.x;
}
}
int main()
{
size_t total_pixels = pow(2, 26) ;
float* d_image;
cudaMalloc(&d_image, total_pixels*sizeof(float));
cudaCheckErrors("Malloc");
dim3 bsz = dim3(MAXTHREADS);
dim3 gsz = dim3(total_pixels / bsz.x + ((total_pixels % bsz.x > 0) ? 1 : 0));
mymemset << <gsz, bsz >> >(d_image, 1.0f, total_pixels);
cudaCheckErrors("mymemset"); //<- error!
cudaDeviceReset();
}
The code works fine up to (and a bit more) pow(2,25) in total_pixels but fails for pow(2,26).
Coincidentally this is the point where the block size bsz is 65536, which seems to be an upper limit in some GPUs, but in the Tesla k40 its supposed to be 2147483647 for the x dimension, while 65536 for y and z (that I am not using). Any insight about the origin of this error?
Compiler flags from VS2013: Properties->CUDA C/C++/command line
# Driver API (NVCC Compilation Type is .cubin, .gpu, or .ptx)
set CUDAFE_FLAGS=--sdk_dir "C:\Program Files (x86)\Windows Kits\8.1\"
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\bin\nvcc.exe" --use-local-env --cl-version 2013 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin" -G --keep-dir Debug -maxrregcount=0 --machine 32 --compile -cudart static -o Debug\%(Filename)%(Extension).obj "%(FullPath)"
# Runtime API (NVCC Compilation Type is hybrid object or .c file)
set CUDAFE_FLAGS=--sdk_dir "C:\Program Files (x86)\Windows Kits\8.1\"
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\bin\nvcc.exe" --use-local-env --cl-version 2013 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin" -G --keep-dir Debug -maxrregcount=0 --machine 32 --compile -cudart static -g -Xcompiler "/EHsc /nologo /Zi " -o Debug\%(Filename)%(Extension).obj "%(FullPath)"
You are compiling for the default architecture (sm_20), which has a block size limit of 65535 each dimension of the grid. You must build for sm_35 to be able to launch 2147483647 blocks in a 1D grid.
You should also note that the kernel you are using (which I wrote), could be run with many fewer blocks than (n/blocksize) and still work correctly, and it would be more efficient to do so.