LVM_GETITEMTEXT for x32 and x64 in C - c

I've been trying to get the text of items in listview another process. I found an awesome tutorial on CodeProject. Thanks to this article I was able to do this on x32. But when try to run on x64, it crashes the application I'm trying to access when SendMessage is called. In the articles comments people had simliar problems because of different pointer sizes. Some people suggested using a x64 compiler which I cant use. I need my program to run on both x32/x64. One guy suggested:
I have the answer. The LVITEM
structure is wrong under 64-bit
systems. Pointers are 64-bit now, so
the text pointer has to be followed by
a dummy value, to offset the length
member correctly.
I think this would be the best solution, as I could run it for x32 and x64 with one exe. I just have no idea how to do what hes talking about. I have included my code which currently works on x32. If anyone can help me out. That would be awesoem.
LVITEMLVITEM lvi, *_lvi;
char item[512];
char *_item;
unsigned long pid;
HANDLE process;
GetWindowThreadProcessId(procList, &pid);
process = OpenProcess(0x001f0fff, FALSE, pid);
_lvi = (LVITEM*)VirtualAllocEx(process, NULL, sizeof(LVITEM), 0x1000, 4);
_item = (char*)VirtualAllocEx(process, NULL, 512, 0x1000, 4);
lvi.cchTextMax = 512;
int r, c;
for (r = 0; r < rowCount; r++)
{
for (c = 0; c < columnCount; c++)
{
lvi.iSubItem = c;
lvi.pszText =_item;
// Insert lvi into programs's memory
WriteProcessMemory(process, _lvi, &lvi, sizeof(LVITEM), NULL);
// Have program write text to in its memory where we told it to
SendMessage(procList, LVM_GETITEMTEXT, (WPARAM)r, (LPARAM)_lvi);
// Get TVITEM back from programs
ReadProcessMemory(process, _item, item, 512, NULL);
}
}
// Clean up the mess we made
VirtualFreeEx(process, _lvi, 0, MEM_RELEASE);
VirtualFreeEx(process, _item, 0, MEM_RELEASE);
CloseHandle(process);

I don't think you'll be able to achieve this. In a 32 bit process your pointers will be too short. I believe that VirtualAllocEx will fail when called from a 32 bit process and with a 64 bit process handle as its first parameter. I think you would see this if you added error checking to your code.
Your only solution will be to have 2 versions, x86 and x64. That should be no real trouble - usually it can be done with single source.

Sending LVM_GETITEMTEXT message from 32-bit application to 64-bit ListView is actually possible.
I was able to achieve this by using not the original LVITEM (60 bytes long) but LVITEM structure (88 bytes long) with seven 4-byte placeholders inserted between members. It works on my Win7 Pro 64-bit, though I have not tested this approach yet on other machines.
Below is the structure. This is C++, but nothing prevents us from doing the same in .NET.
typedef struct {
UINT mask;
int iItem;
int iSubItem;
UINT state;
UINT stateMask;
int placeholder1;
LPTSTR pszText;
int placeholder11;
int cchTextMax;
int iImage;
LPARAM lParam;
int placeholder2;
#if (_WIN32_IE >= 0x0300)
int iIndent;
#endif
#if (_WIN32_WINNT >= 0x0501)
int iGroupId;
UINT cColumns;
int placeholder3;
UINT puColumns;
int placeholder4;
#endif
#if (_WIN32_WINNT >= 0x0600)
int piColFmt;
int placeholder5;
int iGroup;
int placeholder6;
#endif
} LVITEM64, *LPLVITEM64;

Related

AT91 ARM EMAC polling issue

I am using atmel's lwip example. Interfacing with PHY is ok. It can link and even auto negotiate. Netif is going up. But when i start polling netif nothing happens. Ive narrowed down problem to EMAC_Poll
unsigned char EMAC_Poll(unsigned char *pFrame, unsigned int frameSize, unsigned int *pRcvSize)
{
unsigned short bufferLength;
unsigned int tmpFrameSize=0;
unsigned char *pTmpFrame=0;
unsigned int tmpIdx = rxTd.idx;
volatile EmacRxTDescriptor *pRxTd = rxTd.td + rxTd.idx;
ASSERT(pFrame, "F: EMAC_Poll\n\r");
char isFrame = 0;
// Set the default return value
*pRcvSize = 0;
// Process received RxTd
while ((pRxTd->addr & EMAC_RX_OWNERSHIP_BIT) == EMAC_RX_OWNERSHIP_BIT) {
// Never got there.
...
}
return EMAC_RX_NO_DATA;
}
typedef struct {
volatile EmacRxTDescriptor td[RX_BUFFERS];
EMAC_RxCallback rxCb; /// Callback function to be invoked once a frame has been received
unsigned short idx;
} RxTd;
/// Describes the type and attribute of Receive Transfer descriptor.
typedef struct _EmacRxTDescriptor {
unsigned int addr;
unsigned int status;
} __attribute__((packed, aligned(8))) EmacRxTDescriptor, *PEmacRxTDescriptor;
There is while loop, but condition is never goes true.
I have very vague presentation what is RxTd and what exacly this condition means. However i can not see how thise RxTd Would change to pass condition. All references of RxTd leads to same emac.c module. Most of them in that polling function and rest in EMAC_ResetRx function.
static void EMAC_ResetRx(void)
{
unsigned int Index;
unsigned int Address;
// Disable RX
AT91C_BASE_EMAC->EMAC_NCR &= ~AT91C_EMAC_RE;
// Setup the RX descriptors.
rxTd.idx = 0;
for(Index = 0; Index < RX_BUFFERS; Index++) {
Address = (unsigned int)(&(pRxBuffer[Index * EMAC_RX_UNITSIZE]));
// Remove EMAC_RX_OWNERSHIP_BIT and EMAC_RX_WRAP_BIT
rxTd.td[Index].addr = Address & EMAC_ADDRESS_MASK;
rxTd.td[Index].status = 0;
}
rxTd.td[RX_BUFFERS - 1].addr |= EMAC_RX_WRAP_BIT;
// Receive Buffer Queue Pointer Register
AT91C_BASE_EMAC->EMAC_RBQP = (unsigned int) (rxTd.td);
}
I do not realy understand last line, but it looks like that rxTd is auto filled with AT91 itself. If it is so, there may be packing/aligment problem, but Atmel added __attribute__ ((packed, aligned(8))) on RxTd structure definition. Any way, can someone describe mechanism of data input or tell me where proble might be?
By the way i am using gcc, if that matters.
UPD:
Ive checked RSR and notice that it is start with 0, then goes to 2 after second. 2- means new data was captured.
UPD:
So i've read about function of emac in datasheet for my chip. I was right. That RBQP register must point to array of descriptors. Each descriptor consists of address and status field. The datasheet states that "bit zero of address field is written to one to show the buffer has been used". Then ARM uses another rx descriptor from that array. I guess by "has been used" they mean that that buffer is filled with frame data and ready to be processed. This must mean that data just not going to that buffer. But it must be there because REC goes high. Additionaly i've checked that RE in NCR is up and MI is enabled. I have no idea what is wrong.
I've spend whole week to solve it. The funny thing is that if i've dump memory and looked at all those addresses - The data was there whole time! So the key was to disable I and D caching and MMU itself. Hope it will help someone.

How to create a process that runs a routine with variable number of parameters?

I know there are lots of questions here about functions that take a variable number of arguments. I also know there's lots of docs about stdarg.h and its macros. And I also know how printf-like functions take a variable number of arguments. I already tried each of those alternatives and they didn't help me. So, please, keep that in mind before marking this question as duplicate.
I'm working on the process management features of a little embedded operating system and I'm stuck on the design of a function that can create processes that run a function with a variable number of parameters. Here's a simplified version of how I want my API to looks like:
// create a new process
// * function is a pointer to the routine the process will run
// * nargs is the number of arguments the routine takes
void create(void* function, uint8_t nargs, ...);
void f1();
void f2(int i);
void f3(float f, int i, const char* str);
int main()
{
create(f1, 0);
create(f2, 1, 9);
create(f3, 3, 3.14f, 9, "string");
return 0;
}
And here is a pseudocode for the relevant part of the implementation of system call create:
void create(void* function, uint8_t nargs, ...)
{
process_stack = create_stack();
first_arg = &nargs + 1;
copy_args_list_to_process_stack(process_stack, first_arg);
}
Of course I'll need to know the calling convention in order to be able to copy from create's activation record to the new process stack, but that's not the problem. The problem is how many bytes do I need to copy. Even though I know how many arguments I need to copy, I don't know how much space each of those arguments occupy. So I don't know when to stop copying.
The Xinu Operating System does something very similar to what I want to do, but I tried hard to understand the code and didn't succeed. I'll transcript a very simplified version of the Xinu's create function here. Maybe someone understand and help me.
pid32 create(void* procaddr, uint32 ssize, pri16 priority, char *name, int32 nargs, ...)
{
int32 i;
uint32 *a; /* points to list of args */
uint32 *saddr; /* stack address */
saddr = (uint32 *)getstk(ssize); // return a pointer to the new process's stack
*saddr = STACKMAGIC; // STACKMAGIC is just a marker to detect stack overflow
// this is the cryptic part
/* push arguments */
a = (uint32 *)(&nargs + 1); /* start of args */
a += nargs -1; /* last argument */
for ( ; nargs > 4 ; nargs--) /* machine dependent; copy args */
*--saddr = *a--; /* onto created process's stack */
*--saddr = (long)procaddr;
for(i = 11; i >= 4; i--)
*--saddr = 0;
for(i = 4; i > 0; i--) {
if(i <= nargs)
*--saddr = *a--;
else
*--saddr = 0;
}
}
I got stuck on this line: a += nargs -1;. This should move the pointer a 4*(nargs - 1) ahead in memory, right? What if an argument's size is not 4 bytes? But that is just the first question. I also didn't understand the next lines of the code.
If you are writing an operating system, you also define the calling convention(s) right? Settle for argument sizes of sizeof(void*) and pad as necessary.

Are the function signatures correct in the RSA Authentication Agent API documentation?

I have some software which uses the documented API for RSA's Authentication Agent. This is a product which runs as a service on the client machines in a domain, and authenticates users locally by communicating with an "RSA Authentication Manager" installed centrally.
The Authentication Agent's API is publicly documented here: Authentication Agent API 8.1.1 for C Developers Guide. However, the docs seem to be incorrect, and I do not have access to the RSA header files - they are not public; only the PDF documentation is available for download without paying $$ to RSA. If anyone here has access to up to date header files, would you be able to confirm for me whether the documentation is out of date?
The function signatures given in the API docs seem incorrect - in fact, I'm absolutely convinced that they are wrong on x64 machines. For example, the latest PDF documentation shows the following:
int WINAPI AceSetUserData(SDI_HANDLE hdl, unsigned int userData)
int WINAPI AceGetUserData(SDI_HANDLE hdl, unsigned int *pUserData)
The documentation states several times that the "userData" value is a 32-bit quantity, for example in the documentation for AceInit, AceSetUserData, and AceGetUserData. A relevant excerpt from the docs for AceGetUserData:
This function is synchronous and the caller must supply, as the second argument, a pointer to a 32-bit storage area (that is, an unsigned int) into which to copy the user data value.
This is clearly false - from some experimentation, if you pass in a pointer to the center of a buffer filled with 0xff, AceGetUserData is definitely writing out a 64-bit value, not a 32-bit quantity.
My version of aceclnt.dll is 8.1.3.563; the corresponding documentation is labelled "Authentication Agent API 8.1 SP1", and this corresponds to version 7.3.1 of the Authentication Agent itself.
Test code
Full test code given, even though it's not relevant to the problem at all... It's no use to me if someone else runs the test code (I know what it does!), what I need is someone with access to the RSA header files who can confirm the function signatures.
#include <assert.h>
#include <stdlib.h>
#include <stdint.h>
#ifdef WIN32
#include <Windows.h>
#include <tchar.h>
#define SDAPI WINAPI
#else
#define SDAPI
#endif
typedef int SDI_HANDLE;
typedef uint32_t SD_BOOL;
typedef void (SDAPI* AceCallback)(SDI_HANDLE);
#define ACE_SUCCESS 1
#define ACE_PROCESSING 150
typedef SD_BOOL (SDAPI* AceInitializeEx_proto)(const char*, char*, uint32_t);
typedef int (SDAPI* AceInit_proto)(SDI_HANDLE*, void*, AceCallback);
typedef int (SDAPI* AceClose_proto)(SDI_HANDLE, AceCallback);
typedef int (SDAPI* AceGetUserData_proto)(SDI_HANDLE, void*);
typedef int (SDAPI* AceSetUserData_proto)(SDI_HANDLE, void*);
struct Api {
AceInitializeEx_proto AceInitializeEx;
AceInit_proto AceInit;
AceClose_proto AceClose;
AceGetUserData_proto AceGetUserData;
AceSetUserData_proto AceSetUserData;
} api;
static void api_init(struct Api* api) {
// All error-checking stripped...
HMODULE dll = LoadLibrary(_T("aceclnt.dll")); // leak this for the demo
api->AceInitializeEx = (AceInitializeEx_proto)GetProcAddress(dll, "AceInitializeEx");
api->AceInit = (AceInit_proto)GetProcAddress(dll, "AceInit");
api->AceClose = (AceClose_proto)GetProcAddress(dll, "AceClose");
api->AceGetUserData = (AceGetUserData_proto)GetProcAddress(dll, "AceGetUserData");
api->AceSetUserData = (AceSetUserData_proto)GetProcAddress(dll, "AceSetUserData");
int success = api->AceInitializeEx("C:\\my\\conf\\directory", 0, 0);
assert(success);
}
static void demoFunction(SDI_HANDLE handle) {
union {
unsigned char testBuffer[sizeof(void *) * 3];
void *forceAlignment;
} u;
memset(u.testBuffer, 0xA5, sizeof u.testBuffer);
int err = api.AceGetUserData(handle, (void*)(u.testBuffer + sizeof(void*)));
assert(err == ACE_SUCCESS);
fputs("DEBUG: testBuffer =", stderr);
for (size_t i = 0; i < sizeof(u.testBuffer); i++) {
if (i % 4 == 0)
putc(' ', stderr);
fprintf(stderr, "%02x", u.testBuffer[i]);
}
fputc('\n', stderr);
// Prints:
// DEBUG: testBuffer = a5a5a5a5 a5a5a5a5 00000000 00000000 a5a5a5a5 a5a5a5a5
// According to the docs, this should only write out a 32-bit value
}
static void SDAPI demoCallback(SDI_HANDLE h) {
fprintf(stderr, "Callback invoked, handle = %p\n", (void*)h);
}
int main(int argc, const char** argv)
{
api_init(&api);
SDI_HANDLE h;
int err = api.AceInit(&h, /* contentious argument */ 0, &demoCallback);
assert(err == ACE_PROCESSING);
demoFunction(h);
api.AceClose(h, 0);
return 0;
}
As you've copied the function/type definitions out of the documentation, you basically don't have and never will have the correct definition for the version of the .dll you're using and could always end up in crashes or worse, undefined behavior.
What you could do is to debug the corresponding .dll:
Do you run Visual Studio? I remember that VS could enter a function call in debug mode and show the assembly, not sure though how it is today. But any disassembler should do the trick. As of x64 ABI register rcx gets the first argument, rdx the second. If the function internally works with the 32bit register names or clears the upper 32bit than you can assume a 32bit integer. If it uses it to load an address (e.g. lea instruction) you could assume a pointer. But as you can see, that's probably not a road you wanna go down...
So what else do you have left?
The document you've linked states a 32-bit and 64-bit library - depending on the platform you use. I guess you use the 64bit lib and that RSA did not update the documentation for this library, but at some point the developers needed to upgrade the library to 64bit.
So think about this way: If you would be the API developer, what is possible to migrate to 64bit and what not. E.g. everything that needs to work across 32/64 implementations (stuff that gets send over the network or stored and shared on disk) cannot be touched. But everything that's local to the instance, can be migrated. As the userData seems to be a runtime thing, it makes sense to support whatever the platform provides: unsigned long on 64bit and unsigned int on 32bit.
You've figured out that userData must be 64 bit. But not because the function writes out a 64bit integer, but because the function sees a 64bit value to start with. As integers are passed by value (I guess in general, but definitely in WINAPI), there's absolutely no chance the function could see the full 64bit value if it would be a 32bit datatype. So most likely, the API developers changed the datatype to unsigned long (in any case to 64bit type).
PS: If you end up putting a pointer into userData, cast the pointer to uintptr_t and store/read that type.
To avoid questions of undefined behavior, please replace your test function with this one, and report what it prints. Please also show us the complete test program, so that people who have access to this library can compile and run it for themselves and tinker with it. I would especially like to see the declarations of the api global and its type, and the code that initializes api, and to know where the type came from (did you make it up as part of this reverse engineering exercise or did you get it from somewhere?)
static void demoFunction(SDI_HANDLE handle) {
int err = api.AceSetUserData(handle, 0);
assert(err == ACE_SUCCESS);
union {
unsigned char testBuffer[sizeof(void *) * 3];
void *forceAlignment;
} u;
memset(u.testBuffer, 0xA5, sizeof u.testBuffer);
err = api.AceGetUserData(handle, (void *)(u.testBuffer + sizeof(void*)));
assert (err == ACE_SUCCESS);
fputs("DEBUG: testBuffer =", stderr);
for (size_t i = 0; i < sizeof(u.testBuffer); i++) {
if (i % 4 == 0)
putc(' ', stderr);
printf(stderr, "%02x", u.testBuffer[i]);
}
fputc('\n', stderr);
}
(If your hypothesis is correct, the output will be
DEBUG: testBuffer = a5a5a5a5 a5a5a5a5 00000000 00000000 a5a5a5a5 a5a5a5a5
.)

Setting the x86 debug registers during handling of an exception

I am able to set up a HW breakpoint (writing or reading of an address) without problems and it also works.
If I try to do the same from a Vectored Exception handler the registers are set (confirmed it with set/getThreadContext), but the same SW never breaks.
The address and the setup functions are the same, the only difference that I see is that, if I set the breakpoint from the main code, it works; if I do the same from a vectored exception handler (during where an unrelated exception has happened) the breakpoint will not have any effect.
The code that should break on the breakpoint is of course outside of the exception handler.
I can't find any relevant information in the IA32 Software Manual.
Perhaps some of you had this problem. Any help is appreciated.
Edit:
Here is a very crude example, don't mind the warnings. If compiled with #define DIRECT, the program terminates because of the uncaught SINGLE_STEP, if compiled without nothing happens.
To be clear, this is just an ugly, messy, quick example.
// The following macros define the minimum required platform. The minimum required platform
// is the earliest version of Windows, Internet Explorer etc. that has the necessary features to run
// your application. The macros work by enabling all features available on platform versions up to and
// including the version specified.
// Modify the following defines if you have to target a platform prior to the ones specified below.
// Refer to MSDN for the latest info on corresponding values for different platforms.
#ifndef WINVER // Specifies that the minimum required platform is Windows Vista.
#define WINVER 0x0600 // Change this to the appropriate value to target other versions of Windows.
#endif
#ifndef _WIN32_WINNT // Specifies that the minimum required platform is Windows Vista.
#define _WIN32_WINNT 0x0600 // Change this to the appropriate value to target other versions of Windows.
#endif
#ifndef _WIN32_WINDOWS // Specifies that the minimum required platform is Windows 98.
#define _WIN32_WINDOWS 0x0410 // Change this to the appropriate value to target Windows Me or later.
#endif
#ifndef _WIN32_IE // Specifies that the minimum required platform is Internet Explorer 7.0.
#define _WIN32_IE 0x0700 // Change this to the appropriate value to target other versions of IE.
#endif
#include "windows.h"
#include "stdio.h"
#define DR7_CONFIG(num, lg, rw, size) ((((rw & 3) << (16 + 4 * num)) | ((size & 3) << (18 + 4 * num))) | (lg << (2 * num)))
HANDLE mainThread;
//#define DIRECT
DWORD WINAPI LibAsilEmu_ArrayBoundsMonitor(LPVOID lpParameter)
{
MSG msg;
CONTEXT tcontext;
HANDLE th;
DWORD temp = 0;
IMAGE_SECTION_HEADER *psec;
th = OpenThread(THREAD_ALL_ACCESS, FALSE, mainThread);
temp = SuspendThread(th);
tcontext.ContextFlags = CONTEXT_DEBUG_REGISTERS;
temp = GetThreadContext(th, &tcontext);
tcontext.Dr6 = 0;
tcontext.Dr0 = lpParameter;
tcontext.Dr2 = lpParameter;
tcontext.Dr3 = lpParameter;
tcontext.Dr1 = lpParameter;
tcontext.Dr7 = 0xffffffff;//DR7_CONFIG(0, 1, 2, 3) | DR7_CONFIG(1, 1, 2, 3) | 256;
temp = SetThreadContext(th, &tcontext);
temp = GetThreadContext(th, &tcontext);
temp = ResumeThread(th);
return 0;
}
HANDLE child;
unsigned int test_array[10];
LONG WINAPI LibAsilEmu_TopLevelVectoredFilter(struct _EXCEPTION_POINTERS *ExceptionInfo)
{
child = CreateThread(NULL, 0x4000, LibAsilEmu_ArrayBoundsMonitor, &test_array[2], 0, NULL);
while (WAIT_OBJECT_0 != WaitForSingleObject(child, 0));
return EXCEPTION_CONTINUE_EXECUTION;
}
int main(void)
{
int i;
printf("%p\n", test_array); fflush(stdout);
printf("sizeof %d\n", sizeof(test_array));
mainThread = GetCurrentThreadId();
#ifdef DIRECT
child = CreateThread(NULL, 0x4000, LibAsilEmu_ArrayBoundsMonitor, &test_array[2], 0, NULL);
// Edit 2: the wait was missing
while (WAIT_OBJECT_0 != WaitForSingleObject(child, 0));
#else
AddVectoredExceptionHandler(1, &LibAsilEmu_TopLevelVectoredFilter);
RaiseException(0,0,0,0);
#endif
printf("thread ready\n");
printf("ta %p\n", &(test_array[10]));
for (i = 0; i < 15; ++i)
{
*((char*)test_array) += 0x25;
test_array[i] += 0x15151515;
printf("%2d %d %p\n", i, test_array[i], &(test_array[i])); fflush(stdout);
}
printf("ta %d\n", test_array[10]);
printf("exit\n\nexit\n\n");
return 0;
}
I do not know the specifics of your problem because there is no code provided in your description. i.e., it could simply be a coding error. So, with no other information all that I can think is to point you to a link with some example code on using vectored exception handling HERE Hope this helps, otherwise, post some code
[EDIT 1]
First, I would guess you are not the first to see this problem, Vectored exception handling has been around since XP. Others most certainly have seen it.
Probably no need to say it (but I will anyway) debuggers, when running multiple threads behave differently than they do in
in single thread. You are calling your vectored exception handler from a separate thread. It seems likely this is somehow related to the fact you cannot see the breakpoint when running from thread. Do you have the abilty to point your debugger to a specific thread? See Here
I have one other suggestion - Since the real point of this question is to determine why your HW break point is not being set when attempted using a Vectored Exception Handler, consider changing the title of this post to something like:
Cannot set HW breakpoints using Vectored Exception handler.
it may get more attention. (I would have done this myself, but did not want to make the presumption you would be okay with that :)
[EDIT 2]
I see in your example that in main(), you are calling AddVectoredExceptionHandler() from within a new thread. But the function
LONG WINAPI LibAsilEmu_TopLevelVectoredFilter(struct _EXCEPTION_POINTERS *ExceptionInfo)
{
child = CreateThread(NULL, 0x4000, LibAsilEmu_ArrayBoundsMonitor, &test_array[2], 0, NULL);
while (WAIT_OBJECT_0 != WaitForSingleObject(child, 0));
return EXCEPTION_CONTINUE_EXECUTION;
}
creates yet another new thread, where LibAsilEmu_ArrayBoundsMonitor in turn opens the main thread again. This seems an odd series of events to me. Is this code something you have seen used somewhere else, and used in conjunction with vectored exception handling?

What's a good way to implement simple clone() based multithread library?

I'm trying to build simple multithread library based on linux using clone() and other kernel utilities.I've come to a point where I'm not really sure what's the correct way to do things. I tried going trough original NPTL code but it's a bit too much.
That's how for instance I imagine the create method:
typedef int sk_thr_id;
typedef void *sk_thr_arg;
typedef int (*sk_thr_func)(sk_thr_arg);
sk_thr_id sk_thr_create(sk_thr_func f, sk_thr_arg a){
void* stack;
stack = malloc( 1024*64 );
if ( stack == 0 ){
perror( "malloc: could not allocate stack" );
exit( 1 );
}
return ( clone(f, (char*) stack + FIBER_STACK, SIGCHLD | CLONE_FS | CLONE_FILES | CLONE_SIGHAND | CLONE_VM, a ) );
}
1: I'm not really sure what the correct clone() flags should be. I just found these being used in a simple example. Any general directions here will be welcome.
Here are parts of the mutex primitives created using futexes(not my own code for now):
#define cmpxchg(P, O, N) __sync_val_compare_and_swap((P), (O), (N))
#define cpu_relax() asm volatile("pause\n": : :"memory")
#define barrier() asm volatile("": : :"memory")
static inline unsigned xchg_32(void *ptr, unsigned x)
{
__asm__ __volatile__("xchgl %0,%1"
:"=r" ((unsigned) x)
:"m" (*(volatile unsigned *)ptr), "0" (x)
:"memory");
return x;
}
static inline unsigned short xchg_8(void *ptr, char x)
{
__asm__ __volatile__("xchgb %0,%1"
:"=r" ((char) x)
:"m" (*(volatile char *)ptr), "0" (x)
:"memory");
return x;
}
int sys_futex(void *addr1, int op, int val1, struct timespec *timeout, void *addr2, int val3)
{
return syscall(SYS_futex, addr1, op, val1, timeout, addr2, val3);
}
typedef union mutex mutex;
union mutex
{
unsigned u;
struct
{
unsigned char locked;
unsigned char contended;
} b;
};
int mutex_init(mutex *m, const pthread_mutexattr_t *a)
{
(void) a;
m->u = 0;
return 0;
}
int mutex_lock(mutex *m)
{
int i;
/* Try to grab lock */
for (i = 0; i < 100; i++)
{
if (!xchg_8(&m->b.locked, 1)) return 0;
cpu_relax();
}
/* Have to sleep */
while (xchg_32(&m->u, 257) & 1)
{
sys_futex(m, FUTEX_WAIT_PRIVATE, 257, NULL, NULL, 0);
}
return 0;
}
int mutex_unlock(mutex *m)
{
int i;
/* Locked and not contended */
if ((m->u == 1) && (cmpxchg(&m->u, 1, 0) == 1)) return 0;
/* Unlock */
m->b.locked = 0;
barrier();
/* Spin and hope someone takes the lock */
for (i = 0; i < 200; i++)
{
if (m->b.locked) return 0;
cpu_relax();
}
/* We need to wake someone up */
m->b.contended = 0;
sys_futex(m, FUTEX_WAKE_PRIVATE, 1, NULL, NULL, 0);
return 0;
}
2: The main question for me is how to implement the "join" primitive? I know it's supposed to be based on futexes too. It's a struggle for me for now to come up with something.
3: I need some way to cleanup stuff(like the allocated stack) after a thread has finished. I can't really thing of a good way to do this too.
Probably for these I'll need to have additional structure in user space for every thread with some information saved in it. Can someone point me in good direction for solving these issues?
4: I'll want to have a way to tell how much time a thread has been running, how long it's been since it's last being scheduled and other stuff like that. Are there some kernel calls providing such info?
Thanks in advance!
The idea that there can exist a "multithreading library" as a third-party library separate from the rest of the standard library is an outdated and flawed notion. If you want to do this, you'll have to first drop all use of the standard library; particularly, your call to malloc is completely unsafe if you're calling clone yourself, because:
malloc will have no idea that multiple threads exist, and therefore may fail to perform proper synchronization.
Even if it knew they existed, malloc will need to access an unspecified, implementation-specific structure located at the address given by the thread pointer. As this structure is implementation-specific, you have no way of creating such a structure that will be interpreted correctly by both the current and all future versions of your system's libc.
These issues don't apply just to malloc but to most of the standard library; even async-signal-safe functions may be unsafe to use, as they might dereference the thread pointer for cancellation-related purposes, performing optimal syscall mechanisms, etc.
If you really insist on making your own threads implementation, you'll have to abstain from using glibc or any modern libc that's integrated with threads, and instead opt for something much more naive like klibc. This could be an educational experiment, but it would not be appropriate for a deployed application.
1) You are using an example of LinuxThreads. I will not rewrite good references for directions, but I advise you "The Linux Programming interface" of Michael Kerrisk, chapter 28. It explains in 25 pages, what you need.
2) If you set the CLONE_CHILD_CLEARID flag, when the child terminates, the ctid argument of clone is cleared. If you treat that pointer as a futex, you can implement the join primitive. Good luck :-) If you don't want to use futexes, have also a look to wait3 and wait4.
3) I do not know what you want to cleanup, but you can use the clone tls arugment. This is a thread local storage buffer. If the thread is finished, you can clean that buffer.
4) See getrusage.

Resources