I want to see how much is taken by the C program, so I wrote:
#include<stdio.h>
#include<stdlib.h>
#include"memory.h"
#include"memory_debug.h"
#include<sys/times.h>
#include<unistd.h>
int (*deallocate_ptr)(memContainer *,void*);
void (*merge_ptr)(node *);
void* (*allocate_ptr)(memContainer *,unsigned long size);
memContainer* (*init_ptr)(unsigned long );
diagStruct* (*diagnose_ptr)(memContainer *);
void (*finalize_ptr)(memContainer *);
void (*printNode_ptr)(node *n);
void (*printContainer_ptr)(memContainer *c);
void info(memContainer *c)
{
struct tms *t;
t=malloc(sizeof(struct tms));
times(t);
printf("user : %d\nsystem : %d\n %d",t->tms_utime,(int)t->tms_stime);
diagnose_ptr(c);
printf("\n");
return ;
}
but when I invoke this function I get 0 user time and 0 system time, even if I write:
for (i=0;i<100000;++i)
for (j=0;j<10;++j)
{}
info(c);
what am I doing wrong?
The compiler probably optimizes away your for loops since they do nothing. Try incrementing a volatile variable.
If you only want to know the time, try running time ./app and it will print the cputime, wall clock time etc of the executed app.
The code could simply write a volatile variable at the start, put your 'work' in a function (in a separate file), then read the volatile after the 'work' and print something involving the volatile.
Or do some simple calculation with a part of the calculation buried in a function, or using a function return.
What platform (Operating system & Compiler) are you using?
I don't know what platform you are running on, but there are a few useful questions on stackoverflow about higher precision system clocks. High precision timing in userspace in Linux has several useful links and references.
Timing Methods in C++ Under Linux looked useful.
The below demo program outputs nonzero times:
#include<stdio.h>
#include<stdlib.h>
#include"memory.h"
#include<sys/times.h>
#include<unistd.h>
#include <iostream>
using namespace std;
int main()
{
int x = 0;
for (int i = 0; i < 1 << 30; i++)
x++;
struct tms t;
times(&t);
cout << t.tms_utime << endl;
cout << t.tms_stime << endl;
return x;
}
Output:
275
1
Related
I have a rather huge recursive function (also, I write in C), and while I have no doubt that the scenario where stack overflow happens is extremely unlikely, it is still possible. What I wonder is whether you can detect if stack is going to get overflown within a few iterations, so you can do an emergency stop without crashing the program.
In the C programming language itself, that is not possible. In general, you can't know easily that you ran out of stack before running out. I recommend you to instead place a configurable hard limit on the recursion depth in your implementation, so you can simply abort when the depth is exceeded. You could also rewrite your algorithm to use an auxillary data structure instead of using the stack through recursion, this gives you greater flexibility to detect an out-of-memory condition; malloc() tells you when it fails.
However, you can get something similar with a procedure like this on UNIX-like systems:
Use setrlimit to set a soft stack limit lower than the hard stack limit
Establish signal handlers for both SIGSEGV and SIGBUS to get notified of stack overflows. Some operating systems produce SIGSEGV for these, others SIGBUS.
If you get such a signal and determine that it comes from a stack overflow, raise the soft stack limit with setrlimit and set a global variable to identify that this occured. Make the variable volatile so the optimizer doesn't foil your plains.
In your code, at each recursion step, check if this variable is set. If it is, abort.
This may not work everywhere and required platform specific code to find out that the signal came from a stack overflow. Not all systems (notably, early 68000 systems) can continue normal processing after getting a SIGSEGV or SIGBUS.
A similar approach was used by the Bourne shell for memory allocation.
Heres a simple solution that works for win-32. Actually resembles what Wossname already posted but less icky :)
unsigned int get_stack_address( void )
{
unsigned int r = 0;
__asm mov dword ptr [r], esp;
return r;
}
void rec( int x, const unsigned int begin_address )
{
// here just put 100 000 bytes of memory
if ( begin_address - get_stack_address() > 100000 )
{
//std::cout << "Recursion level " << x << " stack too high" << std::endl;
return;
}
rec( x + 1, begin_address );
}
int main( void )
{
int x = 0;
rec(x,get_stack_address());
}
Here's a naive method, but it's a bit icky...
When you enter the function for the first time you could store the address of one of your variables declared in that function. Store that value outside your function (e.g. in a global). In subsequent calls compare the current address of that variable with the cached copy. The deeper you recurse the further apart these two values will be.
This will most likely cause compiler warnings (storing addresses of temporary variables) but it does have the benefit of giving you a fairly accurate way of knowing exactly how much stack you're using.
Can't say I really recommend this but it would work.
#include <stdio.h>
char* start = NULL;
void recurse()
{
char marker = '#';
if(start == NULL)
start = ▮
printf("depth: %d\n", abs(&marker - start));
if(abs(&marker - start) < 1000)
recurse();
else
start = NULL;
}
int main()
{
recurse();
return 0;
}
An alternative method is to learn the stack limit at the start of the program, and each time in your recursive function to check whether this limit has been approached (within some safety margin, say 64 kb). If so, abort; if not, continue.
The stack limit on POSIX systems can be learned by using getrlimit system call.
Example code that is thread-safe: (note: it code assumes that stack grows backwards, as on x86!)
#include <stdio.h>
#include <sys/time.h>
#include <sys/resource.h>
void *stack_limit;
#define SAFETY_MARGIN (64 * 1024) // 64 kb
void recurse(int level)
{
void *stack_top = &stack_top;
if (stack_top <= stack_limit) {
printf("stack limit reached at recursion level %d\n", level);
return;
}
recurse(level + 1);
}
int get_max_stack_size(void)
{
struct rlimit rl;
int ret = getrlimit(RLIMIT_STACK, &rl);
if (ret != 0) {
return 1024 * 1024 * 8; // 8 MB is the default on many platforms
}
printf("max stack size: %d\n", (int)rl.rlim_cur);
return rl.rlim_cur;
}
int main (int argc, char *argv[])
{
int x;
stack_limit = (char *)&x - get_max_stack_size() + SAFETY_MARGIN;
recurse(0);
return 0;
}
Output:
max stack size: 8388608
stack limit reached at recursion level 174549
This question already has an answer here:
Calculate run time of kernel code in OpenCL C
(1 answer)
Closed 7 years ago.
I want to measure the performance of different devices viz CPU and GPUs.
This is my kernel code:
__kernel void dataParallel(__global int* A)
{
sleep(10);
A[0]=2;
A[1]=3;
A[2]=5;
int pnp;//pnp=probable next prime
int pprime;//previous prime
int i,j;
for(i=3;i<10;i++)
{
j=0;
pprime=A[i-1];
pnp=pprime+2;
while((j<i) && A[j]<=sqrt((float)pnp))
{
if(pnp%A[j]==0)
{
pnp+=2;
j=0;
}
j++;
}
A[i]=pnp;
}
}
However the sleep() function doesnt work. I am getting the following error in buildlog:
<kernel>:4:2: warning: implicit declaration of function 'sleep' is invalid in C99
sleep(10);
builtins: link error: Linking globals named '__gpu_suld_1d_i8_trap': symbol multiply defined!
Is there any other way to implement the function. Also is there a way to record the time taken to execute this code snippet.
P.S. I have included #include <unistd.h> in my host code.
You dont need to use sleep in your kernel to measure the execution time.
There are two ways to measure the time.
1. Use opencl inherent profiling
look here: cl api
get timestamps in your hostcode and compare them before and after execution.
example:
double start = getTimeInMS();
//The kernel starts here
clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL, &tasksize, &local_size_in, 0, NULL, NULL)
//wait for kernel execution
clFinish(command_queue);
cout << "kernel execution time " << (getTimeInMS() - start) << endl;
Where getTimeinMs() is a function that returns a double value of miliseconds:
(windows specific, override with other implementation if you dont use windows)
static inline double getTimeInMS(){
SYSTEMTIME st;
GetLocalTime(&st);
return (double)st.wSecond * (double)1000 + (double)st.wMilliseconds;}
Also you want to:
#include <time.h>
For Mac it would be (could work on Linux as well, not sure):
static inline double getTime() {
struct timeval starttime;
gettimeofday(&starttime, 0x0);
return (double)starttime.tv_sec * (double)1000 + (double)starttime.tv_usec / (double)1000;}
I'm trying to build simple multithread library based on linux using clone() and other kernel utilities.I've come to a point where I'm not really sure what's the correct way to do things. I tried going trough original NPTL code but it's a bit too much.
That's how for instance I imagine the create method:
typedef int sk_thr_id;
typedef void *sk_thr_arg;
typedef int (*sk_thr_func)(sk_thr_arg);
sk_thr_id sk_thr_create(sk_thr_func f, sk_thr_arg a){
void* stack;
stack = malloc( 1024*64 );
if ( stack == 0 ){
perror( "malloc: could not allocate stack" );
exit( 1 );
}
return ( clone(f, (char*) stack + FIBER_STACK, SIGCHLD | CLONE_FS | CLONE_FILES | CLONE_SIGHAND | CLONE_VM, a ) );
}
1: I'm not really sure what the correct clone() flags should be. I just found these being used in a simple example. Any general directions here will be welcome.
Here are parts of the mutex primitives created using futexes(not my own code for now):
#define cmpxchg(P, O, N) __sync_val_compare_and_swap((P), (O), (N))
#define cpu_relax() asm volatile("pause\n": : :"memory")
#define barrier() asm volatile("": : :"memory")
static inline unsigned xchg_32(void *ptr, unsigned x)
{
__asm__ __volatile__("xchgl %0,%1"
:"=r" ((unsigned) x)
:"m" (*(volatile unsigned *)ptr), "0" (x)
:"memory");
return x;
}
static inline unsigned short xchg_8(void *ptr, char x)
{
__asm__ __volatile__("xchgb %0,%1"
:"=r" ((char) x)
:"m" (*(volatile char *)ptr), "0" (x)
:"memory");
return x;
}
int sys_futex(void *addr1, int op, int val1, struct timespec *timeout, void *addr2, int val3)
{
return syscall(SYS_futex, addr1, op, val1, timeout, addr2, val3);
}
typedef union mutex mutex;
union mutex
{
unsigned u;
struct
{
unsigned char locked;
unsigned char contended;
} b;
};
int mutex_init(mutex *m, const pthread_mutexattr_t *a)
{
(void) a;
m->u = 0;
return 0;
}
int mutex_lock(mutex *m)
{
int i;
/* Try to grab lock */
for (i = 0; i < 100; i++)
{
if (!xchg_8(&m->b.locked, 1)) return 0;
cpu_relax();
}
/* Have to sleep */
while (xchg_32(&m->u, 257) & 1)
{
sys_futex(m, FUTEX_WAIT_PRIVATE, 257, NULL, NULL, 0);
}
return 0;
}
int mutex_unlock(mutex *m)
{
int i;
/* Locked and not contended */
if ((m->u == 1) && (cmpxchg(&m->u, 1, 0) == 1)) return 0;
/* Unlock */
m->b.locked = 0;
barrier();
/* Spin and hope someone takes the lock */
for (i = 0; i < 200; i++)
{
if (m->b.locked) return 0;
cpu_relax();
}
/* We need to wake someone up */
m->b.contended = 0;
sys_futex(m, FUTEX_WAKE_PRIVATE, 1, NULL, NULL, 0);
return 0;
}
2: The main question for me is how to implement the "join" primitive? I know it's supposed to be based on futexes too. It's a struggle for me for now to come up with something.
3: I need some way to cleanup stuff(like the allocated stack) after a thread has finished. I can't really thing of a good way to do this too.
Probably for these I'll need to have additional structure in user space for every thread with some information saved in it. Can someone point me in good direction for solving these issues?
4: I'll want to have a way to tell how much time a thread has been running, how long it's been since it's last being scheduled and other stuff like that. Are there some kernel calls providing such info?
Thanks in advance!
The idea that there can exist a "multithreading library" as a third-party library separate from the rest of the standard library is an outdated and flawed notion. If you want to do this, you'll have to first drop all use of the standard library; particularly, your call to malloc is completely unsafe if you're calling clone yourself, because:
malloc will have no idea that multiple threads exist, and therefore may fail to perform proper synchronization.
Even if it knew they existed, malloc will need to access an unspecified, implementation-specific structure located at the address given by the thread pointer. As this structure is implementation-specific, you have no way of creating such a structure that will be interpreted correctly by both the current and all future versions of your system's libc.
These issues don't apply just to malloc but to most of the standard library; even async-signal-safe functions may be unsafe to use, as they might dereference the thread pointer for cancellation-related purposes, performing optimal syscall mechanisms, etc.
If you really insist on making your own threads implementation, you'll have to abstain from using glibc or any modern libc that's integrated with threads, and instead opt for something much more naive like klibc. This could be an educational experiment, but it would not be appropriate for a deployed application.
1) You are using an example of LinuxThreads. I will not rewrite good references for directions, but I advise you "The Linux Programming interface" of Michael Kerrisk, chapter 28. It explains in 25 pages, what you need.
2) If you set the CLONE_CHILD_CLEARID flag, when the child terminates, the ctid argument of clone is cleared. If you treat that pointer as a futex, you can implement the join primitive. Good luck :-) If you don't want to use futexes, have also a look to wait3 and wait4.
3) I do not know what you want to cleanup, but you can use the clone tls arugment. This is a thread local storage buffer. If the thread is finished, you can clean that buffer.
4) See getrusage.
First time posting so there's probably gonna be more info than necessary but I wanna be thorough:
One of our exercises in C was to create sender and receiver programs that would exchange data via RS232 serial communication with null modem. We used a virtual port program (I used the trial version of Virtual Serial Port by eltima software if you want to test). We were required to do 4 versions:
1) Using a predetermined library created by a previous student that had sender and reveiver etc. premade functions
2) Using the inportb and outportb functions
3) Using OS interrupt int86 and giving register values through the REGS union
4) Using inline assembly
Compiler: DevCPP (Bloodshed).
All worked, but now we are required to compare all the different versions based on the CPU time that is spent to send and receive a character. It specifically says that we have to find the following:
average, standard deviation, min, max and 99,5 %
Nothing was explained in class so I'm a little lost here...I'm guessing those are statistical numbers after many trials of the normal distribution? But even then how do I actually measure CPU cycles on this? I'll keep searching but I'm posting here in the mean time 'cause the deadline is in 3 days :D.
Code sample of the int86 version:
#include <stdio.h>
#include <stdlib.h>
#include <dos.h>
#define RS232_INIT_FUNCTION 0
#define RS232_SEND_FUNCTION 1
#define RS232_GET_FUNCTION 2
#define RS232_STATUS_FUNCTION 3
#define DATA_READY 0x01
#define PARAM 0xEF
#define COM1 0
#define COM2 1
void rs232init (int port, unsigned init_code)
{
union REGS inregs;
inregs.x.dx=port;
inregs.h.ah=RS232_INIT_FUNCTION;
inregs.h.al=init_code;
int86(0x14,&inregs,&inregs);
}
unsigned char rs232transmit (int port, char ch)
{
union REGS inregs;
inregs.x.dx=port;
inregs.h.ah=RS232_SEND_FUNCTION;
inregs.h.al=ch;
int86(0x14,&inregs,&inregs);
return (inregs.h.ah);
}
unsigned char rs232status(int port){
union REGS inregs;
inregs.x.dx=port;
inregs.h.ah=RS232_STATUS_FUNCTION;
int86(0x14, &inregs, &inregs);
return (inregs.h.ah); //Because we want the second byte of ax
}
unsigned char rs232receive(int port)
{
int x,a;
union REGS inregs;
while(!(rs232status(port) & DATA_READY))
{
if(kbhit()){
getch();
exit(1);
}
};
inregs.x.dx=port;
inregs.h.ah=RS232_GET_FUNCTION;
int86(0x14,&inregs,&inregs);
if(inregs.h.ah & 0x80)
{
printf("ERROR");
return -1;
}
return (inregs.h.al);
}
int main(){
unsigned char ch;
int d,e,i;
do{
puts("What would you like to do?");
puts("1.Send data");
puts("2.Receive data");
puts("0.Exit");
scanf("%d",&i);
getchar();
if(i==1){
rs232init(COM1, PARAM);
puts("Which char would you like to send?");
scanf("%c",&ch);
getchar();
while(!rs232status(COM1));
d=rs232transmit(COM1,ch);
if(d & 0x80) puts("ERROR"); //Checks the bit 7 of ah for error
}
else if(i==2){
rs232init(COM1,PARAM);
puts("Receiving character...");
ch=rs232receive(COM1);
printf("%c\n",ch);
}
}while(i != 0);
system("pause");
return 0;
}
There is some guesswork required here because the question is a little undefined.
You've listed four different methods for sending/receiving a character. What I suspect your lecturer is looking for is the time from when you call the method given (or enter your inline assembly code) to the time when you return from the method (leave inline code). You will need to grab a time just before the call and just after the call and find their difference.
Less ambiguous is cpu time. The clock() method is the most straightforward way to do this, however this may not be what the lecturer is looking for.
Finally are the statistics, which is straightforward. Do a bunch of runs, and run some statistics on the times
I intend to write my own JIT-interpreter as part of a course on VMs. I have a lot of knowledge about high-level languages, compilers and interpreters, but little or no knowledge about x86 assembly (or C for that matter).
Actually I don't know how a JIT works, but here is my take on it: Read in the program in some intermediate language. Compile that to x86 instructions. Ensure that last instruction returns to somewhere sane back in the VM code. Store the instructions some where in memory. Do an unconditional jump to the first instruction. Voila!
So, with that in mind, I have the following small C program:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
int main() {
int *m = malloc(sizeof(int));
*m = 0x90; // NOP instruction code
asm("jmp *%0"
: /* outputs: */ /* none */
: /* inputs: */ "d" (m)
: /* clobbers: */ "eax");
return 42;
}
Okay, so my intention is for this program to store the NOP instruction somewhere in memory, jump to that location and then probably crash (because I haven't setup any way for the program to return back to main).
Question: Am I on the right path?
Question: Could you show me a modified program that manages to find its way back to somewhere inside main?
Question: Other issues I should beware of?
PS: My goal is to gain understanding, not necessarily do everything the right way.
Thanks for all the feedback. The following code seems to be the place to start and works on my Linux box:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
unsigned char *m;
int main() {
unsigned int pagesize = getpagesize();
printf("pagesize: %u\n", pagesize);
m = malloc(1023+pagesize+1);
if(m==NULL) return(1);
printf("%p\n", m);
m = (unsigned char *)(((long)m + pagesize-1) & ~(pagesize-1));
printf("%p\n", m);
if(mprotect(m, 1024, PROT_READ|PROT_EXEC|PROT_WRITE)) {
printf("mprotect fail...\n");
return 0;
}
m[0] = 0xc9; //leave
m[1] = 0xc3; //ret
m[2] = 0x90; //nop
printf("%p\n", m);
asm("jmp *%0"
: /* outputs: */ /* none */
: /* inputs: */ "d" (m)
: /* clobbers: */ "ebx");
return 21;
}
Question: Am I on the right path?
I would say yes.
Question: Could you show me a modified program that manages to find its way back to somewhere inside main?
I haven't got any code for you, but a better way to get to the generated code and back is to use a pair of call/ret instructions, as they will manage the return address automatically.
Question: Other issues I should beware of?
Yes - as a security measure, many operating systems would prevent you from executing code on the heap without making special arrangements. Those special arrangements typically amount to you having to mark the relevant memory page(s) as executable.
On Linux this is done using mprotect() with PROT_EXEC.
If your generated code follows the proper calling convention, then you can declare a pointer-to-function type and invoke the function this way:
typedef void (*generated_function)(void);
void *func = malloc(1024);
unsigned char *o = (unsigned char *)func;
generated_function *func_exec = (generated_function *)func;
*o++ = 0x90; // NOP
*o++ = 0xcb; // RET
func_exec();