Which Linux kernel function creates the 'process 0'?

Which Linux kernel function creates the 'process 0'? - c

I'm trying to understand more about process 0, such as, whether it has a memory descriptor (non-NULL task_struct->mm field) or not, and how is it related to the swap or idle process. It seems to me that a single 'process 0' is created on the boot cpu, and then an idle thread is created for every other cpu by idle_threads_init, but I didn't find where the first one( I assume that is the process 0) was created.
Update
In light of the live book that tychen referenced, here is my most up-to-date understanding regarding process 0 (for x86_64), can someone confirm/refute the items below?
An init_task typed task_struct is statically defined, with the task's kernel stack init_task.stack = init_stack, memory descriptor init_task.mm=NULL and init_task.active_mm=&init_mm, where the stack area init_stack and mm_struct init_mm are both statically defined.
The fact that only active_mm is non-NULL means process 0 is a kernel process. Also, init_task.flags=PF_KTHREAD.
Not long after the uncompressed kernel image begins execution, boot cpu starts to use init_stack as kernel stack. This makes the current macro meaningful (for the first time since machine boots up), which makes fork() possible. After this point, the kernel literally runs in process 0's conext.
start_kernel -> arch_call_rest_init -> rest_init, and inside this function, process 1&2 are forked. Within the kernel_init function which is scheduled for process 1, a new thread (with CLONE_VM) is made and hooked to a CPU's run queue's rq->idle, for every other logical CPU.
Interestingly, all idle threads share the same tid 0 (not only tgid). Usually threads share tgid but have distinct tid, which is really Linux's process id. I guess it doesn't break anything because idle threads are locked to their own CPUs.
kernel_init loads the init executable (typically /sbin/init), and switches both current->mm and active_mm to a non-NULL mm_struct, and clears the PF_KTHREAD flag, which makes process 1 a legitimate user space process. While process 2 does not tweak mm, meaning it remains a kernel process, same as process 0.
At the end of rest_init, do_idle takes over, which means all CPU has an idle process.
Something confused me before, but now becomes clear: the init_* objects/labels such as init_task/init_mm/init_stack are all used by process 0, and not the init process, which is process 1.

We really start Linux kernel from start_kernel, and the process 0/idle starts here too.
In the begin of start_kernel, we call set_task_stack_end_magic(&init_stack). This function will set the stack border of init_task, which is the process 0/idle.
void set_task_stack_end_magic(struct task_struct *tsk)
{
unsigned long *stackend;
stackend = end_of_stack(tsk);
*stackend = STACK_END_MAGIC; /* for overflow detection */
}
It's easy to understand that this function get the limitation address and set the bottom to STACK_END_MAGIC as a stack overflow flag. Here is the structure graph.
The process 0 is statically defined . This is the only process that is not created by kernel_thread nor fork.
/*
* Set up the first task table, touch at your own risk!. Base=0,
* limit=0x1fffff (=2MB)
*/
struct task_struct init_task
#ifdef CONFIG_ARCH_TASK_STRUCT_ON_STACK
__init_task_data
#endif
= {
#ifdef CONFIG_THREAD_INFO_IN_TASK
.thread_info = INIT_THREAD_INFO(init_task),
.stack_refcount = REFCOUNT_INIT(1),
#endif
.state = 0,
.stack = init_stack,
.usage = REFCOUNT_INIT(2),
.flags = PF_KTHREAD,
.prio = MAX_PRIO - 20,
.static_prio = MAX_PRIO - 20,
.normal_prio = MAX_PRIO - 20,
.policy = SCHED_NORMAL,
.cpus_ptr = &init_task.cpus_mask,
.cpus_mask = CPU_MASK_ALL,
.nr_cpus_allowed= NR_CPUS,
.mm = NULL,
.active_mm = &init_mm,
......
.thread_pid = &init_struct_pid,
.thread_group = LIST_HEAD_INIT(init_task.thread_group),
.thread_node = LIST_HEAD_INIT(init_signals.thread_head),
......
};
EXPORT_SYMBOL(init_task);
Here are some important thins we need to make it clearly.
INIT_THREAD_INFO(init_task) sets the thread_info as the graph above.
init_stack is defined as below
extern unsigned long init_stack[THREAD_SIZE / sizeof(unsigned long)];
where THREAD_SIZE equal to
#ifdef CONFIG_KASAN
#define KASAN_STACK_ORDER 1
#else
#define KASAN_STACK_ORDER 0
#endif
#define THREAD_SIZE_ORDER (2 + KASAN_STACK_ORDER)
#define THREAD_SIZE (PAGE_SIZE << THREAD_SIZE_ORDER)
so the default size is defined.
The process 0 will only run in kernel space, but in some circumstances as I mention above it needs a virtual memory space, so we set the following
.mm = NULL,
.active_mm = &init_mm,
Let's look back at start_kernel, the rest_init will initialize kernel_init and kthreadd.
noinline void __ref rest_init(void)
{
......
pid = kernel_thread(kernel_init, NULL, CLONE_FS);
......
pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES);
......
}
kernel_init will run execve and then go to user space, change to init process by running , which is process 1.
if (!try_to_run_init_process("/sbin/init") ||
!try_to_run_init_process("/etc/init") ||
!try_to_run_init_process("/bin/init") ||
!try_to_run_init_process("/bin/sh"))
return 0;
kthread becomes the daemon process to manage and schedule other kernel task_struts, which is process 2.
After all this, the process 0 will become idle process and jump out rq which means it will only run when the rq is empty.
noinline void __ref rest_init(void)
{
......
/*
* The boot idle thread must execute schedule()
* at least once to get things moving:
*/
schedule_preempt_disabled();
/* Call into cpu_idle with preempt disabled */
cpu_startup_entry(CPUHP_ONLINE);
}
void cpu_startup_entry(enum cpuhp_state state)
{
arch_cpu_idle_prepare();
cpuhp_online_idle(state);
while (1)
do_idle();
}
Finally, here is a good gitbook for you if you want to get more understanding of Linux kernel.

Related

How to achieve parallelism in C?

This might be a dumb question, i'm very sorry if that's the case. But i'm struggling to take advantage of the multiple cores in my computer to perform multiple computations at the same time in my Quad-Core MacBook. This is not for any particular project, just a general question, since i want to learn for when i eventually do need to do this kind of things
I am aware of threads, but the seem to run in the same core, so i don't seem to gain any performance using them for compute-bound operations (They are very useful for socket based stuff tho!).
I'm also aware of processed that can be created with fork, but i'm nor sure they are guaranteed to use more CPU, or if they, like threads, just help with IO-bound operations.
Finally i'm aware of CUDA, allowing paralellism in the GPU (And i think OpenCL and Compute Shaders also allows my code to run in the CPU in parallel) but i'm currently looking for something that will allow me to take advantage of the multiple CPU cores that my computer has.
In python, i'm aware of the multiprocessing module, which seems to provide an API very similar to threads, but there i do seem to gain an edge by running multiple functions performing computations in parallel. I'm looking into how could i get this same advantage in C, but i don't seem to be able
Any help pointing me to the right direction would be very much appreciated
Note: I'm trying to achive true parallelism, not concurrency
Note 2: I'm only aware of threads and using multiple processes in C, with threads i don't seem to be able to win the performance boost i want. And i'm not very familiar with processes, but i'm still not sure if running multiple processes is guaranteed to give me the advantage i'm looking for.

A simple program to heat up your CPU (100% utilization of all available cores).
Hint: The thread starting function does not return, program exit via [CTRL + C]
#include <pthread.h>
void* func(void *arg)
{
while (1);
}
int main()
{
#define NUM_THREADS 4 //use the number of cores (if known)
pthread_t threads[NUM_THREADS];
for (int i=0; i < NUM_THREADS; ++i)
pthread_create(&threads[i], NULL, func, NULL);
for (int i=0; i < NUM_THREADS; ++i)
pthread_join(threads[i], NULL);
return 0;
}
Compilation:
gcc -pthread -o thread_test thread_test.c
If i start ./thread_test, all cores are at 100%.
A word to fork and pthread_create:
fork creates a new process (the current process image will be copied and executed in parallel), while pthread_create will create a new thread, sometimes called a lightweight process.
Both, processes and threads will run in 'parallel' to the parent process.
It depends, when to use a child process over a thread, e.g. a child is able to replace its process image (via exec family) and has its own address space, while threads are able to share the address space of the current parent process.
There are of course a lot more differences, for that i recommend to study the following pages:
man fork
man pthreads

I am aware of threads, but the seem to run in the same core, so i don't seem to gain any performance using them for compute-bound operations (They are very useful for socket based stuff tho!).
No, they don't. Except if you block and your threads don't block, you'll see alll of them running. Just try this (beware that this consumes all your cpu time) that starts 16 threads each counting in a busy loop for 60 s. You will see all of them running and makins your cores to fail to their knees (it runs only a minute this way, then everything ends):
#include <assert.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define N 16 /* had 16 cores, so I used this. Put here as many
* threads as cores you have. */
struct thread_data {
pthread_t thread_id; /* the thread id */
struct timespec end_time; /* time to get out of the tunnel */
int id; /* the array position of the thread */
unsigned long result; /* number of times looped */
};
void *thread_body(void *data)
{
struct thread_data *p = data;
p->result = 0UL;
clock_gettime(CLOCK_REALTIME, &p->end_time);
p->end_time.tv_sec += 60; /* 60 s. */
struct timespec now;
do {
/* just get the time */
clock_gettime(CLOCK_REALTIME, &now);
p->result++;
/* if you call printf() you will see them slowing, as there's a
* common buffer that forces all thread to serialize their outputs
*/
/* check if we are over */
} while ( now.tv_sec < p->end_time.tv_sec
|| now.tv_nsec < p->end_time.tv_nsec);
return p;
} /* thread_body */
int main()
{
struct thread_data thrd_info[N];
for (int i = 0; i < N; i++) {
struct thread_data *d = &thrd_info[i];
d->id = i;
d->result = 0;
printf("Starting thread %d\n", d->id);
int res = pthread_create(&d->thread_id,
NULL, thread_body, d);
if (res < 0) {
perror("pthread_create");
exit(EXIT_FAILURE);
}
printf("Thread %d started\n", d->id);
}
printf("All threads created, waiting for all to finish\n");
for (int i = 0; i < N; i++) {
struct thread_data *joined;
int res = pthread_join(thrd_info[i].thread_id,
(void **)&joined);
if (res < 0) {
perror("pthread_join");
exit(EXIT_FAILURE);
}
printf("PTHREAD %d ended, with value %lu\n",
joined->id, joined->result);
}
} /* main */
Linux and all multithread systems work the same, they create a new execution unit (if both don't share the virtual address space, they are both processes --not exactly so, but this explains the main difference between a process and a thread--) and the available processors are given to each thread as necessary. Threads are normally encapsulated inside processes (they share ---not in linux, if that has not changed recently--- the process id, and virtual memory) Processes run each in a separate virtual space, so they can only share things through the system resources (files, shared memory, communication sockets/pipes, etc.)
The problem with your test case (you don't show it so I have go guess) is that probably you will make all threads in a loop in which you try to print something. If you do that, probably the most time each thread is blocked trying to do I/O (to printf() something)
Stdio FILEs have the problem that they share a buffer between all threads that want to print on the same FILE, and the kernel serializes all the write(2) system calls to the same file descriptor, so if the most of the time you pass in the loop is blocked in a write, the kernel (and stdio) will end serializing all the calls to print, making it to appear that only one thread is working at a time (all the threads will become blocked by the one that is doing the I/O) This busy loop will make all the threads to run in parallel and will show you how the cpu is collapsed.

Parallelism in C can be achieved by using the fork() function. This function simulates a thread by allowing two threads to run simultaneously and share data. The first thread forks itself, and the second thread is then executed as if it was launched from main(). Forking allows multiple processes to be Run concurrently without conflicts arising.
To make sure that data is shared appropriately between the two threads, use the wait() function before accessing shared resources. Wait will block execution of the current program until all database connections are closed or all I/O has been completed, whichever comes first.

CHIP8 in C - How to correctly handle the delay timer?

TL;DR I need to emulate a timer in C that allows concurrent writes and reads, whilst preserving constant decrements at 60 Hz (not exactly, but approximately accurate). It will be part of a Linux CHIP8 emulator. Using a thread-based approach with shared memory and semaphores raises some accuracy problems, as well as race conditions depending on how the main thread uses the timer.
Which is the best way to devise and implement such a timer?
I am writing a Linux CHIP8 interpreter in C, module by module, in order to dive into the world of emulation.
I want my implementation to be as accurate as possible with the specifications. In that matter, timers have proven to be the most difficult modules for me.
Take for instance the delay timer. In the specifications, it is a "special" register, initally set at 0. There are specific opcodes that set a value to, and get it from the register.
If a value different from zero is entered into the register, it will automatically start decrementing itself, at a frequency of 60 Hz, stopping once zero is reached.
My idea regarding its implementation consists of the following:
The use of an ancillary thread that does the decrementing automatically, at a frequency of nearly 60 Hz by using nanosleep(). I use fork() to create the thread for the time being.
The use of shared memory via mmap() in order to allocate the timer register and store its value on it. This approach allows both the ancillary and the main thread to read from and write to the register.
The use of a semaphore to synchronise the access for both threads. I use sem_open() to create it, and sem_wait() and sem_post() to lock and unlock the shared resource, respectively.
The following code snippet illustrates the concept:
void *p = mmap(NULL, sizeof(int), PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED, -1, 0);
/* Error checking here */
sem_t *mutex = sem_open("timersem", O_CREAT, O_RDWR, 1);
/* Error checking and unlinking */
int *val = (int *) p;
*val = 120; // 2-second delay
pid_t pid = fork();
if (pid == 0) {
// Child process
while (*val > 0) { // Possible race condition
sem_wait(mutex); // Possible loss of frequency depending on main thread code
--(*val); // Safe access
sem_post(mutex);
/* Here it goes the nanosleep() */
}
} else if (pid > 0) {
// Parent process
if (*val == 10) { // Possible race condition
sem_wait(mutex);
*val = 50; // Safe access
sem_post(mutex);
}
}
A potential problem I see with such implementation relies on the third point. If a program happens to update the timer register once it has reached a value different from zero, then the ancillary thread must not wait for the main thread to unlock the resource, or else the 60 Hz delay will not be fulfilled. This implies both threads may freely update and/or read the register (constant writes in the case of the ancillary thread), which obviously introduces race conditions.
Once I have explained what I am doing and what I try to achieve, my question is this:
Which is the best way to devise and emulate a timer that allows concurrent writes and reads, whilst preserving an acceptable fixed frequency?

Don't use threads and synchronization primitives (semaphores, shared memory, etc) for this. In fact, I'd go as far as to say: don't use threads for anything unless you explicitly need multi-processor concurrency. Synchronization is difficult to get right, and even more difficult to debug when you get it wrong.
Instead, figure out a way to implement this in a single thread. I'd recommend one of two approaches:
Keep track of the time the last value was written to the timer register. When reading from the register, calculate how long ago it was written to, and subtract an appropriate value from the result.
Keep track of how many instructions are being executed overall, and subtract 1 from the timer register every N instructions, where N is a large number such that N instructions is about 1/60 second.

nonexistant memory in gdb

I'm working on an OS class project with a variant of HOCA system.
I'm trying to create the interrupt handler part of the OS where I/O device interrupts are detected and handled.
(If you have no idea about HOCA, that's fine) My question is really about the internal manipulation of C.
The whole system work like this:
Main function of the OS calls an init() where all the parts are initialized.
After initializing the OS, the root process is created and the first application is schedule()'ed to the specific application. Then the application processes are created and schedule()'ed in a tree structure which rooted from the root process.
void schedule(){
proc_t *front;
front = headQueue(RQ); //return the first available process in the Ready Queue
if (checkPointer(front)) {
intschedule(); // load a timeslice to the OS
LDST(&(front->p_s)); // load the state to the OS
// so that OS can process the application specified by p_s
// LDST() is system function to load a state to processor
}
else {
intdeadlock(); // unlock a process from the blocked list and put in RQ
}
}
Using gdb, I see everything is ok, until it processes right before if(checkPointer(front))
int checkPointer(void *p){
return ((p != (void *) ENULL)&&(p != (void *)NULL));
}
gdb respond:
trap: nonexistant memory address: -1 memory size: 131072 ERROR:
address greater than MEMORYSIZE
what's going wrong with this?
checkPointer() is located in another file.
Your help is much appreciated.

4 Process 4 way synchronization using semaphores (In a C Programming, UNIX environment)

I have a question about synchronizing 4 processes in a UNIX environment. It is very important that no process runs their main functionality without first waiting for the others to "be on the same page", so to speak.
Specifically, they should all not go into their loops without first synchronizing with each other. How do I synchronize 4 processes in a 4 way situation, so that none of them get into their first while loop without first waiting for the others? Note that this is mainly a logic problem, not a coding problem.
To keep things consistent between environments let's just say we have a pseudocode semaphore library with the operations semaphore_create(int systemID), semaphore_open(int semaID), semaphore_wait(int semaID), and semaphore_signal(int semaID).
Here is my attempt and subsequent thoughts:
Process1.c:
int main() {
//Synchronization area (relevant stuff):
int sem1 = semaphore_create(123456); //123456 is an arbitrary ID for the semaphore.
int sem2 = semaphore_create(78901); //78901 is an arbitrary ID for the semaphore.
semaphore_signal(sem1);
semaphore_wait(sem2);
while(true) {
//...do main functionality of process, etc (not really relevant)...
}
}
Process2.c:
int main() {
//Synchronization area (relevant stuff):
int sem1 = semaphore_open(123456);
int sem2 = semaphore_open(78901);
semaphore_signal(sem1);
semaphore_wait(sem2);
while(true) {
//...do main functionality of process etc...
}
}
Process3.c:
int main() {
//Synchronization area (relevant stuff):
int sem1 = semaphore_open(123456);
int sem2 = semaphore_open(78901);
semaphore_signal(sem1);
semaphore_wait(sem2);
while(true) {
//...do main functionality of process etc...
}
}
Process4.c:
int main() {
//Synchronization area (relevant stuff):
int sem1 = semaphore_open(123456);
int sem2 = semaphore_open(78901);
semaphore_signal(sem2);
semaphore_signal(sem2);
semaphore_signal(sem2);
semaphore_wait(sem1);
semaphore_wait(sem1);
semaphore_wait(sem1);
while(true) {
//...do main functionality of process etc...
}
}
We run Process1 first, and it creates all of the semaphores into system memory used in the other processes (the other processes simply call semaphore_open to gain access to those semaphores). Then, all 4 processes have a signal operation, and then a wait. The signal operation causes process1, process2, and process3 to increment the value of sem1 by 1, so it's resultant maximum value is 3 (depending on what order the operating system decides to run these processes in). Process1, 2, and 3, are all waiting then on sem2, and process4 is waiting on sem1 as well. Process 4 then signals sem2 3 times to bring its value back up to 0, and waits on sem1 3 times. Since sem1 was a maximum of 3 from the signalling in the other processes (depending on what order they ran in, again), then it will bring its value back up to 0, and continue running. Thus, all processes will be synchronized.
So yea, not super confident on my answer. I feel that it depends heavily on what order the processes ran in, which is the whole point of synchronization -- that it shouldn't matter what order they run in, they all synchronize correctly. Also, I am doing a lot of work in Process4. Maybe it would be better to solve this using more than 2 semaphores? Wouldn't this also allow for more flexibility within the loops in each process, if I want to do further synchronization?
My question: Please explain why the above logic will or will not work, and/or a solution on how to solve this problem of 4 way synchronization. I'd imagine this is a very common thing to have to think about depending on the industry (eg. banking and synching up bank accounts). I know it is not very difficult, but I have never worked with semaphores before, so I'm kind of confused on how they work.

The precise semantics of your model semaphore library are not clear enough to answer your question definitively. However, if the difference between semaphore_create() and semaphore_open() is that the latter requires the specified semaphore to already exist, whereas the former requires it to not exist, then yes, the whole thing will fall down if process1 does not manage to create the needed semaphores before any of the other processes attempt to open them. (Probably it falls down in different ways if other semantics hold.)
That sort of issue can be avoided in a threading scenario because with threads there is necessarily an initial single-threaded segment wherein the synchronization structures can be initialized. There is also shared memory by which the various threads can communicate with one another. The answer #Dark referred to depends on those characteristics.
The essential problem with a barrier for multiple independent processes -- or for threads that cannot communicate via shared memory and that are not initially synchronized -- is that you cannot know which process needs to erect the barrier. It follows that each one needs to be prepared to do so. That can work in your model library if semaphore_create() can indicate to the caller which result was achieved, one of
semaphore successfully created
semaphore already exists
(or error)
In that case, all participating processes (whose number you must know) can execute the same procedure, maybe something like this:
void process_barrier(int process_count) {
sem_t *sem1, *sem2, *sem3;
int result = semaphore_create(123456, &sem1);
int counter;
switch (result) {
case SEM_SUCCESS:
/* I am the controlling process */
/* Finish setting up the barrier */
semaphore_create(78901, &sem2);
semaphore_create(23432, &sem3);
/* let (n - 1) other processes enter the barrier... */
for (counter = 1; counter < process_count; counter += 1) {
semaphore_signal(sem1);
}
/* ... and wait for those (n - 1) processes to do so */
for (counter = 1; counter < process_count; counter += 1) {
semaphore_wait(sem2);
}
/* let all the (n - 1) waiting processes loose */
for (counter = 1; counter < process_count; counter += 1) {
semaphore_signal(sem3);
}
/* and I get to continue, too */
break;
case SEM_EXISTS_ERROR:
/* I am NOT the controlling process */
semaphore_open(123456, &sem1);
/* wait, if necessary, for the barrier to be initialized */
semaphore_wait(sem1);
semaphore_open(78901, &sem2);
semaphore_open(23432, &sem3);
/* signal the controlling process that I have reached the barrier */
semaphore_signal(sem2);
/* wait for the controlling process to allow me to continue */
semaphore_wait(sem3);
break;
}
}
Obviously, I have taken some minor liberties with your library interface, and I have omitted error checks except where they bear directly on the barrier's operation.
The three semaphores involved in that example serve distinct, well-defined purposes. sem1 guards the initialization of the synchronization constructs and allows the processes to choose which among them takes responsibility for controlling the barrier. sem2 serves to count how many processes have reached the barrier. sem3 blocks the non-controlling processes that have reached the barrier until the controlling process releases them all.

Signal handling in kernel-space

I've written a program that uses SIGALRM and a signal handler.
I'm now trying to add this as a test module within the kernel.
I found that I had to replace a lot of the functions that libc provides with their underlying syscalls..examples being timer_create with sys_timer_create timer_settime with sys_timer_settime and so on.
However, I'm having issues with sigaction.
Compiling the kernel throws the following error
arch/arm/mach-vexpress/cpufreq_test.c:157:2: error: implicit declaration of function 'sys_sigaction' [-Werror=implicit-function-declaration]
I've attached the relevant code block below
int estimate_from_cycles() {
timer_t timer;
struct itimerspec old;
struct sigaction sig_action;
struct sigevent sig_event;
sigset_t sig_mask;
memset(&sig_action, 0, sizeof(struct sigaction));
sig_action.sa_handler = alarm_handler;
sigemptyset(&sig_action.sa_mask);
VERBOSE("Blocking signal %d\n", SIGALRM);
sigemptyset(&sig_mask);
sigaddset(&sig_mask, SIGALRM);
if(sys_sigaction(SIGALRM, &sig_action, NULL)) {
ERROR("Could not assign sigaction\n");
return -1;
}
if (sigprocmask(SIG_SETMASK, &sig_mask, NULL) == -1) {
ERROR("sigprocmask failed\n");
return -1;
}
memset (&sig_event, 0, sizeof (struct sigevent));
sig_event.sigev_notify = SIGEV_SIGNAL;
sig_event.sigev_signo = SIGALRM;
sig_event.sigev_value.sival_ptr = &timer;
if (sys_timer_create(CLOCK_PROCESS_CPUTIME_ID, &sig_event, &timer)) {
ERROR("Could not create timer\n");
return -1;
}
if (sigprocmask(SIG_UNBLOCK, &sig_mask, NULL) == -1) {
ERROR("sigprocmask unblock failed\n");
return -1;
}
cycles = 0;
VERBOSE("Entering main loop\n");
if(sys_timer_settime(timer, 0, &time_period, &old)) {
ERROR("Could not set timer\n");
return -1;
}
while(1) {
ADD(CYCLES_REGISTER, 1);
}
return 0;
}
Is such an approach of taking user-space code and changing the calls alone sufficient to run the code in kernel-space?

Is such an approach of taking user-space code and changing the calls
alone sufficient to run the code in kernel-space?
Of course not! What are you doing is to call the implementation of a system call directly from kernel space, but there is not guarantee that they SYS_function has the same function definition as the system call. The correct approach is to search for the correct kernel routine that does what you need. Unless you are writing a driver or a kernel feature you don't nee to write kernel code. System calls must be only invoked from user space. Their main purpose is to offer a safe manner to access low level mechanisms offered by an operating system such as File System, Socket and so on.
Regarding signals. You had a TERRIBLE idea to try to use signal system calls from kernel space in order to receive a signal. A process sends a signal to another process and signal are meant to be used in user space, so between user space processes. Typically, what happens when you send a signal to another process is that, if the signal is not masked, the receiving process is stopped and the signal handler is executed. Note that in order to achieve this result two switches between user space and kernel space are required.
However, the kernel has its internal tasks which have exactly the same structure of a user space with some differences ( e.g. memory mapping, parent process, etc..). Of course you cannot send a signal from a user process to a kernel thread (imagine what happen if you send a SIGKILL to a crucial component). Since kernel threads have the same structure of user space thread, they can receive signal but its default behaviour is to drop them unless differently specified.
I'd recommend to change you code to try to send a signal from kernel space to user space rather than try to receive one. ( How would you send a signal to kernel space? which pid would you specify?). This may be a good starting point : http://people.ee.ethz.ch/~arkeller/linux/kernel_user_space_howto.html#toc6
You are having problem with sys_sigaction because this is the old definition of the system call. The correct definition should be sys_rt_sigaction.
From the kernel source 3.12 :
#ifdef CONFIG_OLD_SIGACTION
asmlinkage long sys_sigaction(int, const struct old_sigaction __user *,
struct old_sigaction __user *);
#endif
#ifndef CONFIG_ODD_RT_SIGACTION
asmlinkage long sys_rt_sigaction(int,
const struct sigaction __user *,
struct sigaction __user *,
size_t);
#endif
BTW, you should not call any of them, they are meant to be called from user space.

You're working in kernel space so you should start thinking like you're working in kernel space instead of trying to port a userspace hack into the kernel. If you need to call the sys_* family of functions in kernel space, 99.95% of the time, you're already doing something very, very wrong.
Instead of while (1), have it break the loop on a volatile variable and start a thread that simply sleeps and change the value of the variable when it finishes.
I.e.
void some_function(volatile int *condition) {
sleep(x);
*condition = 0;
}
volatile int condition = 1;
start_thread(some_function, &condition);
while(condition) {
ADD(CYCLES_REGISTER, 1);
}
However, what you're doing (I'm assuming you're trying to get the number of cycles the CPU is operating at) is inherently impossible on a preemptive kernel like Linux without a lot of hacking. If you keep interrupts on, your cycle count will be inaccurate since your kernel thread may be switched out at any time. If you turn interrupts off, other threads won't run and your code will just infinite loop and hang the kernel.
Are you sure you can't simply use the BogoMIPs value from the kernel? It is essentially what you're trying to measure but the kernel does it very early in the boot process and does it right.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight