multithreading program to perform word count frequency- Segmentation fault - c

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <pthread.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <unistd.h>
#include <time.h>
pthread_mutex_t lock;
//typedef struct for a word
typedef struct {
char word[101];
int frequency;
}Word;
//struct for thread
struct ft{
char* fileName;
int start;
int stop;
};
//compare frequency of 2 words
int compareWords(const void *f1, const void *f2){
Word *a = (Word *)f1;
Word *b = (Word *)f2;
return (b->frequency - a->frequency);
}
//count frequency of a word
void countFrequency(void *arg){
pthread_mutex_lock(&lock);
int i, c;
struct ft* fi = (struct ft*)arg;
FILE *file = fopen(fi->fileName,"r");
fseek(file,fi->start,SEEK_SET);
for(i = 0; i < fi->stop - fi->start; i++){
c = getc(file);
//printf("%d\n", c);
//frequency count
}
fclose(file);
pthread_mutex_unlock(&lock);
}
int main (int argc, char **argv){
//variabies for <time.h>
struct timespec startTime;
struct timespec endTime;
clock_gettime(CLOCK_REALTIME, &startTime);
/*------------main------------------*/
//variables
int nthreads; //number of threads
int chunkSize; //each threas processing size
//if user input is not correct, inform
if(argc < 3){
printf("./a.out text_file #ofthreads \n");
exit(-1);
}
nthreads = atoi(argv[2]);
chunkSize = sizeof(argv[1])/nthreads;
//declare threads and default attributes
pthread_t threads[nthreads];
pthread_attr_t attr;
pthread_attr_init(&attr);
//run threads in parallel
int i;
for (i = 0; i < nthreads; i++){
struct ft data[nthreads];
data[i].start = i*chunkSize;
data[i].stop = data[i].start+chunkSize;
data[i].fileName = argv[1];
// Create a new thread for every segment, and count word frequency for each
pthread_create(&threads[i], &attr, (void*) countFrequency, (void*) &data[i]);
}
//wait for results (all threads)
for (i = 0; i < nthreads; i++){
pthread_join(threads[i], NULL);
}
//func of <time.h>
clock_gettime(CLOCK_REALTIME, &endTime);
time_t sec = endTime.tv_sec - startTime.tv_sec;
long n_sec = endTime.tv_nsec - startTime.tv_nsec;
if (endTime.tv_nsec < startTime.tv_nsec)
{
--sec;
n_sec = n_sec + 1000000000L;
}
printf("Total Time was %ld.%09ld seconds\n", sec, n_sec);
}
I'm working on this program to use multiple threads to read and process a large text file and perform a word count frequency of the top 10 most frequent words in the text that are longer than 6 characters long. But I keep getting the segmentation fault error im not sure why, does anybody have any idea.?

This code:
for (i = 0; i < nthreads; i++){
struct ft data[nthreads];
declares data that is live (legal to use) for the duration of this for loop. This code:
pthread_create(&threads[i], &attr, (void*) countFrequency, (void*) &data[i]);
}
passes the address of data into the threads, and then exits the loop. Once the loop is done, data is no longer live, and all access to it leads to undefined behavior.
The compiler is free to write anything else to the memory where data used to be.
The immediate cause of the crash is that if one of the threads doesn't execute fopen before data is overwritten, then fopen may fail, and you don't check for the failure in fopen.
P.S.
As Eraklon noted, this code: chunkSize = sizeof(argv[1])/nthreads; will divide sizeof(char*) (either 4 or 8 depending on whether you build for 32-bit or for 64-bit) by number of threads. That is unlikely to be what you want, and will yield chinkSize==0 for nthreads > 4 on 32-bit and nthreads > 8 on 64-bit machines.
P.P.S.
There is a concurrency bug in your program as well: since each of the countFrequency invocations locks the same lock for the entire duration, they will all run in sequence (one after another), never in parallel. Thus your program will be slower than if you just did all the work in your main thread.

Related

Printf with multiple threads (for real-time logging) in C

I have written a code for real-time logging. Here's the pseudo-code:
initialize Q; //buffer structure stores values to be printed
log(input)
{
push input to Q;
}
printLog() //infinte loop
{
loop(1)
{
if(Q is not empty)
{
values = pop(Q);
msg = string(values); //formating values into a message string
print(msg);
}
}
}
mainFunction()
{
loop(1)
{
/*
insert operations to be performed
*/
log(values); //log function called
}
}
main()
{
Create 4 threads; //1 mainFunction and 3 printLog
Bind them to CPUs;
}
I'm using atomic operations instead of locks.
When I print the output to the console, I see that each thread prints consecutively for a while. This must mean that once a thread enters printLog(), the other threads are inactive for a while.
What I want instead is while one thread is printing, another thread formats the next value popped from Q and prints it right after. How can this be achieved?
EDIT: I've realized the above information isn't sufficient. Here are some other details.
Buffer structure Q is a circular array of fixed size.
Pushing information to Q is faster than popping+printing. So by the time the Buffer structure is full, I want most of the information to be printed.
NOTE: mainFunction thread shouldn't wait to fill Buffer when it is full.
I'm trying to utilize all the threads at a given time. Currently, after one thread prints, the same thread reads and prints the next value (this means the other 2 threads are inactive).
Here's the actual code:
//use gcc main.c -o run -pthread
#define _GNU_SOURCE
#include <unistd.h>
#include <stdint.h>
#include <sys/time.h>
#include <time.h>
#include <string.h>
#include <stdio.h>
#include <pthread.h>
#include <math.h>
#include <signal.h>
#include <stdlib.h>
#define N 3
/* Buffer size */
#define BUFFER_SIZE 1000
struct values
{
uint64_t num;
char msg[20];
};
struct values Q[BUFFER_SIZE];
int readID = -1;
int writeID = -1;
int currCount = 0;
void Log(uint64_t n, char* m)
{
int i;
if (__sync_fetch_and_add(&currCount,1) < BUFFER_SIZE)
{
i = __sync_fetch_and_add(&writeID,1);
i = i%BUFFER_SIZE;
Q[i].num = n;
strcpy(Q[i].msg, m);
}
else __sync_fetch_and_add(&currCount,-1);
}
void *printLog(void *x)
{
int thID = *((int*)(x));
int i;
while(1)
{
if(__sync_fetch_and_add(&currCount,-1)>=0)
{
i = __sync_fetch_and_add(&readID,1);
i = i%BUFFER_SIZE;
printf("ThreadID: %2d, count: %10d, message: %15s\n",thID,Q[i].num,Q[i].msg);
}
else __sync_fetch_and_add(&currCount,1);
}
}
void *mainFunction()
{
uint64_t i = 0;
while(1)
{
Log(i,"Custom Message");
i++;
usleep(50);
}
}
int main()
{
/* Set main() Thread CPU */
cpu_set_t cpusetMain;
CPU_ZERO(&cpusetMain);
CPU_SET(0, &cpusetMain);
if(0 != pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpusetMain))
printf("pthread_setaffinity_np failed for CPU: 0\n");
int LogThID[N+1];
pthread_t LogThreads[N+1];
/* Create Threads */
if (pthread_create(&LogThreads[0], NULL, &mainFunction, NULL) != 0){return 0;}
for(int i=1; i<N+1 ; i++)
{
LogThID[i] = i;
if (pthread_create(&LogThreads[i], NULL, &printLog, &LogThID[i]) != 0){return i;}
}
/* Set CPUs */
cpu_set_t cpuset[N+1];
for(int i=0; i<N+1; i++)
{
CPU_ZERO(&cpuset[i]);
CPU_SET(i+1, &cpuset[i]);
if(0 != pthread_setaffinity_np(LogThreads[i], sizeof(cpu_set_t), &cpuset[i]))
printf("pthread_setaffinity_np failed for CPU: %d\n", i+1);
}
struct sched_param param[N+1];
for(int i=0; i<N+1; i++)
{
param[i].sched_priority = 91;
if(0 != pthread_setschedparam(LogThreads[i],SCHED_FIFO,&param[i]))
printf("pthread_setschedparam failed for CPU: %d\n", i);
}
/* Join threads */
for(int i=0; i<N+1; i++)
{
pthread_join(LogThreads[i], NULL);
}
return 0;
}

Why do I keep getting segmentation fault in my C program

I've been trying to implement thread synchronization on C. However, I keep getting the segmentation fault when my invoke the function that I want the thread to execute. So anyone can suggest the solution on for this problem?
Here is my code
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <semaphore.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>
#define N 5
#define M 3
#define LEFT (robot_id - 1) % N
#define RIGHT (robot_id + 1) % N
pthread_t robots_id[N];
sem_t simulations[M];
pthread_mutex_t sever_mutex;
void Learning(int robot_id)
{
printf("learning robot = %d\n", robot_id);
}
void *robotAct(void *id)
{
int *robot_id = id;
printf("robot id = %d\n", robot_id);
Learning(*robot_id);
}
int main(int argc, char *argv[])
{
int E, T;
E = atoi(argv[1]);
T = atoi(argv[2]);
printf("Initializing Robot!\n");
//Initializes the simulations
for (int i = 0; i < M; i++)
{
sem_init(&simulations[i], 0, 0);
}
//Initializes the robots
for (int i = 0; i < N; i++)
{
printf("Robot %d is created\n", i + 1);
pthread_create(&robots_id[i], NULL, robotAct, (void *)i + 1);
}
sleep(T);
printf("Terminating Robots\n");
for (int i = 0; i < N; i++)
{
pthread_cancel(robots_id[i]);
}
printf("Termination is completed!\n");
printf("-------Report-------------\n");
//getReport();
return 0;
}
Here is my result that I keep getting
Initializing Robot!
Robot 1 is created
Robot 2 is created
Robot 3 is created
robot id = 1
robot id = 2
Robot 4 is created
robot id = 3
[1] 54477 segmentation fault ./project 5 10
The main issue is explained in my comment:
You're not passing a valid pointer to the thread function. You sort of, mostly, almost get away with the misuse of it in the printf() call in robotAct(); you emphatically do not get away with it in the call to Learning() where you dereference the invalid non-pointer.
A solution is to create an array of integers in the main program which holds robot ID numbers (int id[N];). Then, initialize each element and pass &id[i] to pthread_create().
You should not print addresses with the %d format (even though it works on 32-bit systems; it does not work on 64-bit systems). The correct technique is to use %p to format the address. Or, in this case, print the integer and not the address using *robot_id.
The code that follows has minimal adaptations to the original code and has not been compiled or tested (there could be problems outside the lines changed):
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <semaphore.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>
#define N 5
#define M 3
#define LEFT (robot_id - 1) % N
#define RIGHT (robot_id + 1) % N
pthread_t robots_id[N];
sem_t simulations[M];
pthread_mutex_t sever_mutex;
void Learning(int robot_id)
{
printf("learning robot = %d\n", robot_id);
}
void *robotAct(void *id)
{
int *robot_id = id;
printf("robot id = %d\n", *robot_id); // Changed
Learning(*robot_id);
return 0; // Added
}
int main(int argc, char *argv[])
{
int E, T;
int id[N]; // Added
E = atoi(argv[1]);
T = atoi(argv[2]);
printf("Initializing Robot!\n");
//Initializes the simulations
for (int i = 0; i < M; i++)
{
sem_init(&simulations[i], 0, 0);
}
//Initializes the robots
for (int i = 0; i < N; i++)
{
printf("Robot %d is created\n", i + 1);
id[i] = i + 1; // Added
pthread_create(&robots_id[i], NULL, robotAct, &id[i]); // Changed
}
sleep(T);
printf("Terminating Robots\n");
for (int i = 0; i < N; i++)
{
pthread_cancel(robots_id[i]);
}
printf("Termination is completed!\n");
printf("-------Report-------------\n");
//getReport();
return 0;
}
Avoid using pthread_cancel() for ending the threads; the threads should terminate under control. For example, there might be a flag that you set in the main thread to indicate that the threads should cease, and they'd check that periodically. Normally, pthread_join() is used to clean up the completed threads.
For future posts, please read about how to create an MCVE (Minimal, Complete, Verifiable Example). There are parts of the code shown that are not relevant to the problem — the mutex and the semaphores, for example, are not really used.

How to read the same file byte by byte asynchronously from within a few threads?

I am trying to read a file with aio.h byte by byte using aio_read with a number of threads. But I don't know if I am on the right track since there are not so many stuff to read on the Internet.
I have just created a worker function to pass it to my threads. And also as an argument to pass to the thread, I created a struct called thread_arguments and I pass a few necessary arguments in it, which will be needed for aiocb such as offset, file_path to open, buffer size, and priority.
I can read a file with one thread from start to end successfully. But when it comes to reading a file as chunks from within a few threads, I couldn't make it. And I am not even sure if I can do that with aio->reqprio without using semaphores or mutexes. (Trying to open a file from within a few threads at the same time?)
How can I read a few number of bytes from within a few threads asynchronously?
Let's say the file contains "foobarquax" and we have three threads using aio library.
Then first one should read "foo",
the second should read "bar" and
the last one should read "quax" asynchronously.
You can see screenshots of issues regarding running it with multiple threads on here
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <aio.h>
#include <string.h>
#include <fcntl.h> // open -> file descriptor O_RDONLY, O_WRONLY, O_RDWR
#include <errno.h>
#include <unistd.h>
typedef struct thread_args {
char *source_path;
char *destination_path;
long int buffer_size;
long int buffer_size_last; // buffer size for the last thread in case there is a remainder.
long int offset;
int priority;
} t_args;
void *worker(void *thread_args) {
t_args *arguments = (t_args *) thread_args;
struct aiocb *aiocbRead;
aiocbRead = calloc(1, sizeof(struct aiocb));
aiocbRead->aio_fildes = open(arguments->source_path, O_RDONLY);
if (aiocbRead->aio_fildes == -1) {
printf("Error");
}
printf("opened on descriptor %d\n", aiocbRead->aio_fildes);
aiocbRead->aio_buf = malloc(sizeof(arguments->buffer_size));
aiocbRead->aio_offset = arguments->offset;
aiocbRead->aio_nbytes = (size_t) arguments->buffer_size;
aiocbRead->aio_reqprio = arguments->priority;
int s = aio_read(aiocbRead);
if (s == -1) {
printf("There was an error");
}
while (aio_error(aiocbRead) == EINPROGRESS) {}
printf("Bytes read %ld", aio_return(aiocbRead));
close(aiocbRead->aio_fildes);
for (int i = 0; i < arguments->buffer_size ; ++i) {
printf("%c\n", (*((char *) aiocbRead->aio_buf + i)));
}
}
// Returns a random alphabetic character
char getrandomChar() {
int letterTypeFlag = (rand() % 2);
if (letterTypeFlag)
return (char) ('a' + (rand() % 26));
else
return (char) ('A' + (rand() % 26));
}
void createRandomFile(char *source, int numberofBytes) {
FILE *fp = fopen(source, "w");
for (int i = 0; i < numberofBytes; i++) {
fprintf(fp, "%c", getrandomChar());
}
fclose(fp);
}
int main(int argc, char *argv[]) {
char *source_path = argv[1];
char *destination_path = argv[2];
long int nthreads = strtol(argv[3], NULL, 10);
// Set the seed.
srand(time(NULL));
// Get random number of bytes to write to create the random file.
int numberofBytes = 10 + (rand() % 100000001);
// Create a random filled file at the source path.
createRandomFile(source_path, 100);
// Calculate the payload for each thread.
long int payload = 100 / nthreads;
long int payloadLast = payload + 100 % nthreads;
// Create a thread argument to pass to pthread.
t_args *thread_arguments = (t_args *) malloc(nthreads * sizeof(t_args));
for (int l = 0; l < nthreads; ++l) {
// Set arguments in the struct.
(&thread_arguments)[l]->source_path = source_path;
(&thread_arguments)[l]->destination_path = destination_path;
(&thread_arguments)[l]->buffer_size = payload;
(&thread_arguments)[l]->buffer_size_last = payloadLast;
(&thread_arguments)[l]->offset = l * payload;
(&thread_arguments)[l]->priority = l;
}
pthread_t tID[nthreads];
// Create pthreads.
for (int i = 0; i < nthreads; ++i) {
pthread_create(&tID[i], NULL, worker, (void *) &thread_arguments[i]);
}
// Wait for pthreads to be done.
for (int j = 0; j < nthreads; ++j) {
pthread_join(tID[j], NULL);
}
free(thread_arguments);
return 0;
}
This code is reading succesfully if I just call it from one thread but doesn't work if I use it for more than one threads which is what I want.

Squaring numbers with multiple threads

I am trying to square the numbers 1 - 10,000 with 8 threads. To clarify, I want the 1st thread to do 1^2, 2nd thread to do 2^2, ..., 8th thread to do 8^2, first thread to do 9^2... etc. The problem I am having is that instead of the above happening, each thread computes the squares of 1-10,000.
My code is below. I have marked sections that I'd rather those answering do not modify. Thanks in advance for any tips!
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <pthread.h>
#define NUMBER_OF_THREADS 8
#define START_NUMBER 1
#define END_NUMBER 10000
FILE *f;
void *sqrtfunc(void *tid) { //function for computing squares
int i;
for (i = START_NUMBER; i<=END_NUMBER; i++){
fprintf(f, "%lu squared = %lu\n", i, i*i);
}
}
int main(){
//Do not modify starting here
struct timeval start_time, end_time;
gettimeofday(&start_time, 0);
long unsigned i;
f = fopen("./squared_numbers.txt", "w");
//Do not modify ending here
pthread_t mythreads[NUMBER_OF_THREADS]; //thread variable
long mystatus;
for (i = 0; i < NUMBER_OF_THREADS; i++){ //loop to create 8 threads
mystatus = pthread_create(&mythreads[i], NULL, sqrtfunc, (void *)i);
if (mystatus != 0){ //check if pthread_create worked
printf("pthread_create failed\n");
exit(-1);
}
}
for (i = 0; i < NUMBER_OF_THREADS; i++){
if(pthread_join(mythreads[i], NULL)){
printf("Thread failed\n");
}
}
exit(1);
//Do not modify starting here
fclose(f);
gettimeofday(&end_time, 0);
float elapsed = (end_time.tv_sec-start_time.tv_sec) * 1000.0f + \
(end_time.tv_usec-start_time.tv_usec) / 1000.0f;
printf("took %0.2f milliseconds\n", elapsed);
//Do not modify ending here
}

c cygwin- abored(core dumped)

I have tried for a long time and cannot figure out where this 'core dumped' is coming from. I am using c on cygwin. Commenting out the threads gets rid of the problem but commenting out the entire code in the thread does nothing. Could this have something to do with the calling of the thread?? It appeared to be working then this suddenly happened. I have deleted most of the code and this is what is left-
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <stdint.h>
#include <pthread.h>
#include <string.h>
typedef enum {true=1, false=0} bool;
void *piThread(void *arg);
int finished;
int main(int argc, char *argv[])
{
int i;
int threads;
bool display = false;
long double pI = 0.0;
void *status = malloc(sizeof(int));
pthread_t thread_id[threads];
if(argc < 2) {printf("not enough arguments"); exit(1);
}else threads = atoi(argv[1]);
if(argc == 3)
if (strcmp(argv[2], "b") == 0)
display = true;
for(i=0; i<threads; i++)
{
pthread_create(&thread_id[i], NULL, piThread, NULL);
pthread_join(thread_id[i], &status);
printf("pi: %Lf\n", pI);
}
return 0;
}
void *piThread(void *arg)
{
int number = 0;
number = 74;
pthread_exit((void*)number);
}
This is causing an aborted error.
Stack trace:
Frame Function Args
0028A6A4 76821184 (000000D0, 0000EA60, 00000000, 0028A7D8)
0028A6B8 76821138 (000000D0, 0000EA60, 000000A4, 0028A7B4)
0028A7D8 610DBE29 (00000000, FFFFFFFE, 77403B23, 77403B4E)
0028A8C8 610D915E (00000000, 0028A918, 00000001, 00000000)
0028A928 610D962E (76D709CD, 7427AED9, 00000003, 00000006)
0028A9D8 610D9780 (000011E8, 00000006, 002B002B, 800483D8)
0028A9F8 610D97AC (00000006, 0028CE80, FFFDE000, 00000000)
0028AA28 610D9A85 (000000D0, 0028ABF0, 0028AA58, 610FA223)
End of stack trace
I have no idea what is wrong!!
command line is-
gcc pi.exe 100
any combination ABOVE 26 causes this fault.
Thank you for any insight
You are allocating thread_id before 'threads' is defined. This should fix that problem at least.
if(argc < 2) {printf("not enough arguments"); exit(1);
}else threads = atoi(argv[1]);
pthread_t thread_id[threads];

Resources