Multithreaded reading/doing things with chars from character array in C - c

I am trying to read a character array that contains the contents of many large files. The character array is going to be quite large, because the files are large, so I want to do it using multithreading (pthread). I want the user to be able to designate how many threads they want to run. I have something working, but increasing the number of threads does nothing to affect performance (i.e. 1 thread finishes just as fast as 10). In fact, it seems to be just the opposite: telling the program to use 10 threads runs much slower than telling it to use 1.
Here is the method for slicing up the character array according to the number of threads the user passes to the program. I know this is wrong, I could use some advice here.
//Universal variables
int numThreads;
size_t sizeOfAllFiles; // Size, in bytes, of allFiles
char* allFiles; // Where all of the files are stored, together
void *zip(void *nthread);
void *zip(void *nThread) {
int currentThread = *(int*)nThread;
int remainder = sizeOfAllFiles % currentThread;
int slice = (sizeOfAllFiles-remainder) / currentThread;
// I subtracted the remainder for my testing
// because I didn't want to worry about whether
// the char array's size is evenly divisible by numThreads
int i = (slice * (currentThread-1));
char currentChar = allFiles[i]; //Used for iterating
while(i<(slice * currentThread) && i>=(slice * (currentThread-1))) {
i++;
// Do things with the respective thread's
// 'slice' of the array.
.....
}
return 0;
}
And here is how I am spawning the threads, which I am almost positive that I am doing correctly:
for (int j = 1; j <= threadNum; j++) {
k = malloc(sizeof(int));
*k = j;
if (pthread_create (&thread[j], NULL, zip, k) != 0) {
printf("Error\n");
free(thread);
exit(EXIT_FAILURE);
}
}
for (int i = 1; i <= threadNum; i++)
pthread_join (thread[i], NULL);
This is all really confusing for me so if I could get some help on this, I'd greatly appreciate it. I specifically am struggling with the slicing part (cutting it up correctly), and with not seeing performance gains by using more than one thread. Thanks in advance.

I'm starting by throwing a test program at you:
#include <assert.h>
#include <stdbool.h>
#include <stdlib.h>
#include <stdio.h>
#include <stddef.h>
#include <time.h>
bool
EnlargeBuffer(char ** const buffer_pointer,
size_t * const buffer_size)
{
char * larger_buffer = realloc(*buffer_pointer,
2 * *buffer_size);
if (! larger_buffer) {
larger_buffer = realloc(*buffer_pointer,
*buffer_size + 100);
if (! larger_buffer) {
return false;
}
*buffer_size += 100;
} else {
*buffer_size *= 2;
}
*buffer_pointer = larger_buffer;
printf("(Buffer size now at %zu)\n", *buffer_size);
return true;
}
bool
ReadAll(FILE * const source,
char ** pbuffer,
size_t * pbuffer_size,
size_t * pwrite_index)
{
int c;
while ((c = fgetc(source)) != EOF) {
assert(*pwrite_index < *pbuffer_size);
(*pbuffer)[(*pwrite_index)++] = c;
if (*pwrite_index == *pbuffer_size) {
if (! EnlargeBuffer(pbuffer, pbuffer_size)) {
free(*pbuffer);
return false;
}
}
}
if (ferror(source)) {
free(*pbuffer);
return false;
}
return true;
}
unsigned
CountAs(char const * const buffer,
size_t size)
{
unsigned count = 0;
while (size--)
{
if (buffer[size] == 'A') ++count;
}
return count;
}
int
main(int argc, char ** argv)
{
char * buffer = malloc(100);
if (! buffer) return 1;
size_t buffer_size = 100;
size_t write_index = 0;
clock_t begin = clock();
for (int i = 1; i < argc; ++i)
{
printf("Reading %s now ... \n", argv[i]);
FILE * const file = fopen(argv[i], "r");
if (! file) return 1;
if (! ReadAll(file, &buffer, &buffer_size, &write_index))
{
return 1;
}
fclose(file);
}
clock_t end = clock();
printf("Reading done, took %f seconds\n",
(double)(end - begin) / CLOCKS_PER_SEC);
begin = clock();
unsigned const as = CountAs(buffer, write_index);
end = clock();
printf("All files have %u 'A's, counting took %f seconds\n",
as,
(double)(end - begin) / CLOCKS_PER_SEC);
}
This program reads all files (passed as command line arguments) into one big large char * buffer, and then counts all bytes which are == 'A'. It also times both of these steps.
Example run with (shortened) output on my system:
# gcc -Wall -Wextra -std=c11 -pedantic allthefiles.c
# dd if=/dev/zero of=large_file bs=1M count=1000
# ./a.out allthefiles.c large_file
Reading allthefiles.c now ...
(Buffer size now at 200)
...
(Buffer size now at 3200)
Reading large_file now ...
(Buffer size now at 6400)
(Buffer size now at 12800)
...
(Buffer size now at 1677721600)
Reading done, took 4.828559 seconds
All files have 7 'A's, counting took 0.764503 seconds
Reading took almost 5 seconds, but counting (= iterating once, in a single thread, over all bytes) took a bit less than 1 second.
You're optimizing at the wrong place!
Using 1 thread to read all files, and then using N threads to operate on that one buffer isn't going to bring you places. The fastest way to read 1 file is to use 1 thread. For multiple files, use 1 thread per file!
So, in order to achieve the speedup that you need to show for your assignment:
Create a pool of threads with variable size.
Have a pool of tasks, where each task consists of
read one file
compute it's run-length encoding
store the run-length encoded file
let the threads take tasks from your task pool.
Things to consider: How do you combine the results of each task? Without requiring (costly) synchronization.

Related

How to stop page faults from slowing C program with mmap?

I am developing a program that involves making reads and writes to a file on parallel pthreads in C. The program mmaps a portion of the file (which is always a multiple of the page size) into different pages and then creates assignments for each thread on the system that contain an array of pointers to the start of various pages that have been mapped. Each thread is then given an assignment of pages that must be transformed with a function, and the main thread waits until all the other threads finish their work on the pages. Then, it will mmap the next portion of the file and repeat the thread dispersal and execution process. I have noticed that as pages continue to load, their loading time increases significantly. I believe that page faults are taking longer and longer as more data is edited. I have done various testing indicating that the first hit on a page is what slows the transformation of the pages down increasingly more and more. I realize this is likely the operating system slowing things down, any ideas on what is going on and how I can increase speed?
I am working on a 16gb RAM, 4 core (1 thread per core) machine. Below, you will find a simpler version of my program where a function called simpleMod is setting the first byte of each page of the file to 0. disperseFunction splits up the file into the max amount that can be mmaped at once and runFunction does the actual thread dispersal. For a 1gb file, it take ~0.3 seconds. For and 8gb file, it takes ~35 second showing the drastic time increase. I did try madvise for sequential reads. Here is the code:
#include <pthread.h>
#include <sched.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <stdio.h>
#include <stdint.h>
#define DEFAULT_THREADS_PER_CORE 2
#define BYTE_COUNT 4096
#define LONG_COUNT 512
#define DEFAULT_MAX_LOAD 1073741824
#define PTR_ADD(ptr, off) ((void *) (((size_t) (ptr)) + ((size_t) (off))))
//the blocks that are read to be changed
union block {
uint_fast64_t longs[LONG_COUNT];
char bytes[BYTE_COUNT];
} block;
//pointers to positions in file
typedef struct job {
void *start;
} job;
//what each thread has to do
typedef struct assignment {
job *jobs;
unsigned long long jobCount;
} assignment;
unsigned int threadsPerCore;
unsigned int threadCount;
unsigned long long maxLoad;
/*
This function takes in a function and distributes it to threads
#param loadSize is how many bytes to distribute
#param offset is how many bytes into file to start at
#param fd is the file descriptor
#param assignments is the assignments array to utilize
#param threads is the threads array to dispatch to
#param function is the function to be dispatched
The load size and offset must be multiples of 4096
*/
int runFunction(unsigned long long loadSize, unsigned long long offset, int fd,
assignment assignments[], pthread_t threads[], void *(*function)(void *)) {
//get pointer to location in file
void *fileBytes = mmap(NULL, loadSize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, offset);
if(fileBytes == MAP_FAILED) { //check for error
printf("Could not open file\n");
return 1;
}
//calculate distribution data
unsigned long long chunkCount = loadSize / BYTE_COUNT;
unsigned long long chunksPerThread = chunkCount / threadCount;
unsigned long long remainingChunks = chunkCount - chunksPerThread * threadCount;
unsigned long long addPerAssignemnt = (remainingChunks + (threadCount - 1)) / threadCount;
//get jobs counts and malloc sapce in assignments
for(int i = 0; i < threadCount; i++) {
assignments[i].jobCount = chunksPerThread;
if(remainingChunks == 0) {
assignments[i].jobCount += remainingChunks;
remainingChunks = 0;
} else if(remainingChunks >= addPerAssignemnt) {
assignments[i].jobCount += addPerAssignemnt;
remainingChunks -= addPerAssignemnt;
}
assignments[i].jobs = malloc(sizeof(job) * assignments[i].jobCount);
}
//then assignments need to be filled with jobs
//sequential memory blocks deployed to threads
unsigned long long done = 0;
unsigned int t;
for(t = 0; t < threadCount; t++) { //cycle through dimensions
for(unsigned long long j = 0; j < assignments[t].jobCount; j++) {
//setup new job
job *newJob = &assignments[t].jobs[j];
newJob->start = PTR_ADD(fileBytes, done + j * BYTE_COUNT);
}
done += assignments[t].jobCount * BYTE_COUNT;
}
//setup threads for function
for(t = 0; t < threadCount; t++) {
pthread_create(&threads[t], NULL, function, &assignments[t]);
}
for(t = 0; t < threadCount; t++) {
pthread_join(threads[t], NULL);
}
//free the jobs
for(t = 0; t < threadCount; t++) {
free(assignments[t].jobs);
}
//unmap mem
munmap(fileBytes, loadSize);
}//end runFunction
/*
This function breaks up the file into chunks no greater than the max size and then disperses that chunk to the threads
#param function is the function to disperse
#param threads are the threads to disperse onto
#param assignments is the assignments array to ultize
#param fd is the file descriptor
#param fileSize is the size of the file described by fd
*/
int disperseFunction(void *(*function)(void *), pthread_t threads[],
assignment assignments[], int fd, unsigned long long fileSize) {
//only load up to certain amount of file at time
for(unsigned long long lc = 0; lc < fileSize / maxLoad; lc++) {
runFunction(maxLoad, maxLoad * lc, fd, assignments, threads, function);
}
//run remainder
unsigned long long lastSize = fileSize % maxLoad;
if(lastSize) {
runFunction(lastSize, fileSize - lastSize, fd, assignments, threads, function);
}
return 0;
}
void *simpleMod(void *data) {
//get assignment info
//assignments are a struct that hold an array of jobs (which are pointers) and the job count
assignment *assignmentPtr = data;
assignment currentAssignment = *assignmentPtr;
job *jobList = currentAssignment.jobs;
unsigned long long jobCount = currentAssignment.jobCount;
//setup tracking vars
job currentJob;
union block *chunk;
//go through jobs one by one and set first byte to 0
for(unsigned long long j = 0; j < jobCount; j++) {
currentJob = jobList[j];
chunk = currentJob.start;
chunk->bytes[0] = 0;
}//end for each job
return NULL;
}
int main() {
threadCount = 4; //just an example for my four thread system
maxLoad = DEFAULT_MAX_LOAD;
assignment assignments[threadCount];
pthread_t threads[threadCount];
struct stat statBuf;
int fd = open("8g", O_RDWR);
fstat(fd, &statBuf);
//run simpleMod over file
disperseFunction(simpleMod, threads, assignments, fd, statBuf.st_size);
}
Thank you in advance and please let me know any other information I can provide.

DES CBC mode not outputting correctly

I am working on a project in C to implement CBC mode on top of a skeleton code for DES with OpenSSL. We are not allowed to use a function that does the CBC mode automatically, in the sense that we must implement it ourselves. I am getting output but I have result files and my output is not matching up completely with the intended results. I also am stuck on figuring out how to pad the file to ensure all the blocks are of equal size, which is probably one of the reasons why I'm not receiving the correct output. Any help would be appreciated. Here's my modification of the skeleton code so far:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <openssl/des.h>
#include <sys/time.h>
#include <unistd.h>
#define ENC 1
#define DEC 0
DES_key_schedule key;
int append(char*s, size_t size, char c) {
if(strlen(s) + 1 >= size) {
return 1;
}
int len = strlen(s);
s[len] = c;
s[len+1] = '\0';
return 0;
}
int getSize (char * s) {
char * t;
for (t = s; *t != '\0'; t++)
;
return t - s;
}
void strToHex(const_DES_cblock input, unsigned char *output) {
int arSize = 8;
unsigned int byte;
for(int i=0; i<arSize; i++) {
if(sscanf(input, "%2x", &byte) != 1) {
break;
}
output[i] = byte;
input += 2;
}
}
void doBitwiseXor(DES_LONG *xorValue, DES_LONG* data, const_DES_cblock roundOutput) {
DES_LONG temp[2];
memcpy(temp, roundOutput, 8*sizeof(unsigned char));
for(int i=0; i<2; i++) {
xorValue[i] = temp[i] ^ data[i];
}
}
void doCBCenc(DES_LONG *data, const_DES_cblock roundOutput, FILE *outFile) {
DES_LONG in[2];
doBitwiseXor(in, data, roundOutput);
DES_encrypt1(in,&key,ENC);
printf("ENCRYPTED\n");
printvalueOfDES_LONG(in);
printf("%s","\n");
fwrite(in, 8, 1, outFile);
memcpy(roundOutput, in, 2*sizeof(DES_LONG));
}
int main(int argc, char** argv)
{
const_DES_cblock cbc_key = {0x01,0x23,0x45,0x67,0x89,0xab,0xcd,0xef};
const_DES_cblock IV = {0x01,0x23,0x45,0x67,0x89,0xab,0xcd,0xef};
// Initialize the timing function
struct timeval start, end;
gettimeofday(&start, NULL);
int l;
if ((l = DES_set_key_checked(&cbc_key,&key)) != 0)
printf("\nkey error\n");
FILE *inpFile;
FILE *outFile;
inpFile = fopen("test.txt", "r");
outFile = fopen("test_results.txt", "wb");
if(inpFile && outFile) {
unsigned char ch;
// A char array that will hold all 8 ch values.
// each ch value is appended to this.
unsigned char eight_bits[8];
// counter for the loop that ensures that only 8 chars are done at a time.
int count = 0;
while(!feof(inpFile)) {
// read in a character
ch = fgetc(inpFile);
// print the character
printf("%c",ch);
// append the character to eight_bits
append(eight_bits,1,ch);
// increment the count so that we only go to 8.
count++;
const_DES_cblock roundOutput;
// When count gets to 8
if(count == 8) {
// for formatting
printf("%s","\n");
// Encrypt the eight characters and store them back in the char array.
//DES_encrypt1(eight_bits,&key,ENC);
doCBCenc(eight_bits, roundOutput, outFile);
// prints out the encrypted string
int k;
for(k = 0; k < getSize(eight_bits); k++){
printf("%c", eight_bits[k]);
}
// Sets count back to 0 so that we can do another 8 characters.
count = 0;
// so we just do the first 8. When everything works REMOVE THE BREAK.
//break;
}
}
} else {
printf("Error in opening file\n");
}
fclose(inpFile);
fclose(outFile);
// End the timing
gettimeofday(&end, NULL);
// Initialize seconds and micros to hold values for the time output
long seconds = (end.tv_sec - start.tv_sec);
long micros = ((seconds * 1000000) + end.tv_usec) - (start.tv_usec);
// Output the time
printf("The elapsed time is %d seconds and %d microseconds\n", seconds, micros);
}
Your crypto is at least half correct, but you have a lot of actual or potential other errors.
As you identified, raw CBC mode can only encrypt data which is a multiple of the block size, for DES 64 bits or 8 bytes (on most modern computers and all where you could use OpenSSL). In some applications this is okay; for example if the data is (always) an MD5 or SHA-256 or SHA-512 hash, or a GUID, or an IPv6 (binary) address, then it is a block multiple. But most applications want to handle at least any length in bytes, so they need to use some scheme to pad on encrypt and unpad on decrypt the last block (all blocks before the last already have the correct size). Many different schemes have been developed for this, so you need to know which to use. I assume this is a school assignment (since no real customer would set such a stupid and wasteful combination of requirements) and this should either have been specified or clearly left as a choice. One padding scheme very common today (although not for single-DES, because that is broken, unsafe, obsolete, and not common) is the one defined by PKCS5 and generalized by PKCS7 and variously called PKCS5, PKCS7, or PKCS5/7 padding, so I used that as an example.
Other than that:
you try to test feof(inpFile) before doing fgetc(inpFile). This doesn't work in C. It results in your code treating the low 8 bits of EOF (255 aka 0xFF on practically all implementations) as a valid data character added to the characters that were actually in the file. The common idiom is to store the return of getchar/getc/fgetc in a signed int and compare to EOF, but that would have required more changes so I used an alternate.
you don't initialize eight_bits which is a local-scope automatic duration variable, so its contents are undefined and depending on the implementation are often garbage, which means trying to 'append' to it by using strlen() to look for the end won't work right and might even crash. Although on some implementations at least some times it might happen to contain zero bytes, and 'work'. In addition it is possible in C for a byte read from a file (and stored here) to be \0 which will also make this work wrong, although if this file contains text, as its name suggests, it probably doesn't contain any \0 bytes.
once you fill eight_bits you write 'off-the-end' into element [8] which doesn't exist. Technically this is Undefined Behavior and anything at all can happen, traditionally expressed on Usenet as nasal demons. Plus after main finishes the first block it doesn't change anything in eight_bits so all further calls to append find it full and discard the new character.
while you could fix the above points separately, a much simple solution is available: you are already using count to count the number of bytes in the current block, so just use it as the subscript.
roundOutput is also an uninitialized local/auto variable within the loop, which is then used as the previous block for the CBC step, possibly with garbage or wrong value(s). And you don't use the IV at all, as is needed. You should allocate this before the loop (so it retains its value through all iterations) and initialize it to the IV, and then for each block in the loop your doCBCenc can properly XOR it to the new block and then leave the encrypted new block to be used next time.
your code labelled 'prints out the encrypted string' prints plaintext not ciphertext -- which is binary and shouldn't be printed directly anyway -- and is not needed because your file-read loop already echoes each character read. But if you do want to print a (validly null-terminated) string it's easier to just use fputs(s) or [f]printf([f,]"%s",s) or even fwrite(s,1,strlen(s),f).
your doCBCenc has a reference to printvalueofDES_LONG which isn't defined anywhere, and which along with two surrounding printf is clearly not needed.
you should use a cast to convert the first argument to doCBCenc -- this isn't strictly required but is good style and a good compiler (like mine) complains if you don't
finally, when an error occurs you usually print a message but then continue running, which will never work right and may produce symptoms that disguise the problem and make it hard to fix.
The below code fixes the above except that last (which would have been more work for less benefit) plus I removed routines that are now superfluous, and the timing code which is just silly: Unix already has builtin tools to measure and display process time more easily and reliably than writing code. Code I 'removed' is under #if 0 for reference, and code I added under #else or #if 1 except for the cast. The logic for PKCS5/7 padding is under #if MAYBE so it can be either selected or not. Some consider it better style to use sizeof(DES_block) or define a macro instead of the magic 8's, but I didn't bother -- especially since it would have required changes that aren't really necessary.
// SO70209636
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <openssl/des.h>
#include <sys/time.h>
#include <unistd.h>
#define ENC 1
#define DEC 0
DES_key_schedule key;
#if 0
int append(char*s, size_t size, char c) {
if(strlen(s) + 1 >= size) {
return 1;
}
int len = strlen(s);
s[len] = c;
s[len+1] = '\0';
return 0;
}
int getSize (char * s) {
char * t;
for (t = s; *t != '\0'; t++)
;
return t - s;
}
void strToHex(const_DES_cblock input, unsigned char *output) {
int arSize = 8;
unsigned int byte;
for(int i=0; i<arSize; i++) {
if(sscanf(input, "%2x", &byte) != 1) {
break;
}
output[i] = byte;
input += 2;
}
}
#endif
void doBitwiseXor(DES_LONG *xorValue, DES_LONG* data, const_DES_cblock roundOutput) {
DES_LONG temp[2];
memcpy(temp, roundOutput, 8*sizeof(unsigned char));
for(int i=0; i<2; i++) {
xorValue[i] = temp[i] ^ data[i];
}
}
void doCBCenc(DES_LONG *data, const_DES_cblock roundOutput, FILE *outFile) {
DES_LONG in[2];
doBitwiseXor(in, data, roundOutput);
DES_encrypt1(in,&key,ENC);
#if 0
printf("ENCRYPTED\n");
printvalueOfDES_LONG(in);
printf("%s","\n");
#endif
fwrite(in, 8, 1, outFile);
memcpy(roundOutput, in, 2*sizeof(DES_LONG));
}
int main(int argc, char** argv)
{
const_DES_cblock cbc_key = {0x01,0x23,0x45,0x67,0x89,0xab,0xcd,0xef};
const_DES_cblock IV = {0x01,0x23,0x45,0x67,0x89,0xab,0xcd,0xef};
#if 0
// Initialize the timing function
struct timeval start, end;
gettimeofday(&start, NULL);
#endif
int l;
if ((l = DES_set_key_checked(&cbc_key,&key)) != 0)
printf("\nkey error\n");
#if 1
DES_cblock roundOutput; // must be outside the loop
memcpy (roundOutput, IV, 8); // and initialized
#endif
FILE *inpFile;
FILE *outFile;
inpFile = fopen("test.txt", "r");
outFile = fopen("test.encrypt", "wb");
if(inpFile && outFile) {
unsigned char ch;
// A char array that will hold all 8 ch values.
// each ch value is appended to this.
unsigned char eight_bits[8];
// counter for the loop that ensures that only 8 chars are done at a time.
int count = 0;
#if 0
while(!feof(inpFile)) {
// read in a character
ch = fgetc(inpFile);
#else
while( ch = fgetc(inpFile), !feof(inpFile) ){
#endif
// print the character
printf("%c",ch);
#if 0
// append the character to eight_bits
append(eight_bits,1,ch);
// increment the count so that we only go to 8.
count++;
#else
eight_bits[count++] = ch;
#endif
#if 0
const_DES_cblock roundOutput;
#endif
// When count gets to 8
if(count == 8) {
// for formatting
printf("%s","\n");
// Encrypt the eight characters and store them back in the char array.
//DES_encrypt1(eight_bits,&key,ENC);
doCBCenc((DES_LONG*)eight_bits, roundOutput, outFile);
#if 0
// prints out the encrypted string
int k;
for(k = 0; k < getSize(eight_bits); k++){
printf("%c", eight_bits[k]);
}
#endif
// Sets count back to 0 so that we can do another 8 characters.
count = 0;
// so we just do the first 8. When everything works REMOVE THE BREAK.
//break;
}
}
#if MAYBE
memset (eight_bits+count, 8-count, 8-count); // PKCS5/7 padding
doCBCenc((DES_LONG*)eight_bits, roundOutput, outFile);
#endif
} else {
printf("Error in opening file\n");
}
fclose(inpFile);
fclose(outFile);
#if 0
// End the timing
gettimeofday(&end, NULL);
// Initialize seconds and micros to hold values for the time output
long seconds = (end.tv_sec - start.tv_sec);
long micros = ((seconds * 1000000) + end.tv_usec) - (start.tv_usec);
// Output the time
printf("The elapsed time is %d seconds and %d microseconds\n", seconds, micros);
#endif
}
PS: personally I wouldn't put the fwrite in doCBCenc; I would only do the encryption and let the caller do whatever I/O is appropriate which might in some cases not be fwrite. But what you have is not wrong for the requirements you apparently have.

Quick-sorting with multiple threads leaves last 1/6 unsorted

My goal is to create a program that takes a large list of unsorted integers (1-10 million) and divides it into 6 parts where a thread concurrently sorts it. After sorting I merge it into one sorted array so I can find the median and mode quicker.
The input file will be something like this:
# 1000000
314
267
213
934
where the number following the # identifies the number of integers in the list.
Currently I can sort perfect and quickly without threading however when I began threading I ran into an issue. For a 1,000,000 data set it only sorts the first 833,333 integers leaving the last 166,666 (1/6) unsorted.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <time.h>
#define BUF_SIZE 1024
int sum; /* this data will be shared by the thread(s) */
int * bigArr;
int size;
int findMedian(int array[], int size)
{
if (size % 2 != 0)
return array[size / 2];
return (array[(size - 1) / 2] + array[size / 2]) / 2;
}
/*compare function for quicksort*/
int _comp(const void* a, const void* b) {
return ( *(int*)a - *(int*)b);
}
/*This function is the problem method*/
/*indicate range of array to be processed with the index(params)*/
void *threadFct(int param)
{
int x= size/6;
if(param==0)x= size/6;
if(param>0&&param<5)x= (size/6)*param;
if(param==5)x= (size/6)*param+ (size%size/6);/*pass remainder into last thread*/
qsort((void*)bigArr, x, sizeof(bigArr[param]), _comp);
pthread_exit(0);
}
int main(int argc, char *argv[])
{
FILE *source;
int i =0;
char buffer[BUF_SIZE];
if(argc!=2){
printf("Error. please enter ./a followed by the file name");
return -1;}
source= fopen(argv[1], "r");
if (source == NULL) { /*reading error msg*/
printf("Error. File not found.");
return 1;
}
int count= 0;
while (!feof (source)) {
if (fgets(buffer, sizeof (buffer), source)) {
if(count==0){ /*Convert string to int using atoi*/
char str[1];
sprintf(str, "%c%c%c%c%c%c%c%c%c",buffer[2],buffer[3],buffer[4],buffer[5],buffer[6],buffer[7],buffer[8],buffer[9],buffer[10]);/*get string of first */
size= atoi(str); /* read the size of file--> FIRST LINE of file*/
printf("SIZE: %d\n",size);
bigArr= malloc(size*sizeof(int));
}
else{
//printf("[%d]= %s\n",count-1, buffer); /*reads in the rest of the file*/
bigArr[count-1]= atoi(buffer);
}
count++;
}
}
/*thread the unsorted array*/
pthread_t tid[6]; /* the thread identifier */
pthread_attr_t attr; /* set of thread attributes */
// qsort((void*)bigArr, size, sizeof(bigArr[0]), _comp); <---- sorts array without threading
for(i=0; i<6;i++){
pthread_create(&tid[i], NULL, &threadFct, i);
pthread_join(tid[i], NULL);
}
printf("Sorted array:\n");
for(i=0; i<size;i++){
printf("%i \n",bigArr[i]);
}
fclose(source);
}
So to clarify the problem function is in my threadFct().
To explain what the function is doing, the param(thread number) identifies which chunk of the array to quicksort. I divide the size into 6 parts and because the it is even, the remainder of the numbers go into the last chunk. So for example, 1,000,000 integers I would have the first 5/6 sort 166,666 each and the last 1/6 would sort the remainder (166670).
I am aware that
Multi-threading will not speed up much at all even for 10 million integers
This is not the most efficient way to find the median/mode
Thanks for reading this and any help is received with gratitude.
You're sorting the beginning of the array in every call to qsort. You're only changing the number of elements that each thread sorts, by setting x. You're also setting x to the same value in threads 0 and 1.
You need to calculate an offset into the array for each thread, which is just size/6 * param. The number of elements will be size/6 except for the last chunk, which uses a modulus to get the remainder.
As mentioned in the comments, the argument to the thread function should be a pointer, not int. You can hide an integer in the pointer, but you need to use explicit casts.
void *threadFct(void* param_ptr)
{
int param = (int)param_ptr;
int start = size/6 * param;
int length;
if (param < 5) {
length = size/6;
} else {
length = size - 5 * (size/6);
}
qsort((void*)(bigArr+start), length, sizeof(*bigArr), _comp);
pthread_exit(0);
}
and later
pthread_create(&tid[i], NULL, &threadFct, (void*)i);

Pthreads and recursion

I'm struggling with one of my training tasks for getting in touch with a new language. Unfortunately, this time the new language is an old one, it is C. My programming Task for this is to generate Langford-Strings, which should not be the main problem.
My first attempt in C, with a recursive approach works like a charm:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int grade = 0;
const char* blank = "_";
const char* alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
void generate(int position, char* string) {
if (!string) {
string = calloc(grade*2+1, sizeof(char));
for (int i = 0; i < grade*2; i++) {
string = strcat(string, blank);
}
}
if (!strstr(string, blank)) {
printf("%s\n", string);
return;
}
if (position < strlen(string)) {
if (string[position] != *blank) {
char* nstring = calloc(grade*2+1, sizeof(char));
strcpy(nstring, string);
generate(position+1, nstring);
free(nstring);
return;
} else {
for (int i = 0; i<strlen(string); i++) {
if (strchr(string, alphabet[i])){
continue;
}
int index = strcspn(alphabet, &alphabet[i])+1;
if (position+index+1<strlen(string)) {
if (string[position]==*blank) {
if (string[position+index+1]==*blank) {
char* nstring = calloc(grade*2+1, sizeof(char));
strncat(nstring, string, position);
strncat(nstring, &alphabet[i], 1);
strncat(nstring, &string[position+1], index);
strncat(nstring, &alphabet[i], 1);
strcat(nstring, &string[position+2+index]);
if (position<strlen(nstring)) {
generate(position+1, nstring);
}
free(nstring);
}
}
}
}
}
}
}
int main(int argc, char* argv[]) {
if (argc < 2) {
printf("Missing parameter of langford strings grade!\n");
return 1;
}
grade = strtol(argv[1], NULL, 10);
if (grade % 4 != 0) {
if ((grade+1) % 4 != 0) {
printf("Grade must be multiple of 4 or one less\n");
return 1;
}
}
generate(0, NULL);
return 0;
}
That works great, giving me exactly the results I expected.
But when I try to do it threaded (old-style threaded, spawning a new thread on each level of the recursion), it not only ends with a seqfault every time. It does end in an seqfault in a not predictable time. That means, that it runs indefinitly, printing out doubled and trippled results and always a random number of results, before seqfaulting.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include <signal.h>
#include <errno.h>
size_t grade = 0;
const char* blank = "_";
const char* alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
struct thread {
pthread_t* thread;
size_t position;
struct thread* threads[26];
char* string;
};
void* alloc_thread_data() {
struct thread* ret = calloc(1, sizeof(struct thread));
ret->thread = calloc(1, sizeof(pthread_t));
ret->position = 0;
ret->string = calloc((grade*2)+1, sizeof(char));
return (void*)ret;
}
void free_thread_data(struct thread* dst) {
free(dst->string);
free(dst->thread);
free(dst);
}
void assemble_string(char* dst, char* src, size_t pos, size_t index) {
strncat(dst, src, pos);
strncat(dst, &alphabet[index-1], 1);
strncat(dst, &src[pos+1], index);
strncat(dst, &alphabet[index-1], 1);
strncat(dst, &src[pos+2+index], (grade*2)-pos+index+2);
}
void* generate(void* data) {
struct thread* args = (struct thread*)data;
if (args->string && strlen(args->string)==0) {
for (size_t i = 0; i<grade*2; i++) {
strcat(args->string, blank);
}
}
if (args->string && !strstr(args->string, blank)) {
printf("%s\n", args->string);
return NULL;
}
if (args->string && args->position<strlen(args->string)) {
size_t sub = 0;
if (args->string[args->position]!=*blank) {
args->threads[sub] = alloc_thread_data();
strcpy(args->threads[sub]->string, args->string);
args->threads[sub]->position = args->position+1;
pthread_create(args->threads[sub]->thread, NULL, generate, (void*)args->threads[sub]);
sub++;
} else {
for (size_t i = 0; i<grade*2; i++) {
if (strchr(args->string, alphabet[i])){
continue;
}
int index = strcspn(alphabet, &alphabet[i])+1;
if (args->string[args->position] == *blank) {
if (args->string[args->position+index+1] == *blank) {
args->threads[sub] = alloc_thread_data();
assemble_string(args->threads[sub]->string, args->string, args->position, index);
args->threads[sub]->position = args->position+1;
pthread_create(args->threads[sub]->thread, NULL, generate, (void*)args->threads[sub]);
sub++;
}
}
}
}
for (size_t i = 0; i<sub; i++) {
if (args->threads[i]->thread!=NULL) {
if(pthread_kill(*args->threads[i]->thread, 0)==0) {
pthread_join(*args->threads[i]->thread, NULL);
}
free_thread_data(args->threads[i]);
}
}
}
return NULL;
}
int main(int argc, char* argv[]) {
if (argc < 2) {
printf("Missing parameter of langford strings grade!\n");
return 1;
}
grade = strtol(argv[1], NULL, 10);
if (grade % 4 != 0) {
if ((grade+1) % 4 != 0) {
printf("Grade must be multiple of 4 or one less\n");
return 2;
}
}
struct thread* args = alloc_thread_data();
pthread_create(args->thread, NULL, generate, (void*)args);
if(pthread_kill(*args->thread, 0)==0) {
pthread_join(*args->thread, NULL);
}
free_thread_data(args);
}
So, as written before, I managed to get around C programming for my whole work-life and do this just for fun - so I do not expect my code to be somewhat comprehensive. Please help me finding out, what is wrong with the threaded approach (and if you see any well-known-codesmell in the first one as well of course). Any hints welcome.
In addition to the bad allocation that #EugeneSh. pointed out, this looks like a problem:
pthread_create(args->threads[sub]->thread, NULL, generate,
(void*)&args->threads[sub]);
Note the difference from this other call that also appears:
pthread_create(args->threads[sub]->thread, NULL, generate,
(void*)args->threads[sub]);
[newlines inserted and indentation normalized for clarity and ease of reading].
args->threads[sub] is a struct thread*. You want to pass that pointer itself to pthread_create(), as in the second case, not its address, as in the first case.
Overall, I'm inclined to agree with #MikeRobinson that yours is an inappropriate use of threads. It is never useful performance-wise to have more schedulable threads in your process than you have cores, and you scale up to many thousands of total threads very quickly. I doubt very much that the result will outperform your single-threaded solution -- the costs of the context switching and cache thrashing that surely result will likely swamp whatever speedup you get from parallel execution on the 4 - 12 cores you probably have.
Added:
Additionally, it is very important to check the values returned by your function calls for error codes, unless you don't care and don't need to care whether the calls succeed. In particular, you should check
the return values of your malloc() / calloc() calls -- these return NULL in the event of unsuccessful allocation, and with as many total allocations as you perform, it is plausible that some of these fail. Using the resulting NULL pointer could easily lead to a segfault
the return values of your pthread_create() calls -- these return a value different from 0 in the event of failure. It is not safe to afterward rely on pthread_kill() to determine whether the thread was created successfully, for a failed pthread_create() leaves the thread handle's contents undefined. Any subsequent evaluation that depends on the value of the handle therefore exhibits undefined behavior.
I'm also a little suspicious of all your strncat()ing, for this is a notorious source of string overruns. These are ok if the target strings have enough capacity, but it's difficult for me to tell whether they always do in your case.
May I cordially suggest that it makes absolutely no sense (to me, at least ...) to "spawn a new thread" here?
The only reason to "spawn a thread" is when you wish to perform two thereafter, independent activities, which this algorithm quite clearly does not.
The immediate reason for the segfault is that the various threads are all attempting to manipulate the same data without regard to one another, and without waiting for one another. But, IMHO, the root cause of the problem is ... that this entire scenario is nonsense. "Recursion" and "multi-threading" are not at all the same thing. If your objective here was to learn about threading, I'm afraid that you've just learned far more (the very-hard way) than you ever wished to know . . .

Multiple producer single consumer with Circular Buffer

Need help in getting the following to work.
I have a multiple producer threads (each writing say 100 bytes of data) to ringbuffer.
And one single reader(consumer) thread ,reads 100 bytes at a time and writes to stdout.(Finally i want to write to files based on the data)
With this implementation ,I get the data read from ring buffer wrong sometimes. see below
Since the ringbuffer size is small it becomes full and some part of data is loss.This is not my current problem.
** Questions:
On printing the data thats read from ringbuffer ,some data gets
interchanged !!I'm unable to find the bug.
Is the logic/approach correct ? (or) Is there a
better way to do this
ringbuffer.h
#define RING_BUFFER_SIZE 500
struct ringbuffer
{
char *buffer;
int wr_pointer;
int rd_pointer;
int size;
int fill_count;
};
ringbuffer.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "ringbuffer.h"
int init_ringbuffer(char *rbuffer, struct ringbuffer *rb, size_t size)
{
rb->buffer = rbuffer;
rb->size = size;
rb->rd_pointer = 0;
rb->wr_pointer = 0;
rb->fill_count = 0;
return 0;
}
int rb_get_free_space (struct ringbuffer *rb)
{
return (rb->size - rb->fill_count);
}
int rb_write (struct ringbuffer *rb, unsigned char * buf, int len)
{
int availableSpace;
int i;
availableSpace = rb_get_free_space(rb);
printf("In Write AVAIL SPC=%d\n",availableSpace);
/* Check if Ring Buffer is FULL */
if(len > availableSpace)
{
printf("NO SPACE TO WRITE - RETURN\n");
return -1;
}
i = rb->wr_pointer;
if(i == rb->size) //At the end of Buffer
{
i = 0;
}
else if (i + len > rb->size)
{
memcpy(rb->buffer + i, buf, rb->size - i);
buf += rb->size - i;
len = len - (rb->size - i);
rb->fill_count += len;
i = 0;
}
memcpy(rb->buffer + i, buf, len);
rb->wr_pointer = i + len;
rb->fill_count += len;
printf("w...rb->write=%tx\n", rb->wr_pointer );
printf("w...rb->read=%tx\n", rb->rd_pointer );
printf("w...rb->fill_count=%d\n", rb->fill_count );
return 0;
}
int rb_read (struct ringbuffer *rb, unsigned char * buf, int max)
{
int i;
printf("In Read,Current DATA size in RB=%d\n",rb->fill_count);
/* Check if Ring Buffer is EMPTY */
if(max > rb->fill_count)
{
printf("In Read, RB EMPTY - RETURN\n");
return -1;
}
i = rb->rd_pointer;
if (i == rb->size)
{
i = 0;
}
else if(i + max > rb->size)
{
memcpy(buf, rb->buffer + i, rb->size - i);
buf += rb->size - i;
max = max - (rb->size - i);
rb->fill_count -= max;
i = 0;
}
memcpy(buf, rb->buffer + i, max);
rb->rd_pointer = i + max;
rb->fill_count -= max;
printf("r...rb->write=%tx\n", rb->wr_pointer );
printf("r...rb->read=%tx\n", rb->rd_pointer );
printf("DATA READ ---> %s\n",(char *)buf);
printf("r...rb->fill_count=%d\n", rb->fill_count );
return 0;
}
At the producer you also need to wait on conditional variable for the has empty space condition. The both conditional variables should be signaled unconditionally, i.e. when a consumer removes an element from the ring buffer it should signal the producers; when a producer put something in the buffer it should signal the consumers.
Also, I would move this waiting/signaling logic into rb_read and rb_write implementations, so your ring buffer is a 'complete to use solution' for the rest of your program.
As to your questions --
1. I can't find that bug either -- in fact, I've tried your code and don't see that behavior.
2. You ask if this is logic/approach correct -- well, as far as it goes, this does implement a kind of ring buffer. Your test case happens to have an integer multiple of the size, and the record size is constant, so that's not the best test.
In trying your code, I found that there is a lot of thread starvation -- the 1st producer thread to run (the last created) hits things really hard, trying and failing after the 1st 5 times to stuff things into the buffer, not giving the consumer thread a chance to run (or even start). Then, when the consumer thread starts, it stays cranking for quite some time before it releases the cpu, and the next producer thread finally starts. That's how it works on my machine -- it will be different on different machines, I'm sure.
It's too bad that your current code doesn't have a way to end -- creating files of 10's or 100's of MB ... hard to wade through.
(Probably a bit later for the author, but if anyone else searches for a "multiple producers single consumer")
I think the fundamental problem in that implementation is what rb_write modifies a global state (rb->fill_count and other rb->XX) w/o doing any synchronization between multiple writers.
For alternative ideas check the: http://www.linuxjournal.com/content/lock-free-multi-producer-multi-consumer-queue-ring-buffer.

Resources