I am trying to implement file copy program with POSIX Asynchronous IO APIs in linux.
I tried this:
main() {
char data[200];
int fd = open("data.txt", O_RDONLY); // text file on the disk
struct aiocb aio;
aio.aio_fildes = fd;
aio.aio_buf = data;
aio.aio_nbytes = sizeof(data);
aio.aio_offset = 0;
memset(&aio, 0, sizeof(struct aiocb));
aio_read(arg->aio_p);
int counter = 0;
while (aio_error(arg->aio_p) == EINPROGRESS) {
printf("counter: %d\n", counter++);
}
int ret = aio_return(&aio);
printf("ret value %d \n",ret);
return 0;
}
But counter giving different results every time when I run
Is it possible to display progress of aio_read and aio_write functions?
You have different results because each different execution has its own context of execution that may differe from others (Do you always follow the exact same path to go from your house to the bank? Even if yes, elapsed time to reach the bank always exactly the same? Same task at the end - you're in the bank, different executions). What your program tries to measure is not I/O completion but the number of time it tests for completion of some async-I/O.
And no, there is no concept of percentage of completion of a given async-I/O.
Related
I am really new to the pthread and time classes and I am currently doing a homework assignment in which I have to send packets of strings at specific times using the pthread_cond_timedwait() command. The command is called in a thread declared to the sendPackets() function; a function that will send all packets to the target IP. The thread initializes just fine but after storing the time that I would like the thread to unblock and uses it as an argument in timedwait(), the function returns the ETIMEDOUT error. Now I am aware that my condition variable could be (and probably is) the reason why it is timing out. I have tried to do research on this function but no matter how much searching I did I haven't found any solutions to my problem (and this is probably because of something simple I overlooked).
Established as global variables are the mutex object and the pthread_cond_t object. They have a global scope so that all threads can access them. I also have established a struct in order to hold information about the set of packets that i'm sending:
struct info{
int socket;
int size;
int count;
float interval;
struct sockaddr_in echoServAddr;
};
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t thread = PTHREAD_COND_INITIALIZER;
After the CLA's are read in (these determine things such as packet count, interval, size in bytes, and server port), I check to see if the program was called as a server or a client (the program is supposed to be interchangeable depending on the presence of the flag -S). If the main method is a client, it goes into the following if statement and initializes the sendPackets() thread. An info pointer is created and initialized and casted to a void pointer in order to pass arguments to the sendPackets() function.
if(isClient){
/*Create a datagram UDP socket*/
if((sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP)) < 0){
DieWithError("socket() failed\n");
}
/* Construct the server address structure */
memset(&echoServAddr, 0, sizeof(echoServAddr));
echoServAddr.sin_family = AF_INET;
echoServAddr.sin_addr.s_addr = inet_addr(servIP);
echoServAddr.sin_port = htons(echoServPort);
enum threads {sender=0,receiver};
struct info *packets = (struct info*)malloc(sizeof(struct info));
packets->size = size;
packets->count = ping_packet_count;
packets->socket = sock;
packets->echoServAddr = echoServAddr;
packets->interval = ping_interval;
pthread_t tid[2];
int a,b; //Thread creation return variables
a = pthread_create(&(tid[sender]),NULL,&sendPackets,(void*)packets);
pthread_join(tid[sender], NULL);
//pthread_join(tid[receiver], NULL);
pthread_mutex_destroy(&lock);
}
Once the thread begins, it acquires the lock and proceeds to carry out its code. Start time is the time the program had begun processing packets. Current time represents the time that the program is at when calculating when to send the next packet, and send packet is the start time + the delay for each packet (sendTime = start_time + [id#] * packet_interval). After testing the code a bit, ive noticed the program doesn't time out until the time specified by sendTime(), which even further shows me that I am just doing something wrong with my condition variable since im so unfamiliar with them. Last little note: clk_id is a macro I had set to CLOCK_REALTIME.
void* sendPackets(void *p){
printf("Starting sendPackets function...\n");
pthread_mutex_lock(&lock);
printf("Sender has aquired lock\n\n");
struct info *packet = (struct info*)p;
printf("Packet Details: Socket: %d Size %d Count %d Interval:%f\n\n",packet->socket,packet->size,packet->count,packet->interval);
struct timespec startTime = {0};
for(int i = 0; i < packet->count; i++){
struct timespec sendTime = {0};
struct timespec currentTime = {0};
float delay = packet->interval * i;
int delayInt = (int) delay;
unsigned char echoString[packet->size];
char strbffr[200] = "";
inet_ntop(AF_INET,&(packet->echoServAddr.sin_addr.s_addr),strbffr,200*sizeof(char));
sendTime.tv_sec = 0;
sendTime.tv_nsec = 0;
printf("PacketID:%d Delay:%f DelayInt:%d\n",i,delay,delayInt);
if(i == 0){
clock_gettime(clk_id,&startTime);
startTime.tv_sec+=1;
}
clock_gettime(clk_id,¤tTime);
sendTime.tv_sec = startTime.tv_sec + delayInt;
sendTime.tv_nsec = startTime.tv_nsec + (int)((delay - delayInt) * 1000000000);
printf("startTime: tv_sec = %d tv_nsec = %d\n",(int)startTime.tv_sec,(int)startTime.tv_nsec);
printf("sendTime: tv_sec = %d tv_nsec = %d\n",(int)sendTime.tv_sec,(int)sendTime.tv_nsec);
printf("currentTime: tv_sec = %d tv_nsec = %d\n\n",(int)currentTime.tv_sec,(int)currentTime.tv_nsec);
int r_wait;
if((r_wait = pthread_cond_timedwait(&thread,&lock,&sendTime)) != 0){
clock_gettime(clk_id,¤tTime);
printf("currentTime: tv_sec = %d tv_nsec = %d\n\n",(int)currentTime.tv_sec,(int)currentTime.tv_nsec);
printf("Received error for timedwait:%s\n",strerror(r_wait));
exit(1);
}
if (sendto(packet->socket, echoString, packet->size, 0, (struct sockaddr *) &packet->echoServAddr, sizeof(packet->echoServAddr)) != packet->size){
DieWithError("sendto() sent a different number of bytes than expected\n");
}
printf("Sent %d to IP:%s\n",i,strbffr);
}
for(int i = 0; i < packet->count; i++){
unsigned char echoString[packet->size];
char strbffr[200] = "";
inet_ntop(AF_INET,&(packet->echoServAddr.sin_addr.s_addr),strbffr,200*sizeof(char));
if (sendto(packet->socket, echoString, packet->size, 0, (struct sockaddr *) &packet->echoServAddr, sizeof(packet->echoServAddr)) != packet->size){
DieWithError("sendto() sent a different number of bytes than expected\n");
}
printf("Sent %d to IP:%s\n",i,strbffr);
}
pthread_mutex_unlock(&lock);
printf("Sender has released lock\n");
printf("Yielding Sender\n\n");
sched_yield();
I am aware that this is a lot of stuff to take in. If there is any other part of my code that you would like to take a look at that I haven't mentioned then please feel free to post a comment stating what you would like to see. I'm pretty confident this is every data structure in my code that is relevant to the issue, however, I could always be wrong.
Here is an image of the output of my program from the print statements I have listed.
You appear to be using pthread_cond_timedwait() as a timer: you don't expect the CV to be signaled (which would terminate the wait early), but rather for the calling thread to be suspended for the full specified timeout.
In that case, ETIMEDOUT is exactly what you should expect when everything works as intended. You should check for that and accept it, and you should perform appropriate handling if you see anything else. In particular, pthreads CV's can exhibit spurious wakeup, so if your pthread_cond_wait() ever returns normally then you need to loop back and wait again to ensure that the full timeout elapses before you proceed.
In short, you should not view an ETIMEDOUT return code as indicating that something went wrong, but rather that (for your particular purposes) everything went right.
My whole system (Ubuntu 18.04) always freezes after around one hour when my c program continuously writes some logs to files. Each file created is around 100 to 200MB and the total amount of these files before system down is around 40-60GB. Usually, I have 150GB more SSD spaces available at this moment.
I checked system condition by System Monitor but couldn't find any problem. When my program runs, only one of the eight cores has 100% usage. Others are pretty low. Before system down, only 2.5GB of 15.5GB memory are used. Every time I reboot my machine, the latest 4-6 created files are empty. Even though most of them was showing some sizes at the moment of freezing. (looks like they were not actual written to SSD)
My c code can be simplified as below:
#define MEM_LEN 50000
#define FILE_LEN 10000*300
struct log_format {
long cnt;
long tv_sec;
long tv_nsec;
unsigned int user;
char rw;
char pathbuffer[256];
size_t count;
long long pos;
};
int main(int argc, const char *argv[])
{
int fd=0;
struct log_format *addr = NULL;
int i=0;
FILE *file;
char filestr[20];
int data_cnt = 0;
int file_cnt =0;
// open shared memory device //
fd = open("/dev/remap_pfn", O_RDWR);
if (fd < 0) {
perror("....open shared memory device1 failed\n");
exit(-1); }
// memory mapping to shared memory device //
addr = mmap(NULL, BUF_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_LOCKED, fd, OFFSET);
if (!addr) {
perror("....mmap1 failed\n");
exit(-1); }
// open a file //
sprintf(filestr, "%d.csv", file_cnt);
file = fopen(filestr, "w");
printf("%s created\n",filestr);
// continuously check the memory replacement of last, and write to file //
while(1){
fprintf(file, "%lu,%lu,%lu,%u,%c,%s,%zu,%lld\n", addr[i].cnt, addr[i].tv_sec,
addr[i].tv_nsec, addr[i].user, addr[i].rw, addr[i].pathbuffer,
addr[i].count, addr[i].pos);
i++;
data_cnt++;
if(i>=MEM_LEN)
i=0;
// when reaching a threshold, create another file to write //
if(data_cnt>=FILE_LEN){
data_cnt = 0;
fclose(file);
file_cnt++;
// open a file //
sprintf(filestr, "%d.csv", file_cnt);
file = fopen(filestr, "w");
printf("%s created\n",filestr);
}
}
fclose(file);
return 0;
}
I didn't find any error message from syslog & kern.log. It just freezes.
Does anyone have ideas what could be the problem. Thanks.
I tried to add some delays into my While loop, to slow down the Write:
(since 1 nano second is still too long for the loop, I make it only Sleep per 10 runs)
While(1){
struct timespec ts = {0,1L};
if(data_cnt%10==0)
nanosleep(&ts, NULL);
......
}
The freeze problem seems gone now.
So... what might be the reason for this? For now, I only saw Write becoming slower and CPU loading decreased to 50% for that core. Is there a write buffer in between and my program exceeded its limit and crushed the system?
(I will also keep track if it is a overheated problem resulting in machine down)
My goal is to write some code to record the current call stack for all CPUs at some interval. Essentially I would like to do the same as perf record but using perf_event_open myself.
According to the manpage it seems I need to use the PERF_SAMPLE_CALLCHAIN sample type and read the results with mmap. That said, the manpage is incredibly terse, and some sample code would go a long way right now.
Can someone point me in the right direction?
The best way to learn about this would be to read the Linux kernel source code and see how you can emulate perf record -g yourself.
As you correctly identified, recording of perf events would start with the system call perf_event_open. So that is where we can start,
definition of perf_event_open
If you observe the parameters of the system call, you will see that the first parameter is a struct perf_event_attr * type. This is the parameter that takes in the attributes for the system call. This is what you need to modify to record callchains. A sample code could be like this (remember you can tweak other parameters and members of the struct perf_event_attr the way you want) :
int buf_size_shift = 8;
static unsigned perf_mmap_size(int buf_size_shift)
{
return ((1U << buf_size_shift) + 1) * sysconf(_SC_PAGESIZE);
}
int main(int argc, char **argv)
{
struct perf_event_attr pe;
long long count;
int fd;
memset(&pe, 0, sizeof(struct perf_event_attr));
pe.type = PERF_TYPE_HARDWARE;
pe.sample_type = PERF_SAMPLE_CALLCHAIN; /* this is what allows you to obtain callchains */
pe.size = sizeof(struct perf_event_attr);
pe.config = PERF_COUNT_HW_INSTRUCTIONS;
pe.disabled = 1;
pe.exclude_kernel = 1;
pe.sample_period = 1000;
pe.exclude_hv = 1;
fd = perf_event_open(&pe, 0, -1, -1, 0);
if (fd == -1) {
fprintf(stderr, "Error opening leader %llx\n", pe.config);
exit(EXIT_FAILURE);
}
/* associate a buffer with the file */
struct perf_event_mmap_page *mpage;
mpage = mmap(NULL, perf_mmap_size(buf_size_shift),
PROT_READ|PROT_WRITE, MAP_SHARED,
fd, 0);
if (mpage == (struct perf_event_mmap_page *)-1L) {
close(fd);
return -1;
}
ioctl(fd, PERF_EVENT_IOC_RESET, 0);
ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
printf("Measuring instruction count for this printf\n");
ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);
read(fd, &count, sizeof(long long));
printf("Used %lld instructions\n", count);
close(fd);
}
Note: A nice and easy way to understand the handling of all of these perf events can be seen below -
PMU-TOOLS by Andi Kleen
If you start reading the source code for the system call, you will see that a function perf_event_alloc is being called. This function, among other things, will setup the buffer for obtaining callchains using perf record.
The function get_callchain_buffers is responsible for setting up callchain buffers.
perf_event_open works via a sampling/counting mechanism where if the performance monitoring counter corresponding to the event you are profiling overflows, then all the event relevant information will be collected and stored into a ring-buffer by the kernel. This ring-buffer can be prepared and accessed via mmap(2).
Edit #1:
The flowchart describing the use of mmap when doing perf record is shown via the below image.
The process of mmaping ring buffers would start from the first function when you call perf record - which is __cmd_record, this calls record__open, which then calls record__mmap, followed by a call to record__mmap_evlist, which then calls perf_evlist__mmap_ex, this is followed by perf_evlist__mmap_per_cpu and finally ending up in perf_evlist__mmap_per_evsel which is doing most of the heavy-lifting as far as doing an mmap for each event is concerned.
Edit #2:
Yes you are correct. When you set the sample period to be, say, a 1000, this means for every 1000th occurrence of the event(which by default is cycles), the kernel will record a sample of this event into this buffer. This means the perf counters will be set to 1000, so that it overflows at 0 and you get an interrupt and eventual recording of the samples.
Currently, I am having a hard time to discover what the problem with my multithreading C program on the RPi is. I have written an application relying on two pthreads, one of them reading data from a gps device and writing it to a text file and the second one is doing exactly the same but with a temperature sensor. On my laptop (Intel® Core™ i3-380M, 2.53GHz) I am have the program nicely working and writing to my files up to the frequencies at which both of the devices send information (10 Hz and 500 Hz respectively).
The real problem emerges when I compile and execute my C program to run on the RPi; The performance of my program running on RPi considerably decreases, having my GPS log file written with a frequency of 3 Hz and the temperature log file at a frequency of 17 Hz (17 measurements written per second)..
I do not really know why I am getting those performance problems with my code running on the PI. Is it because of the RPi has only a 700 MHz ARM Processor and it can not process such a Multithreaded application? Or is it because my two threads routines are perturbing the nice work normally carried out by the PI? Thanks a lot in advance Guys....!!!
Here my code. I am posting just one thread function because I tested the performance with just one thread and it is still writing at a very low frequency (~4 Hz). At first, the main function:
int main(int argc, char *argv[]) {
int s1_hand = 0;
pthread_t routines[2];
printf("Creating Thread -> Main Thread Busy!\n");
s1_hand = pthread_create(&(routines[1]), NULL, thread_2, (void *)&(routines[1]));
if (s1_hand != 0){
printf("Not possible to create threads:[%s]\n", strerror(s1_hand));
exit(EXIT_FAILURE);
}
pthread_join(routines[1], NULL);
void* result;
if ((pthread_join(routines[1], &result)) == -1) {
perror("Cannot join thread 2");
exit(EXIT_FAILURE);
}
pthread_exit(NULL);
return 0;
}
Now, thread number 2 function:
void *thread_2(void *parameters) {
printf("Thread 2 starting...\n");
int fd, chars, parsing, c_1, parse, p_parse = 1;
double array[3];
fd = open("dev/ttyUSB0", O_RDONLY | O_NOCTTY | O_SYNC);
if (fd < 0){
perror("Unable to open the fd!");
exit (EXIT_FAILURE);
}
FILE *stream_a, *stream_b;
stream_a = fdopen(fd, "r");
stream_b = fopen (FILE_I, "w+");
if (stream_a == NULL || stream_b == NULL){
perror("IMPOSSIBLE TO CREATE STREAMS");
exit(EXIT_FAILURE);
}
c_1 = fgetc(stream_a);
parse = findit(p_parse, c_1, array);
printf("First Parse Done -> (%i)\n", parse);
while ((chars = fgetc(stream_a)) != EOF){
parsing = findit(0, (uint8_t)chars, array);
if (parsing == 1){
printf("MESSAGE FOUND AND SAVED -> (%i)\n", parsing);
fprintf(stream_b,"%.6f %.3f %.3f %.3f\n", time_stamp(), array[0], array[1], array[2]);
}
}
fflush(stream_b);
fclose(stream_b);
fclose(stream_a);
close(fd);
pthread_exit(NULL);
return 0;
}
Note that on my thread 2 function I am using findit(), function which returns 0 or 1 in case of having found and parsed a message from the gps, writing the parsed info in my array (0 no found, 1 found and parsed). The function time_stamp() just call the clock_gettime(CLOCK_MONOTONIC, &time_stamp) function in order to have a time reference on each written event. Hope with this information you guys can help me. Thank you!
Obviously the processor is capable of running 20 things a second. I'd first check your filesystem performance.
Write a small program that simulates the writes just the way you're doing them and see what the performance is like.
Beyond that, I'd suggest it's the task swapping that's causing delays. Try without one of the threads. What type of performance do you get?
I'd guess it's the filesystem, though. Try buffering your writes into memory and do large (4k+) writes every few seconds and I bet that will make your system a lot happier.
Also, post your code. Otherwise all we can do is guess.
My scenario, I'm collecting network packets and if packets match a network filter I want to record the time difference between consecutive packets, this last part is the part that doesn't work. My problem is that I cant get accurate sub-second measurements no matter what C timer function I use. I've tried: gettimeofday(), clock_gettime(), and clock().
I'm looking for assistance to figure out why my timing code isn't working properly.
I'm running on a cygwin environment.
Compile Options: gcc -Wall capture.c -o capture -lwpcap -lrt
Code snippet :
/*globals*/
int first_time = 0;
struct timespec start, end;
double sec_diff = 0;
main() {
pcap_t *adhandle;
const struct pcap_pkthdr header;
const u_char *packet;
int sockfd = socket(PF_INET, SOCK_STREAM, 0);
.... (previous I create socket/connect - works fine)
save_attr = tty_set_raw();
while (1) {
packet = pcap_next(adhandle, &header); // Receive a packet? Process it
if (packet != NULL) {
got_packet(&header, packet, adhandle);
}
if (linux_kbhit()) { // User types message to channel
kb_char = linux_getch(); // Get user-supplied character
if (kb_char == 0x03) // Stop loop (exit channel) if user hits Ctrl+C
break;
}
}
tty_restore(save_attr);
close(sockfd);
pcap_close(adhandle);
printf("\nCapture complete.\n");
}
In got_packet:
got_packet(const struct pcap_pkthdr *header, const u_char *packet, pcap_t * p){ ... {
....do some packet filtering to only handle my packets, set match = 1
if (match == 1) {
if (first_time == 0) {
clock_gettime( CLOCK_MONOTONIC, &start );
first_time++;
}
else {
clock_gettime( CLOCK_MONOTONIC, &end );
sec_diff = (end.tv_sec - start.tv_sec) + ((end.tv_nsec - start.tv_nsec)/1000000000.0); // Packet difference in seconds
printf("sec_diff: %ld,\tstart_nsec: %ld,\tend_nsec: %ld\n", (end.tv_sec - start.tv_sec), start.tv_nsec, end.tv_nsec);
printf("sec_diffcalc: %ld,\tstart_sec: %ld,\tend_sec: %ld\n", sec_diff, start.tv_sec, end.tv_sec);
start = end; // Set the current to the start for next match
}
}
}
I record all packets with Wireshark to compare, so I expect the difference in my timer to be the same as Wireshark's, however that is never the case. My output for tv_sec will be correct, however tv_nsec is not even close. Say there is a 0.5 second difference in wireshark, my timer will say there is a 1.999989728 second difference.
Basically, you will want to use a timer with a higher resolution
Also, I did not check in libpcap, but I am pretty sure that libpcap can give you the time at which each packet was received. In which case, it will be closest that you can get to what Wireshark displays.
I don't think that it is the clocks that are your problem, but the way that you are waiting on new data. You should use a polling function to see when you have new data from either the socket or from the keyboard. This will allow your program to sleep when there is no new data for it to process. This is likely to make the operating system be nicer to your program when it does have data to process and schedule it quicker. This also allows you to quit the program without having to wait for the next packet to come in. Alternately you could attempt to run your program at really high or real time priority.
You should consider getting the current time at the first instance after you get a packet if the filtering can take very long. You may also want to consider multiple threads for this program if you are trying to capture data on a fast and busy network. Especially if you have more than one processor, but since you are doing some pritnfs which may block. I noticed you had a function to set a tty to raw mode, which I assume is the standard output tty. If you are actually using a serial terminal that could slow things down a lot, but standard out to a xterm can also be slow. You may want to consider setting stdout to fully buffered rather than line buffered. This should speed up the output. (man setvbuf)