arm neon assembly performance issue in xiaomi5s - arm

Consider the following codes, The first code snippet:
void run_new(const float* src, float* dst,
size_t IH, size_t IW, size_t OH, size_t OW,
size_t N) {
rep(n, N) {
const float* src_ptr = src + IW * IH * n;
float* outptr = dst;
const float* r0 = src_ptr;
const float* r1 = src_ptr + IW;
float32x4_t k0123 = vdupq_n_f32(3.f);
rep(h, OH) {
size_t width = OW >> 2;
asm volatile(
"dup v21.4s, %4.s[0] \n"
"dup v22.4s, %4.s[1] \n"
"dup v23.4s, %4.s[2] \n"
"dup v24.4s, %4.s[3] \n"
"mov x3, xzr \n"
"0: \n"
"ldr q0, [%1] \n"
"ld1 {v1.4s, v2.4s}, [%2], #32 \n"
"add x3, x3, #0x1 \n"
"cmp %0, x3 \n"
"ld1 {v3.4s, v4.4s}, [%3], #32 \n"
"fmla v0.4s, v1.4s, v21.4s \n" // src[i] * k[i]
"fmla v0.4s, v2.4s, v22.4s \n"
"fmla v0.4s, v3.4s, v23.4s \n"
"fmla v0.4s, v4.4s, v24.4s \n"
"str q0, [%1], #16 \n"
"bne 0b \n"
: "+r"(width), "+r"(outptr), "+r"(r0), "+r"(r1)
: "w"(k0123)
: "cc", "memory", "x3", "v0", "v1", "v2", "v3", "v4", "v21", "v22", "v23", "v24");
}
}
}
The second code snippet:
void run_origin(const float* src, float* dst,
size_t IH, size_t IW, size_t OH, size_t OW,
size_t N) {
rep(n, N) {
const float* src_ptr = src + IW * IH * n;
float* outptr = dst;
const float* r0 = src_ptr;
const float* r1 = src_ptr + IW;
float32x4_t k0123 = vdupq_n_f32(3.f);
rep(h, OH) {
size_t width = OW >> 2;
asm volatile(
"dup v21.4s, %4.s[0] \n"
"dup v22.4s, %4.s[1] \n"
"dup v23.4s, %4.s[2] \n"
"dup v24.4s, %4.s[3] \n"
"mov x3, xzr \n"
"mov x4, xzr \n"
"0: \n"
"add x19, %2, x4 \n"
"ldr q0, [%1] \n" // load dst 0, 1, 2, 3
"ld1 {v1.4s, v2.4s}, [x19]\n" // 1, 2, 4, 6
"add x3, x3, #0x1 \n"
"cmp %0, x3 \n"
"add x19, %3, x4 \n"
"ld1 {v3.4s, v4.4s}, [x19]\n"
"fmla v0.4s, v1.4s, v21.4s \n" // src[i] * k[i]
"fmla v0.4s, v2.4s, v22.4s \n"
"fmla v0.4s, v3.4s, v23.4s \n"
"fmla v0.4s, v4.4s, v24.4s \n"
"add x4, x4, #0x20 \n"
"str q0, [%1], #16 \n"
"bne 0b \n"
"add %2, %2, x4 \n"
"add %3, %3, x4 \n"
: "+r"(width), "+r"(outptr), "+r"(r0), "+r"(r1)
: "w"(k0123)
: "cc", "memory", "x3", "x4", "x19", "v0", "v1", "v2", "v3", "v4", "v21", "v22", "v23", "v24");
}
}
}
All the code in Test performance of arm neon assembly
I test the performance of these two codes on xiaomi5s、xiaomi6、redmi, The detail of the performance is:
N: 12 IH: 224 IW: 224 OH: 112 OW: 112
perf origin: 325.35058 mflops --- new: 4275.63483 mflops --- speedup: 13.14162 xiaomi5s
perf origin: 3082.00078 mflops --- new: 3063.45047 mflops --- speedup: 0.99398 xiaomi6
perf origin: 1761.05058 mflops --- new: 1814.37185 mflops --- speedup: 1.03028 redmi
The following test in xiaomi5s.
N: 12 IH:48-256 IW: 224
N: 12 IH: 48 IW: 224 OH: 24 OW: 112
perf origin: 3721.16633 mflops --- new: 4935.31729 mflops --- speedup: 1.32628
N: 12 IH: 80 IW: 224 OH: 40 OW: 112
perf origin: 1185.58378 mflops --- new: 3852.38266 mflops --- speedup: 3.24936
N: 12 IH: 112 IW: 224 OH: 56 OW: 112
perf origin: 1021.83468 mflops --- new: 3503.70672 mflops --- speedup: 3.42884
N: 12 IH: 144 IW: 224 OH: 72 OW: 112
perf origin: 797.61461 mflops --- new: 4167.12780 mflops --- speedup: 5.22449
N: 12 IH: 176 IW: 224 OH: 88 OW: 112
perf origin: 465.55073 mflops --- new: 4084.54206 mflops --- speedup: 8.77357
N: 12 IH: 208 IW: 224 OH: 104 OW: 112
perf origin: 373.99237 mflops --- new: 4255.78687 mflops --- speedup: 11.37934
N: 12 IH: 240 IW: 224 OH: 120 OW: 112
perf origin: 341.57406 mflops --- new: 4290.58840 mflops --- speedup: 12.56122
N: 12 IH:224 IW: 48-256
N: 12 IH: 224 IW: 48 OH: 112 OW: 24
perf origin: 3660.35916 mflops --- new: 4729.61877 mflops --- speedup: 1.29212
N: 12 IH: 224 IW: 80 OH: 112 OW: 40
perf origin: 2918.48755 mflops --- new: 4748.17285 mflops --- speedup: 1.62693
N: 12 IH: 224 IW: 112 OH: 112 OW: 56
perf origin: 951.03852 mflops --- new: 4051.84318 mflops --- speedup: 4.26044
N: 12 IH: 224 IW: 144 OH: 112 OW: 72
perf origin: 1186.74405 mflops --- new: 4160.18572 mflops --- speedup: 3.50555
N: 12 IH: 224 IW: 176 OH: 112 OW: 88
perf origin: 533.47286 mflops --- new: 4199.36622 mflops --- speedup: 7.87175
N: 12 IH: 224 IW: 208 OH: 112 OW: 104
perf origin: 447.30682 mflops --- new: 4092.22256 mflops --- speedup: 9.14858
N: 12 IH: 224 IW: 240 OH: 112 OW: 120
perf origin: 442.58206 mflops --- new: 4200.13672 mflops --- speedup: 9.49007
IC: 2-12 IH:224 IW: 224
N: 2 IH: 224 IW: 224 OH: 112 OW: 112
perf origin: 3794.45684 mflops --- new: 5236.48508 mflops --- speedup: 1.38004
N: 3 IH: 224 IW: 224 OH: 112 OW: 112
perf origin: 3790.20521 mflops --- new: 5150.30622 mflops --- speedup: 1.35885
N: 4 IH: 224 IW: 224 OH: 112 OW: 112
perf origin: 2117.55521 mflops --- new: 4329.34274 mflops --- speedup: 2.04450
N: 5 IH: 224 IW: 224 OH: 112 OW: 112
perf origin: 1290.43541 mflops --- new: 3915.65607 mflops --- speedup: 3.03437
N: 6 IH: 224 IW: 224 OH: 112 OW: 112
perf origin: 1038.86926 mflops --- new: 3747.69392 mflops --- speedup: 3.60747
N: 7 IH: 224 IW: 224 OH: 112 OW: 112
perf origin: 845.26878 mflops --- new: 4025.81237 mflops --- speedup: 4.76276
N: 8 IH: 224 IW: 224 OH: 112 OW: 112
perf origin: 658.23150 mflops --- new: 3971.62335 mflops --- speedup: 6.03378
N: 9 IH: 224 IW: 224 OH: 112 OW: 112
perf origin: 527.99489 mflops --- new: 4163.94501 mflops --- speedup: 7.88634
N: 10 IH: 224 IW: 224 OH: 112 OW: 112
perf origin: 416.75353 mflops --- new: 4119.03296 mflops --- speedup: 9.88362
N: 11 IH: 224 IW: 224 OH: 112 OW: 112
perf origin: 378.38875 mflops --- new: 4203.33717 mflops --- speedup: 11.10852
N: 12 IH: 224 IW: 224 OH: 112 OW: 112
perf origin: 350.36924 mflops --- new: 4202.19842 mflops --- speedup: 11.99363
I am confused by the performance test in xiaomi5s, Why the performance of the first code on xiaomi5s so bad.
I guess it may be caused by the pipeline of neon is broken if it wait for the normal register such as ld1 {v3.4s, v4.4s}, [x19] wait for x19 which is calculated by add x19, %3, x4, but I am not very sure。
Addition details:
xiaomi5s cpu: Qualcomm Snapdragon 821
xiaomi6 cpu: Qualcomm Snapdragon 835
redmi cpu: MediaTek Helio X20
Compile options(clang version: 5.0.0): clang++ -std=c++11 -Ofast.
I change ldr q0, [%2] to ld1 v0.4s, [%2], but the result is the same, the performance of the run_origin may be a little faster, about 1%-3%.
N: 12 IH: 224 IW: 224 OH: 112 OW: 112
perf origin: 342.96631 mflops --- asm: 4288.51646 mflops --- speedup: 12.50419
I change fmla v0.4s, v1.4s, v21.4s to smlsl2 v0.2d, v1.4s, v21.4s, but the result is the same.
N: 12 IH: 224 IW: 224 OH: 112 OW: 112
perf origin: 348.03699 mflops --- asm: 4245.18804 mflops --- speedup: 12.19752
I change fmla v0.4s, v1.4s, v21.4s to fadd v0.4s, v1.4s, v21.4s, the origin code gets faster.
N: 12 IH: 224 IW: 224 OH: 112 OW: 112
perf origin: 743.95433 mflops --- asm: 4756.65769 mflops --- speedup: 6.39375

A wild guess is that the bottleneck is just as likely to be in the memory/cache subsystem as the core. Perhaps the first case does something that inhibits automatic pre-loading (or the xiaomi5s lacks this or has it disabled)?
It might be interesting to try adding a pld (or rather prfm) instruction, though I've never found them to help much on Cortex-A9 at least.
An easy way to check if fmla is the bottleneck would be to comment out some or all of the data-processing instructions (of course, the output will be wrong!)

I'm still not as familiar with NEON64 as with NEON32, but there are several things I wouldn't do in your code:
Why are you using the VFP instruction "ldr"?. Switching between VFP and NEON can cost lots of cycles, especially if these instructions are memory accessing ones. That both share the registers doesn't mean they are the same unit. Change it to LD1 ...... 4s
Do you want it 32bit or 64bit? Chose x3 or w3, and stick to it.
Are you sure you want fused multiply with fmla? Maybe yes or maybe no, but note that fused multiplies cost more...
cheers

Related

Google Data Studio: Compare daily sales to 7-day average

I have a data source with daily sales per product.
I want to create a field that calculates the average daily sales for the 7 last days, for each product and day (e.g. on day 10 for product A, it will give me the average sales for product A on days 3 - 9; on Day 15 for product B, I'll see the average sales of B on days 8 - 14).
Is this possible?
Example data (I have the first 3 columns. need to generate the fourth)
Date Product Sales 7-Day Average
1/11 A 983 201
2/11 A 650 983
3/11 A 328 817
4/11 A 728 654
5/11 A 246 672
6/11 A 613 587
7/11 A 575 591
8/11 A 601 589
9/11 A 462 534
10/11 A 979 508
11/11 A 148 601
12/11 A 238 518
13/11 A 53 517
14/11 A 500 437
15/11 A 684 426
16/11 A 261 438
17/11 A 69 409
18/11 A 159 279
19/11 A 964 281
20/11 A 429 384
21/11 A 731 438
1/11 B 790 471
2/11 B 265 486
3/11 B 94 487
4/11 B 66 490
5/11 B 124 477
6/11 B 555 357
7/11 B 190 375
8/11 B 232 298
9/11 B 747 218
10/11 B 557 287
11/11 B 432 353
12/11 B 526 405
13/11 B 690 463
14/11 B 350 482
15/11 B 512 505
16/11 B 273 545
17/11 B 679 477
18/11 B 164 495
19/11 B 799 456
20/11 B 749 495
21/11 B 391 504
Haven't really tried anything. Couldn't figure out how to do get started with this)
This may not be the super perfect solution but it does give your expected result in a crude way.
Cross-join the same data source first as shown in the screenshot
Use the calculated field to get the last 7 day average
(CASE WHEN Date (Table 2) BETWEEN DATETIME_SUB(Date (Table 1), INTERVAL 7 DAY) AND DATETIME_SUB(Date (Table 1), INTERVAL 1 DAY) THEN Sales (Table 2) ELSE 0 END)/7
-

print each range in perl array

I have an array of ranges in Perl and need a way to loop through each range in the array, search a number and print the min..max indexes for each range. I am able to do this in bash shell scripting but having some trouble in Perl.
My code:
#!/usr/bin/perl
use List::Util qw(max min);
$search_num = 95;
#ranges = (73..80, 92..107, 941..1000, 3000..3170);
foreach $num (#ranges) {
$range_min = min(#ranges);
$range_max = max(#ranges);
if ($search_num == $n) {
print "$search was found in range $range_min..$range_max\n";
}
}
Desired output:
95 was found in range 92..107
The following works fine for indicating per hard coded range
but need a way to have a series of ranges in an array to loop, search and display where found. The following works:
#range = (92..107);
foreach $num (#range) {
$range_min = min(#range);
$range_max = max(#range);
if ($search_num == $num){
print "$search_num was found in range $range_min..$range_max\n";
}
}
Output:
95 was found in range 92..107
thanks for any advice.
#ranges=(73..80, 92..107, 941..1000, 3000..3170);
You seem to be under the impression that this will put separate range objects in #ranges. Instead, #range contains the following flat list:
$ perl -E '#ranges=(73..80, 92..107, 941..1000, 3000..3170); say "#ranges"'
73 74 75 76 77 78 79 80 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170
You can insert references to anonymous arrays in #ranges:
#ranges = ([73..80], [92..107], [941..1000], [3000..3170]);
However, since you already know the upper and lower limits of each range, why are you wasting memory?
#ranges=([73, 80], [92, 107], [941, 1000], [3000, 3170]);
Here is one way to implement that:
#!/usr/bin/env perl
use strict;
use warnings;
my #ranges=([73, 80], [92, 107], [941, 1000], [3000, 3170]);
my $search = 95;
my $found = search_in_ranges($search, \#ranges);
for my $r ( #$found ) {
printf "%d was found in [%d, %d]\n", $search, $r->[0], $r->[1];
}
sub search_in_ranges {
my ($n, $ranges) = #_;
return [ grep $n >= $_->[0] && $n <= $_->[1], #$ranges ];
}
See also perldoc perlreftut which is installed along with your Perl distribution.

Implementation of merge sort using threads and fork

Problem: I'm trying to implement Merge Sort in the following way, I have a Parent and two children. the first child will use the merge sort on his own, the second child will implement this the following way: create 2 threads, first one will sort the first half of the array, the second one will sort the rest. Then, after calling the merge sort, he will create again 2 threads for the first half, and 2 threads for the rest, and so on, until we end up in the base case and we finish. In the end, I want to check how much faster the second child implemented the merge sort than the first child.
My question: I've created 2 childs, the first child is implementing the sort merge and everything is fine. the second child - I was able to create only 2 threads, instead of much more (then 2 for each half, and so on), and in the end it neither prints the array nor the date of its finish.
This is the code for the second child:
if((id2 = fork()) == 0 && id1 != 0)
{
printf("Child2: \n");
ans1 = pthread_create ( &thread1 , NULL , mergeSort ,(arr3, (arr_size / 2) - 1 ,arr_size - 1 )) ;
ans2 = pthread_create ( &thread2 , NULL , mergeSort ,(arr3, 0, (arr_size / 2)- 1 )) ;
ans3 = pthread_create ( &thread3 , NULL , printArray ,(arr3, arr_size) ) ;
execl("/bin/date", "date",0);
if ( ans1 != 0 || ans2 != 0 || ans3 != 0) {
printf ( " \n can't create threads " ) ;
exit(0) ;
}
pthread_join ( thread1 , NULL ) ;
pthread_join ( thread2 , NULL ) ;
pthread_join ( thread3 , NULL ) ;
}
I'm using UNIX, and for compiling:
gcc -lpthread prog.c
for executing:
./a.out
This is the whole code:
/* C program for Merge Sort */
#include<stdlib.h>
#include<stdio.h>
#include <pthread.h>
#define N 100
// Merges two subarrays of arr[].
// First subarray is arr[l..m]
// Second subarray is arr[m+1..r]
void merge(int arr[], int l, int m, int r)
{
int i, j, k;
int n1 = m - l + 1;
int n2 = r - m;
/* create temp arrays */
int L[n1], R[n2];
/* Copy data to temp arrays L[] and R[] */
for (i = 0; i < n1; i++)
L[i] = arr[l + i];
for (j = 0; j < n2; j++)
R[j] = arr[m + 1+ j];
/* Merge the temp arrays back into arr[l..r]*/
i = 0; // Initial index of first subarray
j = 0; // Initial index of second subarray
k = l; // Initial index of merged subarray
while (i < n1 && j < n2)
{
if (L[i] <= R[j])
{
arr[k] = L[i];
i++;
}
else
{
arr[k] = R[j];
j++;
}
k++;
}
/* Copy the remaining elements of L[], if there
are any */
while (i < n1)
{
arr[k] = L[i];
i++;
k++;
}
/* Copy the remaining elements of R[], if there
are any */
while (j < n2)
{
arr[k] = R[j];
j++;
k++;
}
}
/* l is for left index and r is right index of the
sub-array of arr to be sorted */
void mergeSort(int arr[], int l, int r)
{
if (l < r)
{
// Same as (l+r)/2, but avoids overflow for
// large l and h
int m = l+(r-l)/2;
// Sort first and second halves
mergeSort(arr, l, m);
mergeSort(arr, m+1, r);
merge(arr, l, m, r);
}
}
/* UTILITY FUNCTIONS */
/* Function to print an array */
void printArray(int A[], int size)
{
int i;
for (i=0; i < size; i++)
printf("%d ", A[i]);
printf("\n");
}
/* Driver program to test above functions */
int main()
{
int min = -1000, max = 1000;
int arr[10], arr2[10], arr3[10];
int i,r;
int arr_size = sizeof(arr)/sizeof(arr[0]);
int id1,id2;
//Threads init
pthread_t thread1 , thread2, thread3;
int ans1, ans2, ans3;
for( i = 0; i < arr_size; i++){
r = rand() % (max - min + 1);
arr[i] = r;
arr2[i] = r;
arr3[i] = r;
}
//printf("Before: \n");
if((id1 = fork()) == 0)
{
printf("Child1: \n");
mergeSort(arr2, 0, arr_size - 1);
printArray(arr2, arr_size);
execl("/bin/date", "date",0);
}
if((id2 = fork()) == 0 && id1 != 0)
{
printf("Child2: \n");
ans1 = pthread_create ( &thread1 , NULL , mergeSort ,(arr3, (arr_size / 2) - 1 ,arr_size - 1 )) ;
ans2 = pthread_create ( &thread2 , NULL , mergeSort ,(arr3, 0, (arr_size / 2)- 1 )) ;
ans3 = pthread_create ( &thread3 , NULL , printArray ,(arr3, arr_size) ) ;
execl("/bin/date", "date",0);
if ( ans1 != 0 || ans2 != 0 || ans3 != 0) {
printf ( " \n can't create threads " ) ;
exit(0) ;
}
pthread_join ( thread1 , NULL ) ;
pthread_join ( thread2 , NULL ) ;
pthread_join ( thread3 , NULL ) ;
}
wait();
if(id1 != 0 && id2 != 0){
printf("Given array is \n");
printArray(arr, arr_size);
printf("Father:\n");
mergeSort(arr, 0, arr_size - 1);
printArray(arr, arr_size);
execl("/bin/date", "date",0);
printf("\nSorted array is \n");
//printf("After: \n");
}
return 0;
}
EDITED CODE:
/* C program for Merge Sort */
#include<stdlib.h>
#include<stdio.h>
#include <pthread.h>
#include <time.h>
#define N 100
// Merges two subarrays of arr[].
// First subarray is arr[l..m]
// Second subarray is arr[m+1..r]
void merge(int arr[], int l, int m, int r)
{
int i, j, k;
int n1 = m - l + 1;
int n2 = r - m;
/* create temp arrays */
int L[n1], R[n2];
/* Copy data to temp arrays L[] and R[] */
for (i = 0; i < n1; i++)
L[i] = arr[l + i];
for (j = 0; j < n2; j++)
R[j] = arr[m + 1+ j];
/* Merge the temp arrays back into arr[l..r]*/
i = 0; // Initial index of first subarray
j = 0; // Initial index of second subarray
k = l; // Initial index of merged subarray
while (i < n1 && j < n2)
{
if (L[i] <= R[j])
{
arr[k] = L[i];
i++;
}
else
{
arr[k] = R[j];
j++;
}
k++;
}
/* Copy the remaining elements of L[], if there
are any */
while (i < n1)
{
arr[k] = L[i];
i++;
k++;
}
/* Copy the remaining elements of R[], if there
are any */
while (j < n2)
{
arr[k] = R[j];
j++;
k++;
}
}
/* l is for left index and r is right index of the
sub-array of arr to be sorted */
void mergeSort(int arr[], int l, int r)
{
if (l < r)
{
// Same as (l+r)/2, but avoids overflow for
// large l and h
int m = l+(r-l)/2;
// Sort first and second halves
mergeSort(arr, l, m);
mergeSort(arr, m+1, r);
merge(arr, l, m, r);
}
}
void* mergeSort2(void* args)
{
int* newArgs = (int*)args;
int l = newArgs[1];
int r = newArgs[2];
pthread_t thread1 , thread2;
int ans1, ans2;
if (l < r)
{
// Same as (l+r)/2, but avoids overflow for
// large l and h
int m = (r+l)/2;
int newArgs1[3] = {newArgs[0], l, m};
int newArgs2[3] = {newArgs[0], m+1, r};
ans1 = pthread_create ( &thread1 , NULL , mergeSort2 ,(void*)newArgs1);
ans1 = pthread_create ( &thread2 , NULL , mergeSort2 ,(void*)newArgs2);
pthread_join(thread1,NULL);
pthread_join(thread2,NULL);
merge(newArgs[0], l, m, r);
}
}
/* UTILITY FUNCTIONS */
/* Function to print an array */
void printArray(int A[], int size)
{
int i;
for (i=0; i < size; i++)
printf("%d ", A[i]);
printf("\n");
}
static void print_timestamp(void)
{
time_t now = time(0);
struct tm *utc = gmtime(&now);
char iso8601[32];
strftime(iso8601, sizeof(iso8601), "%Y-%m-%d %H:%M:%S", utc);
printf("%s\n", iso8601);
}
/* Driver program to test above functions */
int main()
{
int min = -1000, max = 1000;
int arr[10], arr2[10], arr3[10];
int i,r;
int arr_size = sizeof(arr)/sizeof(arr[0]);
int id1,id2;
int args[3] ={arr3, 0, arr_size - 1};
struct timeval tvalBefore, tvalAfter;
struct timeval tvalBefore1, tvalAfter1;
//Threads init
pthread_t thread1;
int ans1;
srand(time(NULL));
for( i = 0; i < arr_size; i++){
r = rand() % (max - min + 1);
arr[i] = r;
arr2[i] = r;
arr3[i] = r;
}
//printf("Before: \n");
if((id1 = fork()) == 0)
{
gettimeofday (&tvalBefore, NULL);
//Operation to do
printf("Child1: \n");
mergeSort(arr2, 0, arr_size - 1);
printArray(arr2, arr_size);
print_timestamp();
gettimeofday (&tvalAfter, NULL);
// Changed format to long int (%ld), changed time calculation
printf("Time in microseconds for sorting CHILD 1: %ld microseconds\n",
((tvalAfter.tv_sec - tvalBefore.tv_sec)*1000000L
+tvalAfter.tv_usec) - tvalBefore.tv_usec
); // Added semicolon
}
else if((id2 = fork()) == 0)
{
printf("Child2: \n");
//Start Timer
gettimeofday (&tvalBefore1, NULL);
//Operation to do
ans1 = pthread_create ( &thread1 , NULL , mergeSort2 ,(void*)args);
pthread_join ( thread1 , NULL ) ;
print_timestamp();
gettimeofday (&tvalAfter1, NULL);
// Changed format to long int (%ld), changed time calculation
printf("Time in microseconds for sorting CHILD 2: %ld microseconds\n",
((tvalAfter1.tv_sec - tvalBefore1.tv_sec)*1000000L
+tvalAfter1.tv_usec) - tvalBefore1.tv_usec
); // Added semicolon
}
else{
wait();
wait();
gettimeofday (&tvalBefore, NULL);
//Operation to do
printf("Given array is \n");
printArray(arr, arr_size);
printf("Father:\n");
mergeSort(arr, 0, arr_size - 1);
printArray(arr, arr_size);
print_timestamp();
gettimeofday (&tvalAfter, NULL);
// Changed format to long int (%ld), changed time calculation
printf("Time in microseconds for sorting Father: %ld microseconds\n",
((tvalAfter.tv_sec - tvalBefore.tv_sec)*1000000L
+tvalAfter.tv_usec) - tvalBefore.tv_usec
); // Added semicolon
}
return 0;
}
You have several problems:
as noted in comments and Jonathan's answer, you call exec and replace your whole process image before your threads complete (and possibly before they actually start, since they may not have been given their first timeslice yet)
if you move that, you still have the problem that your printArray function was run in parallel to your sort threads, instead of afterwards
if you fix that, you still have the problem that your printArray thread was started improperly (with a likely invalid input pointer), for the same reason as for the sorting threads, described in more detail below
if you fix the printing, your sorting thread invocation is completely wrong (much detail follows below)
if you fix the thread invocation, your code still doesn't do what you claim you wanted: to keep starting new child threads for smaller and smaller sub-ranges of your input array
Let's start with the prototype of pthread_create, the declaration of your thread function, and the thread creation call:
int pthread_create(pthread_t *thread, const pthread_attr_t *attr,
void *(*start_routine) (void *), void *arg);
this requires a function of shape void* start_routine(void *) as its third argument. However, you have
void mergeSort(int arr[], int l, int r) { ... }
which will nevertheless be called with only the first argument having a defined value. I'm amazed your compiler didn't warn about this.
Now, consider your the fourth argument to pthread_create in the following call:
ans1 = pthread_create(&thread1, NULL,
mergeSort,
(arr3, (arr_size / 2) - 1 ,arr_size - 1 )) ;
it takes the expression (arr3, (arr_size / 2) - 1 ,arr_size - 1 ). However, C doesn't have tuple types, and even if it did they wouldn't be convertible to void*. Instead this uses the comma operator , to discard the results of the first two expressions, and so you're actually using the integer value of arr_size - 1 as a pointer argument.
I'd expect it to crash when it tries to start the child thread - you didn't say how your program failed, but a SEGV would be common. You can catch these in a debugger, but it'll be somewhere inside the pthread library code, so it might not help much.
A sane solution for your problem would look something like this un-tested and never-compiled sample code:
/* use this for the fourth argument to pthread_create */
struct Range {
int *array;
int left;
int right;
pthread_t thread;
};
void mergeSortRange(Range *r) {
const int width = (right - left);
const int mid = left + (width/2);
if (width > THRESHOLD) {
/* wide enough to be worth a child thread */
Range left = { r->array, r->left, mid };
Range right = { r->array, mid+1, r->right };
pthread_create(&left.thread, NULL,
mergeSortRangeThreadFunction,
&left);
mergeSortRange(&right);
pthread_join(left.thread);
mergeSortedHalved(r->array, r->left, mid, r->right);
} else {
regularSingleThreadedMergeSort(r->array, r->left, r->right);
}
}
/* this is what you pass to pthread_create */
void* mergeSortRangeThreadFunction(void *data) {
Range *r = (Range *)data;
mergeSortRange(r);
return data;
}
although, even with THRESHOLD set to something good, it's better to use a thread pool than to start & stop threads repeatedly.
Finally, of course, you don't need to use recursion to start these threads and populate these Range structures - you could just create an array of size/THRESHOLD + 1 range descriptors, create one thread per core, and then figure out some logic for deciding when you're allowed to merge two consecutive ranges.
Program stops because of calls to execl()
You have:
…
ans3 = pthread_create ( &thread3 , NULL , printArray ,(arr3, arr_size) ) ;
execl("/bin/date", "date",0);
if ( ans1 != 0 || ans2 != 0 || ans3 != 0) {
…
The execl() replaces your process and all its threads with date, which produces its output and exits. You can't time-stamp your work like that!
You probably need to call time() or a higher-resolution timing mechanism, and then localtime() or gmtime() to create a broken-down time, and then strftime() to format it as you want, and finally printf() or similar to print the result. That all belongs in a function, of course, not in your code.
#include <stdio.h>
#include <time.h>
static void print_timestamp(void)
{
time_t now = time(0);
struct tm *utc = gmtime(&now);
char iso8601[32];
strftime(iso8601, sizeof(iso8601), "%Y-%m-%dT%H:%M:%S", utc);
printf("%s\n", iso8601);
}
Where you have execl(), call print_timestamp() instead.
Or, more simply, use system() instead of execl():
system("/bin/date");
This is a grotesquely heavyweight way of reporting the time, but it has the merit of simplicity.
Sub-second resolution times
I need to determine the time in milliseconds.
It depends on your platform, but on POSIX-ish systems you can use clock_gettime() or gettimeofday() to get sub-second timing.
#include <stdio.h>
#include <time.h>
#include <sys/time.h>
static void print_timestamp(void) // UTC to seconds
{
time_t now = time(0);
struct tm *utc = gmtime(&now);
char iso8601[32];
strftime(iso8601, sizeof(iso8601), "%Y-%m-%dT%H:%M:%S", utc);
printf("%s\n", iso8601);
}
static void print_utc_ms(void) // UTC to milliseconds
{
struct timeval tv;
gettimeofday(&tv, 0);
struct tm *utc = gmtime(&tv.tv_sec);
char iso8601[32];
strftime(iso8601, sizeof(iso8601), "%Y-%m-%dT%H:%M:%S", utc);
printf("%s.%.3d\n", iso8601, tv.tv_usec / 1000);
}
static void print_local_us(void) // Local time to microseconds
{
struct timespec ts;
clock_gettime(CLOCK_REALTIME, &ts); // CLOCK_MONOTONIC has merits too
struct tm *lmt = localtime(&ts.tv_sec);
char iso8601[32];
strftime(iso8601, sizeof(iso8601), "%Y-%m-%dT%H:%M:%S", lmt);
printf("%s.%.6ld\n", iso8601, ts.tv_nsec / 1000L);
}
int main(void)
{
print_timestamp();
print_utc_ms();
print_local_us();
return 0;
}
Example output:
2017-05-05T16:04:14
2017-05-05T16:04:14.268
2017-05-05T09:04:14.268975
NB: Once you've fixed your code so it isn't using execl(), there may still be other problems to resolve — there probably are other problems to fix. But fixing this is a key step to getting your threads to run to completion.
Creating working code
Taking the revised code from the question, applying basic 'cleanliness' to it (making sure it compiles cleanly under stringent warning options), the program seems to work. The 'array of int' approach to passing a pointer and two int values doesn't work on a 64-bit system, so I created a struct Sort to contain the information. I also moved the 'start clock' and 'stop clock' calls to gettimeofday() closer to the code being measured (no printing in the calling code in the way). I added headers needed on macOS Sierra 10.12.4 (GCC 7.1.0). The code also prints the input data before it sorts any of it. The cleanup work was basically 'around' the sort code; the core sorting algorithms were not changed at all.
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#include <time.h>
#include <sys/time.h> // gettimeofday()
#include <unistd.h> // fork()
#include <sys/wait.h> // wait()
#define N 100
struct Sort
{
int *data;
int lo;
int hi;
};
// Merges two subarrays of arr[].
// First subarray is arr[l..m]
// Second subarray is arr[m+1..r]
static
void merge(int arr[], int l, int m, int r)
{
int i, j, k;
int n1 = m - l + 1;
int n2 = r - m;
/* create temp arrays */
int L[n1], R[n2];
/* Copy data to temp arrays L[] and R[] */
for (i = 0; i < n1; i++)
L[i] = arr[l + i];
for (j = 0; j < n2; j++)
R[j] = arr[m + 1 + j];
/* Merge the temp arrays back into arr[l..r]*/
i = 0; // Initial index of first subarray
j = 0; // Initial index of second subarray
k = l; // Initial index of merged subarray
while (i < n1 && j < n2)
{
if (L[i] <= R[j])
{
arr[k] = L[i];
i++;
}
else
{
arr[k] = R[j];
j++;
}
k++;
}
/* Copy the remaining elements of L[], if there
are any */
while (i < n1)
{
arr[k] = L[i];
i++;
k++;
}
/* Copy the remaining elements of R[], if there
are any */
while (j < n2)
{
arr[k] = R[j];
j++;
k++;
}
}
/* l is for left index and r is right index of the
sub-array of arr to be sorted */
static
void mergeSort(int arr[], int l, int r)
{
if (l < r)
{
// Same as (l+r)/2, but avoids overflow for
// large l and h
int m = l + (r - l) / 2;
// Sort first and second halves
mergeSort(arr, l, m);
mergeSort(arr, m + 1, r);
merge(arr, l, m, r);
}
}
static
void *mergeSort2(void *args)
{
struct Sort *newargs = args;
int *data = newargs->data;
int l = newargs->lo;
int r = newargs->hi;
pthread_t thread1, thread2;
int ans1, ans2;
if (l < r)
{
int m = (r + l) / 2;
struct Sort newArgs1 = {data, l, m};
struct Sort newArgs2 = {data, m + 1, r};
ans1 = pthread_create(&thread1, NULL, mergeSort2, &newArgs1);
ans2 = pthread_create(&thread2, NULL, mergeSort2, &newArgs2);
if (ans1 != 0 || ans2 != 0)
exit(1);
pthread_join(thread1, NULL);
pthread_join(thread2, NULL);
merge(data, l, m, r);
}
return 0;
}
/* UTILITY FUNCTIONS */
/* Function to print an array */
static
void printArray(int A[], int size)
{
for (int i = 0; i < size; i++)
printf("%d ", A[i]);
printf("\n");
}
static void print_timestamp(void)
{
time_t now = time(0);
struct tm *utc = gmtime(&now);
char iso8601[32];
strftime(iso8601, sizeof(iso8601), "%Y-%m-%d %H:%M:%S", utc);
printf("%s\n", iso8601);
}
/* Driver program to test above functions */
int main(void)
{
int min = -1000, max = 1000;
int arr[10], arr2[10], arr3[10];
int i, r;
int arr_size = sizeof(arr) / sizeof(arr[0]);
int id1, id2;
struct Sort args = { arr3, 0, arr_size - 1};
struct timeval tvalBefore, tvalAfter;
struct timeval tvalBefore1, tvalAfter1;
// Threads init
pthread_t thread1;
int ans1;
srand(time(NULL));
for (i = 0; i < arr_size; i++)
{
r = rand() % (max - min + 1);
arr[i] = r;
arr2[i] = r;
arr3[i] = r;
}
printf("Given array is \n");
printArray(arr, arr_size);
fflush(stdout);
if ((id1 = fork()) == 0)
{
printf("Child1: \n");
gettimeofday(&tvalBefore, NULL);
mergeSort(arr2, 0, arr_size - 1);
gettimeofday(&tvalAfter, NULL);
printArray(arr2, arr_size);
print_timestamp();
printf("Time in microseconds for sorting CHILD 1: %ld microseconds\n",
((tvalAfter.tv_sec - tvalBefore.tv_sec) * 1000000L
+ tvalAfter.tv_usec) - tvalBefore.tv_usec);
}
else if ((id2 = fork()) == 0)
{
printf("Child2: \n");
gettimeofday(&tvalBefore1, NULL);
ans1 = pthread_create(&thread1, NULL, mergeSort2, &args);
if (ans1 == 0)
pthread_join( thread1, NULL );
gettimeofday(&tvalAfter1, NULL);
print_timestamp();
printArray(arr3, arr_size);
printf("Time in microseconds for sorting CHILD 2: %ld microseconds\n",
((tvalAfter1.tv_sec - tvalBefore1.tv_sec) * 1000000L
+ tvalAfter1.tv_usec) - tvalBefore1.tv_usec);
}
else
{
wait(0);
wait(0);
printf("Parent:\n");
gettimeofday(&tvalBefore, NULL);
mergeSort(arr, 0, arr_size - 1);
gettimeofday(&tvalAfter, NULL);
printArray(arr, arr_size);
print_timestamp();
printf("Time in microseconds for sorting Parent: %ld microseconds\n",
((tvalAfter.tv_sec - tvalBefore.tv_sec) * 1000000L
+ tvalAfter.tv_usec) - tvalBefore.tv_usec);
}
return 0;
}
Compilation (source in ms83.c):
$ gcc -O3 -g -std=c11 -Wall -Wextra -Werror -Wmissing-prototypes \
> -Wstrict-prototypes -Wold-style-definition ms83.c -o ms83
$
Example run 1:
Given array is
574 494 441 870 1121 800 1864 1819 889 242
Child1:
242 441 494 574 800 870 889 1121 1819 1864
2017-05-05 21:31:23
Time in microseconds for sorting CHILD 1: 10 microseconds
Child2:
2017-05-05 21:31:23
242 441 494 574 800 870 889 1121 1819 1864
Time in microseconds for sorting CHILD 2: 3260 microseconds
Parent:
242 441 494 574 800 870 889 1121 1819 1864
2017-05-05 21:31:23
Time in microseconds for sorting Parent: 7 microseconds
Example run 2:
Given array is
150 562 748 1685 889 1859 1807 1904 863 1675
Child1:
150 562 748 863 889 1675 1685 1807 1859 1904
2017-05-05 21:31:40
Time in microseconds for sorting CHILD 1: 11 microseconds
Child2:
2017-05-05 21:31:40
150 562 748 863 889 1675 1685 1807 1859 1904
Time in microseconds for sorting CHILD 2: 4745 microseconds
Parent:
150 562 748 863 889 1675 1685 1807 1859 1904
2017-05-05 21:31:40
Time in microseconds for sorting Parent: 7 microseconds
Note that the threading solution is three orders of magnitude slower than the non-threading code.
When I tried increasing the array size from 10 to 10,000, the threaded child did not complete. That means thread creation failed somewhere. The error reporting is defective (I was being lazy). Switching to 500 entries yielded:
Given array is
1984 1436 713 1349 855 1296 559 1647 567 1153 1156 1395 865 1380 840 1253 714 1396 333 404 538 1468 1381 489 1274 34 697 1484 1742 756 1221 1717 331 532 746 842 1235 1179 1185 1547 1372 1305 138 404 76 762 605 61 1242 1075 1896 203 1173 844 1582 1356 1044 1760 1635 1833 1595 1651 1892 1842 1508 727 357 221 878 967 1665 1783 1927 1655 1110 220 711 371 1785 401 188 1132 1947 1214 5 1414 1065 730 826 807 1155 654 1745 1993 1215 741 1721 1509 604 16 139 804 1773 690 1673 861 1657 566 969 1891 1718 1801 200 1817 235 711 372 319 507 483 1332 968 1138 246 1082 1074 1569 1774 488 358 1713 350 583 381 418 300 1011 416 563 748 1858 837 1678 1336 1516 1177 1449 1664 1991 1465 1159 1653 1724 311 1360 902 1182 1768 1471 1606 1813 1925 825 122 1647 1790 1575 323 153 33 1825 1343 1183 1707 1724 1839 1190 1936 442 1370 206 1530 1142 561 952 478 25 1666 382 1092 418 720 1864 652 313 1878 1268 993 1446 1881 893 1416 319 577 1147 688 1155 726 1336 1354 1419 217 1236 213 1715 101 946 1450 135 297 1962 1405 455 924 26 569 755 64 1459 1636 395 1417 138 924 1360 893 1216 1231 1546 1104 252 697 1602 1794 1565 1945 1738 941 1813 1829 714 280 369 1861 1466 1195 1284 1936 78 1988 145 1541 1927 833 135 913 1214 405 23 1107 390 242 309 964 1311 724 284 342 1550 1394 759 1860 28 1369 1417 362 747 1732 26 1791 646 1817 1392 666 762 1297 945 507 58 928 1972 811 170 1660 1811 1969 573 242 1297 74 581 1513 1258 1311 547 627 942 1965 945 343 1633 197 843 249 77 320 611 1674 303 1346 148 533 1800 259 916 1498 1058 365 973 451 1143 1121 1033 126 595 726 1232 894 1584 878 1076 1796 257 531 144 740 1033 630 471 919 773 1276 1523 1195 475 667 40 91 1336 350 1650 970 1712 542 1927 168 1107 917 1271 649 1006 1428 20 1341 1283 774 1781 1427 1342 316 1317 1162 1333 991 1288 1853 1917 210 1589 1744 1942 962 557 1444 396 1330 378 625 1776 179 434 290 870 961 1365 226 605 1842 1629 1421 1883 108 102 1068 671 1086 692 1053 45 660 1746 1351 399 1308 833 42 1219 491 248 503 499 3 1965 1043 1452 604 1736 1974 675 14 1491 1757 1116 1520 1540 983 108 15 1030 742 1535 423 1802 1622 1401 1801 167 824 230 404 1722 814 1222 1626 1177 1772 1645 27 1061 1848 1031 1659 1725 1862 959 362 728 1644 957 934 1160 1862 915 995 1201 119 1191 259 963 1889
Child1:
3 5 14 15 16 20 23 25 26 26 27 28 33 34 40 42 45 58 61 64 74 76 77 78 91 101 102 108 108 119 122 126 135 135 138 138 139 144 145 148 153 167 168 170 179 188 197 200 203 206 210 213 217 220 221 226 230 235 242 242 246 248 249 252 257 259 259 280 284 290 297 300 303 309 311 313 316 319 319 320 323 331 333 342 343 350 350 357 358 362 362 365 369 371 372 378 381 382 390 395 396 399 401 404 404 404 405 416 418 418 423 434 442 451 455 471 475 478 483 488 489 491 499 503 507 507 531 532 533 538 542 547 557 559 561 563 566 567 569 573 577 581 583 595 604 604 605 605 611 625 627 630 646 649 652 654 660 666 667 671 675 688 690 692 697 697 711 711 713 714 714 720 724 726 726 727 728 730 740 741 742 746 747 748 755 756 759 762 762 773 774 804 807 811 814 824 825 826 833 833 837 840 842 843 844 855 861 865 870 878 878 893 893 894 902 913 915 916 917 919 924 924 928 934 941 942 945 945 946 952 957 959 961 962 963 964 967 968 969 970 973 983 991 993 995 1006 1011 1030 1031 1033 1033 1043 1044 1053 1058 1061 1065 1068 1074 1075 1076 1082 1086 1092 1104 1107 1107 1110 1116 1121 1132 1138 1142 1143 1147 1153 1155 1155 1156 1159 1160 1162 1173 1177 1177 1179 1182 1183 1185 1190 1191 1195 1195 1201 1214 1214 1215 1216 1219 1221 1222 1231 1232 1235 1236 1242 1253 1258 1268 1271 1274 1276 1283 1284 1288 1296 1297 1297 1305 1308 1311 1311 1317 1330 1332 1333 1336 1336 1336 1341 1342 1343 1346 1349 1351 1354 1356 1360 1360 1365 1369 1370 1372 1380 1381 1392 1394 1395 1396 1401 1405 1414 1416 1417 1417 1419 1421 1427 1428 1436 1444 1446 1449 1450 1452 1459 1465 1466 1468 1471 1484 1491 1498 1508 1509 1513 1516 1520 1523 1530 1535 1540 1541 1546 1547 1550 1565 1569 1575 1582 1584 1589 1595 1602 1606 1622 1626 1629 1633 1635 1636 1644 1645 1647 1647 1650 1651 1653 1655 1657 1659 1660 1664 1665 1666 1673 1674 1678 1707 1712 1713 1715 1717 1718 1721 1722 1724 1724 1725 1732 1736 1738 1742 1744 1745 1746 1757 1760 1768 1772 1773 1774 1776 1781 1783 1785 1790 1791 1794 1796 1800 1801 1801 1802 1811 1813 1813 1817 1817 1825 1829 1833 1839 1842 1842 1848 1853 1858 1860 1861 1862 1862 1864 1878 1881 1883 1889 1891 1892 1896 1917 1925 1927 1927 1927 1936 1936 1942 1945 1947 1962 1965 1965 1969 1972 1974 1984 1988 1991 1993
2017-05-05 21:43:11
Time in microseconds for sorting CHILD 1: 62 microseconds
Child2:
2017-05-05 21:43:11
3 5 14 15 16 20 23 25 26 26 27 28 33 34 40 42 45 58 61 64 74 76 77 78 91 101 102 108 108 119 122 126 135 135 138 138 139 144 145 148 153 167 168 170 179 188 197 200 203 206 210 213 217 220 221 226 230 235 242 242 246 248 249 252 257 259 259 280 284 290 297 300 303 309 311 313 316 319 319 320 323 331 333 342 343 350 350 357 358 362 362 365 369 371 372 378 381 382 390 395 396 399 401 404 404 404 405 416 418 418 423 434 442 451 455 471 475 478 483 488 489 491 499 503 507 507 531 532 533 538 542 547 557 559 561 563 566 567 569 573 577 581 583 595 604 604 605 605 611 625 627 630 646 649 652 654 660 666 667 671 675 688 690 692 697 697 711 711 713 714 714 720 724 726 726 727 728 730 740 741 742 746 747 748 755 756 759 762 762 773 774 804 807 811 814 824 825 826 833 833 837 840 842 843 844 855 861 865 870 878 878 893 893 894 902 913 915 916 917 919 924 924 928 934 941 942 945 945 946 952 957 959 961 962 963 964 967 968 969 970 973 983 991 993 995 1006 1011 1030 1031 1033 1033 1043 1044 1053 1058 1061 1065 1068 1074 1075 1076 1082 1086 1092 1104 1107 1107 1110 1116 1121 1132 1138 1142 1143 1147 1153 1155 1155 1156 1159 1160 1162 1173 1177 1177 1179 1182 1183 1185 1190 1191 1195 1195 1201 1214 1214 1215 1216 1219 1221 1222 1231 1232 1235 1236 1242 1253 1258 1268 1271 1274 1276 1283 1284 1288 1296 1297 1297 1305 1308 1311 1311 1317 1330 1332 1333 1336 1336 1336 1341 1342 1343 1346 1349 1351 1354 1356 1360 1360 1365 1369 1370 1372 1380 1381 1392 1394 1395 1396 1401 1405 1414 1416 1417 1417 1419 1421 1427 1428 1436 1444 1446 1449 1450 1452 1459 1465 1466 1468 1471 1484 1491 1498 1508 1509 1513 1516 1520 1523 1530 1535 1540 1541 1546 1547 1550 1565 1569 1575 1582 1584 1589 1595 1602 1606 1622 1626 1629 1633 1635 1636 1644 1645 1647 1647 1650 1651 1653 1655 1657 1659 1660 1664 1665 1666 1673 1674 1678 1707 1712 1713 1715 1717 1718 1721 1722 1724 1724 1725 1732 1736 1738 1742 1744 1745 1746 1757 1760 1768 1772 1773 1774 1776 1781 1783 1785 1790 1791 1794 1796 1800 1801 1801 1802 1811 1813 1813 1817 1817 1825 1829 1833 1839 1842 1842 1848 1853 1858 1860 1861 1862 1862 1864 1878 1881 1883 1889 1891 1892 1896 1917 1925 1927 1927 1927 1936 1936 1942 1945 1947 1962 1965 1965 1969 1972 1974 1984 1988 1991 1993
Time in microseconds for sorting CHILD 2: 83377 microseconds
Parent:
3 5 14 15 16 20 23 25 26 26 27 28 33 34 40 42 45 58 61 64 74 76 77 78 91 101 102 108 108 119 122 126 135 135 138 138 139 144 145 148 153 167 168 170 179 188 197 200 203 206 210 213 217 220 221 226 230 235 242 242 246 248 249 252 257 259 259 280 284 290 297 300 303 309 311 313 316 319 319 320 323 331 333 342 343 350 350 357 358 362 362 365 369 371 372 378 381 382 390 395 396 399 401 404 404 404 405 416 418 418 423 434 442 451 455 471 475 478 483 488 489 491 499 503 507 507 531 532 533 538 542 547 557 559 561 563 566 567 569 573 577 581 583 595 604 604 605 605 611 625 627 630 646 649 652 654 660 666 667 671 675 688 690 692 697 697 711 711 713 714 714 720 724 726 726 727 728 730 740 741 742 746 747 748 755 756 759 762 762 773 774 804 807 811 814 824 825 826 833 833 837 840 842 843 844 855 861 865 870 878 878 893 893 894 902 913 915 916 917 919 924 924 928 934 941 942 945 945 946 952 957 959 961 962 963 964 967 968 969 970 973 983 991 993 995 1006 1011 1030 1031 1033 1033 1043 1044 1053 1058 1061 1065 1068 1074 1075 1076 1082 1086 1092 1104 1107 1107 1110 1116 1121 1132 1138 1142 1143 1147 1153 1155 1155 1156 1159 1160 1162 1173 1177 1177 1179 1182 1183 1185 1190 1191 1195 1195 1201 1214 1214 1215 1216 1219 1221 1222 1231 1232 1235 1236 1242 1253 1258 1268 1271 1274 1276 1283 1284 1288 1296 1297 1297 1305 1308 1311 1311 1317 1330 1332 1333 1336 1336 1336 1341 1342 1343 1346 1349 1351 1354 1356 1360 1360 1365 1369 1370 1372 1380 1381 1392 1394 1395 1396 1401 1405 1414 1416 1417 1417 1419 1421 1427 1428 1436 1444 1446 1449 1450 1452 1459 1465 1466 1468 1471 1484 1491 1498 1508 1509 1513 1516 1520 1523 1530 1535 1540 1541 1546 1547 1550 1565 1569 1575 1582 1584 1589 1595 1602 1606 1622 1626 1629 1633 1635 1636 1644 1645 1647 1647 1650 1651 1653 1655 1657 1659 1660 1664 1665 1666 1673 1674 1678 1707 1712 1713 1715 1717 1718 1721 1722 1724 1724 1725 1732 1736 1738 1742 1744 1745 1746 1757 1760 1768 1772 1773 1774 1776 1781 1783 1785 1790 1791 1794 1796 1800 1801 1801 1802 1811 1813 1813 1817 1817 1825 1829 1833 1839 1842 1842 1848 1853 1858 1860 1861 1862 1862 1864 1878 1881 1883 1889 1891 1892 1896 1917 1925 1927 1927 1927 1936 1936 1942 1945 1947 1962 1965 1965 1969 1972 1974 1984 1988 1991 1993
2017-05-05 21:43:11
Time in microseconds for sorting Parent: 51 microseconds
Different runs showed dramatic variations in the processing time for child 2. I observed the values: 83,377; 73,929; 78,977; 83,977; 94,159; 81,526 microseconds.
You might get some benefit from threading with large data sets sorted by a small number of threads (say 10,000 rows of data, but only 8 threads, each sorting 1250 rows of data), but probably not even then. As you increase the number of threads beyond the number of cores on the system, you get less and less benefit from the multiple threads.

List windows route table

I want to list entries from windows route table. Same output as from route print. I use GetIpForwardTable2 function from IP Helper API. But I get some weird results which differ from route command output.
I run it in Windows 7 64bit in VirtualBox where I have 3 network cards (NAT, Bridge and Internal Network) and compile it under cygwin with following command:
gcc -D_WIN32_WINNT=0x0601 -DNTDDI_VERSION=0x06010000 win-iproute.c -liphlpapi
Those _WIN32_WINNT and NTDDI_VERSION are just to make functionality from Win7 available.
To make it simplier I consider ipv4 only now.
Here is the code:
#include <windows.h>
#include <winsock2.h>
#include <iphlpapi.h>
#include <Mstcpip.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
DWORD retval;
MIB_IPFORWARD_TABLE2 *routes = NULL;
MIB_IPFORWARD_ROW2 *route;
int idx;
retval = GetIpForwardTable2(AF_INET, &routes);
if (retval != ERROR_SUCCESS)
{
fprintf(stderr, "GetIpForwardTable2 failed (0x%x)\n.", retval);
return 1;
}
printf("Route entries count: %lu\n", routes->NumEntries);
for (idx = 0; idx < routes->NumEntries; idx++)
{
printf("\n -- Entry #%d -- \n", idx);
route = routes->Table + idx;
printf("luid: \t\t Reserved: %u, NetLuidIndex %u, IfType %u\n",
route->InterfaceLuid.Info.Reserved,
route->InterfaceLuid.Info.NetLuidIndex,
route->InterfaceLuid.Info.IfType);
printf("protocol: \t %lu\n", route->Protocol);
printf("origin: \t %lu\n", route->Origin);
printf("loopback: \t %lu\n", route->Loopback);
printf("next hop: \t %s\n", inet_ntoa(route->NextHop.Ipv4.sin_addr));
printf("site prefix length: \t %u\n", route->SitePrefixLength);
printf("prefix length: \t %u\n", route->DestinationPrefix.PrefixLength);
printf("prefix : \t %s\n", inet_ntoa(route->DestinationPrefix.Prefix.Ipv4.sin_addr));
}
return 0;
}
And the output is:
Route entries count: 22
-- Entry #0 --
luid: Reserved: 0, NetLuidIndex 6, IfType 6
protocol: 0
origin: 0
loopback: 0
next hop: 0.0.0.0
site prefix length: 0
prefix length: 0
prefix : 0.0.0.0
-- Entry #1 --
luid: Reserved: 0, NetLuidIndex 0, IfType 0
protocol: 0
origin: 0
loopback: 0
next hop: 0.0.0.0
site prefix length: 0
prefix length: 3
prefix : 0.0.0.0
-- Entry #2 --
luid: Reserved: 0, NetLuidIndex 0, IfType 0
protocol: 4294967295
origin: 257
loopback: 0
next hop: 0.0.0.0
site prefix length: 0
prefix length: 10
prefix : 0.1.0.0
-- Entry #3 --
luid: Reserved: 17, NetLuidIndex 0, IfType 0
protocol: 11
origin: 0
loopback: 2
next hop: 0.0.0.0
site prefix length: 17
prefix length: 0
prefix : 2.0.0.0
-- Entry #4 --
luid: Reserved: 0, NetLuidIndex 0, IfType 0
protocol: 32
origin: 0
loopback: 2
next hop: 0.1.0.0
site prefix length: 0
prefix length: 255
prefix : 2.0.0.0
-- Entry #5 --
luid: Reserved: 0, NetLuidIndex 0, IfType 0
protocol: 0
origin: 256
loopback: 255
next hop: 0.0.0.0
site prefix length: 0
prefix length: 11
prefix : 255.255.255.255
-- Entry #6 --
luid: Reserved: 3, NetLuidIndex 65792, IfType 0
protocol: 201326592
origin: 2
loopback: 0
next hop: 0.0.0.0
site prefix length: 3
prefix length: 24
prefix : 0.0.6.0
-- Entry #7 --
luid: Reserved: 5855577, NetLuidIndex 89, IfType 0
protocol: 0
origin: 2
loopback: 0
next hop: 0.1.0.0
site prefix length: 89
prefix length: 0
prefix : 0.0.0.0
-- Entry #8 --
luid: Reserved: 0, NetLuidIndex 0, IfType 0
protocol: 0
origin: 4294967295
loopback: 0
next hop: 2.0.0.0
site prefix length: 0
prefix length: 0
prefix : 0.0.0.0
-- Entry #9 --
luid: Reserved: 16777215, NetLuidIndex 65791, IfType 0
protocol: 593
origin: 1572864
loopback: 0
next hop: 2.0.0.0
site prefix length: 255
prefix length: 0
prefix : 0.0.0.0
-- Entry #10 --
luid: Reserved: 1, NetLuidIndex 512, IfType 0
protocol: 0
origin: 0
loopback: 0
next hop: 255.255.255.255
site prefix length: 1
prefix length: 0
prefix : 0.0.0.0
-- Entry #11 --
luid: Reserved: 4, NetLuidIndex 512, IfType 0
protocol: 0
origin: 0
loopback: 0
next hop: 0.0.6.0
site prefix length: 4
prefix length: 81
prefix : 0.0.0.0
-- Entry #12 --
luid: Reserved: 0, NetLuidIndex 16776960, IfType 65535
protocol: 3
origin: 1
loopback: 0
next hop: 0.0.0.0
site prefix length: 0
prefix length: 0
prefix : 0.1.0.0
-- Entry #13 --
luid: Reserved: 0, NetLuidIndex 12, IfType 6
protocol: 4294967295
origin: 0
loopback: 0
next hop: 0.0.0.0
site prefix length: 0
prefix length: 0
prefix : 0.0.0.0
-- Entry #14 --
luid: Reserved: 0, NetLuidIndex 0, IfType 0
protocol: 0
origin: 0
loopback: 0
next hop: 0.0.0.0
site prefix length: 0
prefix length: 3
prefix : 0.0.0.0
-- Entry #15 --
luid: Reserved: 0, NetLuidIndex 0, IfType 0
protocol: 4294967295
origin: 257
loopback: 0
next hop: 0.0.0.0
site prefix length: 0
prefix length: 255
prefix : 0.1.0.0
-- Entry #16 --
luid: Reserved: 585, NetLuidIndex 0, IfType 0
protocol: 3449440
origin: 0
loopback: 0
next hop: 0.0.0.0
site prefix length: 73
prefix length: 0
prefix : 2.0.0.0
-- Entry #17 --
luid: Reserved: 3211321, NetLuidIndex 13056, IfType 65
protocol: 3342403
origin: 4325427
loopback: 49
next hop: 125.0.0.0
site prefix length: 53
prefix length: 68
prefix : 54.0.45.0
-- Entry #18 --
luid: Reserved: 3473453, NetLuidIndex 17408, IfType 54
protocol: 0
origin: 0
loopback: 0
next hop: 0.0.0.0
site prefix length: 0
prefix length: 0
prefix : 70.0.69.0
-- Entry #19 --
luid: Reserved: 0, NetLuidIndex 0, IfType 0
protocol: 7471205
origin: 7274610
loopback: 0
next hop: 115.0.97.0
site prefix length: 111
prefix length: 0
prefix : 0.0.0.0
-- Entry #20 --
luid: Reserved: 7274611, NetLuidIndex 26112, IfType 116
protocol: 3277144
origin: 50725
loopback: 0
next hop: 49.69.55.56
site prefix length: 51
prefix length: 56
prefix : 65.0.100.0
-- Entry #21 --
luid: Reserved: 3277144, NetLuidIndex 0, IfType 0
protocol: 0
origin: 0
loopback: 0
next hop: 0.0.0.0
site prefix length: 192
prefix length: 0
prefix : 16.0.0.0
While the output of route print -4 is following:
===========================================================================
Interface List
16...08 00 27 7e 98 16 ......Intel(R) PRO/1000 MT Desktop Adapter #3
14...08 00 27 86 3d 31 ......Intel(R) PRO/1000 MT Desktop Adapter #2
11...08 00 27 42 d2 16 ......Intel(R) PRO/1000 MT Desktop Adapter
1...........................Software Loopback Interface 1
12...00 00 00 00 00 00 00 e0 Microsoft ISATAP Adapter
13...00 00 00 00 00 00 00 e0 Teredo Tunneling Pseudo-Interface
15...00 00 00 00 00 00 00 e0 Microsoft ISATAP Adapter #2
17...00 00 00 00 00 00 00 e0 Microsoft ISATAP Adapter #3
===========================================================================
IPv4 Route Table
===========================================================================
Active Routes:
Network Destination Netmask Gateway Interface Metric
0.0.0.0 0.0.0.0 10.0.2.2 10.0.2.15 10
0.0.0.0 0.0.0.0 10.0.0.138 10.0.0.36 10
10.0.0.0 255.255.255.0 On-link 10.0.0.36 266
10.0.0.36 255.255.255.255 On-link 10.0.0.36 266
10.0.0.255 255.255.255.255 On-link 10.0.0.36 266
10.0.2.0 255.255.255.0 On-link 10.0.2.15 266
10.0.2.15 255.255.255.255 On-link 10.0.2.15 266
10.0.2.255 255.255.255.255 On-link 10.0.2.15 266
89.89.89.0 255.255.255.0 On-link 89.89.89.89 266
89.89.89.89 255.255.255.255 On-link 89.89.89.89 266
89.89.89.255 255.255.255.255 On-link 89.89.89.89 266
127.0.0.0 255.0.0.0 On-link 127.0.0.1 306
127.0.0.1 255.255.255.255 On-link 127.0.0.1 306
127.255.255.255 255.255.255.255 On-link 127.0.0.1 306
224.0.0.0 240.0.0.0 On-link 127.0.0.1 306
224.0.0.0 240.0.0.0 On-link 10.0.2.15 266
224.0.0.0 240.0.0.0 On-link 10.0.0.36 266
224.0.0.0 240.0.0.0 On-link 89.89.89.89 266
255.255.255.255 255.255.255.255 On-link 127.0.0.1 306
255.255.255.255 255.255.255.255 On-link 10.0.2.15 266
255.255.255.255 255.255.255.255 On-link 10.0.0.36 266
255.255.255.255 255.255.255.255 On-link 89.89.89.89 266
===========================================================================
Persistent Routes:
None
There is a lot of weird stuff in the code output. Many entries have undocumented values, for example:
Protocol should be within range 1-14 (almost non entry has such value)
Luid.IfType shouldn't be 0 (again almost all are zero)
almost non entry gives any reasonable Prefix
It's described here MIB_IPFORWARD_ROW2 and here NET_LUID
Should I just ignore those with invalid values? and if so where are the valid ones? Or am I doing something terribly wrong?
I also discovered that when I start Windows with cables unplugged it gives less entries (which makes sense). Then I plug in the cables and entries are added. But when I unplug again they are still there. route command works as expected, when cable is unplugged entries are reduced.
When I try older function GetIpForwardTable it works. But it doesn't support ipv6.
So it seems that the problem was in cygwin. When I compile the example code with Microsoft C compiler cl.ex it works as expected. And after update of cygwin it works when compiled using gcc too.
Interesting is that it was enough to update the packages using cygwin installer, cygwin1.dll can remain in older version.

My program returns the wrong values

I am making a program that will calculate the minimum and maximum cost of flight (supposed to be a simple program to practice for an exam) using a separate function to calculate the cost of the flight.
the code is this:
#include<stdio.h>
#include<limits.h>
float cost(float k, int ck, int n)
{
int x;
x = (k*ck)/n;
return x;
}
main()
{
int cont=1, n, nv, costmax = 0, costmin = INT_MAX, ck;
float k;
printf("Introduce the number of flights: \n");
scanf("%d", &nv);
for(cont=1; cont <= nv; cont++)
{
printf("Introduce the number of passangers on flight %d:\n", cont);
scanf("%d", &n);
printf("Introduce the number of distance on flight %d:\n", cont);
scanf("%d", &k);
if(k < 500)
{
ck=50;
}
if(k > 500)
{
ck=80;
}
cost(k,ck,n);
if(cost(k, ck, n) < costmin)
{
costmin = cost(k, ck, n);
}
if(cost(k, ck, n) > costmax)
{
costmax = cost(k, ck, n);
}
}
printf("\nMinimum cost = %d \n", costmin);
printf("\nMaximum cost = %d \n", costmax);
}
and we're supposed to use a text file to input the data
156 397 798 375 489 901 937 519 797 205 883 247 1186 738 860 967 550 887 743 753 906 582 819 665 1112 231 1009 761 921 634 686 591 1027 646 1161 424 668 413 1190 423 840 381 431 559 455 496 1105 489 848 775 456 637 664 760 412 689 639 752 669 312 940 955 706 726 579 556 655 335 902 755 665 431 1093 627 569 310 647 327 943 354 647 733 979 711 504 443 509 266 833 856 667 603 1101 670 688 898 498 669 1149 601 808 934 718 880 1053 977 556 719 1012 286 665 882 456 623 437 632 475 320 494 672 775 548 678 935 984 464 1188 641 749 816 1191 528 1092 203 770 923 1153 220 929 321 789 350 720 745 694 790 687 669 826 372 1029 392 839 932 462 806 882 539 524 797 1084 516 449 218 1048 638 751 889 448 479 465 633 1123 862 904 383 494 472 1117 365 415 889 765 670 941 341 929 876 575 940 565 967 850 473 1119 632 953 904 815 316 409 364 959 287 848 584 574 998 915 826 558 877 858 376 817 591 1068 443 447 428 1081 823 1122 373 852 598 995 735 1028 313 623 820 981 505 753 529 574 433 699 875 1032 833 1068 765 949 691 1145 358 505 251 617 417 945 694 889 323 1028 986 567 269 605 337 1153 926 590 607 803 202 1101 232 771 855 759 776 1011 878 884 393 636 230 1098 788 1140 447 1076 537 1077 734 724 266 635 232 406 752 628 743 848 537 490 598 913 416 855 640 634 209 1172 329 705 249 881 882 817
The program doesn't present any errors or warnings when compiling, but when I run it, it says that the minimum cost and the maximum cost are 0...
I've been checking everything over and over and can't find what's wrong.
Any ideas?
BTW, I'm using a linux machine to run the program, don't know if it makes a difference...
Compile with the -Wall flag, this will help you to catch errors by yourself.
Using gcc:
% gcc t.c -Wall
t.c:9:1: warning: return type defaults to ‘int’ [-Wreturn-type]
main()
^
t.c: In function ‘main’:
t.c:20:9: warning: format ‘%d’ expects argument of type ‘int *’, but argument 2 has type ‘float *’ [-Wformat=]
scanf("%d", &k);
^
t.c:41:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
Using clang:
% clang t.c -Wall
t.c:9:1: warning: type specifier missing, defaults to 'int' [-Wimplicit-int]
main()
^~~~
t.c:20:21: warning: format specifies type 'int *' but the argument has type 'float *' [-Wformat]
scanf("%d", &k);
~~ ^~
%f
2 warnings generated.
Clang suggests you to replace:
scanf("%d", &k);
to
scanf("%f", &k);
And even if it's not as critical, you forgot to define the return type of the main function. Both compilers have replaced it to int but you should also return something at the end of your program.
Finally, as suggested in the comments, you can also use -Wextra. I would also recommend you, while the projects are small enough and that you are still learning, to respect the "0 warning" policy. That will help you to prevent bugs.
Since k is a float, this is wrong:
scanf("%d", &k);
You need:
if (scanf("%f", &k) != 1)
break;
This uses the correct format and checks for errors. A basic debugging technique is to print out the values you've just read to ensure that the program got what you think it should have gotten.
There are other problems too. This code is redundant:
cost(k,ck,n);
if(cost(k, ck, n) < costmin)
{
costmin = cost(k, ck, n);
}
if(cost(k, ck, n) > costmax)
{
costmax = cost(k, ck, n);
}
You call the function up to 5 times to get the same answer each time. The first call you ignore altogether. You should probably use something like:
float new_cost = cost(k,ck,n);
if (new_cost < costmin)
costmin = new_cost;
if (new_cost > costmax)
costmax = cost_max;
You should also use an explicit return type for main():
int main(void)
Normally, 'passengers' is spelled with one 'a' and two 'e's.
It isn't entirely clear whether the cost() function is written appropriately. It takes one float and two int values and combines them and assigns the result to an int before returning that as a float. As written, it will work. Whether that's what you want is another matter. Since costmin and costmax are of type int, there's another level of uncertainty about what's the best type for these values.
Also, generally avoid trailing blanks in your output. A space before \n is almost always … well, if not wrong, superfluous. I'd go for almost always wrong, though. (But it is good that you end messages with a newline — that's a worse problem than trailing blanks, but prevalent in the world of C on Windows.)
Firstly, I see no reading from file. All your readings are from console (stdin).
Also, you are calling the cost function too many times, and sometimes you take no benefit from it, like here:
}
cost(k,ck,n); //<--
if(cost(k, ck, n) < costmin)
I suggest you replace indicated call with:
float c = cost(k, ck, n);
and then use c for checking/assingments instead of calling cost() all over again.
Also, you are assigning a float value to an int in multiple places:
costmax = cost(k, ck, n);
costmin = cost(k, ck, n);
In some places, you use "%d" in scanf and printf for reading/printing a float. You should use "%f".

Resources