How to make the program work this way? - c

So i have a program that does these calculations with numbers. The program is threaded, and the number of threads are specified from the user.
I will give a close example
static void *program_thread(void *thread)
{
bool somevar = true;
if(somevar)
{
work = getwork();
}
dowork(work);
if(condition1 blah blah)
somevar = false; /* disable getwork */
if(condition2)
somevar = true; /* condition was either met or not met, so we request
new work either way */
}
Then with pthreads(and i will skip some code) i do
int main(blah)
{
if (pthread_create(&thr->pth, NULL, program_thread, thread_number)) {
printf("%s","program thread create failed");
return 1;
}
}
Now i will start explaining. The number of threads created are specified from the user, so i do a for loop and create as many threads as i need.
Each thread calls
work = getwork();
Thus getting independant work to do, however the CPU is slow for this kind of job. It tries to compute something by trying 2^32 numbers(which is from 1 to 4 294 967 296)
But my CPU can only do around 3 million numbers per second, and by the time it reaches 4 billion numbers, it's restarted(for new work).
So i then thought of a better method. Instead of each thread getting totally different work, all the threads should get the same work and split the numbers they need to try.
The problem is, that i can't controll what work it get's, so i must fetch
work = getwork();
Before initiating the threads. The question is HOW? Using pthread_create obviously...but then what?

You get more than one way to do it:
split your work package into smaller parts (thus, your getWork returns a new, smaller work)
store your work in a common place, that you access from your thread using a reader-writer pattern
from the pthread API, the 4th parameter is given to your thread, you can do something like the following code :
Work = getWork();
if (pthread_create(&thr->pth, NULL, program_thread, (void*) &work))
...
And your program_thread function would be like that
static void *program_thread(void *pxThread)
{
Work* pWork = (Work*) pxThread;
...
Of course, you need to check the validaty of the pointer and common stuff (in my example, I created it on stack which is most probably a bad idea). Note that your code is givig a thread_number as a pointer, which is usually a bad idea. If you want to have more information transfered to your thread, simply hide it into a structure.
I'm not sure I fully understood your issue, but this could give you some hints most probably. Please note also that when doing multithreading, you need to take into account specific issues like race conditions, concurrent access and more complex lifecycle of objects...

Related

Appending values to DataSet in Apache Flink

I am currently writing an (simple) analytisis code to sum time connected powerreadings. With the data being assumingly raw (e.g. disturbances from the measuring device have not been calculated out) I have to account for disturbances by calculation the mean of the first one thousand samples. The calculation of the mean itself is not a problem. I only am unsure of how to generate the appropriate DataSet.
For now it looks about like this:
DataSet<Tupel2<long,double>>Gyrotron_1=ECRH.includeFields('11000000000'); // obviously the line to declare the first gyrotron, continues for the next ten lines, assuming separattion of not occupied space
DataSet<Tupel2<long,double>>Gyrotron_2=ECRH.includeFields('10100000000');
DataSet<Tupel2<long,double>>Gyrotron_3=ECRH.includeFields('10010000000');
DataSet<Tupel2<long,double>>Gyrotron_4=ECRH.includeFields('10001000000');
DataSet<Tupel2<long,double>>Gyrotron_5=ECRH.includeFields('10000100000');
DataSet<Tupel2<long,double>>Gyrotron_6=ECRH.includeFields('10000010000');
DataSet<Tupel2<long,double>>Gyrotron_7=ECRH.includeFields('10000001000');
DataSet<Tupel2<long,double>>Gyrotron_8=ECRH.includeFields('10000000100');
DataSet<Tupel2<long,double>>Gyrotron_9=ECRH.includeFields('10000000010');
DataSet<Tupel2<long,double>>Gyrotron_10=ECRH.includeFields('10000000001');
for (int=1,i<=10;i++) {
DataSet<double> offset=Gyroton_'+i+'.groupBy(1).first(1000).sum()/1000;
}
It's the part in the for-loop I'm unsure of. Does anybody know if it is possible to append values to DataSets and if so how?
In case of doubt, I could always put the values into an array but I do not know if that is the wise thing to do.
This code will not work for many reasons. I'd recommend looking into the fundamentals of Java and the basic data structures and also in Flink.
It's really hard to understand what you actually try to achieve but this is the closest that I came up with
String[] codes = { "11000000000", ..., "10000000001" };
DataSet<Tuple2<Long, Double>> result = env.fromElements();
for (final String code : codes) {
DataSet<Tuple2<Long, Double>> codeResult = ECRH.includeFields(code)
.groupBy(1)
.first(1000)
.sum(0)
.map(sum -> new Tuple2<>(sum.f0, sum.f1 / 1000d));
result = codeResult.union(result);
}
result.print();
But please take the time and understand the basics before delving deeper. I also recommend to use an IDE like IntelliJ that would point to at least 6 issues in your code.

Rewriting cpufreq_frequency_table initialization for legacy cpufreq_driver

long time listener, first time caller.
I've been backporting features from upstream code as recent as 4.12-rc-whatever to a 3.4-base kernel for an older Qualcomm SoC board (apq8064, ridiculous undertaking I know).
Thus far I've been successful in almost every core api, with any compatibility issues solved by creative shims and ducttape, with the exception of cpufreq.
Keep in mind that I'm still using legacy platform drivers and clocking, no dt's or common clock frame work.
My issue begins with the inclusion of stuct cpufreq_frequency_table into struct cpufreq_policy, as part of the move from percpu to per-policy in the api. In 3.13, registering a platform's freq_table becomes more difficult for unique cases, as using cpufreq_frequency_table_get_attr is no longer an option.
In my case, the cpufreq_driver's init is generic, and relies on my platform's scaling driver (acpuclock-krait) to register the freq_table, which is fine for the older api, but becomes incompatible with the per-policy setup. The upstream so I requires the driver to manually initialize policy->freq_table and mine uses both a cpu, and an array of 35 representing the tables in the platform code. As well, it accounts for the 6 different speedbin/pvs values when choosing a table. I'm considering either dropping the "cpu" param from it and using cpumask_copy, and perhaps even combining the two drivers into one and making the clock driver a probe, but yeah, thus far init is a mystery for me. Here is the snippet of my table registration, if anyone can think of something hackable, I'd be eternally grateful...
ifdef CONFIG_CPU_FREQ_MSM
static struct cpufreq_frequency_table.freq_table[NR_CPUS][35];
extern int console_batt_stat;
static void __init cpufreq_table_init(void)
{
int cpu;
int freq_cnt = 0;
for_each_possible_cpu(cpu) {
int i;
/* Construct the freq_table tables from acpu_freq_tbl. */
for (i = 0, freq_cnt = 0; drv.acpu_freq_tbl[i].speed.khz != 0
&& freq_cnt < ARRAY_SIZE(*freq_table)-1; i++) {
if (drv.acpu_freq_tbl[i].use_for_scaling) {
freq_table[cpu][freq_cnt].index = freq_cnt;
freq_table[cpu][freq_cnt].frequency
= drv.acpu_freq_tbl[i].speed.khz;
freq_cnt++;
}
}
/* freq_table not big enough to store all usable freqs. */
BUG_ON(drv.acpu_freq_tbl[i].speed.khz != 0);
freq_table[cpu][freq_cnt].index = freq_cnt;
freq_table[cpu][freq_cnt].frequency = CPUFREQ_TABLE_END;
/* Register table with CPUFreq. */
cpufreq_frequency_table_get_attr(freq_table[cpu], cpu);
}
dev_info(drv.dev, "CPU Frequencies Supported: %d\n", freq_cnt);
}
UPDATE!!! I wanted to update the initial registration BEFORE merging all the core changes back in, and am pretty certain that I've done so. Previously, the array in question referenced a percpu dummy array that looked like this: freq_table[NR_CPUS][35] that required the cpu parameter to be listed as part of the table. I've made some changes here that allows me a percpu setup AND the platform-specific freq management( which cpufreq doesn't need to see), but with a dummy table representing the "index," which cpufreq does need to see. Commit is here, next one fixed obvious mistakes: https://github.com/robcore/machinex/commit/59d7e5307104c2396a2e4c2a5e0b07f950dea10f

How to define thread safe array?

How can I define a thread safe global array with minimal modifications?
I want like every access to it to be accomplished by using mutex and synchronized block.
Something like this as 'T' will be some type (note that 'sync' keyword is not currently defined AFAIK):
sync Array!(T) syncvar;
And every access to it will be simmilar to this:
Mutex __syncvar_mutex;
//some func scope....
synchronized(__syncvar_mutex) { /* edits 'syncvar' safely */ }
My naive attempt was to do something like this:
import std.typecons : Proxy:
synchronized class Array(T)
{
static import std.array;
private std.array.Array!T data;
mixin Proxy!data;
}
Sadly, it doesn't work because of https://issues.dlang.org/show_bug.cgi?id=14509
Can't say I am very surprised though as automagical handling of multi-threading via hidden mutexes is very unidiomatic in modern D and the very concept of synchronized classes is mostly a relict from D1 times.
You can implement same solution manually, of course, by defining own SharedArray class with all necessary methods and adding locks inside the methods before calling internal private plain Array methods. But I presume you want something that work more out of the box.
Can't invent anything better right here and now (will think about it more) but it is worth noting that in general it is encouraged in D to create data structures designed for handling shared access explicitly instead of just protecting normal data structures with mutexes. And, of course, most encouraged approach is to not shared data at all using message passing instead.
I will update the answer if anything better comes to my mind.
It is fairly easy to make a wrapper around array that will make it thread-safe. However, it is extremely difficult to make a thread-safe array that is not a concurrency bottleneck.
The closest thing that comes to mind is Java's CopyOnWriteArrayList class, but even that is not ideal...
You can wrap the array inside a struct that locks the access to the array when a thread acquires a token and until it releases it.
The wrapper/locker:
acquire(): is called in loop by a thread. As it returns a pointer, the thread knows that it has the token when the method returns a non null value.
release(): is called by a thread after processing the data whose access has been acquired previously.
.
shared struct Locker(T)
{
private:
T t;
size_t token;
public:
shared(T) * acquire()
{
if (token) return null;
else
{
import core.atomic;
atomicOp!"+="(token, 1);
return &t;
}
}
void release()
{
import core.atomic;
atomicOp!"-="(token, 1);
}
}
and a quick test:
alias LockedIntArray = Locker!(size_t[]);
shared LockedIntArray intArr;
void arrayTask(size_t cnt)
{
import core.thread, std.random;
// ensure the desynchronization of this job.
Thread.sleep( dur!"msecs"(uniform(4, 20)));
shared(size_t[])* arr = null;
// wait for the token
while(arr == null) {arr = intArr.acquire;}
*arr ~= cnt;
import std.stdio;
writeln(*arr);
// release the token for the waiting threads
intArr.release;
}
void main(string[] args)
{
import std.parallelism;
foreach(immutable i; 0..16)
{
auto job = task(&arrayTask, i);
job.executeInNewThread();
}
}
With the downside that each block of operation over the array must be surrounded with an acquire/release pair.
You have the right idea. As an array, you need to be able to both edit and retrieve information. I suggest you take a look at the read-write mutex and atomic utilities provided by Phobos. A read operation is fairly simple:
synchronize on mutex.readLock
load (with atomicLoad)
copy the item out of the synchronize block
return the copied item
Writing should be almost exactly the same. Just syncronize on mutex.writeLock and do a cas or atomicOp operation.
Note that this will only work if you copy the elements in the array during a read. If you want to get a reference, you need to do additional synchronization on the element every time you access or modify it.

Testing an Algorithms speed. How?

I'm currently testing different algorithms, which determine whether an Integer is a real square or not. During my research I found this question at SOF:
Fastest way to determine if an integer's square root is an integer
I'm compareably new to the Programming scene. When testing the different Algorithms that are presented in the question, I found out that this one
bool istQuadratSimple(int64 x)
{
int32 tst = (int32)sqrt(x);
return tst*tst == x;
}
actually works faster than the one provided by A. Rex in the Question I posted. I've used an NS-Timer object for this testing, printing my results with an NSLog.
My question now is: How is speed-testing done in a professional way? How can I achieve equivalent results to the ones provided in the question I posted above?
The problem with calling just this function in a loop is that everything will be in the cache (both the data and the instructions). You wouldn't measure anything sensible; I wouldn't do that.
Given how small this function is, I would try to look at the generated assembly code of this function and the other one and I would try to reason based on the assembly code (number of instructions and the cost of the individual instructions, for example).
Unfortunately, it only works in trivial / near trivial cases. For example, if the assembly codes are identical then you know there is no difference, you don't need to measure anything. Or if one code is like the other plus additional instructions; in that case you know that the longer one takes longer to execute. And then there are the not so clear cases... :(
(See the update below.)
You can get the assembly with the -S -emit-llvm flags from clang and with the -S flag from gcc.
Hope this help.
UPDATE: Response to Prateek's question in the comment "is there any way to determine the speed of one particular algorithm?"
Yes, it is possible but it gets horribly complicated REALLY quick. Long story short, ignoring the complexity of modern processors and simply accumulating some predefined cost associated with the instructions can lead to very very inaccurate results (the estimate off by a factor of 100, due to the cache and the pipeline, among others). If you try take into consideration the complexity of the modern processors, the hierarchical cache, the pipeline, etc. things get very difficult. See for example Worst Case Execution Time Prediction.
Unless you are in a clear situation (trivial / near trivial case), for example the generated assembly codes are identical or one is like the other plus a few instructions, it is also hard to compare algorithms based on their generated assembly.
However, here a simple function of two lines is shown, and for that, looking at the assembly could help. Hence my answer.
I am not sure if there is any professional way of checking the speed (if there is let me know as well). For the method that you directed to in your question I would probably do something this this in java.
package Programs;
import java.math.BigDecimal;
import java.math.RoundingMode;
public class SquareRootInteger {
public static boolean isPerfectSquare(long n) {
if (n < 0)
return false;
long tst = (long) (Math.sqrt(n) + 0.5);
return tst * tst == n;
}
public static void main(String[] args) {
long iterator = 1;
int precision = 10;
long startTime = System.nanoTime(); //Getting systems time before calling the isPerfectSquare method repeatedly
while (iterator < 1000000000) {
isPerfectSquare(iterator);
iterator++;
}
long endTime = System.nanoTime(); // Getting system time after the 1000000000 executions of isPerfectSquare method
long duration = endTime - startTime;
BigDecimal dur = new BigDecimal(duration);
BigDecimal iter = new BigDecimal(iterator);
System.out.println("Speed "
+ dur.divide(iter, precision, RoundingMode.HALF_UP).toString()
+ " nano secs"); // Getting average time taken for 1 execution of method.
}
}
You can check your method in similar fashion and check which one outperforms other.
Record the time value before your massive calculation and the value after that. The difference is the time executed.
Write a shell script where you will run the program. And run 'time ./xxx.sh' to get it's running time.

multithreads process data from the same file

can anyone in this forum give an example in C how two threads process data from one textfile.
As an example, I have one textfile that contains a paragraph. I have two threads that will process the data in the said file. One thread will count the number of lines in the paragraph. The second thread will count the numeric characters.
thanks
If you asked in C++ I could give you a code example, but I havent done ANSI C in a very long time so I will give you the design and pseudo code.
Please keep in mind this is really bad pseudo code that is meant to give an example. I'm not questioning WHY you would want to do this. For all I know it could be an excercise with threads or because you "feel like it".
Example 1
int integerCount = 0;
int lineCount = 0;
numericThread()
{
// By flagging the file as readonly you should
// be able to open it as many times as you wish
handle h = openfile ("textfile.txt". readonly);
while (!eof(h)) {
String word = readWord (h);
int outInteger
if (stringToInteger(word, outInteger)) {
++integerCount;
}
}
}
lineThread()
{
// By flagging the file as readonly you should
// be able to open it as many times as you wish
handle h = openfile ("textfile.txt". readonly);
while (!eof(h)) {
String word = readWord (h);
if (word.equals("\n") {
++lineCount ;
}
}
}
If for some reason you aren't able to open the file twice in readonly you will need to maintain a queue for each thread, having the main thread put words into each threads queue. The threads will then pull from the queue.
Example 2
int integerCount = 0;
int lineCount = 0;
queue numericQueue;
queue lineQueue;
numericThread()
{
while (!numericQueue.closed()) {
String word = numericQueue.pop();
int outInteger
if (stringToInteger(word, outInteger)) {
++integerCount;
}
}
}
lineThread()
{
while (!lineQueue.closed()) {
String word = lineQueue.pop();
if (word.equals("\n") {
++lineCount ;
}
}
}
mainThread()
{
handle h = openfile ("textfile.txt". readonly);
while (!eof(h)) {
String word = readWord(h);
numericQueue.push(word);
lineQueue.push(word);
}
numericQueue.close();
lineQueue.close();
}
There are lots of ways to do this. You can make different design decisions depending on how fast or simple or elegant or overengineered you want this to be. One way, as posted by Andrew Finnell is to have each thread open the file and read it completely independently. In theory this isn't great because you are doing expensive IO twice but in practice it's probably fine because the OS has likely cached the contents of whichever read executes first. Double IO is still more expensive than average because it involves a lot of needless system calls, but again in practice it will be irrelevant unless you have a very large file.
Another model of how to do this would be for each thread to have an input queue, or a shared global queue. The main thread reads the file and places each line in turn on the queue(s), and perhaps main doubles as one of your worker threads. This is more complicated because access to the queue(s) must be synchronized, or some lockless queue implementation must be used. In the case of a shared global queue, there is less duplication of data but now the lifecycle of that data is more complicated.
Just to point out how many ways such a simple thing can be done, you could go the overengineering route and make each thread generic. Instead of placing data on the queue(s) you place both data (or pointers to data) and function pointers and let each thread execute the callback. This kind of model might might sense if you plan on adding lots more kind of things to compute but want to limit the number of threads you will use.
I don't think you will see much performance difference in using 2 threads over one. Either way, you don't want both threads to read the file. Read the file first, then pass a COPY of the stream to the methods you want and process both. The threads will not have access to the same stream of data at the same time so you'll need to use 2 copies of the textfile.
P.S. It's possible that depending on the size of the file, you will actually loose performance using 2 threads.

Resources