multithreads process data from the same file

multithreads process data from the same file - c

can anyone in this forum give an example in C how two threads process data from one textfile.
As an example, I have one textfile that contains a paragraph. I have two threads that will process the data in the said file. One thread will count the number of lines in the paragraph. The second thread will count the numeric characters.
thanks

If you asked in C++ I could give you a code example, but I havent done ANSI C in a very long time so I will give you the design and pseudo code.
Please keep in mind this is really bad pseudo code that is meant to give an example. I'm not questioning WHY you would want to do this. For all I know it could be an excercise with threads or because you "feel like it".
Example 1
int integerCount = 0;
int lineCount = 0;
numericThread()
{
// By flagging the file as readonly you should
// be able to open it as many times as you wish
handle h = openfile ("textfile.txt". readonly);
while (!eof(h)) {
String word = readWord (h);
int outInteger
if (stringToInteger(word, outInteger)) {
++integerCount;
}
}
}
lineThread()
{
// By flagging the file as readonly you should
// be able to open it as many times as you wish
handle h = openfile ("textfile.txt". readonly);
while (!eof(h)) {
String word = readWord (h);
if (word.equals("\n") {
++lineCount ;
}
}
}
If for some reason you aren't able to open the file twice in readonly you will need to maintain a queue for each thread, having the main thread put words into each threads queue. The threads will then pull from the queue.
Example 2
int integerCount = 0;
int lineCount = 0;
queue numericQueue;
queue lineQueue;
numericThread()
{
while (!numericQueue.closed()) {
String word = numericQueue.pop();
int outInteger
if (stringToInteger(word, outInteger)) {
++integerCount;
}
}
}
lineThread()
{
while (!lineQueue.closed()) {
String word = lineQueue.pop();
if (word.equals("\n") {
++lineCount ;
}
}
}
mainThread()
{
handle h = openfile ("textfile.txt". readonly);
while (!eof(h)) {
String word = readWord(h);
numericQueue.push(word);
lineQueue.push(word);
}
numericQueue.close();
lineQueue.close();
}

There are lots of ways to do this. You can make different design decisions depending on how fast or simple or elegant or overengineered you want this to be. One way, as posted by Andrew Finnell is to have each thread open the file and read it completely independently. In theory this isn't great because you are doing expensive IO twice but in practice it's probably fine because the OS has likely cached the contents of whichever read executes first. Double IO is still more expensive than average because it involves a lot of needless system calls, but again in practice it will be irrelevant unless you have a very large file.
Another model of how to do this would be for each thread to have an input queue, or a shared global queue. The main thread reads the file and places each line in turn on the queue(s), and perhaps main doubles as one of your worker threads. This is more complicated because access to the queue(s) must be synchronized, or some lockless queue implementation must be used. In the case of a shared global queue, there is less duplication of data but now the lifecycle of that data is more complicated.
Just to point out how many ways such a simple thing can be done, you could go the overengineering route and make each thread generic. Instead of placing data on the queue(s) you place both data (or pointers to data) and function pointers and let each thread execute the callback. This kind of model might might sense if you plan on adding lots more kind of things to compute but want to limit the number of threads you will use.

I don't think you will see much performance difference in using 2 threads over one. Either way, you don't want both threads to read the file. Read the file first, then pass a COPY of the stream to the methods you want and process both. The threads will not have access to the same stream of data at the same time so you'll need to use 2 copies of the textfile.
P.S. It's possible that depending on the size of the file, you will actually loose performance using 2 threads.

Related

Good practice to hold a file or channel in a class

In the following code, I am trying to make a class which can write something to a log file when asked via a method. Here, I am wondering if this is an idiomatic way for this purpose, or possibly is there a more recommended way, e.g., hold a separate field of file type (for some reason)? In other words, is it pratically no problem even if I hold only a channel type?
class Myclass {
var logfile: channel;
proc init() {
writeln( "creating log.out" );
logfile = openwriter( "log.out" );
}
proc log( x ) {
logfile.writeln( x );
}
}
proc main() {
var a = new borrowed Myclass();
a.log( 10 );
a.log( "orange" );
}

I believe what you're doing here is reasonable. The distinction between files and channels in Chapel is primarily made in support of the language's parallel computing theme, in order to support having multiple tasks access a single logical file simultaneously using distinct channels (views into the file, essentially). In a case like yours, there is a file underlying the channel you've created, but there's no need to explicitly store it if you have no need to interact further with it.
So I believe there is no practical problem to simply storing a channel as you have here.

Keep order details in arrays?

could you please help me.
1) Is it better to keep orders in the EA's arrays rather than querying the system with the Order.. commands in mql4? Keeping data in arrays means that you have to query the system less and that internet reliability may be less of an issue. However, the coding required to keep an accurate order book is quite cumbersome.
2) How do you keep track of orders that is on the same Symbol but has come from two different EA's?
Thank you very much

It depends on your needs and ideas, without that it could be quite difficult to tell anything.
you can keep an array of ticket numbers (or CArrayObj) but need to check that ticket exists before doing other operations (like trail). if you have problems with internet - change vps and do not try to solve it with coding.
Each ea keeps a book of its own deals.
Cannot imagine sence of keeping just numbers of tickets, but maybe it exists. If you need to store some data in addition to what can be achieved from Order...() then use classes or structures, some fields might be filled with osl,tp,oop,lot,magic, symbol etc once and do not call Order.() functions later except OrderProfit(),OrderClosePrice() and OrderCloseTime()-such functions would be called all the time.
Example of how to store data is below: instances of CTrade are added to CArrayObj
#include <Object.mqh>
#include <Arrays\ArrayObj.mqh>
class CTrade : public CObject
{
private:
int m_ticketId;
double m_oop,m_osl,m_otp,m_lot;//OrderOpenPrice() and sl, tp, lot-add more
public:
CTrade(const int ticket){
m_ticketId=ticket;
}
bool isTicketExist(){
if(OrderSelect(m_ticketId,SELECT_BY_TICKET))
return(OrderCloseTime()==0);
else return(false);//or GetLastError()!=4108
}
};
CArrayObj* listOfTrades=NULL;
int OnInit(void){
listOfTrades=new CArrayObj;
}
void OnDeinit(const int reason){
if(CheckPointer(listOfTrades)==POINTER_DYNAMIC)
delete(listOfTrades);
}
void OnTick(){
for(int i=listOfTrades.Total()-1;i>=0;i--){
CTrade *trade=listOfTrades.At(i);
if(!trade.isTicketExist())
{listOfTrades.Delete(i);continue;}
//do trail or what you need
} // - loop over the array when necessary but clean it first
}
listOfTrades.Add(new CTrade(ticket));// - way to add elements to the list

How to define thread safe array?

How can I define a thread safe global array with minimal modifications?
I want like every access to it to be accomplished by using mutex and synchronized block.
Something like this as 'T' will be some type (note that 'sync' keyword is not currently defined AFAIK):
sync Array!(T) syncvar;
And every access to it will be simmilar to this:
Mutex __syncvar_mutex;
//some func scope....
synchronized(__syncvar_mutex) { /* edits 'syncvar' safely */ }

My naive attempt was to do something like this:
import std.typecons : Proxy:
synchronized class Array(T)
{
static import std.array;
private std.array.Array!T data;
mixin Proxy!data;
}
Sadly, it doesn't work because of https://issues.dlang.org/show_bug.cgi?id=14509
Can't say I am very surprised though as automagical handling of multi-threading via hidden mutexes is very unidiomatic in modern D and the very concept of synchronized classes is mostly a relict from D1 times.
You can implement same solution manually, of course, by defining own SharedArray class with all necessary methods and adding locks inside the methods before calling internal private plain Array methods. But I presume you want something that work more out of the box.
Can't invent anything better right here and now (will think about it more) but it is worth noting that in general it is encouraged in D to create data structures designed for handling shared access explicitly instead of just protecting normal data structures with mutexes. And, of course, most encouraged approach is to not shared data at all using message passing instead.
I will update the answer if anything better comes to my mind.

It is fairly easy to make a wrapper around array that will make it thread-safe. However, it is extremely difficult to make a thread-safe array that is not a concurrency bottleneck.
The closest thing that comes to mind is Java's CopyOnWriteArrayList class, but even that is not ideal...

You can wrap the array inside a struct that locks the access to the array when a thread acquires a token and until it releases it.
The wrapper/locker:
acquire(): is called in loop by a thread. As it returns a pointer, the thread knows that it has the token when the method returns a non null value.
release(): is called by a thread after processing the data whose access has been acquired previously.
.
shared struct Locker(T)
{
private:
T t;
size_t token;
public:
shared(T) * acquire()
{
if (token) return null;
else
{
import core.atomic;
atomicOp!"+="(token, 1);
return &t;
}
}
void release()
{
import core.atomic;
atomicOp!"-="(token, 1);
}
}
and a quick test:
alias LockedIntArray = Locker!(size_t[]);
shared LockedIntArray intArr;
void arrayTask(size_t cnt)
{
import core.thread, std.random;
// ensure the desynchronization of this job.
Thread.sleep( dur!"msecs"(uniform(4, 20)));
shared(size_t[])* arr = null;
// wait for the token
while(arr == null) {arr = intArr.acquire;}
*arr ~= cnt;
import std.stdio;
writeln(*arr);
// release the token for the waiting threads
intArr.release;
}
void main(string[] args)
{
import std.parallelism;
foreach(immutable i; 0..16)
{
auto job = task(&arrayTask, i);
job.executeInNewThread();
}
}
With the downside that each block of operation over the array must be surrounded with an acquire/release pair.

You have the right idea. As an array, you need to be able to both edit and retrieve information. I suggest you take a look at the read-write mutex and atomic utilities provided by Phobos. A read operation is fairly simple:
synchronize on mutex.readLock
load (with atomicLoad)
copy the item out of the synchronize block
return the copied item
Writing should be almost exactly the same. Just syncronize on mutex.writeLock and do a cas or atomicOp operation.
Note that this will only work if you copy the elements in the array during a read. If you want to get a reference, you need to do additional synchronization on the element every time you access or modify it.

How to make the program work this way?

So i have a program that does these calculations with numbers. The program is threaded, and the number of threads are specified from the user.
I will give a close example
static void *program_thread(void *thread)
{
bool somevar = true;
if(somevar)
{
work = getwork();
}
dowork(work);
if(condition1 blah blah)
somevar = false; /* disable getwork */
if(condition2)
somevar = true; /* condition was either met or not met, so we request
new work either way */
}
Then with pthreads(and i will skip some code) i do
int main(blah)
{
if (pthread_create(&thr->pth, NULL, program_thread, thread_number)) {
printf("%s","program thread create failed");
return 1;
}
}
Now i will start explaining. The number of threads created are specified from the user, so i do a for loop and create as many threads as i need.
Each thread calls
work = getwork();
Thus getting independant work to do, however the CPU is slow for this kind of job. It tries to compute something by trying 2^32 numbers(which is from 1 to 4 294 967 296)
But my CPU can only do around 3 million numbers per second, and by the time it reaches 4 billion numbers, it's restarted(for new work).
So i then thought of a better method. Instead of each thread getting totally different work, all the threads should get the same work and split the numbers they need to try.
The problem is, that i can't controll what work it get's, so i must fetch
work = getwork();
Before initiating the threads. The question is HOW? Using pthread_create obviously...but then what?

You get more than one way to do it:
split your work package into smaller parts (thus, your getWork returns a new, smaller work)
store your work in a common place, that you access from your thread using a reader-writer pattern
from the pthread API, the 4th parameter is given to your thread, you can do something like the following code :
Work = getWork();
if (pthread_create(&thr->pth, NULL, program_thread, (void*) &work))
...
And your program_thread function would be like that
static void *program_thread(void *pxThread)
{
Work* pWork = (Work*) pxThread;
...
Of course, you need to check the validaty of the pointer and common stuff (in my example, I created it on stack which is most probably a bad idea). Note that your code is givig a thread_number as a pointer, which is usually a bad idea. If you want to have more information transfered to your thread, simply hide it into a structure.
I'm not sure I fully understood your issue, but this could give you some hints most probably. Please note also that when doing multithreading, you need to take into account specific issues like race conditions, concurrent access and more complex lifecycle of objects...

Writing to file with Groovy(Grails) fails for some lines (broken lines)

I am performing some mass writing in a .csv file using Groovy. More specifically, I have a Quartz job that is running and creates some Map messages that get sent to a RabbitMQ queue. The queue is being consumed by 10 consumers and results in producing some lists of Strings. For each element in the List I just write it in a pipe separated .csv file. The actual service that has the method that writes to the .csv file, is a standard (singleton) transactional grails service. When I log the lines to be written, everything's fine, but in the file, some lines are "broken". The way I am writing is:
def writeRowsToFile(List<String> rows, File file) {
rows.each {row->
file.append("${row}\n")
}
}
Initially I was using:
file.withWriterAppend {out->
out.write(row.toString())
out.newLine()
}
and got the same thing as well...
If it was something wrong it would fail for all the lines. Could it be some kind of race condition, concurrency or I don't know what else issue?
Any help will be appreciated.
Thanks

You should be doing it the second way, ie:
def writeRowsToFile(List<String> rows, File file) {
file.withWriterAppend {out->
rows.eachWithIndex { row, idx ->
// It's probably \n chars in your strings
if( row ==~ /.*[\n\r]+.*/ ) {
println "Detected a CRLF char in rows[$idx]"
}
out.writeLine row
}
}
}
However, you say it might be "some kind of race condition"
Are multiple threads writing to the same file?
If not, it is more likely that your row data has \n characters in it

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

multithreads process data from the same file - c

Related

Good practice to hold a file or channel in a class

Keep order details in arrays?

How to define thread safe array?

How to make the program work this way?

Writing to file with Groovy(Grails) fails for some lines (broken lines)

Categories

Resources