Can check a Array multiple times from different threads - arrays

So i want to check a file if it contains a data. My program is multi-threaded so it won't work as the file can't be accessed at same time and also gives error, is it possible to load it up on string Array and check if that array contains the text i want ?
If i check it from 5-10 different threads at exactly same time will it matter ?
and How can I write a text to a file from all these threads at the same time but it should look if it being used and wait and then write so no error is logged.

... is it possible to load it up on string Array and check if that array contains the text i want ?
Yes. It is straight-forward programming to read a file into an array of strings, and to check if one of the strings in the array contains another string.
If i check it from 5-10 different threads at exactly same time will it matter ?
Yes, it matters. You have to implement the code the right way to ensure that it always works.
Your question is very hard to decipher, but I am guessing that you want the array of strings to be shared between the threads, AND you want the threads to update the array. In that case, proper synchronization is essential, or you are liable to run into race conditions and memory anomalies.
How can I write a text to a file from all these threads at the same time but it should look if it being used and wait and then write so no error is logged.
You need to synchronize properly so that only one thread attempts to write to the file at any one time. In addition, you need to make sure that one thread doesn't attempt to open a stream to the file while another stream has the file open. (That is most likely the cause of your current errors. Java on Windows won't let you do that ... though Java on Linux will allow it.)
I suggest you read the Oracle Java Tutorials on how to write multi-threaded programs.

Related

Conflicts in writing/reading a file

I'm developing a little software in C that reads and writes messages in a notice-board. Every message is a .txt named with a progressive number.
The software is multithreading, with many users that can do concurrent operations.
The operations that a user can do are:
Read the whole notice-board (concatenation of all the .txt file contents)
Add a message (add a file named "id_max++.txt")
Remove a message. When a message is removed there will be a hole in that number (e.g, "1.txt", "2.txt", "4.txt") that will never be filled up.
Now, I'd like to know if there is some I/O problem (*) that I should manage (and how) or the OS (Unix-like) does it all by itself.
(*) such as 2 users that want to read and delete the same file
As you have an Unix-like, OS will take care of deleting a file while it is open by another thread : the directory entry is immediately removed, and the file itself (inode) is deleted on last close.
The only problem I can see is between the directory scan and the open of a file : race conditions could make that the file has been deleted.
IMHO you simply must considere that an error file does not exist is normal, and simply go to next file.
What you describe is not really bad, since it is analog to MH folders for mails, and it can be accessed by many different processes, even if locking is involved. But depending on the load and on the size of the messages, you could considere using a database. Rule of thumb (my opinion) :
few concurrent accesses and big files : keep on using file system
many accesses and small files (several ko max.) : use a database
Of course, you must use a mutex protected routine to find next number when creating a new message (credits should be attributed to #merlin2011 for noticing the problem).
You said in a comment that your specs do not allow a database. On the analogy with mail handling, you could alse use a single file (like traditionnal mail format) :
one single file
each message is preceded with a fixed size header saying whether it is active or deleted
read access need not be synchronized
write accesses must be synchronized
It would be a poor man's database where all synchronization is done by hand, but you have only one file descriptor per thread and save all open and close operations. It makes sense where there are many reads and few writes or deletes
A possible improvement would be (still like mail readers do) to build an index with the offset and status of each message. The index could be on disk or in memory depending on your requirements.
The easier solution is to use a database like sqlite or MySQL, both of which provide transactions that you can use ot achieve consistency. If you still want to go down the route, read on.
The issue is not an IO problem, it's a concurrency problem if you do not implement proper monitors. Consider the following scenario (it is not the only problematic one, but it is one example of one).
User 1 reads the maximum id and stores it in a local variable.
Meanwhile, User 2 reads the same maximum id and stores it in a local variable also.
User 1 writes first, and then User 2 overwrites what User 1 just wrote, because it had the same idea of what the maximum id was.
This particular scenario can be solved by keeping the current maximum id as a variable that is initialized when the program is initialized, and protecting the get_and_increment operation with a lock. However, this is not the only problematic scenario that you will need to reason through if you go with this approach.

Removing bytes from File in (C) without creating new File

I have a file let's log. I need to remove some bytes let's n bytes from starting of file only. Issue is, this file referenced by some other file pointers in other programs and may these pointer write to this file log any time. I can't re-create new file otherwise file-pointer would malfunction(i am not sure about it too).
I tried to google it but all suggestion for only to re-write to new files.
Is there any solution for it?
I can suggest two options:
Ring bufferUse a memory mapped file as your logging medium, and use it as a ring buffer. You will need to manually manage where the last written byte is, and wrap around your ring appropriately as you step over the end of the ring. This way, your logging file stays a constant size, but you can't tail it like a regular file. Instead, you will need to write a special program that knows how to walk the ring buffer when you want to display the log.
Multiple number of small log filesUse some number of smaller log files that you log to, and remove the oldest file as the collection of files grow beyond the size of logs you want to maintain. If the most recent log file is always named the same, you can use the standard tail -F utility to follow the log contents perpetually. To avoid issues of multiple programs manipulating the same file, your logging code can send logs as messages to a single logging daemon.
So... you want to change the file, but you cannot. The reason you cannot is that other programs are using the file. In general terms, you appear to need to:
stop all the other programs messing with the file while you change it -- to chop now unwanted stuff off the front;
inform the other programs that you have changed it -- so they can re-establish their file-pointers.
I guess there must be a mechanism to allow the other programs to change the file without tripping over each other... so perhaps you can extend that ? [If all the other programs are children of the main program, then if the children all O_APPEND, you have a fighting chance of doing this, perhaps with the help of a file-lock or a semaphore (which may already exist ?). But if the programs are this intimately related, then #jxh has other, probably better, suggestions.]
But, if you cannot change the other programs in any way, you appear to be stuck, except...
...perhaps you could try 'sparse' files ? On (recent-ish) Linux (at least) you can fallocate() with FALLOC_FL_PUNCH_HOLE, to remove the stuff you don't want without affecting the other programs file-pointers. Of course, sooner or later the other programs may overflow the file-pointer, but that may be a more theoretical than practical issue.

Weird output to text file using fprintf()

In a simulation program I am trying to print the measures to a text file. The project is a combination of Java, C and C++ but the file I am working with is in C. The code for printing is as follows:
if(sample)
fprintf(MeasureInfo->measuresFile, "%d: %f\n", count++, sample);
This works for part of the output but there are large bulks (about 100 to 1000 measures) of data that are not printed to the text file. Instead I see just a bulk of NULs in Sublime Text and 0-bytes in bless:
436: 0.851661
437: 0.043466
(Really large block of NUL all in one line).210402
751: 0.357543
752: 0.816120
I only worked with part of the code so far and thought it might be a concurrency problem. So I printed out all pids that access the function with getpid() and it gave me different ones (19036, 19037, 19038 for instance). I then tried to use pthread_mutex_lock and pthread_mutex_unlock but it produced the same output.
Another thing I tried was using sleep after every 400 measures. This actually helped but shortened the number of measures the produced by a fourth.
Do you have any idea what the actual problem might be and how to fix it? I am really sorry if this is an answered or easy question but I tried and searched a while and didn't find a solution to this.
I/O and multitasking are a dangerous combination. At the thread level, fprintf is writing into a buffer, in thread-unsafe fashion. The individual formatting components might overwrite each other, and when one call flushes the buffer while another is trying to write, the result is certain to be lost data. It could even cause a buffer overflow and crash right inside to stdio library.
At the process level, the buffers are getting flushed to particular locations in the file. If two processes try to write at the same time, they will clobber each others' results. A process typically doesn't poll the filesystem to check if a file is growing due to an outside influence, then seek to the new end before writing.
If you have a very large project and synchronizing all the I/O to this file isn't an option, consider assigning each thread or process its own file, and merge the files after the fact.
sample looks to be a float or double. You shouldn't compare a floating point variable against a fixed value, as it most likely will never have this value. For floating point variable you should always compare against a delta. Your if clause will probably always be entered.
fprintf shouldn't print NUL though. Can you show the exact output (or part of it)?

Multi-Threading with files

So let's say I have the following code where I open a file, read the contents line by line and then use each line for a function somewhere else and then when I'm done rewind the file.
FILE *file = Open_File();
char line[max];
while (!EndofFile())
{
int length = GetLength(line);
if (length > 0)
{
DoStuffToLine(line)
}
}
rewind(file);
I'm wondering if there is a way to use threads here to add concurrency. Since I'm just reading the file and not writing to it I feel like I don't have to worry about race conditioning. However I'm not sure how to handle the code that's in the while loop because if one thread is looping over the file and another thread is looping over the file at the same time, would they cause each other to skip over lines, make other errors, etc? What's a good way to approach this?
If you're trying to do this to improve read performance, you're going to likely be disappointed since this will almost surely be disk I/O bound. Adding more threads won't help the OS and disk controller fetch data any faster.
However, if you're trying to just process the data in parallel, that's another matter. In that case, I would read the entire file into a memory buffer somewhere, then have your threads process it in parallel. That way you don't have to worry about thread safety with rewinding the file pointer or any other annoying issues like it.
You'll likely still need to use other locking mechanisms for the multithreaded parts of course, depending on exactly what you're doing, but you shouldn't have to worry about what the standard library is going to do when you start accessing a file with multiple threads.
The concurrency adds some race condition problems:
1. The EndofFile() function is evaluated at the start of the loop, it may always happens that this function returns true for two threads, then one thread reaches the end of file and the other thread attempts to read the file.You never know when a thread may be in execution;
2. Same is valid for the GetLength function: when a thread has the length information, the length may change because another thread may read another line;
3. You are reading a file sequentially, even if you rewind it, it may always occur that the current position of the IO pointer is altered by some other thread.
Furthermore, as Telgin pointed out, reading a file is not CPU bound, but I/O bound, so is the system to read the file.You can't improve the performance because you need some locks, and locking to guarantee thread safety just introduces overhead.
I'm not sure that this is the best approach. However, you could read the file. Then store it in two separate objects and read the objects instead of the file. Just make sure to do cleanup afterward.

Program communicating with itself between executions

I want to write a C program that will sample something every second (an extension to screen). I can't do it in a loop since screen waits for the program to terminate every time, and I have to access the previous sample in every execution. Is saving the value in a file really my best bet?
You could use a named pipe (if available), which might allow the data to remain "in flight", i.e. not actually hit disk. Still, the code isn't any simpler, and hitting disk twice a second won't break the bank.
You could also use a named shared memory region (again, if available). That might result in simpler code.
You're losing some portability either way.
Is saving the value in a file really my best bet?
Unless you want to write some complicated client/server model communicating with another instance of the program just for the heck of it. Reading and writing a file is the preferred method.

Resources