awk read data from file with getline as it's being written

awk read data from file with getline as it's being written - file

I have a script that is running two commands. The first command is writing data to a temp file. The second command is piping to awk while the first command is running in the background. awk, in the second command, needs to read the data from the temp file, but it's parsing its own data faster than data is getting written to the temp file.
Here's an example:
#!/bin/bash
command1 > /tmp/data.txt &
# command1 takes several minutes to run, so start command 2 while it runs in the background
command2 | awk '
/SEARCH/ {
#Matched input so pull next line from temp file
getline temp_line < "/tmp/data.txt"
}
'
This works, unless awk parses the data from command2 so fast that command1 can't keep up with it. I.e. awk is getting an EOF from /tmp/data.txt before command1 has finished writing to it.
I've also tried wrapping some checks around getline, like:
while ((getline temp_line < "/tmp/data.txt") < 0) {
system("sleep 1") # let command1 write more to the temp file
}
# Keep processing now that we have read the next line
But I think once it hits an EOF in the temp file, it stops trying to read from it. Or something like that.
The overall script works as long as command1 writes to the temp file faster than awk tries to read from it. If I put a sleep 10 command between the two commands, then the temp file builds enough buffer and the script produces the output I need. But I may be parsing files much larger than what I've tested on, or the commands might run at different speeds on different systems, etc, so I'd like a safety mechanism to wait for the file until data has been written to it.
Any ideas how I can do this?

I think you'd need to close the file between iterations and read it from the start again back to where you had read it before, something like this (untested);
sleepTime = 0
while ((getline temp_line < "/tmp/data.txt") <= 0) {
close("/tmp/data.txt")
system("sleep " ++sleepTime) # let command1 write more to the temp file
numLines = 0
while (++numLines < prevLines) {
if ( (getline temp_line < "/tmp/data.txt") <= 0 ) {
print "Aaargghhh, my file is gone!" | "cat>&2"
exit
}
}
}
++prevLines
Note that I built in a variable "sleepTime" to have your command sleep longer each time through the loop so if it's taking your tmp file a long time to fill up your 2nd command waits longer for it each iteration. Use that or not as you like.
Using getline in nested loops with system() commands all seems a tad clumsy and error-prone though - I can't help thinking there's probably a better approach but I don't know what off the top of my head.

Related

c code: fprintf print on file less times than expected

I ran 10 copy of the same code (changing only some parameters) simultaneously in 10 different folder (each program working in single core).
In each program I have a for loop (the number of iterations, NumSamples, can be 50000, 500000 or 5000000 (the number of iteration depend on the specific time of execution of a single iteration; A greater number of iteration are performed in quicker cases )). Inside each iteration I compute a certain quantity (a double variable) and then save it on file with (inside the for block):
fprintf(fp_TreeEv, "%f\n", TreeEv);
where TreeEv is the name of the variable computed at each cycle.
To be sure that the code save the variable right after the fprintf command I set the buffer to 0 after the file opening, with:
TreeEv=fopen("data.txt", "a");
setbuf(TreeEv, NULL);
The program ends without error. Also I know that all the NumSamples iterations have been done during the execution (I print a variable initialized with 0 at the beginning of the for loop and that increase by one at each cycle).
When I open the txt file, at the ending of code execution I see that inside the file there are less row (data) than expected (NumSamples), for example 4996187 instead of 5000000
(I've also checked that csv file miss the same amount of data with tr -d -c , < filename.csv | wc -c, as suggested by Barmar)
What could be the source of the problem?
I copy below the for loop of my code(inv_trasf is just a function that generate random numbers):\
char NomeFiletxt [64];
char NomeFilecsv [64];
sprintf(NomeFilecsv, "TreeEV%d.csv", N);
sprintf(NomeFiletxt, "TreeEVCol%d.txt", N);
FILE* fp_TreeEv;
fp_TreeEv=fopen(NomeFilecsv, "a");
FILE* fp_TreeEvCol;
fp_TreeEvCol=fopen(NomeFiletxt, "a");
setbuf(fp_TreeEv, NULL);
setbuf(fp_TreeEvCol, NULL);
for(ciclo=0; ciclo<NumSamples; ciclo++){
sum = 0;
sum_2=0;
k_max_int=0;
for(i=0; i<N; i++){
deg_seq[i]=inv_trasf(k_max_double_round);
sum+=deg_seq[i];
sum_2+=(deg_seq[i]*deg_seq[i]);
if(deg_seq[i]>k_max_int){
k_max_int = deg_seq[i];
}
}
if((sum%2)!=0){
do{
v=rand_int(N);
}while(deg_seq[v]==k_max_int);
deg_seq[v]++;
sum++;
sum_2+=(deg_seq[v]*deg_seq[v]-(deg_seq[v]-1)*(deg_seq[v])-1);
}
TreeEV = ((double)sum_2/sum)-1.;
fprintf(fp_TreeEv, "%f,", TreeEV);
fprintf(fp_TreeEvCol, "%f\n", TreeEV);
CycleCount +=1;
}
fclose(fp_TreeEv);
fclose(fp_TreeEvCol);
Could the problem stay in the ssd that, in a given moment, failed to follow the codes and wasn't able to call back data (for the presence of Null buffer)?
Only codes with greater execution time (on the single cycle iteration) save correctly all the expected data. From man setbuf "If the argument buf is NULL, only the mode is affected; a new buffer will be allocated on the next read or write operation."
[EDIT]
I noticed that inside one of the txt file "corrupted" there is one invalid value:
I checked that it is the only value with a different number of digit; to do so I first removed all the "." from the txt file with tr -d \. < TreeEVCol102400.txt > test.txt and then sorted the new file with sort -g test.txt > t.dat. After that is easy to check at the top/end of t.dat file for values with more/less digit.
I've also checked that it is the only value in the file with at least 2 "." with:
grep -ni '\.*[0-9]*\.[0-9]*\.' TreeEVCol102400.txt
I checked that each corrupted files has just one invalid values of this kind.

When you set 'unbuffered' mode on a FILE, each character might1 be written individually to the underlying device/file. In that case, if mulitple processes are appending to the same file, if both try to write a number at the same time, they might get their digits interleaved, as you show with your "corrupted" value. So setting _IONBF mode on a file is generally not what you want.
1It's actually not possible to completely unbuffer a file -- there will still be disk buffers involved in the OS, and may still be a (small) stdio buffer to deal with some corner cases. But in general, every character might be individually writte to a file.

Use Cat command with C code

I'm using the cat command for a school project.
What i need is to give a txt file as input to my code and then evaluate the output (saved in a txt file).
So far i'm using this in my command line:
cat input_000.txt | ./main > my_output.txt
Where ./main is my C code.
The input_000.txt is structured like this:
0 a a R 3
1 a b L 4
4 c b R 1
ecc...
I have a certain number of lines made of 5 characters (with spaces between them).
How do i get the content of each line in my C code? I've been told so use standard input, but i've always used scanfonly from keyboard input.
Does it still work in this case?
And how should i save my output? I usually use fwrite, but in this case is everything managed by the cat command

That's how pipes works, it sets up so the output of the left-hand side of the pipe will be written to standard input for the right-hand side program.
In short if you can read input from stdin (like you do with plain scanf) then you won't have to do any changes at all.
Redirection works just about the same. Redirecting to a file (>) will make all writes to stdout go to the file. Redirecting from a file (<) will make all reads from stdin come from the file.

You can use getline (or scanf indeed) to read the stdin (fd = 0) and save it in a char* in your C code... Then you only need to write in the stdout (fd = 1) and your > will do the job to write in your file

What you need is something like this inside your function...
FILE *input = fopen("input.txt","rw"); //rw (read-write)
FILE *output= fopen("output.txt","rw"); //rw (read-write)
char inputArray[500];
char outputArray[500];
while(fscanf(input,"%s", inputArray) != EOF){
//read the line and save in 'inputArray'
//you can also use %c to find each caracter, in your case I think it's better...you can //save each caracter in a array position, or something like that
}
while(number of lines you need or the number of lines from your input file){
fprintf(output,"%s\n",output); //this will write the string saved in 'outputArray'
}
If you don't want to use it...then you can give your main.c the input using < and saving the output >
./main.o < input.txt > output.txt
(something like that, its not safer because the terminal could have the settings to use other type of charset...

piping multiple statements together only performs first command

edit half of the problem has been fixed and the question has been edited to reflect that fixing. I still am only seeing this perform the first command in a series of pipes.
I am stuck on the very last thing in my shell that I am making from scratch. I want to be able to parse commands like "finger | tail -n +2 | cut -c1-8 | uniq | sort" and I think that I am close. What I have right now, when given that line as an array of strings, only executes the "finger" part.
Here is the part of code that isn't working as it needs to be, I've narrowed it down to just this for loop and the part after:
mypipe[2];
//start out with stdin
oldinfd = 0;
//run the beginning commands. connect in/out of pipes to eachother.
for(int i = 0; i < max; i++){
pipe(mypipe);
processline(commands[i], oldinfd, mypipe[1]); //doesn't wait on children
close(mypipe[1]);
oldinfd = mypipe[0];
}
//run the final command, put output into stdout
processline(commands[max], oldinfd, 1); //waits on children
...
This loop runs the program and should end up looking like this:
stdin -> [0]pipe[write] -> [read]pipe[write] -> ... ->[read]pipe[1] -> stdout
My processline function takes in a line ('finger') and execs it. for instance, processline("finger", 0, 1) run the finger command without flaws. For the purposes of this question assume that processline works perfectly, what I'm having troubles with is how I'm using the pipes and their write and read ends.
Also with print statements, I have confirmed that the correct words are being sent to the correct parts. (the for loop receives "finger", "tail -n +2", "cut -c1-8", and "uniq", while the commands[max] that the last line receives is "sort").

Minix: Determining size of process table

I would like to iterate through all possible process id's but for this, I would need to know the limit of the process table. How can I find this out?
My idea is to do something like
while (counter < table size)
{
do something
}

I think there is no posix API to get these info directly in C,you need popen() function to call command line to get this kind of info,I do not recommend system() though it is similar with popen.(Display all process using a posix function).
the system max process number can be set/check by command ulimit.
popen("ulimit -a max user processes","r");
or you can check the current max number in current processes ids.(ps aux show all processes,sed find last line ,and awk find process id)
FILE *fp = popen("ps aux|sed -n '$p'|awk '{print $2F}'","r");
if(NULL!=fp)
{
char buff[1024];
fgets(buff,1024,fp);
printf("%s\n",buff);
pclose(fp);
}

C Minishell Command Expansion Printing Gibberish

I'm writing a unix minishell in C, and am at the point where I'm adding command expansion. What I mean by this is that I can nest commands in other commands, for example:
$> echo hello $(echo world! ... $(echo and stuff))
hello world! ... and stuff
I think I have it working mostly, however it isn't marking the end of the expanded string correctly, for example if I do:
$> echo a $(echo b $(echo c))
a b c
$> echo d $(echo e)
d e c
See it prints the c, even though I didn't ask it to. Here is my code:
msh.c - http://pastebin.com/sd6DZYwB
expand.c - http://pastebin.com/uLqvFGPw
I have a more code, but there's a lot of it, and these are the parts that I'm having trouble with at the moment. I'll try to tell you the basic way I'm doing this.
Main is in msh.c, here it gets a line of input from either the commandline or a shellfile, and then calls processline (char *line, int outFD, int waitFlag), where line is the line we just got, outFD is the file descriptor of the output file, and waitFlag tells us whether or not we should wait if we fork. When we call this from main we do it like this:
processline (buffer, 1, 1);
In processline, we allocate a new line:
char expanded_line[EXPANDEDLEN];
We then call expand, in expand.c:
expand(line, expanded_line, EXPANDEDLEN);
In expand, we copy the characters literally from line to expanded_line until we find a $(, which then calls:
static int expCmdOutput(char *orig, char *new, int *oldl_ind, int *newl_ind)
orig is line, and new is expanded line. oldl_ind and newl_ind are the current positions in the line and expanded line, respectively. Then we pipe, and recursively call processline, passing it the nested command(for example, if we had "echo a $(echo b)", we would pass processline "echo b").
This is where I get confused, each time expand is called, is it allocating a new chunk of memory EXPANDEDLEN long? If so, this is bad because I'll run out of stack room really quickly(in the case of a hugely nested commandline input). In expand I insert a null character at the end of the expanded string, so why is it printing past it?
If you guys need any more code, or explanations, just ask. Secondly, I put the code in pastebin because there's a ton of it, and in my experience people don't like it when I fill up several pages with code.

Your problem lies in expCmdOutput. As you already noticed, you do not get NUL terminated strings when reading the output of your child process using read. What you want to do is terminate the string manually, by adding something like
buf[bytes_read] = '\0';
after your call to read in line 29 (expand.c). Sicne you need space for the NUL, you can only read up to BUF_SIZE - 1 bytes then, of course.
You should probably rethink the whole loop you do afterwards, though:
/* READ OUTPUT OF COMMAND FROM READ END OF PIPE, THEN CLOSE READ END */
bytes_read = read(fd[0],buf,BUF_SIZE);
while(bytes_read > 0)
{
bytes_read = read(fd[0], buf, BUF_SIZE);
if (bytes_read == -1) perror("read");
}
close(fd[0]);
If the output of your command is longer than BUF_SIZE, you simply read again to buf, overwriting the output you just read. What you really want here is to allocate memory and append to the end using strcat (or by holding a pointer to the end of your string for efficiency).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight