Unix : script as proxy to a file - file

Hi : Is there a way to create a file which, when read, is generated dynamically ?
I wanted to create 3 versions of the same file (one with 10 lines, one with 100 lines, one with all of the lines). Thus, I don't see any need for these to be static, but rather, it would be best if they were proxies from a head/tail/cat command.
The purpose of this is for unit testing - I want a unit test to run on a small portion of the full input file used in production. However, since the code only runs on full files (its actually a hadoop map/reduce application), I want to provide a truncated version of the whole data set without duplicating information.
UPDATE: An Example
more myActualFile.txt
1
2
3
4
5
more myProxyFile2.txt
1
2
more myProxyFile4.txt
1
2
3
4
etc.... So the proxy files are DIFFERENT named files with content that is dynamically provided by simply getting the first n lines of the main file.

This is hacky, but... One way is to use named pipes, and a looping shell script to generate the content (one per named pipe). This script would look like:
while true; do
(
for $(seq linenr); do echo something; done
) >thenamedpipe;
done
Your script would then read from that named pipe.
Another solution, if you are ready to dig into low level stuff, is FUSE.

Related

Extracting and comparing only a certain column of a file

I need to write a PowerShell script, that allows the user to pass a txt file, that contains the standard information you'd get from a
Get-Process > proc.txt
statement as a parameter and then compare the processes in the file with the currently running ones. I then need to display Id, Name, Starting time and running time for every process, that isn't in the txt file and therefore a new process.
To give you a general idea of how I would approach this: I would
Extract only the names of the processes from the txt file into a variable (v1).
Save only the names of all the currently running processes in a variable (v2).
Compare the 2 variables(v1, v2) and write the processes that are not in the txt file (the new ones) into yet another variable (v3).
Get the process ID, the starting time and the running time for each process name in v3 and output all of that (including name) into the console and in a new file.
First of all, how can I only read the names of the processes from the txt file? I tried to find it on the internet but had no luck.
Secondly how can I save only the new processes in a variable and not all the differences (e.g. processes that are in the file but currently not running).
As far as I know,
Compare-Object
returns all the differences.
Thirdly how can I get the remaining process information I want from all the process names in v3?
And finally how can I then neatly combine ID, starting time, running time and the names from v3 in one file?
I'm pretty much a beginner at PowerShell programming, I'm pretty sure my 4 step approach posted above is most likely wrong and therefore appreciate any help I can get.

TCL Opaque Handle C lib

I am confused with the following code from tcl wiki 1089,
#define TEMPBUFSIZE 256 /* usually enough space! */
char buf[[TEMPBUFSIZE]];
I was curious and I tried to compile the above syntax in gcc & armcc, both fails. I was looking to understand how tcl handle to file pointer mechanism works to solve the chaos on data logging by multiple jobs running in a same folder [log files unique to jobs].
I have multiple tcl scripts running in parallel as LSF Jobs each using a log file.
For example,
Job1 -> log1.txt
Job2 -> log2.txt
(file write in both case is "intermittent" over the entire job execution)
Some of the text which I expect to be part of log1.txt is written to log2.txt and vice versa randomly. I have tried with "fconfigure $fp -buffering none", the behaviour still persists. One important note, all the LSF jobs are submitted from the same folder and if I submit the jobs from individual folder, the log files dont have text write from other job. I would like the jobs to be executed from same folder to reduce space consumption from repeating the resource in different folder.
Question1:
Can anyone advice me how the tcl "handles" is interpreted to a pointer to the memory allocated for the log file? I have mentioned intermitent because of the following, "Tcl maps this string internally to an open file pointer when it is time for the interpreter to do some file I/O against that particular file - wiki 1089"
Question2:
Is there a possiblity that two different "open" can end up having same "file"?
Somewhere along the line, the code has been mangled; it looks like it happened when I converted the syntax from one type of highlighting scheme to another in 2011. Oops! My original content used:
char buf[TEMPBUFSIZE];
and that's what you should use. (I've updated the wiki page to fix this.)

Make a file availabe on all nodes

I'm writing a MPI application that takes a filename as an argument and tries to read from the file using regular C functions. I run this application on several nodes of a cluster by using qsub, which in turn uses mpiexec.
The application runs just fine on a local node where the file is. For this I just call mpiexec directly:
mpiexec -n 4 ~/my_app ~/input_file.txt
But when I submit it with qsub to be run on other nodes of the cluster, the file reading part fails. The application errors at fopen call -- it can't open the file (likely because it's not present).
The question is, how do I make the file available to all nodes? I have looked over qsub manpage and couldn't fine anything relevant.
I guess Vanilla Gorilla doesn't need an answer any more? However, let's consider the case of a pathological system with no parallel file system and a file system available only at one node. There is a way in ROMIO (a very common MPI-IO implementation) to achieve your goal:
how can i transfer file from one proccess to all other with mpi?

Control output from makefile

I'm trying to write a makefile to replace one of the scripts used in building a fairly large application.
The current script compiles one file at a time, and the primary reason for using make is to parallelise the build process. Using make -j 16 I currently get a factor of 4 speedup on our office server.
But what I've lost is some readability of the output. The compilation program for a file bundles up a few bits and pieces of work, including running custom pre-compilers, and running the gcc command. Each of these steps outputs some information, and I would prefer it to buffer the output from the command, and then show the whole lot in one go.
Is it possible to make make do this?
If you upgrade to GNU make 4.0, then you can use the built-in output synchronization feature to get what you want.
If you don't want to upgrade, then you'll have to modify each of your recipes to be wrapped with a small program that manages the output. Or you can set the SHELL variable to something that does it for you. Searching the internet should give you some examples.
A simple way to accomplish this is to send all the log output a to log directory with each file named, say:
log_file_20131104_12013478_b.txt // log_file_<date>_<time>_<sequence letter>.txt
and then simply cat them all together as your last make job in the dependency chain:
cat log_dir/log_file_20131104_12013478_*.txt > log_file_20131104_12013478.txt
With makepp this is the default behaviour as soon as you use -j. All the individual outputs (and entering dir messages) get collected and are output together as soon as the command terminates.

Managing log file size

I have a program which logs its activity.
I want to implement a log file mechanism to keep the log file under a certain size, lets say 10 MB.
The log file itself just holds commands the program executed; those commands are variable length.
Right now, the program runs on a windows environment, but I'm likely to port it to UNIX soon.
I've came up with two methods for managing the log files:
1. Keep multiple files of lower size, and if the new command exceeds the current file length, truncate the oldest file to zero size, and start writing there.
2. Keep a header in the file, which holds metadata regarding the first command in the file, and the next place to write to in the file. Also I think, each command should hold metadata about it's length this way.
My questions are as follows:
In terms of efficiency which of these methods would you use, and why?
Is there a unix command / function to this easily?
Thanks a lot for your help,
Nihil.
On UNIX/Linux platforms there's a logrotate program that manages logfiles. Details can be found for example here:
http://linuxcommand.org/man_pages/logrotate8.html

Resources