How to manage reports/files distribution to different destinations in Unix?

How to manage reports/files distribution to different destinations in Unix? - file

The reporting tools will generate a huge numbers of reports/files in the file system (a Unix directory). There's a list of destinations (email addresses and shared folders) where a different set of reports/files (can have overlap) are required to be distributed at each destinations.
Would like to know if there's a way to efficiently manage this reports delivery using shell scripts so that the maintenance of the list of reports and destinations will not become a mess in future.
It's quite an open ended question, the constraint however is that it should work within the boundaries of managing the reports in a Unix FS.

You could always create a simple text file (report_locations.txt here) with names/locations where reports go i.e.
ReportName1;/home/bob
ReportName2;/home/jim,/home/jill
ReportName3;/home/jill,/home/bob
The report names will always be the first field in this example and locations where the corresponding reports should go follow, delimited by commas (or any other delimiter you like).
Then read that file with a shell script (I like to use for loops for this sort of operation):
#!/usr/bin/ksh93
for REPORT in $(cut -d";" -f1 report_locations.txt)
do
LISTS=$(grep ${REPORT} report_locations.txt | cut -d";" -f2)
for LIST in ${LISTS}
do
DIRS=$(echo ${LIST} | tr ',' '\n')
for DIR in ${DIRS}
do
echo "Copying ${REPORT} to ${DIR}"
cp -f ${REPORT} ${DIR}
done
done
done
The use of for loops may be a bit excessive (I get caught up in them), but it gets the job done.
Not sure this is what you would be looking for, but it is a starting point if anything. Don't hesitate to ask if you need any explanation of the code.

Related

TCL Opaque Handle C lib

I am confused with the following code from tcl wiki 1089,
#define TEMPBUFSIZE 256 /* usually enough space! */
char buf[[TEMPBUFSIZE]];
I was curious and I tried to compile the above syntax in gcc & armcc, both fails. I was looking to understand how tcl handle to file pointer mechanism works to solve the chaos on data logging by multiple jobs running in a same folder [log files unique to jobs].
I have multiple tcl scripts running in parallel as LSF Jobs each using a log file.
For example,
Job1 -> log1.txt
Job2 -> log2.txt
(file write in both case is "intermittent" over the entire job execution)
Some of the text which I expect to be part of log1.txt is written to log2.txt and vice versa randomly. I have tried with "fconfigure $fp -buffering none", the behaviour still persists. One important note, all the LSF jobs are submitted from the same folder and if I submit the jobs from individual folder, the log files dont have text write from other job. I would like the jobs to be executed from same folder to reduce space consumption from repeating the resource in different folder.
Question1:
Can anyone advice me how the tcl "handles" is interpreted to a pointer to the memory allocated for the log file? I have mentioned intermitent because of the following, "Tcl maps this string internally to an open file pointer when it is time for the interpreter to do some file I/O against that particular file - wiki 1089"
Question2:
Is there a possiblity that two different "open" can end up having same "file"?

Somewhere along the line, the code has been mangled; it looks like it happened when I converted the syntax from one type of highlighting scheme to another in 2011. Oops! My original content used:
char buf[TEMPBUFSIZE];
and that's what you should use. (I've updated the wiki page to fix this.)

Does /usr/bin/perl run as separate processes when invoked with one script multiple times?

It's a web server scenario. Linux is the OS. Different IP Addresses call the same script.
Does Linux start a new process of Perl for every script call or does Perl run the multiple calls interleaved or do the scripts run serially (one after another)?
Sorry, I didn't find an answer within the first few results of Google.
I need to know in order to know how much I have to worry about concurrent database access.
The script itself is NOT using multiple threads, it's a straightforward Perl script.
Update: more sophisticated scheduler or serialiser of comment to answer below, still untested:
#! usr/bin/perl
use Fcntl;
use Time::HiRes;
sysopen(INDICATOR, "indicator", O_RDWR | O_APPEND); # filename hardcoded, must exist
flock(INDICATOR, LOCK_EX);
print INDICATOR $$."\n"; # print a value unique to this script
# in single-process context it's serial anyway
# in multi-process context the pid is unique
seek(INDICATOR,0,0);
my $firstline = <INDICATOR>;
close(INDICATOR);
while("$firstline" ne $$."\n")
{
nanosleep(1000000); # time your script to find a better value
open(INDICATOR, "<", "indicator");
$firstline = <INDICATOR>;
close(INDICATOR);
}
do "transferHandler.pl"; # name of your script to run
sysopen(INDICATOR, "indicator", O_RDWR);
flock(INDICATOR, LOCK_EX);
my #content = <INDICATOR>;
shift #content;
truncate(INDICATOR,0);
seek(INDICATOR,0,0);
foreach $line(#content)
{
print INDICATOR $line;
}
close(INDICATOR);
Edit again: above script would not work if perl runs in a single process and threads (interleaves) scripts itself. Such a scenario is the only one of the 3 asked by me which appears not to be the case based on the answer and feedback I got verbally separately. Above script can be made to work then by changing the unique value to a random number rather than the pid on quick thought however.

It is completely depended on set up of your web server. Does it uses plain CGI, FastCGI, mod_perl? You can set up both of scenarios you've described. In case of FastCGI you can also set up for a script to never exit, but do all its work inside a loop that keeps accepting connections from frontend web server.
Regarding an update to your question, I suggest you to start worrying about concurrent access from very start. Unless you're doing some absolutely personal application and deliberately set up your server to strictly run one copy of your script, pretty much any other site will sometime grow into something that will require 2 or more parallel processing scripts. You will save yourself a lot of headache if you plan this very common task ahead. Even if you only have one serving script, you will need indexing/clean up/whatever done by offline tasks and this will mean concurrent access once again.

If the perl scripts as invoked separately they will result in separate processes. Here is an demo using two scripts:
#master.pl
system('perl ./sleep.pl &');
system('perl ./sleep.pl &');
system('perl ./sleep.pl &');
#sleep.pl
sleep(10);
Then run:
perl tmp.pl & (sleep 1 && ps -A | grep perl)

Excel Files, C, and misery

Alright, so, I haven't programmed anything useful in ages - last time I did was a year ago and as you can imagine my knowledge of programming is seriously rusty. (last thing I 'programmed' was a ren'py game over the weekend. One can imagine the limited uses of this. The most advanced C program I wrote was a tic-tac-toe game a year ago. So yeah.)
Anyways, I've been given a job to write a program that takes two Excel files, both of which have a list of items, each associated with an ID. I need to write a program to search both files for IDs and if the IDs match, the program will need to create a new file with the matched IDs and items. This is insanely beyond my limited C capabilities.
If anyone could help, I would seriously appreciate it.
(also, if this is not possible with C, I'll do my best to work with any other languages)

Export the two files to .csv format and write a script to process the two files. For example, in PHP, you have built in csv read/write capabilities.

You can do this with VBA and create a Macro in one of the files which iterates over the cells in your column in file 1 and compares them to cells in file 2 and writes them to a new .xls file if they match.
Dana points out that the VLOOKUP function will do this quite easily.

Install GnuWin32
Output the excel files as text (csv for example)
sort each file with the -u option to remove duplicates if needed
mix and sort the 2 files
count unique IDs with uniq -c
filter out lines with a value of 1 for the count with grep
remove the count leaving the ID and whatever else you need with cut

If you know Java then you can use Apache POI for your project. You can use the examples given on the Apache POI website to accomplish your task.
Apache POI Excel Documentation: http://poi.apache.org/spreadsheet/quick-guide.html

If you absolutely have to do this on xls/xlsx file from a process, you probably need a copy of Excel controlled by COM automation. You can do this in BV/VBA/C#/C++ whatever, some easier than others. Google for 'Excel automation'.
Rgds,
Martin

Not C, but you may be able to cobble something together very quickly using xlsperl.
It has come in handy for me in the past.

Following multiple log files efficiently

I'm intending to create a programme that can permanently follow a large dynamic set of log files to copy their entries over to a database for easier near-realtime statistics. The log files are written by diverse daemons and applications, but the format of them is known so they can be parsed. Some of the daemons write logs into one file per day, like Apache's cronolog that creates files like access.20100928. Those files appear with each new day and may disappear when they're gzipped away the next day.
The target platform is an Ubuntu Server, 64 bit.
What would be the best approach to efficiently reading those log files?
I could think of scripting languages like PHP that either open the files theirselves and read new data or use system tools like tail -f to follow the logs, or other runtimes like Mono. Bash shell scripts probably aren't so well suited for parsing the log lines and inserting them to a database server (MySQL), not to mention an easy configuration of my app.
If my programme will read the log files, I'd think it should stat() the file once in a second or so to get its size and open the file when it's grown. After reading the file (which should hopefully only return complete lines) it could call tell() to get the current position and next time directly seek() to the saved position to continue reading. (These are C function names, but actually I wouldn't want to do that in C. And Mono/.NET or PHP offer similar functions as well.)
Is that constant stat()ing of the files and subsequent opening and closing a problem? How would tail -f do that? Can I keep the files open and be notified about new data with something like select()? Or does it always return at the end of the file?
In case I'm blocked in some kind of select() or external tail, I'd need to interrupt that every 1, 2 minutes to scan for new or deleted files that shall (no longer) be followed. Resuming with tail -f then is probably not very reliable. That should work better with my own saved file positions.
Could I use some kind of inotify (file system notification) for that?

If you want to know how tail -f works, why not look at the source? In a nutshell, you don't need to periodically interrupt or constantly stat() to scan for changes to files or directories. That's what inotify does.

C - Reading multiple files

just had a general question about how to approach a certain problem I'm facing. I'm fairly new to C so bear with me here. Say I have a folder with 1000+ text files, the files are not named in any kind of numbered order, but they are alphabetical. For my problem I have files of stock data, each file is named after the company's respective ticker. I want to write a program that will open each file, read the data find the historical low and compare it to the current price and calculate the percent change, and then print it. Searching and calculating are not a problem, the problem is getting the program to go through and open each file. The only way I can see to attack this is to create a text file containing all of the ticker symbols, having the program read that into an array and then run a loop that first opens the first filename in the array, perform the calculations, print the output, close the file, then loop back around moving to the second element (the next ticker symbol) in the array. This would be fairly simple to set up (I think) but I'd really like to avoid typing out over a thousand file names into a text file. Is there a better way to approach this? Not really asking for code ( unless there is some amazing function in c that will do this for me ;) ), just some advice from more experienced C programmers.
Thanks :)
Edit: This is on Linux, sorry I forgot to metion that!

Under Linux/Unix (BSD, OS X, POSIX, etc.) you can use opendir / readdir to go through the directory structure. No need to generate static files that need to be updated, when the file system has the information you want. If you only want a sub-set of stocks at a given time, then using glob would be quicker, there is also scandir.
I don't know what Win32 (Windows / Platform SDK) functions are called, if you are developing using Visual C++ as your C compiler. Searching MSDN Library should help you.

Assuming you're running on linux...
ls /path/to/text/files > names.txt
is exactly what you want.

opendir(); on linux.
http://linux.die.net/man/3/opendir
Exemple :
http://snippets.dzone.com/posts/show/5734

In pseudo code it would look like this, I cannot define the code as I'm not 100% sure if this is the correct approach...
for each directory entry
scan the filename
extract the ticker name from the filename
open the file
read the data
create a record consisting of the filename, data.....
close the file
add the record to a list/array...
> sort the list/array into alphabetical order based on
the ticker name in the filename...
You could vary it slightly if you wish, scan the filenames in the directory entries and sort them first by building a record with the filenames first, then go back to the start of the list/array and open each one individually reading the data and putting it into the record then....
Hope this helps,
best regards,
Tom.

There are no functions in standard C that have any notion of a "directory". You will need to use some kind of platform-specific function to do this. For some examples, take a look at this post from Cprogrammnig.com.
Personally, I prefer using the opendir()/readdir() approach as shown in the second example. It works natively under Linux and also on Windows if you are using Cygwin.

Approach 1) I would just have a specific directory in which I have ONLY these files containing the ticker data and nothing else. I would then use the C readdir API to list all files in the directory and iterate over each one performing the data processing that you require. Which ticker the file applies to is determined only by the filename.
Pros: Easy to code
Cons: It really depends where the files are stored and where they come from.
Approach 2) Change the file format so the ticker files start with a magic code identifying that this is a ticker file, and a string containing the name. As before use readdir to iterate through all files in the folder and open each file, ensure that the magic number is set and read the ticker name from the file, and process the data as before
Pros: More flexible than before. Filename needn't reflect name of ticker
Cons: Harder to code, file format may be fixed.

but I'd really like to avoid typing out over a thousand file names into a text file. Is there a better way to approach this?
I have solved the exact same problem a while back, albeit for personal uses :)
What I did was to use the OS shell commands to generate a list of those files and redirected the output to a text file and had my program run through them.

On UNIX, there's the handy glob function:
glob_t results;
memset(&results, 0, sizeof(results));
glob("*.txt", 0, NULL, &results);
for (i = 0; i < results.gl_pathc; i++)
printf("%s\n", results.gl_pathv[i]);
globfree(&results);

On Linux or a related system, you could use the fts library. It's designed for traversing file hierarchies: man fts,
or even something as simple as readdir
If on Windows, you can use their Directory Management API's. More specifically, the FindFirstFile function, used with wildcards, in conjunction with FindNextFile

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight