I hope this list is right for asking questions about redis client "hiredis" .
I want to achieve the same thing which I am doing below with redis client . As can be seen redis send 3 different record with one rpush call .
redis 127.0.0.1:6379> rpush test kemal erdem husyin
(integer) 3
redis 127.0.0.1:6379> lrange test 0 -1
1) "kemal"
2) "erdem"
3) "husyin"
In my project I use hiredis an example :
reply = (redisReply*)(redisCommand(c, "RPUSH %s %s" , channelName, message));
But Now I have a big log file which every line is being hold in a buff like char[][];
I need to send each line as different records but also need calling rpush only one time for performance .Would you have a advice for me ?
It would be a bad idea to send a unique command to push more than a few thousands of items. It would saturate the communication buffers, and a large command will block all other concurrent commands due to the single-threaded nature of Redis.
I suggest to build your push commands by batch of small packets of n items (n between 10 and 100), and to group your push commands in a pipeline of m commands (m between 10 and 100).
The algorithm would be something like this:
While there are still lines to read:
New Redis pipeline, i=0
While there are still lines to read and i<m:
Read at most n lines
Build push command for the read lines
Pipeline push command
++i
Flush Redis pipeline, check return status if needed
It will only generate N / (n*m) roundtrips (N being the number of lines in the input file).
To build commands with arbitrary numbers of parameters, you can use the redisAppendCommandArgv function.
Related
I have a question about the file synchronizer, unison. Is there a way to let unison compare two replicas and output the difference (to the standard output or a file) but not execute any updates?
I could do this by just hitting / (skip) for each update possibility in the interactive session, but I have more than a thousand lines of possibilities and lost patience to hold the / key for a long time. Is there a way to skip all automatically?
I don't know if this is a practical solution for you, but if you run Unison in text mode (unison -ui text), you can get an overview of all content to be transmitted by typing an upper-case L at the prompt (when it's ready to reconcile the changes). Perhaps you could automate this using expect?
I'm running 12 different python scripts at the same time to screen for a given criteria out of thousands of data points each hour.
I would like to save the output data to a "master csv file" but figured it would be better to put the data into SQLite3 instead as I would be overwhelmed with csv files.
I am trying to transfer the output straight to SQLite3.
This is my script so far:
symbol = []
with open(r'c:\\Users\\Desktop\\Results.csv') as f:
for line in f:
symbol.append(line.strip())
f.close
path_out = (r'c:\\Users\\Desktop\\Results.csv')
i=0
While you can use a sqlite database concurrently, it might slow your processes down considerably.
Whenever a program wants to write to the database file, the whole database has to be locked:
When SQLite tries to access a file that is locked by another process, the default behavior is to return SQLITE_BUSY.
So you could end up with more than one of your twelve processes waiting for the database to become available because one of them is writing.
Basically, concurrent read/write access is what client/server databases like PostgreSQL are made for. It is not a primary use case for sqlite.
So having the twelve programs each write a separate CSV file and merging them later is probably not such a bad choice, in my opinion. It is much easier than setting up a PostgreSQL server, at any rate.
I am writing output from a simulation to a file using the following code
sprintf(filename, "time_data.dat");
FILE *fp = fopen(filename,"w");
for(i=0;i<ntime;i++){
compute_data();
fprintf(fp, "%d %lf %lf \n", step, time_val ,rho_rms);
}
return;
On my desktop, I get to see the file time_data.dat update every few hours (compute_data() takes a few hundred seconds per time step, with OpenMP on an i7 machine). I have now submitted the job to a cluster node (E5 2650 processor running ubuntu server). I have been waiting for 5 days now, and not a line line has appeared in the file yet. I do
tail -f time_data.dat
to check the output. The simulation will take another couple of weeks to complete. I can't wait for that long to see if my output is good. Is there a way I can probe the OS in the node to flush its buffer without disturbing the computation? If I cancel the job now, I am sure there won't be any output.
Please note that the hard disk to which the output file is being written is one shared using NFS over multiple nodes and the master node. Is this causing any trouble? Is there a temporary file place were the output is actually being written?
PS: I did du -h to find the file showing size 0. I also tried ls -l proc/$ID/fd to confirm the file did open.
You might use lsof or simply ls -l /proc/$(pidof yoursimulation)/fd to check (on the cluster node) that indeed time_data.dat has been opened.
For such long-running programs, I would believe it is worthwhile to consider using:
application checkpointing techniques
persistency of your application data e.g. in some database
design some way to query your app's state (eg use some HTTP server library such as libonion, or at least have some JSONRPC or other service to query something about the state)
It's a web server scenario. Linux is the OS. Different IP Addresses call the same script.
Does Linux start a new process of Perl for every script call or does Perl run the multiple calls interleaved or do the scripts run serially (one after another)?
Sorry, I didn't find an answer within the first few results of Google.
I need to know in order to know how much I have to worry about concurrent database access.
The script itself is NOT using multiple threads, it's a straightforward Perl script.
Update: more sophisticated scheduler or serialiser of comment to answer below, still untested:
#! usr/bin/perl
use Fcntl;
use Time::HiRes;
sysopen(INDICATOR, "indicator", O_RDWR | O_APPEND); # filename hardcoded, must exist
flock(INDICATOR, LOCK_EX);
print INDICATOR $$."\n"; # print a value unique to this script
# in single-process context it's serial anyway
# in multi-process context the pid is unique
seek(INDICATOR,0,0);
my $firstline = <INDICATOR>;
close(INDICATOR);
while("$firstline" ne $$."\n")
{
nanosleep(1000000); # time your script to find a better value
open(INDICATOR, "<", "indicator");
$firstline = <INDICATOR>;
close(INDICATOR);
}
do "transferHandler.pl"; # name of your script to run
sysopen(INDICATOR, "indicator", O_RDWR);
flock(INDICATOR, LOCK_EX);
my #content = <INDICATOR>;
shift #content;
truncate(INDICATOR,0);
seek(INDICATOR,0,0);
foreach $line(#content)
{
print INDICATOR $line;
}
close(INDICATOR);
Edit again: above script would not work if perl runs in a single process and threads (interleaves) scripts itself. Such a scenario is the only one of the 3 asked by me which appears not to be the case based on the answer and feedback I got verbally separately. Above script can be made to work then by changing the unique value to a random number rather than the pid on quick thought however.
It is completely depended on set up of your web server. Does it uses plain CGI, FastCGI, mod_perl? You can set up both of scenarios you've described. In case of FastCGI you can also set up for a script to never exit, but do all its work inside a loop that keeps accepting connections from frontend web server.
Regarding an update to your question, I suggest you to start worrying about concurrent access from very start. Unless you're doing some absolutely personal application and deliberately set up your server to strictly run one copy of your script, pretty much any other site will sometime grow into something that will require 2 or more parallel processing scripts. You will save yourself a lot of headache if you plan this very common task ahead. Even if you only have one serving script, you will need indexing/clean up/whatever done by offline tasks and this will mean concurrent access once again.
If the perl scripts as invoked separately they will result in separate processes. Here is an demo using two scripts:
#master.pl
system('perl ./sleep.pl &');
system('perl ./sleep.pl &');
system('perl ./sleep.pl &');
#sleep.pl
sleep(10);
Then run:
perl tmp.pl & (sleep 1 && ps -A | grep perl)
Hi : Is there a way to create a file which, when read, is generated dynamically ?
I wanted to create 3 versions of the same file (one with 10 lines, one with 100 lines, one with all of the lines). Thus, I don't see any need for these to be static, but rather, it would be best if they were proxies from a head/tail/cat command.
The purpose of this is for unit testing - I want a unit test to run on a small portion of the full input file used in production. However, since the code only runs on full files (its actually a hadoop map/reduce application), I want to provide a truncated version of the whole data set without duplicating information.
UPDATE: An Example
more myActualFile.txt
1
2
3
4
5
more myProxyFile2.txt
1
2
more myProxyFile4.txt
1
2
3
4
etc.... So the proxy files are DIFFERENT named files with content that is dynamically provided by simply getting the first n lines of the main file.
This is hacky, but... One way is to use named pipes, and a looping shell script to generate the content (one per named pipe). This script would look like:
while true; do
(
for $(seq linenr); do echo something; done
) >thenamedpipe;
done
Your script would then read from that named pipe.
Another solution, if you are ready to dig into low level stuff, is FUSE.