Giraph and Cassandra - giraph

Did anybody try to use Giraph and DSE Cassandra?
I try to run but process hangs:
14/10/21 16:38:24 INFO mapred.JobClient: Running job: job_201410211229_0028>
14/10/21 16:38:25 INFO mapred.JobClient: map 80% reduce 0%
Command line is:
dse hadoop jar
/usr/local/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner -D giraph.zkList=SRVITSD03:22181
org.apache.giraph.examples.SimpleShortestPathsComputation -vif
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
-vip /user/hduser/input/tiny_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
/user/rav/giraph/output/shortestpaths -w 4

Ok, solved.
I needed to modify code and add timeout.

Related

Rebalancing rate when new node is added

When a new node is added, we see that it is starting to receive new tablets (in the http://:7000/tablet-servers page) and the system is rebalancing. But the default rate seems low. Are there any knobs to determine this rate?
The rebalance in YugaByte DB is rate limited.
One of the parameters that governs this behavior is the yb-tserver gflag remote_bootstrap_rate_limit_bytes_per_sec which defaults to 256MB/sec and is the maximum transmission rate (inbound + outbound) related to rebalance that any one server (yb-tserver) may do.
To inspect the current setting on a yb-tserver you can try this:
$ curl -s 10.150.0.20:9000/varz | grep remote_bootstrap_rate
--remote_bootstrap_rate_limit_bytes_per_sec=268435456
This particular param can also be changed on the fly without needing a yb-tserver restart. For example to set the rate to 512MB/sec.
bin/yb-ts-cli --server_address=$TSERVER_IP:9100 set_flag --force remote_boostrap_rate_limit_bytes_per_sec 536870912
A second aspect of this is the cluster wide global settings on how many tablet rebalances can happen simultaneously in the system. These are governed by a few yb-master gflags.
$ bin/yb-ts-cli --server_address=$MASTER_IP:7100 set_flag -force load_balancer_max_concurrent_adds 3
$ bin/yb-ts-cli --server_address=$MASTER_IP:7100 set_flag -force load_balancer_max_over_replicated_tablets 3
$ bin/yb-ts-cli --server_address=$MASTER_IP:7100 set_flag -force load_balancer_max_concurrent_tablet_remote_bootstraps 3

Is there a way to stop recording screen with Byzanz?

I use tool called byzanz to record my screen and create gif files.
This is the way I use it:
byzanz-record -d 55 --delay=2 -x 0 -y 0 -w 3940 -h 950 desktop-animation.gif
However, often I can't tell in advance how long the recording will be so it ends up with awkward moments at the end or prematurely ending the recording. Is there a way how to tell byzanz to stop its job, perhaps by sending a signal to it with kill or something?
There seems to be an option that could achieve that :
http://manpages.ubuntu.com/manpages/zesty/man1/byzanz-record.1.html
-e, --exec=COMMAND
Instead of specifying the duration of the animation, execute the
given COMMAND and record until the command exits. This is useful
both for benchmarking and to use more complex ways to stop the
recording, like writing scripts that listen on dbus.
However, in my package manager with the latest byzanz (fedora), --exec doesn't exist.
I think with that option, you could do :
byzanz-record --exec 'sleep 1000000' --delay=2 -x 0 -y 0 -w 3940 -h 950 desktop-animation.gif
and when you want to stop the recording, do : killall sleep
Sidenote: I have opened an issue on redhat bugzilla tracker to update their byzanz-record version : https://bugzilla.redhat.com/show_bug.cgi?id=1531055

Greenplum: Purging database Logs

is there any direct utility available to purge older logs from GP database, If i do it manually it is taking lot of time as there are 100+ segments, i have to go to each server and delete the logs files manually.
Other details: GP version - 4.3.X.X(Software Only Solution)
Cluster Config- 2+10
Thanks
I suggest you create a cron job and use gpssh to do this. For example:
gpssh -f ~/host_list -e 'for i in $(find /data/primary/gpseg*/pg_log/ -name "*.csv" -ctime +60); do rm $i; done'
This will remove files in pg_log on all segments that are over 2 months old. Of course, you should test this and make sure the path to pg_log is correct.

How to set SplitMasterWorker value as false in giraph

I try to execute the giraph custom code from eclipse IDE, and when i try to run the code its showing Exception in thread “main” java.lang.IllegalArgumentException: checkLocalJobRunnerConfiguration: When using LocalJobRunner, must have only one worker since only 1 task at a time!
So i want to set the giraph.SplitMasterWorker=false.How to set it and where to set it?
pass -ca giraph.SplitMasterWorker=false to your application as an argument.
If you are running giraph on a single node cluster, then paste "-ca giraph.SplitMasterWorker=false" would help. However, if you try to run giraph on multi-nodes cluster such as AWS EC2 base on hadoop version 2.x.x, then I definitely recommend to modify the mapred-site.xml file adding parameter such mapred.job.tracker value in it.
giraph.SplitMasterWorker=false is the variable you have to set while calling the giraph runner. This can be passed in as a custom variable under -ca. Also I think you are using -w parameter, if you running on your local machine it should not be more than 1 since there are no slave nodes to work as a worker
E.g. hadoop jar /usr/local/giraph1.0/giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.7.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.ConnectedComponentsComputation -vif org.apache.giraph.io.formats.IntIntNullTextInputFormat -vip -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op -w 5 -ca giraph.SplitMasterWorker=false

Is there a way to ping faster in Busy Box or Tiny Core Linux?

Solution at end of this post.
By default the time is set to one second, and under the usual iputils version of ping there is an option to reduce this number with the -i switch. I need to ping faster, as I have 120 pings in a certain test that needs to be run many times.
I tried modifying the source of ping.c from the busybox source but I don't know much about compiling and I get the error "could not be found libbb.h" and I couldn't find anyone else with a similar error on busybox.
Does anyone know of a way for me to ping faster than 1 per second, I am hoping to go down to 0.1 or 0.05 seconds if at all possible.
Thanks in advance
Solution
In case anyone comes looking for an answer, the solution I came up with was much better. If you write a script to ping with the -c 1 flag, and count the failures yourself you can ping much faster.
Example:
fails=0
for i in `seq 1 20`
do
x=`ping -c 1 192.168.1.1 | grep received | cut -d' ' -f4`
if [ x -eq 0 ]
then
fails=$(($fails+1))
fi
done
echo $fails fails
done
You are correct in that you have to modify the ping.c file. As you have determined, BusyBox ping does not support the -i switch.
What platform are you building this for? A PC, an embedded system?
Option 1:
Modify ping.c from BusyBox and recompile BusyBox. To do this, you would use 'make' in the root of the BusyBox project.
user#linux:~/busybox-1.19.2$ make
Option 2:
It might be easier and more simplistic to leave BusyBox alone and get ping.c from another archive such as iputils. This supports the -i switch and goes as low as 0.2 seconds. To compile ping.c:
user#linux:~/iputils-s20101006$ make ping

Resources