Cannot benchmark DynamoDb using YCSB - database

I need to use YCSB to benchmark DynamoDB and trying to use YCSB for the first time.
The dynamo provisioned throughput is 100 RCU and 50 WCUs. Following is the command I am executing:
./bin/ycsb load dynamodb -P dynamodb-binding/conf/dynamodb.properties -P workloads/workloada -threads 1 -target 40
The properties file has the endpoint (us-east-1), aws credentials etc. defined. I can run the ycsb shell with inserts:
./bin/ycsb shell dynamo
The table schema has only 1 field which is named: partition_key. Since dynamo is schemaless any column could be added by YCSB and should not be a problem.
But when I try to perform a load I get the following errors:
./bin/ycsb load dynamodb -P dynamodb-binding/conf/dynamodb.properties -P workloads/workloada -threads 1 -target 40
java -cp /opt/ycsb-0.12.0/dynamodb-binding/conf:/opt/ycsb-0.12.0/conf:/opt/ycsb-0.12.0/lib/core-0.12.0.jar:/opt/ycsb-0.12.0/lib/HdrHistogram-2.1.4.jar:/opt/ycsb-0.12.0/lib/htrace-core4-4.1.0-incubating.jar:/opt/ycsb-0.12.0/lib/jackson-core-asl-1.9.4.jar:/opt/ycsb-0.12.0/lib/jackson-mapper-asl-1.9.4.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-api-gateway-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-autoscaling-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-cloudformation-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-cloudfront-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-cloudhsm-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-cloudsearch-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-cloudtrail-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-cloudwatch-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-cloudwatchmetrics-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-codecommit-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-codedeploy-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-codepipeline-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-cognitoidentity-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-cognitosync-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-config-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-core-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-datapipeline-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-devicefarm-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-directconnect-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-directory-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-dynamodb-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-ec2-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-ecr-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-ecs-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-efs-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-elasticache-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-elasticbeanstalk-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-elasticloadbalancing-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-elasticsearch-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-elastictranscoder-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-emr-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-events-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-glacier-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-iam-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-importexport-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-inspector-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-iot-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-kinesis-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-kms-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-lambda-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-logs-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-machinelearning-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-marketplacecommerceanalytics-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-opsworks-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-rds-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-redshift-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-route53-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-s3-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-ses-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-simpledb-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-simpleworkflow-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-sns-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-sqs-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-ssm-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-storagegateway-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-sts-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-support-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-swf-libraries-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-waf-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/aws-java-sdk-workspaces-1.10.48.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/commons-codec-1.6.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/commons-logging-1.1.3.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/dynamodb-binding-0.12.0.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/httpclient-4.3.6.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/httpcore-4.3.3.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/jackson-annotations-2.5.0.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/jackson-core-2.5.3.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/jackson-databind-2.5.3.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/joda-time-2.8.1.jar:/opt/ycsb-0.12.0/dynamodb-binding/lib/log4j-1.2.17.jar com.yahoo.ycsb.Client -db com.yahoo.ycsb.db.DynamoDBClient -P dynamodb-binding/conf/dynamodb.properties -P workloads/workloada -threads 1 -target 40 -load
YCSB Client 0.12.0
Command line: -db com.yahoo.ycsb.db.DynamoDBClient -P dynamodb-binding/conf/dynamodb.properties -P workloads/workloada -threads 1 -target 40 -load
Loading workload...
Starting test.
0 [Thread-1] INFO com.yahoo.ycsb.db.DynamoDBClient -dynamodb connection created with http://dynamodb.us-east-1.amazonaws.com
DBWrapper: report latency for each error is false and specific error codes to track for latency are: []
435 [Thread-1] ERROR com.yahoo.ycsb.db.DynamoDBClient -com.amazonaws.AmazonServiceException: One or more parameter values were invalid: Missing the key partition_key in the item (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ValidationException; Request ID: BOJ5PRDH5N2H40TDH04ERN47BBVV4KQNSO5AEMVJF66Q9ASUAAJG)
Error inserting, not retrying any more. number of attempts: 1Insertion Retry Limit: 0
[OVERALL], RunTime(ms), 934.0
[OVERALL], Throughput(ops/sec), 0.0
[TOTAL_GCS_PS_Scavenge], Count, 1.0
[TOTAL_GC_TIME_PS_Scavenge], Time(ms), 8.0
[TOTAL_GC_TIME_%PS_Scavenge], Time(%), 0.8565310492505354
[TOTAL_GCS_PS_MarkSweep], Count, 0.0
[TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 0.0
[TOTAL_GC_TIME%PS_MarkSweep], Time(%), 0.0
[TOTAL_GCs], Count, 1.0
[TOTAL_GC_TIME], Time(ms), 8.0
[TOTAL_GC_TIME%], Time(%), 0.8565310492505354
[CLEANUP], Operations, 1.0
[CLEANUP], AverageLatency(us), 1.0
[CLEANUP], MinLatency(us), 1.0
[CLEANUP], MaxLatency(us), 1.0
[CLEANUP], 95thPercentileLatency(us), 1.0
[CLEANUP], 99thPercentileLatency(us), 1.0
[INSERT], Operations, 0.0
[INSERT], AverageLatency(us), NaN
[INSERT], MinLatency(us), 9.223372036854776E18
[INSERT], MaxLatency(us), 0.0
[INSERT], 95thPercentileLatency(us), 0.0
[INSERT], 99thPercentileLatency(us), 0.0
[INSERT], Return=ERROR, 1
[INSERT-FAILED], Operations, 1.0
[INSERT-FAILED], AverageLatency(us), 428928.0
[INSERT-FAILED], MinLatency(us), 428800.0
[INSERT-FAILED], MaxLatency(us), 429055.0
[INSERT-FAILED], 95thPercentileLatency(us), 429055.0
[INSERT-FAILED], 99thPercentileLatency(us), 429055.0
When we load data by YCSB workloada, what kind of data is loaded in the DB (basically what is the source of that data). Could anyone please guide me to understand as to what I am missing?
~Thanks

I solved this. It was due to the fact that I chose the name of the partition key of dynamo as "partition_key". I changed it to something else and it worked fine.
Thanks.

Related

pyflink.fn_execution.beam.beam_boot doesnt close after its job cancelled

root 64994 14.0 0.0 8099944 85472 ? Sl 16:30 0:03 /bin/python3 -m pyflink.fn_execution.beam.beam_boot --id=5-1 --provision_endpoint=localhost:10514
root 64998 0.0 0.0 108060 684 ? S 16:30 0:00 tee /tmp/python-dist-6b89369d-ba23-4e5d-83d8-7a54dd9a3497/flink-python-udf-boot.log
This python process keeps running after cancelling its flink job. What can I do instead of killing it manually?

i want to get the server name through Process from os level itself without connecting to DB

sybase 1215 30224 0 20:44 pts/3 00:00:00 grep dataserver
sybase 6138 6137 0 Feb04 ? 00:28:10 /u01/sybase/ASE15_0/ASE-15_0/bin/dataserver -d/u01/sybase/ASE15_0/data/aashish1_master.dat -e/u01/sybase/ASE15_0/ASE-15_0/install/aashish1.log -c/u01/sybase/ASE15_0/ASE-15_0/aashish1.cfg -M/u01/sybase/ASE15_0/ASE-15_0 -s**aashish1**
sybase 7671 1 0 Jan27 ? 00:55:50 /u01/sybase/ASE15_0/ASE-15_0/bin/dataserver -s**chaitu** -d/u01/sybase/ASE15_0/data/chaitu_master.dat -e/u01/sybase/ASE15_0/ASE-15_0/install/chaitu.log -c/u01/sybase/ASE15_0/ASE-15_0/chaitu.cfg -M/u01/sybase/ASE15_0/ASE-15_0
sybase 29479 29478 0 17:28 ? 00:00:33 /u01/sybase/ASE15_0/ASE-15_0/bin/dataserver -d/u01/sybase/ASE15_0/data/asdfg_master.dat -e/u01/sybase/ASE15_0/ASE-15_0/install/asdfg.log -c/u01/sybase/ASE15_0/ASE-15_0/asdfg.cfg -M/u01/sybase/ASE15_0/ASE-15_0 -s**asdfg** -psa
sybase 29617 29616 0 17:48 ? 00:00:33 /u01/sybase/ASE15_0/ASE-15_0/bin/dataserver -d/u01/sybase/ASE15_0/data/parbat.dat -e/u01/sybase/ASE15_0/ASE-15_0/install/parbat.log -c/u01/sybase/ASE15_0/ASE-15_0/parbat.cfg -M/u01/sybase/ASE15_0/ASE-15_0 -s**parbat**
sybase 29789 29788 0 17:57 ? 00:00:28 /u01/sybase/ASE15_0/ASE-15_0/bin/dataserver -d/u01/sybase/ASE15_0/data/ab123_master.dat -e/u01/sybase/ASE15_0/ASE-15_0/install/ab123.log -c/u01/sybase/ASE15_0/ASE-15_0/ab123.cfg -M/u01/sybase/ASE15_0/ASE-15_0 -s**ab123** -psa
[sybase#linuxerp scripts]$
I want to get the dataserver name from OS level itself without connecting to the Database.
ps -ef | grep dataserver
will get the server running or not
I tried to keep the output in a file and used grep -v on the file
Since the server name was not in exactly position, it is difficult to get the servername .
There are a couple of ways you can grab that information. One would be to pipe the grep output and use a regular expression:
ps -ef | grep dataserver | grep -oh '\-s[[:alnum:]]*' which should output something like this:
-saashish1
-schaitu
-sasdfg
-sparbat
-sab123
Another would be to use the showservers utility that comes installed with ASE, which outputs very similar to ps -ef but with CPU & Memory information as well as including other database servers such as the backup server, xp server, etc.
%> showserver
USER PID %CPU %MEM SZ RSS TT STAT START TIME COMMAND
user114276 0.0 1.7 712 1000 ? S Apr 5514:05 dataserver -d greensrv.dat -sgreensrv -einstall/greensrv+_errorlog
sybase 1071 0.0 1.4 408 820 ? S Mar 28895:38 /usr/local/sybase/bin/dataserver -d/dev/rsd1f -e/install/errorlog
user128493 0.0 0.0 3692 0 ? IW Apr 1 0:10 backupserver -SSYB_BACKUP -e/install/backup.log -Iinterfaces -Mbin/sybmultbuf -Lus_english -Jiso_1
And then pipe that into the same grep to get the information you are trying to find.
If you want to cut the -s off the front, to just get the servername itself, then you can pipe that into tr or cut.
Using tr you can tell it to delete -s from each line:
| tr -d '\-s'
Using cut you can tell it to print everything from the 3rd character to the end of the word:
| cut -c3-
Both of these will output your server names like this:
aashish1
chaitu
asdfg
parbat
ab123
Check this Question for information on using grep to grab single words.

How to ignore timeouts in ab (apache bench)?

I run benchmarks with apache bench on a web service. I know that 1-2 requests from the test will be timeouted during measurement (it's a web framework issue). And when timeout occurs ab quits with the message apr_pollset_poll: The timeout specified has expired (70007) and does not show results. I want to get measurement results ignoring these timeouted tests (or count them too, but just use timeout value as response time). Is it possible with ab?
EDIT: The command I use is
ab -n 1000 -c 10 http://localhost:80
I looked into ab source and from what I saw it's impossible to ignore these errors. Maybe there is a fork which implements such feature?
The default timeout is 30 seconds. You can change this with -s:
ab -s 9999 -n 1000 -c 10 http://localhost:80

WAL Archive hangs in postgres when gzip is used

I have enabled the WAL Archiving with following archive command:
wal_keep_segments = 32
archive_mode = on
archive_command = 'gzip < %p > /mnt/nfs/archive/%f'
and on Slave I have restore command as:
restore_command = 'gunzip < /mnt/nfs/archive/%f > %p'
archive_cleanup_command = '/opt/PostgreSQL/9.4/bin/pg_archivecleanup -d /mnt/nfs/archive %r'
on Master I could see that many files are stuck. around 327 files are yet to be archived. Ideally it should be only 32 only.
the px command shows:
-bash-4.1$ ps x
PID TTY STAT TIME COMMAND
3302 ? S 0:00 /opt/PostgreSQL/9.4/bin/postgres -D /opt/PostgreSQL/9.4/data
3304 ? Ss 0:00 postgres: logger process
3306 ? Ss 0:09 postgres: checkpointer process
3307 ? Ss 0:00 postgres: writer process
3308 ? Ss 0:06 postgres: wal writer process
3309 ? Ss 0:00 postgres: autovacuum launcher process
3311 ? Ss 0:00 postgres: stats collector process
3582 ? S 0:00 sshd: postgres#pts/1
3583 pts/1 Ss 0:00 -bash
3628 ? Ss 0:00 postgres: archiver process archiving 000000010000002D000000CB
3673 ? S 0:00 sh -c gzip < pg_xlog/000000010000002D000000CB > /mnt/nfs/archive/000000010000002D000000CB
3674 ? D 0:00 gzip
3682 ? S 0:00 sshd: postgres#pts/0
3683 pts/0 Ss 0:00 -bash
4070 ? Ss 0:00 postgres: postgres postgres ::1(34561) idle
4074 ? Ss 0:00 postgres: postgres sorriso ::1(34562) idle
4172 pts/0 S+ 0:00 vi postgresql.conf
4192 pts/1 R+ 0:00 ps x
-bash-4.1$ ls | wc -l
327
-bash-4.1$
gzip and gunzip without flags expect to work with files, compressing or uncompressing them in-place. You're trying to use them as stream processors. That's not going to work.
You want to use gzip -c and zcat (or gunzip -c) to tell them to use stdio.
Additionally, though, you should probably use a simple script as the archive_command that:
Writes with gzip -c to a temp file
Moves the temp file to the final location with mv
This ensures that the file is not read by the replica until it's fully written by the master.
Also, unless the master and replica are sharing the same network file system (or are both on the same host), you might actually need to use scp or similar to transfer the archive files. The restore_command uses paths on the replica, not on the master, so unless the replica server can access the WAL archive via NFS/CIFS/etc, you're going to need to copy the files.

Issue While Loop Apache Status

Do you know why this loop returns directory listing results??
#!/bin/bash
/usr/sbin/httpd fullstatus | while read line
do
echo $line
done
71-0 - 0/0/410 . 7.74 47987 0 0.0 0.00 0.76 127.0.0.1
OPTIONS = bin boot dev error_log etc home lib lib64 lost+found media mnt nohup.out opt proc root sbin selinux srv sys test tmp usr var HTTP/1.0
72-0 - 0/0/103 . 0.14 48912 0 0.0 0.00 0.13 127.0.0.1
OPTIONS = bin boot dev error_log etc home lib lib64 lost+found media mnt nohup.out opt proc root sbin selinux srv sys test tmp usr var HTTP/1.0
It should returns only apache status.
71-0 - 0/0/410 . 7.74 48231 0 0.0 0.00 0.76 127.0.0.1
OPTIONS * HTTP/1.0
72-0 - 0/0/103 . 0.14 49157 0 0.0 0.00 0.13 127.0.0.1
OPTIONS * HTTP/1.0
Thanks
Because of this line on output
OPTIONS * HTTP/1.0
shell expands "*" used in "echo" arguments as list of files in current directory. This script's output will differ depending on $CWD of calling shell.
Go to any chosen directory, and type "echo *" command.
Because * gets interpreted.
Remember:
ALWAYS QUOTE YOUR VARIABLES
In this case:
echo "$line"
Piotr already answered the question. Just some addition. I would suggest not to pipe anything to while! It will create another bash process which is wasting resources and You will face to problems if you define a variable inside the while loop and you want to use it outside the loop. I may suggest to use some other solution, like:
#!/bin/bash
while read line; do
echo "$line"
done < <(/usr/sbin/httpd fullstatus)

Resources