RAM usage when using Memgraph - graph-databases

I’m trying to generate graph schema in Memgraph Lab. Dataset is big (868k nodes and 1.6m edges) but 16GB of RAM should be more than enough to handle it. How cam I calculate the approx. RAM usage?

Here I found out you can calculate the approx. RAM usage with the following formula:
StorageRAMUsage = NumberOfNodes * 260B + NumberOfEdges * 180B

Related

Redshift many small nodes vs less numbers of bigger nodes

Recently I have been facing cluster restart(outside maintenance window/arbitrary) in AWS Redshift that has been triggered from AWS end. They are not able to identify what is the exact root cause of this reboot. The error that AWS team captured is "out of object memory".
In the meantime, I am trying to scale up the cluster size to avoid this out of object memory(as a blind try), Currently I am using ds2.xlarge node type but I am not sure which of below I need to increase/choose?
Many smaller nodes (increase number of nodes in ds2.xlarge)
Few larger nodes (change to ds2.8xlarge and have less number but increased capacity)
Anyone faced similar issue in Redshift? Any advise?
Going with the configuration, for better performance in this case you should opt for ds2.8xlarge cluster type.
One ds2.xlarge cluster has 13 gb of RAM and 2 slice to perform your workload as compared with ds2.8xlarge which has 244 gb of RAM and 16 slices to perform your workloads.
Now even if you choose 8 ds2.xlarge nodes you will get max 104 GB memory against 244 GB in one node of ds2.8xlarge.
So you should go with ds2.8xlarge node type for handling memory issue along with large amount of storage

Cassandra insert performance on a single node C APIs

I am trying to profile cassandra on a single node cluster to see how much one node can handle inserts and then add more node as per this result.
I have changed certain parameters in cassandra.yaml. They are as follows.
memtable_offheap_space_in_mb: 4096
memtable_allocation_type: offheap_buffers
concurrent_compactors: 8
compaction_throughput_mb_per_sec: 32768
concurrent_reads: 64
concurrent_writes: 128
concurrent_counter_writes: 128
write_request_timeout_in_ms: 200000
Cassandra node: JVM heap size 12GB
I have added these parameters to the cassandra C++ driver APIs
cass_cluster_set_num_threads_io(cluster, 8);
cass_cluster_set_core_connections_per_host(cluster,8);
cass_cluster_set_write_bytes_high_water_mark(cluster,32768000);
cass_cluster_set_pending_requests_high_water_mark(cluster,16384000);
With these parameters i get a write speed of 13k/sec with data size of 1250 bytes.
I wanted to know am i missing out on anything in terms of parameter tuning to achieve a better performance.
Cassandra DB node details:
VM
CentOs 6
16GB RAM
8 cores. And is running on a separate box from the machine i am pumping data.
Any insight will be highly appreciated.

data capacity for DSE(solr) node

We need data capacity per node based on below System & Cluster information.
Could you please tell us what is (rough) data capacity per (solr) node? (without changing System & node number)
System Information per node (DSE)
CPU: 2CPUs/ 16 cores
Memory: 32 GB
HDD: 1TB
(DSE)Solr Heap Size: 15 GB
DSE Cluster information
Total Node# (all solr nodes): 4
Average Data Size per node: 24GB
Average (solr) index size per node: 11GB
Replication Factor: 2
DSE version: 4.8.3
Is Solr Heap Size (15GB) enough to support Big data (e.g 100GB solr data) during general Solr operation (e.g Query or indexing) ?
P.s: If you have any capacity formula / calculating tool please let me know.

how do I figure out provisional throughput for AWS DynamoDB table?

My system is supposed to write a large amount of data into a DynamoDB table every day. These writes come in bursts, i.e. at certain times each day several different processes have to dump their output data into the same table. Speed of writing is not critical as long as all the daily data gets written before the next dump occurs. I need to figure out the right way of calculating the provisional capacity for my table.
So for simplicity let's assume that I have only one process writing data once a day and it has to write upto X items into the table (each item < 1KB). Is the capacity I would have to specify essentially equal to X / 24 / 3600 writes/second?
Thx
The provisioned capacity is in terms of writes/second. You need to make sure that you can handle the PEAK number of writes/second that you are going to expect, not the average over the day. So, if you have a single process that runs once a day and makes X number of writes, of Y size (in KB, rounded up), over Z number of seconds, your formula would be
capacity = (X * Y) / Z
So, say you had 100K writes over 100 seconds and each write < 1KB, you would need 1000 w/s capacity.
Note that in order to minimize provisioned write capacity needs, it is best to add data into the system on a more continuous basis, so as to reduce peaks in necessary read/write capacity.

Tokyo cabinet - Slower inserts after hitting 1million

I am evaluating tokyo cabinet Table engine. The insert rate slows down considerable after hitting 1 million records. Batch size is 100,000 and is done within transaction. I tried setting the xmsiz but still no use. Has any one faced this problem with tokyo cabinet?
Details
Tokyo cabinet - 1.4.3
Perl bindings - 1.23
OS : Ubuntu 7.10 (VMWare Player on top of Windows XP)
I hit a brick wall around 1 million records per shard as well (sharding on the client side, nothing fancy). I tried various ttserver options and they seemed to make no difference, so I looked at the kernel side and found that
echo 80 > /proc/sys/vm/dirty_ratio
(previous value was 10) gave a big improvement - the following is the total size of the data (on 8 shards, each on its own node) printed every minute:
total: 14238792 records, 27.5881 GB size
total: 14263546 records, 27.6415 GB size
total: 14288997 records, 27.6824 GB size
total: 14309739 records, 27.7144 GB size
total: 14323563 records, 27.7438 GB size
(here I changed the dirty_ratio setting for all shards)
total: 14394007 records, 27.8996 GB size
total: 14486489 records, 28.0758 GB size
total: 14571409 records, 28.2898 GB size
total: 14663636 records, 28.4929 GB size
total: 14802109 records, 28.7366 GB size
So you can see that the improvement was in the order of 7-8 times. Database size was around 4.5GB per node at that point (including indexes) and the nodes have 8GB RAM (so dirty_ratio of 10 meant that the kernel tried to keep less than ca. 800MB dirty).
Next thing I'll try is ext2 (currently: ext3) and noatime and also keeping everything on a ramdisk (that would probably waste twice the amount of memory, but might be worth it).
I just set the cache option and it is now significantly faster.
I think modifying the bnum parameter in the dbtune function will also give a significant speed improvement.

Resources