How to Config ScyllaDB for million rows: run simple query count(*) return error : timeout during read query at consistency - database

I have table with simple table :
create table if not exists keyspace_test.table_test
(
id int,
date text,
val float,
primary key (id, date)
)
with caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
and compaction = {'class': 'SizeTieredCompactionStrategy'}
and compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
and dclocal_read_repair_chance = 0
and speculative_retry = '99.0PERCENTILE'
and read_repair_chance = 1;
After that i import 12 million rows. Than i want to run simple calculation count rows & sum column val. With this query :
SELECT COUNT(*), SUM(val)
FROM keyspace_test.table_test
but show error :
Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)
I am already add USING TIMEOUT 180s; but show error :
Timed out waiting for server response
Configuration server that i use are in 2 location datacenter. Each datacenter has 4 server.
# docker exec -it scylla-120 nodetool status
Datacenter: dc2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.3.192.25 79.04 GB 256 ? 5975a143fec6 Rack1
UN 10.3.192.24 74.2 GB 256 ? 61dc1cfd3e92 Rack1
UN 10.3.192.22 88.21 GB 256 ? 0d24d52d6b0a Rack1
UN 10.3.192.23 63.41 GB 256 ? 962s266518ee Rack1
Datacenter: dc3
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 34.77.78.21 83.5 GB 256 ? 5112f248dd38 Rack1
UN 34.77.78.20 59.87 GB 256 ? e8db897ca33b Rack1
UN 34.77.78.48 81.32 GB 256 ? cb88bd9326db Rack1
UN 34.77.78.47 79.8 GB 256 ? 562a721d4b77 Rack1
Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
And i create keyspace with :
CREATE KEYSPACE keyspace_test WITH replication = { 'class' : 'NetworkTopologyStrategy', 'dc2' : 3, 'dc3' : 3};
How realay config this scylla with million rows data ?

Not sure about SUM, but you could use DSBulk to count the rows in a table.
dsbulk count \
-k keyspace_test \
-t table_test \
-u username \
-p password \
-h 10.3.192.25
DSBulk takes token range ownership into account, so it's not as stressful on the cluster.

A explained in the Scylla documentation (https://docs.scylladb.com/stable/kb/count-all-rows.html) a COUNT requires scanning the entire database, which can take a long time, so using USING TIMEOUT like you did is indeed the right thing.
I don't know whether or not 180 seconds for scanning 12 million rows on your table is a long enough timeout - to be sure you can try increasing this to 3600 seconds and see if it ever finishes, or try a full-table scan (not just a count) so see how fast it progresses to be able to estimate how long a count() might take (a count() should take less time than an actual scan returning data, but not much less - it still does all the same IO).
Also, it is important to note that until very recently, COUNT was implemented inefficiently - it proceeded sequentially instead of utilizing all the shards in the system. This was fixed in https://github.com/scylladb/scylladb/commit/fe65122ccd40a2a3577121aebdb9a5b50deb4a90 - but it only reached Scylla 5.1 (or the master branch), are you using an older version of Scylla? The example in that commit suggests that the new implementation may be as much as 30 times faster as the old one!
So hopefully, on Scylla 5.1 a much lower timeout would be enough for your COUNT operation to finish. On older versions, you can emulate what Scylla 5.1 does manually: Divide the token range into parts, and invoke a partial COUNT on each of these token ranges, all in parallel, and then sum up the results from all the different ranges.

Related

Can snowflake work as an operational data store against which I can write rest APIs

I am researching snowflake database and have a data aggregation use case, where i need to expose the aggregated data via a Rest API. While the data ingestion and aggregation seems to be well defined, is snowflake a system that can be used as an operational data store for servicing high throughput apis?
Or is this an anti pattern for this system
Updating based on your recent comment.
Here's some quick test results I did on large tables we have in production. *Changed the table names for display.
vLookupView records = 175,760,316
vMainView records = 179,035,026
SELECT
LP.REGIONCODE
, SUM(L.VALUE)
FROM DBO.vLookupView AS LP
INNER JOIN DBO.vMainView AS M
ON LP.PK = M.PK
GROUP BY LP.REGIONCODE;
Results:
SQL SERVER
Production box - 2:04 minutes
**Snowflake:**
By Warehouse (compute) size
XS - 17.1 seconds
Small - 9.9 seconds
Medium - 7.1s seconds
Large - 5.4 seconds
Extra Large - 5.4 seconds
When I added a WHERE condition
WHERE L.ENTEREDDATE BETWEEN '1/1/2018' AND '6/1/2018'
the results were:
SQL SERVER
Production box - 5 seconds
**Snowflake:**
By Warehouse (compute) size
XS - 12.1 seconds
Small - 3.9 seconds
Medium - 3.1s seconds
Large - 3.1 seconds
Extra Large - 3.1 seconds

How to distribute data across different nodes in cassandra cluster

I have setup a multinode cassandra cluster with two different nodes with all required configurations i.e cluster_name , endpoint_snitch , seeds , auto_bootstrap etc.
I am using datacenter as dc1 for both nodes. I created keyspace using -
CREATE KEYSPACE dcTest WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'dc1' : 2 };
Now , when I start both nodes and try entering data in database , it creates replica on both nodes. i.e if I create 4 rows in table , it copies all 4 rows on another node also. I want this data to get distributed across nodes. i.e two on one node and two on another.
Is it achieved by configuring keyspace? Am I missing anything?
Nodetoll status -
nodetool -p 7199 status cassandrareplication1
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.45.123.123 35.01 MB 256 50.3% 8c529955-c42a-4629-dfgh-0666a444acbb rack1
UN 10.45.123.124 225.4 KB 256 49.7% eddf1039-d803-4d61-dfse-1ce0ec3782a9 rack1
You should replication factor 1, not 2. This will mean all your data in this keyspace will be replicated once within this datacenter. With 2 as replication factor it will be replicated twice, 3 thrice and so on.
Having 2 Replication means you want 2 copies of your data into datacenter henceforth Cassnadra will put 1 full copy of data on each node to satisfy 2 RF.
to achieve your goal you may want to have 1 RF and 2 nodes so Cassandra can distribute data among nodes.
you can alter keyspace using
Alter KEYSPACE dcTest WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'dc1' : 1 };
Don't forget to run nodetool repair with -full option after that.

Cannot repair specific tables on specific nodes in Cassandra

I'm running 5 nodes in one DC of Cassandra 3.10.
As I'm trying to maintain those nodes I'm running on daily basis on every node
nodetool repair -pr
and weekly
nodetool repair -full
This is only table I have difficulties:
Table: user_tmp
SSTable count: 4
Space used (live): 366.71 MiB
Space used (total): 366.71 MiB
Space used by snapshots (total): 216.87 MiB
Off heap memory used (total): 5.28 MiB
SSTable Compression Ratio: 0.4690289976332873
Number of keys (estimate): 1968368
Memtable cell count: 2353
Memtable data size: 84.98 KiB
Memtable off heap memory used: 0 bytes
Memtable switch count: 1108
Local read count: 62938927
Local read latency: 0.324 ms
Local write count: 62938945
Local write latency: 0.018 ms
Pending flushes: 0
Percent repaired: 76.94
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 4.51 MiB
Bloom filter off heap memory used: 4.51 MiB
Index summary off heap memory used: 717.62 KiB
Compression metadata off heap memory used: 76.96 KiB
Compacted partition minimum bytes: 51
Compacted partition maximum bytes: 654949
Compacted partition mean bytes: 194
Average live cells per slice (last five minutes): 2.503074492537404
Maximum live cells per slice (last five minutes): 179
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
Dropped Mutations: 19 bytes
Percent repaired is never above 80% on this table on this and one more node but on others is above 85%. RF is 3, and strategy is SizeTieredCompactionStrategy
gc_grace_period is on 10days and as I somewhere in that period I'm getting writetimeout on exactly this table but after consumer which got this timeout is immediately replaced with another one everything keep going like nothing happened. Its like one time writetimeout.
My question is: Are you maybe have suggestion for better repair strategy because I'm kind of a noob and every suggest is a big win for me + any other for this table?
Maybe repair -inc instead of repair -pr
The nodetool repair command in Casandra 3.10 defaults to running incremental repair. There have been some major issues with incremental repair and it's currently not recommended by the community to run incremental repair. Please see this article for some great insight into repair and the issues with incremental repair: http://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html
I would recommend, as does many others, to run:
nodetool repair -full -pr
Please be aware that you need to run repair on every node in your cluster. This means that if you run repair on one node per day you can have a max of 7 nodes (since with default gc_grace you should aim to finish repair within 7 days). And you also have to rely on that nothing goes wrong when doing repair since you would have to restart any failing jobs.
This is why tools like Reaper exist. It solves these issues with ease, it automates repair and makes life simpler. Reaper runs scheduled repairs and provides a web interface to make administration easier. I would highly recommend using reaper for routine maintance and nodetool repair for unplanned activities.
Edit: Link http://cassandra-reaper.io/

Learning big data for a real case

I made a database (150GB) to index the Bitcoin blockchain.
table 1 : id, block_height, block_hash : 500 000 lines
table 2 : id, block_height, transaction_hash : 780 millions lines
table 3 : id, transaction_hash, address : 480 millions lines
On a i7#3Ghz, 16GB RAM, Windows 10, SSD SATA3 I tried adding an index on table3.address. The RAM goes to 100% and after 30h there was an I/O error and the index was not created. I tried a select distinct on table3.address, after 86h of my SSD and RAM being at 100% I decided to kill the SQLite process.
What can I do? I'm going toward my custom solution : text files, one text file per address, per transactions, per block. Want to know all the unique address? List the files in the address folder. Want to know what happened in a transaction? Open the file with its hash.txt.

Cassandra row cache not reducing query latency

I have a table with partition key + clustering column, and I enabled row caching = 2000. I gave row caching 20GB by changing the yaml file.
I can see that my hit rate is ~90% (which is good enough for me), but monitoring shows no latency reduction at all (it's even a bit higher after caching). Note that the query that I'm running is to get all rows for a given partition key. Number of rows differ from 10 to 10000.
Any idea why Cassandra row caching is not effective for my case?
Host: EC2 r4.16x, which has 64 cores and 488GB memory.
C* version: 3.11.0
=========nodetool info=========:
ID : 1f05c846-ddd6-4409-8473-10f3c0490279
Gossip active : true
Thrift active : false
Native Transport active: true
Load : 145.95 GiB
Generation No : 1506984130
Uptime (seconds) : 69561
Heap Memory (MB) : 4707.58 / 7987.25
Off Heap Memory (MB) : 680.54
Data Center : us-east
Rack : 1e
Exceptions : 0
Key Cache : entries 671231, size 100 MiB, capacity 100 MiB,
558160978 hits, 566196579 requests, 0.986 recent hit rate, 14400 save
period in seconds
Row Cache : entries 1225796, size 17.8 GiB, capacity 19.53 GiB, 80015143 hits, 86502918 requests, 0.925 recent hit rate, 0 save period in seconds
Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
Chunk Cache : entries 122880, size 480 MiB, capacity 480 MiB, 334907479 misses, 6940124103 requests, 0.952 recent hit rate, 13.407 microseconds miss latency
Percent Repaired : 85.28384276963543%
Token : (invoke with -T/--tokens to see all 256 tokens)

Resources