"Vnodes on different data nodes can form a virtual node group to ensure the high availability of the system. The virtual node group is managed using RAFT protocol. "
Hence a vgroup is a group of VNodes with the same number
like vgroup2: Vnode2,vnode2,vnode2
am I right?
yes you are correct .
In TDengine base,vgroup does be a group of VNodes with the same number.
If replica one , then vnode2
If replica three ,then vnode2 ,vnode2,vnode 2
In TDengine 2.0 , we support 2 replication, but in TDengine 3.0 ,we don't support it anymore .
we use three replication with raft to make sure consistency .
Related
Given that YugaByte offers client drivers for Redis and Postgres, I was wondering about performance differences between the two if used in similar fashion.
For example, assume a Postgres table with 2 columns, 1 VARCHAR as the primary key and 1 TEXT column for the data. The only queries executed against this table are INSERT INTO, UPDATE, SELECT val,SELECT EXISTS(...), DELETE FROM all with a WHERE primary_key_constraint = val condition.
Usage is analog to Redis operations SET, GET, EXISTS, DEL.
Does the Postgres driver add overhead to those operations compared to the Redis driver?
These should be in a similar ballpark-- but to be more precise, the performance via the YCQL/YEDIS APIs is expected to be faster than the YSQL API primarily because the client drivers for YCQL/YEDIS are cluster/partitioning aware and can route the query directly to the correct node in the cluster that owns key. In contrast, the vanilla Postgres client drivers, which were designed to talk to a single-instance database historically, are not aware of how the tables are sharded across multiple nodes - and so potentially incur an extra node hop to process the request.
Also, for YSQL, YugaByte DB doesn’t currently special case operations that only related to a single-shard/single-row, but that’s on the near term roadmap. So this gap should be bridged pretty soon.
I'm new to Cassandra and I'm trying to make a basic Cassandra server but I am having difficulties. Through some sheer miracle, I've managed to create a keyspace and some tables. However, whenever I try interacting with the tables, I get the following error:
"Unable to execute CQL script on 'Localhost': not enough replicas available for query at consistency ONE (1 required but only 0 alive)))"
The message lead me to believe I have no active nodes, but I have cassandra.bat (I'm on win10) running in the background and that has allowed me to connect and create keyspaces and tables.
Moreover, when I try doing anything with nodetool, it processes indefinitely (or takes very long time, I'm too impatient to find out but I guessed the former due to my previous assumption).
My keyspace is NetworkTopologyStrategy with 1 datacenter of a replication factor 3 and durable write enabled.
Anybody has any ideas what's wrong?
First, you're specified replication factor equal to 3, although you have only one node. Second - you need to check what datacenter name you did specify in the NetworkTopologyStrategy - you can find it if you execute nodetool status. After that make changes into existing keyspace using command:
ALTER KEYSPACE keyspace_name
WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'datacenter_name' : 1};
P.S. I recommend to watch DS201/210/220 courses on the DataStax Academy - this will give you a good overview of Cassandra, base operations, and data modelling.
What is database clustering? If you allow the same database to be on 2 different servers how do they keep the data between synchronized. And how does this differ from load balancing from a database server perspective?
Database clustering is a bit of an ambiguous term, some vendors consider a cluster having two or more servers share the same storage, some others call a cluster a set of replicated servers.
Replication defines the method by which a set of servers remain synchronized without having to share the storage being able to be geographically disperse, there are two main ways of going about it:
master-master (or multi-master) replication: Any server can update the database. It is usually taken care of by a different module within the database (or a whole different software running on top of them in some cases).
Downside is that it is very hard to do well, and some systems lose ACID properties when in this mode of replication.
Upside is that it is flexible and you can support the failure of any server while still having the database updated.
master-slave replication: There is only a single copy of authoritative data, which is the pushed to the slave servers.
Downside is that it is less fault tolerant, if the master dies, there are no further changes in the slaves.
Upside is that it is easier to do than multi-master and it usually preserve ACID properties.
Load balancing is a different concept, it consists distributing the queries sent to those servers so the load is as evenly distributed as possible. It is usually done at the application layer (or with a connection pool). The only direct relation between replication and load balancing is that you need some replication to be able to load balance, else you'd have a single server.
From SQL Server point of view:
Clustering will give you an active - passive configuration. Meaning in a 2 node cluster, one of them will be the active (serving) and the other one will be passive (waiting to take over when the active node fails). It's a high availability from hardware point of view.
You can have an active-active cluster, but it will require multiple instances of SQL Server running on each node. (i.e. Instance 1 on Node A failing over to Instance 2 on Node B, and instance 1 on Node B failing over to instance 2 on Node A).
Load balancing (at least from SQL Server point of view) does not exists (at least in the same sense of web server load balancing). You can't balance load that way. However, you can split your application to run on some database on server 1 and also run on some database on server 2, etc. This is the primary mean of "load balancing" in SQL world.
Clustering uses shared storage of some kind (a drive cage or a SAN, for example), and puts two database front-ends on it. The front end servers share an IP address and cluster network name that clients use to connect, and they decide between themselves who is currently in charge of serving client requests.
If you're asking about a particular database server, add that to your question and we can add details on their implementation, but at its core, that's what clustering is.
Database Clustering is actually a mode of synchronous replication between two or possibly more nodes with an added functionality of fault tolerance added to your system, and that too in a shared nothing architecture. By shared nothing it means that the individual nodes actually don't share any physical resources like disk or memory.
As far as keeping the data synchronized is concerned, there is a management server to which all the data nodes are connected along with the SQL node to achieve this(talking specifically about MySQL).
Now about the differences: load balancing is just one result that could be achieved through clustering, the others include high availability, scalability and fault tolerance.
I have a question for the DBA's out there: If I scale from a single web/DB server setup to two web/two DB server setup with a load balancer in front of the web servers to route incoming queries evenly... how do solutions like MySQL Cluster work so that a change made to one DB server is immediately known to the other (otherwise, users routed to the other DB server won't see the data or will outdated data), or at least so that the other web server is made aware of the fact that it's reading "dirty data" and it should try again in X seconds so as to get up-to-date data?
Thank you.
TWO ways of doing this.
Active/Active or Active/Passive.
Active/Passive is most prevalent
The data is kept in sync on the passive node.
The cluster is useful configuration in as much as the active node goes down the passive is immediately switched hence no downtime.
The clustering continuously synchronises the 2 nodes in the cluster.
I work with SQL server but I think the basic premise of clustering is the same for mySQL - that is no (or no noticeable) downtime on hardware failure.
EDIT: Additionally the clustering software handles the synchronisation. You don't need to worry. You view the cluster nodes as a virtual directory, which behaves like one server in windows.
here is document explaining this
http://www.sql-server-performance.com/articles/clustering/clustering_intro_p1.aspx
In Windows server clustering (to be distinguished from High Performance Clustering), there is a shared external storage array. The active node takes ownership/control of the storage, and when that node fails, the storage 'fails over' to the previously passive node (which is now the active node). There are also different schemes that allow for independent storage at each node, vs. shared storage. However, these require the application to have enough intelligence to know that it is clustered, and keep the two storage sets in sync.
Clustering is also where a number of nodes handle the workload, this is sometimes called active/active clusters i.e. all the nodes share the workload and are active. This is normally handled by specialist software like Oracle RAC (RAC#Wikipedia) for the Oracle RDBMS database. RAC allows Oracle to scale to very large workloads.
What is database clustering? If you allow the same database to be on 2 different servers how do they keep the data between synchronized. And how does this differ from load balancing from a database server perspective?
Database clustering is a bit of an ambiguous term, some vendors consider a cluster having two or more servers share the same storage, some others call a cluster a set of replicated servers.
Replication defines the method by which a set of servers remain synchronized without having to share the storage being able to be geographically disperse, there are two main ways of going about it:
master-master (or multi-master) replication: Any server can update the database. It is usually taken care of by a different module within the database (or a whole different software running on top of them in some cases).
Downside is that it is very hard to do well, and some systems lose ACID properties when in this mode of replication.
Upside is that it is flexible and you can support the failure of any server while still having the database updated.
master-slave replication: There is only a single copy of authoritative data, which is the pushed to the slave servers.
Downside is that it is less fault tolerant, if the master dies, there are no further changes in the slaves.
Upside is that it is easier to do than multi-master and it usually preserve ACID properties.
Load balancing is a different concept, it consists distributing the queries sent to those servers so the load is as evenly distributed as possible. It is usually done at the application layer (or with a connection pool). The only direct relation between replication and load balancing is that you need some replication to be able to load balance, else you'd have a single server.
From SQL Server point of view:
Clustering will give you an active - passive configuration. Meaning in a 2 node cluster, one of them will be the active (serving) and the other one will be passive (waiting to take over when the active node fails). It's a high availability from hardware point of view.
You can have an active-active cluster, but it will require multiple instances of SQL Server running on each node. (i.e. Instance 1 on Node A failing over to Instance 2 on Node B, and instance 1 on Node B failing over to instance 2 on Node A).
Load balancing (at least from SQL Server point of view) does not exists (at least in the same sense of web server load balancing). You can't balance load that way. However, you can split your application to run on some database on server 1 and also run on some database on server 2, etc. This is the primary mean of "load balancing" in SQL world.
Clustering uses shared storage of some kind (a drive cage or a SAN, for example), and puts two database front-ends on it. The front end servers share an IP address and cluster network name that clients use to connect, and they decide between themselves who is currently in charge of serving client requests.
If you're asking about a particular database server, add that to your question and we can add details on their implementation, but at its core, that's what clustering is.
Database Clustering is actually a mode of synchronous replication between two or possibly more nodes with an added functionality of fault tolerance added to your system, and that too in a shared nothing architecture. By shared nothing it means that the individual nodes actually don't share any physical resources like disk or memory.
As far as keeping the data synchronized is concerned, there is a management server to which all the data nodes are connected along with the SQL node to achieve this(talking specifically about MySQL).
Now about the differences: load balancing is just one result that could be achieved through clustering, the others include high availability, scalability and fault tolerance.