Regarding the burden on Snowflake's database storage layer - database

Snowflake has an architecture consisting of the following three layers.
・Database storage
・Query processing
・Cloud service
I understand that it is possible to create a warehouse for each process in query processing, and scale up and scale out on a per process basis.
However, when the created warehouses (processes) are processed in parallel, I am worried about the burden on the database storage.
Even though the query processing process can be load-balanced, since there is only one database storage, wouldn't there be a lot of parallel processing running in the database storage and an error occurring in the database storage layer?
Sorry if I don't understand the architecture.

The storage is immutable, thus the query read load is just IO against cloud provider IO layers, so for all purposes infinitely scalable.
When any node updates a table, the new set of file partitions are known, and any warehouse without the new set of partition parts, does remote IO to read them.
The only downsides to this pattern is it does not scale well for transactional write patterns, thus why that is not the targeted at those markets.

Related

Are CoW snapshots the solution to safely pull data from critial OLTP databases for reporting?

Our IT team copies data from mission-critical SQL Server OLTP databases in what seems to be a naive way - basically just INSERT INTO ... SELECT * every night. We use this copied data database for reporting. This is unsatisfactory for various reasons but we're told it is the only way because uncontrolled user query execution could compromise OLTP performance & data integrity. I want an improvement that still addresses their concerns.
Copy-on-write snapshots are the best solution I've read about (we don't need up-to-the-minute data for reporting), but please comment on the following:
The snapshot's sparse files should be placed on a separate physical drive (so that snapshot reads/writes can occur without limiting disk throughput for OLTP tasks).
There should be a single NTFS filesystem spanning all physical disks (on a hunch that would work better than putting the online database its snapshots on logically separated volumes).
Create the filesystem with the /L:enable flag (so it works better with large sparse files).
Avoid multiple snapshots (since original data would have to be copied to each one).
We could use a single snapshot MyDB_LatestSnapshot that could be deleted and very quickly re-created every day, or even throughout the day (so long as kicking users running reports off it is acceptable).
Since the database snapshots will always be recent, most data will not have changed and so it will still have to be retrieved from the same drive as the online OLTP database, so increased resource (CPU/RAM) use is inevitable. Won't a long-running reporting query that pulls years of historical data (including data that hasn't changed and therefore doesn't exist in the snapshot) block writes just as if it were running against the online database?
Is there any way to tell SQL Server to prioritize resources for the needs of the OLTP database?
I've found examples of how original rows are copied from the online database when they're updated, but how do snapshots handle structural changes in the new database, like new/altered tables, indexes, etc.?
Can snapshots have different user permissions versus the online database (so that users can read from the snapshot, but not the online database)?
The OLTP system runs core banking applications, so I understand utmost caution is justified, but I can't believe the current approach is best practice in 2022.

Containers for database and scalability

Consider TiDB and the TiDB Operator as examples for this question.
TiDB
TiDB ("Ti" stands for Titanium) is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.
TiDB Operator
The TiDB Operator automatically deploys, operates, and manages a TiDB cluster in any Kubernetes-enabled cloud environment.
Once the database is live, there are broadly two scenarios ever.
Very high rate of read only queries.
Very high rate of write queries.
In either of the scenarios, which component of the containerized database scales? Read replicas? Database 'engine' itself? Persistent volumes? All of the above?
Containerized infrastructure abstracts storage and computing resources
(consider PV and Pod in k8s), and these resources scale as the database scales. So the form of scaling depends on the database itself.
For TiDB, while it offers MySQL compatible SQL interface, its architecture is is very different from MySQL and other traditional relational databases:
The SQL layer(TiDB) serves SQL queries and interacts with the storage layer based on the calculated query plan. It is stateless and scales on demand for both read and write queries. Typically, you scale out/up the SQL layer to get more compute resources for query plan calculation, join, aggregation and serving more connections.
The Storage layer(TiKV) is responsible for storing data and serving KV APIs for the SQL layer. The most interesting part of TiKV is the Multi-raft replication, The storage layer automatically splits data into pieces and distributes them to containers evenly. Each pieces is a raft group whose leader serves read and write queries. Upon scale in/out, the storage layer will automatically migrates data pieces to balance the load. So, scale out the storage layer will give you better read/write throughput and large data capacity.
Back to the question, all of the components mentioned in the question scales. The read/write replicas serving SQL queries can scale, the database "engine"(storage layer) serving KV queries can scale, and the PV is also scaled out along with the scaling process of the storage layer.

Load balancer and multiple instance of database design

The current single application server can handle about 5000 concurrent requests. However, the user base will be over millions and I may need to have two application servers to handle requests.
So the design is to have a load balancer to hope it will handle over 10000 concurrent requests. However, the data of each users are being stored in one single database. So the design is to have two or more servers, shall I do the followings?
Having two instances of databases
Real-time sync between two database
Is this correct?
However, if so, will the sync process lower down the performance of the servers
as Database replication seems costly.
Thank you.
You probably want to think of your service in "tiers". In this instance, you've got two tiers; the application tier and the database tier.
Typically, your application tier is going to be considerably easier to scale horizontally (i.e. by adding more application servers behind a load balancer) than your database tier.
With that in mind, the best approach is probably to overprovision your database (i.e. put it on its own, meaty server) and have your application servers all connect to that same database. Depending on the database software you're using, you could also look at using read replicas (AWS docs) to reduce the strain on your database.
You can also look at caching via Memcached / Redis to reduce the amount of load you're placing on the database.
So – tl;dr – put your DB on its own, big, server, and spread your application code across many small servers, all connecting to that same DB server.
Best option could be the synchronizing the standby node with data from active node as cost effective solution since it can be achievable using open source relational database(e.g. Maria DB).
Do not store computable results and statistics that can be easily doable at run time which may help reduce to data size.
If history data is not needed urgent for inquiries , it can be written to text file in easily importable format to database(e.g. .csv).
Data objects that are very oftenly updated can be kept in in-memory database as key value pair, use scheduled task to perform batch update/insert to relation database to achieve persistence
Implement retry logic for database batch update tasks to handle db downtimes or network errors
Consider writing data to relational database as serialized objects
Cache configuration data to memory from database either periodically or via API to refresh the changing part.

When to prefer master-slave and when to cluster?

I know there have been many articles written about database replication. Trust me, I spent some time reading those articles including this SO one that explaints the pros and cons of replication. This SO article goes in depth about replication and clustering individually, but doesn't answer these simple questions that I have:
When do you replicate your database, and when do you cluster?
Can both be performed at the same time? If yes, what are the inspirations for each?
Thanks in advance.
MySQL currently supports two different solutions for creating a high availability environment and achieving multi-server scalability.
MySQL Replication
The first form is replication, which MySQL has supported since MySQL version 3.23. Replication in MySQL is currently implemented as an asyncronous master-slave setup that uses a logical log-shipping backend.
A master-slave setup means that one server is designated to act as the master. It is then required to receive all of the write queries. The master then executes and logs the queries, which is then shipped to the slave to execute and hence to keep the same data across all of the replication members.
Replication is asyncronous, which means that the slave server is not guaranteed to have the data when the master performs the change. Normally, replication will be as real-time as possible. However, there is no guarantee about the time required for the change to propagate to the slave.
Replication can be used for many reasons. Some of the more common reasons include scalibility, server failover, and for backup solutions.
Scalibility can be achieved due to the fact that you can now do can do SELECT queries across any of the slaves. Write statements however are not improved generally due to the fact that writes have to occur on each of the replication member.
Failover can be implemented fairly easily using an external monitoring utility that uses a heartbeat or similar mechanism to detect the failure of a master server. MySQL does not currently do automatic failover as the logic is generally very application dependent. Keep in mind that due to the fact that replication is asynchronous that it is possible that not all of the changes done on the master will have propagated to the slave.
MySQL replication works very well even across slower connections, and with connections that aren't continuous. It also is able to be used across different hardware and software platforms. It is possible to use replication with most storage engines including MyISAM and InnoDB.
MySQL Cluster
MySQL Cluster is a shared nothing, distributed, partitioning system that uses synchronous replication in order to maintain high availability and performance.
MySQL Cluster is implemented through a separate storage engine called NDB Cluster. This storage engine will automatically partition data across a number of data nodes. The automatic partitioning of data allows for parallelization of queries that are executed. Both reads and writes can be scaled in this fashion since the writes can be distributed across many nodes.
Internally, MySQL Cluster also uses synchronous replication in order to remove any single point of failure from the system. Since two or more nodes are always guaranteed to have the data fragment, at least one node can fail without any impact on running transactions. Failure detection is automatically handled with the dead node being removed transparent to the application. Upon node restart, it will automatically be re-integrated into the cluster and begin handling requests as soon as possible.
There are a number of limitations that currently exist and have to be kept in mind while deciding if MySQL Cluster is the correct solution for your situation.
Currently all of the data and indexes stored in MySQL Cluster are stored in main memory across the cluster. This does restrict the size of the database based on the systems used in the cluster.
MySQL Cluster is designed to be used on an internal network as latency is very important for response time.
As a result, it is not possible to run a single cluster across a wide geographic distance. In addition, while MySQL Cluster will work over commodity network setups, in order to attain the highest performance possible special clustering interconnects can be used.
In Master-Salve configuration the write operations are performed by Master and Read by slave. So all SQL request first reaches the Master and a queue of request is maintained and the read operation get executed only after completion of write. There is a common problem in Master-Salve configuration which i also witnessed is that when queue becomes too large to be maintatined by master then this achitecture collapse and the slave starts behaving like master.
For clusters i have worked on Cassandra where the request reaches a node(table) and a commit hash is maintained which notices the differences made to a node and updates the other nodes based on that commit hash. So here all operations are not dependent on a single node.
We used Master-Salve when write data is not big in size and count otherwise we use clusters.
Clusters are expensive in space and Master-Salve in time so your desicion of what to choose depends on what you want to save.
We can also use both at the same time, i have done this in my current company.
We moved the tables with most write operations to Cassandra and we have written 4 API to perform the CRUD operation on tables in Cassandra. As whenever an HTTP request comes it first hits our web server and from the code running on our web server we can decide which operation has to be performed (among CRUD) and then we call that particular API to make changes to the cassandra database.

How does Replication work in a Distributed Database

I would like to know how replication works in a distributed database. It would be nice if this could be explained in a thorough, yet easy to understand way.
It would also be nice if you could make a comparison between distributed transactions and distributed replication.
Single point of failure
The database server is a central part of an enterprise system, and, if it goes down, service availability might get compromised.
If the database server is running on a single server, then we have a single point of failure. Any hardware issue (e.g., disk drive failure) or software malfunction (e.g., driver problems, malfunctioning updates) will render the system unavailable.
Limited resources
If there is a single database server node, then vertical scaling is the only option when it comes to accommodating a higher traffic load. Vertical scaling, or scaling up, means buying more powerful hardware, which provides more resources (e.g., CPU, Memory, I/O) to serve the incoming client transactions.
Up to a certain hardware configuration, vertical scaling can be a viable and simple solution to scale a database system. The problem is that the price-performance ratio is not linear, so after a certain threshold, you get diminishing returns from vertical scaling.
Another problem with vertical scaling is that, in order to upgrade the server, the database service needs to be stopped. So, during the hardware upgrade, the application will not be available, which can impact underlying business operations.
Database Replication
To overcome the aforementioned issues associated with having a single database server node, we can set up multiple database server nodes. The more nodes, the more resources we will have to process incoming traffic.
Also, if a database server node is down, the system can still process requests as long as there are spare database nodes to connect to. For this reason, upgrading the hardware or software of a given database server node can be done without affecting the overall system availability.
The challenge of having multiple nodes is data consistency. If all nodes are in-sync at any given time, the system is Linearizable, which is the strongest guarantee when it comes to data consistency across multiple registers.
The process of synchronizing data across all database nodes is called replication, and there are multiple strategies that we can use.
Single-Primary Database Replication
The Single-Primary Replication scheme looks as follows:
The primary node, also known as the Master node, is the one accepting writes while the replica nodes can only process read-only transactions. Having a single source of truth allows us to avoid data conflicts.
To keep the replicas in-sync, the primary nodes must provide the list of changes that were done by all committed transactions.
Relational database systems have a Redo Log, which contains all data changes that were successfully committed.
PostgreSQL uses the WAL (Write-Ahead Log) records to ensure transaction Durability and for Streaming Replication.
Because the storage engine is separated from the MySQL server, MySQL uses a separate Binary Log for replication. The Redo Log is generated by the InnoDB storage engine, and its goal is to provide transaction Durability while the Binary Log is created by the MySQL Server, and it stores the logical logging records, as opposed to physical logging created by the Redo Log.
By applying the same changes recorded in the WAL or Binary Log entries, the replica node can stay in-sync with the primary node.
Horizontal scaling
The Single-Primary Replication provides horizontal scalability for read-only transactions. If the number of read-only transactions increases, we can create more replica nodes to accommodate the incoming traffic.
This is what horizontal scaling, or scaling out, is all about. Unlike vertical scaling, which requires buying more powerful hardware, horizontal scaling can be achieved using commodity hardware.
On the other hand, read-write transactions can only be scaled up (vertical scaling) as there is a single primary node.
I would recommend initially spending time reviewing the MySQL Docs on Replication. It's a good example of database replication. They are here:
http://dev.mysql.com/doc/refman/5.5/en/replication.html
Covering the entire scope of your question seems like too much for one question.
If you have some specific questions, please feel free to post them. Thanks!
Clustrix is a distributed database with a shared nothing architecture that supports both distributed transactions and replication. There is some technical documentation available that describes data distribution, distributed evaluation model, and built in fault tolerance, as well as an overview of the architecture.
As a MySQL replacement, Clustrix implements MySQL's replication policy and produces binlogs in the MySQL format, which are serialized so that Clustrix can act as either a Master or Slave to MySQL.

Resources