I'm currently in the process of migrating some PostgreSQL 11 databases to Azure, in the process of configuring the database storage I end up in doubt about how PostgreSQL manager the disk space.
Taking into consideration the storage space, how much free space should I leave on my database? I want to reduce the cost without having substantial performance hits by only having what a need. Expanding the storage space when needed is not a problem.
How and at what level having low free space can affect my performance?
For context, I have databases with sizes between 70GB and 90GB.
You could ref the document: Pricing tiers in Azure Database for PostgreSQL:
You can add additional storage capacity during and after the creation of the server, and allow the system to grow storage automatically based on the storage consumption of your workload. Storage can only be scaled up, not down.
This document can help you know more about the Azure PostgreSQL storage management.
As you said, your database size is 70GB and 90GB. the Basic price tier is suitable. Per my experience about Azure SQL database, 75% or 80% is the alert metric usually. You could first provision the storage not less than 90GB*120%=108GB.
HTH.
If the provisioned storage limit of the databases is equal or less than 10 GB, then create and alert when the data has consumed 80%.
If the provisioned storage limit of the databases is equal or less than 100 GB, then create and alert when the data has consumed 90%. For everything else create an alert when space has been consumed on 95%.
If your server is reaching provisioned storage limits, it will soon be out of space and set to read-only.
Monitor your usage and you can also provision for more storage to continue using the server without deleting any files, logs, and more.
A common issue when provisioning low storage capacity is the IOPS. The lower the storage capacity the less IOPS it can handle. If your application requires high IOPs, or you see queries having poor performance and waiting for IO, then it is recommended that you create an Azure Database for PostgreSQL server with a higher storage size to get more IOPs so that your application performance is not impacted by storage throttling.
Related
We have a read-only PostgreSQL RDS database which is heavily queried. We don't perform any inserts/updates/deletes during normal traffic, but still we can see how we are running out of Free Storage Space and an increase on Write IOPS metric. During this period, CPU usage is at 100%.
At some point, the storage space seems to be released.
Is this expected?
The issue was in the end related to our logs. log_statement was set to all, where every single query to PG would be log. In order to troubleshoot long time queries, we combined log_statement and log_min_duration_statement.
Since this is a read only database we want to know if there is any insert/update/delete operation, so log_statement: dll ; and we want to know which queries are taking longer than 1s: log_min_duration_statement: 1000
I am new to Azure and I am currently exploring the possibility of migrating my database from a Virtual private server to an Azure hosted sql database. I can see the potential advantage of moving to azure, less maintenance, cloud hosted, cheaper.
My VPS i currently pay £36 every month, this is a fixed payment. However on Azure using the pricing calculator i can see that with a standard tier it would cost about £14, which is a huge saving. My only issue is that I have chosen the DTU. Now i am worried that for one month it may be fine, but for another month cost may spike. The reason why I have been hesitant to migrate is because I am fine with paying £36 a month knowing i can use less or more and i will still be at a fixed cost.
On the other hand - using azure with dtu, there is no guarantee that i may have the same cost each month, it may be potentially higher.
My question is can someone explain DTU for me and also is there a way where i can make sure my cost is low without having any surprise costs in the future?
In DTU-based SQL purchase models, a fixed set of resources is assigned to the database or elastic pool via performance tiers: Basic, Standard and Premium. This model is best for customers who prefer the simplicity of fixed payments each month, where the simplicity of pre-configured options is desired.
You need to first measure the resource utilization to check now much DTU is enough for you.
Measure the following utilization metrics for at least an hour so the calculator can analyze utilization over time to provide you the best recommendation:
Processor - % Processor Time
Logical Disk - Disk Reads/sec
Logical Disk - Disk Writes/sec
Database - Log Bytes Flushed/sec
Refer this document to calculate the database resource consumption. Once you will get the output, calculate button to view your recommended Service Tier/Performance Level and DTUs on the same page.
Your consumption cost will be as per given image below:
You can also check your the pricing based on the your calculated DTUs here.
We have sort of a load balancer implemented for sql elastic pools. Basically it merges two pools, or splits two pools or scales up storage based on some reference settings (compute, storage etc) and the stats we get from the sql server. In some cases, as a special case of spliting a pool into two or more pools, we create a pool with larger vcore and then move the all databases from the old pool to the new pool. The question
is there a performance hit by creating a new pool with desired vcore and move the databases to the new pool vs scaling the vCore to the designed level in place or are they both the same since scaling in place also internally creates a new tier?
The question is there a performance hit by creating a new pool with
desired vcore and move the databases to the new pool vs scaling the
vCore to the designed level in place or are they both the same since
scaling in place also internally creates a new tier?
There is no such performance hit observed, however below are few notable points.
The databases in an elastic pool are on a single server and share a set number of resources at a set price. They ensuring that databases get the performance resources they need when they need it. They provide a simple resource allocation mechanism within a predictable budget.
When moving databases into or out of an elastic pool, there is no downtime except for a brief period of time (on the order of seconds) at the end of the operation when database connections are dropped.
You can refer to this official MS doc to understand the factors affected while changing service tier or rescaling.
Going through documentation I did find cross-region replication for storage layer, but not about compute layer of Snowflake. I did not see any mentions about availability options for Virtual Warehouses. In case whole AWS Region goes down, database will still be available for serving queries, but what about virtual warehouse? do I need to create a new one in case region is still down or is there a way to have a "back-up" virtual warehouse in different AWS region?
Virtual warehouse is a essentially a compute server (for example, AWS EC2 if hosted on AWS). Virtual warehouses are not persistent, i.e. when you suspend a warehouse, it is returned to the AWS/Azure/GCP pool and when you resume, it is allocated from the pool.
When a cloud region goes down, virtual warehouses will be allocated and created from AWS/Azure/GCP pool in the backup region.
The documentation here states clearly what can, and what can't, be replicated:
Snowflake Replication
For example, it states:
Currently, replication is supported for databases only. Other types of objects in an account cannot be replicated. This list includes:
Users
Roles
Warehouses
Resource monitors
Shares
When you set up an environment (into which you are going to replicate a database from another account) you also need to set up the roles, users, warehouses, etc.
The 'The Snowflake Elastic Data Warehouse'(2016) paper at the paragraph '4.2.1 Fault Resilience' reports:
Virtual Warehouses (VWs) are not distributed across AZs. This choice is for performance reasons. High network throughput is critical for distributed query execution, and network throughput is significantly higher within the same AZ. If one of the worker nodes fails during query execution, the query fails but is transparently re-executed, either with the node immediately replaced, or with a temporarily reduced number of nodes. To accelerate node replacement, Snowflake maintains a small pool of standby nodes. (These nodes are also used for fast VW provisioning.)
If an entire AZ becomes unavailable though, all queries running on a given VW of that AZ will fail, and the user needs to actively re-provision the VW in a different AZ.
With full-AZ failures being truly catastrophic and exceedingly rare events, we today accept this one scenario of partial system unavailability, but hope to address it in the future.
We have around 300 millions documents on a drive. We need to delete around 200 millions of them. I am going to write the 200 million paths to a storage so I can keep track of deleted documents. My current thought's is an Azure SQL database is properly not very suited for this amount. Cosmos DB is to expensive. Storing csv files is bad, because I need to do updates everytime I delete a file. Table storage seems to be a pretty good match, but does not offer groups by operations that could come in handy when doing status reports. I dont know much about data lake, if you can do fast updates or it is more like an archive. All input is welcome for choosing the right storage for this kind of reporting.
Thanks in advance.
According to your need, you can use Azure Cosmos DB or Azure table storage.
Azure Table Storage offers a NoSQL key-value store for semi-structured data. Unlike a traditional relational database, each entity (such as a row - in relational database terminology) can have a different structure, allowing your application to evolve without downtime to migrate between schemas.
Azure Cosmos DB is a multimodal database service designed for global use in mission-critical systems. Not only does it expose a Table API, it also has a SQL API, Apache Cassandra, MongoDB, Gremlin and Azure Table Storage. These allow you to easily swap out existing dbs with a Cosmos DB implementation.
Their differences are as below:
Performance
Azure Table Storage has no upper bound on latency. Cosmos DB defines
latency of single-digit milliseconds for reads and writes along with
operations at sub-15 milliseconds at the 99th percentile worldwide.
(That was a mouthful) Throughput is limited on Table Storage to 20,000
operations per second. On Cosmos DB, there is no upper limit on
throughput, and more than 10 million operations per second are
supported.
Global Distribution
Azure Table Storage supports a single region with an optional
read-only secondary region for availability. Cosmos DB supports
distribution from 1 to more than 30 regions with automatic failovers
worldwide.
Billing
Azure Table Storage uses storage volume to determine billing.Pricing
is tiered to get progressively cheaper per GB the more storage you
use. Operations incur a charge measured per 10,000 transactions.
For Cosmos DB, It has tow billing nodule : Provisioned throughput and
Consumed Storage.
Provisioned Throughput: Provisioned throughput (also called reserved throughput) guarantees high performance at any scale. You
specify the throughput (RU/s) that you need, and Azure Cosmos DB
dedicates the resources required to guarantee the configured
throughput. You are billed hourly for the maximum provisioned
throughput for a given hour.
Consumed Storage: You are billed a flat rate for the total amount of storage (GBs) consumed for data and the indexes for a given
hour.
For more details, please refer to the document.