Snowflake Virtual Warehouse usage consideration [closed] - snowflake-cloud-data-platform

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
This post was edited and submitted for review 1 year ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
When should you consider disabling auto-suspend for a Virtual Warehouse?
A. When users will be using compute at different times throughout a 24/7 period
B. When managing a steady workload
C. When the compute must be available with no delay or lag time
D. When you don’t want to have to manually turn on the Warehouse each time a user needs it
If we have to choose 2 options "B" is perfect but how about from "C" or "D" ?

The most common reason to disable auto-suspend is in situations where you don't want to lose the cache that exists on the warehouse, which would add additional query time to the first few queries that are executed after a warehouse resumes. In essence, when query time is critical and must be consistent, you'd potentially want to disable auto-suspend and allow the warehouse to stay up and running (either 24/7 or during a specified time period - using a task to suspend or resume the warehouse explicitly).

Related

How do I know which requirements my server needs for my database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I need to reduce the cost of servers.
the current server has the application
web, the database. the technical characteristics in general are
5tb
16 cpus
30gb ram
the current application does not exceed 100 users per day, the database weighs approx 30 GB. for these reasons the current server expense seems too much to me. I checked in digital ocean droplet and proposed to have the web application in an 8 CPUs, 16 GB ram, 1tb. the database in a 4 CPUs, 16 GB ram, 240 GB solid disk. But how can I know if what I am proposing is correct? to make the server change. thanks for your help.
Nobody can tell you the size of equipment you would use because this is totally dependent upon your application and how it is used.
For example, some applications are CPU-heavy (doing video processing), some require a lot of memory (keeping lots of data accessible), some are disk-dependent (doing database transactions). The "shape" of every application is different. Your application might be able to take advantage of multiple CPUs, or perhaps it is only using some of them.
Also, how users interact with every application is different. People use Facebook different to a banking app, different to a gaming website. If your 100 users are all in the same timezone, then usage will be different to having them spread around the world.
The only way to know what specification is require is either to monitor live traffic on the website and observe how CPU, RAM and disk is used, or simulate levels of traffic to reproduce what users typically do on the website and then measure the behaviour of the system.
Then there is the question of reliability. Consider whether you willing to run everything on one server, where the app might be unavailable if something goes wrong. Or perhaps you need high availability to ensure uptime, but at a greater cost.
Since you appear to be cost-conscious, I would recommend:
Monitor your current system (CPU, RAM, network, disk) during periods of normal usage.
If some aspects seem over-provisioned, then reduce them (eg if CPU never goes above 40%, reduce it). Check whether all CPUs are being used.
Use some form of continual monitoring to notify you when the application is not behaving as desired.
Keep log files to allow you to deep-dive into what might be causing the application problems.

snowflakes query profile -network communication [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
In snowflake query "profile" in history tab what is the significance of values that we see for "Network"
As per the documentation :
"Network — network communication:
Bytes sent over the network — amount of data sent over the network."
Does this cost us?if it cost, does it costs under the compute cost (credits)?
(or)
Does this cost under storage?
(or)
Is this indirectly adds up compute cost as the query running longer because of high no.of bytes sent over network.?
What is recommendation to keep it low ,if it's not good to have high number ?
I think below two links will help you to understand the billing part of snowflake. Second link talks about when the Data transfer will cost you and how. The network on the query profile provides just the statistics on the data movement , billing components are defined clearly in these links.
https://docs.snowflake.com/en/user-guide/admin-usage-billing.html
https://docs.snowflake.com/en/user-guide/billing-data-transfer.html
Note on Data transfer from Snowflake document
Note that Snowflake does not charge data ingress fees; however, contact your
cloud storage provider (Amazon S3, Google Cloud Storage, or Microsoft Azure) to
determine whether they apply data egress charges to transfer data from their network
and region of origin to the cloud provider’s network and region where your Snowflake
account is hosted.

Distributed Cache: Clearing entries on owner loss (Apache Ignite) [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I have a number of nodes each owning a subset of a large pool of physical resources. Each node only handles it's own resources but needs to do so based on the state of all resources in the pool.
A node only updates the state of it's own resources but listens to state changes of all the others.
When a node dies (process terminated, crash, power loss,...) those physical resources die with it and the other nodes must discard them ASAP.
Now i would do this with a distributed replicated cache (so nodes can read the local replicas for performance), but how to "clear" deceases entries (the owning node can't do it).
Standard answer seems to be Expiry policy = map entry gets X seconds to live without update, then gone. Problem is, if you are thinking in the 100.000 range of resources and 100 nodes, that is a LOT of updates to send everywhere for NOT changing state. Not to mention the work to update each entry before the timeout and discard these updates in the other nodes.
Any better way of doing this?
Subscribe to EVT_NODE_LEFT via IgniteEvents
Find related entries via SqlQuery or ScanQuery and remove them

Build a dynamic client list in build time [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm struggling to find a solution for the following issue:
Let's say I have a 'server' process in Kernel that centralize reports from all 'client' processes that signed up on it. This 'server' process will report other CPUs in the system when all 'client' processes finished their boot and they are ready.
A 'client' in my example will be any process in my system that wish to sign as one that 'server' needs to wait until he'll finish booting.
My problem is that the entire process above most be done in build time, because otherwise I am vulnerable to race cases, such as the following example:
Let's say my 'server' process finished his initial boot and he is ready, and he was the first process to boot in the system. In that case, if another CPU will query him - he will response that all 'clients' are ready (even if no one listed). So when other 'clients' will boot and list on it - it will be too late.
I want to build a generic solution, so once I finished building my environment - the 'server' process will 'know' how many 'clients' should sign up during system boot.
Any ideas here?
Thank you all
Here is what I have understood:
you want to build a service that will report whether other clients are up or not
you want the list of clients to be dynamic - ie a client could register or unregister at will
you want the list of clients to be persistent - the service should know the current list of clients immediately after each boot
A common way for that kind of requirement is to use a persistent database where the client can register (add one line) or unregister (delete their own line). The service has then only to read the database at boot time or on each request.
You cant then decide :
whether you want to use a simple file, a lite database (SQLite) or a full database (PosgreSQL, MariaDB, ...)
whether you want to read the database on each and every query or have the server cache the current state
in case of caching, whether you can accept non accurate responses, and just refresh state when it is older than n seconds, or if you need immediate synchronization (database is read at boot, but registration goes to service that writes database back to persistent storage) - that last way is more accurate but registration is only possible when service is up
Depending on you actual requirements, you can then imagine more clever solutions, but above should help you to start

Does boltdb support concurrent queries for reading and updating the db? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
Currently using boltdb for storing various entries in the bucket.
How can I use goroutines and channels in reading and updating in the db?
Generally, yes you can, provided you pay attention to the following points:
all accesses should be done in their own transactions. Transactions should not be shared between goroutines (whether they are read-only or read-write).
boltdb only tolerates one writer at a given point in time. If multiple concurrent transactions try to write at the same time, they will be serialized. The consistency of the database is guaranteed, but it has an impact on the performance, since write operations cannot be parallelized.
read-only transactions are executed concurrently (and potentially parallelized).
open only one transaction in a given goroutine at the same time to avoid deadlock situations.

Resources