snowflakes query profile -network communication [closed] - snowflake-cloud-data-platform

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
In snowflake query "profile" in history tab what is the significance of values that we see for "Network"
As per the documentation :
"Network — network communication:
Bytes sent over the network — amount of data sent over the network."
Does this cost us?if it cost, does it costs under the compute cost (credits)?
(or)
Does this cost under storage?
(or)
Is this indirectly adds up compute cost as the query running longer because of high no.of bytes sent over network.?
What is recommendation to keep it low ,if it's not good to have high number ?

I think below two links will help you to understand the billing part of snowflake. Second link talks about when the Data transfer will cost you and how. The network on the query profile provides just the statistics on the data movement , billing components are defined clearly in these links.
https://docs.snowflake.com/en/user-guide/admin-usage-billing.html
https://docs.snowflake.com/en/user-guide/billing-data-transfer.html
Note on Data transfer from Snowflake document
Note that Snowflake does not charge data ingress fees; however, contact your
cloud storage provider (Amazon S3, Google Cloud Storage, or Microsoft Azure) to
determine whether they apply data egress charges to transfer data from their network
and region of origin to the cloud provider’s network and region where your Snowflake
account is hosted.

Related

Snowflake Virtual Warehouse usage consideration [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
This post was edited and submitted for review 1 year ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
When should you consider disabling auto-suspend for a Virtual Warehouse?
A. When users will be using compute at different times throughout a 24/7 period
B. When managing a steady workload
C. When the compute must be available with no delay or lag time
D. When you don’t want to have to manually turn on the Warehouse each time a user needs it
If we have to choose 2 options "B" is perfect but how about from "C" or "D" ?
The most common reason to disable auto-suspend is in situations where you don't want to lose the cache that exists on the warehouse, which would add additional query time to the first few queries that are executed after a warehouse resumes. In essence, when query time is critical and must be consistent, you'd potentially want to disable auto-suspend and allow the warehouse to stay up and running (either 24/7 or during a specified time period - using a task to suspend or resume the warehouse explicitly).

How do I know which requirements my server needs for my database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I need to reduce the cost of servers.
the current server has the application
web, the database. the technical characteristics in general are
5tb
16 cpus
30gb ram
the current application does not exceed 100 users per day, the database weighs approx 30 GB. for these reasons the current server expense seems too much to me. I checked in digital ocean droplet and proposed to have the web application in an 8 CPUs, 16 GB ram, 1tb. the database in a 4 CPUs, 16 GB ram, 240 GB solid disk. But how can I know if what I am proposing is correct? to make the server change. thanks for your help.
Nobody can tell you the size of equipment you would use because this is totally dependent upon your application and how it is used.
For example, some applications are CPU-heavy (doing video processing), some require a lot of memory (keeping lots of data accessible), some are disk-dependent (doing database transactions). The "shape" of every application is different. Your application might be able to take advantage of multiple CPUs, or perhaps it is only using some of them.
Also, how users interact with every application is different. People use Facebook different to a banking app, different to a gaming website. If your 100 users are all in the same timezone, then usage will be different to having them spread around the world.
The only way to know what specification is require is either to monitor live traffic on the website and observe how CPU, RAM and disk is used, or simulate levels of traffic to reproduce what users typically do on the website and then measure the behaviour of the system.
Then there is the question of reliability. Consider whether you willing to run everything on one server, where the app might be unavailable if something goes wrong. Or perhaps you need high availability to ensure uptime, but at a greater cost.
Since you appear to be cost-conscious, I would recommend:
Monitor your current system (CPU, RAM, network, disk) during periods of normal usage.
If some aspects seem over-provisioned, then reduce them (eg if CPU never goes above 40%, reduce it). Check whether all CPUs are being used.
Use some form of continual monitoring to notify you when the application is not behaving as desired.
Keep log files to allow you to deep-dive into what might be causing the application problems.

Best practices for draining or clearing a Google Cloud pubsub topic [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
For pubsub topics with number of messages in the range of ~100k, what is the best practice for draining/dropping/clearing/deleting all messages using gcloud-java SDK?
Possible solutions:
Deleting and recreating the subscribers and then the publishers
High concurrency pull+ack (easy to hit the quota this way)
Something else
My hope is that this process can be fast (not more than ~60 seconds, say), robust, and uses supported SDK methods with minimal other code.
Update with description of snapshot and seek feature:
One can use seek on a Pub/Sub subscription to ack older messages by seeking to a timestamp corresponding to now. The best way is via the gcloud command line tool. The command to acknowledge messages published up to a particular timestamp would be:
gcloud pubsub subscriptions seek <subscription path> --time=yyyy-mm-ddThh:mm:ss
To delete all messages up to now:
gcloud pubsub subscriptions seek <subscription path> --time=$(date +%Y-%m-%dT%H:%M:%S)
Previous answer prior to the addition of snapshot and seek:
Currently, Google Cloud Pub/Sub has no way clear older messages, though it is something we are looking to add. Deleting and recreating the subscription would be the most efficient way to clear it, both in terms of time and cost. You wouldn't have to do anything with your publishers; any messages published from the point after recreation on will be sent to subscribers on the recreated subscription.

GAE: Messenger monthly pricing [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I'm trying to estimate the pricing for a messenger iOS/Android app running under Google App Engine. However, the pricing info here doesn't really give me any usable information.
I plan on building the messenger via the XMPP API, but I will also share photos, conduct searches etc. I'd be able to calculate/guess the amount of messages, their sizes and the sizes of the photos, however I don't really understand the concept of Frontend Instance hours and how to estimate them.
Data to calculate with
100 000 users (profile with images - together cca 3 MB)
80 messages per user per day
2 photos per user per day (photo size: 200 KB)
Services (APIs)
Datastore API/High replication data storage (for sending pictures, user profiles)
Search API (searching for users)
Blobstore API (to store images in the user's profile)
XMPP API (messenger)
Now here is where it gets tricky for me... Does the XMPP per stanza price also add to the Instance Hours and Out/In network traffic? And what about the other APIs?
If I can approximate the number of requests, can I calculate the number of needed Instance hours?
The most important thing to realise - do I need to calculate the HOSTING (in/out traffic, instance hours, datastore) and API (XMPP, Search, Datastore) prices separately or are they inclusive? Meaning that for example for every message I will be charged twice, the XMPP stanza price and the in/out traffic price, or does the XMPP stanza price already contain all other costs (traffic+instance hours)?
As far as I researched, this isn't really explained in any of the documents. Most of the others trying to estimate the price just ended up with the trial & failure method - launching their system and observing the approximate monthly price... However that is not good enough for me :)
Out/In network traffic
XMPP:
https://developers.google.com/appengine/docs/python/xmpp/#Python_Quotas_and_limits
Network quota:
https://developers.google.com/appengine/docs/quotas#Requests
When the application will execute your code, then it will consume Instance Hours. Number of Instance Hours will depend on how complex the code. Max HTTP request is 60sec for Frontend Instance.
So, HOSTING and API prices are separately.
Tip 1: Start by calculating the cost of a single event (photo, message, etc).
Tip 2: From my experience the most difficult is estimate Datastore reads/writes and storage (in our app huge part of storage are indexes).

Is it true that a VPS's disk performance dont scale well (for Solr)? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I plan to run Solr cores for multiple clients. I thought Linode offers solid performance, stability and a scalable upgrade plan.
Solr is a high-performance full text search engine, written as a REST app in JSP. A source whom I look up to says that disk-intensive apps dont scale as gracefully as CPU-intensive tasks in a VPS environment. So, past a "point", if I am serving twice as many clients, I should order twice as many instances, rather than upgrading them. Or I need cloud services with load-balancing such as EC2, where multiple instances scale gracefully.
Is this true, especially for a modern host like Linode?
It is true that Solr is a disk-IO-intensive application. We run Websolr in Amazon EC2 and have spent a lot of time benchmarking and tuning both their instance stores and EBS RAID devices in order to get the best price to performance ratio that we can.
Really, the best answer for you here will be to benchmark your disk performance and compare a few different setups. Measure IO operations per second, read and write bandwidth, and create some timed Solr benchmarks as well. (Large reindexing task, large search volume, etc.) Compare those to some different setups and see which one gives you the most favorable price to performance ratio.

Resources