Snowflake Automation Tasks in your current or previous organization [closed] - snowflake-cloud-data-platform

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
Just curious to know the automations you have done in your snowflake job ( may be in your current organization or previous one ).
Automations or BI Related tasks.
Regards

You can automate almost anything. I have faced such tasks as:
orchestration with ETL/ELT processes
copying/moving files on External Stage
direct loading of data from e-mail messages
alarms and automatic notifications about Snowflake charge or costs
automation of granting RBAC permissions
creating new copies of environments
database replication
data backup on AWS S3 (customer requirement)
CI/CD processes
and much much more
If you are looking for a good automation tool, I strongly recommend AutomateNOW!.
AutomateNOW! enables you natively communicate with Snowflake. Thanks to that you can manage and monitor Snowflake processes such as stored procedures, queries, DMLs and so on. AutomateNOW! is also useful to reduce the cost of Snowflake usage because in a dynamic manner it is able to decide what size of Data Warehouse should be launched and once the process is finished it can dynamically resize the warehouse or make it suspended. In addition the output of the Explain Plan of SF queries can be used and parsed on the fly in order to manage workload of Snowflake. I see the enormous potential when Snowflake is one of the components of a multi technology ecosystem and it is needed to manage the whole process or chain of the process.
Tool link: AutomateNOW!
Examples of AutomateNOW! screenshots:

Related

Firebase database simultaneous connections [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Firebase says it can have only 100k users simultaneously for spark plan. It also states per database. What does that mean? How can I store data in multiple databases and connect each other? Also it states 1gb data stored. How much will that be approx? Say 1 users data will have 10 childs. So how many users data can be stored at that space? Someone please help me out as google isn't very clear about it.
I'm going to assume you're talking about Realtime Databases and not Cloud Firestore.
The Firebase Spark "Free" Plan includes 100 simultaneous users not 100k. (100k+ users is supported with the Flame plan and Blaze plan).
You can store 1GB worth of data in the Real Time Database, and 100GB worth a month for download. This plan only supports 1 database per project, connecting of multiple databases isn't possible.
It's hard to determine how much "storage" that would take up, due to varying factors. But, a good rule of thumb is that most JSON data doesn't take up a lot of space so you should be good.
I would like to clarify with you that simultaneous users is just the amount of users that can access your database (via any interface or platform) at the same time to a single database.
There's a great documentation on the features and pricing of Firebase here, and I would also recommend reading some of their documentation on Realtime Databases.
I hope this helps, if you need any more help please let me know.

Framework for partial bidirectional database synchronization [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm trying to optimize the backend for an information system for high-availability, which involves splitting off a part needed for time-critical client requests (front office) from the rest (back office).
Front office will have redundant application servers with load balancing for maximum performance and will use a database with pre-computed data. Back office will periodically prepare data for the front office based on client statistics and some external data.
A part of the data schema will be shared between both back and front office, but not the whole databases, only parts of some tables. The data will not need to correspond all the time, it will be synchronized between the two databases periodically. Continuous synchronization is also viable, but there is no real-time consistency requierement and it seems that batch-style synchronization would be better in terms of control, debug and backup possibilities. I expect no need for solving conflicts because data will mostly grow and change only on one side.
The solution should allow defining corresponding tables and columns and then it will insert/update new/changed rows. The solution should ideally use data model defined in Groovy classes (probably through annotations?), as both applications run on Grails. The synchronization may use the existing Grails web applications or run externally, maybe even on the database server alone (Postgresql).
There are systems for replicating whole mirrored databases, but I wasn't able to find any solution suiting my needs. Do you know of any existing framework to do help with that or to make my own is the only possibility?
I ended up using Londiste from SkyTools. The project page on pgFoundry site lists quite old binaries (and is currently down), so you better build it from source.
It's one direction (master-slave) only, so one has to set up two synchronization instances for bidirectional sync. Note that each instance consists of two Londiste binaries (master and slave worker) and a ticker daemon that pushes the changes.
To reduce synchronization traffic, you can extend the polling period (by default 1 second) in the configuration file or even turn it off completely by stopping the ticker and then trigger the sync manually by running SQL function pgq.ticker on master.
I solved the issue of partial column replication by writing a simple custom handler (londiste.handler.TableHandler subclass) with column-mapping configured in database. The mapping configuration is not model-driven (yet) as I originally planned, but I only need to replicate common columns, so this solution is sufficient for now.

Does Microsoft have a similar product like Google BigQuery? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I want to see whether Microsoft provide a similar service to Google BigQuery.
I want to run some queries on a database with the size of ~15GB and I want the service to be on the cloud.
P.S: Yes. I have google already but did not find anything similar.
The answer to your question is NO: Microsoft does not offer (yet) a real time big data query service where you pay as you perform queries. Which does not means you won't get a solution to your problem in Azure.
Depending on your need you may have two options on Azure:
SQL Data Warehouse: A new Azure based columnar database service in preview http://azure.microsoft.com/fr-fr/documentation/services/sql-data-warehouse/ which according to Microsoft can scale up to petabytes. Assuming that your data is structured (relational) and that you need sub second response time it should do the job you expect.
HDInsight is hadoop managed service https://azure.microsoft.com/en-us/documentation/articles/hdinsight-component-versioning/ which can deal better with semi structured data but is more oriented to batch processing. It contains Hive which is also SQL like but you won't get instant query response-time. You could go for this option if you are expecting to do calculations on a batch mode and store the aggregated result set somewhere else.
The main difference of these products and BigQuery is the prizing model in BigQuery you pay as you perform queries but in Micrisoft options you pay based on the resources you allocate, which can be very expensive if you data is really big.
I think if the expected usage is occasional BigQuery will be much cheaper, Misrosoft options will be better for intense use, but of course you will need to do a detailed prize comparison to be sure.
To get an idea of what BigQuery really is, and how it compares to a relational database (or Hadoop for that matter), take a look at this doc:
https://cloud.google.com/files/BigQueryTechnicalWP.pdf
Take a look at this:
http://azure.microsoft.com/en-in/solutions/big-data/.
Reveal new insights and drive better decision making with Azure HDInsight, a Big Data solution powered by Apache Hadoop. Surface those insights from all types of data to business users through Microsoft Excel.

Should I be tracking slowly changing dimensions in a Relational/transational database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Let's use the example of a Human Resources database. The transactional database that the HR personnel use on a day-to-day basis handles all of the hiring and firing that takes place on a daily basis. There is also a Dimensional Data Warehouse that pulls from that transactional database.
assuming that latency is sufficiently low, which of the following arguments would be considered "best practice"?
1) the Transactional database should only have to keep track of how that data currently is. it shouldn't have to keep track of slowly changing data (For example, the history of which managers a specific employee has had, how his salary has evolved over time, etc.). The ETL Process should detect changes in the transitional database, and update slowly changing dimensions in the data warehouse.
2) The transactional database is more than capable of tracking it's own historical information. If something were to ever change twice in between ETL sessions, you would lose the first change forever. The main purpose of the Dimensional database is for efficient query performance in reports, so it is still doing it's job. This also allows the ETL process to be faster and simpler.
I feel like both arguments have merits, and if they are both valid arguments, I am happy to simply choose between them.
Am I missing something that isn't being taken into consideration?
Are one of the arguments flat out wrong?
I think what #marek-grzenkowicz said is correct. If the business requirements of the HR transactional/operational system state that a history of changes are required, then they belong in the transactional/operational system. Likewise, if the business requirements state that this history (or perhaps a history at a different level of granularity) are required, the warehouse would store that as well. It is possible that the histories may be stored in both systems.
I too recommend "The Data Warehouse Toolkit". I'm reading it now and it seems to have a lot of time- and field-tested design patterns for modeling your data. The 3rd edition of this book just came out a couple weeks ago.

Best way to archive sql data and show in web application whenever required [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I have around 10 tables containing millions of rows. Now I want to archive 40% of data due to size and performance problem.
What would be best way to archive the old data and let the web application run? And in the near future if I need to show up the old data along with existing.
Thanks in advance.
There is no single solution for any case. It depends much on your data structure and application requirements. Most general cases seemed to be as follows:
If your application can't be redesigned and instant access is required to all your data, you need to use more powerful hardware/software solution.
If your application can't be redesigned but some of your data could be count as obsolete because it's requested relatively rearely you can split data and configure two applications to access different data.
If your application can't be redesigned but some of your data could be count as insensitive and could be minimized (consolidated, packed, etc.) you can perform some data transformation as well as keeping full data in another place for special requests.
If it's possible to redesign your application there are many ways to solve the problem.In general you will implement some kind of archive subsystem and in general it's complex problem especially if not only your data changes in time but data structure changes too.
If it's possible to redesign your application you can optimize you data structure using new supporting tables, indexes and other database objects and algorythms.
Create archive database if possible maintain different archive server because this data wont be much necessary but still need to be archived for future purposes, hence this reduces load on server and space.
Move all the table's data to that location. Later You can retrieve back in number of ways:
Changing the path of application
or updating live table with archive table

Resources