Can Snowflake be used to mitigate application failure for business continuity? - snowflake-cloud-data-platform

I would like your opinions or experiences around the following possible solution idea. I know Snowflake is primarily a data analytics platform. But why could we not use it for some creative scenarios like business continuity?
Problem
Imagine an application that supports a critical business process. There is a risk that the application could become unavailable for an extended period. The application in this case is a SaaS solution by a reputable vendor, Salesforce. So it does not go down often. And when it does, they normally restore it in less than a day. But the business process is a critical medical logistics process - meaning if a transaction is delayed for a few days, lives may be lost.
Background
Our transaction volumes are moderate. We probably serve 25 new patients per day, with a few hundred interactions each day to support those. In the even of an outage, a subset of those might need immediate manual intervention to keep things moving. Others might be able to wait a couple of days.
We already use Snowflake to store replicas of the application's data. We use Looker to write analytics reports.
Proposed Solution
Write reports that expose critical data that may be needed if the primary application fails. Then, when the primary application fails, users can view reports using the latest replicated data to enable manual activities to keep things going until the primary application is restored to working order.
If data changes are needed, they must be written down somewhere and then applied to the application when its availability is restored

Your only issue could be latency, as it is today Snowflake is not built for OLTP workloads, but OLAP workloads.
If the latency you get when running queries from Snowflake is fine then you have a valid use case.
Snowflake is used as an Application backend - particularly if the query is about historical analysis and latency is acceptable at a few seconds as opposed to immediate.
See: https://www.snowflake.com/workloads/data-applications/

Related

Best practice or design to scale out/horizontal scale database for microservices

The main benefit of Microservices are one Service “Type” can be scale out by using multiple container instances and load-balancing to improve through put.
But one things is, multiple instances (ie. containers) of a "Service Type" are sharing the same database instance; and this could leave to performance bottle neck when multiple instance write/read on that database instance.
Traditionally, we would scale up on the processing power of that database instance to meet high demand.
The main questions for me is, what is the current best practice/design/solution to scale out/ horizontal scale so we can have multiple instance of that database and having performance improvement?
In particular, what I want to archive are:
One instance is down, a nother instance can handle the load -> High
Availability
Can load balance read, or maybe even write to multiple database
intance
Maintain the persistent and consistency of data incase I want to
create more database-instance
Within my knowledge,
One of the solution is Microsoft SQL Server provide High availability for SQL Server containers with can do most of the requirements above (https://learn.microsoft.com/en-us/sql/linux/sql-server-linux-container-ha-overview?view=sql-server-2017). But I'm wonder is there a better solution to avoid technology lock-down?
Another solution which I'm thinking of is: Replicate to multiple instance by using CDC Stream Data from a master database instance to multiple replications. This allow replication read.
But I'm still not convince because to quarrant the consistency, every services instance should write to master-database-instance, this could also, leave to bottle neck on master database instance.
There are 3 possible architectures for database at a broad level:
Single leader (e.g. RDBMS)
Multi leader (e.g. RDBMS in multiple DC)
Leader less (e.g. Riak, Cassandra)
As you go from top to bottom in the above list, horizontal scalability potential increases, but consistancy becomes weaker.
Scalability potential increases because more nodes can accept writes as you go down the list. Consistancy becomes weaker as writes take time to propagate or replicate to all nodes responsible for the data. Conflicts arise when same record is written in two different nodes at almost same time and so at the time of replication the system does not know which one is correct.
There are various conflict resolution strategies. Different database use different strategies. You need to study these strategies to understand which one suits your usecase and based on that you pick your DB.
There is always a trade off when making choices . database has its limitations and despite scaling database we can avoid performace hit by using simple best practices. you can't leave it to database to handle high request rate and mind it scaling database is expensive option and you will hit database limits eventually if not taken right so plan the whole system than just database.
coming to your point you can have one master and slave for read and write separately is very common approach but you have to rely on eventual consistency and sql always on is something you can have a look. You can cache the most frequently data. If you have very high request rate you may need to consider queues where you put the request and dequeue later to avoid database performance hit.

Tech-stack for querying and alerting on GB scale (streaming and at rest) datasets

Trying to scope out a project that involves data ingestion and analytics, and could use some advice on tooling and software.
We have sensors creating records with 2-3 fields, each one producing ~200 records per second (~2kb/second) and will send them off to a remote server once per minute resulting in about ~18 mil records and 200MB of data per day per sensor. Not sure how many sensors we will need but it will likely start off in the single digits.
We need to be able to take action (alert) on recent data (not sure the time period guessing less than 1 day), as well as run queries on the past data. We'd like something that scales and is relatively stable .
Was thinking about using elastic search (then maybe use x-pack or sentinl for alerting). Thought about Postgres as well. Kafka and Hadoop are definitely overkill. We're on AWS so we have access to tools like kinesis as well.
Question is, what would be an appropriate set of software / architecture for the job?
Have you talked to your AWS Solutions Architect about the use case? They love this kind of thing, they'll be happy to help you figure out the right architecture. It may be a good fit for the AWS IoT services?
If you don't go with the managed IoT services, you'll want to push the messages to a scalable queue like Kafka or Kinesis (IMO, if you are processing 18M * 5 sensors = 90M events per day, that's >1000 events per second. Kafka is not overkill here; a lot of other stacks would be under-kill).
From Kinesis you then flow the data into a faster stack for analytics / querying, such as HBase, Cassandra, Druid or ElasticSearch, depending on your team's preferences. Some would say that this is time series data so you should use a time series database such as InfluxDB; but again, it's up to you. Just make sure it's a database that performs well (and behaves itself!) when subjected to a steady load of 1000 writes per second. I would not recommend using a RDBMS for that, not even Postgres. The ones mentioned above should all handle it.
Also, don't forget to flow your messages from Kinesis to S3 for safe keeping, even if you don't intend to keep the messages forever (just set a lifecycle rule to delete old data from the bucket if that's the case). After all, this is big data and the rule is "everything breaks, all the time". If your analytical stack crashes you probably don't want to lose the data entirely.
As for alerting, it depends 1) what stack you choose for the analytical part, and 2) what kinds of triggers you want to use. From your description I'm guessing you'll soon end up wanting to build more advanced triggers, such as machine learning models for anomaly detection, and for that you may want something that doesn't poll the analytical stack but rather consumes events straight out of Kinesis.

Improving availability for Azure SQL?

What's a good strategy to maintain availability with Azure SQL? We've noticing way too many service interruptions with messages like
which basically kill our application entirely. The SLA is far from 99.9% and honestly we're not interested in getting a refund, just reliable availability so our customers don't experience app outages. We're actually having better uptimes with a single IaaS VM running SQL than relying on Azure SQL (which totally blows our minds)
Anyway, leaving all those observations behind, programmatically, what is the most economical approach to having better than the advertised 99.99% availability (say an order of magnitude better - 99.999% - 5 mins downtime/year) with Azure SQL? Any specific data access programming pattern(s) and operating procedures anyone recommends?
EDIT: We're already using the Microsoft EntLib 6.0 Transient Fault Handling Application Block library ... 10 retries with 100ms inter-attempt timings. However, it's not 'transient' when these outages are 5+ hours long ...
Have a look at the passive and active geo-replication features of SQL Database -
http://azure.microsoft.com/blog/2014/07/12/spotlight-on-sql-database-active-geo-replication/
Active geo-replication creates additional copies of the database seamlessly in another data centre you can fall back to in case of an issue in a particular data centre
We use stovepiped solutions in each datacenter with an external load-balancer/failover solution. URLs are created to monitor data center faults and failover to another region.
In other words, we built our own business continuity solution to guarantee 5x9s.

Distributed store with transactions

I currently develop an application hosted at google app engine. However, gae has many disadvantages: it's expensive and is very hard to debug since we can't attach to real instances.
I am considering changing the gae to an open source alternative. Unfortunately, none of the existing NOSQL solutions which satisfy me support transactions similar to gae's transactions (gae support transactions inside of entity groups).
What do you think about solving this problem? I am currently considering a store like Apache Cassandra + some locking service (hazelcast) for transactions. Did anyone has any experience in this area? What can you recommend
There are plans to support entity groups in cassandra in the future, see CASSANDRA-1684.
If your data can't be easily modelled without transactions, is it worth using a non transcational database? Do you need the scalability?
The standard way to do transaction like things in cassandra is described in this presentation, starting at slide 24. Basically you write something similar to a WAL log entry to 1 row, then perform the actual writes on multiple rows, then delete the WAL log row. On failure, simply read and perform actions in the WAL log. Since all cassandra writes have a user supplied time stamp, all writes can be made idempotent, just store the time stamp of your write with the WAL log entry.
This strategy gives you the Atomic and Durable in ACID, but you do not get Consistency and Isolation. If you are working at scale that requires something like cassandra, you probably need to give up full ACID transactions anyway.
You may want to try AppScale or TyphoonAE for hosting applications built for App Engine on your own hardware.
If you are developing under Python, you have very interesting debugging options with the Werkzeug debugger.

Scaling out SQL Server for the web (Single Writer Multiple Readers)

Has anyone had any experience scaling out SQL Server in a multi reader single writer fashion. If not can anyone suggest a suitable alternative for a read intensive web application, that they have experience with
It depends on probably 2 things:
How big each single write is?
Do readers need real time data?
A write will block readers when writing, but if each write is small and fast then readers won't notice.
If you offload, say, end of day reporting then you batch your load onto a separate server because readers do not require real time data. This makes sense
A write on your primary server must be synched to your offload secondary server... which will block there as part of the synch process anyway + you add an overhead load to manage the synch.
Most apps are 95%+ read anyway all the time. For example, an update or delete is a read followed by a write.
My choice would be (probably, based on the low write volume and it's a web app) to scale up and stuff as much RAM as I could in the DB server with separate disk paths for the data and log files of the database.
I don't have any experience with scaling out SQL Server for your scenario.
However for a Read-Intensive application, I would be looking at reducing the load on the database and employ a Cache Strategy using something like Memcache or MS Velocity
There are two approaches that I'm aware of:
Have the entire database loaded into the Cache and manage Adding and Updating of items in the cache.
Add items to the cache only when they are requested and remove them when a write operation is performed.
Some kind of replication would do the trick.
http://msdn.microsoft.com/en-us/library/ms151827.aspx
You of course need to change your app code.
Some people use partitioned tables, with different row ranges being stored on different servers - united with views. This would be invisible to the app. Federation for this practice, I think.
By designing your database, application and server configuration (SQL particulars - location of data/log/system/sql binaries/tempdb), you should be able to handle a pretty good load. Try not to complicate things if you don't have to.

Resources