how to manage costs of Snowflake Cloud test environnement - snowflake-cloud-data-platform

we have Poc ed Snowflake Cloud environement and I'm thinking about test costs.
we use ODI to run ELT queries, in dev/test phase, some queries could be wrong because of missing join or some thing else (dev environment exists for such things)
Comparing to traditional way, some times I keep running many packages the night, then, the morning when I go work I check results to see if something was wrong, if row number is what I expect etc, there is no impact, no extra charge of what I did on my company fees.
but if I do so with snowflake, If I come to work the morning and my boss tells me what you did yesterday ? we got 30000 dollars bill for yesterday tests running all the night, I'll be in a very bad situation.
the same question may be asked with novice developers
So how to deal with tests costs in cloud environment?
please consider ELT cases with a database/datawarehouse and a lot of data exchange and computing resources.
I read article below but there is no concrete idea.
https://hystax.com/how-test-environment-management-helps-achieve-cloud-cost-optimization-goals/
thank you.

Related

Customer SQLServer installations - how to prevent performance issues

We have an off the shelf product and customers are buying licenses to use this. They install this on their servers. The database schema looks exactly the same at all customers but the usage differs a lot. One customer might be huge with its users and using one component and one customer might be small using several. It differs how they use it.
The problem is when it comes to performance tuning. I can always start a log/trace/extended events, check system views, activity monitor etc at the customer site when they are facing these issues. But the problem is that I don´t know what is normal - do the waits differ or so? Some technicians at the customer does not have the knowledge to log these.
I´m searching the internet for clues how to prevent performance issues like these at the customer site, but I don´t feel like I´m getting the answers I want, if there are any? :) Our test databases differs so much in how the data is distributed.
Anyone else working like this and have tips on how to prevent performance issues in all the ways possible? Some tips on third party tools or ways to work?
Have you considered an active performance test before launch to appropriately tune the instance for its intended use? IS this worth doing? Ask what is the cost to the organization for the poor performance of the app or downtime? If the risk is less than the cost of the performance testing effort then this would seem an easy decision.

force.com ISV development, deployment, support

We're an ISV that's completed our first app on force.com. It's an xRM-like app with extended workflow to build out complex campaigns (not simple marketing-like campaigns) and integration with on-premise software. The platform brings enormous value, and at the same time some challenges. Interested in other ISV experiences around the following:
Application upgrade process. Customers expect cloud app upgrade to "just happen". Reality is that there's inevitable manual pre- and post-upgrade steps that can fill many pages. We don't want to burden the customer with this, and at the same time while we're happy to do the upgrade work for the customer, we don't want access to customer data and the need for elaborate security assurances that come along with that access. A conundrum.
Development environment. Agile/scrum development relies on achieving full test automation and continuous integration, yet full automation beyond unit test seems difficult or impossible.
Background processing. Constraints on scheduled jobs, callouts, and futures, and issues with transaction management present challenges to traditional software development.
Curious what other ISVs have found.
Thanks!
I am now working at my second Force.com ISV and so have a fair amount of experience in releasing products on the platform (have seen 4 separate products releases, 1 which included 3 version releases and 1 including another version update).
If possible, you should try to remove any pre/post install steps that the user requires to do. It sounds tough, and it is, but its the biggest reason people don't adopt a product. The idea is that it is quick and easy to install, one click, and any extra effort detracts from the user experience. Ensuring your system is data independent is a good way of getting around the data security issues you referred to, and obviously you can offer a consultancy to do the upgrade work. A sensible idea might be to have a list of all the objects and fields that are affected by your products installation and then to do a check of the customer org before installing. I would also say that installing in sandbox and doing a couple of weeks user testing can highlight any problems you may have in future very effectively.
It is not true that full test automation beyond unit tests cannot occur and is actually very simple. The key is having the necessary framework setup. So you would have a central version control system where your code is stored (a key agile part). Then you create a script so that when code is committed, it runs an install on a SFDC org, running all tests and reporting back. You can then get this script to run a set of apex classes or upload a bunch of CSV files to put data in with either further fuller apex tests to run functionality or selenium running to do a set of tests. You can then also use this test data and script for knocking out demo environments for sales guys.
The governor and background processing limits are a bit tight, but they keep on being increased. Maybe you should integrate with Heroku or similar to do some larger external processing? I will say though I think it improves programming abilities in general, making you think about what it is your doing and the best way to do it. This then leads to a more pleasant end user experience. Batch apex jobs area a good way of doing this processing and you can use the asyncapexjob object to report back on the status f a run to users.
Hope that helps and gives you a different perspective!
Paul

The difficulty of choosing right database for analytics

I need some help deciding which database we should choose for our project. We are developing a web application that collects data about user's behavior and analyses that (bad explanation, but I can't provide much more detail; web analytics data is one of our core datasets). We have estimated that we will insert approx 200 million rows per week into database + data calculated from that raw data. The data must be retained for at least six months.
I have spent last week and half gathering information about different solutions, but there seems to be so many that I feel lost. Most promising ones I found are Cassandra, Hbase and Hive. I also looked at MongoDb, Redis and some others, but they looked like they suited different needs or community wasn't that active.
The whole app will be run in Amazon's EC2. As a startup company pay-as-you-go pricing model fits us like a glove. The easier the database is to manage in the cloud, the better.
Scalability is important. The amount of data we will generate varies quite much and will grow over time.
We can't pay huge licensing fees. Otherwise we would probably use something like http://www.vertica.com/.
We need to do all sorts of analysis on data, and the easier they are write the better. I thought about using Map/Reduce for the task; Hbase seems to have better support for this than Cassandra, and Hive has it's own query language. Real-time analysis isn't needed; we can calculate results once a day and shovel those back to database for fast retrieval.
Compression support would be nice, but not necessary (disk space is cheap :).
I also though about using MySql (because we will use that for all the user information etc. anyway), but scaling will be much harder in the future and I think at some point we would have to move to some other db anyway. We are also more than willing to commit some time and effort to push the selected database forward in terms of development.
We have decided to go on with Hadoop(& Hive/Hbase) as our primary data store. Main reasons for this are:
It is proven technology, and many big sites are using it (Facebook...).
Lot's of documentation around and even Hadoop books have been written.
Hive provides nice SQL-like query language and command line, so even guys who don't know Java/Python/etc. can write queries easily.
It's free and community people seem to be helpful :)

Database Refresh

How often do you refresh your development database from production database?Since there are many types of projects (targeting different domains) I would like to know how it is being done and at what intervals(days/months/years) it is being done ?
thanks
While working at Callaway Golf we had an automated build that would completely refresh the database from a baseline. This baseline would be updated (from production) almost daily. We had a set up scripts (DTS) that would do this for us. So if there was some new and interesting information we could easily do it a couple times of day, once a week, etc. The key here is automation to perform the task. If it is easy to do then when it is done is really only dependent on how performing the task impacts the load on the production database, the network, and the amount of time it takes to complete it. This could of course be set up as a schedule task to run at off peak hours and before the dev team gets in in the morning.
The key things in refreshing your development database are:
(1) Automate the refresh through a script.
(2) Obfuscate the data in your development database since you do not want your developers to see the real data or you could do some sampling of your production database.
(3) Decide the frequency of the refresh -- I usually do it once a week.
Depends on what kind of work you're doing. If you're debugging issues that are closely related to the data, then frequent updates are good.
If you're doing data Quality Assurance (which often involves writing code to detect and repair it, that you have to develop and test away from the production server), then you need extremely fresh data. The bad data that is the most valuable to fix is the data that was just inserted or updated today.
If you are writing client code, then infrequent updates are good. Often when I'm writing C# UI code, I could care less what the data is, I just care if it shows up in the right box on the screen.
If you have data with any security issues, you should stop using production data--i.e. never update from production--and get a good sample data generator. Writing a good sample data generator is hard, so 3rd party products are the way to go. MS Data Dude comes to mind, and I recommend Sql RedGate's data generator.
And finally, how hard is it to get a copy of the production data? If it is cheap and automatable, just get a new copy every night. If it is expensive (requires the attention of a very busy DBA), well, resource constraints might answer the question for you regardless to these other concerns.
We tend to refresh every couple of days, or perhaps once a week or so if things are "normal," though if we're investigating something amiss we may do so more much more often.
Our production database is about 1GB, so it's not a trivial thing to copy around. Also, for us there's generally no burning need to get current data from production into the dev systems.
The "how" is simply a MySQL "backup" and "restore"
In a lot of cases, refreshing the dev database really isn't that important. Often production systems have far more data that required for development, and working with such a large dataset can be a hassle for several reasons. Examples include development on the interface, where it's more important to have some data instead of anything specific. In this case, it's more customary to thin out the production database to a smaller subset of real data. After you do this once, it's not really that important to update, as long as all schema changes are pushed through the dev database(s).
On the other hand, performance bugs may often require production-sized databases to be able to reproduce and identify bottlenecks, so in this scenario it is extremely useful to have an almost-realtime database. Many issues may only present themselves with the exact data used in production.
We tend to always go back to an on-demand schedule. We have many different databases that are used in a suite of applications. We stay away from automatic DEV databases b/c many of our code changes involve database changes and I don't want anything overwritten.
Not at all, the dev databases (one per developer) get setup by a script or similar a couple of times a day, possibly a couple hundred times when running db tests locally. This includes a couple of records for playing around
Of course we still need a database with at least parts of production in it, for integration and scalability tests. We are aiming for a daily automated refresh. But we aren't there yet.

How to get a customer to understand the importance of a qualified DBA?

I'm part of a software development company where we do custom developed applications for our clients.
Our software uses MS SQL Server and we have encountered some customers which do not have a DBA on staff to manage the databases or if they do, they lack the necessary knowledge to perform their job adequately.
We are in the process of drafting a contract with one of those customers to provide development services for new functionality on our software during the next year, where they have an amount of hours available for customization of our software.
Now they want us to include also a quote for database administration services and the problem is that they are including a clause that says that those services will be provided only when they request it.
My first reaction is that db administration is an ongoing process and not something that they can call us once a month to come for a day or two. I'm talking about a central 1TB+ MSSql Cluster and 100 branch offices with MSSql Workgroup edition.
My question is for any suggestions on how I could argue that there must be a fixed amount of hours every month for dba work and not only when their management thinks they need it (which I’m guessing would only be when they have a problem).
PS: Maybe this will be closed as not programming related. But I'm a programmer and I have this problem. My work is software development but i don't want to lose this client and the only solution I can think of is to find a way for the client to understand the scope so we can hire a qualified DBA to provide them with the service they require.
Edit: We are in a Latin American country with clients in the Spanish speaking region. My guess is that in more developed countries there is a culture that knows how delicate the situation is.
This is definitely one of those 'you can lead a horse to water, but you can't make them drink' situations.
My recommendation here would be to quote the DBA services as hourly, and make the rate high enough that you can outsource the work if you decide you want to. When (not if) the SQL servers start to have problems, the firm is on the hook.
I would also recommend that you include in your quote a non-optional 2 hour database technology review once per year. This is your opportunity to say 'You spent XXX on database maintenance this year, most of which was spent fighting fires that could have been easily avoided if you had just spent XXXX/4 and hired a DBA. We care about you as a customer, and we want you to save money, so we really recommend that you commit to using a DBA to perform periodic preventative maintenance'.
I would also recommend that you categorize any support requests as having root cause b/c of database maintenance vs other causes. This will let you put a nice pie chart in front of the customer during their annual review (which they are going to pay you to perform). It is critical to manage the perception so they don't think your code is causing the problems. You might even go so far as to share these metrics (db related issue vs non-db related issue) with them on a quarterly basis.
Sometimes people need to experience pain before they change. The key is to not be in between the hammer and their thumb as they learn the lesson, and hourly quoted work is one way of doing this.
As a side note, this sort of question is of great interest to a large number of developers. I'd say that this sort of thing could impact a programmer's quality of life more than any algorithm or library question ever could. Thanks for asking it!
No DBA on a system that size is a disaster waiting to happen. If they don't understand that, they are not qualified to run a database that size. I'd recommend that they talk to other companies with similar sized databases and have them ask them about their DBAs and what they do for them, and if they think they could survive without them.
Perhaps the link below from MS SQL Tips could give you some good talking points. But people who aren't technical wont respond to a technical explanation of the necessity of good DBA you are likley going to have to work toward proving the cost of bad DBA. Work out the worst case scenarios and see how they feel about them. If you can make it seem like a good financial move (and I think we all know it is) it will be an easy sell.
http://www.mssqltips.com/tip.asp?tip=1278

Resources