Deploying to more than one application server - app-config

We have reached that point where one application server is not enough.
Apart from performance benefits we want to have a somewhat failover scenario as we cannot afford a 30 minute downtime just because the server needs a reboot for the new kernel.
The first issue that has to be resolved is where to store the shared files so that both application servers can access it all the time. Possible solutions are:
DRBD with NFS on both servers.
A NAS with NFS
SAN with a clustered filesystem
We want to keep the solution as simple as possible and we don't want to spend a fortune on it. DRBD synchronization is cheap but too fragile. SANs are expensive and i've heard some horror stories with clustered filesystems. A NAS with NFS seems the best fit.
How do you handle shared storage from multiple servers? Have you encountered any problems with NFS?
Bonus Question: Sun has a start up essentials program that offers significant discounts and we can grab their Sun 7110 Unified Storage for about 6k. Has anyone any experience with unified storage? Are there any fairly priced alternatives ?

I only have experience with nfs mounts (for ten app servers) using a netapp[1] storage product and can't say anything negative.
[1] http://www.netapp.com

Related

What is the smallest AWS EC2 instance I can run a postgres db on?

There is the free tier on AWS and I can get a micro EC2 instance for free essentially.. or close to.. I'm sure setting up elastic ips - loads balancers etc is extra.
Would it effectively be possible for me to run a postgres DB - for a small api. Roughly about 50 inserts + 50 reads per second ... say about 6000 operations per min at the most.
I can't seem to find anything online - which makes me think that this might be a silly idea.
For this not to be an "open question" - it's simply: Is it possible and realistic to expect usable performance on an EC2 instance running my postgres DB.
The best way to determine whether the database can handle a particular workload is to test it at that capacity. Launch the database, simulate traffic and monitor its performance. Please note that every application uses a database differently, so nobody can provide "general advice" as to whether a particular-sized database would meet the needs of your particular application.
If you are going to run 'production' workloads, try to avoid using the Burstable performance instances (T2, T3) since they can hit limits under heavy workloads unless the 'Unlimited' option is selected. T2/T3 is great for bursty workloads, but not for sustained workloads.
Comparing m5.xlarge between EC2 and RDS:
Amazon EC2: 19.2c/hr ($4.61/day)
Amazon RDS: 35.6c/hr ($8.54/day)
For the additional price, Amazon RDS provides a fully-managed database, automated backups, CloudWatch metrics, etc. This is probably worth much more than $4 of your time every day.
Alternatively, if you can modify your application to use NoSQL instead of SQL, you could use Amazon DynamoDB where the capacity you mention would cost 4c/hour ($1/day) plus request and storage costs.
Don't underspend on your database — it powers everything you do. Instead, try to save money by turning off non-production systems when they aren't being used (eg weekends and evenings). That will hopefully give you enough savings to afford an appropriately-powered database.

estimate server cost using aws

pardon me if this isn't the place to ask such question. But I have finished my project and thinking to deploy it using amazon elastic beanstalk and got a huge worry. My worry is that my project's database can be humongous. It's a community website like a reddit that users can create a page that other people can post text,link,pic,video(youtube). Also users get a profile page, and are able to comment as well. This was my first big project, and I don't want to pay more than $200 for server fee every month.
should I still deploy this? or just be happy I proved myself I can make this? how much do you think I'll have to pay assuming I get about (max)100 users?
For starters, you can look at the costs for any AWS service by going to that services 'homepage' and clicking "Pricing" usually on the left side. I typically get to the pricing page by Googling "AWS <> Pricing" (e.g. "AWS EC2 Pricing").
Whether or not you incur any cost and what that cost is, really depends on how you deploy your website. Questions like, is your database self-managed (i.e. installed on your own EC2 instance) or are you using RDS? Are you using S3 to store static content? Will you be serving your web contents via Cloudfront (AWS' CDN)?
Many of the basic services (EC2, S3, RDS, etc.) have free-tiers which will allow you to use them for free, provided you stay within certain (and usually very low) levels of usage.
If your database is going to get VERY large, and cost is your primary concern, it's usually more cost-effective to manage it on your own EC2 server, however then things like updates, security, scaling, backups, etc. all become your problem to deal with and often can incur additional cost (i.e. your backups will likely require volume snapshots which will cost you vs. RDS' backups are free).
If you're going to have a significant amount of static content, it will be more cost effective to host it on your own EC2 server, however again, all maintenance will be your responsibility, such as backups and scaling (which can incur cost) to meet demand vs. all of that is taken care of by S3, though you pay each time a file is accessed.
If cost is your primary concern, my suggestion is to start out your development using the AWS services (RDS, S3 maybe Elastic Beanstalk), though that can often complexity to your development efforts (dealing with authentication, additional SDK's, etc.). You can typically and pretty easily roll out your own service (MySQL, EBS filesystem to replace S3, etc.) as replacement. Additionally, again depending on your roll-out, there can be network traffic costs. Usually, this isn't a problem if you're doing things the way Amazon wants you to, but...it wouldn't be unheard of.
To get you started:
https://aws.amazon.com/s3/pricing/
https://aws.amazon.com/ec2/pricing/
https://aws.amazon.com/rds/pricing/
Additionally, there is a nifty calculator which can help you estimate your costs. You will need to know what your traffic expectations as well as service requirements are, but you can play around with those numbers.
https://calculator.s3.amazonaws.com/index.html
You don't have to worry about charges. As AWS has a Free tier which offers most of the services free for 12 months.
https://aws.amazon.com/free/
You can have 01 t2.micro (1 vCpu + 1 gb RAM) instance for Elastic beanstalk (EBS) with auto scaling turned off, purchase Reserved instances by making 1 or 3 year all upfront payment to save more.
https://aws.amazon.com/ec2/purchasing-options/reserved-instances/
You must use Relational Database Service (RDS) for database, instead of installing db in your EBS instance & store files on Simple Storage Service (S3).
RDS Pricing is $0.100 per GB-month, after first 12 months. So you dont need to worry about the database size.
After first 12 months, your monthly bill would be less than USD 50.
On production, we are using 2 t2.micro instances (Windows) with 1 Ms Sql database on RDS & we only have to pay for the extra EC2 machine, about USD 14 per month.
I did find some relevant information in the below stackoverflow thread which talks about the capabilities of t2.micro,t2.small & t2.medium ec2 instances.Do have a look at it.
several t2.micro better than a single t2.small or t2.medium

Central data management for custom desktop applications

I have a background in web programming where both the data and the code live on the server. Web hosts with mysql or the like are plentiful and cheap so using the application from multiple pcs was never a problem.
However I'm considering switching to building desktop applications but the only factor that annoys me is the syncing of data across the many pcs I use. I was thinking of perhaps setting up a light amazon ec2 instance with a postgresql on it and having my desktop applications use that.
I have a few questions:
I'm curious as to what latency I might expect by running the database on ec2 instead of the local network, any experience or insight is appreciated.
Are there better/more obvious/cheaper solutions?
I've looked at the pricing and it seems to come down to 24.48$ per month for a yearly contract. Whilst not really expensive, it is not exactly cheap either. At what point does it become more interesting to run a local server?
I'm obviously not using my applications for large parts of the day (sleep, work,...). I was wondering if I can have the amazon server go into a sort of "sleep" mode and wake up when poked. An initial delay for the first desktop application is acceptable. The reason behind this behavior would be to save money on the instance if it is only actually needed for 10% of the day.
I welcome any feedback at all on how this problem is best tackled.
This could get ugly. Every single query you do will have latency associated with it. If you have a lot of queries, this can add up very fast. So keep your query count low, and try to pre-fetch and cache data when possible.
Not enough information to answer that question.
Depends on the cost of your local server. Keep in mind that you will need to pay for electricity to keep it on.
You can stop your instance when you are not needing it, with the exception of high utilization reservations, you wont get billed when its in stopped state. With high utilization reservations you will still pay the full cost.

Are there any "gotchas" in deploying a Cassandra cluster to a set of Linode VPS instances?

I am learning about the Apache Cassandra database [sic].
Does anyone have any good/bad experiences with deploying Cassandra to less than dedicated hardware like the offerings of Linode or Slicehost?
I think Cassandra would be a great way to scale a web service easily to meet read/write/request load... just add another Linode running a Cassandra node to the existing cluster. Yes, this implies running the public web service and a Cassandra node on the same VPS (which many can take exception with).
Pros of Linode-like deployment for Cassandra:
Private VLAN; the Cassandra nodes could communicate privately
An API to provision a new Linode (and perhaps configure it with a "StackScript" that installs Cassandra and its dependencies, etc.)
The price is right
Cons:
Each host is a VPS and is not dedicated of course
The RAM/cost ratio is not that great once you decide you want 4GB RAM (cf. dedicated at say SoftLayer)
Only 1 disk where one would prefer 2 disks I suppose (1 for the commit log and another disk for the data files themselves). Probably moot since this is shared hardware anyway.
EDIT: found this which helps a bit: http://wiki.apache.org/cassandra/CassandraHardware
I see that 1GB is the minimum but is this a recommendation? Could I deploy with a Linode 720 for instance (say 500 MB usable to Cassandra)? See http://www.linode.com/
How much ram you needs really depends on your workload: if you are write-mostly you can get away with less, otherwise you will want ram for the read cache.
You do get more ram for you money at my employer, rackspace cloud: http://www.rackspacecloud.com/cloud_hosting_products/servers/pricing. (our machines also have raided disks so people typically see better i/o performance vs EC2. Dunno about linode.)
Since with most VPSes you pay roughly 2x for the next-size instance, i.e., about the same as adding a second small instance, I would recommend going with fewer, larger instances than more, smaller ones, since in small numbers network overhead is not negligible.
I do know someone using Cassandra on 256MB VMs but you're definitely in the minority if you go that small.

File Replication Solutions

Thinking about a Windows-hosted build process that will periodically drop files to disk to be replicated to several other Windows Servers in the same datacenter. The other machines would run IIS, and serve those files to the masses.
The total corpus size would be millions of files, 100's of GB of data. It'd have to deal with possible contention on the target servers, latent links e.g. over a WAN, cold-start clean servers
Solutions I've thought about so far :
queue'd system and daemons either wake periodically and copy or run as services.
SAN - expensive, complex, more expensive
ROBOCOPY, on a timed job - simple but effective. Lots of internal/indeterminate state e.g. where its at in copying, errors
Off the shelf repl. software - less expensive than SAN but still expensive
UNC shared folders and no repl. Higher latency, lower cost - still need a clustering solution too.
DFS Replication.
What else have other folks used?
I've used rsync scripts with good success for this type of work, 1000's of machines in our case. I believe there is an rsync server for windows, but I have not used it on anything other than Linux.
Though we do not have these millions of giga of data to manage, we are sending and collecting lots of files overnight between our main company and its agencies abroad. We have been using allwaysync for a while. It allows folders/ftp synchronization. It has a nice interface that allow folders and files analysis and comparisons, and it can be of course scheduled.
UNC shared folders and no replication has many downsides, especially if IIS is going to use UNC paths as home directories for sites. Under stress, you will run into http://support.microsoft.com/default.aspx/kb/810886 because of the number of simultaneous sessions against the server sharing the folder. Also, you will experience slow IIS site startups since IIS is going to want to scan/index/cache (depending on IIS version and ASP settings) the UNC folder.
I've seen tests with DFS that are very promising, exhibition none of the above restrictions.
We use ROBOCOPY in my organization to pass files around. It runs very seamlessly and I feel it worth a recommendation.
Additionally, you are not doing anything too crazy. If you are also proficient in perl, I am sure you could write a quick script that will fulfill your needs.

Resources