Which service helps sites to solve scalability/availability problem? [closed] - database

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I run a data-intensive site with CMS built by outsourcing. Now as the no. of users grows it is getting slower. It goes down during major product launches. What services that I can turn to, to analyze the bottleneck of the site and analyze SQL queries, etc? Service which can provide solutions like separating load between redundant servers, configure master-slave for db, etc? I am new to this.

If you have a custom-built CMS, you're going to need software engineers to analyze the problems and propose solutions; there's no off-the-shelf solution for that. The engineers you're looking for would need to understand the programming language your CMS is built in, and have experience in web scalability. Obviously, your outsource partner would be a good candidate here...
If the CMS is an off the shelf solution, the vendor might be able to recommend specialist service providers, independent of the outsource partner.
In general, the performance / scalability process is:
understand the targets, ideally represented as page generations per
second, subdivided into page types if necessary (e.g. "50 CMS product
pages / second, 20 logins / second"). Establish the maximum acceptable response time (e.g. 1 second average, 4 seconds max).
create a test environment which you understand completely, and which you can relate to the target production system. The test environment should be easy to work with, and accessible to the team; typically I recommend using a developer work station, or a low-powered VM. The purpose is to bring bottlenecks to light, not to handle huge amounts of traffic.
establish test targets for your test environment - e.g. if "production" needs to handle 100 page generations / second, your test environment might only need to handle 20 page generations / second.
deploy your application to the test environment, and set up a way of collecting performance information, e.g. CPU, memory, disk, network, etc.
run load tests on the test environment; increase load until you exceed the response time targets. Use the monitoring tools to identify the bottleneck.
fix bottleneck
rinse & repeat until you hit your performance targets
deploy application to production-like environment - similar capacity and architecture - if you're lucky enough to have one. Set up monitoring and performance capture tools.
run load test, increasing load until you exceed your performance targets
if you've met your goals, congratulations!
if you haven't met your goals, it suggests your production environment has a different bottleneck than your test environment (often the database). Find out what the bottleneck is; try to replicate on your test environment.
restart testing on test environment.
Quite often, you will find you have to make architectural or infrastructure changes to reach your targets; I've used the following:
- run the solution on bigger hardware.
- introduce a CDN to offload traffic - some CMS-driven sites can be cached almost entirely on a CDN
- introduce caching into the application, ideally at the page generation level, but a typically web app has many places where caching can help
- add more front-end web servers (assuming application was built with load balancing in mind)
- add more database servers (this is nearly always a major intervention unless the app was built with this in mind)
Load testing tools are available as services you can hire (Keynote is one I've used), or tools you can run yourself (JMeter is my favourite).

Related

Best hosting option for migrating off Google App Engine (GAE) in my case? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I have a small Google App Engine site which seems to be outgrowing it, and I want to migrate somewhere else.
It is based on Java / Stripes Framework / Objectify, and only uses URLFetch from Google services. It uses ~60 front-end hours and ~4 GB datastore at the moment, with ~5k visitors doing ~25k page views per day.
Reasons why I believe I should migrate:
I have made some assumptions early on which no longer are valid and am running into 1MB memcache/datastore limits. While I could refactor this it would likely increase the number of datastore operations / overall worsen performance characteristics, and involve a database conversion step (which I may as well use to migrate somewhere else)
I want to add some features which would involve a significant increase in stored data (to ca 100 GB) and front-end hours
As I'm using resources past the free quota, the increase in costs seem to rise quite fast. While they are quite manageable now, if the application becomes more popular, I'm afraid it may no longer be affordable.
Some stability issues (getting some OutOfMemoryErrors and other errors that I cannot explain, and cannot replicate very well on my local environment)
I'm evaluating the following options:
Staying on GAE, optimizing the application, living with the growing costs (Cons: still will be having high costs and reliability issues)
Moving to AWS EC2 / EBS with MongoDB as a datastore (Pros: probably the most mature solution, Cons: appears difficult to set up, easy to make architecture/design mistakes).
Using Appscale to hopefuly largely leave my application as-is, but host it on AWS EC2 (Pros: seems easy on paper, Cons: seems to presume a largely Unix development environment, no idea if production ready or what is happening behind the scenes)
Use CloudFoundry.com with MongoDB as a datastore (Cons: no idea if production ready, post-beta costs are not known)
Get a VPS or a dedicated server with some hosting provider, deploy using MongoDB as a datastore (Cons: probably teaches me less of the things I want to learn than the other options, plenty of sysadmining to do)
It is a hobby site, so part of the goal is to also learn some new technologies in practice, I'd just want to invest my time in learning the right ones.
Notes - I have some, but quite limited system administration skills, especially on Linux, and I don't enjoy doing it. I have done a small project in MongoDB before (never put in production though). I've never used any of the AWS infrastructure.
My questions:
a. Is AppScale mature enough for the purpose of running a small website without much hassles (bugs, lack of documentation, etc.)? Is the learning curve very steep? How much system administration would using it in scenario #3 require? And most importantly - do I understand correctly that the Google 1MB and so forth limits are all there on AppScale?
b. Is CloudFoundry mature enough for the purpose of running a small website without much hassles (bugs, lack of documentation, etc.)? Is the learning curve very steep? How much system administration would using it in scenario #4 require? I presume moving off from CloudFoundry.com to anther CloudFoundry should be fairly easy if needed.
c. How much sysadmining does deploying on AWS EC2 / EBS involve for the described application? Assuming I don't care that much for temporary outages, but care about permanent data loss, do I need to mirror the EBS on my own, or can I just leave AWS to do it?
d. Which of the new options (AppScale, CloudFoundry, EC2/EBS) would work fine with a Windows / Eclipse based development approach? Which has the best Eclipse plug-ins?
I'm asking because upon quick review of AppScale docs, it seems they assume the development VM will be hosted by a Unix host, which is another hassle for me.
e. Which of my options 1.-5. would you recommend in my case?
I'm split between #2 and #4 at the moment.
Just some observation:
a. AppScale is a thin wrapper around other technologies (runtimes, datastores), so in general it's as reliable as those underlying parts are. For a small non-mission-critical website is IMO reliable enough. Btw, the memcache 1MB limit is per-object, not per-memcache. So I suppose you could break it up into multiple smaller objects.
b. I don't have experience with CloudFoundry, but they do say they are "beta" and they do not have SLA: http://support.cloudfoundry.com/entries/20971351-cloud-foundry-sla\
c. I'd guess a few hours a week. ESB is a disk-based service so you should not be loosing data with it. But you can still do incremental backups of ECB to S3. There are many solutions that do it automatically, for example: http://www.stardothosting.com/blog/2012/05/automated-amazon-ebs-snapshot-backup-script-with-7-day-retention/
d. IMO EC2 is the most mature with the most tools available. Note that AppScale is just a wrapper - you can deploy it to EC2. Dev environment (Eclipse+Windows) has nothing to do with deployment environment (Usually Linux, can also be Windows on EC2).
e. Personally I'd recommend staying with GAE (= option 1). IMHO anything else would be less reliable and more costly (due to setup/support costs, not even comparing base service costs).
Btw, if you are getting OutOfMemoryErrors you should really review your code. Where do you keep massive amounts of data in memory?

GAE Vs AWS 2012 [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Is GAE is a good option for the backend when compared with AWS? The information found mainly discusses the issues that GAE has resolved as of today. The mobile application under consideration deals with the images. Sharing and editing of images simultaneously with multiple users.
I am mainly concerned with the Scalability & flexibility in the implementation. The robust & compability layer, Storage and data analysis (analysis (identifying the patterns)of the data stored).
AWS lets use the popular open source technologies & tools and has a granular pricing. GAE is good to get to market really fast, no administration pain, and a free quota.
Can you please point out some important pros & cons that I must consider before taking the decision.
I think that GAE is good for its fast startup and for proof of concept. It is really simple and cheaper to begin with, but it locks you in to google.
If your idea works well and it becomes popular you can rewrite it using open source technology in future.
I have 25 GB appengine DB. Every 1-10 minutes I add record.
It costs $2.5 per week.
But originally it was more expensive to upload then I ever expected.
My upload script was uploading chunks 500 records per request.
Requests were ending in 10-15 seconds but logs showed datastore time much more like 5 minutes vs 15 real seconds!
Also upload servlet was waiting 99% of time doing nothing and I had to pay for that as well.
It took days to upload 15 GB of indexed data.
AppEngine has certain pricing risks
GAE is essentially for "throw-away" proof of concepts or "very very small" applications. I say this because, I would not invest is a large amount of money into a completely vendor locked system...Other people might but I wouldn't take that kind of risk since I'd be at Google's whim on their availability and pricing.
So if you have a large project or product, you're probably better off using EC2 since all it's providing is the infrastructure...there's no code requirements imposed on you.
That being said, if I had a small project I wanted to toss on the web for my friends, I'd definitely take advantage of GAE's free tier.
I think the biggest difference is that, in a general sense, EC2 hosts servers whereas GAE hosts code. If you're building a system where you're going to want to do things like tail logs, have cron jobs managed by a sys admin, use open source tools like rsync and have fine grained control over OS and configuration or co-locate services on one box, then EC2 is very compelling.
GAE is "upload your app and it works". Very cool in its own right but personally I'd rather deal with VMs in EC2 because it's a more natural dynamic for systems development for me at least.

How to do Agile Developing and testing High volume processing systems? Tools? Methodologies? Recommendation? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am developing high volume processing systems. Like mathematical models that calculate various parameters based on milions of records, calculated derived fields over milions of records, process huge files having transactions etc...
I am well aware of unit testing methodologies and if my code is in X# I have no problem in unit testing is. Problem is I often have code in T-SQL, C# code that is a SQL stored assembly, and SSIS workflow with a good amount of logic (and outcomes etc) or some SAS process.
What is the approach YOu use when developing such systems. I usually develop several tests as Stored procedures in a designed schema(TEST) and then automatically run them overnight and check out the results. But this is only for T-SQL. But the problem is with testing SSIS packages. How do You test it? What is Your preferred approach for stubbing data into tables (especially if You need a lot data initialization). I have some approach derived over the years but maybe I am just not reading enough articles.
So Banking, Telecom, Risk developers out there. How do You test your mission critical apps that process milions of records at end day, month end etc? What frameworks do You use? How do You validate that Your ssis package is Correct (as You develop it)/ How do You achieve continous integration in such an environment (Personally I never got there)? I hope this is not to open-ended question. How do You test Your map-reduce jobs for example (i do not use hadoop but this is quite similar).
luke
Hope that this is not to open ended
Firstly build logging, monitoring & double entry systems into what you're building.
Ensure that even with these systems switched on, performance is acceptable, so benchmark, and profile these, and ensure the hardware is appropriate for the entire system.
Split each system into sub-systems which can be tested independently, so try and ensure systems are designed to be quite loosely coupled.
Also ensure each sub-system validates their inputs before processing further, this ensures erroneous data is stopped before it becomes a bigger problem.
By using logging, you can test a variety of systems in a similar way.
For any system which doesn't have unit test frameworks available, use logging, and then test the logs generated.
This should allow you to test SSIS processes, Workflow's, or assembly's.
Monitoring & double entry systems, will flag up errors & process problems, so you can identify and ideally resolve them in a timely fashion.
Finally, when systems go live, don't switch logging off entirely.
If necessary, reduce it's verbosity, but ensure this can be switched on, to debug processes, as problems will still occur in the live environment which you need to resolve.
Ensure you use live data, and edge cases, for automated testing.
Use code reviews or pair programming to ensure the code is optimal.
Ensure you use expert QA staff to think of use cases you won't think of.
Ensure you have a excellent project manager, who can manage you, your team, the related teams, the end users, and your bosses, and ensure everyone is communicating appropriately.
You won't be able to achieve well tested processes without a well run team.
Using some of the above, has allowed us to develop well tested processes, which handles billions of pounds worth of transactions annually, so we must be doing something right.
Automated regression testing, not unit testing. Custom tools to compare input and expected output. Performance over everything. Performance tests. Test using pre-loaded systems. Try on x64, x32 etc. Custom tools to synthesise data based on business cases. Modular dtsx. One dev per dtsx. List goes on.

Pros & Cons of Google App Engine [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
[An Updated List 21st Aug 09]
Help me Compile a List of all the Advantages & Disadvantages of Building an Application on the Google App Engine
Pros:
No need to buy servers or server space (no maintenance).
Makes solving the problem of scaling easier.
Free up to a certain level of consumed resources.
Cons:
Locked into Google App Engine ?
Developers have read-only access to the filesystem on App Engine.
App Engine can only execute code called from an HTTP request (except for scheduled background tasks).
Users may upload arbitrary Python modules, but only if they are pure-Python; C and Pyrex modules are not supported.
App Engine limits the maximum rows returned from an entity get to 1000 rows per Datastore call. (Update - App Engine now supports cursors for accessing larger queries)
Java applications may only use a subset (The JRE Class White List) of the classes from the JRE standard edition.
Java applications cannot create new threads.
Known Issues!! : http://code.google.com/p/googleappengine/issues/list
Hard limits
Apps per developer - 10
Time per request - 30 sec
Files per app - 3,000
HTTP response size - 10 MB
Datastore item size - 1 MB
Application code size - 150 MB
Update Blob store now allows storage of files up to 50MB
Pro or Con?
App Engine's infrastructure removes many of the system administration and development challenges of building applications to scale to millions of hits. Google handles deploying code to a cluster, monitoring, failover, and launching application instances as necessary.
While other services let users install and configure nearly any *NIX compatible software, App Engine requires developers to use Python or Java as the programming language and a limited set of APIs. Current APIs allow storing and retrieving data from a BigTable non-relational database; making HTTP requests; sending e-mail; manipulating images; and caching. Most existing Web applications can't run on App Engine without modification, because they require a relational database.
Pros:
Scalable
Easy and cheaper (in short term).
Nice option for start-ups/individuals.
Suitable for apps that just store and retrieve data.
Cons:
Not suitable for CPU intensive calculations. They are slower and expensive.
Scalability doesn't matter much cuz if an app works at Google scale then probably it makes enough money to run on its own servers.
They have lots of limitations thrown here and there, as a result deep data analysis is difficult. Like you cannot produce a social graph using GAE.
I would say its not meant for serious businesses and expensive in long run.
(A huge new) PRO: GAE now supports MySQL :
https://developers.google.com/cloud-sql/
Pros:
built-in ui for unified logs
built-in web interface for task queues
built-in indexes on list of primary objects.
Cons:
loose logs very fast
VERY expensive
VERY expensive
VERY expensive
Un-hackable. Scales because you're obligated to code in a way that scales.
Longer development cycles. Sometimes you just want to hack something together and throw it away after 5 hors. With appengine you have to proper code it and write a lot of stuff to make it sure it scales. You can't just do a "find . | grep .avi | xargs ffmpeg -compress ...." :)
You will loose hours trying to do the simplest tasks like sending push notifications to APNS (iPhone). Although it's fine if you only want to support android in the future.
Terrible to make cleanups on the database. It's a HUGE pain in the ass to fix rows in the database, mainly because terribly slow, but it also requires a lot of code to loop properly within it's time constraints.
It was a pain to port Lucene to work on it's "filesystem".
Slow for what you pay.
Even MORE expensive if your app has spikes of traffic. My app has those spikes if a user that has many followers makes an action and we have to push notifications to his followers. Because of that I have to keep 10 inactive servers always on ($$$$$) to handle spikes.
Appengine isn't too bad due to the fact that I have the option to burn $$$$ instead of being concerned about scalability and fixing bottlenecks to reduce server usage. Sometimes it worth it.
My advice to people starting new products is to go with hetzner.de which is where I host my other products servers. It's cheap and extremely hackable. I have one server at hetzner that is handling 3x more traffic than the product that I have on appengine. The difference in price is $100 a month versions $2700 a month!
I have system admin experience, so the bottom line is that I would never choose appengine over having my own ROOT server. Don't be that bored software engineer wanting to experiment new things instead of building great products!
Pro: Unlimited scalabity to your application and scales with demand.
Con: Not available in some countries (Argentina).
Edit
Available worldwide, but only through Google Groups for App Engine.
When assessing pros and cons, I think it is important to clarify the market for which one is representing. Developers looking for a cost-effective solution to help them with the steep part of their planned hockey-stick growth curve will weigh heavily the cons already listed. For a small business owner, however, GAE is a God-send. These folks most often are looking to "the cloud" as a means to more effectively run their business (i.e. sell physical product and services). For the SMB, GAE the pros already listed can be much more valuable compared to the hockey-stick seeking dev, whilst the cons weight in at a fraction of the devs' measure. I don't see the GAE team doing anything related to SMB positioning, so I guess answers like this are me just pulling on Superman's cape, and spitting into the wind. Really GAE should be absolutely ruling the SMB space now. If not (I have no insights re: user base), then its is a greatly lamentable failure.
I believe , GAE is yet to mature in terms of providing the basic features for serious business such as Datastore with complex primary key, java.awt.* support, these are just a few I'm naming.
Other than the free space and to build some "Hobby" websites, I strongly feel GAE is NOT the place java guys should looking into.
I'm having applications built on the JSP/Servlets and MySQL, thinking about migrating to GAE, but I find I will be spending more "value time" on the migration than just buying a space from some java hosting provider such as EATJ, etc (Sorry not marketing, just an experience).
Another big issue I've got is migration of my existing mySQL data into GAE, bulkupload is really pathetic and has no client support.
No support for Local Db to Server DB upload.
Once the GAE is ready with "all the Cons" mentioned by above, then I'll think we can look in to this migration.
You are force to own a cell phone line, and your country+carrier must be able to receive international SMSs.
(I hate cell phones, and my mom's or co-workers won't get the SMSs)
Con: No Other RDBMS or NoSQL databases are not possible ....
Con: All your base are belong to us
... On a serious note:
Con: You don't control the environment your application runs in. The same cons as with outsourcing any component. Fun for toys, not for business (yet) IMHO.
Various things like API for Google proprietary backends such as their database system and other 'lockdowns' and frameworks that mean your code is tied, in some loose sense to their system can create cost issues later if you want to migrate from GAE. Of course, you could abstract these.
I like GAE, AppJet and others. They are cool. But everything has its place. If you want freedom and the ability to control your language's modules, API, syntax/stdlib versions and whatnot ... don't relinquish control to a service provider.
The lack of standards for environments and specifications for what your app can expect worries me in the cloud arena.
common sense stuff really.
Con: Limited to Java and Python

Google App Engine as production platform [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 years ago.
Improve this question
We are about to start working on new commercial web project and considering Google App Engine as a potential platform.
Questions:
Does Google App Engine is really
scalable and may be considered as a
production platform for commercial
project?
Is it more expensive (or
cheaper) than good hosting company
service in long run?
Is it
possible (and pretty cheap) to move
the app from Google App Engine to
independent server/farm (e.g. to use it
as a private system, to exploit our
own hardware etc.)?
Is there some
mechanism to deal with DDoS attacks?
Can I make a full backup of the
app data?
Sorry for such silly questions.
I'll answer question 1:
I'm in the pilot phase of a new web application on app engine. We've spent about a month writing code and getting things ready for our first customer. They went live last week. They love the software but a couple of days ago I started to get random deadline exceeded errors in the application. You look up a record or a list and it would come back in miliseconds. The next go it would take 30 seconds and come back with a deadline exceeded error.
The stack traces in the dashboard give random results. I've tried everything, even stripping the app down to a hello world. I put a log message into our django process request middleware, the first bit of our code that gets executed. It was showing that on the timeout requests it took 25 seconds from google getting the request to running our process_request code. I posted to the google forum and got nothing. I contacted someone at google and they answered back quickly but only said they would contact the team. Nothing since.
It is possible there's something I'm doing to cause this but I really doubt it. Google doesn't provide support so I'm basically out of luck.
If this was a full blown commercial application I'd be out of business.
tl;dr: google app engine has great promise but needs to mature and is not yet suitable for comercial production
Watch google IO (Whre among other they say that: "yep it's scallable".
That depends ... It can even be free for you (you pay for load that you've got).
You can move to Amazon for example using appdrop. It's also a good idea to use app-engine-patch.
... Good question. I realy do not know.
Use GAEBar.
It all depends on your needs.
For a project that has the need to scale from very few users to possible millions of users in short time, google app engine might be exactly what you're looking for.
However, note that you might be surprised by the limitations that GAE comes with. Datastore can amongst others not do full-text search or queries using the IN statement.
So be carefull to specify what needs your application will have, and what data you're going to store and search for.
This also means that moving your application from GAE to a separate server might be troublesome, since the database architecture will most likely be different.
My answers:
BuddyPoke runs on gae (probably the biggest app), check their millions numbers.
You don't pay until your app grows quite a bit
If you are familiar with python, web2py offers this feature with some limitations
Dos protection (java, python)
Gaebar, here a great article.
You're question #3 raises a red flag. If this is an important issue, I'd caution against App Engine at this time. I love the platform, and don't doubt that their will be viable migration paths to a self-hosted solution at some point, but not now. Things like appdrop prove it would be possible to do, but would the effort and investment be worth it? That's the question I'd ask. I'd love to know if somebody has successfully ported a real-world production app engine app to another host.
Backups should be easily scripted or there are tools like GAEbar as Bolotov mentioned.
Regarding cost, you can probably get tens (maybe hundreds) of thousands of objects (records) and decent traffic/use for free. Beyond that, I'm not sure about comparative hosting costs, sounds like a good area to do some research in (note to self).
Finally, Silfverstrom is right about limitations, especially around full-text search. There are some projects underway to tackle this, but probably nothing as robust as a mature RDBMS.
To update with some more recent info (2013), GAE now has a text search API. You can't search data in the database directly; you create searchable documents from your data, and add those to a searchable index. It's not terribly hard to do, but it's a hassle. In particular, whenever your data changes, you need to re-regenerate the changed documents and update them in the index.
It's also fairly easy to export data into Google Big Query, which makes it easy to do reporting.

Resources