Difference between Lighthouse numbers and observed performance - mobile

I have spent literally weeks chasing the perfect desktop and mobile scores from Google Search Console. My website is new, so there is not enough data, and I'm forced to use Lighthouse.
At 1AM today my score was 97 desktop, 80 mobile. At 2PM today, with no changes, it dropped to 80/32.
I am also being informed that there is a FOIT due to improper loading of code.
I have run these tests dozens of times over the past few weeks, making minor changes and then looking at their impact. I am chasing smoke.
Just before I wrote this note my son and I decided to test real-world timing for the first time. I have a desktop pc, Windows 10, all properly updated. i-5 processor about 3 years old, 12 gb RAM, with a 300 MB/s cable modem connection on ethernet (wired).
Lighthouse speed: Time to Interactive: 1.8s The website loaded so fast it was not measurable with a digital stopwatch,at well under 1 second. Next, we tried mobile connections over home wifi (Netgear Nighthawk AC 1900). First with a Samsung S8 cellphone accessing the website via my Google Ads website link.
Time from pressing the website button to complete load: 1.8 seconds, +/- .2 seconds. Next, my son's iPhone8 plus. Time to complete load: 1.6 seconds, +/- .2 seconds. Lighthouse result: "Time to Interactive"8.4s.
The results were close on the desktop, but off by about 400% on the mobile connections. I am a retired doctor with no IT training, but I know where to find the information I need and apply it. I just spent the last 2 weeks chasing numbers that have little relationship to real-world experience; an enormous waste of time.
I suggest that if there is not enough real-life data to assess "time to interactive", that you might be better served not using a service that attempts to mimic real-world experience with such poor correlation. I just wasted a lot of time chasing numbers that tend to arbitrarily change and have no relationship to observable and measurable performance, and I am sure I'm not the only one.
In addition, there was zero observable FOIT on any device, so if it does exist, it is too fast for the human eye to observe, and as such is irrelevant. Knowing this, what is the purpose of the Lighthouse test, why do its numerical scores change so dramatically during the course of a day, and how do the numerical results of such a test improve real-world experience?

Related

AWS Lightsail Metric graphs "No data available"

We're using AWS Lightsail PostgreSQL Database. We've been experiencing errors with our C# application timing out when using the connection to database. As I'm trying to debug the issue, I went to look at the Metric graphs in AWS. I noticed that many of the graphs have frequent gaps in the data, labeled No data available. See image below.
This graph (and most of the other metrics) shows frequent gaps in the data. I'm trying to understand if this is normal, or could be a symptom of the problem. If I go back to 2 weeks timescale, there does not appear to be any other strange behaviors in any of the metric data. For example, I do not see a point in time in the past where the CPU or memory usage went crazy. The issue started happening about a week ago, so I was hoping the metrics would have helped explained why the connections to the PostgreSQL database are failing from C#.
🔶 So I guess my question is, are those frequent gaps of No data available normal for a AWS Lightsail Postgres Database?
Other Data about the machine:
1 GB RAM, 1 vCPU, 40 GB SSD
PostgreSQL database (12.11)
In the last two weeks (the average metrics show):
CPU utilization has never gone over 20%
Database connections have never gone over 35 (usually less than 5) (actually, usually 0)
Disk queue depth never goes over 0.2
Free storage space hovers around 36.5 GB
Network receive throughput is mostly less than 1 kB/s (with one spike to 141kB/s)
Network transmit throughput is mostly less than 11kB/s with all spikes less than 11.5kB/s
I would love to view the AWS logs, but they are a month old, and when trying to view them they are filled with checkpoint starting/complete logs. They start at one month ago and each page update only takes me 2 hours forward in time (and taking ~6 seconds to fetch the logs). This would require me to do ~360 page updates, and when trying, my auth timed out. 😢
So we never figured out the reason why, but this seems like it was a problem with the AWS LightSail DB. We ended up using a snapshot to create a new clone of the DB, and wiring the C# servers to the new DB. The latency issues we were having disappeared and the metric graphs looked normal (without the strange gaps).
I wish we were able to figure out the root of the problem. ATM, we are just hoping the problem does not return.
When in doubt, clone everything! 🙃

Interpreting cost data in Google Cloud Platform

I host a basic web app on Google Cloud Platform, and I've noticed my costs creeping up over the last couple of months. It's really accelerated over the last 30 days (fortunately, on a tiny base - I'm still ticking along at under $2 a day). I haven't added any new functionality or clients in months so this was a bit surprising.
My first instinct was an increase in traffic. I couldn't see anything like that in the App Engine dashboard, but I put in a heap of optimizations and dramatically decreased QPS just in case. No change.
The number of instances hasn't moved around much either - this looks like the most likely culprit but it's still just flat, not growing.
My next guess was that data was accumulating in Datastore (even though the cost chart is filtered to App Engine only, I figured a fuller datastore -> a slower datastore -> more instance time in GAE). There's no chart for this, annoyingly, but I determined the data store size was more or less flat (I have a blunt instrument TTL job that runs daily) and culled it by dropping my retention threshold by 20% just to be safe.
These optimizations were on the 17th, but my cost hasn't moved at all. I considered forex fluctuations (I'm billed in Aussie dollars, all my charges are for frontend instances in Japan) but they haven't been anywhere near big enough to explain this.
Any ideas what's going on? I've clicked through all the graphs and reports in billing but can't reconcile the ~100% growth in cost with a flat or dropping qps, instance count and database size.
Yes! I've seen the same thing on a simple App Engine website running Python 3.7! I've had a ticket open since April 29th and they're not helpful. I saw a step change in frontend instance hours on March 24th with no corresponding increase in traffic. I have screenshots that are really telling but I can't upload them since I don't have 10 reputation points.
There's no corresponding increase in traffic, either in the cloud console or in Google analytics.
What's worse, each day the daily estimate shows I'm be under the 28 hour quota. For example, I took a screenshot that showed after 15 hours I was on pace for 24.352 frontend instance hours for the day (I didn't take one at the end of the quota day since it resets at 3AM)
When I woke up the next morning the billing report showed I was charged $0.00 for frontend instance hours for the previous day, but 3 hours later it shot up to $0.48, which means I used 38.6 frontend instance hours worth.
Somehow, the estimated cost calculation was off by 14 hours. Why have the estimate at all if it has an error that large? When I looked at the minute-by-minute billed instance hours for the hours after taking the screenshot through the end of the quota-day, there's nothing that indicates I would have used 23 additional hours from the time I took the screenshot to the time of the quota reset.
This behavior has been happening every day since March 24th for me with no explanation from Google besides "it looks like you exceeded your instances..." I wish I could share the screenshots so you can compare what you're seeing.

What amazon ec2 would be suggested for an ecommerce website with windows and sql server?

We are running an eCommerce venture that has around 2000 unique visitors in a day. The total data is around 6 GB as of now.
We are using SQL Server as our database and in the coming months the website may scale up to 10000 users per day.
From this link deciphered that it would be best to use M1 instance but could anyone help really clueless as to what to purchase from these options.
Note: Our budget is around 170 Dollars PM.
EDIT: The number of concurrent users we have had is around 150
I'd try to fit everything in memory. If you can't due to budget, you need to make sure the disk response times are up to par for your expected load. You application can vary widely. One visit to a homepage could generate many queries, or maybe you have application caching set up - so it's hard for anyone to just tell you. You should also get solid numbers on your peak number of concurrent users so you can plan for that. You don't mention your current environment, but you can get some numbers about CPU, Disk MB Read/Writes/s and memory used to help you get the right size.
I'd look at the xlarge m1. That gives you 15GB of memory to play with. You'll be able to cache all the data you need and have some left over for the OS and also have some room to grow. CPU probably won't be your issue, but be sure to check out your current use.
If you have some time to spend on it, I'd try setting up JMeter to do some load testing and see how many concurrent users you can max out with one of the cheaper options.
This topic may be better suited to ServerFault.
I'd suggest you to check at the reserved instances in heavy category where you can get interesting discounts if you plan to run it for a year or similar.
But with that budget you should be thinking about an m1.medium instance, which might be a little tight for your requirements.

App Engine High CPU Warning - Is it really an issue?

I recently finished my app and have since began testing. One of my
main resources makes 14 RPCs, I have made every effort to decrease CPU
time, i.e., marking properties as unindexed, however I really cannot
reduce the number of GETs, PUTs and queries that occur in the request.
The average CPU times associated with the request today is:
ms=381 cpu_ms=1192 api_cpu_ms=1122 cpm_usd=0.033597
This does fluctuate and can be less, however it rarely falls low
enough to have the high CPU warnings removed.
I've noticed my app still scales fine, I have about 15 instances
running whilst testing.
So my question is, should I be concerned at the high CPU warnings or
is it something most people experience?
As long as your wallclock time (the first time in those timings) averages less than 1000ms, your app will continue to autoscale just fine, regardless of how many CPU ms your app uses. As systempuntoout explains, the high CPU warnings are a holdover from when we limited high CPU requests on a per-url basis, and no longer apply except as a guide for optimisation.
If your application can't be optimized but the CPU usage is fine for your daily budget, I think you could ignore this warning without any concern.
Years ago, the number of High CPU Requests in a specific time was limited and the warning was useful to monitor and correct the application to stay above that specific quota; that limit was then removed and the High CPU warning is now useful just to identify part of the program that should be checked for optimization (where possible).
In practice, I've never had an issue with not enough instances being spun up to handle user requests even if I am running many longer-running jobs in the background [e.g., an expensive mapreduce job]. Of course, your budget needs to cover your CPU expenses or your app won't be able to operate.
However, I remember reading in the docs that if the majority of your requests take a long time, then app engine might not spin up additional instances to handle requests. However, if the majority of your requests are served quickly, then a minority of long-running jobs shouldn't cause any problems. (Unfortunately, I can't find the article/docs with the specific guidance on longer running requests at the moment.)

Is app engine more expensive when it's slower?

There have been quite a few occasions recently when app engine appears to run slower. To some degree that's understandable with the architecture of their cloud platform. I'm not talking about new server instances - just requests to warm servers. I'm also just referring to CPU, not datastore API, but I do wonder about that as well.
It seems that during these slow periods I get a lot more yellow warnings on my requests - saying I am using a lot of CPU. Certainly they take longer to complete during this period. What concerns me is that during these slow periods, my billable CPU seems to go up.
So to be clear - when app engine is fast, a request might complete in 100ms. In a slow period, it might take more than 1s for the same request. Same URI, same caching, same processing path, same datastore, same indexes - much more CPU. The yellow warnings, as I understand it, are referring to billable CPU usage, and there's many more of them when app engine is slower.
This seems to set up a bizarre situation where my app costs more to run when app engine performance is worse. This means google makes more money the more poorly the platform performs (up to the point where it fails or customers leave). Maybe I've got the situation all wrong, and it doesn't work like that - but if it does work like that, then as a customer the pressures and balances there are all wrong. That's not intimating any wrong-doing on google's part - just that the relationships between those two things don't seem right.
It almost seems like google's algorithm goes something like - 'If I give a processing job to a CPU and start my watch, then stop it when the job returns I get the billable CPU figure.' i.e. it doesn't measure CPU work at all. Surely that time should be divided by the number of processing jobs being concurrently executed plus some extra to cover the additional context switching. I'm sure that stuff is hard to measure - perhaps that's the reason.
I guess you could argue it is fair that you pay more when app engine is in high demand, but that makes budgeting close to impossible - you can't generate stats like '100 users costs me $1 a day', because that could change for a whole host of reasons - including app engine onboarding more customers than the infrastructure can realistically handle. If google over-subscribes app engine then all customers pay more - it's another relationship that doesn't sound right. Surely google's costs should go down as they onboard more customers, and those customers use more resources - based on economies of scale.
Should I expect two identical requests in my app to cost me roughly the same amount each time they run - regardless of how much wall-time app engine takes to actually complete them? Have I misunderstood how this works? If I haven't, is there a reason why I shouldn't be worried about it in the long term? Is there some documentation which makes this situation clearer? Cheers,
Colin
It would be more complicated, but they could change the billing algorithm to be a function of load. Or perhaps they could normalize the CPU measurements based on the performance of similar calls in the past.
I agree that this presents problems for the developers.
Yes this is true. It is a bummer. It also takes them over a second to start up my Java application (which I was billed for) every time they decided my site was in low demand, and didn't need the resources.
I ended up using a cron to auto ping my site every minute to keep it warm.. doing all the wasted work made my bill cheaper, as it didn't have the startup time, instead it just had lots of 2ms pings...
This question appears old and I think the pricing scheme must have changed...
The Google App Engine charges for "instance hours" and the instances currently spawned are viewable in the GAE console. And Google provides adjustments so you can decide cost vs latency for your app.
https://developers.google.com/appengine/docs/adminconsole/performancesettings
I did noticed that if the front-end is bogged down hitting a common backend resource that GAE will spawn a bunch of instances to get latency down. And you will pay for those instance hours even though latency/throughput doesn't improve. The adjustments I mentioned seem to help with that.

Resources