I am deploying my react app in AWS S3 using AWS code build and caching through AWS CloudFront, But the bucket size is increased to more than 10GB within a month due to frequent deployment.
I tried to delete old build files while deploying but it is causing issues to users who has the old code cached in their browser. As the old files trying to get the previous version build but those are deleted, So it throws 404.
I tried to set no-cache for index.html file but that does not resolve this issue.
Does anyone face this issue?
#Nilanth here is what I do for the similar case:
My stack is also a React app (not so business critical) (it is used to offer article selection possibility for main content management flow..) app is build via CodeCommit - Codebuild to s3 Bucket using CodePipeline & buildspec.yml file. Build it triggered by commit of the repository. I faced a similar problem, that Cloudfront didn't "offer" the newest JS files for the browser (html) so it started to feel like Cache-issue.
I make pretty good solution like this:
Update Cloudfront Cache settings (edit behaviour, set to use "Use legacy cache settings") and set min / max TTLs to 0. (this helps for cache, so user should get newest versions immediately)
For JS / CSS file issue, I add "aws cli remove command" lines to buildspec.yml file like:
aws s3 rm s3://<s3_bucket>/static/js/ --recursive
aws s3 rm s3://<s3_bucket>/static/css/ --recursive
Set those as pre_build commands
Note: See that by removing JS files your application can not be used before new ones are offered again from folders /js & /css. I your application is business critical then u could think beyond this, since there will be 30 - 60s time that app can not be used. And what if build fails, then there is no js/css assets at all, well then you can trigger old build from Codebuild. This will require some effort to do business critical app's Devops work here..
To allow "remove" executions to S3 Bucket, you need to give Codebuild additional permissions. Go to build projects, see the environment's service role. Then go to IAM / roles / pick up the correct role name, and give more S3 permissions, e.g. AmazonS3FullAccess, its enough for sure..
I am not sure, that this is 99% correct solution from Cloudfront side, but it seems to avoid caching-problem and also the bucket size stays small.
-MM
There are many elements there that could throw 404 you'll need to prove one-by-one if they are working to find the root cause(s).
First I'd try the bucket itself, use <s3-bucket-url>/index.html and see if the file (in this case index.html ) exists.
Second the cloudfront, I'll assume the cloudfront distribution is configured correctly (i.e. / path redirects to /index.html). Also, every time you edit the bucket files, create an invalidation to speed up propagation.
Third, you'll need to tell your users to constantly hard reload the page, or use incognito, specially if your site is in constant development.
Related
I have a Google App Engine app, which connects to Google Cloud Storage.
I noticed that the amount of data stored was unreasonably high (4.01 GB, when it should be 100MB or so).
So, I looked at how much each bucket was storing, and I found that there was an automatically created bucket called us.artificats. that was taking up most of the space.
I looked inside, and all it has is one folder: containers/images/.
From what I've Googled, it seems like these images come from Google Cloud Build.
My question is, can I delete them without compromising my entire application?
I have solved this problem by applying a deletion rule. Here's how to do it:
Open the project in Google Cloud console
Open the storage management (search for "Storage" for example).
In the Browser tab, select the container us.artifacts....
Now, open the Lifecycle section. You should see something like:
Click on Add a rule and provide the following conditions:
In the action, select Delete object
In the conditions, select Age and enter for example 3 days
Click on create to confirm the creation
Now all objects older than 3 days will be automatically deleted. It might take a few minutes for this new rule to be applied by Google Cloud.
For those of you seeing this later on, I ended up deleting the folder, and everything was fine.
When I ran Google Cloud Build again, it added items back into the bucket, which I had to delete later on.
As #HarshitG mentioned, this can be set up to happen automatically via deletion rules in cloud storage. As for myself, I added a deletion step to my deployment GitHub action.
Here is the reference to the documentation: link
Built container images are stored in the app-engine folder in Container Registry. You can download these images to keep or run elsewhere. Once deployment is complete, App Engine no longer needs the container images. Note that they are not automatically deleted, so to avoid reaching your storage quota, you can safely delete any images you don't need. For more information about managing images in Container Registry, see the Container Registry documentation.
This can be automated by adding a Lifecycle rules like #HarshitG mentioned.
You can add a trigger to your lifecycle rules on console.cloud.google.com.
Access the bucket with artifacts (default is "us.artifacts.yourAppName.appspot.com")
Go to "Life cycle".
Click on "Add a rule".
Check "Object delete" and press "Continue".
Check the filter to delete the bucket, I chose "age" and selected three days as the number of auto delete old elements (after element has 3 days of life it's auto deleted).
Click on "Create" and the rule is working now, then you do not need to visit every day to clean the bucket.
Same issue. Thanks for the update, Caleb.
I'm having the same issue, but I don't have an App running; I just have:
Firebase Auth
Firestore
Firebase Functions
Cloud Storage
Not sure why I have 4GB stored in those containers, and I'm not sure if I should delete them or if that would break my functions.
UPDATE:
I deleted the container folder and all still works. Not sure if those are backups or whatnot, but I can't find anything online or in the docs. I will post here if something happens. As soon as a cloud function ran, the folder had 33 files again.
I recommend against setting up a lifecycle rules on your Storage buckets. It will likely lead to breaking subsequent updates to the function (as described in Cloud Function build error - failed to get OS from config file for image)
If you are interested in cleaning up container images, you should instead delete container images stored in Google Cloud Registry https://console.cloud.google.com/gcr. Deleting container images on GCR repos will automatically cleanup objects stored in your Cloud Storage.
https://issuetracker.google.com/issues/186832976 has relevant information from a Google Cloud Functions engineer.
I am building a react app, which consists in a Single Page Application, hosted on Amazon S3.
Sometimes, I deploy a change to the back-end and to the front-end at the same time, and I need all the browser sessions to start running the new version, or at least those whose sessions start after the last front-end deploy.
What happens is that many of my users still running the old front-end version on their phones for weeks, which is not compatible with the new version of the back-end anymore, but some of them get the updates by the time they start the next session.
As I use Webpack to build the app, it generates bundles with hashes in their names, while the index.html file, which defines the bundles that should be used, is uploaded with the following cache-control property: "no-cache, no-store, must-revalidate". The service worker file has the same cache policy.
The idea is that the user's browser can cache everything, execpt for the first files they need. The plan was good, but I'm replacing the index.html file with a newer version and my users are not refetching this file when they restart the app.
Is there a definitive guide or a way to workaround that problem?
I also know that a PWA should work offline, so it has to have the ability to cache to reuse, but this idea doesn't help me to perform a massive and instantaneous update as well, right?
What are the best options I have to do it?
You've got the basic idea correct. Why your index.html is not updated is a tough question to answer to since you're not providing any code – please include your Service Worker code. Keep in mind that depending on the logic implemented in the Service Worker, it doesn't necessarily honor the HTTP caching headers and will cache everything including the index.html file, as it seems now is happening.
In order to have the app work also in offline mode, you would probably want to use a network-first SW strategy. Using network-first the browser tries to load files from the web but if it doesn't succeed it falls back to the latest cached version of the particular file it tried to get. Another option would be to choose what is called a stale-while-revalidate strategy. That first gives the user the old file (which is super fast) and then updates the file in the background. There are other strategies as well, I suggest you read through the documentation of the most widely used SW library Workbox (https://developers.google.com/web/tools/workbox/modules/workbox-strategies).
One thing to keep in mind:
In all other strategies except "skip SW and go to the network", you cannot really ensure the user gets the latest version of the index.html. It is not possible. If the SW gives something back from the cache, it could be an old version and that's that. In these situations what is usually done is a notification to the user that a new version of the app has been donwloaded in the background. Basically user would load the app, see the version that was available in the cache, and SW would then check for updates. If an update was found (there was a new index.html and, because of that, new service-worker.js), the user would see a notification telling that the page should be refreshed. You can also trigger the SW to check for an update from the server manually from your own JS code if you want. In that situation, too, you would show a notification to the user.
Does this help you?
I pushed an HTML static file containing an Angular SPA as catch-all handler for my custom domain with this settings:
- url: /(api|activate|associate|c|close_fb|combine|import|password|sitemap)($|/.*)
script: gae.php
- url: /.*
static_files: public/static/app/v248/es/app.html
upload: public/static/app/v248/es/app.html
expiration: "1h"
That worked fine, but if I push a new app.html it doesn't update. I've tried to change the local path, deploy a new app version, even replacing the catch-all handler with a custom php endpoint, but it doesn't work, the response still is the first version of app.html I uploaded.
Other people has had the same problem (CSS File Not Updating on Deploy (Google AppEngine)), and it looks like is related to Google CDN cache but, as far as I know, there isn't any way to flush it.
There is a way to flush static files cached by your app on Google Cloud.
Head to your Google Cloud Console and open your project. Under the left hamburger menu, head to Storage -> Cloud Storage -> Browser. There you should find at least one Bucket: your-project-name.appspot.com. Under the Lifecycle column, click on the link with respect to your-project-name.appspot.com. Delete any existing rules, since they may conflict with the one you will create now.
Create a new rule by clicking on the 'Add A Rule' button. For the action, select "Set storage to nearline". For the object conditions, choose only the 'Number of newer versions' option and set it to 1. Click on the 'Continue' button and then click 'Create'.
This new rule will take up to 24 hours to take effect, but at least for my project it took only a few minutes. Once it is up and running, the version of the files being served by your app under your-project-name.appspot.com will always be the latest deployed, solving the problem. Also, if you are routinely editing your static files, you should remove any expiration element from handlers related to those static files and the default_expiration element from the app.yaml file, which will help avoid unintended caching by other servers.
When performing changes in static files in an App Engine application, changes will not be available immediately, due to cache, as you already imagined. The cache in Google Cloud cannot be manually flushed, so instead I would recommend you to change the expiration time to a shorter period (by default it is 10 minutes) if you want to test how it works, and later setting an appropriate expiration time according to your requirements.
Bear in mind that you can change the static cache expiration time both for all static files or for just the ones you choose, just by setting the proper element in the app.yaml file.
2020 Update:
For my application I found that App Engine started failing to detect my latest app deployments once I reached 50 Versions in my Versions list.
See (Burger Menu) -> App Engine -> Versions
After deleting a bunch of old versions on next deploy it picked up my latest changes immediately. Not sure if this is specific to my account or billing settings but that solved it for me.
I had my static files over a service in Google Cloud Platform. My problem was that I didn't execute
gcloud app deploy dispatch.yaml
Once executed, everything was fine. I hope it helps
Another problem that could be causing this is caching in Google's frontend, which depends on the cache header returned by your application. In my case, I opened Firefox's inspector on the Network tab, and saw that the stale file had a cache-control setting of 43200 seconds, i.e. 12 hours:
This was for a file returned from Flask, so I fixed this by explicitly specifying max-age in the Flask response from my GAE code:
return flask.send_from_directory(directory, filename, max_age=600)
This causes intermediate caches such as Google's frontend to only cache the file for a period of 600 seconds (10 minutes).
Unfortunately, once a file has been caches there is no way to flush it, so you will have to wait out the 12 hours. But it will solve the problem for the next time.
I am working on a solution to get S3 and Cloudfront in sync when I upload a new version of an angular app.
My approach is to upload the new version to a new folder with an increasing version number http://awsbucket/v1 ... /v2 and after that updating the Download Distribution Origin Path to that new folder.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html#DownloadDistValuesOriginPath
I am wondering if this change of the Origin Path automatically results in a complete cache invalidation or if i have to send invalidation requests never the less.
So if you keep moving your web resources ( images, scripts or any thing that can be sent over http) to various versions and do to necessary changes in your app; by design; intentionally you would starting using the newer versions resources - the older version's cache would go colder and colder and eventually being taken out of the cache.
The invalidation requests are costly, time consuming while the versioning is easy and natural. The best use cases was found in the areas of newer CSS stylesheets, updation in js scripts being versioned. The same can be extrapolated for your use case.
Also you don't need to change the origin; keep adding the new files to the S3 and ensure the same are being reflected in the app- that would do.
To answer your question, NO - changing the Origin, including just the path, does not result in cache invalidation.
Information can be found here
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html#DownloadDistValuesDomainName
Quoting the specific part:
Changing the origin does not require CloudFront to repopulate edge caches with objects from the new origin. As long as the viewer requests in your application have not changed, CloudFront will continue to serve objects that are already in an edge cache until the TTL on each object expires or until seldom-requested objects are evicted.
I have a Java based web application running on google appengine that depends on data in the datastore. When I update this backend data and deploy. I can see the data change immediately if I access the url 1-dot-myapp.appspot.com. I cannot get the default version of the url (myapp.appspot.com) to update on another device unless I access the full specific version of the url.
How can I force the default version of the application to update on deployment?
Thank you
I went back and looked at my cookies information. 1-dot-myapp.appspot.com only has a _ga cookie, the entry for myapp.appspot.com has 3 cookie values: application cache, ACID, and _ga. I was surprised that 1-dot-myapp.appspot.com did not have an application cache value in the cookie. So now I guess my question is. How can I force the application cache to renew as desired.
What I came up with was to either remove reference to my manifest file from my html or to rename my manifest file. So when ever I want to for the client browser to update the cache I redploy with a newly named manifest file. The manifst file is renamed with a version number like manifest2.mf. Then my build modifies all references for manifest to the newly name manifest file i.e. manifest2.mf. My html files and my appengine.xml file then us manifest2.mf . these changes seem to force the client browsers to update their cache.