Temporary 503 and 404 for OpenShift hosted webapp

Temporary 503 and 404 for OpenShift hosted webapp - http-status-code-404

I deployed a webapp to WildFly 8.0.0 on OpenShift. The application currently has very few users, but works fine. I use a free OpenShift account (I don't know if it is relevant) with a single cartidge, for WildFly.
Sometimes, when I access the application I get 503 (sometimes) or 404 (most of the times) errors.
It seems that I get these errors if the application has not been used for some time (something like 2 or 3 days). For about a minute, if I reload the page I get the same error. But after about a minute, I do not get the errors any more, instead the application is correctly available.
It looks like OpenShift "disables" webapps if they have not been used for some time, then "re-enables" them as required (but displays 503 or 404 during the "re-enabling" of the webapps).
=> Is this a bug? Is this a well-known issue of OpenShift?
=> How can I prevent my webapp from being unavailable?
Regards

As diw stated, gear idling is part of the free plan and with the announcement of the bronze plan, you probably won't have to worry about that anymore.
However, if you want to stay on the free plan and if your application needs regular visits so you don't get those errors, you could setup a monitoring service (for example http://pingdom.com or http://uptimerobot.com) to check hourly and thus avoid having your gear idled. I found this out by accident when I moved a small personal site over to OpenShift and it never was idled due to the monitoring service hitting it.

On the free plan, OpenShift will idle any gears that have not received an external HTTP request or git push in 2 days, as per this FAQ.
If you are only using the 3 free gears, you can upgrade to the Bronze plan, which does not have any gear idling, and will not have any additional charges.

Related

NextJS - Incremental Static Regeneration on Vercel constantly times out (on a pro plan - 60s time-out limit)

I'm having issues with a NextJS project hosted on a Vercel pro plan. Routes that are set up with ISR just keep timing out. I have revalidate set to 120s currently but I've tried it with 60s, 30s, and 6000s too just in case I was calling it too often.
I'm using Strapi as the headless CMS that serves the API for NextJS to build pages from and is deployed on Render's German region. The database Strapi uses is a mongodb databse hosted on MongoDB Atlas and deployed on MongoDB's Ireland AWS region. (I've also set the Serverless Functions on Vercel to be served from London, UK but I'm not sure if that affects ISR?)
There are 3 different dynamic routes with about 20 pages each and on build-time they average 6999ms, 6508ms and 6174ms respectively. Yet at run-time, if I update some content in the Strapi CMS and wait the 120s that I've set for revalidate the page hardly ever gets rebuilt. If I look at the vercel dashboard "Functions" tab that shows realtime logs, I see that many of the pages have attempted to rebuild all at the same time and they are all hitting the 60s time-out limit.
I also have the vercel logs being sent to LogTail and if I filter logs for the name of the page that I've edited, I can see that it returns a 304 status code before 120s has passed as expected but then after 120s it tries to fetch and build the new page and nearly always returns the time-out error.
So my first question is, why are so many of them trying to rebuild at the same time if nothing has changed in the CMS for all of those pages but the 1 I've deliberately changed myself? And secondly, why at build time does it only take an average of 6000ms to build a page but during ISR they are hitting the 60s time-out limit?
Could it be because so many rebuilds are being triggered that they are all end up causing each other to time-out? If so, then how to I tackle that first issue?
Here is a screenshot of my vercel realtime logs. As you can see, many of the pages are all trying to rebuild at once but I'm not sure why, I've only changed the data for one page in this instance.
To try and debug the issue, I decided to create a Postman Flow for building one of the dynamic routes and then added up the time for each api call that is needed to build the page and I get 7621ms on average after a few tests. Here is a screenshot of the Postman console:
I'm not that experienced with NextJs ISR so I'm hoping I'm just doing something wrong or I've not got a setting correct on vercel etc. but after looking on stackoverflow and other websites, I believe I'm using ISR as expected. If anybody has any ideas or advice about what might be going on, I'd very much appreciate it.

Server gives 503 on almost all images

I have a Wordpress site there is suddenly getting extremely slow - special on mobile devices. I am often getting this error:
Service Unavailable
The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.
Additionally, a 503 Service Unavailable error was encountered while trying to use an ErrorDocument to handle the request.
I checked in my server log and can there is a huge amount of 503 errors:
As I understand Wordpress, then everytime you upload a picture there is generated different screensizes. I was looking on my ftp server, and can see each picture is generated with the below sizes:
Kajak-i-Norge.jpg
Kajak-i-Norge-60x40.jpg
Kajak-i-Norge-60x40#2x.jpg
Kajak-i-Norge-100x100.jpg
Kajak-i-Norge-120x80.jpg
Kajak-i-Norge-150x150.jpg
Kajak-i-Norge-300x200.jpg
Kajak-i-Norge-300x300.jpg
Kajak-i-Norge-394x263.jpg
Kajak-i-Norge-394x303.jpg
Kajak-i-Norge-394x394.jpg
Kajak-i-Norge-394x400.jpg
Kajak-i-Norge-600x400.jpg
Those 503 errors does that mean the picture is not found on the server?

There are 3 reasons i think are producing this problem
1 - Something wrong with hosting , especially if your it is shared
2 - Someone trying to do a denial of service
3 - You have a lot of viewers in your server that the server is starting to get gradually slower

The 503 error means that the server can't serve up the image, not that it's not found on the server. 404 errors are not found. The fact that you're seeing both of these errors on your images is most likely the result of server resources being overloaded (as other answers have mentioned)...either capacity or bandwidth. If you're on shared hosting, this can happen often. Your site loaded fine for me just now, but of course, that can change the second your server gets bogged down.
A free cloudflare account is a good recommendation from Dan Web above...it will at least rule out the chance it's spam bots hogging up your resources.
Intermittent issues like this are almost always shared hosting problems. Your host will also rarely accept responsibility for the issues, and you're let with having to decide to move your site to a different hosting solution.

Why is C1 requesting my admin page remotely?

Some time ago I had Composite C1 installed on a public url to test it (for example http://c1.mydomain.com). But I did remove it already some time ago.
I checked my firewall logs recently and I discovered requests for http://c1.mydomain.com/Composite/top.aspx every single night from IP address 109.238.52.32. (Composite.net's ip address is 109.238.52.24, which is almost the same, so I assume the requests are comming from Composite.net.)
So the question is: Why is Composite.net requesting my admin page every single day?

What you are seeing is a process that check long-term usage of the software in order to generate statistics. The daily URL request looks at the HTTP status code (HTTP 200 vs other) to determine if your website is still online and using the CMS. Since the URL is unique and likely not used by any other system it is a good indication.
When that is said it is probably silly to keep on requesting the same URL once the check starts reporting 404 or similar.

HTTPS requests getting aborted after 2 seconds

We have a deployed java application on GAE. We have enabled SNI SSL certificates. For the last few days, we have observed that that any HTTPS request that are taking more than 2 seconds are getting aborted by server (as reported by browser). This is consistently happening on FF, IE and Chrome on Windows XP, Windows 7 64bit & Safari and Chrome on Mac Mountain lion. The error that is shown on Chrome is "Error 324 (net::ERR_EMPTY_RESPONSE): The server closed the connection without sending any data", where in IE (v9.0) is throwing error 12152.
This was consistently reproduced by hitting a URL, mapped to a java servlet, which is made to sleep for >=2000 ms. The sleep interval was given as a request parameter and tried with varied value of 1000 to 5000 ms. The above mentioned error was thrown for all values >=1900 ms, while any thing less than that would not yield any issues.
However, there were no issues faced if the URL scheme is changed to HTTP.
GAE application logs did not show any error or any signs of new instance spawning off. We are App engine version 1.8.1 and java version is 6.
Any ideas to solve the issue would be immensely helpful.
Edit: The issue is only there for custom domains. It works fine for appspot domain. (xxxxx.appspot.com)
Sreejith

That error is widely known and is related to your internet cache (see the bug report for Chrome, for example).
After a quick search, everybody seem to agree that these solutions could fix the problem
clear your browsers cache (maybe use some external programs)
reset your browsers preferences to the default ones
uninstall and install again your browsers
Some other people suggest to perform a system restore, but that seems too extreme to me.

Random 500 errors on AppEngine

I have a fairly big application which went over a major overhaul.
The newer version uses lot of JSONP calls and I notice 500 server errors. Nothing is logged in the logs section to determine the error cause. It happens on JS, png and even jersey (servlets) too.
Searching SO and groups suggested that these errors are common during deployment. But it happens even after hours after deployment.
BTW, the application has become slightly bigger and it even causes deadline exception while starting few instances in few rare cases. Sometimes, it starts & serves within 6-10secs. Sometimes it goes to more than 75secs thereby causing a timeout for the similar request. I see the same behavior for warmup requests too. Nothing custom is loaded during app warmup.

I feel like you should be seeing the errors in your logs. Are you exceeding quotas or having deadline errors? Perhaps you have an error in your error handler like your file cannot be found, or the path to the error handler overlaps with another static file route?
To troubleshoot, I would implement custom error pages so you could determine the actual error code. I'm assuming Python since you never specified what language you are using. Add the following to your app.yaml and create static html pages that will give the recipient some idea of what's going on and then report back with your findings:
error_handlers:
- file: default_error.html
- error_code: over_quota
file: over_quota.html
- error_code: dos_api_denial
file: dos_api_denial.html
- error_code: timeout
file: timeout.html
If you already have custom error handlers, can you provide some of your app.yaml so we can help you?

Some 500s are not logged in your application logs. They are failures at the front-end of GAE. If, for some reason, you have a spike in requests and new instances of your application cannot be started fast enough to serve those requests, your client may see 500s even though those 500s do not appear in your application's logs. GAE team is working to provide visibility into those front-end logs.

I just saw this myself... I was researching some logs of visitors who only loaded half of the graphics files on a page. I tried clicking on the same link on a blog that they did to get to our site. In my case, I saw a 500 error in the chrome browser developer console for a js file. Yet when I looked at the GAE logs it said it served the file correctly with a 200 status. That js file loads other images which were not. In my case, it was an https request.
It is really important for us to know our customer experience (obviously). I wanted to let you know that this problem is still occurring. Just having it show up in the logs would be great, even attach a warm-up error to it or something so we know it is an unavoidable artefact of a complex server system (totally understandable). I just need to know if I should be adding instances or something else. This error did not wait for 60 seconds, maybe 5 to 10 seconds. It is like the round trip for SSL handshaking failed in the middle but the logs showed it as success.
So can I increase any timeout for the handshake or is that done on the browser side?