How can we monitor the health of our backend?
Sometimes, requests take much longer than normal and we don’t know how to monitor the backend health and monitor this.
Our backend is inconsistent now that we’ve launched on the app store for some users and some are having timeouts and unusual errors.
We use Cloud 99.
Requests work perfectly sometimes and other times, they take significantly longer to complete.
This varies on different days but is seemingly random.
Also, we have more users and sometimes, requests have errors.
200 response with 10 users and then a 500 with 1 (this is not specifically the error rate, just a minor illustration); essentially, requests aren’t processed consistently but the main issue seems to be that requests take longer sometimes.
It returns a response under 5 seconds most times.
But there are just times in the day (with no traffic) where it returns a response after over 10 seconds and gives a timeout error.
I looked through the implementation of the “castVote” method and saw that it has a rather high number of API requests and a bunch of logging (which also translates to API calls). It is not a surprise that sometimes it make take longer than 5 seconds, the function is indeed quite long in terms of what it does.
Whenever it happens that the function doesn’t finish under 5 seconds, it is not a sign of the backend being unhealthy, but a fact that the app runs in a shared hosting environment such as Backendless Cloud. The resources of the system are divided between multiple apps. At times we allocate additional resources when a surge occurs. These resources may not be instantly available since they need to be procured. As a result, invocations of methods with higher than-average complexity (in terms of the number of API calls) may take longer to execute. A remedy for this is to purchase a function pack that provides a longer runtime Cloud Code duration for the invocations.
There we go!
I figured that resources were being reallocated.
Yeah, the castVote endpoint is due for a refactor but just wanted to know why the response times may fluctuate.