Timers not executing after business logic error

mitchell · January 30, 2019, 9:22pm

Backendless Version (3.x / 5.x, Online / Managed / Pro )

5.2

Client SDK (REST / Android / Objective-C / Swift / JS )

Swift

Application ID

130CA81A-36A3-9479-FFEE-AB623F4D0E00

Expected Behavior

Please describe the expected behavior of the issue, starting from the first action.

Error is thrown and written to log
Email received to say an interruption occurred
Timers continue/restart automatically

Actual Behavior

Please provide a description of what actually happens, working from the same starting point.

Error is thrown and written to log
Email is received to say an interruption occurred
Timers stop until they are manually restarted by me through console. This defeats the purpose of server code if I am having to manually restart timers.

Is this error referring to an object that is trying to be accessed that doesn’t exist in the data tables?

Reproducible Test Case

mark-piller · January 30, 2019, 9:29pm

Hi Mitchell,

I see there was an email sent out to you two times: one about 9 hours ago and the second one about 40 minutes ago with the following message:

This is a notification to inform you that a business logic code running 
in your application has been interrupted prematurely. The interruption was 
caused by the business logic taking longer than the plan limit.

The information about the limit can be found below.

Application name: BTCMarketsTicker
Application ID: 130CA81A-36A3-9479-FFEE-AB623F4D0E00
Maximum allowed business logic runtime: 20 seconds

This happens when the business logic doesn’t complete in the allocated period of time (20 seconds for your app). Please check the queries the code makes, see if there are any loops where you possibly may iterate over a collection and execute additional calls. In many cases as application data grows, the business logic may start taking longer to run as it has to deal with larger data sets.

Regards,
Mark

mitchell · January 30, 2019, 10:10pm

Hi Mark,

My business logic retrieves JSON from a third party web server, so I assume that any time outs (>20 seconds) are due to slow http connection to this web server. However this is not the focus of my enquiry.

I am referring to an error which occurred on 28th January I.e a few days ago.

This causes the timers to stop executing and I have to manually reset them. Why aren’t timers continuing after the error? I don’t want to come check every day that the timers are running properly.

Mitch

mark-piller · January 30, 2019, 10:19pm

Hi Mitch,

Got it. Let me correlate the time of the event with other things going on in the system to understand what happened.

Regards,
Mark

mitchell · February 4, 2019, 11:27pm

Hi Mark,

Any follow up on this issue?

Mitch

mark-piller · February 5, 2019, 2:59am

We determined that timers stop running when the scheduling service is restarted. Now that we understand the root cause a ticket has been entered and assigned to a developer. We will notify you when it is fixed.

Regards,
Mark

vladimir-upirov · February 5, 2019, 2:25pm

ticket number for reference is BKNDLSS-18097

sergey-chupov · February 7, 2019, 2:48pm

Hello Mitchell,
I failed to reproduce the behavior you described, in my case the timer continues to execute even though I receive the same error. Here’s what I’ve done to verify your issue:

Create a Java Timer which runs every 60s
Create a table (e.g. Order in my app) with one object (let’s say it’s ID is CFE4F1F9-2114-A118-FF45-2C00E2E5FE00)
Add this code to the Timer:

    final Map order = Backendless.Data.of( "Order" ).findById( "CFE4F1F9-2114-A118-FF45-2C00E2E5FE00" );
    Backendless.Logging.getLogger( "Sample" ).info( order.toString() );

    Backendless.Data.of( "Order" ).findById( "XXX" );

Compile the code and deploy the timer
Watch time execution (I used real-time logging console for that)

And here’s the result I get:

Thu Feb 07 2019 16:30:53 GMT+0200 (Eastern European Standard Time) | Sample | INFO | {created=Thu Feb 07 14:29:41 UTC 2019, ___class=Order, ownerId=null, updated=null, objectId=CFE4F1F9-2114-A118-FF45-2C00E2E5FE00}
Thu Feb 07 2019 16:30:53 GMT+0200 (Eastern European Standard Time) | Coderunner | ERROR | Business logic execution has been stopped, due to error: Code: 1000 Class: com.backendless.exceptions.BackendlessException Message: Entity with ID XXX not found
Thu Feb 07 2019 16:31:53 GMT+0200 (Eastern European Standard Time) | Sample | INFO | {created=Thu Feb 07 14:29:41 UTC 2019, ___class=Order, ownerId=null, updated=null, objectId=CFE4F1F9-2114-A118-FF45-2C00E2E5FE00}
Thu Feb 07 2019 16:31:53 GMT+0200 (Eastern European Standard Time) | Coderunner | ERROR | Business logic execution has been stopped, due to error: Code: 1000 Class: com.backendless.exceptions.BackendlessException Message: Entity with ID XXX not found
Thu Feb 07 2019 16:32:53 GMT+0200 (Eastern European Standard Time) | Sample | INFO | {created=Thu Feb 07 14:29:41 UTC 2019, ___class=Order, ownerId=null, updated=null, objectId=CFE4F1F9-2114-A118-FF45-2C00E2E5FE00}
Thu Feb 07 2019 16:32:53 GMT+0200 (Eastern European Standard Time) | Coderunner | ERROR | Business logic execution has been stopped, due to error: Code: 1000 Class: com.backendless.exceptions.BackendlessException Message: Entity with ID XXX not found
...

As you see, the timer does throw an error, but it still executes every minute. So I have to ask you to provide us any more detailed steps which are needed to see your issue, in case you are able to reproduce it persistently.

For now it seems to me that the stopped timer in your case has nothing to do with the errors or timeouts in the code.
First, the timeout cannot happen in case an unhandled exception is thrown. Because in this case the CodeRunner immediately stops executing your code and there’s just no time spent after the exception happens, therefore if you see such error - your timer has already finished its work. Most probably you received the timeout notifications in relation to other runs of the timer.
Second, the code scheduling the Timer tasks is a totally different process from the code executing your code (i.e. from the CodeRunner on Backendless side). So it’s not likely that any behavior in CodeRunner can cause troubles in the timer tasks scheduler.
Truth be told, we do receive occasional reports from our customers about stopped timers execution. And unfortunately, there’s no way we could guarantee you that the scheduling never stops (so as there’s no way any other software can strongly guarantee you this). Being aware of such a possibility, we do our best in trying to detect and prevent such failures, e.g. we have a special app with a timer for monitoring purposes, and when the outer system detects that the timer is not active, we forcefully reschedule all the timers. And still, as you see, we miss the problems sometimes.
Therefore, we’d be really grateful if you were able to provide us enough information in order to investigate at least your specific case.

mitchell · February 8, 2019, 1:30am

Hi @sergey-chupov,

I appreciate the time you have taken to review this case. Unfortunately I also have no steps to reproduce the issue. It may have been a coincidence this time that the scheduler was interrupted around the same time the business logic error occured. Hopefully with the updated database cluster this will happen a lot less.

Moving forward, is it possible to monitor timers on your end and send automated email when one has not executed at expected time?

For example a timer is set to fire every 60 seconds, if it has not gone off in say last 5 minutes then user gets an email saying timer may need to be restarted.

sergey-chupov · February 11, 2019, 10:12am

I guess we could set up such monitoring for your app in terms of our consulting services. Try reaching sales@backendless.com to discuss the details.