[PRO] Shipa Delivery. Too many HTTP timeouts

Rami_alhendi · August 27, 2020, 2:30pm

Hi Guys ,

we are facing a critical issue on our newly set up dedicated servers , Resource requested in the business logic through HTTP is closed .

Am attaching the logs & analysis for the issue , we appreciate you swift help

.

oleg-vyalyh · August 27, 2020, 3:38pm

Hi, Rami.

We need additional metrics (and its history) for the further investigation of this problem.
These are: LA, cpu, disk-io, network bandwidth.

p.s. The behavior is quite strange, because the instances and the configurations are the same for the all nodes. Thus we need to understand where the difference begins.
We also work on the feature which allow to compress data between the client and the server, which should decrease the response time and can exclude the set of problems, related to it.

Rami_alhendi · August 27, 2020, 8:38pm

https://rancher.shipadelivery.com/k8s/clusters/c-8jcwg/api/v1/namespaces/cattle-prometheus/services/http:access-grafana:80/proxy/d/icjpCppik/cluster?orgId=1&refresh=1m&from=now-12h&to=now&var-Node=aws-worker-02

Rami_alhendi · September 1, 2020, 12:52pm

We have received the same time outs issue yesterday for a couple of hours , it is causing instability in the whole platform & we need to find out the root cause resolve it please ,

From the team :-1

" Found this explanationIn the list of DB instances, a bar in the Current Activity column shows the database load of each instance that has Performance Insights enabled. An empty rectangle with a blue border indicates an idle instance. The vertical red line indicates the capacity of the host. As the database load increases, the bar fills with blue. When the load exceeds host capacity, it changes to red."

Finding on Database :

whats weird is there is NO lots of insert/update/delete but there is a lot of commit
maybe i can increase the IOPS for the rds

Thanks //Rami

sergey.kuk · September 1, 2020, 3:00pm

Hello @Rami_alhendi,

we are analysis information that you gave us.

About commits I can say that it is ok to have a lot of it:
https://dev.mysql.com/doc/refman/8.0/en/innodb-autocommit-commit-rollback.html

In InnoDB, all user activity occurs inside a transaction. 
If autocommit mode is enabled, each SQL statement forms a single transaction on its own. 
By default, MySQL starts the session for each new connection with autocommit enabled, so MySQL does a commit after each SQL statement if that statement did not return an error. 
If a statement returns an error, the commit or rollback behavior depends on the error. See Section 15.21.4, “InnoDB Error Handling”.

mark-piller · September 1, 2020, 3:17pm

Hi Rami,

As we’re working on figuring out the root cause of the issue, could you please get the complete query for the request shown here?:
https://take.ms/LQ0d4

Regards,
Mark

Rami_alhendi · September 1, 2020, 4:49pm

Hi Mark , Sure .

select count(udt.Order.objectId) as count from udt.Order where (udt.Order.country = ‘ARE’ and udt.Order.status not in (‘CANCELLED’, ‘DELIVERED’))

But it has changed today as you can see below , we are not sure if any optimization has happened

By Aram
" the queries are less heavy (commit) and the iops are less which made it more responsive but the query profile" is different, no idea why"

mark-piller · September 1, 2020, 4:52pm

Thanks, Rami, this is helpful. We’re somewhat at the discretion of RDS, but any optimizations we can do on our side (code or configuration) should certainly be done. That’s what we are after in this task now.

Regards,
Mark