InControl issue tracking page

We’re experiencing a service interruptions with the IC2 services.

Date: 24/5/2023
Time : Start from 00:10 UTC

IC2 Cloud Engineering team is working to resolve the issue now.

Impact :
Device updates, reporting, online status ,captive portal and sim pool will be affected, as the device will shown offline in IC2.

Next update : We will update the latest status to this post as soon as the issue is resolved.

Please set this forum post “Watching” to receive the notification.

We apologize for any inconvenience.

5 Likes

Thank you for the update. please keep us posted.

1 Like

The issue resolved and resumed around 00:30 UTC 24/5/2023. For those affected devices, we can confirmed there are coming back online gradually and all devices have come back online at about 01:45 UTC 24/5/2023.

For any IC2 users still found their devices offline in IC2 and suspect the issue are related to the IC2 services , few free to open a support ticket for support team to check.

https://ticket.peplink.com/ticket/new/public

1 Like

What was the root cause?

3 Likes

At approximately 00:04 UTC on 24 May 2023, a maintenance fix was deployed to all “planets”. The deployment of the fix is supposed to be transparent to our users and have no negative impact on the system. However, as the deployments were made on all planets at approximately the same time, they caused all devices on all planets to send their report data to the system at the same time. The volume of incoming report data overloaded the system’s device communication cluster. The cluster became unresponsive to devices. Devices began to initiate re-authentication with the system. However, the cluster was too busy to process all the device reauthentication requests in time. As a result, the system began to incorrectly treat devices as offline.

At 00:30, the system stopped requesting devices to send their report data. The cluster load started to decrease. The cluster started to process the re-authentication requests from the devices. Devices gradually and slowly came back online. At about 01:23, more resources were added to the cluster to speed up re-authentication. At around 01:45, all devices were back online.

To prevent the same problem from occurring again,

  • Fixes will not be applied to more than one planet at a time;
  • More spare resources have been added to the communications cluster so that it can cope with an increase in load.
  • If an abnormally high number of authentications are detected, the system will stop identifying devices as offline to avoid potential false alarms.
8 Likes

Date: 2023-08-22
Time: since 01:50 UTC

Issue :
One of three Mars sub-systems hit a bug in Amazon Aurora service and encounter performance issues.

Progress:
The InControl Engineering team and the Amazon engineering team are working out a solution to resolve the issue now.

Impact :
Device updates, reporting, online status & configuration changes are experiencing slowness/delayed issues for some organizations on Mars.

Next update: We will update the latest status to this post as soon as the issue is resolved.

Please set this forum post “Watching” to receive the notification.

We apologize for any inconvenience.

6 Likes

At 12:00 UTC on 2023-08-22, the performance issue with a Mars sub-system has been resolved. The system’s services have been completely restored.

We are sorry for any inconvenience.

4 Likes

Date: 2023-09-25
Time: since 01:30 UTC

Issue :
The IC2 messaging server are unstable at the moment, so user might notice intermittent online and offline alerts.

Progress:
The InControl and Engineering teams are working on the issue now to resolve it soonest possible.

Impact :
So far, we have receive users reported they receive device false offline email alerts and RWA is impacted.

Next update: We will update the latest status to this post as soon as the issue is resolved.

Please set this forum post “Watching” to receive the notification.

We apologize for any inconveniences caused.

[Update]
Issue resolved at 07:30 UTC

2 Likes

Date: 2023-10-11
Time:
Entire IC2 = since 08:23 UTC
Partial of Mars planet = since 07:35 UTC

Issue #1: The IC2 live queries and operations are not working.
Issue #2: Users are reporting devices are randomly appearing offline and online.

Progress:
The InControl and Engineering teams are working on to resolve the issue now. Most of the planets are recovering, while partial of the Mars users are still affected.

Impact on Issue #1: The user might experience the RWA and Captive Portal service are affected.
Impact on Issue #2: Devices are randomly appearing offline and online at the moment

Next update:
We will update the latest status to this post as soon as the issue is resolved.

Please set this forum post “Watching” to receive the notification.

We apologize for any inconveniences caused.

[Update #1] The Issue #1 has been resolved around 09:15 UTC.

[Update #2] Issue #2 is resolved at 10:50 UTC.

4 Likes

Date: 2023-11-11
Time: 05:36 UTC
System: One of Mars subsystems called “mars3”.

Issues: Some device-reported data were not processed. Device and group status might be out of date.

Update: The issue was resolved on 2023-11-12 at 23:40 UTC. It was due to a database connection pool being exhausted. However, the issue was not identified promptly.

Issue avoidance: A monitor on database connection pool errors has been implemented. When the same error occurs, the pool will be reset automatically. Peplink engineers will be notified at the same time.

We apologize for any inconveniences caused.

Please set this forum post “Watching” to receive notifications.

2 Likes

Date: 2024-01-02
Time: 06:20 UTC
System: Entire InControl system.

Issue: A lot of devices have been falsely marked as offline.

Update:
A system component was generating a high CPU load to a memory database. The database was overloaded.

The component stopped generating the load at about 07:12. Devices started to appear online gradually since then. The system was totally recovered at 07:48 UTC.

Please set this forum post “Watching” to receive the notification.

We apologize for any inconvenience caused.

3 Likes

Date: 2024-03-09
Time: since 01:00 UTC

Issue:
When users visit their organization, an error message “This organization requires users to enable two factor authentication.” even though the users have been two-factor authenticated during sign-in.

Impact:
For organizations that require their users to be two-factor authenticated, their users were unable to open the organization. Organizations with the option disabled were not affected.

Update:
The issue was resolved at 02:58 UTC.

We apologize for any inconvenience caused.

4 Likes

There is a major issue with InControl. All public organizations are inaccessible.

1 Like

It is working again now. Ah it was scheduled maintenance…

2 Likes

Hi Martin,

How did you learn it was scheduled? I didn’t see any updates on IC2 or in this thread.

@jakub.nowicki it should be this announcement.

1 Like

Hi Wei Ming,

thanks! I just learned that I “watched” wrong thread.
I set a an alert for any future posts with the tag you linked.

Please consider adding alerts about scheduled maintenance in IC2 as App push and notification for browser users.
image

I’m still down, getting 502 gateway error. phone app isn’t working either. time to call for support :pleading_face:

Another extremely useful tool would be a secondary site like status.incontrol2.peplink.com where known issues and planned interruptions could be shared with users. This would help all those that aren’t part of the forums to know if/when there are any incidents and that Peplink is working on things keeping network managers from worrying and channel support teams from creating tickets for known issues.

1 Like

Hello, this has been resolved.

2 Likes