Report on the PepVPN disconnection issue on April 10 and 11, 2017


#1

On April 9, the InControl “mars” system was upgraded to version 2.4. Due to two software bugs, the PepVPN connections of some organizations residing on the mars system may have disconnected. The problem was fixed soon after it was identified by the development team. We sincerely apologize for any inconvenience caused.
Here are the details for each bug:

  1. Configuration generation issue for AP Controller enabled devices
    In this release, IC2 stopped generating Wi-Fi configurations for AP Controller enabled devices. That is, no Wi-Fi configurations will be pushed onto them. However the system incorrectly stopped generating all configurations to AP Controller enabled devices, including PepVPN configurations. Thus, PepVPN profiles were incorrectly removed from AP Controller enabled devices. On April 11 at 02:45, we deployed a fix to IC2. When the affected devices go offline and on again, IC2 will push them with a correct profile.

  2. PSK generation issue
    In previous InControl releases, whenever a PepVPN topology profile is updated, IC2 will also generate a new pre-shared key (PSK) and update existing connection settings on all devices in the profile. This key was shared by all VPN connections. However, this approach carried two limitations. Firstly, it is not necessary for IC2 to generate a new key, update all devices and cause reconnections. Secondly, different connections should be assigned with a different key for enhanced security.
    In the release 2.4, we eliminated both limitations. The stability and security should also be enhanced. In the course, we have introduced a new database table to store each connection’s pre-shared key. In order to avoid any interruptions on existing PepVPNs, IC2 should keep referring to the keys in the old table to generate configurations for existing connections. (As the generated configurations should be the same as the original one, no change should be made to devices). No key will be stored in the new table until a profile is opened, updated and saved once.
    However the code incorrectly referred to the new table (which contains no keys) to generate PepVPN profiles for existing connections and then sent this configuration to devices. Thus, devices received profiles with no key. When both ends have received incorrect profiles, a PepVPN connection could actually still be established. However, before both ends have received the incorrect profile, the PepVPN would be down.
    Soon after the problem was identified and fixed, we deployed an update to IC2 on April 11 at 03:15 GMT+0. When the affected devices went offline and back online, IC2 pushed them with a correct profile which should be the same one as before the IC2 upgrade. Again, the PepVPNs were back up again when both ends also received the (correct) config.

In order to avoid the same problems from happening, our QA process will be improved. More test cases for old configurations will be included in the future.