Appendix E - Troubleshooting ============================ Fixing Deployment Problems -------------------------- Sometimes a node fails to deploy. When this happens, check the installation output on the node's MAAS page. (Click the Logs tab and then click Installation Output.) Often, a clue to the nature of the problem appears near the end of that output. If you don't spot anything obvious, copy that output into a file and send it to the Server Certification Team. One common cause of deployment problems is IP address assignment issues. Depending on your MAAS configuration and local network needs, your network might work better with DHCP, Auto Assign, or Static Assign as the method of IP address assignment. To change this setting, you must first release the node. You can then click the Network tab on the node's summary page in MAAS and reconfigure the network options by using the Actions field, as described earlier, in :ref:`install-ubuntu`. If, when you try to deploy a GA kernel, MAAS complains that the kernel is too old, try this: #. Click the node's *Configuration* tab in MAAS. #. Click *Edit* under *Machine Configuration.* #. In the *Minimum Kernel* radio button, select *No Minimum Kernel.* #. Click *Save Changes.* #. Try to re-deploy. Adding PPAs Manually -------------------- Sometimes you may need to add a PPA manually. In order for this to work, your SUT must be able to reach the internet and more specifically reach ``launchpad.net``. If either of those requirements are not met, you will receive a somewhat confusing message like this:: ubuntu@ubuntu:~$ sudo add-apt-repository ppa:checkbox-dev/stable Cannot add PPA: 'ppa:checkbox-dev/stable'. Please check that the PPA name or format is correct. To resolve this, ensure that your SUT can reach the internet and can reach ``launchpad.net`` directly. Submitting Results ------------------ If submitting results from the Server Test Suite itself fails, you can use the ``checkbox-cli`` program, as described earlier, in :doc:`manually-upload-test-results`. You can try this on the SUT, but if network problems prevented a successful submission, you may need to bring the files out on a USB flash drive or other removable medium and submit them from a computer with better Internet connectivity. Resolving Network Problems -------------------------- Network problems are common in testing. These problems can manifest as complete failures of all network tests or as failures of just some tests. Specific suggestions for fixing these problems include: - **Check cables and other hardware** -- Yes, this is very basic; but bad cables can cause problems. For instance, one bad cable at Canonical resulted in connections at 100 Mbps rather than 1 Gbps, and therefore failures. Some of these failures were identified in the output as the lack of a route to the host. Similarly, if a switch connecting the SUT to the ``iperf3`` server is deficient, it will affect the network test results. - **Use the simplest possible network** -- Complex network setups and those with heavy traffic from computers uninvolved in the testing or those with multiple switches, bridges, etc., can create problems for network testing. Simplifying the network in whatever way is practical can improve matters. - **Check firewall settings** -- Successful deployments may require access to several network sites. These include repositories at ``archive.ubuntu.com`` (or a regional mirror), Ubuntu's PPA site at ``ppa.launchpad.net``, and Ubuntu's key server at ``keyserver.ubuntu.com``. (You may instead use local mirrors of the archive and PPA sites.) If your site implements strict outgoing firewall rules, you may need to open access to these sites on ports 80 and/or 443. - **Check the iperf3 server** -- Ensure that the server computer is up and that the ``iperf3`` server program is running on it. Also ensure that the computer has no issues, such as a runaway process that's consuming too much CPU time. - **Verify the iperf3 server is not overworked** -- The ``iperf3`` server program refuses connections if it's already talking to another client. Thus, a SUT may fail its network test if the ``iperf3`` server is already in use. You may need to re-run the network tests on one or more SUTs if this is the case. Note that a faster ``iperf3`` server (say, one with a 10 Gbps NIC used to test 1 Gbps SUTs) requires special configuration to handle multiple simultaneous connections, as described in the :doc:`../Environment_Setup_Guide/Environment_Setup_Guide`. - **Ensure the iperf3 server is on the SUT's local network** -- The network tests temporarily remove the default route from the routing table, so the ``iperf3`` server must be on the same network segment as the SUT. - **Check the SUT's network configuration** -- A failure to configure the network ports will cause a failure of the network tests. Likewise, a failure to bring up a network interface before testing will cause the test to fail, even if the Server Test Suite detects the interface. - **Check your DHCP server** -- A sluggish or otherwise malfunctioning DHCP server can delay bringing up the SUT's network interfaces (which repeatedly go down and come up during testing). This in turn can cause network testing failures. If you end up having to re-run the network tests, you can do so as described earlier, in :doc:`appendix-b-updated-test`. Fixing Virtualization Test Problems ----------------------------------- Virtualization tests can fail for a number of reasons. If these tests fail, you should first try these diagnostic or corrective actions: - Type ``sudo apt install -f`` on the SUT. This command repairs some package installation problems, which can sometimes cause the KVM test to fail. - Check your virtualization image sources, as described in :doc:`run-certification-tests`. Note that you may need to check the configuration on the SUT (in ``/etc/xdg/canonical-certification.conf``) and on whatever server you use to host your virtualization images. - If you're *not* hosting virtualization images locally, be aware that the virtualization tests will try to download images from the Internet. In this case, you must ensure that the SUT has Internet access. You can run the virtualization tests alone by typing ``test-virtualization`` on the SUT. Handling Secure Boot MOKs ------------------------- Although most Ubuntu components, such as GRUB, the Linux kernel, and standard Linux kernel modules, are cryptographically signed with Canonical's key, some third-party and specialized modules (notably including some used by the firmware test suite, or ``fwts``) are not so signed. To use such modules, they must be signed with a machine owner key (MOK), which is stored in the computer's NVRAM; and to store the MOK, UEFI Secure Boot policy requires manual boot-time approval. Thus, if the computer is deployed with Secure Boot active and certain packages are updated via ``apt``, the ``apt`` program will prompt for a password and, upon reboot, the computer's console will display a prompt to enter a password, and the MOK will be added only if the password matches the one you entered as part of the ``apt`` package update. The prompt at reboot has no timeout, so if you can't see the console, the reboot will fail. If console access is not available, it's best to configure computers with Secure Boot disabled; however, as a general rule, we encourage use of Secure Boot so as to ensure that this feature works. "Console access" can be via a remote KVM or even IPMI SoL. Enabling and disabling Secure Boot generally requires this access, too. Repeatedly deploying a server with Secure Boot active may result in the accumulation of multiple MOKs in the computer's NVRAM. In theory, these could grow to consume enough space in the NVRAM to cause problems. Typing ``sudo mokutil --reset`` at an Ubuntu console will cause all the MOKs to be deleted; however, this will cause kernel modules signed with a MOK to fail to load. It's best to use this command just prior to releasing a node. Handling Miscellaneous Issues During Testing -------------------------------------------- The testing process should be straightforward and complete without issue. Should you encounter problems during testing, please contact your account manager. Be sure to save the ``~/.local/share/checkbox-ng`` and ``~/.cache/plainbox`` directory trees as they will contain logs and other data that will help the Server Certification Team determine if the issue is a testing issue or a hardware issue that will affect the certification outcome. If possible, please also save a copy of any terminal output or tracebacks you notice to a text file and save that along with the previously-noted directories. (Feel free to send us a photo of the screen taken with a digital camera.)