Some applications have specific requirements for a system’s boot time. Often the system does not need to be immediately ready for all its tasks, but it should be ready for certain mission-critical tasks (e.g. accepting commands over Ethernet or displaying a user interface). This article provides a few methodologies and low-hanging fruit for improving boot time on Toradex System on Modules.
Note: A few tips mentioned in this article require recompiling the U-Boot, Kernel or rebuilding a root file system from scratch. Please refer to their respective articles on our developer website.
Before starting the optimization, we need an appropriate method to measure the boot time. If an exact end-to-end boot time is required, it might even be necessary to involve the hardware (e.g. GPIOs and an oscilloscope). In most cases simple monitoring of the serial port from a host system is accurate enough. A popular utility tool to monitor the timings of serial output is Tim Bird's grabserial
. This utility tool adds a time stamp to each line captured from the serial port as shown below:
$ ./grabserial -d /dev/ttyUSB1 -t
[0.000002 0.000002]
[0.000171 0.000169]
[0.000216 0.000045] U-Boot 2015.04-00006-g6762920 (Oct 12 2015 - 15:35:50)
[0.005177 0.004961]
[0.005227 0.000050] CPU: Freescale Vybrid VF610 at 500 MHz
[0.008938 0.003711] Reset cause: POWER ON RESET
[0.011153 0.002215] DRAM: 256 MiB
[0.063692 0.052539] NAND: 512 MiB
[0.065568 0.001876] MMC: FSL_SDHC: 0
The first number represents the time stamp (since the first character was received) while the second number shows the delta between the time stamps of the current and the last line.
This article is generally applicable to all of our modules. However, I do present some measurements and improvements specifically using our NXP®/Freescale Vybrid-based module - Colibri VF61.
There are roughly three phases of a Linux system boot, which we are listed below and will be examined during the course of this blog.
- Boot loader
- Linux kernel
- User space (init system)
There are actually two more phases before the boot loader can run: Hardware initialization and boot ROM. The hardware initialization phase is needed to fulfill power sequencing requirements and bus or SoC reset timing requirements. This phase is usually fixed and in the range of 10-200 ms. ARM SoCs boot from a firmware stored on an internal ROM. This firmware loads the boot loader from the boot media. The runtime is usually rather short and influenced by the boot loader’s size. Other than minimizing the boot loader’s size, optimizations are rather hard. Real optimization potential and flexibility are within the boot loader (U-Boot).
With the current release V2.5 Beta 1, the time from the first character to the Kernel start is ~1.85 seconds. This involves the following steps:
- U-Boot initialization (~110 ms, measured from the first character received)
- Autoboot delay (1 s)
- Loading and patching the device tree (~35 ms) thanks to a feature called Fastmap. Without Fastmap it would take around 1.6 s)
- Loading the kernel (375 ms)
- Loading and patching the device tree (~35ms)
- And finally jump into the kernels start address
Boot time to Kernel start: ~1850 ms
The obvious optimization is reducing the auto-boot delay. This can be set to zero using:
setenv bootdelay 0
saveenv
This can also be configured as a default by using the CONFIG_BOOTDELAY config symbol. But in the current release, with a boot delay of 0, there is no way to get into the boot loader’s console directly. U-Boot provides an option called CONFIG_ZERO_BOOTDELAY_CHECK which will check for one character even if the boot delay is 0. We have added this option to our default configuration for the next release.
Boot time to Kernel start with this improvement: ~860 ms
Serial output is sent synchronously. This means that the CPU waits until the character has been sent over the serial line. Therefore, each character that is printed slows down the U-Boot. Especially since UBI prints a lot of information messages, there is potential for optimization. It turns out that there is a config symbol CONFIG_UBI_SILENCE_MSG.
Boot time to Kernel start with this improvement: ~800 ms
Ensuring that the hardware is used as efficiently as possible needs insight into what the hardware is capable of and what is currently being implemented. A missing feature till now was the Level 2 Cache (only on Colibri VF61). After implementing Level 2 cache, the boot time improved by more than 40 ms.
Boot time to Kernel start with this improvement: ~760 ms
Removing certain features helps to decrease the relocation time and initialization of such features. By removing Display support (DCU), EXT3 and EXT4 support as well as USB peripheral drivers such as DFU and mass storage. It helped us to decrease the size of U-Boot to 366 kB and shaved away another 10 ms.
Boot time to Kernel start with this improvement: ~750 ms
According to the timestamps, most of the time is spent in attaching UBI and mounting the UBIFS as well as loading the kernel (~380 ms). Obviously, the kernel size and the load time correlate linearly hence optimizing the kernel size will help to improve the boot time further.
KernelTo measure the kernel boot time only, the “match” feature of grabserial can be used to reset the time stamp in the last message printed by the boot loader:
./grabserial -d /dev/ttyUSB1 -t -m "^Starting kernel.*"
The end of the boot time is somewhat hard to determine, since the kernel continues to initialize hardware even after the root file system has been mounted and the first user space process (init) starts running (delayed initialization). The string “Freeing unused kernel memory” is the last message emitted before the init process is started, and hence marks the end of the kernels “linear” init procedure (see kernel_init in init/main.c). We will use the timestamp of that message to compare boot times. The shipped kernel has a zipped size of 4316 kB and a boot time of 2.56 seconds.
Kernel boot time to Init start: 2.56 s
Similar to U-Boot, the Linux kernel prints all messages synchronously to the serial console. The exact behavior depends on the serial console used, but the LPUART (the driver for Vybrid’s console) waits synchronously until the character is sent over the serial port. This has the advantage that when the kernel crashes, all the messages up to that point are visible. If the messages were sent asynchronously, the last visible message would not indicate the location of a crash…
The kernel has an argument to minimize the amount of kernel messages displaying: “quiet”. However, this also silences our anchor for the boot time measurement (“Freeing unused kernel memory”). The easiest way to get the message back on the screen is to elevate the log level for that particular print statement. It is located in ‘mm/page_alloc.c’ - search for “Freeing %s memory”. I elevated the message to ‘pr_alert’. The measurement showed an improvement of 1.55 seconds, which is an improvement greater than factor of 2!
Kernel boot time to Init start with this improvement: ~1.01 s
The easiest way to archive further improvements is by removing features. The Yocto project has a handy tool called ksize.py which needs to be started from within a kernel build directory. The tool prints tables identifying the size of individual kernel parts. The first table shows a high level overview (use make clean before building to get an accurate overview):
Linux Kernel total | text data bss
-------------------------------------------------------------------
vmlinux 8305381 | 7882273 247732 175376
drivers/built-in.o 2010229 | 1881545 109796 18888
fs/built-in.o 1944926 | 1911100 19422 14404
net/built-in.o 1477404 | 1398316 44832 34256
kernel/built-in.o 628094 | 514935 17099 96060
sound/built-in.o 326322 | 316298 8248 1776
mm/built-in.o 288456 | 276492 8000 3964
lib/built-in.o 160209 | 157659 217 2333
block/built-in.o 137262 | 133614 2420 1228
crypto/built-in.o 104157 | 100063 4082 12
security/built-in.o 37391 | 36303 788 300
init/built-in.o 31064 | 16208 14772 84
ipc/built-in.o 29366 | 28640 722 4
usr/built-in.o 138 | 138 0 0
-------------------------------------------------------------------
sum 7175018 | 6771311 230398 173309
delta 1130363 | 1110962 17334 2067
Which features can be removed safely is obviously application specific. Going through the individual high level directories helps to quickly remove the most promising candidates. For this article I removed several file systems (cifs, nfs, ext4, ntfs), the audio subsystem, multimedia support, USB and wireless network adapter support. The kernel ended up at about 3356 kB, roughly 1 MB less than before. This also decreased the kernel loading time in the boot loader by about ~85 ms.
Kernel boot time to Init start with this improvement: ~0.90 s
Another improvement idea can be to evaluate different compression algorithm, even though the current default algorithm in our kernel configuration is LZO which is already quite elaborate.
User SpaceIn Linux user space, initialization is done by the init system. The Toradex BSP images use the Ångström standard init system which is systemd. Systemd, the de facto standard init system on the Linux desktop nowadays, is very feature-rich and is especially designed with dynamic systems in mind. Systemd also addresses boot time. Multiple daemons are started simultaneously (leveraging today's multi-core systems). Socket activation allows delayed loading of services at a later point in time and device activation allows starting services on demand. Furthermore, the integrated logging daemon journald saves space due to binary-packed log files and sophisticated log file management.
Depending on the application, an embedded system might be rather static. Hence, the dynamic features of systemd are not really needed. Unfortunately systemd is not very modular, or the individual modules have interlocked dependencies. This makes it hard to strip down systemd to a bare minimum. This section is separated into two parts: the first part shows systemd boot optimization techniques; whereas, the second part looks at System V and other alternatives.
In both parts we use the “Freeing unused kernel memory” message as the base time for time measurement:
./grabserial -d /dev/ttyUSB1 -t -m "^\[ *[]0-9.]* Freeing unused kernel memory.*"
systemd
For this blog post, we define the login shell on the serial console as a critical task. The login shell is defined as “Type=Idle”, which means that by definition, it starts only after all services have been started.
To start a headless or framebuffer-based application, one would typically create a new service. Systemd allows defining certain requirements as service needs before it can be started (e.g. Network with “Wants=network-online.target”) and then automatically ensures that the services gets started as soon as the requirements are met. However, since services are started in parallel, the CPU resources get shared amongst them. But still, the application is likely up and running before the serial console comes available, hence the following numbers may appear to be be on the higher side.
User space boot time to Login without improvements: ~8.6s
The quiet argument in the kernel arguments is also picked up by systemd. This change already has a positive effect on the systemd boot time, shaving off about 1.6s in the process.
User space boot time to Login with this improvement: ~6.5 s
systemd provides an utility called systemd-analyze which prints a list of services and their starting time when initiated with the “blame” argument. This allows finding boot time offenders quite easily; however, the values might be misleading since the time is measured according to the wall clock time. A listed service might just be in the sleep state the CPU is processing other work. So the service at the top of the list may not be the biggest boot time offender, especially on single core system.
Services can be disabled using the disable commands. Some services (especially the services provided by systemd itself) might need the mask command to disable them. Some might still be required for the system to operate; hence disabling the service should be done carefully and only one at a time. For this article, the following services have been disabled:
systemctl disable usbg
systemctl disable connman.service # replaced with networkd
systemctl mask alsa-restore.service
User space boot time to Login with this improvement: ~6.1s
Systemd comes with its own system logging daemon called journald. It is one of those components that should not be disabled entirely. During booting up the logging daemon needs to manage and delete old log files on the disk as well as write new log entries to the disk. By disabling the logging in to the disk boot time can already be improved, with the cost of having no log files stored of course. Use Storage=none in /etc/systemd/journald.conf to disable the log storage part.
User space boot time to Login with this improvement: ~5.6 s
System V init and other alternativesFor many years SysV has been the standard init system also on Linux. Due to its script based init system, it is very modular and relatively easy to strip to a bare minimum. Especially for relatively static systems, where systemd's device activation or socket activation are not needed, SysV is a good alternative.
The Yocto project’s reference distribution “poky”, which I blogged about in my previous article The Yocto Project's Reference Distribution “Poky” on Toradex Hardware, uses SysV by default. Using the ‘minimal-console-image’ and a static IP address configuration, the measured user space boot time on Colibri VF61 is ~2.3s.
User space boot time to Shell with System V: ~2.3s
The meta-yocto layer also provides ‘poky-tiny’, which uses just a shell script as the init system. Just replace the distribution with “poky-tiny” and build the usual Yocto image, such as ‘console-image-minimal’. The distribution is meant to be used as an initramfs; however, by removing MACHINE_ESSENTIAL_EXTRA_RDEPENDS, IMAGE_FSTYPES and PREFERRED_PROVIDER_virtual/kernel from the conf/distro/poky-tiny.conf file, I am able to build a working UBIFS image. To properly “reconfigure” the distribution for a flashable root file system, one should create a new distribution layer and copy the distribution configuration file. The “boot time” to the shell is obviously very fast (220 ms), allowing execution of a simple command with an overall boot time of just below 2 seconds. But, it also provides almost no features other than mounting the root file system, some basic virtual file system support and a shell. Still, depending on the amount of features needed in a project, this could be a good starting point.
User space boot time to Shell with a Shell script only: ~0.2 s
Further resources: http://free-electrons.com/doc/training/boot-time/boot-time-slides.pdf
Stefan Agner
Comments