In October, we shared a blog post detailing significant snap startup time improvements due to the use of a new compression algorithm. In that article, we focused on the cold and hot application startups, but we did not delve much into the first-run setup scenario.
While running our tests, we observed a rather interesting phenomenon, primarily on the Fedora 32 Workstation system. On a particular laptop, we noticed that the initial snap launch took about 60 seconds, whereas cold launches would take about 10 seconds. We decided to analyze this problem, and once we did, we realized there’s an amazing investigative story to be shared, including some really cool findings and general advice for developers on how to optimize their snaps.
First-run startup time
It is important to note that the difference (6x) was what drew our eye – rather the actual numbers, as these are highly dependent on the platform capabilities and software in question. For that matter, 6-second and 1-second launch times would have been equally representative.
Using the script we developed for the test, we timed the Chromium browser launches. To simulate the first-time startup, one can simply move (or delete) the ~/snap/chromium folder, and then purge the system caches. We soon found out that aforementioned long startup times, in the range of ~60 seconds, compared to ~10 seconds for the cold startups.
Snapd, the snap service, contains a number of tools that can help developers and users troubleshoot the startup and execution of their snaps. One of these features is the –trace-exec flag with which snaps can be run, allowing you to see a list of tasks and their execution times in the startup chain of the snap. It can be a good indicator as to where you might need to focus in your investigation.
Unfortunately, the –trace-exec option didn’t tell us much:
Slowest 10 exec calls during snap run:
Total time: 60.647s
Similarly, using the –strace option, which lets you trace snap applications the way you normally would with strace, didn’t help either:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- -------------------
42.10 3.315633 3000 1105 86 futex
19.62 1.544763 3814 405 epoll_wait
15.25 1.201121 1814 662 poll
9.94 0.783167 6751 116 51 wait4
2.68 0.210840 149 1410 2 read
2.18 0.171819 14 12079 10122 openat
2.00 0.157887 22 6934 6600 access
1.39 0.109535 2960 37 execve
0.96 0.075822 15 4749 3388 stat
The summary of a slow run showed some interesting information, but it didn’t indicatively point to anything specific. The telling part was that the recorded time in the strace log didn’t match the startup time. Typically, if there’s a mismatch between total runtime of an application and the sum of its different sub-tasks and system calls, that indicates computation – time spent in the CPU crunching numbers. Before we looked at CPU counters, to try to understand a little better what was happening during the initial snap startup, we decided to check the contents of the ~/snap/chromium directory in more detail.
Content of a snap
A snap has two parts – the read-only filesystem with the application code, mounted under /snap/snapname, and the user-writable component, under ~/snap/snapname. In the latter path, there will often be a sub-directory named common, which stores application-specific configurations that are retained over updates (refreshes). For example, in the case of Chromium, this would be the user profile, as well as any caches, like font and shader caches, and other information the browser needs to maintain consistency and user data over multiple launches.
We looked at the timestamps for every object inside the common directory, and we found another clue. There was a ~50 second gap between the creation of the BrowserMetrics directory and the SingletonSocket symlink.
drwx------. 2 igor igor 4096 2020-10-26 12:15:56.952907984 +0000 BrowserMetrics
lrwxrwxrwx. 1 igor igor 50 2020-10-26 12:16:47.236109348 +0000 SingletonSocket -> /tmp/.org.chromium.Chromium.zrTDlA/SingletonSocket
At this stage, we really needed to understand what Chromium was doing here.
Perf is a powerful system profiling tool, using the perf_events interface in the Linux kernel. The tool has various uses, but if you want to get a very simple and quick overview of the CPU activity on your host, you can use the record command, and then report command to get an understanding of the system behavior. It does not trace specific applications, but it can still be very indicative and quite accurate, especially if a specific application is doing a lot of computation on a largely idle host.
We ran the tool in the background during the Chromium browser initial startup launch, and we discovered some quite useful tidbits. Most notably, out of roughly 33,000 events, some 6,800 were done by Chromium, accounting for 80.3% of the CPU consumed during the sample period. Out of this portion, 91.8% of the time was consumed by just two shared objects – libfreetype.so.6.15.0 and libfontconfig.so.1.10.1. A small selection of top hitters from the perf report is shown below:
7.42% swapper [kernel.kallsyms] [k] mwait_idle_with_hints.constprop.0
4.40% chrome libfreetype.so.6.15.0 [.] 0x00000000000378fa
2.39% chrome libc-2.27.so [.] 0x000000000018f19d
1.95% chrome libfreetype.so.6.15.0 [.] FT_DivFix
1.05% chrome libfreetype.so.6.15.0 [.] 0x0000000000039645
0.86% chrome libfreetype.so.6.15.0 [.] FT_Outline_Get_CBox
0.85% chrome libfreetype.so.6.15.0 [.] 0x0000000000039233
0.69% chrome [kernel.kallsyms] [k] lzo1x_decompress_safe
0.67% chrome libfreetype.so.6.15.0 [.] 0x00000000000349b8
0.62% chrome libfreetype.so.6.15.0 [.] 0x000000000003a150
This indicates a possible problem with the font cache creation, something that used to affect snaps in the past, but which has largely been resolved. However, it would seem a similar or perhaps a new problem of this nature may be affecting Chromium. We had to look at the fontconfig caches in more detail.
Private snap caches are stored in the hidden .cache directory inside the common subdirectory for each snap. There can be multiple types of caches, like fontconfig, GDK-PixBuf Loaders cache, GIO modules cache, immodules, MESA shader caches, and others.
With perf pointing at an issue with font caches, it was relatively easy to determine whether this is the main culprit. Indeed, it was – deleting all the other caches, and keeping the fontconfig directory in place resulted in first-time launch being near-identical to the cold launch times.
But we still needed to determine why the font cache generation was so slow.
-rw-rw-r--. 1 igor igor 104 2020-10-27 16:41:14.683992076 +0000 4c599c202bc5c08e2d34565a40eac3b2-le64.cache-7
-rw-rw-r--. 1 igor igor 176 2020-10-27 16:41:14.691992107 +0000 9dc4195ac9420d9178822dc735425b39-le64.cache-7
-rw-rw-r--. 1 igor igor 4928 2020-10-27 16:41:14.745992316 +0000 b3043b29744cf0a429678d296f5e4026-le64.cache-7
We looked at the exact timestamps for each cache and discovered that it was just one font taking the bulk of the time. Out of the 50 fonts on the particular system, the caches for 49 of them were generated within ~500 ms all together, whereas just this one cache took ~45 seconds to complete.
Caches are binary files, but you can inspect them with strings (or open them with a text editor), and you will be able to see the name of the font that was used to generate the cache. We quickly determined that it was the google-noto font (family) that was behind this.
Temporarily moving the font out of the /usr/share/fonts directory resulted in first-launch times becoming almost identical to the cold launches.
What about formal testing?
Of course, the snap development and release cycle also includes a range of automated tests, designed to detect possible errors and regressions in both the build process and application runtime. This is a topic we will discuss at quite some length in a separate article. Here, it is worth mentioning that the problem we encountered was new as well as specific to Chromium, and it did not manifest under all conditions.
Furthermore, formally, there was nothing explicitly wrong with the font cache generation, as a system may have a large number of fonts, and it may take a long time to generate them. This is why this particular use case was such a fascinating story.
The fix – snapcraft extension & font hook
Of course, removing fonts is not the solution here.
What we determined from this exercise were several things. One, the Chromium browser snap was not using any snapcraft desktop extensions, which come with multiple useful features, including various common assets used by desktop applications. Snapcraft extensions also help reduce the application size, which can also contribute to faster startup time, both as common libraries will most likely already be loaded into memory, and due to smaller size of the application data. Furthermore, in the case of desktop-related extensions, they also encode the best practices for desktop application snaps, all of which can help developers achieve better consistency and performance with their snaps.
The most trivial example would be to look at something like gnome-calculator:
If we expand the extension, declared in the gnome-calculator snapcraft.yaml file, then we learn that the extension abstracts away:
- Several additional plugs, including the GNOME 3.34 content snap, which contains the majority of common assets used by GNOME/GTK3 applications, significantly reducing the application size.
- A long list of build environment variables, including: XDG_DATA_DIRS, GETTEXTDATADIRS, GDK_PIXBUF_MODULE_FILE, and others.
- GTK3 themes, icon themes and sound themes, and others.
This approach can help developers make snaps in a more consistent fashion, more likely to integrate correctly into the desktop, and will also have a smaller footprint. As a separate example, the KDE Kcalc snap only weighs about 1 MB through the use of the KDE content snap. The typical application size would be about 100 MB otherwise.
Also on the development side, Snapcraft 4.4 comes with another neat optimization – a new mechanism to pre-generate font caches has been added to the desktop-related extensions. This allows for system font caches to be generated at install time and isolated from the potentially ABI-incompatible host generated cache.
The practical advantage is that snap startup times should be faster. To benefit from this feature, any extension using the Snapcraft project just needs to rebuild their snap. More details about this feature can be found in the desktop extensions font hook specification.
Try it for yourself
Good news, we have already implemented these changes in the Chromium snap!
You can install and try this build by running the following command in a terminal:
snap refresh chromium --channel=candidate/gnome-3-28-extension
Indeed, this revision was built against Snapcraft 4.4, taking advantage of the new mechanism to pre-generate font caches, speeding up the cold boot startup time. Equally importantly, the use of the extension results in the de-duplication of common GNOME libraries and their dependencies, resulting in a much smaller snap. The new build weighs only 122 MB – compared to about 250 MB when not using the gnome-3-28-1804 platform snap. This is a 50% reduction in size, and it is even 20% smaller than the Chromium snap using XZ compression.
Software problems are never fun – but they can also be a great turning point, and an opportunity to resolve the issue in an elegant way. User experience is a fundamental part of the snap ecosystem, and we’re investing a lot of effort in making sure people can consume their applications in a fun, seamless – and snappy – manner.
With Chromium, users now have improved startup times due to the use of LZO compression, improved first-run startup times through the use of the font hooks, and a better integrated snap with a much smaller disk footprint due to the use of the GNOME extension. We hope you found this journey interesting, and if you have any comments or ideas, please join our forum and let us know what you think.