Different malloc implementations for Qt apps
TL;DR: tcmalloc is faster at startup than the normal malloc when tested with a sample Qt app on an embedded device.
The malloc implementations tested were:
standard malloc implementation from glibc
tcmalloc (part of Google perftools)
jemalloc (used in FreeBSD, among others)
Unfortunately using jemalloc resulted in a bus error, so there are no results for that implementation. The error backtraced to C++11 atomics and was not investigated further.
Plugging different malloc implementations into a Qt application is easy by using LD_PRELOAD. Furthermore, to simulate a cold boot, the Linux caches were cleared before each run with the following command:
echo 3 > /proc/sys/vm/drop_caches
This makes Linux drop its file system caches, which makes the startup slower and thereby the relative speedup by using different mallocs smaller. However it is closer to the real world scenario of powering on a device.
The measured value was startup time, i.e. the time from the beginning of the main() function until the first time something is drawn onto the screen (i.e. the first time the frameSwapped() signal of QQuickWindow is called). This yields a roughly 300 ms speedup when using tcmalloc:
This seems like a good speedup, since the startup of a QML app consists of parsing many QML files, so the amount of waiting for file I/O might be considerable. It would be interesting to check whether the relative speedup is higher when using the QML compiler.
Memory usage is roughly the same, with a slight advantage for standard malloc:
The measured value was the "Rss" field of the proc file system ("grep Rss: /proc/`pidof appman`/smaps") to only measure the amount of memory actually present in RAM, as opposed to the reserved size.
Another pitfall when measuring memory usage is to only look at the "heap" sections of smaps, which apparently only tracks memory allocations made via the (s)brk commands, while anonymously mmap'ed pages are not marked with "heap".
The "other" value in the diagram above includes sections of shared libraries and mmap'ed files.
What has not been measured was memory usage and fragmentation when running the program for a longer time, which seems to be the focus of jemalloc.
A big thanks goes to to Pelagicore for supplying the hardware and helping with appman installation.