Log from the debugging of a segfault
The following is a cleaned up log I wrote for myself while debugging a bug. Writing a log while working helps me keep track of the debugging effort in case I’m interrupted (life, sleep, work, etc.). It also requires me to explain all findings to myself in fully spelled out sentences, making my thoughts considerably easier to follow.
In addition to serving as an example of a personal debug log, I hope it can be useful as an introduction to debugging segfaults or other low-level bugs.
I have three computers running Ubuntu 11.04. Mopidy revision 9c23949 consistently crashes with a segfault on one of them when I do the following:
ncmpcpp
.foo
.Updating metadata
.I check that I’m actually using the latest versions of the important pieces of software, not just some old version I installed by hand and forgot about.
I try uninstalling pyspotify v1.1+mopidy20110405 installed from my own Debian package, and install the latest revision (c6e2a02) from Git instead. Still the same error.
Nobody has reported the same problem, even though both other developers and users should have been on the same mix of versions for the last couple of months.
Luckily I can consistently reproduce the segfault in less than 20 seconds of work. I’m not too familiar with debugging C programs and especially the infamous segfaults, but I’m taking on this fight.
I expect the problem to be in pyspotify, as the rest of the code is either pure Python or more well-tested and broadly used libraries. Also the segfault consistently happens directly after a log message from Mopidy’s Spotify backend.
Antoine Pierlot-Garcin, the new main contributor to pyspotify, says to rebuild pyspotify with CFLAGS="-g -O0"
. -g
will include debug information that gdb
will understand in the resulting binary. -O0
will override the -O2
default of the pyspotify build system, and turn off any optimizations to ease the debugging.
Googling “debugging segfaults” yields a nice howto on debugging segfaults.
In summary: Get a core dump, inspect it with gdb
.
Possible causes: “There are four common mistakes that lead to segmentation faults: dereferencing NULL, dereferencing an uninitialized pointer, dereferencing a pointer that has been freed (or deleted, in C++) or that has gone out of scope (in the case of arrays declared in functions), and writing off the end of an array.”
Rerunning Mopidy nothing more interesting happens, except the usual segfault. No core dump is produced.
Hum, how to get a core dump? I’ve done it before, but I can’t remember. Google helps.
I try ulimit -c 50000
to get a core dump of maximum 50MB, and then reproduce the segfault.
Run python mopidy -v
, connect with ncmpcpp
, search for foo
, wait less than 10 seconds. Kaboom:
DEBUG 2011-05-23 20:07:02,030 [7659:SpotifySMThread] mopidy.backends.spotify.session_manager
Metadata updated
Segmentation fault (core dumped)
Loads the core dump up in gdb
:
gdb /usr/bin/python core
gdb
complains that the core dump is truncated, and that is should be approximately 180MB.
man bash
and search for ulimit
. Aha. I can specify unlimited
. But I’m not allowed to? Restart shell, run ulimit -c unlimited
again. Works.
Rerun Mopidy to get an untruncated core dump.
Yay! gdb
loads debug symbols from all the linked libraries that got them available. Notably, the proprietary libspotify
library does not include debug symbols. So what does gdb
say?
Core was generated by `python mopidy -v'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007fef50dae5f0 in sp_album_add_ref () from /usr/lib/libspotify.so.7
(gdb)
According to the libspotify docs for the album subsystem sp_album_add_ref
takes an sp_album
struct and increases the reference count of the album. Looking back at the list of reasons for segfaults from the segfault debugging howto, it may sound as we increase the reference count of NULL
, which obviously isn’t good. Let’s see if that hypothesis is correct…
Looking in the howto for ways to proceed, backtrace
reveals the events happening just before the segfault:
(gdb) backtrace
#0 0x00007fef50dae5f0 in sp_album_add_ref () from /usr/lib/libspotify.so.7
#1 0x00007fef51048801 in Track_album (self=0x7fef4d155a50) at src/track.c:72
#2 0x00000000004970ef in call_function (f=<value optimized out>, throwflag=<value optimized out>) at ../Python/ceval.c:3997
#3 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at ../Python/ceval.c:2666
Moving one step up the stack, we get to the pyspotify code:
(gdb) up
#1 0x00007fef51048801 in Track_album (self=0x7fef4d155a50) at src/track.c:72
72 sp_album_add_ref(album);
Our guess was that we’re increasing the reference count on an album that is null. Let’s see what album
actually was…
(gdb) print album
$1 = (sp_album *) 0x0
A pointer to an sp_album
struct at the address 0x0
, also known as NULL
. Hypothesis confirmed.
Let’s take a look at the pyspotify code in question:
static PyObject *Track_album(Track *self) {
sp_album *album;
album = sp_track_album(self->_track);
Album *a = (Album *)PyObject_CallObject((PyObject *)&AlbumType, NULL);
sp_album_add_ref(album);
a->_album = album;
return (PyObject *)a;
}
We create a pointer to a sp_album
struct. Given a Spotify track, we request a reference to the related album, and assign the result to our pointer. Then we create an Album
Python object, before we increase the reference count on the sp_album
and give the reference to the Python object. Finally, the Python object is returned.
Lets try just returning NULL
if we get NULL
from sp_track_album
:
diff --git a/src/track.c b/src/track.c
index f0d5956..3284029 100644
--- a/src/track.c
+++ b/src/track.c
@@ -68,6 +68,9 @@ static PyObject *Track_album(Track *self) {
sp_album *album;
album = sp_track_album(self->_track);
+ if (album == NULL) {
+ return NULL;
+ }
Album *a = (Album *)PyObject_CallObject((PyObject *)&AlbumType, NULL);
sp_album_add_ref(album);
a->_album = album;
If we rebuild pyspotify and try to reproducing the segfault, we now get a familiar Python traceback instead:
/usr/lib/python2.7/threading.py:828: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
return _active[_get_ident()]
ERROR 2011-05-23 22:33:05,214 [10571:SpotifySMThread] mopidy.utils.process
error return without exception set
Traceback (most recent call last):
File "/home/jodal/dev/mopidy/mopidy/utils/process.py", line 21, in run
self.run_inside_try()
File "/home/jodal/dev/mopidy/mopidy/backends/spotify/session_manager.py", line 40, in run_inside_try
self.connect()
File "/usr/local/lib/python2.7/dist-packages/pyspotify-1.1-py2.7-linux-x86_64.egg/spotify/manager.py", line 48, in connect
self.loop(sess) # returns on disconnect
File "/usr/local/lib/python2.7/dist-packages/pyspotify-1.1-py2.7-linux-x86_64.egg/spotify/manager.py", line 58, in loop
self.timer = threading.Timer(timeout/1000.0, self.awoken.set)
File "/usr/lib/python2.7/threading.py", line 731, in Timer
return _Timer(*args, **kwargs)
File "/usr/lib/python2.7/threading.py", line 742, in __init__
Thread.__init__(self)
File "/usr/lib/python2.7/threading.py", line 446, in __init__
self.__daemonic = self._set_daemon()
File "/usr/lib/python2.7/threading.py", line 470, in _set_daemon
return current_thread().daemon
File "/usr/lib/python2.7/threading.py", line 828, in currentThread
return _active[_get_ident()]
File "/home/jodal/dev/mopidy/mopidy/backends/spotify/session_manager.py", line 73, in metadata_updated
self.refresh_stored_playlists()
File "/home/jodal/dev/mopidy/mopidy/backends/spotify/session_manager.py", line 131, in refresh_stored_playlists
SpotifyTranslator.to_mopidy_playlist(spotify_playlist))
File "/home/jodal/dev/mopidy/mopidy/backends/spotify/translator.py", line 62, in to_mopidy_playlist
if str(Link.from_track(t, 0))],
File "/home/jodal/dev/mopidy/mopidy/backends/spotify/translator.py", line 34, in to_mopidy_track
if dt.MINYEAR <= int(spotify_track.album().year()) <= dt.MAXYEAR:
SystemError: error return without exception set
Note that we get a SystemError
and not e.g. AttributeError: 'NoneType' object has no attribute 'year'
which we would expect if album()
returned None
. This is because we return NULL
from Track_album
, which Python considers an “error return”, without setting an exception code first. If we want to return None
on failure instead of throwing an exception, we can change track.c
as follows:
diff --git a/src/track.c b/src/track.c
index f0d5956..82897de 100644
--- a/src/track.c
+++ b/src/track.c
@@ -68,6 +68,8 @@ static PyObject *Track_album(Track *self) {
sp_album *album;
album = sp_track_album(self->_track);
+ if (!album)
+ Py_RETURN_NONE;
Album *a = (Album *)PyObject_CallObject((PyObject *)&AlbumType, NULL);
sp_album_add_ref(album);
a->_album = album;
Rebuilding pyspotify again, and reproducing the error, we now get the expected Python error which we can handle nicely:
File "/home/jodal/dev/mopidy/mopidy/backends/spotify/session_manager.py", line 73, in metadata_updated
self.refresh_stored_playlists()
File "/home/jodal/dev/mopidy/mopidy/backends/spotify/session_manager.py", line 131, in refresh_stored_playlists
SpotifyTranslator.to_mopidy_playlist(spotify_playlist))
File "/home/jodal/dev/mopidy/mopidy/backends/spotify/translator.py", line 62, in to_mopidy_playlist
if str(Link.from_track(t, 0))],
File "/home/jodal/dev/mopidy/mopidy/backends/spotify/translator.py", line 34, in to_mopidy_track
if dt.MINYEAR <= int(spotify_track.album().year()) <= dt.MAXYEAR:
AttributeError: 'NoneType' object has no attribute 'year'
By this point, I consider the bug squashed and ready to be reported.
The only part left is to decide how to best fix it. E.g. if album()
should return None
, throw an exception, or maybe return an empty album object. That’s another story.