[arch-general] Troubleshooting random crash

Curtis Shimamoto sugar.and.scruffy at gmail.com
Wed Feb 6 22:45:41 EST 2013


Andre Goree <andre at drenet.info> wrote:

>On 02/06/13 20:14, Gaetan Bisson wrote:
>> [2013-02-06 19:06:45 -0500] Andre Goree:
>>> Not really too keen on downgrading a bunch of packages that might
>break
>>> dependencies and provide a REAL mess.  If I have to go through that
>long
>>> process, I'd rather just reinstall -- which at this point I'm
>planning
>>> to do anyways.
>> 
>> Well, there is little point in posting to this list if you have no
>> motivation to actually investigate the problem.
>> 
>> For starters, you've upgraded Linux from 3.6.11 to 3.7.4 in the
>window
>> when you report the issue appeared; from the symptoms you described,
>> it's a likely suspect. Downgrading it is far from being a "REAL
>mess":
>> you only need to downgrade/rebuild the external modules you really
>need
>> (probably none).
>
>Indeed there isn't, and surely even less point in replying to said post
>if in fact I had no motivation.  Given that I'm replying, I'd probably
>like to avoid reinstalling if at all possible.  I like the idea of
>downgrading just the kernel -- obviously I mean downgrading every
>package I've upgraded since 1/21 was not something I wanted to
>undertake.  I'll try this tomorrow.
>
>>> In fact it seems
>>> all system processes hang because no logs are produced after the
>issue
>>> rears it's ugly head.
>> 
>> Ah. So that would mean your issue is I/O related, then?
>
>It would seem so, yes.  I hinted to this at the end of my last reply as
>well.
>
>> 
>>>> So you produce nothing at work?
>>>
>>> Not sure if you're just being an ass or not, however if you aren't:
>>> that has nothing at all to do with the issue and I merely wanted to
>>> establish _why_ I was using btrfs on a machine that I have running
>at my
>>> job -- which is _also_ inconsequential in the context of my email. 
>If
>>> you indeed were being an ass, congrats, you succeeded.
>> 
>> Once you were done being offended, you could have looked for the
>meaning
>> behind the words I used: that your "main work desktop" really
>qualifies
>> as "a production server".
>> 
>> But, of course, as you have so unequivocally declared, btrfs has
>> absolutely "nothing at all to do with the issue". And your statement
>> above implying that the problem is I/O related is just a coincidence.
>> 
>
>I think you mis-comprehended my reply.  Following the context, I merely
>meant that distinguishing my system from a production server and
>explaining why I was running btrfs on this system was inconsequential
>to
>the issue at hand.  Which is still true.  I never said nor meant it to
>be understood that I believed btrfs not to be the problem.  In fact,
>the
>opposite is true.
>
>So, for the sake of clarity, I never declared (and certainly not
>unequivocally) "btrfs has absolutely nothing at all to do with the
>issue", but rather, my distinctions and reasons for running btrfs have
>nothing to do with the issue.  Not sure how you got that mixed up,
>especially given the later part of my reply.
>
>
>> Reporting issues is worthless when speculation is substituted for
>hard
>> data. For example, a good report would have gone: "I believe this
>issue
>> is unrelated to btrfs being my root filesystem since, on another Arch
>> machine running ext3, I observe the following identical symptoms:
>first,
>> `ssh -vvv` hangs at exactly the same point; second..."
>
>
>I'll be sure to raise my reporting standards the next time I'd like
>help
>from an Arch list, my apologies.
>
>>>> How about looking at the system logs to see what your system was up
>to
>>>> just before a crash? 
>>>
>>> I've done that, with no real hints.  That's the first thing any
>linux
>>> admin does when confronted with an issue such as this, no?
>> 
>> Sure. But your first post gave no indication that you did that.
>
>Indeed, I need to raise my reporting standards, I figured a lot of
>stuff
>was implied but I now know I must be much clearer.  Again, my
>apologies.
>
>
>>> Is there
>>> perhaps a way to build Thunderbird with debug symbols or some kind
>of
>>> logging?  I seem to recall opening Thunderbird each time this issue
>has
>>> showed up.
>> 
>> Well it would be nice to confirm that it is indeed at fault;
>downgrading
>> it is certainly not a "REAL mess" either. You can certainly also
>build
>> it with debug symbols: in the PKGBUILD (or makepkg.conf), set
>> CXXFLAGS='' LDFLAGS='' CFLAGS='-g' and remove the strip option.
>
>
>Given that thunderbird wasn't upgraded in the time that this issue
>began, not sure a downgrade would help but it may be worth a shot.
>Thanks for the pointers on building with debug symbols.
>
>
>>> I'm ready
>>> to blame btrfs b/c that's the only issue I see with my setup -- I
>also
>>> have a tough time running a virtual machine on this box which I
>believe
>>> is also due to btrfs.
>> 
>> Didn't you write just a few lines ago that btrfs "has nothing at all
>to
>> do with the issue"?
>
>I most certainly did not, there's an obvious misunderstanding here.
>
>> 
>> Wild guess: your thunderbird mail database is huge (just like the
>disk
>> image of your virtual machine - although I cannot really know what
>you
>> mean by "tough time") and your btrfs has problems dealing with such
>big
>> files (for instance, because your filesystem nearly full). To
>confirm,
>> start thunderbird with an empty profile (such as by renaming
>~/.mozilla
>> into ~/.mozilla.old) and see what happens.
>> 
>
>The thing is, this doesn't happen everytime I start thunderbird --
>rather, seemingly, after the system has been up for a long period (>20
>hrs or so).  The filesystem is not nearly full either, though it does
>contain a lot of data.  I'm thinking downgrading to 3.6.x will help a
>bit.  I'm going to look for btrfs bugs in 3.7.x and see if anyone else
>has been having a similar issue as well.  Thanks for the assistance.

For downgrading, I have found the Arch Rollback Machine quite handy. You can choose a date to roll back to, sync and then have pacman reinstall all packages that it finds are newer than its database. This is of course if there was not a significant change like potentially the recent filesyatem update. 

I would certainly try doing just the kernel first though, as that is even easier. Just thought I would mention that amazing tool we have in our debugging arsenal.

Regards,
-- 
Curtis Shimamoto
sugar.and.scruffy at gmail.com


More information about the arch-general mailing list