[arch-dev-public] doc size

Xavier shiningxc at gmail.com
Wed Dec 16 17:07:53 EST 2009


On Wed, Dec 16, 2009 at 10:46 PM, Dan McGee <dpmcgee at gmail.com> wrote:
> On Wed, Dec 16, 2009 at 1:18 PM, Xavier <shiningxc at gmail.com> wrote:
>> On Wed, Dec 16, 2009 at 8:05 PM, Allan McRae <allan at archlinux.org> wrote:
>>> Xavier wrote:
>>>>
>>>> Hello,
>>>>
>>>> I like this gnome feature that warns me about running out of space and
>>>> proposes me to run a disk usage analyzer (baobab).
>>>> Looking at the results, I found out that /usr/share/doc was taking a
>>>> non negligible space : 140M
>>>> It's actually the 3rd biggest directory in my filesystem after two
>>>> games (openarena and flightgear). Well maybe 4th if I also count
>>>> warsow in /opt :)
>>>>
>>>> It's especially gtkmm that made me go wtf. It takes 48MB. I checked
>>>> the package size, 60MB.
>>>> The next entries are far away, at 10MB, but still. Compared to the
>>>> package size, they can still be pretty big (more than half).
>>>> That made me want to check the proportion of docs in each package, and
>>>> I quickly hacked a script together.
>>>> I am pretty sure this subject has come here before, but I don't
>>>> remember seeing any results, so I will post them here.
>>>> That might help to establish some reasonable limits for when a package
>>>> should be split. And with makepkg supporting that now, it's much
>>>> better than before.
>>>
>>>
>>> Didn't Dan post a patch for namcap to check the relative proportion of docs
>>> at some stage?
>>>
>>
>> Indeed, that's awesome. Seems I missed or forgot it.
>> It's actually the only patch that came up after namcap 2.4 so there
>> hasn't been a new release yet.
>> http://projects.archlinux.org/namcap.git/
>>
>> I will try it to compare with my results.
>> I don't think that makes my results worthless though, I see both tools
>> as complementary.
>> Mine allows to quickly see what are the worst packages in your cache
>> (either in ratio or in docsize), so that they can be treated in
>> priority.
>
> Yeah, I can't remember which package it was that I noticed this on,
> but it might have been gtkmm and it made me go "WTF" as well, thus the
> reason for the patch.
>

It was indeed gtkmm, you posted it on the list too :)

So I played a bit with namcap, I tweaked it a bit to include more
information, to reconstruct a list similar to my first one. See
attached patch.

The results are slightly different, mostly because makepkg
overestimates uncompressed size by not using -a with du, while namcap
lotsofdocs computes the real size.

Now that I think about it, maybe the trick/hack I used in my first
script would actually be a portable way to get the real uncompressed
size :
bsdtar tvf foo.pkg.tar.gz 2>/dev/null | awk '{ SUM += $5 } END { print SUM }'

But this is offtopic, I should find again the makepkg bug report we
had about this that we probably rejected :P

$ ./namcap.py -r lotsofdocs /home/pkg/* > result
$ sort -k 3 -r result | cut -d';' -f1 | column  -t -s'W'
libsigc++2.0     : Package was 86% docs by size (9146/10574K)
mcpp             : Package was 83% docs by size (761/915K)
pangomm          : Package was 81% docs by size (2689/3295K)
gtkmm            : Package was 80% docs by size (45/56M)
eggdbus          : Package was 78% docs by size (2634/3366K)
glibmm           : Package was 78% docs by size (10/13M)
libogg           : Package was 77% docs by size (232/299K)
libsoup          : Package was 75% docs by size (1467/1948K)
randrproto       : Package was 74% docs by size (77/103K)
libsoup          : Package was 74% docs by size (1448/1939K)
libgdata         : Package was 73% docs by size (2126/2898K)
flac             : Package was 69% docs by size (4534/6485K)
fontconfig       : Package was 69% docs by size (1917/2754K)
clutter-gtk      : Package was 69% docs by size (164/237K)
policykit        : Package was 66% docs by size (1048/1583K)
telepathy-glib   : Package was 64% docs by size (7/11M)
raptor           : Package was 64% docs by size (1144/1771K)
pygtk            : Package was 64% docs by size (10/15M)
libdatrie        : Package was 63% docs by size (93/147K)
redland          : Package was 63% docs by size (1040/1639K)
libepc           : Package was 62% docs by size (494/786K)
renderproto      : Package was 62% docs by size (36/58K)
libunique        : Package was 62% docs by size (139/221K)
cairo            : Package was 62% docs by size (1177/1889K)
groff            : Package was 60% docs by size (10/17M)
rasqal           : Package was 59% docs by size (567/948K)
libnotify        : Package was 57% docs by size (87/153K)
libgtop          : Package was 57% docs by size (607/1051K)
clutter          : Package was 57% docs by size (3/6M)
libbeagle        : Package was 56% docs by size (430/760K)
poppler-glib     : Package was 56% docs by size (394/702K)
pango            : Package was 56% docs by size (2092/3679K)
libxslt          : Package was 56% docs by size (1660/2955K)
libtheora        : Package was 56% docs by size (1020/1819K)
compositeproto   : Package was 55% docs by size (13/23K)
pcre             : Package was 55% docs by size (1060/1903K)
polkit           : Package was 54% docs by size (1093/2010K)
libxml2          : Package was 51% docs by size (5/10M)
libxklavier      : Package was 51% docs by size (185/359K)
damageproto      : Package was 50% docs by size (7/14K)
libgsf           : Package was 50% docs by size (658/1291K)
at-spi           : Package was 50% docs by size (2/4M)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-lotsofdocs-minor-improvements.patch
Type: text/x-diff
Size: 2356 bytes
Desc: not available
URL: <http://mailman.archlinux.org/pipermail/arch-dev-public/attachments/20091216/7a261c5b/attachment-0001.bin>


More information about the arch-dev-public mailing list