Why is GNU/Linux so Bloated?

Thu Jun 11 22:22:13 IDT 2009

Shlomi Fish <shlomif at iglu.org.il> writes:

> Hi all!
>
> Based on the gcc-4.4.0 (with -Os) / x86-Linux shared library sizes
> here:
>
> http://tech.groups.yahoo.com/group/fc-solve-discuss/message/998
>
> And the Visual C++/Win32 (also x86) .dll sizes here:
>
> http://tech.groups.yahoo.com/group/fc-solve-discuss/message/999
>
> My question is: why are the Visual C++ generated binaries so much
> smaller than the equivalent Linux ones? Any insights would be
> appreciated.

Shlomi,

The short answer is, I don't know. I didn't even try to figure out
where the apples and the oranges were in your fc-solve-discuss
postings. Since you don't list files and sizes (at least not in any
way I can decipher, being unfamiliar with the project) or specify how
you compile and link (apart from -Os), I don't know if you compare
apples to apples. 

I'll assume you compare dynamically linked executables on Linux/gcc
and on Windows/cl, and the corresponding so and dll libraries.

I'll wave my hands wildly and offer a couple of guesses that you can
try to investigate. They may be completely off the mark.

1) You probably know that DLLs work differently from Linux shared
   libraries. DLLs contain relocatable code that uses a preferred base
   address to which the loader will want to map the file. If a process
   is linked against several libraries all but one need to be
   relocated to other free addresses, COW-ed while fixing the addresses,
   independently paged, etc. This also means that DLLs are dynamically
   loaded, but not shared (they can only be shared between processes
   with the same memory layout). Linux shared libraries contain
   position-independent code (PIC) that uses only relative (to the
   program counter) addresses. These libraries are really shared.

   PIC implies address translation tables that are filled at load
   time, but I suppose they are allocated at link time. This may be
   one source of size overhead. I have no idea how important this
   overhead is, you need to consult the experts. 

   There are (or at least used to be) -fPIC and -fpic options to
   GCC. IIRC, -fpic implied a limit on the size of translation tables,
   and refused to build if the resulting tables were too large. In
   comparison, -fPIC implies no limits. However, I seem to recall that
   the limits were quite small.

2) I suppose that the structure of the code is important. E.g., does
   your optimization include inlining? Inlining replicates code,
   objects, etc., hence it may affect something. I am not sure if -Os
   overrides inlining.

3) Do you use exceptions a lot? IIRC, GCC generates stack unwinding
   information for each function that may throw an exception (unless
   something changed - you are using a recent version). This
   information is stored in the executable. I don't know if the MS
   compiler does the same thing. 

-- 
Oleg Goldshmidt | pub at goldshmidt.org