disabling loop unrolling in GCC

disabling loop unrolling in GCC

Oleg Goldshmidt pub at goldshmidt.org
Mon Dec 21 18:06:16 IST 2009


2009/12/21 Shachar Shemesh <shachar at shemesh.biz>:
> Hi all,
>
> I'm trying, without success, to disable loop unrolling when compiling a
> program with -O3 with gcc (4.4, but I see the same problem with 4.3).

I am actually very surprized that -O3 unrolls loops. It is not
supposed to. The idea to include -funroll-loops into O3 was raised
quite a few times and was always rejected. Maybe something changed in
recent years. The documentation certainly does not say loop unrolling
is enabled with either -O2 or -O3.

I suspect something is the matter with -ftree-loop-optimize. The gcc
documentation says,

`-ftree-loop-optimize'
     Perform loop optimizations on trees.  This flag is enabled by
     default at `-O' and higher.

However, the behaviour depends on which optimization options you use.
E.g., -O2 won't unroll no matter what:

$ gcc -c -O2 -ftree-loop-optimize loop.c
$ objdump -S loop.o

loop.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <func>:
   0:	31 c0                	xor    %eax,%eax
   2:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
   8:	83 c0 01             	add    $0x1,%eax
   b:	c7 05 00 00 00 00 00 	movl   $0x0,0x0(%rip)        # 15 <func+0x15>
  12:	00 00 00
  15:	83 f8 08             	cmp    $0x8,%eax
  18:	75 ee                	jne    8 <func+0x8>
  1a:	f3 c3                	repz retq


However, try compiling with -O3 -fno-tree-loop-optimize and you will succeed.

$ gcc -c -O3 -fno-tree-loop-optimize loop.c
$ objdump -S loop.o

loop.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <func>:
   0:	31 c0                	xor    %eax,%eax
   2:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
   8:	83 c0 01             	add    $0x1,%eax
   b:	c7 05 00 00 00 00 00 	movl   $0x0,0x0(%rip)        # 15 <func+0x15>
  12:	00 00 00
  15:	83 f8 07             	cmp    $0x7,%eax
  18:	7e ee                	jle    8 <func+0x8>
  1a:	f3 c3                	repz retq

Or, if you are primarily interested in code size as you indicate, why not -Os?

$ gcc -c -Os loop.c
$ objdump -S loop.o

loop.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <func>:
   0:	31 c0                xor    %eax,%eax
   2:	ff c0                	inc    %eax
   4:	c7 05 00 00 00 00 00 	movl   $0x0,0x0(%rip)        # e <func+0xe>
   b:	00 00 00
   e:	83 f8 08           cmp    $0x8,%eax
  11:	75 ef                	jne    2 <func+0x2>
  13:	c3                   	retq

Hope it helps,

-- 
Oleg Goldshmidt | pub at goldshmidt.org



More information about the Linux-il mailing list