Wednesday, November 25, 2009

Stack Canaries - Part 2 (64-bit, bypass techniques, etc.)

So, back for more pain, I decided to try and model the canary generation (without looking at either the kernel code which assigns the initial value to the registers, nor the libc code which might do some population). I want to treat this feature as a blackbox, at least initially.

For a little more in-depth view at what happens when we turn on stack protection (-fstack-protector-all), let's look at the values of the gs:0x14 memory location. We'll see if the assumption that it changes with each function call is correct (and if so, then we have a real conundrum on our hands involving memory in general). In order to measure this, the following C code can be slipped into a recursive function which just prints the value of gs:0x14.

For 64-bit code:
unsigned long int a = 0;
if(sizeof(unsigned int) != sizeof(unsigned long))
asm ("movq %%fs:0x28, %0\n":"=r"(a));

And for 32-bit code:
unsigned int a = 0;
if(sizeof(unsigned int) == sizeof(unsigned long))
asm ("movl %%gs:0x14, %0\n":"=r"(a));

These little snippets populate 'a' with the stack canary value. Note: 64-bit uses fs:0x28 instead of gs:0x14.

Running this reveals the following neat information:
[02:20:12][aconole@ssh:~]
$ ./sp32 4
gs:0x14[3829412976 / E4403470]
gs:0x14[3829412976 / E4403470]
gs:0x14[3829412976 / E4403470]
gs:0x14[3829412976 / E4403470]
gs:0x14[3829412976 / E4403470]

This tells us that the canary is global to the context in which the functions are running. Is this the same across threads? A simple change to the rig I used to print the values for 3 different threads reveals that yes - this canary is global to the process:
[02:25:47][aconole@ssh:~]
$ ./sp32_t 1
gs:0x14[1475995573 / 57F9E7B5]
gs:0x14[1475995573 / 57F9E7B5]
gs:0x14[1475995573 / 57F9E7B5]
gs:0x14[1475995573 / 57F9E7B5]
gs:0x14[1475995573 / 57F9E7B5]
gs:0x14[1475995573 / 57F9E7B5]

The same can be seen for 64-bit code:
[02:27:03][aconole@ssh:~]
$ ./sp64 4
fs:0x28[15354006399877715714 / d5146378bfc91702]
fs:0x28[15354006399877715714 / d5146378bfc91702]
fs:0x28[15354006399877715714 / d5146378bfc91702]
fs:0x28[15354006399877715714 / d5146378bfc91702]
fs:0x28[15354006399877715714 / d5146378bfc91702]

And threaded:
[02:27:31][aconole@ssh:~]
$ ./sp64_t 1
fs:0x28[15000647472177854910 / d02d01662c05b9be]
fs:0x28[15000647472177854910 / d02d01662c05b9be]
fs:0x28[15000647472177854910 / d02d01662c05b9be]
fs:0x28[15000647472177854910 / d02d01662c05b9be]
fs:0x28[15000647472177854910 / d02d01662c05b9be]
fs:0x28[15000647472177854910 / d02d01662c05b9be]

So, now we know some important things about the canary for 64-bit and 32-bit.
#1 - if we can obtain the value while the system is running, then there's no worries on modifying the stack.
#2 - We know from where this value is obtained.
#3 - We know how to retrieve this value in a fancy rig.

The next question, before we jump headfirst into probabilities and statistics for the segment register offset value is: Can we generically modify this global canary?

Here are two generic routines to do so:
void reassign_sc_32()
{
unsigned int a = 0;
if(sizeof(unsigned int) == sizeof(unsigned long))
asm ("movl %0, %%gs:0x14\n"::"r"(a));
}

void reassign_sc_64()
{
unsigned long int a = 0;
if(sizeof(unsigned int) != sizeof(unsigned long))
asm ("movq %0, %%fs:0x28\n"::"r"(a));
}

We can simply call: reassign_sc_64(); reassign_sc_32(); and the code should modify the canary value to 0 on the correct platform. A test reveals:

[02:32:30][aconole@ssh:~]
$ ./sp32_c_t 1
gs:0x14[0 / 0]
gs:0x14[0 / 0]
gs:0x14[0 / 0]
gs:0x14[0 / 0]
gs:0x14[0 / 0]
gs:0x14[0 / 0]

64-bit also yields the same results:

[02:34:05][aconole@ssh:~]
$ ./sp64_c_t 1
fs:0x28[0 / 0]
fs:0x28[0 / 0]
fs:0x28[0 / 0]
fs:0x28[0 / 0]
fs:0x28[0 / 0]
fs:0x28[0 / 0]

So, we can modify the canary value - OUTSTANDING! Lets turn on the stack protector and see it in action:

[02:35:21][aconole@ssh:~]
$ ./sp32_c_t 1
*** stack smashing detected ***: ./sp32_c_t terminated
Aborted

Hrrm. Not quite what I had expected. Looks like we're going to have to delve into the internals of the stack protector after all, which is something I was hoping to avoid.

The code for stack-protector.c below:
#include
#include

int stack_prot_64(int num_left)
{
unsigned long int a = 0;
if(sizeof(unsigned int) != sizeof(unsigned long))
asm ("movq %%fs:0x28, %0\n":"=r"(a));

if(!num_left)
{
return printf("fs:0x28[%lu / %lx]\n", a, a);
}
stack_prot_64(num_left-1);
return printf("fs:0x28[%lu / %lx]\n", a, a);
}

int stack_prot_32(int num_left)
{
unsigned int a = 0;
asm ("movl %%gs:0x14, %0\n":"=r"(a));

if(!num_left)
{
return printf("gs:0x14[%lu / %X]\n", a, a);
}
stack_prot_32(num_left-1);
return printf("gs:0x14[%lu / %X]\n", a, a);
}

void reassign_sc_32()
{
unsigned int a = 0;
asm ("movl %0, %%gs:0x14\n"::"r"(a));
}

void reassign_sc_64()
{
unsigned long int a = 0;
if(sizeof(unsigned int) != sizeof(unsigned long))
asm ("movq %0, %%fs:0x28\n"::"r"(a));
}

void *run32(void *a)
{
#ifdef GAPING_SECURITY_HOLE
reassign_sc_32();
#endif

stack_prot_32(atoi(a));
return NULL;
}

void * run64(void *a)
{
#ifdef GAPING_SECURITY_HOLE
reassign_sc_64();
#endif

stack_prot_64(atoi(a));
return NULL;
}

int main(int argc, char *argv[])
{
pthread_t t1,t2,t3;
if(argc <= 1)
return printf("error: give a number of functions.\n");

if(sizeof(unsigned int) == sizeof(unsigned long))
{
pthread_create(&t1, NULL, run32, argv[1]);
pthread_create(&t2, NULL, run32, argv[1]);
pthread_create(&t3, NULL, run32, argv[1]);
} else
{
pthread_create(&t1, NULL, run64, argv[1]);
pthread_create(&t2, NULL, run64, argv[1]);
pthread_create(&t3, NULL, run64, argv[1]);
}

pthread_join(t1, NULL);
pthread_join(t2, NULL);
pthread_join(t3, NULL);

return 0;
}


-Aaron

Tuesday, November 17, 2009

Stack Canaries - Part 1

So, for those of you who haven't paid much attention, starting...oh...3-4 years ago the GNU compiler folded in a stack guarding implementation to protect against the classic buffer overflow attack. What that means is that when you make a function call in C, the system actually tries to protect your stack by pushing a canary value (determined, I think, by the gs:14 value) during the preamble, and then comparing this just before returning. An example is below:

0x080484e4 : push %ebp
0x080484e5 : mov %esp,%ebp
0x080484e7 : sub $0x38,%esp
0x080484ea : mov 0x8(%ebp),%eax
0x080484ed : mov %eax,-0x24(%ebp)
0x080484f0 : mov %gs:0x14,%eax
0x080484f6 : mov %eax,-0x4(%ebp)
0x080484f9 : xor %eax,%eax
0x080484fb : movl $0x0,-0x14(%ebp)
0x08048502 : movl $0x0,-0x10(%ebp)
0x08048509 : movl $0x0,-0xc(%ebp)
0x08048510 : movl $0x0,-0x8(%ebp)

The above does the standard save the return address, and make space on the stack. Then it zeros out the local stack argument (vbuf[16] = {0}. See addrs 0x080484fb -> 0x08048510). So, what we see is the "bottom" of the stack contains the canary, followed by the argument(s). If, say, we were to try and copy more than 16 bytes into vbuf, it would overwrite the canary value starting at ebp-4. The check comes at the very end of the function:

0x08048543 : mov -0x4(%ebp),%eax
0x08048546 : xor %gs:0x14,%eax
0x0804854d : je 0x8048554
0x0804854f : call 0x8048418 <__stack_chk_fail@plt>
0x08048554 : leave
0x08048555 : ret

So, the function cleanup routine first puts the canary value into register eax, then does exclusive or of gs:14 with eax. If the values match, the exclusive or would give a 0 value, and the je branch would be taken allowing execution to resume. However, if we fail, execution would be transferred to __stack_chk_fail@plt function. This function looks as follows:

0x08048418 <__stack_chk_fail@plt+0>: jmp *0x804a014
0x0804841e <__stack_chk_fail@plt+6>: push $0x28
0x08048423 <__stack_chk_fail@plt+11>: jmp 0x80483b8 <_init+48>

The very first thing that happens is we jmp into the global offest table, and start our execution, which should print out some memory map information, as well as a debug message that stack corruption was detected, etc. The C code for this is as follows:

#include "string.h"

void foo(char *buf)
{
char vbuf[16] = {0};

memcpy(vbuf, buf, strlen(buf));
printf(vbuf); /* might as well be really vulnerable :) */
}

int main(int argc, char *argv[])
{
printf("%s: %s\n", argv[0], argv[1]);

foo(argv[1]);

printf("\n%s: %s\n", argv[0], argv[1]);

return 0;
}

And execution with some different length arguments yields the following results:

[12:07:11][aconole@ssh:~]
$ gcc -fstack-protector -m32 -o bodud bodud.c
bodud.c: In function âfooâ:
bodud.c:7: warning: incompatible implicit declaration of built-in function âmemcpyâ
bodud.c:7: warning: incompatible implicit declaration of built-in function âstrlenâ
[12:07:23][aconole@ssh:~]
$ ./bodud asdf
./bodud: asdf
asdf
./bodud: asdf
[12:07:25][aconole@ssh:~]
$ ./bodud asdfasdf
./bodud: asdfasdf
asdfasdf
./bodud: asdfasdf
[12:07:28][aconole@ssh:~]
$ ./bodud asdfasdfasdf
./bodud: asdfasdfasdf
asdfasdfasdf
./bodud: asdfasdfasdf
[12:07:29][aconole@ssh:~]
$ ./bodud asdfasdfasdfasdf
./bodud: asdfasdfasdfasdf
asdfasdfasdfasdföuóc
ÿ ¡e
ÿe
ÿ¡e
ÿù 0c
ÿôÿv÷c
؍bֈ 0
c
ÿçb÷
./bodud: asdfasdfasdfasdf
[12:07:32][aconole@ssh:~]
$ ./bodud asdfasdfasdfasdfasdf
./bodud: asdfasdfasdfasdfasdf
*** stack smashing detected ***: ./bodud terminated
======= Backtrace: =========
/lib/libc.so.6(__fortify_fail+0x48)[0xf7729da8]
/lib/libc.so.6(__fortify_fail+0x0)[0xf7729d60]
./bodud[0x8048554]
./bodud[0x804859b]
/lib/libc.so.6(__libc_start_main+0xe5)[0xf7659705]
./bodud[0x8048451]
======= Memory map: ========
08048000-08049000 r-xp 00000000 08:11 7602791 /home/aconole/bodud
08049000-0804a000 r--p 00000000 08:11 7602791 /home/aconole/bodud
0804a000-0804b000 rw-p 00001000 08:11 7602791 /home/aconole/bodud
0804b000-0806c000 rw-p 0804b000 00:00 0 [heap]
f7642000-f7643000 rw-p f7642000 00:00 0
f7643000-f7798000 r-xp 00000000 08:11 6381831 /lib/libc-2.9.so
f7798000-f7799000 ---p 00155000 08:11 6381831 /lib/libc-2.9.so
f7799000-f779b000 r--p 00155000 08:11 6381831 /lib/libc-2.9.so
f779b000-f779c000 rw-p 00157000 08:11 6381831 /lib/libc-2.9.so
f779c000-f779f000 rw-p f779c000 00:00 0
f77bf000-f77cc000 r-xp 00000000 08:11 6381883 /lib/libgcc_s.so.1
f77cc000-f77cd000 r--p 0000c000 08:11 6381883 /lib/libgcc_s.so.1
f77cd000-f77ce000 rw-p 0000d000 08:11 6381883 /lib/libgcc_s.so.1
f77ce000-f77d0000 rw-p f77ce000 00:00 0
f77d0000-f77ee000 r-xp 00000000 08:11 6381973 /lib/ld-2.9.so
f77ee000-f77ef000 r--p 0001d000 08:11 6381973 /lib/ld-2.9.so
f77ef000-f77f0000 rw-p 0001e000 08:11 6381973 /lib/ld-2.9.so
ffb6f000-ffb84000 rw-p 7ffffffea000 00:00 0 [stack]
ffffe000-fffff000 r-xp ffffe000 00:00 0 [vdso]
asdfasdfasdfasdfasdfÈ'¸ÿ
[12:07:36][aconole@ssh:~]
$

So, we see that we have triggered the buffer overflow condition and can correctly cause an exception in the stack guard protector. The big question is, can we reassign the stack protector entrypoint to yield a different result. Just from a theoretical standpoint.

Lets try the good 'ol bruteforce method. We'll set the entrypoint of __stack_chk_fail@plt to 195 (the decimal value of ret).

[01:27:32][aconole@ssh:~]
$ gdb ./bodud
GNU gdb (GDB; openSUSE 11.1) 6.8.50.20081120-cvs
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-suse-linux".
For bug reporting instructions, please see:
...
(gdb) r asdf
Starting program: /home/aconole/bodud asdf
/home/aconole/bodud: asdf

Program received signal SIGSEGV, Segmentation fault.
0x08048595 in main (argc=2, argv=0xffffd484) at bodud.c:16
16 *f = 195;
(gdb)

As we can see, that region of memory seems to be protected from alteration which confirms the output of the following line (from our original dump):
08048000-08049000 r-xp 00000000 08:11 7602791 /home/aconole/bodud

So, it seems as though we'll need to find another way of either bypassing the call to __stack_chk_fail or putting gs:14 into our stack.