< Home

To learn more about reverse engineering and how systems work I started doing the pwnable.kr challenges. I’ve decided to also write everything down as a sort of journal and notes for myself. As this is the learning technique that best suits me.

The first challenge is the fd challenge in the category “toddler’s bottle”. When following the instructions (connect to a host through ssh) you’ll be dropped in a /home/ folder where three files reside in.

[email protected]:~$ ls -ls
total 16
8 -r-sr-x--- 1 fd_pwn fd   7322 Jun 11  2014 fd
4 -rw-r--r-- 1 root   root  418 Jun 11  2014 fd.c
4 -r--r----- 1 fd_pwn root   50 Jun 11  2014 flag

We’ve got fd, which is an executable by our user, fd.c which is readable by all and flag which we can’t read. The interesting permission here is the s flag on fd which means that other users can run this file with the permissions of the owner of the file. So the owner of fd is fd_pwn and the flag file also has the owner of fd_pwn. We can use this file to have the read rights on the flag file.

Now, they have included the source code of fd which makes it quite simple to solve this. My mistake was looking at it which kinda beats the purpose of my “reverse engineering” goal I’ve setup on myself. In this post I won’t go over the source code file. I will go over the disassembled code and try to understand what fd does with Ghidra.

Let’s talk about the obvious stuff, it is an ELF file that is compiled through gcc. The exported main function does not call any other function except some library functions so all code we need seems to be in the main section.

Let us first edit the function signature to “comply” with a typical C program. Ghidra analyses our main function as

undefined main(int param_1, int param_2)

For C main function the started is as following

int main(int argc, const char* argv[])

Now, const variables is not something assembly really understands. It can place a const variable in the read-only data section (.rodata for example) but that is not always the case as in a parameter list there is a high possibility that you don’t know what the const will be until execution. Const in a function just means that that function is not allowed to edit the value. Your compiler will complain when you edit the const variable but in assembly it is the wild wild west!

When we edit our function (Right-Click -> Edit Function or Shortcut F) we make the line look like

int main(int argc, char** argv)

This is almost identical to what C describes but Ghidra will understand this.

Now, we could use the de-compiled tab and easily see what is happening but to make it a bit harder we are only going to look through the dissembled code for now. Remember to comment everything you see so it becomes like “reading a book”. For now I even comment the very easy stuff just to get the hang of it.

First we have our typical function prologue

08048494 55              PUSH       EBP  
08048495 89 e5           MOV        EBP,ESP
08048497 57              PUSH       EDI 
08048498 56              PUSH       ESI
08048499 83 e4 f0        AND        ESP,0xfffffff0
0804849c 83 ec 20        SUB        ESP,0x20
  1. We save the previous stack-frame base pointer (EBP) onto the stack
  2. We move the previous stack-pointer to EBP (as to create a new stack-frame where the old one left off)
  3. We push EDI and ESI onto the stack as these are callee-saved registers
  4. An optimization by the gcc compiler to round our ESP to the nearest multiple of 16 (hex 0xF)
  5. We subtract 0x20 (or 32 bytes) from our stack-pointer. So this function will use (can change) a stack-frame of 32 bytes.

The prologue is almost always the same idea, convey to some calling conventions and initialize the stack. It should be something to be seen diagonally.

Then we see the first lines of code.

0804849f                    CMP        dword ptr [EBP + argc],0x1
080484a3                    JG         LAB_080484bb
080484a5                    MOV        dword ptr [ESP]=>local_30,s_pass_argv[1]_a_num   = "pass argv[1] a number"
080484ac                    CALL       puts                                             
080484b1                    MOV        EAX,0x0
080484b6                    JMP        LAB_08048559
  1. Compare argc to 0x1 (decimal 1)
  2. If what is in argc is higher than one (JG -> Jump Greater) we’ll jump to 0x080484bb which skips everything we can see in the code block above
  3. But if argc is equal or less than two ( argc < 2 ) than we’ll execute the following instructions.
  4. Move the string located at s_pass_argv[1]_a_num to the ESP pointer (remember, when a new function is called, the function prologue happens. Now, what the puts function can do is ESP-0x8 to access the string.)
  5. Call the PUTS function (remember, this file was compiled for Linux)
  6. Put zero in EAX (you’ll see many compilers do XOR EAX, EAX instead)
  7. And as last, jump to 0x08048559 which here jumps to the function epilogue which will use EAX for the exit code (so exit code will be zero here)

So this part of the code just checks if you have passed enough arguments to the program and if not, prints out an “error message” and returns the function (when main returns, it will exit). Let us go to the next part

080484bb                 MOV        EAX,dword ptr [EBP + argv]
080484be                 ADD        EAX,0x4
080484c1                 MOV        EAX,dword ptr [EAX]
080484c3                 MOV        dword ptr [ESP]=>local_30,EAX
080484c6                 CALL       atoi
080484cb                 SUB        EAX,0x1234
080484d0                 MOV        dword ptr [ESP + local_18],EAX
080484d4                 MOV        dword ptr [ESP + local_14],0x0
080484dc                 MOV        dword ptr [ESP + local_28],0x20
080484e4                 MOV        dword ptr [ESP + local_2c],buf
080484ec                 MOV        EAX,dword ptr [ESP + local_18]
080484f0                 MOV        dword ptr [ESP]=>local_30,EAX
080484f3                 CALL       read
  1. First we move argv to EAX, we need to remember, argv in C is a CHAR array. So we have actually moved the pointer to the first item of the array. Now in C the first item of argv is actually the program name. To access the first parameter we will need to access the second item of that array that is why…
  2. We add 0x4 to the address argv
  3. We now read out what EAX is pointing to and put it in EAX
  4. We move that value to our ESP address (same method again to pass arguments to a function)
  5. We call atoi (which converts a string to an integer, remember everything you type is actually a string. When you type 1234 a processor actually sees 31 32 33 34)
  6. On return atoi has put the return result in EAX (calling convention) so EAX is now 0x4D2 (1234 decimal)
  7. Now, this is something someone deliberately put into this function, probably to confuse people but we will subtract 0x1234 from EAX so 0x4D2 becomes 0xFFFF FFFF FFFF F29E (or -3426 decimal)
  8. Now we see a bunch of MOV opcodes and later on we see a CALL. As no other opcodes happen these are probably arguments for READ
  9. By looking at the function read we notice only three arguments. As previous functions always used ESP to access arguments we can assume the same here. We can take an educated guess and see that read will use ESP-0x8 as the first argument, ESP-0xC will be the second and ESP-0x10 the third. This means that the function will be called as read(ourinput, buf, 0x20)
Memory map in Ghidra

Memory map in Ghidra

So here is the first part we need to solve to continue in our program. Read is using our first argument we pass to know which file descriptor we must listen to. A file descriptor is a handle used to access a file or other I/O resources (pipes are a common file descriptor in Linux). You probably already know some of them like 0 is standard input, 1 is standard output and 2 is standard error. We noted that our argument will be subtracted by 0x1234 so if we want to say “read standard input” we must pass 0x1234 as argument. Now remember, some lines above the read call an atoi call happened. This means that an integer is expected. We convert the hex 0x1234 to decimal 4660 and we can use that as argument to pass the first hurdle.

080484f8                 MOV        dword ptr [ESP + local_14],EAX
080484fc                 MOV        EDX,s_LETMEWIN_08048646 = "LETMEWIN\n"
08048501                 MOV        EAX,buf                 = ??
08048506                 MOV        ECX,0xa
0804850b                 MOV        ESI,EDX
0804850d                 MOV        EDI,EAX
0804850f                 CMPSB.REPE ES:EDI=>buf,ESI=>s_LETMEWIN_08048646  

Following lines actually show us what the secret password is. You don’t really need to use some sophisticated thinking logic to see that from the data section of the program the string LETMEWIN is used. But for learning sake we will continue at looking what the program does

  1. Moves the return of READ (our input) EAX into the stack (local_14)
  2. Moves some address to a pre-defined string (what would that be ;-O) into EDX
  3. Moves the buf address into EAX
  4. Move 0xA (10 decimal) into ECX
  5. Move EDX (our pre-defined data string) into ESI
  6. Move EAX into EDI
  7. CMPSB.REPE, we can see is that is uses the address of EDI and ESI. If we read the documentation it is an instruction used to compare strings, why it is a bit different is that it fetch the string data from memory as many times the registers aren’t big enough to hold the data. REPE means “Repeat while equal” and uses ECX (where we put 10 in) as a counter until zero. So the CMPSB will run until a mismatch has been found OR ECX is zero.
08048511                 SETA       DL
08048514                 SETC       AL
08048517                 MOV        ECX,EDX
08048519                 SUB        CL,AL
0804851b                 MOV        EAX,ECX
0804851d                 MOVSX      EAX,AL
08048520                 TEST       EAX,EAX
08048522                 JNZ        LAB_08048548
  1. Ok, so the comparison has happened, now we continue again with our code which will check what the result is from the CMPSB.REPE (the CMPSB.REPE only sets flags, this is a low-level language)
  2. SETA will set the byte of DL (EDX lowest byte) if EDI is higher than ESI (so not equal)
  3. SETC will set the byte of AL (EAX lowest byte) if a carry flag is present (something CMPSB does)
  4. Now we move EDX (which has DL in it) to ECX
  5. Here we subtract AL from CL (Which is the lower byte of ECX which is thanks to previous operation the lower byte of EDX which is DL)
  6. Now we move ECX (which had the previous operation happen upon itself) from EAX
  7. Now we move signed AL to EAX
  8. And here we TEST EAX against EAX. The inner workings from TEST are a bit different then high-level languages let you believe, it actually does a check here if EAX is zero but TEST EAX, 0 is a longer operation for the CPU so the compiler optimizes it.
  9. JNZ or JUMP Not Zero will jump if EAX is not zero
08048524  MOV   dword ptr [ESP]=>local_30,s_good_job_:)_08048650 = "good job :)"
0804852b  CALL  puts
08048530  MOV   dword ptr [ESP]=>local_30,s_/bin/cat_flag_0804   = "/bin/cat flag"
08048537  CALL  system
0804853   MOV   dword ptr [ESP]=>local_30,0x0
08048543  CALL  exit 

Alright, the last part which actually shows the interesting stuff. If our strings were equal (see above notes) than

  1. Move into the stack the string “good job :)”
  2. CALL puts (prints it on screen)
  3. Move into the stack the string “/bin/cat flag”
  4. CALL system (equivalent to eval, shell,… this is actually a dangerous command if it would allow any user input). Now system will execute /bin/cat flag and show us the contents of flag. This is possible because the fd file has the S right, which as said above above above, can run under the owner rights.
  5. Now move zero into the stack and call EXIT

I’m gonna skip the rest of the program as it is nothing more than the function epilogue with some extra text, but this is how you could analyze and find out how this program works without any source code available. Next one I won’t touch the source code like I foolishly did for this one.

A fun comparison between Ghidra generated C code and the real C code

Ghidra Source
< Home