初学angr与去混淆

发表于 2025-05-29 本文字数： 32k 阅读时长 ≈ 58 分钟

学习angr

我打算根据https://github.com/jakespringer/angr_ctf里的解题代码进行学习。

00_angr_find

import angr
import sys

def main(argv):
  # Create an Angr project.
  # If you want to be able to point to the binary from the command line, you can
  # use argv[1] as the parameter. Then, you can run the script from the command
  # line as follows:
  # python ./scaffold00.py [binary]
  # (!)
  path_to_binary = ???  # :string
  project = angr.Project(path_to_binary)

  # Tell Angr where to start executing (should it start from the main()
  # function or somewhere else?) For now, use the entry_state function
  # to instruct Angr to start from the main() function.
  initial_state = project.factory.entry_state()

  # Create a simulation manager initialized with the starting state. It provides
  # a number of useful tools to search and execute the binary.
  simulation = project.factory.simgr(initial_state)

  # Explore the binary to attempt to find the address that prints "Good Job."
  # You will have to find the address you want to find and insert it here. 
  # This function will keep executing until it either finds a solution or it 
  # has explored every possible path through the executable.
  # (!)
  print_good_address = ???  # :integer (probably in hexadecimal)
  simulation.explore(find=print_good_address)

  # Check that we have found a solution. The simulation.explore() method will
  # set simulation.found to a list of the states that it could find that reach
  # the instruction we asked it to search for. Remember, in Python, if a list
  # is empty, it will be evaluated as false, otherwise true.
  if simulation.found:
    # The explore method stops after it finds a single state that arrives at the
    # target address.
    solution_state = simulation.found[0]

    # Print the string that Angr wrote to stdin to follow solution_state. This 
    # is our solution.
    print(solution_state.posix.dumps(sys.stdin.fileno()))
  else:
    # If Angr could not find a path that reaches print_good_address, throw an
    # error. Perhaps you mistyped the print_good_address?
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

解析

创建Angr项目，angr会对二进制文件解析。

1	project = angr.Project(path_to_binary)

创建一个初始状态，通常从程序的入口点开始，即文件的入口函数main。

1	initial_state = project.factory.entry_state()

创建一个模拟管理器，通过这个模拟器可以使用很多工具，对二进制文件进行搜索和执行。

1	simulation = project.factory.simgr(initial_state)

自动探索路径，直到找到一个到达find参数指定的地址的状态。

1	simulation.explore(find=print_good_address)

这是一个列表，如果搜索成功，这个列表会包含所有可到达目标地址的状态。

1	simulation.found

从成功状态中，提取标准输入的内容。

1 2	solution_state = simulation.found[0] solution_state.posix.dumps(sys.stdin.fileno()).decode()

01_angr_avoid

import angr
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)
  initial_state = project.factory.entry_state(
    add_options = { angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY,
                    angr.options.SYMBOL_FILL_UNCONSTRAINED_REGISTERS}
  )
  simulation = project.factory.simgr(initial_state)

  print_good_address = 0x80485f7 # 目标成功地址（示例）
  will_not_succeed_address = 0x80485bf # 需要避开的地址（示例）
  # 探索时，寻找print_good_address，并避开will_not_succeed_address
  simulation.explore(find=print_good_address, avoid=will_not_succeed_address)

  if simulation.found:
    solution_state = simulation.found[0]
    print(solution_state.posix.dumps(sys.stdin.fileno()).decode())
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

解析

这次程序中不仅有成功路径，还有一些我们不希望进入的“陷阱”路径（例如，一个会打印失败信息然后退出的路径）。我们需要找到成功路径，同时避开这些陷阱。

在探索路径的时候，avoid参数可以避免某些状态的产生。比方说，不进入基本块A，则不会在基本块A处产生新状态，基本块A之后的基本块也不会继续通过基本块A产生新状态。

1	simulation.explore(find=find_address, avoid=avoid_address)

02_angr_find_condition

import angr
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)
  initial_state = project.factory.entry_state(
    add_options = { angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY,
                    angr.options.SYMBOL_FILL_UNCONSTRAINED_REGISTERS}
  )
  simulation = project.factory.simgr(initial_state)

  # 定义成功条件函数
  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno()) # 获取标准输出
    return b'Good Job.' in stdout_output # 检查是否包含 "Good Job." (注意是字节串)

  # 定义应中止路径的条件函数
  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b'Try again.' in stdout_output # 检查是否包含 "Try again."

  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    solution_state = simulation.found[0]
    print(solution_state.posix.dumps(sys.stdin.fileno()).decode())
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

解析

有时候，可能不知道确切“成功”的指令地址，而是某种行为。比如说，某个基本块会标准输出“Good Job.”。

explore方法的find和avoid参数不仅可以接受地址，还可以接受一个函数。这个函数会以一个state对象为参数，返回True代表找到了或者应该避开，而返回False代表没找到成功状态或者不应该避开。

Angr会在每一步执行后，用这些函数评判当前状态，如果状态是True，就代表成功了。

# 定义成功条件函数
  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno()) # 获取标准输出
    return b'Good Job.' in stdout_output # 检查是否包含 "Good Job." (注意是字节串)

  # 定义应中止路径的条件函数
  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b'Try again.' in stdout_output # 检查是否包含 "Try again."

03_angr_symbolic_registers

import claripy # 导入claripy用于创建符号变量
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  # 设定一个开始地址，通常是scanf调用之后，这样我们可以手动控制输入
  start_address = 0x80488c7 # 示例地址，你需要反汇编找到它
  initial_state = project.factory.blank_state(
    addr=start_address, # 从指定地址开始
    add_options = { angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY,
                    angr.options.SYMBOL_FILL_UNCONSTRAINED_REGISTERS}
  )

  # 创建符号变量来代表密码的各个部分
  # 假设程序从scanf读取3个32位整数，然后这些值被放到了eax, ebx, edx
  password0_size_in_bits = 32
  password0 = claripy.BVS('password0', password0_size_in_bits) # 'password0'是名字，32是位数

  password1_size_in_bits = 32
  password1 = claripy.BVS('password1', password1_size_in_bits)

  password2_size_in_bits = 32
  password2 = claripy.BVS('password2', password2_size_in_bits)

  # 将符号变量注入到相应的寄存器
  initial_state.regs.eax = password0
  initial_state.regs.ebx = password1
  initial_state.regs.edx = password2 # 实际使用哪个寄存器取决于反汇编结果

  simulation = project.factory.simgr(initial_state)

  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b'Good Job.' in stdout_output

  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b'Try again.' in stdout_output

  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    solution_state = simulation.found[0]

    # 从成功状态中求解出各个符号变量的具体值
    solution0 = solution_state.solver.eval(password0)
    solution1 = solution_state.solver.eval(password1)
    solution2 = solution_state.solver.eval(password2)

    # 将解格式化为程序期望的输入格式（这里假设是十六进制，空格分隔）
    solution_string = ' '.join(map('{:x}'.format, [ solution0, solution1, solution2 ]))
    print(solution_string)
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

解析

虽然现在的Angr支持函数scanf了，但之前是不支持的，这道题目是想教会读者，可以在任意地址设置状态，并符号执行。

在任意地址创建状态。

1	project.factory.blank_state(addr=start_address)

创建符号向量，例如，一个代表32位数值的符号。

1	claripy.BVS('name', bit_length)

将符号值赋给寄存器。

1 2	state.regs.REG_NAME = symbolic_value # 例如: initial_state.regs.eax = password0

获取符号变量的一个具体满足约束的值。

1	state.solver.eval(symbolic_value)

04_angr_symbolic_stack

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  # 挑战描述 scanf("%u %u")
  # 通常 scanf 的参数是：格式字符串地址、变量1地址、变量2地址...
  # 这些地址本身可能在栈上（如果是局部变量的地址）
  start_address = 0x80486ae # 跳过scanf的地址 (示例)
  initial_state = project.factory.blank_state(
    addr=start_address,
    add_options = { angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY,
                    angr.options.SYMBOL_FILL_UNCONSTRAINED_REGISTERS}
  )

  # 模拟函数栈帧的建立
  # mov %esp, %ebp (通常在函数开始)
  # sub $0x18, %esp (为局部变量分配空间)
  # 我们跳过了这些，所以需要手动设置
  initial_state.regs.ebp = initial_state.regs.esp # 假设 ebp 指向当前 esp

  password0 = claripy.BVS('password0', 32) # 第一个32位符号整数
  password1 = claripy.BVS('password1', 32) # 第二个32位符号整数

  # 根据反汇编，scanf的参数是 ebp-0xc 和 ebp-0x10
  # scanf(format_string, ebp - 0xc, ebp - 0x10)
  # 假设 ebp 和 esp 初始化后相等，我们需要在栈上为这两个变量腾出空间。
  # 并且，由于我们跳过了scanf之前的参数压栈，栈的布局需要我们模拟。
  # solve04.py 中:
  # ebp ->     |          padding          | (ebp 指向这里，也是初始 esp 指向这里)
  # ...
  # ebp - 0x0c |   password0 (符号值)    | <- scanf 会写入这里 (即 initial_state.regs.ebp - 0xc)
  # ...
  # ebp - 0x10 |   password1 (符号值)    | <- scanf 会写入这里 (即 initial_state.regs.ebp - 0x10)
  # ...
  # esp ->     |                           | (esp 在分配完所有局部变量后指向这里)

  # 为了让 scanf 操作的地址 (ebp-0xc, ebp-0x10) 包含我们的符号值，
  # 我们需要在这些“模拟的”内存位置存储符号值。
  # solve04.py 通过 stack_push 来实现，这暗示它认为这些 scanf 的目标缓冲区
  # 就是在栈上连续分配的。

  # 首先，为栈上的局部变量和 scanf 的参数指针（如果它们是通过压栈传递的话）分配空间。
  # `solve04.py` 中 padding_length_in_bytes = 8，然后 esp -= padding_length_in_bytes
  # 之后 stack_push(password0) 和 stack_push(password1)
  # 这意味着它期望 password0 和 password1 直接作为值被程序使用，
  # 并且它们位于 esp 指向的栈顶附近。
  # 这需要非常小心地根据实际反汇编代码来确定栈布局。
  # scanf("%u %u", &var1, &var2) -> var1 和 var2 是栈上的局部变量
  # 假设 var1 在 ebp-0xc, var2 在 ebp-0x10
  # 我们需要确保在 initial_state 中，内存地址 (ebp-0xc) 处的值是 password0，(ebp-0x10) 处是 password1。
  # 更直接的方法可能是:
  # initial_state.memory.store(initial_state.regs.ebp - 0xc, password0, endness=project.arch.memory_endness)
  # initial_state.memory.store(initial_state.regs.ebp - 0x10, password1, endness=project.arch.memory_endness)
  #
  # 但 solve04.py 的做法是：
  padding_length_in_bytes = 8 # 这个值需要根据具体情况调整
  initial_state.regs.esp -= padding_length_in_bytes

  # 将符号变量压入栈。注意顺序！x86参数通常自右向左压栈。
  # 但这里我们是模拟 scanf 已经执行完后，数据已在栈上变量中的情况。
  # 所以这里的顺序是 password0 在较高的地址 (ebp-0xc)，password1 在较低地址 (ebp-0x10)。
  # stack_push 会将值放到当前esp，然后esp相应调整。
  # 为了匹配 ebp-0xc 和 ebp-0x10，可能需要先push password1，再push password0，
  # 或者调整 padding 和 esp 的初始值。
  # solve04.py 的顺序是：
  initial_state.stack_push(password0)
  initial_state.stack_push(password1)
  # 这意味着 password1 在栈顶 (低地址)，password0 在其后 (高地址)。
  # 这需要与程序如何从栈上读取这些值相对应。

  simulation = project.factory.simgr(initial_state)

  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b'Good Job.' in stdout_output

  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b'Try again.' in stdout_output

  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    solution_state = simulation.found[0]
    solution0 = solution_state.solver.eval(password0)
    solution1 = solution_state.solver.eval(password1)
    # 格式化输出
    solution_string = ' '.join(map(str, [solution0, solution1])) # 假设是十进制整数，空格分隔
    print(solution_string)
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

解析

类似于Level3，但这次scanf读取的值可能直接存储于栈上的局部变量，而非像Level3一样，将值直接传递给3个寄存器。

获得初始状态，angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY和SYMBOL_FILL_UNCONSTRAINED_REGISTERS用于设置未初始化的内存、寄存器为符号。

initial_state = project.factory.blank_state(
  addr=start_address,
  add_options = { angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY,
                  angr.options.SYMBOL_FILL_UNCONSTRAINED_REGISTERS}
)

模拟栈帧的建立，令ebp指向当前esp。

1	initial_state.regs.ebp = initial_state.regs.esp

创建2个32位的符号整数。

1 2	password0 = claripy.BVS('password0', 32) # 第一个32位符号整数 password1 = claripy.BVS('password1', 32) # 第二个32位符号整数

根据反汇编，scanf的参数是 ebp-0xc 和 ebp-0x10，相当于：

1	scanf(format_string, ebp - 0xc, ebp - 0x10)

esp开辟8个字节的栈帧。

1 2	padding_length_in_bytes = 8 # 这个值需要根据具体情况调整 initial_state.regs.esp -= padding_length_in_bytes

压栈。

1 2	initial_state.stack_push(password0) initial_state.stack_push(password1)

创建模拟器。

1	simulation = project.factory.simgr(initial_state)

解析成功状态时，符号的值。

1 2	solution0 = solution_state.solver.eval(password0) solution1 = solution_state.solver.eval(password1)

为什么这里需要设置ebp = esp，并且esp -= 8？

观察反汇编代码，发现之后调用参数的时候，都是基于ebp来调用的，因此这里ebp先设置为esp，然后让esp -= 8，之后压栈操作又会令esp - 4，这个时候，ebp + var_10和ebp + var_C可以正确访问到栈上的符号变量了。

05_angr_symbolic_memory

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  # 挑战描述 scanf("%8s %8s %8s %8s")
  # 假设这些字符串被存储在已知的全局地址
  start_address = 0x8048618 # scanf之后开始 (示例)
  initial_state = project.factory.blank_state(
    addr=start_address,
    add_options = { angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY,
                    angr.options.SYMBOL_FILL_UNCONSTRAINED_REGISTERS}
  )

  # 创建符号字符串，每个8字节 (8*8=64位)
  password0 = claripy.BVS('password0', 8*8)
  password1 = claripy.BVS('password1', 8*8)
  password2 = claripy.BVS('password2', 8*8)
  password3 = claripy.BVS('password3', 8*8)

  # 将符号字符串存储到它们在内存中的地址
  # 这些地址需要通过反汇编确定
  password0_address = 0xab232c0 # 示例地址
  initial_state.memory.store(password0_address, password0) # 默认字节序，对于字符串通常OK
  password1_address = 0xab232c8 # 示例地址
  initial_state.memory.store(password1_address, password1)
  password2_address = 0xab232d0 # 示例地址
  initial_state.memory.store(password2_address, password2)
  password3_address = 0xab232d8 # 示例地址
  initial_state.memory.store(password3_address, password3)

  simulation = project.factory.simgr(initial_state)

  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b'Good Job.' in stdout_output

  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b'Try again.' in stdout_output

  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    solution_state = simulation.found[0]

    # 求解符号字符串
    solution0 = solution_state.solver.eval(password0, cast_to=bytes).decode()
    solution1 = solution_state.solver.eval(password1, cast_to=bytes).decode()
    solution2 = solution_state.solver.eval(password2, cast_to=bytes).decode()
    solution3 = solution_state.solver.eval(password3, cast_to=bytes).decode()

    # 组合成最终输入字符串，假设空格分隔
    solution_string = ' '.join([solution0, solution1, solution2, solution3])
    print(solution_string)
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

解析

scanf可以获得输入，这个输入在Level3中写入了寄存器，在Level4中写入了栈，在本次案例中，写入了全局变量。

创建起始状态。

initial_state = project.factory.blank_state(
  addr=start_address,
  add_options = { angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY,
                  angr.options.SYMBOL_FILL_UNCONSTRAINED_REGISTERS}
)

创建符号变量。

# 创建符号字符串，每个8字节 (8*8=64位)
password0 = claripy.BVS('password0', 8*8)
password1 = claripy.BVS('password1', 8*8)
password2 = claripy.BVS('password2', 8*8)
password3 = claripy.BVS('password3', 8*8)

使用initial_state.memory.store，将符号值存储到已知的全局变量地址。（偏移地址）

# 将符号字符串存储到它们在内存中的地址
  # 这些地址需要通过反汇编确定
  password0_address = 0xab232c0 # 示例地址
  initial_state.memory.store(password0_address, password0) # 默认字节序，对于字符串通常OK
  password1_address = 0xab232c8 # 示例地址
  initial_state.memory.store(password1_address, password1)
  password2_address = 0xab232d0 # 示例地址
  initial_state.memory.store(password2_address, password2)
  password3_address = 0xab232d8 # 示例地址
  initial_state.memory.store(password3_address, password3)

解析符号的值。

1	solution_state.solver.eval(symbolic_value, cast_to=bytes).decode()

06_angr_symbolic_dynamic_memory

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  # 挑战描述 scanf("%8s %8s")，输入存储在动态分配的内存中
  start_address = 0x80486af # scanf 之后 (示例)
  initial_state = project.factory.blank_state(
    addr=start_address,
    add_options = { angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY,
                    angr.options.SYMBOL_FILL_UNCONSTRAINED_REGISTERS}
  )

  password0 = claripy.BVS('password0', 8*8) # 8字节字符串
  password1 = claripy.BVS('password1', 8*8) # 注意solve06.py这里写的是password0，应为password1

  # 伪造的堆地址，我们可以控制这块内存
  fake_heap_address0 = 0x4444444 
  # 存储了 malloc返回地址 的那个指针变量的地址 (需反汇编确定)
  pointer_to_malloc_memory_address0 = 0xa2def74 # 示例

  # 修改程序中的指针，使其指向我们的伪造堆地址
  # 对于32位程序，地址是4字节
  initial_state.memory.store(pointer_to_malloc_memory_address0, fake_heap_address0, endness=project.arch.memory_endness, size=4)

  fake_heap_address1 = 0x4444454 # 为第二个密码准备的伪造地址
  pointer_to_malloc_memory_address1 = 0xa2def7c # 示例
  initial_state.memory.store(pointer_to_malloc_memory_address1, fake_heap_address1, endness=project.arch.memory_endness, size=4)


  # 在伪造的堆地址上存储我们的符号密码
  initial_state.memory.store(fake_heap_address0, password0)
  initial_state.memory.store(fake_heap_address1, password1)


  simulation = project.factory.simgr(initial_state)

  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b'Good Job.' in stdout_output

  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b'Try again.' in stdout_output

  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    solution_state = simulation.found[0]
    solution0 = solution_state.solver.eval(password0, cast_to=bytes).decode()
    solution1 = solution_state.solver.eval(password1, cast_to=bytes).decode() # 原为password0
    
    solution_string = ' '.join([solution0, solution1])
    print(solution_string)
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

解析

本次案例中，输入的值保存到了堆上。

在scanf之后的地址创建初始状态。

start_address = 0x80486af # scanf 之后 (示例)
  initial_state = project.factory.blank_state(
    addr=start_address,
    add_options = { angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY,
                    angr.options.SYMBOL_FILL_UNCONSTRAINED_REGISTERS}
  )

程序中会有一个指针变量（全局变量或是局部变量）用于存储malloc返回的地址，如上图的buffer0、buffer1。

选择一个angr知道的、未被使用的“伪造堆地址”，使用下面这个语句修改存储malloc结果的指针，让它指向伪造的堆地址，这里的endness指字节序。

 # 伪造的堆地址，我们可以控制这块内存
  fake_heap_address0 = 0x4444444 
  # 存储了 malloc返回地址 的那个指针变量的地址 (需反汇编确定)
  pointer_to_malloc_memory_address0 = 0xa2def74 # 示例

initial_state.memory.store(pointer_to_malloc_memory_address1, fake_heap_address1, endness=project.arch.memory_endness, size=4)

在对地址上存储符号向量。

password0 = claripy.BVS('password0', 8*8)
password1 = claripy.BVS('password1', 8*8)
# 在伪造的堆地址上存储我们的符号密码
initial_state.memory.store(fake_heap_address0, password0)
initial_state.memory.store(fake_heap_address1, password1)

探索路径。

def is_successful(state):
  stdout_output = state.posix.dumps(sys.stdout.fileno())
  return b'Good Job.' in stdout_output

def should_abort(state):
  stdout_output = state.posix.dumps(sys.stdout.fileno())
  return b'Try again.' in stdout_output

simulation.explore(find=is_successful, avoid=should_abort)

解析符号。

1 2	solution0 = solution_state.solver.eval(password0, cast_to=bytes).decode() solution1 = solution_state.solver.eval(password1, cast_to=bytes).decode()

07_angr_syombolic_file

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  # fread(buffer, sizeof(char), 64, file) - 题目提示，但实际解法用了8字节
  # 可能程序逻辑只需要前8字节，或者solve07.py中的大小是正确的。
  start_address = 0x80488bc # 通常是fread调用之前或包含fread的函数入口 (示例)
  initial_state = project.factory.blank_state( # 或 entry_state，取决于是否需要模拟fopen
    addr=start_address,
    add_options = { angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY,
                    angr.options.SYMBOL_FILL_UNCONSTRAINED_REGISTERS}
  )

  filename = 'FOQVSBZB.txt' # 程序将要读取的文件名 (示例)
  symbolic_file_size_bytes = 8 # 假设密码是8字节

  # 创建代表文件内容的符号位向量
  password_data = claripy.BVS('password_file_content', symbolic_file_size_bytes * 8)

  # 创建一个模拟文件，其内容是我们的符号数据
  # 'r'表示只读模式，Angr的SimFile也接受模式参数，但content更直接
  password_file = angr.storage.SimFile(filename, content=password_data)

  # 将这个模拟文件插入到初始状态的文件系统中
  # 当程序尝试打开并读取 'FOQVSBZB.txt' 时，它会读到 password_data
  initial_state.fs.insert(filename, password_file)

  simulation = project.factory.simgr(initial_state)

  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b'Good Job.' in stdout_output

  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b'Try again.' in stdout_output

  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    solution_state = simulation.found[0]
    # 求解文件内容
    solution_bytes = solution_state.solver.eval(password_data, cast_to=bytes)
    print(solution_bytes.decode()) # 假设是可打印字符
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

解析

程序会从文件中读取密码，而不是标准的输入或命令行参数。

解题思路：

确定程序读取的文件名和预期的大小。

1 2	filename = 'FOQVSBZB.txt' # 程序将要读取的文件名 (示例) symbolic_file_size_bytes = 8 # 假设密码是8字节

创建符号位向量代表文件内容。

1 2	# 创建代表文件内容的符号位向量 password_data = claripy.BVS('password_file_content', symbolic_file_size_bytes * 8)

使用angr.storage.SimFile(filename, content=symbolic_data)创建一个模拟文件，内容是我们的符号数据。将这个模拟文件插入到初始状态的文件系统中。

# 创建一个模拟文件，其内容是我们的符号数据
# 'r'表示只读模式，Angr的SimFile也接受模式参数，但content更直接
password_file = angr.storage.SimFile(filename, content=password_data)

# 将这个模拟文件插入到初始状态的文件系统中
# 当程序尝试打开并读取 'FOQVSBZB.txt' 时，它会读到 password_data
initial_state.fs.insert(filename, password_file)

当程序执行 fopen, fread 等文件操作时，它会从我们提供的这个符号化文件中读取。

08_angr_constrains

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  # 目标是找到complex_function的输入，使其输出等于硬编码的字符串
  # 我们从main函数中scanf之后，complex_function执行之后，但在check_equals_调用之前开始
  start_address = 0x804863c # scanf之后，complex_function之前的某个点 (示例)
                           # 或者，让Angr执行scanf和complex_function
  initial_state = project.factory.blank_state(
    addr=start_address, # 实际上，应该让Angr执行scanf和complex_function
                        # 所以可能还是entry_state()或者scanf后的地址，
                        # 并将原始密码符号化。
                        # solve08.py中直接从0x804863c开始，并符号化了password_address处的内存
    add_options = { angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY,
                    angr.options.SYMBOL_FILL_UNCONSTRAINED_REGISTERS}
  )
  
  # 原始密码（complex_function的输入）是符号化的
  # 假设原始密码存储在 password_address，长度16字节
  password_size_bytes = 16
  password = claripy.BVS('password', password_size_bytes * 8)
  password_address = 0x804a040 # 假设这是存储原始密码的全局变量地址 (示例)
  initial_state.memory.store(password_address, password) # 注入符号化的原始密码

  simulation = project.factory.simgr(initial_state)

  # 探索到调用 check_equals_ 函数之前的地址
  address_to_check_constraint = 0x8048683 # call check_equals_... 指令的地址 (示例)
  simulation.explore(find=address_to_check_constraint)

  if simulation.found:
    solution_state = simulation.found[0] # 到达了 call check_equals_ 的状态

    # check_equals_ 的第一个参数 (char* to_check) 是 complex_function 的输出
    # 这个参数的地址通常是已知的，或者可以通过分析调用前的汇编指令得到
    # 在这个题目中，complex_function 的输出结果（即 to_check）覆盖了原始密码的内存区域
    # 所以 constrained_parameter_address 就是 password_address
    constrained_parameter_address = 0x804a040 # (示例) complex_function的输出存放地址
    constrained_parameter_size_bytes = 16

    # 从当前状态的内存中加载这个参数，它应该是符号化的
    constrained_parameter_bitvector = solution_state.memory.load(
      constrained_parameter_address,
      constrained_parameter_size_bytes
    )
    
    # 我们知道 check_equals_ 期望的字符串 (通过反汇编或题目描述)
    constrained_parameter_desired_value = 'OSIWHBXIFOQVSBZB'.encode() # 目标字符串 (示例)

    # 添加约束：要求 complex_function 的输出等于期望的字符串
    solution_state.add_constraints(constrained_parameter_bitvector == constrained_parameter_desired_value)

    # 现在，求解原始的符号密码 password
    # Angr会找到一个 password，使得经过 complex_function 变换后，
    # 其结果 (constrained_parameter_bitvector) 等于 constrained_parameter_desired_value
    solution_bytes = solution_state.solver.eval(password, cast_to=bytes)
    print(solution_bytes.decode())
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

解析

程序中有一个非常复杂的检查函数（check_equals_...），它逐字节比较输入经过某个 complex_function 转换后的结果与一个硬编码的字符串。如果让Angr完整符号执行这个检查函数，由于大量的分支 (每个字节比较都是一个分支)，会导致状态爆炸，非常耗时。

创建初始状态，假设这里的0x804863C是符号执行的起始地址。

start_address = 0x804863c
initial_state = project.factory.blank_state(
    addr=start_address,
    add_options={
        angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY,
        angr.options.SYMBOL_FILL_UNCONSTRAINED_REGISTERS
    }
)

符号化密码，假设密码是16字节长，创建一个16字节的符号变量，password_address是原始密码的内存地址，

password_size_bytes = 16
password = claripy.BVS('password', password_size_bytes * 8)
password_address = 0x804a040
initial_state.memory.store(password_address, password) # 将符号变量password写入内存地址0x804a040

创建模拟管理器，探索目标地址。

simulation = project.factory.simgr(initial_state)

address_to_check_constraint = 0x8048683
simulation.explore(find=address_to_check_constraint)

处理成功状态。

1 2	if simulation.found: solution_state = simulation.found[0]

加载和约束输出。solution_state.add_constraints(…)添加约束。

constrained_parameter_address = 0x804a040
constrained_parameter_size_bytes = 16
constrained_parameter_bitvector = solution_state.memory.load(
    constrained_parameter_address,
    constrained_parameter_size_bytes
)
constrained_parameter_desired_value = 'OSIWHBXIFOQVSBZB'.encode()
solution_state.add_constraints(constrained_parameter_bitvector == constrained_parameter_desired_value)

求解密码。

1 2	solution_bytes = solution_state.solver.eval(password, cast_to=bytes) print(solution_bytes.decode())

09_angr_hooks

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  initial_state = project.factory.entry_state( # 从程序入口开始，让Angr处理scanf
    add_options = { angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY,
                    angr.options.SYMBOL_FILL_UNCONSTRAINED_REGISTERS}
  )

  check_equals_called_address = 0x80486ca # call check_equals_... 指令的地址 (示例)
  instruction_to_skip_length = 5 # call指令的长度 (示例，通常是5字节: E8 XX XX XX XX)

  # 定义钩子函数
  @project.hook(check_equals_called_address, length=instruction_to_skip_length)
  def skip_check_equals_(state):
    # 假设 check_equals_ 的第一个参数 (char* to_check) 指向的地址是已知的
    # 在此题中，它指向 complex_function 的输出，该输出存储在全局变量 0x804a044
    user_input_buffer_address = 0x804a044 # (示例)
    user_input_buffer_length = 16 # 字符串长度

    # 加载这个参数（它应该是符号化的，因为它来自用户输入经过complex_function)
    user_input_string_bv = state.memory.load(
      user_input_buffer_address,
      user_input_buffer_length # load的长度单位是字节
    )
    
    check_against_string_concrete = 'OSIWHBXIFOQVSBZB'.encode() # 硬编码的目标字符串

    # 构造符号化的返回值：如果匹配则为1，否则为0 (32位整数)
    return_value = claripy.If(
      user_input_string_bv == check_against_string_concrete, # 条件
      claripy.BVV(1, 32), # 如果为真，eax = 1 (32位)
      claripy.BVV(0, 32)  # 如果为假，eax = 0 (32位)
    )
    state.regs.eax = return_value # 将返回值放入eax

  simulation = project.factory.simgr(initial_state)

  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b'Good Job.' in stdout_output

  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b'Try again.' in stdout_output

  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    solution_state = simulation.found[0]
    # 输入是由Angr的scanf SimProcedure处理的，所以直接从stdin dump
    print(solution_state.posix.dumps(sys.stdin.fileno()).decode())
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

解析

设置钩子。

# 在地址0x80486ca设置钩子，替换check_equals_的调用
# 当程序执行到此地址时，angr执行自定义函数skip_check_equals_
# 而不会调用实际的check_equals_
check_equals_called_address = 0x80486ca
instruction_to_skip_length = 5
@project.hook(check_equals_called_address, length=instruction_to_skip_length)
def skip_check_equals_(state):
# 用户输入的内容存在buffer中，buffer地址是0x804a044，长度是16
    user_input_buffer_address = 0x804a044
    user_input_buffer_length = 16
# 加载16字节的符号变量
    user_input_string_bv = state.memory.load(
        user_input_buffer_address,
        user_input_buffer_length
    )
    check_against_string_concrete = 'OSIWHBXIFOQVSBZB'.encode()
# 构造符号化的返回值
# 成功则return_value = claripy.BVV(1,32)
# 失败则return_value = claripy.BVV(0,32)
    return_value = claripy.If(
        user_input_string_bv == check_against_string_concrete,
        claripy.BVV(1, 32),
        claripy.BVV(0, 32)
    )
    state.regs.eax = return_value

加载二进制文件：创建 angr 项目，解析程序。

设置初始状态：从程序入口（main）开始，允许 angr 模拟 scanf 和 complex_function。

钩子处理：

在 check_equals_ 调用处（0x80486ca）设置钩子，跳过其执行。
模拟 check_equals_ 的行为，比较 0x804a044 处的符号输出与目标字符串，设置 eax 为 1（匹配）或 0（不匹配）。

符号执行：

探索所有路径，寻找输出 “Good Job.” 的状态，避免输出 “Try again.” 的状态。

提取输入：从成功状态的 stdin 中提取输入密码，打印结果。

利用angr解混淆

已经有大佬写过了，利用angr符号执行去除虚假控制流，思路很简单，不是活跃状态的基本块全部NOP掉。

思路：https://bbs.kanxue.com/thread-266005.htm

项目代码：https://github.com/bluesadi/debogus

局限也是有的：

依赖于z3约束求解，如果解不出来，则判定为非活跃状态的基本块；
未约束导致路径爆炸。

我的理解

什么是状态、成功状态？

在一开始，符号执行还没开始的时候，需要一个初始状态（寄存器、堆栈、内存等，它们可以被定义成符号），初始状态没有对符号的约束；

而成功状态就是走到了目标地址或者目标行为的状态，此时的寄存器、堆栈、内存等共同构成了成功状态；

在初始状态到成功状态的过程中，angr会为被定义为符号的变量收集约束，正是有了这些约束，程序才可以从初始状态走到成功状态；

在成功状态时，可以调用angr相关的函数，根据约束计算出符号变量的值。

状态的更新与分裂？

每个基本块都有一个状态，每执行一条指令，状态都会对自身进行更新；而当来到条件分支的时候，会基于当前的状态，创建分裂出独立的、并行的多个状态，此时称作产生了新的状态。

angr符号执行与基本块的关系？

基本块的特点是：

只有一个入口点（第一条指令）；
只有一个出口点（最后一条指令，通常是跳转、返回、条件分支）。

angr是一种符号执行框架，它通过将程序的输入（例如标准输入或参数）表示为符号变量（claripy.BVS），模拟程序的所有可能执行路径。angr 跟踪符号变量如何影响程序的状态（寄存器、内存等），并使用约束求解器（通常是 Z3）来确定哪些输入能够引导程序到达特定目标（如某个地址）。

在 angr 的符号执行过程中，基本块是程序执行的“原子单位”，而符号执行是基于这些基本块进行路径探索的。

基本块是执行的单位
- angr 在符号执行时，会将程序分解为基本块（通过静态分析生成 CFG 或动态执行时解析）。每个基本块代表一段连续的指令，angr 会逐块模拟执行。
- 在执行一个基本块时，angr 更新当前状态的寄存器、内存等内容，并根据基本块的出口指令（例如跳转、条件分支、返回）决定下一步的执行路径。
状态分叉与基本块的出口
- 当基本块的最后一条指令是条件分支（例如 if 语句对应的 je、jne 等跳转指令），angr 会根据条件生成多个新的状态（state），分别对应分支的真假路径。
- 例如，假设一个基本块以
  
  if (input[0] == ‘A’)
  
  结束，angr 会：
  - 创建一个状态，约束 input[0] == ‘A’，继续执行真分支。
  - 创建另一个状态，约束 input[0] != ‘A’，继续执行假分支。
- 这些分叉点通常发生在基本块的出口，因此基本块的边界决定了状态分叉的时机。
符号执行跟踪基本块的路径
- angr 的符号执行本质上是沿着 CFG 的路径探索，每个路径由一系列基本块组成。angr 维护一个状态池（通过 SimulationManager，即 simgr），跟踪每个状态对应的执行路径。
- 每个状态在执行一个基本块后，会更新其上下文（例如寄存器、内存），并根据基本块的出口指令跳转到下一个基本块。
约束累积与基本块
- 在执行基本块的过程中，angr 会根据指令的逻辑为符号变量添加约束。例如，如果一个基本块包含比较指令 cmp input[0], ‘A’，angr 会在分支时为符号变量 input[0] 添加约束（如 input[0] == ‘A’ 或 input[0] != ‘A’）。
- 这些约束是符号执行的核心，用于在成功状态中求解具体输入。
成功状态与基本块
- 当 angr 的符号执行到达目标地址（例如 find=print_good_address），对应的状态通常位于某个基本块的入口或内部。成功状态包含了从初始状态到目标地址的完整执行路径（即一系列基本块），以及符号输入的约束集合。
- 通过 solution_state.posix.dumps(sys.stdin.fileno())，angr 求解这些约束，得到引导程序到达目标基本块的输入值。

同时，angr 不会为每条指令生成新状态（而是基于原状态进行更新），而在基本块的边界（特别是条件分支）生成新状态。状态分叉发生在基本块的出口，基于分支条件。