Handle function calls during static analysis in angr

25 February 2021

angr , development , binary analysis

On the research project I work on at SEFCOM, I use angr to statically analyse binary programs.

Incidentally, I was invited to give a presentation as part of CSE545 during the Fall semester of 2020 at ASU. This talk was meant to be a hands-on introduction on data-flow analysis, using angr, to find “taint-style” vulnerabilities 1 in binaries. Thanks to the one of the class’s TA, the video recording is available. Furthermore, I published the slides of the presentation, as well as the illustrating code examples.

Some of the examples do not work “as is”. It means that for the people trying to reproduce it, extra elbow grease is necessary. Sadly, it can be somewhat of a tedious (and painful) process: angr is not really stable (its API evolves wildly depending on the needs of people working on it), and documentation is helpful, but not self-sufficient.

This post is aiming to bridge the gap for who would like to get similar examples working. In particular, by answering the question:

How to write a function handler to simulate the effect of a function on the state of the analysis?

This post is divided in four sections:

If you already know what “function handler” means, you can skip the Context section, and start reading from the Usage and description section.

Context

At a high level, we can use a static analysis to gather data-flow facts about the variables of programs without executing them. To do so, such analysis somewhat sequentially interprets the effects of program’s statements on the state it keeps track of 2.

But what if such a statement is a function call?

Well, the analysis could continue on the statements of the targeted function, and then jump back to where it was once the function returns.

… And what if this function is an external function? For example provided by a dynamically linked library?

Ah! In such case, the statements that make up the content of the targeted function (its implementation in the binary) are not directly available for analysis.

One thing to note is that we don’t really want to analyse a external library as part of the process: We want to focus on the binary at hand, and prefer to avoid spending resources (computing time and memory) tracking what happens “outside” of it…

Most of the time though, we “know” what a library function does. Here are examples of what we “know” about a couple of libc functions:

From the program perspective, and thus the analysis perspective, these functions are black boxes: their implementation details remain hidden. However, we are only interested in the effect such functions have on the state of the system when the program is running; From the analysis perspective, the effect they have on the representation of this state.

So?

What we need in both cases is a mechanism to produce the effect of a function on the state representation managed by the analysis. This is achieved using function handlers.

In the first case (local function), a function handler should drive the analysis to the function called, and return adequately; In the second case (external function), a function handler should update the analysis state respecting the “known” function behavior.

Usage and description

We are implementing our analysis using angr’s ReachingDefinitionsAnalysis. As described in the documentation, it takes an optional function_handler parameter.

To work, what is passed via function_handler needs to inherit from the FunctionHandler astract base class: As you can see in the documentation of FunctionHandler, it means that the given function_handler must have the following methods:

Those are the minimal requirements for a function_handler to have.

Then, for ReachingDefinitionsAnalysis to be able to deal with say printf, malloc, or strcpy, we would add the corresponding methods: handle_printf, handle_malloc, and handle_strcpy to the concrete class inheriting from FunctionHandler. For example, such a concrete class MyHandlers, would produce instances exposing handle_printf, that will be called during the analysis when a call to printf is encountered in the binary (and respectively handle_malloc, handle_strcpy for calls to malloc, strcpy) 3.

To recap, and because the terminology is somewhat confusing:

Examples

Let’s see what it looks like in practice.

Binary to analyse

We will analyse the binary produced by command_line_injection.c . Here is how to download and compile the code:

git clone git@github.com:Pamplemousse/bits_of_static_binary_analysis.git
cd bits_of_static_binary_analysis
make

If everything went fine, running ./build/command_line_injection ~/ should list your home directory.

The simplest analysis

The most straightforward analysis starting from the function main looks like the following analysis.py:

from angr import Project

project = Project('./build/command_line_injection', auto_load_libs=False)
cfg = project.analyses.CFGFast(normalize=True, data_references=True)

main_function = project.kb.functions.function(name='main')
program_rda = project.analyses.ReachingDefinitions(
    subject=main_function,
)

# Do domething with `program_rda`
...

However, as is, the analysis is intra-procedural: it only runs on the function main. Pleasantly, when executing python analysis.py, angr warns us with the following "Please implement the local function handler with your own logic."; So we know it encountered a call to a local function, and he felt helpless. Poor angr.

Handle local functions

We can improve analysis.py to give the ReachingDefinitionsAnalysis the necessary handle_local_function that will get triggered when analysing main, precisely on the instruction calling check.

from angr import Project
from angr.analyses.reaching_definitions.function_handler import FunctionHandler


class MyHandler(FunctionHandler):
    def __init__(self):
        self._analysis = None

    def hook(self, rda):
        self._analysis = rda
        return self

    def handle_local_function(self, state, function_address, call_stack, maximum_local_call_depth, visited_blocks,
                              dependency_graph, src_ins_addr=None, codeloc=None):
        function = self._analysis.project.kb.functions.function(function_address)

        # Break point so you can play around with what you have access to here.
        import ipdb; ipdb.set_trace()
        pass

        return True, state, visited_blocks, dependency_graph

project = Project('./build/command_line_injection', auto_load_libs=False)
cfg = project.analyses.CFGFast(normalize=True, data_references=True)

handler = MyHandler()

main_function = project.kb.functions.function(name='main')
program_rda = project.analyses.ReachingDefinitions(
    function_handler=handler,
    observe_all=True,
    subject=main_function
)

# Do domething with `program_rda`
...

Running python analysis.py, we now get a shell thanks to the breakpoint placed in the handle_local_function. From there, I invite you to investigate and play around with what you can do; And remember: you have access to a lot of facts gathered by angr through self._analysis.project whether it be .arch, .kb, etc.

Handling external functions

As presented earlier, handlers can also be triggered on calls to library functions, and used to model the effects of code that cannot be directly analysed. In our example, we can see that the function check calls the libc function sprintf.

Here is a new analysis.py that showcases how to have the analysis to consider this call; With a richer MyHandler, containing a handle_sprintf method.

from angr import Project
from angr.analyses.reaching_definitions.function_handler import FunctionHandler


class MyHandler(FunctionHandler):
    def __init__(self):
        self._analysis = None

    def hook(self, rda):
        self._analysis = rda
        return self

    def handle_local_function(self, state, function_address, call_stack, maximum_local_call_depth, visited_blocks,
                              dependency_graph, src_ins_addr=None, codeloc=None):
        function = self._analysis.project.kb.functions.function(function_address)
        return True, state, visited_blocks, dependency_graph

    def handle_sprintf(self, state, codeloc):
        # Break point so you can play around with what you have access to here.
        import ipdb; ipdb.set_trace()
        pass

        return True, state

project = Project('./build/command_line_injection', auto_load_libs=False)
cfg = project.analyses.CFGFast(normalize=True, data_references=True)

handler = MyHandler()

sprintf_plt_stub = project.kb.functions.function(name='sprintf', plt=True)
program_rda = project.analyses.ReachingDefinitions(
    function_handler=handler,
    observe_all=True,
    subject=sprintf_plt_stub
)

# Do domething with `program_rda`
...

Notice that for the sake of example simplicity, the analysis gets started on the sprintf PLT stub reconstituted by angr. If it was not, this example would be hitting the handle_local_function first, because check has a call instruction pointing to a PLT location, which is not at an external address! In other words, handling external functions that are called using the PLT mechanics, requires to start a ReachingDefinitionsAnalysis on the targeted PLT stub, with the proper handler.

Ideally, we would like to start the analysis on the function check, and expect the handle_sprintf to be called sometime: In particular, the analysis should use handle_local_function to point the analysis at the PLT stub, which in turn should end up triggering the handle_sprintf.

Coincidentally, this is a special case of a more generic problem: How to perform inter-procedural analysis?

One step beyond: Inter-procedural analysis

With real world programs, it is very unlikely that all the responses to analysts’ questions are waiting at a shallow level. Most of the time, we want to start the analysis from the entrypoint of the binary, and expect it to carry on across function calls until we get the information we were looking for. In our example, this means starting the ReachingDefinitionsAnalysis on the main function, and expecting it to analyse check, as well as calling handle_sprintf.

Because we want an analysis to run over multiple functions, we need an inter-procedural analysis. Sadly, this is currently not implemented in angr main repository!

In the presentation, and the corresponding video segment I however presented at a “high level” how we can turn angr’s ReachingDefinitionsAnalysis into an inter-procedural analysis.

The idea is to run it recursively: every time a call to a local function is encountered, a “child” ReachingDefinitionsAnalysis is started on the targeted function, and, once finished, the analysis state at its end is “copied” back to the parent, for it to continue from (after the call instruction).

Its implementation relies on function handlers. In particular, handle_local_function is where the “recursiveness” happens:

Some functions can have several exits (in the case of multiple return statements in the source for example), and thus several output states from the analysis perspective! In such case, the handle_local_function must merge those states together to create a unique one for the parent analysis to resume from.

Conclusion

Function handlers are a handy tool for angr’s static analysis using ReachingDefinitionsAnalysis: they can be leveraged to apply the effect of external function to the state without having access to their implementation.

By applying the same principle on local functions, they even bring us one step beyond: inter-procedural analysis is nothing more than customization of the analysis behavior (recursiveness, state management, and internal bookkeeping) on call instructions.

Hoping you found those examples enlightening, happy hacking!


  1. By “tainting” a variable taking a value from a user input, and propagating this taint on use, one can find other variables that can be influenced by a user input. Tainted variables being used for sensitive operations (arguments to execve, or system, affectation to a buffer of fixed size, etc.) points to potential security vulnerabilities. 

  2. If you want to learn more details about how the analysis works, and a more concrete example of such analysis, I strongly encourage you to go look at the presentation mentioned above, available on YouTube

  3. For those interested in the underlying mechanics on the angr side of things, the handler’s instance method is called in angr/analyses/reaching_definitions/engine_vex.py