Handle function calls during static analysis in angr
25 February 2021
angr , development , binary analysis
On the research project I work on at SEFCOM, I use
angr to statically analyse binary programs.
Incidentally, I was invited to give a presentation as part of CSE545 during the Fall semester of 2020 at ASU.
This talk was meant to be a hands-on introduction on data-flow analysis, using
angr, to find “taint-style” vulnerabilities 1 in binaries.
Thanks to the one of the class’s TA, the video recording is available.
Furthermore, I published the slides of the presentation, as well as the illustrating code examples.
Some of the examples do not work “as is”.
It means that for the people trying to reproduce it, extra elbow grease is necessary.
Sadly, it can be somewhat of a tedious (and painful) process:
angr is not really stable (its API evolves wildly depending on the needs of people working on it), and documentation is helpful, but not self-sufficient.
This post is aiming to bridge the gap for who would like to get similar examples working. In particular, by answering the question:
How to write a function handler to simulate the effect of a function on the state of the analysis?
This post is divided in four sections:
- Context: Presentation of the analysis, and problems encountered;
- Usage and description: Runthrough of the documentation, implementation requirements and first thoughts;
- Examples: Examples of handlers for a local and an external function;
- One step beyond: Inter-procedural analysis: Discussion and high-level overview of turning
ReachingDefinitionsAnalysisinter-procedural using function handlers;
- Conclusion: A closing proclamation.
If you already know what “function handler” means, you can skip the Context section, and start reading from the Usage and description section.
At a high level, we can use a static analysis to gather data-flow facts about the variables of programs without executing them. To do so, such analysis somewhat sequentially interprets the effects of program’s statements on the state it keeps track of 2.
But what if such a statement is a function call?
Well, the analysis could continue on the statements of the targeted function, and then jump back to where it was once the function returns.
… And what if this function is an external function? For example provided by a dynamically linked library?
Ah! In such case, the statements that make up the content of the targeted function (its implementation in the binary) are not directly available for analysis.
One thing to note is that we don’t really want to analyse a external library as part of the process: We want to focus on the binary at hand, and prefer to avoid spending resources (computing time and memory) tracking what happens “outside” of it…
Most of the time though, we “know” what a library function does. Here are examples of what we “know” about a couple of libc functions:
printf: Uses several parameters to deterministically compose a string, and write it to
malloc: Allocates a chunk of memory of size determined from its first parameter, and return a pointer to it;
strcpy: Copies the content of its second parameter into the memory area pointed to by its first parameter.
From the program perspective, and thus the analysis perspective, these functions are black boxes: their implementation details remain hidden. However, we are only interested in the effect such functions have on the state of the system when the program is running; From the analysis perspective, the effect they have on the representation of this state.
What we need in both cases is a mechanism to produce the effect of a function on the state representation managed by the analysis. This is achieved using function handlers.
In the first case (local function), a function handler should drive the analysis to the function called, and return adequately; In the second case (external function), a function handler should update the analysis state respecting the “known” function behavior.
Usage and description
We are implementing our analysis using
As described in the documentation, it takes an optional
To work, what is passed via
function_handler needs to inherit from the
FunctionHandler astract base class:
As you can see in the documentation of
FunctionHandler, it means that the given
function_handler must have the following methods:
hook: A mean for the handler to have a reference to an analysis, to be able access to information about its context (architecture, facts gathered in the knowledge base, etc.). In particular,
ReachingDefinitionsAnalysiscalls it at initialisation;
handle_local_function: That the analysis will run when it encounters a call to a local function.
Those are the minimal requirements for a
function_handler to have.
ReachingDefinitionsAnalysis to be able to deal with say
strcpy, we would add the corresponding methods:
handle_strcpy to the concrete class inheriting from
For example, such a concrete class
MyHandlers, would produce instances exposing
handle_printf, that will be called during the analysis when a call to
printf is encountered in the binary (and respectively
handle_strcpy for calls to
To recap, and because the terminology is somewhat confusing:
- A “function handler” is a (Python) method that will be called by the analysis when encountering a
FunctionHandleris an ABC class that describe what a concrete class (say
MyHandlers) to have to work with
function_handleris the name of the parameter to pass the
ReachingDefinitionsAnalysis; It’s a kind of
MyHandlers, and thus of
FunctionHandler, exposing “function handlerS”;
Let’s see what it looks like in practice.
Binary to analyse
We will analyse the binary produced by command_line_injection.c . Here is how to download and compile the code:
git clone firstname.lastname@example.org:Pamplemousse/bits_of_static_binary_analysis.git cd bits_of_static_binary_analysis make
If everything went fine, running
./build/command_line_injection ~/ should list your home directory.
The simplest analysis
The most straightforward analysis starting from the function
main looks like the following
from angr import Project project = Project('./build/command_line_injection', auto_load_libs=False) cfg = project.analyses.CFGFast(normalize=True, data_references=True) main_function = project.kb.functions.function(name='main') program_rda = project.analyses.ReachingDefinitions( subject=main_function, ) # Do domething with `program_rda` ...
However, as is, the analysis is intra-procedural: it only runs on the function
Pleasantly, when executing
angr warns us with the following
"Please implement the local function handler with your own logic."; So we know it encountered a
call to a local function, and he felt helpless.
Handle local functions
We can improve
analysis.py to give the
ReachingDefinitionsAnalysis the necessary
handle_local_function that will get triggered when analysing
main, precisely on the instruction calling
from angr import Project from angr.analyses.reaching_definitions.function_handler import FunctionHandler class MyHandler(FunctionHandler): def __init__(self): self._analysis = None def hook(self, rda): self._analysis = rda return self def handle_local_function(self, state, function_address, call_stack, maximum_local_call_depth, visited_blocks, dependency_graph, src_ins_addr=None, codeloc=None): function = self._analysis.project.kb.functions.function(function_address) # Break point so you can play around with what you have access to here. import ipdb; ipdb.set_trace() pass return True, state, visited_blocks, dependency_graph project = Project('./build/command_line_injection', auto_load_libs=False) cfg = project.analyses.CFGFast(normalize=True, data_references=True) handler = MyHandler() main_function = project.kb.functions.function(name='main') program_rda = project.analyses.ReachingDefinitions( function_handler=handler, observe_all=True, subject=main_function ) # Do domething with `program_rda` ...
python analysis.py, we now get a shell thanks to the breakpoint placed in the
From there, I invite you to investigate and play around with what you can do;
And remember: you have access to a lot of facts gathered by
self._analysis.project whether it be
Handling external functions
As presented earlier, handlers can also be triggered on calls to library functions, and used to model the effects of code that cannot be directly analysed.
In our example, we can see that the function
check calls the libc function
Here is a new
analysis.py that showcases how to have the analysis to consider this
With a richer
MyHandler, containing a
from angr import Project from angr.analyses.reaching_definitions.function_handler import FunctionHandler class MyHandler(FunctionHandler): def __init__(self): self._analysis = None def hook(self, rda): self._analysis = rda return self def handle_local_function(self, state, function_address, call_stack, maximum_local_call_depth, visited_blocks, dependency_graph, src_ins_addr=None, codeloc=None): function = self._analysis.project.kb.functions.function(function_address) return True, state, visited_blocks, dependency_graph def handle_sprintf(self, state, codeloc): # Break point so you can play around with what you have access to here. import ipdb; ipdb.set_trace() pass return True, state project = Project('./build/command_line_injection', auto_load_libs=False) cfg = project.analyses.CFGFast(normalize=True, data_references=True) handler = MyHandler() sprintf_plt_stub = project.kb.functions.function(name='sprintf', plt=True) program_rda = project.analyses.ReachingDefinitions( function_handler=handler, observe_all=True, subject=sprintf_plt_stub ) # Do domething with `program_rda` ...
Notice that for the sake of example simplicity, the analysis gets started on the
sprintf PLT stub reconstituted by
If it was not, this example would be hitting the
handle_local_function first, because
check has a
call instruction pointing to a PLT location, which is not at an external address!
In other words, handling external functions that are called using the PLT mechanics, requires to start a
ReachingDefinitionsAnalysis on the targeted PLT stub, with the proper handler.
Ideally, we would like to start the analysis on the function
check, and expect the
handle_sprintf to be called sometime:
In particular, the analysis should use
handle_local_function to point the analysis at the PLT stub, which in turn should end up triggering the
Coincidentally, this is a special case of a more generic problem: How to perform inter-procedural analysis?
One step beyond: Inter-procedural analysis
With real world programs, it is very unlikely that all the responses to analysts’ questions are waiting at a shallow level.
Most of the time, we want to start the analysis from the entrypoint of the binary, and expect it to carry on across function calls until we get the information we were looking for.
In our example, this means starting the
ReachingDefinitionsAnalysis on the
main function, and expecting it to analyse
check, as well as calling
Because we want an analysis to run over multiple functions, we need an inter-procedural analysis.
Sadly, this is currently not implemented in
angr main repository!
In the presentation, and the corresponding video segment I however presented at a “high level” how we can turn
ReachingDefinitionsAnalysis into an inter-procedural analysis.
The idea is to run it recursively: every time a
call to a local function is encountered, a “child”
ReachingDefinitionsAnalysis is started on the targeted function, and, once finished, the analysis state at its end is “copied” back to the parent, for it to continue from (after the
Its implementation relies on function handlers.
handle_local_function is where the “recursiveness” happens:
- It starts the child
ReachingDefinitionsAnalysison the targeted function, with proper parameters (passing the current
kb, initialising the child with the parent state using the
init_stateparameter, forwarding the
- It updates the parent’s
.observed_resultswhen the child returns, for the parent to be aware of what was captured during the child’s run;
- It returns the
state(which contains the current
live_definitions) for the parent to continue from, as well as other structures the analysis records (
Some functions can have several exits (in the case of multiple
return statements in the source for example), and thus several output states from the analysis perspective!
In such case, the
handle_local_function must merge those states together to create a unique one for the parent analysis to resume from.
Function handlers are a handy tool for
angr’s static analysis using
ReachingDefinitionsAnalysis: they can be leveraged to apply the effect of external function to the state without having access to their implementation.
By applying the same principle on local functions, they even bring us one step beyond: inter-procedural analysis is nothing more than customization of the analysis behavior (recursiveness, state management, and internal bookkeeping) on
Hoping you found those examples enlightening, happy hacking!
By “tainting” a variable taking a value from a user input, and propagating this taint on use, one can find other variables that can be influenced by a user input. Tainted variables being used for sensitive operations (arguments to
system, affectation to a buffer of fixed size, etc.) points to potential security vulnerabilities. ↩
If you want to learn more details about how the analysis works, and a more concrete example of such analysis, I strongly encourage you to go look at the presentation mentioned above, available on YouTube. ↩
For those interested in the underlying mechanics on the
angrside of things, the handler’s instance method is called in angr/analyses/reaching_definitions/engine_vex.py. ↩