Jekyll2024-02-06T07:14:28+00:00https://blog.xaviermaso.com/feed.xmlpamplemousse’s blogSome thoughts and writings related to projects I work on.PamplemousseHandle function calls during static analysis in angr2021-02-25T00:00:00+00:002021-02-25T00:00:00+00:00https://blog.xaviermaso.com/2021/02/25/Handle%20function%20calls%20during%20static%20analysis%20with%20angr<p>On the research project I work on at <a href="https://sefcom.asu.edu/">SEFCOM</a>, I use <a href="https://angr.io/"><code class="language-plaintext highlighter-rouge">angr</code></a> to statically analyse binary programs.</p>
<p>Incidentally, I was invited to give a presentation as part of <a href="https://cse545.tiffanybao.com/">CSE545</a> during the Fall semester of 2020 at <a href="https://www.asu.edu/">ASU</a>.
This talk was meant to be a hands-on introduction on data-flow analysis, using <a href="https://angr.io/"><code class="language-plaintext highlighter-rouge">angr</code></a>, to find “taint-style” vulnerabilities <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> in binaries.
Thanks to the one of the class’s TA, the <a href="https://www.youtube.com/watch?v=4SMRnpuqN6E&start=490">video recording</a> is available.
Furthermore, I published <a href="https://docs.google.com/presentation/d/13SDNRKHblo2xenczp9m6rQahigtwygmUcrBhZ-G3gvo">the slides of the presentation</a>, as well as <a href="https://github.com/Pamplemousse/bits_of_static_binary_analysis/">the illustrating code examples</a>.</p>
<p>Some of the examples do not work “as is”.
It means that for the people trying to reproduce it, extra elbow grease is necessary.
Sadly, it can be somewhat of a tedious (and painful) process: <a href="https://angr.io/"><code class="language-plaintext highlighter-rouge">angr</code></a> is not really stable (its API evolves wildly depending on the needs of people working on it), and <a href="https://angr.io/api-doc/angr.html">documentation</a> is helpful, but not self-sufficient.</p>
<p>This post is aiming to bridge the gap for who would like to get similar examples working.
In particular, by answering the question:</p>
<p><strong>How to write a function handler to simulate the effect of a function on the state of the analysis?</strong></p>
<p>This post is divided in four sections:</p>
<ul>
<li><a href="#context">Context</a>: Presentation of the analysis, and problems encountered;</li>
<li><a href="#usage-and-description">Usage and description</a>: Runthrough of the documentation, implementation requirements and first thoughts;</li>
<li><a href="#examples">Examples</a>: Examples of handlers for a local and an external function;</li>
<li><a href="#one-step-beyond-inter-procedural-analysis">One step beyond: Inter-procedural analysis</a>: Discussion and high-level overview of turning <code class="language-plaintext highlighter-rouge">ReachingDefinitionsAnalysis</code> inter-procedural using function handlers;</li>
<li><a href="#conclusion">Conclusion</a>: A closing proclamation.</li>
</ul>
<p>If you already know what “function handler” means, you can skip the <a href="#context">Context</a> section, and start reading from the <a href="#usage-and-description">Usage and description</a> section.</p>
<h1 id="context">Context</h1>
<p>At a high level, we can use a static analysis to gather data-flow facts about the variables of programs without executing them.
To do so, such analysis somewhat sequentially interprets the effects of program’s statements on the state it keeps track of <sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.</p>
<h4 id="but-what-if-such-a-statement-is-a-function-call">But what if such a statement is a function call?</h4>
<p>Well, the analysis could continue on the statements of the targeted function, and then jump back to where it was once the function returns.</p>
<h4 id="-and-what-if-this-function-is-an-external-function-for-example-provided-by-a-dynamically-linked-library">… And what if this function is an external function? For example provided by a dynamically linked library?</h4>
<p>Ah! In such case, the statements that make up the content of the targeted function (its implementation in the binary) are not directly available for analysis.</p>
<p>One thing to note is that we don’t really want to analyse a external library as part of the process:
We want to focus on the binary at hand, and prefer to avoid spending resources (computing time and memory) tracking what happens “outside” of it…</p>
<p>Most of the time though, we “know” what a library function does.
Here are examples of what we “know” about a couple of libc functions:</p>
<ul>
<li><a href="https://linux.die.net/man/3/printf"><code class="language-plaintext highlighter-rouge">printf</code></a>: Uses several parameters to deterministically compose a string, and write it to <code class="language-plaintext highlighter-rouge">stdout</code>;</li>
<li><a href="https://linux.die.net/man/3/malloc"><code class="language-plaintext highlighter-rouge">malloc</code></a>: Allocates a chunk of memory of size determined from its first parameter, and return a pointer to it;</li>
<li><a href="https://linux.die.net/man/3/strcpy"><code class="language-plaintext highlighter-rouge">strcpy</code></a>: Copies the content of its second parameter into the memory area pointed to by its first parameter.</li>
</ul>
<p>From the program perspective, and thus the analysis perspective, these functions are black boxes: their implementation details remain hidden.
However, we are only interested in the effect such functions have on the state of the system when the program is running;
From the analysis perspective, the effect they have on the representation of this state.</p>
<h4 id="so">So?</h4>
<p>What we need in both cases is <strong>a mechanism to produce the effect of a function on the state representation managed by the analysis</strong>.
This is achieved using <strong>function handlers</strong>.</p>
<p>In the first case (local function), a function handler should drive the analysis to the function called, and return adequately;
In the second case (external function), a function handler should update the analysis state respecting the “known” function behavior.</p>
<h1 id="usage-and-description">Usage and description</h1>
<p>We are implementing our analysis using <code class="language-plaintext highlighter-rouge">angr</code>’s <code class="language-plaintext highlighter-rouge">ReachingDefinitionsAnalysis</code>.
As described in <a href="https://angr.io/api-doc/angr.html?highlight=cfg#angr.analyses.reaching_definitions.reaching_definitions.ReachingDefinitionsAnalysis">the documentation</a>, it takes an optional <code class="language-plaintext highlighter-rouge">function_handler</code> parameter.</p>
<p>To work, what is passed via <code class="language-plaintext highlighter-rouge">function_handler</code> needs to inherit from the <code class="language-plaintext highlighter-rouge">FunctionHandler</code> <a href="https://docs.python.org/3/glossary.html#term-abstract-base-class">astract base class</a>:
As you can see in <a href="https://angr.io/api-doc/angr.html?highlight=cfg#angr.analyses.reaching_definitions.function_handler.FunctionHandler">the documentation of <code class="language-plaintext highlighter-rouge">FunctionHandler</code></a>, it means that the given <code class="language-plaintext highlighter-rouge">function_handler</code> must have the following methods:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">hook</code>: A mean for the handler to have a reference to an analysis, to be able access to information about its context (architecture, facts gathered in the knowledge base, etc.).
In particular, <a href="https://github.com/angr/angr/blob/0558f5758814cf3f17912b0621f4adb8d0f92240/angr/analyses/reaching_definitions/reaching_definitions.py#L101"><code class="language-plaintext highlighter-rouge">ReachingDefinitionsAnalysis</code> calls it at initialisation</a>;</li>
<li><code class="language-plaintext highlighter-rouge">handle_local_function</code>: That the analysis will run when it encounters a call to a local function.</li>
</ul>
<p>Those are the minimal requirements for a <code class="language-plaintext highlighter-rouge">function_handler</code> to have.</p>
<p>Then, for <code class="language-plaintext highlighter-rouge">ReachingDefinitionsAnalysis</code> to be able to deal with say <code class="language-plaintext highlighter-rouge">printf</code>, <code class="language-plaintext highlighter-rouge">malloc</code>, or <code class="language-plaintext highlighter-rouge">strcpy</code>, we would add the corresponding methods: <code class="language-plaintext highlighter-rouge">handle_printf</code>, <code class="language-plaintext highlighter-rouge">handle_malloc</code>, and <code class="language-plaintext highlighter-rouge">handle_strcpy</code> to the concrete class inheriting from <code class="language-plaintext highlighter-rouge">FunctionHandler</code>.
For example, such a concrete class <code class="language-plaintext highlighter-rouge">MyHandlers</code>, would produce instances exposing <code class="language-plaintext highlighter-rouge">handle_printf</code>, that will be called during the analysis when a call to <code class="language-plaintext highlighter-rouge">printf</code> is encountered in the binary (and respectively <code class="language-plaintext highlighter-rouge">handle_malloc</code>, <code class="language-plaintext highlighter-rouge">handle_strcpy</code> for calls to <code class="language-plaintext highlighter-rouge">malloc</code>, <code class="language-plaintext highlighter-rouge">strcpy</code>) <sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.</p>
<p><strong>To recap</strong>, and because the terminology is somewhat confusing:</p>
<ul>
<li>A “function handler” is a (Python) method that will be called by the analysis when encountering a <code class="language-plaintext highlighter-rouge">call</code> instruction;</li>
<li><code class="language-plaintext highlighter-rouge">FunctionHandler</code> is an ABC class that describe what a concrete class (say <code class="language-plaintext highlighter-rouge">MyHandlers</code>) to have to work with <code class="language-plaintext highlighter-rouge">angr</code>;</li>
<li><code class="language-plaintext highlighter-rouge">function_handler</code> is the name of the parameter to pass the <code class="language-plaintext highlighter-rouge">ReachingDefinitionsAnalysis</code>; It’s a kind of <code class="language-plaintext highlighter-rouge">MyHandlers</code>, and thus of <code class="language-plaintext highlighter-rouge">FunctionHandler</code>, exposing “function handler<strong>S</strong>”;</li>
</ul>
<h1 id="examples">Examples</h1>
<p>Let’s see what it looks like in practice.</p>
<h2 id="binary-to-analyse">Binary to analyse</h2>
<p>We will analyse the binary produced by <a href="https://github.com/Pamplemousse/bits_of_static_binary_analysis/blob/main/source/command_line_injection.c">command_line_injection.c</a> .
Here is how to download and compile the code:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone git@github.com:Pamplemousse/bits_of_static_binary_analysis.git
<span class="nb">cd </span>bits_of_static_binary_analysis
make
</code></pre></div></div>
<p>If everything went fine, running <code class="language-plaintext highlighter-rouge">./build/command_line_injection ~/</code> should list your home directory.</p>
<h2 id="the-simplest-analysis">The simplest analysis</h2>
<p>The most straightforward analysis starting from the <a href="https://github.com/Pamplemousse/bits_of_static_binary_analysis/blob/313f93f1fc287a60dbb3e3217c2c4b3a2cbabf9f/source/command_line_injection.c#L11-L14">function <code class="language-plaintext highlighter-rouge">main</code></a> looks like the following <code class="language-plaintext highlighter-rouge">analysis.py</code>:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">angr</span> <span class="kn">import</span> <span class="n">Project</span>
<span class="n">project</span> <span class="o">=</span> <span class="n">Project</span><span class="p">(</span><span class="s">'./build/command_line_injection'</span><span class="p">,</span> <span class="n">auto_load_libs</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">cfg</span> <span class="o">=</span> <span class="n">project</span><span class="p">.</span><span class="n">analyses</span><span class="p">.</span><span class="n">CFGFast</span><span class="p">(</span><span class="n">normalize</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">data_references</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">main_function</span> <span class="o">=</span> <span class="n">project</span><span class="p">.</span><span class="n">kb</span><span class="p">.</span><span class="n">functions</span><span class="p">.</span><span class="n">function</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">'main'</span><span class="p">)</span>
<span class="n">program_rda</span> <span class="o">=</span> <span class="n">project</span><span class="p">.</span><span class="n">analyses</span><span class="p">.</span><span class="n">ReachingDefinitions</span><span class="p">(</span>
<span class="n">subject</span><span class="o">=</span><span class="n">main_function</span><span class="p">,</span>
<span class="p">)</span>
<span class="c1"># Do domething with `program_rda`
</span><span class="p">...</span>
</code></pre></div></div>
<p>However, as is, the analysis is <strong>intra-procedural</strong>: it only runs on the function <code class="language-plaintext highlighter-rouge">main</code>.
Pleasantly, when executing <code class="language-plaintext highlighter-rouge">python analysis.py</code>, <code class="language-plaintext highlighter-rouge">angr</code> warns us with the following <code class="language-plaintext highlighter-rouge">"Please implement the local function handler with your own logic."</code>; So we know it encountered a <code class="language-plaintext highlighter-rouge">call</code> to a local function, and he felt helpless.
Poor <code class="language-plaintext highlighter-rouge">angr</code>.</p>
<h2 id="handle-local-functions">Handle local functions</h2>
<p>We can improve <code class="language-plaintext highlighter-rouge">analysis.py</code> to give the <code class="language-plaintext highlighter-rouge">ReachingDefinitionsAnalysis</code> the necessary <code class="language-plaintext highlighter-rouge">handle_local_function</code> that will get triggered when analysing <code class="language-plaintext highlighter-rouge">main</code>, precisely on the instruction calling <code class="language-plaintext highlighter-rouge">check</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">angr</span> <span class="kn">import</span> <span class="n">Project</span>
<span class="kn">from</span> <span class="nn">angr.analyses.reaching_definitions.function_handler</span> <span class="kn">import</span> <span class="n">FunctionHandler</span>
<span class="k">class</span> <span class="nc">MyHandler</span><span class="p">(</span><span class="n">FunctionHandler</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">_analysis</span> <span class="o">=</span> <span class="bp">None</span>
<span class="k">def</span> <span class="nf">hook</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">rda</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">_analysis</span> <span class="o">=</span> <span class="n">rda</span>
<span class="k">return</span> <span class="bp">self</span>
<span class="k">def</span> <span class="nf">handle_local_function</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">state</span><span class="p">,</span> <span class="n">function_address</span><span class="p">,</span> <span class="n">call_stack</span><span class="p">,</span> <span class="n">maximum_local_call_depth</span><span class="p">,</span> <span class="n">visited_blocks</span><span class="p">,</span>
<span class="n">dependency_graph</span><span class="p">,</span> <span class="n">src_ins_addr</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">codeloc</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
<span class="n">function</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_analysis</span><span class="p">.</span><span class="n">project</span><span class="p">.</span><span class="n">kb</span><span class="p">.</span><span class="n">functions</span><span class="p">.</span><span class="n">function</span><span class="p">(</span><span class="n">function_address</span><span class="p">)</span>
<span class="c1"># Break point so you can play around with what you have access to here.
</span> <span class="kn">import</span> <span class="nn">ipdb</span><span class="p">;</span> <span class="n">ipdb</span><span class="p">.</span><span class="n">set_trace</span><span class="p">()</span>
<span class="k">pass</span>
<span class="k">return</span> <span class="bp">True</span><span class="p">,</span> <span class="n">state</span><span class="p">,</span> <span class="n">visited_blocks</span><span class="p">,</span> <span class="n">dependency_graph</span>
<span class="n">project</span> <span class="o">=</span> <span class="n">Project</span><span class="p">(</span><span class="s">'./build/command_line_injection'</span><span class="p">,</span> <span class="n">auto_load_libs</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">cfg</span> <span class="o">=</span> <span class="n">project</span><span class="p">.</span><span class="n">analyses</span><span class="p">.</span><span class="n">CFGFast</span><span class="p">(</span><span class="n">normalize</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">data_references</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">handler</span> <span class="o">=</span> <span class="n">MyHandler</span><span class="p">()</span>
<span class="n">main_function</span> <span class="o">=</span> <span class="n">project</span><span class="p">.</span><span class="n">kb</span><span class="p">.</span><span class="n">functions</span><span class="p">.</span><span class="n">function</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">'main'</span><span class="p">)</span>
<span class="n">program_rda</span> <span class="o">=</span> <span class="n">project</span><span class="p">.</span><span class="n">analyses</span><span class="p">.</span><span class="n">ReachingDefinitions</span><span class="p">(</span>
<span class="n">function_handler</span><span class="o">=</span><span class="n">handler</span><span class="p">,</span>
<span class="n">observe_all</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">subject</span><span class="o">=</span><span class="n">main_function</span>
<span class="p">)</span>
<span class="c1"># Do domething with `program_rda`
</span><span class="p">...</span>
</code></pre></div></div>
<p>Running <code class="language-plaintext highlighter-rouge">python analysis.py</code>, we now get a shell thanks to the breakpoint placed in the <code class="language-plaintext highlighter-rouge">handle_local_function</code>.
From there, I invite you to investigate and play around with what you can do;
And remember: you have access to a lot of facts gathered by <code class="language-plaintext highlighter-rouge">angr</code> through <code class="language-plaintext highlighter-rouge">self._analysis.project</code> whether it be <code class="language-plaintext highlighter-rouge">.arch</code>, <code class="language-plaintext highlighter-rouge">.kb</code>, etc.</p>
<h2 id="handling-external-functions">Handling external functions</h2>
<p>As presented earlier, handlers can also be triggered on calls to library functions, and used to model the effects of code that cannot be directly analysed.
In <a href="https://github.com/Pamplemousse/bits_of_static_binary_analysis/blob/313f93f1fc287a60dbb3e3217c2c4b3a2cbabf9f/source/command_line_injection.c#L7">our example</a>, we can see that the function <code class="language-plaintext highlighter-rouge">check</code> calls the libc function <code class="language-plaintext highlighter-rouge">sprintf</code>.</p>
<p>Here is a new <code class="language-plaintext highlighter-rouge">analysis.py</code> that showcases how to have the analysis to consider this <code class="language-plaintext highlighter-rouge">call</code>;
With a richer <code class="language-plaintext highlighter-rouge">MyHandler</code>, containing a <code class="language-plaintext highlighter-rouge">handle_sprintf</code> method.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">angr</span> <span class="kn">import</span> <span class="n">Project</span>
<span class="kn">from</span> <span class="nn">angr.analyses.reaching_definitions.function_handler</span> <span class="kn">import</span> <span class="n">FunctionHandler</span>
<span class="k">class</span> <span class="nc">MyHandler</span><span class="p">(</span><span class="n">FunctionHandler</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">_analysis</span> <span class="o">=</span> <span class="bp">None</span>
<span class="k">def</span> <span class="nf">hook</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">rda</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">_analysis</span> <span class="o">=</span> <span class="n">rda</span>
<span class="k">return</span> <span class="bp">self</span>
<span class="k">def</span> <span class="nf">handle_local_function</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">state</span><span class="p">,</span> <span class="n">function_address</span><span class="p">,</span> <span class="n">call_stack</span><span class="p">,</span> <span class="n">maximum_local_call_depth</span><span class="p">,</span> <span class="n">visited_blocks</span><span class="p">,</span>
<span class="n">dependency_graph</span><span class="p">,</span> <span class="n">src_ins_addr</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">codeloc</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
<span class="n">function</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_analysis</span><span class="p">.</span><span class="n">project</span><span class="p">.</span><span class="n">kb</span><span class="p">.</span><span class="n">functions</span><span class="p">.</span><span class="n">function</span><span class="p">(</span><span class="n">function_address</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">True</span><span class="p">,</span> <span class="n">state</span><span class="p">,</span> <span class="n">visited_blocks</span><span class="p">,</span> <span class="n">dependency_graph</span>
<span class="k">def</span> <span class="nf">handle_sprintf</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">state</span><span class="p">,</span> <span class="n">codeloc</span><span class="p">):</span>
<span class="c1"># Break point so you can play around with what you have access to here.
</span> <span class="kn">import</span> <span class="nn">ipdb</span><span class="p">;</span> <span class="n">ipdb</span><span class="p">.</span><span class="n">set_trace</span><span class="p">()</span>
<span class="k">pass</span>
<span class="k">return</span> <span class="bp">True</span><span class="p">,</span> <span class="n">state</span>
<span class="n">project</span> <span class="o">=</span> <span class="n">Project</span><span class="p">(</span><span class="s">'./build/command_line_injection'</span><span class="p">,</span> <span class="n">auto_load_libs</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">cfg</span> <span class="o">=</span> <span class="n">project</span><span class="p">.</span><span class="n">analyses</span><span class="p">.</span><span class="n">CFGFast</span><span class="p">(</span><span class="n">normalize</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">data_references</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">handler</span> <span class="o">=</span> <span class="n">MyHandler</span><span class="p">()</span>
<span class="n">sprintf_plt_stub</span> <span class="o">=</span> <span class="n">project</span><span class="p">.</span><span class="n">kb</span><span class="p">.</span><span class="n">functions</span><span class="p">.</span><span class="n">function</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">'sprintf'</span><span class="p">,</span> <span class="n">plt</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">program_rda</span> <span class="o">=</span> <span class="n">project</span><span class="p">.</span><span class="n">analyses</span><span class="p">.</span><span class="n">ReachingDefinitions</span><span class="p">(</span>
<span class="n">function_handler</span><span class="o">=</span><span class="n">handler</span><span class="p">,</span>
<span class="n">observe_all</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">subject</span><span class="o">=</span><span class="n">sprintf_plt_stub</span>
<span class="p">)</span>
<span class="c1"># Do domething with `program_rda`
</span><span class="p">...</span>
</code></pre></div></div>
<p>Notice that for the sake of example simplicity, the analysis gets started on the <code class="language-plaintext highlighter-rouge">sprintf</code> PLT stub reconstituted by <code class="language-plaintext highlighter-rouge">angr</code>.
If it was not, this example would be hitting the <code class="language-plaintext highlighter-rouge">handle_local_function</code> first, because <code class="language-plaintext highlighter-rouge">check</code> has a <code class="language-plaintext highlighter-rouge">call</code> instruction pointing to a PLT location, which is not at an external address!
In other words, handling external functions that are called using the PLT mechanics, requires to start a <code class="language-plaintext highlighter-rouge">ReachingDefinitionsAnalysis</code> on the targeted PLT stub, with the proper handler.</p>
<p>Ideally, we would like to start the analysis on the function <code class="language-plaintext highlighter-rouge">check</code>, and expect the <code class="language-plaintext highlighter-rouge">handle_sprintf</code> to be called sometime:
In particular, the analysis should use <code class="language-plaintext highlighter-rouge">handle_local_function</code> to point the analysis at the PLT stub, which in turn should end up triggering the <code class="language-plaintext highlighter-rouge">handle_sprintf</code>.</p>
<p>Coincidentally, this is a special case of a more generic problem: How to perform inter-procedural analysis?</p>
<h1 id="one-step-beyond-inter-procedural-analysis"><a href="https://www.youtube.com/watch?v=C9N8piRFVcU">One step beyond</a>: Inter-procedural analysis</h1>
<p>With real world programs, it is very unlikely that all the responses to analysts’ questions are waiting at a shallow level.
Most of the time, we want to start the analysis from the entrypoint of the binary, and expect it to carry on across function calls until we get the information we were looking for.
In our example, this means starting the <code class="language-plaintext highlighter-rouge">ReachingDefinitionsAnalysis</code> on the <code class="language-plaintext highlighter-rouge">main</code> function, and expecting it to analyse <code class="language-plaintext highlighter-rouge">check</code>, as well as calling <code class="language-plaintext highlighter-rouge">handle_sprintf</code>.</p>
<p>Because we want an analysis to run over multiple functions, we need an <strong>inter-procedural</strong> analysis.
Sadly, this is currently not implemented in <code class="language-plaintext highlighter-rouge">angr</code> main repository!</p>
<p><a href="https://docs.google.com/presentation/d/13SDNRKHblo2xenczp9m6rQahigtwygmUcrBhZ-G3gvo/edit#slide=id.g9a7d25ed88_10_3">In the presentation</a>, and <a href="https://www.youtube.com/watch?v=4SMRnpuqN6E&start=2825">the corresponding video segment</a> I however presented at a “high level” how we can turn <code class="language-plaintext highlighter-rouge">angr</code>’s <code class="language-plaintext highlighter-rouge">ReachingDefinitionsAnalysis</code> into an inter-procedural analysis.</p>
<p>The idea is to run it recursively: every time a <code class="language-plaintext highlighter-rouge">call</code> to a local function is encountered, a “child” <code class="language-plaintext highlighter-rouge">ReachingDefinitionsAnalysis</code> is started on the targeted function, and, once finished, the analysis state at its end is “copied” back to the parent, for it to continue from (after the <code class="language-plaintext highlighter-rouge">call</code> instruction).</p>
<p>Its implementation relies on function handlers.
In particular, <code class="language-plaintext highlighter-rouge">handle_local_function</code> is where the “recursiveness” happens:</p>
<ul>
<li>It starts the child <code class="language-plaintext highlighter-rouge">ReachingDefinitionsAnalysis</code> on the targeted function, with proper parameters (passing the current <code class="language-plaintext highlighter-rouge">kb</code>, initialising the child with the parent state using the <code class="language-plaintext highlighter-rouge">init_state</code> parameter, forwarding the <code class="language-plaintext highlighter-rouge">function_handler</code>);</li>
<li>It updates the parent’s <code class="language-plaintext highlighter-rouge">.observed_results</code> when the child returns, for the parent to be aware of what was captured during the child’s run;</li>
<li>It returns the <code class="language-plaintext highlighter-rouge">state</code> (which contains the current <code class="language-plaintext highlighter-rouge">live_definitions</code>) for the parent to continue from, as well as other structures the analysis records (<code class="language-plaintext highlighter-rouge">visited_blocks</code>, <code class="language-plaintext highlighter-rouge">dep_graph</code>).</li>
</ul>
<p>Some functions can have several exits (in the case of multiple <code class="language-plaintext highlighter-rouge">return</code> statements in the source for example), and thus several output states from the analysis perspective!
In such case, the <code class="language-plaintext highlighter-rouge">handle_local_function</code> must merge those states together to create a unique one for the parent analysis to resume from.</p>
<h1 id="conclusion">Conclusion</h1>
<p>Function handlers are a handy tool for <code class="language-plaintext highlighter-rouge">angr</code>’s static analysis using <code class="language-plaintext highlighter-rouge">ReachingDefinitionsAnalysis</code>: they can be leveraged to apply the effect of external function to the state without having access to their implementation.</p>
<p>By applying the same principle on local functions, they even bring us one step beyond: inter-procedural analysis is nothing more than customization of the analysis behavior (recursiveness, state management, and internal bookkeeping) on <code class="language-plaintext highlighter-rouge">call</code> instructions.</p>
<p><strong>Hoping you found those examples enlightening, happy hacking!</strong></p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>By “tainting” a variable taking a value from a user input, and propagating this taint on use, one can find other variables that can be influenced by a user input. Tainted variables being used for sensitive operations (arguments to <code class="language-plaintext highlighter-rouge">execve</code>, or <code class="language-plaintext highlighter-rouge">system</code>, affectation to a buffer of fixed size, etc.) points to potential security vulnerabilities. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>If you want to learn more details about how the analysis works, and a more concrete example of such analysis, I strongly encourage you to go look at the presentation mentioned above, <a href="https://www.youtube.com/watch?v=4SMRnpuqN6E">available on YouTube</a>. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>For those interested in the underlying mechanics on the <code class="language-plaintext highlighter-rouge">angr</code> side of things, the handler’s instance method is called in <a href="https://github.com/angr/angr/blob/0558f5758814cf3f17912b0621f4adb8d0f92240/angr/analyses/reaching_definitions/engine_vex.py#L588-L591">angr/analyses/reaching_definitions/engine_vex.py</a>. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>PamplemousseOn the research project I work on at SEFCOM, I use angr to statically analyse binary programs.Use SMT Solvers to generate crossword grids (3)2019-11-13T00:00:00+00:002019-11-13T00:00:00+00:00https://blog.xaviermaso.com/2019/11/13/Use%20SMT%20Solvers%20to%20generate%20crossword%20grids%20(3)<p>This post is part of a series on using SMT Solvers to generate crossword grids.</p>
<ul>
<li><a href="/2019/11/11/Use-SMT-Solvers-to-generate-crossword-grids-(1).html" target="_blank">Introduction to SMT, and programming with SMT Solvers</a>;</li>
<li><a href="/2019/11/12/Use-SMT-Solvers-to-generate-crossword-grids-(2).html" target="_blank">Definitions and first formulas</a>;</li>
<li>Plumbing everything together, complete formula, and results (<strong>currently reading</strong>).</li>
</ul>
<p>Thanks <a href="https://twitter.com/geistindersh">@geistindersh</a> for his feedback, and corrections!</p>
<hr />
<p>In the two previous posts, we covered how to represent:</p>
<ul>
<li>Valid words;</li>
<li>The potential values they can have, taking their length into consideration;</li>
<li>And their “crossing points”, <em>i.e.</em>, the characters some of them must have in common.</li>
</ul>
<p>We presented how the formulas derived from these representations are fed into a Solver, that would lead us to a set of values that we then “mapped back” into the grid to have it completed.</p>
<p>Although the savant part of the job is done, a couple of points that are left to discuss to end the series:</p>
<ul>
<li>Automate the formula generation, from potentially different grid frames;</li>
<li>Present measurements we made and results we had;</li>
<li><del>Cry over the lack of efficiency</del> Discuss some potential improvements.</li>
</ul>
<h1 id="formula-generation">Formula generation</h1>
<p>In <a href="/2019/11/12/Use-SMT-Solvers-to-generate-crossword-grids-(2).html">the previous post</a>, we presented how to write formulas to encode crossword grids constraints.</p>
<p>So far, the process has been very manual: declaring a variable for each word, and explicitly adding the “intersection” constraint.
One can easily see how this can become arduous as we will want to generate bigger grids.</p>
<p>Ambitiously, our ultimate goal is to be able to generate a grid such as <a href="http://frv100.com/fleches/mf001.htm">this real world one</a>, which dimension is 17x12, counts 64 words and 162 intersections in total.</p>
<p>Because we interact with Z3 using its Python API, it makes it easy for us to write a program in this language to do all the fancy plumbing we need to:</p>
<ul>
<li>Create the variables from a grid frame;</li>
<li>Formulate the constraints using these variables: values words can take, and the common letters they must respect on intersections;</li>
<li>Deal with a large wordlist: 200 000+ words.</li>
</ul>
<h2 id="variables">Variables</h2>
<p>Unlike in the previous post, where we had to deal with a small number of words to represent, we expect here to deal with a consequent number of variables.</p>
<p>Naming them \(\text{horizontal}\), or \(\text{vertical}\) would be very limiting; Still, using a naming scheme to help us locate words in the grid from the name of the variable use to represent them is an helpful idea.</p>
<p>Hence, let’s arbitrarily decide that our variable names will follow the pattern: \(direction\_x\_y\), where \(x\) and \(y\) are respectively the line and column components of the coordinate of the first letter of the word, and \(direction\) takes the value <code class="language-plaintext highlighter-rouge">h</code> or <code class="language-plaintext highlighter-rouge">v</code> if the word is either horizontal or vertical.
We count coordinates respecting the French reading direction.
So, the top left corner has coordinates \((0, 0)\).</p>
<p>Here is an example of a portion of a grid:</p>
<figure style="text-align: center;">
<img alt="" src="/assets/images/crosswords/two_words_for_coordinates.png" style="width: 30%;" />
</figure>
<ul>
<li>\(h\_1\_0\) represents the horizontal word which first letter is on the cell \((1, 0)\), in light blue;</li>
<li>\(v\_0\_1\) represents the vertical word which first letter is on the cell \((0, 1)\), in light orange.</li>
</ul>
<h2 id="grid">Grid</h2>
<p>The biggest grid we are aiming to represent counts 64 words to determine.
Hence, we need to create the same number of variables, respecting the naming convention we just presented.</p>
<p>Doing so manually would be time consuming, especially if we expect to represent different grid frames (I did).
So, I wrote a Python program to generate the variables out of a grid represented as an array of <code class="language-plaintext highlighter-rouge">0</code>s and <code class="language-plaintext highlighter-rouge">1</code>s, as shown in the following picture:</p>
<figure style="text-align: center;">
<img alt="" src="/assets/images/crosswords/what_you_see_what_I_see.png" style="width: 75%;" />
<figcaption>What you see, What I see.</figcaption>
</figure>
<p>From this representation, a complete formula can be generated (the different variables, with the values they can take, and the intersection constraints).</p>
<h2 id="stop-waving-your-hands-where-is-the-code">Stop waving your hands. Where is the code?</h2>
<p>The complete code is available in <a href="https://github.com/Pamplemousse/SMT-solver-playground/crosswords">a GitHub repository</a>, and too long to completely expose here.
Don’t let the amount of files intimidate you, the principles exposed in this series are the one implemented.
Let’s briefly present its content:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">francais.txt</code>: The wordlist where the words are chosen from;</li>
<li><code class="language-plaintext highlighter-rouge">generate_dictionary.py</code>: A script to generate this “normalised” wordlist (remove diacritics, deduplicate) out of a French wordlist found online;</li>
<li><code class="language-plaintext highlighter-rouge">dictionary.py</code>: The representation of the wordlist as a Python structure, with the logic of “splitting” it into several pieces per the word size (as detailed <a href="/2019/10/30/Use-SMT-Solvers-to-generate-crossword-grids-(2).html#a-single-valid-word">in the previous post</a>).</li>
<li><code class="language-plaintext highlighter-rouge">grid.py</code>: The scanning of a grid from <code class="language-plaintext highlighter-rouge">0</code>s and <code class="language-plaintext highlighter-rouge">1</code>s, and interface to query grid related content, such as: list of words and intersections, with their coordinates, following the convention <a href="#variables">exposed above</a>.</li>
<li><code class="language-plaintext highlighter-rouge">test_grid.py</code>: Some unit test for the above logic;</li>
<li><code class="language-plaintext highlighter-rouge">solve.py</code>: <strong>The central piece</strong>, making use of the above components to generate the formula as exposed in this series, call the solver, and print a solution (<del> when </del> if found).</li>
</ul>
<h1 id="results">Results</h1>
<p>I ran the <code class="language-plaintext highlighter-rouge">solve.py</code> program on a Lenovo x220, with an Intel Core i5-2520M CPU (dual core, 2.50GHz base frequency), and 8GB of RAM, running on NixOS 20.03 <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.
I used different parameters, varying the size of the grid and wordlists, at first to ensure that it worked as expected and produced valid solutions, then to measure its efficiency.</p>
<p>The wordlists’ sized has been reduced by shuffling the original (to keep a certain diversity among the options), then selecting only the first elements from it.
I took three different grid sizes: small, medium, and large (respectively 6x3, 12x6, and 17x12); With the two smallest truncated from <a href="http://frv100.com/fleches/mf001.htm">the original 17x12 frame</a>.</p>
<p>Here are the results obtained with our solution, implemented with the code presented earlier:</p>
<ul>
<li>On a 6x3 grid, with wordlists each reduced to 200 words, <code class="language-plaintext highlighter-rouge">SAT</code> with a solution, in ~6 seconds;</li>
<li>On a 6x3 grid, with complete wordlists, <code class="language-plaintext highlighter-rouge">SAT</code> with a solution, in ~1.5 hours;</li>
<li>On a 12x6 grid, with wordlists each reduced to 200 words, <code class="language-plaintext highlighter-rouge">UNSAT</code>, in ~5 minutes;</li>
<li>On a 12x6 grid, with wordlists each reduced to 500 words, <code class="language-plaintext highlighter-rouge">UNSAT</code>, in ~6 hours;</li>
<li>On a 12x6 grid, with complete wordlists, <code class="language-plaintext highlighter-rouge">Unknown</code>, timed out after ~100 hours;</li>
<li><del> On a 17x12 grid, with complete wordlists …</del></li>
</ul>
<p>Generation is working fairly quickly on small grids, using a reduced number of words to pick from.
However, although the production of the formula increases linearly (in the size of the grid, and number of words per wordlists), the time it takes for the Solver to solve a given query grows at least quadratically (if not exponentially).</p>
<p>In the end, I did not get the patience to run the experiment for the targeted 17x12 grid.</p>
<h1 id="improvements">Improvements</h1>
<p>During the development of this idea, the evolution of the code to support it, and the writing of this series, some points of interest regarding future improvements arose.</p>
<p>First, we point out that the support for String theory is very recent in Z3.
Maybe our results could be improved by using a Solver with a more efficient support it.
With the same objective, it could be interesting to have a look at Z3’s internals, and get a better understanding of the practical limitations of using it in our context.</p>
<p>Second, I started wondering if one could come up with efficient strategies to reduce the size of the queries without losing too much accuracy, for example:</p>
<ul>
<li>Take guesses for words coming from lists containing a lot of words;</li>
<li>Clean the wordlists from words that are containing letters with low occurrences in the dictionary. For example, isolated words (farther from the others), defined using the <a href="https://en.wikipedia.org/wiki/Levenshtein_distance" target="_blank">Levenshtein distance</a>;</li>
<li>“Divide and conquer”: isolate portions of the grid that could be solved separately, adapting the wordlists accordingly. For example solving the bottom right corner using wordlists of truncated words, where only their “end” portion is left. This approach would reduce the size of the wordlist too, as verbs with the same “ending” in their conjugation would be mapped to a single entry in the truncated wordlist.</li>
</ul>
<h1 id="last-words">Last words</h1>
<p>All in all, using Solver to generate crossword grids is not the most efficient way to do it, making its use impractical: I will never be able to start my crossword editor startup…</p>
<p>However, this idea allowed us to explore several concepts around the use of SMT Solvers, to help us find solution to algorithmic problems.</p>
<p>In particular, we discussed in details how we can model (encode) crossword grids, to get a program give us, although very slowly, valid solutions!</p>
<p><strong>Hope you found this journey very cool; At least it was from my side.</strong></p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Clearly, this setup is rubbish from the computing power perspective, considering the task at hand. But that’s my laptop, and I love it. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>PamplemousseThis post is part of a series on using SMT Solvers to generate crossword grids.Use SMT Solvers to generate crossword grids (2)2019-11-12T00:00:00+00:002019-11-12T00:00:00+00:00https://blog.xaviermaso.com/2019/11/12/Use%20SMT%20Solvers%20to%20generate%20crossword%20grids%20(2)<p>This post is part of a series on using SMT Solvers to generate crossword grids.</p>
<ul>
<li><a href="/2019/11/11/Use-SMT-Solvers-to-generate-crossword-grids-(1).html" target="_blank">Introduction to SMT, and programming with SMT Solvers</a>;</li>
<li>Definitions and first formulas (<strong>currently reading</strong>);</li>
<li><a href="/2019/11/13/Use-SMT-Solvers-to-generate-crossword-grids-(3).html" target="_blank">Plumbing everything together, complete formula, and results</a>.</li>
</ul>
<p>Thanks <a href="https://twitter.com/geistindersh">@geistindersh</a> for his feedback, and corrections!</p>
<hr />
<p>In a previous blog post we presented SMT Solvers, and mentioned that we can use them to solve problems;
More explicitly, for our problem at hand, we plan to:</p>
<ul>
<li>Construct a formula to represent (or encode) “generic” crossword grids;</li>
<li>Ask the Solver to give us a solution (a set of values);</li>
<li>Interpret this solution (or decode) to obtain a valid, completed, crossword grid.</li>
</ul>
<p><br /></p>
\[\text{grid}
\xrightarrow[]{\text{encode}}
\text{formula}
\xrightarrow[]{\text{solve}}
\text{values}
\xrightarrow[]{\text{decode}}
\text{completed grid}
\\\]
<p>In this post, we will first define some vocabulary and definitions we use while constructing our solution.
Then we are going to state more clearly what we are aiming to achieve.
At the end, we will present some formulas, the building blocks of how we can represent any crossword grid.</p>
<h1 id="crosswords">Crosswords</h1>
<h2 id="definitions">Definitions</h2>
<p>Let’s clarify some concept that we are going to use to describe crossword grids:</p>
<ul>
<li>A <strong>grid</strong> is composed of: definition slots, and (empty) <strong>cells</strong>;</li>
<li>Definitions do not take more room than a cell, and there can be up to two definitions per slot;</li>
<li>Following the definitions, words can be written vertically (from top to bottom), and horizontally (from left to right).</li>
</ul>
<p>Here is a basic example of what an empty crossword grid could look like:</p>
<div style="text-align: center">
<img alt="" src="/assets/images/crosswords/complete_grid_frame.png" style="width: 30%;" />
</div>
<p>The aim of a game of crosswords is to fill a grid with words taken from the dictionary, (preferably) respecting the definitions, with each cell containing one (and only one) letter.</p>
<p>Here is a valid solution of the previous example, using French words (by chauvinism):</p>
<div style="text-align: center">
<img alt="" src="/assets/images/crosswords/complete_grid_frame_completed.png" style="width: 30%;" />
</div>
<h2 id="so-back-to-the-problem-what-are-we-trying-to-do">So, back to the problem: What are we trying to do?</h2>
<p>Let’s not care about definitions.
If we can generate a grid of intersecting words respecting the rules:</p>
<ul>
<li>One and only one letter per cell;</li>
<li>Words are coming from a list of valid words;</li>
<li>Words are written left to right or top to bottom;</li>
</ul>
<p>Then adding definitions is relatively straightforward, using a mapping of words to definitions (<em>a.k.a.</em>, a dictionary).</p>
<h1 id="from-grid-to-formula-back-to-grid">From grid to formula back to grid</h1>
<p>Now that we specified what we talk about, let’s write some formulas;
Gradually, from simple to complex.</p>
<p>As we mentioned earlier, we will use <a href="https://github.com/Z3Prover/z3">Z3</a>, because it supports a String theory <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.
Furthermore, we will make use of the Python bindings for Z3, to make the code friendly to beginners’ eyes.</p>
<h2 id="a-single-valid-word">A single valid word</h2>
<p>Let’s start with the simplest case we can think of: a grid composed of a single word.
It would look like this:</p>
<figure style="text-align: center;">
<img alt="" src="/assets/images/crosswords/single_word.png" style="width: 30%;" />
</figure>
<p>Essentially, we declare a variable named \(horizontal\) and constrain it to take any value from a finite wordlist (our dictionary).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Using a Python REPL
</span><span class="n">python</span><span class="o">></span> <span class="n">horizontal</span> <span class="o">=</span> <span class="n">String</span><span class="p">(</span><span class="s">'horizontal'</span><span class="p">)</span>
<span class="n">python</span><span class="o">></span> <span class="n">formula</span> <span class="o">=</span> <span class="n">Or</span><span class="p">(</span>
<span class="n">horizontal</span> <span class="o">==</span> <span class="n">StringVal</span><span class="p">(</span><span class="s">"abat"</span><span class="p">),</span>
<span class="n">horizontal</span> <span class="o">==</span> <span class="n">StringVal</span><span class="p">(</span><span class="s">"abbe"</span><span class="p">),</span>
<span class="n">horizontal</span> <span class="o">==</span> <span class="n">StringVal</span><span class="p">(</span><span class="s">"abri"</span><span class="p">),</span>
<span class="c1"># [...]
</span><span class="p">)</span>
</code></pre></div></div>
<p>In the above example, we see that \(horizontal\) can take the value <code class="language-plaintext highlighter-rouge">abat</code>, or the value <code class="language-plaintext highlighter-rouge">abbe</code>, or the value <code class="language-plaintext highlighter-rouge">abri</code>, etc.
As you might guess, we truncated the display: dictionaries are get pretty big, so the formula is effectively thousands of lines long.</p>
<p>Note that if we consider \(horizontal\) to be potentially any word from the dictionary, we would end with a massive formula, involving ten of thousands of disjunctions.
The French wordlist we will use counts 200 224 entries!
And that would only get worse for a complete grid, counting between 50 and 100 words (ballpark estimate)…
Intuitively, we want to keep the “size of the queries” we send to the Solver relatively small for them to be able to handle the task.</p>
<p>There is something that we already know about \(horizontal\) that we do not need the help of a Solver for: its <strong>size</strong>!
Indeed, we know from the grid that \(horizontal\) should be of length 4, hence, we don’t need to pass the Solver anything that involves words of size different than that.</p>
<p>In practice, instead of having a single wordlist, we use many: each one of them referencing only words of the same size.
At most, that means we have 36 wordlists for the French language <sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.</p>
<p>And so, asking a Solver:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">python</span><span class="o">></span> <span class="n">solver</span> <span class="o">=</span> <span class="n">Solver</span><span class="p">()</span>
<span class="n">python</span><span class="o">></span> <span class="n">solver</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">formula</span><span class="p">)</span>
<span class="n">python</span><span class="o">></span> <span class="n">solver</span><span class="p">.</span><span class="n">check</span><span class="p">()</span>
<span class="n">sat</span>
<span class="n">python</span><span class="o">></span> <span class="n">solver</span><span class="p">.</span><span class="n">model</span><span class="p">()</span>
<span class="p">[</span><span class="n">horizontal</span> <span class="o">=</span> <span class="s">"abbe"</span><span class="p">]</span>
</code></pre></div></div>
<p>Which we can map back into the grid frame we had:</p>
<figure style="text-align: center;">
<img alt="" src="/assets/images/crosswords/single_word_solved.png" style="width: 30%;" />
</figure>
<p>Yay! We generated a first single word grid!</p>
<h2 id="two-valid-words">Two valid words</h2>
<p>Let’s up the game, because crossword grids would either get boring or very frustrating with a single word to guess.
Consider the following:</p>
<figure style="text-align: center;">
<img alt="" src="/assets/images/crosswords/two_words.png" style="width: 40%;" />
</figure>
<p>In the same spirit than earlier: we now use \(horizontal\) and \(vertical\), two variables that can take any values from respectively two different wordlists, the first being the list of French words of length 4, the latter the list of French words of length 2 (again, we omit possible values for brevity):</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">python</span><span class="o">></span> <span class="n">horizontal</span> <span class="o">=</span> <span class="n">String</span><span class="p">(</span><span class="s">'horizontal'</span><span class="p">)</span>
<span class="n">python</span><span class="o">></span> <span class="n">vertical</span> <span class="o">=</span> <span class="n">String</span><span class="p">(</span><span class="s">'vertical'</span><span class="p">)</span>
<span class="n">python</span><span class="o">></span> <span class="n">formula</span> <span class="o">=</span> <span class="n">And</span><span class="p">(</span>
<span class="n">Or</span><span class="p">(</span>
<span class="n">horizontal</span> <span class="o">==</span> <span class="n">StringVal</span><span class="p">(</span><span class="s">"abat"</span><span class="p">),</span>
<span class="n">horizontal</span> <span class="o">==</span> <span class="n">StringVal</span><span class="p">(</span><span class="s">"abbe"</span><span class="p">),</span>
<span class="n">horizontal</span> <span class="o">==</span> <span class="n">StringVal</span><span class="p">(</span><span class="s">"abri"</span><span class="p">),</span>
<span class="c1"># [...]
</span> <span class="p">),</span>
<span class="n">Or</span><span class="p">(</span>
<span class="n">vertical</span> <span class="o">==</span> <span class="n">StringVal</span><span class="p">(</span><span class="s">"ah"</span><span class="p">),</span>
<span class="n">vertical</span> <span class="o">==</span> <span class="n">StringVal</span><span class="p">(</span><span class="s">"an"</span><span class="p">),</span>
<span class="n">vertical</span> <span class="o">==</span> <span class="n">StringVal</span><span class="p">(</span><span class="s">"ru"</span><span class="p">),</span>
<span class="c1"># [...]
</span> <span class="p">)</span>
<span class="p">)</span>
<span class="n">python</span><span class="o">></span> <span class="n">solver</span> <span class="o">=</span> <span class="n">Solver</span><span class="p">()</span>
<span class="n">python</span><span class="o">></span> <span class="n">solver</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">formula</span><span class="p">)</span>
<span class="n">python</span><span class="o">></span> <span class="n">solver</span><span class="p">.</span><span class="n">check</span><span class="p">()</span>
<span class="n">sat</span>
<span class="n">python</span><span class="o">></span> <span class="n">solver</span><span class="p">.</span><span class="n">model</span><span class="p">()</span>
<span class="p">[</span><span class="n">vertical</span> <span class="o">=</span> <span class="s">"ru"</span><span class="p">,</span> <span class="n">horizontal</span> <span class="o">=</span> <span class="s">"abbe"</span><span class="p">]</span>
</code></pre></div></div>
<p>Again, the Solver was able to find a solution, that we interpret as the following filled grid:</p>
<figure style="text-align: center;">
<img alt="" src="/assets/images/crosswords/two_words_solved.png" style="width: 40%;" />
</figure>
<h2 id="two-valid-intersecting-words">Two valid intersecting words</h2>
<p>Disconnected words do not represent very well the content of real world crossword grids.
Next step is to start assembling:</p>
<figure style="text-align: center;">
<img alt="" src="/assets/images/crosswords/two_crossing_words.png" style="width: 30%;" />
</figure>
<p>From the formula perspective, we start exactly as earlier, and then we add a constraint saying that the character at index \(2\) of \(horizontal\) must equal the character at index \(0\) of \(vertical\).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">python</span><span class="o">></span> <span class="n">horizontal</span> <span class="o">=</span> <span class="n">String</span><span class="p">(</span><span class="s">'horizontal'</span><span class="p">)</span>
<span class="n">python</span><span class="o">></span> <span class="n">vertical</span> <span class="o">=</span> <span class="n">String</span><span class="p">(</span><span class="s">'vertical'</span><span class="p">)</span>
<span class="n">python</span><span class="o">></span> <span class="n">formula</span> <span class="o">=</span> <span class="n">And</span><span class="p">(</span>
<span class="n">Or</span><span class="p">(</span>
<span class="n">horizontal</span> <span class="o">==</span> <span class="n">StringVal</span><span class="p">(</span><span class="s">"abat"</span><span class="p">),</span>
<span class="n">horizontal</span> <span class="o">==</span> <span class="n">StringVal</span><span class="p">(</span><span class="s">"abbe"</span><span class="p">),</span>
<span class="n">horizontal</span> <span class="o">==</span> <span class="n">StringVal</span><span class="p">(</span><span class="s">"abri"</span><span class="p">),</span>
<span class="c1"># [...]
</span> <span class="p">),</span>
<span class="n">Or</span><span class="p">(</span>
<span class="n">vertical</span> <span class="o">==</span> <span class="n">StringVal</span><span class="p">(</span><span class="s">"ah"</span><span class="p">),</span>
<span class="n">vertical</span> <span class="o">==</span> <span class="n">StringVal</span><span class="p">(</span><span class="s">"an"</span><span class="p">),</span>
<span class="n">vertical</span> <span class="o">==</span> <span class="n">StringVal</span><span class="p">(</span><span class="s">"ru"</span><span class="p">),</span>
<span class="c1"># [...]
</span> <span class="p">),</span>
<span class="n">horizontal</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">==</span> <span class="n">vertical</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="p">)</span>
<span class="n">python</span><span class="o">></span> <span class="n">solver</span> <span class="o">=</span> <span class="n">Solver</span><span class="p">()</span>
<span class="n">python</span><span class="o">></span> <span class="n">solver</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">formula</span><span class="p">)</span>
<span class="n">python</span><span class="o">></span> <span class="n">solver</span><span class="p">.</span><span class="n">check</span><span class="p">()</span>
<span class="n">sat</span>
<span class="n">python</span><span class="o">></span> <span class="n">solver</span><span class="p">.</span><span class="n">model</span><span class="p">()</span>
<span class="p">[</span><span class="n">vertical</span> <span class="o">=</span> <span class="s">"ru"</span><span class="p">,</span> <span class="n">horizontal</span> <span class="o">=</span> <span class="s">"abri"</span><span class="p">]</span>
</code></pre></div></div>
<p>And the given solution is translated to the following grid:</p>
<figure style="text-align: center;">
<img alt="" src="/assets/images/crosswords/two_crossing_words_solved.png" style="width: 30%;" />
</figure>
<p><em>Note that the result is (as in the previous examples) one of many valid combinations: there are other four letter words where the character at index \(2\) is the same as the character at index \(0\) of a well chosen two letter word.</em></p>
<p>Boom, we now have everything we need to represent complete crossword grids!</p>
<p><strong>In the next (and last) blog post, we will present the complete formula we build, and the results we obtain trying to generate grids this way.</strong></p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Under this theory, variables take their values among all the possible sequences of characters, and operations can be substring comparison, concatenation, etc. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>According to <a href="https://en.wikipedia.org/wiki/Longest_word_in_French" target="blank">the relevant Wikipedia page</a>, the longest French word is “hippopotomonstrosesquippedaliophobie”, counting 36 letters! <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>PamplemousseThis post is part of a series on using SMT Solvers to generate crossword grids.Use SMT Solvers to generate crossword grids (1)2019-11-11T00:00:00+00:002019-11-11T00:00:00+00:00https://blog.xaviermaso.com/2019/11/11/Use%20SMT%20Solvers%20to%20generate%20crossword%20grids%20(1)<p>This post is part of a series on using SMT Solvers to generate crossword grids.</p>
<ul>
<li>Introduction to SMT, and programming with SMT Solvers (<strong>currently reading</strong>);</li>
<li><a href="/2019/11/12/Use-SMT-Solvers-to-generate-crossword-grids-(2).html" target="_blank">Definitions and first formulas</a>;</li>
<li><a href="/2019/11/13/Use-SMT-Solvers-to-generate-crossword-grids-(3).html" target="_blank">Plumbing everything together, complete formula, and results</a>.</li>
</ul>
<p>Thanks <a href="https://twitter.com/geistindersh">@geistindersh</a> for his feedback, and corrections!</p>
<hr />
<p>SMT solvers are tools that are used in several fields. By modeling complex problems into logical formulas, and then leveraging the power of a Solver hoping to find values satisfying these formulas, it is possible to obtain solutions for the targeted problems.</p>
<p>When I first encountered this approach in a class on program analysis <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>, the whole concept of encoding problems with mathematics was not very straightforward, and required a bit of mental gymnastics.</p>
<p>However, after some practice, I became more accustomed to this idea, and recently had the opportunity to exercise and study these approaches more in depth.
Now that it feels familiar, I believe it’s the perfect time to write what I wish I could have read earlier.</p>
<p>Hence, in the following blog posts, we will explore the use of SMT Solvers in a recreational way, making use of them to solve an absolutely unimportant problem: <strong>generation of crossword grids</strong>!</p>
<h1 id="introduction">Introduction</h1>
<h2 id="smt-satisfiability-modulo-theory">SMT: Satisfiability Modulo Theory</h2>
<p>What is SMT by the way?</p>
<blockquote>
<p>SMT problem is a decision problem for logical formulas with respect to combinations of background theories
<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></p>
</blockquote>
<p>Huh? Let’s break that down:</p>
<ul>
<li>A <strong>decision problem</strong> is a question that can be answered by <code class="language-plaintext highlighter-rouge">Yes</code>, <code class="language-plaintext highlighter-rouge">No</code>, or <code class="language-plaintext highlighter-rouge">Don't Know</code> <sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>;</li>
<li><strong>Logical formulas</strong> are “mathematical” formulas using variables and operations; Such formulas can be evaluated to <code class="language-plaintext highlighter-rouge">True</code>, or <code class="language-plaintext highlighter-rouge">False</code> (depending on the values that the variables take, and the considered operations);</li>
<li><strong>Background theories</strong> can be thought as the “universe in which our formula lives”. Some examples:
<ul>
<li>Booleans: variables can take the values <code class="language-plaintext highlighter-rouge">True</code> or <code class="language-plaintext highlighter-rouge">False</code>, and operations being \(\land, \lor, \lnot\) (respectively logical and, or, not) …</li>
<li>Integers: variables can take integers values \(\{ -\infty, ..., -2, -1, 0, 1, 2, 3, ... \infty \}\), and operations being arithmetic operations: \(+, -, *, /, \%, ...\)</li>
<li>Strings: variables are sequences of characters, and operations can be substring comparison, concatenation, …</li>
</ul>
</li>
</ul>
<p>In our context, we will consider the latest, because crosswords involve finding words respecting certain constraints.</p>
<p>But before explaining how we will use this particular theory, we need to explicit one can generally use constraints Solvers to help solving problems.</p>
<h2 id="solvers">Solvers</h2>
<p>SMT Solvers (also known as constraints Solvers, or theorem Provers) are computer programs, that take formulas expressed under a specific theory as input, and answers one of the following:</p>
<ul>
<li><strong><code class="language-plaintext highlighter-rouge">SAT</code></strong>isfiable, if there exists a set of values for (a valuation of) the variables that make the given formula <code class="language-plaintext highlighter-rouge">True</code>;</li>
<li><strong><code class="language-plaintext highlighter-rouge">UNSAT</code></strong>isfiable, if there are no values for which the formula is <code class="language-plaintext highlighter-rouge">True</code>; In other terms, the formula will always evaluate to <code class="language-plaintext highlighter-rouge">False</code> no matter how hard we try;</li>
<li><strong><code class="language-plaintext highlighter-rouge">Don't Know</code></strong>, if the Solver did not manage to give one of the previous result under a specific time bound.</li>
</ul>
<p>We will assimilate SMT Solvers as magical <sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup> black boxes, their inner working remaining mysterious.
We can feed formulas to them, and expect one of the above response.</p>
\[formula
\xrightarrow[]{\text{SMT Solver}}
\begin{cases}
\text{SAT} \\
\text{UNSAT} \\
\text{Don't Know}
\end{cases}\]
<p>And if the formula is <code class="language-plaintext highlighter-rouge">SAT</code>, the Solver will return a proof alongside its answer: a set of values for the variables appearing in the formula.
To verify the satisfiability using the proof, we evaluate the formula with the given values, and ensures that the computed result is <code class="language-plaintext highlighter-rouge">True</code>.</p>
<h4 id="z3-the-theorem-prover">Z3, the theorem Prover</h4>
<p>Z3 is an open source SMT Solver developed by Microsoft Research, and <a href="https://github.com/Z3Prover/z3">available on GitHub</a>.</p>
<p>Well-know, well-documented, and relatively famous, it even comes with bindings for many popular programming languages, and notably Python (in other words, Python code can interact with Z3).</p>
<p>It is for this practical reasons that we will generate our crossword grids using Z3, but note that it could be done with any other Solver supporting a theory for Strings.</p>
<h2 id="constraint-programming">Constraint programming</h2>
<p>Now that we know what SMT Solvers are, we can make use of them to create programs to help us find solutions to our problems.</p>
<p>However, that is where the conceptual difficulty arise:
As educated humans, we have been taught and trained to solve problems that we are given.</p>
<p>Yet, here, we want to avoid doing so: Instead, we want to <strong>let a Solver do the hard work</strong> of “understanding” the problem and finding a solution to it.
In other words: We <strong>don’t try to construct an algorithm to solve the problem</strong>, but rather, we express the constraints of the problem in Mathematical terms, with logical expressions using variables taking values in an adequate theory.</p>
<h3 id="example">Example</h3>
<p>Let’s consider the following puzzle:</p>
<blockquote>
<p>Find \(x\) a String such that: \(x\)’s first letter is an ‘H’ and its last letter is a ‘S’.</p>
</blockquote>
<p>Natural “bad” approach, thinking out loud:</p>
<blockquote>
<p>Educated you: I need to construct \(x\) a string. <br />
Educated you: First letter is an ‘H’, so \(x\) should look like ‘H…’ . <br />
Educated you: Then, last letter is an ‘S’.
Educated you: So \(x\) is anything of the form ‘H…S’ . <br />
Educated you: … *thinking really hard* … <br />
Educated you: Hey! ‘HS’ works!</p>
</blockquote>
<p>“Good” approach, from the constraint programming point of view, thinking out loud again:</p>
<blockquote>
<p>You: I have the following constraints: \(x\) is a String, \(x\) starts with an ‘H’, and \(x\) ends with an ‘S’. <br />
You: Hey Solver! Can you give me a value for the following formula: \(x \text{ is a String } \land x \text{ starts with an 'H' } \land x \text{ ends with an 'S'}\)? <br />
Solver: Let me see… <br />
Solver: … *thinking* … <br />
Solver: It’s <code class="language-plaintext highlighter-rouge">SAT</code>, and \(x = \text{'HYPOTHALAMUS'}\) works! <br />
You: Thank you Solver. <br />
Solver: No worries.</p>
</blockquote>
<p>Solvers are very powerful:
We can use them to avoid dealing with tedious logic and building complicated algorithms.</p>
<p><strong>In the next blog post, we are going to explicit how we model the crossword grids, to get a Solver to help generate them.</strong></p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p><a href="https://www.u-bordeaux.fr/formation/2018/PRMA_68/informatique/enseignement/FRUAI0333298FCOEN_7296/verification-de-logiciels" target="blank">Software Verification</a>, of the <a href="https://mastercsi.labri.fr/" target="blank">CSI Masters</a> . <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Source: <a href="https://en.wikipedia.org/wiki/Satisfiability_Modulo_Theories" target="blank">https://en.wikipedia.org/wiki/Satisfiability_Modulo_Theories</a> . <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>One of the biggest upset in Computer Science is that not all problems can be “solved” by algorithms: <a href="https://en.wikipedia.org/wiki/Undecidable_problem" target="blank">https://en.wikipedia.org/wiki/Undecidable_problem</a> , but we don’t know which ones for sure. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>Because their implementation is way out of the scope of this post, it can be easier to imagine them as transcendental entities, or at least being beyond our comprehension. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>PamplemousseThis post is part of a series on using SMT Solvers to generate crossword grids.NixOS on a Dell XPS15 95602019-02-28T00:00:00+00:002019-02-28T00:00:00+00:00https://blog.xaviermaso.com/2019/02/28/NixOS%20on%20a%20Dell%20XPS15%209560<h2 id="foreword">Foreword</h2>
<p>From a several months now, I have been using NixOS, as much for personal stuff than for work.</p>
<p>In particular, its declarative and reproducible system configuration allows me to have a GitHub repository, <a href="https://github.com/pamplemousse/laptop">Pamplemousse/laptop</a>, “representing” what the software setup of my machine is.</p>
<p>The idea with this is being able to automate the installation of my computer, easing the pain of installing and setting up a new machine.</p>
<p>I am not at all an expert in system administration, and this “project” is far from being mature, but this had done the job so far.</p>
<h2 id="until">Until…</h2>
<p>One day, I received a Dell XPS15 9560 which I needed to setup, thus I wanted to install NixOS on it.</p>
<p>It was painful ; but I realized I was not the only one running into trouble installing Linux on this machine: <a href="https://medium.com/@kemra102/linux-on-the-dell-xps-15-919e6d472aa3">see this comparative result on the installation of several distributions</a>.</p>
<p>Among the bit that caused me much trouble:</p>
<ul>
<li>Move from <code class="language-plaintext highlighter-rouge">MBR/BIOS</code> to <code class="language-plaintext highlighter-rouge">GPT/UEFI</code> <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>;</li>
<li>LUKS encryption;</li>
<li>Machine freezing, very likely due to the <a href="https://nouveau.freedesktop.org/wiki/">nouveau</a>, and <a href="https://github.com/Bumblebee-Project/bbswitch">bbswitch</a> modules for Nvidia graphic card.</li>
</ul>
<p>Miraculously, I discovered two blog posts that literally saved my day (or shall I say my week):</p>
<ul>
<li><a href="http://grahamc.com/blog/nixos-on-dell-9560">NixOS on a Dell 9560</a> (by Graham Christensen)</li>
<li><a href="https://jtanguy.me/blog/installing-nixos-on-a-xps-9560/">Installing NixOS on a XPS 9560</a> (by Julien Tanguy)</li>
</ul>
<p>Thus, this post stands on their shoulders and comes essentially as a wrap up of the work they have provided.</p>
<h2 id="full-disk-encryption">Full disk encryption</h2>
<blockquote>
<p>I don’t think we can make these devices harder to lose; that’s a human problem and not a technological one. But we can make the loss just cost money, not privacy.
<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></p>
</blockquote>
<p>Solution: use <a href="https://guardianproject.info/code/luks/">Luks</a> to encrypt partitions ; Here is what the partitioning looks like.</p>
<p><img alt="partitioning of the disk" src="/assets/images/2019-02-28-NixOS%20on%20a%20Dell%20XPS15%209560/partitioning.png" /></p>
<ol>
<li><code class="language-plaintext highlighter-rouge">/dev/sda1</code>: BIOS</li>
<li><code class="language-plaintext highlighter-rouge">/dev/sda2</code>: EFI</li>
<li><code class="language-plaintext highlighter-rouge">/dev/sda3</code>, <code class="language-plaintext highlighter-rouge">/dev/mapper/cryptkey</code>: LUKS key</li>
<li><code class="language-plaintext highlighter-rouge">/dev/sda4</code>, <code class="language-plaintext highlighter-rouge">/dev/mapper/cryptswap</code>: swap partition</li>
<li><code class="language-plaintext highlighter-rouge">/dev/sda5</code>, <code class="language-plaintext highlighter-rouge">/dev/mapper/cryptroot</code>: root filesystem</li>
</ol>
<h4 id="why-this-partitioning">Why this partitioning?</h4>
<p>We want the swap partition to be encrypted not to leak the RAM content on hibernation.</p>
<p>For user-friendliness, we create a partition <code class="language-plaintext highlighter-rouge">/dev/mapper/cryptkey</code> that will be used as a keyfile to unlock both the swap (<code class="language-plaintext highlighter-rouge">/dev/mapper/cryptswap</code>) and the root (<code class="language-plaintext highlighter-rouge">/dev/mapper/cryptroot</code>) partitions.
This keyfile will then be encrypted using a user passphrase.</p>
<p>Hence, a passphrase will be asked only once, to decrypt the keyfile, which will then be used to decrypt the swap and root partitions.</p>
<h4 id="and-if-devmappercryptkey-gets-corrupted">And if <code class="language-plaintext highlighter-rouge">/dev/mapper/cryptkey</code> gets corrupted?</h4>
<p>As is, that would mean that <strong>the swap and root partitions would be lost</strong>.
For the latter one, that is very bad: all data on the root partition (system and user data) would then be inaccessible.</p>
<p>One solution is to create a random passphrase (not meant to be remembered by yourself, that you store securely store elsewhere), and then allow it to decrypt the root partition.</p>
<h4 id="here-is-how-we-proceeded">Here is how we proceeded:</h4>
<p><em>Note that some space is left at the beginning of the disk for the GPT to take place.</em> <sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># partitioning</span>
<span class="nv">DISK</span><span class="o">=</span>/dev/sda
sgdisk <span class="nt">-og</span> <span class="s2">"</span><span class="nv">$DISK</span><span class="s2">"</span>
sgdisk <span class="nt">-n</span> 1:2048:4095 <span class="nt">-c</span> 1:<span class="s2">"BIOS boot partition"</span> <span class="nt">-t</span> 1:ef02 <span class="s2">"</span><span class="nv">$DISK</span><span class="s2">"</span>
sgdisk <span class="nt">-n</span> 2:0:+550MiB <span class="nt">-c</span> 2:<span class="s2">"EFI system partition"</span> <span class="nt">-t</span> 2:ef00 <span class="s2">"</span><span class="nv">$DISK</span><span class="s2">"</span>
sgdisk <span class="nt">-n</span> 3:0:+3MiB <span class="nt">-c</span> 3:<span class="s2">"cryptsetup luks key"</span> <span class="nt">-t</span> 3:8300 <span class="s2">"</span><span class="nv">$DISK</span><span class="s2">"</span>
sgdisk <span class="nt">-n</span> 4:0:+<span class="s2">"</span><span class="k">${</span><span class="nv">RAM</span><span class="k">}</span><span class="s2">"</span>GiB <span class="nt">-c</span> 4:<span class="s2">"swap space (hibernation)"</span> <span class="nt">-t</span> 4:8300 <span class="s2">"</span><span class="nv">$DISK</span><span class="s2">"</span>
sgdisk <span class="nt">-n</span> 5:0:<span class="s2">"</span><span class="si">$(</span>sgdisk <span class="nt">-E</span> <span class="s2">"</span><span class="nv">$DISK</span><span class="s2">"</span><span class="si">)</span><span class="s2">"</span> <span class="nt">-c</span> 5:<span class="s2">"root filesystem"</span> <span class="nt">-t</span> 5:8300 <span class="s2">"</span><span class="nv">$DISK</span><span class="s2">"</span>
<span class="c"># encrypting</span>
cryptsetup luksFormat <span class="s2">"</span><span class="k">${</span><span class="nv">DISK</span><span class="k">}</span><span class="s2">3"</span>
cryptsetup luksOpen <span class="s2">"</span><span class="k">${</span><span class="nv">DISK</span><span class="k">}</span><span class="s2">3"</span> cryptkey
<span class="nb">dd </span><span class="k">if</span><span class="o">=</span>/dev/random <span class="nv">of</span><span class="o">=</span>/dev/mapper/cryptkey <span class="nv">bs</span><span class="o">=</span>1024 <span class="nv">count</span><span class="o">=</span>14000
cryptsetup luksFormat <span class="nt">--key-file</span><span class="o">=</span>/dev/mapper/cryptkey <span class="s2">"</span><span class="k">${</span><span class="nv">DISK</span><span class="k">}</span><span class="s2">4"</span>
cryptsetup luksFormat <span class="s2">"</span><span class="k">${</span><span class="nv">DISK</span><span class="k">}</span><span class="s2">5"</span>
cryptsetup luksAddKey <span class="s2">"</span><span class="k">${</span><span class="nv">DISK</span><span class="k">}</span><span class="s2">5"</span> /dev/mapper/cryptkey
<span class="c"># labeling, mounting and generating the base config</span>
cryptsetup luksOpen <span class="nt">--key-file</span><span class="o">=</span>/dev/mapper/cryptkey <span class="s2">"</span><span class="k">${</span><span class="nv">DISK</span><span class="k">}</span><span class="s2">4"</span> cryptswap
mkswap <span class="nt">-L</span> swap /dev/mapper/cryptswap
swapon /dev/disk/by-label/swap
cryptsetup luksOpen <span class="nt">--key-file</span><span class="o">=</span>/dev/mapper/cryptkey <span class="s2">"</span><span class="k">${</span><span class="nv">DISK</span><span class="k">}</span><span class="s2">5"</span> cryptroot
mkfs.ext4 <span class="nt">-L</span> nixos /dev/mapper/cryptroot
mount /dev/disk/by-label/nixos /mnt
mkfs.vfat <span class="nt">-n</span> efi <span class="s2">"</span><span class="k">${</span><span class="nv">DISK</span><span class="k">}</span><span class="s2">2"</span>
<span class="nb">mkdir</span> /mnt/boot
mount /dev/disk/by-label/efi /mnt/boot
nixos-generate-config <span class="nt">--root</span> /mnt
</code></pre></div></div>
<h4 id="and-what-the-nixos-configuration-looks-like">And what the NixOS configuration looks like:</h4>
<p>We created an extra file called <code class="language-plaintext highlighter-rouge">/etc/nixos/luks-devices-configuration.nix</code>, containing the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
boot.initrd.luks.devices = {
cryptkey = {
device = "/dev/sda3";
};
cryptroot = {
device = "/dev/sda5";
keyFile = "/dev/mapper/cryptkey";
};
cryptswap = {
device = "/dev/sda4";
keyFile = "/dev/mapper/cryptkey";
};
};
}
</code></pre></div></div>
<p>And then, included it in the general <code class="language-plaintext highlighter-rouge">/etc/nixos/configuration.nix</code> file:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{ config, pkgs, ... }:
{
imports =
[ # Include the results of the hardware scan.
./hardware-configuration.nix
./luks-devices-configuration.nix
];
# ...
}
</code></pre></div></div>
<h2 id="machine-freezing">Machine freezing</h2>
<p>As mentioned earlier in the post, I experienced many freezes of the laptop, and adopted the solution proposed in <a href="https://jtanguy.cleverapps.io/installing-nixos-on-a-xps-9560/#graphics">jtanguy.cleverapps.io/installing-nixos-on-a-xps-9560</a> by adding the following to my <code class="language-plaintext highlighter-rouge">/etc/nixos/configuration.nix</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>boot.blacklistedKernelModules = [ "nouveau" "bbswitch" ];
boot.extraModulePackages = [ pkgs.linuxPackages.nvidia_x11 ];
hardware.bumblebee.enable = true;
hardware.bumblebee.pmMethod = "none";
</code></pre></div></div>
<h2 id="conclusion">Conclusion</h2>
<p>My GitHub repository <a href="https://github.com/Pamplemousse/laptop">Pamplemousse/laptop</a> should contain the most up-to-date state of my configuration.</p>
<p>However, it does not concern specifically the Dell XPS15 9560, and not all that I have presented here is merged into the master branch (in particular the Kernel modules blacklisting or the <code class="language-plaintext highlighter-rouge">bumblebee</code> configuration).
Despite, missing pieces can be found in the <a href="https://github.com/Pamplemousse/laptop/tree/test">test branch</a> of the same repository.</p>
<h4 id="warning--need-to-be-improved">Warning / Need to be improved</h4>
<p><strong>It is worth nothing as I do not run these script as is.</strong></p>
<p>So far, my work on this repository is actually more about having handful “templates” and/or bits of configuration to speed-up my laptop’s installation rather than having “production ready” autonomous scripts that have been thoroughly tested.</p>
<p>Some areas of improvements that are worth mentioning:</p>
<ul>
<li>Install script, pay attention to what is generated in <code class="language-plaintext highlighter-rouge">/etc/nixos/hardware-configuration.nix</code> and what might overwrite stuff from the <code class="language-plaintext highlighter-rouge">/etc/nixos/luks-devices-configuration.nix</code> during the configuration generation;</li>
<li><code class="language-plaintext highlighter-rouge">/boot</code> being located on <code class="language-plaintext highlighter-rouge">/dev/sda2</code> is not encrypted.</li>
</ul>
<p><strong>Aside from that, I am happy now that my laptop is functional! (Pun intended.)</strong></p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>MBR, BIOS, GPT, UEFI definitions: <a href="https://wiki.manjaro.org/index.php?title=Some_basics_of_MBR_v/s_GPT_and_BIOS_v/s_UEFI" target="blank">https://wiki.manjaro.org/index.php?title=Some_basics_of_MBR_v/s_GPT_and_BIOS_v/s_UEFI</a> . <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Source: <a href="https://www.wired.com/2006/01/big-risks-come-in-small-packages/" target="blank">https://www.wired.com/2006/01/big-risks-come-in-small-packages/</a> . <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>GUID Partition Table: <a href="https://en.wikipedia.org/wiki/GUID_Partition_Table" target="blank">https://en.wikipedia.org/wiki/GUID_Partition_Table</a> . <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>PamplemousseForewordScanning “modern” web applications with OWASP ZAP2018-10-01T00:00:00+00:002018-10-01T00:00:00+00:00https://blog.xaviermaso.com/2018/10/01/Scanning%20modern%20web%20applications%20with%20OWASP%20ZAP<p>During the summer of 2018, I was an intern in the <a href="https://wiki.mozilla.org/Security/FirefoxOperations">FoxSec</a> team at Mozilla, where I contributed to <a href="https://www.zaproxy.org/">ZAP (for Zed Attack proxy)</a>, an open-source web application security scanner.</p>
<p>The subject of my internship was <strong>Scanning modern web applications with OWASP ZAP</strong>, and the report I wrote about it is available online at <a href="https://www.xaviermaso.com/internship_report_2018.pdf" target="_blank">xaviermaso.com/internship_report_2018.pdf</a> .</p>
<p>I do not intend to delve too much into the details of what have been implemented in this post, especially because the report should contain all the informations needed for whom is interested by the subject.</p>
<p>However, this is a good opportunity for me to talk a bit more loosely about what is inside.
In a sense, this post is more of an “Abstract” (if you are from the academia) or a “TL;DR” (if you are from Reddit) to present the key ideas and motivations for those who still hesitate to read twenty pages of my lame prose.</p>
<p>Thus, let’s follow the plan of the report:</p>
<ol>
<li><a href="#zap-and-modern-web-applications">ZAP and “modern” web applications</a></li>
<li><a href="#the-frontendscanner">The FrontEndScanner</a></li>
<li><a href="#the-front-end-tracker">The front-end-tracker</a></li>
</ol>
<h2 id="zap-and-modern-web-applications">ZAP and “modern” web applications</h2>
<h3 id="some-zap-concepts">Some ZAP concepts</h3>
<p>ZAP is a web proxy: sitting between a web browser and the server serving the application that one wishes to test, it can monitor web traffic between the two entities, interrupt it, modify it and even record and replay it.</p>
<p>Because of that, it is a great tool to perform security testing of an application; either by passively looking for vulnerabilities in the requests and responses (such as missing headers, plaintext secrets, or else), or by actively crafting requests or tampering with the content aiming to trigger interesting behavior on the server, or in the browser (in the case of XSS vulnerabilities for example).</p>
<p>One of the most interesting feature of ZAP is for the user to be able to write their own scripts that will run under specific circumstances, to perform custom tasks.
For examples, here are a couple of scripts that have been written by the community and published under <a href="https://github.com/zaproxy/community-scripts">github.com/zaproxy/community-scripts</a> .</p>
<h3 id="modern-web-applications">“Modern” web applications</h3>
<p>Arguably, we called “modern” web applications the ones relying heavily on JavaScript.</p>
<p>In nowadays web, almost every page contains JavaScript to be executed by the client (aka, the web browser).
This is even truer, especially with the rise of JavaScript framework such as <a href="https://reactjs.org/">React</a>, <a href="https://angularjs.org/">AngularJS</a>, <a href="https://vuejs.org/">Vue.js</a>, <a href="https://emberjs.com/">Ember.js</a>, and so on, encouraging developers to embed a whole applications into single pages, the so-called <a href="https://en.wikipedia.org/wiki/Single-page_application" target="_blank">“SPA”</a>.</p>
<p>The problem is that it somewhat breaks the approach taken by ZAP: by only scanning HTTP responses, our favorite proxy statically analyze the transferred content, without taking into consideration the transformations that might happen when the browser will interpret the embedded JavaScript.</p>
<p>For example, let’s say that to detect XSS or SQLi, you have a script to look for <code class="language-plaintext highlighter-rouge"><input></code> fields in a webpage.
If the <code class="language-plaintext highlighter-rouge"><input></code> is present in the HTML of the page, ZAP will be able to find it.
However, if there is a piece of JavaScript that modifies the DOM to add such an element, such as:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">window</span><span class="p">.</span><span class="nx">onload</span><span class="p">(</span><span class="kd">function</span> <span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">inputElement</span> <span class="o">=</span> <span class="nb">document</span><span class="p">.</span><span class="nx">createElement</span><span class="p">(</span><span class="dl">'</span><span class="s1">input</span><span class="dl">'</span><span class="p">);</span>
<span class="nx">inputElement</span><span class="p">.</span><span class="nx">type</span> <span class="o">=</span> <span class="dl">'</span><span class="s1">text</span><span class="dl">'</span><span class="p">;</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">body</span><span class="p">.</span><span class="nx">appendChild</span><span class="p">(</span><span class="nx">inputElement</span><span class="p">);</span>
<span class="p">});</span>
</code></pre></div></div>
<p>ZAP would be unable to understand the implications and detect the fact that a potential source of vulnerability will be added to the page “at run time” <em>i.e</em> when the browser will interpret the JavaScript.</p>
<p>Through this basic example, one can see the limits of the static analysis of an HTTP response: it is difficult to be confident in the fact that an application is vulnerability free just by looking at its source code, as complex chain of events in the browser (user or network interactions for example) can lead to the modification of the scanned content, making the process irrelevant.</p>
<p>To answer this problematic, we broke our solution down into two “components”:</p>
<ul>
<li>the <a href="#the-frontendscanner">FrontEndScanner</a> add-on,</li>
<li>the <a href="#the-front-end-tracker">front-end-tracker</a> JavaScript library</li>
</ul>
<h2 id="the-frontendscanner">The FrontEndScanner</h2>
<blockquote>
<p>Add-ons add additional functionality to ZAP. They have full access to all of the ZAP internals, and so can provide very powerful new features.
<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>
</blockquote>
<p>We wrote the FrontEndScanner add-on to provide a way for ZAP users to look for front-end vulnerabilities by executing scripts where they can make sense out of the dynamic nature of JavaScript: in the web browser, alongside the application that is being tested.</p>
<p>When turned on, our add-on will tamper with all HTTP responses coming back from the server to inject a piece of JavaScript code into the tested application, directly into the <code class="language-plaintext highlighter-rouge"><head></code> of the HTML document.
By doing so, we ensure that our code will be run before anything else (especially before front-end frameworks and libraries) when loaded by the web browser.
This is really important as we want to keep track of modifications to the DOM and to the <a href="https://developer.mozilla.org/en-US/docs/Web/API" target="_blank">WebAPI</a> that those external scripts might be doing.</p>
<p>The piece of JavaScript code is made of the following:</p>
<ul>
<li>the <code class="language-plaintext highlighter-rouge">FrontEndScanner</code> object, itself containing:
<ul>
<li>ZAP constants to help scripts create alerts,</li>
<li>the “mailbox”: a “publish-subscribe” mechanism to help ZAP users’ scripts react to events happening in the browser (such as user interactions, <a href="!mdn storage">Storage</a> accesses, etc.),</li>
<li>a helper function to report findings back to ZAP</li>
</ul>
</li>
<li>a list of user defined scripts, for which each of them will be encapsulated in a function</li>
</ul>
<p>When in the browser, these functions are executed, taking the <code class="language-plaintext highlighter-rouge">FrontEndScanner</code> object as parameter.
Thus, user scripts can make use of the content defined above to perform meaningful security checks and raise alerts in ZAP when finding vulnerabilities.</p>
<h2 id="the-front-end-tracker">The front-end-tracker</h2>
<p>As the <a href="https://developer.mozilla.org/en-US/docs/Web/API" target="_blank">WebAPI</a> is mostly intended for application developers rather than for security testers, it does not expose all the features that we would hope to have for debugging and testing a web page.</p>
<p>To answer this lack of features considering our use case, we wrote the front-end-tracker, a JavaScript library meant to provide an extension of the API available in the browser that would be more pertinent for one wishing to write security checks.</p>
<h4 id="how-does-it-work">How does it work?</h4>
<p>When loaded into a web page, the front-end-tracker wraps behaviors that we are interested in tracking into our own functions that will perform some kind of reporting before running the expected code.</p>
<p>Here is a simplified example of what it could look like:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">oldGetItem</span> <span class="o">=</span> <span class="nx">Storage</span><span class="p">.</span><span class="nx">prototype</span><span class="p">.</span><span class="nx">getItem</span><span class="p">;</span>
<span class="nx">Storage</span><span class="p">.</span><span class="nx">prototype</span><span class="p">.</span><span class="nx">getItem</span> <span class="o">=</span> <span class="kd">function</span> <span class="p">(...</span><span class="nx">args</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">mailbox</span><span class="p">.</span><span class="nx">publish</span><span class="p">(</span>
<span class="dl">'</span><span class="s1">storage</span><span class="dl">'</span><span class="p">,</span>
<span class="p">{</span><span class="na">action</span><span class="p">:</span> <span class="dl">'</span><span class="s1">get</span><span class="dl">'</span><span class="p">,</span> <span class="na">args</span><span class="p">:</span> <span class="nx">args</span><span class="p">}</span>
<span class="p">);</span>
<span class="k">return</span> <span class="nx">oldGetItem</span><span class="p">(</span><span class="nx">args</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>We call such a mechanism a “hook”, as it hooks a custom function to a standard behavior.
So far, the following hooks have been implemented:</p>
<ul>
<li><a href="https://developer.mozilla.org/en-US/docs/Web/Events">DOM events</a>: catch when a user interacts with a webpage (by clicking, scrolling, hovering, or else), when resources are loaded, etc. <sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>,</li>
<li><a href="https://developer.mozilla.org/en-US/docs/Web/API/Storage">Storage</a>: catch when values in the storages in the browser are read, written or removed</li>
</ul>
<p>If the front-end-tracker ever runs after one of this behavior get triggered in the page, this one would not be reported.
Hence, if we want to monitor everything that we are interested in in a web page, the front-end-tracker needs to be the very first thing to be interpreted here.</p>
<p>Another key concept of the front-end-tracker is the <strong>mailbox</strong>: a topic based publish-subscribe object, on which the functions from our hooks publish to, and for which scripts in the page can subscribe to.</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// example of subscription to log messages related to 'dom-events'</span>
<span class="kd">const</span> <span class="nx">topic</span> <span class="o">=</span> <span class="dl">'</span><span class="s1">dom-events</span><span class="dl">'</span><span class="p">;</span>
<span class="nx">mailbox</span><span class="p">.</span><span class="nx">subscribe</span><span class="p">(</span><span class="nx">topic</span><span class="p">,</span> <span class="p">(</span><span class="nx">_</span><span class="p">,</span> <span class="nx">data</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">data</span><span class="p">);</span>
<span class="p">});</span>
</code></pre></div></div>
<p>Written to be a standalone component, the front-end-tracker can be used to help debugging any application.
That is why it has been released on <a href="https://npmjs.com">npm</a> under <a href="https://www.npmjs.com/package/@zaproxy/front-end-tracker" target="_blank">@zaproxy/front-end-tracker</a>.</p>
<h2 id="conclusion">Conclusion</h2>
<p>After these twelve weeks of internship, we ended up having an interesting proof-of-concept of our approach and tools for scanning modern web applications.</p>
<p>Not only we implemented the basics FrontEndScanner add-on and the front-end-tracker it relies on, but we wrote the very first client-side passive script to detect when JWT tokens are written in an application <sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.</p>
<p>Unfortunately, all the work presented here has not yet been released: indeed, the FrontEndScanner still lacks features and documentation to be made available on ZAP’s marketplace: see <a href="https://github.com/zaproxy/zaproxy/issues/4939" target="_blank">issue #4939</a> for more details.</p>
<p>On the other hand, the front-end-tracker is already published on npm, but could become even more useful with a couple more hooks added to it, such as ones for <a href="https://developer.mozilla.org/en-US/docs/Web/API/MutationObserver">DOM mutations</a>, <a href="https://developer.mozilla.org/en-US/docs/Glossary/XHR_(XMLHttpRequest)">XMLHttpRequest</a>, or <a href="https://developer.mozilla.org/en-US/docs/Web/API/Client/postMessage">postMessage</a>.</p>
<p>Unfortunately, as I am back to university, I do not have much time to invest on this, and as the ZAP core team members have already an awful lot of things to deal with, it does not seem that these features will be brought to ZAP users in a near future.</p>
<p>If you are interested to help and contribute, you can take a look at the <a href="https://github.com/zaproxy/zaproxy/issues?q=is%3Aissue+is%3Aopen+FrontEndScanner" target="_blank">related issues opened on GitHub</a>, or come talk to the (very welcoming) team members on <code class="language-plaintext highlighter-rouge">irc.freenode.net</code>, in channel <code class="language-plaintext highlighter-rouge">#zaproxy</code>.</p>
<p><strong>I am always happy to receive constructive feedback, so do not hesitate to ping me, on twitter <a href="https://twitter.com/Pamplemouss_" target="_blank">@pamplemouss_</a> or elsewhere.</strong></p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Source: <a href="https://github.com/zaproxy/zap-core-help/wiki/HelpStartConceptsAddons" target="blank">https://github.com/zaproxy/zap-core-help/wiki/HelpStartConceptsAddons</a> . <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>The complete list of events to track: <a href="https://github.com/zaproxy/front-end-tracker/blob/master/src/events.js" target="blank">https://github.com/zaproxy/front-end-tracker/blob/master/src/events.js</a> . <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>This “scan-jwt-tokens” script is installed with the FrontEndScanner add-on, and thus available as an example for ZAP users. Here is what it looks like: <a href="https://github.com/zaproxy/zap-extensions/blob/master/addOns/frontendscanner/src/main/zapHomeFiles/scripts/scripts/client-side-passive/scan-jwt-tokens.js" target="blank">https://github.com/zaproxy/zap-extensions/blob/master/addOns/frontendscanner/src/main/zapHomeFiles/scripts/scripts/client-side-passive/scan-jwt-tokens.js</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>PamplemousseDuring the summer of 2018, I was an intern in the FoxSec team at Mozilla, where I contributed to ZAP (for Zed Attack proxy), an open-source web application security scanner.Patch option in Git2018-06-29T00:00:00+00:002018-06-29T00:00:00+00:00https://blog.xaviermaso.com/2018/06/29/Patch%20option%20in%20Git<h2 id="patch">Patch</h2>
<p>I have (almost) always been using <code class="language-plaintext highlighter-rouge">git add -p</code> or <code class="language-plaintext highlighter-rouge">git add --patch</code>.</p>
<p>This option allows you to interactively select which pieces of your changes to be added to the index.
(Before writing this article, I was even convinced that <code class="language-plaintext highlighter-rouge">-p</code> stood for “partial”…)</p>
<p>This is very convenient to 1) make sure that you will not commit unwanted code, 2) partially save your changes without losing your work in progress.</p>
<p>I recently learned that this <code class="language-plaintext highlighter-rouge">-p</code>/<code class="language-plaintext highlighter-rouge">--patch</code> option is available for the <a href="https://git-scm.com/docs/git-checkout"><code class="language-plaintext highlighter-rouge">checkout</code></a> and the <a href="https://git-scm.com/docs/git-reset"><code class="language-plaintext highlighter-rouge">reset</code></a> commands as well!</p>
<p>With these, we can respectively get rid of only part of our changes and remove pieces of code from the index.</p>
<h2 id="examples">Examples</h2>
<p>Here is the example setup: I created a git repository in which I have committed a single file.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">ls
</span>example.txt
<span class="nv">$ </span>git status
On branch master
nothing to commit, working tree clean
<span class="nv">$ </span><span class="nb">cat </span>example.txt
This is an exmple fil.
Containing multiple lines.
Very interest.
</code></pre></div></div>
<h3 id="git-add--p">git add -p</h3>
<p>Let’s say we edited our <code class="language-plaintext highlighter-rouge">example.txt</code> to add some content that we would like to commit.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cat </span>example.txt
File
This is an example file.
Containing multiple lines.
Very interesting.
</code></pre></div></div>
<p>We did multiple things here: added a “title” an corrected several words.
To keep things clean, we would like to make one commit for each one of these changes.</p>
<p>Here is how I would use <code class="language-plaintext highlighter-rouge">-p</code>:</p>
<ul>
<li>use “split” and “edit” to keep only the changes related to correct the words</li>
<li>commit those changes</li>
<li>verify that only the title is added</li>
<li>commit this change</li>
</ul>
<p><img alt="showing the use of `git add -p`" src="/assets/images/2018-06-29-Patch%20option%20in%20Git/add_patch.gif" /></p>
<h3 id="git-checkout--p">git checkout -p</h3>
<p>Similarly, we can use <code class="language-plaintext highlighter-rouge">git checkout -p</code> to discard part of the changes that we have performed on a file.
Let’s say we have edited our <code class="language-plaintext highlighter-rouge">example.txt</code> to add a line in the middle and modify the last one.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cat </span>example.txt
File
This is an example file.
Bwaaaaaaaaaah!
Containing multiple lines.
Some very interesting changes.
</code></pre></div></div>
<p>Then, we can use <code class="language-plaintext highlighter-rouge">git checkout -p</code> to get rid of the rubbish line that has been introduced:</p>
<ul>
<li>use split</li>
<li>get rid of the first part</li>
<li>but not of the second</li>
</ul>
<p><img alt="showing the use of `git checkout -p`" src="/assets/images/2018-06-29-Patch%20option%20in%20Git/checkout_patch.gif" /></p>
<h3 id="git-reset--p">git reset -p</h3>
<p>At last, we added our previous change to the index, as well as our edit of the third line, containing a grammar error…</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cat </span>example.txt
File
This is a example marvelous file.
Containing multiple lines.
Some very interesting changes.
<span class="nv">$ </span>git add example.txt
</code></pre></div></div>
<p>We changed our mind: this is not OK to commit broken English.</p>
<p>Let’s use <code class="language-plaintext highlighter-rouge">git reset -p</code> to remove the unwanted content from the index so we can commit peacefully:</p>
<ul>
<li>split the content</li>
<li>reset the first part</li>
<li>keep the second one</li>
<li>commit</li>
</ul>
<p><img alt="showing the use of `git reset -p`" src="/assets/images/2018-06-29-Patch%20option%20in%20Git/reset_patch.gif" /></p>
<p><strong>Et voilà!</strong></p>PamplemoussePatchGoogle Hangouts with Irssi on Nixos2018-05-31T00:00:00+00:002018-05-31T00:00:00+00:00https://blog.xaviermaso.com/2018/05/31/Google%20Hangouts%20with%20Irssi%20on%20NixOS<p>As I wanted to have access to Google Hangouts chats with <a href="https://irssi.org/">Irssi</a> on <a href="https://nixos.org/">NixOS</a>, here is a write-up of how I got it working.</p>
<h2 id="the-protagonists">The protagonists</h2>
<p>After a quick research using my favorite search engine <a href="https://lmddgtfy.net/?q=irssi%20google%20hangouts">DuckDuckGo</a>, it turns out that we will need to add two piece of software to Irssi.</p>
<ul>
<li><a href="https://www.bitlbee.org/main.php/news.r.html">BitlBee</a>: an IRC gateway that act as a server your client connects to, using the IRC protocol ; and “translates” what you send and receive to another protocol (depending on whom your gateway connects to)</li>
<li><a href="https://bitbucket.org/EionRobb/purple-hangouts">purple-hangouts</a>: a library “to support the proprietary protocol that Google uses for its Hangouts service”</li>
</ul>
<h2 id="add-them-to-the-system">Add them to the system</h2>
<p>We are pretty lucky as packages for <a href="https://nixos.org/nixos/packages.html#bitlbee">BitlBee</a> and <a href="https://nixos.org/nixos/packages.html#purple-hangout">purple-hangout</a> are available on NixOS.</p>
<p>However, <code class="language-plaintext highlighter-rouge">purple</code> is not a plugin installed by default in the <code class="language-plaintext highlighter-rouge">bitlbee</code> package: we need to declare that we want it enabled.</p>
<p>Having a look at the <a href="https://github.com/NixOS/nixpkgs/blob/8aa385069f830fc801c8a04d2bd8a70a02be3de4/pkgs/applications/networking/instant-messengers/bitlbee/default.nix#L27">declaration of <code class="language-plaintext highlighter-rouge">bitlbee</code></a>, we can find out the name of the relevant build option.</p>
<p>Let’s edit <code class="language-plaintext highlighter-rouge">/etc/nixos/configuration.nix</code>:</p>
<div class="language-nix highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">environment</span><span class="o">.</span><span class="nv">systemPackages</span> <span class="o">=</span> <span class="kn">with</span> <span class="nv">pkgs</span><span class="p">;</span> <span class="p">[</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="nv">bitlbee</span>
<span class="nv">purple-hangouts</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="p">];</span>
<span class="nv">nixpkgs</span><span class="o">.</span><span class="nv">config</span><span class="o">.</span><span class="nv">bitlbee</span><span class="o">.</span><span class="nv">enableLibPurple</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
<span class="nv">services</span><span class="o">.</span><span class="nv">bitlbee</span> <span class="o">=</span> <span class="p">{</span>
<span class="nv">enable</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
<span class="nv">libpurple_plugins</span> <span class="o">=</span> <span class="p">[</span> <span class="nv">pkgs</span><span class="o">.</span><span class="nv">purple-hangout</span> <span class="p">];</span>
<span class="p">};</span>
</code></pre></div></div>
<p><strong>You can see how this fit my whole configuration <a href="https://github.com/Pamplemousse/laptop/blob/master/etc/nixos/configuration.nix">on my GitHub repo</a></strong>.</p>
<p>Then rebuild the system:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>nixos-rebuild switch
</code></pre></div></div>
<h2 id="try-it-out">Try it out</h2>
<p>See how it goes: start <code class="language-plaintext highlighter-rouge">irssi</code>, then type the following commands:</p>
<pre><code class="language-irc"><@pamplemousse> /connect localhost
<@pamplemousse> /join &bitlbee
</code></pre>
<p>At this point, I need to create an account to identify myself to the BitlBee server.</p>
<pre><code class="language-irc"><@pamplemousse> register StrongPasswordGeneratedWithKeepassXC
</code></pre>
<p>And verify our the plugin to communicate with Google Hangouts is present:</p>
<pre><code class="language-irc"><@pamplemousse> plugins
[...]
<@root> Enabled Protocols: aim, bonjour, gg, hangouts, icq, identica, irc, jabber, novell, oscar, simple, twitter, zephysr
</code></pre>
<p>All good!</p>
<h2 id="setting-up-bitlbee-to-access-your-google-hangouts-account">Setting up BitlBee to access your Google Hangouts account</h2>
<pre><code class="language-irc"><@pamplemousse> account add hangouts MyAddress@Email.Com
<@pamplemousse> acc hangouts on
</code></pre>
<p>The next step is one of the most unreliable thing I have ever done to configure an account.</p>
<p>In fact, the previous command created another Irssi window to interact with the lib (that is, a private conversation with <code class="language-plaintext highlighter-rouge">purple_request_0</code>).</p>
<p>Follow the instruction that appeared there, and <strong>reply the oauth code that you obtain in the conversation</strong> (took me 30 minutes to figure this out).</p>
<p>Once that’s done, you should see all your contacts appearing in the <code class="language-plaintext highlighter-rouge">&bitlbee</code> window.</p>
<h2 id="try-it-out-1">Try it out</h2>
<p>We can now start 1-on-1 conversations, for example with JohnDoe:</p>
<pre><code class="language-irc"><@pamplemousse> /msg JohnDoe hello
</code></pre>
<p>And even join group chats (that exists):</p>
<pre><code class="language-irc"><@pamplemousse> help chat list
<@pamplemousse> chat list hangouts
<@pamplemousse> chat add hangouts !1 #chatname
</code></pre>
<p>So later on, we can use the shortcut <code class="language-plaintext highlighter-rouge">#chatname</code>:</p>
<pre><code class="language-irc"><@pamplemousse> /j #chatname
</code></pre>
<p><strong>That’s it! We now can chat on Google Hangouts from Irssi!</strong></p>PamplemousseAs I wanted to have access to Google Hangouts chats with Irssi on NixOS, here is a write-up of how I got it working.Setup a dev environment to contribute to ZAP2018-04-15T00:00:00+00:002018-04-15T00:00:00+00:00https://blog.xaviermaso.com/2018/04/15/Setup%20a%20dev%20environment%20to%20contribute%20to%20ZAP<h2 id="foreword">Foreword</h2>
<p><a href="https://www.zaproxy.org/">ZAP, or Zed Attack Proxy</a>, is an <a href="https://www.owasp.org">OWASP</a> project to make a free security tool to help developers and security experts test and find vulnerabilities in web applications.</p>
<p>I have been given the opportunity to contribute to it, and, being an open-source project, I feel like it would be a good idea to share my tribulations.
I ambitiously hope it will reduce the time and effort potential future contributors would have to invest diving into it.</p>
<p>The subject is so wide I will not be able to cover it entirely in this single blog post.
There is enough content to start a series, and I should stay motivated to write about it: stay tuned.</p>
<p>For now, let’s start with the basics.</p>
<p><strong>How to get the code running on my machine?</strong></p>
<h2 id="first-steps">First steps</h2>
<p>Documentation about ZAP development can be found on the <a href="https://github.com/zaproxy/zaproxy/wiki/Development">Zap’s repository wiki</a>. Anything I will present here have been found roaming around the docs.</p>
<p>Let’s start by cloning the main repository, build ZAP and start it from the command line.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">mkdir </span>zap <span class="o">&&</span> <span class="nb">cd </span>zap
git clone git@github.com:zaproxy/zaproxy.git
ant <span class="nt">-f</span> zaproxy/build/build.xml dist
./zaproxy/build/zap/zap.sh
</code></pre></div></div>
<p>And then we have ZAP running. Smooth.</p>
<h2 id="extensions-add-ons">Extensions, Add-ons</h2>
<p>Lots of ZAP’s “logic” has been extracted from the core repo into so-called add-ons, which are located into the <a href="https://github.com/zaproxy/zap-extensions"><code class="language-plaintext highlighter-rouge">zap-extensions</code> repo</a>.</p>
<p>Here you can find a general overview about Add-ons: <a href="https://github.com/zaproxy/zap-core-help/wiki/HelpStartConceptsAddons">github.com/zaproxy/zap-core-help/wiki/HelpStartConceptsAddons</a>.</p>
<p>As we are going to work on these as well, let’s clone the repository alongside the <code class="language-plaintext highlighter-rouge">zaproxy</code> one.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone git@github.com:zaproxy/zap-extensions.git
</code></pre></div></div>
<p>Again, <a href="https://github.com/zaproxy/zap-extensions/wiki">its related wiki</a> might be of good help.</p>
<p>There are (as far as I know), two ways to get add-ons in ZAP: via the “marketplace” (located in “Manage Add-ons” in the “Top Level Toolbar”) or load them from a file.</p>
<p>In our case, as we want to edit add-ons and watch their brand new behaviour, we will use the latter.</p>
<p>In an upcoming post, we will talk about how to bring changes to an Add-on, but before that, let’s ensure we can build them normally.</p>
<h3 id="build-and-use-the-add-on-as-is">Build and use the Add-on “as is”</h3>
<p>As we are going to have a look at a specific issue related to <a href="https://github.com/mozilla/zest">Zest</a>, this is the add-on we are going to look at.</p>
<p>Depending on the extension we want to work on, we gonna have to checkout the related branch. In our case, we want to work on Zest, so let’s checkout the <code class="language-plaintext highlighter-rouge">beta</code> branch, and build this Add-on specifically.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd zap-extensions
git checkout beta
ant -f build/build.xml deploy-zest
</code></pre></div></div>
<p>Then, in ZAP, press <code class="language-plaintext highlighter-rouge">Ctrl+l</code> (or go to “File > Load Add-on File”), then select the brand new plugin file (usually, it has been deployed to the “zap/zaproxy/src/plugin/” folder).</p>
<p><strong>At this point, you should have the Zest extension working fine in ZAP, and that’s enough for today.</strong></p>PamplemousseForeword