Witch of LightA collection of Cassie's harebrained schemes.2020-02-18T19:00:00-05:00https://blog.witchoflight.comCassie Jonescassie+blog@witchoflight.comI'm a Rust Contributor2016-09-19T20:00:00-04:00https://blog.witchoflight.com/2016/i-am-a-rust-contributor/<p>A month or two ago I was on the #rust IRC when someone discovered that <code>pow()</code> didn’t act quite right for unsigned numbers.
This was a bug that was isolated to a single function, so it seemed like something that I could handle.
The issue got posted, I claimed it and debugged it, and actually managed to fix it!
It took a little while, but very early this morning PR #34942 <a href="https://github.com/rust-lang/rust/pull/34942#event-795131414"><em>Fix overflow checking in unsigned pow()</em></a> was merged.
Now I’m a contributor to Rust!</p>
<p>Try to find a small thing that you can fix in something that you use.
Somewhere in there is the right issue that you can fix.
It’s a great experience.
(I really look forward to the release notes for 1.12…)</p>
<p><strong>Update:</strong>
It turns out my fix made it into the 1.13 release, and my name is in the <a href="https://blog.rust-lang.org/2016/11/10/Rust-1.13.html#contributors-to-1130">contributors section in the release notes</a>.</p>
You Got Your Race Condition Inside My Package Manager!2017-08-13T20:00:00-04:00https://blog.witchoflight.com/2017/package-manager-race-condition/<h1 id="a-case-of-broken-builds"><a class="header-anchor c-sun dec-none" href="#a-case-of-broken-builds">¶</a> A Case of Broken Builds</h1>
<p>The continuous integration servers at my current job are unfortunately stateful.
Every week or so, we run a bunch of configuration processes to reinstall packages to keep the environment clean.
One of these reinstalls <code>pip</code> and the Python libraries used by build tools.
This morning, I got a message from one of the build engineers telling me that the Python libraries weren’t installing correctly anymore.
(Even though I’m an intern, I’m apparently one of the office Python experts now.)
So, I opened up the build log, and began looking around.</p>
<p>What was failing was pretty clear:</p>
<pre><code>Collecting ruamel.yaml
Using cached ruamel.yaml-0.15.28.tar.gz
Installing collected packages: ruamel.yaml
...
error: Microsoft Visual C++ 9.0 is required
</code></pre>
<p>but why was this suddenly happening now, without us making any changes to our configuration?
Also, why are we installing ruamel.yaml?
We’re not using that!</p>
<p>Long story short, ruamel.yaml was a transitive dependency of dateparser, an excellent library for parsing natural language dates.
It wasn’t clear to me why it would be suddenly failing though, so I decided to go investigating further.
Looking at the release notes of dateparser, I saw that they had recently pinned ruamel.yaml to <code><0.14</code>, which we clearly weren’t getting.
Previously, the version was un-pinned, so I decided to go look at the release notes for ruamel.yaml, and sure enough, there were releases over the weekend—those must’ve been what broke it.</p>
<p>We upgraded our dependency on dateparser to 0.6, and tried again… and it still failed while trying to build the newest version of ruamel.yaml.
One period of looking at GitHub blame views, commit histories, and unpacking PyPI tarballs later, I determined that version 0.6 of dateparser released on PyPI doesn’t actually have the pin the version of ruamel.yaml, despite what the changelog claims.
(I opened <a href="https://github.com/scrapinghub/dateparser/issues/342">dateparser issue #342</a> for this.)</p>
<p>Since the version wasn’t pinned, we just asked pip to first install an older version of ruamel.yaml, to hopefully get priority when dateparser tried to install it.
So, we put <code>ruamel.yaml==0.13.14</code> in our package list, and then tried again.
Finally, everything worked perfectly.</p>
<p>Case closed.</p>
<hr>
<h1 id="this-fix-is-a-mystery"><a class="header-anchor c-sun dec-none" href="#this-fix-is-a-mystery">¶</a> This Fix is a Mystery</h1>
<p>But wait, what’s this?
Looking closer at the successful build logs, we can see that both <code>ruamel.yaml-0.13.14</code> and <code>ruamel.yaml-0.15.29</code> are installing without complaint.
What’s stopped the error?
Well, if you’ll look at the version number up at the top, we were installing <code>ruamel.yaml-0.15.28</code> before—just one hour previously, while I was on my lunch break, an update to ruamel.yaml had been released.
Looking back at previous versions on PyPI, I finally figured out what had gone wrong.
If you look at the downloads on the PyPI page for <a href="https://pypi.python.org/pypi/ruamel.yaml/0.15.28">ruamel.yaml version 0.15.28</a>, you’ll see that there are no Windows wheels.
(Wheels are the format that Python uses to distribute compiled C extensions and pre-packed libraries.)
However, if you go to the page for <a href="https://pypi.python.org/pypi/ruamel.yaml/0.15.29">version 0.15.29</a>, then you’ll see that Windows wheels are finally present.
So, I guess until dateparser fixes their version pinning, we’ll just have to hope that ruamel.yaml stays packaged correctly.</p>
<p>Case closed.</p>
<hr>
<h1 id="we-get-very-unlucky"><a class="header-anchor c-sun dec-none" href="#we-get-very-unlucky">¶</a> We Get Very Unlucky</h1>
<p>Oops, nope it’s not.
Later in the afternoon, I got another message that some of the builds had failed.
Looking at the first build that started failing, again we see that…</p>
<pre><code>Collecting ruamel.yaml
Using cached ruamel.yaml-0.15.30.tar.gz
Installing collected packages: ruamel.yaml
...
error: Microsoft Visual C++ 9.0 is required
</code></pre>
<p>okay, this project releases <em>fast</em>, this is the fourth release in 2 days.
In any case, the last few builds succeeded with <code>0.15.30</code>, so what happened?
Well, I don’t know for sure, but I have a pretty good guess.
I suspect that the release process for ruamel.yaml isn’t atomic, and that they upload their source releases first, and the wheels come a bit later.
We were unlucky enough to start a build during that first upload, where only the source package was available, and no Windows wheels.
But, the few builds that got held up and started 4 minutes after the others took long enough that the wheels were available, and so they installed without any fuss.</p>
<p>This was an exceptionally unlucky situation.
But, I’ve got a very good story now—and also a much greater appreciation for various package manager <code>.lock</code> files.</p>
Debugging in the Deep End2018-04-16T20:00:00-04:00https://blog.witchoflight.com/2018/debugging-the-deep-end/<h1 id="the-problem"><a class="header-anchor c-sun dec-none" href="#the-problem">¶</a> The Problem</h1>
<p>Last week I was working with <a href="http://twitter.com/a2aarontothe2">Aaron</a> on a series of <a href="https://github.com/a2aaron/VCVMicroTools">VCV Rack plugin modules</a>, and we were trying to add our own custom graphics for them.
VCV Rack uses SVG for its plugins, so Aaron had built a front face for one of our modules, but it wasn’t properly aligned.
I imported it into Affinity Designer and tried to fix it up, but when I exported my new version and loaded it, suddenly all of our modules had vanished.
Since our module wasn’t <em>supposed</em> to vanish, and I hadn’t done anything <em>obviously</em> wrong, I decided that this must be a bug in VCV Rack.
Over the next few hours, I diagnosed and managed to fix this bug, and by the magic of open source and some luck, the PRs got merged the next day.
In particular, I managed to make this fix without having ever looked at any of this code before, and I’d like to share the process I followed to manage to do this.</p>
<!-- Use these links? -->
<h1 id="debugging-the-svg"><a class="header-anchor c-sun dec-none" href="#debugging-the-svg">¶</a> Debugging the SVG</h1>
<p>The first phase when fixing a bug is to reproduce the bug.
Here, because the rendering worked fine with Aaron’s SVG until I re-exported it, I suspected that some feature being used in Affinity’s SVG export wasn’t supported by the VCV Rack SVG renderer.
To figure out which, I used the first technique: minimize your failing case.</p>
<p>First, I tried changing export settings, removing groups to flatten the SVG, doing everything I could to remove different features.
As I went, I inspected the working and not working SVG side-by-side to see what the differences were.
I didn’t make much progress this way, so I started from the other direction, building up instead of tearing down.
I saved a simple blank grey square, just a single element.
When that didn’t work, I figured it must have something to do with one of the attributes on the <code><svg></code> container element.
For reference, a minimal SVG exported from Affinity might look something like:</p>
<pre class="language-svg"><code class="language-svg"><span class="token prolog"><?xml version="1.0" encoding="UTF-8" standalone="no"?></span><br><span class="token doctype"><span class="token punctuation"><!</span><span class="token doctype-tag">DOCTYPE</span> <span class="token name">svg</span> <span class="token name">PUBLIC</span> <span class="token string">"-//W3C//DTD SVG 1.1//EN"</span><br> <span class="token string">"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"</span><span class="token punctuation">></span></span><br><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>svg</span> <span class="token attr-name">width</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>100%<span class="token punctuation">"</span></span> <span class="token attr-name">height</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>100%<span class="token punctuation">"</span></span> <span class="token attr-name">viewBox</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>0 0 240 380<span class="token punctuation">"</span></span> <span class="token attr-name">version</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>1.1<span class="token punctuation">"</span></span><br> <span class="token attr-name">xmlns</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>http://www.w3.org/2000/svg<span class="token punctuation">"</span></span> <span class="token attr-name"><span class="token namespace">xmlns:</span>xlink</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>http://www.w3.org/1999/xlink<span class="token punctuation">"</span></span><br> <span class="token attr-name"><span class="token namespace">xml:</span>space</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>preserve<span class="token punctuation">"</span></span> <span class="token attr-name"><span class="token namespace">xmlns:</span>serif</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>http://www.serif.com/<span class="token punctuation">"</span></span><br> <span class="token special-attr"><span class="token attr-name">style</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span><span class="token value css language-css"><span class="token property">fill-rule</span><span class="token punctuation">:</span>evenodd<span class="token punctuation">;</span><span class="token property">clip-rule</span><span class="token punctuation">:</span>evenodd<span class="token punctuation">;</span><span class="token property">stroke-linejoin</span><span class="token punctuation">:</span>round<span class="token punctuation">;</span><br> <span class="token property">stroke-miterlimit</span><span class="token punctuation">:</span>1.41421<span class="token punctuation">;</span></span><span class="token punctuation">"</span></span></span><span class="token punctuation">></span></span><br> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>rect</span> <span class="token attr-name">x</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>0<span class="token punctuation">"</span></span> <span class="token attr-name">y</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>0<span class="token punctuation">"</span></span> <span class="token attr-name">width</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>240<span class="token punctuation">"</span></span> <span class="token attr-name">height</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>380<span class="token punctuation">"</span></span> <span class="token special-attr"><span class="token attr-name">style</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span><span class="token value css language-css"><span class="token property">fill</span><span class="token punctuation">:</span><span class="token function">rgb</span><span class="token punctuation">(</span>235<span class="token punctuation">,</span>235<span class="token punctuation">,</span>235<span class="token punctuation">)</span><span class="token punctuation">;</span></span><span class="token punctuation">"</span></span></span><span class="token punctuation">/></span></span><br><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>svg</span><span class="token punctuation">></span></span></code></pre>
<p>So I looked through all the settings on that, I noticed that it was setting the width and height to <code>100%</code>, whereas the working one was setting it to explicit pixel numbers.
I copied the width and height out of the working one into the not-working one… and that fixed it.
That suggested a possible problem: If you used percentage dimensions on the <code><svg></code> element, it wouldn’t correctly calculate the size of the object, and would simply make it 0 by 0.
This was a good enough guess for me, so I set about trying to figure out how to fix that.</p>
<h1 id="spelunking-the-code"><a class="header-anchor c-sun dec-none" href="#spelunking-the-code">¶</a> Spelunking the Code</h1>
<p>This brings us to the second phase of solving the bug: find a piece of code that’s related to the bug, so you have a place to start.
I suspected that I could find where the SVGs were loaded in VCV Rack and fix it to handle those percentages correctly.
I didn’t know exactly how I would handle them yet, I had to see what it was doing first.
To find this, I took a simple approach: search the source code for the word “SVG” and see what I could find!
I used <a href="https://github.com/BurntSushi/ripgrep">ripgrep</a>, a very good search tool, but you can use whatever tool you have available as long as it can search all the code at once.
If your editor can jump to definitions in a project, searching for related words and then jumping from definition to definition can help you find the part of the code you’re interested in very quickly; having good code navigation tools helps <em>a lot</em>.</p>
<p>Using this, I found SVG widgets, followed their class hierarchy up to rendering components, and then eventually I found my way to a class calling functions from “nanosvg.”
Curious, I looked it up, and saw that it was a small SVG parser library, and that it produces a bunch of shape paths.
In order to not have to resize all those paths (I assumed), I decided to try fixing the bug from inside nanosvg instead of inside VCV Rack.
Knowing that it was a problem with dimensions, I searched the nanosvg code for the string <code>"width"</code>.
The second result was a very promising looking function:</p>
<pre class="language-c"><code class="language-c"><span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">nsvg__parseSVG</span><span class="token punctuation">(</span>NSVGparser<span class="token operator">*</span> p<span class="token punctuation">,</span> <span class="token keyword">const</span> <span class="token keyword">char</span><span class="token operator">*</span><span class="token operator">*</span> attr<span class="token punctuation">)</span><br><span class="token punctuation">{</span><br> <span class="token keyword">int</span> i<span class="token punctuation">;</span><br> <span class="token keyword">for</span> <span class="token punctuation">(</span>i <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span> attr<span class="token punctuation">[</span>i<span class="token punctuation">]</span><span class="token punctuation">;</span> i <span class="token operator">+=</span> <span class="token number">2</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token operator">!</span><span class="token function">nsvg__parseAttr</span><span class="token punctuation">(</span>p<span class="token punctuation">,</span> attr<span class="token punctuation">[</span>i<span class="token punctuation">]</span><span class="token punctuation">,</span> attr<span class="token punctuation">[</span>i <span class="token operator">+</span> <span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token function">strcmp</span><span class="token punctuation">(</span>attr<span class="token punctuation">[</span>i<span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token string">"width"</span><span class="token punctuation">)</span> <span class="token operator">==</span> <span class="token number">0</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br> p<span class="token operator">-></span>image<span class="token operator">-></span>width <span class="token operator">=</span> <span class="token function">nsvg__parseCoordinate</span><span class="token punctuation">(</span>p<span class="token punctuation">,</span> attr<span class="token punctuation">[</span>i <span class="token operator">+</span> <span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token number">0.0f</span><span class="token punctuation">,</span> <span class="token number">0.0f</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token punctuation">}</span> <span class="token keyword">else</span> <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token function">strcmp</span><span class="token punctuation">(</span>attr<span class="token punctuation">[</span>i<span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token string">"height"</span><span class="token punctuation">)</span> <span class="token operator">==</span> <span class="token number">0</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br> p<span class="token operator">-></span>image<span class="token operator">-></span>height <span class="token operator">=</span> <span class="token function">nsvg__parseCoordinate</span><span class="token punctuation">(</span>p<span class="token punctuation">,</span> attr<span class="token punctuation">[</span>i <span class="token operator">+</span> <span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token number">0.0f</span><span class="token punctuation">,</span> <span class="token number">0.0f</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br><span class="token comment">// …</span></code></pre>
<h1 id="writing-a-fix"><a class="header-anchor c-sun dec-none" href="#writing-a-fix">¶</a> Writing a Fix</h1>
<p>I’d located a likely location for the bug, so now I changed mode from code spelunking to trying to understand what the code did.
Since this function looked so relevant, I first tried to figure out what <code>nsvg__parseSVG</code> was doing.
A good tool for this was finding where it was used: it was getting called in one place, from <code>nsvg__startElement</code>, and seemed to be being called when an <code><svg></code> tag was found, to compute the context from the attributes… perfect.
The parameter <code>const char** attr</code> suggested a list of attribute strings, and the usage <code>attr[i]</code> and <code>attr[i + 1]</code> suggested the SVG key/value pairs.
Therefore, it seemed like</p>
<pre class="language-c"><code class="language-c"><span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token function">strcmp</span><span class="token punctuation">(</span>attr<span class="token punctuation">[</span>i<span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token string">"width"</span><span class="token punctuation">)</span> <span class="token operator">==</span> <span class="token number">0</span><span class="token punctuation">)</span><br> p<span class="token operator">-></span>image<span class="token operator">-></span>width <span class="token operator">=</span> <span class="token function">nsvg__parseCoordinate</span><span class="token punctuation">(</span>p<span class="token punctuation">,</span> attr<span class="token punctuation">[</span>i <span class="token operator">+</span> <span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token number">0.0f</span><span class="token punctuation">,</span> <span class="token number">1.0f</span><span class="token punctuation">)</span><span class="token punctuation">;</span></code></pre>
<p>would parse the width coordinate value.
In order to figure this out, we want to go look at <code>nsvg__parseCoordinate</code>.</p>
<pre class="language-c"><code class="language-c"><span class="token keyword">static</span> <span class="token keyword">float</span> <span class="token function">nsvg__parseCoordinate</span><span class="token punctuation">(</span>NSVGparser<span class="token operator">*</span> p<span class="token punctuation">,</span> <span class="token keyword">const</span> <span class="token keyword">char</span><span class="token operator">*</span> str<span class="token punctuation">,</span><br> <span class="token keyword">float</span> orig<span class="token punctuation">,</span> <span class="token keyword">float</span> length<span class="token punctuation">)</span><br><span class="token punctuation">{</span><br> NSVGcoordinate coord <span class="token operator">=</span> <span class="token function">nsvg__parseCoordinateRaw</span><span class="token punctuation">(</span>str<span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token keyword">return</span> <span class="token function">nsvg__convertToPixels</span><span class="token punctuation">(</span>p<span class="token punctuation">,</span> coord<span class="token punctuation">,</span> orig<span class="token punctuation">,</span> length<span class="token punctuation">)</span><span class="token punctuation">;</span><br><span class="token punctuation">}</span></code></pre>
<p>Following those definitions, <code>nsvg__parseCoordinateRaw</code> follows a few steps to get to unit parsing, but it seems largely straightforward parsing of the data, no fancy processing.
The fact that we’ve got an issue in % suggests that <code>nsvg__convertToPixels</code> is doing something interesting.
And indeed, looking at the code for that function, it made clear what the <code>length</code> argument did:</p>
<pre class="language-c"><code class="language-c"><span class="token keyword">static</span> <span class="token keyword">float</span> <span class="token function">nsvg__convertToPixels</span><span class="token punctuation">(</span>NSVGparser<span class="token operator">*</span> p<span class="token punctuation">,</span> NSVGcoordinate c<span class="token punctuation">,</span><br> <span class="token keyword">float</span> orig<span class="token punctuation">,</span> <span class="token keyword">float</span> length<span class="token punctuation">)</span><br><span class="token punctuation">{</span><br> NSVGattrib<span class="token operator">*</span> attr <span class="token operator">=</span> <span class="token function">nsvg__getAttr</span><span class="token punctuation">(</span>p<span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token keyword">switch</span> <span class="token punctuation">(</span>c<span class="token punctuation">.</span>units<span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token comment">// …</span><br> <span class="token keyword">case</span> NSVG_UNITS_PERCENT<span class="token operator">:</span> <span class="token keyword">return</span> orig <span class="token operator">+</span> c<span class="token punctuation">.</span>value <span class="token operator">/</span> <span class="token number">100.0f</span> <span class="token operator">*</span> length<span class="token punctuation">;</span><br> <span class="token keyword">default</span><span class="token operator">:</span> <span class="token keyword">return</span> c<span class="token punctuation">.</span>value<span class="token punctuation">;</span><br> <span class="token punctuation">}</span><br> <span class="token keyword">return</span> c<span class="token punctuation">.</span>value<span class="token punctuation">;</span><br><span class="token punctuation">}</span></code></pre>
<p>It was used as the base value that the percentage should be relative to.
Then, it becomes clear: <code>nsvg__parseCoordinate(p, attr[i + 1], 0.0f, 1.0f);</code> makes <code>100%</code> into <code>1px</code>
So, now we know what exactly has gone wrong, how do we solve it?
Since I didn’t know what the percentages should be relative to, I started researching, looking at Mozilla references for how the percent should behave.</p>
<p>I didn’t find an answer, but while I was researching, I ran into lots of examples that didn’t specify dimensions at all.
This made me suspicious: nanosvg handles most SVGs correctly, so it must have some code to handle this case.
When you’re fixing a bug, often the edge case that you’re running into is similar to another edge case that’s already handled, and you just need to make it cover your case as well.
Since this must be related to the dimensions, and the dimension handling sets the <code>width</code> field while parsing the <code><svg></code> element, I went out searching for <code>->width</code> and <code>.width</code> in the code.
I immediately found <code>nsvg__scaleToViewbox</code> which contains a promising looking block of code:</p>
<pre class="language-c"><code class="language-c"><span class="token keyword">if</span> <span class="token punctuation">(</span>p<span class="token operator">-></span>viewWidth <span class="token operator">==</span> <span class="token number">0</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token keyword">if</span> <span class="token punctuation">(</span>p<span class="token operator">-></span>image<span class="token operator">-></span>width <span class="token operator">></span> <span class="token number">0</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br> p<span class="token operator">-></span>viewWidth <span class="token operator">=</span> p<span class="token operator">-></span>image<span class="token operator">-></span>width<span class="token punctuation">;</span><br> <span class="token punctuation">}</span> <span class="token keyword">else</span> <span class="token punctuation">{</span><br> p<span class="token operator">-></span>viewMinx <span class="token operator">=</span> bounds<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">;</span><br> p<span class="token operator">-></span>viewWidth <span class="token operator">=</span> bounds<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span> <span class="token operator">-</span> bounds<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">;</span><br> <span class="token punctuation">}</span><br><span class="token punctuation">}</span></code></pre>
<p>This looks like what we want!
It will recalculate the width and height if they’re set to 0, so we just need to make sure that our 100% sets it to 0 instead of 1.
And to fix that, we can simply change:</p>
<pre class="language-diff"><code class="language-diff"><span class="token unchanged"><span class="token prefix unchanged"> </span><span class="token line">if (strcmp(attr[i], "width") == 0) {<br></span></span><span class="token deleted-sign deleted"><span class="token prefix deleted">-</span><span class="token line"> p->image->width = nsvg__parseCoordinate(p, attr[i + 1], 0.0f, 1.0f);<br></span></span><span class="token inserted-sign inserted"><span class="token prefix inserted">+</span><span class="token line"> p->image->width = nsvg__parseCoordinate(p, attr[i + 1], 0.0f, 0.0f);<br></span></span><span class="token unchanged"><span class="token prefix unchanged"> </span><span class="token line">} else if (strcmp(attr[i], "height") == 0) {<br></span></span><span class="token deleted-sign deleted"><span class="token prefix deleted">-</span><span class="token line"> p->image->height = nsvg__parseCoordinate(p, attr[i + 1], 0.0f, 1.0f);<br></span></span><span class="token inserted-sign inserted"><span class="token prefix inserted">+</span><span class="token line"> p->image->height = nsvg__parseCoordinate(p, attr[i + 1], 0.0f, 0.0f);<br></span></span><span class="token unchanged"><span class="token prefix unchanged"> </span><span class="token line">} else if (strcmp(attr[i], "viewBox") == 0) {</span></span></code></pre>
<p>And that’s the whole fix!</p>
<h1 id="conclusions"><a class="header-anchor c-sun dec-none" href="#conclusions">¶</a> Conclusions</h1>
<p>You can use these techniques the next time you have to jump into a large codebase that’s unfamiliar.
Finding a simple case that fails, making a hypothesis about why it fails, and then searching for terms related to that gives you a big head-start navigating the code.
Being able to jump to definitions helps you build a mental map of a thin slice of the code.
Even though Rack is about 11K lines of code, and nanosvg is almost 3K, in the process of fixing this bug I only <em>glanced at</em> a few hundred lines of code, and only tried to understand a few dozen of them.
The next time you want to try to examine a new codebase, keep these tricks in mind.</p>
Computing Min2018-08-26T20:00:00-04:00https://blog.witchoflight.com/2018/min-benchmark/<p>Inspired by the <a href="https://users.rust-lang.org/t/my-gamedever-wishlist-for-rust/2859">“gamedev wishlist for rust”</a>, I got curious if computing the minimum of a bunch of numbers with <code>min(min(min(a, b), c), d)</code> was effective.
My thinking was that this would produce unnecessary dependency chains in the processor, stopping out-of-order executions of independent <code>min</code>s.
Also, this was a good excuse to try out <a href="https://github.com/japaric/criterion.rs">Criterion</a>, so I set out to measure the impact.
One extra node</p>
<h2 id="implementation"><a class="header-anchor c-sun dec-none" href="#implementation">¶</a> Implementation</h2>
<p>In my actual benchmark I produced two copies of each of these methods specified here.
One for <code>std::cmp::min</code>, and one for <code>f32</code> (since it’s not <code>Ord</code>).
For simplicity, I’ll just use the generic one here, they both look pretty much the same.</p>
<h3 id="loopy"><a class="header-anchor c-sun dec-none" href="#loopy">¶</a> Loopy</h3>
<p>First, I was curious if a usual <code>.iter().min()</code> would perform well.
The theory here is that <em>ideally</em>, for a known list length, if the compiler thought it was worthwhile, this would compile to the same code as a straight line of <code>min</code>.
So, our first case is this:</p>
<pre class="language-rust"><code class="language-rust"><span class="token punctuation">[</span>a<span class="token punctuation">,</span> b<span class="token punctuation">,</span> c<span class="token punctuation">,</span> d<span class="token punctuation">,</span> e<span class="token punctuation">]</span><span class="token punctuation">.</span><span class="token function">iter</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">min</span><span class="token punctuation">(</span><span class="token punctuation">)</span></code></pre>
<h3 id="linear-reduction-macro"><a class="header-anchor c-sun dec-none" href="#linear-reduction-macro">¶</a> Linear Reduction Macro</h3>
<p>The second method is a macro that will turn <code>min!(a, b, c, d, e)</code> into <code>min(a, min(b, min(c, min(d, e))))</code>.
This is a direct recursive macro that that just accumulates the <code>min</code> calls.
If you’re familiar with Rust macros, nothing <em>too</em> scary is going on here.</p>
<pre class="language-rust"><code class="language-rust"><span class="token attribute attr-name">#[macro_export]</span><br><span class="token macro property">macro_rules!</span> min <span class="token punctuation">{</span><br> <span class="token punctuation">(</span><span class="token variable">$x</span><span class="token punctuation">:</span><span class="token fragment-specifier punctuation">expr</span> $<span class="token punctuation">(</span><span class="token punctuation">,</span><span class="token punctuation">)</span><span class="token operator">*</span><span class="token punctuation">)</span> <span class="token operator">=></span> <span class="token punctuation">{</span> <span class="token variable">$x</span> <span class="token punctuation">}</span><span class="token punctuation">;</span><br> <span class="token punctuation">(</span><span class="token variable">$x</span><span class="token punctuation">:</span><span class="token fragment-specifier punctuation">expr</span> $<span class="token punctuation">(</span><span class="token punctuation">,</span> <span class="token variable">$y</span><span class="token punctuation">:</span><span class="token fragment-specifier punctuation">expr</span><span class="token punctuation">)</span><span class="token operator">*</span> $<span class="token punctuation">(</span><span class="token punctuation">,</span><span class="token punctuation">)</span><span class="token operator">*</span><span class="token punctuation">)</span> <span class="token operator">=></span> <span class="token punctuation">{</span> <span class="token punctuation">::</span><span class="token namespace">std<span class="token punctuation">::</span>cmp<span class="token punctuation">::</span></span><span class="token function">min</span><span class="token punctuation">(</span><span class="token variable">$x</span><span class="token punctuation">,</span> <span class="token macro property">min!</span><span class="token punctuation">(</span>$<span class="token punctuation">(</span><span class="token variable">$y</span><span class="token punctuation">)</span><span class="token punctuation">,</span><span class="token operator">*</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">}</span><span class="token punctuation">;</span><br><span class="token punctuation">}</span></code></pre>
<h3 id="tree-reduction-macro"><a class="header-anchor c-sun dec-none" href="#tree-reduction-macro">¶</a> Tree Reduction Macro</h3>
<p>This macro is quite hairy.
The goal is to turn something like <code>min_tree!(a, b, c, d, e)</code> into <code>min(min(a, min(c, e)), min(b, d))</code> in order to allow the processor to simultaneously execute the leaf <code>min</code> calls.
Let me walk us through the parts:</p>
<p>First, we have the <code>()</code> case.
The <code>Ord</code> typeclass doesn’t offer us a top element, so we just give an error if there are no arguments.
(The float version returns <code>f32::INFINITY</code> in this case.)</p>
<p>Next, we have the base cases.
These look very similar to the cases from the <code>min!</code> macro, except that the n-element case calls the <code>@split</code> case.
The <code>@split</code> cases are dedicated to taking a list of expressions, and partitioning it into two different lists of expressions.
The idea being that if you can split it into two lists, then you can do <code>min_tree!</code> to each of those two lists.
The first <code>@split</code> case pulls two items off the arguments if they’re available, and puts one in each accumulator list.
The second case is if there’s only one argument left, and the final case is for when there are no arguments left.
Once the argument list has been split into two parts, we do <code>min(min_tree!(a...), min_tree!(b...))</code>, recursively constructing the tree.</p>
<pre class="language-rust"><code class="language-rust"><span class="token attribute attr-name">#[macro_export]</span><br><span class="token macro property">macro_rules!</span> min_tree <span class="token punctuation">{</span><br> <span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=></span> <span class="token punctuation">{</span> <span class="token macro property">compile_error!</span><span class="token punctuation">(</span><span class="token string">"Cannot compute the minimum of 0 elements"</span><span class="token punctuation">)</span> <span class="token punctuation">}</span><span class="token punctuation">;</span><br> <span class="token punctuation">(</span><span class="token variable">$x</span><span class="token punctuation">:</span><span class="token fragment-specifier punctuation">expr</span><span class="token punctuation">)</span> <span class="token operator">=></span> <span class="token punctuation">{</span> <span class="token variable">$x</span> <span class="token punctuation">}</span><span class="token punctuation">;</span><br> <span class="token punctuation">(</span><span class="token variable">$x</span><span class="token punctuation">:</span><span class="token fragment-specifier punctuation">expr</span><span class="token punctuation">,</span> <span class="token variable">$y</span><span class="token punctuation">:</span><span class="token fragment-specifier punctuation">expr</span><span class="token punctuation">)</span> <span class="token operator">=></span> <span class="token punctuation">{</span> <span class="token punctuation">::</span><span class="token namespace">std<span class="token punctuation">::</span>cmp<span class="token punctuation">::</span></span><span class="token function">min</span><span class="token punctuation">(</span><span class="token variable">$x</span><span class="token punctuation">,</span> <span class="token variable">$y</span><span class="token punctuation">)</span> <span class="token punctuation">}</span><span class="token punctuation">;</span><br> <span class="token punctuation">(</span>$<span class="token punctuation">(</span><span class="token variable">$x</span><span class="token punctuation">:</span><span class="token fragment-specifier punctuation">expr</span><span class="token punctuation">)</span><span class="token punctuation">,</span><span class="token operator">*</span> $<span class="token punctuation">(</span><span class="token punctuation">,</span><span class="token punctuation">)</span><span class="token operator">*</span><span class="token punctuation">)</span> <span class="token operator">=></span> <span class="token punctuation">{</span> <span class="token macro property">min_tree!</span><span class="token punctuation">(</span><span class="token operator">@</span>split <span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">;</span> <span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">;</span> $<span class="token punctuation">(</span><span class="token variable">$x</span><span class="token punctuation">)</span><span class="token punctuation">,</span><span class="token operator">*</span><span class="token punctuation">)</span> <span class="token punctuation">}</span><span class="token punctuation">;</span><br> <span class="token punctuation">(</span><span class="token operator">@</span>split <span class="token punctuation">[</span>$<span class="token punctuation">(</span><span class="token variable">$a</span><span class="token punctuation">:</span><span class="token fragment-specifier punctuation">expr</span><span class="token punctuation">)</span><span class="token punctuation">,</span><span class="token operator">*</span><span class="token punctuation">]</span><span class="token punctuation">;</span> <span class="token punctuation">[</span>$<span class="token punctuation">(</span><span class="token variable">$b</span><span class="token punctuation">:</span><span class="token fragment-specifier punctuation">expr</span><span class="token punctuation">)</span><span class="token punctuation">,</span><span class="token operator">*</span><span class="token punctuation">]</span><span class="token punctuation">;</span> <span class="token variable">$x</span><span class="token punctuation">:</span><span class="token fragment-specifier punctuation">expr</span><span class="token punctuation">,</span> <span class="token variable">$y</span><span class="token punctuation">:</span><span class="token fragment-specifier punctuation">expr</span> $<span class="token punctuation">(</span><span class="token punctuation">,</span> <span class="token variable">$z</span><span class="token punctuation">:</span><span class="token fragment-specifier punctuation">expr</span><span class="token punctuation">)</span><span class="token operator">*</span><span class="token punctuation">)</span> <span class="token operator">=></span> <span class="token punctuation">{</span><br> <span class="token macro property">min_tree!</span><span class="token punctuation">(</span><span class="token operator">@</span>split <span class="token punctuation">[</span><span class="token variable">$x</span> $<span class="token punctuation">(</span><span class="token punctuation">,</span> <span class="token variable">$a</span><span class="token punctuation">)</span><span class="token operator">*</span><span class="token punctuation">]</span><span class="token punctuation">;</span> <span class="token punctuation">[</span><span class="token variable">$y</span> $<span class="token punctuation">(</span><span class="token punctuation">,</span> <span class="token variable">$b</span><span class="token punctuation">)</span><span class="token operator">*</span><span class="token punctuation">]</span><span class="token punctuation">;</span> $<span class="token punctuation">(</span><span class="token variable">$z</span><span class="token punctuation">)</span><span class="token punctuation">,</span><span class="token operator">*</span><span class="token punctuation">)</span><br> <span class="token punctuation">}</span><span class="token punctuation">;</span><br> <span class="token punctuation">(</span><span class="token operator">@</span>split <span class="token punctuation">[</span>$<span class="token punctuation">(</span><span class="token variable">$a</span><span class="token punctuation">:</span><span class="token fragment-specifier punctuation">expr</span><span class="token punctuation">)</span><span class="token punctuation">,</span><span class="token operator">*</span><span class="token punctuation">]</span><span class="token punctuation">;</span> <span class="token punctuation">[</span>$<span class="token punctuation">(</span><span class="token variable">$b</span><span class="token punctuation">:</span><span class="token fragment-specifier punctuation">expr</span><span class="token punctuation">)</span><span class="token punctuation">,</span><span class="token operator">*</span><span class="token punctuation">]</span><span class="token punctuation">;</span> <span class="token variable">$x</span><span class="token punctuation">:</span><span class="token fragment-specifier punctuation">expr</span><span class="token punctuation">)</span> <span class="token operator">=></span> <span class="token punctuation">{</span><br> <span class="token macro property">min_tree!</span><span class="token punctuation">(</span><span class="token operator">@</span>split <span class="token punctuation">[</span><span class="token variable">$x</span> $<span class="token punctuation">(</span><span class="token punctuation">,</span> <span class="token variable">$a</span><span class="token punctuation">)</span><span class="token operator">*</span><span class="token punctuation">]</span><span class="token punctuation">;</span> <span class="token punctuation">[</span>$<span class="token punctuation">(</span><span class="token variable">$b</span><span class="token punctuation">)</span><span class="token punctuation">,</span><span class="token operator">*</span><span class="token punctuation">]</span><span class="token punctuation">;</span><span class="token punctuation">)</span><br> <span class="token punctuation">}</span><span class="token punctuation">;</span><br> <span class="token punctuation">(</span><span class="token operator">@</span>split <span class="token punctuation">[</span>$<span class="token punctuation">(</span><span class="token variable">$a</span><span class="token punctuation">:</span><span class="token fragment-specifier punctuation">expr</span><span class="token punctuation">)</span><span class="token punctuation">,</span><span class="token operator">*</span><span class="token punctuation">]</span><span class="token punctuation">;</span> <span class="token punctuation">[</span>$<span class="token punctuation">(</span><span class="token variable">$b</span><span class="token punctuation">:</span><span class="token fragment-specifier punctuation">expr</span><span class="token punctuation">)</span><span class="token punctuation">,</span><span class="token operator">*</span><span class="token punctuation">]</span><span class="token punctuation">;</span><span class="token punctuation">)</span> <span class="token operator">=></span> <span class="token punctuation">{</span><br> <span class="token punctuation">::</span><span class="token namespace">std<span class="token punctuation">::</span>cmp<span class="token punctuation">::</span></span><span class="token function">min</span><span class="token punctuation">(</span><span class="token macro property">min_tree!</span><span class="token punctuation">(</span>$<span class="token punctuation">(</span><span class="token variable">$a</span><span class="token punctuation">)</span><span class="token punctuation">,</span><span class="token operator">*</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token macro property">min_tree!</span><span class="token punctuation">(</span>$<span class="token punctuation">(</span><span class="token variable">$b</span><span class="token punctuation">)</span><span class="token punctuation">,</span><span class="token operator">*</span><span class="token punctuation">)</span><span class="token punctuation">)</span><br> <span class="token punctuation">}</span><span class="token punctuation">;</span><br><span class="token punctuation">}</span></code></pre>
<h2 id="results"><a class="header-anchor c-sun dec-none" href="#results">¶</a> Results</h2>
<p>First of all, I was right, tree reduction is faster, at least for the 10-element <code>min</code> I was benchmarking.
This is imagining in the context of graphics applications, so we expect relatively small cases, often of a known size (like finding the minimum among 8 neighbors, for instance).
What was slightly more surprising to me was that for floats, the loop was faster than the linear reduction.
Looking at <a href="https://godbolt.org/z/e32CnV">godbolt output</a> for a hardcoded case shows that they all get vectorized (and the loop gets unrolled), just with slightly different load scheduling.</p>
<p>Criterion produces really cool graphs.
Here’s the results from the two cases:</p>
<p><img src="https://blog.witchoflight.com/img/2018-08-27-violin-i32.svg" alt="violin-i32">
<img src="https://blog.witchoflight.com/img/2018-08-27-violin-f32.svg" alt="violin-f32"></p>
<p>I suspect if you want to compute the minimum of a <em>very</em> large list, you’ll benefit from doing tree reductions on independent chunks in a loop.</p>
LLVM 💖s Peano Addition2018-09-06T20:00:00-04:00https://blog.witchoflight.com/2018/llvm-hearts-peano-addition/<p>This semester I’m taking an advanced compilers class.
We’re going to be learning by making changes to LLVM, so for the first assignment I was reading recommended <a href="http://www.aosabook.org/en/llvm.html">introduction to LLVM</a>.
In order to give an example of some LLVM IR, it provides two small C functions implementing addition in different ways, and equivalent IR.</p>
<pre class="language-c"><code class="language-c"><span class="token keyword">unsigned</span> <span class="token function">add1</span><span class="token punctuation">(</span><span class="token keyword">unsigned</span> a<span class="token punctuation">,</span> <span class="token keyword">unsigned</span> b<span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token keyword">return</span> a<span class="token operator">+</span>b<span class="token punctuation">;</span><br><span class="token punctuation">}</span><br><br><span class="token comment">// Perhaps not the most efficient way to add two numbers.</span><br><span class="token keyword">unsigned</span> <span class="token function">add2</span><span class="token punctuation">(</span><span class="token keyword">unsigned</span> a<span class="token punctuation">,</span> <span class="token keyword">unsigned</span> b<span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token keyword">if</span> <span class="token punctuation">(</span>a <span class="token operator">==</span> <span class="token number">0</span><span class="token punctuation">)</span> <span class="token keyword">return</span> b<span class="token punctuation">;</span><br> <span class="token keyword">return</span> <span class="token function">add2</span><span class="token punctuation">(</span>a<span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> b<span class="token operator">+</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br><span class="token punctuation">}</span></code></pre>
<p>Being something of a mathematician myself, I felt I had to <a href="https://twitter.com/porglezomp/status/1037408309140221952">defend the honor</a> of “Peano-likers” from this defamation.
I made that joke tweet and moved on, but after <a href="https://twitter.com/hyperfekt/status/1037654687024119808">someone suggested LLVM optimize it</a>, I started to think about writing some of those optimization passes as hopefully easy pattern-matching definitions.</p>
<p>The next day, after compiling LLVM and getting a custom Hello World optimizer pass running, I decided to create some tests, and discovered (much to my surprise) that LLVM already handled Peano-style addition and multiplication perfectly competently!</p>
<p>I had just read John Regehr’s <a href="https://blog.regehr.org/archives/1603">blog post on how LLVM optimizes a function</a>, so I had an idea for how to investigate this.
If you haven’t read that yet, you should go read that first in order to see in some more detail LLVM’s optimization passes like the ones I’m going to describe below.</p>
<h2 id="how-to-view-the-optimizations"><a class="header-anchor c-sun dec-none" href="#how-to-view-the-optimizations">¶</a> How to View the Optimizations</h2>
<p>That blog post proceeds by running the LLVM <code>opt</code> tool and examining the changes between passes.
You can easily get the LLVM IR corresponding to some C code using <code>clang</code>, just run:</p>
<pre class="language-shell-session"><code class="language-shell-session"><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash">clang peano.c -emit-llvm -S -o peano.ll</span></span></code></pre>
<p>and you’ll have a beautiful LLVM IR dump in the textual format.
In order to view the optimizations on that code, you can run:</p>
<pre class="language-shell-session"><code class="language-shell-session"><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash">opt -O3 -print-before-all -print-after-all peano.ll</span></span></code></pre>
<p>This gives you a huge wall of IR dumps after each optimization pass.
If you want to do a similar investigation yourself, I wrote <a href="https://gist.github.com/porglezomp/f2dc233f971cf3f30d45e0b501ae5ead">a Python script that shows each pass’s diff</a> and waits for you to continue it.
Make sure you have <a href="https://github.com/jeffkaufman/icdiff">icdiff</a> (a very nice color diff tool) installed in order to use it, or else modify the diff invocation in the script.</p>
<h2 id="the-optimizations"><a class="header-anchor c-sun dec-none" href="#the-optimizations">¶</a> The Optimizations</h2>
<p>As you can see from <a href="https://blog.regehr.org/archives/1603">John Regehr’s blog post</a>, LLVM’s passes sometimes undo and redo lots of work without changing very much when working on a function this simple.
Furthermore, the code emitted by the Clang frontend is a little bit of a mess that needs quite a bit of cleanup before it’s decent code, in order to avoid needing to reimplement analyses that LLVM can do perfectly well itself.</p>
<p>In order to make this discussion clearer, I’ll use the hand-written IR from the introductory article rather than the IR emitted by clang, and only run through the necessary passes to get the job done, not the whole <code>-O3</code> pipeline.
At each step of the optimization, I’ll provide the IR, and some roughly corresponding C code.</p>
<h3 id="the-program"><a class="header-anchor c-sun dec-none" href="#the-program">¶</a> The Program</h3>
<p>We’ll be investigating this recursive definition of addition:</p>
<pre class="language-llvm"><code class="language-llvm"><span class="token keyword">define</span> <span class="token type class-name">i32</span> <span class="token variable">@add</span><span class="token punctuation">(</span><span class="token type class-name">i32</span> <span class="token variable">%a</span><span class="token punctuation">,</span> <span class="token type class-name">i32</span> <span class="token variable">%b</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br><span class="token label">entry:</span><br> <span class="token variable">%tmp1</span> <span class="token punctuation">=</span> <span class="token keyword">icmp</span> <span class="token keyword">eq</span> <span class="token type class-name">i32</span> <span class="token variable">%a</span><span class="token punctuation">,</span> <span class="token number">0</span><br> <span class="token keyword">br</span> <span class="token type class-name">i1</span> <span class="token variable">%tmp1</span><span class="token punctuation">,</span> <span class="token type class-name">label</span> <span class="token variable">%done</span><span class="token punctuation">,</span> <span class="token type class-name">label</span> <span class="token variable">%recurse</span><br><br><span class="token label">recurse:</span><br> <span class="token variable">%tmp2</span> <span class="token punctuation">=</span> <span class="token keyword">sub</span> <span class="token type class-name">i32</span> <span class="token variable">%a</span><span class="token punctuation">,</span> <span class="token number">1</span><br> <span class="token variable">%tmp3</span> <span class="token punctuation">=</span> <span class="token keyword">add</span> <span class="token type class-name">i32</span> <span class="token variable">%b</span><span class="token punctuation">,</span> <span class="token number">1</span><br> <span class="token variable">%tmp4</span> <span class="token punctuation">=</span> <span class="token keyword">call</span> <span class="token type class-name">i32</span> <span class="token variable">@add</span><span class="token punctuation">(</span><span class="token type class-name">i32</span> <span class="token variable">%tmp2</span><span class="token punctuation">,</span> <span class="token type class-name">i32</span> <span class="token variable">%tmp3</span><span class="token punctuation">)</span><br> <span class="token keyword">ret</span> <span class="token type class-name">i32</span> <span class="token variable">%tmp4</span><br><br><span class="token label">done:</span><br> <span class="token keyword">ret</span> <span class="token type class-name">i32</span> <span class="token variable">%b</span><br><span class="token punctuation">}</span></code></pre>
<p>Which corresponds to this C program:</p>
<pre class="language-c"><code class="language-c"><span class="token keyword">typedef</span> <span class="token keyword">unsigned</span> nat<span class="token punctuation">;</span><br><br>nat <span class="token function">add</span><span class="token punctuation">(</span>nat a<span class="token punctuation">,</span> nat b<span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token keyword">if</span> <span class="token punctuation">(</span>a <span class="token operator">==</span> <span class="token number">0</span><span class="token punctuation">)</span> <span class="token keyword">return</span> b<span class="token punctuation">;</span><br> <span class="token keyword">return</span> <span class="token function">add</span><span class="token punctuation">(</span>a<span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> b<span class="token operator">+</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br><span class="token punctuation">}</span></code></pre>
<h3 id="tail-call-optimization"><a class="header-anchor c-sun dec-none" href="#tail-call-optimization">¶</a> Tail Call Optimization</h3>
<p>The first important optimization here is tail call optimization.
Above we see that we call <code>@add</code> into <code>%tmp4</code> and then immediately return it without doing anything else in between, which makes this a tail call.
Therefore, in order to avoid the cost of calling functions, the extra stack frames needed, and to expose more opportunities for optimizations, tail call optimization turns our tail recursion into a loop.</p>
<pre class="language-llvm"><code class="language-llvm"><span class="token keyword">define</span> <span class="token type class-name">i32</span> <span class="token variable">@add</span><span class="token punctuation">(</span><span class="token type class-name">i32</span> <span class="token variable">%a</span><span class="token punctuation">,</span> <span class="token type class-name">i32</span> <span class="token variable">%b</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br><span class="token label">entry:</span><br> <span class="token keyword">br</span> <span class="token type class-name">label</span> <span class="token variable">%tailrecurse</span><br><br><span class="token label">tailrecurse:</span><br> <span class="token variable">%a.tr</span> <span class="token punctuation">=</span> <span class="token keyword">phi</span> <span class="token type class-name">i32</span> <span class="token punctuation">[</span> <span class="token variable">%a</span><span class="token punctuation">,</span> <span class="token variable">%entry</span> <span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token punctuation">[</span> <span class="token variable">%tmp2</span><span class="token punctuation">,</span> <span class="token variable">%recurse</span> <span class="token punctuation">]</span><br> <span class="token variable">%b.tr</span> <span class="token punctuation">=</span> <span class="token keyword">phi</span> <span class="token type class-name">i32</span> <span class="token punctuation">[</span> <span class="token variable">%b</span><span class="token punctuation">,</span> <span class="token variable">%entry</span> <span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token punctuation">[</span> <span class="token variable">%tmp3</span><span class="token punctuation">,</span> <span class="token variable">%recurse</span> <span class="token punctuation">]</span><br> <span class="token variable">%tmp1</span> <span class="token punctuation">=</span> <span class="token keyword">icmp</span> <span class="token keyword">eq</span> <span class="token type class-name">i32</span> <span class="token variable">%a.tr</span><span class="token punctuation">,</span> <span class="token number">0</span><br> <span class="token keyword">br</span> <span class="token type class-name">i1</span> <span class="token variable">%tmp1</span><span class="token punctuation">,</span> <span class="token type class-name">label</span> <span class="token variable">%done</span><span class="token punctuation">,</span> <span class="token type class-name">label</span> <span class="token variable">%recurse</span><br><br><span class="token label">recurse:</span><br> <span class="token variable">%tmp2</span> <span class="token punctuation">=</span> <span class="token keyword">sub</span> <span class="token type class-name">i32</span> <span class="token variable">%a.tr</span><span class="token punctuation">,</span> <span class="token number">1</span><br> <span class="token variable">%tmp3</span> <span class="token punctuation">=</span> <span class="token keyword">add</span> <span class="token type class-name">i32</span> <span class="token variable">%b.tr</span><span class="token punctuation">,</span> <span class="token number">1</span><br> <span class="token keyword">br</span> <span class="token type class-name">label</span> <span class="token variable">%tailrecurse</span><br><br><span class="token label">done:</span><br> <span class="token keyword">ret</span> <span class="token type class-name">i32</span> <span class="token variable">%b.tr</span><br><span class="token punctuation">}</span></code></pre>
<p>This code approximately corresponds to:</p>
<pre class="language-c"><code class="language-c">nat <span class="token function">add</span><span class="token punctuation">(</span>nat a<span class="token punctuation">,</span> nat b<span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token keyword">while</span> <span class="token punctuation">(</span>a <span class="token operator">!=</span> <span class="token number">0</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br> a <span class="token operator">-=</span> <span class="token number">1</span><span class="token punctuation">;</span><br> b <span class="token operator">+=</span> <span class="token number">1</span><span class="token punctuation">;</span><br> <span class="token punctuation">}</span><br> <span class="token keyword">return</span> b<span class="token punctuation">;</span><br><span class="token punctuation">}</span></code></pre>
<p>By removing the recursive call, further optimizations become visible.
In particular…</p>
<h3 id="induction-variable-simplification"><a class="header-anchor c-sun dec-none" href="#induction-variable-simplification">¶</a> Induction Variable Simplification</h3>
<p>Loop optimizations are a primary focus of compiler optimizations, because many programs spend most of their time in a few loops, making those loops faster is the most fruitful optimization.
“Induction Variable Simplification” is a specific optimization that works on identified “loop induction variables”, variables that change by a constant amount each loop iteration, or that are derived from other induction variables.</p>
<p>Here, <code>a</code> and <code>b</code> are identified as loop induction variables.
Event more critically, <code>a</code> is the induction variable that controls the loop condition, so <code>a</code> is counting down towards <code>0</code>.
Therefore, LLVM can determine that the loop will run exactly <code>a</code> times, called the “trip count.”</p>
<p>In cases where one of the induction variables is used after the loop and the trip count is statically known, LLVM performs an optimization where it computes the final value of the induction variable outside the loop, which splits the live range of the induction variable, and potentially makes it eligible for dead code elimination (which happens in this case).</p>
<pre class="language-llvm"><code class="language-llvm"><span class="token keyword">define</span> <span class="token type class-name">i32</span> <span class="token variable">@add</span><span class="token punctuation">(</span><span class="token type class-name">i32</span> <span class="token variable">%a</span><span class="token punctuation">,</span> <span class="token type class-name">i32</span> <span class="token variable">%b</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br><span class="token label">entry:</span><br> <span class="token keyword">br</span> <span class="token type class-name">label</span> <span class="token variable">%tailrecurse</span><br><br><span class="token comment">; Loop:</span><br><span class="token label">tailrecurse:</span><br> <span class="token variable">%a.tr</span> <span class="token punctuation">=</span> <span class="token keyword">phi</span> <span class="token type class-name">i32</span> <span class="token punctuation">[</span> <span class="token variable">%a</span><span class="token punctuation">,</span> <span class="token variable">%entry</span> <span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token punctuation">[</span> <span class="token variable">%tmp2</span><span class="token punctuation">,</span> <span class="token variable">%recurse</span> <span class="token punctuation">]</span><br> <span class="token variable">%tmp1</span> <span class="token punctuation">=</span> <span class="token keyword">icmp</span> <span class="token keyword">eq</span> <span class="token type class-name">i32</span> <span class="token variable">%a.tr</span><span class="token punctuation">,</span> <span class="token number">0</span><br> <span class="token keyword">br</span> <span class="token type class-name">i1</span> <span class="token variable">%tmp1</span><span class="token punctuation">,</span> <span class="token type class-name">label</span> <span class="token variable">%done</span><span class="token punctuation">,</span> <span class="token type class-name">label</span> <span class="token variable">%recurse</span><br><br><span class="token label">recurse:</span><br> <span class="token variable">%tmp2</span> <span class="token punctuation">=</span> <span class="token keyword">sub</span> <span class="token type class-name">i32</span> <span class="token variable">%a.tr</span><span class="token punctuation">,</span> <span class="token number">1</span><br> <span class="token keyword">br</span> <span class="token type class-name">label</span> <span class="token variable">%tailrecurse</span><br><br><span class="token comment">; Exit blocks</span><br><span class="token label">done:</span><br> <span class="token variable">%0</span> <span class="token punctuation">=</span> <span class="token keyword">add</span> <span class="token type class-name">i32</span> <span class="token variable">%b</span><span class="token punctuation">,</span> <span class="token variable">%a</span><br> <span class="token keyword">ret</span> <span class="token type class-name">i32</span> <span class="token variable">%0</span><br><span class="token punctuation">}</span></code></pre>
<p>This IR looks basically like this C:</p>
<pre class="language-c"><code class="language-c">nat <span class="token function">add</span><span class="token punctuation">(</span>nat a<span class="token punctuation">,</span> nat b<span class="token punctuation">)</span> <span class="token punctuation">{</span><br> nat a0 <span class="token operator">=</span> a<span class="token punctuation">;</span><br> <span class="token keyword">while</span> <span class="token punctuation">(</span>a0 <span class="token operator">!=</span> <span class="token number">0</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br> a0 <span class="token operator">-=</span> <span class="token number">1</span><span class="token punctuation">;</span><br> <span class="token punctuation">}</span><br> <span class="token keyword">return</span> b <span class="token operator">+</span> a<span class="token punctuation">;</span><br><span class="token punctuation">}</span></code></pre>
<p>If you’re interested in more details of these loop optimizations, my knowledge here comes from <a href="https://www.cs.cmu.edu/~fp/courses/15411-f13/lectures/17-loopopt.pdf">some very nice lecture notes</a> linked from Regehr’s blog post, go read that if you want to know more about how you actually detect these cases.</p>
<h3 id="delete-dead-loops"><a class="header-anchor c-sun dec-none" href="#delete-dead-loops">¶</a> Delete Dead Loops</h3>
<p>This pass is very straightforward.
The loop doesn’t do anything anymore, and we know it will terminate, so we can just get rid of it.</p>
<pre class="language-llvm"><code class="language-llvm"><span class="token keyword">define</span> <span class="token type class-name">i32</span> <span class="token variable">@add</span><span class="token punctuation">(</span><span class="token type class-name">i32</span> <span class="token variable">%a</span><span class="token punctuation">,</span> <span class="token type class-name">i32</span> <span class="token variable">%b</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br><span class="token label">entry:</span><br> <span class="token variable">%0</span> <span class="token punctuation">=</span> <span class="token keyword">add</span> <span class="token type class-name">i32</span> <span class="token variable">%b</span><span class="token punctuation">,</span> <span class="token variable">%a</span><br> <span class="token keyword">ret</span> <span class="token type class-name">i32</span> <span class="token variable">%0</span><br><span class="token punctuation">}</span></code></pre>
<p>And therefore, our code has been optimized down to:</p>
<pre class="language-c"><code class="language-c">nat <span class="token function">add</span><span class="token punctuation">(</span>nat a<span class="token punctuation">,</span> nat b<span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token keyword">return</span> b <span class="token operator">+</span> a<span class="token punctuation">;</span><br><span class="token punctuation">}</span></code></pre>
<p>Our recursive definition of addition turns out to actually be addition, and LLVM has proved it for us!</p>
<h2 id="takeaways"><a class="header-anchor c-sun dec-none" href="#takeaways">¶</a> Takeaways</h2>
<p>Very general optimizations can combine together to have some very surprising specific results, and optimizing compilers are very clever.</p>
<p>These same optimizations work to optimize Peano multiplication, since the loop induction variables like to work with linear functions, but they don’t succeed with saturating subtraction, recursive comparisons, or min/max.
It’ll be interesting to see if I can come up with a loop optimization pass that can deal with those more complicated trip counts / induction variables in general at all, or if I’ll only succeed at pattern matching these very specific functions.</p>
Getting Spaced in Heat Signature2018-10-04T20:00:00-04:00https://blog.witchoflight.com/2018/heat-signature/<p><a href="https://spaceships.cool/">Heat Signature</a> is a game where you can go inside the spaceships.
More specifically, it’s a game designed by <a href="https://twitter.com/Pentadact/">Tom Francis</a> about being a space freedom fighter and mercenary, hijacking spaceships, doing cool missions, and trying to complete your own personal goal before you retire or die.
If you want to get a good feel for how the game works, he has an excellent series where he’s playing the daily challenge missions every day for a week, <a href="https://www.youtube.com/playlist?list=PLUtKzyIe0aB1YePV-rl-4pGjYTL2WRCRo">which you can watch here</a>.</p>
<p>Heat Signature does something I really like out of video games, where it gives you lots of tools to get yourself into trouble, and then just as many ways to try to get yourself back out.
You can find this structure in a variety of games; one of the first times I heard it clearly articulated was in describing <a href="https://subsetgames.com/ftl.html">FTL: Faster Than Light</a>, which is a game where you both figuratively and literally have to put out a lot of fires.
This mode of play lends itself to very memorable stories.
Whether you succeed or fail, you’ll probably have something to tell a friend about.
The mechanics hand you a satisfying narrative arc and built-in twists make the story rich.
In my first hour and a half playing it, I got one of those stories that I felt was worth writing up.</p>
<hr>
<p>I had done very well on the normal missions for my first character.
I had pulled off two almost flawless hard missions in a row, so I decided I should try the daily challenge.
My mission is to assassinate the 3 guards that had killed my boyfriend, without extra casualties.
(A daily challenge always has an extra condition that determines your score.)
To help me on this mission, I have an armor-piercing short-blade, a concussive gun, and a swapper teleporter.</p>
<p>I fly my pod to the first mission ship and begin scoping it out.
Unfortunately, on this mission there’s a jammer making the rounds—they wander around the ship laying down huge jammer fields that prevent gadgets from working.
During the first few seconds that I watch the ship, the jammer lays down their first field.</p>
<p>Now I’m in a hurry.
I don’t want them to get a chance to cover too much more of the ship, since that makes me much less effective.
In my haste, I act a bit rashly.
I figure I can rush in and knock the first guard out when they’re not looking, and then knock out the jammer quickly after.
As I step through my pod’s airlock into the ship, I realize I don’t have a non-lethal melee weapon, and my gun’s not silenced.
If I shoot, it’s gonna be <em>really</em> loud.
So, as the guard begins to take notice of me, I do what any sensible person would do: I throw my gun.</p>
<p>It sails out of my hand toward the guard.
But this guard is wearing armor.
The gun clangs harmlessly off it, and they continue to stare me down unfazed, their alert meter slowly creeping higher towards the point where they’ll shoot me.
You know what’s worse though?
As the gun ricochets of their armored chest, it goes off.
Every guard in the sector has heard me.
They know there’s a fight going on and they all start rushing towards the noise, sounding the alarm on the way.
I’m slightly panicked, but I still feel like I can get out of this.
Now that I’ve thrown my gun, I don’t have any weapon to take down the guard in front of me.
Even if I was willing to kill them, I’m too far away to reach with my short-blade—but I can probably teleport away!</p>
<p>I pull out my visitor (a teleporter that temporarily brings you to a spot, then a few seconds later pulls you back) and try to pick the best destination.
I want to stay out of sight, and hopefully get pulled back behind the guard after they go looking for me.
I carefully make my choice.
And click.
And the teleporter fizzles, because I’m still standing in field that the jammer laid down at the beginning of the mission
A second later, the guard shoots me, and throws me unceremoniously out the airlock.
As I try to grab my unconscious body with my pod, the alarm timer countdown runs out, and my targets escape.</p>
<hr>
<p>And that’s the story of how I failed my first Heat Signature daily mission in only 33 seconds.
This took several minutes of real time, because the game encourages you to pause frequently for planning and reflexes, but the entire mission from start to finish only took 33 seconds of game time.
If this sounds like your sort of game, consider <a href="https://spaceships.cool/">buying it for yourself</a>, and tell me what story you get yourself into.</p>
Hello World, in 0x A Presses2019-11-01T20:00:00-04:00https://blog.witchoflight.com/2019/hello-world-in-0x-a-presses/<h3 id="frantic-video"><a class="header-anchor c-sun dec-none" href="#frantic-video">¶</a> Frantic Video</h3>
<div class="youtube row hcenter">
<iframe width="560" height="315" src="https://www.youtube.com/embed/wulRHxO9w-U" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>
</div>
<p>If you want this content compressed into a 4 minute video of the lightning talk, then it’s available now!
It’s considerably more frantic than the blog post.
Also note that it has some audible breathing in the first half, if that’s the sort of thing that bothers you.</p>
<h3 id="the-mario-a-button-challenge"><a class="header-anchor c-sun dec-none" href="#the-mario-a-button-challenge">¶</a> The (Mario) A Button Challenge</h3>
<p>In the Super Mario 64 speedrunning scene, many people have gotten good enough at the game that doing normal speedrunning isn’t challenging enough anymore.
In order to add the difficulty back in, there are many “Category Extensions” that add some extra challenge on top.
One of the most interesting of these is the A Button Challenge, where runners attempt to play the game pressing the A button as few times as possible.
Since the A button is the jump button, and this is a game about jumping, this leads to quite a few creative solutions and alternate routes.
“Watch For Rolling Rocks - 0.5x A Presses” is an excellent video showing one of those routes.
If you haven’t watched it before, it’s much more interesting than anything I can write, watch it first :).</p>
<div class="youtube row hcenter">
<iframe width="560" height="315" src="https://www.youtube.com/embed/kpk2tdsPh0A" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>
</div>
<h3 id="the-programming-a-button-challenge"><a class="header-anchor c-sun dec-none" href="#the-programming-a-button-challenge">¶</a> The (Programming) A Button Challenge</h3>
<p>In mid-October I tweeted a joke about doing the “Advent of Code” A Button Challenge.</p>
<div class="row hcenter">
<blockquote class="twitter-tweet">
<p lang="en" dir="ltr">
Advent of Code, A Button Challenge: solve all programs without ever writing the letter A in any of them.
</p> — genderAlgebraist (@porglezomp)
<a href="https://twitter.com/porglezomp/status/1184561096859869184">2019-10-16</a>
</blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8">
</script>
</div>
<p>This inspired some discussion in the replies by Swift compiler folks discussing ways to try to handle programs without needing <code>var</code>.
It was a lot of fun!</p>
<p>Just a few days later I was at <a href="https://rust-belt-rust.com/">Rust Belt Rust</a>.
After lunch, I was hanging out in the hall with some people, and we were sharing jokes, and I made the Rust A Button Challenge joke.
At first it doesn’t sound too bad, until you realize that <code>main</code> has an <code>a</code> in it, and so you’re immediately in trouble.
We started thinking through different approaches, and eventually got a working solution, and eventually a polished solution.
Doing this looks at a wide variety of interesting topics in Rust, and so it’s worth sharing.</p>
<h3 id="hello-rust-without-the-prohibited-letter"><a class="header-anchor c-sun dec-none" href="#hello-rust-without-the-prohibited-letter">¶</a> Hello, Rust! (Without the Prohibited Letter!)</h3>
<p>So, we want to write Hello World.
Let’s make a project.</p>
<pre class="language-shell-session"><code class="language-shell-session"><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash">cargo init button</span></span></code></pre>
<p>And we’ve already hit trouble.
We can’t even write <code>cargo</code>.
Luckily, bash has our backs here, and we can use command substitution with <code>printf</code> to get it.</p>
<pre class="language-shell-session"><code class="language-shell-session"><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash"><span class="token variable"><span class="token variable">$(</span><span class="token builtin class-name">printf</span> <span class="token string">'c\141rgo init button'</span><span class="token variable">)</span></span></span></span></code></pre>
<p>Because this is such a mechanical replacement, for legibility I’m gonna just write the normal commands even when they would include an <code>a</code>, and you can assume that we’re using the <code>printf</code> method.
Technically, this already just wrote a Hello World for us, in <code>main.rs</code>, due to the default project.
Let’s say, though, that this doesn’t meet the spirit of the challenge, and we have to rewrite it.
We can’t write it the usual way, with:</p>
<pre class="language-rust"><code class="language-rust"><span class="token keyword">fn</span> <span class="token function-definition function">main</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token macro property">println!</span><span class="token punctuation">(</span><span class="token string">"Hello, World!"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br><span class="token punctuation">}</span></code></pre>
<p>Because <code>main</code> has an <code>a</code> right there!
So, we’ll have to get more creative.
There are a few ways to override the entry point if you don’t want to use Rust’s default <code>main</code>.
We could try…</p>
<pre><code>#[start]
</code></pre>
<p>Well that won’t work.
And even if it did, it needs a <code>#![feature]</code> to enable it, which would cause trouble on its own.</p>
<p>Libraries have ways to have global symbols that are called at the start.
There are various mechanisms on different platforms, but they’re all troublesome.
If we wanted to name a function <code>__init</code> (which is one of the Linux symbols for the purpose) and have it be recognized, then we’d need either <code>#[link_name]</code> or <code>#[no_mangle]</code>, both of which needs an <code>a</code>.
There are ways to get the correct setup using a <code>pub static</code>… but that also uses an <code>a</code>.
So, in the end we’ve exhausted all the options here.</p>
<p>What we really want is some attribute which adds a new, custom entry point, we want it to be in the standard library, since otherwise we could cheat by using some external macro written with the letter <code>a</code> that does all the work for us.
We can’t write one ourselves <em>without</em> using the letter <code>a</code>, because we’d have to write <code>proc_macro</code>.
But, it turns out, we don’t need to, since one of these has been sitting in front of us all along.</p>
<pre class="language-rust"><code class="language-rust"><span class="token attribute attr-name">#[test]</span></code></pre>
<p>Let’s write our first working candidate for Hello World!</p>
<pre class="language-rust"><code class="language-rust"><span class="token attribute attr-name">#[test]</span><br><span class="token keyword">fn</span> <span class="token function-definition function">hello</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token macro property">println!</span><span class="token punctuation">(</span><span class="token string">"Hello, World!"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br><span class="token punctuation">}</span></code></pre>
<p>And now we can run it:</p>
<pre class="language-shell-session"><code class="language-shell-session"><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash">cargo <span class="token builtin class-name">test</span></span></span><br><span class="token output">running 1 test<br>test test ... ok<br><br>test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out<br><br> Doc-tests abc<br><br>running 0 tests<br><br>test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out</span></code></pre>
<p>Ah right, we need <code>--nocapture</code> if we want to see our output.</p>
<pre class="language-shell-session"><code class="language-shell-session"><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash">cargo <span class="token builtin class-name">test</span> -- --nocapture</span></span><br><span class="token output">Hello, World!<br>test test ... ok<br><br>test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out<br><br> Doc-tests abc<br><br>running 0 tests<br><br>test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out</span></code></pre>
<p>So we’ve got Hello World for sure, but we also have a bunch of other junk too.
We can clean that up slightly with a few more flags, <code>--lib</code> will get rid of the doctest, and <code>--quiet</code> will remove slightly more.</p>
<pre class="language-shell-session"><code class="language-shell-session"><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash">cargo <span class="token builtin class-name">test</span> --lib --quiet -- --nocapture</span></span><br><br><span class="token output">running 1 test<br>Hello, World!<br>.<br>test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out</span></code></pre>
<p>And that’s about as clean as we can get with command line arguments.
This still isn’t great.
But we can do better.
First, let’s avoid that extra text afterward.
Rust’s tests are run in-process, so if we just exit immediately, then we clean things up a bit more.
Now, with this program:</p>
<pre class="language-rust"><code class="language-rust"><span class="token attribute attr-name">#[test]</span><br><span class="token keyword">fn</span> <span class="token function-definition function">test</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token macro property">println!</span><span class="token punctuation">(</span><span class="token string">"Hello, World!"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token namespace">std<span class="token punctuation">::</span>process<span class="token punctuation">::</span></span><span class="token function">exit</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br><span class="token punctuation">}</span></code></pre>
<p>We get a much cleaner output:</p>
<pre><code>
running 1 test
Hello, World!
</code></pre>
<p>We can deal with that initial text as well, too.
You can expect that essentially all terminals implement <a href="http://ascii-table.com/ansi-escape-sequences.php">ANSI escape sequences</a>, which allow you to control the cursor, mess with text, and more.
In this case, we want to use the sequence <code><Esc>[<Value>A</code>, which is “Cursor Up.”</p>
<blockquote>
<p>Moves the cursor up by the specified number of lines without changing columns. If the cursor is already on the top line, ANSI.SYS ignores this sequence.</p>
</blockquote>
<p>Using this, we can go back to the previous lines and overwrite them with spaces, and then go back yet again to write on the first line.</p>
<pre class="language-rust"><code class="language-rust"><span class="token attribute attr-name">#[test]</span><br><span class="token keyword">fn</span> <span class="token function-definition function">test</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token macro property">print!</span><span class="token punctuation">(</span><span class="token string">"\x1b[1\x41 \x1b[1\x41\r"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token macro property">println!</span><span class="token punctuation">(</span><span class="token string">"Hello, World!"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token namespace">std<span class="token punctuation">::</span>process<span class="token punctuation">::</span></span><span class="token function">exit</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br><span class="token punctuation">}</span></code></pre>
<p>This works in 4 steps.
First, <code>\x1b[1\x41</code> moves up by one line.
Then, the sequence of spaces overwrites the existing text on the line.
We repeat the <code>\x1b[1\x41</code> sequence to move up on the the first line, and then finally use <code>\r</code> to go back to the beginning of the line.
Now text can write from there and it will all be hidden.
Finally, this prints only:</p>
<pre class="language-shell-session"><code class="language-shell-session"><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash">cargo <span class="token builtin class-name">test</span></span></span><br><span class="token output">Hello, World!</span></code></pre>
<h3 id="not-done-yet"><a class="header-anchor c-sun dec-none" href="#not-done-yet">¶</a> Not Done Yet?</h3>
<p>Ok.
We’ve written our program without using the A key, but maybe you’re not satisfied.
We’ve got a big 'ol <code>Cargo.toml</code> with an <code>a</code>, and plenty of <code>cargo</code> on the command line.
Sure we never have to type any of that, but maybe you think that’s against the spirit of the thing?
After all, I’ve been using bash here, but using a substitution as your command flat-out doesn’t work in Fish, my shell of choice.
If you want to do that, you need <code>eval</code>… and you can see where this breaks down.</p>
<p>Luckily, we don’t <em>have</em> to use Cargo.
We can use <code>rustc</code>, a compiler which graciously has no <code>a</code> in its name.
Let’s start with the program we finished with in the last section.</p>
<pre class="language-shell-session"><code class="language-shell-session"><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash"><span class="token function">vim</span> hello.rs <span class="token comment"># Type it in without using the forbidden letter!</span></span></span><br><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash">rustc --test hello.rs</span></span><br><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash">./hello</span></span><br><br><span class="token output">running 1 test</span></code></pre>
<p>Hm
This time we can’t write <code>--nocapture</code>, so we don’t get any output, and we don’t overwrite the harness text.
The test harness is capturing out stdout, so we need to get it back.
This calls for one final trick: let’s reopen <code>stdout</code> again.</p>
<pre class="language-rust"><code class="language-rust"><span class="token keyword">use</span> <span class="token namespace">std<span class="token punctuation">::</span>io<span class="token punctuation">::</span></span><span class="token class-name">Write</span><span class="token punctuation">;</span><br><br><span class="token attribute attr-name">#[test]</span><br><span class="token keyword">fn</span> <span class="token function-definition function">test</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token keyword">let</span> <span class="token keyword">mut</span> f <span class="token operator">=</span> <span class="token namespace">std<span class="token punctuation">::</span>fs<span class="token punctuation">::</span></span><span class="token class-name">File</span><span class="token punctuation">::</span><span class="token function">open</span><span class="token punctuation">(</span><span class="token string">"/dev/stdout"</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">expect</span><span class="token punctuation">(</span><span class="token string">"stdout"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token macro property">writeln!</span><span class="token punctuation">(</span>f<span class="token punctuation">,</span> <span class="token string">"\x1b[1\x41 \x1b[2\x41\r"</span><span class="token punctuation">)</span><br> <span class="token punctuation">.</span><span class="token function">expect</span><span class="token punctuation">(</span><span class="token string">"write"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token macro property">writeln!</span><span class="token punctuation">(</span>f<span class="token punctuation">,</span> <span class="token string">"Hello, World!"</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">expect</span><span class="token punctuation">(</span><span class="token string">"write"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token namespace">std<span class="token punctuation">::</span>process<span class="token punctuation">::</span></span><span class="token function">exit</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br><span class="token punctuation">}</span></code></pre>
<p>Running this, we get:</p>
<pre class="language-shell-session"><code class="language-shell-session"><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash">rustc --test hello.rs</span></span><br><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash">./hello --quiet</span></span><br><span class="token output">Hello, World!</span></code></pre>
<p>Truly in 0x A presses.</p>
<p>Thanks to <a href="https://www.youtube.com/user/pannenkoek2012">Pannenkoek2012</a> for popularizing the Super Mario 64 A Button Challenge, and making the amazing video.
Thanks to <a href="https://twitter.com/quietmisdreavus">@quietmisdreavus</a> for giving suggestions for flags to use when testing with Rust, and <a href="https://twitter.com/myrrlyn">@myrrlyn</a> for suggesting ANSI escapes.
You can take a look at <a href="https://gist.github.com/porglezomp/68f4d9e7be29b758284a7b897269c718/revisions">many revisions of a gist containing the program</a>.</p>
<p>If you try an A Button Challenge in your programming language of choice, <a href="https://twitter.com/porglezomp">send me a tweet</a> telling me about it and what was difficult!</p>
Modding Games and Freezing Fish2019-11-05T19:00:00-05:00https://blog.witchoflight.com/2019/modding-games-and-freezing-fish/<aside class="warning"><p>This article will contain minor spoilers for the Dark Bramble in <em>Outer Wilds</em>.</p>
</aside>
<p>In October I played <a href="http://outerwilds.com/"><em>Outer Wilds</em></a>, a game about being an astronaut and a space archaeologist in a weird solar system.
Their website describes:</p>
<blockquote>
<p>Outer wilds is an exploration game about curiosity, roasting marshmallows, and unraveling the mysteries of the cosmos.</p>
</blockquote>
<p>It very quickly became one of my favorite games that I’ve ever played.
But unfortunately, even though it’s almost entirely a pure joy to play, there’s one <em>single</em> part that I hate.</p>
<p>It’s the anglerfish.
These guys.</p>
<figure class="col hcenter">
<img src="https://blog.witchoflight.com/img/2019-anglerfish.png" alt="An anglerfish floating in fog">
<figcaption><p>Image from <a href="https://outerwilds.gamepedia.com/File:Angler_crop.png">the Outer Wilds Wiki</a>, licensed as <a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA 3.0</a>.</p>
</figcaption>
</figure>
<p>In the Dark Bramble, these anglerfish are constantly looming out in the fog, making everything terrifying.
They’re <em>great</em> for the spooky atmosphere, but I found them absolutely miserable mechanically.
At one point you need to sneak past them in an enclosed space, not making any noise with your engines.
I was never able to manage it.
This was almost the last thing I needed to do in the entire game, and I was so stuck I just put the game down for about two weeks in frustration.</p>
<p>But finally, I came back with a solution.</p>
<h3 id="hacking-the-game"><a class="header-anchor c-sun dec-none" href="#hacking-the-game">¶</a> Hacking the Game</h3>
<p>I decided that I could probably mod the game.
I knew it was made in Unity, which means it had a .NET core, so I set out to figure out how you mod a Unity game or a .NET executable.
My initial searching found <a href="https://github.com/icsharpcode/ILSpy">ILSpy</a>, which is an absolutely excellent tool for disassembling and viewing .NET code, but it didn’t let you modify it.
Nonetheless, I used it to look around in some Unity games to make sure I could understand them.
What I ended up actually using was <a href="https://github.com/0xd4d/dnSpy">dnSpy</a>, which in addition to letting you disassemble and view code, also acts as a debugger and an editor.</p>
<p>Armed with this tool, I set out to hack apart the game.</p>
<figure class="col hcenter">
<img src="https://blog.witchoflight.com/img/2019-outer-wilds-mod-01.png" alt="A Windows file explorer dialogue.">
<figcaption><p>To disassemble a game, you have to find where the code is.
For Unity games, it’s the <code>Assembly-CSharp.dll</code>, generally stored in a folder next to the game executable.</p>
</figcaption>
</figure>
<p>To do this, make sure you make a copy of the original <code>Assembly-CSharp.dll</code> so you can revert to that if you want to.
You’ll also want to make a backup of your save files, just in case something goes save-corruptingly wrong.
Once you’re all set up with backup files, you can open the assembly in dnSpy.</p>
<figure class="col hcenter">
<img src="https://blog.witchoflight.com/img/2019-outer-wilds-mod-02.png" alt="Using the file > open dialogue in dnSpy.">
<figcaption></figcaption>
</figure>
<p>When we open up the assembly, it gets added to the assembly listing.
Inside it is a single DLL, and expanding that shows a list of namespaces.
There are some various editor related namespaces that have some classes, but the vast majority are in the top-level namespace, labeled with empty braces and no name.
If we look inside this namespace, near the top is the <code>AnglerfishController</code> class, which seems like it’s probably exactly what we want.</p>
<figure class="col hcenter">
<img src="https://blog.witchoflight.com/img/2019-outer-wilds-mod-03.png" alt="dnSpy showing the assembly listing, with the AnglerfishController selected.">
<figcaption></figcaption>
</figure>
<p>Once we’ve located it, we should look around inside the class.
Because I know some Unity, I expect that the <code>Awake</code> and <code>Start</code> methods are likely to have interesting initialization in them.
And indeed, if we look at <code>Awake()</code>, we see can see a <code>_noiseSensor</code> field.
Since the anglerfish chase you when you make noise, this seems like a useful thing to disable.</p>
<pre class="language-csharp"><code class="language-csharp"><span class="token keyword">protected</span> <span class="token keyword">override</span> <span class="token return-type class-name"><span class="token keyword">void</span></span> <span class="token function">Awake</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token keyword">base</span><span class="token punctuation">.</span><span class="token function">Awake</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token keyword">this</span><span class="token punctuation">.</span>_anglerBody <span class="token operator">=</span> <span class="token keyword">this</span><span class="token punctuation">.</span><span class="token generic-method"><span class="token function">GetRequiredComponent</span><span class="token generic class-name"><span class="token punctuation"><</span>OWRigidbody<span class="token punctuation">></span></span></span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token keyword">this</span><span class="token punctuation">.</span>_impactSensor <span class="token operator">=</span> <span class="token keyword">this</span><span class="token punctuation">.</span><span class="token generic-method"><span class="token function">GetRequiredComponent</span><span class="token generic class-name"><span class="token punctuation"><</span>ImpactSensor<span class="token punctuation">></span></span></span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token keyword">this</span><span class="token punctuation">.</span>_noiseSensor <span class="token operator">=</span> <span class="token keyword">this</span><span class="token punctuation">.</span><span class="token generic-method"><span class="token function">GetRequiredComponentInChildren</span><span class="token generic class-name"><span class="token punctuation"><</span>NoiseSensor<span class="token punctuation">></span></span></span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token keyword">this</span><span class="token punctuation">.</span>_anglerfishFluidVolume <span class="token operator">=</span> <span class="token keyword">this</span><span class="token punctuation">.</span><span class="token generic-method"><span class="token function">GetRequiredComponentInChildren</span><span class="token generic class-name"><span class="token punctuation"><</span>AnglerfishFluidVolume<span class="token punctuation">></span></span></span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token keyword">this</span><span class="token punctuation">.</span>_currentState <span class="token operator">=</span> AnglerfishController<span class="token punctuation">.</span>AnglerState<span class="token punctuation">.</span>Lurking<span class="token punctuation">;</span><br> <span class="token keyword">this</span><span class="token punctuation">.</span>_turningInPlace <span class="token operator">=</span> <span class="token boolean">false</span><span class="token punctuation">;</span><br> <span class="token keyword">this</span><span class="token punctuation">.</span>_stunTimer <span class="token operator">=</span> <span class="token number">0f</span><span class="token punctuation">;</span><br> <span class="token keyword">this</span><span class="token punctuation">.</span>_consumeStartTime <span class="token operator">=</span> <span class="token operator">-</span><span class="token number">1f</span><span class="token punctuation">;</span><br> <span class="token keyword">this</span><span class="token punctuation">.</span>_consumeComplete <span class="token operator">=</span> <span class="token boolean">false</span><span class="token punctuation">;</span><br><span class="token punctuation">}</span></code></pre>
<p>If we go look for the other places that this field is used, we find that the enable and disable methods register a noise listener <code>OnClosestAudibleNoise</code>.
If we look at <em>that</em> method, it looks like it starts chasing the source of the noise, which is exactly what we don’t want.</p>
<pre class="language-csharp"><code class="language-csharp"><span class="token keyword">private</span> <span class="token return-type class-name"><span class="token keyword">void</span></span> <span class="token function">OnEnable</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token keyword">this</span><span class="token punctuation">.</span>_impactSensor<span class="token punctuation">.</span>OnImpact <span class="token operator">+=</span> <span class="token keyword">this</span><span class="token punctuation">.</span>OnImpact<span class="token punctuation">;</span><br> <span class="token keyword">this</span><span class="token punctuation">.</span>_noiseSensor<span class="token punctuation">.</span>OnClosestAudibleNoise <span class="token operator">+=</span> <span class="token keyword">this</span><span class="token punctuation">.</span>OnClosestAudibleNoise<span class="token punctuation">;</span><br> <span class="token keyword">this</span><span class="token punctuation">.</span>_anglerfishFluidVolume<span class="token punctuation">.</span>OnCaughtObject <span class="token operator">+=</span> <span class="token keyword">this</span><span class="token punctuation">.</span>OnCaughtObject<span class="token punctuation">;</span><br><span class="token punctuation">}</span><br><br><span class="token keyword">private</span> <span class="token return-type class-name"><span class="token keyword">void</span></span> <span class="token function">OnDisable</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token keyword">this</span><span class="token punctuation">.</span>_impactSensor<span class="token punctuation">.</span>OnImpact <span class="token operator">-=</span> <span class="token keyword">this</span><span class="token punctuation">.</span>OnImpact<span class="token punctuation">;</span><br> <span class="token keyword">this</span><span class="token punctuation">.</span>_noiseSensor<span class="token punctuation">.</span>OnClosestAudibleNoise <span class="token operator">-=</span> <span class="token keyword">this</span><span class="token punctuation">.</span>OnClosestAudibleNoise<span class="token punctuation">;</span><br> <span class="token keyword">this</span><span class="token punctuation">.</span>_anglerfishFluidVolume<span class="token punctuation">.</span>OnCaughtObject <span class="token operator">-=</span> <span class="token keyword">this</span><span class="token punctuation">.</span>OnCaughtObject<span class="token punctuation">;</span><br><span class="token punctuation">}</span></code></pre>
<p>So let’s remove that registration entirely.
We can right click on a method and select “Edit Method” to modify the code.</p>
<figure class="col hcenter">
<img src="https://blog.witchoflight.com/img/2019-outer-wilds-mod-04.png" alt="Right clicking on a method to demonstrate the 'Edit Method' option in the context menu.">
<figcaption></figcaption>
</figure>
<p>In here, let’s remove the registration and deregistration statements.
I didn’t touch the <code>_impactSensor</code> registration, so potentially if you run into an anglerfish it’ll still kill you.
Just that alone was plenty tension to leave them terrifying monsters sitting in the fog.
If you want to get rid of that difficulty as well, you could remove that registration, or even modify one of the update methods to delete the anglerfish itself.</p>
<pre class="language-diff"><code class="language-diff"><span class="token unchanged"><span class="token prefix unchanged"> </span><span class="token line">private void OnEnable() {<br></span><span class="token prefix unchanged"> </span><span class="token line"> this._impactSensor.OnImpact += this.OnImpact;<br></span></span><span class="token deleted-sign deleted"><span class="token prefix deleted">-</span><span class="token line"> this._noiseSensor.OnClosestAudibleNoise += this.OnClosestAudibleNoise;<br></span></span><span class="token unchanged"><span class="token prefix unchanged"> </span><span class="token line"> this._anglerfishFluidVolume.OnCaughtObject += this.OnCaughtObject;<br></span><span class="token prefix unchanged"> </span><span class="token line">}<br></span></span><br><span class="token unchanged"><span class="token prefix unchanged"> </span><span class="token line">private void OnDisable() {<br></span><span class="token prefix unchanged"> </span><span class="token line"> this._impactSensor.OnImpact -= this.OnImpact;<br></span></span><span class="token deleted-sign deleted"><span class="token prefix deleted">-</span><span class="token line"> this._noiseSensor.OnClosestAudibleNoise -= this.OnClosestAudibleNoise;<br></span></span><span class="token unchanged"><span class="token prefix unchanged"> </span><span class="token line"> this._anglerfishFluidVolume.OnCaughtObject -= this.OnCaughtObject;<br></span><span class="token prefix unchanged"> </span><span class="token line">}</span></span></code></pre>
<p>When I tried to click the compile button after making my modifications, it gave me some confusing error messages.
Apparently some of the attributes that disassembly put on <code>event</code> declarations aren’t valid to go there, or something?</p>
<figure class="col hcenter">
<img src="https://blog.witchoflight.com/img/2019-outer-wilds-mod-05.png" alt="A list of errors and warnings from the compile dialogue, saying that events can't be DebuggerBrowsable.">
<figcaption></figcaption>
</figure>
<p>Double-clicking on one of the errors leads to this attribute declaration:</p>
<figure class="col hcenter">
<img src="https://blog.witchoflight.com/img/2019-outer-wilds-mod-06.png" alt="A System.Diagnostics.DebuggerBrowsable attribute on an event declaration.">
<figcaption></figcaption>
</figure>
<p>I just deleted every attribute it was complaining about, and then it compiled cleanly.
After you compile, you can save the assembly… and then try out the game!</p>
<h3 id="the-test"><a class="header-anchor c-sun dec-none" href="#the-test">¶</a> The Test</h3>
<p>It worked perfectly for me.
I could sneak right past anglerfish at full engine blast if I wanted to and they would never hear… and I was still terrified of accidentally bumping into them, so the atmosphere of the scene was preserved pretty well.
With this little mod, I was able to beat the game (the end of <em>Outer Wilds</em> is absolutely fantastic).</p>
<p>I found it super empowering to be able to change a game so that it accommodated the way I wanted to play it.
I hope it’s useful to some of you too.</p>
Hosting My Own Git2020-02-17T19:00:00-05:00https://blog.witchoflight.com/2020/hosting-my-own-git/<p>I’ve wanted to get off GitHub at least for new projects due to their <a href="https://github.com/drop-ice/dear-github-2.0">supporting ICE</a>, but I hadn’t figured what I was going to do until recently.
Then, Jordan Rose made <a href="https://belkadan.com/blog/2020/01/Gitweb-on-Shared-Hosting/">a post about setting up personal Git hosting</a> that I liked, and I learned how easy it was to set up, and about <a href="https://git-scm.com/docs/gitweb">GitWeb</a>.
So I set up my own shared Git hosting that suited my needs, which you can take a look at at <a href="https://git.witchoflight.com/">git.witchoflight.com</a>.</p>
<h3 id="hosting-your-own-git"><a class="header-anchor c-sun dec-none" href="#hosting-your-own-git">¶</a> Hosting <em>Your</em> Own Git</h3>
<p>It turns out if you only care about SSH access, hosting Git is way easier than I believed it would be.
If you have a server with SSH access, as the user you want to use Git as, run <code>git init --bare</code> to create a repository.
On my server, I created a <code>git</code> user, and a <code>/repos/</code> folder owned by the <code>git</code> user where I could create all of my repos.
Then, it’s just a matter of using <code><user>@<server>:<path></code> as your remote.
With that, your git and ssh handles all of the connections and transfer, you don’t need a special Git server running or anything.</p>
<p>As a specific example of using this:
When I want a new repository on my server “Sunstone,” I ssh there and create a new bare repository:</p>
<pre class="language-shell-session"><code class="language-shell-session"><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash"><span class="token function">ssh</span> sunstone</span></span><br><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash"><span class="token function">su</span> <span class="token function">git</span></span></span><br><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash"><span class="token builtin class-name">cd</span> /repos/</span></span><br><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash"><span class="token function">git</span> init --bare fancy-new-repo.git</span></span></code></pre>
<p>I include the <code>.git</code> suffix as a personal preference, it’s just part of the path and doesn’t do anything special, so you can leave it off if you prefer.
Now I can clone it as <code>git clone git@sunstone:/repos/fancy-new-repo.git</code>.
You can do all sorts of setup to make this work fine for letting multiple users push, including users that don’t have access to most of the computer, but I’ll leave those kinds of things to <a href="https://git-scm.com/book/en/v2/Git-on-the-Server-Setting-Up-the-Server">the Git docs</a>.
There are other fancy things you can do to make it nicer, I’ll get to some of those later.</p>
<h3 id="sharing-with-the-world"><a class="header-anchor c-sun dec-none" href="#sharing-with-the-world">¶</a> Sharing With The World</h3>
<p>One of the big things that GitHub offers is the ability for anybody to easily read and clone the code without any particular authorization.
It turns out that Git itself offers a solution for this, called <a href="https://git-scm.com/docs/gitweb">GitWeb</a>.
It’s a server that makes readable web pages for Git repositories, and as a side effect you can do plain HTTP(S) clones.</p>
<p>For this, I followed the setup directions <a href="https://git-scm.com/docs/gitweb">in the docs</a> and in <a href="https://belkadan.com/blog/2020/01/Gitweb-on-Shared-Hosting/">Jordan’s post</a>, as well as searching around online for various setups.
In the end, I put a few pieces together which were different from what I found.
My setup is a bit different from those because I’m on nginx.
Like Jordan, I want people to be able to use the nice URLs, and also to clone over HTTPS.</p>
<h4 id="configuring-gitweb"><a class="header-anchor c-sun dec-none" href="#configuring-gitweb">¶</a> Configuring GitWeb</h4>
<p>First, <code>/etc/gitweb.conf</code>:</p>
<pre class="language-perl"><code class="language-perl"><span class="token keyword">our</span> <span class="token variable">$projectroot</span> <span class="token operator">=</span> <span class="token string">"/repos"</span><span class="token punctuation">;</span><br><span class="token keyword">our</span> <span class="token variable">$site_name</span> <span class="token operator">=</span> <span class="token string">'Witch of Git'</span><span class="token punctuation">;</span><br><span class="token keyword">our</span> <span class="token variable">@git_base_url_list</span> <span class="token operator">=</span> <span class="token string">qw(https://git.witchoflight.com)</span><span class="token punctuation">;</span><br><span class="token keyword">our</span> <span class="token variable">$omit_owner</span> <span class="token operator">=</span> true<span class="token punctuation">;</span><br><span class="token variable">$feature</span><span class="token punctuation">{</span><span class="token string">'highlight'</span><span class="token punctuation">}</span><span class="token punctuation">{</span><span class="token string">'default'</span><span class="token punctuation">}</span> <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">;</span><br><span class="token variable">$feature</span><span class="token punctuation">{</span><span class="token string">'pathinfo'</span><span class="token punctuation">}</span><span class="token punctuation">{</span><span class="token string">'default'</span><span class="token punctuation">}</span> <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">;</span><br><span class="token variable">$feature</span><span class="token punctuation">{</span><span class="token string">'search'</span><span class="token punctuation">}</span><span class="token punctuation">{</span><span class="token string">'default'</span><span class="token punctuation">}</span> <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">;</span><br><br><span class="token keyword">our</span> <span class="token variable">%highlight_ext</span> <span class="token operator">=</span> <span class="token punctuation">(</span><br> <span class="token variable">%highlight_ext</span><span class="token punctuation">,</span><br> <span class="token punctuation">(</span>map <span class="token punctuation">{</span> <span class="token variable">$_</span> <span class="token operator">=></span> <span class="token variable">$_</span> <span class="token punctuation">}</span> <span class="token string">qw(rs lua)</span><span class="token punctuation">)</span><span class="token punctuation">,</span><br> toml <span class="token operator">=></span> <span class="token string">'ini'</span><span class="token punctuation">,</span> <span class="token comment"># good enough for lazy :?</span><br><span class="token punctuation">)</span><span class="token punctuation">;</span><br><br><span class="token variable">@stylesheets</span> <span class="token operator">=</span> <span class="token punctuation">(</span><span class="token string">"/static/gitweb.css"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br><span class="token variable">$logo</span> <span class="token operator">=</span> <span class="token string">"/static/git-logo.png"</span><span class="token punctuation">;</span><br><span class="token variable">$favicon</span> <span class="token operator">=</span> <span class="token string">"/static/git-favicon.png"</span><span class="token punctuation">;</span><br><span class="token variable">$javascript</span> <span class="token operator">=</span> <span class="token string">"/static/gitweb.js"</span><span class="token punctuation">;</span><br><br><span class="token variable">$export_auth_hook</span> <span class="token operator">=</span> <span class="token keyword">sub</span> <span class="token punctuation">{</span> <span class="token operator">not</span> <span class="token punctuation">(</span><span class="token variable">$_</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> <span class="token operator">=~</span> <span class="token regex">/\.git\z/</span><span class="token punctuation">)</span> <span class="token punctuation">}</span><span class="token punctuation">;</span></code></pre>
<p>This does a few things.
First, some basic variables.
Where it should look for files, what the title should be, stuff like that.
I enable syntax highlighting, and importantly enable <code>pathinfo</code>, which changes URLs from messy query strings to readable URLs.
If you’re not careful, this can cause trouble in the server config later, but I’ll go over that.</p>
<p>I need to modify <code>%highlight_ext</code>, which specifies what file extensions map to which syntax highlighting.
This is based on looking at <code>/usr/share/gitweb/gitweb.cgi</code> which sets up all the features that you can modify, and it has a comment:</p>
<pre class="language-perl"><code class="language-perl"><span class="token comment"># see files in /usr/share/highlight/langDefs/ directory</span></code></pre>
<p>Setting <code>pathinfo</code> requires that we specify the stylesheets, logo, favicon, and JS explicitly, for some reason.</p>
<p>The very last line is my own idea here.
The <code>$export_auth_hook</code> is a callback that checks if a given repo should be exported.
I create two copies of all my repos with a symlink, a <code>repo.git</code>, and a plain <code>repo</code> linked to the first.
I create the former because I like having my remotes have <code>.git</code> on them, and I create the latter because it makes for nicer URLs in GitWeb.
I use the hook here to avoid exporting two copies of each.</p>
<h4 id="nginx"><a class="header-anchor c-sun dec-none" href="#nginx">¶</a> Nginx</h4>
<p>I happen to like Nginx as a static file server.
I planned to have it already running on my server, so it seemed like the best idea to get GitWeb running through there.
Unfortunately, GitWeb is a CGI script, and Nginx doesn’t directly support them.
I searched around on three different fronts.
One, tutorials for Nginx CGI scripts.
Another, tutorials for setting up GitWeb with other servers.
And also in the middle of those, tutorials actually on GitWeb with Nginx.
They didn’t completely solve my issues, but I used <a href="https://wiki.archlinux.org/index.php/Gitweb#Nginx">a tutorial from the Arch wiki</a> and <a href="https://gist.github.com/mcxiaoke/055af99e86f8e8d3176e">a tutorial in a gist</a>.</p>
<p>Synthesizing all this together, I made my <code>/etc/nginx/sites-available/gitweb</code>:</p>
<pre class="language-nginx"><code class="language-nginx"><span class="token directive"><span class="token keyword">server</span></span> <span class="token punctuation">{</span><br> <span class="token directive"><span class="token keyword">listen</span> <span class="token number">80</span> default_server</span><span class="token punctuation">;</span><br> <span class="token directive"><span class="token keyword">listen</span> [::]:80 default_server</span><span class="token punctuation">;</span><br> <span class="token directive"><span class="token keyword">server_name</span> git.witchoflight.com</span><span class="token punctuation">;</span><br> <span class="token directive"><span class="token keyword">root</span> /repos/</span><span class="token punctuation">;</span><br><br> <span class="token directive"><span class="token keyword">location</span> /</span> <span class="token punctuation">{</span><br> <span class="token directive"><span class="token keyword">try_files</span> <span class="token variable">$uri</span> @gitweb</span><span class="token punctuation">;</span><br> <span class="token punctuation">}</span><br><br> <span class="token directive"><span class="token keyword">location</span> /static/</span> <span class="token punctuation">{</span><br> <span class="token directive"><span class="token keyword">root</span> /usr/share/gitweb/</span><span class="token punctuation">;</span><br> <span class="token punctuation">}</span><br><br> <span class="token directive"><span class="token keyword">location</span> @gitweb</span> <span class="token punctuation">{</span><br> <span class="token directive"><span class="token keyword">include</span> fastcgi_params</span><span class="token punctuation">;</span><br> <span class="token directive"><span class="token keyword">gzip</span> <span class="token boolean">off</span></span><span class="token punctuation">;</span><br> <span class="token directive"><span class="token keyword">fastcgi_param</span> SCRIPT_FILENAME /usr/share/gitweb/gitweb.cgi</span><span class="token punctuation">;</span><br> <span class="token directive"><span class="token keyword">fastcgi_param</span> PATH_INFO <span class="token variable">$uri</span></span><span class="token punctuation">;</span><br> <span class="token directive"><span class="token keyword">fastcgi_param</span> GITWEB_CONFIG /etc/gitweb.conf</span><span class="token punctuation">;</span><br> <span class="token directive"><span class="token keyword">fastcgi_pass</span> unix:/var/run/fcgiwrap.socket</span><span class="token punctuation">;</span><br> <span class="token punctuation">}</span><br><span class="token punctuation">}</span></code></pre>
<p>This will try to route files at the root through GitWeb, and files under <code>/static/</code> to the filesystem.
GitWeb is set up as a fastcgi client, with fastcgi configured to run through <code>fcgiwrap.socket</code>.
I pass the request uri in the <code>PATH_INFO</code>, seems to be necessary to handle routing with the pretty paths.
As the absolute basics, you always have to pass the script filename, and the config.
I actually got stuck for about an hour here with a mismatch between <code>fcgiwrap.socket</code> and <code>fcgiwrap.sock</code>, so check your spellings carefully while you configure things!
With this setup, GitWeb will correctly format HTML pages when it should, and also correctly pass through files inside the repositories so that HTTP(S) cloning works fine.
Speaking of HTTPS…</p>
<h4 id="using-https"><a class="header-anchor c-sun dec-none" href="#using-https">¶</a> Using HTTPS</h4>
<p>As my final step, I got a cert set up for HTTPS.
I’ve never set up my own HTTPS before, just used it if it was provided by my hosting, but setting one up with <a href="https://letsencrypt.org/">Let’s Encrypt</a> ended up being quite easy.
I just directly followed the steps on their “Get Started” page and things worked first try.
I pointed it at my GitWeb Nginx config and it slightly rewrote it to add certs and things, which I left out above.</p>
<h3 id="making-it-nice"><a class="header-anchor c-sun dec-none" href="#making-it-nice">¶</a> Making It Nice</h3>
<p>I do <em>lots</em> of hobby projects, so I want it to be nice to set up repos.
In the <code>git</code> user’s home directory, I have a <code>git-hooks/</code> folder that holds all of the “universal” base hooks I come up with, a <code>git-template/</code> folder that stores the template for new git repos, and a <code>make-repo</code> script which does the setup for new repositories.</p>
<p>First, the <code>git-hooks/</code>.
I have a dedicated folder for them so the template can symlink that, so I don’t need to edit all the different hooks.
Currently I just have one, the <code>post-update</code> hook.
It contains:</p>
<pre class="language-bash"><code class="language-bash"><span class="token shebang important">#!/bin/sh</span><br><span class="token function">git</span> update-server-info<br><span class="token function">git</span> cat-file blob HEAD:README.md <span class="token operator">|</span> pulldown-cmark <span class="token operator">></span> README.html</code></pre>
<p>We need to <code>git update-server-info</code> to prepare the data that HTTP clones and things need.
Then, we prepare the <code>README.html</code> file that GitWeb displays on the main page of a project.
For this, <code>git cat-file blob HEAD:README.md</code> grabs the latest <code>README.md</code> from the main branch, and then we render it to HTML.</p>
<p>In the <code>git-template/</code> folder, I make symlinks from the <code>~/git-hooks/</code> folder into the <code>git-template/hooks/</code> folder so they just need to be updated in one place.
I also change the default branch to <code>develop</code> since I prefer that instead of <code>master</code>.
You can set the template dir with <code>git config --global init.templatedir <path></code>.</p>
<p>And then the <code>make-repo</code> script I just have set up so that it will create both <code>repo</code> and <code>repo.git</code> in the right place when I ask for it, and let me rewrite the description immediately.</p>
<pre class="language-bash"><code class="language-bash"><span class="token shebang important">#!/bin/bash</span><br><span class="token builtin class-name">set</span> -e<br><br><span class="token keyword">if</span> <span class="token punctuation">[</span> -z <span class="token variable">$1</span> <span class="token punctuation">]</span><span class="token punctuation">;</span> <span class="token keyword">then</span><br> <span class="token builtin class-name">echo</span> <span class="token string">"Usage: make-repo <name>"</span><br> <span class="token builtin class-name">exit</span> <span class="token number">0</span><br><span class="token keyword">fi</span><br><span class="token comment"># Strip .git from the name if you specify it, for uniformity</span><br><span class="token assign-left variable">repo</span><span class="token operator">=</span><span class="token variable"><span class="token variable">$(</span><span class="token function">basename</span> -s.git $1<span class="token variable">)</span></span><br><span class="token comment"># We don't want to work as root, but we login as root (oops?)</span><br><span class="token comment"># so it's convenient to automatically switch to the git user</span><br><span class="token keyword">if</span> <span class="token punctuation">[</span> <span class="token string">"<span class="token environment constant">$EUID</span>"</span> -eq <span class="token number">0</span> <span class="token punctuation">]</span><span class="token punctuation">;</span> <span class="token keyword">then</span> <span class="token builtin class-name">exec</span> <span class="token function">su</span> <span class="token function">git</span> -c <span class="token string">"<span class="token variable">$0</span> <span class="token variable">$1</span>"</span><span class="token punctuation">;</span> <span class="token keyword">fi</span><br><span class="token comment"># Only allow running as the git user, for file permissions purposes</span><br><span class="token keyword">if</span> <span class="token punctuation">[</span> <span class="token variable"><span class="token variable">$(</span><span class="token function">whoami</span><span class="token variable">)</span></span> <span class="token operator">!=</span> <span class="token string">"git"</span> <span class="token punctuation">]</span><span class="token punctuation">;</span> <span class="token keyword">then</span><br> <span class="token builtin class-name">echo</span> <span class="token string">"Please run this tool as the 'git' user"</span><br> <span class="token builtin class-name">exit</span> <span class="token number">1</span><br><span class="token keyword">fi</span><br><span class="token comment"># Don't overwrite existing repositories</span><br><span class="token keyword">if</span> <span class="token punctuation">[</span> -d <span class="token string">"/repos/<span class="token variable">$repo</span>.git"</span> <span class="token punctuation">]</span><span class="token punctuation">;</span> <span class="token keyword">then</span><br> <span class="token builtin class-name">echo</span> <span class="token string">"Repository '<span class="token variable">$1</span>' already exists"</span><br> <span class="token builtin class-name">exit</span> <span class="token number">2</span><br><span class="token keyword">fi</span><br><br><span class="token builtin class-name">cd</span> /repos/<br><span class="token comment"># Make a bare repo for SSH cloning</span><br><span class="token function">git</span> init --bare <span class="token string">"<span class="token variable">$repo</span>.git"</span><br><span class="token comment"># We immediately edit the description file that will show up in GitWeb</span><br>sensible-editor <span class="token string">"<span class="token variable">$repo</span>.git/description"</span><br><span class="token comment"># And have both repo and repo.git point to the same place for cloning</span><br><span class="token function">ln</span> -s <span class="token string">"<span class="token variable">$repo</span>.git"</span> <span class="token string">"<span class="token variable">$repo</span>"</span></code></pre>
<h3 id="what-now"><a class="header-anchor c-sun dec-none" href="#what-now">¶</a> What Now?</h3>
<p>Well, now I have Git hosting!
And if you want some, you can probably figure it out too, it’s less work than I thought it might be!
I’ve done some minimal styling, making it a narrower column and fixing some things, but I want to do more.
That’s a project for another day, though.</p>
<p>Now, go look at my projects.
Some of them are cool!
I’ll try to write about them soon. :)
<a href="https://git.witchoflight.com/">git.witchoflight.com</a>.</p>
Syntactic Unification Is Easy!2020-02-18T19:00:00-05:00https://blog.witchoflight.com/2020/syntactic-unification/<p>Syntactic unification is the process of trying to find an assignment of variables that makes two terms the same.
For example, attempting to unify the terms in the equation <code>(X, Y) = (1, 2)</code> would give the substitution <code>{X: 1, Y: 2}</code>, but if we tried to unify <code>(X, X) = (1, 2)</code> it would fail since one variable can’t be equal to two different values.
Unification is used to implement two things I’m very interested in: type systems, and logic programming languages.</p>
<p>Earlier last year I tried implementing unification in order to make a toy Prolog, and I got completely stuck on the algorithms.
I eventually puzzled throught it by reading lots of papers and articles, but I don’t recall understanding it fully.
You can take a look at <a href="https://en.wikipedia.org/wiki/Unification_(computer_science)#A_unification_algorithm">the equations that the Wikipedia article uses</a> to see what I was dealing with—it’s certainly intimidating!</p>
<p>Cut to last week, I happened to read <a href="http://webyrd.net/scheme-2013/papers/HemannMuKanren2013.pdf">the MicroKanren paper</a> (which is very readable) and discovered that unification could be very simple!
Let me show you how!</p>
<h3 id="the-terms-to-unify"><a class="header-anchor c-sun dec-none" href="#the-terms-to-unify">¶</a> The Terms To Unify</h3>
<p>We have to define what kinds of things we’re going to unify.
There are a few basic things we <em>need</em> to have.</p>
<ul>
<li>Variables, because otherwise unification is just equality comparison.</li>
<li>Terms with multiple subterms, because otherwise unification is just.</li>
<li>Other atomic “base” terms are good to have, because they let you write more interesting terms that are easily understandable.</li>
</ul>
<p>I’m going to write my examples in Python, so the types I’m going to work with are:</p>
<ul>
<li>Variables: A <code>Var</code> class defined for this purpose.</li>
<li>Composite terms: Tuples of terms.</li>
<li>Atomic terms: Any Python type that can be compared for equality.</li>
</ul>
<p>The scheme example they demonstrate in the MicroKanren article uses variables, pairs, and arbitrary objects.
The Rust version I wrote just has variables, pairs, and integers.
Exactly what you want to work with is completely up to you: hopefully at the end of this, you’ll be able to figure out how to do it for some custom type.</p>
<p>My <code>Var</code> type is very simple, with only one custom method to make the printing more compact:</p>
<pre class="language-python"><code class="language-python"><span class="token keyword">from</span> dataclasses <span class="token keyword">import</span> dataclass<br><br><span class="token decorator annotation punctuation">@dataclass</span><span class="token punctuation">(</span>frozen<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span><br><span class="token keyword">class</span> <span class="token class-name">Var</span><span class="token punctuation">:</span><br> name<span class="token punctuation">:</span> <span class="token builtin">str</span><br><br> <span class="token keyword">def</span> <span class="token function">__repr__</span><span class="token punctuation">(</span>self<span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">return</span> self<span class="token punctuation">.</span>name</code></pre>
<p>And everything else is base Python types, so we’re done with this part.</p>
<h3 id="unifying-terms"><a class="header-anchor c-sun dec-none" href="#unifying-terms">¶</a> Unifying Terms</h3>
<p>Let’s work through unifying things.
We want to unify two terms, and return a substitution that makes them equal, which I’ll call the “environment.”
If they can’t possibly be equal, we’ll return <code>None</code>.
Let’s start out simple: two things can be unified if they’re already equal.</p>
<pre class="language-python"><code class="language-python"><span class="token keyword">def</span> <span class="token function">unify</span><span class="token punctuation">(</span>a<span class="token punctuation">,</span> b<span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">if</span> a <span class="token operator">==</span> b<span class="token punctuation">:</span><br> <span class="token keyword">return</span> <span class="token punctuation">{</span><span class="token punctuation">}</span><br> <span class="token comment"># ... other cases ...</span><br> <span class="token keyword">return</span> <span class="token boolean">None</span></code></pre>
<p>Here we return an empty environment because they’re already equal, and nothing needs to be substituted.</p>
<p>The variable cases can also be solved very easily.
Trying to unify a variable with another term, we can just substitute that for the variable.
When unifying <code>x = 1</code>, we get back <code>{ x: 1 }</code>.</p>
<pre class="language-python"><code class="language-python"> <span class="token keyword">if</span> <span class="token builtin">isinstance</span><span class="token punctuation">(</span>a<span class="token punctuation">,</span> Var<span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">return</span> <span class="token punctuation">{</span>a<span class="token punctuation">:</span> b<span class="token punctuation">}</span><br> <span class="token keyword">if</span> <span class="token builtin">isinstance</span><span class="token punctuation">(</span>b<span class="token punctuation">,</span> Var<span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">return</span> <span class="token punctuation">{</span>b<span class="token punctuation">:</span> a<span class="token punctuation">}</span></code></pre>
<p>Finally, let’s handle tuples.
For this, we want to unify each individual component of the tuple.
We want the union of each of these unifications, so we’ll update the dictionary each time.
We have to check if the sub-unification returned None and pass that along.</p>
<pre class="language-python"><code class="language-python"> <span class="token keyword">if</span> <span class="token builtin">isinstance</span><span class="token punctuation">(</span>a<span class="token punctuation">,</span> <span class="token builtin">tuple</span><span class="token punctuation">)</span> <span class="token keyword">and</span> <span class="token builtin">isinstance</span><span class="token punctuation">(</span>b<span class="token punctuation">,</span> <span class="token builtin">tuple</span><span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">if</span> <span class="token builtin">len</span><span class="token punctuation">(</span>a<span class="token punctuation">)</span> <span class="token operator">!=</span> <span class="token builtin">len</span><span class="token punctuation">(</span>b<span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">return</span> <span class="token boolean">None</span><br> env <span class="token operator">=</span> <span class="token punctuation">{</span><span class="token punctuation">}</span><br> <span class="token keyword">for</span> <span class="token punctuation">(</span>ax<span class="token punctuation">,</span> bx<span class="token punctuation">)</span> <span class="token keyword">in</span> <span class="token builtin">zip</span><span class="token punctuation">(</span>a<span class="token punctuation">,</span> b<span class="token punctuation">)</span><span class="token punctuation">:</span><br> res <span class="token operator">=</span> unify<span class="token punctuation">(</span>ax<span class="token punctuation">,</span> bx<span class="token punctuation">)</span><br> <span class="token keyword">if</span> res <span class="token keyword">is</span> <span class="token boolean">None</span><span class="token punctuation">:</span><br> <span class="token keyword">return</span> <span class="token boolean">None</span><br> env<span class="token punctuation">.</span>update<span class="token punctuation">(</span>res<span class="token punctuation">)</span><br> <span class="token keyword">return</span> env</code></pre>
<p>Now you might see where this could go wrong, and we’ll get to that in a moment with an example.
Repeating that function all in one piece:</p>
<pre class="language-python"><code class="language-python"><span class="token keyword">def</span> <span class="token function">unify</span><span class="token punctuation">(</span>a<span class="token punctuation">,</span> b<span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">if</span> a <span class="token operator">==</span> b<span class="token punctuation">:</span><br> <span class="token keyword">return</span> <span class="token punctuation">{</span><span class="token punctuation">}</span><br> <span class="token keyword">if</span> <span class="token builtin">isinstance</span><span class="token punctuation">(</span>a<span class="token punctuation">,</span> Var<span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">return</span> <span class="token punctuation">{</span>a<span class="token punctuation">:</span> b<span class="token punctuation">}</span><br> <span class="token keyword">if</span> <span class="token builtin">isinstance</span><span class="token punctuation">(</span>b<span class="token punctuation">,</span> Var<span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">return</span> <span class="token punctuation">{</span>b<span class="token punctuation">:</span> a<span class="token punctuation">}</span><br> <span class="token keyword">if</span> <span class="token builtin">isinstance</span><span class="token punctuation">(</span>a<span class="token punctuation">,</span> <span class="token builtin">tuple</span><span class="token punctuation">)</span> <span class="token keyword">and</span> <span class="token builtin">isinstance</span><span class="token punctuation">(</span>b<span class="token punctuation">,</span> <span class="token builtin">tuple</span><span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">if</span> <span class="token builtin">len</span><span class="token punctuation">(</span>a<span class="token punctuation">)</span> <span class="token operator">!=</span> <span class="token builtin">len</span><span class="token punctuation">(</span>b<span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">return</span> <span class="token boolean">None</span><br> env <span class="token operator">=</span> <span class="token punctuation">{</span><span class="token punctuation">}</span><br> <span class="token keyword">for</span> <span class="token punctuation">(</span>ax<span class="token punctuation">,</span> bx<span class="token punctuation">)</span> <span class="token keyword">in</span> <span class="token builtin">zip</span><span class="token punctuation">(</span>a<span class="token punctuation">,</span> b<span class="token punctuation">)</span><span class="token punctuation">:</span><br> res <span class="token operator">=</span> unify<span class="token punctuation">(</span>ax<span class="token punctuation">,</span> bx<span class="token punctuation">)</span><br> <span class="token keyword">if</span> res <span class="token keyword">is</span> <span class="token boolean">None</span><span class="token punctuation">:</span><br> <span class="token keyword">return</span> <span class="token boolean">None</span><br> env<span class="token punctuation">.</span>update<span class="token punctuation">(</span>res<span class="token punctuation">)</span><br> <span class="token keyword">return</span> env<br> <span class="token keyword">return</span> <span class="token boolean">None</span></code></pre>
<p>Now we can unify stuff:</p>
<pre class="language-python"><code class="language-python"><span class="token operator">>></span><span class="token operator">></span> x<span class="token punctuation">,</span> y <span class="token operator">=</span> Var<span class="token punctuation">(</span><span class="token string">"x"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> Var<span class="token punctuation">(</span><span class="token string">"y"</span><span class="token punctuation">)</span><br><span class="token operator">>></span><span class="token operator">></span> unify<span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><br><span class="token punctuation">{</span><span class="token punctuation">}</span><br><span class="token operator">>></span><span class="token operator">></span> unify<span class="token punctuation">(</span>x<span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><br><span class="token punctuation">{</span> x<span class="token punctuation">:</span> <span class="token number">1</span> <span class="token punctuation">}</span><br><span class="token operator">>></span><span class="token operator">></span> unify<span class="token punctuation">(</span><span class="token punctuation">(</span>x<span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">)</span><br><span class="token punctuation">{</span> x<span class="token punctuation">:</span> <span class="token number">1</span> <span class="token punctuation">}</span><br><span class="token operator">>></span><span class="token operator">></span> unify<span class="token punctuation">(</span><span class="token punctuation">(</span>x<span class="token punctuation">,</span> y<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">)</span><br><span class="token punctuation">{</span> x<span class="token punctuation">:</span> <span class="token number">1</span><span class="token punctuation">,</span> y<span class="token punctuation">:</span> <span class="token number">2</span> <span class="token punctuation">}</span><br><span class="token operator">>></span><span class="token operator">></span> unify<span class="token punctuation">(</span><span class="token punctuation">(</span>x<span class="token punctuation">,</span> x<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">)</span><br><span class="token punctuation">{</span> x<span class="token punctuation">:</span> <span class="token number">2</span> <span class="token punctuation">}</span></code></pre>
<p>Oops!
That last one is wrong!
So what can we do to fix that?</p>
<p>Well, we need to take into account context.
Let’s change our <code>unify</code> function to accept an environment to work inside.
I’m gonna make this a pure function, so we don’t update the environment, we just make a new version of it.
Some things are changed a bit when doing this.
We include <code>env</code> in all of our outputs, instead of starting from nothing.
We’ll also include</p>
<pre class="language-python"><code class="language-python"><span class="token keyword">def</span> <span class="token function">unify</span><span class="token punctuation">(</span>a<span class="token punctuation">,</span> b<span class="token punctuation">,</span> env<span class="token operator">=</span><span class="token punctuation">{</span><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">if</span> a <span class="token operator">==</span> b<span class="token punctuation">:</span><br> <span class="token keyword">return</span> env<br> <span class="token keyword">if</span> <span class="token builtin">isinstance</span><span class="token punctuation">(</span>a<span class="token punctuation">,</span> Var<span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">return</span> <span class="token punctuation">{</span><span class="token operator">**</span>env<span class="token punctuation">,</span> a<span class="token punctuation">:</span> b<span class="token punctuation">}</span><br> <span class="token keyword">if</span> <span class="token builtin">isinstance</span><span class="token punctuation">(</span>b<span class="token punctuation">,</span> Var<span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">return</span> <span class="token punctuation">{</span><span class="token operator">**</span>env<span class="token punctuation">,</span> b<span class="token punctuation">:</span> a<span class="token punctuation">}</span><br> <span class="token keyword">if</span> <span class="token builtin">isinstance</span><span class="token punctuation">(</span>a<span class="token punctuation">,</span> <span class="token builtin">tuple</span><span class="token punctuation">)</span> <span class="token keyword">and</span> <span class="token builtin">isinstance</span><span class="token punctuation">(</span>b<span class="token punctuation">,</span> <span class="token builtin">tuple</span><span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">if</span> <span class="token builtin">len</span><span class="token punctuation">(</span>a<span class="token punctuation">)</span> <span class="token operator">!=</span> <span class="token builtin">len</span><span class="token punctuation">(</span>b<span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">return</span> <span class="token boolean">None</span><br> <span class="token keyword">for</span> <span class="token punctuation">(</span>ax<span class="token punctuation">,</span> bx<span class="token punctuation">)</span> <span class="token keyword">in</span> <span class="token builtin">zip</span><span class="token punctuation">(</span>a<span class="token punctuation">,</span> b<span class="token punctuation">)</span><span class="token punctuation">:</span><br> env <span class="token operator">=</span> unify<span class="token punctuation">(</span>a<span class="token punctuation">,</span> b<span class="token punctuation">,</span> env<span class="token punctuation">)</span><br> <span class="token keyword">if</span> env <span class="token keyword">is</span> <span class="token boolean">None</span><span class="token punctuation">:</span><br> <span class="token keyword">return</span> <span class="token boolean">None</span><br> <span class="token keyword">return</span> env<br> <span class="token keyword">return</span> <span class="token boolean">None</span></code></pre>
<p>This still doesn’t do the right thing though, because we’re not actually looking in the environment.
We need to look stuff up, so let’s write a function to do that.
We’ll just always call it, and if the term isn’t a key for the environment, we’ll just return it unchanged.</p>
<pre class="language-python"><code class="language-python"><span class="token keyword">def</span> <span class="token function">walk</span><span class="token punctuation">(</span>env<span class="token punctuation">,</span> term<span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">if</span> term <span class="token keyword">in</span> env<span class="token punctuation">:</span><br> term <span class="token operator">=</span> env<span class="token punctuation">[</span>term<span class="token punctuation">]</span><br> <span class="token keyword">return</span> term</code></pre>
<p>Now we can add these two lines at the beginning.</p>
<pre class="language-python"><code class="language-python"> a <span class="token operator">=</span> walk<span class="token punctuation">(</span>env<span class="token punctuation">,</span> a<span class="token punctuation">)</span><br> b <span class="token operator">=</span> walk<span class="token punctuation">(</span>env<span class="token punctuation">,</span> b<span class="token punctuation">)</span></code></pre>
<p>With these modifications, we correctly handle the case we were stuck on earlier.</p>
<pre class="language-python"><code class="language-python"><span class="token operator">>></span><span class="token operator">></span> unify<span class="token punctuation">(</span><span class="token punctuation">(</span>x<span class="token punctuation">,</span> x<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">)</span><br><span class="token operator">>></span><span class="token operator">></span> <span class="token comment"># look ma no solution!</span></code></pre>
<p>It’s still not perfect though.
It doesn’t properly reject this case where the conflict is two steps away:</p>
<pre class="language-python"><code class="language-python"><span class="token operator">>></span><span class="token operator">></span> unify<span class="token punctuation">(</span><span class="token punctuation">(</span>x<span class="token punctuation">,</span> y<span class="token punctuation">,</span> x<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>y<span class="token punctuation">,</span> <span class="token number">8</span><span class="token punctuation">,</span> <span class="token number">9</span><span class="token punctuation">)</span><span class="token punctuation">)</span><br><span class="token punctuation">{</span> x<span class="token punctuation">:</span> y<span class="token punctuation">,</span> y<span class="token punctuation">:</span> <span class="token number">9</span> <span class="token punctuation">}</span></code></pre>
<p>Here the problem is that we have two facts around, that <code>x = y</code>, and that <code>y = 8</code>.
Even so, it doesn’t see that that means <code>x = 8</code> when it gets to the assertion of <code>x = 9</code>.
Luckily this is another simple fix.
We just keep walking through the environment as long as we have a binding.
Just change the <code>if</code> to a <code>while</code>!</p>
<pre class="language-python"><code class="language-python"><span class="token keyword">def</span> <span class="token function">walk</span><span class="token punctuation">(</span>env<span class="token punctuation">,</span> term<span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">while</span> term <span class="token keyword">in</span> env<span class="token punctuation">:</span><br> term <span class="token operator">=</span> env<span class="token punctuation">[</span>term<span class="token punctuation">]</span><br> <span class="token keyword">return</span> term</code></pre>
<p>And now this unifier should be able to handle any <a href="#postscript-unifying-infinity">(finite)</a> hard cases you throw at it!</p>
<h3 id="the-whole-unifier"><a class="header-anchor c-sun dec-none" href="#the-whole-unifier">¶</a> The Whole Unifier</h3>
<p>Here’s the whole program we wrote in one piece, in just 30 lines.</p>
<pre class="language-python"><code class="language-python"><span class="token keyword">from</span> dataclasses <span class="token keyword">import</span> dataclass<br><br><span class="token decorator annotation punctuation">@dataclass</span><span class="token punctuation">(</span>frozen<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span><br><span class="token keyword">class</span> <span class="token class-name">Var</span><span class="token punctuation">:</span><br> name<span class="token punctuation">:</span> <span class="token builtin">str</span><br> <span class="token keyword">def</span> <span class="token function">__repr__</span><span class="token punctuation">(</span>self<span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">return</span> self<span class="token punctuation">.</span>name<br><br><span class="token keyword">def</span> <span class="token function">walk</span><span class="token punctuation">(</span>env<span class="token punctuation">,</span> term<span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">while</span> term <span class="token keyword">in</span> env<span class="token punctuation">:</span><br> term <span class="token operator">=</span> env<span class="token punctuation">[</span>term<span class="token punctuation">]</span><br> <span class="token keyword">return</span> term<br><br><span class="token keyword">def</span> <span class="token function">unify</span><span class="token punctuation">(</span>a<span class="token punctuation">,</span> b<span class="token punctuation">,</span> env<span class="token operator">=</span><span class="token punctuation">{</span><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">:</span><br> a <span class="token operator">=</span> walk<span class="token punctuation">(</span>env<span class="token punctuation">,</span> a<span class="token punctuation">)</span><br> b <span class="token operator">=</span> walk<span class="token punctuation">(</span>env<span class="token punctuation">,</span> b<span class="token punctuation">)</span><br> <span class="token keyword">if</span> a <span class="token operator">==</span> b<span class="token punctuation">:</span><br> <span class="token keyword">return</span> env<br> <span class="token keyword">if</span> <span class="token builtin">isinstance</span><span class="token punctuation">(</span>a<span class="token punctuation">,</span> Var<span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">return</span> <span class="token punctuation">{</span> <span class="token operator">**</span>env<span class="token punctuation">,</span> a<span class="token punctuation">:</span> b <span class="token punctuation">}</span><br> <span class="token keyword">if</span> <span class="token builtin">isinstance</span><span class="token punctuation">(</span>b<span class="token punctuation">,</span> Var<span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">return</span> <span class="token punctuation">{</span> <span class="token operator">**</span>env<span class="token punctuation">,</span> b<span class="token punctuation">:</span> a <span class="token punctuation">}</span><br> <span class="token keyword">if</span> <span class="token builtin">isinstance</span><span class="token punctuation">(</span>a<span class="token punctuation">,</span> <span class="token builtin">tuple</span><span class="token punctuation">)</span> <span class="token keyword">and</span> <span class="token builtin">isinstance</span><span class="token punctuation">(</span>b<span class="token punctuation">,</span> <span class="token builtin">tuple</span><span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">if</span> <span class="token builtin">len</span><span class="token punctuation">(</span>a<span class="token punctuation">)</span> <span class="token operator">!=</span> <span class="token builtin">len</span><span class="token punctuation">(</span>b<span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">return</span> <span class="token boolean">None</span><br> <span class="token keyword">for</span> <span class="token punctuation">(</span>a<span class="token punctuation">,</span> b<span class="token punctuation">)</span> <span class="token keyword">in</span> <span class="token builtin">zip</span><span class="token punctuation">(</span>a<span class="token punctuation">,</span> b<span class="token punctuation">)</span><span class="token punctuation">:</span><br> env <span class="token operator">=</span> unify<span class="token punctuation">(</span>a<span class="token punctuation">,</span> b<span class="token punctuation">,</span> env<span class="token punctuation">)</span><br> <span class="token keyword">if</span> env <span class="token keyword">is</span> <span class="token boolean">None</span><span class="token punctuation">:</span><br> <span class="token keyword">return</span> <span class="token boolean">None</span><br> <span class="token keyword">return</span> env<br> <span class="token keyword">return</span> <span class="token boolean">None</span></code></pre>
<p>Consider writing your own in your language of choice, or adding support for new datatypes.
It would be nice to be able to unify namedtuples and dictionaries and things, or do unification in the web browser.
Try it out for yourself, because next time I’ll be talking about how you can go from here and use unification to build logic programs.</p>
<h3 id="postscript-unifying-infinity"><a class="header-anchor c-sun dec-none" href="#postscript-unifying-infinity">¶</a> Postscript: Unifying Infinity</h3>
<p>What was that earlier?
About only the finite cases?
Well, it turns out this unifier is good enough for implementing well-behaved logic programs, but it leaves out one complexity: the “Occurs Check.”
This checks that a variable does not “occur” inside a term you unify it with, to avoid infinite solutions and non-termination.
Many old Prolog implementations left out the occurs check for efficiency reasons, and we can leave it out for simplicity reasons if we want to trust our programmers to only write finite terms.</p>
<p>But let’s go look at the problem specifically.
We can write a unification like:</p>
<pre class="language-python"><code class="language-python"><span class="token operator">>></span><span class="token operator">></span> unify<span class="token punctuation">(</span>a<span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> a<span class="token punctuation">)</span><span class="token punctuation">)</span><br><span class="token punctuation">{</span> a<span class="token punctuation">:</span> <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> a<span class="token punctuation">)</span> <span class="token punctuation">}</span></code></pre>
<p>If you think through this, you can see that <code>a = (1, (1, (1, (1, ...))))</code>, going on infinitely deep.
There are languages that work with logic on infinite terms like this, but applied naively it can end up giving us incorrect results or non-termination.
For instance, with proper support for infinite terms, the following should unify:</p>
<pre class="language-python"><code class="language-python"><span class="token operator">>></span><span class="token operator">></span> env <span class="token operator">=</span> unify<span class="token punctuation">(</span>a<span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> a<span class="token punctuation">)</span><span class="token punctuation">)</span><br><span class="token operator">>></span><span class="token operator">></span> env <span class="token operator">=</span> unify<span class="token punctuation">(</span>b<span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> b<span class="token punctuation">)</span><span class="token punctuation">,</span> env<span class="token punctuation">)</span><br><span class="token operator">>></span><span class="token operator">></span> env <span class="token operator">=</span> unify<span class="token punctuation">(</span>a<span class="token punctuation">,</span> b<span class="token punctuation">)</span></code></pre>
<p>But instead, it gets stuck on the last step, descending forever.
There are two ways to solve this.
One is to embrace infinite terms and include fixpoints in your logic, giving you terms like <code>a = fix x. (1, x)</code> and <code>b = fix x. (1, x)</code>.
I don’t know how to do this, though.</p>
<p>The other way is to detect unifications that would lead to infinite terms, and reject them.
We take our code that adds variables to the substitution, and add the occurs check there:</p>
<pre class="language-python"><code class="language-python"> <span class="token keyword">if</span> <span class="token builtin">isinstance</span><span class="token punctuation">(</span>a<span class="token punctuation">,</span> Var<span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">if</span> occurs<span class="token punctuation">(</span>a<span class="token punctuation">,</span> b<span class="token punctuation">,</span> env<span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token keyword">return</span> <span class="token boolean">None</span><br> <span class="token keyword">return</span> <span class="token punctuation">{</span> <span class="token operator">**</span>env<span class="token punctuation">,</span> a<span class="token punctuation">:</span> b <span class="token punctuation">}</span><br> <span class="token keyword">if</span> <span class="token builtin">isinstance</span><span class="token punctuation">(</span>b<span class="token punctuation">,</span> Var<span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">if</span> occurs<span class="token punctuation">(</span>b<span class="token punctuation">,</span> a<span class="token punctuation">,</span> env<span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token keyword">return</span> <span class="token boolean">None</span><br> <span class="token keyword">return</span> <span class="token punctuation">{</span> <span class="token operator">**</span>env<span class="token punctuation">,</span> b<span class="token punctuation">:</span> a <span class="token punctuation">}</span></code></pre>
<p>If we’re using the occurs-check here, then we know a few things:</p>
<ul>
<li><code>a</code> is a variable that’s already been <code>walk()</code>-ed.</li>
<li><code>a</code> and <code>b</code> aren’t equal to each other, because earlier in the unify checked that.</li>
</ul>
<p>This means that we can check if the literal variable <code>a</code> occurs literally in walked terms and subterms of <code>b</code>.
If there’s a variable in <code>b</code> that’s a synonym of <code>a</code>, then it will walk to <code>a</code> because it was the result of walking.
While it’s safe to assume to that <code>a</code> has already been walked, note that we can’t assume that <code>b</code> has already been walked since we’re recursing over it.
Then, the implementation looks like this:</p>
<pre class="language-python"><code class="language-python"><span class="token keyword">def</span> <span class="token function">occurs</span><span class="token punctuation">(</span>a<span class="token punctuation">,</span> b<span class="token punctuation">,</span> env<span class="token operator">=</span><span class="token punctuation">{</span><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">:</span><br> b <span class="token operator">=</span> walk<span class="token punctuation">(</span>env<span class="token punctuation">,</span> b<span class="token punctuation">)</span><br> <span class="token keyword">if</span> a <span class="token operator">==</span> b<span class="token punctuation">:</span><br> <span class="token keyword">return</span> <span class="token boolean">True</span><br> <span class="token keyword">if</span> <span class="token builtin">isinstance</span><span class="token punctuation">(</span>b<span class="token punctuation">,</span> <span class="token builtin">tuple</span><span class="token punctuation">)</span><span class="token punctuation">:</span><br> <span class="token keyword">return</span> <span class="token builtin">any</span><span class="token punctuation">(</span>occurs<span class="token punctuation">(</span>a<span class="token punctuation">,</span> x<span class="token punctuation">)</span> <span class="token keyword">for</span> x <span class="token keyword">in</span> b<span class="token punctuation">)</span><br> <span class="token keyword">return</span> <span class="token boolean">False</span></code></pre>
<p>Now if we try…</p>
<pre class="language-python"><code class="language-python"><span class="token operator">>></span><span class="token operator">></span> unify<span class="token punctuation">(</span>a<span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> a<span class="token punctuation">)</span><span class="token punctuation">)</span><br><span class="token operator">>></span><span class="token operator">></span></code></pre>
<p>We correctly don’t unify this.
We’ve got sound unification over finite terms.
This is useful to clean the situtation up, but I’m still a bit sad to see infinite terms go…</p>
<hr>
<p><em>Edit November 26, 2020.</em>
Thanks to Onel Harrison for giving some clarity feedback and pointing out some typos.</p>