<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>beta BLOG dot NET - recently in algorithms category</title>
  <link rel="alternate" type="text/html" href="http://beta-blog.net/algorithms/" />
  <link rel="self" type="application/atom+xml" href="" />
  <id>tag:beta-blog.net,2009-08-27://1</id>
  <updated>2009-11-29T16:22:17Z</updated>
  
  <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.25</generator>

<entry>
  <title>a wordlist folding algorithm</title>
  <link rel="alternate" type="text/html" href="http://beta-blog.net/2009/11/a-wordlist-folding-algorithm" />
  <id>tag:beta-blog.net,2009://1.52384</id>

  <published>2009-11-28T23:33:32Z</published>
  <updated>2009-11-29T16:22:17Z</updated>

  <summary>Assumed you wish to match a large wordlist against a huge chunk of text. As a small test case, let
for, far, bar, foo, boofaz, boofar, boof, faz, foobaz, foobars, boofar
be your wordlist. Now, you may apply the according regualar expression:
But which way a regex engine would implement the assignment?
</summary>
  <author>
    <name>Sebastian</name>
    <uri>http://beta-blog.net</uri>
  </author>
  
  <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
  
  <category term="algorithms" scheme="http://www.sixapart.com/ns/types#category" />
  
  <category term="codes" label="codes" scheme="http://www.sixapart.com/ns/types#tag" />
  <category term="perl" label="Perl" scheme="http://www.sixapart.com/ns/types#tag" />
  <category term="regex" label="regex" scheme="http://www.sixapart.com/ns/types#tag" />
  
  <content type="html" xml:lang="en" xml:base="http://beta-blog.net/">
  <![CDATA[<p>
Assumed you wish to match a large wordlist against a huge chunk of text.
As a small test case, let
</p>
<pre class="code">
for, far, bar, foo, boofaz, boofar, boof, faz, foobaz, foobars, boofar
</pre>
<p>
be your wordlist. Now, you may apply the according regualar expression:
</p>
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_qgobsnhs_1">[-] hide code</a></legend><div class="collapsible-container"><pre class="code">
(1) /\b(for|far|bar|foo|boofaz|boofar|boof|faz|foobaz|foobars|boofar)\b/
</pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_qgobsnhs_1')})/*]]&gt;*/</script>
<p>
But which way a regex engine would implement the assignment?
There are different options. The very worst algorithm would be surely to
look up every word separately in the whole text. That would be the same as
doing
</p>
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_qgobsnhs_2">[-] hide code</a></legend><div class="collapsible-container"><pre class="code"><code class="perl"><a class="kwd" href="http://perldoc.perl.org/functions/foreach.html" target="_blank" rel="nofollow">foreach</a> <span class="op ld">(</span><span class="qlo qw"><span class="kwd">qw</span><span class="op">(</span><span class="istr"> for far bar foo boofaz boofar boof faz foobaz foobars boofar </span><span class="op">)</span></span><span class="op rd">)</span>
<span class="op ld">{</span>
  <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="istr">&quot;matching!&quot;</span> <span class="kwd">if</span> <span class="var">$<span class="symb">text</span></span> =~ <span class="symb">m</span>/\<span class="symb">b</span><span class="var">$_</span>\<span class="symb">b</span>/<span class="op stmt">;</span>
<span class="op rd">}</span>
<a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="istr">&quot;not matching.&quot;</span><span class="op stmt">;</span></code></pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_qgobsnhs_2')})/*]]&gt;*/</script>
<p>
Assumed you would match <span class="math">m</span> words against a text consisting of <span class="math">n</span> letters,
this peace of coding horror would have a runtime estimation of <span class="math">O(m*n)</span>.
</p>

<p>
Now, a better approach would be to run only once through the text,
using a matching stack. Thus, assume <span class="code">&quot; foobar &quot;</span> would appear somewhere in
the text, the stack trace might look as follows then (read from bottom to top):
</p>
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_qgobsnhs_3">[-] hide code</a></legend><div class="collapsible-container"><pre class="code">
[7] ' ' =&gt; nothing matches.
[6] 'r' =&gt; &quot;foobars&quot; might match.
[5] 'a' =&gt; &quot;foobaz&quot; or &quot;foobars&quot; might match.
[4] 'b' =&gt; &quot;foobaz&quot; or &quot;foobars&quot; might match.
[3] 'o' =&gt; &quot;foo&quot;, &quot;foobaz&quot;, or &quot;foobars&quot; might match.
[2] 'o' =&gt; &quot;for&quot;, &quot;foo&quot;, &quot;foobaz&quot;, or &quot;foobars&quot; might match.
[1] 'f' =&gt; &quot;for&quot;, &quot;far&quot;, &quot;foo&quot;, &quot;faz&quot;, &quot;foobaz&quot;, or &quot;foobars&quot; might match.
[0] ' ' =&gt; &quot;\b&quot; matches.
</pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_qgobsnhs_3')})/*]]&gt;*/</script>
<p>
So, but what if the wordlist is getting large? It seems that we should run nearly
through the whole list each time a character is pushed onto the stack in order to
find out whether the current stack contents still may be matched or not.
</p>

<p>
It's clear that a considerable optimization would be to sort the word list
in advance. Moreover, instead of looking up one item after another,
a really smart approach would be to walk downwards a search tree instead.
As a tree, the wordlist above would appear like this:
</p>
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_qgobsnhs_4">[-] hide code</a></legend><div class="collapsible-container"><pre class="code">
          _____________|_____________
          |                         |
          b                         f
    ______|______        ___________|___________
    |           |        |                     |
   oof          ar       a                     o
    |                 ___|___            ______|______
    a ?               |     |            |           |
 ___|___              r     z            o           r
 |     |                                 |
 r     z                                 ba ?
                                     ____|____
                                     |       |
                                     rs      z
</pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_qgobsnhs_4')})/*]]&gt;*/</script>
<p>
Here, the &quot;?&quot; denotes an optional node. Remember the length of the way downwards
such a tree is in logarithmic relation to the number of nodes. Thus, loosely speeking,
we have improved the worst algorithm above up to <span class="math">O(n*log(m))</span> at least.
</p>
<p>
Actually I'm not sure whether regex engines would apply optimizations like that
when compiling. I guess they do, so it might be needless to replace the regex <span class="code">(1)</span> above
by the optimized version, implementing the sorted tree of alternative and optional nodes:
</p>
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_qgobsnhs_5">[-] hide code</a></legend><div class="collapsible-container"><pre class="code">
(2) /\b(b(?:ar|oof(?:a(?:r|z))?)|f(?:a(?:r|z)|o(?:o(?:ba(?:rs|z))?|r)))\b/
</pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_qgobsnhs_5')})/*]]&gt;*/</script>
<p>
Nevertheless I couldn't help to create a little Perl routine that folds a wordlist into an
optimized regex. Now, here it is:
</p>
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_qgobsnhs_6">[-] hide code</a></legend><div class="collapsible-container"><pre class="code"><code class="perl"><a class="kwd" href="http://perldoc.perl.org/functions/sub.html" target="_blank" rel="nofollow">sub</a> <span class="symb">foldWordsToRegex</span> <span class="op ld">{</span>

  <a class="kwd" href="http://perldoc.perl.org/functions/local.html" target="_blank" rel="nofollow">local</a> *<span class="symb">toString</span> = <a class="kwd" href="http://perldoc.perl.org/functions/sub.html" target="_blank" rel="nofollow">sub</a> <span class="op ld">{</span>
    <span class="cmnt">## node: [ prefix, [ nodes ], opt ]</span>

    <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="op ld">(</span><span class="var">$<span class="symb">prefix</span></span>, <span class="var">$<span class="symb">nodes</span></span>, <span class="var">$<span class="symb">opt</span></span><span class="op rd">)</span> = <span class="var">$<span class="op ld">{</span><span class="var">$_<span class="op ld">[</span>0<span class="op rd">]</span></span><span class="op rd">}</span></span><span class="op stmt">;</span>
    <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">rv</span></span> = <a class="kwd" href="http://perldoc.perl.org/functions/quotemeta.html" target="_blank" rel="nofollow">quotemeta</a> <span class="var">$<span class="symb">prefix</span></span><span class="op stmt">;</span>
    <span class="kwd">if</span> <span class="op ld">(</span> <a class="kwd" href="http://perldoc.perl.org/functions/ref.html" target="_blank" rel="nofollow">ref</a> <span class="var">$<span class="symb">nodes</span></span> <span class="symb">eq</span> <span class="qlo q"><span class="kwd">q</span><span class="op">|</span><span class="str">ARRAY</span><span class="op">|</span></span> &amp;&amp; <span class="var">@$<span class="symb">nodes</span></span> <span class="op rd">)</span>
    <span class="op ld">{</span>
      <span class="var">$<span class="symb">rv</span></span> .= <span class="str">&#039;(?:&#039;</span>.<span class="op ld">(</span><a class="kwd" href="http://perldoc.perl.org/functions/join.html" target="_blank" rel="nofollow">join</a> <span class="str">&#039;|&#039;</span>, <a class="kwd" href="http://perldoc.perl.org/functions/map.html" target="_blank" rel="nofollow">map</a> <span class="op ld">{</span> <span class="symb">toString</span><span class="op ld">(</span><span class="var">$_</span><span class="op rd">)</span> <span class="op rd">}</span> <span class="var">@$<span class="symb">nodes</span></span><span class="op rd">)</span>.<span class="str">&#039;)&#039;</span><span class="op stmt">;</span>
      <span class="var">$<span class="symb">rv</span></span> .= <span class="str">&#039;?&#039;</span> <span class="kwd">if</span> <span class="var">$<span class="symb">opt</span></span><span class="op stmt">;</span>
    <span class="op rd">}</span>
    <span class="var">$<span class="symb">rv</span></span><span class="op stmt">;</span>
  <span class="op rd">}</span><span class="op stmt">;</span>

  <a class="kwd" href="http://perldoc.perl.org/functions/local.html" target="_blank" rel="nofollow">local</a> *<span class="symb">fold</span> = <a class="kwd" href="http://perldoc.perl.org/functions/sub.html" target="_blank" rel="nofollow">sub</a><span class="op ld">(</span><span class="var">@_</span><span class="op rd">)</span> <span class="op ld">{</span>

    <a class="kwd" href="http://perldoc.perl.org/functions/sub.html" target="_blank" rel="nofollow">sub</a> <span class="symb">reduce</span><a class="o" href="o" target="_blank" rel="nofollow">(</a><a class="p" href="p" target="_blank" rel="nofollow">$</a><a class="o" href="o" target="_blank" rel="nofollow">)</a><span class="op stmt">;</span>
    <a class="kwd" href="http://perldoc.perl.org/functions/local.html" target="_blank" rel="nofollow">local</a> *<span class="symb">reduce</span> = <a class="kwd" href="http://perldoc.perl.org/functions/sub.html" target="_blank" rel="nofollow">sub</a> <span class="op ld">{</span>
      <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="op ld">(</span><span class="var">$<span class="symb">prefix</span></span>, <span class="var">$<span class="symb">nodes</span></span>, <span class="var">$<span class="symb">opt</span></span><span class="op rd">)</span> = <span class="var">$<span class="op ld">{</span><span class="var">$_<span class="op ld">[</span>0<span class="op rd">]</span></span><span class="op rd">}</span></span><span class="op stmt">;</span>

      <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="var">$_<span class="op ld">[</span>0<span class="op rd">]</span></span> <span class="kwd">unless</span> <a class="kwd" href="http://perldoc.perl.org/functions/ref.html" target="_blank" rel="nofollow">ref</a> <span class="var">$<span class="symb">nodes</span></span> <span class="symb">eq</span> <span class="qlo q"><span class="kwd">q</span><span class="op">|</span><span class="str">ARRAY</span><span class="op">|</span></span> &amp;&amp; <span class="var">@$<span class="symb">nodes</span></span> &gt; <span class="num">1</span><span class="op stmt">;</span>

      <span class="cmnt">## 1st char of the prefix of 1st node in list</span>
      <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="op ld">(</span><span class="var">$<span class="symb">c</span></span>, <span class="var">$<span class="symb">qc</span></span><span class="op rd">)</span><span class="op stmt">;</span>

      <span class="cmnt">## check whether 2nd prefix starts with same letter as the 1st</span>
      <span class="kwd">if</span> <span class="op ld">(</span> <a class="kwd" href="http://perldoc.perl.org/functions/length.html" target="_blank" rel="nofollow">length</a> <span class="var">$<span class="symb">nodes</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>0<span class="op rd">]</span><span class="op ld">[</span>0<span class="op rd">]</span></span> <span class="op rd">)</span>
      <span class="op ld">{</span>
        <span class="var">$<span class="symb">c</span></span> = <a class="kwd" href="http://perldoc.perl.org/functions/substr.html" target="_blank" rel="nofollow">substr</a> <span class="var">$<span class="symb">nodes</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>0<span class="op rd">]</span><span class="op ld">[</span>0<span class="op rd">]</span></span>, <span class="num">0</span>, <span class="num">1</span><span class="op stmt">;</span>
        <span class="var">$<span class="symb">qc</span></span> = <a class="kwd" href="http://perldoc.perl.org/functions/quotemeta.html" target="_blank" rel="nofollow">quotemeta</a> <span class="var">$<span class="symb">c</span></span><span class="op stmt">;</span>
        <span class="var">$<span class="symb">nodes</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>1<span class="op rd">]</span><span class="op ld">[</span>0<span class="op rd">]</span></span> =~ <span class="symb">m</span>/^<span class="var">$<span class="symb">qc</span></span>/ <span class="kwd">or</span> <a class="kwd" href="http://perldoc.perl.org/functions/undef.html" target="_blank" rel="nofollow">undef</a> <span class="var">$<span class="symb">c</span></span><span class="op stmt">;</span>
      <span class="op rd">}</span>

      <span class="kwd">unless</span> <span class="op ld">(</span> <a class="kwd" href="http://perldoc.perl.org/functions/defined.html" target="_blank" rel="nofollow">defined</a> <span class="var">$<span class="symb">c</span></span> <span class="op rd">)</span>
      <span class="op ld">{</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="var">$_<span class="op ld">[</span>0<span class="op rd">]</span></span> <span class="kwd">unless</span> <span class="var">@$<span class="symb">nodes</span></span> &gt; <span class="num">2</span><span class="op stmt">;</span>

        <span class="cmnt">## try to reduce next list part</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">first</span></span> = <a class="kwd" href="http://perldoc.perl.org/functions/shift.html" target="_blank" rel="nofollow">shift</a> <span class="var">@$<span class="symb">nodes</span></span><span class="op stmt">;</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">next</span></span> = <span class="symb">reduce</span> <span class="op ld">[</span><span class="str">&#039;&#039;</span>, <span class="var">$<span class="symb">nodes</span></span>, <span class="num">0</span><span class="op rd">]</span><span class="op stmt">;</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="op ld">[</span> <span class="var">$<span class="symb">prefix</span></span>, <span class="op ld">[</span> <span class="var">$<span class="symb">first</span></span>, <span class="var">$<span class="symb">next</span></span> <span class="op rd">]</span>, <span class="var">$<span class="symb">opt</span></span><span class="op rd">]</span> <span class="kwd">if</span> <a class="kwd" href="http://perldoc.perl.org/functions/length.html" target="_blank" rel="nofollow">length</a> <span class="var">$<span class="symb">next</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>0<span class="op rd">]</span></span><span class="op stmt">;</span>

        <span class="cmnt">## couldn&#039;t be reduced</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="op ld">[</span> <span class="var">$<span class="symb">prefix</span></span>, <span class="op ld">[</span> <span class="var">$<span class="symb">first</span></span>, <span class="var">$<span class="op ld">{</span><span class="var">$<span class="symb">next</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>1<span class="op rd">]</span></span><span class="op rd">}</span></span> <span class="op rd">]</span>, <span class="var">$<span class="symb">opt</span></span> <span class="op rd">]</span><span class="op stmt">;</span>
      <span class="op rd">}</span>

      <span class="cmnt">## reduce any ensuing node whose prefix starts with $c</span>
      <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">@<span class="symb">new</span></span><span class="op stmt">;</span>
      <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">newopt</span></span> = <span class="num">0</span><span class="op stmt">;</span>
      <a class="kwd" href="http://perldoc.perl.org/functions/while.html" target="_blank" rel="nofollow">while</a> <span class="op ld">(</span> <span class="var">@$<span class="symb">nodes</span></span> <span class="op rd">)</span>
      <span class="op ld">{</span>
        <span class="var">$<span class="symb">nodes</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>0<span class="op rd">]</span><span class="op ld">[</span>0<span class="op rd">]</span></span> =~ <span class="symb">s</span>/^<span class="var">$<span class="symb">qc</span></span>// <span class="kwd">or</span> <a class="kwd" href="http://perldoc.perl.org/functions/last.html" target="_blank" rel="nofollow">last</a><span class="op stmt">;</span>

        <span class="cmnt">## reduce node or detect new optional node</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">n</span></span> = <a class="kwd" href="http://perldoc.perl.org/functions/shift.html" target="_blank" rel="nofollow">shift</a> <span class="var">@$<span class="symb">nodes</span></span><span class="op stmt">;</span>
        <span class="kwd">if</span> <span class="op ld">(</span> <a class="kwd" href="http://perldoc.perl.org/functions/length.html" target="_blank" rel="nofollow">length</a> <span class="var">$<span class="symb">n</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>0<span class="op rd">]</span></span> <span class="op rd">)</span>
        <span class="op ld">{</span>
          <a class="kwd" href="http://perldoc.perl.org/functions/push.html" target="_blank" rel="nofollow">push</a> <span class="var">@<span class="symb">new</span></span>, <span class="var">$<span class="symb">n</span></span><span class="op stmt">;</span>
          <a class="kwd" href="http://perldoc.perl.org/functions/next.html" target="_blank" rel="nofollow">next</a><span class="op stmt">;</span>
        <span class="op rd">}</span>
        <span class="var">$<span class="symb">newopt</span></span> = <span class="num">1</span><span class="op stmt">;</span>
      <span class="op rd">}</span>

      <span class="kwd">if</span> <span class="op ld">(</span> <span class="var">@$<span class="symb">nodes</span></span> || <span class="var">$<span class="symb">opt</span></span> <span class="op rd">)</span>
      <span class="op ld">{</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">new</span></span> = <span class="symb">reduce</span> <span class="op ld">[</span> <span class="var">$<span class="symb">c</span></span>, <span class="op ld">[</span> <span class="var">@<span class="symb">new</span></span> <span class="op rd">]</span>, <span class="var">$<span class="symb">newopt</span></span> <span class="op rd">]</span><span class="op stmt">;</span>
        <span class="kwd">if</span> <span class="op ld">(</span> <span class="var">@$<span class="symb">nodes</span></span> <span class="op rd">)</span>
        <span class="op ld">{</span>
          <span class="cmnt">## reduce remaining nodes</span>
          <a class="kwd" href="http://perldoc.perl.org/functions/my.html" target="_blank" rel="nofollow">my</a> <span class="var">$<span class="symb">next</span></span> = <span class="symb">reduce</span> <span class="op ld">[</span><span class="str">&#039;&#039;</span>, <span class="var">$<span class="symb">nodes</span></span>, <span class="num">0</span><span class="op rd">]</span><span class="op stmt">;</span>
          <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="op ld">[</span> <span class="var">$<span class="symb">prefix</span></span>, <span class="op ld">[</span> <span class="var">$<span class="symb">new</span></span>, <span class="var">$<span class="symb">next</span></span> <span class="op rd">]</span>, <span class="var">$<span class="symb">opt</span></span><span class="op rd">]</span> <span class="kwd">if</span> <a class="kwd" href="http://perldoc.perl.org/functions/length.html" target="_blank" rel="nofollow">length</a> <span class="var">$<span class="symb">next</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>0<span class="op rd">]</span></span><span class="op stmt">;</span>

          <span class="cmnt">## couldn&#039;t be reduced</span>
          <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="op ld">[</span> <span class="var">$<span class="symb">prefix</span></span>, <span class="op ld">[</span> <span class="var">$<span class="symb">new</span></span>, <span class="var">$<span class="op ld">{</span><span class="var">$<span class="symb">next</span><span class="op ptr">-&gt;</span><span class="op ld">[</span>1<span class="op rd">]</span></span><span class="op rd">}</span></span> <span class="op rd">]</span>, <span class="var">$<span class="symb">opt</span></span> <span class="op rd">]</span><span class="op stmt">;</span>
        <span class="op rd">}</span>

        <span class="cmnt">## current node is optional</span>
        <a class="kwd" href="http://perldoc.perl.org/functions/return.html" target="_blank" rel="nofollow">return</a> <span class="op ld">[</span> <span class="var">$<span class="symb">prefix</span></span>, <span class="op ld">[</span> <span class="var">$<span class="symb">new</span></span>, <span class="var">@$<span class="symb">nodes</span></span> <span class="op rd">]</span>, <span class="var">$<span class="symb">opt</span></span> <span class="op rd">]</span><span class="op stmt">;</span>
      <span class="op rd">}</span>

      <span class="cmnt">## nothing left to reduce</span>
      <span class="symb">reduce</span> <span class="op ld">[</span> <span class="var">$<span class="symb">prefix</span></span>.<span class="var">$<span class="symb">c</span></span>, <span class="op ld">[</span> <span class="var">@<span class="symb">new</span></span> <span class="op rd">]</span>, <span class="var">$<span class="symb">newopt</span></span> <span class="op rd">]</span><span class="op stmt">;</span>
    <span class="op rd">}</span><span class="op stmt">;</span>

    <span class="symb">reduce</span> <span class="op ld">[</span> <span class="str">&#039;&#039;</span>, <span class="op ld">[</span><span class="op ld">(</span> <a class="kwd" href="http://perldoc.perl.org/functions/map.html" target="_blank" rel="nofollow">map</a> <span class="op ld">{</span> <span class="op ld">[</span><span class="var">$_</span><span class="op rd">]</span> <span class="op rd">}</span> <a class="kwd" href="http://perldoc.perl.org/functions/sort.html" target="_blank" rel="nofollow">sort</a> <span class="var">@_</span> <span class="op rd">)</span><span class="op rd">]</span>, <span class="num">0</span><span class="op rd">]</span><span class="op stmt">;</span>
  <span class="op rd">}</span><span class="op stmt">;</span>

  <span class="symb">toString</span><span class="op ld">(</span><span class="symb">fold</span><span class="op ld">(</span><span class="var">@_</span><span class="op rd">)</span><span class="op rd">)</span><span class="op stmt">;</span>
<span class="op rd">}</span><span class="op stmt">;</span></code></pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_qgobsnhs_6')})/*]]&gt;*/</script>
<p>
Well, not so easy, but it works :)
</p>
<p>
Here, the inner recursion <span class="code">fold</span> will create the actually tree, where nodes
having the form of arrays consisting of prefix, subnodes and a flag denoting optional nodes.
The second inner function <span class="code">toString</span> then creates the actual regular
expression string from that tree.
So, for instance, calling
</p>
<fieldset class="collapsible"><legend><a href="javascript:void(0)" id="collapsible_qgobsnhs_7">[-] hide code</a></legend><div class="collapsible-container"><pre class="code"><code class="perl">&amp;<span class="symb">foldWordsToRegex</span><span class="op ld">(</span><span class="qlo qw"><span class="kwd">qw</span><span class="op">(</span><span class="istr"> for far bar foo boofaz boofar boof faz foobaz foobars boofar </span><span class="op">)</span></span><span class="op rd">)</span></code></pre></div></fieldset><script type="text/javascript">/*<![CDATA[*/xLib.onLoad(function(){Blog.Collapsible.create('collapsible_qgobsnhs_7')})/*]]&gt;*/</script>
<p>
would return the regex <span class="code">(2)</span>.
</p>
]]>
  
  </content>
</entry>

</feed>
