beta BLOG dot NET

/* bugs, features, drafts, and solutions. */

// november 2009 archive

Sebastian blogged on 2009-11-28T23:33:32+00:00

a wordlist folding algorithm


Assumed you wish to match a large wordlist against a huge chunk of text. As a small test case, let for, far, bar, foo, boofaz, boofar, boof, faz, foobaz, foobars, boofar be your wordlist. Now, you may apply the according regualar expression: But which way a regex engine would implement the assignment?  → read more …

# 

$tags

 = [  ];
# 

$categories

 = [  ];
# 

no comments

, 

no trackbacks

→ view entry

Sebastian blogged on 2009-11-17T20:23:58+00:00

understanding unicode surrogates / or: how to deal with Linear B strings in .NET


Remember a String object in .NET is a collection of Char objects, where a Char object in turn s announced as a unicode character, encoded by a 16bit unsigned integer. Thus, more precisely speaking, a single Char object is able to encode any codepoint within the basic multilingual lane (BMP), i.e. between U+0000 and U+FFFF. So, where goes the rest of the story? Unicode, as an universal character set, is designed to support much more than 65536 characters of ourse.  → read more …

# 

$tags

 = [  ];
# 

$categories

 = [  ];
# 

no comments

, 

no trackbacks

→ view entry

Sebastian blogged on 2009-11-12T23:44:36+00:00

Fun with European domain names


Starting 10 December 2009, companies and private persons based in the European Union will be able to register.eu Internationalised Domain Names  → read more …

# 

$tags

 = [  ];
# 

$categories

 = [  ];
# 

no comments

, 

no trackbacks

→ view entry

here goes the message.