Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Somewhat surprised that the author doesn't go on to show how to reduce the truth table...

    let decide requiresApproval canUserApprove itemStatus =
        match requiresApproval, canUserApprove, itemStatus with
        |  true,  true, NotAvailable -> Hidden
        | false,  true, NotAvailable -> Hidden
        |  true, false, NotAvailable -> Hidden
        | false, false, NotAvailable -> Hidden
        |  true,  true,    Available -> ReadWrite
        | false,  true,    Available -> Hidden
        |  true, false,    Available -> ReadOnly
        | false, false,    Available -> Hidden
        |  true,  true,        InUse -> ReadOnly
        | false,  true,        InUse -> Hidden
        |  true, false,        InUse -> ReadOnly
        | false, false,        InUse -> Hidden
...to something like this:

    let decide requiresApproval canUserApprove itemStatus =
        match requiresApproval, canUserApprove, itemStatus with
        |     _,     _, NotAvailable -> Hidden
        | false,     _,            _ -> Hidden
        |  true,  true,    Available -> ReadWrite
        |  true,     _,            _ -> ReadOnly
I would think that being able to see and effect this reduction is part of the benefit of writing out the whole table. It might not always be the right call to actually reduce it, but in this case (based on the parameter names) each of the branches seems to makes sense from a domain perspective, so why not do it?


> so why not do it?

Because wildcard pattern matches are error prone.

When you add a new constructor, the compiler will remain satisfied that the pattern match is exhaustive and won't prompt you to verify the behaviour.


You can write it pretty compactly in Rust without introducing wildcards over non-boolean arguments. If you're extra paranoid you can write `true | false` instead of _ for those patterns in the below example

    match (requires_approval, can_user_approve, item_status) {
        (true, true, Available) => ReadWrite,
        (true, false, Available) => ReadOnly,
        (true, _, Inuse) => ReadOnly,
        (true, _, NotAvailable) => Hidden,
        (false, _, NotAvailable | Available | InUse) => Hidden,
    }


> wildcard pattern matches are error prone.

True.

Unfortunately, they can also be asymptotically shorter than writing the whole thing out: for example, try writing a lexicographic comparison function for an AST type of your choosing and making it linear in the number of constructors.

If the language is lazy, it can be more severe than that: in Haskell,

  andL False _   = False
  andL _     rhs = rhs
is the short-circuiting AND, while

  andS False False = False
  andS False True  = False
  andS True  False = False
  andS True  True  = True
is the naïve one.


I have a related anecdote. I generated a pattern matching system for Unicode grapheme cluster breaks once (using OCaml which is similar to F#), and it turned out to be very slow due to the branch mispredictions introduced by pattern matching.

Later, I borrowed a trick learned from someone else and used a hashtable to store the relevant data (written to only once when the program starts) instead. At this point, the performance cost was so small that I couldn't tell any difference compared to before I introduced pattern matching.

For what it's worth, there are about 18000 characters specified by Unicode for special handling with grapheme breaks[0] so I don't know how noticeable the difference would be with the pattern matching tables you're likely to encounter in real life.

[0] https://www.unicode.org/Public/UCD/latest/ucd/auxiliary/Grap...


Yeah, the usual way to compile pattern matching is a decision tree, because once you decide to check one of the arguments it can completely change which of the remaining arguments is a better candidate to look at next. (Of course, in a lazy language, it's even more subtle, but ML is strict.) It's a bit like compiling switch statements, only more so, and I expect compilers for functional languages have less tuning for crazy corner cases like a 18k-entry table than, say, Clang, simply because they have less tuning crazy corner cases overall. (I once had Clang compile a switch into a lookup table and stuff the table into a 64-bit constant, for crying out loud.)

For Unicode tables, the usual approach is of course a radix trie, even if the traditional two-level trie from the Unicode 2.0 days is starting to look a bit bloated now and a three-level tree is quite a bit slower. A hash might be a good approach for some things. One potential downside is that it completely destroys memory locality, while normal Unicode data will usually use a fairly small subset of the characters; I'm not sure how important this would turn out to be. People (e.g. libutf and IIRC duktape) have also used a sorted table of alternating start- and endpoints of ranges.

[The 18k figure sounds scarier than it is: 11k of those are precomposed Korean syllables encoded in a well-known systematic way, so for 0xAC00 <= codepoint < 0xAC00 + 19 * 21 * 28 the table just says (if (codepoint - 0xAC00) mod 28 = 0 then LV else LVT).]


I understand your point, but it's unlikely that there's going to be new constructors added to Boolean :)

EDIT: Oh, actually, nevermind. After reading more closely, I do agree that it might be reduced a bit too much!


>wildcard pattern matches are error prone

That's my problem with var in Java




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: