Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
[flagged] C Is Not Reasonable (osr.com)
28 points by ingve on Oct 3, 2016 | hide | past | favorite | 69 comments


First line:

> ULONGLONG tableOffset;

It's starting well, for a C example...

Oh, and it ends even better:

> I write drivers for a living, not scientific or statistical analysis software. During this project, I quickly learn that casting everything to (double) was my friend. When in doubt, stick a (double) in front of it and test it again. In the end, the code worked pretty well. The customer was happy. We got paid.

o_O

OKOK, so you write drivers but don't know about C types, standard types, fixed-length types, you randomly fiddle with types until it passes a test but you don't know why it passes or doesn't, and you're proud of selling that.

Wonderful.

EDIT: added the "even better" part.


>OKOK, so you write drivers but don't know about C types, standard types, fixed-length types

That was my thought as well. I've occasionally been hired to write windows kernel drivers and I'd say that if you don't have a good grasp of stuff like C's type promotion, you're gonna have a bad time. Surprised to see this from OSR, which was a great source of info for driver development arcana when I was doing it.

Edit: remove repeated phrase


You're both missing the point.

Knowing the rules is not the same thing as consistently applying the rules across code you wrote or didn't write (or wrote late at night). The first is easy for a human, the second is hard for a human. That's why language researchers post-1972 have had jobs.


As someone who only has a passing knowledge of C, why is that wrong? Is it because it should be "uint_64" instead?


Because ULONGLONG is reminiscent of Windows, Visual Studio, and its flaky and outdated support for C.

It does not remove the argument of C choosing to use the type of the operands to determine the width of the calculus, but it shown that this is not portable, standard C.

Also, if the guy writes drivers, you should expect him to be aware of these problems, possibly having built an extensive set of preprocessor macros to handle these problems, like all other drivers use. But he seems to just loop into the "modify /compile" until it compiles and behaves, without trying to understand.


To be fair, though, this entire comment thread echoes the main point of the article: in what world is it reasonable to have to keep track of such things?


Drivers, for one. Any world where performance is so critical that you want to ensure you're fitting as much information as you reasonably can into the L1 and L2 caches for the computations you must do.

There's even still a role in this world for people to fine tune the assembly code for a specific execution environment to make their HPC models execute faster.


Respectfully, this misses my point (which, granted, was poorly explained).

The necessary evil here is using low-level languages. The unnecessary evil is using a language like C with arbitrary and highly variant conventions.

Better alternatives exist for writing drivers (e.g. Rust). To be clear, historical baggage sometimes dictates that we must use C, but that doesn't change the fact that much of the C world is a footgun, leading to eye-gouging frustration and threads such as this one about the myriad pitfalls associated with the language.

The author might not be a good C programmer, but this thread nevertheless supports his point. The necessity of writing C in practice is neither here nor there.


Rust and its ilk may eventually eat this space, but it doesn't appear ready. For one, it's list of supported architectures is too small. For two, it's poorly optimized, when compared to C. Three, it's still too new - it has only been "stable" for a bit over a year. Four, the static verification tools are not yet there (you still need a way to verify the unsafe portions of code, which will not be trivial in a driver).

Do you know of any existing non-toy (i.e. distributed in support of a device) drivers which have been written in Rust? I'd love to see some in action, and will be happy revise my opinion of Rust's capabilities when those drivers start rolling out.

Let's be frank for a moment. Rust's biggest advantage in this space is its memory safety. But we've had memory safe languages for as long as we've had C, and they still have not eaten C's lunch. If Rust wants to pave the way into C's territory (and not just C++'s territory), it will need to identify why and address that.


Okay, point taken, but I'd like to back-pedal a bit if you don't mind. I agree that Rust is very immature, but again I think I misspoke.

The basic point I'm trying to make is that C is (a) fraught with historical baggage that makes it difficult to use safely, in practice and that (b) this comment thread proves this point.

All points made beyond this are secondary and tangential. Again, I have to agree with you about the maturity of Rust, but I must insist that using C is fraught with absolute insanity. Sometimes it's the only thing available, but it's still insane.


  >  For two, it's poorly optimized, when compared to C
I'd be interested in hearing more about this.


The ULONGLONG typedef is a 64 bit integer. They should use that.

By the way, MS's compiler has had stdint.h since 2010. So they are making improvements, slowly.


Its wrong because you don't understand the problem you're trying to solve nor why the tests passes after randomly trying a double. This is cargo-culting.


unsigned long long if you want at least a 64-bit unsigned.

uint64_t if you want a fixed-length 64-bit unsigned (and your compiler is at least C99, but I think long long came with C99 too).

Note that unsigned long is at least 32-bit, so it could be 64-bit too, there is no guarantee it is exactly 32-bit.


It's silly because no casts give you an integer type that is too small, but casting to double converts to floating point, performs floating point math, then assigns the result to the correct type they should have used in the beginning.

Floating point math in low level C code is very often a WTF. You most often want integers.


Sadly, C does not support fixed point arithmetic. You can hack it on top, but the result is super ugly.


> When in doubt, stick a (double) in front of it

If casting from an integer type, this loses precision. Why would this make tests pass? Are there any secondary effects? The author apparently made no effort to find out.


This may lose precision. Depending on architecture & compiler & flags, your mantissa may be big enough to hold your whole int. An 80 bit extended double has 64 bits of significand.

https://en.wikipedia.org/wiki/Extended_precision#x86_extende...


This will lose precision, as any floating point operation is only accurate to 1 unit in the last place (ULP). Even simple addition.


This conversion will not lose precision!

With a 64-bit mantissa, ULP(x) is ≤ 1 for x in [-2e64, 2e64]. If you have the time you can count up to 2^64 by repeatedly incrementing a float80(0.0) by 1.

Addition, subtraction and multiplication on float80s are perfectly stable provided the results stays within [-2e64, 2e64].


There is no guarantee on stability of those ops. Intel and IEEE guarantees 1 ULP. Even if you use the 80 bit type. The error is not guaranteed to not be additive over multiple operations either. And for multiplication and division? Let's not even go there.

This will mess up all kinds of sharp comparisons on the result of casted operations.


Well, uint64_t ;) - but ULONGLONG is uint64_t, so it's the same thing.

The real "problem" here is presumably that ULONGLONG is a Windows-specific type, probably. (I say probably, because you can always make your own type called ULONGLONG - there's no rule to stop you.) People do like to mock people who program for Windows.


ULONGLONG is in fact a Windows Drivers Kit specific type. WDK has its own specialised set of weird and uncomfortable type conventions in the MS style [0] and although the author's point about C arithmetic type promotion is generic, his context is WDK device drivers, so ULONGLONG is absolutely correct, if possibly somewhat archaic (there's a ULONG64 lately).

[0] https://www.osr.com/blog/2015/05/27/newbie-corner-theres-typ...


It's in user mode too - as the return type of GetTickCount64, for example: https://msdn.microsoft.com/en-gb/library/windows/desktop/aa3...


  ULONGLONG tableOffset;

  tableOffset = (l1Index * L1_TABLE_GRANULARITY) +
                (l2Index * L2_TABLE_GRANULARITY) +
                startingL3->StartingOffset;
I'm tempted to point out that there's some HTML mishap, but actually the statement will be compiled if there is a variable called gt and one called StartingOffset.


Arithmetic is surprisingly hard to get "right". You might try to generalise from this example that "multiplying two N bit numbers together should give a result 2N" wide, then discover that for simple examples you run out of machine bits. Then there's overflow/saturation handling, which is a mess everywhere: lots of systems have hardware support for saturation arithmetic, but you can't conveniently specify it in C.

If anyone could think of a good, concise way of expressing all these bells and whistles of arithmetic, it could be implemented as a language or language frontend. For now, most languages choose to either ignore it entirely or push the user to floating or arbitary-precision arithmetic as an 'improvement'.


For modern applications programming, arbitrary-precision is probably the right way to do integer arithmetic. Python does that, and it doesn't seem to cause any trouble. The people who need to do massive amounts of numeric stuff know who they are and can take the time to learn the relevant arcana, but your typical cat pictures app never has to worry about how big any of its integers are.


> Python does that

So do Ruby or Erlang. The problem, of course, is that it has a cost: trivial arithmetic operations have to be checked and may need to allocate.

And operational coverage can be spotty outside of the trivial range e.g. when you give a bignum to an "integer" operation going through FPN, bad things can happen as said FPN are generally machine doubles (fp64)


If Python handled overflow with an exception instead of a bignum, I'd bet it still wouldn't cause trouble either. The actual values generally aren't reaching bignums.


Yeah, I recently read on some forum someone who was looking for a quick C++ training for a friend and said "he doesn't need a deep training, just some basic syntax, he's not a programmer, that doesn't interest him, he's just doing scientific calculus".

I was like: Oh boy... getting arithmetic calculation right is really tricky and you have to understand many not-obvious details, especially concerning the behaviour of floating-points numbers, and then you have to make choices or compromises depending on your requirements. But if you have no idea of the limitations of CPUs and languages, the naïve requirement is just "exact, full-precision, as long as needed" and you'll hit some bugs sooner or later.


The Cambridge (UK) maths undergraduate course used to do exactly this; include a numerical programming section without any training. They effectively just gave people a small volume of API calls and expected them to get on with it. The only reason it worked at all was a small number of undergraduates who could already program running guerilla assistance courses in C for bewildered mathematicians.

Programming is becoming increasingly important in science, but still treated as a skill that people should just casually pick up.

Oh, and then there's a whole bunch of stupid reproducibility issues caused by Intel using 80-bit FP internally but then truncating on load/store. Very important if your algorithm isn't rigorously convergent.


Intel's 80-bit FP issues are only an issue on 32-bit platforms, amd64 stuff uses SSE. (Right?)


> then discover that for simple examples you run out of machine bits

Why? I don't think it would be common to run out of machine bits for correct programs (and not really run out, just go into bigints in some cases). But the compiler cannot calculate those bits based on machine types, instead it should track all possible values/ranges and multiply them too to decide whether the result could fit into the type or it needs a bit bigger one. And since values don't appear out of thin air, but come from literals or some input and checked for specific ranges all the time, it should be possible to automatically choose appropriate types and only warn about possible overflows or performance penalties due to bigints.

Naive approach of multiplying machine types is pointless, of course, on that I agree.


Generally, for fixed point operation you need to explicitly handle two things: overflow and rounding. The former may not be an issue in some applications, but the latter will. It is easy to introduce a cumulative error by incorrect rounding for instance.


I suppose rounding is solvable too. Depending of how value is used at the end it would be possible to deduce how much precision is needed to guarantee no cumulative error.


It is strange to pick on C for this, as I cannot think of a single language that works the way the author seems to want. The only exception I can think of is Perl, which has the "wantarray" function that lets a function vary its behavior based on what its return is being assigned to: http://perldoc.perl.org/functions/wantarray.html

With that exception, it's pretty much always assumed that an expression is evaluated independently of what it is being assigned to. I have a feeling that the author's preferred behavior would lead to some surprising results, but I can't think of any good examples off the top of my head.


Looks like a use-case for typeclasses, e.g. those in Haskell. A typeclass is like a Java interface, except the method lookup can use the return type as well.

For example, we could make "plus" depend on the return type:

    -- "plus" takes two instances of "Number" and returns an "a"
    class Additive a where
      plus :: (Number b) => b -> b -> a

    -- Take two Numbers and return an Int
    instance Additive Int where
      plus x y = intPlus (toInt x) (toInt y)

    -- Take two Numbers and return a Float
    instance Additive Float where
      plus x y = floatPlus (toFloat x) (toFloat y)
We can even abuse this to do ugly hacks, like making "plus x y" return a list of "[x + y]" when the return type requires a list:

    -- Take two Numbers and return a list containing their sum
    instance (Additive a) => Additive (List a) where
      plus x y = [plus x y]
Alternatively, we could have returned "[plus x 0, plus y 0]" to make things even uglier ;)


The surprise would be if you wanted to, say, add 100 to a uint8 and have it wrap, but were assigning the result to a uint64. Then your arithmetic wouldn't wrap, unless you explicitly told it to, i.e. "x = (y + 100) & 0xff;"

Either way leads to surprises if your intuition tells you the other thing should happen.

Strictly, in C the behaviour is undefined when a signed integer overflows. Although it wouldn't help in the author's case, since he's using unsigned integers, it is valid for a compiler to internally promote the operands of signed arithmetic.


> It is strange to pick on C for this, as I cannot think of a single language that works the way the author seems to want.

Agreed, though recent crop of languages with less implicit conversion will reject the entire thing rather than silently expand the value at the end, e.g. in Rust

    error[E0308]: mismatched types
     --> <anon>:7:19
      |
    7 |     tableOffset = (l1Index * L1_TABLE_GRANULARITY) +
      |                   ^ expected u64, found u32
    
    error: aborting due to previous error


So far I found this mandatory explicit conversion to only get in the way. Caused me a bug even (in Go), because it made it too noisy and very unclear where the value was unnecessarily trimmed.


Yeah it's a mixed blessing, implicit conversions are sources of bugs but the lack of them can be a real pain in the ass when mixing numerics of different bit width. Maybe it's possible to write a rustc plugin to opt-in auto-expansion of numeric types in the style of unsafe blocks? E.g. `extend!(numexpr)` which would go through the numeric expression and automatically (sign-|zero-)extend values to make it typecheck?


It would be nice if languages allowed us to overload function names with the exact same parameters but different return types. The compiler or runtime environment should be able to select the correct implementation automatically based on type inference, or allow the programmer to specify which one should be used with an annotation. This would make some code a little cleaner instead of having to give all of those functions different names.


> It would be nice if languages allowed us to overload function names with the exact same parameters but different return types.

Haskell's typeclasses kind-of do that. `read` has type `Read a => String -> a`, so the result's type `a` (which can be inferred) drives which "Read" instance will be used:

    Prelude> read "1" :: Int
    1
    Prelude> read "1" :: Float
    1.0


> Haskell's typeclasses kind-of do that.

You're over-hedging. As your example shows, Haskell's typeclasses exactly do that.


I'm mostly hedging on the "overloading" part as Haskell doesn't support arbitrary ad-hoc polymorphism in the sense of C++/Java/C# (IIRC the foundational paper on typeclasses is about making ad-hoc polymorphism less ad-hoc). Depending how far into the overloading field nradov is, typeclasses may not match their definition of or requirements for overloading.

It's probably more noticeable in Rust than in Haskell.


Ah, that makes sense. I don't think there's a reasonable way to define "overloading" that excludes typeclasses, but it's true that there are things that can properly be termed "overloading" which typeclasses can't (or shouldn't) do.


> or allow the programmer to specify which one should be used with an annotation.

That's exactly:

> give all of those functions different names.


Effectively; although "name" usually implies an opaque symbol, e.g. the language would see no difference between, say, "addAsInts" vs. "addAsFloats", compared to "addAsInts" vs. "divide".

Classes, namespaces, modules, etc. allow names to have a more fine-grained structure, e.g. "int.add" and "int.divide" come from the same module, whilst "int.add" and "float.add" are alternative implementations of the same signature.


> whilst "int.add" and "float.add" are alternative implementations of the same signature.

Maybe in a functional programming language like Haskell. But not in C++.


The syntax was just an example ;)

What I described is actually closer to ML.


Has nothing to do with syntax.


Similar, but not exactly the same. It would make the syntax and documentation a little cleaner.


"This is wrong, and there's a way to fix it, but shut up I don't want to learn."

ok...


Indeed. The post begins with a question and asserts that the reader doesn't know the answer. I knew the correct answer instinctively and I haven't written any C in anger in years. Expecting a C compiler to promote expression operands prior to assignment because the output location is a larger size... that's just not the mentality C programmers develop; what if the output is smaller? Should all the expression operands be demoted automatically, producing radically different results as large numbers get truncated prior to evaluation? No. Obviously not. That sort of thing is self-evident to a competent C programmer.


I really couldn't organize my thoughts from the sheer amount of criticism I have of this blog post. So instead, I'll kindly ask anyone to give this person a "do what I mean, and not what I say" programming language.

And they will realize that even that has quirks.


C is a minimal abstraction over asm and I wouldn't be surprised if this behavior can be configured in some C compiler, even though it would be odd to do so.

What I find surprising is when people say C is a simple and small language that's nice to build portable programs in. So much of C is undefined behavior or implementation specific that many professional users pin a certain compiler toolchain version for a project. Part of it is due to changing and hard to predict optimizer passes, but most of it is due to the purported portability feature of C. I mean, if I leave a lot of the semantics undefined or to be defined per compiler+target, then it's not really a portability abstraction. Not to mention the missing or, if you consider POSIX, inconsistent across platforms, C stdlib.

It's educational to look into Modula-3 and SPIN. SPIN is like MirageOS but with dynamic loading of components whereas in Mirage everything is compiled ahead of time into the final image. I mention Modula-3 because it's of similar age as ANSI-C and serves as an example of an OS written in a comparatively safe language around the same time Unix won and gave us the hegemony of C (with all the avoidable security fallout ever since).

Even though software isn't like the real world and could be improved substantially unlike world politics, we still continue dealing with bugs due to C, although we have better options. I think as long as new operating systems or libraries and applications are written in C, this won't change. Just the idea of writing kernels for IoT systems in C in 2016 makes no sense from a reliability and high assurance perspective. Getting there half way would be exposing hard-to-misuse C APIs where the implementation is in a safe language (Rust, ATS2, etc). For the moment consuming C APIs is the cross platform library interface we have to deal with, but having at least the implementation be safe would go a long way.


C portability is not portability in the Java or Python sense. It's portability across architectures, and undefined (and implementation-defined) behavior is a powerful tool for that.


Of course, but shifting the burden onto developers, where most of them aren't professional libc or kernel engineers, leads to avoidable issues.

Rust seems familiar enough to many developers, and forces them to think about resource management before running the code by not allowing incorrect code, so I hope it will lead to more libraries and applications that would have otherwise been easily plagued with issues due to choosing C. Granted, I'm not a fan of how Rust's surface has turned out, but it's a sensible compromise to attract masses of developers into writing less buggy code that operates on the same level as that written in C. So I use it as a C replacement, and for that use case I like it because there's momentum behind it.



The way to understand it is that "+" when applied to ULONGs returns a ULONG. What the heck else it would return, in a language that doesn't have arbitrary precision arithmetic?


If it were consistent with FLT_EVAL_METHOD==2, then it would evaluate in the highest precision available.

There's a lot of smug criticism of this article, but there actually is a reasonable point that FLT_EVAL_METHOD is inconsistent with the way integers are handled in C.


That's a good point.


What harsh comments about a person that is just asking for a warning. Tell me, do you really think such a warning wouldn't be reasonable?

I do agree with the author that it would be better if C coerced the operators into the result type before the calculation instead of after. I would really like a warning when there's an implicit coercion at the end of a calculation.


Programming practically requires a willingness to continue learning. But his statements are deeply troubling:

> I’m not annoyed by the way statements are formed, or even by the precedence order (which I readily admit to not knowing or understanding or even caring much about)

> And don’t complain about how I parenthesize my arithmetic statements. I already mentioned precedence order. All those parens are the result of yet another lesson I learned to avoid working weekends.

... his example is saying (a×b)+(c×d)+e->f. I understand it being a learning curve to memorize that bit-shifts have higher precedence over bitwise operators, but ... you learn that multiplication happens before addition in elementary school! It's a pre-requisite to learning pre-Algebra in middle school. This isn't even some arcane programming thing you have to learn.

Sure, you can pepper all your code in parentheses, but sooner or later you're going to come across code that doesn't. And a person like that working on such a codebase is a huge danger.

I don't say this to be mean, but I really don't think programming is the right profession for this guy.


Whilst the delivery could be better, I agree with the sentiment that operator precedence is a waste of time. It's exactly the kind of mundane, error-prone work that machines should be doing for us, whether it's via a sophisticated structured editor, or a simple hack like Emacs's various paredit-like modes.

> Sure, you can pepper all your code in parentheses, but sooner or later you're going to come across code that doesn't. And a person like that working on such a codebase is a huge danger.

I can certainly imagine someone who doesn't care about precedence using an editor which disambiguates such things automatically; either by adding parentheses, colouring the background, etc. I can also imagine such a person spotting precedence bugs introduced by a colleague who considered themselves to be above such tooling.

Lisp, Forth and friends do perfectly well without having to consult precedence tables, and whilst I appreciate that some would prefer more syntax than those provide, I think precedence rules should still stick to the meta-level, parser-directing stuff like block delimiters, statement separators ("x ; y"), and maybe syntax sugar like "," ".", "=", etc. If it can be written as a library (which certainly includes things like numeric procedures) then it shouldn't have any precedence.

Even this minimal amount of precedence should be avoidable if desired, e.g. to avoid ambiguities like "a = b xor c". In PHP this parses to "(a = b) xor c", which is a perfectly valid PHP expression and caused the most egregious waste of my time to date.

Whilst it's true that learning and applying "BODMAS" is simple enough to do in high school arithmetic, the context of high school arithmetic is vastly impoverished compared to computer programming.

For example, which has the higher precedence: integer multiplication or list append? What about floating-point division compared to image convolution? Tree construction or matrix subtraction?

I think such questions are silly, yet they're a legitimate concern in languages with heavy use of infix notation (e.g. Haskell). Languages which define a fixed set of infix operators to avoid such problems (e.g. C) cause an unnecessary asymmetry between operators and every other procedure (e.g. think about the number of times "function(x, y) { return x + y; }" has been written by Javascript programmers!), and are impoverishing their domain of discourse much like high school arithmetic (e.g. compare introductory C material to something like https://code.world/doc.html?help/codeworld.md or http://www.bootstrapworld.org/materials/spring2016/tutorial )


> Lisp, Forth and friends do perfectly well without having to consult precedence tables

It helps that they have "alien" syntaxes (prefix or postfix but not infix). Smalltalk is also a language which did away with precedence almost entirely: it has a strict right-to-left evaluation model, and only three precedence levels[0]: unary messages (aSubject aMessage), binary messages (aValue <operator> anOtherValue) and keyword messages (aSubject aMessage: aValue) but because it has "infix operators" the effect is much weirder.

[0] or possibly a fourth, the cascading `;` operator is not a message, and can become really, really weird when used with different message types e.g.

    7 + 4 squared; factorial; yourself.
binds like this:

    7 (+ (4 squared)); factorial; yourself.
so it computes 4 squared, adds it to 7, then computes the factorial of 7 and finally returns 7.


It's really simple, I have a table hanging right by my screen that has type of Operand1 and type of Operand2 and then the type of the result. It has never been an issue. In general, the result is always cast towards the larger and unsigned type.


implicit conversions bad stop always use casts for conversion stop do not write drivers for my hardware with this approach stop


So true. A page long rant is much more reasonable than the ten-or-so keystrokes that would paste the ULONLONGs in there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: