FYI, at least in C/C++, the compiler is free to throw away assignments to any memory pointed to by a pointer if said pointer is about to be passed to free(), so depending on how you did this, no perf impact could have been because your compiler removed the assignment. This will even affect a call to memset()
I patched the free() implementation itself, not the code that calls free().
I did, of course, test it, and anyway we now run into the "freed memory" pattern regularly when debugging (yes including optimized builds), so it's definitely working.
That code is not guaranteed to work. Declaring memset_v as volatile means that the variable has to be read, but does not imply that the function must be called; the compiler is free to compile the function call as "tmp = memset_v; if (tmp != memset) tmp(...)" relying on its knowledge that in the likely case of equality the call can be optimized away.
Whilst the C standard doesn't guarantee it, both LLVM and GCC _do_. They have implementation-defined that it will work, so are not free to optimise it away.
Most of C++ programs written before P0593R6 depended on implementation behaviour, and were graciously allowed to not be undefined behaviour just 5 years ago. C++ as a language standard is mostly irrelevant, what one should care about is what the compiler authors consider valid code.
You have to rely on implementation for anything to do with what happens to memory after it is freed, or really almost anything to do with actual bytes in RAM.
The C committee gave you memset_explicit. But note that there is still no guarantee that information can not leak. This is generally a very hard problem as information can leak in many different ways as it may have been copied by the compiler. Fully memory safe languages (so "Safe Rust" but not necessarily real-word Rust) would offer a bit more protection by default, but then there are still side-channel issues.
Because, for the 1384th time, they're pretending they can ignore what the programmer explicitly told them to do
Creating memset_explicit won't fix existing code. "Oh but what if maybe" is just cope.
If I do memset then free then that's what I want to do
And the way things go I won't be surprised if they break memset_explicit for some other BS reason and then make you use memset_explicit_you_really_mean_it_this_time
Your problem is not the C committee but your lack of understanding how optimizing compilers work. WG14 could, of course, specify that a compiler has do exactly what you tell it to do. And in fact, every compiler supports this already: Im most cases even by default! Just do not turn on optimization. But this is not what most people want.
Once you accept that optimizing compilers do, well, optimizations, the question is what should be allowed and what not. Both inlining "memset" and eliminating dead stores are both simply optimizations which people generally want.
If you want a store not to be eliminated by a compiler, you can make it volatile. The C standard says this can not be deleted by optimizations. The criticism with this was that later undefined behavior could "undo" this by "travelling in time". We made it clear in ISO C23 that this not allowed (and I believe it never was) - against protests from some compiler folks. Compilers still do not fully conform to this, which shows the limited power WG14 has to change reality.
> Once you accept that optimizing compilers do, well, optimizations
Why in tarnation it is optimizing out a write to a pointer out before a function that takes said pointer? Imagine it is any other function besides free, see how ridiculous that sounds?
It's been many years since C compilers started making pathological-but-technically-justifiable optimizations that work against the programmer. The problem is the vast sea of "undefined behavior" — if you are not a fully qualified language lawyer versed in every nook and cranny of the C standard, prepare to be surprised.
Many of us who don't like working under such conditions have just moved on to other languages.
I agree that compilers were too aggressive in exploiting UB, but this is not the topic of this thread which has nothing to do with UB. But also the situation with UB is in practice not too bad. While compilers broke some old code which caused frustration, when writing new code most UB can easily be dealt with in practice by following some basic ground rules (e.g. no unsafe casts, being careful with pointer arithmetic) and by activating some compiler flags. It is not anything that should cause much trouble when programming in C.
Because it is a dead store. Removing dead stores does not sound ridiculous to me and neither is it to anybody using an optimizing compiler in the last decades.
The whole point of the optimizer is that it can detect inefficiencies by treating every statement as some combination of simple, fundamental operations. The compiler is not seeing "call memset() on pointer to heap", it's seeing "write of variable size" just before "deallocation". For some, optimizing that will be a problem, for others, not optimizing it will leave performance on the table.
There are still ways to obtain the desired behavior. Just put a call to a DLL or SO that implements what you need. The compiler cannot inspect the behavior of functions across module boundaries, so it cannot tell whether removing the call preserves semantics or not (for example, it could be that the external function sends the contents of the buffer to a file), so it will not remove it.
A modern compiler may also completely remove malloc / free pairs and move the computation to the stack. And I do not see what this has to do with C, it should be the same for most languages. C gives you tools to express low-level intent such as "volatile", but one has to use them.
Strong disagree. In C, malloc and free are functions, and I expect no magic to happen when calling a function. If malloc and free were keywords like sizeof, it would have been different.
Your problem is that you're treating words such as "function" and "call" as if they had meaning outside of the language itself (or, more specifically, outside of the C abstract machine), when the point of the compiler is precisely to melt away the language parts of the specified program and be left with a concrete program that matches its behavior. If you view a binary in a disassembler, you will not find any "functions" or "calls". Maybe that particular architecture happens to have a "call" instruction to jump to "functions", but these words are merely homophones with what C refers to as "functions" and "calls".
When you "call" a "function" in the source you're not specifying to the compiler that you want a specific opcode in the generated executable, you're merely specifying a particular observable behavior. This is why optimizations such as inlining and TCO are valid. If the compiler can prove that a heap allocation can be turned into a stack allocation, or even removed altogether (e.g. free(malloc(1ULL << 50))), the fact that these are exposed to the programmer as "functions" he can "call" poses no obstacle.
Closest to what you say that I can find is 5.1.2.3 §4 of N3096
In the abstract machine, all expressions are evaluated as specified by the semantics. An actual
implementation need not evaluate part of an expression if it can deduce that its value is not used
and that no needed side effects are produced (including any caused by calling a function or through
volatile access to an object)
Problem is, calling external library function has a needed side effect of calling that library function. I do not see language that allows simply not doing that, based on assumed but unknown function behaviour.
The behavior of the standard functions is not unknown, it is at least partially specified. If a user overrides them under the mistaken assumption that a call in source translates in a 1-to-1 correspondence to a call in binary, that's their problem.
Thanks, I did read it! Things like footnote 236: "This means that an implementation is required to provide an actual function for each library function, even if it also
provides a macro for that function", where macro is shown to use compiler builtin as an example.
Again, could you please explain how compiler can decide to remove call to a function in an external dynamically loaded library, that is not known at compile time, simply based on the name of the function (i.e. not because the call is unreachable)? I do not see any such language in the standard.
And yes, calling unknown function from a dynamically loaded library totally is a side effect.
> Again, could you please explain how compiler can decide to remove call to a function in an external dynamically loaded library, that is not known at compile time, simply based on the name of the function (i.e. not because the call is unreachable)? I do not see any such language in the standard.
> And yes, calling unknown function from a dynamically loaded library totally is a side effect.
The thing is that malloc/free aren't "unknown function[s]". From the C89 standard:
> All external identifiers declared in any of the headers are reserved, whether or not the associated header is included.
And from the C23 standard:
> All identifiers with external linkage in any of the following subclauses (including the future library directions) and errno are always reserved for use as identifiers with external linkage
malloc/free are defined in <stdlib.h> and so are reserved names, so compilers are able to optimize under the assumption that malloc/free will have the semantics dictated by the standard.
In fact, the C23 standard explicitly provides an example of this kind of thing:
> Because external identifiers and some macro names beginning with an underscore are reserved, implementations can provide special semantics for such names. For example, the identifier _BUILTIN_abs could be used to indicate generation of in-line code for the abs function. Thus, the appropriate header could specify
#define abs(x) _BUILTIN_abs(x)
> for a compiler whose code generator will accept it.
Only answering the "side effect" part as the rest was answered already.
What a side effect is, is explained in "5.1.2.3". Calling function is only a side effect when the function contains a side effect, such as modifying an object, or a volatile access, or I/O.
As they're freely replaceable through loading, and designed for that, I would strongly suggest that are among the most magical areas of the C standard.
We get a whole section for those in the standard: 7.24.3 Memory management functions
Hell, malloc is allowed to return you _less than you asked for_:
> The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and size less than or equal to the size requested
I read the text as saying the object size can be less or equal to returned memory size.
Anyway, section 7 is library. As you say, replacing through loading is a common thing to do — surely compiler is not free to simply elide external library function at will? This is not C++ after all, it must be sensible
If the function is equivalent to a no-op, and not explicitly marked as volatile for side-effects, it absolutely can elide it. If there is a side-effect in hardware or wider systems like the OS, then it must be marked as volatile. If the code is just code, then a function call that does effectively nothing, will probably become nothing.
That was one of the first optimisations we had, back with Fortran and COBOL. Before C existed - and as B started life as a stripped down Fortran compiler, the history carried through.
The K&R book describes the buddy system for malloc, and how its design makes it suitable for compiler optimisations - including ignoring a write to a pointer that does nothing, because the pointer will no longer be valid.
You are literally scaring me now. I'd understand such things being done when statically linking or running JIT, but for "normal" program which function implementation malloc() will link against is not known during compilation. How can compiler go, like, "eh, I'll assume free(malloc(x)) is NOP and drop it" and not break most existing code?
> but for "normal" program which function implementation malloc() will link against is not known during compilation. How can compiler go, like, "eh, I'll assume free(malloc(x)) is NOP and drop it" and not break most existing code?
I'd suspect that eliding suitable malloc/free pairs would not break most existing code because most existing code simply does not depend on malloc/free doing anything other than and/or beyond what the C standard requires.
How would you propose that eliding free(malloc(x)) would break "most" existing code, anyways?
As an example, user kentonv wrote: "I patched the memory allocator used by the Cloudflare Workers runtime to overwrite all memory with a static byte pattern on free". And compiler would, like, "nah, let's leave all that data on stack".
Or somebody would try to plug in mimalloc/jemalloc or a debug allocator and wonder what's going on.
>As an example, user kentonv wrote: "I patched the memory allocator used by the Cloudflare Workers runtime to overwrite all memory with a static byte pattern on free". And compiler would, like, "nah, let's leave all that data on stack".
Such a program would continue to function as normal; the dirty data would just be left on the stack. If the developer wants to clear that data too, they'd just have to modify the compiler to overwrite the stack just before (or just after) moving the stack pointer.
>Or somebody would try to plug in mimalloc/jemalloc or a debug allocator and wonder what's going on.
Again, that wouldn't be broken. They would see that no dynamic allocations were performed during that particular section. Which would be correct.
I'm a bit skeptical either example is representative of "most" existing software. If anything, the mere existence of __builtin_malloc and its default use should hint that most existing software doesn't care about malloc/free actually being called. That being said...
> As an example, user kentonv wrote: "I patched the memory allocator used by the Cloudflare Workers runtime to overwrite all memory with a static byte pattern on free". And compiler would, like, "nah, let's leave all that data on stack".
Strictly speaking, I don't think eliding malloc/free would "break" those programs because that behavior is there for security if/when something else goes wrong, not as part of the software's regular intended functionality (or at least I sure hope nothing relies on that behavior for proper functioning!).
> Or somebody would try to plug in mimalloc/jemalloc [] and wonder what's going on.
Why would mimalloc/jemalloc/some other general-purpose allocator care that it doesn't have to execute a matching malloc/free pair any more than the default allocator?
I'm not sure debug allocators would care either? If you're trying to debug mismatched malloc/free pairs then the ones the compiler elides are the ones you don't care about anyways since those are the ones that can be statically proven to be "self-contained" and/or correct. If you're gathering statistics then you probably care more about the malloc/free calls that do occur (i.e., the ones that can't be elided), not those that don't.
In any case, if you want to use a malloc/free implementation that promises more than the C standard does (e.g., special byte pattern on free, statistics/debug info tracking, etc.) there's always -fno-builtin-malloc (or memset_explicit if you're lucky enough to be using C23). Of course, the tradeoff is that you give up some potential performance.
Thank you for putting it in a much more correct and understandable language than I could. That is exactly what I am talking about: if you call __builtin_malloc (e.g. via macro definition in the libc header), compiler is free to do whatever it wants. However, calling "malloc" library function should call "malloc" library function, and anything else is unacceptable and a bug. There should be no case where compiler could assume anything about a function it does not see based simply on it's name. Neither malloc nor strlen.
see here: https://godbolt.org/z/rMa8MbYox