this post was submitted on 24 Dec 2023
35 points (84.3% liked)
Rust
5999 readers
64 users here now
Welcome to the Rust community! This is a place to discuss about the Rust programming language.
Wormhole
Credits
- The icon is a modified version of the official rust logo (changing the colors to a gradient and black background)
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I don't understand this fixation with undefined behavior. Its origins are in the design decision of leaving the door open for implementations to employ whatever optimization techniques they see fit without the specification get in the way. This is hardly a problem.
In practical terms, developers are mindful to not rely on those traits because as far as specifications go they have unpredictable implications, but even so they are never a problem. I mean, even in C and C++ it's trivial to tweak the compiler to flag undefined behavior as warnings/errors.
Sometimes it sounds like detractors just parrot undefined behavior as some kind of gotcha in ways I'm not even sure they fully understand.
What problem do you think that undefined behavior poses?
It sounds like you've never had to do real work in a language kind C++ where the compiler is always trying to play gotcha with undefined behavior. You can kind of use tools like AddressSanitizer to catch undefined behavior in testing but you certainly cannot just have a compiler catch it for you like you claim.
I use C++ all the time, undefined behavior is not something I encounter ever. I run undefined behavior sanitizer often.
From the looks of some of the posts showing up in this thread, I doubt the bulk of the commenters portraying UB as the root cause of any problem have any experience at all with C or C++. They are clearly resorting to unrealistic strawmen to pretend UB is something that it clearly is not. That just goes to show their technical background and the substance behind their claims. I really don't know how this helps advocating for Rust.
I have over a decade of professional experience working with C++, and it's likely you already used software I worked on.
Throughout those years, the total number of times where undefined behavior posed a problem in any of the projects I worked on was zero.
Please enlighten me about the insurmountable challenges posed by undefined behavior.
Dangling pointers, double frees and the like, mostly. Tell me you haven't run into those and I'll laugh in your face and call you a liar.
Those are bugs you wrote in. UB is not the problem. Your code is the problem.
I ran into bugs. Do you understand that UB is not the problem if you're pushing broken code? It's not the C++ standard that's messing up if you're writing in use-after-free bugs.
The irony of your comment is that some implementations take advantage of UB to prevent programs from crashing and actually continue to work in some scenarios such as use-after-free and even dereferencing null pointers. But that's not caused by UB, is it? Those problems are caused by developers like you and me who didn't knew what they were doing and even failed to either pay attention to the errors flagged by compiler and static code analysis tools, or even failed to onboard one.
I mean, think about it for a second. Let's say we have a magic wand that can update any C and C++ standard version of your choosing, and we specify that each and every single instance where behavior is left undefined is updated to specify that the program should automatically crash. Awesome, no more UB. What does this mean for your code? Is it now bug-free? Is it now working well after crashing all the time due to the code you added? What role did UB played in this mess?
Do you understand this?
I repeat: detractors just parrot undefined behavior as some kind of gotcha in ways I’m not even sure they fully understand.
Could you provide examples of this?
I'm just a noob when it comes to low level languages, having only been in C# and python. But I took a course on C++ and encountered something that didn't seem right. And I asked and got the "that's undefined behavior". And that didn't quite sit tight with me. We don't know what will happen? It'll probably crash? Or worse? How can one not know how a programming language will perform? I felt it was wrong.
Now, it's quite some time since that happened, and I understand why it's undefined. But I still do not think it should be allowed by default. C and C++ both are "free to do as you want" languages, but I don't think a language should let you do something that's undefined especially if you aren't aware you're doing it. Everyone makes mistakes, even stupid ones. If we can make a place where undefined behavior simply won't happen, why not go there? If you need some special tricks, you can always drop the guard where you need it. I guess I'm just reiterating the article here though. But that's the point for me, if something can enforce "defined behavior" by default then I'd want that.
If you really want to know you can. Basically in most cases it depends on the compiler. Sometimes the hardware. The point is that you should not expect any specific behavior because the standard doesn't specify one
The standard differentiates between "unspecified" behavior, which is as you describe, and "undefined" behavior, which may be completely nondeterministic at runtime.
And that's is the part that irks me a little. How should I expect anything when I don't expect the undefined behavior to begin with?
Say I manage to accidentally do something undefined, I do some math incorrectly on some index, and try to read out of bounds on an array that didn't implement bound guards. Now already it's my fault for several reasons, but in a complex project what the array is, and the details of the array may be "vague", especially if it's not something you did yourself in the project. So as a scenario, it's not completely out there. (some other dev knew the index was "always" right, and did premature optimization and used unguarded arrays instead) Still completely avoidable, but it can happen.
But if only an edge case actually leads to an out of bound read, the problem will probably never happen where the issue is. An experienced dev might never step into this mess, but sometimes this happens when other people change up what others did. I've had similar problems at my workplace, just not with undefined behavior as a result. At the end of the day, you sitting there with hard to know issues that have hard to know consequences.
This doesn't require a special programming language to solve, it just requires a guarded array first and foremost, tests and good reviews might see the bug as well before production. Which in C++ we were taught about from the beginning. If performance actual was a problem, then I guess we'd maybe still end up with this bug in the example. But my point is something along the lines of, all the good practice comes down to the choices of the individual developer. And the choices of one, also affects the next one.
If we could instead place those choices a step up in the chain however. Have the language enforce safety, unless you specifically say you need the be unsafe. So in my example, other the code would be safe already or someone threw and unsafe block to do their "fast" read of the array. In one case, I'll make a crash due to a bad read, the other I'll have to really evaluate why this is unsafe to begin with and apply extra caution. That extra caution goes away when everything is unsafe. Kinda like a small PR will have 15 comments, a huge PR will have less. You won't dedicate your all for every line of code, but if a line is tagged "unsafe" you sure as hell will.
Nothing is a magic solution to everything, and unsafe array reads are just a simple example of fucky behavior. I'm not sure if it still stands, but even something like this was/is undefined in C++
a[i] = i++;
That is to me something you quickly can forget about. And it happens because of compiler optimization that happens even if it breaks the code, because normally the index in the array should be a different variable. Again, here is something that should have obviously stopped me. If the compiler still follows through (I'm sure there warnings for it) then it's just letting me do an error no one should be allowed to. There is no reason this should ever compile.
If we cna do that for everything at the language level, it's a win in my book.
Easy to say, but in the real world those are things you don't do so who cares? You mostly hear about it from people who use other languages and want to pick on C
I think there's a much higher chance of running into UB on C++ vs C due to the complexity of the language. That's why I don't touch C++ and only use C if a higher level language isn't available. I can understand the C language in its entirety, whereas that's really not feasible with C++, especially at the edges.
So I think it's absolutely fair criticism of C++, but a little tired for C.
Not sure exactly what you mean, could you elaborate or rephrase? What problem does it not pose? I mean any program with undefined behaviour is basically by definition wrong. Avoiding undefined behaviour is definitely a good thing.
Those programs are wrong already. you can insert lots of checks and slow down correct code, but that results are still wrong as the code has a bug.
What do you mean wrong "already"?
This is one of the problems in these discussions about undefined behavior: some people feel very strongly about topics they are entirely unfamiliar with.
According to the C++ standard, "undefined behavior may be expected when this document omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data." Some examples of undefined behavior still lead to the correct execution of a program, but even so the rule of thumb is to interpret all instances as wrong already.
Some people also feel strongly about topics they are very familiar with 🙂. I have experienced my fair share of undefined behaviour in C++ and it has never been a pleasant experience.
Sure, sometimes use of undefined behaviour works, but this is more dumb luck than a rule of thumb that you should use. The idea of "it's undefined behaviour but it works when I test it so it's fine" is downright dangerous. That kind of careless attitude might be okay for your hobby projects with low stakes, but for any serious professional software development, you're setting a landmine for yourself and/or your coworkers, employer and users.
Undefined behaviour might work when you test it and it might work when you run it the 1000th time. At the same time it may fail on the 1001st run and it can also easily completely fail when compiling again or simply compiling with a different compiler version. Undefined behaviour is not predictable and is always wrong.
See also https://predr.ag/blog/falsehoods-programmers-believe-about-undefined-behavior/
If you had half the experience you claim to have, you'd know that code that triggers UB is broken code by definition, and represents a bug that you introduced.
It's not the language's fault that you added bugs to the code. UB is a red herring.
You missed the whole point of what I said.
By definition, UB does not work. It does not work because by design there is no behavior that should be expected. By design it's up to the implementation to fill in the blanks, but as far as the language spec goes there is no behavior that should be expected.
Thus, code with UB is broken code, and if your PR relies on UB then you messed up.
Nevertheless, some implementations do use UB to add guardrails to typical problems. However, if you crash onto a guardrail, that does not mean you know how to drive. Do you get the point?
Nobody's perfect and time has shown multiple time that you can't trust human beings with memory safety. I.e. the whole 70% of bugs being memory safety bugs thing. Adding bugs to the code isn't the language's fault, but you can't blame a human being (even experts) to do it.
It is however the language's fault to allow UB in the first place. And it's possible to entirely avoid UB in (safe) Rust. So we've seen that the possibility of undefined behavior is not necessary for the vast major of programming. So I would definitely say it's C and C++'s "fault" for allowing UB in the unrestricted way that it does.
Am I blaming those languages? Nah, it was a different time. We didn't have the technology we have now. But going forward there's no reason to use unsafe languages in greenfield projects. We should move forward with safe languages.
That's perfectly fine. That's not a problem caused UB, or involving UB.
Again, UB is a red herring.
It really isn't. Again, mindlessly parroting this doesn't give any substance to this claim. Please try to think about it for a second. For starters, do you believe it would make any difference if the C or C++ standard defined how the language should handle dereferencing a null pointer? I mean, in some platforms NULL is a tombstone, but on specific platforms NULL actually points to a valid memory address. The standards purposely leave this as undefined. Why is that? Seriously, think about it for a second.
It really isn't. It's a design choice that reflects the need to work with the widest possible range of platforms. The standards have already been updated with backwards-incompatible changes, but even the latest revisions purposely include UB.
I repeat: I see people mindlessly parroting nonsense about UB when they clearly have no idea what they're talking about.
what is the usecase for going beyond maxint? Sure I can make it defined, but your program will have a bug if you do it. Defined just means platforms with a different behavior have to insert checks all over for something that rarely haypens
Okay but I would much rather deal with a defined bug than an undefined one. A defined bug is still a bug, but at least it is predictable. A bug from undefined behaviour is chaos and could do conceivably anything.
A defined bug is at least somewhat limited in what it can do. In Rust, many of those cases would panic for instance. That's much better than, say, continuing execution with garbage data.
There are many examples of applications that leverage integer overflow, either wrapping around or saturating values.
There is nothing to rephrase. I asked what problem do you think that undefined behavior poses. That's pretty cut-and-dry. Either you think undefined behavior poses a problem, and you can substantiate your concerns, or you don't and talking about undefined behavior being a concern is a mute point.