What's New in Rust 1.44 and 1.45
Episode Page with Show NotesJon Gjengset: Ben, we are back.
Ben Striegel: It’s been a while.
Jon: It has been a while, but we’ve been doing it intentionally, right?
Ben: Yes.
Jon: So that we could get two versions at once, yes?
Ben: Yeah, it’s called being efficient. And Rust is all about efficiency.
Jon: Exactly. Really, we’ve just embraced the idioms that is Rust. We’ve got some quick housekeeping notes before we start. As you may have noticed, if you follow this podcast, there are now a bunch of This Week in Rust episodes also appearing alongside our voices, Ben. And this is really exciting. This is Nell Shamrell-Harrington, who is the editor of This Week in Rust, who basically reached out and was like, hey, I have an idea for a podcast. Can you help me? And then we went, yeah, that’s a great idea. You’re going to be seeing a lot of those episodes going forward as well. We also have expanded the Discord for Rustacean Station. There’s now also a category there for people who do other kind of media in the Rust world. In particular, people who do Rust live streams. Whether they’re sort of coding or educational stuff, you can jump in there and and find other people that produce other interesting Rust stuff. If you’re tired of listening to just Ben and I chat on like we sometimes do.
But now I think we’re I think we’re ready to talk about Rust 1.44, Ben. What do you think?
Ben: That sounds great. And so I think it’s actually perfect, because 1.44 and 1.45 are both on the smaller side, so it makes sense to cover them both in one episode.
Jon: Yeah, I agree. I think it worked out exactly like we planned it.
Ben: But there are still some important things to go over in terms of, like, interesting digressions, as I think we are accustomed to around here.
Jon: Particularly prone to, its true.
Ben: Yes.
Jon: The first one is that cargo tree
got integrated into Cargo. Do you
want to explain a little bit what cargo tree
is?
Ben: I actually don’t know what it is. I didn’t hear about this.
Jon: Oh, fantastic. Then I can explain what it is. So cargo tree
is a
command that I’ve been using a lot, actually. So cargo tree
used to be this
separate plugin you could install for cargo, where if you ran it, it would print
a little tree of all your transitive dependencies and what version they were at.
And it would also let you do inverse trees, so you could say, show me all of the
things in my dependencies that depend on, say, tokio
, and it would show you
all the things in your dependency tree that depend on tokio
. It’s just a
really nice way to, like, explore your dependencies. Try to figure out how many
you have, whether some of them might be, like, outdated or old, whether you’re
accidentally pulling in lots of crates you weren’t expecting. And now this whole
toolchain has been integrated into Cargo. So starting with 1.44, you can now
just run cargo tree
and you’ll see this, like, pretty output of your
dependency tree that you can then do various interesting sort of introspections
on, to try to figure out what’s going on.
Ben: Now to clarify, over the years, I have used a few different sort of tools for this purpose. Is the one that prints out a graphvis dot file?
Jon: No, it prints it in the terminal. I did not know there was one that printed a graphvis file.
Ben: Oh yeah, this one gives you a nice, like, a graphvis file, which is, like, for a very large dependency graph is sometimes both overwhelming, but also useful.
Jon: Oh, that’s really cool. The one I really hope gets integrated is
cargo outdated
, which I use just constantly. You can use cargo tree
a little
bit like this. In fact, one of the really helpful ways I’ve been using
cargo tree
is sometimes I find that my crate depends, like basically ends up
pulling in two versions of some downstream crate. Like 0.2 and 0.3 or whatever,
like two major versions. And cargo tree
is a really useful way to try to
visualize, why is it pulling in both? Because I can tell it, like, list all the
things in my dependencies that pull in, I don’t know, like rand
0.6, and then
it will just show you, here are the ones that depend on rand
0.6 below you.
It’s a really handy tool for for just understanding your dependencies better.
Ben: And then if you also go on to act on that, and try and say, like hey, to your upstream dependencies. Okay, you have this out of date thing. Let me, like, update this for you, that helps the entire ecosystem to avoid the same problem.
Jon: Yeah, exactly.
Ben: So it’s a nice little public service, if you choose to go that route.
Jon: Yeah, I’ve had some pretty fascinating, like deep explorations there, where I go, okay, I need to update crate A. But crate A depends on crate B. So I need to update crate B. But crate B depends on crate C. And so in the end, I have, like, submitted five pull requests that are all just like, bringing things up to date one by one. This is really handy, if you’re part of like, the Hacktoberfest, for example. It works out very nicely, if you just want to do lots of small contributions that actually end up helping the ecosystem.
The other thing that landed in 1.44 was, you can use async/await now in no_std
context as well. This is really cool, but there’s also some interesting stuff
going on under the hood for why this is now possible. And what mechanism is
there that makes this possible? You want to speak a little bit to why this is
interesting in the first place?
Ben: Sure. I think first, let’s briefly, just like, quickly go over
no_std
, for those who don’t know, is a kind of profile that you can use with
Rust, that is intended for embedded devices. Or I’m not sure what the correct
term of art for it is, like low-memory devices, like low-power devices, like
low-capability devices. Imagine kind of like, an incredibly simple
microcontroller that you could like, stick on the wall of a factory, say, and
have it run on a very small battery for months or years at a time. It’s a very
different sort of programming than you would use for, say, making a desktop app,
or any kind of server application. And C is often used for this purpose. And
Rust is able to also do this sort of thing, because Rust wants to be able to be
used as an alternative to C wherever possible. But it does impose some
restrictions, and in particular you lose a large portion of the standard
library.
But one of the interesting things about Rust is that Rust doesn’t want you to
ever lose any language features, as a result of using no_std
mode. There are
some languages out there that try to support this use case in terms of wanting
to be both, you know, have the high level niceties of, like, a nice vector
abstraction that can be allocated, and also low level stuff. And some languages
also impose, if you go through this route with this more limited profile, you
also can’t use, say, exceptions or whatever, or any various, like maybe closures
you can’t use. And Rust is very carefully designed to not have that happen.
Jon: Yeah, it’s interesting because one of the places where no_std
comes
up a lot is when you don’t have an operating system. Like, you don’t have things
like threads. You don’t have a file system. You don’t have a networking stack.
And so all of those things, you’re going to have to sort of do yourself.
Ben: And in Rust, too, you wouldn’t— if you use no_std
, you wouldn’t have
any library support from the standard library for these things. But you would
still have closures, and the Result
type, and everything else would just still
work. Even panics work in no_std
. They’re a bit different semantically. And
the only exception to this case, until now, for the past few months, that’s been
stable anyway, was async/await. And the reason that it was unstable, or didn’t
quite work just yet, was because of some implementation details about
async/await and how it works under the hood, and we may have spoken about this
back when async/await was first stabilized. But it’s worth reiterating.
Async/await is implemented in the Rust compiler by an abstraction called
generators, and you may have heard that term from other languages like Python,
the idea being that a generator is like a function. If you know a function,
you’ve already got a handle on what a generator is. A generator is a function—
normally, a function would run, you start and it runs to completion, and you get
a thing back. A generator is a function that you can kind of pause in the
middle, get a thing back, and then resume again. And it’s very useful to, say,
model an async/await sort of style as a function that has a loop in the middle,
and after each loop you pause and then resume beginning the loop over and over
and over again.
And it’s useful to be able to do this. There are various enhancements that would make it even more useful. For the moment, the only way to use generator in Rust is through async/await. Generators are not exposed stably to any kind of like, you know, to any stable code. And one of the future enhancements is the idea of passing in arguments to the generator, on each resume step. But there are some, you know, there’s some syntactic questions. There are some semantic questions, implementation questions. And in this case, the idea was internally, the Rust compiler was passing in arguments to each new iteration of the generator as it resumed. But there wasn’t really any way to do it on the MVP that shipped last year, without using task-local storage and task-local storage is something that is generally provided by the operating system. I mean, I guess they— let’s say thread-local storage, really.
Jon: Yeah, it’s thread-local storage.
Ben: Thread-local storage. So if you are no_std
, just no luck. But thanks
to some heroic efforts to basically add the support for actually passing in
proper arguments to generators, now you can use these in no_std
contexts and,
like, I’m not sure— so do you have any examples of, like, what sort of thing a
no_std
application might want to use async/await for Jon?
Jon: Well, so async/await is an interesting model for embedded devices, because you often don’t have threading, right? You might just have one core and you’re not implementing multi-threading because it’s not— like, that’s fairly complicated to implement and might even require special hardware that you don’t have. And so the sort of, like, you have one event loop that manages everything in the system, is actually a fairly attractive model on these platforms. And that’s basically what async/await gives you, right? You have the one executor that runs on the core, and it just keeps track of all of the asynchronous tasks that are either runnable or pending. And so it is really nice to be able to now use async/await in that context, because it means that you can now write that sort of relatively straightforward asynchronous code, and then you could just run it with the executor that you implement for your embedded platform, rather than either having to like, manually write all your futures, or implement full multi-threading, which is its own huge task.
Ben: Yeah, I think it’s fun to see the kind of things that we take for granted. As, you know, people who program generally for these more powerful machines, more capable devices with, like, OSes, that give us all these nice things. And so generally, if you’re writing a normal app, you might even say, hey, I don’t actually even care about using async/await. I’ll just use threads, which I’m used to. It is like, you know, maybe conceptually, a bit simpler, if a bit kind of more freeform, and perhaps error prone. But if you’re writing for an embedded device, you often don’t even have that choice. And so, this is kind of like, you can now do thread-like things, asynchronous things, but in a single threaded context.
Jon: Yeah. You can think of this basically as making it easier to write nice
code on embedded platforms. That’s basically what this enables, and it’s
interesting, because the only thing that was really missing here was, as you
say, the thread-local storage, which is only needed for— async/await has, or
futures in general in Rust, when you call poll
on them, they require this
Context
argument, and the Context
primarily includes the Waker
for that
future. So the Waker
is the thing where, when the future eventually goes to
sleep, it yields. It says, I can’t do any more work right now. Maybe there
aren’t any more bytes to read, or the buffer is full, or whatever. The Waker
is the thing that’s going to mark that future as runnable again. Like, if a
packet comes in on the network or something, it’s going to call wake
on that
future’s Waker
. And now that future is going to be polled yet again by the
main executor. And that Context
, that Waker
, was the thing that previously
had to be stored in thread-local storage, but that can now be passed through
normal arguments, through the work that went through this PR.
There was another change that sort of, was hidden a little bit, deep down in the changelogs, which is where we like to delve. And that is support for Unicode 13. And this got you a little excited, Ben, can you talk about what this is and why it matters?
Ben: I want to clarify, I don’t think that Unicode 13 specifically adds anything new, but I do think it’s an important kind of thing to consider. I know that, like, when you come to Rust, say from, you know, a language that doesn’t impose Unicode. Like Python 2 did not, C does not. Some languages do, but in different way like Java and JavaScript impose it for encoding, but they do use Unicode. But people usually just say, hey, like we— our language uses Unicode, but that’s not quite as precise as you could be. Unicode is a standard, and it has revisions all the time. To talk about what Unicode 13 is, like, what does Unicode add? And people kind of think of Unicode as being, like, just a mapping, like, oh, you know. So this number maps conceptually to this glyph, and then, like, Unicode doesn’t take care of the font. Unicode doesn’t take care of, like, any sort of thing about actually displaying the glyph. It’s just a mapping. But that’s not quite the case. Unicode has a lot of mappings, but also a lot of different rules about how languages should be treated and how different glyphs should interact. So, to support things like vertical text or right-to-left text, or how different composing characters should interact. If Unicode were just a mapping of numbers to code points, to glyphs in human language, that would actually be pretty easy. Because you can imagine that adding new characters would never, like, break old code in terms of like, well, as long as you weren’t using this previously unregistered glyph, or code points, it wouldn’t be a problem. And that is the bulk of changes to Unicode, in every new revision, are just adding things. I believe Unicode 13 added, like, 4000 or so different code points over four brand new scripts, and like 50 new emojis, and these are all things that are just strict additions, and they shouldn’t ever have an impact on your code. But they do, I think— there are other things in there, like I mentioned, you know, like the different supports for, like, scripts that go right to left, or top to bottom, that sort of thing. Certain things can sometimes change, either because there was an error in the previous specifications. Or just, you know, things have changed because human language is sort of a global mutable value, and you can’t ever just sit on your heels and, you know, think about, hey, like, this is— we’re done. Things will change and will be added.
Jon: It’s funny, because in some sense, like, language is like an additional version dependency that you have, that’s sort of unstated.
Ben: Yeah, and I do want to— I think we’re going to— let’s break the timeline a bit. I want to talk about it quickly because it makes sense here, a feature from the next Rust release, 1.45, which is—
Jon: How convenient, that we’re also talking about that version.
Ben: How convenient, that you and I had the foresight to discuss these two
versions in this one episode. Yes, there is a a new feature in the next Rust
version, 1.45, which is that— by saying next, it’s been released for weeks now—
which now allows you to query the standard library and ask hey, what version of
Unicode do you support? Because if you wanted to be, like, really robust about
your Unicode handling, you would be like, hey— so you would keep in mind the
things that change from version to version of Unicode, and then add in support
in your program, in your Rust code that might do things differently, based on
what version of Unicode the standard library currently supports. And the
standard library will only ever support a single version at a time of Unicode,
and the Unicode version changing things in the Unicode version is exempt from
the normal Rust stability guideline, because Rust has no control over whatever
Unicode does. And being on an old version of Unicode is liable to cause even
more problems in the long run. So Rust just kind of has to, like, suck it up and
be like, you know what? If something— Unicode is generally pretty responsible
about this, as any kind of large project is; they try not to break things
without reason, but still, like, things might change. Even just bug fixes might
change behavior. The std::char::UNICODE_VERSION
constant now tells you the
version of Unicode that you can expect the standard library to support.
Jon: Yeah, I feel like Unicode is in a position where it’s so widely used, that they’re probably very, very wary of making, like, backwards-incompatible changes. It’s probably mostly sort of bug fixes and additions would be my guess.
Ben: But it’s also such a large and subtle specification, that changing anything could cause anything to change. So really, the choice is either don’t ever change anything, or try and carefully change things, which hopefully won’t break things.
Jon: That’s true.
Ben: But you can never be sure.
Jon: Another cool change that sort of seems minor, but might have impact for
some of you, is that rustc
now actually respects the codegen-units
flag in
incremental mode, which might not mean very much to you. But the basic idea here
is that when Rust compiles your code, it generally compiles each crate
separately. And each crate is compiled sort of like a unit, like it’s compiled
as a single thing, and it used to, be back in the day, that only one sort of,
almost, one thread could compile any given crate at a time. Now that’s changed a
little bit since, where it was able to split the compilation of any given crate,
using what are called codegen units. But this used to be disabled in incremental
mode, which is what the compiler usually compiles your code with. This is the
mode that makes it so that if you compile your crate, and then you make a small
change, and then you compile it again, that compile is much faster than it was
if you just compiled your crate from scratch. And one thing that landed in 1.44
was the ability for the compiler to, or the willingness for the compiler to use
multiple threads to compile a single crate, even in that incremental mode. And
what this means is that at least in theory, if you were building relatively few
crates like let’s say, you don’t have many dependencies, you’re really just
building your crate. You might see that build be significantly faster after
1.44, just because it uses more threads to compile that code now. So that’s kind
of neat.
Ben: I wanted to talk about some of the const work that’s been going on in the standard library.
Jon: Oh yeah, always.
Ben: In this case. There’s always— in this release there’s a few new const
APIs, things about, I think, getting little-endian to big-endian conversions are
now const capable for integers. So things that should be const are becoming
const. As the const machinery gets more mature. One thing in particular I wanted
to highlight was Vec
s, and Vec
constructors and how those work. And so, for
example, you may be familiar with the vec![]
macro for making a Vec
. And
this is probably ubiquitous, I think, for people who construct Vec
s these
days. One problem with that macro constructor, though, was always that, under
the hood, it was doing things like, you know, pushing to a Vec
or hitting
append
on a Vec
. And the problem is that the normal Vec::new
constructor
actually doesn’t allocate, which is a fun thing about Rust, that often things
which are empty won’t allocate. Or things that which allocate, will totally not
allocate until you actually need them to allocate. Vec
can be a zero-sized
type, I believe, right? Or am I wrong in thinking that?
Jon: It’s not a zero-sized type, it still takes up a space on the stack. It’s just that it doesn’t allocate anything on the heap until you actually put anything into it.
Ben: That sounds right. But so, actually, there’s zero heap allocation, for
just making an empty Vec
. And previously, this constructor for Vec
was made
const
. And so now you could actually have, like, say, a const value, or a
static
of Vec
, and you initialize it with just Vec::new
, and it would just
work because there wouldn’t need to be any actual allocation at compile time,
which is kind of a weird concept. And so this would totally work. And now the
macro constructor for a Vec
, if you have an empty vec![]
, that now maps
directly to Vec::new()
, so that now works in const context.
Now let me change gear and propose a radical alternative. So I think people are
used to say, like in Rust, if you want to make a heap allocated string, you use,
there’s a bunch of different ways you could do it. We had a long war about this,
back before 1.0— what should be the idiomatic way of making a heap-allocated
string, and it’s which we like, you know, have the string literal and then call
.to_string()
. Should we call .into()
and just have it turn into a string
based on, like, you know, type inference? And the one that won out was
eventually String::from
the string literal. The reason for that was kind of,
it best exemplifies the idea that a String
and a str
are different things in
Rust. Where, like we mentioned before, the no_std
use case, where no feature
of Rust should be unavailable to no_std
users. And one of those features that
needs to work is string literals. And so you can’t just have a string literal
make a String
implicitly, unless you want your no_std
users to get grumpy at
you.
In Rust, I’m sure many beginners hit this, where it’s like, you have this string
literal, but it’s not a string in terms of like, String
that lives on the
heap. You actually need to do a thing to make it a String
. And to do that, the
idiomatic way of doing that is String::from
, which I’m sure some people will
still dispute. It’s beside the point, the idea being that String
s and Vec
s
in Rust are kind of— they’re kind of— they’re related, they feel like, right?
Where a String
feels like just a Vec
containing a certain kind of type,
right? Containing Unicode. But it was a bit different. There was no, like,
string macro for making a string. It’s this constructor, String::from
. But the
Vec
macro is different, and the reason for that was a bit complicated. There
always did exist. A Vec::from
constructor, and you could give it a slice of an
array, and it would make the Vec
for you. But there were problems with this,
one being that the Vec
would be created using borrows into the slice, because
a slice is borrowed. And there wasn’t really a way of having— just passing in an
array literal, like an actual stack array to the Vec
. And the reason for that
was, I think that we’ve possibly discussed before, which is that in the absence
of the ability to have type-level numbers, also called “const generics”
sometimes in a broader context—
Jon: Yeah, I think we talked about this a little bit last time.
Ben: Right. Or even before. There’s some cool things happening there. But the idea being that there are certain things in the Rust standard library that are only implemented for arrays up to size 32. And that’s always been kind of regarded as a hack. Where it’s like, well, people like to use arrays of any arbitrary size. Why don’t we just implement it for all of them? The problem is that there are quite a few numbers in math—
Jon: Yeah, it’s weird.
Ben: Even in computer math, which has, you know, fewer than infinite numbers. And it’s just not feasible to try to implement the same function. Even with macros, even with macros writing macros, to have the same function defined, say, four billion times. So this is what you need const generics for. You need the ability to write a single function that then on demand, says, oh, you’re trying to call this function with an array of size 1612. Okay, here’s that implementation for you, without needing to actually generate four billion other implementations of that one function, for each and every function you might want to write. Recently with the work on const generics, we have been able to kind of start thinking about lifting this restriction. And so I think actually, the support nowadays is in fact mature enough to support this. It’s just, the Rust library team doesn’t want to actually expose this yet, until they’re absolutely sure that const is ready for prime time.
Jon: Yeah, or at least some subset of const that were—
Ben: Yeah, you mentioned before— you want to go over, like, the arbitrary restrictions that we have right now, to keep const from being too exposed?
Jon: Yeah, it’s kind of silly, but also kind of cool. So the idea is that
internally, the compiler is allowed to use unstable features because it doesn’t
actually leak out into the application code. This is what allows the compiler
to, like, implement new features, and then start testing them out internally,
even on stable, because the compiler is compiled with nightly. And this means
that internally in the compiler, you have access to const generics, so you can
easily write, like, an implementation of From<[T;___]>
, like an array of T
s
of any length. And for Vec<T>
. You can write that implementation for any N,
using const generic in the compiler. But if you did, you’re sort of leaking the
fact to the world that you’re using this feature, and that this feature exists.
And there are a couple of reasons why you don’t want to do this. But the way
they’ve come up with not leaking this value is that there’s a trait in the
standard library called LengthAtMost32
, which is only implemented for arrays
of length that is less than or equal to 32. And so what they have is they have
this impl of From<[T; N] for Vec<T>
. But only where the array of that length
implements that trait. And the only thing this trait does is limit what
implementations that applies to. And so the moment that const generics are
actually made stable, they could just remove that trait from the impl, and
suddenly the implementation is now visible to the whole world for any N rather
than just Ns up to 32.
Ben: Yeah, and so, as kind of a precursor to lifting that restriction,
getting back to my point, and the radical proposal I have, you can now call
Vec::from
with just an array, using an array literal, a stack allocated array.
And I think this is a great sort of mirror to String::from
. And in fact, if it
would finally become stable for, you know, arrays of larger than 32, I suggest
that maybe it should replace the vec!
macro, which again, highly radical and
contentious—
Jon: Very radical of you.
Ben: Only because I think macros are a little bit, kind of, opaque to
editors, and they kind of don’t play super nicely with incremental compilation.
And to be clear, none of these are really deal breakers for using the vec!
macro specifically, which is a pretty simple macro. But when has something ever
being not a big deal stopped programmers from arguing about something.
Jon: That’s right. Wait. So isn’t then the galaxy brain solution here, to
not even use String::from
, but use Vec::from
and list the characters one by
one?
Ben: Oh, we could do that. We could add an implementation of Vec::from
,
specifically using specialization, which is still unstable. For bytes, and then
you can type Vec::from
and then have a b
binary character literal before
every single character.
Jon: Nice. It is interesting that—
Ben: And then we could have— and then to make it nice, we have a macro
called string!
Jon: stringify!
Ben: To do that for you.
Jon: It’s stringify_vec
, is what it should be called. It’s great.
It is funny, actually, because I looked into this a while back, and it turns out
that a String
is not Vec
of characters, and this turns out to be important
in a bunch of ways. The char
type in Rust is actually specifically a Unicode
scalar. It is always four bytes long. It’s like a 32-bit integer that represents
a Unicode scalar, so it doesn’t, like, it doesn’t use UTF-8, for example,
whereas a String
is a UTF-8 string. So it’s a UTF-8 encoding of that
particular string in Unicode. And so if you have a Vec
of characters and you
want to turn it into a String
, you actually need to encode it. It’s not just a
straight transformation, which was interesting to me.
Ben: Getting back to, you know, Unicode being a bit more than just a mapping. It’s a lot of things, and it also specifies behavior, which sometimes changes. Hopefully not too much, though.
Jon: Speaking of things that change, are you ready for this? We also got a change to Rust 1.45. How do you like that segue?
Ben: I’ll give you partial credit for that segue? That’s fine. That’s fine. But I think we’re done with 1.44. Let’s move on to 1.45, which, I don’t have very much to talk about in this. Do you want to start?
Jon: Sure, I can kick us off with what’s been like, basically, a very long
journey that talks to a little bit of, like, interesting low level interplay of
Rust and LLVM, which is the compiler back end used by rustc
. So this is an
interesting weird quirk in Rust that’s been standing for a long time. I think
this issue first came up in in 2013.
Ben: Let me even give you some more context. This is the oldest unsoundness issue in the Rust codebase. Also, until recently, it was the oldest open one, by dint of also being the oldest one. So the fact that this dates to well before 1.0, and it should speak to how thorny this was to actually resolve in practice. Hundreds of comments, dozens and dozens of commenters over the span of seven or more years. Trying to fix this bug.
Jon: And the funny thing is, the bug itself is fairly straightforward. It
was, what is the right fix? That was the big debate. So the bug here is
basically, if you have a floating point number and you cast it using the as
keyword to an integer type that can’t hold the value that’s in the floating
point number. So imagine you have the floating point number like 1024.2, and you
try to cast that to a u8
. Then there is no representation of that number in a
u8
, because the u8
can’t hold values larger than 256, 255 I guess. And so
what do you do? How do you convert this float into that number? Because that’s
what the user asked for. And here begins the discussion of what is the correct
value to end up in that integer at the end. Do you want to talk a little bit
about the debate?
Ben: Sure, I mean, so, generally in Rust, whenever someone is faced with, like what should we do? The first answer to my mind is always, we probably aren’t the first people to have this problem. Let’s just see what other people have done. And so there are various schemes from other languages. I believe the scheme that we eventually settled on is just taken from— maybe Java does this, in terms of, do you, like, if the number’s too big, does it become infinity? Or does it become the maximum number? Or if you have, like a NaN, where does it go, that sort of thing. I believe we just took Java’s. I’m not sure about the full spectrum of, like, different languages doing things. Maybe Java also took it from somewhere else. Or maybe other languages have Java’s semantics as well.
Jon: And what happened before you fixed it? Like, before—
Ben: Well, so yeah, that’s good point. And so you mentioned, like, so it is, right. It is a nice intersection of so, kind of like, Rust tries to not tie itself to any particular codegen back end. And so like, for example, recently, the past year or so or more, people talked about using a compiler called Cranelift as the back end for Rust, the idea of being that you would compile Rust code, and LLVM wouldn’t even be involved in the slightest. And this is possible, if you are very careful when you design the language, to not have your behavior depend on any implementation details of the underlying back end, right? And so this takes a lot of effort and a lot of foresight, a lot of caution, but it can be done. With the idea being that like, hey, we don’t want Rust to be implementation-defined for all eternity, and LLVM doesn’t have, like, a specification. You can’t just say hey, look at that. The LLVM spec for what we do is kind of just, you know, they do what they do, is like, we currently do what we do. And so if you ever wanted to write, say, like a GCC back end for Rust or like, you know, there are other, less complete back ends like the— mrustc is a different kind of like, you know, example back end, example re- implementation of the compiler, doesn’t use LLVM. It’s all its own thing.
Jon: Yeah, I think that’s right.
Ben: And then you know, Cranelift too, like there are good reasons to want to write a different back end. And so you want to be careful to not couple any user visible behavior to implementation details of your back end. This is one example where things did leak through, kind of out of necessity. Because obviously, if one of the big things, one of the important features of a language at Rust’s level, is that you want to have support for doing math really, really fast. In some sense, computers are all about doing math really, really fast. And so the most basic things, like just doing operations on floating point numbers, need to be really, really fast. And so LLVM provides various, like intrinsics, I believe is the terminology that LLVM uses— Rust uses that, but I believe they use it as well. And the idea being that whenever Rust says, hey, like, you know, you want to add, like, this integer to this integer. Well we don’t have a routine for that ourselves. Why would we? LLVM has that. We’ll just use that. And because LLVM can, like, based on the platform and various optimization settings, it can do the absolute fastest thing, doing what LLVM does best, which is to make fast code.
Jon: Yeah, so it’s attractive to just do what what LLVM does.
Ben: Yeah. You want to just do what LLVM does, as long as you can still guarantee that, you know, the behavior is sane, and does not couple you to really weird semantics. And in this case, they were using, to convert from an integer to a floating point, or is it the other way? I always get confused.
Jon: Floating point to integer.
Ben: Floating point to integer, other way around. They were just using an intrinsic provided by LLVM. And LLVM’s behavior here is kind of based on what C says you can and can’t do with numbers. And I’m going to, I don’t want to talk too much about this because I haven’t quite read up on what C does. I don’t want to, like, betray my extreme ignorance here. But C is not as strict. I mean, like, it makes sense. C was written in the seventies. C is kind of like, you know, advanced assembler. It is designed to be extremely flexible, and not over- constrain what an implementation might want—
Jon: Yeah, and in the seventies, everything was allowed.
Ben: Yeah, and so it was like, it’s totally reasonable. And so nowadays, the things that you think are— C predates floating point numbers as a spec, like IEEE 754 is way after C. It’s like, you think about, you know, all these, like, Unicode, UTF-8 comes after Java, after JavaScript. All these things, these languages have been around for a while. They are venerable. And so things you take for granted just weren’t around back then. C doesn’t really prescribe what you must do in these cases, and so LLVM is like, we do it, we’re mostly, you know, we were originally conceived as a C compiler. We’ll, despite a fault, do what C does. And it turns out undefined behavior is often what is allowed in these cases. And so the debate for many, many years is like, well, do we add checks ourselves for these cases? Will that slow things down? And they tried, they benchmarked, and the answer is yes. And so I believe it just took a long time of figuring out what the— like, how to add in checks and, like— what do you want to do first of all, and then how to add in the checks in a way that minimizes the performance cost.
Jon: And it was important that it wasn’t undefined behavior too, right, because casting a floating point number to an integer type isn’t unsafe. But if it— but the idea is that safe code should never trigger undefined behavior, and this was an exception to that. So there had to be, like, a well defined answer to this that was safe, that did not introduce undefined behavior.
Ben: Right. And so, in this case too, good thing you mentioned that, because
now let’s time travel back to 1.44. Two of the otherwise overlooked APIs are
to_int_unchecked
for f32
and f64
, and these essentially just give you the
old behavior, right? Where it’s like, you’re using these intrinsics, maybe
because, like maybe you’ve benchmarked with this new change, which came about
automatically in 1.45 and you said, hey, like, actually, I need to be fast here
and I’m being careful. My data doesn’t contain any NaNs, or numbers that are too
big. I just want the old fast behavior, this straight-through intrinsic. Give me
this unsafe method. And that’s what that’s for, essentially. And so the idea
being that there shouldn’t be any major performance cost. I think like it was
only a few percent in the benchmarks that they looked at—
Jon: A few percent. That’s outrageous.
Ben: But still for some people, that could be too much.
Jon: How can you steal a few percent?
Ben: I don’t know. I mean, like, for different people, different benchmarks, You never know someone, might really matter. I mean, I don’t do any like graphics programming, which I imagine— I’m trying to think, even, like, rack my brain. Like, in what context would you even want to, like, convert floating point to integer? It’s kind of weird to even think about. But, like for me. I’m sure someone out there is yelling at me, being like, no, I do that every day, and I’m sure you’re right. I just don’t know it. I assume it’s going to do with graphics programming, where you’re like, you have like this, like, strange fluid, like three dimensional space, and you want to convert it to actual coordinates in a reasonable way. That’s probably what you’re doing. Or like simulations, where you’d be doing the same. So I can imagine that for those people, they want to use these methods, if they have benchmarked—
Jon: But now those methods are rightfully unsafe.
Ben: And also, the problem is fixed. And so it is a victory for Rust, closing extremely old and thorny unsound issues, and not throwing up its hands. I mean, I remember back in the day, like, back, literally before Rust 1.0. I remember, like, some people just saying, this is never going to be fixed. This is just, like, people won’t accept any sort of slow down here. I don’t see what we could even do here. And I’m like, no, we must do it eventually. It’s important for— you can’t say, Rust’s idea is, if it compiles without unsafe, It’s totally safe. Modulo bugs. This is is a bug. This was a bug, and now it’s fixed. Victory for safety.
Jon: I wonder what the now oldest bug is. Like, soundness bug.
Ben: I mean, I can look it up. I mean, GitHub has a nice interface, if you want to go over the next thing while I—
Jon: Yeah, sure.
Ben: Look this up.
Jon: The other cool thing that landed in 1.45 is something I’ve wanted for a
really long time, which is function-like procedural macros in expressions,
patterns and statements. And this— that’s a little bit of a mouthful. But the
basic idea is that if you have a declarative macro, like you write
macro_rules!
and then the name of the macro and you defined it directly in
your code, then you can call that macro in, like, basically anywhere. You can
put it in expression context, for example. Like I can say, let x =
and then
my macro. And this is how you use the vec!
macro, for example. It’s usually
used in expression context. But it used to be that you couldn’t do this for
procedural macros. What this meant was, you basically couldn’t write the vec!
macro, for example, as a procedural macro. If you did, you wouldn’t be able to
call it in, like, let x = my_vec![]
. It just would not compile. Now there was
a pretty well-established workaround for this. There was a crate called proc-
macro-hack
by David Tolnay, that works really well. It basically lets you do
this, but it was definitely, like, a hack. It was a workaround. And now,
finally, in 1.45, this just works the way that you would expect it to.
If you now write your own procedural macro, that macro can be invoked in
expression context, but can also be invoked in a number of other contexts that
were previously unavailable. So, for example, you can now use your procedural
macro in pattern context, so you can use it with, like an if let
, or a
match
. You can use it in statement context, so you can use it as like, this
thing and then semicolon, in the list of Rust statements, which was also
previously not possible. And one reason why this is really cool, beyond just
making things that should work, work, is that this was the last thing that was
holding back the Rocket web framework from working on stable. And so now,
finally, with this final feature implemented, you can now use Rocket on stable,
without having to enable any features, and so this is definitely, like, a huge
win in terms of both the ecosystem, but also just in terms of not being
surprising to macro authors. Like I know, I certainly ran into this the first
couple of times I tried to write procedural macros myself.
Ben: I was actually looking at the unsoundness stuff.
Jon: Did you find the issue?
Ben: Oh, I did, yes, actually. So I mean, currently, it’s very silly. It’s,
if you cast, or sorry, yeah, if you cast to a trait object of Any
, you can
then get a type_id
, which, according to the docs, is a globally unique
identifier. But because of some weaknesses in the hashing algorithm, it’s
possible to find birthday attacks, potentially.
Jon: Oh, that’s so stupid.
Ben: To find collisions among these global unique identifiers. Yeah, that’s not— It’s listed as priority low. So it’s kind of extremely silly, but it is. We got to fix it, and there are ways to fix it, it’s kind of no one’s worried right now. So it will get fixed some day. But yeah, that’s the current oldest thing.
Jon: There are a couple of smaller changes in 1.45 that I think are also
worth talking about, and one that I’ve actually— that I ran into recently, and
only discovered landed in 1.45 because I tried to compile on 1.44 and it failed,
was the ability to use the char
type, the character type, with ranges. So for
the longest time, you’ve been able to say, like, for i in 0..100
using the
..
or ..=
syntax. And now you can do that with character ranges as well, so
you can do things like for c in 'A'..='Z'
, and it will iterate over all the
characters between A and Z. And this actually works for basically arbitrary
Unicode code points, which is pretty cool. So you can iterate over things like
Emojis if you wanted to. Although you’d want to get the order right, so it’s not
like a weird kind of loop, or it just panics. But it is kind of cool that it’s
now possible to iterate over ranges of Unicode code points. It’s just like a
neat addition to me.
Ben: I think that’s all I have for 1.45, and so is that all you have as well?
Jon: I have one more, actually, which is— and this is something that, like,
has annoyed me so much in the past, which is, imagine that you have a string and
it has some prefix or some suffix you want to get rid of. Well, it used to be
that there was trim_start_matches
and trim_end_matches
, which would remove a
given sort of pattern from the start or the end of your string, but it would do
so repeatedly. So the idea is that you use trim for things like white space,
where you want to remove any occurrence of white space, no matter how many there
are. But if you want to remove like a string prefix, then you kind of only want
to remove it once because it could be that, like there’s like a— imagine there’s
a separator like a comma or something, or a double quote. If the thing inside
contains one of those, you don’t want to strip that as well, even if it comes at
the beginning. And now, finally, the str
type has a strip_prefix
and
strip_suffix
methods, that basically remove a single occurrence of the input
from the start or the end. This is something that, it just saves you writing
that function yourself, and it’s really nice.
And then I think there was one small Cargo change that’s worth mentioning. This is also a minor one, like there’s a lot— there are lots of fixes in Cargo in 1.45, but none that are particularly large. But one thing that you might notice is, in the past, Cargo was not great at telling you which files there were errors with. Like, for example, if you tried to compile a file that you didn’t have read permissions to, Cargo would just sort of fail with “access denied”. It wouldn’t tell you which file it failed on. And that has finally now been fixed. Like now, you will actually get the file system context for the various file system errors you get, which is a really nice addition.
That, I think, is everything I had for 1.45.
Ben: Okay. There we go. And I believe you also had some follow-up remarks for other things—
Jon: Yes, So this is a small point. But one thing that might be interesting to you, if you find these podcasts interesting, if you find the This Week in Rust podcast interesting, is that there’s a decent amount of content out there for Rust now, and especially from things like the core team. The core team of Rust recently announced that all of their agendas are now available, completely in public. That includes things like the sort of agendas, and I think even the transcripts from their meetings. And this is a great way just to keep up with what things are important to the Rust core team at the moment. So I recommend you give that a look if you’re interested. And I think that’s all I have.
Ben: And we’ll link to that in the show notes. And, yeah, that’s all that I have, all that you have.
Jon: Great. I guess I will see you again in one or two releases.
Ben: I kind of like this cadence, but I think it will depend on how big each release is. And so I don’t think we’ll— I think now that we have the This Week in Rust, to kind of fill out the roster, I’d say, let’s preserve this for every other by default. Unless there was a really big, like async/await would have deserved it’s own release.
Jon: Yeah, I think it works pretty well.
Well, so long, farewell, auf Wiedersehen and goodbye, as they say.
Ben: Yeah, as they say in Spain.
Jon: Sure. All right, see you, Ben.
Ben: Farewell.