Most of these mappings are obvious, but there are some nuances and gotchas with Rust FFI (Foreign Function Interface).
This document defines clear, one-to-one mappings between primitive types in C, Rust (and possible other languages in the future). Its purpose is to eliminate ambiguity in type widths, signedness, and binary representation across platforms and languages.
For Git, the only header required to use these unambiguous types in C is
git-compat-util.h.
Boolean types
| C Type | Rust Type |
|---|---|
bool1 |
bool |
Integer types
In C, <stdint.h> (or an equivalent) must be included.
| C Type | Rust Type |
|---|---|
uint8_t |
u8 |
uint16_t |
u16 |
uint32_t |
u32 |
uint64_t |
u64 |
int8_t |
i8 |
int16_t |
i16 |
int32_t |
i32 |
int64_t |
i64 |
Floating-point types
Rust requires IEEE-754 semantics. In C, that is typically true, but not guaranteed by the standard.
| C Type | Rust Type |
|---|---|
float2 |
f32 |
double2 |
f64 |
Size types
These types represent pointer-sized integers and are typically defined in
<stddef.h> or an equivalent header.
Size types should be used any time pointer arithmetic is performed e.g. indexing an array, describing the number of elements in memory, etc…
| C Type | Rust Type |
|---|---|
size_t3 |
usize |
ptrdiff_t3 |
isize |
Character types
This is where C and Rust don’t have a clean one-to-one mapping.
A C char and a Rust u8 share the same bit width, so any C struct containing
a char will have the same size as the corresponding Rust struct using u8.
In that sense, such structs are safe to pass over the FFI boundary, because
their fields will be laid out identically. However, beyond bit width, C char
has additional semantics and platform-dependent behavior that can cause
problems, as discussed below.
The C language leaves the signedness of char implementation defined. Because
our developer build enables -Wsign-compare, comparison of a value of char
type with either signed or unsigned integers may trigger warnings from the
compiler.
Note: Rust’s char type is an unsigned 32-bit integer that is used to describe
Unicode code points.
Notes
1 This is only true if stdbool.h (or equivalent) is used.
2 C does not enforce IEEE-754 compatibility, but Rust expects it. If the
platform/arch for C does not follow IEEE-754 then this equivalence does not
hold. Also, it’s assumed that float is 32 bits and double is 64, but
there may be a strange platform/arch where even this isn’t true.
3 C also defines uintptr_t, ssize_t and intptr_t, but these types are
discouraged for FFI purposes. For functions like read() and write() ssize_t
should be cast to a different, and unambiguous, type before being passed over
the FFI boundary.
Problems with std::ffi::c_* types in Rust
TL;DR: In practice, Rust’s c_* types aren’t guaranteed to match C types for
all possible C compilers, platforms, or architectures, because Rust only
ensures correctness of C types on officially supported targets. These
definitions have changed over time to match more targets which means that the
c_* definitions will differ based on which Rust version Git chooses to use.
Current list of safe, Rust side, FFI types in Git:
-
c_void -
CStr -
CString
Even then, they should be used sparingly, and only where the semantics match exactly.
The std::os::raw::c_* directly inherits the problems of core::ffi, which changes over time and seems to make a best guess at the correct definition for a given platform/target. This probably isn’t a problem for all other platforms that Rust supports currently, but can anyone say that Rust got it right for all C compilers of all platforms/targets?
Rust version 1.63.0
mod c_long_definition {
cfg_if! {
if #[cfg(all(target_pointer_width = "64", not(windows)))] {
pub type c_long = i64;
pub type NonZero_c_long = crate::num::NonZeroI64;
pub type c_ulong = u64;
pub type NonZero_c_ulong = crate::num::NonZeroU64;
} else {
// The minimal size of `long` in the C standard is 32 bits
pub type c_long = i32;
pub type NonZero_c_long = crate::num::NonZeroI32;
pub type c_ulong = u32;
pub type NonZero_c_ulong = crate::num::NonZeroU32;
}
}
}
Rust version 1.89.0
mod c_long_definition {
crate::cfg_select! {
any(
all(target_pointer_width = "64", not(windows)),
// wasm32 Linux ABI uses 64-bit long
all(target_arch = "wasm32", target_os = "linux")
) => {
pub(super) type c_long = i64;
pub(super) type c_ulong = u64;
}
_ => {
// The minimal size of `long` in the C standard is 32 bits
pub(super) type c_long = i32;
pub(super) type c_ulong = u32;
}
}
}
Even for the cases where C types are correctly mapped to Rust types via std::ffi::c_* there are still problems. Let’s take c_char for example. On some platforms it’s u8 on others it’s i8.
Subtraction underflow in debug mode
The following code will panic in debug on platforms that define c_char as u8, but won’t if it’s an i8.
let mut x: std::ffi::c_char = 0;
x -= 1;
Inconsistent shift behavior
x will be 0xC0 for platforms that use i8, but will be 0x40 where it’s u8.
let mut x: std::ffi::c_char = 0x80;
x >>= 1;
Equality fails to compile on some platforms
The following will not compile on platforms that define c_char as i8, but will
if it’s u8. You can cast x e.g. assert_eq!(x as u8, b'a');, but then you get
a warning on platforms that use u8 and a clean compilation where i8 is used.
let mut x: std::ffi::c_char = 0x61;
assert_eq!(x, b'a');
Enum types
Rust enum types should not be used as FFI types. Rust enum types are more like C union types than C enum’s. For something like:
#[repr(C, u8)]
enum Fruit {
Apple,
Banana,
Cherry,
}
It’s easy enough to make sure the Rust enum matches what C would expect, but a more complex type like.
enum HashResult {
SHA1([u8; 20]),
SHA256([u8; 32]),
}
The Rust compiler has to add a discriminant to the enum to distinguish between the variants. The width, location, and values for that discriminant is up to the Rust compiler and is not ABI stable.