Read raw C string to Rust... what's the right way to convert signed to unsigned in this context? - c

I'm binding some C functions to rust. I'm facing a little problem and I'd like to know the right way to solve it in rust.
Here's the function that I'd like to call from the C API:
extern "C" {
pub fn H5Aread(attr_id: hid_t, type_id: hid_t, buf: *mut c_char) -> herr_t;
}
The function reads something from a file, and stores it in buf.
So, I created this buffer in a vector:
let len: u64 = get_the_length();
let attr_raw_string: Vec<c_char> = Vec::new(); // c_char is equivalent to i8
attr_raw_string.resize(len as usize, 0);
let attr_raw_string_ptr = attr_raw_string.as_mut_ptr();
let read_error = H5Aread(attr_obj, attr_type, attr_raw_string_ptr);
if read_error < 0 {
panic!("...");
}
let result_str: String = String::from_utf8(attr_raw_string);
Now this doesn't compile because from_utf8 expects a Vec<u8>, but Vec<c_char> is a Vec<i8>.
Is there a way to fix this without having to copy and cast the string every time as a new type u8?

You were almost there.
For now, we're going to assume that the C side of your FFI boundary is correct - i.e. it properly generates a null-terminated string.
To efficiently assign and recover this in rust, we're going to use CStr. This creates a borrowed type referencing a C string in memory (i.e. a *const char). This does not allocate, since it is not an owned type.
We then convert this to a &str for the final comparison with what we expected. This is still not an owned type, so all we have created is our Vec<> that we effectively used as a buffer.
The full code is available below and on the playground:
#[test]
fn test() {
let len:u64 = 64;
// Allocate a buffer
let mut buffer:Vec<c_char> = Vec::with_capacity(len as usize);
let attr_raw_string_ptr = buffer.as_mut_ptr();
let read_error = unsafe { H5Aread(attr_raw_string_ptr) };
if read_error < 0 {
panic!("...");
}
let result = unsafe {
CStr::from_ptr(attr_raw_string_ptr)
};
let result_str = result.to_str().unwrap();
assert_eq!(result_str, "test");
}
Three important gotchas:
CStr::to_str() can fail (hence why it returns a Result<&str, _> when the content of the CStr is not valid utf-8. This is because both the rust String and &str types need to be valid utf-8.
Obviously, your input buffer needs to be at least the size of what your C function will throw into it. Refer to the C side to be able to make this guarantee.
CStr::from_ptr has a bunch of gotchas that you should at least keep in mind when using it

Related

How to convert from &str to [i8; 256]

I am working with a c api with automatically generated bindings by Bindgen. One of the property structs defined by the rust wrapper takes a [i8, 256], which needs to be a string such as "mystr"
The C declaration is something like this:
typedef struct Properties
{
...
/** Vendor name used to identify specific hardware requested */
char name[MAX_NAME];
...
}
The binding that Bindgen creates looks like this:
pub struct Properties
{
...
pub name: [::std::os::raw::c_char; 256usize],
...
}
My code needs to fill this field somehow. I experimented with something like this:
//property.name is of type [i8; 256]
property.name = *("myName".as_bytes()) as [i8; 256];
which results in an error like this:
non-primitive cast: `[u8]` as `[i8; 256]`
an `as` expression can only be used to convert between primitive types or to coerce to a specific trait object
This should probably actually be something more like a string copy function.
My current solution which seems to work, but isn't very clean, is defining a function to loop through the slice and put its characters into the array one by one. There has to be a better way to do this :
pub fn strcpy_to_arr_i8(out_arr: &[i8; 256], in_str: &str) -> Result<(), SimpleError> {
if in_str.len() > 256 {
bail!("Input str exceeds output buffer size.")
}
for (i, c) in in_str.chars().enumerate() {
out_arr[i] = c as i8;
}
Ok(())
}
Rust has byte literals, so "myName".as_bytes() is unnecessary. b"myName" will give you a byte array (note the initial b).
But these literals are unsigned bytes (u8). Since you seem to need signed bytes, you need to convert them: (*b"adfadsf").map(|u| u as i8).
It's also likely that a C-string would have to be 0-terminated. Do get this, you would need to add a \0 explicitly at the end of your string.

Why does the first character of my string get cut off when I pass from swift to C?

Here is the function in swift to convert from a swift string to a C string
func swiftStringToCString(swiftString: String) -> UnsafeMutablePointer<CString>?{
let convertedCString: [CChar]? = swiftString.cString(using: .utf8)
if let safeConvertedCString = convertedCString {
var cString = UnsafeMutablePointer<CString>.allocate(capacity: 1)
//defer {
// cString.deallocate()
//}
cString.pointee.count = UInt32(safeConvertedCString.count) - 1
cString.pointee.data = UnsafePointer<Int8>(safeConvertedCString)
return cString
}
else
{
return nil
}
}
The CString struct is defined in a C header file:
typedef struct {
const char *data;
uint32_t count;
} CString;
I also have an addition test function which simply prints out the string passed in:
extern void __cdecl testCString(CString *pCString);
When I call
testCString(swiftStringToCString(swiftString: "swiftString"))
This gets printed out:
wiftString
I also noticed that I get the warning
Initialization of 'UnsafePointer<Int8>' results in a dangling pointer
when I do
cString.pointee.data = UnsafePointer<Int8>(safeConvertedCString)
This approach is incorrect. You can't just hold onto pointers into a String's internal storage and expect it to stick around past the current line of code. That's what the warning is telling you.
In order to ensure that a pointer is valid, you need to use .withUnsafeBufferPointer. I would expect something along these lines:
"swiftString".utf8CString.withUnsafeBufferPointer { buffer in
var cString = CString(data: buffer.baseAddress!, count: UInt32(buffer.count))
testCString(&cString);
}
This ensures that the utf8 buffer exists until the end of the block. If you want to hold onto it beyond that, you're going to need to add code to copy those bytes into memory you allocate and release yourself.

Is there a more idiomatic way to keep an optional argument string from being freed?

I want to take command-line arguments in a Rust program and pass them to a C function. However, these arguments are optional and the program should behave differently if no arguments are supplied. I have read the docs for CString::as_ptr but I had hoped that keeping a local variable containing an Option containing the argument, (if it exists,) would keep that String from being freed as in the following example.
This Rust code:
extern crate libc;
use std::ffi::CString;
extern "C" {
fn print_in_c(opt_class: *const libc::c_char) -> libc::c_int;
}
fn main() {
let mut args = std::env::args();
//skip execuatble name
args.next();
let possible_arg = args.next();
println!("{:?}", possible_arg);
let arg_ptr = match possible_arg {
Some(arg) => CString::new(arg).unwrap().as_ptr(),
None => std::ptr::null(),
};
unsafe {
print_in_c(arg_ptr);
};
}
Along with this C code:
#include <stdio.h>
int
print_in_c(const char *bar)
{
puts("C:");
puts(bar);
return 0;
}
But this didn't work.
The code prints out the following when passed an argument of "foo":
Some("foo")
C:
Followed by a blank line.
I got the program to print the correct text if I change the Rust code to the following:
extern crate libc;
use std::ffi::CString;
extern "C" {
fn print_in_c(opt_class: *const libc::c_char) -> libc::c_int;
}
fn main() {
let mut args = std::env::args();
//skip execuatble name
args.next();
let possible_arg = args.next();
println!("{:?}", possible_arg);
let mut might_be_necessary = CString::new("").unwrap();
let arg_ptr = match possible_arg {
Some(arg) => {
might_be_necessary = CString::new(arg).unwrap();
might_be_necessary.as_ptr()
}
None => std::ptr::null(),
};
unsafe {
print_in_c(arg_ptr);
};
}
When run, this prints
Some("foo")
C:
foo
as expected.
This method technically works, but it is awkward to extend to multiple arguments and results in a compiler warning:
warning: value assigned to `might_be_necessary` is never read
--> src/main.rs:19:9
|
19 | let mut might_be_necessary = CString::new("").unwrap();
| ^^^^^^^^^^^^^^^^^^^^^^
|
= note: #[warn(unused_assignments)] on by default
Is there a better way to do this?
The problem is that your code is creating a temporary CString but holding on to just a pointer. The actual CString is dropped, while the dangling pointer is passed to the C function. To understand what's going on, it is useful to expand the pattern match to a more verbose form:
let arg_ptr = match possible_arg {
Some(arg) => {
let tmp = CString::new(arg).unwrap();
tmp.as_ptr()
} // <-- tmp gets destructed here, arg_ptr is dangling
None => std::ptr::null(),
};
Safe Rust prevents dangling pointers by only supporting pointer indirection through references, whose lifetimes are carefully tracked by the compiler. Any use of a reference that outlives the object is automatically rejected at compile time. But you are using raw pointers and an unsafe block which prevents those checks from taking place, so you need to manually ensure proper lifetimes. And indeed, the second snippet fixes the problem by creating a local variable that stores the CString for long enough for its value to outlive the pointer.
The prolonged lifetime comes at the cost of an additional local variable. But fortunately it can be avoided - since you already have a local variable that holds the pointer, you can modify that to store the actual CString, and extract the pointer only when actually needed:
let arg_cstring = possible_arg.map(|arg| CString::new(arg).unwrap());
unsafe {
print_in_c(arg_cstring.as_ref()
.map(|cs| cs.as_ptr())
.unwrap_or(std::ptr::null()));
}
There are several things to notice here:
arg_cstring holds an Option<CString>, which ensures that CString has storage that can outlive the pointer passed to the C function;
Option::as_ref() is used to prevent arg_cstring from being moved into map, which would again free it before the pointer was actually used;
Option::map() is used as an alternative to pattern matching when you want to express "do something with Option if Some, otherwise just leave it as None".
The pattern x.as_ref().map(|x| x.as_ptr().unwrap_or(null()) can and probably should be moved into a utility function if it is used more than once in the program. Be careful that the function takes reference to Option to avoid a move.

How do I create and initialize an immutable array?

I want to create an array. I don't need the array to be mutable, and at the time of creation, I have all the information I need to calculate the i-th member of the array. However, can't figure out how to create an immutable array in Rust.
Here's what I have now:
let mut my_array: [f32; 4] = [0.0; 4];
for i in 0..4 {
// some calculation, doesn't matter what exactly
my_array[i] = some_function(i);
}
And here's what I want:
let my_array: [f32; 4] = array_factory!(4, some_function);
How can I achieve that in Rust?
Here's the macro definition with sample usage:
macro_rules! array_factory(
($size: expr, $factory: expr) => ({
unsafe fn get_item_ptr<T>(slice: *mut [T], index: usize) -> *mut T {
(slice as *mut T).offset(index as isize)
}
let mut arr = ::std::mem::MaybeUninit::<[_; $size]>::uninit();
unsafe {
for i in 0..$size {
::std::ptr::write(get_item_ptr(arr.as_mut_ptr(), i), $factory(i));
}
arr.assume_init()
}
});
);
fn some_function(i: usize) -> f32 {
i as f32 * 3.125
}
fn main() {
let my_array: [f32; 4] = array_factory!(4, some_function);
println!("{} {} {} {}", my_array[0], my_array[1], my_array[2], my_array[3]);
}
The macro's body is essentially your first version, but with a few changes:
The type annotation on the array variable is omitted, because it can be inferred.
The array is created uninitialized, because we're going to overwrite all values immediately anyway. Messing with uninitialized memory is unsafe, so we must operate on it from within an unsafe block. Here, we're using MaybeUninit, which was introduced in Rust 1.36 to replace mem::uninitialized1.
Items are assigned using std::ptr::write() due to the fact that the array is uninitialized. Assignment would try to drop an uninitialized value in the array; the effects depend on the array item type (for types that implement Copy, like f32, it has no effect; for other types, it could crash).
The macro body is a block expression (i.e. it's wrapped in braces), and that block ends with an expression that is not followed by a semicolon, arr.assume_init(). The result of that block expression is therefore arr.assume_init().
Instead of using unsafe features, we can make a safe version of this macro; however, it requires that the array item type implements the Default trait. Note that we must use normal assignment here, to ensure that the default values in the array are properly dropped.
macro_rules! array_factory(
($size: expr, $factory: expr) => ({
let mut arr = [::std::default::Default::default(), ..$size];
for i in 0..$size {
arr[i] = $factory(i);
}
arr
});
)
1 And for a good reason. The previous version of this answer, which used mem::uninitialized, was not memory-safe: if a panic occurred while initializing the array (because the factory function panicked), and the array's item type had a destructor, the compiler would insert code to call the destructor on every item in the array; even the items that were not initialized yet! MaybeUninit avoids this problem because it wraps the value being initialized in ManuallyDrop, which is a magic type in Rust that prevents the destructor from running automatically.
Now, there is a (pretty popular) crate to do that exact thing: array_init
use array_init::array_init;
let my_array: [f32; 4] = array_init(some_function);
PS:
There is a lot of discussion and evolution around creating abstractions around arrays inside the rust team.
For example, the map function for arrays is already available, and it will become stable in rust 1.55.
If you wanted to, you could implement your function with map:
#![feature(array_map)]
let mut i = 0usize;
result = [(); 4].map(|_| {v = some_function(i);i = i+1; v})
And there are even discussions around your particular problem, you can look here
Try to make your macro expand to this:
let my_array = {
let mut tmp: [f32, ..4u] = [0.0, ..4u];
for i in range(0u, 4u) {
tmp[i] = somefunction(i);
}
tmp
};
What I don't know is whether this is properly optimized to avoid moving tmp to my_array. But for 4 f32 values (128 bits) it probably does not make a significant difference.

How do I convert a C string into a Rust string and back via FFI?

I'm trying to get a C string returned by a C library and convert it to a Rust string via FFI.
mylib.c
const char* hello(){
return "Hello World!";
}
main.rs
#![feature(link_args)]
extern crate libc;
use libc::c_char;
#[link_args = "-L . -I . -lmylib"]
extern {
fn hello() -> *c_char;
}
fn main() {
//how do I get a str representation of hello() here?
}
The best way to work with C strings in Rust is to use structures from the std::ffi module, namely CStr and CString.
CStr is a dynamically sized type and so it can only be used through a pointer. This makes it very similar to the regular str type. You can construct a &CStr from *const c_char using an unsafe CStr::from_ptr static method. This method is unsafe because there is no guarantee that the raw pointer you pass to it is valid, that it really does point to a valid C string and that the string's lifetime is correct.
You can get a &str from a &CStr using its to_str() method.
Here is an example:
extern crate libc;
use libc::c_char;
use std::ffi::CStr;
use std::str;
extern {
fn hello() -> *const c_char;
}
fn main() {
let c_buf: *const c_char = unsafe { hello() };
let c_str: &CStr = unsafe { CStr::from_ptr(c_buf) };
let str_slice: &str = c_str.to_str().unwrap();
let str_buf: String = str_slice.to_owned(); // if necessary
}
You need to take into account the lifetime of your *const c_char pointers and who owns them. Depending on the C API, you may need to call a special deallocation function on the string. You need to carefully arrange conversions so the slices won't outlive the pointer. The fact that CStr::from_ptr returns a &CStr with arbitrary lifetime helps here (though it is dangerous by itself); for example, you can encapsulate your C string into a structure and provide a Deref conversion so you can use your struct as if it was a string slice:
extern crate libc;
use libc::c_char;
use std::ops::Deref;
use std::ffi::CStr;
extern "C" {
fn hello() -> *const c_char;
fn goodbye(s: *const c_char);
}
struct Greeting {
message: *const c_char,
}
impl Drop for Greeting {
fn drop(&mut self) {
unsafe {
goodbye(self.message);
}
}
}
impl Greeting {
fn new() -> Greeting {
Greeting { message: unsafe { hello() } }
}
}
impl Deref for Greeting {
type Target = str;
fn deref<'a>(&'a self) -> &'a str {
let c_str = unsafe { CStr::from_ptr(self.message) };
c_str.to_str().unwrap()
}
}
There is also another type in this module called CString. It has the same relationship with CStr as String with str - CString is an owned version of CStr. This means that it "holds" the handle to the allocation of the byte data, and dropping CString would free the memory it provides (essentially, CString wraps Vec<u8>, and it's the latter that will be dropped). Consequently, it is useful when you want to expose the data allocated in Rust as a C string.
Unfortunately, C strings always end with the zero byte and can't contain one inside them, while Rust &[u8]/Vec<u8> are exactly the opposite thing - they do not end with zero byte and can contain arbitrary numbers of them inside. This means that going from Vec<u8> to CString is neither error-free nor allocation-free - the CString constructor both checks for zeros inside the data you provide, returning an error if it finds some, and appends a zero byte to the end of the byte vector which may require its reallocation.
Like String, which implements Deref<Target = str>, CString implements Deref<Target = CStr>, so you can call methods defined on CStr directly on CString. This is important because the as_ptr() method that returns the *const c_char necessary for C interoperation is defined on CStr. You can call this method directly on CString values, which is convenient.
CString can be created from everything which can be converted to Vec<u8>. String, &str, Vec<u8> and &[u8] are valid arguments for the constructor function, CString::new(). Naturally, if you pass a byte slice or a string slice, a new allocation will be created, while Vec<u8> or String will be consumed.
extern crate libc;
use libc::c_char;
use std::ffi::CString;
fn main() {
let c_str_1 = CString::new("hello").unwrap(); // from a &str, creates a new allocation
let c_str_2 = CString::new(b"world" as &[u8]).unwrap(); // from a &[u8], creates a new allocation
let data: Vec<u8> = b"12345678".to_vec(); // from a Vec<u8>, consumes it
let c_str_3 = CString::new(data).unwrap();
// and now you can obtain a pointer to a valid zero-terminated string
// make sure you don't use it after c_str_2 is dropped
let c_ptr: *const c_char = c_str_2.as_ptr();
// the following will print an error message because the source data
// contains zero bytes
let data: Vec<u8> = vec![1, 2, 3, 0, 4, 5, 0, 6];
match CString::new(data) {
Ok(c_str_4) => println!("Got a C string: {:p}", c_str_4.as_ptr()),
Err(e) => println!("Error getting a C string: {}", e),
}
}
If you need to transfer ownership of the CString to C code, you can call CString::into_raw. You are then required to get the pointer back and free it in Rust; the Rust allocator is unlikely to be the same as the allocator used by malloc and free. All you need to do is call CString::from_raw and then allow the string to be dropped normally.
In addition to what #vladimir-matveev has said, you can also convert between them without the aid of CStr or CString:
#![feature(link_args)]
extern crate libc;
use libc::{c_char, puts, strlen};
use std::{slice, str};
#[link_args = "-L . -I . -lmylib"]
extern "C" {
fn hello() -> *const c_char;
}
fn main() {
//converting a C string into a Rust string:
let s = unsafe {
let c_s = hello();
str::from_utf8_unchecked(slice::from_raw_parts(c_s as *const u8, strlen(c_s)+1))
};
println!("s == {:?}", s);
//and back:
unsafe {
puts(s.as_ptr() as *const c_char);
}
}
Just make sure that when converting from a &str to a C string, your &str ends with '\0'.
Notice that in the code above I use strlen(c_s)+1 instead of strlen(c_s), so s is "Hello World!\0", not just "Hello World!".
(Of course in this particular case it works even with just strlen(c_s). But with a fresh &str you couldn't guarantee that the resulting C string would terminate where expected.)
Here's the result of running the code:
s == "Hello World!\u{0}"
Hello World!

Resources