How do I convert a C string into a Rust string and back via FFI? - c

I'm trying to get a C string returned by a C library and convert it to a Rust string via FFI.
mylib.c
const char* hello(){
return "Hello World!";
}
main.rs
#![feature(link_args)]
extern crate libc;
use libc::c_char;
#[link_args = "-L . -I . -lmylib"]
extern {
fn hello() -> *c_char;
}
fn main() {
//how do I get a str representation of hello() here?
}

The best way to work with C strings in Rust is to use structures from the std::ffi module, namely CStr and CString.
CStr is a dynamically sized type and so it can only be used through a pointer. This makes it very similar to the regular str type. You can construct a &CStr from *const c_char using an unsafe CStr::from_ptr static method. This method is unsafe because there is no guarantee that the raw pointer you pass to it is valid, that it really does point to a valid C string and that the string's lifetime is correct.
You can get a &str from a &CStr using its to_str() method.
Here is an example:
extern crate libc;
use libc::c_char;
use std::ffi::CStr;
use std::str;
extern {
fn hello() -> *const c_char;
}
fn main() {
let c_buf: *const c_char = unsafe { hello() };
let c_str: &CStr = unsafe { CStr::from_ptr(c_buf) };
let str_slice: &str = c_str.to_str().unwrap();
let str_buf: String = str_slice.to_owned(); // if necessary
}
You need to take into account the lifetime of your *const c_char pointers and who owns them. Depending on the C API, you may need to call a special deallocation function on the string. You need to carefully arrange conversions so the slices won't outlive the pointer. The fact that CStr::from_ptr returns a &CStr with arbitrary lifetime helps here (though it is dangerous by itself); for example, you can encapsulate your C string into a structure and provide a Deref conversion so you can use your struct as if it was a string slice:
extern crate libc;
use libc::c_char;
use std::ops::Deref;
use std::ffi::CStr;
extern "C" {
fn hello() -> *const c_char;
fn goodbye(s: *const c_char);
}
struct Greeting {
message: *const c_char,
}
impl Drop for Greeting {
fn drop(&mut self) {
unsafe {
goodbye(self.message);
}
}
}
impl Greeting {
fn new() -> Greeting {
Greeting { message: unsafe { hello() } }
}
}
impl Deref for Greeting {
type Target = str;
fn deref<'a>(&'a self) -> &'a str {
let c_str = unsafe { CStr::from_ptr(self.message) };
c_str.to_str().unwrap()
}
}
There is also another type in this module called CString. It has the same relationship with CStr as String with str - CString is an owned version of CStr. This means that it "holds" the handle to the allocation of the byte data, and dropping CString would free the memory it provides (essentially, CString wraps Vec<u8>, and it's the latter that will be dropped). Consequently, it is useful when you want to expose the data allocated in Rust as a C string.
Unfortunately, C strings always end with the zero byte and can't contain one inside them, while Rust &[u8]/Vec<u8> are exactly the opposite thing - they do not end with zero byte and can contain arbitrary numbers of them inside. This means that going from Vec<u8> to CString is neither error-free nor allocation-free - the CString constructor both checks for zeros inside the data you provide, returning an error if it finds some, and appends a zero byte to the end of the byte vector which may require its reallocation.
Like String, which implements Deref<Target = str>, CString implements Deref<Target = CStr>, so you can call methods defined on CStr directly on CString. This is important because the as_ptr() method that returns the *const c_char necessary for C interoperation is defined on CStr. You can call this method directly on CString values, which is convenient.
CString can be created from everything which can be converted to Vec<u8>. String, &str, Vec<u8> and &[u8] are valid arguments for the constructor function, CString::new(). Naturally, if you pass a byte slice or a string slice, a new allocation will be created, while Vec<u8> or String will be consumed.
extern crate libc;
use libc::c_char;
use std::ffi::CString;
fn main() {
let c_str_1 = CString::new("hello").unwrap(); // from a &str, creates a new allocation
let c_str_2 = CString::new(b"world" as &[u8]).unwrap(); // from a &[u8], creates a new allocation
let data: Vec<u8> = b"12345678".to_vec(); // from a Vec<u8>, consumes it
let c_str_3 = CString::new(data).unwrap();
// and now you can obtain a pointer to a valid zero-terminated string
// make sure you don't use it after c_str_2 is dropped
let c_ptr: *const c_char = c_str_2.as_ptr();
// the following will print an error message because the source data
// contains zero bytes
let data: Vec<u8> = vec![1, 2, 3, 0, 4, 5, 0, 6];
match CString::new(data) {
Ok(c_str_4) => println!("Got a C string: {:p}", c_str_4.as_ptr()),
Err(e) => println!("Error getting a C string: {}", e),
}
}
If you need to transfer ownership of the CString to C code, you can call CString::into_raw. You are then required to get the pointer back and free it in Rust; the Rust allocator is unlikely to be the same as the allocator used by malloc and free. All you need to do is call CString::from_raw and then allow the string to be dropped normally.

In addition to what #vladimir-matveev has said, you can also convert between them without the aid of CStr or CString:
#![feature(link_args)]
extern crate libc;
use libc::{c_char, puts, strlen};
use std::{slice, str};
#[link_args = "-L . -I . -lmylib"]
extern "C" {
fn hello() -> *const c_char;
}
fn main() {
//converting a C string into a Rust string:
let s = unsafe {
let c_s = hello();
str::from_utf8_unchecked(slice::from_raw_parts(c_s as *const u8, strlen(c_s)+1))
};
println!("s == {:?}", s);
//and back:
unsafe {
puts(s.as_ptr() as *const c_char);
}
}
Just make sure that when converting from a &str to a C string, your &str ends with '\0'.
Notice that in the code above I use strlen(c_s)+1 instead of strlen(c_s), so s is "Hello World!\0", not just "Hello World!".
(Of course in this particular case it works even with just strlen(c_s). But with a fresh &str you couldn't guarantee that the resulting C string would terminate where expected.)
Here's the result of running the code:
s == "Hello World!\u{0}"
Hello World!

Related

Swift: Turning a String into UnsafeMutablePointer<Int8>

I have a C function mapped to Swift defined as
predictX010(inputString: UnsafePointer<emxArray_char_T>!, ypred: UnsafeMutablePointer<emxArray_real_T>!)
I want to input a string in the inputString, but in order to do that I have to play around with emxArray_char_T which is
emxArray_char_T.init(data: UnsafeMutablePointer<Int8>!, size: UnsafeMutablePointer<Int32>!, allocatedSize: Int32, numDimensions: Int32, canFreeData: boolean_T)
My string will consist of let x = ":1580222503,GCP001,007,Male,30,Left,1,IL8 and IL10,0000; 0,281411,-78,521074,-3,344657,132,347776,-93,25,44" and I just cannot figure out how to input it in data of the emxArray_char_T
First you should write a wrapper function in your bridging header that accepts char pointers then cast/create the emxArray_char_T in c IE:
// casting the func as void because you didn't specify
void predictWrapper(const char *aChar, char *bChar) {
// do your casting and call original func predictX010(...)
}
Then in swift (This isn't going to be pretty)
var arg: String = "some arg"
var arg2: String = "another arg"
// use closure to prevent dangling pointer
arg.withCString{ body in // body is UnsafePointer<Int8>
arg2.withCString{ body2 in // we'll cast this to UnsafeMutablePointer
var arg2Mutable = UnsafeMutablePointer<Int8>(mutating: body2)
//call your wrapper
predictWrapper(body, arg2Mutable)
}
}
You may be able to use the original types and function, but i've always found it easier (less banging my head on the desk) to use the most standard c types you can in swift and casting to custom/complex types in c

Pointer from Rust to C via bindgen: first element is always zero

I use bindgen to generate a C interface for my Rust code. I want to return a structure that contains an Option<Vec<f64>> from Rust to C. In Rust I have created the following structure:
#[repr(C)]
pub struct mariettaSolverStatus {
lagrange: *const c_double
}
which bindgen translates into the following C structure:
/* Auto-generated structure */
typedef struct {
const double *lagrange;
} mariettaSolverStatus;
the corresponding structure in Rust is
pub struct AlmOptimizerStatus {
lagrange_multipliers: Option<Vec<f64>>,
}
impl AlmOptimizerStatus {
pub fn lagrange_multipliers(&self) -> &Option<Vec<f64>> {
&self.lagrange_multipliers
}
}
The idea is to map AlmOptimizerStatus (in Rust) to mariettaSolverStatus (in C). When lagrange_multipliers is None, a null pointer will be assigned to the pointer in C.
Now in Rust, I have the following function:
#[no_mangle]
pub extern "C" fn marietta_solve(
instance: *mut mariettaCache,
u: *mut c_double,
params: *const c_double
) -> mariettaSolverStatus {
/* obtain an instance of `AlmOptimizerStatus`, which contains
* an instance of `&Option<Vec<f64>>`
*/
let status = solve(params, &mut instance.cache, u, 0, 0);
/* At this point, if we print status.langrange_multipliers() we get
*
* Some([-14.079295698854809,
* 12.321753192707693,
* 2.5355683425384417
* ])
*
*/
/* return an instance of `mariettaSolverStatus` */
mariettaSolverStatus {
lagrange: match &status.lagrange_multipliers() {
/* cast status.lagrange_multipliers() as a `*const c_double`,
* i.e., get a constant pointer to the data
*/
Some(y) => {y.as_ptr() as *const c_double},
/* return NULL, otherwise */
None => {0 as *const c_double},
}
}
}
Bindgen generates a C header and library files that allow us to invoke Rust functions in C. Up to this point I should say that I get no warnings from Rust.
However, when I call the above function from C, using the auto-generated C interface, the first element of mariettaSolverStatus.lagrange is always 0, whereas, all subsequent elements are correctly stored.
This is my C code:
#include <stdio.h>
#include "marietta_bindings.h"
int main() {
int i;
double p[MARIETTA_NUM_PARAMETERS] = {2.0, 10.0}; /* parameters */
double u[MARIETTA_NUM_DECISION_VARIABLES] = {0}; /* initial guess */
double init_penalty = 10.0;
double y[MARIETTA_N1] = {0.0};
/* obtain cache */
mariettaCache *cache = marietta_new();
/* solve */
mariettaSolverStatus status = marietta_solve(cache, u, p, y, &init_penalty);
/* prints:
* y[0] = 0 <------- WRONG!
* y[1] = 12.3218
* y[2] = 2.5356
*/
for (i = 0; i < MARIETTA_N1; ++i) {
printf("y[%d] = %g\n", i, status.lagrange[i]);
}
/* free memory */
marietta_free(cache);
return 0;
}
I would guess that somehow, somewhere, some pointer goes out of scope.
I'm pretty sure the issue lies in your implementation of marietta_solve. Let's walk through line by line
let status = solve(params, &mut instance.cache, u, 0, 0);
You've assigned an AlmOptimizerStatus and all its inner members. Up to here, everything is kosher (assuming solve doesn't do silly things)
mariettaSolverStatus {
lagrange: match &status.lagrange_multipliers() {
/* cast status.lagrange_multipliers() as a `*const c_double`,
* i.e., get a constant pointer to the data
*/
Some(y) => {y.as_ptr() as *const c_double},
/* return NULL, otherwise */
None => {0 as *const c_double},
}
}
You then decide to return a raw pointer to a struct that is about to get out of scope and get dropped (status). Inside, you have the Option<Vec<f64>> you are returning a pointer to.
As a result, this leads to UB - your vector is no longer in memory, but you have a raw pointer to it. And, since rust does not protect you against this when using raw pointers, no error comes out. The moment you allocate something else (as you do when you defined int i), you potentially overwrite some of the memory you've used (and freed) prior.
You can convince yourself of this with this playground example, where I have replaced the raw pointers with references to trigger the borrow checker.
In order to get out of this problem, you will need to forcibly cause Rust to forget the existence of the vector, like so (playground):
impl AlmOptimizerStatus {
pub fn lagrange_multipliers(self) -> Vec<f64> {
self.lagrange_multipliers.unwrap_or(vec![])
}
}
fn test() -> *const c_double {
let status = solve();
let output = status.lagrange_multipliers();
let ptr = output.as_ptr();
std::mem::forget(output);
ptr
}
Notice the changes:
lagrange_multipliers() now destructures your struct and takes the inner vector. If you do not want this, you'll need to make a copy of it instead. As this wasn't the purpose of the question, I went with destructuring to keep the code down
std::mem::forget forgets a rust object, allowing it to go out of scope without being deallocated. This is how you typically pass objects across the FFI boundary, the second option being allocating memory via MaybeUninit, std::ptr or other means.
And the evident gotcha: doing this without dealing with the memory leak we have created on either the C side (via free) or the rust side (by recombining the Vec and then properly dropping it) will, evidently, leak memory

Read raw C string to Rust... what's the right way to convert signed to unsigned in this context?

I'm binding some C functions to rust. I'm facing a little problem and I'd like to know the right way to solve it in rust.
Here's the function that I'd like to call from the C API:
extern "C" {
pub fn H5Aread(attr_id: hid_t, type_id: hid_t, buf: *mut c_char) -> herr_t;
}
The function reads something from a file, and stores it in buf.
So, I created this buffer in a vector:
let len: u64 = get_the_length();
let attr_raw_string: Vec<c_char> = Vec::new(); // c_char is equivalent to i8
attr_raw_string.resize(len as usize, 0);
let attr_raw_string_ptr = attr_raw_string.as_mut_ptr();
let read_error = H5Aread(attr_obj, attr_type, attr_raw_string_ptr);
if read_error < 0 {
panic!("...");
}
let result_str: String = String::from_utf8(attr_raw_string);
Now this doesn't compile because from_utf8 expects a Vec<u8>, but Vec<c_char> is a Vec<i8>.
Is there a way to fix this without having to copy and cast the string every time as a new type u8?
You were almost there.
For now, we're going to assume that the C side of your FFI boundary is correct - i.e. it properly generates a null-terminated string.
To efficiently assign and recover this in rust, we're going to use CStr. This creates a borrowed type referencing a C string in memory (i.e. a *const char). This does not allocate, since it is not an owned type.
We then convert this to a &str for the final comparison with what we expected. This is still not an owned type, so all we have created is our Vec<> that we effectively used as a buffer.
The full code is available below and on the playground:
#[test]
fn test() {
let len:u64 = 64;
// Allocate a buffer
let mut buffer:Vec<c_char> = Vec::with_capacity(len as usize);
let attr_raw_string_ptr = buffer.as_mut_ptr();
let read_error = unsafe { H5Aread(attr_raw_string_ptr) };
if read_error < 0 {
panic!("...");
}
let result = unsafe {
CStr::from_ptr(attr_raw_string_ptr)
};
let result_str = result.to_str().unwrap();
assert_eq!(result_str, "test");
}
Three important gotchas:
CStr::to_str() can fail (hence why it returns a Result<&str, _> when the content of the CStr is not valid utf-8. This is because both the rust String and &str types need to be valid utf-8.
Obviously, your input buffer needs to be at least the size of what your C function will throw into it. Refer to the C side to be able to make this guarantee.
CStr::from_ptr has a bunch of gotchas that you should at least keep in mind when using it

Is there a more idiomatic way to keep an optional argument string from being freed?

I want to take command-line arguments in a Rust program and pass them to a C function. However, these arguments are optional and the program should behave differently if no arguments are supplied. I have read the docs for CString::as_ptr but I had hoped that keeping a local variable containing an Option containing the argument, (if it exists,) would keep that String from being freed as in the following example.
This Rust code:
extern crate libc;
use std::ffi::CString;
extern "C" {
fn print_in_c(opt_class: *const libc::c_char) -> libc::c_int;
}
fn main() {
let mut args = std::env::args();
//skip execuatble name
args.next();
let possible_arg = args.next();
println!("{:?}", possible_arg);
let arg_ptr = match possible_arg {
Some(arg) => CString::new(arg).unwrap().as_ptr(),
None => std::ptr::null(),
};
unsafe {
print_in_c(arg_ptr);
};
}
Along with this C code:
#include <stdio.h>
int
print_in_c(const char *bar)
{
puts("C:");
puts(bar);
return 0;
}
But this didn't work.
The code prints out the following when passed an argument of "foo":
Some("foo")
C:
Followed by a blank line.
I got the program to print the correct text if I change the Rust code to the following:
extern crate libc;
use std::ffi::CString;
extern "C" {
fn print_in_c(opt_class: *const libc::c_char) -> libc::c_int;
}
fn main() {
let mut args = std::env::args();
//skip execuatble name
args.next();
let possible_arg = args.next();
println!("{:?}", possible_arg);
let mut might_be_necessary = CString::new("").unwrap();
let arg_ptr = match possible_arg {
Some(arg) => {
might_be_necessary = CString::new(arg).unwrap();
might_be_necessary.as_ptr()
}
None => std::ptr::null(),
};
unsafe {
print_in_c(arg_ptr);
};
}
When run, this prints
Some("foo")
C:
foo
as expected.
This method technically works, but it is awkward to extend to multiple arguments and results in a compiler warning:
warning: value assigned to `might_be_necessary` is never read
--> src/main.rs:19:9
|
19 | let mut might_be_necessary = CString::new("").unwrap();
| ^^^^^^^^^^^^^^^^^^^^^^
|
= note: #[warn(unused_assignments)] on by default
Is there a better way to do this?
The problem is that your code is creating a temporary CString but holding on to just a pointer. The actual CString is dropped, while the dangling pointer is passed to the C function. To understand what's going on, it is useful to expand the pattern match to a more verbose form:
let arg_ptr = match possible_arg {
Some(arg) => {
let tmp = CString::new(arg).unwrap();
tmp.as_ptr()
} // <-- tmp gets destructed here, arg_ptr is dangling
None => std::ptr::null(),
};
Safe Rust prevents dangling pointers by only supporting pointer indirection through references, whose lifetimes are carefully tracked by the compiler. Any use of a reference that outlives the object is automatically rejected at compile time. But you are using raw pointers and an unsafe block which prevents those checks from taking place, so you need to manually ensure proper lifetimes. And indeed, the second snippet fixes the problem by creating a local variable that stores the CString for long enough for its value to outlive the pointer.
The prolonged lifetime comes at the cost of an additional local variable. But fortunately it can be avoided - since you already have a local variable that holds the pointer, you can modify that to store the actual CString, and extract the pointer only when actually needed:
let arg_cstring = possible_arg.map(|arg| CString::new(arg).unwrap());
unsafe {
print_in_c(arg_cstring.as_ref()
.map(|cs| cs.as_ptr())
.unwrap_or(std::ptr::null()));
}
There are several things to notice here:
arg_cstring holds an Option<CString>, which ensures that CString has storage that can outlive the pointer passed to the C function;
Option::as_ref() is used to prevent arg_cstring from being moved into map, which would again free it before the pointer was actually used;
Option::map() is used as an alternative to pattern matching when you want to express "do something with Option if Some, otherwise just leave it as None".
The pattern x.as_ref().map(|x| x.as_ptr().unwrap_or(null()) can and probably should be moved into a utility function if it is used more than once in the program. Be careful that the function takes reference to Option to avoid a move.

How can I append a formatted string to an existing String?

Using format!, I can create a String from a format string, but what if I already have a String that I'd like to append to? I would like to avoid allocating the second string just to copy it and throw away the allocation.
let s = "hello ".to_string();
append!(s, "{}", 5); // Doesn't exist
A close equivalent in C/C++ would be snprintf.
I see now that String implements Write, so we can use write!:
use std::fmt::Write;
pub fn main() {
let mut a = "hello ".to_string();
write!(a, "{}", 5).unwrap();
println!("{}", a);
assert_eq!("hello 5", a);
}
(Playground)
It is impossible for this write! call to return an Err, at least as of Rust 1.47, so the unwrap should not cause concern.

Resources