Related
My question is more on the side of performance in relation to the operations I want to do.
question
I have a u8 byte file where I know the byte offsets of the information I want to pull from it, and their lengths in bytes. Ideally I want to store this information in some kind of object for use afterwards.
ex.
Need Info1 # byte offset 0x2C
Need Info2 # byte offset 0X30
My naive solution to this problem is to offset the read_exact buffer on the file im reading then get the info i want at a variable byte length (see below).
naive solution
fn main() -> std::io::Result<()> {
let mut file = File::open("foo.xd")?;
// read till offset
let mut offset = [0; 0x2C];
file.read_exact(&mut offset)?;
// then get the byte of information i want
let mut info1 = [0; 0x1];
file.read_exact(&mut info1)?;
println!("{info1:?}");
// do some kind of repeatable process for the other information
Ok(())
}
I lack experience in rust to know whether performing this operation over and over again on different offsets is good or bad. My intuition says bad. Would someone be able to suggest a repeatable pattern here that would work for my use case, but is also valid from a performance standpoint?
As #Kenny mentions, if you want to jump large distances in a file you probably want to use the Seek trait.
data.txt:
abcdefghijklmnopqrstuvwxyz
main.rs:
use std::{
fs::File,
io::{Read, Seek, SeekFrom},
};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut file = File::open("data.txt")?;
// Buffer to read into
let mut buffer = [0u8; 5];
// Read at position 10
file.seek(SeekFrom::Start(10))?;
file.read_exact(&mut buffer)?;
println!("{:?}", std::str::from_utf8(&buffer).unwrap());
// Read at position 5
file.seek(SeekFrom::Start(5))?;
file.read_exact(&mut buffer)?;
println!("{:?}", std::str::from_utf8(&buffer).unwrap());
Ok(())
}
"klmno"
"fghij"
Of course, if your data lies in ordered positions, you could use relative seeks as well:
use std::{
fs::File,
io::{Read, Seek, SeekFrom},
};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut file = File::open("data.txt")?;
// Buffer to read into
let mut buffer = [0u8; 5];
// Read at position 5
file.seek(SeekFrom::Start(5))?;
file.read_exact(&mut buffer)?;
println!("{:?}", std::str::from_utf8(&buffer).unwrap());
// Read at position 12
file.seek(SeekFrom::Current(2))?;
file.read_exact(&mut buffer)?;
println!("{:?}", std::str::from_utf8(&buffer).unwrap());
Ok(())
}
"fghij"
"mnopq"
A little excurse.
You could wrap it in a function:
use std::{
fs::File,
io::{Read, Seek, SeekFrom},
};
fn read_at_position(f: &mut File, pos: u64, buffer: &mut [u8]) -> std::io::Result<()> {
f.seek(SeekFrom::Start(pos))?;
f.read_exact(buffer)
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut file = File::open("data.txt")?;
// Buffer to read into
let mut buffer = [0u8; 5];
// Read at position 10
read_at_position(&mut file, 10, &mut buffer)?;
println!("{:?}", std::str::from_utf8(&buffer).unwrap());
// Read at position 5
read_at_position(&mut file, 5, &mut buffer)?;
println!("{:?}", std::str::from_utf8(&buffer).unwrap());
Ok(())
}
Or even implement it as a trait for File:
use std::{
fs::File,
io::{Read, Seek, SeekFrom},
};
trait ReadAtPosition {
fn read_at_position(&mut self, pos: u64, buffer: &mut [u8]) -> std::io::Result<()>;
}
impl ReadAtPosition for File {
fn read_at_position(&mut self, pos: u64, buffer: &mut [u8]) -> std::io::Result<()> {
self.seek(SeekFrom::Start(pos))?;
self.read_exact(buffer)
}
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut file = File::open("data.txt")?;
// Buffer to read into
let mut buffer = [0u8; 5];
// Read at position 10
file.read_at_position(10, &mut buffer)?;
println!("{:?}", std::str::from_utf8(&buffer).unwrap());
// Read at position 5
file.read_at_position(5, &mut buffer)?;
println!("{:?}", std::str::from_utf8(&buffer).unwrap());
Ok(())
}
A very common pattern I have to deal with is, I am given some raw byte data. This data can represent an array of floats, 2D vectors, Matrices...
I know the data is compact and properly aligned. In C usually you would just do:
vec3 * ptr = (vec3*)data;
And start reading from it.
I am trying to create a view to this kind of data in rust to be able to read and write to the buffer as follows:
pub trait AccessView<T>
{
fn access_view<'a>(
offset : usize,
length : usize,
buffer : &'a Vec<u8>) -> &'a mut [T]
{
let bytes = &buffer[offset..(offset + length)];
let ptr = bytes.as_ptr() as *mut T;
return unsafe { std::slice::from_raw_parts_mut(ptr, length / size_of::<T>()) };
}
}
And then calling it:
let data: &[f32] =
AccessView::<f32>::access_view(0, 32, &buffers[0]);
The idea is, I should be able to replace f32 with vec3 or mat4 and get a slice view into the underlying data.
This is crashing with:
--> src/main.rs:341:9
|
341 | AccessView::<f32>::access_view(&accessors[0], &buffer_views, &buffers);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cannot infer type
|
= note: cannot satisfy `_: AccessView<f32>`
How could I use rust to achieve my goal? i.e. have a generic "template" for turning a set of raw bytes into a range checked slice view casted to some type.
There are two important problems I can identify:
You are using a trait incorrectly. You have to connect a trait to an actual type. If you want to call it the way you do, it needs to be a struct instead.
Soundness. You are creating a mutable reference from an immutable one through unsafe code. This is unsound and dangerous. By using unsafe, you tell the compiler that you manually verified that your code is sound, and the borrow checker should blindly believe you. Your code, however, is not sound.
To part 1, #BlackBeans gave you a good answer already. I would still do it a little differently, though. I would directly imlement the trait for &[u8], so you can write data.access_view::<T>().
To part 2, you at least need to make the input data &mut. Further, make sure they have the same lifetime, otherwise the compiler might not realize that they are actually connected.
Also, don't use &Vec<u8> as an argument; in general, use slices (&[u8]) instead.
Be aware that with all that said, there still is the problem of ENDIANESS. The behavior you will get will not be consistent between platforms. Use other means of conversion instead if that is something you require. Do not put this code in a generic library, at max use it for your own personal project.
That all said, here is what I came up with:
pub trait AccessView {
fn access_view<'a, T>(&'a mut self, offset: usize, length: usize) -> &'a mut [T];
}
impl AccessView for [u8] {
fn access_view<T>(&mut self, offset: usize, length: usize) -> &mut [T] {
let bytes = &mut self[offset..(offset + length)];
let ptr = bytes.as_ptr() as *mut T;
return unsafe { std::slice::from_raw_parts_mut(ptr, length / ::std::mem::size_of::<T>()) };
}
}
impl AccessView for Vec<u8> {
fn access_view<T>(&mut self, offset: usize, length: usize) -> &mut [T] {
self.as_mut_slice().access_view(offset, length)
}
}
fn main() {
let mut data: Vec<u8> = vec![1, 2, 3, 4, 5, 6, 7, 8];
println!("{:?}", data);
let float_view: &mut [f32] = data.access_view(2, 4);
float_view[0] = 42.0;
println!("{:?}", float_view);
println!("{:?}", data);
// println!("{:?}", float_view); // Adding this would cause a compiler error, which shows that we implemented lifetimes correctly
}
[1, 2, 3, 4, 5, 6, 7, 8]
[42.0]
[1, 2, 0, 0, 40, 66, 7, 8]
I think you didn't understood exactly what traits are. Traits represent a characteristic of a type, for instance, since I know the size at compile-time of u32 (32 bits), u32 implements the marker trait Sized, noted u32: Sized. A more feature-complete trait could be the Default one: if there is a "default" way of building of type T, then we can implement Default for it, so that now there is a standard default way of building it.
In your example, you are using a trait as a namespace for functions, ie you could simply have
fn access_view<'a, T>(
offset: usize,
length: usize,
buffer: &'a [u8]
) -> &'a mut T
{
let bytes = &buffer[offset..offset+length];
let ptr = bytes.as_ptr() as *mut T;
unsafe {
std::slice::from_raw_parts_mut(ptr, length / size_of::<T>()
}
}
Or, if you want to put it as a trait:
trait Viewable {
fn access_view<'a>(
offset: usize,
length: usize,
buffer: &'a [u8],
) -> &'a mut [Self]
{
let bytes = &buffer[offset..offset+length];
let ptr = bytes.as_ptr() as *mut T;
unsafe {
std::slice::from_raw_parts_mut(ptr, length / size_of::<T>()
}
}
}
Then implement it:
impl<T> Viewable for T {}
Or, again, differently
trait Viewable {
fn access_view<'a>(
offset: usize,
length: usize,
buffer: &'a [u8],
) -> &'a mut [Self];
}
impl<T> Viewable for T {
fn access_view<'a>(
offset: usize,
length: usize,
buffer: &'a [u8],
) -> &'a mut [Self]
{
let bytes = &buffer[offset..offset+length];
let ptr = bytes.as_ptr() as *mut T;
unsafe {
std::slice::from_raw_parts_mut(ptr, length / size_of::<T>()
}
}
}
Although all this way to structure the code will somehow produce the same result, it doesn't mean they're equivalent. Maybe you should learn a little bit more about traits before using them.
Also, your code, as is, really seems unsound, in the sense that you make a call to an unsafe function without any checking (ie. what if I call it with random nonsense in buffer?). It doesn't mean it is (we don't have access to the rest of your code), but you should be careful about that: Rust is not C.
Finally, your error simply comes from the fact that it's impossible for Rust to find out which type T you are calling the associated method access_view of.
I have a simple recursive data structure like this:
struct X{
i:u32,
n:Option<Box<X>>
}
fn new(i:u32,x:X)->X{
X{i,n:Some(Box::new(x))}
}
now I wish to iterate it in a loop.
let mut x = new(7,new(3,new(4,new(0,X{i:3,n:None}))));
let mut i:&mut X = &mut x;
while let Some(n) = &mut i.n{
println!("{}",i.i);
i = &mut n;
}
Unfortunately borrow checker does not seem to be happy with this code. Is there any way to make it work, or should I use raw pointers instead?
Nevermind. I figured it out. Turns out it's not as complicated
use std::borrow::BorrowMut;
struct X{
i:u32,
n:Option<Box<X>>
}
fn n(i:u32,x:X)->X{
X{i,n:Some(Box::new(x))}
}
let mut x = n(7,n(3,n(4,n(0,X{i:3,n:None}))));
let mut i:&mut X = &mut x;
while let Some(n) = &mut i.n{
println!("{}",i.i);
i = n.borrow_mut();
}
I am working on a project that involves reading different information from a file at different offsets.
Currently, I am using the following code:
// ------------------------ SECTORS PER CLUSTER ------------------------
// starts at 13
opened_file.seek(SeekFrom::Start(13)).unwrap();
let aux: &mut [u8] = &mut [0; 1];
let _buf = opened_file.read_exact(aux);
// ------------------------ RESERVED SECTORS ------------------------
// starts at 14
opened_file.seek(SeekFrom::Start(14)).unwrap();
let aux: &mut [u8] = &mut [0; 2];
let _buf = opened_file.read_exact(aux);
But as you can see, I need to create a new buffer of the size I want to read every time. I can't specify it directly as a parameter of the function.
I created a struct but I could not make a struct of all the different pieces of data I wanted. For example:
struct FileStruct {
a1: &mut [u8] &mut [0; 1],
a2: &mut [u8] &mut [0; 2],
}
Which are the types that are required for the read_exact method to work?
Is there a more effective way to read information from different offsets of a file without having to repeatedly copy-paste these lines of code for every piece of information I want to read from the file? Some sort of function, Cursor, or Vector to easily move around the offset? And a way to write this info into struct fields?
The easiest way is to have a struct of owned arrays, then seek and read into the struct.
use std::io::{self, prelude::*, SeekFrom};
#[derive(Debug, Clone, Default)]
struct FileStruct {
a1: [u8; 1],
a2: [u8; 2],
}
fn main() -> io::Result<()> {
let mut file_struct: FileStruct = Default::default();
let mut opened_file = unimplemented!(); // open file somehow
opened_file.seek(SeekFrom::Start(13))?;
opened_file.read_exact(&mut file_struct.a1)?;
opened_file.seek(SeekFrom::Start(14))?;
opened_file.read_exact(&mut file_struct.a2)?;
println!("{:?}", file_struct);
Ok(())
}
Playground link
This is still decently repetitive, so you can make a seek_read function to reduce the repetition:
use std::io::{self, prelude::*, SeekFrom};
#[derive(Debug, Clone, Default)]
struct FileStruct {
a1: [u8; 1],
a2: [u8; 2],
}
fn seek_read(mut reader: impl Read + Seek, offset: u64, buf: &mut [u8]) -> io::Result<()> {
reader.seek(SeekFrom::Start(offset))?;
reader.read_exact(buf)?;
Ok(())
}
fn main() -> io::Result<()> {
let mut file_struct: FileStruct = Default::default();
let mut opened_file = unimplemented!(); // open file somehow
seek_read(&mut opened_file, 13, &mut file_struct.a1)?;
seek_read(&mut opened_file, 14, &mut file_struct.a2)?;
println!("{:?}", file_struct);
Ok(())
}
Playground link
The repetition can be lowered even more by using a macro:
use std::io::{self, prelude::*, SeekFrom};
#[derive(Debug, Clone, Default)]
struct FileStruct {
a1: [u8; 1],
a2: [u8; 2],
}
macro_rules! read_offsets {
($file: ident, $file_struct: ident, []) => {};
($file: ident, $file_struct: ident, [$offset: expr => $field: ident $(, $offsets: expr => $fields: ident)*]) => {
$file.seek(SeekFrom::Start($offset))?;
$file.read_exact(&mut $file_struct.$field)?;
read_offsets!($file, $file_struct, [$($offsets => $fields),*]);
}
}
fn main() -> io::Result<()> {
let mut file_struct: FileStruct = Default::default();
let mut opened_file = unimplemented!(); // open file somehow
read_offsets!(opened_file, file_struct, [13 => a1, 14 => a2]);
println!("{:?}", file_struct);
Ok(())
}
Playground link
This is a complementary answer to Aplet123's: it's not quite clear that you must store the bytes as is into a structure, so you can also allocate one buffer (as a fixed-size array) and reuse it with the correctly sized slice e.g.
let mut buf = [0u8;16];
opened_file.read_exact(&mut buf[..4])?; // will read 4 bytes
// do thing with the first 4 bytes
opened_file.read_exact(&mut buf[..8])?; // will read 8 bytes this time
// etc...
You could also use the byteorder crate, which lets you directly read numbers or sequences of numbers. It basically just does the unrelying "create stack buffer of the right size; read; decode" for you.
That's especially useful because it looks a lot like "SECTORS PER CLUSTER" should be a u8 and "RESERVED SECTORS" should be a u16. With byteorder you can straight read_16() or read_u8().
Also building on Aplet123's answer, the following function seek_read doesn't require to know how many bytes to read at compile time, since it uses a Vector instead of a byte slice:
// Starting at `offset`, reads the `amount_to_read` from `reader`.
// Returns the bytes as a vector.
fn seek_read(
reader: &mut (impl Read + Seek),
offset: u64,
amount_to_read: usize,
) -> Result<Vec<u8>> {
// A buffer filled with as many zeros as we'll read with read_exact
let mut buf = vec![0; amount_to_read];
reader.seek(SeekFrom::Start(offset))?;
reader.read_exact(&mut buf)?;
Ok(buf)
}
Here are some tests to demonstrate how seek_read behaves:
use std::io::Cursor;
#[test]
fn seek_read_works() {
let bytes = b"Hello world!";
let mut reader = Cursor::new(bytes);
assert_eq!(seek_read(&mut reader, 0, 2).unwrap(), b"He");
assert_eq!(seek_read(&mut reader, 1, 4).unwrap(), b"ello");
assert_eq!(seek_read(&mut reader, 6, 5).unwrap(), b"world");
assert_eq!(seek_read(&mut reader, 2, 0).unwrap(), b"");
}
#[test]
#[should_panic(expected = "failed to fill whole buffer")]
fn seek_read_beyond_buffer_fails() {
let mut reader = Cursor::new(b"Hello world!");
seek_read(&mut reader, 6, 99).unwrap();
}
#[test]
#[should_panic(expected = "failed to fill whole buffer")]
fn start_seek_reading_beyond_buffer_fails() {
let mut reader = Cursor::new(b"Hello world!");
seek_read(&mut reader, 99, 1).unwrap();
}
in the code below the creation of the string buffer
is a quickest way i have found as there's no allocation deallocation done
if i understand correctly
pub extern fn rust_print_file() -> *mut PackChar {
//set min size to 50 - avoid expanding when line count is 50 or less
let mut out_vec = Vec::with_capacity(50);
let mut curdr = env::current_dir().unwrap();//get path to file dir
let fl_str = "file_test.txt";
curdr.push(fl_str);//created full path to be used
let file = BufReader::new(File::open(curdr).unwrap());
//here i try to accommodate each line in a struct
let mut line_index = 0;
for line in file.lines() {
let cur_line = line.unwrap();
let loclbuf_size = cur_line.len();
let mut loclbuf = String::with_capacity(buffer_size);
//i tried two ways
loclbuf.push_str(cur_line.unwrap()); // can't be done
loclbuf.push_str(line.unwrap()); // can't be done too
let pack_char = PackChar {
int_val: line_index,
buffer_size: loclbuf_size as i32,
buffer: loclbuf.as_ptr() as *mut _,
};
line_index+=1;
mem::forget(buffer);
out_vec.push(pack_char);
}
Box::into_raw(out_vec.into_boxed_slice()) as *mut _
}
this is the struct i am using to pass to C#
#[repr(C)]
pub struct PackChar {
pub int_val: c_int,
pub buffer: *mut c_char,
pub buffer_size: c_int,
}
when generating some dummy text
i have checked, and it passes data correctly to "the other side" to use it.
but not with the read line task, produced text as coded above.
and this is another way i have tried, although i prefer the above code but this one throws a compile error.
error: use of moved value: buffer [E0382] on forget(buffer)
#[no_mangle]
pub extern fn rust_return_file_read_lines() -> *mut PackChar {
let mut out_vec = Vec::with_capacity(50);
let mut curdr = env::current_dir().unwrap();
let fl_str = "file_test.txt";
curdr.push(fl_str);
let file = BufReader::new(File::open(curdr).unwrap());
let mut lindex = 0;
for line in file.lines() {
let tmpbuffer = line.unwrap().into_bytes();
let tmpbuffer_size = buffer.len();
let pack_char = PackChar {
int_val: lindex,
buffer_size: tmpbuffer_size as i32,
buffer: Box::into_raw(tmpbuffer.into_boxed_slice()) as *mut _
};
lindex+=1;
mem::forget(buffer);
out_vec.push(pack_char);
}
Box::into_raw(out_vec.into_boxed_slice()) as *mut _
}
Edit
as long as the type of buffer :
buffer: loclbuf.as_ptr() as *mut _,
i could pass the the data correctly to c#.
so how could i read lines that way so each will be stored into buffer as described ?
as it appear now it seem that there was a bug in my visual studio, it's not the first time it ever happen but as i am new to rust i was sure that the code is wrong .
this is what is working for me, i will be happy to have comments and suggestions
extern crate libc;
use std::env;
use libc::c_char;
use libc::c_int;
use std::mem;
use std::io::{BufReader, BufRead};
use std::fs::File;
#[repr(C)]
pub struct PackChar {
pub int_val: c_int,
pub buffer: *mut c_char, // changed
pub dbuffer_size: c_int, // added
}
#[no_mangle]
pub extern fn rust_print_file() -> *mut PackChar {
let mut out_vec = Vec::with_capacity(50 as usize);
let mut cwd = env::current_dir().unwrap();
let fl_str = "file_test.txt";
cwd.push(fl_str);
let file = BufReader::new(File::open(cwd).unwrap());
for (index, line) in file.lines().enumerate() {
let buffer = line.unwrap();
let buffer_size = buffer.len();
let pack_char = PackChar {
int_val: index as i32,
dbuffer_size: buffer_size as i32,
buffer: buffer.as_ptr() as *mut _,
};
mem::forget(buffer); // don't deallocate memory
out_vec.push(pack_char);
}
Box::into_raw(out_vec.into_boxed_slice()) as *mut _ // changed
}