How do I get a random line from a file? - file

I'm trying to get a random line from a file:
extern crate rand;
use rand::Rng;
use std::{
fs::File,
io::{prelude::*, BufReader},
};
const FILENAME: &str = "/etc/hosts";
fn find_word() -> String {
let f = File::open(FILENAME).expect(&format!("(;_;) file not found: {}", FILENAME));
let f = BufReader::new(f);
let lines: Vec<_> = f.lines().collect();
let n = rand::thread_rng().gen_range(0, lines.len());
let line = lines
.get(n)
.expect(&format!("(;_;) Couldn't get {}th line", n))
.unwrap_or(String::from(""));
line
}
This code doesn't work:
error[E0507]: cannot move out of borrowed content
--> src/main.rs:18:16
|
18 | let line = lines
| ________________^
19 | | .get(n)
20 | | .expect(&format!("(;_;) Couldn't get {}th line", n))
| |____________________________________________________________^ cannot move out of borrowed content
I tried adding .clone() before .expect(...) and before .unwrap_or(...) but it gave the same error.
Is there a better way to get a random line from a file that doesn't involve collecting the whole file in a Vec?

Use IteratorRandom::choose to randomly sample from an iterator using reservoir sampling. This will scan through the entire file once, creating Strings for each line, but it will not create a giant vector for every line:
use rand::seq::IteratorRandom; // 0.7.3
use std::{
fs::File,
io::{BufRead, BufReader},
};
const FILENAME: &str = "/etc/hosts";
fn find_word() -> String {
let f = File::open(FILENAME)
.unwrap_or_else(|e| panic!("(;_;) file not found: {}: {}", FILENAME, e));
let f = BufReader::new(f);
let lines = f.lines().map(|l| l.expect("Couldn't read line"));
lines
.choose(&mut rand::thread_rng())
.expect("File had no lines")
}
Your original problem is that:
slice::get returns an optional reference into the vector.
You can either clone this or take ownership of the value:
let line = lines[n].cloned()
let line = lines.swap_remove(n)
Both of these panic if n is out-of-bounds, which is reasonable here as you know that you are in bounds.
BufRead::lines returns io::Result<String>, so you have to handle that error case.
Additionally, don't use format! with expect:
expect(&format!("..."))
This will unconditionally allocate memory. When there's no failure, that allocation is wasted. Use unwrap_or_else as shown.

Is there a better way to get a random line from a file that doesn't involve collecting the whole file in a Vec?
You will always need to read the whole file, if only to know the number of lines. However, you don't need to store everything in memory, you can read lines one by one and discard them as you go so that you only keep one in the end. Here is how it goes:
Read and store the first line;
Read the second line, draw a random choice and either:
keep the first line with a probability of 50%,
or discard the first line and store the second line with a probability of 50%,
Keep reading lines from the file and for line number n, draw a random choice and:
keep the currently stored line with a probability of (n-1)/n,
or replace the currently stored line with the current line with a probability of 1/n.
Note that this is more or less what sample_iter does, except that sample_iter is more generic since it can work on any iterator and it can pick samples of any size (eg. it can choose k items randomly).

Related

How to completely remove a line from a file?

How do I completely remove a line in Rust? Not just replace it with an empty line.
In Rust, when you delete a line from a file with the following code as an example:
let mut file: File = File::open("file.txt").unwrap();
let mut buf = String::from("");
file.read_to_string(&mut buf).unwrap(); //Read the file to a buffer
let reader = BufReader::new(&file);
for (index, line) in reader.lines().enumerate() { //Loop through all the lines in the file
if line.as_ref().unwrap().contains("some text") { //If the line contains "some text", execute the block
buf = buf.replace(line.as_ref().unwrap(), ""); //Replace "some text" with nothing
}
}
file.write_all(buf.as_bytes()).unwrap(); //Write the buffer back to the file
file.txt:
random text
random text
random text
some text
random text
random text
When you run the code, file.txt turns into this:
random text
random text
random text
random text
random text
Rather than just
random text
random text
random text
random text
random text
Is there any way to completely remove the line rather than just leaving it blank? Like some sort of special character?
This part is bad-news: buf = buf.replace(line.as_ref().unwrap(), "");
This is doing a search through your entire buffer to find the line contents (without '\n') and replace it with "". To make it behave as you expect you need to add back in the newline. You can just about do this by buf.replace(line.as_ref().unwrap() + "\n", "") The problem is that lines() treats more than "\n" as a newline, it also splits on "\r\n". If you know you're always using "\n" or "\r\n" as newlines you can work around this - if not you'll need something tricker than lines().
However, there is a trickier issue. For larger files, this may end up scanning through the string and resizing it many times, giving an O(N^2) style behaviour rather than the expected O(N). Also, the entire file needs to be read into memory, which can be bad for very large files.
The simplest solution to the O(N^2) and memory issues is to do your processing line-by-line, and
then move your new file into place. It would look something like this.
//Scope to ensure that the files are closed
{
let mut file: File = File::open("file.txt").unwrap();
let mut out_file: File = File::open("file.txt.temp").unwrap();
let reader = BufReader::new(&file);
let writer = BufWriter::new(&out_file);
for (index, line) in reader.lines().enumerate() {
let line = line.as_ref().unwrap();
if !line.contains("some text") {
writeln!(writer, "{}", line);
}
}
}
fs::rename("file.txt.temp", "file.txt").unwrap();
This still does not handle cross-platform newlines correctly, for that you'd need a smarter lines iterator.
Hmm could try removing the new line char in the previous line

Take numbers from a file and put into a Vec<i32> but keep getting error

Code:
use std::io::Read;
fn main() {
let mut file = std::fs::File::open("numbs").unwrap();
let mut contents = String::new();
file.read_to_string(&mut contents).unwrap();
let mut v: Vec<i32> = Vec::new();
for s in contents.lines() {
v.push(s.parse::<i32>().unwrap());
}
}
Error:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ParseIntError { kind: Empty }', src/libcore/result.rs:1165:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
Most likely, you have a trailing newline character \n at the end of your file i.e. an empty last line. You might also have empty lines in the middle of your file.
The easiest way to fix this for your use case is to just ignore empty lines:
for s in contents.lines() {
if !s.is_empty() {
v.push(s.parse::<i32>().unwrap());
}
}
However, it is generally not a good idea to just unwrap a Result especially if you cannot guarantee that it will never panic. A more robust solution is to handle each possible outcome of the Result appropriately. Another advantage of this solution is that it will not just ignore empty lines but also strings that cannot be parsed as an i32. Whether this is what you want or if you wish to handle this error explicitly is up to you. In the following example, we will use if-let to only insert values into the vector if they were successfully parsed as an i32:
for s in contents.lines() {
if let Ok(i) = s.parse::<i32>() {
v.push(i);
}
}
Side Note: You don't need to read the entire file into a string and then parse that line-by-line. Refer to Read large files line by line in Rust to see how to achieve this more idiomatically
Combining the aforementioned point and the use of flatten and flat_map, we can greatly simplify the logic to:
use std::fs::File;
use std::io::{BufRead, BufReader};
fn main() {
let file = File::open("numbs").unwrap();
let v: Vec<i32> = BufReader::new(file)
.lines()
.flatten() // gets rid of Err from lines
.flat_map(|line| line.parse::<i32>()) // ignores Err variant from Result of str.parse
.collect();
}

How to modify a Cow variable that uses itself in a loop?

I am trying to remove all the parentheses in a string. Not thinking about it too hard, I just do a simple regexp replace (i.e. the problem in question is not particularly about getting rid of arbitrary levels of nested parentheses, but feel free to suggest a better way of doing that in a comment if you want).
use regex::Regex;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let input = "Text (with some (nested) parentheses)!";
let re = Regex::new(r"\([^()]*\)")?;
let output = re.replace_all(&input, "");
let output = re.replace_all(&output, "");
// let output = re.replace_all(&output, "");
// let output = re.replace_all(&output, "");
// let output = re.replace_all(&output, "");
// let output = re.replace_all(&output, "");
// ...
assert_eq!("Text !", output);
println!("Works!");
Ok(())
}
Because I do not know how nested the parentheses will be, I need to do the replacement in a loop rather than repeating it "just enough times". Creating a loop, however, creates a new scope and that's where I'm hitting a dead point in the discussion with the borrow checker.
The simplest case that shows what I am trying to do in the loop would be:
let mut output = re.replace_all(&input, "");
while re.is_match(&output) {
output = re.replace_all(&output, "");
}
However that cannot be done because I am assigning to a borrowed variable:
error[E0506]: cannot assign to `output` because it is borrowed
--> src/main.rs:9:9
|
9 | output = re.replace_all(&output, "");
| ^^^^^^ ------- borrow of `output` occurs here
| |
| assignment to borrowed `output` occurs here
| borrow later used here
What I would like to do, ideally, is to create new variable binding with the same name, but using let output = will shadow the outer variable binding, so the loop would cycle infinitely.
No matter what inner or outer temporary variable I create I cannot make it do what I want. I also tried using the fact that re.replace_all() returns Cow and tried using .to_owned() and .to_string() in a couple of places, but that didn't help either.
Here's a link to a playground.
re.replace_all() returns Cow
This is the root of the problem. The compiler knows that the return value might reference output, but it will also replace output, causing output to be dropped right away. If it allowed this, the reference would point to unallocated memory, leading to memory unsafety.
The solution is to avoid borrowing at all.
tried using .to_owned()
to_owned on a Cow just returns the same Cow. Perhaps you meant into_owned?
let mut output = re.replace_all(&input, "").into_owned();
while re.is_match(&output) {
output = re.replace_all(&output, "").into_owned();
}
and .to_string() in a couple of places
This works as well:
let mut output = re.replace_all(&input, "").to_string();
while re.is_match(&output) {
output = re.replace_all(&output, "").to_string();
}
Shepmaster's answer works, but it's not as efficient as it could be. A subtle property of the Cow type is that by inspecting it, we can determine whether the string was modified, and skip work if it wasn't.
Due to constraints of the Rust type system, if the value was not modified then Cow::into_owned() makes a copy. (Cow::into_owned() of a modified value does not copy). (into_owned documentation)
In your use case, we can detect unmodified Cow -- Cow::Borrowed -- and skip into_owned().
let mut output = /* mutable String */;
while re.is_match(&output).unwrap() {
match re.replace_all(&output, "") {
// Unmodified -- skip copy
Cow::Borrowed(_) => {}
// replace_all() returned a new value that we already own
Cow::Owned(new) => output = new,
}
}
But we can go further. Calling both is_match() and replace_all() means the pattern is matched twice. With our new knowledge of Cows, we can optimize that away:
let mut output = /* mutable String */;
// Cow::Owned is returned when the string was modified.
while let Cow::Owned(new) = re.replace_all(&output, "") {
output = new;
}
Edit: If your input value is immutable, you can avoid the .to_string() copy by making it Cow as well:
let input = "value";
let mut output = Cow::from(input);
while let Cow::Owned(new) = re.replace_all(&output, "") {
output = Cow::Owned(new);
}

AS3 embedding CSV files and reading from them

So in a simple arcade/platformer game, I'm making it so I have a .csv text file set out like so:
660, 25, 0
720, 15, 1
etc..
The first number being the x coordinate, the next being the y coordinate and the last being whether the block kills you or not. Loading this data externally is not a problem and works fine but when it comes to actually running the .swf by itself obviously the .csv file is not embedded into it so I cannot access any values from it.
Therefore my question is: How can I embed a .csv file into my project and then read out 3 values per line into a multi dimensional array with each line denoting a different obstacle?
(The multi dimensional array being [obstacleID][0 for x coord/1 for y coord/2 for whether it kills or not])
How to embed a text file in Flash
then you can try:
var csv:embedded_csv = new embedded_csv();
var csvLines:Array = csv.toString().split("\n"); // \n or File.lineSeparator or \r\n
for(i=0; i<csvLines.length; i++)
{
line:Array = String(csvLines[i]).split(", ");
x = line[0];
y = line[1];
kills = line[2];
...
}

Need program to loop correctly

Got it.
while 1:
line = sub.readline().split()
if line == []:
new = main
break
else:
new = main.replace(line[0],line[1])
main = new
This seem to work for me. Thanks for the help =).
Try another loop and in that loop index the word swap that you need to occur:
I assume each line of sub.txt has the subs that you want. Read all the lines of sub.txt, storing each line in an indexable array. Set up a loop around your main code, and in that loop index through the array referencing, sequentially the line of sub.txt that you want each time.
As pointed out by Cameron, this method will overwrite the output file every time thus recording only your last change.
The error in this block is :
while True:
word = substitute.readline().split()
print(word)
if word == []: // ---Indentation ---
break
else:
new = (main_story.read().replace(word[0],word[1]))
new_story.write(new)
You need to read the complete file at once, make changes and write to file.
Or you could read from the first file, and then do subsequent read/writes on the new file.

Resources