I am trying to remove all the parentheses in a string. Not thinking about it too hard, I just do a simple regexp replace (i.e. the problem in question is not particularly about getting rid of arbitrary levels of nested parentheses, but feel free to suggest a better way of doing that in a comment if you want).
use regex::Regex;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let input = "Text (with some (nested) parentheses)!";
let re = Regex::new(r"\([^()]*\)")?;
let output = re.replace_all(&input, "");
let output = re.replace_all(&output, "");
// let output = re.replace_all(&output, "");
// let output = re.replace_all(&output, "");
// let output = re.replace_all(&output, "");
// let output = re.replace_all(&output, "");
// ...
assert_eq!("Text !", output);
println!("Works!");
Ok(())
}
Because I do not know how nested the parentheses will be, I need to do the replacement in a loop rather than repeating it "just enough times". Creating a loop, however, creates a new scope and that's where I'm hitting a dead point in the discussion with the borrow checker.
The simplest case that shows what I am trying to do in the loop would be:
let mut output = re.replace_all(&input, "");
while re.is_match(&output) {
output = re.replace_all(&output, "");
}
However that cannot be done because I am assigning to a borrowed variable:
error[E0506]: cannot assign to `output` because it is borrowed
--> src/main.rs:9:9
|
9 | output = re.replace_all(&output, "");
| ^^^^^^ ------- borrow of `output` occurs here
| |
| assignment to borrowed `output` occurs here
| borrow later used here
What I would like to do, ideally, is to create new variable binding with the same name, but using let output = will shadow the outer variable binding, so the loop would cycle infinitely.
No matter what inner or outer temporary variable I create I cannot make it do what I want. I also tried using the fact that re.replace_all() returns Cow and tried using .to_owned() and .to_string() in a couple of places, but that didn't help either.
Here's a link to a playground.
re.replace_all() returns Cow
This is the root of the problem. The compiler knows that the return value might reference output, but it will also replace output, causing output to be dropped right away. If it allowed this, the reference would point to unallocated memory, leading to memory unsafety.
The solution is to avoid borrowing at all.
tried using .to_owned()
to_owned on a Cow just returns the same Cow. Perhaps you meant into_owned?
let mut output = re.replace_all(&input, "").into_owned();
while re.is_match(&output) {
output = re.replace_all(&output, "").into_owned();
}
and .to_string() in a couple of places
This works as well:
let mut output = re.replace_all(&input, "").to_string();
while re.is_match(&output) {
output = re.replace_all(&output, "").to_string();
}
Shepmaster's answer works, but it's not as efficient as it could be. A subtle property of the Cow type is that by inspecting it, we can determine whether the string was modified, and skip work if it wasn't.
Due to constraints of the Rust type system, if the value was not modified then Cow::into_owned() makes a copy. (Cow::into_owned() of a modified value does not copy). (into_owned documentation)
In your use case, we can detect unmodified Cow -- Cow::Borrowed -- and skip into_owned().
let mut output = /* mutable String */;
while re.is_match(&output).unwrap() {
match re.replace_all(&output, "") {
// Unmodified -- skip copy
Cow::Borrowed(_) => {}
// replace_all() returned a new value that we already own
Cow::Owned(new) => output = new,
}
}
But we can go further. Calling both is_match() and replace_all() means the pattern is matched twice. With our new knowledge of Cows, we can optimize that away:
let mut output = /* mutable String */;
// Cow::Owned is returned when the string was modified.
while let Cow::Owned(new) = re.replace_all(&output, "") {
output = new;
}
Edit: If your input value is immutable, you can avoid the .to_string() copy by making it Cow as well:
let input = "value";
let mut output = Cow::from(input);
while let Cow::Owned(new) = re.replace_all(&output, "") {
output = Cow::Owned(new);
}
Related
So I am trying to compare user input to the lines from a separate file name fruits.txt. I got it mostly working I believe, but I am running into this error:
error[E0658]: use of unstable library feature 'option_result_contains'
--> src/main.rs:19:20
|
19 | s if s.contains(&ask) => println!("{} is a fruit!", ask),
| ^^^^^^^^
|
= note: see issue #62358 <https://github.com/rust-lang/rust/issues/62358> for more information
For more information about this error, try `rustc --explain E0658`.
error: could not compile `learn_arrays` due to previous error
I have tried several types of ways to match it in rust and this is the closest where it doesn't complain that I am trying to match a string to whatever type lines is. here is what it looks like
use std::fs::File;
use std::io::{BufReader, BufRead, Error, stdin};
fn main() -> Result<(), Error>{
let path = "fruits.txt";
let input = File::open(path)?;
let buffered = BufReader::new(input);
let mut ask = String::new();
stdin()
.read_line(&mut ask)
.expect("Failed to read line");
let ask: String = ask.trim().parse().expect("Please type a valid string!");
for line in buffered.lines() {
match line {
s if s.contains(&ask) => println!("{} is a fruit!", ask),
_ => println!("{} is either not in the list or not a fruit", ask),
}
}
Ok(())
}
Is there a way where I can use the unstable feature or is there another better method to compare user input to lines from a file.
I was able to fix the issue my changing the part where I am attempting to match the input with:
let mut found = false;
println!("Result");
for line in buffered.lines() {
let s = line.unwrap();
if s.find(&ask).is_some() {
println!("{} is a fruit!", ask);
found = true;
break;
}
}
if !found {
println!("{} is either not in the list or not a fruit", ask)
}
Code:
use std::io::Read;
fn main() {
let mut file = std::fs::File::open("numbs").unwrap();
let mut contents = String::new();
file.read_to_string(&mut contents).unwrap();
let mut v: Vec<i32> = Vec::new();
for s in contents.lines() {
v.push(s.parse::<i32>().unwrap());
}
}
Error:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ParseIntError { kind: Empty }', src/libcore/result.rs:1165:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
Most likely, you have a trailing newline character \n at the end of your file i.e. an empty last line. You might also have empty lines in the middle of your file.
The easiest way to fix this for your use case is to just ignore empty lines:
for s in contents.lines() {
if !s.is_empty() {
v.push(s.parse::<i32>().unwrap());
}
}
However, it is generally not a good idea to just unwrap a Result especially if you cannot guarantee that it will never panic. A more robust solution is to handle each possible outcome of the Result appropriately. Another advantage of this solution is that it will not just ignore empty lines but also strings that cannot be parsed as an i32. Whether this is what you want or if you wish to handle this error explicitly is up to you. In the following example, we will use if-let to only insert values into the vector if they were successfully parsed as an i32:
for s in contents.lines() {
if let Ok(i) = s.parse::<i32>() {
v.push(i);
}
}
Side Note: You don't need to read the entire file into a string and then parse that line-by-line. Refer to Read large files line by line in Rust to see how to achieve this more idiomatically
Combining the aforementioned point and the use of flatten and flat_map, we can greatly simplify the logic to:
use std::fs::File;
use std::io::{BufRead, BufReader};
fn main() {
let file = File::open("numbs").unwrap();
let v: Vec<i32> = BufReader::new(file)
.lines()
.flatten() // gets rid of Err from lines
.flat_map(|line| line.parse::<i32>()) // ignores Err variant from Result of str.parse
.collect();
}
I have text file which contains 18000 lines which have cities names. Each line has city name, state, latitude, longitude etc. Below is the function which does that, if i don't implement string.components(separtedBy: ", ") loading function is pretty fast but with it implemented it takes time which makes my UI freeze. What is the right way of doing it? Is string.components(separtedBy: ", ") that costly?
I profiled the app, this line is taking string.components(separtedBy: ", ") 1.45s out of 2.09s in whole function.
func readCitiesFromCountry(country: String) -> [String] {
var cityArray: [String] = []
var flag = true
var returnedCitiesList: [String] = []
if let path = Bundle.main.path(forResource: country, ofType: "txt") {
guard let streamReader = StreamReader(path: path) else {fatalError()}
defer {
streamReader.close()
}
while flag {
if let nextLine = streamReader.nextLine() {
cityArray = nextLine.components(separatedBy: ",") // this is the line taking a lot of time, without this function runs pretty fast
if (country == "USA") {
returnedCitiesList.append("\(cityArray[0]) , \(cityArray[1]) , \(cityArray[2])")
} else {
returnedCitiesList.append("\(cityArray[0]) , \(cityArray[1])")
}
//returnedCitiesList.append(nextLine)
} else {
flag = false
}
}
} else {
fatalError()
}
return returnedCitiesList
}
StreamReader used in the code can be found here. It helps to read file line by line
Read a file/URL line-by-line in Swift
This question is not about how to split the string into array Split a String into an array in Swift? , rather why splitting is taking more time in the given function.
NSString.components(separatedBy:) returns a [String], which requires that all of the pieces' content be copied, from the original string, and pasted into new-ly allocated stringss. This slows things down.
You could address the symptoms (UI freezing) by putting this work on a background thread, but that just sweeps the problem under the wrong (the inefficient copying is still there), and complicates things (async code is never fun).
Instead, you should consider using String.split(separator:maxSplits:omittingEmptySubsequences:), which returns [Substring]. Each Substring is just a view into the original string's memory, which stores the relevant range so that you only see that portion of the String which is modeled by the Substring. The only memory allocation happening here is for the array.
Hopefully that should be enough to speed your code up to acceptable levels. If not, you should combine both solutions, and use split off-thread.
I'm trying to implement a commonly used pattern - using the result of a previous loop iteration in the next loop iteration. For example, to implement pagination where you need to give the id of the last value on the previous page.
struct Result {
str: String,
}
fn main() {
let times = 10;
let mut last: Option<&str> = None;
for i in 0..times {
let current = do_something(last);
last = match current {
Some(r) => Some(&r.str.to_owned()),
None => None,
};
}
}
fn do_something(o: Option<&str>) -> Option<Result> {
Some(Result {
str: "whatever string".to_string(),
})
}
However, I'm not sure how to actually get the value out of the loop. Currently, the compiler error is temporary value dropped while borrowed (at &r.str.to_owned()), though I made many other attempts, but to no avail.
The only way I found to actually get it working is to create some sort of local tmp_str variable and do a hack like this:
match current {
Some(r) => {
tmp_str.clone_from(&r.str);
last = Some(&tmp_str);
}
None => {
last = None;
}
}
But that doesn't feel like it's the way it's supposed to be done.
In your code, it remains unclear who the owner of the String referenced in last: Option<&str> is supposed to be. You could introduce an extra mutable local variable that owns the string. But then you would have two variables: the owner and the reference, which seems redundant. It would be much simpler to just make last the owner:
struct MyRes {
str: String,
}
fn main() {
let times = 10;
let mut last: Option<String> = None;
for _i in 0..times {
last = do_something(&last).map(|r| r.str);
}
}
fn do_something(_o: &Option<String>) -> Option<MyRes> {
Some(MyRes {
str: "whatever string".to_string(),
})
}
In do_something, you can just pass the whole argument by reference, this seems more likely to be what you wanted. Also note that naming your own struct Result is a bad idea, because Result is such a pervasive trait built deeply into the compiler (?-operator etc).
Follow-up question: Option<&str> or Option<String>?
Both Option<&str> and Option<String> have different trade-offs. One is better for passing string literals, other is better for passing owned Strings. I'd actually propose to use neither, and instead make the function generic over type S that implements AsRef<str>. Here is a comparison of various methods:
fn do_something(o: &Option<String>) {
let _a: Option<&str> = o.as_ref().map(|r| &**r);
let _b: Option<String> = o.clone();
}
fn do_something2(o: &Option<&str>) {
let _a: Option<&str> = o.clone(); // do you need it?
let _b: Option<String> = o.map(|r| r.to_string());
}
fn do_something3<S: AsRef<str>>(o: &Option<S>) {
let _a: Option<&str> = o.as_ref().map(|s| s.as_ref());
let _b: Option<String> = o.as_ref().map(|r| r.as_ref().to_string());
}
fn main() {
let x: Option<String> = None;
let y: Option<&str> = None;
do_something(&x); // nice
do_something(&y.map(|r| r.to_string())); // awkward & expensive
do_something2(&x.as_ref().map(|x| &**x)); // cheap but awkward
do_something2(&y); // nice
do_something3(&x); // nice
do_something3(&y); // nice, in both cases
}
Note that not all of the above combinations are very idiomatic, some are added just for completeness (e.g. asking for AsRef<str> and then building an owned String out of seems a bit strange).
r.str.to_owned() is a temporary value. You can take a reference to a temporary, but because the temporary value will usually be dropped (destroyed) at the end of the innermost enclosing statement, the reference becomes dangling at that point. In this case the "innermost enclosing statement" is either the last line of the loop, or the loop body itself -- I'm not sure exactly which one applies here, but it doesn't matter, because either way, you're trying to make last contain a reference to a String that will soon be dropped, making last unusable. The compiler is right to stop you from using it again in the next iteration of the loop.
The easiest fix is just to not make last a reference at all -- in the example, it's not necessary or desirable. Just use Option<String>:
fn main() {
let times = 10;
let mut last = None;
for _ in 0..times {
last = match do_something(last) {
Some(r) => Some(r.str),
None => None,
};
}
}
fn do_something(_: Option<String>) -> Option<Result> {
// ...
}
There are also ways to make the reference version work; here is one:
let mut current; // lift this declaration out of the loop so `current` will have
// a lifetime longer than one iteration
for _ in 0..times {
current = do_something(last);
last = match current {
Some(ref r) => Some(&r.str), // borrow from `current` in the loop instead
// of from a newly created String
None => None,
};
}
You might want to do this if your code is more complicated than the example and using String would mean a lot of potentially expensive .clone()s.
I have a fixed size array of Strings: [String; 2]. I want to turn it into a (String, String). Can I do this without copying the values?
The piece of code that I'm working on in particular is the following:
let (basis, names_0, names_1) = if let Some(names) = self.arg_name {
(ComparisonBasis::Name, names[0], names[1])
} else {
(ComparisonBasis::File, self.arg_file[0], self.arg_file[1])
};
types:
self.arg_name: Option<[String; 2]>
self.arg_file: Vec<String>
Right now I'm getting errors
cannot move out of type `[std::string::String; 2]`, a non-copy fixed-size array [E0508]
and
cannot move out of indexed content [E0507]
for the two arms of the if
You've omitted a fair amount of context, so I'm taking a guess at a few aspects. I'm also hewing a little closer to the question you asked, rather than the vaguer one implied by your snippets.
struct NeverSpecified {
arg_names: Option<[String; 2]>,
arg_file: Vec<String>,
}
impl NeverSpecified {
fn some_method_i_guess(mut self) -> (String, String) {
if let Some(mut names) = self.arg_names {
use std::mem::replace;
let name_0 = replace(&mut names[0], String::new());
let name_1 = replace(&mut names[1], String::new());
(name_0, name_1)
} else {
let mut names = self.arg_file.drain(0..2);
let name_0 = names.next().expect("expected 2 names, got 0");
let name_1 = names.next().expect("expected 2 names, got 1");
(name_0, name_1)
}
}
}
I use std::mem::replace to switch the contents of the array, whilst leaving it in a valid state. This is necessary because Rust won't allow you to have a "partially valid" array. There are no copies or allocations involved in this path.
In the other path, we have to pull elements out of the vector by hand. Again, you can't just move values out of a container via indexing (this is actually a limitation of indexing overall). Instead, I use Vec::drain to essentially chop the first two elements out of the vector, then extract them from the resulting iterator. To be clear: this path doesn't involve any copies or allocations, either.
As an aside, those expect methods shouldn't ever be triggered (since drain does bounds checking), but better paranoid than sorry; if you want to replace them with unwrap() calls instead, that should be fine..
Since Rust 1.36, you can use slice patterns to bind to all the values of the array at once:
struct NeverSpecified {
arg_names: Option<[String; 2]>,
arg_file: Vec<String>,
}
impl NeverSpecified {
fn some_method_i_guess(mut self) -> (String, String) {
if let Some([name_0, name_1]) = self.arg_names.take() {
(name_0, name_1)
} else {
let mut names = self.arg_file.drain(0..2);
let name_0 = names.next().expect("expected 2 names, got 0");
let name_1 = names.next().expect("expected 2 names, got 1");
(name_0, name_1)
}
}
}
See also:
Method for safely moving all elements out of a generic array into a tuple with minimal overhead