Rust Programming - Trying to compare user input to lines from a file - file

So I am trying to compare user input to the lines from a separate file name fruits.txt. I got it mostly working I believe, but I am running into this error:
error[E0658]: use of unstable library feature 'option_result_contains'
--> src/main.rs:19:20
|
19 | s if s.contains(&ask) => println!("{} is a fruit!", ask),
| ^^^^^^^^
|
= note: see issue #62358 <https://github.com/rust-lang/rust/issues/62358> for more information
For more information about this error, try `rustc --explain E0658`.
error: could not compile `learn_arrays` due to previous error
I have tried several types of ways to match it in rust and this is the closest where it doesn't complain that I am trying to match a string to whatever type lines is. here is what it looks like
use std::fs::File;
use std::io::{BufReader, BufRead, Error, stdin};
fn main() -> Result<(), Error>{
let path = "fruits.txt";
let input = File::open(path)?;
let buffered = BufReader::new(input);
let mut ask = String::new();
stdin()
.read_line(&mut ask)
.expect("Failed to read line");
let ask: String = ask.trim().parse().expect("Please type a valid string!");
for line in buffered.lines() {
match line {
s if s.contains(&ask) => println!("{} is a fruit!", ask),
_ => println!("{} is either not in the list or not a fruit", ask),
}
}
Ok(())
}
Is there a way where I can use the unstable feature or is there another better method to compare user input to lines from a file.

I was able to fix the issue my changing the part where I am attempting to match the input with:
let mut found = false;
println!("Result");
for line in buffered.lines() {
let s = line.unwrap();
if s.find(&ask).is_some() {
println!("{} is a fruit!", ask);
found = true;
break;
}
}
if !found {
println!("{} is either not in the list or not a fruit", ask)
}

Related

Rust - open dynamic number of writers

let's say I have a dynamic number of input strings from a file (barcodes).
I want to split up a huge 111GB text file based upon matches to the input strings, and write those hits to files.
I don't know how many inputs to expect.
I have done all the file input and string matching, but am stuck at the output step.
Ideally, I would open a file for each input in the input vector barcodes, just containing strings. Are there any approaches to open a dynamic number of output files?
A suboptimal approach is searching for a barcode string as an input arg, but this means I have to read the huge file repeatedly.
The barcode input vector just contains strings, eg
"TAGAGTAT",
"TAGAGTAG",
Ideally, output should look like this if the previous two strings are input
file1 -> TAGAGTAT.txt
file2 -> TAGAGTAG.txt
Thanks for your help.
extern crate needletail;
use needletail::{parse_fastx_file, Sequence, FastxReader};
use std::str;
use std::fs::File;
use std::io::prelude::*;
use std::path::Path;
fn read_barcodes () -> Vec<String> {
// TODO - can replace this with file reading code (OR move to an arguments based model, parse and demultiplex only one oligomer at a time..... )
// The `vec!` macro can be used to initialize a vector or strings
let barcodes = vec![
"TCTCAAAG".to_string(),
"AACTCCGC".into(),
"TAAACGCG".into()
];
println!("Initial vector: {:?}", barcodes);
return barcodes
}
fn main() {
//let filename = "test5m.fastq";
let filename = "Undetermined_S0_R1.fastq";
println!("Fastq filename: {} ", filename);
//println!("Barcodes filename: {} ", barcodes_filename);
let barcodes_vector: Vec<String> = read_barcodes();
let mut counts_vector: [i32; 30] = [0; 30];
let mut n_bases = 0;
let mut n_valid_kmers = 0;
let mut reader = parse_fastx_file(&filename).expect("Not a valid path/file");
while let Some(record) = reader.next() {
let seqrec = record.expect("invalid record");
// get sequence
let sequenceBytes = seqrec.normalize(false);
let sequenceText = str::from_utf8(&sequenceBytes).unwrap();
//println!("Seq: {} ", &sequenceText);
// get first 8 chars (8chars x 2 bytes)
let sequenceOligo = &sequenceText[0..8];
//println!("barcode vector {}, seqOligo {} ", &barcodes_vector[0], sequenceOligo);
if sequenceOligo == barcodes_vector[0]{
//println!("Hit ! Barcode vector {}, seqOligo {} ", &barcodes_vector[0], sequenceOligo);
counts_vector[0] = counts_vector[0] + 1;
}
You probably want a HashMap<String, File>. You could build it from your barcode vector like this:
use std::collections::HashMap;
use std::fs::File;
use std::path::Path;
fn build_file_map(barcodes: &[String]) -> HashMap<String, File> {
let mut files = HashMap::new();
for barcode in barcodes {
let filename = Path::new(barcode).with_extension("txt");
let file = File::create(filename).expect("failed to create output file");
files.insert(barcode.clone(), file);
}
files
}
You would call it like this:
let barcodes = vec!["TCTCAAAG".to_string(), "AACTCCGC".into(), "TAAACGCG".into()];
let file_map = build_file_map(&barcodes);
And you would get a file to write to like this:
let barcode = barcodes[0];
let file = file_map.get(&barcode).expect("barcode not in file map");
// write to file
I just need an example of a) how to properly instantiate a vector of files named after the relevant string b) setup the output file objects properly c) write to those files.
Here's a commented example:
use std::io::Write;
use std::fs::File;
use std::io;
fn read_barcodes() -> Vec<String> {
// read barcodes here
todo!()
}
fn process_barcode(barcode: &str) -> String {
// process barcodes here
todo!()
}
fn main() -> io::Result<()> {
let barcodes = read_barcodes();
for barcode in barcodes {
// process barcode to get output
let output = process_barcode(&barcode);
// create file for barcode with {barcode}.txt name
let mut file = File::create(format!("{}.txt", barcode))?;
// write output to created file
file.write_all(output.as_bytes());
}
Ok(())
}

Take numbers from a file and put into a Vec<i32> but keep getting error

Code:
use std::io::Read;
fn main() {
let mut file = std::fs::File::open("numbs").unwrap();
let mut contents = String::new();
file.read_to_string(&mut contents).unwrap();
let mut v: Vec<i32> = Vec::new();
for s in contents.lines() {
v.push(s.parse::<i32>().unwrap());
}
}
Error:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ParseIntError { kind: Empty }', src/libcore/result.rs:1165:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
Most likely, you have a trailing newline character \n at the end of your file i.e. an empty last line. You might also have empty lines in the middle of your file.
The easiest way to fix this for your use case is to just ignore empty lines:
for s in contents.lines() {
if !s.is_empty() {
v.push(s.parse::<i32>().unwrap());
}
}
However, it is generally not a good idea to just unwrap a Result especially if you cannot guarantee that it will never panic. A more robust solution is to handle each possible outcome of the Result appropriately. Another advantage of this solution is that it will not just ignore empty lines but also strings that cannot be parsed as an i32. Whether this is what you want or if you wish to handle this error explicitly is up to you. In the following example, we will use if-let to only insert values into the vector if they were successfully parsed as an i32:
for s in contents.lines() {
if let Ok(i) = s.parse::<i32>() {
v.push(i);
}
}
Side Note: You don't need to read the entire file into a string and then parse that line-by-line. Refer to Read large files line by line in Rust to see how to achieve this more idiomatically
Combining the aforementioned point and the use of flatten and flat_map, we can greatly simplify the logic to:
use std::fs::File;
use std::io::{BufRead, BufReader};
fn main() {
let file = File::open("numbs").unwrap();
let v: Vec<i32> = BufReader::new(file)
.lines()
.flatten() // gets rid of Err from lines
.flat_map(|line| line.parse::<i32>()) // ignores Err variant from Result of str.parse
.collect();
}

How to modify a Cow variable that uses itself in a loop?

I am trying to remove all the parentheses in a string. Not thinking about it too hard, I just do a simple regexp replace (i.e. the problem in question is not particularly about getting rid of arbitrary levels of nested parentheses, but feel free to suggest a better way of doing that in a comment if you want).
use regex::Regex;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let input = "Text (with some (nested) parentheses)!";
let re = Regex::new(r"\([^()]*\)")?;
let output = re.replace_all(&input, "");
let output = re.replace_all(&output, "");
// let output = re.replace_all(&output, "");
// let output = re.replace_all(&output, "");
// let output = re.replace_all(&output, "");
// let output = re.replace_all(&output, "");
// ...
assert_eq!("Text !", output);
println!("Works!");
Ok(())
}
Because I do not know how nested the parentheses will be, I need to do the replacement in a loop rather than repeating it "just enough times". Creating a loop, however, creates a new scope and that's where I'm hitting a dead point in the discussion with the borrow checker.
The simplest case that shows what I am trying to do in the loop would be:
let mut output = re.replace_all(&input, "");
while re.is_match(&output) {
output = re.replace_all(&output, "");
}
However that cannot be done because I am assigning to a borrowed variable:
error[E0506]: cannot assign to `output` because it is borrowed
--> src/main.rs:9:9
|
9 | output = re.replace_all(&output, "");
| ^^^^^^ ------- borrow of `output` occurs here
| |
| assignment to borrowed `output` occurs here
| borrow later used here
What I would like to do, ideally, is to create new variable binding with the same name, but using let output = will shadow the outer variable binding, so the loop would cycle infinitely.
No matter what inner or outer temporary variable I create I cannot make it do what I want. I also tried using the fact that re.replace_all() returns Cow and tried using .to_owned() and .to_string() in a couple of places, but that didn't help either.
Here's a link to a playground.
re.replace_all() returns Cow
This is the root of the problem. The compiler knows that the return value might reference output, but it will also replace output, causing output to be dropped right away. If it allowed this, the reference would point to unallocated memory, leading to memory unsafety.
The solution is to avoid borrowing at all.
tried using .to_owned()
to_owned on a Cow just returns the same Cow. Perhaps you meant into_owned?
let mut output = re.replace_all(&input, "").into_owned();
while re.is_match(&output) {
output = re.replace_all(&output, "").into_owned();
}
and .to_string() in a couple of places
This works as well:
let mut output = re.replace_all(&input, "").to_string();
while re.is_match(&output) {
output = re.replace_all(&output, "").to_string();
}
Shepmaster's answer works, but it's not as efficient as it could be. A subtle property of the Cow type is that by inspecting it, we can determine whether the string was modified, and skip work if it wasn't.
Due to constraints of the Rust type system, if the value was not modified then Cow::into_owned() makes a copy. (Cow::into_owned() of a modified value does not copy). (into_owned documentation)
In your use case, we can detect unmodified Cow -- Cow::Borrowed -- and skip into_owned().
let mut output = /* mutable String */;
while re.is_match(&output).unwrap() {
match re.replace_all(&output, "") {
// Unmodified -- skip copy
Cow::Borrowed(_) => {}
// replace_all() returned a new value that we already own
Cow::Owned(new) => output = new,
}
}
But we can go further. Calling both is_match() and replace_all() means the pattern is matched twice. With our new knowledge of Cows, we can optimize that away:
let mut output = /* mutable String */;
// Cow::Owned is returned when the string was modified.
while let Cow::Owned(new) = re.replace_all(&output, "") {
output = new;
}
Edit: If your input value is immutable, you can avoid the .to_string() copy by making it Cow as well:
let input = "value";
let mut output = Cow::from(input);
while let Cow::Owned(new) = re.replace_all(&output, "") {
output = Cow::Owned(new);
}

How do I move String values from an array to a tuple without copying?

I have a fixed size array of Strings: [String; 2]. I want to turn it into a (String, String). Can I do this without copying the values?
The piece of code that I'm working on in particular is the following:
let (basis, names_0, names_1) = if let Some(names) = self.arg_name {
(ComparisonBasis::Name, names[0], names[1])
} else {
(ComparisonBasis::File, self.arg_file[0], self.arg_file[1])
};
types:
self.arg_name: Option<[String; 2]>
self.arg_file: Vec<String>
Right now I'm getting errors
cannot move out of type `[std::string::String; 2]`, a non-copy fixed-size array [E0508]
and
cannot move out of indexed content [E0507]
for the two arms of the if
You've omitted a fair amount of context, so I'm taking a guess at a few aspects. I'm also hewing a little closer to the question you asked, rather than the vaguer one implied by your snippets.
struct NeverSpecified {
arg_names: Option<[String; 2]>,
arg_file: Vec<String>,
}
impl NeverSpecified {
fn some_method_i_guess(mut self) -> (String, String) {
if let Some(mut names) = self.arg_names {
use std::mem::replace;
let name_0 = replace(&mut names[0], String::new());
let name_1 = replace(&mut names[1], String::new());
(name_0, name_1)
} else {
let mut names = self.arg_file.drain(0..2);
let name_0 = names.next().expect("expected 2 names, got 0");
let name_1 = names.next().expect("expected 2 names, got 1");
(name_0, name_1)
}
}
}
I use std::mem::replace to switch the contents of the array, whilst leaving it in a valid state. This is necessary because Rust won't allow you to have a "partially valid" array. There are no copies or allocations involved in this path.
In the other path, we have to pull elements out of the vector by hand. Again, you can't just move values out of a container via indexing (this is actually a limitation of indexing overall). Instead, I use Vec::drain to essentially chop the first two elements out of the vector, then extract them from the resulting iterator. To be clear: this path doesn't involve any copies or allocations, either.
As an aside, those expect methods shouldn't ever be triggered (since drain does bounds checking), but better paranoid than sorry; if you want to replace them with unwrap() calls instead, that should be fine..
Since Rust 1.36, you can use slice patterns to bind to all the values of the array at once:
struct NeverSpecified {
arg_names: Option<[String; 2]>,
arg_file: Vec<String>,
}
impl NeverSpecified {
fn some_method_i_guess(mut self) -> (String, String) {
if let Some([name_0, name_1]) = self.arg_names.take() {
(name_0, name_1)
} else {
let mut names = self.arg_file.drain(0..2);
let name_0 = names.next().expect("expected 2 names, got 0");
let name_1 = names.next().expect("expected 2 names, got 1");
(name_0, name_1)
}
}
}
See also:
Method for safely moving all elements out of a generic array into a tuple with minimal overhead

Writing to a file or stdout in Rust

I'm learning Rust, and I'm somewhat stumped.
I'm trying to give the user the option of writing output to stdout or to a supplied filename.
I started with the example code that's given for using extra::getopts located here. From there, in the do_work function, I'm trying to do this:
use std::io::stdio::stdout;
use std::io::buffered::BufferedWriter;
fn do_work( input: &str, out: Option<~str> ) {
println!( "Input: {}", input );
println!( "Output: {}", match out {
Some(x) => x,
None => ~"Using stdout"
} );
let out_writer = BufferedWriter::new( match out {
// I know that unwrap is frowned upon,
// but for now I don't want to deal with the Option.
Some(x) => File::create( &Path::new( x ) ).unwrap(),
None => stdout()
} );
out_writer.write( bytes!( "Test output\n" ) );
}
But it outputs the following error:
test.rs:25:43: 28:6 error: match arms have incompatible types: expected `std::io::fs::File` but found `std::io::stdio::StdWriter` (expected struct std::io::fs::File but found struct std::io::stdio::StdWriter)
test.rs:25 let out_writer = BufferedWriter::new( match out {
test.rs:26 Some(x) => File::create( &Path::new( x ) ).unwrap(),
test.rs:27 None => stdout()
test.rs:28 } );
test.rs:25:22: 25:41 error: failed to find an implementation of trait std::io::Writer for [type error]
test.rs:25 let out_writer = BufferedWriter::new( match out {
^~~~~~~~~~~~~~~~~~~
But I don't understand what the issue is because both File and StdWriter implement the Writer Trait. Can someone explain what I'm doing wrong?
Thanks!
A lot has changed in Rust since 2014, so here is an answer that works for me using Rust 1.15.1:
let out_writer = match out {
Some(x) => {
let path = Path::new(x);
Box::new(File::create(&path).unwrap()) as Box<dyn Write>
}
None => Box::new(io::stdout()) as Box<dyn Write>,
};
This is pretty much the same as #Arjan's answer, except that ~ was replaced by Box, and some names have changed. I'm leaving out BufferedWriter, but if you want that, I believe it is now named BufWriter.
Yes, both implement Write, but the problem is BufWriter is expecting a type T that implements Writer, and that T can't be File and Stdout at the same time.
You must cast both to the common type (either Box<dyn Write> or &dyn Write, but since you cannot return references you have to use Box):
fn do_work(input: &str, out: Option<String>) {
let mut out_writer: Box<dyn Write> = BufWriter::new(match out {
Some(ref x) => Box::new(File::create(&Path::new(x)).unwrap()),
None => Box::new(stdout()),
});
out_writer.write(b"Test output\n").unwrap();
}
You should also handle errors properly, not just using unwrap (used in example for simplicity).

Resources