How do I completely remove a line in Rust? Not just replace it with an empty line.
In Rust, when you delete a line from a file with the following code as an example:
let mut file: File = File::open("file.txt").unwrap();
let mut buf = String::from("");
file.read_to_string(&mut buf).unwrap(); //Read the file to a buffer
let reader = BufReader::new(&file);
for (index, line) in reader.lines().enumerate() { //Loop through all the lines in the file
if line.as_ref().unwrap().contains("some text") { //If the line contains "some text", execute the block
buf = buf.replace(line.as_ref().unwrap(), ""); //Replace "some text" with nothing
}
}
file.write_all(buf.as_bytes()).unwrap(); //Write the buffer back to the file
file.txt:
random text
random text
random text
some text
random text
random text
When you run the code, file.txt turns into this:
random text
random text
random text
random text
random text
Rather than just
random text
random text
random text
random text
random text
Is there any way to completely remove the line rather than just leaving it blank? Like some sort of special character?
This part is bad-news: buf = buf.replace(line.as_ref().unwrap(), "");
This is doing a search through your entire buffer to find the line contents (without '\n') and replace it with "". To make it behave as you expect you need to add back in the newline. You can just about do this by buf.replace(line.as_ref().unwrap() + "\n", "") The problem is that lines() treats more than "\n" as a newline, it also splits on "\r\n". If you know you're always using "\n" or "\r\n" as newlines you can work around this - if not you'll need something tricker than lines().
However, there is a trickier issue. For larger files, this may end up scanning through the string and resizing it many times, giving an O(N^2) style behaviour rather than the expected O(N). Also, the entire file needs to be read into memory, which can be bad for very large files.
The simplest solution to the O(N^2) and memory issues is to do your processing line-by-line, and
then move your new file into place. It would look something like this.
//Scope to ensure that the files are closed
{
let mut file: File = File::open("file.txt").unwrap();
let mut out_file: File = File::open("file.txt.temp").unwrap();
let reader = BufReader::new(&file);
let writer = BufWriter::new(&out_file);
for (index, line) in reader.lines().enumerate() {
let line = line.as_ref().unwrap();
if !line.contains("some text") {
writeln!(writer, "{}", line);
}
}
}
fs::rename("file.txt.temp", "file.txt").unwrap();
This still does not handle cross-platform newlines correctly, for that you'd need a smarter lines iterator.
Hmm could try removing the new line char in the previous line
Related
I'm trying to get a random line from a file:
extern crate rand;
use rand::Rng;
use std::{
fs::File,
io::{prelude::*, BufReader},
};
const FILENAME: &str = "/etc/hosts";
fn find_word() -> String {
let f = File::open(FILENAME).expect(&format!("(;_;) file not found: {}", FILENAME));
let f = BufReader::new(f);
let lines: Vec<_> = f.lines().collect();
let n = rand::thread_rng().gen_range(0, lines.len());
let line = lines
.get(n)
.expect(&format!("(;_;) Couldn't get {}th line", n))
.unwrap_or(String::from(""));
line
}
This code doesn't work:
error[E0507]: cannot move out of borrowed content
--> src/main.rs:18:16
|
18 | let line = lines
| ________________^
19 | | .get(n)
20 | | .expect(&format!("(;_;) Couldn't get {}th line", n))
| |____________________________________________________________^ cannot move out of borrowed content
I tried adding .clone() before .expect(...) and before .unwrap_or(...) but it gave the same error.
Is there a better way to get a random line from a file that doesn't involve collecting the whole file in a Vec?
Use IteratorRandom::choose to randomly sample from an iterator using reservoir sampling. This will scan through the entire file once, creating Strings for each line, but it will not create a giant vector for every line:
use rand::seq::IteratorRandom; // 0.7.3
use std::{
fs::File,
io::{BufRead, BufReader},
};
const FILENAME: &str = "/etc/hosts";
fn find_word() -> String {
let f = File::open(FILENAME)
.unwrap_or_else(|e| panic!("(;_;) file not found: {}: {}", FILENAME, e));
let f = BufReader::new(f);
let lines = f.lines().map(|l| l.expect("Couldn't read line"));
lines
.choose(&mut rand::thread_rng())
.expect("File had no lines")
}
Your original problem is that:
slice::get returns an optional reference into the vector.
You can either clone this or take ownership of the value:
let line = lines[n].cloned()
let line = lines.swap_remove(n)
Both of these panic if n is out-of-bounds, which is reasonable here as you know that you are in bounds.
BufRead::lines returns io::Result<String>, so you have to handle that error case.
Additionally, don't use format! with expect:
expect(&format!("..."))
This will unconditionally allocate memory. When there's no failure, that allocation is wasted. Use unwrap_or_else as shown.
Is there a better way to get a random line from a file that doesn't involve collecting the whole file in a Vec?
You will always need to read the whole file, if only to know the number of lines. However, you don't need to store everything in memory, you can read lines one by one and discard them as you go so that you only keep one in the end. Here is how it goes:
Read and store the first line;
Read the second line, draw a random choice and either:
keep the first line with a probability of 50%,
or discard the first line and store the second line with a probability of 50%,
Keep reading lines from the file and for line number n, draw a random choice and:
keep the currently stored line with a probability of (n-1)/n,
or replace the currently stored line with the current line with a probability of 1/n.
Note that this is more or less what sample_iter does, except that sample_iter is more generic since it can work on any iterator and it can pick samples of any size (eg. it can choose k items randomly).
I have details like below in an array. There will be plenty of testbed details in actual case. I want to grep a particular testbed(TESTBED = vApp_eprapot_icr) and an infomation like below should get copied to another array. How can I do it using perl ? End of Testbed info can be understood by a closing flower bracket }.
TESTBED = vApp_eprapot_icr {
DEVICE = vApp_eprapot_icr-ipos1
DEVICE = vApp_eprapot_icr-ipos2
DEVICE = vApp_eprapot_icr-ipos3
DEVICE = vApp_eprapot_icr-ipos5
CARDS=1GIGE,ETHFAST
CARDS=3GIGE,ETHFAST
CARDS=10PGIGE,ETHFAST
CARDS=20PGIGE,ETHFAST
CARDS=40PGIGE,ETHFAST
CARDS=ETHFAST,ETHFAST
CARDS=10GIGE,ETHFAST
CARDS=ETH,ETHFAST
CARDS=10P10GIGE,ETHFAST
CARDS=PPA2GIGE,ETHFAST
CARDS=ETH,ETHFAST,ETHGIGE
}
I will make it simpler, please see the below array
#array("
student=Amit {
Age=20
sex=male
rollno=201
}
student=Akshaya {
Age=24
phone:88665544
sex=female
rollno=407
}
student=Akash {
Age=23
sex=male
rollno=356
address=na
phone=88456789
}
");
Consider an array like this. Where such entries are plenty. I need to grep, for an example student=Akshaya's data. from the opening '{' to closing '}' all info should get copied to another array. This is what I'm looking for.
while (<>) {
print if /TESTBED = vApp_eprapot_icr/../\}/;
}
as a sidenote <> will capture the filename you use on cmdline. So if the data is stored in a file you will run from commandline
perl scriptname.pl filename.txt
Ok. We finally have enough information to come up with an answer. Or, at least, to produce two answers which will work on slightly different versions of your input file.
In a comment you say that you are creating your array like this:
#array = `cat $file`;
That's not a very good idea for a couple of reasons. Firstly, why run an external command like cat when Perl will read the file for you. And secondly, this gives you one element in your array for each line in your input file. Things become far easier if you arrange it so that each of your TESTBED = foo { ... } records is a single array element.
Let's get rid of the cat first. The easiest way to read a single file into an array is to use the file input operator - <>. That will read data from the file whose name is given on the command line. So if you call your program filter_records, you can call it like this:
$ ./filter_records your_input_data.txt
And then read it into an array like this:
#array = <>;
That's good, but we still have each line of the input file in its own array element. How we fix that depends on the exact format of your input file. It's easiest if there's a blank line between each record in the input file, so it looks like this:
student=Amit {
Age=20
sex=male
rollno=201
}
student=Akshaya {
Age=24
phone:88665544
sex=female
rollno=407
}
student=Akash {
Age=23
sex=male
rollno=356
address=na
phone=88456789
}
Perl has a special variable called $/ which controls how it reads records from input files. If we set it to be an empty string then Perl goes into "paragraph" mode and it uses blank lines to delimit records. So we can write code like this:
{
local $/ = '';
#array = <>;
}
Note that it's always a good idea to localise changes to Perl's special variables, which is why I have enclosed the whole thing in a naked block.
If there are no blank lines, then things get slightly harder. We'll read the whole file in and then split it.
Here's our example file with no blank lines:
student=Amit {
Age=20
sex=male
rollno=201
}
student=Akshaya {
Age=24
phone:88665544
sex=female
rollno=407
}
student=Akash {
Age=23
sex=male
rollno=356
address=na
phone=88456789
}
And here's the code we use to read that data into an array.
{
local $/;
$data = <>;
}
#array = split /(?<=^})\n/m, $data;
This time, we've set $/ to undef which means that all of the data has been read from the file. We then split the data wherever we find a newline that is preceded by a } on a line by itself.
Whichever of the two solutions above that we use, we end up with an array which (for our sample data) has three elements - one for each of the records in our data file. It's then simple to use Perl's grep to filter that array in various ways:
# All students whose names start with 'Ak'
#filtered_array = grep { /student=Ak/ } #array;
If you use similar techniques on your original data file, then you can get the records that you are interested in with code like this:
#filtered_array = grep { /TESTBED = vApp_eprapot_icr/ } #array;
I am trying to modify a text file I am using PHP or also I can use the C# the file that I am working on a text file consists of strings for example
TM_len= --------------------------------------------
EMM_len --------------------------------------------
T_len=45 CTGCCTGAGCTCGTCCCCTGGATGTCCGGGTCTCCCCAGGCGG
NM_=2493 ----------------ATATAAAAAGATCTGTCTGGGGCCGAA
and I want to delete those four lines from the file if I found that one line consists of only "-" no characters in it and of course save to the file.
Maybe something like this? I wrote it in a easy to understand and "not-shortened" way:
$newfiledata = "";
$signature = " ";
$handle = fopen("inputfile.txt", "r"); // open file
if ($handle) {
while (($line = fgets($handle)) !== false) { // read line by line
$pos = strpos($line, $signature); // locate spaces in line text
if ($pos) {
$lastpart = trim(substr($line, $pos)); // get second part of text
$newstring = trim(str_replace('-', '', $line)); // remove all dashes
if (len($newstring) > 0) $newfiledata .= $line."\r\n"; // if still there is characters, append it to our variable
}
}
fclose($handle);
}
// write new file
file_put_contents("newfile.txt", $newfiledata);
thanks for your response but there nothing happened on the file please check the link of the file and another link of the desired output for the file.download the file and required output file
First I use URLRequest to read a txt file contain multiple lines of String
var urlRequest:URLRequest = new URLRequest("listOfJsonFile.txt");
and I had created a array
private var listOfJson:Array = new Array();
Then I split each string into array
var loader:URLLoader = URLLoader(event.target);
listOfJson = loader.data.split(/\n/);
trace(listOfJson[0]); // return XXX.Json
Question:
How can I do:
var urlRequest:URLRequest = new URLRequest(listOfJson[0]);
Error #2044: Unhandled ioError:. text=Error #2032: Stream Error.
I have try create a temp var string or cast the element to String().
I did: var urlRequest:URLRequest = new URLRequest("XXX.json"); and it work.
Maybe your text file contains \r (carriage return) characters as well as \n (newlines)? That depends on your operating system. When you split on \n characters only, those CR characters remain part of the file name you try to load, and the file XXX.json\r doesn't likely exist.
So use split("\r\n") and see if that helps.
If that doesn't help, it may even be the other way round (I keep forgetting the actual order: split("\n\r")).
My purpose is to parse text files and store information in respective tables.
I have to parse around 100 folders having more that 8000 files and whole size approximately 20GB.
When I tried to store whole file contents in a string, memory out exception was thrown.
That is
using (StreamReader objStream = new StreamReader(filename))
{
string fileDetails = objStream.ReadToEnd();
}
Hence I tried one logic like
using (StreamReader objStream = new StreamReader(filename))
{
// Getting total number of lines in a file
int fileLineCount = File.ReadLines(filename).Count();
if (fileLineCount < 90000)
{
fileDetails = objStream.ReadToEnd();
fileDetails = fileDetails.Replace(Environment.NewLine, "\n");
string[] fileInfo = fileDetails.ToString().Split('\n');
//call respective method for parsing and insertion
}
else
{
while ((firstLine = objStream.ReadLine()) != null)
{
lineCount++;
fileDetails = (fileDetails != string.Empty) ? string.Concat(fileDetails, "\n", firstLine)
: string.Concat(firstLine);
if (lineCount == 90000)
{
fileDetails = fileDetails.Replace(Environment.NewLine, "\n");
string[] fileInfo = fileDetails.ToString().Split('\n');
lineCount = 0;
//call respective method for parsing and insertion
}
}
//when content is 90057, to parse 57
if (lineCount < 90000 )
{
string[] fileInfo = fileDetails.ToString().Split('\n');
lineCount = 0;
//call respective method for parsing and insertion
}
}
}
Here 90,000 is the bulk size which is safe to process without giving out of memory exception for my case.
Still the process is taking more than 2 days for completion. I observed this is because of reading line by line.
Is there any better approach to handle this ?
Thanks in Advance :)
You can use a profiler to detect what sucks your performance. In this case it's obvious: disk access and string concatenation.
Do not read a file more than once. Let's take a look at your code. First of all, the line int fileLineCount = File.ReadLines(filename).Count(); means you read the whole file and discard what you've read. That's bad. Throw away your if (fileLineCount < 90000) and keep only else.
It almost doesn't matter if you read line-by-line in consecutive order or the whole file because reading is buffered in any case.
Avoid string concatenation, especially for long strings.
fileDetails = fileDetails.Replace(Environment.NewLine, "\n");
string[] fileInfo = fileDetails.ToString().Split('\n');
It's really bad. You read the file line-by-line, why do you do this replacement/split? File.ReadLines() gives you a collection of all lines. Just pass it to your parsing routine.
If you'll do this properly I expect significant speedup. It can be optimized further by reading files in a separate thread while processing them in the main. But this is another story.