Check for letter in text file python - file

I need to count the number of vowels in a text file given to me (with python program) and return the number. For whatever reason when I run the program the file returns 0 vowels even though the count variable is supposed to increase by one each time it loops and finds a vowel.
def numVowels(file):
count = 0
opened_file = open(file)
content = opened_file.readlines()
for char in content:
if char.lower() in 'aeiou':
count += 1
return(count)
I'm not sure if that is because I am working with a text file, but usually I am able to do this without an issue. Any help is greatly appreciated.
Thank you!

readlines() returns a list of lines from the file so for char in content: means char is a line of text in the file which isn't what you are looking for.
You can read() the whole file into memory or iterate through the file line by line and then iterate through the line character at at time:
def numVowels(file):
count = 0
with open(file) as opened_file:
for content in opened_file:
for char in content:
if char.lower() in 'aeiou':
count += 1
return count
You can sum a generator of 1's to produce the same value:
def numVowels(file):
with open(file) as f:
return sum(1 for content in f for char in content if char.lower() in 'aeiou')

Related

How can I replace two characters in a 40GB file in Unix?

I have two huge json files (20GB each) and I need to join them. The files have the following content:
file_1.json = [{"key": "value"}, {...}]
file_2.json = [{"key": "value"}, {...}]
The main problem, however, is that I need all dict to be in the same list. I tried to do this in python, but unfortunately, I don't have the memory to do this operation.
So, I thought maybe I could tackle this with unix commands, by replacing, in the first file, the ] for , (note that there is a space after the comma) and erasing [ from the second file. Then, I would join the two files with the cat unix command.
Is there a way for me to edit only the last 10 char in unix?
I tried to use echo and tr but I might be doing something wrong with the syntax.
You can very easily append to a file in place, i.e. add characters at the end without rewriting the data that's already there. With the right tools (truncate if your system has it), you can truncate a file in place, i.e. remove characters at the end without rewriting the data that's staying. With the right tools (dd, if you're feeling adventurous), you can replace a part of a file by a string of the same length, without rewriting the unchanged parts. On the other hand, you can't remove characters from the beginning or middle of a file without rewriting the file (with a few exceptions that aren't relevant here).
But anyway rewriting both files in place wouldn't help you that much. You will need to at least rewrite the content of the second file to append it to the first file.
If you don't need to keep the split files around, you can append the second file to the first file in place, after taking care of the middle punctuation. Remove the last ] character from the first file, as well as any following spaces and line breaks. Assuming that the first file ends in ] and a newline and you have GNU core utilities (e.g. non-embedded Linux):
truncate -s -2 file_1.json
Now you can add a comma and optionally a line break to the first file, and append the data from the second file without its first character.
echo , >>file_1.json
tail -c +2 file_2.json >>file_1.json
If you want to keep the original files unmodified, you can make a copy of the first file and truncate it. Or you can directly make a truncated copy of the first file (still assuming GNU coreutils):
head -c -2 file_1.json >concatenated.json
echo , >>concatenated.json
tail -c +2 file_2.json >>concatenated.json
If you're more comfortable with Python, you can do all of this in Python. Just don't read the whole file in one go, i.e. don't call read() or use readline() in a way that reads all the lines as once. Instead, read and process a single line at a time (if the lines are short) or a single block of data. Untested code:
with open('concatenated.json', 'wb') as out:
with open('file_1.json', 'rb') as inp:
buf = bytes(1024)
size = inp.seek(-len(buf), io.SEEK_END)
n = inp.readinto(buf)
m = re.search(rb']\s*\Z', buf)
stop_at = m.start()
inp.seek(0, io.SEEK_SET)
n = inp.readinto(buf)
total = n
while n > 0:
out.write(buf)
n = inp.readinto(buf)
total += n
if total > stop_at:
out.write(buf[:len(buf)-(total-stop_at)])
n = 0
out.write(b',')
with open('file_2.json', 'rb') as inp:
buf = bytes(1024)
n = inp.readinto(buf)
assert buf[0] == b'['
buf[0:1] = b'\n'
while n > 0:
out.write(buf)
n = inp.readinto(buf)

Function is not reading the integer from a file

I'm trying to write a function for a program I'm working on. The aim of my function is to take a number from the "text" file and use the variable store to hold that value and have current to be equal to store add 1 and output the value of current into the original text file.
but when I execute the program, with the number 1 in the text file, it appears that the store does not take the 1 from the text file and is set to be 0 when I printed the variable, while current is equal to 1 so 1 is outputted to the file. Below is the code and an example of what I'm trying to do.
If you do have suggestions, concerns, or a solution. Please feel free to comment, Thank you.
Example:
If text file = 1
then store = textfile(1)
and current = store(1) + 1 | current = 2
textfile = current(2)
#include <stdio.h>
main()
{
int current=0,store;
FILE*count=fopen("text","r");
FILE*countout=fopen("text","w");
fscanf(count,"%d",store);
current=store+1;
fprintf(countout,"%d",current);
fclose(count);
fclose(countout);
}
Fixed Code:
main()
{
int current=0,store;
FILE*count=fopen("text","r");
fscanf(count,"%d",&store);
fclose(count);
current=store+1;
printf("%d %d",store,current);
FILE*countout=fopen("text","w");
fprintf(countout,"%d",current);
fclose(countout);
}

Best way to append text to file in dm-script

What is the best way to append line(s) to a file?
Currently I am using the following script:
/**
* Append the `line` to the file given at the `path`.
*
* #param path
* The absolute or relative path to the file with
* extension
* #param line
* The line to append
* #param [max_lines=10000]
* The maximum number of lines to allow for a file
* to prevent an infinite loop
*/
void append(string path, string line, number max_lines){
number f = OpenFileForReadingAndWriting(path);
// go through file until the end is reached to set the
// internal pointer to this position
number line_counter = 0;
string file_content = "";
string file_line;
while(ReadFileLine(f, file_line) && line_counter < max_lines){
line_counter++;
// file_content += file_line;
}
// result("file content: \n" + file_content + "{EOF}");
// append the line
WriteFile(f, line + "\n");
CloseFile(f);
}
void append(string path, string line){
append(path, line, 10000);
}
string path = "path/to/file.txt";
append(path, "Appended line");
For me it seems a little bit odd to read the whole file content to just append one line. If the file is very big, this probably is very slow1. So I guess there is a better solution of this. Does anyone know this solution?
Some background
My application is written in python but executed in Digital Micrograph. My python application is logging its steps. Sometimes I am executing dm-script from python. There I have no possibility to see what is going on. Since there is a bug, I need something to find out what is going on. Therefore I want to add logging to dm-script too.
This also explains, why I want to open and close the file every single time. This takes more time, but I don't care about execution speed while debugging. The logs will either be removed or switched off for the normal version, as usual. But on the other hand I am executing dm-script and python alternating so I have to prevent python blocking the file for dm-script and the other way around.
1As written in the background, I am not really interested in speed. So the current script is enough for me. Still I am interested in how to do this better, just for learnings and curiositys sake.
The best way to deal with any files in DM-script (binary or text) is to use the streaming object. The following example should answer your question:
void writeText()
{
string path
if ( !SaveAsDialog( "Save text as" , path , path ) ) return
number fileID = CreateFileForWriting( path )
object fStream = NewStreamFromFileReference( fileID , 1 ) // 1 for auto-close file when out of scope
// Write some text
number encoding = 0 // 0 = system default
fStream.StreamWriteAsText( encoding , "The quick brown dog jumps over the lazy fox" )
// Replace last 'fox' by 'dog'
fStream.StreamSetPos( 1 , -3 ) // 3 bytes before current position
fStream.StreamWriteAsText( encoding, "dog" )
// Replace first 'dog' by 'fox'
fStream.StreamSetPos( 0 , 16 ) // 16 bytes after start
fStream.StreamWriteAsText( encoding, "fox" )
// Append at end
fStream.StreamSetPos( 2 , 0 ) // end position (0 bytes from end)
fStream.StreamWriteAsText( encoding, "." )
}
writeText()

Where am I going wrong in getting this function to do what I would like?

I have written the following function in my C program. The program loads a text file (Les Miserables Vol. I) as well as another text file of 20 of the characters names. The purpose of this function is to scan the entire file, line by line, and count the number of times any of the 20 names appear.
NumOfNames = 20.
Names is an array of the 20 names stored from Names[1] - Names[20].
MaxName is a global integer variable which I would like to store the total number of name appearances throughout the file (It should be in the hundreds or even thousands).
EDIT: After the function is executed, the value of MaxName is 4. I am completely lost as to where I have made a mistake, but it appears that I have made several mistakes throughout the function. One seems to be that it only executed the first iteration of the for loop i.e. it only searches for Name[1], however the first name appears 196 times in the file, so it still isnt even working correctly for just the first name.
void MaxNameAppearances()
{
char LineOfText[85];
char *TempName;
FILE *fpn = fopen(LesMisFilePath, "r+");
for(i = 1; i<=NumOfNames; i++)
{
while(fgets(LineOfText, sizeof(LineOfText), fpn))
{
TempName = strstr(LineOfText, Names[i]);
if(TempName != NULL)
{
MaxName++;
}
}
}
fclose(fpn);
}
I guess that one problem of the code is that it would have to read the file upon every iteration of i. Try to re-order the loops like this:
while(fgets(LineOfText, sizeof(LineOfText), fpn))
{
for(i = 1; i<=NumOfNames; i++)
{
TempName = strstr(LineOfText, Names[i]);
if(TempName != NULL)
{
MaxName++;
}
}
}
This reads a line, checks the occurrances of all names in that line and then goes on to the next line.
If you do it your way, you will be at the end of file for i == 1 already.

How to get the length of a file in MATLAB?

Is there any way to figure out the length of a .dat file (in terms of rows) without loading the file into the workspace?
Row Counter -- only loads one character per row:
Nrows = numel(textread('mydata.txt','%1c%*[^\n]'))
or file length (Matlab):
datfileh = fopen(fullfile(path, filename));
fseek(datfileh, 0,'eof');
filelength = ftell(datfileh);
fclose(datfileh);
I'm assuming you are working with text files, since you mentioned finding the number of rows.
Here's one solution:
fid = fopen('your_file.dat','rt');
nLines = 0;
while (fgets(fid) ~= -1),
nLines = nLines+1;
end
fclose(fid);
This uses FGETS to read each line, counting the number of lines it reads. Note that the data from the file is never saved to the workspace, it is simply used in the conditional check for the while loop.
It's also worth bearing in mind that you can use your file system's in-built commands, so on linux you could use the command
[s,w] = system('wc -l your_file.dat');
and then get the number of lines from the returned text (which is stored in w). (I don't think there's an equivalent command under Windows.)

Resources