Text file read lost formatting vs slowdown - file

I've been trying to work this out for a while.
Reading a text file.
Using this code the formatting gets replaced with the getline...but the time to load the file is simply too long...
std::string line = "";
std::string file = "";
std::ifstream filepath(path);
if (filepath.is_open())
{
while (std::getline (filepath,line) )
{
file = file + line + "\r\n";
}
filepath.close();
}
Using this code the time to load the file is around 10X faster, but the formatting is lost:
std::ifstream in(path);
std::stringstream stream;
stream << in.rdbuf();
std::string file(stream.str());
Is it possible to get the speed of the second method with the formatting of the first...? Or better yet faster speeds and no change in formatting?
I had considered trying not to keep loading the same string in the first example at each getline in the loop but early attempts didn't seem to help.

Okay after checking again, not sure what this is called but this string optimization took care of the delay and the formatting... whew!
If anyone knows how to make it even faster I'd love to learn how...
std::string line = "";
std::string file = "";
std::string tmp_str = "";
std::ifstream filepath(path);
unsigned int c = 0;
if (filepath.is_open())
{
while (std::getline (filepath,line) )
{
tmp_str = tmp_str + line + "\r\n";
c++;
if (c > 100)
{
file = file + tmp_str;
tmp_str = "";
c = 0;
}
}
if (c != 0)
{
file = file + tmp_str;
}
filepath.close();

Related

Split ogg vorbis stream without BOS

Input: a stream of ogg/vorbis coming from an encoder chip of an embedded system.
Problem: create output chunks of one second without transcoding.
Issue: the stream is being read "in the middle", so the first page with BOS (Beginning of Stream) is not available. Since the encoder chip has always the same parameters, I'd like to recreate the BOS page using the BOS page of a stream that was read from the start (reference stream).
I am trying to use vcut. I modified it so that it creates infinite chunks of one second. It was easy, and it works with files and streams with BOS.
I also hacked it so that I wrote to a file the first pages of the reference stream and then read them before reading the production stream with no BOS. In this way, vs->headers are populated. When I detect a page serial number change, I change it so that vcut and libogg do not freak:
int process_page(vcut_state *s, ogg_page *page) {
...
else if(vs->serial != ogg_page_serialno(page))
{
// fprintf(stderr, _("Multiplexed bitstreams are not supported.\n"));
vs->stream_in.serialno = ogg_page_serialno(page);
vs->serial = ogg_page_serialno(page);
vs->granulepos = -1;
vs->initial_granpos = 0;
// ogg_stream_init(&vs->stream_in, vs->serial);
// vorbis_info_init(&vs->vi);
// vorbis_comment_init(&vs->vc);
s->vorbis_init = 1;
}
However, this gigantic hack does not work. How to solve this issue?
It actually works: see VS1053 split ogg.
What I needed to do was to consider that starting reading in the middle of the stream, granulepos was naturally high. So it was mine logical mistake.
In process_audio_packet, I added:
int process_audio_packet(vcut_state *s,
vcut_vorbis_stream *vs, ogg_packet *packet)
{
...
if(packet->granulepos >= 0)
{
if (!firstNonZeroGranule) { // my addition
firstNonZeroGranule = 1;
vs->initial_granpos = packet->granulepos - bs;
if(vs->initial_granpos < 0)
vs->initial_granpos = 0;
} else if(vs->granulepos == 0 && packet->granulepos != bs) {
...

strrchr() function cause memory corruption

I'm just try to writing a little function to get extension of a file (char * file) using strrchr() (string.h).
But, I've a problem, this function cause a memory corruption error (and I don't know why precisly).
I already check the parameter file, it's OK.
I expect a result like ".jpg" when I put "01.jpg" in input.
When it's "" in input, i'm waiting "" in result.
And same when input is "NA"
char * getExtensionOfFile(char * file){
//create variable ext
char * ext = (char*)malloc(sizeof(char)*4);
strcpy(ext,"");
if(strlen(file)==0 || strcmp(file,"NA")==0) return ""; //If file is empty or useless (case file=="NA")
sprintf(ext,"%s",strrchr(file,'.'));
return ext;
I think the guilty is strrchr().
If it's true, why ? if not ? Which ?
I tried to rewrite this function from scratch with char[] instead but it's less beautiful and I really want to understand.
Thanks!
So, after discutions with others persons, someone found a pretty (good?) solution.
Thanks for your help!
The solution (not by myself)
const char * getExtensionOfFile(const char * file){
if (strlen(file) == 0 || strcmp(file, "NA") == 0) return "";
return strrchr(file,'.');
}
suggest keeping it simple.
#include <string.h>
char * returnExtension( char * filename )
{
char * ext;
if( (ext = strchr( filename, '.'))
{
return ext;
else
{
return "";
}
}

LZW encoding for large file

I am building an LZW encoding algorithm, which uses dictionary and hashing so it can reach fast enough for working words already stored in a dictionary.
The algorithm gives proper results when ran on smaller files (cca few hundreds of symbols), but on the larger files (and especially in those files which contain of less different symbols - for example, it gives the worst performance when ran on a file which consists only of 1 symbol, 'y' let's say). The worst performance, in terms that it just crashes when dictionary is not even close to being full. However, when the large input file consists of more than 1 symbol, dictionary gets close to being full, approximately 90%, but again then it crashes.
Considering the structure of my algorithm, I am not quite sure what is causing it to crash in general, or crash so soon when large file of just 1 symbol is given.
It must be something about hashing (first time doing it, so it might have some bugs).
The hash function I am using can be found here, and from what I have tested it, it gives good results: oat_hash
LZW encoding algorithm is based on this link, with slight change, that it works until the dictionary is not full: LZW encoder
Let's get into code:
Note: oat_hash is changed so it returns value % CAPACITY, so every index is from DICTIONARY
// Globals
#define CAPACITY 100000
char *DICTIONARY[CAPACITY];
unsigned short CODES[CAPACITY]; // CODES and DICTIONARY are linked via index: word from dictionary on index i, has its code in CODES on index i
int position = 0;
int code_counter = 0;
void encode(FILE *input, FILE *output){
int succ1 = fseek(input, 0, SEEK_SET);
if(succ1 != 0) printf("Error: file not open!");
int succ2 = fseek(output, 0, SEEK_SET);
if(succ2 != 0) printf("Error: file not open!");
//1. Working word = next symbol from the input
char *working_word = malloc(2048*sizeof(char));
char new_symbol = getc(input);
working_word[0] = new_symbol;
working_word[1] = '\0';
//2. WHILE(there are more symbols on the input) DO
//3. NewSymbol = next symbol from the input
while((new_symbol = getc(input)) != EOF){
char *workingWord_and_newSymbol= NULL;
char newSymbol[2];
newSymbol[0] = new_symbol;
newSymbol[1] = '\0';
workingWord_and_newSymbol = working_word_and_new_symbol(working_word, newSymbol);
int index = oat_hash(workingWord_and_newSymbol, strlen(workingWord_and_newSymbol));
//4. IF(WorkingWord + NewSymbol) is already in the dictionary THEN
if(DICTIONARY[index] != NULL){
// 5. WorkingWord += NewSymbol
working_word = working_word_and_new_symbol(working_word, newSymbol);
}
//6. ELSE
else{
//7. OUTPUT: code for WorkingWord
int idx = oat_hash(working_word, strlen(working_word));
fprintf(output, "%u", CODES[idx]);
//8. Add (WorkingWord + NewSymbol) into a dictionary and assign it a new code
if(!dictionary_full()){
DICTIONARY[index] = workingWord_and_newSymbol;
CODES[index] = code_counter + 1;
code_counter += 1;
working_word = strdup(newSymbol);
}else break;
}
//10. END IF
}
//11. END WHILE
//12. OUTPUT: code for WorkingWord
int index = oat_hash(working_word, strlen(working_word));
fprintf(output, "%u", CODES[index]);
free(working_word);
}
int index = oat_hash(workingWord_and_newSymbol, strlen(workingWord_and_newSymbol));
And later
int idx = oat_hash(working_word, strlen(working_word));
fprintf(output, "%u", CODES[idx]);
//8. Add (WorkingWord + NewSymbol) into a dictionary and assign it a new code
if(!dictionary_full()){
DICTIONARY[index] = workingWord_and_newSymbol;
CODES[index] = code_counter + 1;
code_counter += 1;
working_word = strdup(newSymbol);
}else break;
idx and index are unbounded and you use them to access a bounded array. You're accessing memory out of range. Here's a suggestion, but it may skew the distribution. If your hash range is much larger than CAPACITY it shouldn't be a problem. But you also have another problem which was mentioned, collisions, you need to handle them. But that's a different problem.
int index = oat_hash(workingWord_and_newSymbol, strlen(workingWord_and_newSymbol)) % CAPACITY;
// and
int idx = oat_hash(working_word, strlen(working_word)) % CAPACITY;
LZW compression is certainly used to construct binary files and normally is capable of reading binary files.
The following code is problematic as it relies on new_symbol never being a \0.
newSymbol[0] = new_symbol; newSymbol[1] = '\0';
strlen(workingWord_and_newSymbol)
strdup(newSymbol)
Needs re-write to work with arrays of bytes rather than strings.
fopen() was not shown. Insure one is opening in binary. input = fopen(..., "rb");
#Wumpus Q. Wumbley is correct, use int newSymbol.
Minor:
new_symbol and newSymbol are confusing.
Consider:
// char *working_word = malloc(2048*sizeof(char));
#define WORKING_WORD_N (2048)
char *working_word = malloc(WORKING_WORD_N*sizeof(*working_word));
// or
char *working_word = malloc(WORKING_WORD_N);

bilingual program in console application in C

I have been trying to implement a way to make my program bilingual : the user could chose if the program should display French or English (in my case).
I have made lots of researches and googling but I still cannot find a good example on how to do that :/
I read about gettext, but since this is for a school's project we are not allowed to use external libraries (and I must admit I have nooo idea how to make it work even though I tried !)
Someone also suggested to me the use of arrays one for each language, I could definitely make this work but I find the solution super ugly.
Another way I thought of is to have to different files, with sentences on each line and I would be able to retrieve the right line for the right language when I need to. I think I could make this work but it also doesn't seem like the most elegant solution.
At last, a friend said I could use DLL for that. I have looked up into that and it indeed seems to be one of the best ways I could find... the problem is that most resources I could find on that matter were coded for C# and C++ and I still have no idea how I would do to implement in C :/
I can grasp the idea behind it, but have no idea how to handle it in C (at all ! I do not know how to create the DLL, call it, retrieve the right stuff from it or anything >_<)
Could someone point me to some useful resources that I could use, or write a piece of code to explain the way things work or should be done ?
It would be seriously awesome !
Thanks a lot in advance !
(Btw, I use visual studio 2012 and code in C) ^^
If you can't use a third party lib then write your own one! No need for a dll.
The basic idea is the have a file for each locale witch contains a mapping (key=value) for text resources.
The name of the file could be something like
resources_<locale>.txt
where <locale> could be something like en, fr, de etc.
When your program stars it reads first the resource file for specified locale.
Preferably you will have to store each key/value pair in a simple struct.
Your read function reads all key/value pair into a hash table witch offers a very good access speed. An alternative would be to sort the array containing the key/value pairs by key and then use binary search on lookup (not the best option, but far better than iterating over all entries each time).
Then you'll have to write a function get_text witch takes as argument the key of the text resource to be looked up an return the corresponding text in as read for the specified locale. You have to handle keys witch have no mapping, the simplest way would be to return key back.
Here is some sample code (using qsort and bsearch):
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#define DEFAULT_LOCALE "en"
#define NULL_ARG "[NULL]"
typedef struct localized_text {
char* key;
char* value;
} localized_text_t;
localized_text_t* localized_text_resources = NULL;
int counter = 0;
char* get_text(char*);
void read_localized_text_resources(char*);
char* read_line(FILE*);
void free_localized_text_resources();
int compare_keys(const void*, const void*);
void print_localized_text_resources();
int main(int argc, char** argv)
{
argv++;
argc--;
char* locale = DEFAULT_LOCALE;
if(! *argv) {
printf("No locale provided, default to %s\n", locale);
} else {
locale = *argv;
printf("Locale provided is %s\n", locale);
}
read_localized_text_resources(locale);
printf("\n%s, %s!\n", get_text("HELLO"), get_text("WORLD"));
printf("\n%s\n", get_text("foo"));
free_localized_text_resources();
return 0;
}
char* get_text(char* key)
{
char* text = NULL_ARG;
if(key) {
text = key;
localized_text_t tmp;
tmp.key = key;
localized_text_t* result = bsearch(&tmp, localized_text_resources, counter, sizeof(localized_text_t), compare_keys);
if(result) {
text = result->value;
}
}
return text;
}
void read_localized_text_resources(char* locale)
{
if(locale) {
char localized_text_resources_file_name[64];
sprintf(localized_text_resources_file_name, "resources_%s.txt", locale);
printf("Read localized text resources from file %s\n", localized_text_resources_file_name);
FILE* localized_text_resources_file = fopen(localized_text_resources_file_name, "r");
if(! localized_text_resources_file) {
perror(localized_text_resources_file_name);
exit(1);
}
int size = 10;
localized_text_resources = malloc(size * sizeof(localized_text_t));
if(! localized_text_resources) {
perror("Unable to allocate memory for text resources");
}
char* line;
while((line = read_line(localized_text_resources_file))) {
if(strlen(line) > 0) {
if(counter == size) {
size += 10;
localized_text_resources = realloc(localized_text_resources, size * sizeof(localized_text_t));
}
localized_text_resources[counter].key = line;
while(*line != '=') {
line++;
}
*line = '\0';
line++;
localized_text_resources[counter].value = line;
counter++;
}
}
qsort(localized_text_resources, counter, sizeof(localized_text_t), compare_keys);
// print_localized_text_resources();
printf("%d text resource(s) found in file %s\n", counter, localized_text_resources_file_name);
}
}
char* read_line(FILE* p_file)
{
int len = 10, i = 0, c = 0;
char* line = NULL;
if(p_file) {
line = malloc(len * sizeof(char));
c = fgetc(p_file);
while(c != EOF) {
if(i == len) {
len += 10;
line = realloc(line, len * sizeof(char));
}
line[i++] = c;
c = fgetc(p_file);
if(c == '\n' || c == '\r') {
break;
}
}
line[i] = '\0';
while(c == '\n' || c == '\r') {
c = fgetc(p_file);
}
if(c != EOF) {
ungetc(c, p_file);
}
if(strlen(line) == 0 && c == EOF) {
free(line);
line = NULL;
}
}
return line;
}
void free_localized_text_resources()
{
if(localized_text_resources) {
while(counter--) {
free(localized_text_resources[counter].key);
}
free(localized_text_resources);
}
}
int compare_keys(const void* e1, const void* e2)
{
return strcmp(((localized_text_t*) e1)->key, ((localized_text_t*) e2)->key);
}
void print_localized_text_resources()
{
int i = 0;
for(; i < counter; i++) {
printf("Key=%s value=%s\n", localized_text_resources[i].key, localized_text_resources[i].value);
}
}
Used with the following resource files
resources_en.txt
WORLD=World
HELLO=Hello
resources_de.txt
HELLO=Hallo
WORLD=Welt
resources_fr.txt
HELLO=Hello
WORLD=Monde
run
(1) out.exe /* default */
(2) out.exe en
(3) out.exe de
(4) out.exe fr
output
(1) Hello, World!
(2) Hello, World!
(3) Hallo, Welt!
(4) Hello, Monde!
gettext is the obvious answer but it seems it's not possible in your case. Hmmm. If you really, really need a custom solution... throwing out a wild idea here...
1: Create a custom multilingual string type. The upside is that you can easily add new languages afterwards, if you want. The downside you'll see in #4.
//Terrible name, change it
typedef struct
{
char *french;
char *english;
} MyString;
2: Define your strings as needed.
MyString s;
s.french = "Bonjour!";
s.english = "Hello!";
3: Utility enum and function
enum
{
ENGLISH,
FRENCH
};
char* getLanguageString(MyString *myStr, int language)
{
switch(language)
{
case ENGLISH:
return myStr->english;
break;
case FRENCH:
return myStr->french;
break;
default:
//How you handle other values is up to you. You could decide on a default, for instance
//TODO
}
}
4: Create wrapper functions instead of using plain old C standard functions. For instance, instead of printf :
//Function should use the variable arguments and allow a custom format, too
int myPrintf(const char *format, MyString *myStr, int language, ...)
{
return printf(format, getLanguageString(myStr, language));
}
That part is the painful one : you'll need to override every function you use strings with to handle custom strings. You could also specify a global, default language variable to use when one isn't specified.
Again : gettext is much, much better. Implement this only if you really need to.
the main idea of making programs translatable is using in all places you use texts any kind of id. Then before displaying the test you get the text using the id form the appropriate language-table.
Example:
instead of writing
printf("%s","Hello world");
You write
printf("%s",myGetText(HELLO_WORLD));
Often instead of id the native-language string itself is used. e.g.:
printf("%s",myGetText("Hello world"));
Finally, the myGetText function is usually implemented as a Macro, e.g.:
printf("%s", tr("Hello world"));
This macro could be used by an external parser (like in gettext) for identifying texts to be translated in source code and store them as list in a file.
The myGetText could be implemented as follows:
std::map<std::string, std::map<std::string, std::string> > LangTextTab;
std::string GlobalVarLang="en"; //change to de for obtaining texts in German
void readLanguagesFromFile()
{
LangTextTab["de"]["Hello"]="Hallo";
LangTextTab["de"]["Bye"]="Auf Wiedersehen";
LangTextTab["en"]["Hello"]="Hello";
LangTextTab["en"]["Bye"]="Bye";
}
const char * myGetText( const char* origText )
{
return LangTextTab[GlobalVarLang][origText ].c_str();
}
Please consider the code as pseudo-code. I haven't compiled it. Many issues are still to mention: unicode, thread-safety, etc...
I hope however the example will give you the idea how to start.

Reading unsingned chars with Qt

Hi fellow stack overflowers,
I'm currently parsing a file which both contains text and binary data. Currently, I'm reading the file in following manner:
QTextStream in(&file);
int index = 0;
while(!in.atEnd()) {
if (index==0) {
QString line = in.readLine(); // parse file here
} else {
QByteArray raw_data(in.readAll().toAscii());
data = new QByteArray(raw_data);
}
index++;
}
where data refers to the binary data I'm looking for. I'm not sure if this is what I want, since the QString is encoded into ascii and I have no idea if some bytes are lost.
I checked the documentation, and it recommends using a QDataStream. How can I combine both approaches, i.e. read lines with an encoding and also read the binary dump, after one line break?
Help is greatly appreciated!
This will do what you want.
QTextStream t(&in);
QString line;
QByteArray raw_data;
if(!in.atEnd()) {line = t.readLine();}
in.reset();
int lineSize = line.toLocal8Bit().size() + 1;
in.seek(lineSize);
if(!in.atEnd())
{
int len = in.size() - lineSize;
QDataStream d(&in);
char *raw = new char[len]();
d.readRawData(raw, len);
raw_data = QByteArray(raw, len);
delete raw;
}
PS: if file format is yours, it will be better to create file with QDataStream and write data with <<, read with >>. This way you can store QByteArray and QString in file without such problems.

Resources