I have a file containing a list of numbers separated by commas. I tried different methods of reading data, and this piece of code has worked without issues on different datasets.
Input for example (600 values): https://pastebin.com/AHJ5UpEu
#include <stdio.h>
#include <stdint.h>
#include <malloc.h>
#include <mem.h>
#define READ "r"
#define MAX_LINE_SIZE 4096
#define DATA_DELIMITER ","
unsigned char *readInput(const char *filename, size_t inputs) {
unsigned char *input = malloc(sizeof(unsigned char) * inputs);
unsigned char nbr;
const char *token;
int i;
FILE *inputPtr = fopen(filename, READ);
char line[MAX_LINE_SIZE];
while (fgets(line, MAX_LINE_SIZE, inputPtr)) {
nbr = 0;
for (token = strtok(line, DATA_DELIMITER); token && *token; token = strtok(NULL, ",\n")) {
input[nbr] = (unsigned char) atoi(token);
nbr++;
}
break;
}
fclose(inputPtr);
if(nbr != inputs){
printf("Error, did not read all files. Only read %d\n",nbr);
exit(-1);
}
exit(0);
}
int main() {
unsigned char *d = readInput("../traces/inputs.dat", 600);
free(d);
exit(0);
}
Though it only reads the first 88 values. If I change the max-line-size to for example 512, this number is 145.
Though the value should - if I understand this correct - be equal to the length of the line, in my case ~2100 characters. So using 4098 shouldn't be an issue.
Please do correct me if I'm wrong.
How come I'm not reading all 600 values, but only parts of the data?
nbr is being used like an integer counter but is defined as an unsigned char. A char is one byte, and an unsigned byte has a range of 0 to 255. Incrementing beyond 255 will cause the byte to overflow and return to a value of 0. So, currently, nbr is actually the total number of entries processed mod 256.
Related
This question already has answers here:
Reading string from input with space character? [duplicate]
(13 answers)
Closed last year.
I'm trying to practice some stuff about text files, printing and reading from them. I need to take input from user -maybe their phone number or someone else's- but i want them to be able to use spaces between numbers
Lets say my phone number is: 565 856 12
i want them to be able to give me this number with spaces, instead of a squished version like 56585612
So far i've tried scanf() and i don't know how to make scanf() do something like this. I've tried going for chars and for loops but its a tangle.
And when i type 565 856 12 and press enter, only 565 will be counted for the phone number. and 856 12 goes for the next scanf.
struct Student{
unsigned long long student_phone_number;
}
int main(){
FILE *filePtr;
filePtr = fopen("std_info.txt","w");
struct Student Student1;
printf("\nEnter Student's Phone Number: ");
scanf("%llu",&Student1.student_phone_number);
fprintf(filePtr,"%llu\t",Student1.student_phone_number);
}
To solve this problem, I modified the Student structure to store both unsigned long long integers and character array. User reads character array from stdin. The read data is validated using the isValid() method, and the string is converted to an unsigned long long integer using the convertToNumber() method.
#include <stdio.h>
#include <ctype.h>
#include <stdbool.h>
struct Student{
unsigned long long numberForm;
char *textForm;
};
// Converts character array to unsigned long long integer.
void convertToNumber(struct Student * const student);
// Validates the data in the Student.textForm variable.
bool isValid(const char * const input, const char * const format);
// Returns the number of characters in a character array.
size_t getSize(const char * const input);
// This function returns the "base ^ exponent" result.
unsigned long long power(int base, unsigned int exponent);
int main()
{
struct Student student;
char format[] = "NNN NNN NN"; /* "123 456 78" */
printf("Enter phone number: ");
fgets(student.textForm, getSize(format) + 1, stdin);
// The gets() function is deprecated in newer versions of the C/C++ standards.
if(isValid(student.textForm, format))
{
convertToNumber(&student);
printf("Result: %llu", student.numberForm);
}
return 0;
}
void convertToNumber(struct Student * const student)
{
int size = getSize(student->textForm) - 2;
unsigned int temp[size];
student->numberForm = 0ull;
for(int i = 0, j = 0 ; i < getSize(student->textForm) ; ++i)
if(isdigit(student->textForm[i]))
temp[j++] = student->textForm[i] - '0';
for(size_t i = 0 ; i < size ; ++i)
student->numberForm += temp[i] * power(10, size - i - 1);
}
bool isValid(const char * const input, const char * const format)
{
if(getSize(input) == getSize(format))
{
size_t i;
for(i = 0 ; i < getSize(input) ; ++i)
{
if(format[i] == 'N')
{
if(!isdigit(input[i]))
break;
}
else if(format[i] == ' ')
{
if(input[i] != format[i])
break;
}
}
if(i == getSize(input))
return true;
}
return false;
}
unsigned long long power(int base, unsigned int exponent)
{
unsigned long long result = 1;
for(size_t i = 0 ; i < exponent ; ++i)
result *= base;
return result;
}
size_t getSize(const char * const input)
{
size_t size = 0;
while(input[++size] != '\0');
return size;
}
This program works as follows:
Enter phone number: 123 465 78
Result: 12346578
You can use fgets to parse an input with spaces and all:
#include <stdio.h>
#define SIZE 100
int main() {
char str[SIZE];
fgets(str, sizeof str, stdin); // parses input string with spaces
// and checks destination buffer bounds
}
If you then want to remove the spaces you can do that easily:
#include <ctype.h>
void remove_white_spaces(char *str)
{
int i = 0, j = 0;
while (str[i])
{
if (!isspace(str[i]))
str[j++] = str[i];
i++;
}
str[j] = '\0';
}
Presto, this function will remove white spaces from the string you pass as an argument.
Live demo
Input:
12 4 345 789
Output:
124345789
After that it's easy to convert this into an unsigned integral value, you can use strtoul, but why would you store a phone number in a numeric type, would you be performing arithmetic operations on it? Doubtfully. And you then save to a file, so it really doesn't matter if it's a string or numeric type. I would just keep it as a string.
It is generally better to store a telephone number as a string, not as a number.
In order to read a whole line of input (not just a single word) as a string, I recommend that you use the function fgets.
Here is an example based on the code in your question:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct Student{
char phone_number[50];
};
int main( void )
{
//open output file (copied from code in OP's question)
FILE *filePtr;
filePtr = fopen("std_info.txt","w");
//other variable declarations
struct Student Student1;
char *p;
//prompt user for input
printf( "Enter student's phone number: ");
//attempt to read one line of input
if ( fgets( Student1.phone_number, sizeof Student1.phone_number, stdin ) == NULL )
{
printf( "input error!\n" );
exit( EXIT_FAILURE );
}
//attempt to find newline character, in order to verify that
//the entire line was read in
p = strchr( Student1.phone_number, '\n' );
if ( p == NULL )
{
printf( "line too long for input!\n" );
exit( EXIT_FAILURE );
}
//remove newline character by overwriting it with terminating
//null character
*p = '\0';
//write phone number to file
fprintf( filePtr, "%s\n", Student1.phone_number );
//cleanup
fclose( filePtr );
}
Use string for that purpose and trim the spaces from string and then convert the given number into integer using atoi function, to use atoi you must include stdlib.h header file. For example
#include<stdio.h>
#include<stdlib.h>
int main(){
char str[] = "123456";
unsigned long long int num = atoi(str);
printf("%llu", num);
}
I'm programming something that counts the number of UTF-8 characters in a file. I've already written the base code but now, I'm stuck in the part where the characters are supposed to be counted. So far, these are what I have:
What's inside the text file:
黄埔炒蛋
你好
こんにちは
여보세요
What I've coded so far:
#include <stdio.h>
typedef unsigned char BYTE;
int main(int argc, char const *argv[])
{
FILE *file = fopen("file.txt", "r");
if (!file)
{
printf("Could not open file.\n");
return 1;
}
int count = 0;
while(1)
{
BYTE b;
fread(&b, 1, 1, file);
if (feof(file))
{
break;
}
count++;
}
printf("Number of characters: %i\n", count);
fclose(file);
return 0;
}
My question is, how would I code the part where the UTF-8 characters are being counted? I tried to look for inspirations in GitHub and YouTube but I haven't found anything that works well with my code yet.
Edit: Originally, this code prints that the text file has 48 characters. But considering UTF-8, it should only be 18 characters.
See: https://en.wikipedia.org/wiki/UTF-8#Encoding
Each UTF-8 sequence contains one starting byte and zero or more extra bytes.
Extra bytes always start with bits 10 and first byte never starts with that sequence.
You can use that information to count only first byte in each UTF-8 sequence.
if((b&0xC0) != 0x80) {
count++;
}
Keep in mind this will break, if file contains invalid UTF-8 sequences.
Also, "UTF-8 characters" might mean different things. For example "👩🏿" will be counted as two characters by this method.
In C, as in C++, there is no ready-made solution for counting UTF-8 characters. You can convert UTF-8 to UTF-16 using mbstowcs and use the wcslen function, but this is not the best way for performance (especially if you only need to count the number of characters and nothing else).
I think a good answer to your question is here: counting unicode characters in c++.
Еxample from answer on link:
for (p; *p != 0; ++p)
count += ((*p & 0xc0) != 0x80);
You could look into the specs: https://www.rfc-editor.org/rfc/rfc3629.
Chapter 3 has this table in it:
Char. number range | UTF-8 octet sequence
(hexadecimal) | (binary)
--------------------+---------------------------------------------
0000 0000-0000 007F | 0xxxxxxx
0000 0080-0000 07FF | 110xxxxx 10xxxxxx
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
You could inspect the bytes and build the unicode characters.
A different point is, whether you would count a base character and its accent (combining mark cf. https://en.wikipedia.org/wiki/Combining_character) as one or as several characters.
There are multiple options you may take:
you may depend on your system implementation of wide encoding and multibyte encoding
you may read the file as a wide stream and just count the bytes, depend on the system to do UTF-8 multibyte string to wide string conversion on it's own (see main1 below)
you may read the file as bytes and convert the multibyte string into a wide string and count bytes (see main2 below)
You may use an external library that operates on UTF-8 strings and count the unicode characters (see main3 below that uses libunistring)
Or roll your own utf8_strlen-ish solution that will work on specific UTF-8 string property and check the bytes yourself, as showed in other answers.
Here is an example program that has to be compiled with -lunistring under linux with rudimentary error checking with assert:
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <assert.h>
#include <stdlib.h>
void main1()
{
// read the file as wide characters
const char *l = setlocale(LC_ALL, "en_US.UTF-8");
assert(l);
FILE *file = fopen("file.txt", "r");
assert(file);
int count = 0;
while(fgetwc(file) != WEOF) {
count++;
}
fclose(file);
printf("Number of characters: %i\n", count);
}
// just a helper function cause i'm lazy
char *file_to_buf(const char *filename, size_t *strlen) {
FILE *file = fopen(filename, "r");
assert(file);
size_t n = 0;
char *ret = malloc(1);
assert(ret);
for (int c; (c = fgetc(file)) != EOF;) {
ret = realloc(ret, n + 2);
assert(ret);
ret[n++] = c;
}
ret[n] = '\0';
*strlen = n;
fclose(file);
return ret;
}
void main2() {
const char *l = setlocale(LC_ALL, "en_US.UTF-8");
assert(l);
size_t strlen = 0;
char *str = file_to_buf("file.txt", &strlen);
assert(str);
// convert multibye string to wide string
// assuming multibytes are in UTF-8
// this may also be done in a streaming fashion when reading byte by byte from a file
// and calling with `mbtowc` and checking errno for EILSEQ and managing some buffer
mbstate_t ps = {0};
const char *tmp = str;
size_t count = mbsrtowcs(NULL, &tmp, 0, &ps);
assert(count != (size_t)-1);
printf("Number of characters: %zu\n", count);
free(str);
}
#include <unistr.h> // u8_mbsnlen from libunistring
void main3() {
size_t strlen = 0;
char *str = file_to_buf("file.txt", &strlen);
assert(str);
// for simplicity I am assuming uint8_t is equal to unisgned char
size_t count = u8_mbsnlen((const uint8_t *)str, strlen);
printf("Number of characters: %zu\n", count);
free(str);
}
int main() {
main1();
main2();
main3();
}
I've got an array of unsigned chars that I'd like to output to a file using the C file I/O. What's a good way to do that? Preferably for now, I'd like to output just one unsigned char.
My array of unsigned chars is also non-zero terminated, since the data is coming in binary.
I'd suggest to use function fwrite to write binary data to a file; obviously the solution works for arrays of SIZE==1 as well:
int main() {
#define SIZE 10
unsigned char a[SIZE] = {1, 2, 3, 4, 5, 0, 1, 2, 3, 4 };
FILE *f1 = fopen("file.bin", "wb");
if (f1) {
size_t r1 = fwrite(a, sizeof a[0], SIZE, f1);
printf("wrote %zu elements out of %d requested\n", r1, SIZE);
fclose(f1);
}
}
If you have, as you claim, an array of input (not just a pointer to the first element of the array), you don't need a null terminator. The array will also have length information:
size_t len = sizeof myarray;
// To write a single char:
fputc(myarray[0], fp);
// To write the entire array:
fwrite(myarray, 1, len, fp);
If you do want to output the entire array
I've got an array of unsigned chars that I'd like to output
How to output unsigned char to a text file in c?
The tricky part is "to a text file". This usually means to save something human readable.
If that is the goal, then save printable characters as text and non-printable ones as escaped text - also special handling of '\n' and '\\'.
#include <ctype.h>
#include <stdio.h>
void print_1_uchar(FILE *stream, unsigned char uch) {
if ((isprint(uch)) && ch != '\\') || ch == '\n') {
fputc(uch, stream);
} else {
fprintf(stream, "\\x03o", uch);
}
}
void print_1_array(FILE *stream, const void *a, size_t sz) {
const unsigned char *s = a;
while (sz-- > 0) {
print_1_uchar(stream, *s++);
}
}
Usage
unsigned char foo[] = { 1,2,3,4,5 };
print_1_array(stdout, foo, sizeof foo);
I am looking to find part of a string containing TP2, DP3, OP1, or OP2, in a text file.
On each line is a different set of characters and eventually these three characters are used, but they are never on the same line as each other.
I can get it to print once I find the OP2, but it will not print the three before it. If I comment out the OP2 it finds OP1 and if I do that to OP1 and OP2 it finds DP3 and so on.
I do not get why it cannot print out all four different ones once found.
I used two different methods one where I strcpy into a temp and one I just print it as is and neither work. Later I want it to print to the right of the = sign on the lines with the four search types, but I will work on that after I get the print issue fixed. Any help or reasons why would be much appreciated.
#include < stdio.h>
#include < stdlib.h>
#include < string.h>
#define MAX_LINE_LENGTH 150
int main(void) {
FILE *file1, *file2;
char parts[MAX_LINE_LENGTH+1];
int len = strlen(parts);
//char TP2[3] = "TP2";
char DP3[3] = "DP3";
char MOP1[3] = "OP1";
//char MOP2[3] = "OP2";
//char TP2Temp[MAX_LINE_LENGTH];
char DP3Temp[MAX_LINE_LENGTH];
char MOP1Temp[MAX_LINE_LENGTH];
//char MOP2Temp[MAX_LINE_LENGTH];
file1 = fopen("input.txt", "r");
file2 = fopen("output2.txt", "w");
if (file1 == NULL || file2 ==NULL) {
exit(1);
}
while(fgets(parts, sizeof(parts), file1)!=NULL){
if(parts[len -1 ] =='\n'){
parts[len -1 ] ='\0';
}
//if(strstr(parts, TP2)!=NULL){
// strcpy(TP2Temp, parts);
// fprintf(file2, "%s", TP2Temp);
//}
if(strstr(parts,DP3)!=NULL){
strcpy(DP3Temp, strstr(parts,DP3));
fprintf(file2, "%s", DP3Temp);
}
else if(strstr(parts, MOP1)!=NULL){
strcpy(MOP1Temp, strstr(parts,MOP1));
fprintf(file2, "%s", MOP1Temp);
}
/*else if(strstr(parts, MOP2)!=NULL){
strcpy(MOP2Temp, parts);
fprintf(file2, "%s", MOP2Temp);
}*/
}
fclose(file1);
fclose(file2);
return 0;
}
/*Here is the text file sample
TC_TP1[2]=1
TC_TP2[2]="9070036"
TC_TP3[2]=1
TC_TP4[2]=1
TC_TP5[2]=1
TC_TP6[2]=1
TC_TP7[2]=1
TC_DP1[2,1]=120
TC_DP2[2,1]=0
TC_DP3[2,1]=179.85
TC_DP4[2,1]=0
TC_DP5[2,1]=0
TC_MOP1[2,1]=3
TC_MOP2[2,1]=28
TC_MOP3[2,1]=0
TC_MOP4[2,1]=0
TC_TP1[3]=1
TC_TP2[3]="9005270"
TC_TP3[3]=1*/
char parts[MAX_LINE_LENGTH+1];
int len = strlen(parts);
parts is uninitialised in this code, and thus isn't guaranteed to contain a string. Even if it were to, len would be initialised the length of that garbage string, which is meaningless and thus useless.
char DP3[3] = "DP3";
If your understanding of strings is valid, you should realise there are four characters in these strings. The following program demonstrates this:
#include <stdio.h>
int main(void) {
printf("sizeof \"DP3\": %zu\n", sizeof "DP3");
}
You are reading a book to learn C, right? Your book would explain to you among many other things so we wouldn't need to, strstr requires its operands be strings, and strings always contain a terminating '\0'. Where's your terminating '\0'? How is strstr expected to know the length of the string pointed to by DP3?
Because the length of your tokens are at most three bytes, you currently only need to read and store at most three bytes at a time to conduct your search (four including the terminal byte explained above; untested&incomplete example below); this requirement could change, should you decide to introduce longer (or dynamically sized) tokens, your cursor will need to be as wide as your longest token.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef unsigned char item[4];
int item_cmp(void const *x, void const *y) {
return memcmp(x, y, sizeof (item));
}
int main(void) {
item cursor = "",
haystack[] = { "TP1", "OP0", "OP1", "OP2", "TP0", "DP3", "OOO" };
size_t size = fread(cursor, sizeof cursor - 1, 1, stdin),
nelem = sizeof haystack / sizeof *haystack;
int c = 0, e = !size;
qsort(haystack, nelem, sizeof *haystack, item_cmp);
do {
if (bsearch(cursor, haystack, nelem, sizeof *haystack, item_cmp)) {
printf("match found for %s\n", cursor);
}
memmove(cursor, cursor + 1, sizeof cursor - 1);
if (!e) {
c = fgetc(stdin);
e = c < 0 && feof(stdin);
}
cursor[size] = e || c == '\n' ? '\0' : c;
size -= e;
} while (size);
exit(0);
}
Thank you again BLUEPIXY, with your information I was able to make the changes I needed and was able to extract the data where it found TP2 and then the value after the equal sign. I am sure there is a nicer way to code this, but my solution is below. I will add a change to be able to take in any file name and the reason for the
MOP1Equal[strlen(MOP1Equal) -1] ='\0';
was to make it in columns for a csv file to put in excel and the
fprintf(file2, "\t%s", MOP1Equal+1);
where I add 1 was to get rid of the = sign.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#define MAX_LINE_LENGTH 150
int main(void) {
FILE *file1, *file2;
char parts[MAX_LINE_LENGTH+1] = "startingvaluebeforechange";
char TP2[4] = "TP2";
char DP3[4] = "DP3";
char MOP1[4] = "OP1";
char MOP2[4] = "OP2";
char Equal[2] = "=";
char TP2Temp[MAX_LINE_LENGTH];
char TP2Equal[MAX_LINE_LENGTH];
char DP3Temp[MAX_LINE_LENGTH];
char DP3Equal[MAX_LINE_LENGTH];
char MOP1Temp[MAX_LINE_LENGTH];
char MOP1Equal[MAX_LINE_LENGTH];
char MOP2Temp[MAX_LINE_LENGTH];
char MOP2Equal[MAX_LINE_LENGTH];
file1 = fopen("input.txt", "r");
file2 = fopen("output.txt", "w");
if (file1 == NULL || file2 ==NULL) {
exit(1);
}
while(fgets(parts, sizeof(parts), file1)!=NULL){
int len = strlen(parts);
if(parts[len -1 ] =='\n'){
parts[len -1 ] ='\0';
}
if(strstr(parts, TP2)!=NULL){
strcpy(TP2Temp, strstr(parts,TP2));
strcpy(TP2Equal, strstr(TP2Temp,Equal));
TP2Equal[strlen(TP2Equal) -2] ='\0';
fprintf(file2, "%s", TP2Equal+2);
}
if(strstr(parts,DP3)!=NULL){
strcpy(DP3Temp, strstr(parts,DP3));
strcpy(DP3Equal, strstr(DP3Temp,Equal));
DP3Equal[strlen(DP3Equal) -1] ='\0';
fprintf(file2, "\t%s", DP3Equal+1);
}
if(strstr(parts, MOP1)!=NULL){
strcpy(MOP1Temp, strstr(parts,MOP1));
strcpy(MOP1Equal, strstr(MOP1Temp,Equal));
MOP1Equal[strlen(MOP1Equal) -1] ='\0';
fprintf(file2, "\t%s", MOP1Equal+1);
}
if(strstr(parts, MOP2)!=NULL){
strcpy(MOP2Temp, strstr(parts,MOP2));
strcpy(MOP2Equal, strstr(MOP2Temp,Equal));
fprintf(file2, "\t%s", MOP2Equal+1);
}
}
fclose(file1);
fclose(file2);
return 0;
}
I would like to read characters from a file on a single line (for a school exercise). The exercise statest that the string is a maximum of 1000 characters.
Using the following code I was able to read the file content in a char[]:
FILE *fp;
const int buffsize = 1000;
char *filepath = argv[1];
char buff[buffsize];
fp = fopen(filepath, "r");
fscanf(fp, "%s", buff);
//buff has the file contents
Considering these includes:
#include <stdio.h>
#include <stdlib.h>
This seem to work fine however when I want to iterate over the array it goes way beyond the actual array length (because i fixed the buffsize at 1000 I assume). How could I approach this in a better way, so I can iterate over the correct number of indices?
To get the length (which is a better term since it has little to do with the size of the buffer it's in) of a string, use strlen():
const size_t len = strlen(buff);
This will count the number of characters before the first 0-character, which is how strings are terminated ("end-marked") in C.
Now you can iterate using an index if you want:
for(size_t i = 0; i < len; ++i)
printf("The character at index %zu is '%c'\n", i, buff[i]);
Note that many C programmers would iterate over the buffer directly instead, using a pointer:
for(const char *s = buff; *s != '\0'; ++s)
printf("Found character '%c'\n", *s);
A C-"string" of 1000 characters requires a buffer of 1000+1=1001 chars, as a C-"string"'s end is indicated by an additional 0.
So to enable your code to actually read 1000 characters, change
const int buffsize = 1000;
to be
const int buffsize = 1000 + 1;
and to make sure to also just scan in 1000 chars and not overflow buff, change
fscanf(fp, "%s", buff);
to be
fscanf(fp, "%1000s", buff);