I can have strings containing random 10 digit numbers e.g.
"abcgfg1234567890gfggf" or
"fgfghgh3215556890ddf" etc
basically any combination of 10 digits plus chars together in a string, so I need check the string to determine if a 10 digit number is present. I use strspn but it returns 0
char str_in[] = "abcgfg1234567890gfggf";
char cset[] = "1234567890";
int result;
result = strspn(str_in, cset); // returns 0 need it to return 10
The fact that the following code returns 0 instead of 10 highlights the problem. I asked this previously but most replies were for checking against a known 10 digit number. In my case the number will be random. Any better way than strspn?
It returns 0 because there are no digits at the start of the string.
The strspn() function calculates the length (in bytes) of the
initial segment of s which consists entirely of bytes in accept.
You need to skip non-digits - strcspn - and then call strspn on the string + that offset. You could try:
/* Count chars to skip. */
skip = strcspn(str_in, cset);
/* Measure all-digit portion. */
length = strspn(str_in + skip, cset)
EDIT
I should mention this must be done in a loop. For example if your string is "abcd123abcd1234567890" the first strspn will only match 3 characters and you need to look further.
Just use sscanf():
unsigned long long value;
const char *str_in = "abcgfg1234567890gfggf";
if(sscanf(str_in, "%*[^0-9]%uL", &value) == 1)
{
if(value >= 1000000000ull) /* Check that it's 10 digits. */
{
/* do magic here */
}
}
The above assumes that unsigned long long is large enough to hold a 10-digit decimal numbers, in practice this means it assumes that's a 64-bit type.
The %*[^0-9] conversion specifier tells sscanf() to ignore a bunch of initial characters that are not (decimal) digits, then convert an unsigned long long (%uL) directly after that. The trailing characters are ignored.
How about using a regex?
#include <stdio.h>
#include <stdlib.h>
#include <regex.h>
int
main(int argc, char **argv)
{
char str_in[] = "abcgfg1234567890gfggf";
int result = 0;
const char *pattern = "[0-9]{10}";
regex_t re;
char msg[256];
if (regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB) != 0) {
perror("regcomp");
return(EXIT_FAILURE);
}
result = regexec(&re, str_in, (size_t)0, NULL, 0);
regfree(&re);
if (!result) {
printf("Regex got a match.\n");
} else if (result == REG_NOMATCH) {
printf("Regex got no match.\n");
} else {
regerror(result, &re, msg, sizeof(msg));
fprintf(stderr, "Regex match failed: %s\n", msg);
return(EXIT_FAILURE);
}
return(EXIT_SUCCESS);
}
strspn seems handy for this, but you would have to include it in a loop and search several times. Given the specific requirements, the easiest way is probably to make your own custom function.
int find_digits (const char* str, int n);
/* Searches [str] for a sequence of [n] adjacent digits.
Returns the index of the first valid substring containing such a sequence,
otherwise returns -1.
*/
#include <ctype.h>
int find_digits (const char* str, int n)
{
int result = -1;
int substr_len = 0;
int i = 0;
for(int i=0; str[i] != '\0'; i++)
{
if(isdigit(str[i]))
{
substr_len++;
}
else
{
substr_len=0;
}
if(substr_len == n)
{
result = i;
break;
}
}
return result;
}
(I just hacked this down here and now, not tested, but you get the idea. This is most likely the fastest algorithm for the task, that is, if performance matters at all)
Alternative use of sscanf()
(blatant variation of #unwind)
const char *str_in = "abcgfg0123456789gfggf";
int n1 = 0;
int n2 = 0;
// %*[^0-9] Scan any non-digits. Do not store result.
// %n Store number of characters read so far.
// %*[0-9] Scan digits. Do not store result.
sscanf(str_in, "%*[^0-9]%n%*[0-9]%n", &n1, &n2);
if (n2 == 0) return 0;
return n2 - n1;
Counts leading 0 characters as part of digit count.
Should one wish to avoid sscanf()
char str_in[] = "abcgfg1234567890gfggf";
const char *p1 = str_in;
while (*p1 && !isdigit(*p1)) p1++;
const char *p2 = p1;
while (isdigit(*p2)) p2++;
result = p2 - p1;
for testing a suit of "0123456789" inside a string you can do something like that:
int main()
{
char str_in[] = "abcgfg1234567890gfggf";
char cset[] = "1234567890";
int result;
int i;
int f;
i = 0;
f = 0;
while (str_in[i])
{
if (str_in[i] == cset[f])
{
f++;
if(f == strlen(cset))
return (f);
}
else
f = 0;
i++;
}
}
Related
This question already has answers here:
Reading string from input with space character? [duplicate]
(13 answers)
Closed last year.
I'm trying to practice some stuff about text files, printing and reading from them. I need to take input from user -maybe their phone number or someone else's- but i want them to be able to use spaces between numbers
Lets say my phone number is: 565 856 12
i want them to be able to give me this number with spaces, instead of a squished version like 56585612
So far i've tried scanf() and i don't know how to make scanf() do something like this. I've tried going for chars and for loops but its a tangle.
And when i type 565 856 12 and press enter, only 565 will be counted for the phone number. and 856 12 goes for the next scanf.
struct Student{
unsigned long long student_phone_number;
}
int main(){
FILE *filePtr;
filePtr = fopen("std_info.txt","w");
struct Student Student1;
printf("\nEnter Student's Phone Number: ");
scanf("%llu",&Student1.student_phone_number);
fprintf(filePtr,"%llu\t",Student1.student_phone_number);
}
To solve this problem, I modified the Student structure to store both unsigned long long integers and character array. User reads character array from stdin. The read data is validated using the isValid() method, and the string is converted to an unsigned long long integer using the convertToNumber() method.
#include <stdio.h>
#include <ctype.h>
#include <stdbool.h>
struct Student{
unsigned long long numberForm;
char *textForm;
};
// Converts character array to unsigned long long integer.
void convertToNumber(struct Student * const student);
// Validates the data in the Student.textForm variable.
bool isValid(const char * const input, const char * const format);
// Returns the number of characters in a character array.
size_t getSize(const char * const input);
// This function returns the "base ^ exponent" result.
unsigned long long power(int base, unsigned int exponent);
int main()
{
struct Student student;
char format[] = "NNN NNN NN"; /* "123 456 78" */
printf("Enter phone number: ");
fgets(student.textForm, getSize(format) + 1, stdin);
// The gets() function is deprecated in newer versions of the C/C++ standards.
if(isValid(student.textForm, format))
{
convertToNumber(&student);
printf("Result: %llu", student.numberForm);
}
return 0;
}
void convertToNumber(struct Student * const student)
{
int size = getSize(student->textForm) - 2;
unsigned int temp[size];
student->numberForm = 0ull;
for(int i = 0, j = 0 ; i < getSize(student->textForm) ; ++i)
if(isdigit(student->textForm[i]))
temp[j++] = student->textForm[i] - '0';
for(size_t i = 0 ; i < size ; ++i)
student->numberForm += temp[i] * power(10, size - i - 1);
}
bool isValid(const char * const input, const char * const format)
{
if(getSize(input) == getSize(format))
{
size_t i;
for(i = 0 ; i < getSize(input) ; ++i)
{
if(format[i] == 'N')
{
if(!isdigit(input[i]))
break;
}
else if(format[i] == ' ')
{
if(input[i] != format[i])
break;
}
}
if(i == getSize(input))
return true;
}
return false;
}
unsigned long long power(int base, unsigned int exponent)
{
unsigned long long result = 1;
for(size_t i = 0 ; i < exponent ; ++i)
result *= base;
return result;
}
size_t getSize(const char * const input)
{
size_t size = 0;
while(input[++size] != '\0');
return size;
}
This program works as follows:
Enter phone number: 123 465 78
Result: 12346578
You can use fgets to parse an input with spaces and all:
#include <stdio.h>
#define SIZE 100
int main() {
char str[SIZE];
fgets(str, sizeof str, stdin); // parses input string with spaces
// and checks destination buffer bounds
}
If you then want to remove the spaces you can do that easily:
#include <ctype.h>
void remove_white_spaces(char *str)
{
int i = 0, j = 0;
while (str[i])
{
if (!isspace(str[i]))
str[j++] = str[i];
i++;
}
str[j] = '\0';
}
Presto, this function will remove white spaces from the string you pass as an argument.
Live demo
Input:
12 4 345 789
Output:
124345789
After that it's easy to convert this into an unsigned integral value, you can use strtoul, but why would you store a phone number in a numeric type, would you be performing arithmetic operations on it? Doubtfully. And you then save to a file, so it really doesn't matter if it's a string or numeric type. I would just keep it as a string.
It is generally better to store a telephone number as a string, not as a number.
In order to read a whole line of input (not just a single word) as a string, I recommend that you use the function fgets.
Here is an example based on the code in your question:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct Student{
char phone_number[50];
};
int main( void )
{
//open output file (copied from code in OP's question)
FILE *filePtr;
filePtr = fopen("std_info.txt","w");
//other variable declarations
struct Student Student1;
char *p;
//prompt user for input
printf( "Enter student's phone number: ");
//attempt to read one line of input
if ( fgets( Student1.phone_number, sizeof Student1.phone_number, stdin ) == NULL )
{
printf( "input error!\n" );
exit( EXIT_FAILURE );
}
//attempt to find newline character, in order to verify that
//the entire line was read in
p = strchr( Student1.phone_number, '\n' );
if ( p == NULL )
{
printf( "line too long for input!\n" );
exit( EXIT_FAILURE );
}
//remove newline character by overwriting it with terminating
//null character
*p = '\0';
//write phone number to file
fprintf( filePtr, "%s\n", Student1.phone_number );
//cleanup
fclose( filePtr );
}
Use string for that purpose and trim the spaces from string and then convert the given number into integer using atoi function, to use atoi you must include stdlib.h header file. For example
#include<stdio.h>
#include<stdlib.h>
int main(){
char str[] = "123456";
unsigned long long int num = atoi(str);
printf("%llu", num);
}
I am making a language translator, and want to read from the buffer word by word and store them in a key-value struct.
The buffer contains such a file:
hola:hello
que:what
and so on. I already tried everything and I keep errors such as segmentation fault: 11 or just reading the same line again and again.
struct key_value{
char *key;
char *value;
};
...
struct key_value *kv = malloc(sizeof(struct key_value) * count);
char k[20]; //key
char v[20]; //value
int x = 0;
for(i = 0; i < numbytes; i++){
sscanf(buffer,"%21[^:]:%21[^\n]\n",k,v);
(kv + i)->key = k;
(kv + i)->value = v;
}
for(i = 0; i < count; i++){
printf("key: %s, value: %s\n",(kv + i)->key,(kv + i)->value);
}
free(buffer);
free(kv);
I expect the output to be key: hola, value: hello key: que, value: what,
but the actual output is just key: hola, value: hello again and again.
Which is the right way to do it?
There are multiple problems with your code, among them
On each loop iteration, you read from the beginning of the buffer. It is natural, then, that each iteration extracts the same key and value.
More generally, your read loop iteration variable seems to have no relationship with the data read. It appears to be a per-byte iteration, but you seem to want a per-line iteration. You might want to look into scanf's %n directive to help you track progress through the buffer.
You are scanning each key / value pair into the same local k and v variables, then you are assigning pointers to those variables to your structures. The resulting pointers are all the same, and they will become invalid when the function returns. I suggest giving structkey_value` arrays for its members instead of pointers, and copying the data into them.
Your sscanf format reads up to 21 characters each for key and value, but the provided destination arrays are not long enough for that. You need them to be dimensioned for at least 22 characters to hold 21 plus a string terminator.
Your sscanf() format and usage do not support recognition of malformed input, especially overlength keys or values. You need to check the return value, and you probably need to match the trailing newline with a %c field (the literal newline in the format does not mean what you think it means).
Tokenizing (the whole buffer) with strtok_r or strtok or even strchr instead of sscanf() might be easier for you.
Also, style note: your expressions of the form (kv + i)->key are valid, but it would be more idiomatic to write kv[i].key.
I've written a simple piece of code that may help you to solve your problem. I've used the function fgets to read from a file named "file.txt" and the function strchr to individuate the 1st occurence of the separator ':'.
Here the code:
#include <stdio.h>
#include <string.h>
#include <errno.h>
#define MAX_LINE_SIZE 256
#define MAX_DECODED_LINE 1024
struct decod {
char key[MAX_LINE_SIZE];
char value[MAX_DECODED_LINE];
};
static struct decod decod[1024];
int main(void)
{
FILE * fptr = NULL;
char fbuf[MAX_LINE_SIZE];
char * value;
int cnt=0,i;
if ( !(fptr=fopen("file.txt","r")) )
{
perror("");
return errno;
}
while( fgets(fbuf,MAX_LINE_SIZE,fptr)) {
// Eliminate UNIX/DOS line terminator
value=strrchr(fbuf,'\n');
if (value) *value=0;
value=strrchr(fbuf,'\r');
if (value) *value=0;
//Find first occurrence of the separator ':'
value=strchr(fbuf,':');
if (value) {
// Truncates fbuf string to first word
// and (++) points second word
*value++=0;
}
if (cnt<MAX_DECODED_LINE) {
strcpy(decod[cnt].key,fbuf);
if (value!=NULL) {
strcpy(decod[cnt].value,value);
} else {
decod[cnt].value[0]=0;
}
cnt++;
} else {
fprintf(stderr,
"Cannot read more than %d lines\n", MAX_DECODED_LINE);
break;
}
}
if (fptr)
fclose(fptr);
for(i=0;i<cnt;i++) {
printf("key:%s\tvalue:%s\n",decod[i].key,decod[i].value);
}
return 0;
}
This code reads all the lines (max 1024) that the file named file.txt contains, loads all individuated couples (max 1024) into the struct array decod and then printouts the content of the structure.
I wrote this code, I think it does the job! this is simpler than the accepted answer I think! and it uses just as much as memory is needed, no more.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
struct key_value{
char key[22];
char value[22];
};
void parse_str(char* str, struct key_value** kv_arr, int* num){
int n = 0;
int read = -1;
char k[22];
char v[22];
int current_pos = 0;
int consumed = 0;
/*counting number of key-value pairs*/
while (1){
if(current_pos > strlen(str)){
break;
}
read = sscanf(str + current_pos, "%21[^:]:%21[^\n]\n%n", k, v, &consumed);
current_pos += consumed;
if(read == 2){
++n;
}
}
printf("n = %d\n", n);
*kv_arr = malloc(sizeof(struct key_value) * n);
/*filling key_value array*/
int i = 0;
read = -1;
current_pos = 0;
consumed = 0;
while (1){
if(current_pos > strlen(str)){
break;
}
read = sscanf(str + current_pos, "%21[^:]:%21[^\n]\n%n", k, v, &consumed);
current_pos += consumed;
if(read == 2){
struct key_value* kv = &((*kv_arr)[i]);
strncpy(kv->key, k, 22);
strncpy(kv->value, v, 22);
++i;
}
}
*num = n;
}
int main(){
char* str = "hola:hello\n"
"que:what\n";
int n;
struct key_value* kv_arr;
parse_str(str, &kv_arr, &n);
for (int i = 0; i < n; ++i) {
printf("%s <---> %s\n", kv_arr[i].key, kv_arr[i].value);
}
free(kv_arr);
return 0;
}
output :
n = 2
hola <---> hello
que <---> what
Process finished with exit code 0
Note: sscanf operates on a const char*, not an input stream from a file, so it will NOT store any information about what it has consumed.
solution : I used %n in the format string to get the number of characters that it has consumed so far (C89 standard).
I have dynamic string like: "/users/5/10/fnvfnvdjvndfvjvdklchsh" and also dynamic format like "/users/%u/%d/%s", how to check these strings matches?
As string i mean char[255] or char* str = malloc(x).
I tried use sscanf but i dont know number of arguments and types, if i do:
int res = sscanf(input, format);
I have stack overflow, or can i allocate stack to prevent this?
Example like this:
void* buffer = malloc(1024);
int res = sscanf(input, format, buffer);
I would like have function like this:
bool stringMatches(const char* format, const char* input);
stringMatches("/users/%u/%d/%s", "/users/5/10/fnvfnvdjvndfvjvdklchsh"); //true
stringMatches("/users/%u/%d/%s", "/users/5/10"); //false
stringMatches("/users/%u/%d/%s", "/users/-10/10/aaa"); //false %u is unsigned
Do you see any solution?
Thanks in advance.
I don't think that there is a scanf-like matching function in the standard lib, so you will have to write your own. Replicating all details of the scanf behaviour is difficult, but it's probably not necessary.
If you allow only % and a limited selection of single format identifiers without size, width and precision information, the code isn't terribly complex:
bool stringMatches(const char *format, const char *input)
{
while (*format) {
if (*format == '%') {
format++;
switch(*format++) {
case '%': {
if (*input++ != '%') return false;
}
break;
case 'u':
if (*input == '-') return false;
// continue with 'd' case
case 'd': {
char *end;
strtol(input, &end, 0);
if (end == input) return false;
input = end;
}
break;
case 's': {
if (isspace((uint8_t) *input)) return false;
while (*input && !isspace((uint8_t) *input)) input++;
}
break;
default:
return false;
}
} else {
if (*format++ != *input++) return false;
}
}
return (*input == '\0');
}
Some notes:
I've parsed the numbers with strtol. If you want to include floating-point number formats, you could use strtod for that, if your embedded system provides it. (You could also parse stretches of isdigit() chars as valid numbers.)
The 'u' case falls through to the 'd' case here. The function strtoul parses an unsigned long, but it allows a minus sign, so that case is caught explicitly. (But the way it is caught, it won't allow leading white space.)
You could implement your own formats or re-interpret existing ones. For example you could decide that you don't want leading white space for numbers or that a string ends with a slash.
It's a rather tricky one. I don't think C has very useful built in functions that will help you.
What you could do is using a regex. Something like this:
#include <sys/types.h>
#include <regex.h>
#include <stdio.h>
int main(void)
{
regex_t regex;
if (regcomp(®ex, "/users/[[:digit:]]+", 0)) {
fprintf("Error\n");
exit(1);
}
char *mystring = "/users/5/10/fnvfnvdjvndfvjvdklchsh";
if( regexec(®ex, myString, 0, NULL, 0) == 0)
printf("Match\n");
}
The regex in the code above does not suit your example. I just used something to show the idea. I think it would correspond to the format string "/users/%u" but I'm not sure. Nevertheless, I think this is one of the easiest ways to tackle this problem.
The easiest is to just try parsing it with sscanf, and see if the scan succeeded.
char * str = "/users/5/10/fnvfnvdjvndfvjvdklchsh";
unsigned int tmp_u;
int tmp_d;
char tmp_s[256];
int n = sscanf (str, "/users/%u/%d/%s", &tmp_u, &tmp_d, tmp_s);
if (n!=3)
{
/* Match failed */
}
Just remember that you don't have to mach everything in one go. You can use the %n format specifier to get the number of bytes parsed, and increment the string for the next parse.
This example abuses the fact that bytes_parsed will not be modified if the parsing doesn't reach the %n specifier:
char * str = "/users/5/10/fnvfnvdjvndfvjvdklchsh";
int bytes_parsed = 0;
/* parse prefix */
sscanf("/users/%n", &bytes_parsed);
if (bytes_parsed == 0)
{
/* Parse error */
}
str += bytes_parsed; /* str = "5/10/fnvfnvdjvndfvjvdklchsh"; */
bytes_parsed = 0;
/* Parse next num */
unsigned int tmp_u
sscanf(str, "%u%n", &tmp_u, &bytes_parsed);
if (bytes_parsed)
{
/* Number was an unsigned, do something */
}
else
{
/* First number was not an `unsigned`, so we try parsing it as signed */
unsigned int tmp_d
sscanf(str, "%d%n", &tmp_d, &bytes_parsed);
if (bytes_parsed)
{
/* Number was an unsigned, do something */
}
}
if (!bytes_parsed)
{
/* failed parsing number */
}
str += bytes_parsed; /* str = "/10/fnvfnvdjvndfvjvdklchsh"; */
......
I have a string that contains both Mandarin and English words in UTF-8:
char *str = "你a好测b试";
If you use strlen(str), it will return 14, because each Mandarin character uses three bytes, while each English character uses only one byte.
Now I want to copy the leftmost 4 characters ("你a好测"), and append "..." at the end, to give "你a好测...".
If the text were in a single-byte encoding, I could just write:
strncpy(buf, str, 4);
strcat(buf, "...");
But 4 characters in UTF-8 isn't necessarily 4 bytes. For this example, it will be 13 bytes: three each for 你, 好 and 测 and one for a. So, for this specific case, I would need
strncpy(buf, str, 13);
strcat(buf, "...");
If I had a wrong value for the length, I could produce a broken UTF-8 stream with an incomplete character. Obviously I want to avoid that.
How can I compute the right number of bytes to copy, corresponding to a given number of characters?
First you need to know your encoding. By the sound of it (3 byte Mandarin) your string is encoded with UTF-8.
What you need to do is convert the UTF-8 back to unicode code points (integers). You can then have an array of integers rather than bytes, so each element of the array will be 1 character, reguardless of the language.
You could also use a library of functions that already handle utf8 such as http://www.cprogramming.com/tutorial/utf8.c
http://www.cprogramming.com/tutorial/utf8.h
In particular this function: int u8_toucs(u_int32_t *dest, int sz, char *src, int srcsz); might be very useful, it will create an array of integers, with each integer being 1 character. You can then modify the array as you see fit, then convert it back again with int u8_toutf8(char *dest, int sz, u_int32_t *src, int srcsz);
I would recommend dealing with this at a higher level of abstraction: either convert to wchar_t or use a UTF-8 library. But if you really want to do it at the byte level, you could count characters by skipping over the continuation bytes (which are of the form 10xxxxxx):
#include <stddef.h>
size_t count_bytes_for_chars(const char *s, int n)
{
const char *p = s;
n += 1; /* we're counting up to the start of the subsequent character */
while (*p && (n -= (*p & 0xc0) != 0x80))
++p;
return p-s;
}
Here's a demonstration of the above function:
#include <string.h>
#include <stdio.h>
int main()
{
const char *str = "你a好测b试";
char buf[50];
int truncate_at = 4;
size_t bytes = count_bytes_for_chars(str, truncate_at);
strncpy(buf, str, bytes);
strcpy(buf+bytes, "...");
printf("'%s' truncated to %d characters is '%s'\n", str, truncate_at, buf);
}
Output:
'你a好测b试' truncated to 4 characters is '你a好测...'
The Basic Multilingual Plane was designed to contain characters for almost all modern languages. In particular, it does contain Chinese.
So you just have to convert your UTF8 string to a UTF16 one to have each character using one single position. That means that you can just use a wchar_t array or even better a wstring to be allowed to use natively all string functions.
Starting with C++11, the <codecvt> header declares a dedicated converter std::codecvt_utf8 to specifically convert UTF8 narrow strings to wide Unicode ones. I must admit it is not very easy to use, but it should be enough here. Code could be like:
char str[] = "你a好测b试";
std::codecvt_utf8<wchar_t> cvt;
std::mbstate_t state = std::mbstate_t();
wchar_t wstr[sizeof(str)] = {0}; // there will be unused space at the end
const char *end;
wchar_t *wend;
auto cr = cvt.in(state, str, str+sizeof(str), end,
wstr, wstr+sizeof(str), wend);
*wend = 0;
Once you have the wstr wide string, you can convert it to a wstring and use all the C++ library tools, or if you prefer C strings you can use the ws... counterparts of the str... functions.
Pure C solution:
All UTF8 multibyte characters will be made from char-s with the most-significant-bit set to 1 with the first bits of their first character indicating how many characters makes a codepoint.
The question is ambiguous in regards to the criterion used in cutting; either:
a fixed number of codepoints followed by three dots, this wil require a variable size output buffer
a fixed size output buffer, which will impose "whatever you can fit inside"
Both the solutions will require a helper function telling how many chars make the next codepoint:
// Note: the function does NOT fully validate a
// UTF8 sequence, only looks at the first char in it
int codePointLen(const char* c) {
if(NULL==c) return -1;
if( (*c & 0xF8)==0xF0 ) return 4; // 4 ones and one 0
if( (*c & 0xF0)==0xE0 ) return 3; // 3 ones and one 0
if( (*c & 0xE0)==0xC0 ) return 2; // 2 ones and one 0
if( (*c & 0x7F)==*c ) return 1; // no ones on msb
return -2; // invalid UTF8 starting character
}
So, solution for the criterion 1 (fixed number of code points, variable output buff size) - does not append ... to the destination, but you can ask "how many chars I need" upfront and if it is longer than you can afford, reserve yourself the extra space.
// returns the number of chars used from the output
// If not enough space or the dest is null, does nothing
// and returns the lenght required for the output buffer
// Returns negative val if the source in not a valid UTF8
int copyFirstCodepoints(
int codepointsCount, const char* src,
char* dest, int destSize
) {
if(NULL==src) {
return -1;
}
// do a cold run to see if size of the output buffer can fit
// as many codepoints as required
const char* walker=src;
for(int cnvCount=0; cnvCount<codepointsCount; cnvCount++) {
int chCount=codePointLen(walker);
if(chCount<0) {
return chCount; // err
}
walker+=chCount;
}
if(walker-src < destSize && NULL!=dest) {
// enough space at destination
strncpy(src, dest, walker-src);
}
// else do nothing
return walker-src;
}
Second criterion (limited buffer size): just use the first one with the number of codepoints returned by this one
// return negative if UTF encoding error
int howManyCodepointICanFitInOutputBufferOfLen(const char* src, int maxBufflen) {
if(NULL==src) {
return -1;
}
int ret=0;
for(const char* walker=src; *walker && ret<maxBufflen; ret++) {
int advance=codePointLen(walker);
if(advance<0) {
return src-walker; // err because negative, but indicating the err pos
}
// look on all the chars between walker and walker+advance
// if any is 0, we have a premature end of the source
while(advance>0) {
if(0==*(++walker)) {
return src-walker; // err because negative, but indicating the err pos
}
advance--;
} // walker is set on the correct position for the next attempt
}
return ret;
}
static char *CutStringLength(char *lpszData, int nMaxLen)
{
if (NULL == lpszData || 0 >= nMaxLen)
{
return "";
}
int len = strlen(lpszData);
if(len <= nMaxLen)
{
return lpszData;
}
char strTemp[1024] = {0};
strcpy(strTemp, lpszData);
char *p = strTemp;
p = p + (nMaxLen-1);
if ((unsigned char)(*p) < 0xA0)
{
*(++p) = '\0'; // if the last byte is Mandarin character
}
else if ((unsigned char)(*(--p)) < 0xA0)
{
*(++p) = '\0'; // if the last but one byte is Mandarin character
}
else if ((unsigned char)(*(--p)) < 0xA0)
{
*(++p) = '\0'; // if the last but two byte is Mandarin character
}
else
{
int i = 0;
p = strTemp;
while(*p != '\0' && i+2 <= nMaxLen)
{
if((unsigned char)(*p++) >= 0xA0 && (unsigned char)(*p) >= 0xA0)
{
p++;
i++;
}
i++;
}
*p = '\0';
}
printf("str = %s\n",strTemp);
return strTemp;
}
I'm trying to parse and convert a csv and thought I'd try my hand at c since I'm currently learning the language.
int main()
{
char * s1, * s2;
int field_count = 0;
while ((field_count = scanf("%20[^;];%s", s1, s2)) != EOF)
{
printf("%d\n", field_count));
if (field_count != 2)
continue;
}
return 0;
}
I don't get why field_count is always 0 no matter how many fields there are in each line, and the loop never ends either.
I tried changing the while clause to:
while ((field_count = scanf("%s", s1)) != EOF)
but it made no difference.
I've tried googling, but examples retaining the return value AND checking for EOF were hard to find.
This is an example of code that should work.
It allocates a storage buffer for s1 and s2.
This code expects that the fields are at most 99 characters long.
int main(int argc, char * argv[])
{
char s1[100], s2[100];
int field_count = 0;
while ((field_count = scanf("%99[^;];%99[^\n]", s1, s2)) >= 0)
{
printf("%d '%s' '%s'\n", field_count, s1, s2);
}
return 0;
}
Scanf is a bit limited to parse an input string. You might get unexpected result when you give it an empty string.