Unencoding data from POST (CGI and C)

Unencoding data from POST (CGI and C) - c

I was just reading this page http://www.cs.tut.fi/~jkorpela/forms/cgic.html about getting started with CGI in C. I had a question about the code in the unencoding part.
#include <stdio.h>
#include <stdlib.h>
#define MAXLEN 80
#define EXTRA 5
/* 4 for field name "data", 1 for "=" */
#define MAXINPUT MAXLEN+EXTRA+2
/* 1 for added line break, 1 for trailing NUL */
#define DATAFILE "../data/data.txt"
void unencode(char *src, char *last, char *dest)
{
for(; src != last; src++, dest++)
if(*src == '+')
*dest = ' ';
else if(*src == '%') {
int code;
if(sscanf(src+1, "%2x", &code) != 1) code = '?';
*dest = code;
src +=2; }
else
*dest = *src;
*dest = '\n';
*++dest = '\0';
}
int main(void)
{
char *lenstr;
char input[MAXINPUT], data[MAXINPUT];
long len;
printf("%s%c%c\n",
"Content-Type:text/html;charset=iso-8859-1",13,10);
printf("<TITLE>Response</TITLE>\n");
lenstr = getenv("CONTENT_LENGTH");
if(lenstr == NULL || sscanf(lenstr,"%ld",&len)!=1 || len > MAXLEN)
printf("<P>Error in invocation - wrong FORM probably.");
else {
FILE *f;
fgets(input, len+1, stdin);
unencode(input+EXTRA, input+len, data);
f = fopen(DATAFILE, "a");
if(f == NULL)
printf("<P>Sorry, cannot store your data.");
else
fputs(data, f);
fclose(f);
printf("<P>Thank you! Your contribution has been stored.");
}
return 0;
}
I was wondering exactly how these lines:
else if(*src == '%') {
int code;
if(sscanf(src+1, "%2x", &code) != 1) code = '?';
*dest = code;
src +=2; }
convert something like %21 back into the exclamation mark?
Thanks!

else if(*src == '%') {
int code;
if(sscanf(src+1, "%2x", &code) != 1) code = '?';
*dest = code;
src +=2;
}
If the string begins with a % character, sscanf() is used to parse the following hexadecimal characters. The "%x" format converts hexadecimal characters to a integer value (in this case, a character code), and the 2 specifies a maximum field width, so that it consumes at most 2 characters.
The return value of sscanf() indicate the number of successful conversions, so if it doesn't return 1, it didn't find a valid hexadecimal number.
Then the character code is assigned to *dest, and the src pointer is advanced to point to the next character after the %xx sequence.
There are actually three bugs here:
The "%x" format specifier expects an argument of type unsigned int *. A signed int * was passed which, I believe, invokes undefined behaviour. Variadic functions (such as sscanf()) have unusal ways of passing the arguments, and it is required that the format specifier matches the type of the argument.
However, the two types are similar enough that it will probably work just fine in practice.
It also accepts signed hexadecimal numbers (with a + or - character), which is probably not what the author intended.
For example, "%-ffText" would result in code == -15.
The src pointer is advanced by 2 bytes, but scanf() doesn't necessarily consume 2 characters.
"%fText" would result in code == 15, and consume only one character (other than the % character). The example above would consume 3 characters.

The sscanf function translates 2 hex chars into a single int value. This value is equal to the ASCII value, therefore put in 'dest'. Since 2 chars has been decoded, src has to increase two positions.
So '%21' -> 0x21 -> char '!'

Related

Convert a string to the character that it represents

const char *lex1 = "\'a\'"; // prints 'a'
const char *lex2 = "\'\\t\'"; // prints '\t'
const char *lex3 = "\'0\'"; // prints '0'
How would I convert from a char * to the representative char? For the above example, the conversion is as follows:
char lex1_converted = 'a';
char lex2_converted = '\t';
char lex3_converted = '0';
I am currently thinking of chains of if-else blocks, but there might be a better approach.
This is a homework problem related to lexical analysis with flex. Please provide the minimum hint required to comprehend the solution. Thanks!
Edit: Looks like I misinterpreted the tokenized strings in my first post :)

const char* lex1="your sample text";
char lex1_converted=*lex1;
this code will convert the first char of your lex1 , and if you want to convert others you had to do this
char lex1_converted=*(lex1+your_disire_index);
//example
char lex1_converted=*(lex1+1);
and also the other way is doing this
char lex1_converted=lex1[your_desire_index];

Please provide the minimum hint required to comprehend the solution.
A key attribute of the conversion is to detect that the source string may lack the expected format.
Below some untested sample code to get OP started.
Sorry, did not see "homework problem related to lexical analysis with flex" until late.
Non-flex, C only approach:
Pseudo code
// Return 0-255 on success
// Else return -1.
int swamp_convert(s)
Find string length
If length too small or string missing the two '
return fail
if s[1] char is not \
if length is 3, return middle char
return fail
if length is 4 & character after \ is a common escaped character, (search a string)
return corresponding value (table lookup)
Maybe consider octal, hexadecimal, other escaped forms
else return fail
// Return 0-255 on success
// Else return -1.
int swamp_convert(const char *s) {
size_t len = strlen(s);
if (len < 3 || s[0] != '\'' || s[len - 1] != '\'') {
return -1;
}
if (s[1] != '\\') {
return (len == 3) ? s[1] : -1;
}
// Common escaped single charters
const char *escape_chars = "abtnvfr\\\'\"";
const char *e = strchr(escape_chars, s[2]);
if (t && len == 4) {
return "\a\b\t\n\v\f\r\\\'\""[e - escape_chars];
}
// Additional code to handle \0-7, \x, \X, maybe \e ...
return -1;
}

C if statement, optimal way to check for special characters and letters

Hi folks thanks in advance for any help, I'm doing the CS50 course i'm at the very beginning of programming.
I'm trying to check if the string from the main function parameter string argv[] is indeed a number, I searched multiple ways.
I found in another topic How can I check if a string has special characters in C++ effectively?, on the solution posted by the user Jerry Coffin:
char junk;
if (sscanf(str, "%*[A-Za-z0-9_]%c", &junk))
/* it has at least one "special" character
else
/* no special characters */
if seems to me it may work for what I'm trying to do, I'm not familiar with the sscanf function, I'm having a hard time, to integrate and adapt to my code, I came this far I can't understand the logic of my mistake:
#include <cs50.h>
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
int numCheck(string[]);
int main(int argc, string argv[]) {
//Function to check for user "cooperation"
int key = numCheck(argv);
}
int numCheck(string input[]) {
int i = 0;
char junk;
bool usrCooperation = true;
//check for user "cooperation" check that key isn't a letter or special sign
while (input[i] != NULL) {
if (sscanf(*input, "%*[A-Za-z_]%c", &junk)) {
printf("test fail");
usrCooperation = false;
} else {
printf("test pass");
}
i++;
}
return 0;
}

check if the string from the main function parameter string argv[] is indeed a number
A direct way to test if the string converts to an int is to use strtol(). This nicely handles "123", "-123", "+123", "1234567890123", "x", "123x", "".
int numCheck(const char *s) {
char *endptr;
errno = 0; // Clear error indicator
long num = strtol(s, &endptr, 0);
if (s == endptr) return 0; // no conversion
if (*endptr) return 0; // Junk after the number
if (errno) return 0; // Overflow
if (num > INT_MAX || num < INT_MIN) return 0; // int Overflow
return 1; // Success
}
int main(int argc, string argv[]) {
// Call each arg[] starting with `argv[1]`
for (int a = 1; a < argc; a++) {
int success = numCheck(argv[a]);
printf("test %s\n", success ? "pass" : "fail");
}
}
sscanf(*input, "%*[A-Za-z_]%c", &junk) is the wrong approach for testing numerical conversion.

You pass argv to numcheck and test all strings in it: this is incorrect as argv[0] is the name of the running executable, so you should skip this argument. Note also that you should pass input[i] to sscanf(), not *input.
Furthermore, lets analyze the return value of sscanf(input[i], "%*[A-Za-z_]%c", &junk):
it returns EOF if the input string is empty,
it returns 0 if %*[A-Za-z_] fails,
it also returns 0 if the conversion %c fails after the %*[A-Za-z_] succeeds,
it returns 1 is both conversions succeed.
This test is insufficient to check for non digits in the string, it does not actually give useful information: the return value will be 0 for the string "1" and also for the string "a"...
sscanf() is very tricky, full of quirks and traps. Definitely not the right tool for pattern matching.
If the goal is to check that the strings contain only digits (at least one), use this instead, using the often overlooked standard function strspn():
#include <stdio.h>
#include <string.h>
int numCheck(char *input[]) {
int i;
int usrCooperation = 1;
//check for user "cooperation" check that key isn't a letter or special sign
for (i = 1; input[i] != NULL; i++) {
// count the number of matching character at the beginning of the string
int ndigits = strspn(input[i], "0123456789");
// check for at least 1 digit and no characters after the digits
if (ndigits > 0 && input[i][ndigits] == '\0') {
printf("test passes: %d digits\n", ndigits);
} else {
printf("test fails\n");
usrCooperation = 0;
}
}
return usrCooperation;
}

Let's try this again:
This is still your problem:
if (sscanf(*input, "%*[A-Za-z_]%c", &junk))
but not for the reason I originally said - *input is equal to input[0]. What you want to have there is
if ( sscanf( input[i], "%*[A-Za-z_]%c", &junk ) )
what you're doing is cycling through all your command line arguments in the while loop:
while( input[i] != NULL )
but you're only actually testing input[0].
So, quick primer on sscanf:
The first argument (input) is the string you're scanning. The type of this argument needs to be char * (pointer to char). The string typedef name is an alias for char *. CS50 tries to paper over the grosser parts of C string handling and I/O and the string typedef is part of that, but it's unique to the CS50 course and not a part of the language. Beware.
The second argument is the format string. %[ and %c are format specifiers and tell sscanf what you're looking for in the string. %[ specifies a set of characters called a scanset - %[A-Za-z_] means "match any sequence of upper- and lowercase letters and underscores". The * in %*[A-Za-z_] means don't assign the result of the scan to an argument. %c matches any character.
Remaining arguments are the input items you want to store, and their type must match up with the format specifier. %[ expects its corresponding argument to have type char * and be the address of an array into which the input will be stored. %c expects its corresponding argument (in this case junk) to also have type char *, but it's expecting the address of a single char object.
sscanf returns the number of items successfully read and assigned - in this case, you're expecting the return value to be either 0 or 1 (because only junk gets assigned to).
Putting it all together,
sscanf( input, "%*[A-Za-z_]%c", &junk )
will read and discard characters from input up until it either sees the string terminator or a character that is not part of the scanset. If it sees a character that is not part of the scanset (such as a digit), that character gets written to junk and sscanf returns 1, which in this context is treated as "true". If it doesn't see any characters outside of the scanset, then nothing gets written to junk and sscanf returns 0, which is treated as "false".
EDIT
So, chqrlie pointed out a big error of mine - this test won't work as intended.
If there are no non-letter and non-underscore characters in input[i], then nothing gets assigned to junk and sscanf returns 0 (nothing assigned). If input[i] starts with a letter or underscore but contains a non-letter or non-underscore character later on, that bad character will be converted and assigned to junk and sscanf will return 1.
So far so good, that's what you want to happen. But...
If input[i] starts with a non-letter or non-underscore character, then you have a matching failure and sscanf bails out, returning 0. So it will erroneously match a bad input.
Frankly, this is not a very good way to test for the presence of "bad" characters.
A potentially better way would be to use something like this:
while ( input[i] )
{
bool good = true;
/**
* Cycle through each character in input[i] and
* check to see if it's a letter or an underscore;
* if it isn't, we set good to false and break out of
* the loop.
*/
for ( char *c = input[i]; *c; c++ )
{
if ( !isalpha( *c ) && *c != '_' )
{
good = false;
break;
}
}
if ( !good )
{
puts( "test fails" );
usrCooperation = 0;
}
else
{
puts( "test passes" );
}
}

I followed the solution by the user "chux - Reinstate Monica". thaks everybody for helping me solve this problem. Here is my final program, maybe it can help another learner in the future. I decided to avoid using the non standard library "cs50.h".
//#include <cs50.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <limits.h>
void keyCheck(int);
int numCheck(char*);
int main(int argc, char* argv[])
{
//Error code == 1;
int key = 0;
keyCheck(argc); //check that two parameters where sent to main.
key = numCheck(argv[1]); //Check for user "cooperation".
return 0;
}
//check for that main received two parameters.
void keyCheck(int key)
{
if (key != 2) //check that main argc only has two parameter. if not terminate program.
{
exit(1);
}
}
//check that the key (main parameter (argv [])) is a valid number.
int numCheck(char* input)
{
char* endptr;
errno = 0;
long num = strtol(input, &endptr, 0);
if (input == endptr) //no conversion is possible.
{
printf("Error: No conversion possible");
return 1;
}
else if (errno == ERANGE) //Input out of range
{
printf("Error: Input out of range");
return 1;
}
else if (*endptr) //Junk after numeric text
{
printf("Error: data after main parameter");
return 1;
}
else //conversion succesfull
{
//verify that the long int is in the integer limits.
if (num >= INT_MIN && num <= INT_MAX)
{
return num;
}
//if the main parameter is bigger than an int, terminate program
else
{
printf("Error key out of integer limits");
exit(1);
}
}
/* else
{
printf("Success: %ld", num);
return num;
} */
}

how to cut out a chinese words & english words mixture string in c language

I have a string that contains both Mandarin and English words in UTF-8:
char *str = "你a好测b试";
If you use strlen(str), it will return 14, because each Mandarin character uses three bytes, while each English character uses only one byte.
Now I want to copy the leftmost 4 characters ("你a好测"), and append "..." at the end, to give "你a好测...".
If the text were in a single-byte encoding, I could just write:
strncpy(buf, str, 4);
strcat(buf, "...");
But 4 characters in UTF-8 isn't necessarily 4 bytes. For this example, it will be 13 bytes: three each for 你, 好 and 测 and one for a. So, for this specific case, I would need
strncpy(buf, str, 13);
strcat(buf, "...");
If I had a wrong value for the length, I could produce a broken UTF-8 stream with an incomplete character. Obviously I want to avoid that.
How can I compute the right number of bytes to copy, corresponding to a given number of characters?

First you need to know your encoding. By the sound of it (3 byte Mandarin) your string is encoded with UTF-8.
What you need to do is convert the UTF-8 back to unicode code points (integers). You can then have an array of integers rather than bytes, so each element of the array will be 1 character, reguardless of the language.
You could also use a library of functions that already handle utf8 such as http://www.cprogramming.com/tutorial/utf8.c
http://www.cprogramming.com/tutorial/utf8.h
In particular this function: int u8_toucs(u_int32_t *dest, int sz, char *src, int srcsz); might be very useful, it will create an array of integers, with each integer being 1 character. You can then modify the array as you see fit, then convert it back again with int u8_toutf8(char *dest, int sz, u_int32_t *src, int srcsz);

I would recommend dealing with this at a higher level of abstraction: either convert to wchar_t or use a UTF-8 library. But if you really want to do it at the byte level, you could count characters by skipping over the continuation bytes (which are of the form 10xxxxxx):
#include <stddef.h>
size_t count_bytes_for_chars(const char *s, int n)
{
const char *p = s;
n += 1; /* we're counting up to the start of the subsequent character */
while (*p && (n -= (*p & 0xc0) != 0x80))
++p;
return p-s;
}
Here's a demonstration of the above function:
#include <string.h>
#include <stdio.h>
int main()
{
const char *str = "你a好测b试";
char buf[50];
int truncate_at = 4;
size_t bytes = count_bytes_for_chars(str, truncate_at);
strncpy(buf, str, bytes);
strcpy(buf+bytes, "...");
printf("'%s' truncated to %d characters is '%s'\n", str, truncate_at, buf);
}
Output:
'你a好测b试' truncated to 4 characters is '你a好测...'

The Basic Multilingual Plane was designed to contain characters for almost all modern languages. In particular, it does contain Chinese.
So you just have to convert your UTF8 string to a UTF16 one to have each character using one single position. That means that you can just use a wchar_t array or even better a wstring to be allowed to use natively all string functions.
Starting with C++11, the <codecvt> header declares a dedicated converter std::codecvt_utf8 to specifically convert UTF8 narrow strings to wide Unicode ones. I must admit it is not very easy to use, but it should be enough here. Code could be like:
char str[] = "你a好测b试";
std::codecvt_utf8<wchar_t> cvt;
std::mbstate_t state = std::mbstate_t();
wchar_t wstr[sizeof(str)] = {0}; // there will be unused space at the end
const char *end;
wchar_t *wend;
auto cr = cvt.in(state, str, str+sizeof(str), end,
wstr, wstr+sizeof(str), wend);
*wend = 0;
Once you have the wstr wide string, you can convert it to a wstring and use all the C++ library tools, or if you prefer C strings you can use the ws... counterparts of the str... functions.

Pure C solution:
All UTF8 multibyte characters will be made from char-s with the most-significant-bit set to 1 with the first bits of their first character indicating how many characters makes a codepoint.
The question is ambiguous in regards to the criterion used in cutting; either:
a fixed number of codepoints followed by three dots, this wil require a variable size output buffer
a fixed size output buffer, which will impose "whatever you can fit inside"
Both the solutions will require a helper function telling how many chars make the next codepoint:
// Note: the function does NOT fully validate a
// UTF8 sequence, only looks at the first char in it
int codePointLen(const char* c) {
if(NULL==c) return -1;
if( (*c & 0xF8)==0xF0 ) return 4; // 4 ones and one 0
if( (*c & 0xF0)==0xE0 ) return 3; // 3 ones and one 0
if( (*c & 0xE0)==0xC0 ) return 2; // 2 ones and one 0
if( (*c & 0x7F)==*c ) return 1; // no ones on msb
return -2; // invalid UTF8 starting character
}
So, solution for the criterion 1 (fixed number of code points, variable output buff size) - does not append ... to the destination, but you can ask "how many chars I need" upfront and if it is longer than you can afford, reserve yourself the extra space.
// returns the number of chars used from the output
// If not enough space or the dest is null, does nothing
// and returns the lenght required for the output buffer
// Returns negative val if the source in not a valid UTF8
int copyFirstCodepoints(
int codepointsCount, const char* src,
char* dest, int destSize
) {
if(NULL==src) {
return -1;
}
// do a cold run to see if size of the output buffer can fit
// as many codepoints as required
const char* walker=src;
for(int cnvCount=0; cnvCount<codepointsCount; cnvCount++) {
int chCount=codePointLen(walker);
if(chCount<0) {
return chCount; // err
}
walker+=chCount;
}
if(walker-src < destSize && NULL!=dest) {
// enough space at destination
strncpy(src, dest, walker-src);
}
// else do nothing
return walker-src;
}
Second criterion (limited buffer size): just use the first one with the number of codepoints returned by this one
// return negative if UTF encoding error
int howManyCodepointICanFitInOutputBufferOfLen(const char* src, int maxBufflen) {
if(NULL==src) {
return -1;
}
int ret=0;
for(const char* walker=src; *walker && ret<maxBufflen; ret++) {
int advance=codePointLen(walker);
if(advance<0) {
return src-walker; // err because negative, but indicating the err pos
}
// look on all the chars between walker and walker+advance
// if any is 0, we have a premature end of the source
while(advance>0) {
if(0==*(++walker)) {
return src-walker; // err because negative, but indicating the err pos
}
advance--;
} // walker is set on the correct position for the next attempt
}
return ret;
}

static char *CutStringLength(char *lpszData, int nMaxLen)
{
if (NULL == lpszData || 0 >= nMaxLen)
{
return "";
}
int len = strlen(lpszData);
if(len <= nMaxLen)
{
return lpszData;
}
char strTemp[1024] = {0};
strcpy(strTemp, lpszData);
char *p = strTemp;
p = p + (nMaxLen-1);
if ((unsigned char)(*p) < 0xA0)
{
*(++p) = '\0'; // if the last byte is Mandarin character
}
else if ((unsigned char)(*(--p)) < 0xA0)
{
*(++p) = '\0'; // if the last but one byte is Mandarin character
}
else if ((unsigned char)(*(--p)) < 0xA0)
{
*(++p) = '\0'; // if the last but two byte is Mandarin character
}
else
{
int i = 0;
p = strTemp;
while(*p != '\0' && i+2 <= nMaxLen)
{
if((unsigned char)(*p++) >= 0xA0 && (unsigned char)(*p) >= 0xA0)
{
p++;
i++;
}
i++;
}
*p = '\0';
}
printf("str = %s\n",strTemp);
return strTemp;
}

Rotate words around vowels in C

I am trying to write a program that reads the stdin stream looking for words (consecutive alphabetic characters) and for each word rotates it left to the first vowel (e.g. "friend" rotates to "iendfr") and writes this sequence out in place of the original word. All other characters are written to stdout unchanged.
So far, I have managed to reverse the letters, but have been unable to do much more. Any suggestions?
#include <stdio.h>
#include <ctype.h>
#include <string.h>
#define MAX_STK_SIZE 256
char stk[MAX_STK_SIZE];
int tos = 0; // next available place to put char
void push(int c) {
if (tos >= MAX_STK_SIZE) return;
stk[tos++] = c;
}
void putStk() {
while (tos >= 0) {
putchar(stk[--tos]);
}
}
int main (int charc, char * argv[]) {
int c;
do {
c = getchar();
if (isalpha(c) && (c == 'a' || c == 'A' || c == 'e' || c == 'E' || c == 'i' || c == 'o' || c == 'O' || c == 'u' || c == 'U')) {
push(c);
} else if (isalpha(c)) {
push(c);
} else {
putStk();
putchar(c);
}
} while (c != EOF);
}
-Soul

I am not going to write the whole program for you, but this example shows how to rotate a word from the first vowel (if any). The function strcspn returns the index of the first character matching any in the set passed, or the length of the string if no matches are found.
#include <stdio.h>
#include <string.h>
void vowelword(const char *word)
{
size_t len = strlen(word);
size_t index = strcspn(word, "aeiou");
size_t i;
for(i = 0; i < len; i++) {
printf("%c", word[(index + i) % len]);
}
printf("\n");
}
int main(void)
{
vowelword("friend");
vowelword("vwxyz");
vowelword("aeiou");
return 0;
}
Program output:
iendfr
vwxyz
aeiou

There are a number of ways your can approach the problem. You can use a stack, but that just adds handling the additional stack operations. You can use a mathematical reindexing, or you can use a copy and fill solution where you copy from the first vowel to a new string and then simply add the initial characters to the end of the string.
While you can read/write a character at a time, you are probably better served by creating the rotated string in a buffer to allow use of the string within your code. Regardless which method you use, you need to validate all string operations to prevent reading/writing beyond the end of your input and/or rotated strings. An example of a copy/fill approach to rotating to the first vowel in your input could be something like the following:
/* rotate 's' from first vowel with results to 'rs'.
* if 's' contains a vowel, 'rs' contains the rotated string,
* otherwise, 'rs' contais 's'. a pointer to 'rs' is returned
* on success, NULL otherwise and 'rs' is an empty-string.
*/
char *rot2vowel (char *rs, const char *s, size_t max)
{
if (!rs || !s || !max) /* validate params */
return NULL;
char *p = strpbrk (s, "aeiou");
size_t i, idx, len = strlen (s);
if (len > max - 1) { /* validate length */
fprintf (stderr, "error: insuffieient storage (len > max - 1).\n");
return NULL;
}
if (!p) { /* if no vowel, copy s to rs, return rs */
strcpy (rs, s);
return rs;
}
idx = p - s; /* set index offset */
strcpy (rs, p); /* copy from 1st vowel */
for (i = 0; i < idx; i++) /* rotate beginning to end */
rs[i+len-idx] = s[i];
rs[len] = 0; /* nul-terminate */
return rs;
}
Above, strpbrk is used to return a pointer to the first occurrence of a vowel in string 's'. The function takes as parameters a pointer to a adequately sized string to hold the rotated string 'rs', the input string 's' and the allocated size of 'rs' in 'max'. The parameters are validated and s is checked for a vowel with strpbrk which returns a pointer to the first vowel in s (if it exists), NULL otherwise. The length is checked against max to insure adequate storage.
If no vowels are present, s is copied to rs and a pointer to rs returned, otherwise the pointer difference is used to set the offset index to the first vowel, the segment of the string from the first vowel-to-end is copied to rs and then the preceding characters are copied to the end of rs with the loop. rs is nul-terminated and a pointer is returned.
While I rarely recommend the use of scanf for input, (a fgets followed by sscanf or strtok is preferable), for purposes of a short example, it can be used to read individual strings from stdin. Note: responding to upper/lower case vowels is left to you. A short example setting the max word size to 32-chars (31-chars + the nul-terminating char) will work for all known words in the unabridged dictionary (longest word is 28-chars):
#include <stdio.h>
#include <string.h>
enum { BUFSZ = 32 };
char *rot2vowel (char *rs, const char *s, size_t max);
int main (void)
{
char str[BUFSZ] = {0};
char rstr[BUFSZ] = {0};
while (scanf ("%s", str) == 1)
printf (" %-8s => %s\n", str, rot2vowel (rstr, str, sizeof rstr));
return 0;
}
Example Use/Output
(shamelessly borrowing the example strings from WeatherVane :)
$ echo "friend vwxyz aeiou" | ./bin/str_rot2vowel
friend => iendfr
vwxyz => vwxyz
aeiou => aeiou
Look it over and let me know if you have any questions. Note: you can call the rot2vowel function prior to the printf statement and print the results with rstr, but since the function returns a pointer to the string, it can be used directly in the printf statement. How you use it is up to you.

Parsing text in C

I have a file like this:
...
words 13
more words 21
even more words 4
...
(General format is a string of non-digits, then a space, then any number of digits and a newline)
and I'd like to parse every line, putting the words into one field of the structure, and the number into the other. Right now I am using an ugly hack of reading the line while the chars are not numbers, then reading the rest. I believe there's a clearer way.

Edit: You can use pNum-buf to get the length of the alphabetical part of the string, and use strncpy() to copy that into another buffer. Be sure to add a '\0' to the end of the destination buffer. I would insert this code before the pNum++.
int len = pNum-buf;
strncpy(newBuf, buf, len-1);
newBuf[len] = '\0';
You could read the entire line into a buffer and then use:
char *pNum;
if (pNum = strrchr(buf, ' ')) {
pNum++;
}
to get a pointer to the number field.

fscanf(file, "%s %d", word, &value);
This gets the values directly into a string and an integer, and copes with variations in whitespace and numerical formats, etc.
Edit
Ooops, I forgot that you had spaces between the words.
In that case, I'd do the following. (Note that it truncates the original text in 'line')
// Scan to find the last space in the line
char *p = line;
char *lastSpace = null;
while(*p != '\0')
{
if (*p == ' ')
lastSpace = p;
p++;
}
if (lastSpace == null)
return("parse error");
// Replace the last space in the line with a NUL
*lastSpace = '\0';
// Advance past the NUL to the first character of the number field
lastSpace++;
char *word = text;
int number = atoi(lastSpace);
You can solve this using stdlib functions, but the above is likely to be more efficient as you're only searching for the characters you are interested in.

Given the description, I think I'd use a variant of this (now tested) C99 code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
struct word_number
{
char word[128];
long number;
};
int read_word_number(FILE *fp, struct word_number *wnp)
{
char buffer[140];
if (fgets(buffer, sizeof(buffer), fp) == 0)
return EOF;
size_t len = strlen(buffer);
if (buffer[len-1] != '\n') // Error if line too long to fit
return EOF;
buffer[--len] = '\0';
char *num = &buffer[len-1];
while (num > buffer && !isspace((unsigned char)*num))
num--;
if (num == buffer) // No space in input data
return EOF;
char *end;
wnp->number = strtol(num+1, &end, 0);
if (*end != '\0') // Invalid number as last word on line
return EOF;
*num = '\0';
if (num - buffer >= sizeof(wnp->word)) // Non-number part too long
return EOF;
memcpy(wnp->word, buffer, num - buffer);
return(0);
}
int main(void)
{
struct word_number wn;
while (read_word_number(stdin, &wn) != EOF)
printf("Word <<%s>> Number %ld\n", wn.word, wn.number);
return(0);
}
You could improve the error reporting by returning different values for different problems.
You could make it work with dynamically allocated memory for the word portion of the lines.
You could make it work with longer lines than I allow.
You could scan backwards over digits instead of non-spaces - but this allows the user to write "abc 0x123" and the hex value is handled correctly.
You might prefer to ensure there are no digits in the word part; this code does not care.

You could try using strtok() to tokenize each line, and then check whether each token is a number or a word (a fairly trivial check once you have the token string - just look at the first character of the token).

Assuming that the number is immediately followed by '\n'.
you can read each line to chars buffer, use sscanf("%d") on the entire line to get the number, and then calculate the number of chars that this number takes at the end of the text string.

Depending on how complex your strings become you may want to use the PCRE library. At least that way you can compile a perl'ish regular expression to split your lines. It may be overkill though.

Given the description, here's what I'd do: read each line as a single string using fgets() (making sure the target buffer is large enough), then split the line using strtok(). To determine if each token is a word or a number, I'd use strtol() to attempt the conversion and check the error condition. Example:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
/**
* Read the next line from the file, splitting the tokens into
* multiple strings and a single integer. Assumes input lines
* never exceed MAX_LINE_LENGTH and each individual string never
* exceeds MAX_STR_SIZE. Otherwise things get a little more
* interesting. Also assumes that the integer is the last
* thing on each line.
*/
int getNextLine(FILE *in, char (*strs)[MAX_STR_SIZE], int *numStrings, int *value)
{
char buffer[MAX_LINE_LENGTH];
int rval = 1;
if (fgets(buffer, buffer, sizeof buffer))
{
char *token = strtok(buffer, " ");
*numStrings = 0;
while (token)
{
char *chk;
*value = (int) strtol(token, &chk, 10);
if (*chk != 0 && *chk != '\n')
{
strcpy(strs[(*numStrings)++], token);
}
token = strtok(NULL, " ");
}
}
else
{
/**
* fgets() hit either EOF or error; either way return 0
*/
rval = 0;
}
return rval;
}
/**
* sample main
*/
int main(void)
{
FILE *input;
char strings[MAX_NUM_STRINGS][MAX_STRING_LENGTH];
int numStrings;
int value;
input = fopen("datafile.txt", "r");
if (input)
{
while (getNextLine(input, &strings, &numStrings, &value))
{
/**
* Do something with strings and value here
*/
}
fclose(input);
}
return 0;
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Unencoding data from POST (CGI and C) - c

The sscanf function translates 2 hex chars into a single int value. This value is equal to the ASCII value, therefore put in 'dest'. Since 2 chars has been decoded, src has to increase two positions. So '%21' -> 0x21 -> char '!'

Related

Convert a string to the character that it represents

C if statement, optimal way to check for special characters and letters

how to cut out a chinese words & english words mixture string in c language

Rotate words around vowels in C

Parsing text in C

Categories

Resources