I have a txt file with thousands of lines. Length of each line varies. The txt file mainly contains hex data in bytes. For example:
01 01 04 03 = 4 bytes.
Second line might contain 8 bytes, 3rd 40 bytes and so on. There are thousands of such lines.
Now I want to read these bytes into int buffer. I am reading into char buffer and in the memory it saves as 0001 0001 0004 0003, which I do not want and it is considered as 8 Bytes. In memory, it saves as 3031 3031 3034 3030 (ASCII) as it is char buffer. I am converting this to 0001 0001 0004 0003.
Below is my piece of code
FILE *file;
char buffer[100] = { '\0' };
char line[100] = { '0' };
if(file!=NULL)
{
while(fgets(line, sizeof(line), file)!=NULL)
{
for(i = 0; (line[i] != '\r') ; i++)
{
buffer[i] = line[i];
}
}
}
I want to read line by line not entire file at once. In the memory I want to see as just 01 01 04 03. I guess using int buffer will help. As soon as it reads the file into buffer line, it is stored as char. Any suggestions please?
I would read in a line, then use strtol to convert the individual numbers in the input. strtol gives you a pointer to the character at which the conversion failed, which you can use as a starting point to find/convert the next number.
You can convert small hex numbers:
#include <ctype.h>
uint8_t digits2hex(char *digits) {
uint8_t r = 0;
while (isxdigit(*digits)) {
r = r * 16 + (*digit - '0');
digit++;
/* check size? */
}
return r;
}
/* ... */
for(i = 0; (line[i] != '\r') ; i+=2)
{
hexnumbers[hexcount++] = digits2hex(line + i);
/* skip white space */
while (isspace(line[i]))
i++
}
You seem to be confusing the textual representation of a byte with the value of the byte (or alternately, expecting your compiler to do more than it does.)
When your program reads in "01", it is reading in two bytes whose values correspond to the ASCII codes for the characters "0" and "1". C doesn't do anything special with them, so you need to convert this sequence into a one-byte value. Note that a C char is one byte and so is the right size to hold this result. This is a coincidence and in any case is not true for Unicode and other wide character encodings.
There are several ways to do this conversion. You can do arithmetic on the bytes yourself like this:
unsigned char charToHex(char c) {
if (isdigit(c)) return c - '0';
return 9 + toupper(c) - 'A';
}
...
first = getc(fh);
second = getc(fh);
buffer[*end] = charToHex(first) << 4 | charToHex(second);
(Note that I'm using getc() to read the characters instead of fgets(). I'll go into that later.)
Note also that 'first' is the most significant half-byte of the input.
You can also (re)create a string from the two bytes and call strtol on it:
char buffer[3];
buffer[0] = first;
buffer[1] = second;
buffer[2] = 0; // null-terminator
buffer[*end] = (char)strtol(buffer, NULL, 16);
Related to this, you'd probably have better luck using getc() to read in the file one character at a time, ignoring anything that isn't a hex digit. That way, you won't get a buffer overflow if an input line is longer than the buffer you pass to fgets(). It also makes it easier to tolerate garbage in the input file.
Here's a complete example of this. It uses isxdigit() to detect hex characters and ignores anything else including single hex digits:
// Given a single hex digit, return its numeric value
unsigned char charToHex(char c) {
if (isdigit(c)) return c - '0';
return 9 + toupper(c) - 'A';
}
// Read in file 'fh' and for each pair of hex digits found, append
// the corresponding value to 'buffer'. '*end' is set to the index
// of the last byte written to 'buffer', which is assumed to have enough
// space.
void readBuffer(FILE *fh, unsigned char buffer[], size_t *end) {
for (;;) {
// Advance to the next hex digit in the stream.
int first;
do {
first = getc(fh);
if (first == EOF) return;
} while (!isxdigit(first));
int second;
second = getc(fh);
// Ignore any single hex digits
if (!isxdigit(second)) continue;
// Compute the hex value and append it to the array.
buffer[*end] = charToHex(first) << 4 | charToHex(second);
(*end)++;
}
}
FILE *fp = ...;
int buffer[1024]; /*enough memery*/
int r_pos = 0;/*read start position*/
char line[128];
char tmp[4];
char *cp;
if(fp) {
while(NULL!=fgets(line, sizeof(line), fp)) {
cp = line;
while(sscanf(cp, "%d %d %d %d", &tmp[0], &tmp[1], &tmp[2], &tmp[3])==4) {
buffer[r_pos++] = *(int *)tmp; /*or ntohl(*(int *)tmp)*/
cp += strlen("01 01 04 03 ");
}
}
}
Related
I'm programming something that counts the number of UTF-8 characters in a file. I've already written the base code but now, I'm stuck in the part where the characters are supposed to be counted. So far, these are what I have:
What's inside the text file:
黄埔炒蛋
你好
こんにちは
여보세요
What I've coded so far:
#include <stdio.h>
typedef unsigned char BYTE;
int main(int argc, char const *argv[])
{
FILE *file = fopen("file.txt", "r");
if (!file)
{
printf("Could not open file.\n");
return 1;
}
int count = 0;
while(1)
{
BYTE b;
fread(&b, 1, 1, file);
if (feof(file))
{
break;
}
count++;
}
printf("Number of characters: %i\n", count);
fclose(file);
return 0;
}
My question is, how would I code the part where the UTF-8 characters are being counted? I tried to look for inspirations in GitHub and YouTube but I haven't found anything that works well with my code yet.
Edit: Originally, this code prints that the text file has 48 characters. But considering UTF-8, it should only be 18 characters.
See: https://en.wikipedia.org/wiki/UTF-8#Encoding
Each UTF-8 sequence contains one starting byte and zero or more extra bytes.
Extra bytes always start with bits 10 and first byte never starts with that sequence.
You can use that information to count only first byte in each UTF-8 sequence.
if((b&0xC0) != 0x80) {
count++;
}
Keep in mind this will break, if file contains invalid UTF-8 sequences.
Also, "UTF-8 characters" might mean different things. For example "👩🏿" will be counted as two characters by this method.
In C, as in C++, there is no ready-made solution for counting UTF-8 characters. You can convert UTF-8 to UTF-16 using mbstowcs and use the wcslen function, but this is not the best way for performance (especially if you only need to count the number of characters and nothing else).
I think a good answer to your question is here: counting unicode characters in c++.
Еxample from answer on link:
for (p; *p != 0; ++p)
count += ((*p & 0xc0) != 0x80);
You could look into the specs: https://www.rfc-editor.org/rfc/rfc3629.
Chapter 3 has this table in it:
Char. number range | UTF-8 octet sequence
(hexadecimal) | (binary)
--------------------+---------------------------------------------
0000 0000-0000 007F | 0xxxxxxx
0000 0080-0000 07FF | 110xxxxx 10xxxxxx
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
You could inspect the bytes and build the unicode characters.
A different point is, whether you would count a base character and its accent (combining mark cf. https://en.wikipedia.org/wiki/Combining_character) as one or as several characters.
There are multiple options you may take:
you may depend on your system implementation of wide encoding and multibyte encoding
you may read the file as a wide stream and just count the bytes, depend on the system to do UTF-8 multibyte string to wide string conversion on it's own (see main1 below)
you may read the file as bytes and convert the multibyte string into a wide string and count bytes (see main2 below)
You may use an external library that operates on UTF-8 strings and count the unicode characters (see main3 below that uses libunistring)
Or roll your own utf8_strlen-ish solution that will work on specific UTF-8 string property and check the bytes yourself, as showed in other answers.
Here is an example program that has to be compiled with -lunistring under linux with rudimentary error checking with assert:
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <assert.h>
#include <stdlib.h>
void main1()
{
// read the file as wide characters
const char *l = setlocale(LC_ALL, "en_US.UTF-8");
assert(l);
FILE *file = fopen("file.txt", "r");
assert(file);
int count = 0;
while(fgetwc(file) != WEOF) {
count++;
}
fclose(file);
printf("Number of characters: %i\n", count);
}
// just a helper function cause i'm lazy
char *file_to_buf(const char *filename, size_t *strlen) {
FILE *file = fopen(filename, "r");
assert(file);
size_t n = 0;
char *ret = malloc(1);
assert(ret);
for (int c; (c = fgetc(file)) != EOF;) {
ret = realloc(ret, n + 2);
assert(ret);
ret[n++] = c;
}
ret[n] = '\0';
*strlen = n;
fclose(file);
return ret;
}
void main2() {
const char *l = setlocale(LC_ALL, "en_US.UTF-8");
assert(l);
size_t strlen = 0;
char *str = file_to_buf("file.txt", &strlen);
assert(str);
// convert multibye string to wide string
// assuming multibytes are in UTF-8
// this may also be done in a streaming fashion when reading byte by byte from a file
// and calling with `mbtowc` and checking errno for EILSEQ and managing some buffer
mbstate_t ps = {0};
const char *tmp = str;
size_t count = mbsrtowcs(NULL, &tmp, 0, &ps);
assert(count != (size_t)-1);
printf("Number of characters: %zu\n", count);
free(str);
}
#include <unistr.h> // u8_mbsnlen from libunistring
void main3() {
size_t strlen = 0;
char *str = file_to_buf("file.txt", &strlen);
assert(str);
// for simplicity I am assuming uint8_t is equal to unisgned char
size_t count = u8_mbsnlen((const uint8_t *)str, strlen);
printf("Number of characters: %zu\n", count);
free(str);
}
int main() {
main1();
main2();
main3();
}
I have a string that contains both Mandarin and English words in UTF-8:
char *str = "你a好测b试";
If you use strlen(str), it will return 14, because each Mandarin character uses three bytes, while each English character uses only one byte.
Now I want to copy the leftmost 4 characters ("你a好测"), and append "..." at the end, to give "你a好测...".
If the text were in a single-byte encoding, I could just write:
strncpy(buf, str, 4);
strcat(buf, "...");
But 4 characters in UTF-8 isn't necessarily 4 bytes. For this example, it will be 13 bytes: three each for 你, 好 and 测 and one for a. So, for this specific case, I would need
strncpy(buf, str, 13);
strcat(buf, "...");
If I had a wrong value for the length, I could produce a broken UTF-8 stream with an incomplete character. Obviously I want to avoid that.
How can I compute the right number of bytes to copy, corresponding to a given number of characters?
First you need to know your encoding. By the sound of it (3 byte Mandarin) your string is encoded with UTF-8.
What you need to do is convert the UTF-8 back to unicode code points (integers). You can then have an array of integers rather than bytes, so each element of the array will be 1 character, reguardless of the language.
You could also use a library of functions that already handle utf8 such as http://www.cprogramming.com/tutorial/utf8.c
http://www.cprogramming.com/tutorial/utf8.h
In particular this function: int u8_toucs(u_int32_t *dest, int sz, char *src, int srcsz); might be very useful, it will create an array of integers, with each integer being 1 character. You can then modify the array as you see fit, then convert it back again with int u8_toutf8(char *dest, int sz, u_int32_t *src, int srcsz);
I would recommend dealing with this at a higher level of abstraction: either convert to wchar_t or use a UTF-8 library. But if you really want to do it at the byte level, you could count characters by skipping over the continuation bytes (which are of the form 10xxxxxx):
#include <stddef.h>
size_t count_bytes_for_chars(const char *s, int n)
{
const char *p = s;
n += 1; /* we're counting up to the start of the subsequent character */
while (*p && (n -= (*p & 0xc0) != 0x80))
++p;
return p-s;
}
Here's a demonstration of the above function:
#include <string.h>
#include <stdio.h>
int main()
{
const char *str = "你a好测b试";
char buf[50];
int truncate_at = 4;
size_t bytes = count_bytes_for_chars(str, truncate_at);
strncpy(buf, str, bytes);
strcpy(buf+bytes, "...");
printf("'%s' truncated to %d characters is '%s'\n", str, truncate_at, buf);
}
Output:
'你a好测b试' truncated to 4 characters is '你a好测...'
The Basic Multilingual Plane was designed to contain characters for almost all modern languages. In particular, it does contain Chinese.
So you just have to convert your UTF8 string to a UTF16 one to have each character using one single position. That means that you can just use a wchar_t array or even better a wstring to be allowed to use natively all string functions.
Starting with C++11, the <codecvt> header declares a dedicated converter std::codecvt_utf8 to specifically convert UTF8 narrow strings to wide Unicode ones. I must admit it is not very easy to use, but it should be enough here. Code could be like:
char str[] = "你a好测b试";
std::codecvt_utf8<wchar_t> cvt;
std::mbstate_t state = std::mbstate_t();
wchar_t wstr[sizeof(str)] = {0}; // there will be unused space at the end
const char *end;
wchar_t *wend;
auto cr = cvt.in(state, str, str+sizeof(str), end,
wstr, wstr+sizeof(str), wend);
*wend = 0;
Once you have the wstr wide string, you can convert it to a wstring and use all the C++ library tools, or if you prefer C strings you can use the ws... counterparts of the str... functions.
Pure C solution:
All UTF8 multibyte characters will be made from char-s with the most-significant-bit set to 1 with the first bits of their first character indicating how many characters makes a codepoint.
The question is ambiguous in regards to the criterion used in cutting; either:
a fixed number of codepoints followed by three dots, this wil require a variable size output buffer
a fixed size output buffer, which will impose "whatever you can fit inside"
Both the solutions will require a helper function telling how many chars make the next codepoint:
// Note: the function does NOT fully validate a
// UTF8 sequence, only looks at the first char in it
int codePointLen(const char* c) {
if(NULL==c) return -1;
if( (*c & 0xF8)==0xF0 ) return 4; // 4 ones and one 0
if( (*c & 0xF0)==0xE0 ) return 3; // 3 ones and one 0
if( (*c & 0xE0)==0xC0 ) return 2; // 2 ones and one 0
if( (*c & 0x7F)==*c ) return 1; // no ones on msb
return -2; // invalid UTF8 starting character
}
So, solution for the criterion 1 (fixed number of code points, variable output buff size) - does not append ... to the destination, but you can ask "how many chars I need" upfront and if it is longer than you can afford, reserve yourself the extra space.
// returns the number of chars used from the output
// If not enough space or the dest is null, does nothing
// and returns the lenght required for the output buffer
// Returns negative val if the source in not a valid UTF8
int copyFirstCodepoints(
int codepointsCount, const char* src,
char* dest, int destSize
) {
if(NULL==src) {
return -1;
}
// do a cold run to see if size of the output buffer can fit
// as many codepoints as required
const char* walker=src;
for(int cnvCount=0; cnvCount<codepointsCount; cnvCount++) {
int chCount=codePointLen(walker);
if(chCount<0) {
return chCount; // err
}
walker+=chCount;
}
if(walker-src < destSize && NULL!=dest) {
// enough space at destination
strncpy(src, dest, walker-src);
}
// else do nothing
return walker-src;
}
Second criterion (limited buffer size): just use the first one with the number of codepoints returned by this one
// return negative if UTF encoding error
int howManyCodepointICanFitInOutputBufferOfLen(const char* src, int maxBufflen) {
if(NULL==src) {
return -1;
}
int ret=0;
for(const char* walker=src; *walker && ret<maxBufflen; ret++) {
int advance=codePointLen(walker);
if(advance<0) {
return src-walker; // err because negative, but indicating the err pos
}
// look on all the chars between walker and walker+advance
// if any is 0, we have a premature end of the source
while(advance>0) {
if(0==*(++walker)) {
return src-walker; // err because negative, but indicating the err pos
}
advance--;
} // walker is set on the correct position for the next attempt
}
return ret;
}
static char *CutStringLength(char *lpszData, int nMaxLen)
{
if (NULL == lpszData || 0 >= nMaxLen)
{
return "";
}
int len = strlen(lpszData);
if(len <= nMaxLen)
{
return lpszData;
}
char strTemp[1024] = {0};
strcpy(strTemp, lpszData);
char *p = strTemp;
p = p + (nMaxLen-1);
if ((unsigned char)(*p) < 0xA0)
{
*(++p) = '\0'; // if the last byte is Mandarin character
}
else if ((unsigned char)(*(--p)) < 0xA0)
{
*(++p) = '\0'; // if the last but one byte is Mandarin character
}
else if ((unsigned char)(*(--p)) < 0xA0)
{
*(++p) = '\0'; // if the last but two byte is Mandarin character
}
else
{
int i = 0;
p = strTemp;
while(*p != '\0' && i+2 <= nMaxLen)
{
if((unsigned char)(*p++) >= 0xA0 && (unsigned char)(*p) >= 0xA0)
{
p++;
i++;
}
i++;
}
*p = '\0';
}
printf("str = %s\n",strTemp);
return strTemp;
}
I want to shift elements in my array to right and fill the shifted elements with zero (as string) . but it's a bit more difficult than i thought.
this program reads a text file containing two lines. each line contains of an integer (maximum length is defined). it saves each line in a separate array.
I need to convert these two arrays to integer array later and do some arithmetic operations.
so for that, i need to make sure that these two arrays have the same length
for example my input is:
num_first : 1859654
num_second: 5654
now i need them to be:
num_First : 1859654 (it's larger than second number so it doesn't change)
num second: 0005654 (smaller than first, so we need to add those leading zeros)
how do i add those leading zeros to array based on difference of two input??
and at last i want to save them in arrays (as string or integer).
#include <stdio.h>
#include <string.h>
#define SIZE_MAX 20
int main()
{
FILE *fPTR;
char num_first[SIZE_MAX]; // string input
char num_second[SIZE_MAX];
if ((fPTR = fopen("input.txt", "r")) == NULL) // our file contains two line of integers. one at each
{
puts("File could not be opened.");
}
else
{
if (fgets(num_first, SIZE_MAX, fPTR) != NULL) // reads first line and saves to num_first
puts(num_first); // prints first number
if (fgets(num_second, SIZE_MAX, fPTR) != NULL) // reads second line and saves to num_second
puts(num_second); // prints second number
fclose(fPTR);
}
// getting strings lengths
int fLEN = strlen(num_first) - 1;
int sLEN = strlen(num_second);
int e = 0; // difference between two string lengths
// here we get the difference and it's the place which i want to shif the arrays
if (fLEN>sLEN) // first string is bigger than second
{
e = fLEN-sLEN;
}
else if (sLEN>fLEN) // second string is bigger than first
{
e = sLEN-fLEN;
}
else // there is no difference between two strings
{
e = fLEN-sLEN;
}
}
edit: i also got another idea but it doesn't work as intended.
if (fLEN>sLEN) // first string is bigger than second
{
e = fLEN-sLEN;
for(i=e;i>=0;i--)
{
num_second[i+1] = num_second[i];
}
for(i=fLEN-e;i>=0;i--)
{
num_second[i] = '0';
}
}
edit 2: (slowly started to working) prints more zero. trying to fix it
#include <stdio.h>
#include <string.h>
#define SIZE_MAX 20
int main()
{
FILE *fPTR;
char num_first[SIZE_MAX]; // string input
char num_second[SIZE_MAX];
char num_second2[SIZE_MAX] = {0};
int i = 0;
char numbers[SIZE_MAX];
if ((fPTR = fopen("input.txt", "r")) == NULL) // our file contains two line of integers. one at each
{
puts("File could not be opened.");
}
else
{
if (fgets(num_first, SIZE_MAX, fPTR) != NULL) // reads first line and saves to num_first
puts(num_first); // prints first number
if (fgets(num_second, SIZE_MAX, fPTR) != NULL) // reads second line and saves to num_second
puts(num_second); // prints second number
fclose(fPTR);
}
// getting strings lengths
int fLEN = strlen(num_first) - 1;
int sLEN = strlen(num_second);
int e = 0; // difference between two string lengths
int h = 4;
// here we get the difference and it's the place which i want to shif the arrays
if (fLEN>sLEN) // first string is bigger than second
{
e = fLEN-sLEN;
while (i>= h)//for(i=e;i>=0;i--)
{
num_second2[i+h] = num_second[i];
i++;
h--;
}
for(i=fLEN-e;i>=0;i--)
{
// num_second[i] = '0';
}
}
else if (sLEN>fLEN) // second string is bigger than first
{
e = sLEN-fLEN;
}
else // there is no difference between two strings
{
e = fLEN-sLEN;
}
printf("\n%d\n", e);
for (i = 0; i < SIZE_MAX; i++) // using c to print
{
printf("%d", num_second2[i]);
}
puts(num_second);
}
#include <stdio.h>
#include <string.h>
#define SIZE_MAX 20
int main()
{
FILE *fPTR;
char num_first[SIZE_MAX]; // string input
char num_second[SIZE_MAX];
char num_zeros[SIZE_MAX];//array for leading zeros
int i = 0;
char numbers[SIZE_MAX];
if ((fPTR = fopen("input.txt", "r")) == NULL) // our file contains two line of integers. one at each
{
puts("File could not be opened.");
}
else
{
if (fgets(num_first, SIZE_MAX, fPTR) != NULL) // reads first line and saves to num_first
puts(num_first); // prints first number
if (fgets(num_second, SIZE_MAX, fPTR) != NULL) // reads second line and saves to num_second
puts(num_second); // prints second number
fclose(fPTR);
}
for ( i = 0; i < SIZE_MAX; i++)
{
num_zeros[i] = '0';//fill array with '0's
}
// getting strings lengths
int fLEN = strlen(num_first);
if ( fLEN && num_first[fLEN - 1] == '\n')
{
num_first[fLEN - 1] = '\0';//remove trailing newline
fLEN--;
}
int sLEN = strlen(num_second);
if ( sLEN && num_second[sLEN - 1] == '\n')
{
num_second[sLEN - 1] = '\0';//remove trailing newline
sLEN--;
}
int e = 0; // difference between two string lengths
// here we get the difference and it's the place which i want to shif the arrays
if (fLEN>sLEN) // first string is bigger than second
{
e = fLEN-sLEN;
num_zeros[e] = '\0';//terminate array leaving e leading zeros
strcat ( num_zeros, num_second);
strcpy ( num_second, num_zeros);
}
else if (sLEN>fLEN) // second string is bigger than first
{
e = sLEN-fLEN;
while ( fLEN >= 0)//start at end of array
{
num_first[fLEN + e] = num_first[fLEN];//copy each element e items from current location
fLEN--;// decrement length
}
while ( e)// number of leading zeros
{
e--;
num_first[e] = '0';// set first e elements to '0'
}
}
else // there is no difference between two strings
{
//e = fLEN-sLEN;
}
puts(num_first);
puts(num_second);
return 0;
}
Good start with your code.
Since you're reading strings you can just use string manipulation.
If you wanted to read them as ints you could determine the size with a logarithm function, but that would be overkill.
You could save the numbers as ints, but then you'd have to defer the padding until you printed them later or saved them to a file.
The easiest way to left pad the numbers with zeroes would be to use sprintf() with the correct format specifier to right justify the number. Then you can iterate through each character of the result and replace space e.g. ' ' with '0'. That will create left-side 0-padded entries. E.g. sprintf right justifies your number in a buffer that has room to hold the max sized number you can read, leaving spaces on the left.
Then in a loop indexing one character at a time in the entry, and based on MAX_NUMBER_LEN shown below, you skip extra spaces on the left you don't want zeroes in (e.g. MAX_NUMBER_LEN - maxPaddingLenYouCalculateAtRuntime), and start replacing with zeroes.
Then it's just a matter of creating a buffer whose address you'll pass to sprintf() that has enough space allocated to hold the result. That will have to be as big or bigger than your max length. I would call it maxStrLen rather than e, because naming variables for what they're used for makes for an easier to understand and maintain program.
You have a few choices as to how to allocate that buffer of the right size, including using malloc(). But it is probably easier to determine the maximum size that an integer could be. There's even a C constant that tells you what a 32-bit or 64 bit Integer max. value is, and create an char array of fixed length entries based on that size in advance.
For example:
#define MAX_ENTRIES = 10
#define MAX_NUMBER_LEN = 15
char numbers[MAX_ENTRIES][MAX_NUMBER_LEN]
That would give you the storage to store your sprintf() results in.
Example:
sprintf(numbers[entryNumber], "*right-justifying format specifier you look up*", numberReadFromFile)
Wherein entryNumber is which slot in the array you want to store the result.
The MAX_NUMBER_LEN part you don't need to include when getting the address for sprintf (notice I just passed in numbers[entryNumber], but not the 2nd set of brackets, intentionally). Because by omitting the second set of brackets, you're saying you want the address of the specific [MAX_NUMBER_LEN] chunk corresponding to the entryNumber, inside the numbers array. Note, there is no & on numberReadFromFile either, because you are reading them into a char array as strings, and because you're passing the address of the first char of an array you can just pass the array name. Alternatively you could pass in &numberReadFromFile[0] to get the address of the first element to pass to sprintf().
And when using arrays that way, you don't need the & character to get the variable address to pass to sprintf(), as you would if you were not passing in an array element, because arrays really are simply another notation for pointers in C, and understanding how that works in general is the key to being effective with C in general and worth the initial struggle to comprehend.
Here's an example of how to do the actual zero padding, based on what you got into your array from sprintf. I won't code up a full example because learning C is about the struggle to actually do it yourself. There's no shortcut to real comprehension. I'm giving you the hardest to discover aspects, by you working it through and making a solution of it, you'll gain quite a bit of mastery. This code is off the top of my head, not compiled and tested. It would either work or be close to working.
/* numberToPad is a buffer containing a right-justified number, e.g. a number
* shifted-right by sprintf(), in a buffer sized 15 characters,
* space-padded on the left by sprintf's right-justifying format specifier.
* We want to convert the number to a 10-digit zero-padded number
* inside a 15 character field. The result should be 4 spaces, followed
* by some zeroes, followed by non-zero digits, followed by null-terminator.
* example: "ƀƀƀƀ0000012345\0" where ƀ is a space character.
*/
#define MAX_NUMBER_LEN 15 /* note: 15 includes null-terminator of string */
int maxNumberLenActuallyRead = 10;
int startZeroPaddingPos = MAX_NUMBER_LEN - maxNumberLenActuallyRead
char numberToPad[MAX_NUMBER_LEN];
int i;
for (i = 0; i < MAX_NUMBER_LEN; i++) {
if (i < startZeroPaddingPos)
continue;
if (numberToPad[i] == ' ')
numberToPad[i] = '0';
}
int fd = open(argv[argc-1], O_RDONLY, 0);
if (fd >=0) {
char buff[4096]; //should be better sized based on stat
ssize_t readBytes;
int j;
readBytes = read(fd, buff, 4096);
char out[4096];
for (j=0; buff[j] != '\0'; j++) {
out[j] = buff[j];
//printf("%d ", out[j]);
}
write(STDOUT_FILENO, out, j+1);
close(fd);
}
else {
perror("File not opened.\n");
exit(errno);
}
This is code for a file dump program. The goal is to have a file and dump its contents to the command line both as ASCII chars and as hex/dec values. The current code is able to dump the ascii values, but not the hex/dec. We are allowed to use printf (as seen in the commented out section) but we can get extra credit if we don't use any high level (higher than system) functions. I have tried multiple ways to manipulate the char array in the loop, but it seems no matter how I try to add or cast the chars they come out as chars.
This isn't surprising since I know chars are, at least in C, technically integers. I am at a loss for how to print the hex/dec value of a char using write() and as have yet not seen any answers on stack that don't default to printf() or putchar()
You could make a larger buffer, make the conversion from ASCII to hex/dec (as needed) in that and print the new one. I hope this example illustrates the idea:
#include <stdlib.h>
#include <io.h>
int main (int argc, char** argv)
{
const char* pHexLookup = "0123456789abcdef";
char pBuffer[] = {'a', 'b', 'c'}; // Assume buffer is the contents of the file you have already read in
size_t nInputSize = sizeof(pBuffer); // You will set this according to how much your input read in
char* pOutputBuffer = (char*)malloc(nInputSize * 3); // This should be sufficient for hex, since it takes max 2 symbols, for decimal you should multiply by 4
for (size_t nByte = 0; nByte < nInputSize; ++nByte)
{
pOutputBuffer[3 * nByte] = pBuffer[nByte];
pOutputBuffer[3 * nByte + 1] = pHexLookup[pBuffer[nByte] / 16];
pOutputBuffer[3 * nByte + 2] = pHexLookup[pBuffer[nByte] % 16];
}
write(1 /*STDOUT_FILENO*/, pOutputBuffer, nInputSize * 3);
free(pOutputBuffer);
return EXIT_SUCCESS;
}
This will print a61b62c63, the ASCII and hex values side by side.
This was done on Windows so don't try to copy it directly, I tried to stick to POSIX system calls. Bascially for hex you allocate a memory chunk 3 times larger than the original (or more if you need to pad the output with spaces) and put an ASCII symbol that corresponds to the hex value of the byte next to it. For decimal you will need more space since it the value can span to 3 characters. And then just write the new buffer. Hope this is clear enough.
How about:
unsigned char val;
val = *out / 100 + 48;
write(STDOUT_FILENO, &val, 1);
val = (*out - *out / 100 * 100 ) / 10 + 48;
write(STDOUT_FILENO, &val, 1);
val = (*out - *out / 10 * 10) + 48;
I was just reading this page http://www.cs.tut.fi/~jkorpela/forms/cgic.html about getting started with CGI in C. I had a question about the code in the unencoding part.
#include <stdio.h>
#include <stdlib.h>
#define MAXLEN 80
#define EXTRA 5
/* 4 for field name "data", 1 for "=" */
#define MAXINPUT MAXLEN+EXTRA+2
/* 1 for added line break, 1 for trailing NUL */
#define DATAFILE "../data/data.txt"
void unencode(char *src, char *last, char *dest)
{
for(; src != last; src++, dest++)
if(*src == '+')
*dest = ' ';
else if(*src == '%') {
int code;
if(sscanf(src+1, "%2x", &code) != 1) code = '?';
*dest = code;
src +=2; }
else
*dest = *src;
*dest = '\n';
*++dest = '\0';
}
int main(void)
{
char *lenstr;
char input[MAXINPUT], data[MAXINPUT];
long len;
printf("%s%c%c\n",
"Content-Type:text/html;charset=iso-8859-1",13,10);
printf("<TITLE>Response</TITLE>\n");
lenstr = getenv("CONTENT_LENGTH");
if(lenstr == NULL || sscanf(lenstr,"%ld",&len)!=1 || len > MAXLEN)
printf("<P>Error in invocation - wrong FORM probably.");
else {
FILE *f;
fgets(input, len+1, stdin);
unencode(input+EXTRA, input+len, data);
f = fopen(DATAFILE, "a");
if(f == NULL)
printf("<P>Sorry, cannot store your data.");
else
fputs(data, f);
fclose(f);
printf("<P>Thank you! Your contribution has been stored.");
}
return 0;
}
I was wondering exactly how these lines:
else if(*src == '%') {
int code;
if(sscanf(src+1, "%2x", &code) != 1) code = '?';
*dest = code;
src +=2; }
convert something like %21 back into the exclamation mark?
Thanks!
else if(*src == '%') {
int code;
if(sscanf(src+1, "%2x", &code) != 1) code = '?';
*dest = code;
src +=2;
}
If the string begins with a % character, sscanf() is used to parse the following hexadecimal characters. The "%x" format converts hexadecimal characters to a integer value (in this case, a character code), and the 2 specifies a maximum field width, so that it consumes at most 2 characters.
The return value of sscanf() indicate the number of successful conversions, so if it doesn't return 1, it didn't find a valid hexadecimal number.
Then the character code is assigned to *dest, and the src pointer is advanced to point to the next character after the %xx sequence.
There are actually three bugs here:
The "%x" format specifier expects an argument of type unsigned int *. A signed int * was passed which, I believe, invokes undefined behaviour. Variadic functions (such as sscanf()) have unusal ways of passing the arguments, and it is required that the format specifier matches the type of the argument.
However, the two types are similar enough that it will probably work just fine in practice.
It also accepts signed hexadecimal numbers (with a + or - character), which is probably not what the author intended.
For example, "%-ffText" would result in code == -15.
The src pointer is advanced by 2 bytes, but scanf() doesn't necessarily consume 2 characters.
"%fText" would result in code == 15, and consume only one character (other than the % character). The example above would consume 3 characters.
The sscanf function translates 2 hex chars into a single int value. This value is equal to the ASCII value, therefore put in 'dest'. Since 2 chars has been decoded, src has to increase two positions.
So '%21' -> 0x21 -> char '!'