Is it possible "force" UTF-8 in a C program? - c

Usually when I want my program to use UTF-8 encoding, I write setlocale (LC_ALL, "");. But today I found that it's just setting locate to environment's default locale, and I can't know whether the environment is using UTF-8 by default.
I wonder is there any way to force the character encoding to be UTF-8? Also, is there any way to check whether my program is using UTF-8?

It is possible, but it is the completely wrong thing to do.
First of all, the current locale is for the user to decide. It is not just the character set, but also the language, date and time formats, and so on. Your program has absolutely no "right" to mess with it.
If you cannot localize your program, just tell the user the environmental requirements your program has, and let them worry about it.
Really, you should not really rely on UTF-8 being the current encoding, but use wide character support, including functions like wctype(), mbstowcs(), and so on. POSIXy systems also provide iconv_open() and iconv() function family in their C libraries to convert between encodings (which should always include conversion to and from wchar_t); on Windows, you need a separate version libiconv library. This is how for example the GCC compiler handles different character sets. (Internally, it uses Unicode/UTF-8, but if you ask it to, it can do the necessary conversions to work with other character sets.)
I am personally a strong proponent of using UTF-8 everywhere, but overriding the user locale in a program is horrific. Abominable. Distasteful; like a desktop applet changing the display resolution because the programmer is particularly fond of certain one.
I would be happy to write some example code to show how to correctly solve any character-set-sensible situation, but there are so many, I don't know where to start.
If the OP amends their question to state exactly what problem overriding the character set is supposed to solve, I'm willing to show how to use the aforementioned utilities and POSIX facilities (or equivalent freely available libraries on Windows) to solve it correctly.
If this seems harsh to someone, it is, but only because taking the easy and simple route here (overriding the user's locale setting) is so ... wrong, purely on technical grounds. Even no action is better, and actually quite acceptable, as long as you just document your application only handles UTF-8 input/output.
Example 1. Localized Happy New Year!
#include <stdlib.h>
#include <locale.h>
#include <stdio.h>
#include <wchar.h>
int main(void)
{
/* We wish to use the user's current locale. */
setlocale(LC_ALL, "");
/* We intend to use wide functions on standard output. */
fwide(stdout, 1);
/* For Windows compatibility, print out a Byte Order Mark.
* If you save the output to a file, this helps tell Windows
* applications that the file is Unicode.
* Other systems don't need it nor use it.
*/
fputwc(L'\uFEFF', stdout);
wprintf(L"Happy New Year!\n");
wprintf(L"С новым годом!\n");
wprintf(L"新年好!\n");
wprintf(L"賀正!\n");
wprintf(L"¡Feliz año nuevo!\n");
wprintf(L"Hyvää uutta vuotta!\n");
return EXIT_SUCCESS;
}
Note that wprintf() takes a wide string (wide string constants are of form L"", wide character constants L'', as opposed to normal/narrow counterparts "" and ''). Formats are still the same; %s prints a normal/narrow string, and %ls a wide string.
Example 2. Reading input lines from standard input, and optionally saving them to a file. The file name is supplied on the command line.
#include <stdlib.h>
#include <string.h>
#include <locale.h>
#include <wctype.h>
#include <wchar.h>
#include <errno.h>
#include <stdio.h>
typedef enum {
TRIM_LEFT = 1, /* Remove leading whitespace and control characters */
TRIM_RIGHT = 2, /* Remove trailing whitespace and control characters */
TRIM_NEWLINE = 4, /* Remove newline at end of line */
TRIM = 7, /* Remove leading and trailing whitespace and control characters */
OMIT_NUL = 8, /* Skip NUL characters (embedded binary zeros, L'\0') */
OMIT_CONTROLS = 16, /* Skip control characters */
CLEANUP = 31, /* All of the above. */
COMBINE_LWS = 32, /* Combine all whitespace into a single space */
} trim_opts;
/* Read an unlimited-length line from a wide input stream.
*
* This function takes a pointer to a wide string pointer,
* pointer to the number of wide characters dynamically allocated for it,
* the stream to read from, and a set of options on how to treat the line.
*
* If an error occurs, this will return 0 with errno set to nonzero error number.
* Use strerror(errno) to obtain the error description (as a narrow string).
*
* If there is no more data to read from the stream,
* this will return 0 with errno 0, and feof(stream) will return true.
*
* If an empty line is read,
* this will return 0 with errno 0, but feof(stream) will return false.
*
* Typically, you initialize variables like
* wchar_t *line = NULL;
* size_t size = 0;
* before calling this function, so that subsequent calls the same, dynamically
* allocated buffer for the line, and it is automatically grown if necessary.
* There are no built-in limits to line lengths this way.
*/
size_t getwline(wchar_t **const lineptr,
size_t *const sizeptr,
FILE *const in,
trim_opts const trimming)
{
wchar_t *line;
size_t size;
size_t used = 0;
wint_t wc;
fpos_t startpos;
int seekable;
if (lineptr == NULL || sizeptr == NULL || in == NULL) {
errno = EINVAL;
return 0;
}
if (*lineptr != NULL) {
line = *lineptr;
size = *sizeptr;
} else {
line = NULL;
size = 0;
*sizeptr = 0;
}
/* In error cases, we can try and get back to this position
* in the input stream, as we cannot really return the data
* read thus far. However, some streams like pipes are not seekable,
* so in those cases we should not even try.
* Use (seekable) as a flag to remember if we should try.
*/
if (fgetpos(in, &startpos) == 0)
seekable = 1;
else
seekable = 0;
while (1) {
/* When we read a wide character from a wide stream,
* fgetwc() will return WEOF with errno set if an error occurs.
* However, fgetwc() will return WEOF with errno *unchanged*
* if there is no more input in the stream.
* To detect which of the two happened, we need to clear errno
* first.
*/
errno = 0;
wc = fgetwc(in);
if (wc == WEOF) {
if (errno) {
const int saved_errno = errno;
if (seekable)
fsetpos(in, &startpos);
errno = saved_errno;
return 0;
}
if (ferror(in)) {
if (seekable)
fsetpos(in, &startpos);
errno = EIO;
return 0;
}
break;
}
/* Dynamically grow line buffer if necessary.
* We need room for the current wide character,
* plus at least the end-of-string mark, L'\0'.
*/
if (used + 2 > size) {
/* Size policy. This can be anything you see fit,
* as long as it yields size >= used + 2.
*
* This one increments size to next multiple of
* 1024 (minus 16). It works well in practice,
* but do not think of it as the "best" way.
* It is just a robust choice.
*/
size = (used | 1023) + 1009;
line = realloc(line, size * sizeof line[0]);
if (!line) {
/* Memory allocation failed. */
if (seekable)
fsetpos(in, &startpos);
errno = ENOMEM;
return 0;
}
*lineptr = line;
*sizeptr = size;
}
/* Append character to buffer. */
if (!trimming)
line[used++] = wc;
else {
/* Check if we have reasons to NOT add the character to buffer. */
do {
/* Omit NUL if asked to. */
if (trimming & OMIT_NUL)
if (wc == L'\0')
break;
/* Omit controls if asked to. */
if (trimming & OMIT_CONTROLS)
if (iswcntrl(wc))
break;
/* If we are at start of line, and we are left-trimming,
* only graphs (printable non-whitespace characters) are added. */
if (trimming & TRIM_LEFT)
if (wc == L'\0' || !iswgraph(wc))
break;
/* Combine whitespaces if asked to. */
if (trimming & COMBINE_LWS)
if (iswspace(wc)) {
if (used > 0 && line[used-1] == L' ')
break;
else
wc = L' ';
}
/* Okay, add the character to buffer. */
line[used++] = wc;
} while (0);
}
/* End of the line? */
if (wc == L'\n')
break;
}
/* The above loop will only end (break out)
* if end of line or end of input was found,
* and no error occurred.
*/
/* Trim right if asked to. */
if (trimming & TRIM_RIGHT)
while (used > 0 && iswspace(line[used-1]))
--used;
else
if (trimming & TRIM_NEWLINE)
while (used > 0 && (line[used-1] == L'\r' || line[used-1] == L'\n'))
--used;
/* Ensure we have room for end-of-string L'\0'. */
if (used >= size) {
size = used + 1;
line = realloc(line, size * sizeof line[0]);
if (!line) {
if (seekable)
fsetpos(in, &startpos);
errno = ENOMEM;
return 0;
}
*lineptr = line;
*sizeptr = size;
}
/* Add end of string mark. */
line[used] = L'\0';
/* Successful return. */
errno = 0;
return used;
}
/* Counts the number of wide characters in 'alpha' class.
*/
size_t count_letters(const wchar_t *ws)
{
size_t count = 0;
if (ws)
while (*ws != L'\0')
if (iswalpha(*(ws++)))
count++;
return count;
}
int main(int argc, char *argv[])
{
FILE *out;
wchar_t *line = NULL;
size_t size = 0;
size_t len;
setlocale(LC_ALL, "");
/* Standard input and output should use wide characters. */
fwide(stdin, 1);
fwide(stdout, 1);
/* Check if the user asked for help. */
if (argc < 2 || argc > 3 || strcmp(argv[1], "-h") == 0 || strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "/?") == 0) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s [ -h | --help | /? ]\n", argv[0]);
fprintf(stderr, " %s FILENAME [ PROMPT ]\n", argv[0]);
fprintf(stderr, "\n");
fprintf(stderr, "The program will read input lines until an only '.' is supplied.\n");
fprintf(stderr, "If you do not want to save the output to a file,\n");
fprintf(stderr, "use '-' as the FILENAME.\n");
fprintf(stderr, "\n");
return EXIT_SUCCESS;
}
/* Open file for output, unless it is "-". */
if (strcmp(argv[1], "-") == 0)
out = NULL; /* No output to file */
else {
out = fopen(argv[1], "w");
if (out == NULL) {
fprintf(stderr, "%s: %s.\n", argv[1], strerror(errno));
return EXIT_FAILURE;
}
/* The output file is used with wide strings. */
fwide(out, 1);
}
while (1) {
/* Prompt? Note: our prompt string is narrow, but stdout is wide. */
if (argc > 2) {
wprintf(L"%s\n", argv[2]);
fflush(stdout);
}
len = getwline(&line, &size, stdin, CLEANUP);
if (len == 0) {
if (errno) {
fprintf(stderr, "Error reading standard input: %s.\n", strerror(errno));
break;
}
if (feof(stdin))
break;
}
/* The user does not wish to supply more lines? */
if (wcscmp(line, L".") == 0)
break;
/* Print the line to the file. */
if (out != NULL) {
fputws(line, out);
fputwc(L'\n', out);
}
/* Tell the user what we read. */
wprintf(L"Received %lu wide characters, %lu of which were letterlike.\n",
(unsigned long)len, (unsigned long)count_letters(line));
fflush(stdout);
}
/* The line buffer is no longer needed, so we can discard it.
* Note that free(NULL) is safe, so we do not need to check.
*/
free(line);
/* I personally also like to reset the variables.
* It helps with debugging, and to avoid reuse-after-free() errors. */
line = NULL;
size = 0;
return EXIT_SUCCESS;
}
The getwline() function above is pretty much at the most complicated end of functions you might need when dealing with localized wide character support. It allows you to read localized input lines without length restrictions, and optionally trims and cleans up (removing control codes and embedded binary zeros) the returned string. It also works fine with both LF and CR-LF (\n and \r\n) newline encodings.

Try:
setlocale(LC_ALL, "en_US.UTF-8");
You can run locale -a in the terminal to get a full list of locales supported by your system ("en_US.UTF-8" should be supported by most/all UTF-8 supporting systems).
EDIT 1 (alternate spelling)
In the comments, Lee points out that some systems have an alternate spelling, "en_US.utf8" (which surprised me, but we learn new stuff every day).
Since setlocale returns NULL when it fails, you can chain these calls:
if(!setlocale(LC_ALL, "en_US.UTF-8") && !setlocale(LC_ALL, "en_US.utf8"))
printf("failed to set locale to UTF-8");
EDIT 2 (finding out if we're using UTF-8)
To find out if the locale is set to UFT-8 (after attempting to set it), you can either check for the returned value (NULL means the call failed) or check the locale used.
Option 1:
char * result;
if((result = setlocale (LC_ALL, "en_US.UTF-8")) == NULL)
printf("failed to set locale to UTF-8");
Option 2:
setlocale (LC_ALL, "en_US.UTF-8"); // set
char * result = setlocale (LC_ALL, NULL); // review
if(!strstr(result, "UTF-8"))
printf("failed to set locale to UTF-8");

This is not an answer, but a third, quite complex example, on how to use wide character I/O. This was too long to add to my actual answer to this question.
This example shows how to read and process CSV files (RFC-4180 format, optionally with limited backslash escape support) using wide strings.
The following code is CC0/public domain, so you are free to use it any way you like, even include in your own proprietary projects, but if it breaks anything, you get to keep all the bits and not complain to me. (I'll be happy to include any bug fixes if you find and report them in a comment below, though.)
The logic of the code is robust, however. In particular, it supports universal newlines, all four common newline types: Unix-like LF (\n), old CR LF (\r\n), old Mac CR (\r), and the occasionally encountered weird LF CR (\n\r). There are no built-in limitations wrt. the length of a field, the number of fields in a record, or the number of records in a file. It works very nicely if you need to convert CSV or process CSV input stream-like (field by field or record-by-record), without having to have more than one in memory at one point. If you want to construct structures to describe the records and fields in memory, you'll need to add some scaffolding code for that.
Because of universal newline support, when reading input interactively, this program might require two consecutive end-of-inputs (Ctrl+Z in Windows and MS-DOS, Ctrl+D everywhere else), as the first one is usually "consumed" by the csv_next_field() or csv_skip_field() function, and the csv_next_record() function needs to re-read it again to actually detect it. However, you do not normally ask the user to input CSV data interactively, so this should be an acceptable quirk.
#include <stdlib.h>
#include <locale.h>
#include <string.h>
#include <stdio.h>
#include <wchar.h>
#include <wctype.h>
#include <errno.h>
/* RFC-4180 -format CSV file processing using wide input streams.
*
* #define BACKSLASH_ESCAPES if you additionally wish to have
* \\, \a, \b, \t, \n, \v, \f, \r, \", and \, de-escaped to their
* C string equivalents when reading CSV fields.
*/
typedef enum {
CSV_OK = 0,
CSV_END = 1,
CSV_INVALID_PARAMETERS = -1,
CSV_FORMAT_ERROR = -2,
CSV_CHARSET_ERROR = -3,
CSV_READ_ERROR = -4,
CSV_OUT_OF_MEMORY = -5,
} csv_status;
const char *csv_error(const csv_status code)
{
switch (code) {
case CSV_OK: return "No error";
case CSV_END: return "At end";
case CSV_INVALID_PARAMETERS: return "Invalid parameters";
case CSV_FORMAT_ERROR: return "Bad CSV format";
case CSV_CHARSET_ERROR: return "Illegal character in CSV file (incorrect locale?)";
case CSV_READ_ERROR: return "Read error";
case CSV_OUT_OF_MEMORY: return "Out of memory";
default: return "Unknown csv_status code";
}
}
/* Start the next record. Automatically skips any remaining fields in current record.
* Returns CSV_OK if successful, CSV_END if no more records, or a negative CSV_ error code. */
csv_status csv_next_record (FILE *const in);
/* Skip the next field. Returns CSV_OK if successful, CSV_END if no more fields in current record,
* or a negative CSV_ error code. */
csv_status csv_skip_field (FILE *const in);
/* Read the next field. Returns CSV_OK if successful, CSV_END if no more fields in current record,
* or a negative CSV_ error code.
* If this returns CSV_OK, then *dataptr is a dynamically allocated wide string to the field
* contents, space allocated for *sizeptr wide characters; and if lengthptr is not NULL, then
* *lengthptr is the number of wide characters in said wide string. */
csv_status csv_next_field (FILE *const in, wchar_t **const dataptr,
size_t *const sizeptr,
size_t *const lengthptr);
static csv_status internal_skip_quoted(FILE *const in)
{
while (1) {
wint_t wc;
errno = 0;
wc = fgetwc(in);
if (wc == WEOF) {
if (errno == EILSEQ)
return CSV_CHARSET_ERROR;
if (errno)
return CSV_READ_ERROR;
if (ferror(in)) {
errno = EIO;
return CSV_READ_ERROR;
}
errno = 0;
return CSV_FORMAT_ERROR;
}
if (wc == L'"') {
errno = 0;
wc = fgetwc(in);
if (wc == L'"')
continue;
while (wc != WEOF && wc != L'\n' && wc != L'\r' && iswspace(wc)) {
errno = 0;
wc = fgetwc(in);
}
if (wc == WEOF) {
if (errno == EILSEQ)
return CSV_CHARSET_ERROR;
if (errno)
return CSV_READ_ERROR;
if (ferror(in)) {
errno = EIO;
return CSV_READ_ERROR;
}
errno = 0;
return CSV_END;
}
if (wc == L',') {
errno = 0;
return CSV_OK;
}
if (wc == L'\n' || wc == L'\r') {
ungetwc(wc, in);
errno = 0;
return CSV_END;
}
ungetwc(wc, in);
errno = 0;
return CSV_FORMAT_ERROR;
}
#ifdef BACKSLASH_ESCAPES
if (wc == L'\\') {
errno = 0;
wc = fgetwc(in);
if (wc == L'"')
continue;
if (wc == WEOF) {
if (errno == EILSEQ)
return CSV_CHARSET_ERROR;
if (errno)
return CSV_READ_ERROR;
if (ferror(in)) {
errno = EIO;
return CSV_READ_ERROR;
}
errno = 0;
return CSV_END;
}
}
#endif
}
}
static csv_status internal_skip_unquoted(FILE *const in, wint_t wc)
{
while (1) {
if (wc == WEOF) {
if (errno == EILSEQ)
return CSV_CHARSET_ERROR;
if (errno)
return CSV_READ_ERROR;
if (ferror(in)) {
errno = EIO;
return CSV_READ_ERROR;
}
errno = 0;
return CSV_END;
}
if (wc == L',') {
errno = 0;
return CSV_OK;
}
if (wc == L'\n' || wc == L'\r') {
ungetwc(wc, in);
errno = 0;
return CSV_END;
}
#ifdef BACKSLASH_ESCAPES
if (wc == L'\\') {
errno = 0;
wc = fgetwc(in);
if (wc == WEOF) {
if (errno == EILSEQ)
return CSV_CHARSET_ERROR;
if (errno)
return CSV_READ_ERROR;
if (ferror(in)) {
errno = EIO;
return CSV_READ_ERROR;
}
errno = 0;
return CSV_END;
}
}
#endif
errno = 0;
wc = fgetwc(in);
}
}
csv_status csv_next_record(FILE *const in)
{
while (1) {
wint_t wc;
csv_status status;
do {
errno = 0;
wc = fgetwc(in);
} while (wc != WEOF && wc != L'\n' && wc != L'\r' && iswspace(wc));
if (wc == WEOF) {
if (errno == EILSEQ)
return CSV_CHARSET_ERROR;
if (errno)
return CSV_READ_ERROR;
if (ferror(in)) {
errno = EIO;
return CSV_READ_ERROR;
}
errno = 0;
return CSV_END;
}
if (wc == L'\n' || wc == L'\r') {
wint_t next_wc;
errno = 0;
next_wc = fgetwc(in);
if (next_wc == WEOF) {
if (errno == EILSEQ)
return CSV_CHARSET_ERROR;
if (errno)
return CSV_READ_ERROR;
if (ferror(in)) {
errno = EIO;
return CSV_READ_ERROR;
}
errno = 0;
return CSV_END;
}
if ((wc == L'\n' && next_wc == L'\r') ||
(wc == L'\r' && next_wc == L'\n')) {
errno = 0;
return CSV_OK;
}
ungetwc(next_wc, in);
errno = 0;
return CSV_OK;
}
if (wc == L'"')
status = internal_skip_quoted(in);
else
status = internal_skip_unquoted(in, wc);
if (status < 0)
return status;
}
}
csv_status csv_skip_field(FILE *const in)
{
wint_t wc;
if (!in) {
errno = EINVAL;
return CSV_INVALID_PARAMETERS;
} else
if (ferror(in)) {
errno = EIO;
return CSV_READ_ERROR;
}
/* Skip leading whitespace. */
do {
errno = 0;
wc = fgetwc(in);
} while (wc != WEOF && wc != L'\n' && wc != L'\r' && iswspace(wc));
if (wc == L'"')
return internal_skip_quoted(in);
else
return internal_skip_unquoted(in, wc);
}
csv_status csv_next_field(FILE *const in, wchar_t **const dataptr,
size_t *const sizeptr,
size_t *const lengthptr)
{
wchar_t *data;
size_t size;
size_t used = 0; /* length */
wint_t wc;
if (lengthptr)
*lengthptr = 0;
if (!in || !dataptr || !sizeptr) {
errno = EINVAL;
return CSV_INVALID_PARAMETERS;
} else
if (ferror(in)) {
errno = EIO;
return CSV_READ_ERROR;
}
if (*dataptr) {
data = *dataptr;
size = *sizeptr;
} else {
data = NULL;
size = 0;
*sizeptr = 0;
}
/* Skip leading whitespace. */
do {
errno = 0;
wc = fgetwc(in);
} while (wc != WEOF && wc != L'\n' && wc != L'\r' && iswspace(wc));
if (wc == WEOF) {
if (errno == EILSEQ)
return CSV_CHARSET_ERROR;
if (errno)
return CSV_READ_ERROR;
if (ferror(in)) {
errno = EIO;
return CSV_READ_ERROR;
}
errno = 0;
return CSV_END;
}
if (wc == L'\n' || wc == L'\r') {
ungetwc(wc, in);
errno = 0;
return CSV_END;
}
if (wc == L'"')
while (1) {
errno = 0;
wc = getwc(in);
if (wc == WEOF) {
if (errno == EILSEQ)
return CSV_CHARSET_ERROR;
if (errno)
return CSV_READ_ERROR;
if (ferror(in)) {
errno = EIO;
return CSV_READ_ERROR;
}
errno = 0;
return CSV_FORMAT_ERROR;
} else
if (wc == L'"') {
errno = 0;
wc = getwc(in);
if (wc != L'"') {
/* Not an escaped doublequote. */
while (wc != WEOF && wc != L'\n' && wc != L'\r' && iswspace(wc)) {
errno = 0;
wc = getwc(in);
}
if (wc == WEOF) {
if (errno == EILSEQ)
return CSV_CHARSET_ERROR;
if (errno)
return CSV_READ_ERROR;
if (ferror(in)) {
errno = EIO;
return CSV_READ_ERROR;
}
} else
if (wc == L'\n' || wc == L'\r') {
ungetwc(wc, in);
} else
if (wc != L',') {
errno = 0;
return CSV_FORMAT_ERROR;
}
break;
}
#ifdef BACKSLASH_ESCAPES
} else
if (wc == L'\\') {
errno = 0;
wc = getwc(in);
if (wc == L'\0')
continue;
else
if (wc == WEOF) {
if (errno == EILSEQ)
return CSV_CHARSET_ERROR;
if (errno)
return CSV_READ_ERROR;
if (ferror(in)) {
errno = EIO;
return CSV_READ_ERROR;
}
break;
} else
switch (wc) {
case L'a': wc = L'\a'; break;
case L'b': wc = L'\b'; break;
case L't': wc = L'\t'; break;
case L'n': wc = L'\n'; break;
case L'v': wc = L'\v'; break;
case L'f': wc = L'\f'; break;
case L'r': wc = L'\r'; break;
case L'\\': wc = L'\\'; break;
case L'"': wc = L'"'; break;
case L',': wc = L','; break;
default:
ungetwc(wc, in);
wc = L'\\';
}
#endif
}
if (used + 2 > size) {
/* Allocation policy.
* Anything that yields size >= used + 2 is acceptable.
* This one allocates in roughly 1024 byte chunks,
* and is known to be robust (but not optimal) in practice. */
size = (used | 1023) + 1009;
data = realloc(data, size * sizeof data[0]);
if (!data) {
errno = ENOMEM;
return CSV_OUT_OF_MEMORY;
}
*dataptr = data;
*sizeptr = size;
}
data[used++] = wc;
}
else
while (1) {
if (wc == L',')
break;
if (wc == L'\n' || wc == L'\r') {
ungetwc(wc, in);
break;
}
#ifdef BACKSLASH_ESCAPES
if (wc == L'\\') {
errno = 0;
wc = fgetwc(in);
if (wc == WEOF) {
if (errno == EILSEQ)
return CSV_CHARSET_ERROR;
if (errno)
return CSV_READ_ERROR;
if (ferror(in)) {
errno = EIO;
return CSV_READ_ERROR;
}
wc = L'\\';
} else
switch (wc) {
case L'a': wc = L'\a'; break;
case L'b': wc = L'\b'; break;
case L't': wc = L'\t'; break;
case L'n': wc = L'\n'; break;
case L'v': wc = L'\v'; break;
case L'f': wc = L'\f'; break;
case L'r': wc = L'\r'; break;
case L'"': wc = L'"'; break;
case L',': wc = L','; break;
case L'\\': wc = L'\\'; break;
default:
ungetwc(wc, in);
wc = L'\\';
}
}
#endif
if (used + 2 > size) {
/* Allocation policy.
* Anything that yields size >= used + 2 is acceptable.
* This one allocates in roughly 1024 byte chunks,
* and is known to be robust (but not optimal) in practice. */
size = (used | 1023) + 1009;
data = realloc(data, size * sizeof data[0]);
if (!data) {
errno = ENOMEM;
return CSV_OUT_OF_MEMORY;
}
*dataptr = data;
*sizeptr = size;
}
data[used++] = wc;
errno = 0;
wc = getwc(in);
if (wc == WEOF) {
if (errno == EILSEQ)
return CSV_CHARSET_ERROR;
if (errno)
return CSV_READ_ERROR;
if (ferror(in)) {
errno = EIO;
return CSV_READ_ERROR;
}
break;
}
}
/* Ensure there is room for the end-of-string mark. */
if (used >= size) {
size = used + 1;
data = realloc(data, size * sizeof data[0]);
if (!data) {
errno = ENOMEM;
return CSV_OUT_OF_MEMORY;
}
*dataptr = data;
*sizeptr = size;
}
data[used] = L'\0';
if (lengthptr)
*lengthptr = used;
errno = 0;
return CSV_OK;
}
/* Helper function: print a wide string as if in quotes, but backslash-escape special characters.
*/
static void wquoted(FILE *const out, const wchar_t *ws, const size_t len)
{
if (out) {
size_t i;
for (i = 0; i < len; i++)
if (ws[i] == L'\0')
fputws(L"\\0", out);
else
if (ws[i] == L'\a')
fputws(L"\\a", out);
else
if (ws[i] == L'\b')
fputws(L"\\b", out);
else
if (ws[i] == L'\t')
fputws(L"\\t", out);
else
if (ws[i] == L'\n')
fputws(L"\\n", out);
else
if (ws[i] == L'\v')
fputws(L"\\v", out);
else
if (ws[i] == L'\f')
fputws(L"\\f", out);
else
if (ws[i] == L'\r')
fputws(L"\\r", out);
else
if (ws[i] == L'"')
fputws(L"\\\"", out);
else
if (ws[i] == L'\\')
fputws(L"\\\\", out);
else
if (iswprint(ws[i]))
fputwc(ws[i], out);
else
if (ws[i] < 65535)
fwprintf(out, L"\\x%04x", (unsigned int)ws[i]);
else
fwprintf(out, L"\\x%08x", (unsigned long)ws[i]);
}
}
static int show_csv(FILE *const in, const char *const filename)
{
wchar_t *field_contents = NULL;
size_t field_allocated = 0;
size_t field_length = 0;
unsigned long record = 0UL;
unsigned long field;
csv_status status;
while (1) {
/* First field in this record. */
field = 0UL;
record++;
while (1) {
status = csv_next_field(in, &field_contents, &field_allocated, &field_length);
if (status == CSV_END)
break;
if (status < 0) {
fprintf(stderr, "%s: %s.\n", filename, csv_error(status));
free(field_contents);
return -1;
}
field++;
wprintf(L"Record %lu, field %lu is \"", record, field);
wquoted(stdout, field_contents, field_length);
wprintf(L"\", %lu characters.\n", (unsigned long)field_length);
}
status = csv_next_record(in);
if (status == CSV_END) {
free(field_contents);
return 0;
}
if (status < 0) {
fprintf(stderr, "%s: %s.\n", filename, csv_error(status));
free(field_contents);
return -1;
}
}
}
static int usage(const char *argv0)
{
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s [ -h | --help | /? ]\n", argv0);
fprintf(stderr, " %s CSV-FILE [ ... ]\n", argv0);
fprintf(stderr, "\n");
fprintf(stderr, "Use special file name '-' to read from standard input.\n");
fprintf(stderr, "\n");
return EXIT_SUCCESS;
}
int main(int argc, char *argv[])
{
FILE *in;
int arg;
setlocale(LC_ALL, "");
fwide(stdin, 1);
fwide(stdout, 1);
if (argc < 1)
return usage(argv[0]);
for (arg = 1; arg < argc; arg++) {
if (!strcmp(argv[arg], "-h") || !strcmp(argv[arg], "--help") || !strcmp(argv[arg], "/?"))
return usage(argv[0]);
if (!strcmp(argv[arg], "-")) {
if (show_csv(stdin, "(standard input)"))
return EXIT_FAILURE;
} else {
in = fopen(argv[arg], "r");
if (!in) {
fprintf(stderr, "%s: %s.\n", argv[arg], strerror(errno));
return EXIT_FAILURE;
}
if (show_csv(in, argv[arg]))
return EXIT_FAILURE;
if (ferror(in)) {
fprintf(stderr, "%s: %s.\n", argv[arg], strerror(EIO));
fclose(in);
return EXIT_FAILURE;
}
if (fclose(in)) {
fprintf(stderr, "%s: %s.\n", argv[arg], strerror(EIO));
return EXIT_FAILURE;
}
}
}
return EXIT_SUCCESS;
}
The use of the above csv_next_field(), csv_skip_field(), and csv_next_record() is quite straightforward.
Open the CSV file normally, then call fwide(stream, 1) on it to tell the C library you intend to use the wide string variants instead of the standard narrow string I/O functions.
Create four variables, and initialize the first two:
wchar_t *field = NULL;
size_t allocated = 0;
size_t length;
csv_status status;
field is a pointer to the dynamically allocated contents of each field you read. It is allocated automatically; essentially, you don't need to worry about it at all. allocated holds the currently allocated size (in wide characters, including terminating L'\0'), and we'll use length and status later.
At this point, you are ready to read or skip the first field in the first record.
You do not wish to call csv_next_record() at this point, unless you wish to skip the very first record entirely in the file.
Call status = csv_skip_field(stream); to skip the next field, or status = csv_next_field(stream, &field, &allocated, &length); to read it.
If status == CSV_OK, you have the field contents in wise string field. It has length wide characters in it.
If status == CSV_END, there was no more fields in the current record. (The field is unchanged, and you should not examine it.)
Otherwise, status < 0, and it describes an error code. You can use csv_error(status) to obtain a (narrow) string describing it.
At any point, you can move (skip) to the start of the next record by calling status = csv_next_record(stream);.
If it returns CSV_OK, there might be a new record available. (We only know when you try to read or skip the first field. This is similar to how standard C library function feof() only tells you whether you have tried to read past the end of input, it does not tell whether there is more data available or not.)
If it returns CSV_END, you already have processed the last record, and there are no more records.
Otherwise, it returns a negative error code, status < 0. You can use csv_error(status) to obtain a (narrow) string describing it.
After you are done, discard the field buffer:
free(field);
field = NULL;
allocated = 0;
You do not actually need to reset the variables to NULL and zero, but I recommend it. In fact, you can do the above at any point (when you are no longer interested in the contents of the current field), as the csv_next_field() will then automatically allocate a new buffer as necessary.
Note that free(NULL); is always safe and does nothing. You do not need to check if field is NULL or not before freeing it. This is also the reason why I recommend initializing the variables immediately when you declare them. It just makes everything so much easier to handle.
The compiled example program takes one or more CSV file names as command-line parameters, then reads the files and reports the contents of each field in the file. If you have a particularly fiendishly complex CSV file, this is optimal for checking if this approach reads all the fields correctly.

Related

Is there any way to change sigaction flags during execution?

I have this child process in infinite loop and i want it to stop the loop when recive SIGUSR1 from parent pid.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <signal.h>
int GameOver = 0;
jmp_buf here; // <------- After Joshua's Answer
void trataSIGUSR1(int sig, siginfo_t *info, void *extra);
int main(int argc, char** argv){
int someNumber = 0, score = 0;
char word[15],c;
struct sigaction new_action;
// new_action.sa_flags = SA_SIGINFO; // <------- Before Joshua's Answer
new_action.sa_flags = SA_SIGINFO | SA_RESTART; // <------- After Joshua's Answer
new_action.sa_sigaction = &trataSIGUSR1;
sigfillset(&new_action.sa_mask);
if (sigaction(SIGUSR1, &new_action, NULL) == -1){
perror("Error: cannot handle SIGUSR1"); // não deve acontecer
return EXIT_FAILURE;
}
FILE *f;
f = fopen("randomfile.txt", "r");
if (f == NULL){
printf("Errr Opening File!\n");
return EXIT_FAILURE;
}
// setjmp(here); // <------- After Joshua's Answer
sigsetjmp(here,1); // <-- After wildplasser's Answer
while (!GameOver){
fscanf(f, "%s", word);
printf("\nWord -> %s\n", word);
if(!scanf("%d", &someNumber)){
puts("Invalid Value!");
while ((c = getchar()) != '\n' && c != EOF);
continue;
}
if(someNumber == strlen(word) && !GameOver)
score ++;
if(feof(f)){
printf("\nEnd of file.\n");
break;
}
}
if( GameOver )
puts("\nAcabou o tempo!"); // <-- After wildplasser's Answer
fclose(f);
return score;
}
void trataSIGUSR1(int sig, siginfo_t *info, void *extra){
if (info->si_pid == getppid()){ // only end when parent send SIGUSR1
// puts("\nAcabou o tempo!"); // <-- Before wildplasser's Answer
GameOver = 1;
// longjmp(here,1); // <------- After Joshua's Answer
siglongjmp(here,1); // <---- After wildplasser's Answer
}
}
It works fine but if i send SIGUSR1 to child pid from another process scanf get interupted... I want to interupt the scanf and automaticly stop the loop only when signal come from parent, in other case just ignore. Is there any way to change the flag to new_action.sa_flags = SA_RESTART; when signal comes from other process?!
There are several possibilities, ranging from a huge hack, to proper (but complicated).
The simplest thing is to have the SIGUSR1 from parent reopen standard input to /dev/null. Then, when scanf() fails, instead of complaining and retrying, you can break out of the loop if feof(stdin) is true. Unfortunately, freopen() is not async-signal safe, so this is not a standards (POSIX, in this case) compliant way of doing things.
The standards-compliant way of doing things is to implement your own read input line into a dynamically allocated string -type of function, which detects when the signal handler sets the flag. The flag should also be of volatile sig_atomic_t type, not an int; the volatile in particular tells the compiler that the value may be changed unexpectedly (by the signal handler), so whenever referenced, the compiler must re-read the variable value, instead of remembering it from a previous access. The sig_atomic_t type is an atomic integer type: the process and the signal handler will only ever see either the new, or the old value, never a mix of the two, but might have as small valid range as 0 to 127, inclusive.
Signal delivery to an userspace handler (installed without SA_RESTART) does interrupt a blocking I/O operation (like read or write; in the thread used for signal delivery – you only have one, so that will always be used), but it might occur between the flag check and the scanf(), so in this case, it is not reliable.
The proper solution here is to not use stdin at all, and instead use the low-level <unistd.h> I/O for this. Note that it is imperative to not mix stdin/scanf() and low-level I/O for the same stream. You can safely use printf(), fprintf(stdout, ...), fprintf(stderr, ...), and so on. The reason is that the C library internal stdin stream structure will not be updated correctly by our low-level access, and will be out-of-sync with reality if we mix both (for the same stream).
Here is an example program showing one implementation (licensed under Creative Commons Zero v1.0 International – do as you wish with it, no guarantees though):
// SPDX-License-Identifier: CC0-1.0
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <fcntl.h>
#include <poll.h>
#include <signal.h>
#include <time.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
/* Maximum poll() timeout, in milliseconds, so that done flag is checked often enough.
*/
#ifndef DONE_POLL_INTERVAL_MS
#define DONE_POLL_INTERVAL_MS 100
#endif
static volatile sig_atomic_t done = 0;
static void handle_done(int signum, siginfo_t *info, void *context)
{
/* This silences warnings about context not being used. It does nothing. */
(void)context;
if (signum == SIGUSR1 && info->si_pid == getppid()) {
/* SIGUSR1 is only accepted if it comes from the parent process */
done = 1;
} else {
/* All other signals are accepted from all processes (that have the necessary privileges) */
done = 1;
}
}
static int install_done(const int signum)
{
struct sigaction act;
memset(&act, 0, sizeof act);
sigemptyset(&(act.sa_mask));
act.sa_sigaction = handle_done;
act.sa_flags = SA_SIGINFO;
return sigaction(signum, &act, NULL);
}
/* Our own input stream structure type. */
struct input {
int descriptor;
char *data;
size_t size;
size_t head;
size_t tail;
};
/* Associating an input stream with a file descriptor.
Do not mix stdin use and input stream on descriptor STDIN_FILENO!
*/
static int input_use(struct input *const in, const int descriptor)
{
/* Check that the parameters are not obviously invalid. */
if (!in || descriptor == -1) {
errno = EINVAL;
return -1;
}
/* Set the descriptor nonblocking. */
{
int flags = fcntl(descriptor, F_GETFL);
if (flags == -1) {
/* errno set by fcntl(). */
return -1;
}
if (fcntl(descriptor, F_SETFL, flags | O_NONBLOCK) == -1) {
/* errno set by fcntl(). */
return -1;
}
}
/* Initialize the stream structure. */
in->descriptor = descriptor;
in->data = NULL;
in->size = 0;
in->head = 0;
in->tail = 0;
/* Success. */
return 0;
}
/* Read until delimiter from an input stream.
* If 'done' is set at any point, will return 0 with errno==EINTR.
* Returns 0 if an error occurs, with errno set.
* Returns 0 with errno==0 when end of input stream.
*/
static size_t input_getdelim(struct input *const in,
int const delim,
char **const dataptr,
size_t *const sizeptr,
const double timeout)
{
const clockid_t timeout_clk = CLOCK_BOOTTIME;
struct timespec then;
/* Verify none of the pointers are NULL. */
if (!in || !dataptr || !sizeptr) {
errno = EINVAL;
return 0;
}
/* Record current time for timeout measurement. */
clock_gettime(timeout_clk, &then);
char *line_data = *dataptr;
size_t line_size = *sizeptr;
/* If (*sizeptr) is zero, then we ignore dataptr value, like getline() does. */
if (!line_size)
line_data = NULL;
while (1) {
struct timespec now;
struct pollfd fds[1];
ssize_t n;
int ms = DONE_POLL_INTERVAL_MS;
/* Done flag set? */
if (done) {
errno = EINTR;
return 0;
}
/* Is there a complete line in the input buffer? */
if (in->tail > in->head) {
const char *ptr = memchr(in->data + in->head, delim, in->tail - in->head);
if (ptr) {
const size_t len = ptr - (in->data + in->head);
if (len + 2 > line_size) {
/* Since we do not have any meaningful data in line_data,
and it would be overwritten anyway if there was,
instead of reallocating it we just free an allocate it. */
free(line_data); /* Note: free(null) is safe. */
line_size = len + 2;
line_data = malloc(line_size);
if (!line_data) {
/* Oops, we lost the buffer. */
*dataptr = NULL;
*sizeptr = 0;
errno = ENOMEM;
return 0;
}
*dataptr = line_data;
*sizeptr = line_size;
}
/* Copy the line, including the separator, */
memcpy(line_data, in->data + in->head, len + 1);
/* add a terminating nul char, */
line_data[len + 1] = '\0';
/* and update stream buffer state. */
in->head += len + 1;
return len + 1;
}
/* No, we shall read more data. Prepare the buffer. */
if (in->head > 0) {
memmove(in->data, in->data + in->head, in->tail - in->head);
in->tail -= in->head;
in->head = 0;
}
} else {
/* Input buffer is empty. */
in->head = 0;
in->tail = 0;
}
/* Do we need to grow input stream buffer? */
if (in->head >= in->tail) {
/* TODO: Better buffer size growth policy! */
const size_t size = (in->tail + 65535) | 65537;
char *data;
data = realloc(in->data, size);
if (!data) {
errno = ENOMEM;
return 0;
}
in->data = data;
in->size = size;
}
/* Try to read additional data. It is imperative that the descriptor
has been marked nonblocking, as otherwise this will block. */
n = read(in->descriptor, in->data + in->tail, in->size - in->tail);
if (n > 0) {
/* We read more data without blocking. */
in->tail += n;
continue;
} else
if (n == 0) {
/* End of input mark (Ctrl+D at the beginning of line, if a terminal) */
const size_t len = in->tail - in->head;
if (len < 1) {
/* No data buffered, read end of input. */
if (line_size < 1) {
line_size = 1;
line_data = malloc(line_size);
if (!line_data) {
errno = ENOMEM;
return 0;
}
*dataptr = line_data;
*sizeptr = line_size;
}
line_data[0] = '\0';
errno = 0;
return 0;
}
if (len + 1 > line_size) {
/* Since we do not have any meaningful data in line_data,
and it would be overwritten anyway if there was,
instead of reallocating it we just free an allocate it. */
free(line_data); /* Note: free(null) is safe. */
line_size = len + 1;
line_data = malloc(line_size);
if (!line_data) {
/* Oops, we lost the buffer. */
*dataptr = NULL;
*sizeptr = 0;
errno = ENOMEM;
return 0;
}
*dataptr = line_data;
*sizeptr = line_size;
}
memmove(line_data, in->data, len);
line_data[len] = '\0';
in->head = 0;
in->tail = 0;
return 0;
} else
if (n != -1) {
/* This should never occur; it would be a C library bug. */
errno = EIO;
return 0;
} else {
const int err = errno;
if (err != EAGAIN && err != EWOULDBLOCK && err != EINTR)
return 0;
/* EAGAIN, EWOULDBLOCK, and EINTR are not real errors. */
}
/* Nonblocking operation, with timeout == 0.0? */
if (timeout == 0.0) {
errno = ETIMEDOUT;
return 0;
} else
if (timeout > 0.0) {
/* Obtain current time. */
clock_gettime(timeout_clk, &now);
const double elapsed = (double)(now.tv_sec - then.tv_sec)
+ (double)(now.tv_nsec - then.tv_nsec) / 1000000000.0;
/* Timed out? */
if (elapsed >= (double)timeout / 1000.0) {
errno = ETIMEDOUT;
return 0;
}
if (timeout - elapsed < (double)DONE_POLL_INTERVAL_MS / 1000.0) {
ms = (int)(1000 * (timeout - elapsed));
if (ms < 1) {
errno = ETIMEDOUT;
return 0;
}
}
}
/* Negative timeout values means no timeout check,
and ms retains its initialized value. */
/* Another done check; it's cheap. */
if (done) {
errno = 0;
return EINTR;
}
/* Wait for input, but not longer than ms milliseconds. */
fds[0].fd = in->descriptor;
fds[0].events = POLLIN;
fds[0].revents = 0;
poll(fds, 1, ms);
/* We don't actually care about the result at this point. */
}
/* Never reached. */
}
static inline size_t input_getline(struct input *const in,
char **const dataptr,
size_t *const sizeptr,
const double timeout)
{
return input_getdelim(in, '\n', dataptr, sizeptr, timeout);
}
int main(void)
{
struct input in;
char *line = NULL;
size_t size = 0;
size_t len;
if (install_done(SIGINT) == -1 ||
install_done(SIGHUP) == -1 ||
install_done(SIGTERM) == -1 ||
install_done(SIGUSR1) == -1) {
fprintf(stderr, "Cannot install signal handlers: %s.\n", strerror(errno));
return EXIT_FAILURE;
}
if (input_use(&in, STDIN_FILENO)) {
fprintf(stderr, "BUG in input_use(): %s.\n", strerror(errno));
return EXIT_FAILURE;
}
while (!done) {
/* Wait for input for five seconds. */
len = input_getline(&in, &line, &size, 5000);
if (len > 0) {
/* Remove the newline at end, if any. */
line[strcspn(line, "\n")] = '\0';
printf("Received: \"%s\" (%zu chars)\n", line, len);
fflush(stdout);
continue;
} else
if (errno == 0) {
/* This is the special case: input_getline() returns 0 with
errno == 0 when there is no more input. */
fprintf(stderr, "End of standard input.\n");
return EXIT_SUCCESS;
} else
if (errno == ETIMEDOUT) {
printf("(No input for five seconds.)\n");
fflush(stdout);
} else
if (errno == EINTR) {
/* Break or continue works here, since input_getline() only
returns 0 with errno==EINTR if done==1. */
break;
} else {
fprintf(stderr, "Error reading from standard input: %s.\n", strerror(errno));
return EXIT_FAILURE;
}
}
printf("Signal received; done.\n");
return EXIT_SUCCESS;
}
Save it as e.g. example.c, compile using e.g. gcc -Wall -Wextra -O2 example.c -o example, and run using ./example. Type input and enter to supply lines, or Ctrl+D at the beginning of a line to end input, or Ctrl+C to send the process a SIGINT signal.
Note the compile-time constant DONE_POLL_INTERVAL_MS. If the signal is delivered between a done check and poll(), this is the maximum delay, in milliseconds (1000ths of a second), that the poll may block; and therefore is roughly the maximum delay from receiving the signal and acting upon it.
To make the example more interesting, it also implements a timeout on reading a full line also. The above example prints when it is reached, but that messes up how the user sees the input they're typing. (It does not affect the input.)
This is by no means a perfect example of such functions, but I hope it is a readable one, with the comments explaining the reasoning behind each code block.
Historically we solved this problem by always setting SA_RESTART and calling longjump() to get out of the signal handler when the condition is met.
The standard makes this undefined but I think this does the right thing when stdin is connected to the keyboard. Don't try it with redirected handles. It won't work well. At least you can check for this condition with isatty(0).
If it doesn't work and you are bent on using signals like this, you'll need to abandon scanf() and friends and get all your input using read().

Can the Pagemap folder of processes in the Linux kernel be read(64bit per read) a finite number of times?

I'm trying to keep track of the number of writes per physical page in the file "proc/PID/pagemap".But the file is binary, and the size shown in the file properties is 0, and the following function reads 0 as well.
struct stat buf;
int iRet = fstat(fd, &buf);
if(iRet == -1)
{
perror("fstat error");
exit(-1);
}
printf("the size of file is : %ld\n", buf.st_size);
I write a monitor program to read data from a process's "pagemap" 64bit one time and record the 55-bit(soft dirty bit)to check if one page is written.Of course before doing this I cleared all soft dirty bit in a process's pagemap.This method is provided by linux kernel and my question during coding is that when I use file descriptor(also tried fstream pointer) to get the data from pagemap.My reading of pagemap ends only when the process I'm monitoring is finished, as if the file were infinite.I know the process's logical address mangement is dynamic but I want to know how could I count the write number properly.Should I read a part of this infinite file within a fixed time intervals?And how many items should I read? T _ T.
You need something like the following:
#define _GNU_SOURCE
#include <stdlib.h>
#include <stdint.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
struct pagemap_region {
struct pagemap_region *next;
uintptr_t addr; /* First address within region */
uintptr_t ends; /* First address after region */
size_t pages; /* Number of pages in this region */
uint64_t page[]; /* 64-bit pagemap flags per page */
};
static void free_pagemaps(struct pagemap_region *list)
{
while (list) {
struct pagemap_region *curr = list;
list = curr->next;
curr->addr = 0;
curr->ends = 0;
curr->pages = 0;
free(curr);
}
}
struct pagemap_region *get_pagemaps(const pid_t pid)
{
struct pagemap_region *list = NULL;
size_t page;
char *line_ptr = NULL;
size_t line_max = 256;
ssize_t line_len;
FILE *maps;
int n, fd;
page = sysconf(_SC_PAGESIZE);
/* We reuse this for the input line buffer. */
line_ptr = malloc(line_max);
if (!line_ptr) {
errno = ENOMEM;
return NULL;
}
/* First, fill it with the path to the map pseudo-file. */
if (pid > 0)
n = snprintf(line_ptr, line_max, "/proc/%d/maps", (int)pid);
else
n = snprintf(line_ptr, line_max, "/proc/self/maps");
if (n < 0 || (size_t)n + 1 >= line_max) {
free(line_ptr);
errno = EINVAL;
return NULL;
}
/* Read the maps pseudo-file. */
maps = fopen(line_ptr, "re"); /* Read-only, close-on-exec */
if (!maps) {
free(line_ptr);
errno = ESRCH;
return NULL;
}
while (1) {
struct pagemap_region *curr;
unsigned long addr, ends;
size_t pages;
char *ptr, *end;
line_len = getline(&line_ptr, &line_max, maps);
if (line_len < 0)
break;
/* Start address of the region. */
end = ptr = line_ptr;
errno = 0;
addr = strtoul(ptr, &end, 16);
if (errno || end == ptr || *end != '-')
break;
/* End address of the region. */
ptr = ++end;
errno = 0;
ends = strtoul(ptr, &end, 16);
if (errno || end == ptr || *end != ' ')
break;
/* Number of pages in the region. */
pages = (ends - addr) / page;
if (addr + page * pages != ends || (addr % page) != 0)
break;
/* Allocate new region map. */
curr = malloc(sizeof (struct pagemap_region) + pages * sizeof curr->page[0]);
if (!curr)
break;
curr->addr = addr;
curr->ends = ends;
curr->pages = pages;
/* Prepend to the region list. */
curr->next = list;
list = curr;
}
/* Any issues when reading the maps pseudo-file? */
if (!feof(maps) || ferror(maps)) {
fclose(maps);
free(line_ptr);
free_pagemaps(list);
errno = EIO;
return NULL;
} else
if (fclose(maps)) {
free(line_ptr);
free_pagemaps(list);
errno = EIO;
return NULL;
}
/* Reuse the line buffer for the pagemap pseudo-file path */
if (pid > 0)
n = snprintf(line_ptr, line_max, "/proc/%d/pagemap", (int)pid);
else
n = snprintf(line_ptr, line_max, "/proc/self/pagemap");
if (n < 0 || (size_t)n + 1 >= line_max) {
free(line_ptr);
free_pagemaps(list);
errno = ENOMEM;
return NULL;
}
do {
fd = open(line_ptr, O_RDONLY | O_NOCTTY | O_CLOEXEC);
} while (fd == -1 && errno == EINTR);
if (fd == -1) {
n = errno;
free(line_ptr);
free_pagemaps(list);
errno = n;
return NULL;
}
/* Path no longer needed. */
free(line_ptr);
line_ptr = NULL;
line_max = 0;
/* Read each pagemap section. */
for (struct pagemap_region *curr = list; curr != NULL; curr = curr->next) {
off_t offset = (size_t)(curr->addr / page) * (sizeof curr->page[0]);
unsigned char *ptr = (unsigned char *)&(curr->page[0]);
size_t need = curr->pages * sizeof curr->page[0];
ssize_t bytes;
while (need > 0) {
bytes = pread(fd, ptr, need, offset);
if (bytes >= need)
break;
else
if (bytes > 0) {
ptr += bytes;
offset += bytes;
need -= bytes;
} else
if (bytes == 0) {
/* Assume this is a region we can't access, like [VSYSCALL]; clear the rest of the bits. */
memset(ptr, 0, need);
break;
} else
if (bytes != -1 || errno != EINTR) {
close(fd);
free_pagemaps(list);
errno = EIO;
return NULL;
}
}
}
if (close(fd) == -1) {
free_pagemaps(list);
errno = EIO;
return NULL;
}
return list;
}
int main(int argc, char *argv[])
{
struct pagemap_region *list, *curr;
long pid;
char *end;
if (argc != 2 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
const char *argv0 = (argc > 0 && argv && argv[1]) ? argv[1] : "(this)";
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv0);
fprintf(stderr, " %s PID\n", argv0);
fprintf(stderr, "\n");
fprintf(stderr, "This program prints the a map of the pages of process PID;\n");
fprintf(stderr, "R for pages in RAM, S for pages in swap space, and . for others.\n");
fprintf(stderr, "You can use -1 for the PID of this process itself.\n");
fprintf(stderr, "\n");
return EXIT_SUCCESS;
}
end = argv[1];
errno = 0;
pid = strtol(argv[1], &end, 10);
if (errno || end == argv[1] || *end) {
fprintf(stderr, "%s: Invalid PID.\n", argv[1]);
return EXIT_FAILURE;
}
if (pid != -1 && (pid < 1 || (long)(pid_t)pid != pid)) {
fprintf(stderr, "%s: Not a valid PID.\n", argv[1]);
return EXIT_FAILURE;
}
list = get_pagemaps(pid);
if (!list) {
fprintf(stderr, "%s.\n", strerror(errno));
return EXIT_FAILURE;
}
for (curr = list; curr != NULL; curr = curr->next) {
printf("Region %p - %p: %zu pages\n", (void *)(curr->addr), (void *)(curr->ends), curr->pages);
for (uint64_t *map = curr->page; map < curr->page + curr->pages; map++) {
if ((*map >> 63) & 1)
putchar('R');
else
if ((*map >> 62) & 1)
putchar('S');
else
putchar('.');
}
putchar('\n');
}
return EXIT_SUCCESS;
}
We read /proc/PID/maps line by line, and construct a struct pagemap_region for each; this contains the start address, the end address, and the number of pages in the region. (I didn't bother to support huge pages, though; if you do, consider parsing /proc/PID/smaps instead. If a line begins with a 0-9 or lowercase a-f, it specifies an region; otherwise the line begins with a capital letter A-Z and specifies a property of that region.)
Each struct pagemap_region also contains room for the 64-bit pagemap value per page. After the regions have been found/chosen – this one tries all –, the /proc/PID/pagemap file is opened, and the corresponding data read from the proper location using pread(), which works like read(), but also takes the file offset as an extra parameter.
Not all regions are accessible. I do believe [VSYSCALL] is one of those, but being a kernel-userspace interface, its pagemap bits are uninteresting anyway. Instead of removing such regions from the list, the above just clears the bits to zero.
This is not intended as a "do it exactly like this, just copy and paste this" answer, but as a suggestion of how to start going about this, perhaps exploring a bit, comparing the results or behaviour to your particular needs; a sort of a rough outline for an initial suggestion only.
Also, as I wrote it in a single sitting, it's likely got nasty bugs in it. (If I knew where or for sure, I'd fix them; it's just that bugs happen.)

My program hangs and won't break out

I am trying to run the LIST command to display the files, but when I run it, it displays all the files like I want it to, but it just hangs there, and doesn't break back to the menu. the last 3 characters of the list is always a newline followed by a period then a newline, so I put that in an if statement to check to break out and close the socket, but it doesnt, am I missing something?
case 'l':
case 'L':
//Handle L case
sprintf(buff, "LIST\n");
send(sockfd, buff, 1000, 0);
int length = strlen(buff);
while ((rsize = recv(sockfd, buff, 1000, 0)) > 0)
{
fwrite(buff, rsize, 1, stdout);
if ( buff[length-3] == '\n' && buff[length-2] == '.' && buff[length-1] == '\n' )
{
break;
}
}
close(sockfd);
break;
Here's your problem:
if ( buff[length-3] ...
length comes from strlen(buff), and buff, at that point, contains the data you sent, not the data you received, so buff[length-3] is probably not even close to the end of your input data, which could be up to 1000 characters long.
You should be concentrating here on rsize, which is the number of bytes you received, rather than length.
EDIT: As was once mentioned in the comments (EDIT 2: and now in a separate answer), you're going to run into problems here any time recv() either unexpectedly stops in the middle of your end-of-line sequence, and particularly if it stops after having read less than three characters, since then you'll be illegally using a negative index to your array. It would be better to write a function to read an entire line from the socket and store it in your buffer, and then just call if ( !strcmp(buffer, ".") ) to know when you're done.
Here's an example:
#include <unistd.h>
#include <errno.h>
#include <string.h>
ssize_t socket_readline(const int socket, char * buffer, const size_t max_len) {
ssize_t num_read, total_read = 0;
bool finished = false;
memset(buffer, 0, max_len);
for ( size_t index = 0; !finished && index < (max_len - 1); ++index ) {
num_read = read(socket, &buffer[index], 1);
if ( num_read == -1 ) {
if ( errno == EINTR ) {
continue; /* Interrupted by signal, so continue */
}
else {
return -1; /* Other read() error, return error code */
}
}
else {
if ( buffer[index] == '\n' ) {
buffer[index] = '\0'; /* Remove newline */
finished = true; /* End of line, so stop */
}
else {
++total_read;
}
}
}
return total_read;
}
Using a system call for each individual character is a bit of an overhead, but if you don't do that you're going to have to store the additional characters you read somewhere, so unless you want to write your own buffering facilities, that's the best option.
As an aside, you should also be checking the return from send() (and from all system calls, for that matter), since that's not guaranteed to send all your characters in one go, and you may need additional tries.
You cant rely on rsize by itself. Think of what happens if one call to recv() ends on the first '\n', and then the next recv() receives the '.'. Or if recv() does not receive >=3 bytes to begin with. You would not be able to check for "\n.\n" in a single if statement like you are trying to do.
What you really should be doing instead is reading the socket data into a buffer until a '\n' is encountered (do not store it in the buffer), then process the buffer as needed and clear it, then repeat until the buffer contains only '.' by itself.
Try something like this:
case 'l':
case 'L':
{
//Handle L case
int linecapacity = 1000;
char *line = (char*) malloc(linecapacity);
if (line)
{
int linelength = 0;
if (send(sockfd, "LIST\n", 5, 0) == 5)
{
bool stop = false;
while (!stop)
{
rsize = recv(sockfd, buff, 1000, 0);
if (rsize <= 0) break;
fwrite(buff, rsize, 1, stdout);
char *start = buff;
char *end = &buff[rsize];
while ((start < end) && (!stop))
{
char *ptr = (char*) memchr(start, '\n', end-start);
if (!ptr) ptr = end;
length = (ptr - start);
int needed = (linelength + length);
if (needed > linecapacity)
{
char *newline = realloc(line, needed);
if (!newline)
{
stop = true;
break;
}
line = newline;
linecapacity = needed;
}
memcpy(buff, &line[linelength], length);
linelength += length;
if ((linelength == 1) && (line[0] == '.'))
{
stop = true;
break;
}
// process line up to linelength characters as needed...
linelength = 0;
start = ptr + 1;
}
}
}
free(line);
}
close(sockfd);
break;
}
Alternatively:
case 'l':
case 'L':
{
//Handle L case
int linecapacity = 1000;
char *line = (char*) malloc(linecapacity);
if (line)
{
int linelength = 0;
if (send(sockfd, "LIST\n", 5, 0) == 5)
{
char ch;
while (true)
{
rsize = recv(sockfd, &ch, 1, 0);
if (rsize < 1) break;
fwrite(&ch, 1, 1, stdout);
if (ch == '\n')
{
if ((linelength == 1) && (line[0] == '.'))
break;
// process line up to linelength characters as needed...
linelength = 0;
}
else
{
if (linelength == linecapacity)
{
char *newline = realloc(line, linecapacity + 1000);
if (!newline)
break;
line = newline;
linecapacity += 1000;
}
line[linelength++] = ch;
}
}
}
free(line);
}
close(sockfd);
break;
}

Linux C read file UNICODE formatted text (notepad Windows)

Is there a way to read a text file, under Linux with C, saved on Windows as "UNICODE" with notepad?
The text in Linux with nano editor looks like:
��T^#e^#s^#t^#
^#
but under vi editor is read properly as:
Test
I must specify the text is normal strings ANSI (no Unicode characters or foreign languages related).
Tried like this but no result:
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main() {
char *loc = setlocale(LC_ALL, 0);
setlocale(LC_ALL, loc);
FILE * f = fopen("unicode.txt", "r");
wint_t c;
while((c = fgetwc(f)) != WEOF) {
wprintf(L"%lc\n", c);
}
return 0;
}
UPDATE:
Forgot to mention the file format is Little-endian UTF-16 Unicode text or UTF-16LE
Include <wchar.h>, set an UTF-8 locale (setlocale(LC_ALL, "en_US.UTF-8") is fine), open the file or stream in byte-oriented mode (handle=fopen(filename, "rb"), fwide(handle,-1), i.e. in not-wide mode). Then you can use
wint_t getwc_utf16le(FILE *const in)
{
int lo, hi, code, also;
if ((lo = getc(in)) == EOF)
return WEOF;
if ((hi = getc(in)) == EOF)
return lo; /* Or abort; input sequence ends prematurely */
code = lo + 256 * hi;
if (code < 0xD800 || code > 0xDBFF)
return code; /* Or abort; input sequence is not UTF16-LE */
if ((lo = getc(in)) == EOF)
return code; /* Or abort; input sequence ends prematurely */
if ((hi = getc(in)) == EOF) {
ungetc(lo, in);
return code; /* Or abort; input sequence ends prematurely */
}
/* Note: if ((lo + 256*hi) < 0xDC00 || (lo + 256*hi) > 0xDFFF)
* the input sequence is not valid UTF16-LE. */
return 0x10000 + ((code & 0x3FF) << 10) + ((lo + 256 * hi) & 0x3FF);
}
to read code points from such an input file, assuming it contains UTF16-LE data.
The above function is more permissive than strictly necessary, but it does parse all UTF16-LE I could throw at it (including the sometimes problematic U+100000..U+10FFFF code points), so if the input is correct, this function should handle it just fine.
Because the locale is set to UTF-8 in Linux, and Linux implementations support the full Unicode set, the code points match the ones produced by above functions, and you can safely use wide character functions (from <wchar.h>) to handle the input.
Often the first character in the file is BOM, "byte-order mark", 0xFEFF. You can ignore it if it is the first character in the file. Elsewhere it is the zero-width non-breaking space. In my experience, those two bytes at the start of a file that is supposed to be text, is quite reliable indicator that the file is UTF16-LE. (So, you could peek at the first two bytes, and if they match those, assume it is UTF16-LE.)
Remember that wide-character end-of-file is WEOF, not EOF.
Hope this helps.
Edited 20150505: Here is a helper function one could use instead, to read inputs (using low-level unistd.h interface), converting to UTF-8: read_utf8.h:
#ifndef READ_UTF8_H
#define READ_UTF8_H
/* Read input from file descriptor fd,
* convert it to UTF-8 (using "UTF8//TRANSLIT" iconv conversion),
* and appending to the specified buffer.
* (*dataptr) points to a dynamically allocated buffer (may reallocate),
* (*sizeptr) points to the size allocated for that buffer,
* (*usedptr) points to the amount of data already in the buffer.
* You may initialize the values to NULL,0,0, in which case they will
* be dynamically allocated as needed.
*/
int read_utf8(char **dataptr, size_t *sizeptr, size_t *usedptr, const int fd, const char *const charset);
#endif /* READ_UTF8_H */
read_utf8.c:
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <iconv.h>
#include <string.h>
#include <errno.h>
#define INPUT_CHUNK 16384
#define OUTPUT_CHUNK 8192
int read_utf8(char **dataptr, size_t *sizeptr, size_t *usedptr, const int fd, const char *const charset)
{
char *data;
size_t size;
size_t used;
char *input_data;
size_t input_size, input_head, input_tail;
int input_more;
iconv_t conversion = (iconv_t)-1;
if (!dataptr || !sizeptr || !usedptr || fd == -1 || !charset || !*charset)
return errno = EINVAL;
if (*dataptr) {
data = *dataptr;
size = *sizeptr;
used = *usedptr;
if (used > size)
return errno = EINVAL;
} else {
data = NULL;
size = 0;
used = 0;
}
conversion = iconv_open("UTF8//TRANSLIT", charset);
if (conversion == (iconv_t)-1)
return errno = ENOTSUP;
input_size = INPUT_CHUNK;
input_data = malloc(input_size);
if (!input_data) {
if (conversion != (iconv_t)-1)
iconv_close(conversion);
errno = ENOMEM;
return 0;
}
input_head = 0;
input_tail = 0;
input_more = 1;
while (1) {
if (input_tail > input_head) {
if (input_head > 0) {
memmove(input_data, input_data + input_head, input_tail - input_head);
input_tail -= input_head;
input_head = 0;
}
} else {
input_head = 0;
input_tail = 0;
}
if (input_more && input_tail < input_size) {
ssize_t n;
do {
n = read(fd, input_data + input_tail, input_size - input_tail);
} while (n == (ssize_t)-1 && errno == EINTR);
if (n > (ssize_t)0)
input_tail += n;
else
if (n == (ssize_t)0)
input_more = 0;
else
if (n != (ssize_t)-1) {
free(input_data);
iconv_close(conversion);
return errno = EIO;
} else {
const int errcode = errno;
free(input_data);
iconv_close(conversion);
return errno = errcode;
}
}
if (input_head == 0 && input_tail == 0)
break;
if (used + OUTPUT_CHUNK > size) {
size = (used / (size_t)OUTPUT_CHUNK + (size_t)2) * (size_t)OUTPUT_CHUNK;
data = realloc(data, size);
if (!data) {
free(input_data);
iconv_close(conversion);
return errno = ENOMEM;
}
*dataptr = data;
*sizeptr = size;
}
{
char *source_ptr = input_data + input_head;
size_t source_len = input_tail - input_head;
char *target_ptr = data + used;
size_t target_len = size - used;
size_t n;
n = iconv(conversion, &source_ptr, &source_len, &target_ptr, &target_len);
if (n == (size_t)-1 && errno == EILSEQ) {
free(input_data);
iconv_close(conversion);
return errno = EILSEQ;
}
if (source_ptr == input_data + input_head && target_ptr == data + used) {
free(input_data);
iconv_close(conversion);
return errno = EDEADLK;
}
input_head = (size_t)(source_ptr - input_data);
used = (size_t)(target_ptr - data);
*usedptr = used;
}
}
free(input_data);
iconv_close(conversion);
if (used + 16 >= size) {
size = (used | 15) + 17;
data = realloc(data, size);
if (!data)
return errno = ENOMEM;
*dataptr = data;
*sizeptr = size;
memset(data + used, 0, size - used);
} else
if (used + 32 < size)
memset(data + used, 0, size - used);
else
memset(data + used, 0, 32);
return errno = 0;
}
and an example program, example.c, on how to use it:
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
#include "read_utf8.h"
int main(int argc, char *argv[])
{
char *file_buffer = NULL;
size_t file_allocd = 0;
size_t file_length = 0;
int fd;
if (argc != 3 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv[0]);
fprintf(stderr, " %s FILENAME CHARSET\n", argv[0]);
fprintf(stderr, " %s FILENAME CHARSET//IGNORE\n", argv[0]);
fprintf(stderr, "\n");
return EXIT_FAILURE;
}
do {
fd = open(argv[1], O_RDONLY | O_NOCTTY);
} while (fd == -1 && errno == EINTR);
if (fd == -1) {
fprintf(stderr, "%s: %s.\n", argv[1], strerror(errno));
return EXIT_FAILURE;
}
if (read_utf8(&file_buffer, &file_allocd, &file_length, fd, argv[2])) {
if (errno == ENOTSUP)
fprintf(stderr, "%s: Unsupported character set.\n", argv[2]);
else
fprintf(stderr, "%s: %s.\n", argv[1], strerror(errno));
return EXIT_FAILURE;
}
errno = EIO;
if (close(fd)) {
fprintf(stderr, "%s: %s.\n", argv[1], strerror(errno));
return EXIT_FAILURE;
}
fprintf(stderr, "%s: read %zu bytes, allocated %zu.\n", argv[1], file_length, file_allocd);
if (file_length > 0)
if (fwrite(file_buffer, file_length, 1, stdout) != 1) {
fprintf(stderr, "Error writing to standard output.\n");
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
This lets you read (either into an empty, dynamically allocated buffer, or append to an existing dynamically allocated buffer) using any character set supported by your system (use iconv --list to see the list), auto-converting the contents to UTF-8.
It uses a temporary input buffer (of INPUT_CHUNK bytes) to read the file part by part, and reallocates the output buffer in multiples of OUTPUT_CHUNK bytes, keeping at least OUTPUT_CHUNK bytes available for each conversion. The constants may need a bit of tuning for different use cases; they're by no means optimal or even suggested values. Larger ones lead to faster code, especially for INPUT_CHUNK, as most filesystems perform better when reading large chunks (2097152 is suggested size currently, if I/O performance is important) -- but you should have OUTPUT_CHUNK at similar size, or perhaps twice that, to reduce the number of reallocations needed. (You can trim the resulting buffer afterwards, to used+1 bytes, using realloc(), to avoid memory waste.)

using C, if statement not working (using fork and child processes to run some part of the program)

i've created a program which is a re-make of the wc program in BASH. For some reason my check doesn't work as it should. Word count and Line count (which are handled by my child processes, using fork) still display when they should not. if i type './test -n' it is only meant to display the current user. however it seems to display that followed by word and line count, even though i didn't ask for it. the if statement which doesn't seem to work is near the bottom of the code. here is my code:
#include <getopt.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <ctype.h>
#include <unistd.h>
#include <sys/types.h>
#include <string.h>
#include <sys/wait.h>
/* Size of character buffer to read in a file. */
#define BUFFSIZE 1000000
/* Read file 'filename' into character buffer 'text'.
*
* #param filename file to read from
* #param text character buffer to read into
*
* #return the number of bytes read.
*/
long read_file(char *filename, char *buff)
{
FILE *fp = fopen(filename, "r");
long size = 0; // Number of characters read.
int len = 0;
if (fp == NULL)
{
fprintf(stderr,"1 Error could not open file: %s\n",strerror(errno));
return -1;
}
/* Go to the end of the file. */
if (fseek(fp, 0L, SEEK_END) == 0)
{
/* Get the size of the file. */
size = ftell(fp);
if (size == -1)
{
fprintf(stderr,"2 Error could not open file: %s\n",strerror(errno));
return -1;
}
/* Go back to the start of the file. */
if (fseek(fp, 0L, SEEK_SET) != 0)
{
fprintf(stderr,"3 Error rewinding to start of file: %s\n",strerror(errno));
return -1;
}
/* Read the entire file into memory. */
len = fread(buff, sizeof(char), (size_t)size, fp);
if (len == 0)
{
fprintf(stderr,"4 Error reading file into memory: %s\n",strerror(errno));
return -1;
}
else
{
buff[++len] = '\0'; /* Add a null-terminator. */
}
}
(void)fclose(fp);
return size;
}
int compute_words(char* fileloc)
{
int wordcount = 0;
int check = 1;
char file;
FILE *f = fopen(fileloc, "r");
while((file=getc(f)) != EOF)
{
if(isspace(file) || file == '\t' || file == '\n')
{
if (check == 0)
{
check++;
wordcount++;
}
}
else
{
check = 0;
}
}
fclose(f);
return wordcount;
}
int compute_lines(char* fileloc)
{
int linecount = 0;
char file;
FILE *f = fopen(fileloc, "r");
while((file=getc(f)) != EOF)
{
if(file == '\n')
linecount++;
}
fclose(f);
return linecount;
}
/* The name of this program. */
const char* program_name;
/* Prints usage information for this program to STREAM (typically
stdout or stderr), and exit the program with EXIT_CODE. Does not
return. */
void print_usage (FILE* stream, int exit_code)
{
fprintf (stream, "Usage: %s options [ inputfile .... ]\n", program_name);
fprintf (stream,
" -h --help Display this usage information.\n"
" -n --num Display my student number.\n"
" -c --chars Print number of characters in FILENAME.\n"
" -w --words Print number of words in FILENAME.\n"
" -l --lines Print number of lines in FILENAME.\n"
" -f --file FILENAME Read from file.\n");
exit (exit_code);
}
/* Main program entry point. ARGC contains number of argument list
elements; ARGV is an array of pointers to them. */
int main (int argc, char* argv[])
{
int pipes[2][2];
pid_t child[2];
int status = 0;
int i;
//printf("\nParents Pro ID is %d\n\n", getpid());
char* fileloc = "/usr/share/dict/words";
char buffer[BUFFSIZE];
char* buff = &buffer[0];
int num = 0, chars = 0, words = 0, lines = 0;
int wordcount = 0;
int linecount = 0;
int next_option;
/* A string listing valid short options letters. */
const char* const short_options = "hncwlf:";
/* An array describing valid long options. */
const struct option long_options[] = {
{ "help", 0, NULL, 'h' },
{ "num", 0, NULL, 'n' },
{ "chars", 0, NULL, 'c' },
{ "words", 0, NULL, 'w' },
{ "lines", 0, NULL, 'l' },
{ "file", 1, NULL, 'f' },
{ NULL, 0, NULL, 0 } /* Required at end of array. */};
/* The name of the file to receive program output, or NULL for
standard output. */
const char* output_filename = NULL;
/* Remember the name of the program, to incorporate in messages.
The name is stored in argv[0]. */
program_name = argv[0];
do
{
next_option = getopt_long (argc, argv, short_options,long_options, NULL);
switch (next_option)
{
case 'h': /* -h or --help */
/* User has requested usage information. Print it to standard
output, and exit with exit code zero (normal termination). */
print_usage (stdout, 0);
case 'n':
num=1;
break;
case 'c':
chars=1;
break;
case 'w':
words=1;
break;
case 'l':
lines=1;
break;
case 'f':
fileloc = optarg;
break;
case '?': /* The user specified an invalid option. */
/* Print usage information to standard error, and exit with exit
code one (indicating abnormal termination). */
print_usage (stderr, 1);
case -1: /* Done with options. */
if(!num && !chars && !words && !lines)
chars=1;words=1;lines=1;
break;
default: /* Something else: unexpected. */
abort ();
}
}
while (next_option != -1);
for(i = 0; i < 3; i++)
{
if (pipe(pipes[i]) != 0)
{
printf("Error pipe %d could not be created\n", i);
exit(1);
}
if ((child[i] = fork()) == -1)//create fork
{
printf("Error fork %d could not be created\n", i);
exit(1);
}
else if (child[i] == 0) //fork successful
{
close(pipes[i][0]);
if(words && child[0]) //child 1
{
int computewords = compute_words(fileloc);
write(pipes[0][1], &computewords, sizeof(computewords));
}
if(lines && child[1]) //child 2
{
int computelines = compute_lines(fileloc);
write(pipes[1][1], &computelines, sizeof(computelines));
}
exit(0);
}
}
for (i = 0; i < 2; i++)
{
wait(&status);
}
if(num)
{
char *z=getenv("USER");
if(z == NULL) return EXIT_FAILURE;
printf("\nStudent number: 12345 and logged in as %s\n", z);
}
if(chars)
printf("\nNumber of Characters in the file:%s:\t%ld\n", fileloc, read_file(fileloc, buff));
if(words)
{
close(pipes[0][1]);
read(pipes[0][0], &wordcount, 50);
close(pipes[0][0]);
printf("\nNumber of Words in the file:%s:\t%d\n", fileloc, wordcount);
}
if(lines)
{
close(pipes[1][1]);
read(pipes[1][0], &linecount, 50);
close(pipes[1][0]);
printf("\nNumber of Lines in the file:%s:\t%d\n", fileloc, linecount);
}
close(pipes[0][0]);
close(pipes[1][0]);
close(pipes[0][1]);
close(pipes[1][1]);
return 0;
}
There's something else going on here - an if statement will work if you're getting the expected arguments. Try debugging the main program as it appears you have an error in your option parsing.
Consider the following case statement:
case -1: /* Done with options. */
if(!num && !chars && !words && !lines)
chars=1;words=1;lines=1;
break;
You have an if without braces around the assignments. Just because the statements are on the same line, doesn't mean the parser understands your intent. Instead it'll be parsed as such:
case -1: /* Done with options. */
if(!num && !chars && !words && !lines)
chars=1;
words=1;
lines=1;
break;
Which certainly will lead to unexpected behavior.
if(!num && !chars && !words && !lines)
chars=1;words=1;lines=1;
is equivalent to
if(!num && !chars && !words && !lines)
chars=1;
words=1;
lines=1;
You need some braces, or to put everything in a single statement like this:
if(!num && !chars && !words && !lines)
chars=words=lines=1;

Resources