My setup: gcc-4.9.2, UTF-8 environment.
The following C-program works in ASCII, but does not in UTF-8.
Create input file:
echo -n 'привет мир' > /tmp/вход
This is test.c:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SIZE 10
int main(void)
{
char buf[SIZE+1];
char *pat = "привет мир";
char str[SIZE+2];
FILE *f1;
FILE *f2;
f1 = fopen("/tmp/вход","r");
f2 = fopen("/tmp/выход","w");
if (fread(buf, 1, SIZE, f1) > 0) {
buf[SIZE] = 0;
if (strncmp(buf, pat, SIZE) == 0) {
sprintf(str, "% 11s\n", buf);
fwrite(str, 1, SIZE+2, f2);
}
}
fclose(f1);
fclose(f2);
exit(0);
}
Check the result:
./test; grep -q ' привет мир' /tmp/выход && echo OK
What should be done to make UTF-8 code work as if it was ASCII code - not to bother how many bytes a symbol takes, etc. In other words: what to change in the example to treat any UTF-8 symbol as a single unit (that includes argv, STDIN, STDOUT, STDERR, file input, output and the program code)?
#define SIZE 10
The buffer size of 10 is insufficient to store the UTF-8 string привет мир. Try changing it to a larger value. On my system (Ubuntu 12.04, gcc 4.8.1), changing it to 20, worked perfectly.
UTF-8 is a multibyte encoding which uses between 1 and 4 bytes per character. So, it is safer to use 40 as the buffer size above.
There is a big discussion at How many bytes does one Unicode character take? which might be interesting.
Siddhartha Ghosh's answer gives you the basic problem. Fixing your code requires more work, though.
I used the following script (chk-utf8-test.sh):
echo -n 'привет мир' > вход
make utf8-test
./utf8-test
grep -q 'привет мир' выход && echo OK
I called your program utf8-test.c and amended the source like this, removing the references to /tmp, and being more careful with lengths:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SIZE 40
int main(void)
{
char buf[SIZE + 1];
char *pat = "привет мир";
char str[SIZE + 2];
FILE *f1 = fopen("вход", "r");
FILE *f2 = fopen("выход", "w");
if (f1 == 0 || f2 == 0)
{
fprintf(stderr, "Failed to open one or both files\n");
return(1);
}
size_t nbytes;
if ((nbytes = fread(buf, 1, SIZE, f1)) > 0)
{
buf[nbytes] = 0;
if (strncmp(buf, pat, nbytes) == 0)
{
sprintf(str, "%.*s\n", (int)nbytes, buf);
fwrite(str, 1, nbytes, f2);
}
}
fclose(f1);
fclose(f2);
return(0);
}
And when I ran the script, I got:
$ bash -x chk-utf8-test.sh
+ '[' -f /etc/bashrc ']'
+ . /etc/bashrc
++ '[' -z '' ']'
++ return
+ alias 'r=fc -e -'
+ echo -n 'привет мир'
+ make utf8-test
gcc -O3 -g -std=c11 -Wall -Wextra -Werror utf8-test.c -o utf8-test
+ ./utf8-test
+ grep -q 'привет мир' $'в?\213?\205од'
+ echo OK
OK
$
For the record, I was using GCC 5.1.0 on Mac OS X 10.10.3.
This is more of a corollary to the other answers, but I'll try to explain this from a slightly different angle.
Here is Jonathan Leffler's version of your code, with three slight changes: (1) I made explicit the actual individual bytes in the UTF-8 strings; and (2) I modified the sprintf formatting string width specifier to hopefully do what you are actually attempting to do. Also tangentially (3) I used perror to get a slightly more useful error message when something fails.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SIZE 40
int main(void)
{
char buf[SIZE + 1];
char *pat = "\320\277\321\200\320\270\320\262\320\265\321\202"
" \320\274\320\270\321\200"; /* "привет мир" */
char str[SIZE + 2];
FILE *f1 = fopen("\320\262\321\205\320\276\320\264", "r"); /* "вход" */
FILE *f2 = fopen("\320\262\321\213\321\205\320\276\320\264", "w"); /* "выход" */
if (f1 == 0 || f2 == 0)
{
perror("Failed to open one or both files"); /* use perror() */
return(1);
}
size_t nbytes;
if ((nbytes = fread(buf, 1, SIZE, f1)) > 0)
{
buf[nbytes] = 0;
if (strncmp(buf, pat, nbytes) == 0)
{
sprintf(str, "%*s\n", 1+(int)nbytes, buf); /* nbytes+1 length specifier */
fwrite(str, 1, 1+nbytes, f2); /* +1 here too */
}
}
fclose(f1);
fclose(f2);
return(0);
}
The behavior of sprintf with a positive numeric width specifier is to pad with spaces from the left, so the space you tried to use is superfluous. But you have to make sure the target field is wider than the string you are printing in order for any padding to actually take place.
Just to make this answer self-contained, I will repeat what others have already said. A traditional char is always exactly one byte, but one character in UTF-8 is usually not exactly one byte, except when all your characters are actually ASCII. One of the attractions of UTF-8 is that legacy C code doesn't need to know anything about UTF-8 in order to continue to work, but of course, the assumption that one char is one glyph cannot hold. (As you can see, for example, the glyph п in "привет мир" maps to the two bytes -- and hence, two chars -- "\320\277".)
This is clearly less than ideal, but demonstrates that you can treat UTF-8 as "just bytes" if your code doesn't particularly care about glyph semantics. If yours does, you are better off switching to wchar_t as outlined e.g. here: http://www.gnu.org/software/libc/manual/html_node/Extended-Char-Intro.html
However, the standard wchar_t is less than ideal when the standard expectation is UTF-8. See e.g. the GNU libunistring documentation for a less intrusive alternative, and a bit of background. With that, you should be able to replace char with uint8_t and the various str* functions with u8_str* replacements and be done. The assumption that one glyph equals one byte will still need to be addressed, but that becomes a minor technicality in your example program. An adaptation is available at http://ideone.com/p0VfXq (though unfortunately the library is not available on http://ideone.com/ so it cannot be demonstrated there).
The following code works as required:
#include <stdio.h>
#include <locale.h>
#include <stdlib.h>
#include <wchar.h>
#define SIZE 10
int main(void)
{
setlocale(LC_ALL, "");
wchar_t buf[SIZE+1];
wchar_t *pat = L"привет мир";
wchar_t str[SIZE+2];
FILE *f1;
FILE *f2;
f1 = fopen("/tmp/вход","r");
f2 = fopen("/tmp/выход","w");
fgetws(buf, SIZE+1, f1);
if (wcsncmp(buf, pat, SIZE) == 0) {
swprintf(str, SIZE+2, L"% 11ls", buf);
fputws(str, f2);
}
fclose(f1);
fclose(f2);
exit(0);
}
Probably your test.c file is not stored in UTF-8 format and for that reason "привет мир" string is ASCII - and the comparison failed. Change text encoding of source file and try again.
Related
My application needs to read like thousands of lines from a large csv file around 300GB with billion lines, each line contains several numbers. The data are like these:
1, 34, 56, 67, 678, 23462, ...
2, 3, 6, 8, 34, 5
23,547, 648, 34657 ...
...
...
I tried fget reading file line by line in c, but it took really really really long, even with wc -l in linux, just to read all of the line, it took quite a while.
I also tried to write all data to sqlite3 database based on the logics of the application. However, the data structure is different than the csv file above, which now has 100 billion lines, with only two numbers each line. I then created two indices on top of them, which resulted a 2.5TB database, while it was 1 TB without indices before. Since the scale of indices are large than data, query has to read the whole 1.5 TB indices, I think it doesn't make any sense to use database method right?
So I would like to ask, what is the quickest way to read several lines within a large csv file with billion lines in C or python. And by the way, is there any formula or something to calculate the time consume between reading file and capacity of RAM.
environment: linux, RAM 200GB, C, python
Requirements
huge csv file, several hundred GB in size
each line contains several numbers
the program must extract several thousand lines per run
the program works several times with the same file, only different lines should be extracted
Since lines in the csv files have a variable length, you would have to read the entire file to get the data of the required lines. Sequential reading of the entire file would still be very slow - even if you optimized the file reading as much as possible. A good indicator is actually the runtime of wc -l, as mentioned already by the OP in the question.
Instead, one should optimize on the algorithmic level. A one-time preprocessing of the data is necessary, which then allows fast access to certain lines - without reading the whole file.
There are several possible ways, for example:
Using a database with an index
programmatic creation of an index file (association of line numbers with file offsets)
convert the csv file into a binary file with fixed format
The OP test shows that approach 1) led to 1.5 TB indices. Method 2), to create a small program that connects the line number with a file offset is certainly also a possibility. Finally, approach 3 would allow to calculate the file offset to a line number without the need for a separate index file. This approach is especially useful if the maximum number of numbers per line is known. Otherwise, approach 2 and approach 3 are very similar.
Approach 3 is explained in more detail below. There may be additional requirements that require the approach to be slightly modified, but the following should get things started.
A one-time pre-processing is necessary. The textual csv lines are converted into int arrays and use a fixed record format to store the ints in binary format in a separate file. To then read a particular line n, you can simply calculate the file offset, e.g. with line_nr * (sizeof(int) * MAX_NUMBERS_PER_LINE);. Finally, with fseeko(fp, offset, SEEK_SET); jump to this offset and read MAX_NUMBERS_PER_LINE ints. So you only need to read the data that you actually want to process.
This has not only the advantage that the program runs much faster, it also requires very little main memory.
Test case
A test file with 3,000,000,000 lines was created. Each line contains up to 10 random int numbers, separated by a comma.
In this case this gave a csv file with about 342 GB of data.
A quick test with
time wc -l numbers.csv
gives
187.14s user 74.55s system 96% cpu 4:31.48 total
This means that it would take a total of at least 4.5 minutes if a sequential file read approach were used.
For one-time preprocessing, a converter program reads each line and stores 10 binary ints per line. The converted file is called 'numbers_bin'. A quick test with access to the data of 10,000 randomly selected rows:
time demo numbers_bin
gives
0.03s user 0.20s system 5% cpu 4.105 total
So instead of 4.5 minutes, it takes 4.1 seconds for this specific example data. That is more than a factor of 65 faster.
Source Code
This approach may sound more complicated than it actually is.
Let's start with the converter program. It reads the csv file and creates a binary fixed format file.
The interesting part takes place in the function pre_process: there a line is read in a loop with 'getline', the numbers are extracted with 'strtok' and 'strtol' and put into an int array initialized with 0. Finally this array is written to the output file with 'fwrite'.
Errors during the conversion result in a message on stderr and the program is terminated.
convert.c
#include "data.h"
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <limits.h>
static void pre_process(FILE *in, FILE *out) {
int *block = get_buffer();
char *line = NULL;
size_t line_capp = 0;
while (getline(&line, &line_capp, in) > 0) {
line[strcspn(line, "\n")] = '\0';
memset(block, 0, sizeof(int) * MAX_ELEMENTS_PER_LINE);
char *token;
char *ptr = line;
int i = 0;
while ((token = strtok(ptr, ", ")) != NULL) {
if (i >= MAX_ELEMENTS_PER_LINE) {
fprintf(stderr, "too many elements in line");
exit(EXIT_FAILURE);
}
char *end_ptr;
errno = 0;
long val = strtol(token, &end_ptr, 10);
if (val > INT_MAX || val < INT_MIN || errno || *end_ptr != '\0' || end_ptr == token) {
fprintf(stderr, "value error with '%s'\n", token);
exit(EXIT_FAILURE);
}
ptr = NULL;
block[i] = (int) val;
i++;
}
fwrite(block, sizeof(int), MAX_ELEMENTS_PER_LINE, out);
}
free(block);
free(line);
}
static void one_off_pre_processing(const char *csv_in, const char *bin_out) {
FILE *in = get_file(csv_in, "rb");
FILE *out = get_file(bin_out, "wb");
pre_process(in, out);
fclose(in);
fclose(out);
}
int main(int argc, char *argv[]) {
if (argc != 3) {
fprintf(stderr, "usage: convert <in> <out>\n");
exit(EXIT_FAILURE);
}
one_off_pre_processing(argv[1], argv[2]);
return EXIT_SUCCESS;
}
Data.h
A few auxiliary functions are used. They are more or less self-explanatory.
#ifndef DATA_H
#define DATA_H
#include <stdio.h>
#include <stdint.h>
#define NUM_LINES 3000000000LL
#define MAX_ELEMENTS_PER_LINE 10
void read_data(FILE *fp, uint64_t line_nr, int *block);
FILE *get_file(const char *const file_name, char *mode);
int *get_buffer();
#endif //DATA_H
Data.c
#include "data.h"
#include <stdlib.h>
void read_data(FILE *fp, uint64_t line_nr, int *block) {
off_t offset = line_nr * (sizeof(int) * MAX_ELEMENTS_PER_LINE);
fseeko(fp, offset, SEEK_SET);
if(fread(block, sizeof(int), MAX_ELEMENTS_PER_LINE, fp) != MAX_ELEMENTS_PER_LINE) {
fprintf(stderr, "data read error for line %lld", line_nr);
exit(EXIT_FAILURE);
}
}
FILE *get_file(const char *const file_name, char *mode) {
FILE *fp;
if ((fp = fopen(file_name, mode)) == NULL) {
perror(file_name);
exit(EXIT_FAILURE);
}
return fp;
}
int *get_buffer() {
int *block = malloc(sizeof(int) * MAX_ELEMENTS_PER_LINE);
if(block == NULL) {
perror("malloc failed");
exit(EXIT_FAILURE);
}
return block;
}
demo.c
And finally a demo program that reads the data for 10,000 randomly determined lines.
The function request_lines determines 10,000 random lines. The lines are sorted with qsort. The data for these lines is read. Some lines of the code are commented out. If you comment them out, the read data is output to the debug console.
#include "data.h"
#include <stdlib.h>
#include <assert.h>
#include <sys/stat.h>
static int comp(const void *lhs, const void *rhs) {
uint64_t l = *((uint64_t *) lhs);
uint64_t r = *((uint64_t *) rhs);
if (l > r) return 1;
if (l < r) return -1;
return 0;
}
static uint64_t *request_lines(uint64_t num_lines, int num_request_lines) {
assert(num_lines < UINT32_MAX);
uint64_t *request_lines = malloc(sizeof(*request_lines) * num_request_lines);
for (int i = 0; i < num_request_lines; i++) {
request_lines[i] = arc4random_uniform(num_lines);
}
qsort(request_lines, num_request_lines, sizeof(*request_lines), comp);
return request_lines;
}
#define REQUEST_LINES 10000
int main(int argc, char *argv[]) {
if (argc != 2) {
fprintf(stderr, "usage: demo <file>\n");
exit(EXIT_FAILURE);
}
struct stat stat_buf;
if (stat(argv[1], &stat_buf) == -1) {
perror(argv[1]);
exit(EXIT_FAILURE);
}
uint64_t num_lines = stat_buf.st_size / (MAX_ELEMENTS_PER_LINE * sizeof(int));
FILE *bin = get_file(argv[1], "rb");
int *block = get_buffer();
uint64_t *requests = request_lines(num_lines, REQUEST_LINES);
for (int i = 0; i < REQUEST_LINES; i++) {
read_data(bin, requests[i], block);
//do sth with the data,
//uncomment the following lines to output the data to the console
// printf("%llu: ", requests[i]);
// for (int x = 0; x < MAX_ELEMENTS_PER_LINE; x++) {
// printf("'%d' ", block[x]);
// }
// printf("\n");
}
free(requests);
free(block);
fclose(bin);
return EXIT_SUCCESS;
}
Summary
This approach provides much faster results than reading through the entire file sequentially (4 seconds instead of 4.5 minutes per run for the sample data). It also requires very little main memory.
The prerequisite is the one-time pre-processing of the data into a binary format. This conversion is quite time-consuming, but the data for certain rows can be read very quickly afterwards using a query program.
I need to get proper Polish characters "ąężźćśół". I used some solutions like setlocale, system chcp, wchar_t. Everything goes well as long as I don't use files/lists. wscanf, wprintf and wchar_t works perfectly.
But if I'm trying to read something from a file and save that into a list (even in array), then trying to put that to the screen, I can't get proper Polish characters, and in case of the lists, I'm getting different results from time to time for example, z` , A2 , like random characters from nowhere. I've been trying to get good results by using fscanf and fgets with w(wide) variations, but it doesn't work. Did I something wrong?
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>
struct dyk{
wchar_t line[200];
struct dyk *next;
};
typedef struct dyk dyk;
void printdyk(char name[100]){
dyk *wyp;
wyp = malloc(sizeof(dyk));
wchar_t yt[100];
FILE *dyktando;
dyktando = fopen(name, "r+");
if(dyktando == NULL){
wprintf(L"Błąd otwarcia pliku!\n"); //Can't open file
}else{
fgets(&wyp->line, sizeof(dyk), dyktando); //reading from file and send to the list
wprintf(L"%s\n", wyp->line); //write text from the list on the screen
wchar_t yt[100];
wscanf(L"%s", &yt); //testing strings comparing, so I have to put some variables
int n=strcmp(yt, wyp->line); //str compare
printf("%d", n); //result, it gives me -1 every time
}
fclose(dyktando);
}
I tested function with txt file that contents only one character "ż". Can't read from file properly. At the start of main function I put these 2 lines:
system("chcp 852");
setlocale(LC_ALL, ".852");
I'm using codeblock, mingw32-gcc compiler, and no flags.
You are not using wchar_t compatible functions everywhere in your code. In particular:
fgets(&wyp->line, sizeof(dyk), dyktando); //reading from file and send to the list
The wchar_t compatible version is fgetws. Also, wyp->line (without the & operator) is the correct argument.
int n=strcmp(yt, wyp->line); //str compare
wcscmp should be used instead.
Also note that sizeof on a wchar_t array is not correct when a function expects length in characters rather than bytes (like fgetws does).
A comment OP (Amatheon) made indicates that the true underlying problem is how to properly read files using wide-character functions.
To ensure maximum compatibility and portability, let's restrict to C99. Consider the following example program:
#include <stdlib.h>
#include <locale.h>
#include <string.h>
#include <stdio.h>
#include <wchar.h>
#include <wctype.h>
#include <errno.h>
#ifdef USE_ERRNO_CONSTANTS
#define SET_ERRNO(value) (errno = (value))
#else
#define SET_ERRNO(value)
#endif
ssize_t get_wide_delimited(wchar_t **lineptr, size_t *sizeptr, wint_t delim, FILE *stream)
{
wchar_t *line = NULL;
size_t size = 0;
size_t used = 0;
wint_t wc;
if (!lineptr || !sizeptr || !stream) {
/* Invalid function parameters. NULL pointers are not allowed. */
SET_ERRNO(EINVAL);
return -1;
}
if (ferror(stream)) {
/* Stream is already in error state. */
SET_ERRNO(EIO);
return -1;
}
if (*sizeptr > 0) {
line = *lineptr;
size = *sizeptr;
} else {
*lineptr = NULL;
}
while (1) {
wc = fgetwc(stream);
if (wc == WEOF || wc == delim)
break;
if (used + 1 > size) {
/* Growth policy. We wish to allocate a chunk of memory at once,
so we don't need to do realloc() too often as it is a bit slow,
relatively speaking. On the other hand, we don't want to do
too large allocations, because that would waste memory.
Anything that makes 'size' larger than 'used' will work.
*/
if (used < 254)
size = 256;
else
if (used < 65536)
size = 2 * used;
else
size = (used | 65535) + 65521;
line = realloc(line, size * sizeof (wchar_t));
if (!line) {
/* Out of memory. */
SET_ERRNO(ENOMEM);
return -1;
}
*lineptr = line;
*sizeptr = size;
}
line[used++] = wc;
}
if (wc == WEOF) {
/* Verify that the WEOF did not indicate a read error. */
if (ferror(stream)) {
/* Read error. */
SET_ERRNO(EIO);
return -1;
}
}
/* Ensure there is enough room for the delimiter and end-of-string mark. */
if (used + 2 > size) {
/* We could reuse the reallocation policy here,
with the exception that the minimum is used + 2, not used + 1.
For simplicity, we use the minimum reallocation instead.
*/
size = used + 2;
line = realloc(line, size * sizeof (wchar_t));
if (!line) {
/* Out of memory. */
SET_ERRNO(ENOMEM);
return -1;
}
*lineptr = line;
*sizeptr = size;
}
/* Append the delimiter, unless end-of-stream mark. */
if (wc != WEOF)
line[used++] = wc;
/* Append the end-of-string nul wide char,
but do not include it in the returned length. */
line[used] = L'\0';
/* Success! */
return (ssize_t)used;
}
ssize_t get_wide_line(wchar_t **lineptr, size_t *sizeptr, FILE *stream)
{
return get_wide_delimited(lineptr, sizeptr, L'\n', stream);
}
int main(int argc, char *argv[])
{
wchar_t *line = NULL, *p;
size_t size = 0;
unsigned long linenum;
FILE *in;
int arg;
if (!setlocale(LC_ALL, ""))
fprintf(stderr, "Warning: Your C library does not support your current locale.\n");
if (fwide(stdout, 1) < 1)
fprintf(stderr, "Warning: Your C library does not support wide standard output.\n");
if (argc < 2 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv[0]);
fprintf(stderr, " %s FILENAME [ FILENAME ... ]\n", argv[0]);
fprintf(stderr, "\n");
fprintf(stderr, "This program will output the named files, using wide I/O.\n");
fprintf(stderr, "\n");
return EXIT_FAILURE;
}
for (arg = 1; arg < argc; arg++) {
in = fopen(argv[arg], "r");
if (!in) {
fprintf(stderr, "%s: %s.\n", argv[arg], strerror(errno));
return EXIT_FAILURE;
}
if (fwide(in, 1) < 1) {
fprintf(stderr, "%s: Wide input is not supported from this file.\n", argv[arg]);
fclose(in);
return EXIT_FAILURE;
}
linenum = 0;
while (get_wide_line(&line, &size, in) > 0) {
linenum++;
/* We use another pointer to the line for simplicity.
We must not modify 'line' (except via 'free(line); line=NULL; size=0;'
or a similar reallocation), because it points to dynamically allocated buffer. */
p = line;
/* Remove leading whitespace. */
while (iswspace(*p))
p++;
/* Trim off the line at the first occurrence of newline or carriage return.
(The line will also end at the first embedded nul wide character, L'\0',
if the file contains any.) */
p[wcscspn(p, L"\r\n")] = L'\0';
wprintf(L"%s: Line %lu: '%ls', %zu characters.\n", argv[arg], linenum, p, wcslen(p));
}
if (ferror(in)) {
fprintf(stderr, "%s: Read error.\n", argv[arg]);
fclose(in);
return EXIT_FAILURE;
}
if (fclose(in)) {
fprintf(stderr, "%s: Delayed read error.\n", argv[arg]);
return EXIT_FAILURE;
}
wprintf(L"%s: Total %lu lines read.\n", argv[arg], linenum);
fflush(stdout);
}
free(line);
line = NULL;
size = 0;
return EXIT_SUCCESS;
}
Because the EINVAL, EIO, and ENOMEM errno constants are not defined in the C standards, the get_wide_line() and get_wide_delimited() only set errno if you define the USE_ERRNO_CONSTANTS preprocessor value.
The get_wide_line() and get_wide_delimited() are reimplementations of the getwline() and getwdelim() functions from ISO/IEC TR 24731-2:2010; the wide-character equivalents of the POSIX.1 getline() and getdelim() functions. Unlike fgets() or fgetws(), these use a dynamically allocated buffer to hold the line, so there is no fixed line length limits, other than available memory.
I've explicitly marked the code to be under Creative Commons Zero license: No Rights Reserved. It means you can use it in your own code, under whatever license you want.
Note: I would really love users to push their vendors and C standard committee members to get these included in the bog-standard C library part in the next version of the C standard. As you can see from above, they can be implemented in standard C already; it is just that the C library itself can do the same much more efficiently. The GNU C library is a perfect example of that (although even they are stalling with the implementation, because lack of standardization). Just think how many buffer overflow bugs would be avoided if people used getline()/getdelim()/getwline()/getwdelim() instead of fgets()/fgetws()! And avoid having to think about what the maximum reasonable line length in each instance would be to, too. Win-win!
(In fact, we could switch the return type to size_t, and use 0 instead of -1 as the error indicator. That would limit the changes to the text of the C standard to the addition of the four functions. It saddens and irritates me to no end, to have such a significant group of trivial functions so callously and ignorantly overlooked, for no sensible reason. Please, bug your vendors and any C standards committee members you have access to about this, as incessantly and relentlessly as you can manage. Both you and they deserve it.)
The essential parts of the program are
if (!setlocale(LC_ALL, ""))
This tells the C library to use the locale the user has specified.
Please, do not hardcode the locale value into your programs. In most operating systems, all you need to do is to change the LANG or LC_ALL environment variable to the locale you want to use, before running your program.
You might think that "well, I can hardcode it this time, because this is the locale used for this data", but even that can be a mistake, because new locales can be created at any time. This is particularly annoying when the character set part is hardcoded. For example, the ISO 8859 single-byte character set used in Western Europe is ISO 8859-15, not ISO 8859-1, because ISO 8859-15 has the € character in it, whereas ISO 8859-1 does not. If you have hardcoded ISO 8859-1 in your program, then it cannot correctly handle the € character at all.
if (fwide(stream, 1) < 1) for both stdout and file handles
While the C library does internally do an equivalent of the fwide() call based on which type of I/O function you use on the file handle the very first time, the explicit check is much better.
In particular, if the C library cannot support wide I/O to the file or stream represented by the handle, fwide() will return negative. (Unless the second parameter is also zero, it should never return zero; because of the issues in standardization, I recommend a strict return value check approach in this case, to catch vendors who decide to try to make life as difficult as possible for programmers trying to write portable code while technically still fulfilling the standard text, like Microsoft is doing. They even stuffed the C standard committee with their own representatives, so they could tweak C11 away from C99 features they didn't want to support, plus get a stamp of approval of their own nonstandard extensions nobody used before, to help create barriers for developers writing portable C code. Yeah, I don't trust their behaviour at all.)
ssize_t len = get_wide_line(&line, &size, handle);
If you initialize wchar_t *line = NULL; and size_t size = 0; prior to first call to get_wide_line() or get_wide_delimited(), the function will dynamically resize the buffer as needed.
The return value is negative if and only if an error occurs. (The functions should never return zero.)
When a line is read successfully, the return value reflects the number of wide characters in the buffer, including the delimiter (newline, L'\n' for get_wide_delimited()), and is always positive (greater than zero). The contents in the buffer will have a terminating end-of-wide-string character, L'\0', but it is not counted in the return value.
Note that when the delimiter is not L'\0', the buffer may contain embedded wide nul characters, L'\0'. In that case, len > wcslen(line).
The above example programs skips any leading whitespace on each input line, and trims off the line at the first linefeed (L'\n'), carriage return (L'\r'), or nul (L'\0'). Because of this, the return value len is only checked for success (a positive return value greater than zero).
free(line); line = NULL; size = 0;
It is okay to discard the line at any point its contents are no longer needed. I recommend explicitly setting the line pointer to NULL, and the size to zero, to avoid use-after-free bugs. Furthermore, that allows any following get_wide_line() or get_wide_delimited() to correctly dynamically allocate a new buffer.
ferror(handle) after a wide input function fails
Just like with narrow streams and EOF, there are two cases why wide input functions might return WEOF (or return -1, depending on the function): because there is no more input, or because a read error occurred.
There is no reason whatsoever to write computer programs that ignore read or write errors, without reporting them to the user. Sure, they are rare, but not so rare that a programmer can sanely expect them to never occur. (In fact, with Flash memory on flimsy circuits stored in weak plastic housings and subjected to human-sized stresses (I've sat on mine time and time again), the errors aren't that rare.) It is just evil, rather similar to food preparers being too lazy to wash their hands, causing fecal bacteria outbreaks every now and then. Don't be a fecal bacteria spreader equivalent programmer.
Let's say you have a harebrained lecturer who does not allow you to use the above get_wide_line() or get_wide_delimited() functions.
Don't worry. We can implement the same program using fgetws(), if we restrict line to some fixed upper limit (of wide characters). Lines longer than that will read as two or more lines instead:
#include <stdlib.h>
#include <locale.h>
#include <string.h>
#include <stdio.h>
#include <wchar.h>
#include <wctype.h>
#include <errno.h>
#ifndef MAX_WIDE_LINE_LEN
#define MAX_WIDE_LINE_LEN 1023
#endif
int main(int argc, char *argv[])
{
wchar_t line[MAX_WIDE_LINE_LEN + 1], *p;
unsigned long linenum;
FILE *in;
int arg;
if (!setlocale(LC_ALL, ""))
fprintf(stderr, "Warning: Your C library does not support your current locale.\n");
if (fwide(stdout, 1) < 1)
fprintf(stderr, "Warning: Your C library does not support wide standard output.\n");
if (argc < 2 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv[0]);
fprintf(stderr, " %s FILENAME [ FILENAME ... ]\n", argv[0]);
fprintf(stderr, "\n");
fprintf(stderr, "This program will output the named files, using wide I/O.\n");
fprintf(stderr, "\n");
return EXIT_FAILURE;
}
for (arg = 1; arg < argc; arg++) {
in = fopen(argv[arg], "r");
if (!in) {
fprintf(stderr, "%s: %s.\n", argv[arg], strerror(errno));
return EXIT_FAILURE;
}
if (fwide(in, 1) < 1) {
fprintf(stderr, "%s: Wide input is not supported from this file.\n", argv[arg]);
fclose(in);
return EXIT_FAILURE;
}
linenum = 0;
while (1) {
/* If line is an array, (sizeof line / sizeof line[0]) evaluates to
the number of elements in it. This does not work if line is a pointer
to dynamically allocated memory. In that case, you need to remember
number of wide characters you allocated for in a separate variable,
and use that variable here instead. */
p = fgetws(line, sizeof line / sizeof line[0], in);
if (!p)
break;
/* Have a new line. */
linenum++;
/* Remove leading whitespace. */
while (iswspace(*p))
p++;
/* Trim off the line at the first occurrence of newline or carriage return.
(The line will also end at the first embedded nul wide character, L'\0',
if the file contains any.) */
p[wcscspn(p, L"\r\n")] = L'\0';
wprintf(L"%s: Line %lu: '%ls', %zu characters.\n", argv[arg], linenum, p, wcslen(p));
}
if (ferror(in)) {
fprintf(stderr, "%s: Read error.\n", argv[arg]);
fclose(in);
return EXIT_FAILURE;
}
if (fclose(in)) {
fprintf(stderr, "%s: Delayed read error.\n", argv[arg]);
return EXIT_FAILURE;
}
wprintf(L"%s: Total %lu lines read.\n", argv[arg], linenum);
fflush(stdout);
}
return EXIT_SUCCESS;
}
Aside from the function used to read each line, the difference is that instead of keeping the while loop condition as while ((p = fgetws(line, ...))) { ... }, I changed to the while (1) { p = fgetws(line, ...); if (!p) break; ... form that I believe is more readable.
I did deliberately show the longer, more complicated-looking one first, and this simpler one last, in the hopes that you would see that the more complicated-looking one actually has the simpler main() -- if we don't just count lines of code or something equally silly, but look at how many opportunities for mistakes there are.
As OP themselves wrote in a comment, the size of the buffer passed to fgets() or fgetws() is a real issue. There are rules of thumb, but they all suffer from being fragile against edits (especially the differences between arrays and pointers). With getline()/getdelim()/getwline()/getwdelim()/get_wide_line()/get_wide_delimited(), the rule of thumb is wchar_t *line = NULL; size_t size = 0; ssize_t len; and len = get_wide_line(&line, &size, handle);. No variations, and simple to remember and use. Plus it gets rid of any fixed limitations.
I've found on google code that was over 50 lines long and that's completely unnecessary for what I'm trying to do.
I want to make a very simple cp implementation in C.
Just so I can play with the buffer sizes and see how it affects performance.
I want to use only Linux API calls like read() and write() but I'm having no luck.
I want a buffer that is defined as a certain size so data from file1 can be read into buffer and then written to file2 and that continues until file1 has reached EOF.
Here is what I tried but it doesn't do anything
#include <stdio.h>
#include <sys/types.h>
#define BUFSIZE 1024
int main(int argc, char* argv[]){
FILE fp1, fp2;
char buf[1024];
int pos;
fp1 = open(argv[1], "r");
fp2 = open(argv[2], "w");
while((pos=read(fp1, &buf, 1024)) != 0)
{
write(fp2, &buf, 1024);
}
return 0;
}
The way it would work is ./mycopy file1.txt file2.txt
This code has an important problem, the fact that you always write 1024 bytes regardless of how many you read.
Also:
You don't check the number of command line arguments.
You don't check if the source file exists (if it opens).
You don't check that the destination file opens (permission issues).
You pass the address of the array which has a different type than the pointer to the first element to the array.
The type of fp1 is wrong, as well as that of fp2.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
int main(int argc, char **argv)
{
char buffer[1024];
int files[2];
ssize_t count;
/* Check for insufficient parameters */
if (argc < 3)
return -1;
files[0] = open(argv[1], O_RDONLY);
if (files[0] == -1) /* Check if file opened */
return -1;
files[1] = open(argv[2], O_WRONLY | O_CREAT | S_IRUSR | S_IWUSR);
if (files[1] == -1) /* Check if file opened (permissions problems ...) */
{
close(files[0]);
return -1;
}
while ((count = read(files[0], buffer, sizeof(buffer))) != 0)
write(files[1], buffer, count);
return 0;
}
Go to section 8.3 of the K&R "The C Programming Language". There you will see an example of what you want to accomplish. Try using different buffer sizes and you will end up seeing a point where the performance tops.
#include <stdio.h>
int cpy(char *, char *);
int main(int argc, char *argv[])
{
char *fn1 = argv[1];
char *fn2 = argv[2];
if (cpy(fn2, fn1) == -1) {
perror("cpy");
return 1;
}
reurn 0;
}
int cpy(char *fnDest, char *fnSrc)
{
FILE *fpDest, *fpSrc;
int c;
if ((fpDest = fopen(fnDest, "w")) && (fpSrc = fopen(fnSrc, "r"))) {
while ((c = getc(fpSrc)) != EOF)
putc(fpDest);
fclose(fpDest);
fclose(fpSrc);
return 0;
}
return -1;
}
First, we get the two file names from the command line (argv[1] and argv[2]). The reason we don't start from *argv, is that it contains the program name.
We then call our cpy function, which copies the contents of the second named file to the contents of the first named file.
Within cpy, we declare two file pointers: fpDest, the destination file pointer, and fpSrc, the source file pointer. We also declare c, the character that will be read. It is of type int, because EOF does not fit in a char.
If we could open the files succesfully(if fopen does not return NULL), we get characters from fpSrc and copy them onto fpDest, as long as the character we have read is not EOF. Once we have seen EOF, we close our file pointers, and return 0, the success indicator. If we could not open the files, -1 is returned. The caller can check the return value for -1, and if it is, print an error message.
Good question. Related to another good question:
How can I copy a file on Unix using C?
There are two approaches to the "simplest" implementation of cp. One approach uses a file copying system call function of some kind - the closest thing we get to a C function version of the Unix cp command. The other approach uses a buffer and read/write system call functions, either directly, or using a FILE wrapper.
It's likely the file copying system calls that take place solely in kernel-owned memory are faster than the system calls that take place in both kernel- and user-owned memory, especially in a network filesystem setting (copying between machines). But that would require testing (e.g. with Unix command time) and will be dependent on the hardware where the code is compiled and executed.
It's also likely that someone with an OS that doesn't have the standard Unix library will want to use your code. Then you'd want to use the buffer read/write version, since it only depends on <stdlib.h> and <stdio.h> (and friends).
<unistd.h>
Here's an example that uses function copy_file_range from the unix standard library <unistd.h>, to copy a source file to a (possible non-existent) destination file. The copy takes place in kernel space.
/* copy.c
*
* Defines function copy:
*
* Copy source file to destination file on the same filesystem (possibly NFS).
* If the destination file does not exist, it is created. If the destination
* file does exist, the old data is truncated to zero and replaced by the
* source data. The copy takes place in the kernel space.
*
* Compile with:
*
* gcc copy.c -o copy -Wall -g
*/
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <unistd.h>
/* On versions of glibc < 2.27, need to use syscall.
*
* To determine glibc version used by gcc, compute an integer representing the
* version. The strides are chosen to allow enough space for two-digit
* minor version and patch level.
*
*/
#define GCC_VERSION (__GNUC__*10000 + __GNUC_MINOR__*100 + __gnuc_patchlevel__)
#if GCC_VERSION < 22700
static loff_t copy_file_range(int in, loff_t* off_in, int out,
loff_t* off_out, size_t s, unsigned int flags)
{
return syscall(__NR_copy_file_range, in, off_in, out, off_out, s,
flags);
}
#endif
/* The copy function.
*/
int copy(const char* src, const char* dst){
int in, out;
struct stat stat;
loff_t s, n;
if(0>(in = open(src, O_RDONLY))){
perror("open(src, ...)");
exit(EXIT_FAILURE);
}
if(fstat(in, &stat)){
perror("fstat(in, ...)");
exit(EXIT_FAILURE);
}
s = stat.st_size;
if(0>(out = open(dst, O_CREAT|O_WRONLY|O_TRUNC, 0644))){
perror("open(dst, ...)");
exit(EXIT_FAILURE);
}
do{
if(1>(n = copy_file_range(in, NULL, out, NULL, s, 0))){
perror("copy_file_range(...)");
exit(EXIT_FAILURE);
}
s-=n;
}while(0<s && 0<n);
close(in);
close(out);
return EXIT_SUCCESS;
}
/* Test it out.
*
* BASH:
*
* gcc copy.c -o copy -Wall -g
* echo 'Hello, world!' > src.txt
* ./copy src.txt dst.txt
* [ -z "$(diff src.txt dst.txt)" ]
*
*/
int main(int argc, char* argv[argc]){
if(argc!=3){
printf("Usage: %s <SOURCE> <DESTINATION>", argv[0]);
exit(EXIT_FAILURE);
}
copy(argv[1], argv[2]);
return EXIT_SUCCESS;
}
It's based on the example in my Ubuntu 20.x Linux distribution's man page for copy_file_range. Check your man pages for it with:
> man copy_file_range
Then hit j or Enter until you get to the example section. Or search by typing /example.
<stdio.h>/<stdlib.h> only
Here's an example that only uses stdlib/stdio. The downside is it uses an intermediate buffer in user-space.
/* copy.c
*
* Compile with:
*
* gcc copy.c -o copy -Wall -g
*
* Defines function copy:
*
* Copy a source file to a destination file. If the destination file already
* exists, this clobbers it. If the destination file does not exist, it is
* created.
*
* Uses a buffer in user-space, so may not perform as well as
* copy_file_range, which copies in kernel-space.
*
*/
#include <stdlib.h>
#include <stdio.h>
#define BUF_SIZE 65536 //2^16
int copy(const char* in_path, const char* out_path){
size_t n;
FILE* in=NULL, * out=NULL;
char* buf = calloc(BUF_SIZE, 1);
if((in = fopen(in_path, "rb")) && (out = fopen(out_path, "wb")))
while((n = fread(buf, 1, BUF_SIZE, in)) && fwrite(buf, 1, n, out));
free(buf);
if(in) fclose(in);
if(out) fclose(out);
return EXIT_SUCCESS;
}
/* Test it out.
*
* BASH:
*
* gcc copy.c -o copy -Wall -g
* echo 'Hello, world!' > src.txt
* ./copy src.txt dst.txt
* [ -z "$(diff src.txt dst.txt)" ]
*
*/
int main(int argc, char* argv[argc]){
if(argc!=3){
printf("Usage: %s <SOURCE> <DESTINATION>\n", argv[0]);
exit(EXIT_FAILURE);
}
return copy(argv[1], argv[2]);
}
Another way to ensure portability in general while still working with a Unix-like C API is to develop with GNOME (e.g. GLib, GIO)
https://docs.gtk.org/glib/
https://docs.gtk.org/gio/
How can I write to an existing file with UTF16LE encoding? I've already used fopen(file, "a"); but the resulting file will be like this:
<?xml version="1.0" encoding="UTF-16" standalone="yes"?>
㰼㱤㱯㱣㰾㰊㰼㱰㱡㱧㱥㰠㱮㱡㱭㱥㰽㰢㱎㱏㱒㱍㱁㱌㰢㰾㰊㰼㱦㱩㱥㱬㱤㰠㱮㱡㱭㱥㰽㰢㱉㱤㱥㱮㱴㱩㱦㱩㱣㱡㱴㱩㱯㱮㸢㱔㱃㰳㰶㰰㰴㰰㰱㰭㰭㰭㰭㰱㰲㰷㰼㰯㱦㱩㱥㱬㱤㰾㰊㰼㱦㱩㱥㱬㱤㰠㱮㱡㱭㱥㰽㰢㱔㱲㱡㱣㱥㱡㱢㱩㱬㱩㱴㱹㸢㰱㰳㱖㱖㱖㰭㰭㰭㰭㰭㰭㰭㰭㰭㰭㰭㰭㰭㰭㰭㰭㰰㰰㰼㰯㱦㱩㱥㱬㱤㰾㰊㰼㱦㱩㱥㱬㱤㰠㱮㱡㱭㱥㰽㰢㱄㱥㱳㱣㱲㱩㱰㱴㱩㱯㱮㸢㱄㱥㱳㱣㱲㱩㱰㱴㱩㱯㱮㰀㰼㰯㱦㱩㱥㱬㱤㰾㰊㰼㰯㱰㱡㱧㱥㰾㰊㰼㰯㱤㱯㱣㰾㰊
I don't know how I can append a 2-byte character to this file.
A UTF-16 character is not necessarily 2 bytes wide. It may be 2 bytes
or 4 bytes (read up here).
The weird output you have posted most likely result from appending wchar_ts
directly to the file, generating UTF-16 characters with a byte order that is
the reverse of the right one, and these UTF-16 characters lie up in the
"oriental" heights of the UTF-16 range.
Assuming from your question's tags that you are working with GCC on Linux,
you may use the iconv library by including <inconv.h> to import
a character-encoding conversion api. Here is a specimen program
that converts the wchar_t array:
L'A',L'P',L'P',L'E',L'N',L'D',L'A',L'G',L'E' // "APPENDAGE"
to UTF-16LE and appends the result to the file "tdata.txt". It hard-codes
a limit of 64 bytes on the converted length of output.
#include <stdio.h>
#include <stdlib.h>
#include <iconv.h>
#include <assert.h>
#define MAXOUT 64
int main(void)
{
wchar_t appendage [] = {
L'A',L'P',L'P',L'E',L'N',L'D',L'A',L'G',L'E'
};
wchar_t * inp = appendage;
char converted[MAXOUT];
char * outp = converted;
size_t remain_in = sizeof(appendage);
size_t remain_out = MAXOUT;
size_t conversions;
size_t written;
char const *tfile = "../tdata.txt";
// Create the right converter from wchar_t to UTF-16LE
iconv_t iconvdesc = iconv_open("UTF-16LE","WCHAR_T");
if (iconvdesc == (iconv_t) -1) {
perror("error: conversion from wchar_t to UTF-16LE is not available");
exit(EXIT_FAILURE);
}
FILE * fp = fopen(tfile,"a");
if (!fp) {
fprintf(stderr,"error: cannot open \"%s\" for append\n",tfile,stderr);
perror(NULL);
exit(EXIT_FAILURE);
}
// Do the conversion.
conversions =
iconv(iconvdesc, (char **)&inp, &remain_in, (char **)&outp, &remain_out);
if (conversions == (size_t)-1) {
perror("error: iconv() failed");
exit(EXIT_FAILURE);
}
assert(remain_in == 0);
// Write the UTF-16LE
written = fwrite(converted,1,MAXOUT - remain_out,fp);
assert(written == MAXOUT - remain_out);
fclose(fp);
iconv_close(iconvdesc);
exit(EXIT_SUCCESS);
}
For GCC, wchar_t is 4 bytes wide, hence wide enough for any UTF-16. For
Microsoft's compilers it is 2-bytes wide.
Documentation of <iconv.h> is here
I've been trying to get this code to work for hours! All I need to do is open a file to see if it is real and readable. I'm new to C so I'm sure there is something stupid I'm missing. Here is the code (shorthand, but copied):
#include <stdio.h>
main() {
char fpath[200];
char file = "/test/file.this";
sprintf(fpath,"~cs4352/projects/proj0%s",file);
FILE *fp = fopen(fpath,"r");
if(fp==NULL) {
printf("There is no file on the server");
exit(1);
}
fclose(fp);
//do more stuff
}
I have also verified that the path is correctly specifying a real file that I have read permissions to. Any other ideas?
Edit 1: I do know that the fpath ends up as "~cs4352/projects/proj0/test/file.this"
Edit 2: I have also tried the using the absolute file path. In both cases, I can verify that the paths are properly built via ls.
Edit 3: There errno is 2... I'm currently trying to track what that means in google.
Edit 4: Ok, errno of 2 is "There is no such file or directory". I am getting this when the reference path in fopen is "/home/courses1/cs4352/projects/proj0/index.html" which I verified does exist and I have read rights to it. As for the C code listed below, there may be a few semantic/newbie errors in it, but gcc does not give me any compile time warnings, and the code works exactly as it should except that it says that it keeps spitting errno of 2. In other words, I know that all the strings/char array are working properly, but the only thing that could be an issue is the fopen() call.
Solution: Ok, the access() procedure is what helped me the most (and what i am still using as it is less code, not to mention the more elegant way of doing it). The problem actually came from something that I didn't explain to you all (because I didn't see it until I used access()). To derrive the file, I was splitting strings using strtok() and was only splitting on " \n", but because this is a UNIX system, I needed to add "\r" to it as well. Once I fixed that, everything fell into place, and I'm sure that the fopen() function would work as well, but I have not tested it.
Thank you all for your helpful suggestions, and especially to Paul Beckingham for finding this wonderful solution.
Cheers!
The "~" is expanded by the shell, and is not expanded by fopen.
To test the existence and readability of a file, consider using the POSIX.1 "access" function:
#include <unistd.h>
if (access ("/path/to/file", F_OK | R_OK) == 0)
{
// file exists and is readable
}
First, file needs to be declared as char* or const char*, not simply char as you've written. But this might just be a typo, the compiler should at least give a warning there.
Secondly, use an absolute path (or a path relative to the current directory), not shell syntax with ~. The substitution of ~cs4352 by the respective home directory is usually done by the shell, but you are directly opening the file. So you are trying to open a file in a ~cs4352 subdirectory of your current working directory, which I guess is not what you want.
Other people have probably produced the equivalent (every modern shell, for example), but here's some code that will expand a filename with ~ or ~user notation.
#if __STDC_VERSION__ >= 199901L
#define _XOPEN_SOURCE 600
#else
#define _XOPEN_SOURCE 500
#endif
#include <assert.h>
#include <limits.h>
#include <pwd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
char *relfname(const char *name, char *buffer, size_t bufsiz)
{
assert(name != 0 && buffer != 0 && bufsiz != 0);
if (name[0] != '~')
strncpy(buffer, name, bufsiz);
else
{
const char *copy;
struct passwd *usr = 0;
if (name[1] == '/' || name[1] == '\0')
{
usr = getpwuid(getuid());
copy = &name[1];
}
else
{
char username[PATH_MAX];
copy = strchr(name, '/');
if (copy == 0)
copy = name + strlen(name);
strncpy(username, &name[1], copy - &name[1]);
username[copy - &name[1]] = '\0';
usr = getpwnam(username);
}
if (usr == 0)
return(0);
snprintf(buffer, bufsiz, "%s%s", usr->pw_dir, copy);
}
buffer[bufsiz-1] = '\0';
return buffer;
}
#ifdef TEST
static struct { const char *name; int result; } files[] =
{
{ "/etc/passwd", 1 },
{ "~/.profile", 1 },
{ "~root/.profile", 1 },
{ "~nonexistent/.profile", 0 },
};
#define DIM(x) (sizeof(x)/sizeof(*(x)))
int main(void)
{
int i;
int fail = 0;
for (i = 0; i < DIM(files); i++)
{
char buffer[PATH_MAX];
char *name = relfname(files[i].name, buffer, sizeof(buffer));
if (name == 0 && files[i].result != 0)
{
fail++;
printf("!! FAIL !! %s\n", files[i].name);
}
else if (name != 0 && files[i].result == 0)
{
fail++;
printf("!! FAIL !! %s --> %s (unexpectedly)\n", files[i].name, name);
}
else if (name == 0)
printf("** PASS ** %s (no match)\n", files[i].name);
else
printf("** PASS ** %s -> %s\n", files[i].name, name);
}
return((fail == 0) ? EXIT_SUCCESS : EXIT_FAILURE);
}
#endif
You could try examining errno for more information on why you're not getting a valid FILE*.
BTW-- in unix the global value errno is set by some library and system calls when they need to return more information than just "it didn't work". It is only guaranteed to be good immediately after the relevant call.
char file = "/test/file.this";
You probably want
char *file = "/test/file.this";
Are you sure you do not mean
~/cs4352/projects/proj0%s"
for your home directory?
To sum up:
Use char *file=/test/file.this";
Don't expect fopen() to do shell substitution on ~ because it won't. Use the full path or use a relative path and make sure the current directory is approrpriate.
error 2 means the file wasn't found. It wasn't found because of item #2 on this list.
For extra credit, using sprintf() like this to write into a buffer that's allocated on the stack is a dangerous habit. Look up and use snprintf(), at the very least.
As someone else here mentioned, using access() would be a better way to do what you're attempting here.