Reading files to shared memory - c

I am reading a binary file that I want to offload directly to the Xeon Phi through Cilk and shared memory.
As we are reading fairly much data at once each time and binary data the preferred option is to use fread.
So if I make a very simple example it would go like this
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
_Cilk_shared uint8_t* _Cilk_shared buf;
int main(int argc, char **argv) {
printf("Argv is %s\n", argv[1]);
FILE* infile = fopen(argv[1], "rb");
buf = (_Cilk_shared uint8_t*) _Offload_shared_malloc(2073600);
int len = fread(buf, 1, 2073600, infile);
if(ferror(infile)) {
perror("ferror");
}
printf("Len is %d and first value of buf is %d\n", len, *buf);
return 0;
}
The example is very simplified from the real code but enough to examplify the behavior.
This code would then return
ferror: Bad address
Len is 0 and first value of buf is 0
However if we switch out the fread for a fgets (not very suitable for reading binary data, specially with the return value) things work great.
That is we switch fgets((char *) buf, 2073600, infile); and then drop the len from the print out we get
first value of buf is 46
Which fits with what we need and I can run _Offload_cilk on a function with buf as an argument and do work on it.
Is there something I am missing or is fread just not supported? I've tried to find as much info on this from both intel and other sites on the internet but I have sadly been unable to.
----EDIT----
After more research into this it seems that running fread on the shared memory with a value higher than 524287 (524287 is 19 bits exactly) fread gets the error from above. At 524287 or lower things work, and you can run as many fread as you want and read all the data.
I am utterly unable to find any reason written anywhere for this.

I don't have a PHI, so unable to see if this would make a difference -- but fread has it's own buffering, and while that may be turned of for this type of readind, then I don't see why you would go through the overhead of using fread rather than just using the lower level calls of open&read, like
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#include <stdint.h>
_Cilk_shared uint8_t* _Cilk_shared buf;
int main(int argc, char **argv) {
printf("Argv is %s\n", argv[1]);
int infile = open(argv[1], O_RDONLY); // should test if open ok, but skip to make code similar to OP's
int len, pos =0, size = 2073600;
buf = (_Cilk_shared uint8_t*) _Offload_shared_malloc(size);
do {
buf[pos]=0; // force the address to be mapped to process memory before read
len = read(infile, &buf[pos], size);
if(len < 0) {
perror("error");
break;
}
pos += len; // move position forward in cases where we have no read the entire data in first read.
size -= len;
} while (size > 0);
printf("Len is %d (%d) and first value of buf is %d\n", len, pos, *buf);
return 0;
}
read & write should work with shared memory allocated without the problem you are seeing.

Can you try to insert something like this before the fread calls?
memset(buf, 0, 2073600); // after including string.h
This trick worked for me, but I don't know why (lazy allocation?).
FYI, you can also post a MIC question on this forum.

Related

Consistently Getting Null Value in C String using getcwd

I am trying to make a simple program that just writes your working directory to a file, and I cannot, for the life of me, figure out what I am doing wrong. No matter what I do, my buffer is storing null after my call to getcwd(). I suspect it may have to do with permissions, but allegedly, linux now did some wizardry to ensure that getcwd almost never has access problems (keyword, "almost"). Can anyone test it on their machines? Or is there an obvious bug I am missing?
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
printf("Error is with fopen if stops here\n");
FILE* out_file = fopen("dir_loc.sh","w+");
char* loc = malloc(sizeof(char)*10000);
size_t size = sizeof(loc);
printf("Error is with cwd if stops here\n");
loc = getcwd(loc,size);
printf("%s",loc);
fprintf(out_file,"cd %s",loc);
printf("Error is with fclose if stops here\n");
free(loc);
fclose(out_file);
return 0;
}
compiled with gcc main.c (the file is named "main.c")
EDIT: As was mentioned by different posters, sizeof(loc) was taking the size of a char pointer, and not the size of the amount of space allocated to that pointer. Changed it to malloc(sizeof(char)*1000) and it all works gravy.
Your problem is here:
size_t size = sizeof(loc);
You're getting the size of a char pointer, not the allocated memory for your char.
Change it to:
size_t size = sizeof(char) * 10000;
or even to
size_t size = 10000;
since sizeof(char) is guaranteed to be 1.
And since you're using size in your subsequent call to getcwd, you're obviously gonna have too little space to store most paths, so your result is unsurprising
If you don't want to go about changing multiple different numbers in the code every time you make a change, you can use #DEFINE text replacement to solve that.
Like this:
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define LOC_ARRAY_SIZE 10000 // Here you define the array size
int main(int argc, char *argv[])
{
printf("Error is with fopen if stops here\n");
FILE* out_file = fopen("dir_loc.sh","w+");
char* loc = malloc(sizeof(char)*LOC_ARRAY_SIZE); // sizeof(char) could be omitted
size_t size = sizeof(char)*LOC_ARRAY_SIZE;
printf("Error is with cwd if stops here\n");
loc = getcwd(loc,size);
printf("%s",loc);
fprintf(out_file,"cd %s",loc);
printf("Error is with fclose if stops here\n");
free(loc);
fclose(out_file);
return 0;
}

How to read an integer and a char with read() function in C?

I'm working on linux, I have a file that contains a line like this:
328abc
I would like, in C, to read the integer part (328) and the characters 'a','b','c', using only the function:
ssize_t read (int filedes, void *buffer, size_t size))
This is the only thing the file contains.
I know there are better ways to do that with other functions, but I haven't coded in C for a long time, and trying to help a friend, only this function is alowed.
How do I play with the buffer to do that?
Thanks
edit:
I understand that I need to parse the buffer manually. and my question is how?
If that's the only thing in the file. This will do:
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
int main() {
char buffer[6];
char intBuffer[4];
ssize_t bytesRead;
int number;
int fd;
if ((fd = open("file.txt", O_RDONLY)) == -1) {
perror("Error opening file");
exit(EXIT_FAILURE);
}
if ((bytesRead = read(fd, buffer, 6)) == -1) {
perror("Error reading file");
exit(EXIT_FAILURE);
}
memcpy(intBuffer, buffer, 3);
intBuffer[3] = '\0';
number = atoi(intBuffer);
printf("The number is %d\n", number);
exit(EXIT_SUCCESS);
}
The following code will print "The number is 328".
Is this some kind of homework?
I am asking because there are better ways to do that than using the read function.
Anyway to answer your question, read reads size bytes from the file whose file descriptor is filedes and places them to the buffer.
It does not know anything about line breaks etc. So you need to manually find where a line ends, etc. If you want to only use read, then you need to manually parse the buffer after each call to read (supposing your files contains many lines, that you want to parse).
Beware that a line may be split between two read calls, so you need to handle that case with caution.

File get contents in C

What is the best way to get the contents of a file into a single character array?
I have read this question:
Easiest way to get file's contents in C
But from the comments, I've seen that the solution isn't great for large files. I do have access to the stat function. If the file size is over 4 gb, should I just return an error?
The contents of the file is encrypted and since it's supplied by the user it could be as large as anyone would want it to be. I want it to return an error and not crash if the file is too big. The main purpose of populating the character array with the contents of a file, is to compare it to another character array and also (if needed and configured to do so) to log both of these to a log file (or multiple log files if necessary).
You may use fstat(3) from sys/stat.h. Here is a little function to get size of the file, allocate memory if file is less than 4GB's and return (-1) otherwise. It reads the file to the char array passed to char *buffer a char *, which contains the contents of the whole file.It should be free'd after use.
#include <stdio.h>
#include <sys/stat.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <fcntl.h>
char *loadlfile(const char *path)
{
int file_descr;
FILE *fp;
struct stat buf;
char *p, *buffer;
fstat((file_descr = open(path, O_RDONLY)), &buf);
// This check is done at preprocessing and requires no check at runtime.
// It basically means "If this machine is not of a popular 64bit architecture,
// it's probably not 128bit and possibly has limits in maximum memory size.
// This check is done for the sake of omission of malloc(3)'s unnecessary
// invocation at runtime.
// Amd 64 Arm64 Intel 64 Intel 64 for Microsofts compiler.
#if !defined(__IA_64) || !defined(__aarch64__) || !defined(__ia64__) || !defined(_M_IA64)
#define FILE_MAX_BYTES (4000000000)
// buf.st_size is of off_t, you may need to cast it.
if(buf.st_size >= FILE_MAX_BYTES-1)
return (-1);
#endif
if(NULL == (buffer = malloc(buf.st_size + 1)))
return NULL;
fp = fdopen(file_descr, "rb");
p = buffer;
while((*p++ = fgetc(fp)) != EOF)
;
*p = '\0';
fclose(fp);
close(file_descr);
return buffer;
}
A very broad list of pre-defined macros for various things can be found # http://sourceforge.net/p/predef/wiki/Home/. The reason for the architecture and file size check is, malloc can be expensive at times and it is best to omit/skip it's usage when it is not needed. And querying a memory of max. 4gb for a whole block of 4gb storage is just waste of those precious cycles.
From that guy's code just do, if I understand your question correctly:
char * buffer = 0;
long length;
FILE * f = fopen (filename, "rb");
if (f)
{
fseek (f, 0, SEEK_END);
length = ftell (f);
if(length > MY_MAX_SIZE) {
return -1;
}
fseek (f, 0, SEEK_SET);
buffer = malloc (length);
if (buffer)
{
fread (buffer, 1, length, f);
}
fclose (f);
}
if (buffer)
{
// start to process your data / extract strings here...
}

Why can't my program save a large amount (>2GB) to a file?

I am having trouble trying to figure out why my program cannot save more than 2GB of data to a file. I cannot tell if this is a programming or environment (OS) problem. Here is my source code:
#define _LARGEFILE_SOURCE
#define _LARGEFILE64_SOURCE
#define _FILE_OFFSET_BITS 64
#include <math.h>
#include <time.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/*-------------------------------------*/
//for file mapping in Linux
#include<fcntl.h>
#include<unistd.h>
#include<sys/stat.h>
#include<sys/time.h>
#include<sys/mman.h>
#include<sys/types.h>
/*-------------------------------------*/
#define PERMS 0600
#define NEW(type) (type *) malloc(sizeof(type))
#define FILE_MODE (S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH)
void write_result(char *filename, char *data, long long length){
int fd, fq;
fd = open(filename, O_RDWR|O_CREAT|O_LARGEFILE, 0644);
if (fd < 0) {
perror(filename);
return -1;
}
if (ftruncate(fd, length) < 0)
{
printf("[%d]-ftruncate64 error: %s/n", errno, strerror(errno));
close(fd);
return 0;
}
fq = write (fd, data,length);
close(fd);
return;
}
main()
{
long long offset = 3000000000; // 3GB
char * ttt;
ttt = (char *)malloc(sizeof(char) *offset);
printf("length->%lld\n",strlen(ttt)); // length=0
memset (ttt,1,offset);
printf("length->%lld\n",strlen(ttt)); // length=3GB
write_result("test.big",ttt,offset);
return 1;
}
According to my test, the program can generate a file large than 2GB and can allocate such large memory as well.
The weird thing happened when I tried to write data into the file. I checked the file and it is empty, which is supposed to be filled with 1.
Can any one be kind and help me with this?
You need to read a little more about C strings and what malloc and calloc do.
In your original main ttt pointed to whatever garbage was in memory when malloc was called. This means a nul terminator (the end marker of a C String, which is binary 0) could be anywhere in the garbage returned by malloc.
Also, since malloc does not touch every byte of the allocated memory (and you're asking for a lot) you could get sparse memory which means the memory is not actually physically available until it is read or written.
calloc allocates and fills the allocated memory with 0. It is a little more prone to fail because of this (it touches every byte allocated, so if the OS left the allocation sparse it will not be sparse after calloc fills it.)
Here's your code with fixes for the above issues.
You should also always check the return value from write and react accordingly. I'll leave that to you...
main()
{
long long offset = 3000000000; // 3GB
char * ttt;
//ttt = (char *)malloc(sizeof(char) *offset);
ttt = (char *)calloc( sizeof( char ), offset ); // instead of malloc( ... )
if( !ttt )
{
puts( "calloc failed, bye bye now!" );
exit( 87 );
}
printf("length->%lld\n",strlen(ttt)); // length=0 (This now works as expected if calloc does not fail)
memset( ttt, 1, offset );
ttt[offset - 1] = 0; // Now it's nul terminated and the printf below will work
printf("length->%lld\n",strlen(ttt)); // length=3GB
write_result("test.big",ttt,offset);
return 1;
}
Note to Linux gurus... I know sparse may not be the correct term. Please correct me if I'm wrong as it's been a while since I've been buried in Linux minutiae. :)
Looks like you're hitting the internal file system's limitation for the iDevice: ios - Enterprise app with more than resource files of size 2GB
2Gb+ files are simply not possible. If you need to store such amount of data you should consider using some other tools or write the file chunk manager.
I'm going to go out on a limb here and say that your problem may lay in memset().
The best thing to do here is, I think, after memset() ing it,
for (unsigned long i = 0; i < 3000000000; i++) {
if (ttt[i] != 1) { printf("error in data at location %d", i); break; }
}
Once you've validated that the data you're trying to write is correct, then you should look into writing a smaller file such as 1GB and see if you have the same problems. Eliminate each and every possible variable and you will find the answer.

Using fseek and fread

I am working on a project that reads data from bin files and processes the data. The bin file is huge and is about 150MB. I am trying to use fseek to skip unwanted processing of data.
I am wondering if the processing time of fseek is the same as fread.
Thanks!
fseek just repositions the internal file pointer whereas fread actually reads data. So I guess fseek should be much faster than fread
If you are really curious to see what's happening behind the screen, download glibc from here and check for yourself :)
I am wondering if the processing time of fseek is the same as fread.
Probably not though, of course, it's implementation-dependent.
Most likely, fseek will only set an in-memory "file pointer" without going out to the disk to read any information. fread, on the other hand, will read information.
An fseek to file position 149M followed by a 1M fread will probably be faster than 150 different 1M fread calls, throwing away all but the last.
I probably feel fseek might be bit faster than fread as fseek changes the pointer position to the new address space that you have mentioned and there is no date read is happening.
If you are processing huge files have you considered alternatives to read/write?
You may find that mmap() (UNIX) or MapViewOfFile (Windows) is a more suitable alternative.
The following UNIX example demonstrates opening a file for reading and counting the occurance of the ASCII character 'Q'. NOTE - all error checking has been omitted to make the example shorter.
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>
int main(int argc, char **argv)
{
int i, fd, len, total;
char *map, *ptr;
fd = open("/tmp/mybigfile", O_RDONLY);
len = lseek(fd, SEEK_END, 0);
map = (char *)mmap(0, len, PROT_READ, MAP_SHARED, fd, 0);
total = 0;
for (i=0; i<len; i++) {
if (map[i] == 'Q') total++;
}
printf("Found %d instances of 'Q'\n");
munmap(map, len);
close(fd);
}

Resources