Copy file skipping first n lines and last m lines - c

I want to copy a file with skipping first n of its lines and last m lines using open, read, write and lseek
(eg. n = 1, m = 2, source file:
AAAAAAA
BBBBBBB
CCCCCCC
DDDDDDD
dest file:
BBBBBBB )
I know how to copy a file but don't know how to skip the lines. Here is my code for copy:
char buf[128];
size_t size;
int source = open(argv[1], O_RDONLY);
int dest = open(argv[2], O_CREAT | O_APPEND | O_WRONLY);
if(source == -1) {
printf("error");
return;
}
if(dest == -1) {
printf("error");
return;
}
while((size = read(source, buf, sizeof(buf))) > 0) {
write(dest, buf, size);
}
close(source);
close(dest);
How can i solve this problem?

You should use fgets to read your file since that will read it line by line.
Because it is not trivial to say how many lines you have in total I would suggest you
use fgets to read the file line by line
skip outputting the first n lines
write the rest to your output files and count the number of lines and remember the length of each.
use ftruncate to truncate off the last m lines.
This should do the trick:
void copy_nm(char * source, char * dest, int n, int m) {
FILE * in = fopen(source, "r");
FILE * out = fopen(dest, "w");
size_t file_length = 0;
size_t line_lengths[m + 1];
memset(line_lengths, 0 , sizeof(line_lengths));
int lengths_iterator = 0;
char buffer[0x400];
while (fgets(buffer, sizeof(buffer), in)) {
size_t length = strlen(buffer);
if (n) { // skip this line
if (buffer[length - 1] == '\n') // only if it is a real line
n--;
continue;
}
fwrite(buffer, length, 1, out);
line_lengths[lengths_iterator] += length;
file_length += length;
if (buffer[length - 1] != '\n') { // line was longer then the buffer
continue;
}
lengths_iterator++;
lengths_iterator %= m+1;
line_lengths[lengths_iterator] = 0;
}
for (lengths_iterator = 0; lengths_iterator < m+1; lengths_iterator++) {
file_length -= line_lengths[lengths_iterator];
}
fseek(out, 0, SEEK_SET); // rewind before truncating
ftruncate(fileno(out), file_length);
}

Related

Allocating unknown char[] in dynamically reading a file in c

char* freadline(FILE* fp){
fseek(fp, 0, SEEK_END);
int lSize = ftell(fp);
fseek(fp, 0, SEEK_SET);
char *buffer = malloc(lSize);
fread(buffer, 1, lSize, fp);
fgets(buffer, sizeof(lSize), fp);
return buffer;
}
but it doesn't read line by line any suggestions as to how this would be read line by line
There are couple solutions here.
The first is to get the size of the entire file using fseek and ftell fseek will allow you to go to the end of file and ftell will give you the current position which can be used as a size indicator. You can then allocate enough of a buffer to read the entire file then split them up into lines.
The other solution is to use a temporary buffer of 1000 or so like you're already doing, read a character at a time using fgetc in a loop and feed it into the temporary buffer until you hit a new line indicator , then use the strlen method to get the length and allocate a buffer of that size, copy the temporary buffer then return the allocated buffer.
There is also errors in your code as pointed out in the comments. You're discarding your allocated memory resulting in a leak. And your freadline doesn't actually read a line it just reads whatever size you're telling it to read.
the lines in the file could be of any length.
realloc() is a classic approach, but how about a simple, slow and plodding one:
Read once to find line length, seek, allocate, then read again to save the line.
#include <stdio.h>
char* freadline(FILE *fp) {
int length = 0;
long offset = ftell(fp);
if (offset == -1)
return NULL;
int scan_count = fscanf(fp, "%*[^\n]%n", &length); // Save scan length
if (scan_count == EOF)
return NULL;
if (fseek(fp, offset, SEEK_SET))
return NULL;
size_t n = length + 1u; // +1 for potential \n
char *buf = malloc(n + 1); // + 1 for \0
if (buf == NULL)
return NULL;
size_t len = fread(buf, 1, n, fp);
buf[len] = '\0';
return buf;
}
Test
#include <assert.h>
#include <stdlib.h>
int main() {
FILE *fp = fopen("tmp.txt", "w+");
assert(fp);
for (int i = 0; i < 10; i++) {
int l = i * 7;
for (int j = 0; j < l; j++) {
fputc(rand() % 26 + 'a', fp);
}
fputc('\n', fp);
}
rewind(fp);
char *s;
while ((s = freadline(fp)) != NULL) {
printf("<%s>", s);
free(s);
}
fclose(fp);
return 0;
}
Output
<
><lvqdyoq
><ykfdbxnqdquhyd
><jaeebzqmtblcabwgmscrn
><oiaftlfpcuqffaxozqegxmwgglkh
><vxtdhnzqankyprbwteazdafeqxtijjtkwea
><zqgmplohyxrutojvbzllqgjaidbtqibygdzcxkujvw
><ghwbmjjmbpksnzkgzgiluiggpkzwhaetclrcyxcsixsutjmrm
><vqlybsjnihnfqyfhyszwgpsvnhnngdnjzjypqcflnztrhcfgbkakzxam
><alsuauxxchqjxqaiddtjszgcbullyyjymytioyawpzshhfpqpsatddbcagjgobm
>
If you're ok targeting POSIX, it already has a function that does what you need: getline.
#include <stdio.h>
#include <stdlib.h>
FILE *fh = ...;
char *line = NULL;
size_t buf_size = 0;
while (1) {
ssize_t line_len = getline(&line, &buf_size, fh);
if (line_len == -1)
break;
// ...
}
free(line);
If not, getline can be implemented using using fgets and realloc in a loop. Just start with a arbitrarily-sized buffer.

Corrupted data when using a multidimensional char array

I'm currently implementing a function to use the "external sort" method because I have to sort a big file (+200K lines) on a device with low RAM, right now just trying to make it run on a windows pc.
I'm working on the function to split the file in tiny sorted files.
The problem I'm facing is that among the tiny sorted files the function creates, the data on certain lines are truncated.
I'm quite sure I've done a mistake somewhere but was not able to find it, yet. Could you help me to discover the problem please ?
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAX_LINE_LEN 50
#define MAX_LINES_SORTED 130
void createSortedFiles(FILE*);
int main()
{
FILE *fp = fopen("C:\\C\\Tests\\1.txt", "r+");
if(fp == NULL){
printf("Error opening fp");
return 1;
}
createSortedFiles(fp);
fclose(fp);
return 0;
}
int cmp(const void *p1, const void *p2) {
return strcmp(p1, p2);
}
void createSortedFiles(FILE* fp) {
FILE* sfp;
//FILE* sfp2 = fopen("C:\\C\\Tests\\test.txt", "w+");
char lines[MAX_LINES_SORTED][MAX_LINE_LEN + 1] = {0}, buffer[MAX_LINE_LEN + 1] = { 0 }, fnum[6];
char fname[20] = "C:\\C\\Tests\\";
char *p;
int i = 0, j = 0 /*file names*/, int max_lines = MAX_LINES_SORTED - 1;
size_t N;
while (1){
p = fgets(buffer, MAX_LINE_LEN, fp);
// fwrite(buffer, strlen(buffer), 1, sfp2);
if(strlen(buffer) > 0 || i > 0){
if(p != NULL)
memcpy(lines[i], buffer, strlen(buffer));
//If reached the max number of lines accepted in the array
//Or reached EOF
//=> Sort and write the array "lines"
if (i >= max_lines || p == NULL) {
N = sizeof(lines) / sizeof(lines[0]);
qsort(lines, N, sizeof(*lines), cmp);
//sets the name of the current file
memset(&fname[11], 0, 9);
itoa(j, fnum, 10);
strcat(fname, fnum);
if ((sfp = fopen(fname, "w+")) == NULL) {
printf("Error opening sfp");
return;
}
for (i = 0; i < N; i++) {
fwrite(lines[i], strlen(lines[i]), 1, sfp);
}
fclose(sfp);
memset(lines, 0, sizeof(lines[0][0]) * MAX_LINES_SORTED * MAX_LINE_LEN);
j++; i = -1; //because incremented right after
}
}
if(p == NULL){
break;
}
i++;
}
//fclose(sfp2);
return;
}
Here's an example of the fp file (each lines ending with \r\n):
8023796280724;00060-014.W47
8023796280731;00060-014.W48
;0009070305/08007
;0009470337/08007
;0009490338/13001
;0010480311/08007
;0010830308/08007
;0011S
8033280129293;002004GRS4XL
;002015RSM
5708628117005;00207-630-06T42
5708628117012;00207-630-06T44
5708628117036;00207-630-06T46
4051428088756;647530241000045
4051428088763;647530241000046
4051428088770;647530241000047
;647BLPMF
4051428092586;648510256000040
4051428092593;648510256000041
4051428092609;648510256000042
4051428092616;648510256000043
4051428092623;648510256000044
4051428092630;648510256000045
4051428092647;648510256000046
Your "truncated lines" are not really truncated lines, they are stray data left in the buffer from previous files.
This array:
#define MAX_LINE_LEN 50
#define MAX_LINES_SORTED 130
char lines[MAX_LINES_SORTED][MAX_LINE_LEN + 1];
has 6630 bytes, but here:
memset(lines, 0, sizeof(lines[0][0]) * MAX_LINES_SORTED * MAX_LINE_LEN);
you zero out only 6500 bytes and leave the last two lines as they are.
You can fix this by using (MAX_LINE_LEN +1) in the size calculation, but the array can be zeroes out more tersely (and more reliably) with just:
memset(lines, 0, sizeof(lines));

Copying file pointer into array

Here, I am trying to read from a text file, copy this file into an array, then I want to write the array to another text file. This is not copying into the array at all. I am just getting blank values when I print.
int main(void)
{
char char_array[50];
char copied_array[50];
//int n = 2;
FILE* fpointer = fopen("hello_world.txt", "r");
FILE* fpointer2 = fopen("copyhello.txt", "w");
for(int i = 0;i < 50; i++)
{
fread(&char_array, sizeof(char), 1, fpointer);
copied_array[i] = char_array[i];
}
for(int j = 0;j < 50; j++)
{
printf("char_array: %c\n", copied_array[j]);
}
fclose(fpointer);
fclose(fpointer2);
}
working code. hope this becomes clearer :)
Note you're using fread/fwrite - compare with fgets/fputs for strings.
#include "stdio.h"
#include "string.h"
#define BUFSIZE 50
// memory size 'plus one' to leave room for a string-terminating '\0'
#define BUFMEMSIZE (BUFSIZE+1)
const char *file1 = "hello_world.txt";
const char *file2 = "copyhello.txt";
int main(void)
{
char char_array[BUFMEMSIZE];
char copied_array[BUFMEMSIZE];
FILE *fInput, *fOutput;
fInput = fopen(file1, "r");
if(fInput != NULL)
{
fOutput = fopen(file2, "w");
if(fOutput != NULL)
{
// make sure memory is wiped before use
memset(char_array, 0, BUFMEMSIZE);
memset(copied_array, 0, BUFMEMSIZE);
size_t lastSuccessfulRead = 0;
// the read-then-loop pattern: try and read 50 chars
size_t bytesRead = fread(char_array, sizeof(char), BUFSIZE, fInput);
while(bytesRead != 0)
{
// we got at least 1 char ..
// (to be used at end - so we know where in char_array is the last byte read)
lastSuccessfulRead = bytesRead;
// 'bytesRead' bytes were read : copy to other array
strncpy(copied_array, char_array, bytesRead);
// write to output file, number of bytes read
fwrite(copied_array, sizeof(char), bytesRead, fOutput);
// read more, and loop, see if we got any more chars
bytesRead = fread(char_array, sizeof(char), BUFSIZE, fInput);
}
// set char after the last-read-in char to null, as a string-terminator.
char_array[lastSuccessfulRead] = '\0';
// an array of chars is also a 'string'
printf("char_array: %s\n", char_array);
fclose(fOutput);
}
else printf("cant open %s\n", file2);
fclose(fInput);
}
else printf("cant open %s\n", file1);
}

C - Segmentation fault (core dumped), read first N bytes from file

I wrote some code to read first pos bytes from a binary file and write it into another file. Turned out I got segmentation fault when I ran it. Here is the code:
void outputUntillPos(const char * inFileName, const char * outFileName, int pos) {
FILE * inFile = fopen(inFileName, "r");
FILE * outFile = fopen(outFileName, "aw");
char buf[1024];
int read = 0;
int remain = pos;
do {
if(remain <= 1024) {
read = fread(buf, 1, pos, inFile);
} else {
read = fread(buf, 1, 1024, inFile);
}
remain -= read;
fwrite(buf, 1, read, outFile);
memset(buf, 0, 1024);
} while(remain > 0);
}
Did I get out-of-range operation here?
EDIT: Thanks to all the help, here is the edited code.
void outputUntillPos(const char * inFileName, const char * outFileName, int pos) {
FILE * inFile = fopen(inFileName, "r");
FILE * outFile = fopen(outFileName, "aw");
char buf[1024];
int read = 0;
int remain = pos;
if((inFile != NULL) && (outFile != NULL)) {
do {
if(remain <= 1024) {
read = fread(buf, 1, remain, inFile);
} else {
read = fread(buf, 1, 1024, inFile);
}
remain -= read;
fwrite(buf, 1, read, outFile);
memset(buf, 0, 1024);
} while(remain > 0 && read > 0);
}
fclose(inFile);
fclose(outFile);
}
When remain becomes <= 1024 and the if portion of the block is entered, you're reading in pos bytes, which if it's greater that 1024 will write past the end of the buffer. That's what causes the segfault.
You want to use remain here instead:
if(remain <= 1024) {
read = fread(buf, 1, remain, inFile);
} else {
read = fread(buf, 1, 1024, inFile);
}
Also, be sure to check the return value of fopen, and to fclose(inFile) and fclose(outFile) before you return.
When the remaining amount of bytes to read (in variable remain) becomes less that 1024, you for some reason attempt to read pos bytes. Why pos??? You are supposed to read remain bytes on the last iteration, not pos bytes.
If pos is greater than 1024 and the input file still has extra data, then of course you will overrun the buffer on that last iteration.

split file in c error get buffer in readfile

I program a program to split file in C in Ubuntu.
I have error when get buffer in readfile.
here is my code.
int split(char *filename, unsigned long part) {
FILE *fp;
char *buffer;
size_t result; // bytes read
off_t fileSize;
fp = fopen(filename, "rb");
if (fp == NULL) {
fprintf(stderr, "Cannot Open %s", filename);
exit(2);
}
// Get Size
fileSize = get_file_size(filename);
// Buffer
buffer = (char*) malloc(sizeof(char) * (fileSize + 1));
if (buffer == NULL) {
fputs("Memory error", stderr);
fclose(fp);
return 1;
}
// Copy file into buffer
//char buffers[11];
result = fread(buffer, 1, fileSize, fp);
buffer[fileSize] = '\0';
if (result != fileSize) {
fputs("Reading error", stderr);
return 1;
}
// Split file
off_t partSize = fileSize / part;
// Last Part
off_t lastPartSize = fileSize - partSize * part;
unsigned long i;
unsigned long j;
// create part 1 to n-1
for (j = 0; j < part; j++) {
char partName[255];
char *content;
char partNumber[3];
// Content of file part
// for (i = j; i < partSize * (j + 1); i++) {
//
// }
content = (char*) malloc(sizeof(char) * partSize);
content = copychar(buffer, j + i, partSize + i);
i += partSize;
//copy name
strcpy(partName, filename);
// part Number
sprintf(partNumber, "%d", j);
// file name with .part1 2 3 4 ....
strcat(partName, ".part");
strcat(partName, partNumber);
// Write to file
writeFile(partName, content);
free(content);
}
// last part
char *content;
content = (char*) malloc(sizeof(char) * (fileSize - partSize * (part - 1)));
content = copychar(buffer, (part - 1) * partSize + 1, fileSize);
char lastPartNumber[3];
char lastPartName[255];
sprintf(lastPartNumber, "%d", part);
strcpy(lastPartName, filename);
strcat(lastPartName, ".part");
strcat(lastPartName, lastPartNumber);
writeFile(lastPartName, content);
free(content);
free(buffer);
fclose(fp);
return 0;
}
here is function copychar from start to end
char *copychar(char* buffer, unsigned long start, unsigned long end) {
if (start >= end)
return NULL;
char *result;
result = (char*) malloc(sizeof(char) * (end - start) + 1);
unsigned long i;
for (i = start; i <= end; i++)
result[i] = buffer[i];
result[end] = '\0';
return result;
}
here is function to get filesize
off_t get_file_size(char *filename) {
struct stat st;
if (stat(filename, &st) == 0)
return st.st_size;
fprintf(stderr, "Cannot determine size of %s: %s\n", filename);
return -1;
}
here is function to write file
int writeFile(char* filename, char*buffer) {
if (buffer == NULL || filename == NULL)
return 1;
FILE *file;
file = fopen(filename, "wb");
fwrite(buffer, sizeof(char), sizeof(buffer) + 1, file);
fclose(file);
return 0;
}
When I test I use file test 29MB and it dumped.
I debug It return fileSize true but when readfile in buffer get from file it only return 135 characters and when use copychar it error.
Breakpoint 1, 0x0000000000400a0b in copychar (buffer=0x7ffff5e3a010 "!<arch>\ndebian-binary 1342169369 0 0 100644 4 `\n2.0\ncontrol.tar.gz 1342169369 0 0 100644 4557 `\n\037\213\b", start=4154703576, end=4164450461) at final.c:43
Program received signal SIGSEGV, Segmentation fault.
0x0000000000400a0b in copychar (buffer=0x7ffff5e3a010 "!<arch>\ndebian-binary 1342169369 0 0 100644 4 `\n2.0\ncontrol.tar.gz 1342169369 0 0 100644 4557 `\n\037\213\b", start=4154703576, end=4164450461) at final.c:43
Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
I don't know how to devide buffer into part to write into part when split.
Thank for advance!
It's highly impractical to copy files in 1 big block as you may have noticed. And it's not needed.
At the simplest level you could copy the file byte by byte, like this
while( ( ch = fgetc(source) ) != EOF ) {
fputc(ch, target);
}
Which will work, but it will be quite slow. Better to copy in blocks, like this:
unsigned char buf[4096];
size_t size;
while( (size = fread(buf, 1, sizeof(buf), fpRead) ) > 0) {
fwrite(buf, 1, size, fpWrite);
}
Notice that the resulting code is way simpler and contains no dynamic memory allocation.
You still need to add the splitting logic of course, but that can be done by tracking the number of bytes written and opening a new write-file before actually writing it.
EDIT: how to handle the multipart facet - schematically, you still need to implement extra checks for some special cases and test results of the different system calls of course
unsigned char buf[4096];
size_t size;
size_t partsize = 100000; // asssuming you want to write 100k parts.
size_t stilltobewritten = partsize; // bytes remaining to be written in current part
size_t chunksize = sizeof(buf); // first time around we read full buffersize
while( (size = fread(buf, 1, chunksize, fpRead) ) > 0) {
fwrite(buf, 1, size, fpWrite);
stilltobewritten -= size; // subtract bytes written from saldo
if (stilltobewritten == 0) {
// part is complete, close this part and open next
fclose(fpWrite);
fpWrite = fopen(nextpart,"wb");
// and reinit variables
stilltobewritten = partsize;
chunksize = sizeof(buf);
} else {
// prep next round on present file - just the special case of the last block
// to handle
chunksize = (stilltobewritten > sizeof(buf)) ? sizeof(buf) : stilltobewritten;
}
}
and EDIT 2: the file part name can be made a LOT simpler as well:
sprintf(partName, "%s.part%d",file, j);
concerning the original code, there's some confusion about start and end in the copychar. First, you probably meant sizeof(char) * (end - start + 1) rather than sizeof(char) * (end - start) + 1 in the malloc, second, you're copying end-start+1 symbols from the original buffer (for (i = start; i <= end; i++)) and then overwrite the last one with '\0', which probably isn't the intended behavior.

Resources