Read own source file and text file line number? - c

We have a assignment and the teacher doesn't go into depth with explaining things so I'm a bit confused since I haven't really done much programming before. We have to write a program that when it's done being executed it's able to read its source file and can make another text file which is the same as its source file but the text file has a line number. My problem is I don't understand how to begin it. Could someone give me an example how to get started and what steps to take? I'm not asking for someone to do the programming for me just give an example. Thanks in advance.

Roughly the steps you'll want to take are:
Read each line of the input text file
Prepend the line number to the beginning of each line.
Write your modified lines into a new text file.
There's a lot of good information on how to read/write to files here, and string concatenation (for how to prepend the line number) here. You may also want to look into for loops so that you can hit every line in the input file.

There are really two parts to your question: "Who am I?" (what file are you) and "Write a copy of myself with line numbers"
The part that you describe above is the first -- "Who am I?" and for that, something external to your source code has to provide the info because the language itself can reside in any file.
Often, there is information available about what's being compiled made available by the preprocessor (just like it sounds, it's something that is run before compiling your source code). In this case, "preprocessor macros" commonly give you this sort of environmental data.
Take a look at this link for GNU C: to start researching what is available under what conditions. Your compiler, if not gcc, should have similar docs.


Find duplicate files using C

I am attempting to write a C program that searches for duplicate files, groups the files, and then returns any files that are duplicates. The user can enter either a file path or specify files to check for using from the command line (argc). I am going to use stat() to traverse the system, and I know I need to use a hash table to bin the files. However, I am a bit lost on what to do to actually check if the files are repeats.
I know there are already programs that will do this for you, but this is an academic exercise that I need to complete. I am not looking for the coding answer, just a higher level answer on how I should go about solving the problem. Any feedback is appreciated, including any suggestions other than the ones I have listed above (again, I have to write this program from scratch).

How to read a text based data file into an array in C?

I have to read a text based data file, with an unknown number of data points, into an array in C, but I can't work out how to do this. I can't even manage to get my program to successfully open the text file, let alone put it into an array etc
The file contains numerical values, so it is not a string it needs to be read into. Ideally this should be done by the user inputting the file name.
I basically need the program to:
Ask user to input file name (I understand this is just a simple printf job)
When the user inputs the file name, the program opens the text file, stores the data from it into an array of an appropriate size.
Print entire array to show that this has been done.
If anyone could give a step to step explanation of how this can be done I would really appreciate it.
Anything asked to be described step by step without asking your input would be copy of others work.
Best advice is to learn things step by step on your own.
File I/O in C:
If you want to add additional features like user input:How to read a string from user input in C, put in array and print
Do some research on file content and how it's being handled from program. (Seems that you are referring to ASCII format file).
You should have done some searching before asking this complexity level questions.
If you want same advice in future for this task, I suggest to add code here.

unknown custom file format reader source equals writer source?

Am I correct in assuming that an obscure file format loader's c level source/abstraction that closely corresponds to a hex dump of the original file can also be used to make the said file format construction source code from scratch in what seems to be something like bootstrapping?
In general, no. There usually are auxiliary resources that do not need to be written out, but still have to be reconstructed by the loading function. It's hard to say anything more without knowing your specific situation.

Extract records from compiled search program, C

Does anybody have an idea on how to extract all information from a compiled, record search program?
I think the program works by using a binary search. It was compiled and the database was in the program. The only way to see the records is to make a correct search.
Is there some way that I can bruteforce the program and extract all information?
The record is searched by the ID which starts with 1 and 10 digit long [ 1xxxxxxxxx ].
If you want to try, 1112700303 will work but I don't have the other numbers.
I've tried some Decompiler but I have no idea what I'm doing.
The program can be downloaded from here:
Your help is appreciated as it will increase my knowledge and learn something new here :D
Though question. Is there no way to get hold of the source code (ask the author, search for the program name, ...)?
On Unix/Linux, the program strings extracts printable strings from a binary file. Doing that on x86 executables gives a long list of strings that are just instructions which happen to be ASCII strings, names of functions used by the program, ans other junk. Somewhere it lists initialized text data for the program (printf(3) formats, constant strings used), which in this case shows a bunch of names that look arabic, and some directory names. Perhaps searching for those could help.
This can probably be achieved by using Snowman. It might not get the exact source code you are looking for but enough to extract all the data you need such has the constant strings.

Why do file formats have magic numbers?

For example, Portable Executable has several, including the famous "MZ" at the beginning, as well as the "PE\0\0" at the start of the PE header. The Rar file format has the "Rar!" header at the beginning, and several others have similar "magic values" in the file.
What purpose do such magic values serve?
Because users change the file extension, or other programs steal the file extension, it allows the application to cancel processing of a file in an unknown format instead of trying its best and then failing anyway.
the concept of magic numbers goes back to unix and pre-dates the use of file extensions.
The original idea of the shell was that all 'executable' would look the same - it didn't matter how the file had been created or what program should be used to evaluate it. The shell would look at the contents of the file and determine the appropriate file. Microsoft came along and chose a different approach and the era of file extensions was born. Then to make things 'nicer' for users microsoft chose to 'hide' these extensions and the era of trojan files which look like they are of one type but really have a different extension and are processed by a different file was born.
If two applications store data differently, but are constructed such that a file for one might possibly also be a valid (but meaningless) file for the other, very bad things can happen. A program may think it has successfully loaded the file (unaware that the data is meaningless) and then write back a file which to it would be semantically identical, but which would no longer be meaningfully readable by the application that wrote it (or anything else for that matter).
Using magic numbers doesn't entirely prevent this, but it can help at least somewhat.
BTW, trying to guess about the format of data is often very dangerous. For example, suppose one has a list of what are probably dates in the format nn-nn-nn. If one doesn't know what format the dates are in, there may be enough information to pretty well guess the format (e.g. if one of the records is 12-31-99, then absent information to the contrary, the dates are probably mm-dd-yy) but if all dates are within the first 12 days of a month, the data could easily be misinterpreted. Suppose, though, the data were preceded by something saying "MM-DD-YY". Then the risks of misinterpretation could be reduced.
To quickly identify the type of the file, or the positions within it.
Your question should not be “why do file formats have magic number”, but rather “what are the advantages of file formats having magic number”!
Programs that undelete files by reading disk free space may recognize file types
Your UNIX knows whether an executable file is to be interpreted (she-bang) or is binary
When you lose extensions, programs like file can detect what your files are
Designer of file formats consider it is always safer when applications can easily ensure they are reading a file which has the good format.
As you have a header, it does not cost much to put it at header start.
