How can I read a string from PDF file in C? - c

I want to create a program that process the edit distance from two file, My code works with strings read from a txt file. But now I want to read strings from PDF DOC exc. How can I read strings from this files? I tryed with the func fread but it not works.
This is the code that i wrote:
void method () {
FILE *file;
char *str;
if ((file = fopen("C:/Users/latin/Desktop/prova.pdf", "rb")) == NULL) {
printf("Error!\n");
}
fread(&str,18,1,file);
printf("%s",str);
}
prova.pdf is a PDF file that contains this string : ciaoCiao merendina .

It is possible to do this in plain C. Adobe did it. Artifex did it. Others have done it. But as commented, it is a ton of work. But I can outline the steps to give you a feel for what's involved.
First you could read the "Magic Number" at the start and check that it is actually a PDF. It should start with %PDF- followed by a version number. But apparently many PDF producers don't conform to this requirement.
Next, you need to skip to the very end of the file and read backwards, looking for something like:
startxref
1581
%%EOF
That number is the byte-offset of the start of the X-Reference table which lists the binary offsets of all the "objects" in the file. An object can be a Page or a Font or a Content Stream or many other things.
Looking at the X-Reference table, you'll see something like this:
xref
0 4
0000000000 65535 f
0000000010 00000 n
0000000063 00000 n
0000000127 00000 n
0000000234 00000 n
trailer
<<
/Root 1 0 R
/Size 4
>>
The line /Root 1 0 R tells you which object is the root of the document tree. You'll need to examine this object to find the top-level Pages object which looks like this:
2 0 obj
<< /Kids [ 3 0 R ]
/Type /Pages
/Count 1
>>
endobj
The Kids element here contains a reference to the first Page object which looks like this:
3 0 obj
<< /Contents [ 4 0 R ]
/MediaBox [ 0.0 0.0 612.0 792.0 ]
/Type /Page
/Parent 2 0 R
>>
endobj
Then you'll need to find the Contents object referenced here. A Content stream, if it's not encrypted or compressed, will show you the drawing commands and text commands being drawn to the page.
5 0 obj
<<
/Length 15660
>>
stream
BT F1 10.0 Tf 30.0 750.0 Td (<< ) Tj ET BT F1 10.0 Tf 50.0 738.0 Td (/)
Tj ET BT F1 10.0 Tf 56.0586 738.0 Td (astring) Tj ET BT F1 10.0 Tf 86.7852
738.0 Td ( ) Tj ET BT F1 10.0 Tf 89.2852 738.0 Td (\() Tj ET BT F1 10.0 Tf
92.6133 738.0 Td (this string data) Tj ET
[...lots more commands follow...]
endstream
endobj
Text commands will always be bracketed by BT ... ET. In here, you can finally see the strings wrapped in parens. But you'll have to pay attention to the coordinates 30.0 750.0 Td of each string to figure out which ones are part of the same logical line.
If the PDF was created from a word processor, it is likely to contain text in this form but with lots of caveats. It might have re-encoded fonts and the text strings will no longer represent ASCII characters but just positions in the font's encoding vector. If the PDF was created from a scanned document, it may just contain images of the pages with no text content at all unless it has gone through a conversion involving OCR.

Related

Gnuplot: Iterate over folders

I have different folders with datasets called e.g.
3-1-1
3-1-2
3-2-1
3-1-2
the first placeholder is fixed, the second and third are elements of a list:
k1values = "1 2"
k2values = "1 2"
I want to do easy operations in my Gnuplot script e.g. cd to the above directories and read a line of a textfile. First, it shall cd to the folder, read a file and cd back again etc.
My first (1) idea was to connect system command and sprintf:
do for[i=1:words(k1values)]{
do for[j=1:words(k2values)]{
system sprintf("cd 3-%d-%d", i, j)
system 'pwd'
system 'cd ..'
}
}
with that the same path is being printed, so no CD is happening at all.
or system 'cd sprintf("3-%d-%d", i, j)'
Unfortunately, this is not working.
Error message: sh: 1: Syntax error: "(" unexpected
I also tried concatenating the values to a string and enter it as a path: This also doesn't work:
k1values = "1 2"
k2values = "1 2"
string1 = '3'
do for[i=1:words(k1values)]{
do for[j=1:words(k2values)]{
path = sprintf("%s-%d-%d", string1, i, j)
system sprintf("cd %s", path)
system 'pwd'
system 'cd ..'
}
}
I print the path for testing, but the operating path is not being changed at all.
Thanks in advance!
Edit: The idea in a given pseudo code is like this:
do for k1
do for k2
valueX = <readingCommand>
make dir "3-k1-k2/Pictures"
for int i = 0; i<valueX; i++
set output bla
plot "3-k1-k2/Data/i.txt" <options>
end for
end do for
end do for
Unless there is a reason which we don't know yet, why do you want to change back and forth into the subdirectories?
Why not creating your path/filename via a function and load the desired file and plot the desired lines?
For example, if you have the following directory structure:
CurrentFolder
3-1-1
Data.dat
3-1-2
Data.dat
3-2-1
Data.dat
3-2-2
Data.dat
and the following files:
3-1-1/Data.dat
1 1.14
2 1.15
3 1.12
4 1.11
5 1.13
3-1-2/Data.dat
1 1.24
2 1.25
3 1.22
4 1.21
5 1.23
3-2-1/Data.dat
1 2.14
2 2.15
3 2.12
4 2.11
5 2.13
3-2-2/Data.dat
1 2.24
2 2.25
3 2.22
4 2.21
5 2.23
The following example loads all the files Data.dat from the corresponding subdirectories and plots the lines 2 to 4 (the lines have 0-based index, check help every).
Script:
### plot specific lines from files from different directories
reset session
k1values = "1 2"
k2values = "1 2"
string1 = '3'
myPath(i,j) = sprintf("%s-%s-%s",string1,word(k1values,i),word(k2values,j))
myFile(i,j) = sprintf("%s/%s",myPath(i,j),"Data.dat")
set key out
plot for [i=1:words(k1values)] for[j=1:words(k2values)] myFile(i,j) \
u 1:2 every ::1::3 w lp pt 7 ti myPath(i,j)
### end of script
Result:
This is my final solution:
k1values = '0.5 1'
k2values = '0.5 1'
omega = 3
do for[i in k1values]{
do for[j in k2values]{
savingPoint = system('head -n 1 "3-'.i.'-'.j.'/<fileName>.dat" | tail -1')
number = savingPoint/<value>
do for[m = savingPoint:0:-<value>]{
set title <...>
set output <...>
plot ''.omega.'-'.i.'-'.j.'/Data/'.m.'.txt' <...>
}
}
}
<...> is a placeholder and irrelevant.
So this is how I finally iterate over the folders.
Within the second for loop, a reading command is executed and allocated to a variable which is needed in the third for loop. i and j are strings though, but that does not matter.

How to use VTK with C language?

I've already installed CMake, but I haven't undertood how to use The visualizer Toolkit!I have done a .dat file with C and I want to make a .vtk file.
A .vtk structured grid looks like this :
# vtk DataFile Version 2.0
Really cool data
ASCII
DATASET STRUCTURED_GRID
DIMENSIONS 2 2 1
POINTS 4 float
0 0 0
0 0 2
0 1 0
0 1 1
POINT_DATA 4
SCALARS volume_scalars char 1
LOOKUP_TABLE default
1 2 3 4
Using the toolkit is not absolutely mandatory to write such a file. You can try to use it if you are working with c++. http://www.vtk.org/Wiki/VTK/Examples/Cxx/StructuredGrid/StructuredGrid
Or you can use fopen(), fprintf() and fclose() in stdio.h.
Combine things like :
#include <stdio.h>
...
FILE* f = fopen("bla.vtk","w");
if(f==NULL){printf("file vtk, failed to open\n");}
fprintf(f, "# vtk DataFile Version 2.0");
...
fprintf(f,"%f %f %f\n",x,y,z);
...
fclose(f);
Good luck !

How to read file in matlab?

I have a txt file, and the content of the file is rows of numbers,
each row have 5 float number in it, with comma seperate between each number.
example:
1.1 , 12 , 1.42562, 3.5 , 2.2
2.1 , 3.3 , 3 , 3.333, 6.75
How can I read the file content into matrix in matlab?
So far I have this:
fid = fopen('file.txt');
comma = char(',');
A = fscanf(fid, ['%f', comma]);
fclose(fid);
The problem is that it's only give me the first line and when I
try to write the content of A I get this: 1.0e+004 * some number
Can someone help me please?
I guess that for the file I need to read it in a loop but I don't know how.
Edit: One more question: When I do output to A I get this:
A =
1.0e+004 *
4.8631 0 0 0 0.0001
4.8638 -0.0000 -0.0000 0.0004 0.0114
4.8647 -0.0000 -0.0000 0.0008 0.0109
I want the same values that in the file to be in the matrix, how can I make the numbers to be regular float and not formatted like this? Or are the numbers in the matrix actually float, but the output is just displayed like this?
MATLAB's built-in dlmread function would be a much easier solution for what you want to accomplish.
A = dlmread('filename.txt',',') % call dlmread and specify a comma as the delimiter
try with using importdata function
A = importdata(`filename.txt`);
It will solve your question.
EDIT
Alternative 1)
A = dlmread('test_so.txt',',');
The answer is surprisingly simple:
fid = fopen('depthMap.txt');
A = fscanf(fid, '%f');
fclose(fid);

How are the control points connected in a NURBS surface?

I'm trying to learn how to deal with NURBS surfaces for a project. Basically I wan't to build a geometry in some 3D program with NURBS, then export the geometry, and run some simulations with it. I have figured out the NURBS curve, and I do think I mostly understand how surfaces work, but what I don't get is how the control points are connected. Apparently you don't need any topology matrix as with polygons? When I export NURBS surfaces from Maya, in the file format .ma, which is plain text file, I can see the knot vectors, and then just a list of points. No topology information. How does this work? How can you reconstruct the NURBS surface without knowing how the points are connected to each other? The exported file is written below:
//Maya ASCII 2013 scene
//Name: test4.ma
//Last modified: Sat, Jan 26, 2013 07:21:36 PM
//Codeset: UTF-8
requires maya "2013";
requires "stereoCamera" "10.0";
currentUnit -l centimeter -a degree -t film;
fileInfo "application" "maya";
fileInfo "product" "Maya 2013";
fileInfo "version" "2013 x64";
fileInfo "cutIdentifier" "201207040330-835994";
fileInfo "osv" "Mac OS X 10.8.2";
fileInfo "license" "student";
createNode transform -n "loftedSurface1";
setAttr ".t" -type "double3" -0.68884794895562784 0 -3.8172687581953233 ;
createNode nurbsSurface -n "loftedSurfaceShape1" -p "loftedSurface1";
setAttr -k off ".v";
setAttr ".vir" yes;
setAttr ".vif" yes;
setAttr ".covm[0]" 0 1 1;
setAttr ".cdvm[0]" 0 1 1;
setAttr ".dvu" 0;
setAttr ".dvv" 0;
setAttr ".cpr" 4;
setAttr ".cps" 4;
setAttr ".cc" -type "nurbsSurface"
3 3 0 0 no
8 0 0 0 1 2 3 3 3
11 0 0 0 1 2 3 4 5 6 6 6
54
0.032814107781307778 -0.01084889661073064 -2.5450696958149557
0.032814107781308312 -0.010848896610730773 -1.6967131305433036
0.032824475105651972 -0.010848896610730714 -0.0016892641735144487
0.032777822146102309 -0.01084889661073018 2.5509821204222565
0.032948882997777158 -0.010848896610730326 5.3256822304677218
0.032311292550627417 -0.010848896610730283 7.5033561343333179
0.034690593487551526 -0.010848896610730296 11.39484483093603
0.014785648001686571 -0.010848896610730293 11.972583607988943
-0.00012526283089935193 -0.010848896610730293 12.513351622510489
0.87607723187763198 -0.023973071493875439 -2.5450696958149557
0.87607723187766595 -0.023973071493876091 -1.6967131305433036
0.87636198619878247 -0.023973071493875821 0.00026157734839016289
0.87508059175355446 -0.023973071493873142 2.5441541750955903
0.87977903805225144 -0.023973071493873861 5.3510431702524812
0.86226664730269065 -0.02397307149387367 7.4087403205209448
0.9276177640022375 -0.023973071493873725 11.747947146400762
0.39164345444212556 -0.023973071493873704 12.72679599298271
-0.003344290659457324 -0.023973071493873708 13.356608602511475
2.7585407036097025 0.080696275184513055 -2.5450696958149557
2.7979735813230628 0.036005680442686323 -1.6988092981025378
2.7828331201271896 0.05438167150027777 0.0049374879309111996
2.6143679292284574 0.23983328019207673 2.5309327393956176
2.67593270347135 0.19013709747074492 5.3992530024698517
2.5981387973985108 0.20347021966427298 7.2291224273514345
2.8477496474469728 0.19983391361149261 12.418208886861429
1.1034136098865515 0.20064198162322153 14.474560637904968
-0.010126299867110311 0.20064198162322155 15.133224682698101
4.5214126649737496 0.45953483463333544 -2.5450696958149557
4.6561826938778452 0.23941045408996731 -1.7369291398229287
4.6267725925384751 0.29043329565744253 0.025561242784985394
3.9504978751410711 1.3815767918640129 2.5159293599869446
4.1596851721552888 1.0891788615080038 5.438642765250469
3.9992107014958198 1.1676270867254697 7.0865667556376426
4.4319212871194775 1.1462321162116154 12.949041810935984
1.6384310220676352 1.1509865541035829 15.927795222282771
-0.015643773215464073 1.1509865541035829 16.578582772395933
5.2193823159440154 3.0233786192453191 -2.5450696958149557
5.2193823159440162 3.0233786192453196 -1.6967131305433036
5.2218229691816047 3.0233786192453191 0.0091618497226043649
5.2108400296124504 3.0233786192453196 2.5130032217858407
5.251110808032692 3.0233786192453191 5.4667467111172652
5.1010106339208772 3.0233786192453191 6.9770771103715621
5.6611405519478906 3.0233786192453205 13.358896446133507
2.0430537629341199 3.0233786192453183 17.059047057656215
-0.019924192630756767 3.0233786192453191 17.6998820408444
5.1365144716134976 5.4897102753589557 -2.5450696958149557
5.1365144716134994 5.4897102753589566 -1.6967131305433036
5.1389093836131625 5.4897102753589566 0.0089946049919694682
5.1281322796146718 5.4897102753589566 2.5135885783430627
5.1676483276091361 5.4897102753589548 5.4645725296190131
5.0203612396297714 5.4897102753589566 6.9851884798073476
5.5699935435527692 5.4897102753589566 13.328625149888618
2.0133428487217855 5.4897102753589557 16.975388787391935
-0.01960785732642523 5.4897102753589557 17.617014800296868
;
select -ne :time1;
setAttr ".o" 1;
setAttr ".unw" 1;
select -ne :renderPartition;
setAttr -s 2 ".st";
select -ne :initialShadingGroup;
setAttr ".ro" yes;
select -ne :initialParticleSE;
setAttr ".ro" yes;
select -ne :defaultShaderList1;
setAttr -s 2 ".s";
select -ne :postProcessList1;
setAttr -s 2 ".p";
select -ne :defaultRenderingList1;
select -ne :renderGlobalsList1;
select -ne :hardwareRenderGlobals;
setAttr ".ctrs" 256;
setAttr ".btrs" 512;
select -ne :defaultHardwareRenderGlobals;
setAttr ".fn" -type "string" "im";
setAttr ".res" -type "string" "ntsc_4d 646 485 1.333";
select -ne :ikSystem;
setAttr -s 4 ".sol";
connectAttr "loftedSurfaceShape1.iog" ":initialShadingGroup.dsm" -na;
// End of test4.ma
A NURBS surface is allays topologically square with points of degree+spans in u direction and (degree-1)+spans+1* in v direction. (a single NURBS surface is like one face of a polygon only more complicated)
The first 2 attributes in ".cc" are the degree in direction, and the next two lines define the knots each individual value represents a span. Duplicates are just weights so the point is repeated x times so:
8 0 0 0 1 2 3 3 3
Means there 8 knots (in this case in U direction) with 0 1 2 3 spans for a total of 6 points so it's a single span curve of third degree in U direction. The example has 9 points in V direction thus 7*9 = 54 points in total
This is not enough however, for NURBS to be even remotely useful. You must implement trim curves which are curves that lay on the UV parametrization of the surface and they can clip the individual NURBS to different shape.
In practice however maya users rely on manual quilting. Quilts** are the higher order NURBS equivalent of a mesh, that most nurbs modelers use as a concept. To handle these its often not enough to have even the trim curves. As trim curves cannot be reliably transported between applications, without sewing. Thus many applications rely on actually telling what the spatial history of said surface to surface quilt collections topographical connection is. So be prepared to make your own intersection algorithms etc., etc., for any meaningful NURBS compatibility.
For more on the mathematical underpinning info see Wikipedia, wolfram etc.
* If I remember correctly something like that.
** Quilts have different names in different applications due to simultaneous discovery on in several different language areas.
NURBS surfaces' CVs are always laid out in a grid. The number of CVs in a nurbs surface can be computed using the degree of the surface and the number of knots in each direction. Then the CVs are just presented in some specific order, typically row-major.
Let's look at your example. I'm mostly just guessing the format, so you'll want to check my assumptions.
3 3 0 0 no
It looks like you have a bicubic surface. It's not periodic in either direction (that is, you have a sheet rather than a cylinder or torus). Your CVs are non-rational, meaning they're [x,y,z] instead of [xw,yw,zw,w].
In other words, the format of that first line appears to be:
[degree in s] [degree in t] [periodic in s] [periodic in t] [rational]
Next up, one knot vector has 8 knot values, and the other has 11. For a degree 3 non-periodic nurbs, the number of CVs is num_knots - 2. So, you have 6 x 9 CVs in this surface.
The first 6 CVs are in the first row. The next 6 are in the next row, etc.
If you're looking for more information on NURBS, I'd recommend this text for theory. For maya specific stuff, they have some decent documentation in the maya API.

How do I read and parse a .dat file in C?

I have a file called resistors.dat and I need to get my program to read and parse the values from the file into my program.
How would I read a file like this in C?
Read from the le resistors.dat (supplied on Blackboard) similarly to what you have done in Problem 2 of Lab 12. Each line in resistors.dat now represents one row: Ria, Rib and Ric (i = 1; 2; : : : ; n) of the circuit. Expand Problem 2 of Lab 12 to calculate the total resistance of the circuit. Hint: The total resistance is given by 1 R = 1 R1 + 1 R2 + 1 R3 + : : : + 1 Rn where Ri is the sum of resistances in one input row. In a loop, compute the sum of the inverse resistances 1=Ri. After the input has finished, compute the inverse of this sum to obtain the final result.
This is the content of resistors.dat:
64.35 35.52 85.37
90.43 12.99 80.40
98.37 32.63 78.42
3.82 82.74 52.61
3.75 72.47 49.05
96.73 16.07 23.46
48.15 36.62 83.64
51.96 27.19 22.38
4.18 46.07 91.21
96.94 8.17 50.45
0
There are several ways to accomplish this. I expect that your Resistors.dat file looks something like this:
r=1
r=20
r=22
r=2
I suggest you do something like this:
fopen to open the file, fgets in a while loop until the end of the file (!EOF), to read each line. Then use sscanf to parse each line.

Resources