The project I’m working on requires a scanned file to be associated with a record in the database. The association is required so that scanned documents can be attached to an invoice when it is sent out by email. To meet this requirement, I have created a small QR code that is printed on a sticky label. The QR code contains the ID of the associated database record. I have asked the admin staff to stick a QR code label on each paper document that they scan. The scans are saved as JPG files.
I then have a cron job that runs every few minutes which looks at any JPG files in the scan folder and if it finds a QR code the file is renamed with the ID of the job. E.g. file1.jpg becomes DB-12345-file1.jpg where 12345 is the id of the database record. The file is then moved to a different folder.
The cron job runs the following command on each file it finds
zbarimg -Sdisable -Sqrcode.enable –raw
zbarimg is software that can locate QR codes embedded in JPG files.
This is all being tested at the moment and seems to work most of the time however the company now wants to scan the files in PDF format. This does not work with the zbarimg software. I have tried converting PDF to JPG but the quality of the QR Code is lost and zbarimg fails most of the time.
Is there another automated way of linking scanned PDF files to a database record?
I have already written some code using pdf2text and saving the data into the database but this does not provide an indexed association between the PDF files and the records. It just provides a nice way to search for text in a PDF file.
Related
I am dealing with one problem, before starting developing, I decided to do some research.
So the problem is what would be the efficient solution when storing files?
I read about it, and some people was against storing files within database, as it will have negative impact on backups / restorations, it will add more processing time when reading database for large files and etc...
Good option would be to use S3 or any other cloud solution to store the files, but for this current customer cloud won't be good.
Another option would be to store files under file system. The concept is clear, but I try to understand what I need to understand before implementing that solution.
For example we need to consider how we structure directories, if we would store 100 000 files in one directory it can be come slow to open and etc. As well there is like maximum amount of files that can be stored in one directory.
Is there any 3rd party tools that helps to manage files in file system? that automatically partitions files and places them in directories?
I work with a software that have more than 10 million files in file system, how you will structure the folders depends, but what I did was:
Create a folder to each entity (Document, Image...)
Into each folder create a folder to each ID object with ID beign the name of the folder, and put their files inside, but this could vary.
Example to Person that have the ID 15:
ls /storage/Person/15/Image/
Will give me this 4 images that in the database I linked to person with the ID 15:
Output:
1.jpg
2.png
3.bmp
4.bmp
If you have a HUGE amount of elements, you cold separate each digit of an ID into a subfolder, that is: Person wih ID 16549 will have this path: /storage/Person/1/6/5/4/9/Image/
About limits of files in folder I suggest you to read this thread: https://stackoverflow.com/a/466596/12914069
About a 3rd party tool I don't know, but for sure you could build this logic into your code.
English isn't my first language, tell me if you didn't understand something.
I am relatively new to AWS Glue, but after creating my crawler and running it successfully, I can see that a new table has been created but I can't see any columns in that table. It is absolutely blank.
I am using a .csv file from a S3 bucket as my data source.
Is your file UTF8 encoded... Glue has a problem if it’s not.
Does your file have at least 2 records
Does the file have more than one column.
There are various factors that impact the crawler from identifying a csv file
Please refer to this documentation that talks about the built in classifier and what it needs to crawl a csv file properly
https://docs.aws.amazon.com/glue/latest/dg/add-classifier.html
I get 15+ PDF's a day that I have to enter into a database. They are generated from a table where the "Blanks" are filled in from specific table fields. Any tools or python code examples I could use to try and develop a means of extracting the data from the PDF to either write to or create a table to import to the database table? The Database is currently Access mdb.
Thanks
There are a number of approaches that will work.
One simple approach is to simply print the PDF file out to a text file and then have Access import that text. All recent versions of windows allow you to install a “text” printer that outputs the printing of a document to a text file. You can have access “process” a folder of pdfs, print them to text and then import those text files. You might need some VBA to remove “pages” and some extra lines before you import the data into Access.
Another approach is to use Word (Automate from Access) to open a PDF. When word opens a pdf, it converts it to a word document. This approach will even format rows as a word table. You can then pluck out that table data and send that data to word. You can likely pull that text out without writing the data out to a text file – or just use Words “save-as” to a text file (you can automate this process from Access).
Another approach is to use the free Ghost Script library that can extract text from a PDF (this I would consider if did not have word at your disposal).
So which solution is best will much depend on the current software you going to have installed on the computer running Access. Opening the pdf files with word would be my first choice and test.
At my old job we used Cogniview which converted PDF to Excel spreadsheets quite quickly. If you want to use Python, a quick search yielded me this which seems straight forward enough, PDF to XLS with Python
Is there a way to send a mail with different PDF file to different contacts using file maker?
I am aware of sending batch emails with one attachment but I would like to send a personalize PDF for each contact which seems not so simple.
Also
Can I add PDF files to the table itself or would I have to use the path to the file?
Example:
Table 1
**Name** [James Brown] [James Blue]
**Email** [brown.j#gmail.com] [blue.j#gmail.com]
**PDFfileAttchamnet** [folder/PDF/JamesBrown.pdf] [folder/PDF/JamesBlue.pdf]
So an Email for James Brown would look like:
Dear James Brown, please see the attached file.
Attachment [JamesBrown.pdf] {actual file}
and
Dear James Blue, please see the attached file.
Attachment [JamesBlue.pdf] {actual file}
I think you can solve it by creating container field in you database and import the pdfs in it.
then you can use export Field Contents[] to export it and send it by email
Hope it useful
I would like to send a personalize PDF for each contact which seems
not so simple.
Find the records of contacts you want to include and loop among them, sending mail to each one individually (i.e. without selecting the 'Collect addresses across found set' option).
Can I add PDF files to the table itself or would I have to use the
path to the file?
You can do either, it's up to you. If the path to the file can be calculated (as in your example), you can calculate it right there in the Send Mail script step.
Note that you can also generate the PDF files during the process itself.
Do I understand correctly that you would actually like to personalize the PDF document(s)?
This is possible, maybe not very simple, but quite simple. The trick is to prepare the PDF as a form, and then fill the form fields to personalize.
PDF has a native forms data format (called FDF), which is described in ISO 32000 (as well as the older PDF specification documents provided by Adobe, as you can find in the Acrobat SDK, downloadable from the Adobe website).
FDF is a simple structured text file, which can easily be assembled using FileMaker (I have done that routinely for several catalog projects). The easiest way to get going is to open the form in Acrobat, fill in the fields, and then export the data as FDF. This gives you the pattern to "fill in the blanks".
So, you create the FDF files using Filemaker. With them you can fill the blank form and feed the saved document to the eMail system.
Which tool to use to fill the blank form depends on the volume you have to process. Acrobat is not very powerful (and you may end up in a bit of a legal gray zone, because Acrobat is not set up for being used as a service). There are applications which are made specifically for filling out forms on a server (such as FDFMerge by Appligent), or there are also several libraries which have the tools to fill out forms (iText or pdflib come to my mind). These applications also allow you to flatten the PDF, which means that there are no longer form fields, but their contents becomes part of the base.
The resulting file can now be either made to an eMail attachment, or you make it available on a server and send an eMail with the link to the file (which method you will use may depend on security and privacy regulations).
Can anyone help me to build a table that lists all files in a specified folder, so whenever a file is copied to that folder the table should update and make a log of files?
I need the list to retain the names, even if the file is moved from that folder or deleted. Later the data would be deleted by a scheduler.
Also I need the table to record the time exactly when the file was copied into that folder and not the modification or creation time.
I am using windows 7; how can I build a system with my desired behaviour?
Just turn on Windows file auditing, for that folder, the youtube video takes you through the process.
Microsoft provide information on their techNet site as to how you can use the LogParser tool to extract Security events from the Event Log DB.
Note: Admin questions should really be posted to the SuperUser site.