Extract list of data from PDF that repeats using Azure Form Recognizer - azure-form-recognizer

I have PDF that has sections that is similar in structure but the data differs. The sections are not detected as table. Each sections have multiple data to be read. And I would need to get all the values for all sections as array. And the number of section varies with invoice. How do I achieve that.

Related

Using Google Sheets to split data from a column into different sheets

I work at a private international school which wastes a ton of paper. My goal is to reduce that by using google sheets to collect students lunch data from teachers, rather than writing it on paper weekly, then having someone input it all manually, then make separate sheets for each student manually.
I want to make this more efficient by using google sheets.
My google sheet 1 has all the students' data for a whole month with all their names and data in what I expect is a normal format. How can I use the google sheets split function to read the column with their names in, and separate data with the same name into its separate sheet?
Screenshot of preliminary data
The picture shows the student's name repeating which will happen 4 or 5 times, depending on the month. Rather than manually separating them, I imagine there is a script I can use on google sheets to automatically read each repeated name and separate it into its own sheets.
How do I do this?
The information you provide in fact has some opportunities but, I generate a data sheet for you to verify, I believe it represents the data you have.
In regard to your problem you can use a wide range of solutions one common is the:
Filter Formula
Here is some information on how to use it.
So finally here is a proposed answer sample with this formula so you can check the full configuration of the solution.
You can also just copy paste the data you have in to it, if it has the same structure and it will dynamically adapt to your information. You can also copy this spreadsheet to make it private to you on File> Make a copy.
Other formulas you can use is QUERY or VLOOKUP
If you need further assistance you can contact one of stack overflow members through Stack overflow or you can join a social group like this facebook group where we attend this kind of questions.

fileserver vs DB query speed

I have very simple data that I need to retrieve as quickly as possible:
I have json data that is associated with a hash of an email. So the table looks like this:
email_sha256, json
and has millions of rows.
I was wondering if one of the following two options would be faster:
1 Split the single large table into many smallers (split by alphabetical order)
2 Do not use a DB at all and serve the data as files. i.e. every email hash is the name of a separate file that contains the json data.
Creating a file for each user (for each email address), looks so wrong for so many aspect:
If you needs good performance you need a small amount number of file by directory
DB were created for that, you can have an index to retrieve the information very fast.
Without a DB you need to have your own lock/synchronization mechanism
If you are using a DB why using json to store data.
If you are looking for performance, do not serialize the data to a json.
What do you mean by "fast", can you quantify this duration/delay ?
Unless (maybe) the information associated with the user are huge (The size must be very superior to one sector). But again in this case, what do you mean by fast.

How to handle numerous forms with different fields and data types in the same table?

I need to develop an application where I need to handle more than 30 forms. Those forms have different numbers of fields with different data types. After storage, I need to do advanced search over the forms. A full-text-search may be needed for fields with a specific name shared among forms. Expected data size is ~50k forms with ~500k form fields. PostgreSQL is going to be used.
The solutions that I came up with:
1. Encoding form fields into a JSON String
Problems: Performing full-text-search for data with a specific field name can be cumbersome. Also when I need to read or update the data in the form, I need to perform decode and encode.
2. Creating a table having data fields as much as the number of the inputs in a form
Problems: As the form mapping classes are going to be ready, I can actually map form fields to database fields for each of those forms. For search, I may need to write different rules for each of those forms as the mapping for fields will change radically.
3. Keeping fields in a different table with a foreign-key to the form table
Problems: Maintaining the form data is still an issue but I don't know about speed. I expect it to run at least as fast as the previous way. Generating the search query from Java/Hibernate will be a bit harder than the previous way.
As I don't have any experience in handling such a case, I need your help and suggestions.

How to display all records in one page in crystal reports while using multiple detail section?

In this below image i design a crystal report
the records are printing in eight pages.i have three details section one detail section i inserted sub report.anther two detail sections i inserted two different formats
i have written one store procedure as below.
out put is
i am not getting any idea solve this problem
please help me
Actually my problem is i have three tests in my project.i have to print three tests in three different pages with different formats In first test i have eight results i have to show eight results in one page.is there any solution give reply
You have to do grouping for all the detail section. And While grouping click on the keep group together.
Thus your problem is solved

Database or file type for containing modular page contents?

I'm building a personal web and I have a problem planning my DB structure.
I have a portfolio page and each artwork has its own description page.
So I need to save contents in a file or a database.
The Description page has some guideline,
but the length, the number, and the order of elements are free.
for example a page may have just one paragraph or more,
and in each paragraph, many footage codes and text blocks can be mixed.
the Question is:
(I made a simple diagram to describe my needs).
What can I choose for my data type to save that contents structure maintaining the order?
I'm used to XML and I know XML can be one of the choice, but if the contents is big, it will be hard to read and slow.
I've heard that JSON alternates XML these days, but as I searched, JSON cannot maintain the order of elements, can it?
Waiting for clever recommendations:)
Thanks.

Resources