Manually create a multi-sheet file for excel from C - c

I am working on a C application. I was planning on using a CSV file to read the values into a spread sheet, but then as the data got more and more complex (around 100 cols), I saw the need to start to do multiple sheets. I am working on a single board computer, and the file is used for storing diagnostic information. I would like to be able to write the file from the SBC in an ASCII format, and then import it to excel (or the open source alternative), and have multiple sheets. Is this even possible, or should I start working on macros to run on the data?

Maybe consider using XML to write the data, then it can easily be transformed into whatever format you want, and you can have the data be emitted in a way that make semantic sense rather than according to the architecture of a spreadsheet. This might allow more flexibility for different external programs to interact with the data. You can use XSLT to transform it into multiple CSV files if that's desired and there is probably reasonable ways to import it into spreadsheets directly.

Related

Why is there no program-data independence in traditional file processing?

"In traditional file processing, the structure of data files is embedded in the application programs, so any changes to the structure of a file may require changing all programs that access that file. By contrast, DBMS access programs do not require such changes in most cases. The structure of data files is stored in the DBMS catalog separately from the access programs. We call this property program-data independence."
The following text is taken from the book Fundamentals of the Database system. I didn't get the part about the traditional file processing can somebody please explain(an example would be appreciated)?
I'll give you a simple example.
Microsoft Excel used to save its files in a proprietary binary format. In practical terms, this meant that you could only work on those files using Excel.
But now, Excel supports an open document format in XML that is text-based, and allows other programs like the OpenOffice SDK to interact with them. So you no longer need to rely on Excel to work with open document format Excel files.

Is it possible to manipulate pdf files in Visual Basic without an external library/SDK?

I am looking at how to implement PDF merging with raw VB code so that the code may be invoked by a bot for business process automation.
The software used to create the bot provides a function to invoke VB code, but I don't believe it can access any externally imported libraries because it expects plain source, so I essentially need to produce code that one could run in a VB shell environment without anything fancy (or convenient, it seems).
All the research I've done so far point me in the direction of external packages I would need to install, such as iText; this is what I'm looking to avoid.
(previous iText employee here)
PDF is not an easy (binary) format.
Essentially, blobs of information (text that has to be rendered, fonts, images, vector graphics, etc) are compressed and gathered into objects.
Each object gets a number. Objects are allowed to reference eachother (a piece of text might say 'I want to be rendered with font 4433')
All object numbers and their byte offset in the file are gathered in the crossreference (often called XREF) table.
A PDF includes a 'Pages' dictionary object that tells the viewer which objects belong on which page.
In order to merge PDF files, you would need to:
- read all XREF tables of all files
- adjust all of those to the correct byte offset
- update various dictionary objects within the PDF file that tell it where all the objects per page are kept
This is by no means a trivial task, but it can be done using only VB.
If you are serious about implementing a robust, scalable version of this of tool, perhaps it's better to look at the iText sourcecode and try to port it to VB?

Data Extraction from PDF

I get 15+ PDF's a day that I have to enter into a database. They are generated from a table where the "Blanks" are filled in from specific table fields. Any tools or python code examples I could use to try and develop a means of extracting the data from the PDF to either write to or create a table to import to the database table? The Database is currently Access mdb.
Thanks
There are a number of approaches that will work.
One simple approach is to simply print the PDF file out to a text file and then have Access import that text. All recent versions of windows allow you to install a “text” printer that outputs the printing of a document to a text file. You can have access “process” a folder of pdfs, print them to text and then import those text files. You might need some VBA to remove “pages” and some extra lines before you import the data into Access.
Another approach is to use Word (Automate from Access) to open a PDF. When word opens a pdf, it converts it to a word document. This approach will even format rows as a word table. You can then pluck out that table data and send that data to word. You can likely pull that text out without writing the data out to a text file – or just use Words “save-as” to a text file (you can automate this process from Access).
Another approach is to use the free Ghost Script library that can extract text from a PDF (this I would consider if did not have word at your disposal).
So which solution is best will much depend on the current software you going to have installed on the computer running Access. Opening the pdf files with word would be my first choice and test.
At my old job we used Cogniview which converted PDF to Excel spreadsheets quite quickly. If you want to use Python, a quick search yielded me this which seems straight forward enough, PDF to XLS with Python

Reading Excel files into SQL Server not using OLEDB/ODBC

Is there a way to read Excel 2010/2013 files natively ?
We are importing Excel files into SQL Server and have come across a specific issue whereby it looks as though the Excel driver decides the type of a destination data column depends upon testing the contents of only the first 65K odd rows.
This has only just started happening within the past 3 weeks, before then we had managed to convince Excel of the error of its ways by a simple registry hack that forced it to read the entire set of rows.
The problem is that we have some datasets that contain, say 120,000 rows and these may have all numeric values for the first 80,000, then it will have some non-numeric yet vital information that we wish to retain.
Yes, the data is not correctly typed, we know.
Because the source data type has been determined by the Excel driver to be a float it promptly turns all our non-numeric values into NULLs - not very useful.
If there was some other way to read an Excel file not using the standard ODBC/OLEDB drivers that might help.
We have tried saving it into various other formats before importing but of course all these exports use the Excel driver which has the problem.
I think the closest we have got is to save it as XML (which is frankly huge at 800MB) and then shred it using standard xpath queries and some pretty dodgy workarounds to handle no doubt well-formed but still tricky variations on how column data is represented.
Edit: changed title to more closely reflect the issue
As well as the registry key, when connectting to your excel file have you tried setting the following:
;Extended Properties="IMEX=1"
See here
Also see this MSDN article

Generating several similar SSIS packages (file data source to DB)

Is there a way to automatically generate SSIS packages? I need to create a lot of SSIS packages that just erase data from one table and import data from a text file. The file name matches table name and the column headers are in the first line of the file.
For more detailed information:
I am working on a project in which I have to separate two systems that are currently coupled (one system has direct access to the other's database). After the modifications, one system will provide data through txt files to be loaded in the other database.
We have to use SSIS to load data into the database from the text files.
The text files will be provided in CSV format with column headers in the first line.
The tables from both databases have matching column names, and all we need to do is clear the table and load data from the files.
I have more than one hundred tables with different number of columns. Do I need to create each package manually?
I'm familiar with 2 free options.
EzAPI might be a good place if you're a .NET heavy shop or just really want to geek out with the API. This approach allows you to control the pretty much the entire package generation but at the cost of coding time. I find EzAPI generally easier than working with the base COM/.NET libraries for SSIS.
Biml is an interesting beast. Varigence will be happy to sell you a license to Mist but it's not needed. All you would need is BIDSHelper and then browse through BimlScript and look for a recipe that approximates your needs. Once you have that, click the context sensitive menu button in BIDSHelper and whoosh, it generates packages.
I did this just using vb, I passed in the table names as a command parameter and used vb to generate the insert and clear, worked a charm... I can try and dig it out tomorrow when I'm back in the office but it was pretty simple. There didn't seem to be any other way to say "just get x and export it", "just take y and import it into z" so vb it had to be. In fact come to think of it I think I actually used a small xml file to pass the table info for export and then determined the table name for import from the csv file name. To be clear, this was only one package but it could dynamically choose the number of imports/exports it did. Further clarification this was vb within ssis as a processing step

Resources