Read from excel file in C - c

I want to read from an excel file in C. The excel 2007 file contains about 6000 rows and 2 columns. I want to store the contents in a 2-D array in C. If there exists a C library or any other method then please let me know.

Excel 2007 stores the data in a bunch of files, most of them in XML, all crammed together into a zip file. If you want to look at the contents, you can rename your .xlsx to whatever.zip and then open it and look at the files inside.
Assuming your Excel file just contains raw data, and all you care about is reading it (i.e., you do not need/want to update its contents and get Excel to open it again), reading the data is actually pretty easy. Inside the zip file, you're looking for the subdirectory xl\worksheets\, which will contain a number of .xml files, one for each worksheet from Excel (e.g., a default workbook will have three worksheets named sheet1.xml, sheet2.xml and sheet3.xml).
Inside of those, you're looking for the <sheet data> tag. Inside of that, you'll have <row> tags (one for each row of data), and inside of them <c> tags with an attribute r=RC where RC is replaced by the normal row/column notation (e.g., "A1"). The <c> tag will have nested <v> tag where you'll find the value for that cell.
I do feel obliged to add a warning though: while reading really simple data can indeed be just this easy, life can get a lot more complex in a hurry if you decide to do much more than reading simple rows/columns of numbers. Trying to do anything even slightly more complex than that can get a lot more complex in a hurry.

You have several choices:
1) Save your excel worksheet to a csv file and parse that.
2) Use the COM API (Windows proprietary and tricky)
3) See this link for a C++ class that you could modify.

Another C lib to read data from excel files can be found here.

Related

CSV file not recognized as csv, reason nominal value not declared in header

I am trying to load a dataset in weka, I have tried many solutions such as arff format, comas etc. but it was all a failure. Could any of you give me a working solution or load this dataset according to the format.
Here is a link to dataset
Instead of using Weka's functionality for reading CSV files, you could use ADAMS (developed at the same university; I'm the lead developer) instead.
Download the adams-ml-app snapshot and then use the Weka Investigator to load/save the file:
Load it as ADAMS Spreadsheets (.csv, .csv.gz)
Save it as Arff data files (.arff, .arff.gz) or Simple ARFF data files (.arff, .arff.gz)
The Reviews column contains an erroneous 3.0M, which prevents it from becoming numeric.
If you want to have an introduction to the Weka Investigator, then take a look at my talk from the Weka User Conference 2021: Taking Weka to the next level with ADAMS .
There are too many issues with lines in this file.
In line 23, I eliminated the odd looking brackets.
I removed all single quotes (')
I eliminated all repeated double quotes ("")
In line 10474 the first two fields (before the number) didn't seem to be separated, so I added a comma.
This allowed the file to go through initial screening, but...
The file contains a lot of odd emojis. I started to eliminate them one by one, but there are clearly more of these than I wish to deal with.
Each time I got rid of one, it would read farther into the file, then stop at the next one.
If I just try to read the top of the file, the first 20 lines before we get to any of these problems, it reads fine.
My partial editing can be found here: https://www.dropbox.com/s/ij707mb23dt1jvz/googleplaystore3.csv?dl=0
I think if you clear up the remaining emojis the file should be usable.

How does one split large result set from a Group By into multiple flat files?

I'm far away from an SSIS expert and I'm attempting to correct an error (unspecified in the messages) that began once I modified a variable to increase the size of the data accumulated and exported into a flat file. (Note variable was a date in the WHERE statement that limited the data returned from the SELECT.)
So in the data flow there's a GROUP BY component and I'm trying to find the appropriate component to put in between that and the flat file destination component to chop up the results. I figured there'd be something to export, say flatFile1.csv, flatFile2.csv, etc. based on a number of lines (so if I set a limit of 1-million lines and the results returned 3.5-million, I'd get 4 files with the last one containing 1/2-million lines) or perhaps a max file size with similar results.
Which component should I use from the toolbox to guarantee a manageable file size?
Is a script component the only way to be able to handle any size output? If so would it sit in between the Group By and the Flat File output components or would the script completely obviate the need for the Flat File output?

Reading and writing to xls and doc files in c

I have this particular problem where i have to write a c program that reads numerical data from a text file. The data is tab delimited. Here is a sample from the text file.
1 23099 345565 345569
2 908 66766 66768
This is data for clients and each client has a row.Each column represents customer no.,previous balance,previous reading, current reading.Then i have to generate a doc. document
that summarizes all this information and calculates the balance I can write a function that does this but how do i create an xls document
and a word document where all the results are summarized using the program? The text document has only numerical data. Any ideas
The easiest way is to create a csv file and not a xls file.
Office can open those csv files with good results.
And it is way easier to create a ascii text file with commaseparated values,
than to create something into a closed format like the ms office formats.
The simplest way to create a spreadsheet that contains formulas and formatting, and can be opened by Excel, is to create an XML Spreadsheet file.

Need to extract/consolidate info from database files

Here's a summary of my problem:
Our company's old software had a large database of contacts in it.
We switched to a new program and have no way to easily transfer those contacts to it.
The contacts database appears to have 4 files which can all be opened in Excel, but not MSAccess. The four files contain the following:
File 1: A nicely formatted spreadsheet of names and some other BASIC info for each contact. There is an ID number on each one, but the numbers do not seem to correspond to anything in File 2.
File 2: Info on each contact, but not in rows. Instead it looks something like this :
JHGH_CONTACT_BLOB: 1426367745
EMAIL: SMITH
WEB:
PHONE_COUNT: 1
FAX_COUNT: 0
ADDRESS_COUNT: 0
NOTE_COUNT: 0
555-7364
(I changed some info for privacy reasons)
Each blob of info is on a separate spreadsheet row. Each starts off with the same first line, even the number is the same, so it can't be some sort of ID number.
File 3: A file containing a lot of gobbledygook, interspersed with a few readable bits of text here and there. The readable text looks like it belongs to the database (ie, it is info on contacts like place of work and other notes.)
File 4: Contains one row and one column labeled ID, with the number 12725 in it.
I need to somehow get the info from File 2, into the nicely formatted file 1. In essence, I need to add the phone numbers, emails etc included in a messy fashion in file 2 on their proper rows in file 1.
This probably makes little sense and I thank you for even reading down this far. If you have any suggestions, I'd love to hear them.
Thanks
We have established that you have a DBF file, an FPT file and a CDX file. These are likely to all relate to Visual FoxPro (a now discontinued Microsoft product).
The .dbf file can be opened in Excel via the standard file open dialog by changing "Files of type" to "dBase files (*.dbf)". Going by your original post, Excel seems to be able to open this sensibly in the first place.
The combination of all three files might be accessible by downloading this OLE DB provider for FoxPro which would let you access the database from Excel using the methods outlined here
You can get more info on the specific file structures at the following links: DBF, FPT and CDX. The DBF contains most of the data, the FPT contains binary memo data and the CDX is an index file.

components of an SPSS project

I have given some data in an excel sheet to a 3rd party for SPSS data processing. After completion of the processing, what are the files that I should get back from them.
I have received one file with a ".sav" extension. I presume this file contains the imported data (from my excel file).
I have received documents (.rtf - rich text format) with the chart and graphs only. Is there something else I need to get so that I can use the files later on for further analysis.
Thanks in advance
V Karthick
Yes, the ".sav" extension is the data file. You should also request the syntax file(s), ".sps" extension. The syntax file is a record of all data transformations which have been performed and allows you to review their work. The syntax file can be opened with notepad or any text editor.
Arthur

Resources