Need to extract/consolidate info from database files - database

Here's a summary of my problem:
Our company's old software had a large database of contacts in it.
We switched to a new program and have no way to easily transfer those contacts to it.
The contacts database appears to have 4 files which can all be opened in Excel, but not MSAccess. The four files contain the following:
File 1: A nicely formatted spreadsheet of names and some other BASIC info for each contact. There is an ID number on each one, but the numbers do not seem to correspond to anything in File 2.
File 2: Info on each contact, but not in rows. Instead it looks something like this :
JHGH_CONTACT_BLOB: 1426367745
EMAIL: SMITH
WEB:
PHONE_COUNT: 1
FAX_COUNT: 0
ADDRESS_COUNT: 0
NOTE_COUNT: 0
555-7364
(I changed some info for privacy reasons)
Each blob of info is on a separate spreadsheet row. Each starts off with the same first line, even the number is the same, so it can't be some sort of ID number.
File 3: A file containing a lot of gobbledygook, interspersed with a few readable bits of text here and there. The readable text looks like it belongs to the database (ie, it is info on contacts like place of work and other notes.)
File 4: Contains one row and one column labeled ID, with the number 12725 in it.
I need to somehow get the info from File 2, into the nicely formatted file 1. In essence, I need to add the phone numbers, emails etc included in a messy fashion in file 2 on their proper rows in file 1.
This probably makes little sense and I thank you for even reading down this far. If you have any suggestions, I'd love to hear them.
Thanks

We have established that you have a DBF file, an FPT file and a CDX file. These are likely to all relate to Visual FoxPro (a now discontinued Microsoft product).
The .dbf file can be opened in Excel via the standard file open dialog by changing "Files of type" to "dBase files (*.dbf)". Going by your original post, Excel seems to be able to open this sensibly in the first place.
The combination of all three files might be accessible by downloading this OLE DB provider for FoxPro which would let you access the database from Excel using the methods outlined here
You can get more info on the specific file structures at the following links: DBF, FPT and CDX. The DBF contains most of the data, the FPT contains binary memo data and the CDX is an index file.

Related

SAP Data Services .csv data file load from Excel with special characters

I am trying to load data from an Excel .csv file to a flat file format to use as a datasource in a Data Services job data flow which then transfers the data to an SQL-Server (2012) database table.
I consistently lose 1 in 6 records.
I have tried various parameter values in the file format definition and settled on setting Adaptable file scheme to "Yes", file type "delimited", column delimeter "comma", row delimeter {windows new line}, Text delimeter ", language eng(English) and all else as defaults.
I have also set "write errors to file" to "yes" but it just creates an empty error file (I expected the 6,000 odd unloaded rows to be in here).
If we strip out three of the columns containing special characters (visible in XL) it loads a treat so I think these characters are the problem.
The thing is, we need the data in those columns and unfortunately, this .csv file is as good a data source as we are likely to get and it is always likely to contain special characters in these three columns so we need to be able to read it in if possible.
Should I try to specifically strip the columns in the Query source component of the dataflow? Am I missing a data-cleansing trick in the query or file format definition?
OK so didn't get the answer I was looking for but did get it to work by setting the "Row within Text String" parameter to "Row delimiter".

SSIS error handling: redirect rows that have zip code field more than 5 from a flat file

I have been given a task to load a simple flat file into another using ssis package. The source flat file contains a zip code field, now my task is to extract and load into another flat file that accepts only the ones with correct zip code which is 5 digit zip code , and redirect the invalid rows to a new file.
Since I am new to SSIS, any help or ideas is much appreciated.
You can add a derived column which determines the length of the field. Then you can add a conditional split based on that column. <= 5 goes the good path, > 5 goes the reject path.

AS400: How to know which program create a file?

I am not expert about AS400, just know some commands and i exporti some files from AS400 (iSeries) into SQL Server 2005.
Actually i need to know which RPG Program created a file in a library. This because that file contains statistic data from other files stored in other AS400 libraries.
This screenshot show the file STTMVF in the library DAT_4DWH (by DSPLIB DAT_4DWH)
So there are a command that let me know which RPG program created the file STTMVF ?
If yes i need to open the source RPG or CL and try to understand which phisical files are used to compose this statistic file.
Thanks in advance!
You can use journal management or program references to determine what is writing to the file.
Journal management
Starting the journal
To create a basic journal you need to create a journal receiver, a journal, and activate journalling for the file. Replace RECEIVER-LIB, RECEIVER-FILE, JOURNAL-LIB, JOURNAL-FILE, FILE-LIB and FILE with values appropriate for your system.
CRTJRNRCV JRNRCV(RECEIVER-LIB/RECEIVER-FILE)
CRTJRN JRN(JOURNAL-LIB/JOURNAL-FILE) JRNRCV(RECEIVER-LIB/RECEIVER-FILE)
STRJRNPF FILE(FILE-LIB/FILE) JRN(JOURNAL-LIB/JOURNAL-FILE) OMTJRNE(*OPNCLO)
Dumping the journal
DSPJRN JRN(JOURNAL-LIB/JOURNAL-FILE) FILE(FILE-LIB/FILE) RCVRNG(*CURCHAIN) JRNCDE(R) ENTTYP(PT PX DL UP) OUTPUT(*OUTFILE) OUTFILFMT(*TYPE1) OUTFILE(QTEMP/QADSPJRN)
Querying the journal
The field JOPGM will contain the program name that inserted, updated, or deleted records from the file.
Removing the journal
ENDJRNPF FILE(FILE-LIB/FILE)
DLTJRN JRN(JOURNAL-LIB/JOURNAL-FILE)
Program references
Dumping the references
DSPPGMREF PGM(*ALLUSR/*ALL) OUTPUT(*OUTFILE) OUTFILE(QTEMP/QADSPPGM)
Querying the references
Search the file for all references where the field WHFNAM equals FILE. The field WHPNAM will contain the program name. Due to file overrides, etc this method is not as accurate as using a journal.

How to extract all the IUPAC names mentioned in the data available from Pubchem(NCBI) into a text file?

I want to build lists of prefixes and suffixes of some length from all the IUPAC names mentioned in Pubchem Database,so that I can use them further in my project as a feature.So I want all the IUPAC chemical names in a text file or in some format where I can extract these lists.
Thanks.
Sounds you need something like this Nist species list
You can search for most also in the Webbook but I failed to find a download link for the complete set.
In our lab we got a Cd(?) with the mass spectral database which contained the (complete? - well it got like 250.000 substances) database as text file. Maybe you can get that through some of the vendors.
The pubchem site offers you to download a dump of their data by ftp. Why not use that?
PubChem data can be downloaded via ftp from the PubChem site. A complete description of the available data can be obtained here: https://pubchemdocs.ncbi.nlm.nih.gov/downloads
Of particular interest for the question of IUPAC names, the data are downloadable from the "Compound Extras" section of the ftp site: ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/Extras/
The README-Extras file in this location describes the data in detail. For the IUPAC names, the following information is provided:
CID-IUPAC.gz:
This is a listing of all CIDs with their computed IUPAC names.
It is a gzipped text file with CID, tab, IUPAC on each line. Note
that the names may contain UTF8 characters.
A download today (23-Apr-2020) contains 102,586,778 rows. An excerpt of the information is shown below.
> head CID-IUPAC
1 3-acetyloxy-4-(trimethylazaniumyl)butanoate
2 (2-acetyloxy-3-carboxypropyl)-trimethylazanium
3 5,6-dihydroxycyclohexa-1,3-diene-1-carboxylic acid
4 1-aminopropan-2-ol
5 (3-amino-2-oxopropyl) dihydrogen phosphate
6 1-chloro-2,4-dinitrobenzene
7 9-ethylpurin-6-amine
8 2,3-dihydroxy-3-methylpentanoic acid
9 (2,3,4,5,6-pentahydroxycyclohexyl) dihydrogen phosphate
11 1,2-dichloroethane

Read from excel file in C

I want to read from an excel file in C. The excel 2007 file contains about 6000 rows and 2 columns. I want to store the contents in a 2-D array in C. If there exists a C library or any other method then please let me know.
Excel 2007 stores the data in a bunch of files, most of them in XML, all crammed together into a zip file. If you want to look at the contents, you can rename your .xlsx to whatever.zip and then open it and look at the files inside.
Assuming your Excel file just contains raw data, and all you care about is reading it (i.e., you do not need/want to update its contents and get Excel to open it again), reading the data is actually pretty easy. Inside the zip file, you're looking for the subdirectory xl\worksheets\, which will contain a number of .xml files, one for each worksheet from Excel (e.g., a default workbook will have three worksheets named sheet1.xml, sheet2.xml and sheet3.xml).
Inside of those, you're looking for the <sheet data> tag. Inside of that, you'll have <row> tags (one for each row of data), and inside of them <c> tags with an attribute r=RC where RC is replaced by the normal row/column notation (e.g., "A1"). The <c> tag will have nested <v> tag where you'll find the value for that cell.
I do feel obliged to add a warning though: while reading really simple data can indeed be just this easy, life can get a lot more complex in a hurry if you decide to do much more than reading simple rows/columns of numbers. Trying to do anything even slightly more complex than that can get a lot more complex in a hurry.
You have several choices:
1) Save your excel worksheet to a csv file and parse that.
2) Use the COM API (Windows proprietary and tricky)
3) See this link for a C++ class that you could modify.
Another C lib to read data from excel files can be found here.

Resources