cbis-ddsm files names don't match the names in the train/test csv files - dataset

I've downloaded the CBIS-DDSM data from the cancer imaging archive and its descriptive train/test csv files. However, I noticed that the names of the dcm files do not match the ones found in the csv files.
For example, a file named Mass-Training_P_00001_LEFT_CC/../1-1.dcm is found as Training_P_00001_LEFT_CC/../000000.dcm in the descriptive csv.
For those who have used this dataset before, how did you handle this?
Thank you for your help.

Related

Extract compressed text files

I wanted to ask for some help to extract a few things.
I have more or less 2TB of files that are zipped, and I want to unzip the .txt files and that inside that .txt file it says "Função: finalizar vendas"
There are several .rar .zip .7z files that have inside them several folders with the names of the employees, and inside the folders their respective functions, which would be inside some of the notepads.
The only thing I managed to get closer to this was in 7-Zip, with a command to extract only the .txt files from the compact files, but there are many unnecessary notepads and I just wanted the one that contains the previously said sentence.

How to merge two folders full of text files with matching names?

Say I have the following:
C:/folder1/textfile1.txt
C:/folder1/textfile2.txt
C:/folder1/textfile3.txt
C:/folder2/textfile1.txt
C:/folder2/textfile2.txt
C:/folder2/textfile3.txt
I'd like to merge each file in folder2 with its matching file in folder1. If this is possible, how would I go about doing that? I've found a lot of stuff about merging a bunch of text files in a given folder into a single file, but nothing like this.
Thanks.

Using a batch file to copy multiple files with the same name, and past into new folder with differing names

I have a long list of files that are auto-produced every month. they'll all have the same file name, with a sequential file extension like this:file.001, file.002, file.003
Each file has differing information, despite having the same name. What I need to do is copy them from their home directory and paste them into a new directory with names that reflect their purpose, and as text files, like this: Budget.txt, Expense.txt, Retention.txt
Is it possible to do this with a batch file? I've been unable to find a method that works. Any help would be appreciated.
EDIT: I've tried that solution, and it works as far as it goes. the frustrating thing is that the extensions are not always the same, but always sequentially numbered.

Find and replace string in several xml files

I need to replace a string "AVL_P" with "AVL_PRESSURE" in several XML files. Would someone be able to help me do this using a .bat file?
The xml files are in a directory called C:\ECU\ECU1, ECU2 and ECU3 folders.
Thanks
John.

Detecting the database a .DAT file belongs to

I have a set of .DAT files present along side a set of .IDX files with the same name.
The goal is to be able to open these files and read its contents, parsing it into a new format. The problem: I have no idea what database the data is being stored in! The files contain no headers or clues, they are binary, and the resource from which I have received these has no idea as to its storage mechanism.
So the question is: What are some common databases which store databases in .DAT files and store their indexes in .IDX files with the same name? Is there an application I can use in Linux or Windows which can detect the database?
EDIT :-
File names:
price.dat
price.idx
Here is a hex dump of the beginning of the .DAT file:
030D04806420500FFE3E0500002078581001C000738054E0C0099804138100402550080442090082403C101F7406010080C0A010201002010C006FC0246C0403FE00B041C051F0091BFE042F812FE054F8177E066F81BFE078F8207E08AF824FE09CF8297E0AEF82DFE0C0F8327E0D2F836FE0E4F83B7E0F6F83FE5FEFF47C06608480FA91F003C0213101F1BFDFE804220100F500D2A00388430801E04028D4390D128B46804024010A067269FCA546003C0844060E11F084B9E1377850
Here is a hex dump of the beginning of the .IDX file:
030D04805820100FFD7E0000397FEB60050410007300246A3060068220009BE0401030088B3903F740E010C80402410281402030094004C708004DC058880FFC052F015EBFE042F812FE054F8177E066F81BFE078F8207E08AF824FE09CF8297E0AEF82DFE0C0F8327E0D2F836FE0E4F83B7E0F6F83FFE108F8447E11AF848FE12CF84D7E13EF851FE150F8567E162F85AFE174F85F7E186F863FE198F8687E1AAF86CFE1BCF8717E1CEF875FE1E0F87A7E1F2F87EF5FEFF005E30901714
Both files uniquely start out with 030D04806420500FF wonder if this is a good start?
Did a quick search on Google but it didn't return anything...
END EDIT :-
Any other ideas?
Thanks much in advance!
There is a faircom ODBC driver called 'ctreeODBC_RO.exe' which should be capable.

Resources