How do I parse custom file with F#? - file

I have a custom file which contains data in a format like below
prop1: value1
prop2: value2
prop3: value 2
Table Instance 1
A B C D E
10 11 12 13 14
12 13 11 12 20
Table Instance 2
X Y Z
1 3 4
3 4 0
Table Instance 3
P R S
2 3 5
5 5 0
I want to be able to parse this file and map the contents to a POCO. I was really excited about working with CSV type provider in F#, but then I quickly realized that it might be not possible to use that in my case. Should I have to write my own parser in this case? (Iterate through each like and parse the values into its appropriate properties in POCO)
Thanks
Kay

If that's a one-of file format, I would just write a parser by hand. Split the file into separate tables, throw away the title and header, then String.Split each row and parse the resulting array into a record type specific for the table.
If that file format is more or less standardized and you expect that you'll need to parse it more often and in different contexts (and/or you're feeling adventurous), you can always write your own type provider.

Related

How do I eliminate data that falls within a range of other data in the same query?

I have a query that returns results that describe a numeric range, with some of these data falling within the range of other data returned in the same query. How can I easily eliminate those?
I have the following data:
Code Start End
----- ------- -------
abc 1 1
abc 2 2
abc 3 8
abc 4 4
abc 5 5
xyz 1 1
xyz 2 5
xyz 3 3
In this case, where code is "abc", there are two rows: start=4,end=4 and start=5,end=5. But preceding them is a row where start=3,end=8. So both of those rows should not be returned in my result set.
I can do with with a temp table, cursor, etc. But I'd like to know if there's an elegant way to do this within the query.
I would do this with a WHERE NOT EXISTS() clause.
The EXISTS() function would be to check for another row where the Start is less than or equal to my Start and the End is greater than or equal to my End.
There are no exact duplicate rows in your sample data, but if it's possible for them to exist in your real data, you will have to consider what you want to do with those as well.

Unable to decode all information from Oracle RAW data

I have an application where I can upload files and add metadata to the file. This metadata information is stored in a database, but parts of the added information is encoded somehow (sadly I have no access to the source code).
The raw representation of the metadata in the Oracle database is as follows:
00000009010000000000000000512005B69801505B000000010000000700000040000000010000000A0100000006496D616765000000003C000000010000000A010000000A696D6167652F706E670000000027000000030000000501000000010000000500000001010000000B64653A3132332E706E6700000002A8000000030000000501000000030000000700000001010000000E737461636B6F766572666C6F770000000042000000010000000A010000001844433078303166363565396420307830303033336433640000000A2600000001000000020100033D3D0000003E000000010000000A0100000021346266653539343939343631356333323861613736313431636337346134353900
Whereas the raw sequence
737461636B6F766572666C6F77
corresponds to
stackoverflow
The query
select UTL_RAW.CAST_TO_VARCHAR2(<raw_data>) from dual;
returns the string below:
Here the values of the metadata are shown. But the names/identifier of the properties are unreadable. The corresponding name/identifier of stackoverflow should be test or a foreign key to a table that contains test. The other data contains additional information about the file (like the checksum, title or mime type)
Is it possible to retrieve the unreadable data (identifier) from the raw string?
RAW columns are not always containing a string, since the results it looks like that the content is binary data, more exactly a jpg file which has a string header in it but among binary information.
Converting it to a varchar will generate invalid charcode that are represented as rectangular boxes.
What you are doing here with varchar is the equivalent of opening a binary file, i.e a winword.doc or even a .jpeg by using Notepad.
To be able to get the content you need to treat it as image, not as varchar.
You can obtain the jpg file by using PLSQL as described here:
http://www.dba-oracle.com/t_extract_jpg_image_photo_sql_file.htm
Eventually it is possible to get all the content without loss in a char datatype using the following:
select RAWTOHEX(<raw_data>) from dual;
This will return the whole content as character value containing its hexadecimal equivalent and should not present any invalid ANSI character which is rapresented with a rectangular box.
Indeed you will not be able to read anymore "stackoverflow" or any other text since you will get only a sequence of HEX values.
You will need then from your program to convert it to binary/image and treat it properly.
Both "A01" and "101" are used to preface a 4 byte length followed by the Text, which is null terminated
00000009 010000000000000000512005B69801505B000000010000000700000040000000010000000A01
00000006 496D61676500 Image
0000003C 000000010000000A01
0000000A 696D6167652F706E6700 image/png
00000027 00000003000000050100000001000000050000000101
0000000B 64653A3132332E706E6700 de:123.png
000002A8 00000003000000050100000003000000070000000101
0000000E 737461636B6F766572666C6F7700 stackoverflow
00000042 000000010000000A01
00000018 444330783031663635653964203078303030333364336400
D C 0 x 0 1 f 6 5 e 9 d 0 x 0 0 0 3 3 d 3 d
00000A26 00000001000000020100033D3D0000003E000000010000000A01
00000021 346266653539343939343631356333323861613736313431636337346134353900
4 b f e 5 9 4 9 9 4 6 1 5 c 3 2 8 a a 7 6 1 4 1 c c 7 4 a 4 5 9

SSIS Inserting Records into table in the same order in flat file

I have a flatfile that looks like the first set. I have a table with an auto incrementing primary key field. Using SSIS how can I guarantee when I import that data that it keeps the record order as specified in the flatfile? I'm assuming that when SSIS reads the file that it will keep that order as it inserts into the database. Is this true?
In File:
RecordType | Amount
5 1.00
6 2.00
6 3.00
5 .5
6 1.5
7 .8
5 .5
In a Database Table
ID | RecordType | Amount
1 5 1.00
2 6 2.00
3 6 3.00
4 5 .5
5 6 1.5
6 7 .8
7 5 .5
Just to be safe, I'd add a Sort Transformation in your SSIS package, you can choose the column you want sorted and how it's sorted. This should ensure it reads it the way you want.
Thew order doesn't matter in a Table. It only matters in a Query.
In my experience it will always load in the order of the input file if you are using an autoincrement ID that is also the clustered index.
Here is a similar discussion that has a couple ideas. Particularly preprocessing the file or using a script component as the source. You may want to take one of those routes because the fact that it may behave the way you want by default does not mean it always will.
http://www.sqlservercentral.com/Forums/Topic1300952-364-1.aspx

Extract a .txt file to a .mat file

I am working with a publicly available database in which four files are there : They are all .txt documents. How can I put them in a .mat format ? I am giving a simple example:
A.txt file
1 2 3 4 5 6
7 8 9 1 2 3
4 5 6 7 8 9
1 2 3 4 9 8
So I need to form a matrix with 4 rows and 6 columns. The data in the txt format is separated by 'space' delimiter. The rows are separated by 'newline'. Typically the .txt documents that I will handle will have sizes 130x1000, 3200x58, etc. Can anyone please help me regarding this? The publicly database is available at : click link. Please download the dataset under the topic "Multimodal Texture Dataset".
You can load the .txt file into MatLab:
load audio.txt
then save them
save audio audio
(the first "audio" is the ".mat" file, the second "audio" is the name of the variable stored in it.
Hope this helps.

Reusing inline data in gnuplot

Does anyone know how I can reuse inline data in Gnuplot, I've been googling it and can't find nothing everything suggests to input the data gain? Basically reuse the '-' file.
in place of a bare replot, you can use refresh if you're using gnuplot 4.3 or newer. If you actually want to add more data to be plotted, I think you're out of luck.
e.g.
plot '-' u 1:2
1 2
2 3
e
set label "Hello World!" at 1.5,2.5
refresh
since I stumbled over this old question via Google...
There are two ways to having "inline data" (data in the gnuplot file):
the special filename '-', which reads the lines immediately following the plot command. This data can only be used once.
named datablocks with here documents, which can be reused:
$Data << EOD
0 0 0
1 1 1
2 2 4
3 3 9
4 4 16
EOD
plot $Data using 1:2 title 'linear' with linespoints, \
$Data using 1:3 title 'quadratic' with linespoints
See http://gnuplot.info/docs_5.5/loc3521.html

Resources