Does anyone know how I can reuse inline data in Gnuplot, I've been googling it and can't find nothing everything suggests to input the data gain? Basically reuse the '-' file.
in place of a bare replot, you can use refresh if you're using gnuplot 4.3 or newer. If you actually want to add more data to be plotted, I think you're out of luck.
e.g.
plot '-' u 1:2
1 2
2 3
e
set label "Hello World!" at 1.5,2.5
refresh
since I stumbled over this old question via Google...
There are two ways to having "inline data" (data in the gnuplot file):
the special filename '-', which reads the lines immediately following the plot command. This data can only be used once.
named datablocks with here documents, which can be reused:
$Data << EOD
0 0 0
1 1 1
2 2 4
3 3 9
4 4 16
EOD
plot $Data using 1:2 title 'linear' with linespoints, \
$Data using 1:3 title 'quadratic' with linespoints
See http://gnuplot.info/docs_5.5/loc3521.html
Related
I want to work with the sentiment140 dataset for a sentiment analysis task, as I saw that it contains normally the following labels :
0, 4 for pos and neg sentences
2 for neutral sentences
which I found when looking at the dataset on their website :
https://huggingface.co/datasets/sentiment140#data-fields
But after importing it on my notebook it tells me that it contains just two labels :
0 for neg
4 for pos !!!
So how to get full dataset with the three labels ?
You are correct on the fact that the HuggingFace sentiment140 dataset only contains 2 labels in the training set (0 and 4); however, the test set contains the 3 labels (0, 2 and 4).
You could open a discussion here to ask the authors if this is normal.
I have an application where I can upload files and add metadata to the file. This metadata information is stored in a database, but parts of the added information is encoded somehow (sadly I have no access to the source code).
The raw representation of the metadata in the Oracle database is as follows:
00000009010000000000000000512005B69801505B000000010000000700000040000000010000000A0100000006496D616765000000003C000000010000000A010000000A696D6167652F706E670000000027000000030000000501000000010000000500000001010000000B64653A3132332E706E6700000002A8000000030000000501000000030000000700000001010000000E737461636B6F766572666C6F770000000042000000010000000A010000001844433078303166363565396420307830303033336433640000000A2600000001000000020100033D3D0000003E000000010000000A0100000021346266653539343939343631356333323861613736313431636337346134353900
Whereas the raw sequence
737461636B6F766572666C6F77
corresponds to
stackoverflow
The query
select UTL_RAW.CAST_TO_VARCHAR2(<raw_data>) from dual;
returns the string below:
Here the values of the metadata are shown. But the names/identifier of the properties are unreadable. The corresponding name/identifier of stackoverflow should be test or a foreign key to a table that contains test. The other data contains additional information about the file (like the checksum, title or mime type)
Is it possible to retrieve the unreadable data (identifier) from the raw string?
RAW columns are not always containing a string, since the results it looks like that the content is binary data, more exactly a jpg file which has a string header in it but among binary information.
Converting it to a varchar will generate invalid charcode that are represented as rectangular boxes.
What you are doing here with varchar is the equivalent of opening a binary file, i.e a winword.doc or even a .jpeg by using Notepad.
To be able to get the content you need to treat it as image, not as varchar.
You can obtain the jpg file by using PLSQL as described here:
http://www.dba-oracle.com/t_extract_jpg_image_photo_sql_file.htm
Eventually it is possible to get all the content without loss in a char datatype using the following:
select RAWTOHEX(<raw_data>) from dual;
This will return the whole content as character value containing its hexadecimal equivalent and should not present any invalid ANSI character which is rapresented with a rectangular box.
Indeed you will not be able to read anymore "stackoverflow" or any other text since you will get only a sequence of HEX values.
You will need then from your program to convert it to binary/image and treat it properly.
Both "A01" and "101" are used to preface a 4 byte length followed by the Text, which is null terminated
00000009 010000000000000000512005B69801505B000000010000000700000040000000010000000A01
00000006 496D61676500 Image
0000003C 000000010000000A01
0000000A 696D6167652F706E6700 image/png
00000027 00000003000000050100000001000000050000000101
0000000B 64653A3132332E706E6700 de:123.png
000002A8 00000003000000050100000003000000070000000101
0000000E 737461636B6F766572666C6F7700 stackoverflow
00000042 000000010000000A01
00000018 444330783031663635653964203078303030333364336400
D C 0 x 0 1 f 6 5 e 9 d 0 x 0 0 0 3 3 d 3 d
00000A26 00000001000000020100033D3D0000003E000000010000000A01
00000021 346266653539343939343631356333323861613736313431636337346134353900
4 b f e 5 9 4 9 9 4 6 1 5 c 3 2 8 a a 7 6 1 4 1 c c 7 4 a 4 5 9
After trying the whole day to get more into awk and arrays, my code now roughly does (I think) what I was hoping for: matching two files based on a common column and then adding another column from file 1 to file 2. This has been asked previously and I tried many different versions, but that it now works is more coincidental I have the impression.
People already tried to help me in related cases
(how to use awk to add specific values to a column based on numeric ranges and print output of user-defined function in awk gives unexpected token error and more)
but the different solutions collided in my head and now there is a mess.
Though my code is somewhat working, it now prints out matching lines twice (?) and it is also quite slow. I'm sure there is a lot to optimize, could you give some hints on what I'm actually doing and how to improve it? This is now only for one pair of files, I have about a thousand of these pairs.
contig_lengths_cut.txt (300.000 lines):
k141_157024 1 1011
k141_158290 1 462
k141_158291 1 1648
k141_158292 1 329
k141_158293 1 534
k141_158294 1 497
k141_158295 1 418482
k141_186288 1 324
k141_186289 1 340
k141_186290 1 390
k141_186291 1 206156
k141_186292 1 491
k141_186293 1 759
k141_186294 1 4885
k141_186295 1 2736
k141_185742 1 377
k141_185743 1 6775
k141_185744 1 301
gene_length.txt (50 to 300 lines)
k141_185743 1184 gene=phnM_10
k141_186291 1247 gene=phnM_11
k141_186291 1226 gene=phnM_12
k141_157024 350 gene=phnM_9
k141_158295 1160 gene=phnM_10
k141_158295 1145 gene=phnM_11
k141_247338 410 gene=phnM_1
my code:
awk 'NR==FNR {
contig[$1]=$3; next};
{for (k in contig)
if ($3 ~ contig[k]) print $0, contig[$1] }'
contig_lengths_cut.txt gene_length.txt
current output is:
#with the updated data it is not working at all, if I add more lines to the #sample data, it works again...something is going spectacularly wrong
my desired output is:
k141_185743 1184 gene=phnM_10 6775
k141_186291 1247 gene=phnM_11 206156
k141_186291 1226 gene=phnM_12 206156
k141_157024 350 gene=phnM_9 1011
k141_158295 1160 gene=phnM_10 418482
k141_158295 1145 gene=phnM_11 418482
#k141_247338 410 gene=phnM_1 #no match, don't print
I assume that contig[$1]=$3 means (only for the first file) that the first column of file is used as index and the third column as assigned value?
and for all such elements in the array contig, the third column from the second file is used for matching? Which would not make too much sense in my view. But if I use the first column, I get 100s of identical entries, whereas as shown I get the desired number of lines.
Finally I print the whole line of the second file + the index column of the array, which represents the third column of the first file, correct?
Sorry for the mess, please help me to understand what I'm doing here so I don't have to ask that frequent anymore ;-)
your input samples don't provide enough data to test and match your output. But, I think I got what your issue is. You're looking for a regex match of values instead of an exact match on keys. If you change your script to
awk 'NR==FNR {contig[$1]=$3; next}
$1 in contig {print $0, contig[$1]}' contig_lengths_cut.txt gene_length.txt
should work fine. However, it's not tested due to lack of testable data.
In terms of speed, if your files are not sorted, this is as fast as it will get. You can perhaps split the file1 into chunks do a parallel run for all file1 chunks against file2 and combine the results.
If you want to debug your original code, add contig[k] to the print statement.
I am working with a publicly available database in which four files are there : They are all .txt documents. How can I put them in a .mat format ? I am giving a simple example:
A.txt file
1 2 3 4 5 6
7 8 9 1 2 3
4 5 6 7 8 9
1 2 3 4 9 8
So I need to form a matrix with 4 rows and 6 columns. The data in the txt format is separated by 'space' delimiter. The rows are separated by 'newline'. Typically the .txt documents that I will handle will have sizes 130x1000, 3200x58, etc. Can anyone please help me regarding this? The publicly database is available at : click link. Please download the dataset under the topic "Multimodal Texture Dataset".
You can load the .txt file into MatLab:
load audio.txt
then save them
save audio audio
(the first "audio" is the ".mat" file, the second "audio" is the name of the variable stored in it.
Hope this helps.
I have a custom file which contains data in a format like below
prop1: value1
prop2: value2
prop3: value 2
Table Instance 1
A B C D E
10 11 12 13 14
12 13 11 12 20
Table Instance 2
X Y Z
1 3 4
3 4 0
Table Instance 3
P R S
2 3 5
5 5 0
I want to be able to parse this file and map the contents to a POCO. I was really excited about working with CSV type provider in F#, but then I quickly realized that it might be not possible to use that in my case. Should I have to write my own parser in this case? (Iterate through each like and parse the values into its appropriate properties in POCO)
Thanks
Kay
If that's a one-of file format, I would just write a parser by hand. Split the file into separate tables, throw away the title and header, then String.Split each row and parse the resulting array into a record type specific for the table.
If that file format is more or less standardized and you expect that you'll need to parse it more often and in different contexts (and/or you're feeling adventurous), you can always write your own type provider.