Basically I want to write data to a file at a particular position and don't want to load data in to the memory for sorting it. For example if I have in a file:
FILE.txt
Andy dsoza
Arpit Raj
Karishma Shah
Pratik Mehta
Zppy andre
And i want to insert a contact Barbie patel then i will read the first letter in the file after every line , so Barbie should be inserted after Arpit and before Karishma so the file after editing should be:
FILE.txt
Andy dsoza
Arpit Raj
Barbie Patel
Karishma Shah
Pratik Mehta
Zppy andre
But fseek drives me to that postion but dosen't help me insert when i use fprintf/fwrite/putc. It replaces the byte but does not insert before that particular byte.
Loading all the data in to the memmory and sorting it out would not be good if i wold have lot of contacts in future.
You won't be able to directly insert into a file without loading it into memory. How you are to manage longer file depends on efficient design approach.
One approach would be to use different files.
You cannot insert data in the middle of a file. You have to first read everything that's in the file from that point to the end, overwrite and then append what you read.
Related
I am trying to load a dataset in weka, I have tried many solutions such as arff format, comas etc. but it was all a failure. Could any of you give me a working solution or load this dataset according to the format.
Here is a link to dataset
Instead of using Weka's functionality for reading CSV files, you could use ADAMS (developed at the same university; I'm the lead developer) instead.
Download the adams-ml-app snapshot and then use the Weka Investigator to load/save the file:
Load it as ADAMS Spreadsheets (.csv, .csv.gz)
Save it as Arff data files (.arff, .arff.gz) or Simple ARFF data files (.arff, .arff.gz)
The Reviews column contains an erroneous 3.0M, which prevents it from becoming numeric.
If you want to have an introduction to the Weka Investigator, then take a look at my talk from the Weka User Conference 2021: Taking Weka to the next level with ADAMS .
There are too many issues with lines in this file.
In line 23, I eliminated the odd looking brackets.
I removed all single quotes (')
I eliminated all repeated double quotes ("")
In line 10474 the first two fields (before the number) didn't seem to be separated, so I added a comma.
This allowed the file to go through initial screening, but...
The file contains a lot of odd emojis. I started to eliminate them one by one, but there are clearly more of these than I wish to deal with.
Each time I got rid of one, it would read farther into the file, then stop at the next one.
If I just try to read the top of the file, the first 20 lines before we get to any of these problems, it reads fine.
My partial editing can be found here: https://www.dropbox.com/s/ij707mb23dt1jvz/googleplaystore3.csv?dl=0
I think if you clear up the remaining emojis the file should be usable.
Recently I got a requirement to read a file and insert those records into a DB. But, when I looked at the file it is not consistent, and the source team is not in a position to alter it in any way. So, is there a way to read it?
Example of a File:
Record1,Record2,Record3,Record4
Record1,Record2,Record3,Record4
Record1,Record2
Record1,Record2,Record3,Record4
Record1,Record2,Record3,Record4
Record1,Record2
Record1,Record2,Record3,Record4
Any inputs will be appreciated.
Regards,
Vishnu.
If I understand correctly, you have a comma-separated list of values and each new line forms a dataset.
You can use MFL Format Builder with delimiter as a comma and generate a standardized XML document out of your data.
This link has a good tutorial to get you started.
I am currently having issues exporting data from my cursor into a txt file. Unfortunately the txt file has to look a certain way. I have my cursor which I just named "Export" and have to push it into a txt file so that it looks like this.The asterisk also has to be there.
*Col1,Col2
Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,Col9,Col10.
and repeat about 647 times. I have been searching for a good way to do this, but I feel as if my end result is too specific, which I hope isn't true. Any help would be immensely appreciated.
set textmerge on noshow
set textmerge to myfile.txt
select export
scan
\\*<<col1>>,<<col2>><<chr(13)>>
\\<<Col1>>,<<Col2>>,<<Col3>>,<<Col4>>,<<Col5>>,<<Col6>><<chr(13)>>
endscan
set textmerge off
set textmerge to
The line that stops at col6 you would obviously continue in the same way up to col10, I truncated it to fit here.
I currently have the following sample text file:
http://pastebin.com/BasTiD4x
and I need to duplicate the CDS blocks. Essentially the line that has the word "CDS" and the 4 lines after it are part of the CDS block.
I need to insert this duplicated CDS block right before a line that says CDS, and I need to change the word CDS in the duplicated block to mRNA.
Of course this needs to happen every time there is an instance CDS.
A sample output would be here:
http://pastebin.com/mEMAB50t
Essentially for every CDS block, I need an mRNA block that says exactly the same thing.
Would appreciate help with this, never done 4 line insertions and replacements.
Thanks,
Adrian
Sorry for the very specific question. Here is a working solution provided by someone else:
perl -ne 'if (! /^\s/){$ok=0;if($mem){$mem=~s/CDS/mRNA/;print $mem;$mem="";}}$ok=1 if (/\d+\s+CDS/);if($ok){$mem.=$_};print;' exemple
I need a text file to contain every title / title of each topic / title of each item in a .txt file each on its own line.
How can I do this or make this if I have already downloaded a freebase rdf dump?
If possible, I also need a separate text file with each topic's / item's description on a single line each description on its own line.
How can I do that?
I would greatly appreciate it if someone could help me make either of these files from a Freebase rdf dump.
Thanks in Advance!
Filter the RDF dump on the predicate/property ns:type.object.name. If you only want a particular language, also filter by that language e.g. #en.
EDIT: I missed the second part about descriptions being desired as well. Here's a three part regex which will get you all the lines with:
English names
English descriptions
a type of /commmon/topic
Combining the three is left as an exercise for the reader.
zegrep $'\tns:(((type\\.object\\.name|common\\.topic\\.description)\t.*#en)|type\\.object\\.type\tns:common\\.topic)\\.$' freebase-rdf-2013-06-30-00-00.gz | gzip > freebase-rdf-2013-06-30-00-00-names-descriptions.gz
It seems to have a performance issue that I'll have to look at. A simple grep of the entire file takes ~11 min on my laptop, but this has been running several times that. I'll have to look at it later though...