Is it possible to output formated text in a text file? - c

I`m working at a little text editor. My application is a winapi one in C. The idea is to write text in a large textbox(like in notepad) and then when I press a button it will take all text into a buffer, format it after some rules and then put it in a .txt file.
For example, if my input is:
Anne \red(got) \blue(\bold(apples)) and \italic(\bold(snails!))
After I parse it, it`s possible to put it into a .txt file and after I open it to see it like this?
I want to thank everyone for their time. I got exactly what answer I wanted. Everyone here rocks

I think that you are programming for fun, just for the pleasure of it, and with the perspective of learning more. If that is the objective, then it is okay to invent your own formats and essay your own solutions.
The problem presented can be twofold:
does the format results need to be shown in the editor itself?
or do you just need to do something that is going to be rendered in an external program?
If you are after the first possibility, then you need some Win32 (given your environment) component that will show the formatting. That component is RichEdit, and it implements RTF, a codification that can be saved to a text file, and which is more or less standard.
If you have the second possibility in mind, then you can choose from a variety of codifications. You would just be creating a text editor, probably with some helpers that write part of the commands for the user. For example, you could be creating a HTML editor, or a RTF editor.
There is a third possibility, though. You create your own codification, and when saving, you translate that codification to HTML, and then open the document in a web browser.
Say that you have:
\bold(hello), world.
You would translate that to:
<html><body><b>hello</b>, world.</body></html>
The possibilities, as you can see, are inifinite.
Hope this helps.

Related

How can I detect visual blocks in a PDF?

I'm trying to OCR resumes. My first problem is, before OCR, to get the main blocks of a document.
Since all the resumes have "visual blocks" (referring to professional experience, skills, languages, hobbies, whatever ...), I wonder if there's any open source solution to "split" into "blocks" a document, obviously no matter the layout design (that's where some kind of AI will work, I assume)
Thank you
First decompress your pdf using zlib.
you will then be able to see the pdf in a readable format - https://web.archive.org/web/20141010035745/http://gnupdf.org/Introduction_to_PDF#A_first_example
The pdf format is kind of similar to postscript.
also try converting your pdf to postscript to see how contents are arranged.
you can decompress the pdf using pdf-parser https://blog.didierstevens.com/2008/10/30/pdf-parserpy/
try this as well - https://gist.github.com/averagesecurityguy/ba8d9ed3c59c1deffbd1390dafa5a3c2
Once you can see how your data is presented => you can then start applying alogorithms to extract more meaning.

Bank for text files of different kinds?

This isn't really a programming question, but I want to do a little programming project, and for it I need a big text file that looks like this:
one
two
three
four
...
thirteen
fourteen
...
one hundred
...
The longer the list the better.
Is there some website that has loads of different text files such as this one available for free?
you don't need specifically a website to create a text file for you, you can create one youself. Try using notepad if you're on windows.
Do you specifically need a text file that lists sequential numbers, or just a big text file with lots of lines of data?
If you just want a big file of randomish data you could use a lorum ipsum generator such as http://www.lipsum.com/
Just enter how much data you want in the file and download.

format document WebPage via Windows Forms Application?

So, I'm building a windows forms application that uses a StreamReader/StreamWriter to read each line of the .aspx, .ascx and .master pages on our asp.net website. It then removes certain properties and such from the controls through string manipulation, and writes the result back (overriding the page's markup with the edited markup). The problem is some of these pages are being written as one continuous line.
I've been unable to find anyway to call the visual studios 'Format Document' function. I found this question that would likely accomplish my goal if I weren't trying to do this from my Windows Form Application (as it's an automated process).
Any tips or points in the right direction would be appreciated.
A quick-and-dirty-solution would be (and I don't recommend it):
content = content.Replace("></", ">></").Replace("><", ">\n\t<").Replace(">></", "></");
content is the string that holds the web content.
First and last replacements are to avoid the second replacement to add newlines between something like this <tag></tag>. The above code of course has some flaws. Something like <tag1><tag2 /></tag1> will not be formatted correctly. You could avoid this by pre-replacing /></ with something you can safely re-replace at the end.
You may also want to replace \n with \r\n perhaps.

find part of rtf file with bold or italic

new to stackoverflow so forgive me if I make any mistake.
I'm new to programming and scripting, although I have messed a little little bit with python and understand the basic of filemaker pro.
This is my problem: I have a full database that I built over the years. Database is just a way to say because, actually, it's a huge amount of rtf files with topics inside it. Now that I've built some real database I want to transfer my data from one to another. Just one table
The real problem is: in my old rtf days, I used to store my data in a easy to view manner, meaning that all my titles were bold/italic/underlined, and the text itself wasn't. So, I have aproximately 200 rtf files, each with 10-20 (sub)topics, waiting to be transfered to a two-columns table (title; content)
I would appreciate if anyone have a better idea than mine. My idea was to run a script that found the bold/italic text, copy it to my table field, find the not-bold, copy to the other field, etc. But I'm unable to find the answer to my simple question: how do I search for (and select) bold text?
I'd like to use applescript (is what I'm sort of comfortable with), but could use some other stuff.
You can try something like:
tell application "TextEdit"
set boldText to attribute runs of text of document 1 whose font contains "Bold"
end tell

Huge amount of plaintext data for parsing experiment

I am developing a parser in ruby which parses some nonuniform text data. Can anybody tell me, where I can get a good number of plaintext data for that?
Here's you'll get a list of many:
http://www.quora.com/Data/Where-can-I-get-large-datasets-open-to-the-public
And my fav is:
http://ftp.sunet.se/mirror/archive/ftp.sunet.se/pub/tv+movies/imdb/
You could scrape Wikipedia (or just run a bunch of it through lynx -dump). That would also give you a vast source of non-English text as well. Project Gutenberg would be another good source of large amounts of plain text.

Resources