Paraview - Dataset locations of builtin demos / Sources - dataset

Paraview gets packaged with some examples. One can see them in Sources menu. Examples such as box, 2d glyph, sphere, Outline.
I'm trying to figure out how to represent these data sets, which should be .vtk files (or .html files). Does anyone know where the .vtk files for each of them are? If that's not available, is there an option, where i can extract these points, or convert my visualization into a .vtk (or .html) file? I've been trying to find out, with no success so far. Thanks in advance.

I found the answer. I just have to choose File -> Save Data -> Save as vtk file. How did i not see this before.

Related

Can't use PNG files larger than 10kb in pdf generation

I am trying to generate pdf files using renderToStaticMarkup from react-dom/server library and every time I am using png files larger than 10KB it doesn't display them. I read the documentation for any workarounds, but found nothing. It does not allow for other file types like .svg either. Is there any way for me to add good quality images to pdfs or I have to go by without them? Thanks for all the answers in advance!
PS. When I load images from external links they display properly.

Machine learning: file parsing and prediction class file

Good morning all,
I am currently on a project in the field of Machine Learning, the goal is to make a supervised classification on a set of data. My data is a large number of pdf files, each file has a specific class, the goal is to use these files as a training dataset in order to do class prediction on new files.
My problem is that I don't know how to build my training dataset since the classification algorithm must train on the content of each file and in my training data frame I have the class of each file and the name of the file in question. How do I include the content of each pdf file in my training Data Frame?
Thank you in advance for your help
PDF files are usually characterized by text, images, charts or whatever, and so they cannot be easily transformed into vectors of numbers that can be given to a machine learning algorithm. First you need to extract information of interest from your files.
In this regard, you might want to try first some libraries which can be used to extract information, and see what happens. For Python, a good start can be PyPDF2. You can find a tutorial here.
If this is does not work as expected, my advice would be to try to use some OCR tools, which directly read the pdf as an image to extract information. In pytesseract is one of the most used, but it is not the only one.

Download wiki in one or more files

I would like to load data from wikipedia for some task in Hadoop. I found some links: http://www.kiwix.org/wiki/Main_Page#Wikipedia_files, https://archive.org/details/enwiki-20160113. But I am not sure in which format it will be and how to work with that. So, question is does anybody know if it is possible to download wikipedia in one or more txt files?
Well, you can download the most recent complete (another dump is in progress at 20161101) dumps of wikipedia content here: https://dumps.wikimedia.org/enwiki/20161020/
Note I don't think this includes media files themselves, and that this example is only the English site - the other sites are available there too.

Simplest way to modify a document "template" and print via WPF?

The Situation
I have a WPF program that I want to print several documents from using some data. Currently these documents exist as an Excel Spreadsheet and a Word Document.
What I have tried
Opened XPS file (Saved as XPS from Excel) as a zip and pulled out the Page (it only consists of a single page) and slapped it into a Window with a grid, just for a test. OMG!! The resources that could not be found and red squiglies every where. Fonts that are specified in the XPS are represented in a odttf file which WPF does not seem to like. Renaming it to .ttf doesn't appear to work. The layout appeared correctly, grid lines and what not, so that is hopeful.
What I really would rather not have to do
Recreate the files as flow document, XPS, or other XAML objects by hand. The layout is pretty involved for the Excel Spreadsheet document. The Word doc is not so bad.
So really I just need to know: From the two inputs that I am using (Word Document, Excel Spreadsheet) how best would I get these into a format that I could easily print from WPF. Currently I have some code snippets that would allow me to open Excel, open the spreadsheet, put the data into the specified cells, print, issue a close command, check that the program unload and kill it if necessary. I don't want to do that anymore though. It is messy and can be buggy as well as requiring the Office Interop assemblies and other stuff to be installed.
I found an article here which explains some things that I had previously not realized.

File format of CF10-jpg

While working on a tool that allows to exchange images of several third-party applications and thus creating individual "skins" for those applications, I have stumbled across a jpg-format about which I cannot seem to find any decent informations.
When looking at it in a hex-editor, it starts with the tag "CF10". Searching the internet has only provided a tool that is able to handle these kind of files, without any additional informations.
Does anyone have any further informations about this type of jpg-format?
file(1) should give you some useful information. You can also use ImageMagick's identify(1) program (optionally with the -verbose option) to get even more details about the file. See the example on that page for a good idea of what information it provides.
You could also try and see what the Droid identification tool says about that file.
CF stands for "Compression Factor". CF-10 means factor ten, and I don't think it's different from any "standard" jpeg.
DROID gives it as being a "JTIP (JPEG Tiled Image Pyramid)". Some info from http://www.bcr.org/cdp/digitaltb/digital_imaging/formats.html :
JTIP (JPEG Tiled Image Pyramid) is similar to GridPrix. It offers multiple layers of higher and higher resolutions. Each layer is further divided into tiles. A user can zoom into these tiles, or request a corresponding tile at a higher resolution.

Resources