How I can convert a shape file (GIS) to text?
or, How I can extract the information in a shape file?
If you are willing to write some code (and you will probably need to anyway as there is quite a bit of information in a shapefile, not all of it of interest for any given application) check out shapelib. It has bindings to many scripting languages. It also builds dbfdump which is an executable for dumpint the dbffiles and shpdump that dumps the shp files.
Also of interest if you program in R is the maptools package.
Mapwindow (http://mapwindow.org/) is free, open source, and has a convert shp to csv feature.
The csv files it produces are a little strange, but you should be able to manage.
Windows only.
look into Gdal/ogr bindings
I wrote a small app that can convert your shapefile to KML. Not exactly text but human readable and shareable across map applications. http://www.reimers.dk/files/folders/google_maps/entry328.aspx
Try to use MyGeodata GIS Data Formats and Corrdinate systems Converter - online. It uses OGR library mentioned here and alows to convert most used GIS formats to any other GIS format - so for you for example from ESRI ShapeFile to GeoJSON, GML, CSV or other text-based formats.
There is a web page to view the contents:
http://webprocessresults.com/pgsActv/ShpDump.aspx
The same site offers a free desktop (windows) version:
http://webprocessresults.com/svcs/FreeStuff/FreeStuff.aspx
If you are using arcGIS and the shapefile is composed of several files you can try just opening the .dbf file in excel and saving it into another format (e.g. csv)- I've done this plenty of times and haven't had any ill effects and its a pretty quick and easy method of converting your shapefiles or doing any drastic edits saving as csv then reimporting them back into GIS for saving as a new shapefile. I will say this is and inelegant solution though ;)
You can find most of the shp (Shape File) format detailed here: http://en.wikipedia.org/wiki/Shapefile. The full specification is here: http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf.
The shp file format is very simple, but be careful to mind that the length fields are for 16-bit words not 8-bit words. If you forget this you will spend a bit of time debugging what is going wrong when trying to parse out the records.
The dbf generally contains information associated with each shape. You can also parse the dbf file but you will have to roll your own reader. I have done it before, but the easiest may be to load the dbf up into some spreadsheet application and then save it as a csv file then load that. Also, if I remember correctly you have to be careful as some of the sites out there detailing the dbf can be a little off. It had something to do with a different version where some fields are a little different. So if you are rolling your own and you get stuck be mindful that you may be reading it correctly but it is differing from the specification you are using. I think the solution was that was to return to Google and search up some different docs and finally detailed the version I was reading.
The shp and dbf are linked by the record index. The first record in the shp is linked with the first record in the dbf and so forth.
You can fairly easily find format specifications for dbf such as here: http://www.clicketyclick.dk/databases/xbase/format/index.html. If you are willing to roll your own it will not be too much of a project.
In any regard no matter if you choose to roll your own reader for dbf or shp you will need to be mindful of the fields as some are big and others little endian byte ordering. I think this only applies to the shp file.
You can open dbf file and copy content to another format, for example odt or XLSX.
For open dbf file recommend to use LibreOffice Calc
If you are really in need of only the content in the shapefile., then looking for the .dbf file would give you a better view. You can directly open ".dbf" file with any excel viewer to look for the content in the shapefile.
Related
We’ve got time-stamped directories containing text files, stored in HDFS.
We can regularly get new files added, so we’re using a FileSource (Flink 1.14.4, and a streaming job) with a monitoring duration, so that it continuously picks up any new files.
The challenge is that we need to include the parent directory’s timestamp in the output, for doing time-window joins of this enrichment data with another stream.
Previously I could extend the input format to extract path information, and emit a Tuple2<LongWritable, Text> (see my SO answer to a question about doing that).
But with the new FileSource architecture, I’m really not sure if it’s possible, or if so, the right way to go about doing it.
I’ve wandered through the source code (FileSource, AbstractFileSource, SourceReader, FileSourceReader, FileSourceSplit, ad nauseam) but haven’t seen any happy path to making that all work.
There might be a way using some really ugly hacks to TextLineFormat, where it would reverse engineer the FSDataInputStream to try to find information about the original file, but feels very fragile.
Any suggestions?
A simple but, heh, still weird question. Hope in good section, couldn't find decent answer in whole internet.
First of all, it looks strongly like COBOL (ACUCOBOL?), but I am not sure.
I have binary files with extensions: .AC, .vix, .SC; several MBytes each. Most of files are in pairs eg. ADDRESSES.AC + ADDRESSES.vix or COMPANIES.SC + COMPANIES.vix.
In the middle of these files I can see parts of records, however it seems to be a set of binary files.
No human readable indexes, maps, dialects, configuration files, headers that I know exists in Cobol databases - nothing to be parsed using some normal text tools. No CPY, RDD, XFD files as well. Just files with a lot of binary data and parts of records/ids (?) from time to time. So I can determine e.g., that one file contains set of addresses, next apparently sales, next client data etc.
Questions are:
How to determine which version of COBOL database am I using? (Mostly to obtain a proper tool to extract the data.)
How to convert this database to something that can be parsed and moved to whatever else - even Excel?
I have no access to computer that was working with this database as it is deep in the litter bin from many years, nothing else remained, just one folder with database files.
Had anybody the same problem?
Here is sample:
How to determine which version of COBOL database am I using?
You aren't using a database but ISAM files, very likely ACUCOBOL GT file format 5. For details about the format see official documentation.
Mostly to obtain a proper tool to extract the data.
The proper tool would be vutil and the command vutil -u -t ADDRESSES.AC ADDRESSES.TXT which will present you with a text file that is very likely in fixed-length form (variable form is relative uncommon) -> step 1.
As the data likely contains binary fields you have to investigate the data to check the actual format/record layout --> step2, and calculate the decimal values from the binary fields --> step 3.
But there are tools out there that help you with step 2 and 3, I can recommend RecordEditor where you'll see the data, can set field widths/types (defining a record layout, similar to Excel Import, but also allows you to use binary COBOL types) and convert the resulting file to CSV.
If you don't have access to vutil (or vutil32.exe on Windows) you may find someone that has access to this tool and convert the data for you; or get an evaluation version (would be an old download, the new product owner of ACUCOBOL-GT is MicroFocus and only provides evaluation versions of their not-compatible "Visual COBOL" product).
Alternatively you can reverse-engineer the format (the record layout is in the vix-file, open it with an hex-editor and dive in), but this likely is a bigger task...
Summary:
decide how to do step 1, vutil/vutil32.exe is the easiest way
1: convert the data to text format
2: investigate the files and inspect for the record layout (field width, type)
3: load the file, convert binary fields, export as csv
You definitely have the vision indexed data files as you will see the .vix files which match, if you do not have a .vix file then it is a relative file with a set no of records.
If you have Acubench under the Tools Menu there is an option for Vision File Utility, from there you can Unload your Vision Data to a text file which is tab delimited.
From there you can import to Excel as a tab delimited file and then re-save as a csv file.
So after all I suppose this was ISAM version.
To untangle this the following tools were needed:
First of all some migration tool. In my case it was ISMIGRATE GUI WIzard:
This package comes from isCOBOL 2017 R1, you can find some free demos to download. Note, that you don't need install all package, just this migration tool.
Then you can use ctree2 -> jisam conversion or just try all available options (not every one is available cause of missing libraries that are paid)
After conversion you'll end with something like this:
In worse cases there will be some ASCII special chars, but you can get rid of them using some tools like Notepad++, or even Excel. I mean to search for them by HEX code and replace by space (note, that space will replace one missing character to preserve column ordering)
Note, that you can as well use special function of importing ASCII text files from MS Access/MS Excel. It is really helpful.
to position everything correctly, cut this file and do all adjustements (and export to e.g. csv) you can use http://record-editor.sourceforge.net
that is free. Note, that after several trials I've noticed, that other even paid tools rather won't help you. The problem is in 1st point: conversion.
To be sure that everything works fine you can run even MS Access or similar to see how to create foreign keys and reverse-engineer all database. Having working preview it will be easy to do that on larger scale e.g. in PostgreSQL/Oracle.
That's it. I hope it'll be useful for somebody.
What was UNSUCCESSFUL:
Estabilishing Actian Vector server; it is really great and free tool, but it won't help you significantly
Trying some online tools (despite of who knows where data will be sent)
Any other ASCII editors, cause in my case many of them crashed, i suppose because of size of files and because of some control chars (?)
The situation
I use Labview 2012 on Windows 7
my test result data is written in text files. First, information about the test is written in the file (product type, test type, test conditions etc) and after that the logged data is written each second.
All data files are stored in folders, sorted to date and the names of the files contain some info about the test
I have years worth of data files and my search function now only works on the file names (opening each file to look for search terms costs too much time)
The goal
To write metadata (additional properties like Word files can have) with the text files so that I can implement a search function to quickly find the file that I need
I found here the way to write/read metadata for images, but I need it for text files or something similar.
You would need to be writing to data files that supports meta data to begin with (such as LabVIEW TDMS or datalog file formats). In a similar situation, I would simply use a separate file with the same name, but a different extension for example. Then you can index those file names, and if you want the data you just swap the meta data filename extension and you are good to go.
I would not bother with files and use database for results logging. It may be not what you wiling to do, but this is the ultimate solution for the search problem and it open a lot of data analytics possibilities.
The metadata in Word files is from a feature called "Alternative Data Streams" which is actually a function of NTFS. You can learn more about it here.
I can't say I've ever used this feature. I don't think there is a nice API for LabVIEW, but one could certainly be made. With some research you should be able to play around with this feature and see if it really makes finding files any easier. My understanding is that the data can be lost if transferred over the network or onto a non-NTFS thumbdrive.
The program I work on has several shapefiles, with quite a few attributes. At the moment they are stored in our version control (Subversion) as compressed blobs (dbf.gz, shp.gz and shx.gz). This is how they are used by the program, but it's extremely inconvenient for versioning purposes. We get no information about changes to entries, or attributes - just that something, somewhere in the file has changed. No useful diff.
The DBF is the one that has the attributes. I was thinking maybe we could store it as CSV and then as part of the build process, convert it to DBF and do ??? (to be determined) to make it a valid shapefile, then make the zipped version as it currently uses.
Another approach might be to remove nearly all the attributes from the shapefile, store those in CSV/YAML/whatever (which can be versioned nicely), and either look them up by the shape IDs or try to attach them to our objects after they have been instantiated from shapefiles, something like that.
But maybe folks with more experience with shapefiles have better ideas?
The DBF you are referring to starting your second paragraph has the attributes. Why not dump out the table on a "per Shape" basis to an XML style file and use THAT for the subversion. If you are actually working within Visual Foxpro (which also uses DBF style files too), you could use the function CursorToXML() and just run that through a loop of distinct shapes and dump out to each respective XML file. Then, when reading it back in.... XMLToCursor() of the per file shape.
I am assigned a project to create a new binary file structure for storing co-ordinate information coming out of a 3d CAD/CAM software. I would highly appreciate if you kindly point out some articles or books regarding creation of new binary file format. Thanks for your time :)
I would start by taking a look at other similar file formats on wotsit.org. The site is for various different file formats and it contains links to their specifications.
By looking at other file formats you'll get ideas about how best to format and present information about your specification.
There's a universal binary (and compact) notation called ASN.1. It's used widely and there are books about it available. ASN.1 can be compared to XML, but on some lower (more primitive yet more flexible) level than XML. And XML, especially binary XML mentioned above, would be of great help to you too.
Also, if you have more than just one sequence of data to hold in your file, take a look at Solid File System as a container for several data streams in one file.
If I had the same assignment, I would inspect something already existing, like .OBJ and then try to implement something similiar, probably with minor changes.
Short answer: Don't. Use XML or a text format instead for readability, extensability amd portability.
Longer answer: CAD/CAM has loads of 'legacy' formats around. I'd look to use one of those (possibly extending it if necessary). And if there's nothing suitable, and XML is considered to bloaty and slow, look at Binary XML formats instead.
I think that what you really need is to figure out what data you have to save. Then you load it into memory and serialize that memory
Here is a tutorial on serialization in C++. This page also addresses many issues with saving data