How to version shapefiles - versioning

The program I work on has several shapefiles, with quite a few attributes. At the moment they are stored in our version control (Subversion) as compressed blobs (dbf.gz, shp.gz and shx.gz). This is how they are used by the program, but it's extremely inconvenient for versioning purposes. We get no information about changes to entries, or attributes - just that something, somewhere in the file has changed. No useful diff.
The DBF is the one that has the attributes. I was thinking maybe we could store it as CSV and then as part of the build process, convert it to DBF and do ??? (to be determined) to make it a valid shapefile, then make the zipped version as it currently uses.
Another approach might be to remove nearly all the attributes from the shapefile, store those in CSV/YAML/whatever (which can be versioned nicely), and either look them up by the shape IDs or try to attach them to our objects after they have been instantiated from shapefiles, something like that.
But maybe folks with more experience with shapefiles have better ideas?

The DBF you are referring to starting your second paragraph has the attributes. Why not dump out the table on a "per Shape" basis to an XML style file and use THAT for the subversion. If you are actually working within Visual Foxpro (which also uses DBF style files too), you could use the function CursorToXML() and just run that through a loop of distinct shapes and dump out to each respective XML file. Then, when reading it back in.... XMLToCursor() of the per file shape.

Related

How to include path information in data extracted from text files with FileSource

We’ve got time-stamped directories containing text files, stored in HDFS.
We can regularly get new files added, so we’re using a FileSource (Flink 1.14.4, and a streaming job) with a monitoring duration, so that it continuously picks up any new files.
The challenge is that we need to include the parent directory’s timestamp in the output, for doing time-window joins of this enrichment data with another stream.
Previously I could extend the input format to extract path information, and emit a Tuple2<LongWritable, Text> (see my SO answer to a question about doing that).
But with the new FileSource architecture, I’m really not sure if it’s possible, or if so, the right way to go about doing it.
I’ve wandered through the source code (FileSource, AbstractFileSource, SourceReader, FileSourceReader, FileSourceSplit, ad nauseam) but haven’t seen any happy path to making that all work.
There might be a way using some really ugly hacks to TextLineFormat, where it would reverse engineer the FSDataInputStream to try to find information about the original file, but feels very fragile.
Any suggestions?

Which COBOL database do I have?

A simple but, heh, still weird question. Hope in good section, couldn't find decent answer in whole internet.
First of all, it looks strongly like COBOL (ACUCOBOL?), but I am not sure.
I have binary files with extensions: .AC, .vix, .SC; several MBytes each. Most of files are in pairs eg. ADDRESSES.AC + ADDRESSES.vix or COMPANIES.SC + COMPANIES.vix.
In the middle of these files I can see parts of records, however it seems to be a set of binary files.
No human readable indexes, maps, dialects, configuration files, headers that I know exists in Cobol databases - nothing to be parsed using some normal text tools. No CPY, RDD, XFD files as well. Just files with a lot of binary data and parts of records/ids (?) from time to time. So I can determine e.g., that one file contains set of addresses, next apparently sales, next client data etc.
Questions are:
How to determine which version of COBOL database am I using? (Mostly to obtain a proper tool to extract the data.)
How to convert this database to something that can be parsed and moved to whatever else - even Excel?
I have no access to computer that was working with this database as it is deep in the litter bin from many years, nothing else remained, just one folder with database files.
Had anybody the same problem?
Here is sample:
How to determine which version of COBOL database am I using?
You aren't using a database but ISAM files, very likely ACUCOBOL GT file format 5. For details about the format see official documentation.
Mostly to obtain a proper tool to extract the data.
The proper tool would be vutil and the command vutil -u -t ADDRESSES.AC ADDRESSES.TXT which will present you with a text file that is very likely in fixed-length form (variable form is relative uncommon) -> step 1.
As the data likely contains binary fields you have to investigate the data to check the actual format/record layout --> step2, and calculate the decimal values from the binary fields --> step 3.
But there are tools out there that help you with step 2 and 3, I can recommend RecordEditor where you'll see the data, can set field widths/types (defining a record layout, similar to Excel Import, but also allows you to use binary COBOL types) and convert the resulting file to CSV.
If you don't have access to vutil (or vutil32.exe on Windows) you may find someone that has access to this tool and convert the data for you; or get an evaluation version (would be an old download, the new product owner of ACUCOBOL-GT is MicroFocus and only provides evaluation versions of their not-compatible "Visual COBOL" product).
Alternatively you can reverse-engineer the format (the record layout is in the vix-file, open it with an hex-editor and dive in), but this likely is a bigger task...
Summary:
decide how to do step 1, vutil/vutil32.exe is the easiest way
1: convert the data to text format
2: investigate the files and inspect for the record layout (field width, type)
3: load the file, convert binary fields, export as csv
You definitely have the vision indexed data files as you will see the .vix files which match, if you do not have a .vix file then it is a relative file with a set no of records.
If you have Acubench under the Tools Menu there is an option for Vision File Utility, from there you can Unload your Vision Data to a text file which is tab delimited.
From there you can import to Excel as a tab delimited file and then re-save as a csv file.
So after all I suppose this was ISAM version.
To untangle this the following tools were needed:
First of all some migration tool. In my case it was ISMIGRATE GUI WIzard:
This package comes from isCOBOL 2017 R1, you can find some free demos to download. Note, that you don't need install all package, just this migration tool.
Then you can use ctree2 -> jisam conversion or just try all available options (not every one is available cause of missing libraries that are paid)
After conversion you'll end with something like this:
In worse cases there will be some ASCII special chars, but you can get rid of them using some tools like Notepad++, or even Excel. I mean to search for them by HEX code and replace by space (note, that space will replace one missing character to preserve column ordering)
Note, that you can as well use special function of importing ASCII text files from MS Access/MS Excel. It is really helpful.
to position everything correctly, cut this file and do all adjustements (and export to e.g. csv) you can use http://record-editor.sourceforge.net
that is free. Note, that after several trials I've noticed, that other even paid tools rather won't help you. The problem is in 1st point: conversion.
To be sure that everything works fine you can run even MS Access or similar to see how to create foreign keys and reverse-engineer all database. Having working preview it will be easy to do that on larger scale e.g. in PostgreSQL/Oracle.
That's it. I hope it'll be useful for somebody.
What was UNSUCCESSFUL:
Estabilishing Actian Vector server; it is really great and free tool, but it won't help you significantly
Trying some online tools (despite of who knows where data will be sent)
Any other ASCII editors, cause in my case many of them crashed, i suppose because of size of files and because of some control chars (?)

Is it possible to write/read metadata for a text file using Labview?

The situation
I use Labview 2012 on Windows 7
my test result data is written in text files. First, information about the test is written in the file (product type, test type, test conditions etc) and after that the logged data is written each second.
All data files are stored in folders, sorted to date and the names of the files contain some info about the test
I have years worth of data files and my search function now only works on the file names (opening each file to look for search terms costs too much time)
The goal
To write metadata (additional properties like Word files can have) with the text files so that I can implement a search function to quickly find the file that I need
I found here the way to write/read metadata for images, but I need it for text files or something similar.
You would need to be writing to data files that supports meta data to begin with (such as LabVIEW TDMS or datalog file formats). In a similar situation, I would simply use a separate file with the same name, but a different extension for example. Then you can index those file names, and if you want the data you just swap the meta data filename extension and you are good to go.
I would not bother with files and use database for results logging. It may be not what you wiling to do, but this is the ultimate solution for the search problem and it open a lot of data analytics possibilities.
The metadata in Word files is from a feature called "Alternative Data Streams" which is actually a function of NTFS. You can learn more about it here.
I can't say I've ever used this feature. I don't think there is a nice API for LabVIEW, but one could certainly be made. With some research you should be able to play around with this feature and see if it really makes finding files any easier. My understanding is that the data can be lost if transferred over the network or onto a non-NTFS thumbdrive.

SCM for databases

For a data-driven approach, e.g. for games, data that goes into a database is part of the (generalized) source code for the project. What is the best strategy for version controlling database contents? note: not schema. It needs to have all the properties of a SCM like rollback and branching.
Simple text files to hold the contents work well for version control. Pick a format that is easy to read and write, comma-separated is easiest if you don't have like 500 columns or anything like that.
But this leaves the matter of loading it. If you have a simple situation, your upgrade/install script can truncate the source tables and reload. If that is no good because of foreign keys, you have to code a little routine that goes through the text files line-by-line and inserts new values and possibly overwrites changed values.

Database for record storage with revisioning

I've recently been tasked with improving a records database that consists of the following:
All records are stored in one giant
XML file.
Any changes or updates to
these records are done by hand within
this XML file.
Each record contains
an 'Updated' datetime stamp to keep
some form of revision control.
The entire XML file is also checked into
a subversion repository to keep
revision control for the entire
collection.
This records database is strictly for internal use only and does not face any public interface.
I'm a bit of a newbie to database design, but the above method feels a little cumbersome. I was thinking of moving all of the above to some form of perhaps a SQLite database and building some form of a front end to update/remove/view entries while keeping track of any changes to that DB. Are there better ways to do this or is it pretty standard to have a system like is already in place?
Putting the information into a database is a good solution. Another decent solution is just making each record its own file and using a revision-control system to track the changes to each individual record. This is much more efficient than having one glommed-together file :-).
Doesnt actually sound that bad! Depends how often its updated and how many programs read the XML.
I would try to approaches depending on the above.
First get one of the nifty XML validating editors like XML spy and define an XML Schema if or xsd if you havent already got one. You you now have a clean user interface that can update and validate the file. Continue to use the revision control to system to keep a history.
Secondly -- if the updates are really simple write a quick Java/C#/VB or whatever program to update the XML -- otherwise carry on as before.

Resources