For a data-driven approach, e.g. for games, data that goes into a database is part of the (generalized) source code for the project. What is the best strategy for version controlling database contents? note: not schema. It needs to have all the properties of a SCM like rollback and branching.
Simple text files to hold the contents work well for version control. Pick a format that is easy to read and write, comma-separated is easiest if you don't have like 500 columns or anything like that.
But this leaves the matter of loading it. If you have a simple situation, your upgrade/install script can truncate the source tables and reload. If that is no good because of foreign keys, you have to code a little routine that goes through the text files line-by-line and inserts new values and possibly overwrites changed values.
Related
So I'm thinking to write some small piece of software, which to run/execute ML experiments on a cluster or arbitrary abstracted executor and then save them such that I can view them in real time efficiently. The executor software will have access for writing to the database and will push metrics live. Now, I have not worked too much with databases, thus I'm not sure what is the correct approach for this. Here is a description of what the system should store:
Each experiment will consist of a single piece of code/archive of code such that it can be executed on the remote machine. For now we will assume allow dependencies and etc are installed there. The code will accept command line arguments. The experiment also will consists of a YAML scheme defining the command line arguments. In the code byitself will specify what will be logged in (e.g. I will provide a library in the language for registering channels). Now in terms of logging, you can log numerical values, arrays, text, etc so quite a few types. Each channel will be allowed a single specification (e.g. 2 columns, first int iteration, second float error). The code will also provide special copy of parameters at the end of the experiments.
When one submit an experiments, it will need to provide its unique group name + parameters for execution. This will launch the experiment and log everything.
Implementing this for me is easiest to do with a flat file system. Each project will have a unique name. Each new experiment gets a unique id and folder inside the project. I can store the code there. Each channel gets a file, which for simplicity can be an csv delimeter, with a special schema file describing what type of values are stored there so I can load them there. The final parameters can also be copied in the folder.
However, because of the variety of ways I can do this, and the fact that this might require a separate "table" for each experiment, I have no idea if this is possible in any database systems? Additionally, maybe I'm overseeing something very obvious or maybe not, if you had any experience with this any suggestions/advices are most welcome. The main goal is at the end to be able to serve this to a web interface. Maybe noSQL could accommodate this maybe not (I don't know exactly how those work)?
The data for ML primarily would be unstructured data. That kind of data will not naturally fit into a RDBMS. Essentially a document database like mongodb is far better suited....for such cases.
I'm building a website, and I'm planning to publish various kinds of posts, like tutorials, articles, etc. I'm going to manage it with php, but when it comes to storing the content of each post, the very text that will be displayed, what's a better option: using a separate text file or adding it as an for each entry in the database?
I don't see a reason why not to use the database directly to store the content, but it's the first time I use a DB and that feels kind of wrong.
What's your experience in this matter?
Ok Friends I am visiting this question once again for the benefit of those who will read this answer. After a lot of trial and error I have reached a conclusion that keeping text in database is a lot convenient and easy to manipulate. Thus all my data is now with in database. Previously I had some details in database and the text part in file but now i have moved all to database.
The only problem is that when editing your posts the field like title or tags or subject etc are changed on a simple html form. but for the main content I have created a text area. however i just have to cut and copy it from the text area to my favorite text editor. and after the editing copy and paste it back.
some benefits that forced me to put every thing in database are
EASY SEARCH: you can run quires like mysql LIKE on your text (specially main content).
EASY ESCAPING: you can run commands easily on your data to escape special characters and make it suitable for display etc.
GETTING INPUT FROM USER: if you want the user to give you input it makes sense to save his input in database , escape it and manipulate it as and when required.
Functions like moving tables , back up, merging two records, arranging posts with similar content in sequential order... etc etc all is more easy in database than the file system.
in file system there is always the problem of missing files, different file names, wrong file shown for different title etc etc
I do not escape user input before adding it to database just before display. this way no permanent changes are stored to the text.(i don't know if that's ok or not)
Infact I am also doing something like you. However I have reached the conclusion as explained below (almost the same as mentioned in the answer above me). I hope you must have made the decision by now but still I will explain it so that it is useful for future.
My Solution: I have a table called content_table it contain details about each and every article, post or anything else that I write. The main (text portion) of the articles/post is placed in a directory in a .php or .txt file. When a user clicks on an article to read, a view of the article is created dynamically by using the information in database and then pulling the text part (I call it main content) from the .txt file. The database contain information like _content_id_, creation date, author, catagory (most of this become meta tags).
The two major benefits are:
performance since less load on datbase
editing the text content is easy.
I am giving comments based on my experience ,
Except attachments you can store things in DB, why because managing content,back up, restore ,querying , searching especially full text search will be easy.
Store attached files in some folder and keep path in DB tables.
Even more if you r willing to implement search inside attachments you can go for some search engine like lucene which is efficient to search static contents.
keeping attachment in DB or in file system is upto the level of important to the files.
The program I work on has several shapefiles, with quite a few attributes. At the moment they are stored in our version control (Subversion) as compressed blobs (dbf.gz, shp.gz and shx.gz). This is how they are used by the program, but it's extremely inconvenient for versioning purposes. We get no information about changes to entries, or attributes - just that something, somewhere in the file has changed. No useful diff.
The DBF is the one that has the attributes. I was thinking maybe we could store it as CSV and then as part of the build process, convert it to DBF and do ??? (to be determined) to make it a valid shapefile, then make the zipped version as it currently uses.
Another approach might be to remove nearly all the attributes from the shapefile, store those in CSV/YAML/whatever (which can be versioned nicely), and either look them up by the shape IDs or try to attach them to our objects after they have been instantiated from shapefiles, something like that.
But maybe folks with more experience with shapefiles have better ideas?
The DBF you are referring to starting your second paragraph has the attributes. Why not dump out the table on a "per Shape" basis to an XML style file and use THAT for the subversion. If you are actually working within Visual Foxpro (which also uses DBF style files too), you could use the function CursorToXML() and just run that through a loop of distinct shapes and dump out to each respective XML file. Then, when reading it back in.... XMLToCursor() of the per file shape.
I've recently been tasked with improving a records database that consists of the following:
All records are stored in one giant
XML file.
Any changes or updates to
these records are done by hand within
this XML file.
Each record contains
an 'Updated' datetime stamp to keep
some form of revision control.
The entire XML file is also checked into
a subversion repository to keep
revision control for the entire
collection.
This records database is strictly for internal use only and does not face any public interface.
I'm a bit of a newbie to database design, but the above method feels a little cumbersome. I was thinking of moving all of the above to some form of perhaps a SQLite database and building some form of a front end to update/remove/view entries while keeping track of any changes to that DB. Are there better ways to do this or is it pretty standard to have a system like is already in place?
Putting the information into a database is a good solution. Another decent solution is just making each record its own file and using a revision-control system to track the changes to each individual record. This is much more efficient than having one glommed-together file :-).
Doesnt actually sound that bad! Depends how often its updated and how many programs read the XML.
I would try to approaches depending on the above.
First get one of the nifty XML validating editors like XML spy and define an XML Schema if or xsd if you havent already got one. You you now have a clean user interface that can update and validate the file. Continue to use the revision control to system to keep a history.
Secondly -- if the updates are really simple write a quick Java/C#/VB or whatever program to update the XML -- otherwise carry on as before.
I have a need to copy data from one Informix database to another. I do not want to use LOAD for doing this. Is there any script that can help me with this? Is there any other way to do this?
Without a bit more information about the types of Informix databases you have, it's hard to say exactly what the best option is for you.
If it's a small number of tables and large volumes of data, have a look at onunload, onload and/or the High Performance Loader. (I'm assuming we're not talking about Standard Engine here.)
If on the other hand you have lots of tables and HPL will be too fiddly, have a look at myexport/myimport (available on the iiug.org site). These are non-locking equivalents of the standard dbexport/dbimport utilities.
The simplest solution is to backup the database instance and restore it to a separate instance. If this is not possible for you then there are other possibilities.
dbexport/dbimport
unload/load
hand-crafted SQL inserts
If the database structure is identical then you can use dbexport/dbimport, however this will unload the data to flat files, either in the file system or on tape and then import from the flat files.
I generally find that if the DB structure is the same then load/unload is the easiest solution.
If you do not want to use load/unload dbimport/dbexport then you can use direct SQL INSERTS as follows (Untested you will need to check the syntax)
INSERT INTO dbname2#informix_server2:table
SELECT * FROM dbnam1e#informix_server1:table_name
This would of course imply consistent table structure, you could use a column list if the structure is different.
One area that will cause you issues is referential integrity. If you have foreign keys then this will cause you a problem as you will need to ensure the inserts are done in the correct order. You may also have issues with SERIAL columns and INSERTS. Load does not suffer from this problem as you can load into a table with a serial value and retain the original values.
I have often found that the best solution is as follows
Take a schema from database1.
Split it into 2 parts the initial
segment is all table creation
statements, the second parts is all
of the CREATE INDEX, referential
integrity etc statements.
Create database2 from the 1st part of
the schema.
Use UNLOAD/LOAD to load the data into
database2.
Apply the second part of the schema to database2
This is very similar to the process that dbimport goes through but historically I have not been able to use dbimport as my database contains synonyms to another database and dbimport did/does not work with these.
UNLOAD and LOAD are the simplest way of doing it. By precluding them, you preclude the use of DB-Load and DB-Access and DB-Export and DB-Import too. These are the easiest ways to do it.
As already noted, you could consider using HPL.
You could also set up an ER system - it is harder than UNLOAD followed by LOAD, but doesn't use the verboten operations.
If the two machines are substantially identical, you could consider onunload and onload; I would not recommend it.