I am looking for a method to store only the incremental values of documents in a storage. It can be a database or a file system but the main requirements are:
Fast (adding documents, saving new revisions and retrieving all should be handled as fast as possible)
Efficient (it should use the least amount of storage while keeping it fast enough)
Able to handle a lot of files (billions of files/documents)
At first I was using SVN and now my best choice seems to be Git. It has all the thing I want. However, it has a few issues.
I have to have a copy of the last version of each document in the repository. Which equals to a lot of files sitting at the storage folder.
It seems like it's kind of overkill to use a full version control just to use its storage capability. I'm not sure if this has any disadvantages or not though.
I think the ideal solution would be a database which has something like a version control or basically git's core functionality at its core.
Is there such solution? Is it possible for just 1 developer to somehow easily create such tool without months/years of research and effort?
What would you recommend and why?
Git does meet your requirement. The database or incremental storage just as the .git folder for a git repo.
Remote repo is the place which only store the delta changes (checksum) for different version. And it’s bare repo, that means there is no working directory (no documents of last version exist). So the you can treat remote repo as database.
1. Create a remote repo:
You can create remote repo locally or hosted on github, bitbuket etc. For remote repo hosted on github or bitbucket, you just need to sign up and create a repository, then clone a working copy for it. So I just list create a remote repo locally here:
# In an empty folder, such as D:\repo
git init --bare
Now you have an empty remote repo in D:\repo.
2. Making changes for the remote repo/database:
Working in the git repo, you need a working copy (local repo). So you can clone a local repo from remote and make/commit changes. When you want to store the changes in remote repo (database), just push changes to remote.
# In another directory, such as D:\local
git clone D:/repo
cd repo
# Add/create files you want to store in git repo (D:\local\repo)
git add .
git commit -m 'message'
git push
Now what you make changes will be stored in the remote repo.
Related
I am writing a task for capistrano 3 and I need to get the current commit sha1. How can I read that ? Is there a variable for that ?
I have seen fetch(:sha1) in some files but this isn't working for me.
I am deploying into a docker container, and I need to tag the image with the current sha1 (and ideally, skip the deployment if there is already an image corresponding to the current sha1)
Capistrano creates a file in the deployed folder containing the git revision. In looking at the task which creates that file, we can see how it obtains the revision: https://github.com/capistrano/capistrano/blob/master/lib/capistrano/tasks/deploy.rake#L224
So, it is obtaining the revision from fetch(:current_revision).
In the git specific tasks, we can see where it is set: https://github.com/capistrano/capistrano/blob/master/lib/capistrano/scm/tasks/git.rake#L62
As a side note, Capistrano is probably not the best tool for what you are trying to do. Capistrano is useful for repeated deployments to the same server. Docker essentially is building a deployable container already containing the code. See the answers here for more detail: https://stackoverflow.com/a/39459945/3042016
Capistrano 3 is using a plugin system for the version manager application used (git, svn, etc.)
The current_revision is delegated to the version manager plugin and I don't understand how to access it...
In the meantime a dirty solution would be
set :current_revision, (lambda do
`git rev-list --max-count=1 #{fetch(:branch)}`
end)
But I'm waiting for a good solution that would instead, manage to invoke the right task from the SCM plugin
I'm thinking of a good deployment strategy for magento. I already have managed to deploy code with git from my local installation to my stage server. (The jump to live is not a problem then)
Now I'm thinking about how to deploy backend changes like the following:
I'm adding a new attribute set and I want it to be available on my stage and later the live server. Since these settings are in the database, I could just do a mysqldump and restore this dump on my stage/live systems.
But I can't do this, since the database has more data like orders, articles (with current stock availability) and a lot more stuff which I don't want to deploy from my testing system.
How are others handling this deployment "problem"?
After some testing, I chose the extension Mageploy, which is nicely to install via modman (I prefer modgit which relies on the same data for installation) and already captures a lot important backend settings.
If you need more, it's possible to extend it to more backend settings by yourself (and then contribute to the git project. Pullrequests are concidered quickly)
I'm working in a project that using couchdb, it's the first time I use couchdb. The process we have is to edit couchdb file, commit and push it to our git repo, and run a curl command to update the row into couchdb, but we can edit the couchdb row in couchdb interface too.
Is this process correct? How can I ensure that files in couchdb are in the same version that I have in git, if an user changed the couchdb file through interface?
I looked into tutorials to how to do this management, someone have an indication?
When you open the couchdb interface make sure you are pointing to files in the git repository on your system.
One way to check is:
1. Make changes using couchdb interface, dont commit or anything
2. Open the location that houses your Git repo of couchdb files and run git status. It should show the files you modified in interface as changed. If not, then your interface is pointing to files stored at some other location
I.e., we have a 20MB bzip2 sql file of development data that we'd like to have versioned along with our development code.
However, we don't want this file pulled down from the repo by default with every fresh clone/fetch.
One solution seems to be storing this large file in a separate repo and then link to it with a submodule. Then, a developer would fetch the db file only when they need to retrieve and reset their development database. And then, when there's a schema change, the database file would be updated, committed to the external repo, and the submodule updated.
Is this a good development workflow? Or is there a better way of doing this?
EDIT: The uncompressed SQL dump is 360MB.
EDIT: Github says "no", don't do this:
Database dumps
Large SQL files do not play well with version control systems such as
Git. If you are looking to provide your developers with the most
recent production dataset, we recommend using Dropbox for sharing
files like these among your developers.
I ended up making a simple web server serve the schema dump directory from the repo where dumps are stored. The repo grew really quickly because the dumps are large, and it was slowing people down just to clone it when they had to bring up new nodes.
I am using ProjectLocker as my SVN server.Stuck at Files Synchronization at run time with Local DB Files. I am using Tortoise SVN.
From your comments, it sounds like you may not be familiar with some version control concepts. For new Subversion users, I recommend Chapter 1 of the Version Control With Subversion book. This will explain what a working copy is in more detail, and how Subversion keeps your data. Chapter 2 has more information on a basic work cycle. ProjectLocker takes care of all the svnadmin steps for you, so you can ignore those and look at how to check out, update, and commit.
The first thing you should do is to create a staging directory where you keep any files that you're doing development on. You may need to copy your PHP, CSS, DB files and so on to that location. You then run the TortoiseSVN equivalent for svn import to upload all the files to your server. Once you've imported them, back up the directory you just created, and create an empty working directory. Run the TortoiseSVN equivalent of Subversion checkout and you will pull down all the files in your repository. Once you have that, Subversion will take care of identifying which changes can be merged and which will need manual intervention as you make changes, run updates to pull changes from other users, and commit.
If you wish to upload files to a remote location after commits and you have a paid account, you can use ProjectLocker's remote deployment solution to FTP a particular Subversion directory over to your actual server for deployment.
I apologize if this is a little vague, but the scope of your question is quite broad, and so I wanted to give you as concise an answer as possible while still addressing your needs.