How the GitHub store your repository files? - database

I'm feeling stupid, but I want to know how GitHub and Dropbox store user files, because I have a similar problem and I need to store user's project files .
Is it just like storing project files somewhere in the server and refer to the location as a field in the database, or there are other better methods ?
Thanks.

GitHub uses Git to store repositories, and accesses those repos from their Ruby application. They used to do this with Grit, a Ruby library. Grit was written to implement Git in Ruby but has been replaced with rugged. There are Git reimplementations in other languages like JGit for Java and Dulwich for Python. This presentation gives some details about how GitHub has changed over the years and is worth watching/browsing the slides.
If you wanted to store Git repositories, what you'd want to do is store them on a filesystem (or a cluster thereof) and then have a pointer in your database to point to where the filesystem is located, then use a library like Rugged or JGit or Dulwich to read stuff from the Git repository.
Dropbox stores files on Amazon's S3 service and then implements some wrappers around that for security and so on. This paper describes the protocol that Dropbox uses.
The actual question you've asked is how do you store user files. The simple answer is... on the filesystem. There are plugins for a lot of popular web frameworks for doing user file uploads and file management. Django has Django-Filer for instance. The difficulty you'll encounter in rolling your own file upload management system is building a sensible way to do permissions (so users can only download the files they are entitled to download), so it is worth looking into how the various framework plugins do it.

Related

Heroku where can i save files?

I have a telegram bot, and it saves the user's audio messages and photos in the repository and DB(path only), I deployed it in on pythonanywhere and everything works.
But before that, I tried to deploy it on heroku and ran into the problem that you can't store files there and everything can only be done through databases.
Do I understand correctly that you need to create a field in the database that stores the file itself, or are there other ways?
You may use, for example, cloudinary. They provide 25GB of bandwidth for free. The service is intended to be used for pictures but works well with other files as well. AND it has a good API to go with it for many programming languages (not sponsored)).

Using Apache Solr for Project Archive

In our company there are many projects that are all contain several informations, e.g. source code, project informations, bug reports or emails. The informations are not on a central place so if you want to search for a solved problem in a past project, you have to search for yourself.
The idea is now to build a project archive that you can search through. We want to use Apache Solr to create a Webapp with which you can search for several informations.
Indexing pdf, word or java files is not the problem in this case. The question is, what is the best solution to gather all the files from different systems. The documents are present in systems like MS Sharepoint, Atlassian Confluence, Jira, SVN or Git.
What is the best strategy to export all the informations from the different systems to gather them in a central place, where the indexing can easily be done, maybe automatically.

How to download an ephemeral file from Heroku Cedar

I have a rails project hosted on Heroku Cedar that does the following:
crawls daily newsfeed and store them into the database
manually judge the feeds and classify them into categories
use the judgments to build a classifier that automatically classifies new incoming feed
iteratively improve the classification with additional judgments
The problem is that the classifier requires writing to a file. However, when I run the scripts on Heroku Cedar, it creates an ephemeral file that isn't permanent.
My questions are:
Is there a way to download the ephemeral file I created by running a script on Heroku?
What's a better way to handle situation like this?
In short No. You want to be storing any generated data in some sort of persistent file/data store. You should look at pushing these files to S3 or similar.

Best Practice for Location of Java JSP Application Files in Tomcat Environment

My Java JSP application requires to store permanent files on the Tomcat web server. At the moment I save the files in the "/temp" folder of the System. But this folder gets cleared from time to time. Further, the current solution is hard-coded which makes it less flexible (e.g. when moving to another server).
I would like to now if there is a best practice for defining and accessing a permanent directory in this configuration. In detail, where is the best place to define the app file directory, and how would I access this from within my java application? The goal of this setup would be to cause the least effort when (a) updating the application (i.e. placing a new war file), and (b) moving from one server to another and OS (e.g. Unix, Windows, MacOS).
The research I have done on this topic so far revealed that the following would be solutions (possibly amongst others):
1.) Use of a custom subdirectory in the Tomcat installation directory.
What happens to the files if I deploy a new version on the tomcat via
war file?
Where do I define this directory to be accessed from
within my Java application?
2.) In a separate directory in the file system.
Which are good locations or ways to get the locations without knowing
the system?
Where do I define this directory to be accessed from
within my Java application?
Thank you for your advice!
Essentially, you are creating 'a database' in the form of some files. In the world of Java EE and servlet containers, the only really general approach to this is to configure such a resource via JNDI. Tomcat and other containers have no concept of 'a place for persistent storage for webapps'. If a webapp needs persistent storage, it needs to be configured via JNDI, or -D, or something you tell it by posting something to it. There's no convention or standard practice you can borrow.
You can pick file system pathname by convention and document that convention (e.g. /var/something on Linux, something similar on Windows). But you won't necessarily be aligned with what anyone else is doing.

Common file system API for files in the cloud?

Our app is a sort-of self-service website builder for a particular industry. We need to be able to store the HTML and image files for each customer's site so that users can easily access and edit them. I'd really like to be able to store the files on S3, but potentially other places like Box.net, Google Docs, Dropbox, and Rackspace Cloud Files.
It would be easiest if there there some common file system API that I could use over these repositories, but unfortunately everything is proprietary. So I've got to implement something. FTP or SFTP is the obvious choice, but it's a lot of work. WebDAV will also be a pain.
Our server-side code is Java.
Please someone give me a magic solution which is fast, easy, standards-based, and will solve all my problems perfectly without any effort on my part. Please?
Not sure if this is exactly what you're looking for but we built http://mover.io to address this kind of thing. We currently support 13 different end points and we have a GUI interface and an API for interfacing with all these cloud storage providers.

Resources