When users upload files or the system generates files, where are those files stored? The system keeps files' headers in a database and their bodies on a hard drive. With this approach the performance does not suffer as files' bodies are not pulled from the database and a file may have more than one body. For example, a picture may have a two bodies stored against its single database header: scaled and full size JPG.
File system directory
The system is configured to use a directory on a hard drive and this directory contains the following sub folders:
The system uses data directory to store all files. The directory has sub folders (with lots of files in each) to overcome OS files per directory limit and if a user goes into this directory she will not face endless abyss of files (which may take ages to render).
The system uses tmp directory to store temporary files before those go into the database. When a user uploads a file its body is created in tmp directory. When upload completes the system moves it to data directory and creates database header. The same process applies when the system copies a file - its body is copied into tmp and then is moved to data with new databse header created. There is a task scanning the directory which it removes abandoned files judging by last accessed time.
This directory is periodically scanned for files containing data to feed into the system. There is a task scanning the directory which removes unprocessed files judging by last accessed time.
File header and body
- Header is stored in a databse and contains information about a file such as its virtual folder and file name, when was it updated etc. File body is accessed based on this database header. Every file header must have a file body and a body without a header will be eventually removed by a task.
- Body is stored on a hard drive as an ordinary file which was uploaded or fed into the system.
The system gets file's meta data (path, name, size, last update etc) by querying its header from the database and then retvieves its body by reading the hard drive. On the picture above there are 3 files' headers in the database. The first two files (bar.txt and my.doc) have only one body stored on the hard drive (1f.d and 1c.d - those are hex numbers corresponding to database IDs plus d for the default suffix which is explained later), the last file header (foo.jpg) has 2 bodies on the hard drive (2e.d and 2e.o - which is again explained later) and there is one more file body on the hard drive (2f.d) which will be removed by the cleaner task shortly.
File with multiple bodies
Additional bodies may be attached to a file header. The system uses a suffix (file extension on a hard drive) to differentiate between bodies of the same file header. There is d suffix for the default file body, o suffix for the original body etc. For example, a picture is scaled before going into user's gallery but the system also stores its original unscaled body with o suffix. If proportions need to change then the system will rescale the original body and replace the default body with the result.
Database file header stores information only about its default file body. A new file header is created with its default body and special actions must be performed to read or save additional bodies. Every database header must have default file body attached.
Deleting a file
When a file is deleted by the system its header is removed from the database. The file body remains untouched on the hard drive. There is a task scanning the hard drive which removes all the bodies without database headers. If a file has multiple bodies then the task will remove those as well.
File with no body
- Such case is only allowed when the system creates a new file. File body can only be saved based on database header ID thus the header must be created prior to the body.
- Trying to read file data will result in error and should be considered as a bug. Every header must have a body attached.
This case may happen as the result of a database restore without restoring its file system directory - the database headers have no files on a hard drive. Any database header which does not have a body on a hard drive will result in error.
Retrieving a file
When a browser requests a file it supplies a path to the file. The system uses this path to find requested file header in the database and then locates file body on the hard drive. Then the system sends the file data back to the browser using the binary servlet.