Disaster Recovery

From LogicalDOC Community Wiki
Jump to: navigation, search

The Aim: Mirroring LogicalDOC resources

As a DMS, LogicalDOC has to handle large amount of sensible and reserverd documents, from the user's perspective it is important to guarantee information resilience even after a server failure or in general a disaster that may involve the building or the geographical area. We also need a tool to quickly restore the data once the runtime environment becomes available after de fault. It is important to note that what is described here is NOT a backup, it just describes a mirroring system to be used to fast restore documents after a disaster.

External procedure with Amazon S3

We try to develop a mirroring procedure that takes care of put LogicalDOC resources to a specific Amazon S3 bucket using Java and the AWS SDK. Here below there is a rough description of some important issues to understand when inspecting a single resource:

1) If the resource is locally new, simply execute a PUT

  For each inspected folder a special .cache file is created storing a list of all resources inside the folder itself.
  For each uploaded file the current MD5 hash is also stored

2) If the resource was locally modified, execute a DELETE(discard exceptions) followed by a PUT (we don't want to handle versions)

  To check if a resource is modified it is enough to check the MD5 against the one saved in the .cache file

3) If the resource was locally deleted, execute a DELETE

  To detect deletions we could use the current .cache file, since all resource that are listed in the file but not phisically
  present in the folder can be considered as deleted. Remote deletions have to be logged but cannot stop the main thread.
  If the deleted resource is a folder, all remote elements inside this folder have to be remotely deleted. 


The procedure will take all the configuration parameters from central context.properties and additional properties files. Among other things this procedure must be configured with these optional limits:

  • Maximum size expressed in MB
  • Maximum number of resources
  • Maximum time expressed in minutesù

What about the partial file problem?

When the procedure ispects a folder, how can it assure that a file is finished or not completely written? We can discard the question, since when the case occurs, a second run will solve the issue since MD5 changes and the remote resource will be replaced by the updated local binary. This minimalistic approach is simple but doesn't guarantee to have a remote consistent mirror at a given point in time, and of course may use a little bit more bandwidth. Another approach would be to change the write logic making use of .tmp files that will be renamed after the end of the write, the mirroring procedure will skip .tmp files to avoid the upload of partial informations.