File Change API
One of the main problems on a shared hosting system for file backup operations is to figure out which files have changed. Using INOTIFY on a 1T drive with a large number of small files and directories guarantees slow startup times, and a lot of context switching between kernel and userspace - generating additional load. On the other hand scanning disk for newly modified files is very IO intensive, and can kill the performance of the fastest disks.
CloudLinux File Change API is a kernel level technology with the user space interface that buffers lists of modified files in the kernel and then off-loads that list to user space daemon.
After that - any software (with enough permissions) can get a list of files that has been modified for the last 24 hours.
The software is very simple to use and produces the list of modified files. As such we expect file backup software, including integrated cPanel backup system to integrate with this API soon.
Usage and Integration
/usr/bin/cloudlinux-backup-helper is a utility for getting the list of changed files.
It is supposed to be run by a super user only.
Command line parameters:
-t | --timestamp retrieve file names for files modified after specified timestamp -u | --uid retrieve file names for particular UID only
protocol version (1 right now), timestamp (in seconds) - up to which time data was collected UID:absolute path to file changed UID:absolute path to file changed …
The timestamp in output is needed so you can clearly identify from which timestamp to get list of changed files next.
[root@localhost ~]# cloudlinux-backup-helper -t 1495533489 -u <UID> 1,1495533925 1001:/home/user2/public_html/output.txt 1001:/home/user2/public_html/info.php [root@localhost ~]# cloudlinux-backup-helper -t 1495533489 1,1495533925 1000:/home/user1/.bashrc 1001:/home/user2/public_html/output.txt 1001:/home/user2/public_html/info.php 1003:/home/user3/logs/data.log
/usr/bin/cloudlinux-backup-helper-uid is a SUID wrapper for the cloudlinux-backup-helper utility that enables an end user to get the list of files changed. It accepts timestamp argument only and retrieves data of the user who is running it only.
[user@localhost ~]$ cloudlinux-backup-helper-uid 1,1495530576 1000:/home/user/.bash_history [user@localhost ~]$ cloudlinux-backup-helper-uid -t 1495547922 1,1495548343 1000:/home/user/file1.txt 1000:/home/user/file2.txt
Installation and Configuration
CloudLinux OS 6 (requires Hybrid kernel) or 7 Kernel Version: 3.10.0-427.36.1.lve1.4.47
Installation and Configuration
To install cloudlinux-fchange system run:
yum install cloudlinux-fchange --enablerepo=cloudlinux-updates-testing
CloudLinux 6 Hybrid:
yum install cloudlinux-fchange --enablerepo=cloudlinux-hybrid-testing
Database containing list of modified files is located at /var/lve/cloudlinux-fchange.db by default.
Starting and Stopping
After successful installation the event collecting daemon starts automatically, providing all kernel-exposed data are in place.
To start daemon: CloudLinux 7:
systemctl start cloudlinux-file-change-collector
CloudLinux 6 Hybrid:
service cloudlinux-file-change-collector start
systemctl stop cloudlinux-file-change-collector
CloudLinux 6 Hybrid:
service cloudlinux-file-change-collector stop
To uninstall cloudlinux-fchange run:
yum remove cloudlinux-fchange
Configuration resides in /etc/sysconfig/cloudlinux-fchange. The following is the default configuration (see comments):
# sqlite database file path. If commented out a default value is used #database_path=/var/lve/cloudlinux-fchange.db # If uncommented paths starting with 'include' one are processed only # Pay attention this parameter is a regular string, not a regex # To include more than one item just specify several lines to include: # include=/one # include=/two # If uncommented exclude paths which contain 'exclude' # Pay attention this parameter is a regular string, not a regex # To exclude more than one item just specify several lines to exclude: # exclude=var # exclude=tmp # Daemon polling interval in seconds polling_interval=5 # Time to keep entries in days. Does not clean if commented out or zero time_to_keep=1 # User read-only mode minimal UID # If file change collector stopped, all users with UID >= user_ro_mode_min_uid # are restricted to write to their home directory. This prevents to miss # a file change event. # Value of -1 (default) allows to disable the feature user_ro_mode_min_uid=-1 # Minimal UID of events to be processed. # Events of users with UID less then specified are not handled. # By default 500 (non-system users for redhat-based systems) #minimal_event_uid=500 # SQLite shared lock prevents setting more restrictive locks. That is a # process cannot write to a database table when a concurrent process reads # from the table. As saving data to database is considered far more important # than getting them (data could be reread a second later after all), database # writer could try to terminate concurrent reading processes. Just set # terminate rivals to 'yes' to turn this ability on. # terminate_rivals=no # Events to be handled. Currently the following types of events are processed: # 1. file creation # 2. file deletion # 3. directory creation # 4. directory deletion # 5. file content/metadata modification # 6. file/directory attributes/ownership modification # 7. hardlink creation # 8. symlink creation # 9. file/directory moving/renaming # By default all events are processed. Keep in mind that events for a filepath # are cached, i.e if a file was deleted and then a file with the same absolute # name is created, only the deletion event is triggerred. Changing file # modification timestamp with command 'touch' will trigger modification event # as if a file content is modified. # Currently supported options are: # file_created, file_modified, file_deleted, dir_created, dir_deleted, # owner_changed, attrib_changed, moved, hardlink_created, symlink_created, all # Options that don't have 'file' or 'dir' prefix, applied to both files and # directories. To set more than one options, separate them with commas, # e.g. event_types=file_created,file_deleted,file_modified. Unknown options are # ignored. # # event_types=all
Please keep in mind, that current implementation implies that one process is writing to a database and another is reading from it. As reading sets shared lock to a database table, the writing process cannot write to the table until the lock is released. That’s why passing a timestamp to cloudlinux-backup-helpermatters: this way the number of records to be returned is substantially decreased, lowering the processing time and filtering out old records. Likewise, pay attention to narrowing the scope of events being recorded. Chances are that changing attributes, ownership, directory creation/deletion, symlink events are not relevant and there’s no need to keep them.
Using this options is dangerous, and might cause problems with CloudLinux File Change API.
The kernel exposes the functionality to folder.
enable - enable/disable the functionality. Write 1 to this file to enable, 0 to disable. If disabled, no events are coming to events file.
events - the modified files log itself. Events in the format <EVENT_ID>:<EVENT_TYPE_ID>:<USER_ID>:<FILE_PATH> are constantly appending to the end of the file if datacycle enabled. File events are never duplicated: if we have file modification event, we would not get file deletion event if the file has been later deleted. This events buffer has limited capacity, therefore from time to time, the events log requires flushing.
flush - a file for clearing events log. For flushing, the last event_id from the events file is written to this file. Right after this, events log is truncated to that event_id .
user_ro_mode - forbidding users with UIDs equal or bigger that set in this file writing to their home directories. At the boot, the file has -1. When it’s written positive value, say 500, the system starts effectively preventing users from modifying their home dirs (on write attempt a user gets ‘read-only filesystem’ error). This feature is designed to prevent users from updating their home dirs when events are not handled.
entries_in_buffer - just counter of log entries of events file.
min_event_uid - this file has minimal UID of events to be handled. Events from users with smaller UID are not handled. By default 500 (non-system users in redhat-based systems).