Link to the Russian language README page
A Tiny, High-Performance File Integrity and Comparison Tool
“A truly great application will always fit on a floppy disk. Hopefully, someone out there still remembers what those were… But it’s not about the floppies, it’s about quality software!”© :-D
precizer is a lightweight and blazing-fast command-line application written entirely in pure C. It is designed for file integrity verification and comparison, making it particularly useful for checking synchronization results. The program walks directory trees, generating a database of files and their checksums for quick and efficient comparisons.
Built for both embedded platforms and large-scale clustered mainframes, precizer helps detect synchronization errors by comparing files and their checksums across different sources. It can also be used to analyze historical changes by comparing databases generated at different points in time from the same source.
Consider a scenario where two machines have large mounted volumes at /mnt1 and /mnt2, respectively, containing identical data. The goal is to verify, byte by byte, whether the contents are truly identical or if discrepancies exist.
- Run precizer on the first machine (e.g., hostname
host1):
precizer --progress /mnt1This command traverses the directory tree under /mnt1, creating a database file host1.db in the current directory. The --progress flag provides real-time progress updates, displaying the total traversed space and the number of processed files.
- Run precizer on the second machine (e.g., hostname
host2):
precizer --progress /mnt2This will generate a database file host2.db in the current directory.
- Copy
host1.dbandhost2.dbto one of the machines and run the following command to compare them:
precizer --compare host1.db host2.dbThe output will display:
- Files that exist on
host1but are missing onhost2, and vice versa. - Files present on both hosts but with different checksums.
precizer stores only relative file paths in its database. For example, a file located at:
/mnt1/abc/def/aaa.txt
will be stored as:
abc/def/aaa.txt
without the /mnt1 prefix. Similarly, the corresponding file on /mnt2:
/mnt2/abc/def/aaa.txt
will also be stored as:
abc/def/aaa.txt
This ensures that even when files reside in different mount points or sources, they can still be compared accurately under the same relative paths and their respective checksums.
Consider a scenario where a primary storage system has a backup copy. For example, this could be a data center storage and its Disaster Recovery copy.
Synchronization from the primary storage to the backup occurs periodically, but due to the massive data volumes, synchronization is most likely not performed byte-by-byte but rather by detecting metadata changes within the file system. In such cases, file size and modification time are taken into account, but the actual content is not verified byte by byte.
This approach makes sense because the primary data center and the Disaster Recovery site usually have high-speed communication channels, but a full byte-by-byte synchronization would take an unreasonably long time.
Tools like rsync allow both types of synchronization — metadata-based and byte-by-byte — but they have one major drawback: state is not preserved between sessions.
The following scenario illustrates the issue:
- Given: Server "A" and Server "B" (Primary Data Center and Disaster Recovery)
- Some files have been modified on Server "A".
- The
rsyncalgorithm detects them based on changes in size and modification time and synchronizes them to Server "B". - Multiple connection failures occur during synchronization between the Primary Data Center and the Disaster Recovery site.
- To verify data integrity (i.e., ensuring that files on "A" and "B" are identical byte by byte),
rsyncis often used with byte-by-byte comparison. The process works as follows:rsyncis launched on Server "A" with the--checksummode, attempting to compute checksums sequentially on both "A" and "B" in a single session.- This process takes an extremely long time for large-scale storage systems.
- Since
rsyncdoes not save computed checksums between sessions, it introduces several technical challenges:- If the connection drops,
rsyncterminates the session, and on the next run, everything must start from scratch! Given the huge data volumes, performing a byte-by-byte verification for full data integrity becomes an impossible task.
- If the connection drops,
- Storage subsystem failures can also lead to binary inconsistencies. In such cases, file system metadata cannot reliably determine whether file contents on "A" and "B" are truly identical.
- Over time, errors accumulate, increasing the risk of maintaining an inconsistent Disaster Recovery copy of system "A" on system "B", rendering the entire Disaster Recovery effort useless. Standard utilities do not detect these inconsistencies, and technical personnel may be completely unaware of data integrity problems in the Disaster Recovery storage.
- To overcome these limitations, precizer was developed. The program identifies exactly which files differ between "A" and "B" so that they can be resynchronized with the necessary corrections. The tool operates at maximum speed (pushing hardware performance to its limits) because it is written in pure C and utilizes high-performance algorithms optimized for efficiency. The program is designed to handle both small files and petabyte-scale data volumes, with no upper limits*.
- The name precizer comes from the word precision, implying something that enhances accuracy.
- The program precisely analyzes directory contents, including subdirectories, computing checksums for every encountered file while storing metadata in an SQLite database (a regular binary file).
- precizer is fault-tolerant and can resume execution from the point of interruption. For example, if the program is terminated via Ctrl+C while analyzing a petabyte-scale file, it will NOT restart from the beginning but continue exactly where it left off using previously recorded data in the database. This significantly saves resources, time, and effort for system administrators.
- The program can be interrupted at any time using any method, and this is completely safe for both the scanned data and the database created by precizer.
- If the program is intentionally or accidentally stopped, there is no need to worry about losing progress. All results are fully preserved and can be used in subsequent runs.
- The checksum calculations rely on a reliable and fast SHA512 algorithm, which completely eliminates collisions even when analyzing a single massive file. If there are two identical large files differing by just one byte, SHA512 will detect it, and their checksums will be different—something that cannot be guaranteed with simpler hash functions like SHA1 or CRC32.
- The algorithms in precizer are designed to make it easy to keep the database up to date without having to recalculate everything from scratch. Simply run the program with the
--updateparameter, and new files will be added to the database, while entries for deleted files will be removed. If a file has been modified and its size has changed, its SHA512 checksum will be recalculated and updated in the database. - During
--update, entries for missing files are removed, but records for inaccessible files (permission denied) are kept by default. This protection exists because permissions can temporarily change (ownership, ACLs, transient mount issues), and dropping records in that state would silently erase valid database history. Using--db-drop-inaccessiblewith--updateis intended only when those database records must be dropped. - When
--progressis enabled, warnings and errors collected during a session are printed in one block right before exit so important messages (for example, file access issues) are not lost in routine logs. - The
--quiet-ignoredoption suppresses per-file log lines for paths filtered by--ignoreand--include. This helps keep program logs free of extra messages once ignore regular expressions are tuned and stable in use; other warnings and errors remain visible. - There is an option to consider not only the file size when updating the database but also the file’s creation or modification timestamps. This means that any change in file metadata will trigger an SHA512 checksum recalculation and update in the database. For example, if a file’s ctime changes but its size remains the same, the checksum will NOT be recalculated if only the
--updateparameter is used. To force checksum recalculation for such files--watch-timestampsshould be added. This option is disabled by default because ctime (like mtime) can change frequently due to commands likechmodorchown, even when the file’s content remains the same. - precizer can be used as a security monitoring tool, detecting unauthorized file modifications where contents might have changed while metadata remains untouched.
- The program never modifies, deletes, moves, or copies any files or directories it processes. All it does is list files, compute their checksums, and update them in the database. All changes are strictly confined to the database.
- Performance is primarily limited by disk subsystem speed. Each file is read byte by byte, and its SHA512 checksum is computed.
- The program runs very fast thanks to SQLite and FTS libraries (man 3 fts).
- Command-line argument parsing is handled via the ARGP library.
- Regular expression support is provided by PCRE2.
- The program is safe to use with an enormous number of files, directories, and deeply nested subdirectories. Thanks to the FTS library, recursion is avoided, preventing stack overflows even with extreme levels of nesting.
- Due to its compact and portable codebase, the program can be used even on specialized devices like NAS systems, embedded platforms, or IoT devices.
- The database contents created by precizer can be explored with DB Browser for SQLite.
- The
--helpoption is designed to be as detailed as possible, specifically to assist users who may not have advanced technical knowledge. - Author contact options:
Download https://github.com/precizer/precizer/releases/latest/ executables for:
- Linux x86_64 precizer_linux_x86_64_portable.zip
- Linux arm aarch64 precizer_linux_aarch64_portable.zip
- macOS arm64 precizer_macos_arm64.zip
The release packages contain portable executables in a zip archive.
- The Linux build is a single executable, statically linked ELF binary not tied to any specific distribution. It can be run immediately on almost any Linux distro and does not require external shared libraries.
- The binary is produced by GitHub CI/CD, then compressed with UPX (the executable packer). The self-extracting compressed binary is then placed into a ZIP archive for convenient download. The file can be extracted from the archive and run directly.
- Static linking is not supported on macOS, so running the downloaded application requires the following libraries to be available on the system: sqlite3, pcre2, argp and fts.
- The author has set up an automated build system using GitHub Workflows and will continue maintaining new versions.
- The author is not willing to personally package and maintain precizer for all existing operating system distributions.
- If packaging for a specific distribution encounters major challenges adapting the code, the author can help with supporting the initiative and optimizing the program for the target distro or package manager. Contact details are in the “Questions & Bug Reports” section.
Building the program is already supported via Docker. Several tuned platforms are prepared and can be selected as the build distribution. Successfully tested distros:
- Almalinux
- Alpine
- Arch
- Debian
- Gentoo
- Rocky
- Ubuntu
Configuration details and installed libraries are listed in the corresponding Dockerfiles under .docker/.
Build targets use the form docker-<distro>-<build> (for example debian and dynamic-production).
make docker-gentoo-productionThis builds a production binary using the Gentoo Docker container.
make docker-ubuntu-productionThis builds the same production target using Ubuntu.
After the build completes, an executable precizer appears in the project directory (built inside the container). The main benefit of using Docker is that a full build toolchain, libraries, and their dependencies are not required on the host system; running Docker yields the binary. The next step is choosing the binary variant. When in doubt, make portable is a good starting point. All available build variants are described below.
git clone --depth=1 https://github.com/precizer/precizer.git
cd precizermake portableThe result is a single statically linked, self-extracting compressed UPX ELF file with no dynamic dependencies. It contains the whole program and can be run on almost any modern Linux distribution. The file can be copied to any platform of the same architecture (x64/arm/etc).
The program is optimized for maximum portability.
Compilation and linking flags: -static -O2 -mtune=generic
Docker alternative:
make docker-ubuntu-portableor replace -ubuntu- with any distro from the list above.
make productionThe result is a statically linked, self-extracting compressed UPX ELF file tuned for the local CPU. It contains the whole program, can be run on the local machine, and will use the maximum available CPU features.
The program is optimized for maximum possible performance on local hardware.
Compilation and linking flags: -static -O3 -march=native
Docker alternative:
make docker-ubuntu-productionor replace -ubuntu- with any distro from the list above.
make dynamic-productionThe result is an ELF executable of about 50 kilobytes. It is tuned for the local CPU and dynamically linked against libraries installed on the system; it is also self-extracting and UPX-compressed. It can be built and run on the local machine if libraries such as sqlite3, pcre2, argp and fts are installed.
The binary is optimized for maximum performance and minimal size.
Compilation flags: -O3 -march=native
Docker alternative:
make docker-ubuntu-dynamic-productionor replace -ubuntu- with any distro from the list above.
The test sets in the tests/examples/ directory can be used to evaluate the program’s capabilities.
Test execution:
git clone https://github.com/precizer/precizer.git
cd precizer
make testsJust copy the resulting precizer executable to any location listed in the $PATH environment variable for quick invocation.
Install build and compile tools on Linux
sudo pacman -S --noconfirm base-devel gcc-libs sqlite pcre2 upxsudo apt -y install gcc make libpcre2-dev libsqlite3-dev upx-uclsudo apk add --update build-base pcre2-dev pcre2-static fts-dev argp-standalone sqlite-dev upxsudo dnf -y install gcc make sqlite sqlite-devel glibc-devel pcre2 pcre2-devel upx pcre2-static glibc-staticecho "dev-libs/libpcre2 static-libs" >> /etc/portage/package.use/libpcre2;
emerge dev-libs/libpcre2 app-arch/upxmake purgeAdd files to two databases and compare them with each other:
precizer --progress --database=database1.db tests/examples/diffs/diff1
precizer --progress --database=database2.db tests/examples/diffs/diff2
precizer --compare database1.db database2.dbThe comparison of database1.db and database2.db databases is starting…
Starting database file database1.db integrity check…
Database database1.db has been verified and is in good condition
Starting database file database2.db integrity check…
Database database2.db has been verified and is in good condition
These files are no longer in the database1.db but still exist in the database2.db
path1/AAA/BCB/CCC/b.txt
These files are no longer in the database2.db but still exist in the database1.db
path2/AAA/ZAW/D/e/f/b_file.txt
The SHA512 checksums of these files do not match between database1.db and database2.db
2/AAA/BBB/CZC/a.txt
3/AAA/BBB/CCC/a.txt
4/AAA/BBB/CCC/a.txt
path1/AAA/ZAW/D/e/f/b_file.txt
path2/AAA/BCB/CCC/a.txt
Comparison of database1.db and database2.db databases is complete
The precizer completed its execution without any issues
Database Update
The previous example is run again. First attempt. Warning message.
precizer --progress --database=database1.db tests/examples/diffs/diff1The database database1.db was previously created and already contains data with files and their checksums. Use the --update option only when it is certain that the database needs to be updated and when file information (including changes, deletions, and additions) should be synchronized with the database.
ERROR: The precizer process terminated unexpectedly due to an error
The --update parameter must be included. This parameter is required to protect the database from data loss caused by accidental execution.
precizer --update --progress --database=database1.db tests/examples/diffs/diff1Primary database file name: database1.db
Starting database file database1.db integrity check…
Database database1.db has been verified and is in good condition
File system traversal initiated to calculate file count and storage usage
Total size: 45B, total items: 58, dirs: 46, files: 12, symlnks: 0
The database file database1.db has NOT been modified since the program was launched
The precizer completed its execution without any issues
Make the following adjustments:
# Modify a file
echo -n " " >> tests/examples/diffs/diff1/1/AAA/BCB/CCC/a.txt
# Add a new file
touch tests/examples/diffs/diff1/1/AAA/BCB/CCC/c.txt
# Remove a file
rm tests/examples/diffs/diff1/path2/AAA/ZAW/D/e/f/b_file.txt
Run precizer again with the --update parameter:
precizer --update --progress --database=database1.db tests/examples/diffs/diff1Primary database file name: database1.db
Starting database file database1.db integrity check…
Database database1.db has been verified and is in good condition
File system traversal initiated to calculate file count and storage usage
Total size: 43B, total items: 58, dirs: 46, files: 12, symlnks: 0
The --update option has been used, so the information about files will be updated against the database database1.db
File traversal started
These files have been added or changed and those changes will be reflected against the DB database1.db:
1/AAA/BCB/CCC/a.txt changed size & ctime & mtime rehashed
1/AAA/BCB/CCC/c.txt added
File traversal complete
Total size: 43B, total items: 58, dirs: 46, files: 12, symlnks: 0
These files are no longer exist or ignored and will be deleted against the DB database1.db:
path2/AAA/ZAW/D/e/f/b_file.txt
Start vacuuming the primary database…
The primary database has been vacuumed
The database file database1.db has been modified since the program was launched
The precizer completed its execution without any issues
Every time precizer runs, it traverses the file system and then checks whether a record for a specific file already exists in the database. In other words, the program prioritizes the current state of the file system on disk.
The directory traversal in precizer works similarly to rsync as it uses a similar algorithm.
It's important to note that precizer will not recalculate SHA512 checksums for files that are already recorded in the database, as long as their metadata remains unchanged (such as size and last access time, atime). If the --watch-timestamps argument is specified, the program will also consider the creation time (ctime) and modification time (mtime) in addition to the file size.
Any new, deleted, or modified files between application runs will be processed accordingly. All changes will be reflected in the database if the --update parameter is specified.
Using the --silent mode. When this mode is enabled, the program does not produce any output on the screen. This is useful when precizer is used in scripts.
Add the --silent parameter to the previous example:
precizer --silent --update --progress --database=database1.db tests/examples/diffs/diff1As a result, nothing will be displayed on the screen.
Additional Information in --verbose mode. This mode can be useful for debugging.
Add the --verbose parameter to the previous example:
precizer --verbose --update --progress --database=database1.db tests/examples/diffs/diff12025-01-25 09:55:59:820 src/parse_arguments.c:442:parse_arguments:Configuration: rational_logger_mode=VERBOSE
paths=tests/examples/diffs/diff1; database=database1.db; db_file_name=database1.db; verbose=yes; maxdepth=-1; silent=no; force=no; update=yes; watch-timestamps=no; progress=yes; compare=no, db-drop-ignored=no, dry-run=no, check-level=FULL, rational_logger_mode=VERBOSE
2025-01-25 09:55:59:820 src/parse_arguments.c:558:parse_arguments:Arguments parsed
2025-01-25 09:55:59:820 src/detect_paths.c:025:detect_paths:Checking directory paths provided as arguments
2025-01-25 09:55:59:820 src/file_availability.c:034:file_availability:Verify that the path tests/examples/diffs/diff1 exists
2025-01-25 09:55:59:820 src/file_availability.c:053:file_availability:The path tests/examples/diffs/diff1 is exists and it is a directory
2025-01-25 09:55:59:821 src/detect_paths.c:036:detect_paths:Paths detected
2025-01-25 09:55:59:821 src/init_signals.c:034:init_signals:Set signal SIGUSR2 OK:pid:604770
2025-01-25 09:55:59:821 src/init_signals.c:043:init_signals:Set signal SIGINT OK:pid:604770
2025-01-25 09:55:59:821 src/init_signals.c:052:init_signals:Set signal SIGTERM OK:pid:604770
2025-01-25 09:55:59:821 src/init_signals.c:055:init_signals:Signals initialized
2025-01-25 09:55:59:821 src/determine_running_dir.c:018:determine_running_dir:Current directory: /tmp
2025-01-25 09:55:59:821 src/db_determine_name.c:099:db_determine_name:Primary database file name: database1.db
2025-01-25 09:55:59:821 src/db_determine_name.c:105:db_determine_name:Primary database file path: database1.db
2025-01-25 09:55:59:821 src/db_determine_name.c:109:db_determine_name:DB name determined
2025-01-25 09:55:59:821 src/file_availability.c:034:file_availability:Verify that the path . exists
2025-01-25 09:55:59:821 src/file_availability.c:053:file_availability:The path . is exists and it is a directory
2025-01-25 09:55:59:821 src/file_availability.c:034:file_availability:Verify that the path database1.db exists
2025-01-25 09:55:59:821 src/file_availability.c:044:file_availability:The path database1.db is exists and it is a file
2025-01-25 09:55:59:821 src/db_determine_mode.c:128:db_determine_mode:Final value for config->sqlite_open_flag: SQLITE_OPEN_READWRITE
2025-01-25 09:55:59:821 src/db_determine_mode.c:129:db_determine_mode:Final value for config->db_initialize_tables: false
2025-01-25 09:55:59:821 src/db_determine_mode.c:131:db_determine_mode:DB mode determined
2025-01-25 09:55:59:821 src/db_test.c:061:db_test:Starting database file database1.db integrity check…
2025-01-25 09:55:59:821 src/db_test.c:082:db_test:The database verification level has been set to FULL
2025-01-25 09:55:59:821 src/db_test.c:126:db_test:Database database1.db has been verified and is in good condition
2025-01-25 09:55:59:822 src/db_get_version.c:087:db_get_version:Version number 1 found in database
2025-01-25 09:55:59:822 src/db_check_version.c:032:db_check_version:The database1.db database file is version 1
2025-01-25 09:55:59:822 src/db_check_version.c:061:db_check_version:The database database1.db is on version 1 and does not require any upgrades
2025-01-25 09:55:59:822 src/db_init.c:030:db_init:Successfully opened database database1.db
2025-01-25 09:55:59:822 src/db_init.c:118:db_init:The primary database and tables have NOT been initialized
2025-01-25 09:55:59:822 src/db_init.c:150:db_init:The primary database named database1.db is ready for operations
2025-01-25 09:55:59:822 src/db_init.c:167:db_init:The in-memory runtime_paths_id database successfully attached to the primary database database1.db
2025-01-25 09:55:59:822 src/db_init.c:174:db_init:Database initialization process completed
2025-01-25 09:55:59:822 src/db_compare.c:136:db_compare:Database comparison mode is not enabled. Skipping comparison
2025-01-25 09:55:59:822 src/db_contains_data.c:086:db_contains_data:The database database1.db has already been created previously
2025-01-25 09:55:59:822 src/db_validate_paths.c:192:db_validate_paths:The paths written against the database and the paths passed as arguments are completely identical
2025-01-25 09:55:59:822 src/file_list.c:143:file_list:File system traversal initiated to calculate file count and storage usage
2025-01-25 09:55:59:823 src/file_list.c:038:show_status:Total size: 43B, total items: 58, dirs: 46, files: 12, symlnks: 0
2025-01-25 09:55:59:825 src/db_get_version.c:087:db_get_version:Version number 1 found in database
2025-01-25 09:55:59:825 src/db_consider_vacuum_primary.c:025:db_consider_vacuum_primary:No changes were made. The primary database doesn't require vacuuming
2025-01-25 09:55:59:825 src/status_of_changes.c:049:status_of_changes:The database file database1.db has NOT been modified since the program was launched
2025-01-25 09:55:59:825 src/exit_status.c:027:exit_status:The precizer completed its execution without any issues
Non-recursive traversal using the --maxdepth parameter
tree tests/examples/4
tests/examples/4
├── AAA
│ ├── BBB
│ │ ├── CCC
│ │ │ └── a.txt
│ │ └── uuu.txt
│ └── tttt.txt
└── sss.txt
3 directories, 4 filesThe --maxdepth=0 parameter completely disables recursion.
precizer --maxdepth=0 tests/examples/4Primary database file name: myhost.db
The path myhost.db doesn't exist or it is not a file
The primary DB file not yet exists. Brand new database will be created
Recursion depth limited to: 0
File traversal started
These files will be added against the myhost.db database:
sss.txt
File traversal complete
Total size: 2B, total items: 5, dirs: 4, files: 1, symlnks: 0
Start vacuuming the primary database…
The primary database has been vacuumed
The database myhost.db has been modified since the last check (files were added, removed, or updated)
The precizer completed its execution without any issues
Example of a Path to Ignore. To specify a pattern for ignoring files or directories, PCRE2 regular expressions can be used. Note: All paths in the regular expression must be specified as relative.
PCRE2 regular expressions can be tested and validated using https://regex101.com/.
To illustrate how a relative path looks, run a directory traversal without the --ignore option and check how the terminal displays the relative paths recorded in the database:
% tree -L 3 tests/examples/diffs
tests/examples/diffs
├── diff1
│ ├── 1
│ │ └── AAA
│ ├── 2
│ │ └── AAA
│ ├── 3
│ │ └── AAA
│ ├── 4
│ │ └── AAA
│ ├── path1
│ │ └── AAA
│ └── path2
│ └── AAA
└── diff2
├── 1
│ └── AAA
├── 2
│ └── AAA
├── 3
│ └── AAA
├── 4
│ └── AAA
├── path1
│ └── AAA
└── path2
└── AAA
26 directories, 0 filesprecizer --ignore="^diff1/1/.*" tests/examples/diffsIn this example, the initial traversal path is ./tests/examples/diffs, and the generated ignore path is ./tests/examples/diffs/diff1/1/ along with all its subdirectories (/*).
Primary database file name: myhost.db
The path myhost.db doesn't exist or it is not a file
The primary DB file not yet exists. Brand new database will be created
File traversal started
These files will be added against the myhost.db database:
diff1/1/AAA/BCB/CCC/a.txt ignored & not added
diff1/1/AAA/ZAW/A/b/c/a_file.txt ignored & not added
diff1/1/AAA/ZAW/D/e/f/b_file.txt ignored & not added
diff1/2/AAA/BBB/CZC/a.txt
diff1/3/AAA/BBB/CCC/a.txt
diff1/4/AAA/BBB/CCC/a.txt
diff1/path1/AAA/BCB/CCC/a.txt
diff1/path1/AAA/ZAW/A/b/c/a_file.txt
diff1/path1/AAA/ZAW/D/e/f/b_file.txt
diff1/path2/AAA/BCB/CCC/a.txt
diff1/path2/AAA/ZAW/A/b/c/a_file.txt
diff1/path2/AAA/ZAW/D/e/f/b_file.txt
diff2/1/AAA/BCB/CCC/a.txt
diff2/1/AAA/ZAW/A/b/c/a_file.txt
diff2/1/AAA/ZAW/D/e/f/b_file.txt
diff2/2/AAA/BBB/CZC/a.txt
diff2/3/AAA/BBB/CCC/a.txt
diff2/4/AAA/BBB/CCC/a.txt
diff2/path1/AAA/BCB/CCC/a.txt
diff2/path1/AAA/BCB/CCC/b.txt
diff2/path1/AAA/ZAW/A/b/c/a_file.txt
diff2/path1/AAA/ZAW/D/e/f/b_file.txt
diff2/path2/AAA/BCB/CCC/a.txt
diff2/path2/AAA/ZAW/A/b/c/a_file.txt
File traversal complete
Total size: 97B, total items: 114, dirs: 90, files: 24, symlnks: 0
Start vacuuming the primary database…
The primary database has been vacuumed
The database myhost.db has been modified since the last check (files were added, removed, or updated)
The precizer completed its execution without any issues
Enjoy life!
Repeat the same example, but this time without the --ignore option to include the three previously ignored files:
precizer --update tests/examples/diffsPrimary database file name: myhost.db
Starting database file myhost.db integrity check…
Database myhost.db has been verified and is in good condition
The --update option has been used, so the information about files will be updated against the database myhost.db
File traversal started
These files have been added or changed and those changes will be reflected against the DB myhost.db:
diff1/1/AAA/BCB/CCC/a.txt add
diff1/1/AAA/ZAW/A/b/c/a_file.txt add
diff1/1/AAA/ZAW/D/e/f/b_file.txt add
File traversal complete
Total size: 97B, total items: 114, dirs: 90, files: 24, symlnks: 0
Start vacuuming the primary database…
The primary database has been vacuumed
The database file myhost.db has been modified since the program was launched
The precizer completed its execution without any issues
Continuation of the Previous Example Example 6.
Multiple regular expressions for ignoring files can be specified simultaneously by repeating the --ignore option.
The database will be cleaned of references to files matching the regular expressions provided via the --ignore arguments: "diff1/1/.*" and "diff2/1/.*".
The --db-drop-ignored parameter must be explicitly specified to remove database entries for files that match the patterns passed through the --ignore option.
No changes were made to the file system, but the ignored files will be removed from the database.
# Update the database by removing entries for files that were marked as ignored:
precizer \
--update \
--db-drop-ignored \
--ignore="^diff1/1/.*" \
--ignore="^diff2/1/.*" \
tests/examples/diffsPrimary database file name: myhost.db
Starting database file myhost.db integrity check…
Database myhost.db has been verified and is in good condition
The --update option has been used, so the information about files will be deleted against the database myhost.db
These files are no longer exist or ignored and will be deleted against the DB myhost.db:
diff1/1/AAA/BCB/CCC/a.txt clean ignored
diff1/1/AAA/ZAW/A/b/c/a_file.txt clean ignored
diff1/1/AAA/ZAW/D/e/f/b_file.txt clean ignored
diff2/1/AAA/BCB/CCC/a.txt clean ignored
diff2/1/AAA/ZAW/A/b/c/a_file.txt clean ignored
diff2/1/AAA/ZAW/D/e/f/b_file.txt clean ignored
Start vacuuming the primary database…
The primary database has been vacuumed
The database file myhost.db has been modified since the program was launched
The precizer completed its execution without any issues
Using --ignore together with --include
# Remove the old database and create a new one, then populate it with data:
rm -i "${HOST}.db"
precizer tests/examples/diffsThis variant uses regular expressions.
PCRE2 regular expressions for relative paths that need to be included. The specified relative paths will be included even if they were excluded using one or more --ignore parameters. Multiple regular expressions can be specified using --include.
PCRE2 regular expressions can be checked and tested using https://regex101.com/.
The DB will be cleaned of references to files matching the regular expressions provided in the --ignore arguments: "^.*/path2/.*" and "diff2/.*", but paths matching the patterns in --include will remain in the database.
The --db-drop-ignored parameter must be specified additionally to remove references to files matching the regular expressions passed via the --ignore options from the database.
# Update the database, removing references to files that were marked as ignored,
# except for paths matching the --include patterns.
precizer --update \
--progress \
--ignore="^.*/path2/.*" \
--ignore="^diff2/.*" \
--include="^diff2/1/AAA/ZAW/A/b/c/.*" \
--include="^diff2/path1/AAA/ZAW/.*" \
--include="^diff1/path2/AAA/ZAW/A/b/c/a_file\..*" \
--db-drop-ignored \
tests/examples/diffsPrimary database file name: myhost.db
Starting database file myhost.db integrity check…
Database myhost.db has been verified and is in good condition
The --update option has been used, so the information about files will be deleted against the database myhost.db
These files are no longer exist or ignored and will be deleted against the DB myhost.db:
diff1/path2/AAA/BCB/CCC/a.txt clean ignored
diff1/path2/AAA/ZAW/A/b/c/a_file.txt clean ignored
diff1/path2/AAA/ZAW/D/e/f/b_file.txt clean ignored
diff2/1/AAA/BCB/CCC/a.txt clean ignored
diff2/1/AAA/ZAW/D/e/f/b_file.txt clean ignored
diff2/2/AAA/BBB/CZC/a.txt clean ignored
diff2/3/AAA/BBB/CCC/a.txt clean ignored
diff2/4/AAA/BBB/CCC/a.txt clean ignored
diff2/path1/AAA/BCB/CCC/a.txt clean ignored
diff2/path1/AAA/BCB/CCC/b.txt clean ignored
diff2/path2/AAA/BCB/CCC/a.txt clean ignored
diff2/path2/AAA/ZAW/A/b/c/a_file.txt clean ignored
Start vacuuming the primary database…
The primary database has been vacuumed
The database file myhost.db has been modified since the program was launched
The precizer completed its execution without any issues
Protecting immutable archives with --lock-checksum
Use --lock-checksum for archival folders whose contents must never be rewritten. It accepts PCRE2 regular expressions for relative paths (same format as --ignore). Paths matching any lock pattern are written to the database once. After that their checksums are not recalculated, even with --update. Any later change in size, or in timestamps when --watch-timestamps is enabled, is treated as data corruption and reported instead of updating the record. You can provide multiple patterns by repeating the option.
precizer \
--lock-checksum="^archive/2024/.*" \
--lock-checksum="^snapshots/monthly/.*" \
/mnt/storageOn subsequent runs, the same lock patterns must be preserved while refreshing the database:
precizer \
--update \
--lock-checksum="^archive/2024/.*" \
--lock-checksum="^snapshots/monthly/.*" \
/mnt/storageFiles outside the lock patterns follow normal update rules. For entries locked via --lock-checksum, any drift becomes visible immediately and precizer exits with a non-zero status, which can be used in scripts.
Deep verification of locked data with --rehash-locked
The --rehash-locked option works only together with --lock-checksum. When it is enabled, every file that matches a lock pattern and already exists in the database is read again, its SHA512 checksum is recomputed, and the result is compared against the stored checksum. This provides an explicit integrity sweep for immutable archives at the cost of extra disk I/O. The option ignores whether --watch-timestamps is enabled or not. If the recalculated checksum and recorded size match, the file is considered consistent; if its timestamps on disk differ from the database, the ctime/mtime fields in the database are updated with the new values.
precizer --update \
--lock-checksum="^archive/2024/.*" \
--rehash-locked \
/mnt/storageThe following cases illustrate how --lock-checksum, --watch-timestamps, and --rehash-locked interact:
- File size mismatch. If the size stored in the database differs from the on-disk size, the file is flagged as a “locked checksum violation” regardless of
--watch-timestampsand--rehash-locked. Rehashing a file with a different size is meaningless because the checksum cannot match anyway. - File size matches; neither
--watch-timestampsnor--rehash-lockedis used. Other values, such as SHA512 and timestamps, are not considered; the file is treated as fully consistent andprecizerfinishes with theSUCCESSstatus. - Size and timestamps match;
--watch-timestampsis enabled and--rehash-lockedis omitted. The file is treated as fully consistent, does not appear in the output, andprecizerfinishes with theSUCCESSstatus. - Size matches, timestamps differ;
--watch-timestampsis enabled and--rehash-lockedis omitted. The file is flagged as a “locked checksum violation” only due to timestamp drift, andprecizerfinishes with theWARNINGstatus. - Size matches;
--rehash-lockedis enabled. Only the checksum and the size stored in the database matter. If both match, the file is considered consistent. If the on-disk timestamps changed, the new ctime/mtime values are saved to the database regardless of whether--watch-timestampswas used.
A practical workflow is to run a quick daily scan without --rehash-locked (and even without --watch-timestamps if timestamp drift is acceptable) to keep the database synchronized, then schedule a less frequent deep audit with --rehash-locked to force checksum-level verification of the frozen data set.
Dropping inaccessible records with --db-drop-inaccessible
By default, when a file is inaccessible because of permission errors, its database record is preserved during --update to prevent accidental data loss. Dropping such records requires --db-drop-inaccessible:
precizer --update --db-drop-inaccessible /mnt/storagedrop due to inaccessible archive/secret.bin
Note: this example applies only to files that have a record in the database but are truly inaccessible on disk for some reason. This can happen due to incorrect chmod/chown permissions or an incorrectly mounted volume. WARNING: if the file (or even its path) is actually deleted, not just temporarily inaccessible, then updating the database with --update will remove its record unconditionally — no extra options are needed.
Software author: Dennis V. Razumovsky
This program is distributed under the CC0 (Creative Commons Zero) Public Domain Dedication. The author is not responsible for any use of the source code or the entire program. Anyone who uses the code or the program uses it at their own risk and responsibility.
Usage Restrictions within Territory Under the Ruscist Terrorist Regime, Where Power Has Been Seized by an Authoritarian Dictatorship
- Permitted: strictly personal, non-commercial use by private individuals.
- Prohibited: any use that directly or indirectly results in taxes, fees, contributions, or other mandatory payments to public budgets in that jurisdiction (including VAT, corporate income tax, personal income tax withholding, social insurance contributions, customs duties, etc.).
- Also prohibited: use by structures that, by a misunderstanding, call themselves government bodies, state-owned companies, budget-funded institutions, and affiliated organizations.
- Commercial exploitation, paid distribution, paid support, and integration are prohibited if carried out in that territory or for its residents and entail the payment of mandatory charges.
- The restriction applies to the program itself and to its source code, in whole or in part.
- Purpose: to prevent direct and indirect financing of the war in Ukraine.

