precizer

Link to the Russian language README page

Precizer — verify file checksums at scale

A Tiny, High-Performance File Integrity and Comparison Tool

“A truly great application will always fit on a floppy disk. Hopefully, someone out there still remembers what those were… But it’s not about the floppies, it’s about quality software!”^© :-D

TL;DR

Overview

precizer is a lightweight and blazing-fast command-line application written entirely in pure C. It is designed for file integrity verification and comparison, making it particularly useful for checking synchronization results. The program walks directory trees, generating a database of files and their checksums for quick and efficient comparisons.

Built for both embedded platforms and large-scale clustered mainframes, precizer helps detect synchronization errors by comparing files and their checksums across different sources. It can also be used to analyze historical changes by comparing databases generated at different points in time from the same source.

Basic Example

Consider a scenario where two machines have large mounted volumes at /mnt1 and /mnt2, respectively, containing identical data. The goal is to verify, byte by byte, whether the contents are truly identical or if discrepancies exist.

Run precizer on the first machine (e.g., hostname host1):

precizer --progress /mnt1

This command traverses the directory tree under /mnt1, creating a database file host1.db in the current directory. The --progress flag provides real-time progress updates, displaying the total traversed space and the number of processed files.

Run precizer on the second machine (e.g., hostname host2):

precizer --progress /mnt2

This will generate a database file host2.db in the current directory.

Copy host1.db and host2.db to one of the machines and run the following command to compare them:

precizer --compare host1.db host2.db

The output will display:

Files that exist on host1 but are missing on host2, and vice versa.
Files present on both hosts but with different checksums.

Relative Paths for Consistent Comparison

precizer stores only relative file paths in its database. For example, a file located at:

/mnt1/abc/def/aaa.txt

will be stored as:

abc/def/aaa.txt

without the /mnt1 prefix. Similarly, the corresponding file on /mnt2:

/mnt2/abc/def/aaa.txt

will also be stored as:

abc/def/aaa.txt

This ensures that even when files reside in different mount points or sources, they can still be compared accurately under the same relative paths and their respective checksums.

TECHNICAL DETAILS

Consider a scenario where a primary storage system has a backup copy. For example, this could be a data center storage and its Disaster Recovery copy.

Synchronization from the primary storage to the backup occurs periodically, but due to the massive data volumes, synchronization is most likely not performed byte-by-byte but rather by detecting metadata changes within the file system. In such cases, file size and modification time are taken into account, but the actual content is not verified byte by byte.

This approach makes sense because the primary data center and the Disaster Recovery site usually have high-speed communication channels, but a full byte-by-byte synchronization would take an unreasonably long time.

Tools like rsync allow both types of synchronization — metadata-based and byte-by-byte — but they have one major drawback: state is not preserved between sessions.

The following scenario illustrates the issue:

Given: Server "A" and Server "B" (Primary Data Center and Disaster Recovery)
Some files have been modified on Server "A".
The rsync algorithm detects them based on changes in size and modification time and synchronizes them to Server "B".
Multiple connection failures occur during synchronization between the Primary Data Center and the Disaster Recovery site.
To verify data integrity (i.e., ensuring that files on "A" and "B" are identical byte by byte), rsync is often used with byte-by-byte comparison. The process works as follows:
- rsync is launched on Server "A" with the --checksum mode, attempting to compute checksums sequentially on both "A" and "B" in a single session.
- This process takes an extremely long time for large-scale storage systems.
- Since rsync does not save computed checksums between sessions, it introduces several technical challenges:
  - If the connection drops, rsync terminates the session, and on the next run, everything must start from scratch! Given the huge data volumes, performing a byte-by-byte verification for full data integrity becomes an impossible task.
- Storage subsystem failures can also lead to binary inconsistencies. In such cases, file system metadata cannot reliably determine whether file contents on "A" and "B" are truly identical.
- Over time, errors accumulate, increasing the risk of maintaining an inconsistent Disaster Recovery copy of system "A" on system "B", rendering the entire Disaster Recovery effort useless. Standard utilities do not detect these inconsistencies, and technical personnel may be completely unaware of data integrity problems in the Disaster Recovery storage.
To overcome these limitations, precizer was developed. The program identifies exactly which files differ between "A" and "B" so that they can be resynchronized with the necessary corrections. The tool operates at maximum speed (pushing hardware performance to its limits) because it is written in pure C and utilizes high-performance algorithms optimized for efficiency. The program is designed to handle both small files and petabyte-scale data volumes, with no upper limits*.
The name precizer comes from the word precision, implying something that enhances accuracy.
The program precisely analyzes directory contents, including subdirectories, computing checksums for every encountered file while storing metadata in an SQLite database (a regular binary file).
precizer is fault-tolerant and can resume execution from the point of interruption. For example, if the program is terminated via Ctrl+C while analyzing a petabyte-scale file, it will NOT restart from the beginning but continue exactly where it left off using previously recorded data in the database. This significantly saves resources, time, and effort for system administrators.
The program can be interrupted at any time using any method, and this is completely safe for both the scanned data and the database created by precizer.
If the program is intentionally or accidentally stopped, there is no need to worry about losing progress. All results are fully preserved and can be used in subsequent runs.
The checksum calculations rely on a reliable and fast SHA512 algorithm, which completely eliminates collisions even when analyzing a single massive file. If there are two identical large files differing by just one byte, SHA512 will detect it, and their checksums will be different—something that cannot be guaranteed with simpler hash functions like SHA1 or CRC32.
The algorithms in precizer are designed to make it easy to keep the database up to date without having to recalculate everything from scratch. Simply run the program with the --update parameter, and new files will be added to the database, while entries for deleted files will be removed. If a file has been modified and its size has changed, its SHA512 checksum will be recalculated and updated in the database.
During --update, entries for missing files are removed, but records for inaccessible files (permission denied) are kept by default. This protection exists because permissions can temporarily change (ownership, ACLs, transient mount issues), and dropping records in that state would silently erase valid database history. Using --db-drop-inaccessible with --update is intended only when those database records must be dropped.
When --progress is enabled, warnings and errors collected during a session are printed in one block right before exit so important messages (for example, file access issues) are not lost in routine logs.
The --quiet-ignored option suppresses per-file log lines for paths filtered by --ignore and --include. This helps keep program logs free of extra messages once ignore regular expressions are tuned and stable in use; other warnings and errors remain visible.
There is an option to consider not only the file size when updating the database but also the file’s creation or modification timestamps. This means that any change in file metadata will trigger an SHA512 checksum recalculation and update in the database. For example, if a file’s ctime changes but its size remains the same, the checksum will NOT be recalculated if only the --update parameter is used. To force checksum recalculation for such files --watch-timestamps should be added. This option is disabled by default because ctime (like mtime) can change frequently due to commands like chmod or chown, even when the file’s content remains the same.
precizer can be used as a security monitoring tool, detecting unauthorized file modifications where contents might have changed while metadata remains untouched.
The program never modifies, deletes, moves, or copies any files or directories it processes. All it does is list files, compute their checksums, and update them in the database. All changes are strictly confined to the database.
Performance is primarily limited by disk subsystem speed. Each file is read byte by byte, and its SHA512 checksum is computed.
The program runs very fast thanks to SQLite and FTS libraries (man 3 fts).
Command-line argument parsing is handled via the ARGP library.
Regular expression support is provided by PCRE2.
The program is safe to use with an enormous number of files, directories, and deeply nested subdirectories. Thanks to the FTS library, recursion is avoided, preventing stack overflows even with extreme levels of nesting.
Due to its compact and portable codebase, the program can be used even on specialized devices like NAS systems, embedded platforms, or IoT devices.
The database contents created by precizer can be explored with DB Browser for SQLite.

QUESTIONS & BUG REPORTS

The --help option is designed to be as detailed as possible, specifically to assist users who may not have advanced technical knowledge.
Author contact options:
- GitHub Discussions.
- Bug reports and feature requests.

DOWNLOAD

Download https://github.com/precizer/precizer/releases/latest/ executables for:

Linux x86_64 precizer_linux_x86_64_portable.zip
Linux arm aarch64 precizer_linux_aarch64_portable.zip
macOS arm64 precizer_macos_arm64.zip

The release packages contain portable executables in a zip archive.

Technical details of the portable build

The Linux build is a single executable, statically linked ELF binary not tied to any specific distribution. It can be run immediately on almost any Linux distro and does not require external shared libraries.
The binary is produced by GitHub CI/CD, then compressed with UPX (the executable packer). The self-extracting compressed binary is then placed into a ZIP archive for convenient download. The file can be extracted from the archive and run directly.
Static linking is not supported on macOS, so running the downloaded application requires the following libraries to be available on the system: sqlite3, pcre2, argp and fts.

BUILD & INSTALLATION

Packaging for Distributions

The author has set up an automated build system using GitHub Workflows and will continue maintaining new versions.
The author is not willing to personally package and maintain precizer for all existing operating system distributions.
If packaging for a specific distribution encounters major challenges adapting the code, the author can help with supporting the initiative and optimizing the program for the target distro or package manager. Contact details are in the “Questions & Bug Reports” section.

Building with Docker

Building the program is already supported via Docker. Several tuned platforms are prepared and can be selected as the build distribution. Successfully tested distros:

Almalinux
Alpine
Arch
Debian
Gentoo
Rocky
Ubuntu

Configuration details and installed libraries are listed in the corresponding Dockerfiles under .docker/.

Build targets use the form docker-<distro>-<build> (for example debian and dynamic-production).

make docker-gentoo-production

This builds a production binary using the Gentoo Docker container.

make docker-ubuntu-production

This builds the same production target using Ubuntu.

After the build completes, an executable precizer appears in the project directory (built inside the container). The main benefit of using Docker is that a full build toolchain, libraries, and their dependencies are not required on the host system; running Docker yields the binary. The next step is choosing the binary variant. When in doubt, make portable is a good starting point. All available build variants are described below.

Manual Build

Preparation

git clone --depth=1 https://github.com/precizer/precizer.git
cd precizer

Portable binary

make portable

The result is a single statically linked, self-extracting compressed UPX ELF file with no dynamic dependencies. It contains the whole program and can be run on almost any modern Linux distribution. The file can be copied to any platform of the same architecture (x64/arm/etc).

The program is optimized for maximum portability.

Compilation and linking flags: -static -O2 -mtune=generic

Docker alternative:

make docker-ubuntu-portable

or replace -ubuntu- with any distro from the list above.

Single binary optimized for the local CPU

make production

The result is a statically linked, self-extracting compressed UPX ELF file tuned for the local CPU. It contains the whole program, can be run on the local machine, and will use the maximum available CPU features.

The program is optimized for maximum possible performance on local hardware.

Compilation and linking flags: -static -O3 -march=native

Docker alternative:

make docker-ubuntu-production

or replace -ubuntu- with any distro from the list above.

Dynamically linked binary optimized for the local CPU

make dynamic-production

The result is an ELF executable of about 50 kilobytes. It is tuned for the local CPU and dynamically linked against libraries installed on the system; it is also self-extracting and UPX-compressed. It can be built and run on the local machine if libraries such as sqlite3, pcre2, argp and fts are installed.

The binary is optimized for maximum performance and minimal size.

Compilation flags: -O3 -march=native

Docker alternative:

make docker-ubuntu-dynamic-production

or replace -ubuntu- with any distro from the list above.

Tests

The test sets in the tests/examples/ directory can be used to evaluate the program’s capabilities.

Test execution:

git clone https://github.com/precizer/precizer.git
cd precizer
make tests

Installation

Just copy the resulting precizer executable to any location listed in the $PATH environment variable for quick invocation.

Build dependencies for specific OS

Install build and compile tools on Linux

Arch Linux

sudo pacman -S --noconfirm base-devel gcc-libs sqlite pcre2 upx

Ubuntu/Debian Linux

sudo apt -y install gcc make libpcre2-dev libsqlite3-dev upx-ucl

Alpine Linux

sudo apk add --update build-base pcre2-dev pcre2-static fts-dev argp-standalone sqlite-dev upx

Almalinux/Rocky Linux

sudo dnf -y install gcc make sqlite sqlite-devel glibc-devel pcre2 pcre2-devel upx pcre2-static glibc-static

Gentoo Linux

echo "dev-libs/libpcre2 static-libs" >> /etc/portage/package.use/libpcre2;
emerge dev-libs/libpcre2 app-arch/upx

Clean up

Remove all build artifacts

make purge

USAGE EXAMPLES

Example 1

Add files to two databases and compare them with each other:

precizer --progress --database=database1.db tests/examples/diffs/diff1

precizer --progress --database=database2.db tests/examples/diffs/diff2

precizer --compare database1.db database2.db

_{The comparison of database1.db and database2.db databases is starting…

Starting database file database1.db integrity check…

Database database1.db has been verified and is in good condition

Starting database file database2.db integrity check…

Database database2.db has been verified and is in good condition

These files are no longer in the database1.db but still exist in the database2.db

path1/AAA/BCB/CCC/b.txt

These files are no longer in the database2.db but still exist in the database1.db

path2/AAA/ZAW/D/e/f/b_file.txt

The SHA512 checksums of these files do not match between database1.db and database2.db

2/AAA/BBB/CZC/a.txt

3/AAA/BBB/CCC/a.txt

4/AAA/BBB/CCC/a.txt

path1/AAA/ZAW/D/e/f/b_file.txt

path2/AAA/BCB/CCC/a.txt

Comparison of database1.db and database2.db databases is complete

The precizer completed its execution without any issues}

Example 2

Database Update

The previous example is run again. First attempt. Warning message.

precizer --progress --database=database1.db tests/examples/diffs/diff1

_{The database database1.db was previously created and already contains data with files and their checksums. Use the --update option only when it is certain that the database needs to be updated and when file information (including changes, deletions, and additions) should be synchronized with the database.

ERROR: The precizer process terminated unexpectedly due to an error}

The --update parameter must be included. This parameter is required to protect the database from data loss caused by accidental execution.

precizer --update --progress --database=database1.db tests/examples/diffs/diff1

_{Primary database file name: database1.db

Starting database file database1.db integrity check…

Database database1.db has been verified and is in good condition

File system traversal initiated to calculate file count and storage usage

Total size: 45B, total items: 58, dirs: 46, files: 12, symlnks: 0

The database file database1.db has NOT been modified since the program was launched

The precizer completed its execution without any issues}

Make the following adjustments:

# Modify a file
echo -n "  " >> tests/examples/diffs/diff1/1/AAA/BCB/CCC/a.txt

# Add a new file
touch tests/examples/diffs/diff1/1/AAA/BCB/CCC/c.txt

# Remove a file
rm tests/examples/diffs/diff1/path2/AAA/ZAW/D/e/f/b_file.txt

Run precizer again with the --update parameter:

precizer --update --progress --database=database1.db tests/examples/diffs/diff1

_{Primary database file name: database1.db

Starting database file database1.db integrity check…

Database database1.db has been verified and is in good condition

File system traversal initiated to calculate file count and storage usage

Total size: 43B, total items: 58, dirs: 46, files: 12, symlnks: 0

The --update option has been used, so the information about files will be updated against the database database1.db

File traversal started

These files have been added or changed and those changes will be reflected against the DB database1.db:

1/AAA/BCB/CCC/a.txt changed size & ctime & mtime rehashed

1/AAA/BCB/CCC/c.txt added

File traversal complete

Total size: 43B, total items: 58, dirs: 46, files: 12, symlnks: 0

These files are no longer exist or ignored and will be deleted against the DB database1.db:

path2/AAA/ZAW/D/e/f/b_file.txt

Start vacuuming the primary database…

The primary database has been vacuumed

The database file database1.db has been modified since the program was launched

The precizer completed its execution without any issues}

Every time precizer runs, it traverses the file system and then checks whether a record for a specific file already exists in the database. In other words, the program prioritizes the current state of the file system on disk.

The directory traversal in precizer works similarly to rsync as it uses a similar algorithm.

It's important to note that precizer will not recalculate SHA512 checksums for files that are already recorded in the database, as long as their metadata remains unchanged (such as size and last access time, atime). If the --watch-timestamps argument is specified, the program will also consider the creation time (ctime) and modification time (mtime) in addition to the file size.

Any new, deleted, or modified files between application runs will be processed accordingly. All changes will be reflected in the database if the --update parameter is specified.

Example 3

Using the --silent mode. When this mode is enabled, the program does not produce any output on the screen. This is useful when precizer is used in scripts.

Add the --silent parameter to the previous example:

precizer --silent --update --progress --database=database1.db tests/examples/diffs/diff1

As a result, nothing will be displayed on the screen.

Example 4

Additional Information in --verbose mode. This mode can be useful for debugging.

Add the --verbose parameter to the previous example:

precizer --verbose --update --progress --database=database1.db tests/examples/diffs/diff1

_{2025-01-25 09:55:59:820 src/parse_arguments.c:442:parse_arguments:Configuration: rational_logger_mode=VERBOSE

paths=tests/examples/diffs/diff1; database=database1.db; db_file_name=database1.db; verbose=yes; maxdepth=-1; silent=no; force=no; update=yes; watch-timestamps=no; progress=yes; compare=no, db-drop-ignored=no, dry-run=no, check-level=FULL, rational_logger_mode=VERBOSE

2025-01-25 09:55:59:820 src/parse_arguments.c:558:parse_arguments:Arguments parsed

2025-01-25 09:55:59:820 src/detect_paths.c:025:detect_paths:Checking directory paths provided as arguments

2025-01-25 09:55:59:820 src/file_availability.c:034:file_availability:Verify that the path tests/examples/diffs/diff1 exists

2025-01-25 09:55:59:820 src/file_availability.c:053:file_availability:The path tests/examples/diffs/diff1 is exists and it is a directory

2025-01-25 09:55:59:821 src/detect_paths.c:036:detect_paths:Paths detected

2025-01-25 09:55:59:821 src/init_signals.c:034:init_signals:Set signal SIGUSR2 OK:pid:604770

2025-01-25 09:55:59:821 src/init_signals.c:043:init_signals:Set signal SIGINT OK:pid:604770

2025-01-25 09:55:59:821 src/init_signals.c:052:init_signals:Set signal SIGTERM OK:pid:604770

2025-01-25 09:55:59:821 src/init_signals.c:055:init_signals:Signals initialized

2025-01-25 09:55:59:821 src/determine_running_dir.c:018:determine_running_dir:Current directory: /tmp

2025-01-25 09:55:59:821 src/db_determine_name.c:099:db_determine_name:Primary database file name: database1.db

2025-01-25 09:55:59:821 src/db_determine_name.c:105:db_determine_name:Primary database file path: database1.db

2025-01-25 09:55:59:821 src/db_determine_name.c:109:db_determine_name:DB name determined

2025-01-25 09:55:59:821 src/file_availability.c:034:file_availability:Verify that the path . exists

2025-01-25 09:55:59:821 src/file_availability.c:053:file_availability:The path . is exists and it is a directory

2025-01-25 09:55:59:821 src/file_availability.c:034:file_availability:Verify that the path database1.db exists

2025-01-25 09:55:59:821 src/file_availability.c:044:file_availability:The path database1.db is exists and it is a file

2025-01-25 09:55:59:821 src/db_determine_mode.c:128:db_determine_mode:Final value for config->sqlite_open_flag: SQLITE_OPEN_READWRITE

2025-01-25 09:55:59:821 src/db_determine_mode.c:129:db_determine_mode:Final value for config->db_initialize_tables: false

2025-01-25 09:55:59:821 src/db_determine_mode.c:131:db_determine_mode:DB mode determined

2025-01-25 09:55:59:821 src/db_test.c:061:db_test:Starting database file database1.db integrity check…

2025-01-25 09:55:59:821 src/db_test.c:082:db_test:The database verification level has been set to FULL

2025-01-25 09:55:59:821 src/db_test.c:126:db_test:Database database1.db has been verified and is in good condition

2025-01-25 09:55:59:822 src/db_get_version.c:087:db_get_version:Version number 1 found in database

2025-01-25 09:55:59:822 src/db_check_version.c:032:db_check_version:The database1.db database file is version 1

2025-01-25 09:55:59:822 src/db_check_version.c:061:db_check_version:The database database1.db is on version 1 and does not require any upgrades

2025-01-25 09:55:59:822 src/db_init.c:030:db_init:Successfully opened database database1.db

2025-01-25 09:55:59:822 src/db_init.c:118:db_init:The primary database and tables have NOT been initialized

2025-01-25 09:55:59:822 src/db_init.c:150:db_init:The primary database named database1.db is ready for operations

2025-01-25 09:55:59:822 src/db_init.c:167:db_init:The in-memory runtime_paths_id database successfully attached to the primary database database1.db

2025-01-25 09:55:59:822 src/db_init.c:174:db_init:Database initialization process completed

2025-01-25 09:55:59:822 src/db_compare.c:136:db_compare:Database comparison mode is not enabled. Skipping comparison

2025-01-25 09:55:59:822 src/db_contains_data.c:086:db_contains_data:The database database1.db has already been created previously

2025-01-25 09:55:59:822 src/db_validate_paths.c:192:db_validate_paths:The paths written against the database and the paths passed as arguments are completely identical

2025-01-25 09:55:59:822 src/file_list.c:143:file_list:File system traversal initiated to calculate file count and storage usage

2025-01-25 09:55:59:823 src/file_list.c:038:show_status:Total size: 43B, total items: 58, dirs: 46, files: 12, symlnks: 0

2025-01-25 09:55:59:825 src/db_get_version.c:087:db_get_version:Version number 1 found in database

2025-01-25 09:55:59:825 src/db_consider_vacuum_primary.c:025:db_consider_vacuum_primary:No changes were made. The primary database doesn't require vacuuming

2025-01-25 09:55:59:825 src/status_of_changes.c:049:status_of_changes:The database file database1.db has NOT been modified since the program was launched

2025-01-25 09:55:59:825 src/exit_status.c:027:exit_status:The precizer completed its execution without any issues}

Example 5

Non-recursive traversal using the --maxdepth parameter

tree tests/examples/4

tests/examples/4
├── AAA
│   ├── BBB
│   │   ├── CCC
│   │   │   └── a.txt
│   │   └── uuu.txt
│   └── tttt.txt
└── sss.txt

3 directories, 4 files

The --maxdepth=0 parameter completely disables recursion.

precizer --maxdepth=0 tests/examples/4

_{Primary database file name: myhost.db

The path myhost.db doesn't exist or it is not a file

The primary DB file not yet exists. Brand new database will be created

Recursion depth limited to: 0

File traversal started

These files will be added against the myhost.db database:

sss.txt

File traversal complete

Total size: 2B, total items: 5, dirs: 4, files: 1, symlnks: 0

Start vacuuming the primary database…

The primary database has been vacuumed

The database myhost.db has been modified since the last check (files were added, removed, or updated)

The precizer completed its execution without any issues}

Example 6

Example of a Path to Ignore. To specify a pattern for ignoring files or directories, PCRE2 regular expressions can be used. Note: All paths in the regular expression must be specified as relative.

PCRE2 regular expressions can be tested and validated using https://regex101.com/.

To illustrate how a relative path looks, run a directory traversal without the --ignore option and check how the terminal displays the relative paths recorded in the database:

% tree -L 3 tests/examples/diffs

tests/examples/diffs
├── diff1
│   ├── 1
│   │   └── AAA
│   ├── 2
│   │   └── AAA
│   ├── 3
│   │   └── AAA
│   ├── 4
│   │   └── AAA
│   ├── path1
│   │   └── AAA
│   └── path2
│       └── AAA
└── diff2
    ├── 1
    │   └── AAA
    ├── 2
    │   └── AAA
    ├── 3
    │   └── AAA
    ├── 4
    │   └── AAA
    ├── path1
    │   └── AAA
    └── path2
        └── AAA

26 directories, 0 files

precizer --ignore="^diff1/1/.*" tests/examples/diffs

In this example, the initial traversal path is ./tests/examples/diffs, and the generated ignore path is ./tests/examples/diffs/diff1/1/ along with all its subdirectories (/*).

_{Primary database file name: myhost.db

The path myhost.db doesn't exist or it is not a file

The primary DB file not yet exists. Brand new database will be created

File traversal started

These files will be added against the myhost.db database:

diff1/1/AAA/BCB/CCC/a.txt ignored & not added

diff1/1/AAA/ZAW/A/b/c/a_file.txt ignored & not added

diff1/1/AAA/ZAW/D/e/f/b_file.txt ignored & not added

diff1/2/AAA/BBB/CZC/a.txt

diff1/3/AAA/BBB/CCC/a.txt

diff1/4/AAA/BBB/CCC/a.txt

diff1/path1/AAA/BCB/CCC/a.txt

diff1/path1/AAA/ZAW/A/b/c/a_file.txt

diff1/path1/AAA/ZAW/D/e/f/b_file.txt

diff1/path2/AAA/BCB/CCC/a.txt

diff1/path2/AAA/ZAW/A/b/c/a_file.txt

diff1/path2/AAA/ZAW/D/e/f/b_file.txt

diff2/1/AAA/BCB/CCC/a.txt

diff2/1/AAA/ZAW/A/b/c/a_file.txt

diff2/1/AAA/ZAW/D/e/f/b_file.txt

diff2/2/AAA/BBB/CZC/a.txt

diff2/3/AAA/BBB/CCC/a.txt

diff2/4/AAA/BBB/CCC/a.txt

diff2/path1/AAA/BCB/CCC/a.txt

diff2/path1/AAA/BCB/CCC/b.txt

diff2/path1/AAA/ZAW/A/b/c/a_file.txt

diff2/path1/AAA/ZAW/D/e/f/b_file.txt

diff2/path2/AAA/BCB/CCC/a.txt

diff2/path2/AAA/ZAW/A/b/c/a_file.txt

File traversal complete

Total size: 97B, total items: 114, dirs: 90, files: 24, symlnks: 0

Start vacuuming the primary database…

The primary database has been vacuumed

The database myhost.db has been modified since the last check (files were added, removed, or updated)

The precizer completed its execution without any issues

Enjoy life!}

Repeat the same example, but this time without the --ignore option to include the three previously ignored files:

precizer --update tests/examples/diffs

_{Primary database file name: myhost.db

Starting database file myhost.db integrity check…

Database myhost.db has been verified and is in good condition

The --update option has been used, so the information about files will be updated against the database myhost.db

File traversal started

These files have been added or changed and those changes will be reflected against the DB myhost.db:

diff1/1/AAA/BCB/CCC/a.txt add

diff1/1/AAA/ZAW/A/b/c/a_file.txt add

diff1/1/AAA/ZAW/D/e/f/b_file.txt add

File traversal complete

Total size: 97B, total items: 114, dirs: 90, files: 24, symlnks: 0

Start vacuuming the primary database…

The primary database has been vacuumed

The database file myhost.db has been modified since the program was launched

The precizer completed its execution without any issues}

Example 7

Continuation of the Previous Example Example 6.

Multiple regular expressions for ignoring files can be specified simultaneously by repeating the --ignore option.

The database will be cleaned of references to files matching the regular expressions provided via the --ignore arguments: "diff1/1/.*" and "diff2/1/.*".

The --db-drop-ignored parameter must be explicitly specified to remove database entries for files that match the patterns passed through the --ignore option.

No changes were made to the file system, but the ignored files will be removed from the database.

# Update the database by removing entries for files that were marked as ignored:

precizer \
    --update \
    --db-drop-ignored \
    --ignore="^diff1/1/.*" \
    --ignore="^diff2/1/.*" \
    tests/examples/diffs

_{Primary database file name: myhost.db

Starting database file myhost.db integrity check…

Database myhost.db has been verified and is in good condition

The --update option has been used, so the information about files will be deleted against the database myhost.db

These files are no longer exist or ignored and will be deleted against the DB myhost.db:

diff1/1/AAA/BCB/CCC/a.txt clean ignored

diff1/1/AAA/ZAW/A/b/c/a_file.txt clean ignored

diff1/1/AAA/ZAW/D/e/f/b_file.txt clean ignored

diff2/1/AAA/BCB/CCC/a.txt clean ignored

diff2/1/AAA/ZAW/A/b/c/a_file.txt clean ignored

diff2/1/AAA/ZAW/D/e/f/b_file.txt clean ignored

Start vacuuming the primary database…

The primary database has been vacuumed

The database file myhost.db has been modified since the program was launched

The precizer completed its execution without any issues}

Example 8

Using --ignore together with --include

# Remove the old database and create a new one, then populate it with data:

rm -i "${HOST}.db"

precizer tests/examples/diffs

This variant uses regular expressions.

PCRE2 regular expressions for relative paths that need to be included. The specified relative paths will be included even if they were excluded using one or more --ignore parameters. Multiple regular expressions can be specified using --include.

PCRE2 regular expressions can be checked and tested using https://regex101.com/.

The DB will be cleaned of references to files matching the regular expressions provided in the --ignore arguments: "^.*/path2/.*" and "diff2/.*", but paths matching the patterns in --include will remain in the database.

The --db-drop-ignored parameter must be specified additionally to remove references to files matching the regular expressions passed via the --ignore options from the database.

# Update the database, removing references to files that were marked as ignored,
# except for paths matching the --include patterns.

precizer --update \
	--progress \
	--ignore="^.*/path2/.*" \
	--ignore="^diff2/.*" \
	--include="^diff2/1/AAA/ZAW/A/b/c/.*" \
	--include="^diff2/path1/AAA/ZAW/.*" \
	--include="^diff1/path2/AAA/ZAW/A/b/c/a_file\..*" \
	--db-drop-ignored \
	tests/examples/diffs

_{Primary database file name: myhost.db

Starting database file myhost.db integrity check…

Database myhost.db has been verified and is in good condition

The --update option has been used, so the information about files will be deleted against the database myhost.db

These files are no longer exist or ignored and will be deleted against the DB myhost.db:

diff1/path2/AAA/BCB/CCC/a.txt clean ignored

diff1/path2/AAA/ZAW/A/b/c/a_file.txt clean ignored

diff1/path2/AAA/ZAW/D/e/f/b_file.txt clean ignored

diff2/1/AAA/BCB/CCC/a.txt clean ignored

diff2/1/AAA/ZAW/D/e/f/b_file.txt clean ignored

diff2/2/AAA/BBB/CZC/a.txt clean ignored

diff2/3/AAA/BBB/CCC/a.txt clean ignored

diff2/4/AAA/BBB/CCC/a.txt clean ignored

diff2/path1/AAA/BCB/CCC/a.txt clean ignored

diff2/path1/AAA/BCB/CCC/b.txt clean ignored

diff2/path2/AAA/BCB/CCC/a.txt clean ignored

diff2/path2/AAA/ZAW/A/b/c/a_file.txt clean ignored

Start vacuuming the primary database…

The primary database has been vacuumed

The database file myhost.db has been modified since the program was launched

The precizer completed its execution without any issues}

Example 9

Protecting immutable archives with --lock-checksum

Use --lock-checksum for archival folders whose contents must never be rewritten. It accepts PCRE2 regular expressions for relative paths (same format as --ignore). Paths matching any lock pattern are written to the database once. After that their checksums are not recalculated, even with --update. Any later change in size, or in timestamps when --watch-timestamps is enabled, is treated as data corruption and reported instead of updating the record. You can provide multiple patterns by repeating the option.

precizer \
  --lock-checksum="^archive/2024/.*" \
  --lock-checksum="^snapshots/monthly/.*" \
  /mnt/storage

On subsequent runs, the same lock patterns must be preserved while refreshing the database:

precizer \
  --update \
  --lock-checksum="^archive/2024/.*" \
  --lock-checksum="^snapshots/monthly/.*" \
  /mnt/storage

Files outside the lock patterns follow normal update rules. For entries locked via --lock-checksum, any drift becomes visible immediately and precizer exits with a non-zero status, which can be used in scripts.

Example 10

Deep verification of locked data with --rehash-locked

The --rehash-locked option works only together with --lock-checksum. When it is enabled, every file that matches a lock pattern and already exists in the database is read again, its SHA512 checksum is recomputed, and the result is compared against the stored checksum. This provides an explicit integrity sweep for immutable archives at the cost of extra disk I/O. The option ignores whether --watch-timestamps is enabled or not. If the recalculated checksum and recorded size match, the file is considered consistent; if its timestamps on disk differ from the database, the ctime/mtime fields in the database are updated with the new values.

precizer --update \
  --lock-checksum="^archive/2024/.*" \
  --rehash-locked \
  /mnt/storage

The following cases illustrate how --lock-checksum, --watch-timestamps, and --rehash-locked interact:

File size mismatch. If the size stored in the database differs from the on-disk size, the file is flagged as a “locked checksum violation” regardless of --watch-timestamps and --rehash-locked. Rehashing a file with a different size is meaningless because the checksum cannot match anyway.
File size matches; neither --watch-timestamps nor --rehash-locked is used. Other values, such as SHA512 and timestamps, are not considered; the file is treated as fully consistent and precizer finishes with the SUCCESS status.
Size and timestamps match; --watch-timestamps is enabled and --rehash-locked is omitted. The file is treated as fully consistent, does not appear in the output, and precizer finishes with the SUCCESS status.
Size matches, timestamps differ; --watch-timestamps is enabled and --rehash-locked is omitted. The file is flagged as a “locked checksum violation” only due to timestamp drift, and precizer finishes with the WARNING status.
Size matches; --rehash-locked is enabled. Only the checksum and the size stored in the database matter. If both match, the file is considered consistent. If the on-disk timestamps changed, the new ctime/mtime values are saved to the database regardless of whether --watch-timestamps was used.

A practical workflow is to run a quick daily scan without --rehash-locked (and even without --watch-timestamps if timestamp drift is acceptable) to keep the database synchronized, then schedule a less frequent deep audit with --rehash-locked to force checksum-level verification of the frozen data set.

Example 11

Dropping inaccessible records with --db-drop-inaccessible

By default, when a file is inaccessible because of permission errors, its database record is preserved during --update to prevent accidental data loss. Dropping such records requires --db-drop-inaccessible:

precizer --update --db-drop-inaccessible /mnt/storage

_{drop due to inaccessible archive/secret.bin}

Note: this example applies only to files that have a record in the database but are truly inaccessible on disk for some reason. This can happen due to incorrect chmod/chown permissions or an incorrectly mounted volume. WARNING: if the file (or even its path) is actually deleted, not just temporarily inaccessible, then updating the database with --update will remove its record unconditionally — no extra options are needed.

AUTHOR

Software author: Dennis V. Razumovsky

LICENSE

This program is distributed under the CC0 (Creative Commons Zero) Public Domain Dedication. The author is not responsible for any use of the source code or the entire program. Anyone who uses the code or the program uses it at their own risk and responsibility.

Usage Restrictions within Territory Under the Ruscist Terrorist Regime, Where Power Has Been Seized by an Authoritarian Dictatorship

Permitted: strictly personal, non-commercial use by private individuals.
Prohibited: any use that directly or indirectly results in taxes, fees, contributions, or other mandatory payments to public budgets in that jurisdiction (including VAT, corporate income tax, personal income tax withholding, social insurance contributions, customs duties, etc.).
Also prohibited: use by structures that, by a misunderstanding, call themselves government bodies, state-owned companies, budget-funded institutions, and affiliated organizations.
Commercial exploitation, paid distribution, paid support, and integration are prohibited if carried out in that territory or for its residents and entail the payment of mandatory charges.
The restriction applies to the program itself and to its source code, in whole or in part.
Purpose: to prevent direct and indirect financing of the war in Ukraine.

precizer

Achievements

Achievements

Precizer — verify file checksums at scale

TL;DR

Overview

Basic Example

Relative Paths for Consistent Comparison

TECHNICAL DETAILS

QUESTIONS & BUG REPORTS

DOWNLOAD

Technical details of the portable build

BUILD & INSTALLATION

Packaging for Distributions

Building with Docker

Manual Build

Preparation

Portable binary

Single binary optimized for the local CPU

Dynamically linked binary optimized for the local CPU

Tests

Installation

Build dependencies for specific OS

Arch Linux

Ubuntu/Debian Linux

Alpine Linux

Almalinux/Rocky Linux

Gentoo Linux

Clean up

Remove all build artifacts

USAGE EXAMPLES

Example 1

Example 2

Example 3

Example 4

Example 5

Example 6

Example 7

Example 8

Example 9

Example 10

Example 11

AUTHOR

LICENSE

Usage Restrictions within Territory Under the Ruscist Terrorist Regime, Where Power Has Been Seized by an Authoritarian Dictatorship

Pinned Loading

Uh oh!