Collecting Local RPM File Changes

danbo-1865359.jpgDo you know this? There is an old system you set-up ages ago without using a configuration management tool like ansible and you did not setup etc-keeper right away. And now it is time to migrate the configuration from one Fedora or CentOS version to a newer one. It is a mess to figure out what did you change on the system compared to the distribution packages and it is not much fun to migrate these changes to a new system. Wouldn’t it be nice to be able to get a diff of your local system compared to a clean installation?

I would like to have this and I was thinking about what is needed to implement it. RPM  is so great to track the state of all local file it manages. It stores several attributes such as the modification date, size and one or more checksums (usually MD5 and SHA-256 for Fedora packages). This allows to easily find files that are changed locally, for example rpm -qVa will query the RPM database and show all files where one of the attributes changed:
$ rpm -qVa
S.5....T. c /etc/koji-gc/koji-gc.conf
S.5....T. c /etc/kojira/kojira.conf
S.5....T. /usr/sbin/koji-gc

I am not 100% certain what the attributes mean. I believe the T means the timestamp is different and the 5 relates to a different MD5 checksum and S might be size. The c probably identifies config files. Querying the RPM database is the first step to identify which packages need to be diffed but do not yet allow to create the diff. However this information could be used to at least collect all the files that contain changes needed to be migrated. After playing around with the RPM python API I am planning to write a small utility that allows to gather all local files that are changed and to put them into a tarball. Before investing too much time into this, do you maybe know if something like this already exists?

The next step would be to try to get the unmodified files from the original RPMs and then produce a proper diff. This would also mean to interact with the actual RPM repositories and download the actual file using yum or dnf I guess. What do you think about this idea? Would it be useful for you, as well? Is there something that implements this already?

4 thoughts on “Collecting Local RPM File Changes”

  1. From the `rpm` man-page:

    The format of the output is a string of 9 characters, a possible attribute marker:

    c %config configuration file.
    d %doc documentation file.
    g %ghost file (i.e. the file contents are not included in the package payload).
    l %license license file.
    r %readme readme file.

    from the package header, followed by the file name. Each of the 9 characters denotes the result of a compari‐
    son of attribute(s) of the file to the value of those attribute(s) recorded in the database. A single “.”
    (period) means the test passed, while a single “?” (question mark) indicates the test could not be performed
    (e.g. file permissions prevent reading). Otherwise, the (mnemonically emBoldened) character denotes failure of
    the corresponding –verify test:

    S file Size differs
    M Mode differs (includes permissions and file type)
    5 digest (formerly MD5 sum) differs
    D Device major/minor number mismatch
    L readLink(2) path mismatch
    U User ownership differs
    G Group ownership differs
    T mTime differs
    P caPabilities differ

Leave a comment