[ale] RPM DB Problems

Chris Ricker kaboom at gatech.edu
Sun Mar 3 15:34:59 EST 2002


On 2 Mar 2002, Michael Golden wrote:

> Hello all,
> 	Perhaps you may know how to fix this problem. Since Mandrake doesn't
> like to support apt anymore I'm forced to use urpmi.

You can probably still use apt; <http://apt-rpm.tuxfamily.org/> lists a
Mandrake mirror which support apt, and I'd guess there are others.

> Now, I can't even use it to do updates anymore however because apparently
> something broke my rpm database and so urpmi segfaults on it. So, here is
> the problem:
> 
> [root at naugrim michael]# rpm --initdb
> rpmdb: Overflow page 2070 of invalid type
> rpmdb: Overflow page 2605 of invalid type
> error: db3 error(-30985) from db->verify: DB_VERIFY_BAD: Database
> verification failed
> [root at naugrim michael]# rpm --rebuilddb
> error: rpmdb: damaged header instance #357 retrieved, skipping.
> Segmentation fault

Jeff Johnson regularly posts his guide to fixing corrupted rpmdb (since it's
the question he gets asked at least once a week ;-).  Here's the relevant
portion of a copy of it I saved:

<begin jbj quote>

No single tool, but here's what I do:

0) Make a copy of everything, just in case:
        cd /var/lib
        tar czvf /tmp/rpmdb.tar.gz rpm

1) Check /var/lib/rpm/Packages with db_verify.
        The command is
                db_verify /var/lib/rpm/Packages
        If that fails, the fix is
                cd /var/lib/rpm
                mv Packages Packages-ORIG
                db_dump Packages-ORIG | db_load Packages
        At this point Packages should be OK.

2) Read all Package headers.
        The command is
                rpm -qa
        If that "works", go to step 5), do "rpm --rebuilddb" and you're 
done.
        If that segfaults, go to 3).
        if that hangs, then do
                rm -f /var/lib/rpm/__db*
        and try again.

3) You have a segfault on a header.
        The first step is to get the segfaulting header instance.
        Append "debug" to the Packages configuration by
          echo "%_dbi_config_Packages %{_dbi_htconfig} lockdbfd debug" \
                >> /etc/rpm/macros

        Try "rpm -qa", lots of boring glop until segfault.

        On the screen will be something like

    ...
    Get Packages key (0x8216580,4) data (0x4014f008,155244) "#612" 47000000 
rc
0kernel-enterprise-2.4.9-26beta.17
    Get Packages key ((nil),0) data ((nil),0) "" deadbeef rc -30990
    <segfault here>

        The 2nd to last "Get Packages" line is what you want, the
        bad header instance above is "612" in decimal. Note the
        package name-version-release, you'll want to reinstall
        that package somewhen.

4) Nuke the bad header.
        I usually whip out my handy-dandy header instance deleter,
        edit the bad record into the source, compile, run, and go back
        to step 2).


        The program is at
                ftp://people.redhat.com/jbj/t38454.c
        FWIW, the program is named after the bugzilla report that
        caused me to write it.

        Retrieve, and edit the bad header instances into
                ...
                int badrecs[] = {
--> EDIT -->            562, 566, 559, 561, 563, 560, 564,
                        -1
                };
                ...

        Compile with
                cc -o t38454 t38454.c -ldb-3.2

        Run as
                t38454

        Go back to step 2).

5) Clean up.
        a) rpm --rebuilddb
        b) rm -f /etc/rpm/macros
        c) rm -rf /var/lib/rpmrebuild*
        d) Reinstall the deleted package.

Otherwise, give me a bug report at
        http://bugzilla.redhat.com
with a pointer (i.e. URL, bugzilla attachments won't work) to the
tarball you created in step 0), and I'll see what I can figger.

<end jbj quote>

You can probably start on step #3, but you might as well begin from #0 to be
sure.  Depending upon what the underlying problem is, this may or may not
actually work for you.  I've only gotten db so corrupted that they
segfaulted rpm --rebuilddb twice, and both times it was due to kernel
filesystem bugs which had caused part of the db to be overwritten with
random system binaries.  Obviously, there was no recovery from that ;-).  
You might be luckier, though.

later,
chris


---
This message has been sent through the ALE general discussion list.
See http://www.ale.org/mailing-lists.shtml for more info. Problems should be 
sent to listmaster at ale dot org.






More information about the Ale mailing list