Finally we got to running the actual authrate tests, which yielded a peak rate of 4526 auths/second with 40 client threads. The rate declined from there as more clients were added; AD clearly isn't capable of handling very many concurrent sessions.
It's enlightening to look at the actual CPU time used during the import tasks. For ldifde on W2K3 we got:
time ldifde.exe -i -f examp3.ldif -h -q 8 261.10u 140.73s 4:23:46.85 2.5%
For slapadd on FC6 we got:
time .slapadd -f slapd.conf.slam -q -l example.ldif.1mil 260.75u 80.86s 7:05.17 80%
One interesting part here is that the amount of user CPU time is nearly identical in both cases. That implies that both slapadd and ldifde are doing about the same amount of work to parse the input LDIF.
Comparing the rest of the time isn't really fair since it seems that ldifde just feeds data into a running server using LDAP, while slapadd simply writes to the DB directly. I guess for the sake of fairness we'll have to time an OpenLDAP import using ldapadd next.