Channels
  • n

    nyanshak

    2 years ago
    👋 Hey - OpenBSM audit system is crashing for some users in my fleet on macOS 10.15.4 (reproduced on various 4.x.y osquery versions up to 4.3.0). I can detect this problem by looking for the presence of
    <some_timestamp>.crash_recovery
    log files in
    /var/audit
    . When the audit system crashes, osquery stops receiving events from
    process_events
    table. When the system is restarted,
    process_events
    will start going through again, since the audit subsystem is restarted. 1. (for a temporary fix) Is there a way to make the audit subsystem recover without rebooting the machine? The
    man audit
    suggests you should be able to do
    sudo audit -i
    to reinitialize the system. However, on doing this - it doesn't clear out the crash_recovery file, and process_events don't actually start getting processed again, including after restarting osquery. 2. (troubleshooting) Are there any good tools that can parse the audit binary log files? Trying to see if I can find any meaningful leads on *why* it crashed. 3. Has anyone else run into this and have any suggestions?
    2) on macOS, praudit can be used to view them. 🙂 so hey answered that question. If anyone has a Linux version / knows how to read on Linux, that would also be helpful.
  • theopolis

    theopolis

    2 years ago
    Interesting, so is it osquery’s usage of OpenBSM that influences the crash? Do you know if it then crashes for every process using audit/does it stop the logging on the system or just for osquery?
  • terracatta

    terracatta

    2 years ago
    Here is an additional data point. I currently run osquery with the disable-audit flag set to true and I have many crashes in my folder just from this month
    My
    audit_control
    file also has not been modified since the OS was first installed earlier this year
  • b

    billcobbler

    2 years ago
    We're not actually able to determine if osquery's usage is the cause. The data provided in the _crash_recovery_ files does not indicate what caused the crash, just that a crash happened and it recovered. In the failure state, events continue to emit to the
    .not_terminated
    file, but log volume is severely reduced and with only events values of: • SecSrvr AuthEngine • user authentication Examples of those two events with user info redacted:
    <record version="11" event="user authentication" modifier="0" time="Thu Apr 30 16:28:19 2020" msec=" + 158 msec" >
    <subject audit-uid="502" uid="502" gid="20" ruid="502" rgid="20" pid="11031" sid="100011" tid="2686386 0.0.0.0" />
    <text>Verify password for record type Users &apos;user1&apos; node &apos;/Local/Default&apos;</text>
    <return errval="failure: Unknown error: 255" retval="5000" />
    <identity signer-type="1" signing-id="com.apple.opendirectoryd" signing-id-truncated="no" team-id="" team-id-truncated="no" cdhash="0x1f5920de3532b6fae4f8050f2c7f507b5bbe838a" />
    </record>
    
    <record version="11" event="SecSrvr AuthEngine" modifier="0" time="Thu Apr 30 17:22:08 2020" msec=" + 661 msec" >
    <subject audit-uid="-1" uid="0" gid="0" ruid="0" rgid="0" pid="16775" sid="100000" tid="2701830 0.0.0.0" />
    <text>begin evaluation</text>
    <return errval="success" retval="0" />
    <identity signer-type="1" signing-id="com.apple.authd" signing-id-truncated="no" team-id="" team-id-truncated="no" cdhash="0xda52fe385f41ebc0f7fb14140bea0dfc97ac5644" />
    </record>
  • n

    nyanshak

    2 years ago
    I'm not saying it's osquery's fault per se. I just am not *that* familiar with macOS internals and struggling to understand how to dig into it properly. And I hoped maybe there was someone around with a bit deeper expertise in this area. But regardless of it being caused by osquery or not, it certainly affects osquery and our usage of it a fair bit. I actually have seen it on other macOS versions, but the majority of our mac fleet is on 10.15.4. Seen (in decreasing order of frequency on our fleet): 10.15.4, 10.14.6, 10.15.3, 10.15.5, 10.13.6, 10.14.3, 10.15.1, 10.15.2 But also like... the vast majority of our fleet is on 10.15.4 and 10.14.6 so it's hard to say if it shows up on all versions and if so, if they're the same cause...
    https://github.com/osquery/osquery/issues/6431 Raised this for tracking purposes
    @User - do you have any other process running that would read the audit socket that might trigger this crash? Trying to get more info about what's going on 🤷
  • terracatta

    terracatta

    2 years ago
    @nyanshak I just went over to my wife's iMac which is just used for web browsing, has a few games on it, adobe programs and it has about 15 crash reports in it
    never has run osquery or any other security software
  • n

    nyanshak

    2 years ago
    when you say "15 crash reports" - what do you mean? Like
    /var/audit/*.crash_recovery
    files? Or something else?
  • terracatta

    terracatta

    2 years ago
    Like 
    /var/audit/*.crash_recovery
    Yes