Channels
  • z

    Zach Zeid

    1 year ago
    Is there any benefit from having two of the same queries running at different intervals, one that is set to
    snapshot
    and the other set to
    differential
    ? Or would it make more sense to just have
    differential
    on a more frequent schedule?
  • zwass

    zwass

    1 year ago
    Some folks see benefit from this. Snapshot queries are easy to interpret since they show the full state at once. Of course they generate more data to store in wherever your logging pipeline goes. Some folks do things like schedule a snapshot query for once a day and the same query as differential every 10 minutes (intervals just for example) and then you can reconstruct the state at any given time by "applying" the diffs to the snapshot.
  • s

    seph

    1 year ago
    Im not sure that works in practice. The diffs aren't based on the scheduled results so weird skew is possible.
    I suspect people use snapshot to try to capture state and diffs to alert in specific conditions. But 🤷 I'm speculating.
  • a

    Alexander

    1 year ago
    We are using such scheme in our installation. In our model data loss is possible (and happens sometimes) in logs collector, so daily
    snapshot
    covers possible loss off
    differential
    query result for some queries where we want to be able to rebuild state of machine (for example deb_packages query)
    But we don't use distributed queries. If you have and can use them, and don't need timelines, then you can consider to use fleet or kolide service to get the most current state of machine.
  • s

    seph

    1 year ago
    How do you reconcile the diffs with the snapshots? Or is your data seldom changing so it’s a non-issue?
    How do you re-sync a diff stream that has dropped messages?
  • a

    Alexander

    1 year ago
    Yes, frequent changes require other pipeline and *_events tables. The scheme I described is for seldom changed, like installed packages, kernel modules, users list and it is applied to servers. Usually it is enough to just take latest from added and snapshot result in some time window. But if you want to use such pipeline in automation and "smart" alerts, than you have to fully rebuild the state of machine, applying incoming diffs to the last snapshot. Currently it is done only for the list of installed packages (used to track and check packages on servers over USN and NVD feeds).
    We don't re-sync. Usually drops occurs during network problems or performance problems on logs collector. We have queues to soften impact of short outage, but cant do anything with long-time problems. So here we have to wait for the next snapshot.
    Just to make it clear - logs collector is not where osquery sends logs. We have a service that gets logs from all osquerys, batches them, and sends to archive and logs collector. That service is much more stable than our current collector.
  • s

    seph

    1 year ago
    You may want to look at the epoch incrementing functionality
  • a

    Alexander

    1 year ago
    I see. Can epoch be changed on some event inside osquery or only by config plugin?
  • z

    Zach Zeid

    1 year ago
    epoch incrementing functionality?
  • a

    Alexander

    1 year ago
    No sure if got it right. Do you mean to use
    schedule_epoch
    option and counter to rebuild state and detect drops?
  • s

    seph

    1 year ago
    epoch changes are triggered by the TLS server (presumably also by the config plugin). I’m not aware of another way to change the epoch, but I haven’t gone digging too much.