• b

    Brandon

    1 year ago
    is there a parameter I can pass to configs to address this issue:
    save enroll failed: host identified by 1234123-1234-1234-1234-C3C04F373533 enrolling too often
    Also, seeing
    authentication error: missing node key
    and
    enroll failed: no matching secret found
    and finally
    failed to mark host seen: marking host seen: Error 1205: Lock wait timeout exceeded; try restarting transaction
  • These errors make up less than 0.8% total traffic from osquery to our elk stack
  • zwass

    zwass

    1 year ago
    Which version of Fleet are you on?
  • This usually means you have multiple hosts with the same UUIDs. The issue can be addressed by setting
    --host_identifier=instance
    in your osquery flagfile, or in Fleet 3.9.0 you can configure it within Fleet itself: https://github.com/fleetdm/fleet/blob/master/docs/2-Deployment/2-Configuration.md#osquery_host_identifier
  • b

    Brandon

    1 year ago
    3.9.0 is the version
  • s

    Scott Lampert

    1 year ago
    As an aside to this I noticed that if you already have hosts showing up with duplicate ID’s, changing to
    host_identifier=instance
    doesn’t help because the hosts already have the duplicated id stored in their osquery backing store and won’t regenerate a new one. Only new hosts that pick up that config change will have newly generated id’s.
  • @zwass :this:
  • b

    Brandon

    1 year ago
    Would redeploying to the hosts fix it?
  • zwass

    zwass

    1 year ago
    It sounds like using the setting in Fleet would probably be your easiest option.
  • @Scott Lampert are you talking about setting
    host_identifier=instance
    from the osquery options within Fleet?
  • s

    Scott Lampert

    1 year ago
    @zwass Both. Once osquery boots up with any sort of config that stores its uuid in the osquery backing store it won’t change unless you either remove the backing store and restart with
    instance
    enabled in the flags or use
    ephemeral
    in the flags. The issue on the fleet side is that if you have a bunch of nodes trying to enroll with the same id already you really need to use the cooldown or the database will get thousand of lock contentions and fall over (we have 120,000+ nodes checking into fleet). If a large portion of those nodes are stuck with a non-unique id they never get to enroll since the rate of nodes trying to enroll will always trigger the cooldown. This means you can’t really count on any osquery config changes in fleet to be picked up related to uuid. This might not be an issue until a certain scale.
  • zwass

    zwass

    1 year ago
    @Scott Lampert is it possible that what you are seeing is that an already-enrolled osquery database was copied over to multiple hosts? Otherwise that sounds like a bug in osquery, as
    instance_identifier
    should be generated separately for any installation, regardless of existence/value of UUID.
  • s

    Scott Lampert

    1 year ago
    The symptom we saw is that osquery was misconfigured locally to not have any
    host_identifier
    settings on a few thousand hosts exhibiting the above behavior. We found that even ssh’ing into the host and re-running with
    --host_identifier=instance
    fleet would still see the original duplicate hw uuid regardless of that setting. If we set it to
    ephemperal
    it would also work correctly. If we deleted the backing store and restarted it with
    instance
    it also would show up correctly.
  • Just changing it to
    instance
    would not seem to generate a new uuid in the osquery info.
  • once it already had one.
  • zwass

    zwass

    1 year ago
    instance_id
    is the column Fleet would use if you configure https://github.com/fleetdm/fleet/blob/master/docs/2-Deployment/2-Configuration.md#osquery_host_identifier. That should be unique per osquery database and if it's not that's a bug (please file an issue).
  • s

    Scott Lampert

    1 year ago
    It is but only if you initially used
    instance
    .
    instance
    stores the id in the backing store once it’s generated. Otherwise you would want ephemeral.
  • This is by design in osquery
  • instance uses an instance-unique UUID generated at process start, persisted in the backing store.
  • So once it has an id in the backing store changing it to
    instance
    will not generate a new uuid
  • You would have to use
    instance
    before osquery makes its db the first time.