Hey guys my fleet process started acting weird lately It sta

Question

Hey guys my fleet process started acting weird lately It starts then takes over all of the CPU and RAM and stops after a few minutes I have about 15000 hosts and this has never happened before any ideas

zwass · Answer

What version of Fleet are you running

Alon Starikov · Answer

Currently 3 0 0 planning on upgrading to 3 3 0 in the near future

zwass · Answer

Do your logs by chance include many requests to the EnrollAgent endpoint

Alon Starikov · Answer

Yes

zwass · Answer

Is that expected for you Do you have quite a few new agents enrolling

zwass · Answer

If not is it possible you ve deployed a number of hosts with the same hardware UUID Perhaps by copying a VM

Alon Starikov · Answer

That might be the case is that the cause

zwass · Answer

We saw similar with another user The problem is that enrollment is a bit of an expensive operation and if there are multiple hosts that appear to be the same host to Fleet they will continually overwrite the enrollment

zwass · Answer

Here are some notes from that conversation Status Quo host identifier=uuid Works until hosts have the same UUID Seems to be an issue in your current environment Not viable in your current environment due to hosts overwriting enrollment host identifier=instance A new osquery specific UUID will be generated and stored in the osquery DB for each host Works until a VM image is copied with the osquery DB already initialized though host identifier=uuid will fail in the same way Changing this now will cause Fleet to see every host as a fresh enrollment leading to a single duplicate for each host in Fleet The duplicates will have to be cleaned up later though this can be automated with the host expiry setting in Fleet Redeploy offending hosts with properly reset UUIDs No idea if this is viable for your situation but if the duplicate issue described above seems worse than doing this it is worth considering

Alon Starikov · Answer

Right I ll look into it Thanks

zwass · Answer

Please let me know how that goes Of course we also need to fix Fleet to alert the user and not fall over in this situation

zwass · Answer

< Alon Starikov> are you still encountering this issue Would it be possible for you to generate a debug archive so that I can try to understand what is going on I am going to implement a fix that will rate limit enrollment but I d also really like to debug the issue that is being triggered before that is fixed

zwass · Answer

< Alon Starikov> we ve pushed a cooldown period for host enrollment in Fleet 3 5 0 that is likely to resolve the issue for you If you have a chance before upgrading we would really appreciate a debug archive It s easy to do and may help us prevent similar problems in the future

Alon Starikov · Answer

Apologies I won t be able to get around to it this week unfortunately I will try to get it done as soon as I can host identifier=instance actually seems to do the trick for me though I haven t encountered any problems since changing this setting