And I have another question when I execute distributed query

Question

And I have another question when I execute distributed query against osquery schedule table I see that on some servers there aren t executions of scheduled queries But on the other hand I get query results by syslog which means that query actually executed Also I see many memory cpu thresholds violations and osquery workers stopping I have only two scheduled queries agains process events and file events tables Why What is the reason what to look for

zwass · Answer

Do those queries show as `denylisted` in `osquery schedule` That seems likely given the watchdog issues you describe

zwass · Answer

What is the interval on those queries Possibly setting a lower interval will mean the resource spikes at query time will be smallers and the queries will be able to complete

Macear · Answer

The intervals are 60s They aren t denylisted I have default watchdog limits Do I need to increase default limits If so are there any recommendations about what optimal common more or less thresholds are

zwass · Answer

It really depends on your environment and the observability lt gt performance tradeoffs you are willing to make I discuss performance some in <https dactiv llc blog osquery performance at scale > though you might already know most of what s in there

zwass · Answer

I would consider dropping the interval to 10s 6x fewer events per query run could mean ~6x lower resource usage Of course this will happen 6x more often But the point is to smooth out the load spikes so that osquery stays under the watchdog limits

zwass · Answer

This strategy assumes you are using event based tables and you have `events optimize` turned on the default

Macear · Answer

Yes events optimize is on

Macear · Answer

What depends on cpu and memory utilisation of queries like select from process events What do I need to take into consideration Does the number of running processes affect I suppose so but nevertheless I will be more confident if you assure of it

Macear · Answer

And also want to ask if the query profiler from the GitHub repo shows me resource usage by events queries

zwass · Answer

Keep in mind that any ` events` table will be constantly recording data into osquery s local store and the data will be read from the store when the query is run on the schedule

zwass · Answer

This means that the longer the interval for an ` events` table the more data is processed when the query runs For a regular table all of the data is generated at query time so the interval has no effect

zwass · Answer

The profiler won t be able to give you any helpful information on event based tables because there won t be any events in the buffer when the profile is run Essentially it will just look like the query is very efficient because it s not processing any data

Macear · Answer

Ok that s a good piece of information Thanks Would it affect performance if I set the interval less than 60s What is the possible minimum recommended options for interval setting

zwass · Answer

Let s say that your machine is generating 100 events per second If you run the query every 10 seconds the query process 1000 events each time If you run it every 60 seconds it process 60000 events each time It s the same number of events but the spike in resource usage is much smaller when the interval is lower

Macear · Answer

To clarify a little about my conf 1 events expiry is 1 2 events max is 50000

zwass · Answer

I don t have any hard numbers but my instinct would be if you go below 5 seconds or so you will start to have enough overhead that it might not gain you much

Macear · Answer

Ok so for now I will try to decrease the interval step by step and notice on how it changes the resource usage on servers Also I try to increase watchdog limits Thanks a lot for such a quick reply on my initial question and further help +1

zwass · Answer

Yeah good luck Let us know how it goes slightly smiling face

Francisco Huerta · Answer

< zwass> < Macear> interesting discussion what would be your take for the config refresh interval is 10s reasonable here as well or too aggresive i e a configuration update request forces a full download in each cycle or only whenever there is a change in it

zwass · Answer

For config refresh I would guide it by what your needs are Do you need a new scheduled query or a change to a scheduled query to be picked up within 10 seconds by an online host My feeling would be you probably don t In most environments 10 minutes an hour or even a day could be just fine for those You still have live queries if you need anything quickly

zwass · Answer

The config is going to be fully downloaded each cycle It s just some JSON so this is not a crazy amount of data but still this is network traffic you are generating and a slight bit of load on both the osqueryd client and the server

Francisco Huerta · Answer

Thanks much < zwass> Those were kind of my assumptions too and indeed I don t think an immediate refresh of the config is needed that shortly I see the benefit of more frequent updates when packs are being fine tuned but then probably a 1h or more might make more sense

zwass · Answer

Yeah this makes sense to me

Macear · Answer

For now some interim results I ve decreased the interval of queries against events tables from 60s down to 10s The number of hosts where limit violations are observed decreased by half On several hosts the issue remains and I noticed that 200mb is too low memory limit for them I m on the way of deploying new osquery configuration with increased watchdog limits Then I ll see if it helps