Hi has anybody used osquery sucessfully in windows servers w

Question

Hi has anybody used osquery sucessfully in windows servers with a high load of events I am trying to set osquery to capture all the windows events and send them up to our SIEM using the tls plugin in some scenarios with around 2000 3000 events per second I have noted that osquery struggles trying to get that huge amount of events out I am actually querying every 30 secs tried 60 secs as well and flushing every 10 seconds I have also increased the `logger tls max lines` to the maximum 99999 Any suggested configuration or experience from somebody

Stefano Bonicatti · Answer

Are you able to have direct access to one of those servers WIth that amount of events and a default watchdog cpu limit or memory limit I would expect the watchdog to keep killing osquery and so blocking its ability to flush the logs One quick way to notice that beyond looking at the log which though you struggle to receive is to see via Task Manager if you see one of the `osqueryd` pid which keeps changing You can also verify how much CPU Memory is using

Juan Alvarez · Answer

yes i should have mentioned that i disabled watchdog for testing purposes just to see if i am able to do actually send the events

Juan Alvarez · Answer

now i have restarted in verbose mode and i see some of these which i have not seen bfore ```I1222 14 20 13 356295 2880 scheduler cpp 176 Found results for query pack DevoEventsPack all windows events I1222 14 20 16 077103 2880 rocksdb cpp 67 RocksDB WARN db events Stalling writes because we have 15 immutable memtables waiting for flush max write buffer number is set to 16 rate 16777216 I1222 14 20 16 142108 2880 rocksdb cpp 67 RocksDB WARN db events Stalling writes because we have 15 immutable memtables waiting for flush max write buffer number is set to 16 rate 16777216 I1222 14 20 16 149109 112 rocksdb cpp 67 RocksDB WARN db events Stalling writes because we have 15 immutable memtables waiting for flush max write buffer number is set to 16 rate 13421772 I1222 14 20 16 712949 2880 rocksdb cpp 67 RocksDB WARN db events Stalling writes because we have 15 immutable memtables waiting for flush max write buffer number is set to 16 rate 16777216 I1222 14 20 16 775468 2880 rocksdb cpp 67 RocksDB WARN db events Stalling writes because we have 15 immutable memtables waiting for flush max write buffer number is set to 16 rate 16777216 I1222 14 20 16 806708 4772 rocksdb cpp 67 RocksDB WARN db events Stalling writes because we have 20 level 0 files rate 13421772 I1222 14 20 16 869204 3528 rocksdb cpp 67 RocksDB WARN db events Stalling writes because we have 21 level 0 files rate 10737417 I1222 14 20 16 900492 2880 rocksdb cpp 67 RocksDB WARN db events Stalling writes because we have 21 level 0 files rate 8589933 I1222 14 20 16 916087 2268 rocksdb cpp 67 RocksDB WARN db events Stalling writes because we have 22 level 0 files rate 6871946 I1222 14 20 16 962956 112 rocksdb cpp 67 RocksDB WARN db events Stalling writes because we have 23 level 0 files rate 5497556 I1222 14 20 17 116748 2880 rocksdb cpp 67 RocksDB WARN db events Stalling writes because we have 23 level 0 files rate 4398044 I1222 14 20 17 476115 2880 rocksdb cpp 67 RocksDB WARN db events Stalling writes because we have 23 level 0 files rate 3518435 I1222 14 20 17 788628 2880 rocksdb cpp 67 RocksDB WARN db events Stalling writes because we have 23 level 0 files rate 2814748 I1222 14 20 18 024206 2880 rocksdb cpp 67 RocksDB WARN db events Stalling writes because we have 15 immutable memtables waiting for flush max write buffer number is set to 16 rate 3940647 I1222 14 20 18 149199 2880 rocksdb cpp 67 RocksDB WARN db events Stalling writes because we have 15 immutable memtables waiting for flush max write buffer number is set to 16 rate 5516905 I1222 14 20 18 242956 2880 rocksdb cpp 67 RocksDB WARN db events Stalling writes because we have 15 immutable memtables waiting for flush max write buffer number is set to 16 rate 7723666```

Juan Alvarez · Answer

thinking face

Juan Alvarez · Answer

Also i should mention that i am using osquery 4 9 0 not sure if there is any patch in 5 1 that might help but i am going to give it a try

Juan Alvarez · Answer

I am going to also try to increase this hidden flag `HIDDEN FLAG int32 rocksdb write buffer 16 Max write buffer number ` any idea if that might help unstucking things

Stefano Bonicatti · Answer

I see I haven t personally yet encountered such high amount of events and those messages even though there RocksDB is on disk so disk speed will affect the ability of osquery to quickly store data

Stefano Bonicatti · Answer

I know of others who have reported this in the past like <https github com osquery osquery issues 5162> it s something that would need a deeper look into I suspect that increasing that number will only increase the ability to handle spikes as long as spikes are the problem but if it s continuous then the writing rate has to increase

Juan Alvarez · Answer

I see This server is gathering data from other servers via WEF WEC and therefore the amount of events is so high since the events are forwarded What would you say is a number of Events Per Second osquery can handle in a normal scenario

Juan Alvarez · Answer

I am going to try to put a better disk in place see if the situation changes to any better Any other recommendation to increase the writing rate Any parameter that can be tweaked in rocksDB

Stefano Bonicatti · Answer

Giving numbers is quite difficult because there are too many variables from hardware to the nature of the events I think I would expect maybe less than 1k s events as for parameters to change you could try increasing `rocksdb background flushes` this should translates roughly in how many threads are used to do the flushing

Stefano Bonicatti · Answer

In any case as I was saying I haven t personally encountered such a high amount of events generated would need to test and do some profiling I suspect this requires code changes

Juan Alvarez · Answer

Thanks i am going to test a bit with those ill see where i can get

Juan Alvarez · Answer

Tried with a gp3 in AWS with 16000 IOPS and 1000MiB throughput still could not make it work I have left a comment in <https github com osquery osquery issues 5162> i am not sure how common is this scenario maybe not common to be so busy but still i feel that in the windows events we always see problems when loads get a little higher

Stefano Bonicatti · Answer

< Juan Alvarez> Since you were using AWS would you be able to also give us the specs of that VM CPU Memory Disk Thanks

Juan Alvarez · Answer

Sure Instance type m5 xlarge 4 vCPUs 16 GB RAM Disk GP3 SSD Size 40 GB IOPS 16000 Max IOPS Throughput 1000 MiB s

Stefano Bonicatti · Answer

Do you also know roughly what s the CPU usage pre and post osquery running

Juan Alvarez · Answer

Around 15 in this server that i am replicating the issue in only a event generator using <https github com andrewkroh goeventgen> is running to emulate the load