Does anyone have a sense of where most of the resource utili

Question

Does anyone have a sense of where most of the resource utilization in osquery comes from when using event based tables with a high volume of events Is it in the RocksDB read write

alessandrogario · Answer

Is this about a specific platform In Audit BPF there s a lot of stuff to parse then there s a lot of state that need updating and finally some RocksDB too

alessandrogario · Answer

Looking to solve this with the experimental BPF work

zwass · Answer

I m just trying to get an intuitive sense of what s going on when people talk about resource consumption issues with evented tables

alessandrogario · Answer

for macOS I haven t tried but I m expecting Endpoint Security to be working really well events seems rather rich in metadata and didn t require too much parsing state handling

alessandrogario · Answer

On Linux we are at an additional disadvantage when the host is running many containers essentially causing osquery to trace multiple machines worth of data at the same time

defensivedepth · Answer

When I talk about performance issues with evented tables its usually two things osquery itself is struggling with the volume of events REF my backend system is struggling with the volume of events that is aggregated from a bunch of osquery endpoints sending evented data Other agents like Sysinternals Sysmon allow you to have complex filters at the endpoint before the events are shipped off the box Im actually presenting at bisdes ft wayne next week on the 2nd issue automatically generating osquery filters from sysmon configs Something < fritz> worked with me on last year

Juan Alvarez · Answer

We have many customers collecting `windows events` to forward to our SIEM and we hit every time issues with the watchdog limits as well as the rocksdb limitation mentioned in the thread above I think those are fairly common scenarios in bigger companies where EPS in the same box goes up to at least 500 EPS I am not sure i can say where the memory consumption comes at a lower level but we use to send the data using the `tls` logger and i think that the performance seems better when using `filesystem` which we can combine with some other sw for remote send like fluentbit Now i have been testing with a logger developed by us with ingestion levels near 1000 EPS aprox that seems stable but it just works in memory for now no rocksdb Just sharing this info in case it helps

seph · Answer

In the distant past Uptycs was talking about how performance in that kind of log shipping situation suffered because the data flows through the sql engine Honestly it always felt a bit weird Like if you don t want and of the SQL side why use osquery as your high volume log shipper

alessandrogario · Answer

we had a cool concept from packetzero that traded data integrity for performance

alessandrogario · Answer

I think we could have a different database plugin a rocksdb alternative

zwass · Answer

Ah yeah I remember both of those conversations Would it be crazy to try to use SQLite

alessandrogario · Answer

I think there are different things in the discussion 1 Going through sqlite to compute the results 2 Where results are stored currently rocksdb 3 Where buffered log lines are stored also rocksdb from the buffered log forwarder

alessandrogario · Answer

packetzero s PoC replaced 2 and 3 but still went through 1

alessandrogario · Answer

uptycs wanted to bypass 1 2 3 entirely

seph · Answer

I m generally against bypassing 1 Mostly coming from a what else is osquery stance I m willing to be convinced And I have no deep feelings about 2 and 3 To me those are implementation details and I ll roll with whatever y all tell me makes sense

alessandrogario · Answer

Totally agree with the above sentence

alessandrogario · Answer

We also have to talk about integrity again because it s not as guaranteed as it could in RocksDB

alessandrogario · Answer

and one might argue that it should either be all off and fast or all on and slow i e not the current situation where it s mixed

zwass · Answer

I m most curious about whether it could make sense to use sqlite to do 2 and 3 But it doesn t matter if RocksDB is only a small portion of the resource utilization for the event based tables

alessandrogario · Answer

before the integrity was relaxed again it was taking a significant toll on cpu memory

alessandrogario · Answer

I seem to recall that packetzero s PoC had clear advantages on that front

alessandrogario · Answer

I think we had an sqlite database plugin but was deprecated during experimetal It was frowned upon but I never used it so I don t know how it performed or why it was removed

seph · Answer

Does our SQLite have any kind of disk persistence mechanism

zwass · Answer

IIRC we open the DB in memory so even when you do things like `create table` they go away on restart

seph · Answer

Kinda what I mean So using sqlite for 2 or 3 has data loss questions

zwass · Answer

Ah yeah but I would think if we did it we would use disk backed sqlite

seph · Answer

Could be an interesting experiment Though I wonder why It feels like a deepish change and hitting disk is hitting disk

zwass · Answer

Partly I m interested in the idea of sqlite all the way down Getting some of the data living in an actual sqlite table seems like it could bring osquery s behavior closer to what many folks expect

seph · Answer

Maaaybe I am somewhat sketpcial

zwass · Answer

Also I m still scarred from seeing so many DB corruption issues in the past with RocksDB I m not sure that s been an issue as much lately though

seph · Answer

For 2 and 3 I think we d end up needing to do a lot that subverts sqlite expectations

seph · Answer

My gut sense is that hinting around what a db should be what they really mean is something about performance on tables Eg tables should be data on disk so that join performance is as expected I m sympathetic but I m not always sure I agree Mostly I think of osquery as an api translation layer With that caveat I think exploring osquery as more of a db would be interesting But I don t think I d come at it from 2 and 3 above Those feel weirdly tied into events I d probably start with The existing queuery cache stuff that never seems to work stefano s caching code Moving away from eponymous tables to But I think there are a lot of hard to answer questions

seph · Answer

Something like the `file` table or the `plist` one cannot be real tables Those are close to functions masquerading as tables

zwass · Answer

Many most tables aren t conducive to that pattern because there s no API for events Osquery as an api translation layer works really well for a lot of apis but less so for events IMO

seph · Answer

I agree I don t have a simple model for events

seph · Answer

It s like the api translation bolted onto a table store with a magic cleanup routine Which is a mouthful

alessandrogario · Answer

joins are also really weird when using evented tables

alessandrogario · Answer

i wouldn t mind having an alternative interface to event data the problem is identifying how it should look like

alessandrogario · Answer

given how JOINs essentially add race conditions to evented tables one way of thinking would be to just never do it and make sure evented rows have everything you could possibly need

alessandrogario · Answer

at that point though it doesn t make much sense to use sqlite to access it though

alessandrogario · Answer

race conditions as in joining the user id of an audit event with the users table

seph · Answer

Thinking aloud 1 Events feed a table 2 That table has some max events cleanup 3 Events have a unique Id 4 join select whatever 5 The magic thing would track last id and add an implicit where id gt last But that s a lot of overhead and I m not sure it would be performant

alessandrogario · Answer

I think the problem is that you can for example add remove users while events are generated

alessandrogario · Answer

if the event itself is not capturing the username then it s only a guess what is going to happen when joining against the users table

alessandrogario · Answer

worst case scenario user IDs are being reused and we would end up with the same ID mapping to different names

alessandrogario · Answer

if we talk about data quality events are high because they are coming from the source and hopefully things have been acquired atomically like endpoint sec or the metadata we get in bpf audit

alessandrogario · Answer

joining against anything will lower the quality significantly and I wondered many times if it actually makes sense to do it

alessandrogario · Answer

i can see why uptycs would like to acquire data as is sending it directly to the logger