https://github.com/osquery/osquery logo
#general
Title
# general
v

vaar

06/19/2020, 9:03 PM
I am facing an issue with process_events table, when a pid is reused the join with different tables can provide wrong results, did you ever think to implement an additional random (larger than uint16) pid? most of the EDRs implement this to avoid pid reuse
z

zwass

06/19/2020, 9:13 PM
This is an interesting one... I'd be curious to know more about how other tools deal with it. Can you open an issue on GitHub with some description of that?
s

seph

06/21/2020, 12:56 PM
I'm curious how that works in practice, and how it helps.
m

Mike Myers

06/22/2020, 9:56 PM
Maybe osquery could tag the process events with an additional randomly generated ID for uniqueness in the logs, but I don't think it would avoid the issue you're seeing, which I bet is a race condition between two point-in-time queries that constitute the JOIN. This is a design limitation, I think. @alessandrogario might know differently
s

seph

06/22/2020, 9:57 PM
I’m hesitant to suggest add a ulid, without understanding the problem it would solve. I don’t think it would help correlate between osquery tables. Best case. it provides a unique identifier for external systems. but said external systems should be able to make their own unique identifiers
a

alessandrogario

06/22/2020, 10:01 PM
How would an additional ID solve this issue? We need a way to map pid -> internal-pid-that-is-never-reused -> back to pid
we can come up with a way to generate it, but when joining we always end up using a standard pid once again
s

seph

06/22/2020, 10:02 PM
Yes, exactly. And it’s extra state to track
a

alessandrogario

06/22/2020, 10:02 PM
i.e.: we decide to use SPECIAL_UUID, convert it to pid and then access /proc/<pid>
but that pid may have been reused anyway
there should be a setting named max_pid that can be tweaked
(in linux)
best option would be to stop reusing them, but I'm not sure if the kernel can be configured to avoid that
maybe there's something we can do, but i have to test some stuff first
s

seph

06/22/2020, 10:07 PM
AFAIK the underlying os apis use pid for correlation. So I don’t see what there is to do
m

Mike Myers

06/22/2020, 10:13 PM
I somewhat recall PID-reuse behavior being OS-specific too
v

vaar

06/24/2020, 1:41 PM
yeah, it is not easy to solve with osquery, some EDRs build an internal mapping with process start time and pid, to have an unique identifier for process in case of pid reuse, so in osquery can be easy to have it in processes table, but not for other tables with pid field.
a

alessandrogario

06/24/2020, 1:52 PM
Yeah I actually had an idea for this the other day: 1. We add support for a secondary process id (internally we can use pid.timestamp so that we don't have state to track) 2. Add support for using that ID in SQL 3. Update our utilities that scan /proc so that: opendir on the pid folder under proc in order to lock it, then fstat to check for the timestamp, return as ENOENT if they don't match cc @theopolis
👍 2
if it's interesting, we can open a blueprint issue so that people can weight the pros and cons in implementing something like this
s

seph

06/24/2020, 5:40 PM
Is the intent there so that if you join between two tables, something implied (like the pid timestamp) will prevent joins from breaking? That seems clever. Though I wonder what problem we’re solving.
Is it joins in a short time interval? Or is it archival data in some SIEM
a

alessandrogario

06/24/2020, 5:43 PM
How short the interval is depends on max_pid, osquery event expiration, how often events are generated, how often scheduled queries are hitting those tables
but yeah a blueprint could give us more feedback from users
it doesn't seem like archival is required now (unless I'm wrong)
3 Views