• m

    MarkMurdock

    1 year ago
    I've been struggling with a launcher/osquery problem for a few days now. We have a few Server 2019 systems that this is happening on, and from what I can tell it appears that launcher can't communicate with the osquery process, but I'm definitely no expert at reading the debug information. I'll post more in replies.
  • The affected systems look very similar to me compared to the systems where things are working, but I'm assuming there has to be some difference so I'm still searching. Same OS/Build and same patch level.
  • Here's a debug file output from the following attempt to run launcher manually:
    "C:\Program Files\Kolide\Launcher-launcher\bin\launcher.exe" --autoupdate --debug --debug_log_file "launcher-debug.log" --osquery_verbose --update_channel stable --hostname <http://uscolo1fleet01.corp.dir.spartech.com:8080|uscolo1fleet01.corp.dir.spartech.com:8080> --enroll_secret_path "C:\Program Files\Kolide\Launcher-launcher\conf\secret" --root_directory "C:\Program Files\Kolide\Launcher-launcher" --osquery_flag host_identifier=hostname
  • I've tried completely removing our AV/endpoint product, just in case that was interfering, but there's no change in behavior.
  • I've tried uninstalling and wiping the Kolide program files folder, and then re-installing.
  • Also I've tried with and without autoupdate, with no change. It's almost as if some pipe communication is not happening between launcher and osqueryd? I see the osqueryd.exe and osquery-extension.exe processes running while launcher is retrying to communicate with them.
  • s

    seph

    1 year ago
    That’s a weird
    --root_directory
    — it’s not the root of the launcher install, it should be the data directory where launcher and osquery write their db to
  • The basic thing I see, is that osquery starts, but when launcher tries to start using it, it’s not there on the socket. I don’t see any osquery errors, but I wonder if its just taking a weirdly long time to start up
  • One thing to try, would be manually launching osquery. Does it start up promptly, or does it take ages.
  • I know there’s some timing sensitive startup stuff
  • m

    MarkMurdock

    1 year ago
    Thank you Seph those are good observations and I'll give those things a try and get back to you here.
  • Do you recommend that when running osqueryd manually I mimic the arguments that launcher is using, based on what I see in the log output? I did that on both systems (working launcher and non-working launcher) and I'm a little surprised to find that it does not run on either.
  • s

    seph

    1 year ago
    Honestly it's been awhile since I've done that. What's the error?
  • m

    MarkMurdock

    1 year ago
    Here's the output that I capture in a file. This looks pretty similar on both systems but I'm going to compare a little closer. The output I'm sending you here is from the system where launcher is failing.
    I0527 19:57:25.045593  5312 init.cpp:342] osquery initialized [version=4.7.0]
    I0527 19:57:25.076864  5312 system.cpp:342] Found stale process for osqueryd (4076)
    I0527 19:57:25.076864  5312 system.cpp:374] Writing osqueryd pid (8712) to C:\Program Files\Kolide\Launcher-launcher\data\osquery.pid
    I0527 19:57:25.076864  5312 extensions.cpp:438] Found autoloadable extension: C:\Program Files\Kolide\Launcher-launcher\bin\osquery-extension.exe
    I0527 19:57:25.076864  5312 dispatcher.cpp:78] Adding new service: WatcherRunner (0000014492AFD010) to thread: 4548 (0000014492AF7AC0) in process 8712
    I0527 19:57:25.076864  5312 rocksdb.cpp:132] Opening RocksDB handle: C:\Program Files\Kolide\Launcher-launcher\data\osquery.db
    I0527 19:57:25.092486  4548 watcher.cpp:659] Created and monitoring extension child (4160): C:\Program Files\Kolide\Launcher-launcher\bin\osquery-extension.exe
    [C:\Program Files\Kolide\Launcher-launcher\bin\osquery-extension.exe --verbose --socket \\.\pipe\kolide-osquery-01F6QZD64HQ6J03NNP01ZREWSK --timeout 10 --interval 3]I0527 19:57:25.352573  5312 dispatcher.cpp:78] Adding new service: ExtensionWatcher (0000014492B159E0) to thread: 7896 (0000014492B56630) in process 8712
    I0527 19:57:25.352573  5312 dispatcher.cpp:78] Adding new service: ExtensionRunnerCore (0000014492AB18F0) to thread: 2504 (0000014492B56530) in process 8712
    I0527 19:57:25.352573  2504 interface.cpp:270] Extension manager service starting: \\.\pipe\kolide-osquery-01F6QZD64HQ6J03NNP01ZREWSK
    E0527 19:57:40.675662  5312 shutdown.cpp:69] Cannot activate kolide_grpc config plugin: Unknown registry plugin: kolide_grpc
    I0527 19:57:40.675662  5312 auto_constructed_tables.cpp:97] Removing stale ATC entries
    W0527 19:57:40.675662  5312 init.cpp:589] Error reading config: Missing config plugin 
    E0527 19:57:56.008266  5312 shutdown.cpp:69] Cannot activate kolide_grpc logger plugin: Unknown registry plugin: kolide_grpc
    E0527 19:58:11.259007  5312 shutdown.cpp:69] Cannot activate kolide_grpc distributed plugin: Unknown registry plugin: kolide_grpc
    I0527 19:58:11.259007  5312 eventfactory.cpp:156] Event publisher not enabled: ntfs_event_publisher: NTFS event publisher disabled via configuration
    I0527 19:58:11.259007  5312 events.cpp:36] Skipping subscriber: powershell_events: Required publisher is disabled by configuration
    I0527 19:58:11.259007  5312 events.cpp:36] Skipping subscriber: windows_events: Required publisher is disabled by configuration
    I0527 19:58:11.259007  5260 eventfactory.cpp:391] Starting event publisher run loop: WindowsEventLogPublisher
    I0527 19:58:11.259007  5260 eventfactory.cpp:411] Event publisher WindowsEventLogPublisher run loop terminated for reason: Publisher disabled by configuration
    I0527 19:58:11.259007  5312 dispatcher.cpp:149] Thread: 5312 requesting a stop
    I0527 19:58:11.259007  5312 dispatcher.cpp:156] Service: 0000014492AFD010 has been interrupted
    I0527 19:58:11.259007  5312 dispatcher.cpp:156] Service: 0000014492B159E0 has been interrupted
    I0527 19:58:11.259007  5312 dispatcher.cpp:156] Service: 0000014492AB18F0 has been interrupted
    I0527 19:58:11.259007  5312 dispatcher.cpp:122] Thread: 5312 requesting a join
    I0527 19:58:11.259007  5312 dispatcher.cpp:140] Service thread: 0000014492B56530 has joined
    I0527 19:58:11.259007  5312 dispatcher.cpp:140] Service thread: 0000014492B56630 has joined
    I0527 19:58:11.259007  5312 dispatcher.cpp:140] Service thread: 0000014492AF7AC0 has joined
    I0527 19:58:11.259007  5312 dispatcher.cpp:144] Services and threads have been cleared
  • It appears to bomb here:
    Cannot activate kolide_grpc config plugin: Unknown registry plugin: kolide_grpc
  • s

    seph

    1 year ago
    Hrm. It may be harder to make it just work. Some of that is coming in via launcher
  • m

    MarkMurdock

    1 year ago
    That's sort of what I thought. Maybe I can get some debug going on osquery and run it through launcher.
  • s

    seph

    1 year ago
    You’re already getting that — the osquery args include
    --verbose
    .
  • m

    MarkMurdock

    1 year ago
    Ok, gotcha. Well, there's got to be some difference between these systems. It seems like these might be newer OS's that I've deployed, so I'll continue to hunt for differences.
  • s

    seph

    1 year ago
    I’d generally bet something in permissions. But I’m still surprised there’s no logging.
  • I’d also tinker with trying to start osquery from the command line
  • You don’t need it to connect to the server, so maybe drop that too
  • Just trying to see if the OS is doing something weoird
  • m

    MarkMurdock

    1 year ago
    Good ideas, thanks.
  • s

    seph

    1 year ago
    I keep wondering what’s up with
    Extension socket not available: \\\\.\\pipe\\kolide-osquery-01F6QPMW4JVZVQZAWQS94WTCEJ.30828"
    That path is, AFAIK, reasonable
    \\.\pipe\<whatever>
    is where windows keeps named pipes. And the rest is a random ulid to ensure uniqueness. So, why would that fail? It’s weird
  • (I might also be chasing the wrong idea)
  • m

    MarkMurdock

    1 year ago
    I think you may be on to something with the pipe. If I look at a working system, there are two pipes:
    \\.\\pipe\\kolide-osquery-01F6RS33G9Y895NC1HWN79ZDAD
    \\.\\pipe\\kolide-osquery-01F6RS33G9Y895NC1HWN79ZDAD.13599
    Note that one of them has the 5 digit extension. If I watch the pipe list on a non-working system right after I start the launcher service, I only see 1 pipe:
    \\.\\pipe\\kolide-osquery-01F6RTP7R619269W0ZS6ZWDVAN
    Is there any significance to that? It seems to be the pipe name with the 5 digit extension that the error message indicates launcher can't connect to, and on my non-working systems that pipe name is never created.
  • Also by the way I was able to get osqueryd to run and connect to my fleet server on both systems (working and non-working) so it does appear that osqueryd is stable on both. It really seems to be something around that pipe communication.