• a

    Artem

    6 months ago
    Hi guys, Could someone please help me to connect osquery agents on Windows Server 2019 with fleet? Here are some points: • I have same configs and deploy scripts both on problem servers and on basic Windows 10 VMs (works fine); • osquery agents successfully passes enrolling, I see it in logs and I see agents in UI, but they are in state "Never fetched" and mostly in "Offline" state; • I checked the debug output of osquery daemon and see only one possible issue:
    W0207 18:02:24.934172 4352 watcher.cpp:391] osqueryd worker (560) stopping: Maximum sustainable CPU utilization limit exceeded: 18
    close after executing
    fleet_detail_query_software_windows
    • Adding
    --disable_watchdog=false --watchdog_delay=120 --watchdog_level=0 --watchdog_memory_limit=400 --watchdog_utilization_limit=21
    was with no luck; And now I have no thoughts..
  • zwass

    zwass

    6 months ago
    All of the failing hosts are Windows Server?
  • a

    Artem

    6 months ago
    Right, Microsoft Windows Server 2019 Standard 10.0 and Microsoft Windows Server 2019 Datacenter 10.0
  • Tomas Touceda

    Tomas Touceda

    6 months ago
    hi there, one thing you can try to narrow it all down to the software inventory query is disabling software inventory:
    ---
    apiVersion: v1
    kind: config
    spec:
      host_settings:
        enable_software_inventory: false
    if you apply that yaml, fleet will stop sending the software inventory queries to the hosts
  • could you tell me a bit more about your fleet setup: how many labels. policies, and packs/queries you've got?
  • a

    Artem

    6 months ago
    Hi Tomas, Honestly, I tried to minimise query packs for these hosts. As they are Domain Controllers I expect too much ntfs or socket events. I attached the debug output of osquery daemon. You can see all applied queries. I have same result with disabled "process_all" query pack. My fleet installation is about 1k hosts, only 4 labels and no policies at all.
  • Tomas Touceda

    Tomas Touceda

    6 months ago
    we have pending work to distribute the queries better for cases like this, I do notice you're running 5.1.0, would you be able to try 4.9.0? I have seen in the past issues with 5.x (in linux, though), but it would be good to discard some osquery issue
  • a

    Artem

    6 months ago
    I dont have debug output for 4.9.0, but it was the same problem with it. I hopped upgrade will fix it 😞
  • Tomas Touceda

    Tomas Touceda

    6 months ago
    ok, well that clears that doubt
  • have you tried disabling software inventory and seeing if that is the query causing the issue or if it's just a symptom of something else?
  • a

    Artem

    6 months ago
    Oh, I tried to disable software inventory as you recommended and it helped to resolve the issue! I also tried to set
    enable_host_users = false
    as that query returns about 1800 domain users, but it looks like that the issue only with
    enable_software_inventory
  • Tomas Touceda

    Tomas Touceda

    6 months ago
    that's great! so either the software inventory query pushed things over the edge, or it's taking too long. Would you be able to run this in one of the windows hosts:
    WITH cached_users AS (SELECT * FROM users)
    SELECT
      name AS name,
      version AS version,
      'Program (Windows)' AS type,
      'programs' AS source
    FROM programs
    UNION
    SELECT
      name AS name,
      version AS version,
      'Package (Python)' AS type,
      'python_packages' AS source
    FROM python_packages
    UNION
    SELECT
      name AS name,
      version AS version,
      'Browser plugin (IE)' AS type,
      'ie_extensions' AS source
    FROM ie_extensions
    UNION
    SELECT
      name AS name,
      version AS version,
      'Browser plugin (Chrome)' AS type,
      'chrome_extensions' AS source
    FROM cached_users CROSS JOIN chrome_extensions USING (uid)
    UNION
    SELECT
      name AS name,
      version AS version,
      'Browser plugin (Firefox)' AS type,
      'firefox_addons' AS source
    FROM cached_users CROSS JOIN firefox_addons USING (uid)
    UNION
    SELECT
      name AS name,
      version AS version,
      'Package (Chocolatey)' AS type,
      'chocolatey_packages' AS source
    FROM chocolatey_packages
    UNION
    SELECT
      name AS name,
      version AS version,
      'Package (Atom)' AS type,
      'atom_packages' AS source
    FROM cached_users CROSS JOIN atom_packages USING (uid)
    UNION
    SELECT
      name AS name,
      version AS version,
      'Package (Python)' AS type,
      'python_packages' AS source
    FROM python_packages;
    to see if that's taking too long on its own?
  • a

    Artem

    6 months ago
    So, users query returns me 1800 results (all domain users except computers). Software inventory uses
    SELECT * FROM users
    which returns ~3500 results (domain users + domain computers) and it increases CPU usage. I think this SELECT should be as in users query.
  • Tomas Touceda

    Tomas Touceda

    6 months ago
    ah, interesting find!
  • we are discussing internally to see how we can better approach this, btw, thank you for your patience!
  • zwass

    zwass

    6 months ago
    If you do
    select * from users
    on that DC, do all of the users have a value for the
    directory
    column?
  • a

    Artem

    6 months ago
    No, it looks like
    directory
    field exists only for accounts with successful local login. I attached redacted results with filled
    directory
    , all domain users looks like as in the last two lines.
  • Hi, did you come to any conclusion?
  • Tomas Touceda

    Tomas Touceda

    5 months ago
    hi there! this issue is still ongoing
  • g

    Gregory Storme

    5 months ago
    Also encountered this, on both Windows 2016/2019 running osqueryd 5.2.2, but only on domain controllers though. No more issue after setting
    enable_software_inventory: false
  • defensivedepth

    defensivedepth

    5 months ago
    Same here on multiple Windows DCs. Disabled software_inventory & enable_host_users and things are stable