• z

    zhong

    7 months ago
    Hi folks, looking for some help with connectivity to fleet. I currently have a few RHEL hosts that are up on fleet. Some are connected and show that they're online, but others will periodically show as offline. I've tried deleting them from fleet and reinstalling my 'fleet-osquery' package to see if it fixes the issue and it seems like it does but it is not permanent as those same hosts will show offline a few hours later. I've connected these hosts to a domain controller so I edited
    /etc/resolv.conf
    with the domain nameserver IP and edited
    /etc/NetworkManager/NetworkManager.conf
    so that the changes are on
    /resolv.conf
    stay the same even after reboot. I've also disabled
    SELINUX
    because it was not letting the hosts connect. Currently am stumped on what could be blocking the connection or why the hosts go offline. Any help on what could be the issue?
  • zwass

    zwass

    7 months ago
    Are you able to SSH onto one of the effected hosts when it goes offline? Can you
    curl
    the Fleet server at that time?
  • z

    zhong

    7 months ago
    i can still SSH into them and when i
    curl
     the Fleet server it returns HTML
  • zwass

    zwass

    7 months ago
    Anything in
    /var/log/osquery
    ?
  • Lucas Rodriguez

    Lucas Rodriguez

    7 months ago
    Also try checking syslog messages in the host (
    /var/log/messages
    IIRC
  • z

    zhong

    7 months ago
    /var/log/osquery
    is empty and i do not see
    messages
    in
    /var/log
  • zwass

    zwass

    7 months ago
    systemctl status orbit.service
    ?
  • z

    zhong

    7 months ago
    shows that it is active and running
  • zwass

    zwass

    7 months ago
    Can you edit
    /usr/lib/systemd/system/orbit.service
    to add
    --debug
    to the
    orbit
    command and then reload+restart the service?
  • z

    zhong

    7 months ago
    sorry, still a bit new to fleet/osquery, where in
    orbit.service
    would i add
    --debug
    ?
  • zwass

    zwass

    7 months ago
    Can you show the contents of that file?
  • (I don't have a Linux box up at the moment to reference easily)
  • z

    zhong

    7 months ago
    here are the contents, i appreciate all the help so far 😄
  • zwass

    zwass

    7 months ago
    Add
    --debug
    at the end of the
    ExecStart
    line please.
  • Then
    sudo systemctl daemon-reload && sudo systemctl restart orbit.service
  • z

    zhong

    7 months ago
    doing that brought the host back online
  • mind if i ask what adding
    --debug
    did for orbit?
  • zwass

    zwass

    7 months ago
    Can you check that logs are more verbose in
    systemctl status orbit.service
    ?
  • It just turned on more verbose logging. It was probably restarting the orbit/osquery process that brought it back online.
  • If we have more verbose logs now, hopefully we can determine what the issue is when it goes offline again.
  • I suspect
    systemctl restart orbit.service
    without any other changes would have temporarily "fixed" it because that would have restarted the processes.
  • z

    zhong

    7 months ago
    ah i see, will those logs be in
    /var/log/orbit
    ?
  • zwass

    zwass

    7 months ago
    Yes I think so.
  • z

    zhong

    7 months ago
    awesome, I will be keeping an eye on the host and see when it goes offline again and update here. Thank you for the help!
  • zwass

    zwass

    7 months ago
    Thank you!
  • z

    zhong

    7 months ago
    None of the hosts that had the logging turned on have gone offline since yesterday 🤦‍♂️. Still waiting on them! Anything else I could check that could be the issue in the meantime?
  • zwass

    zwass

    7 months ago
    I am not sure what else we could check with them currently working as expected. Let us know if they go bad again.
  • z

    zhong

    7 months ago
    will do!