• d

    Dawei Zhang

    4 months ago
    hi, an issue reg. fleet running behind a load balancer
  • zwass

    zwass

    4 months ago
    Ah, this looks like an issue with your load balancer not supporting websockets.
  • d

    Dawei Zhang

    4 months ago
    When we run query in fleet UI, occassionally, it runs forever, this happens once in 7/8 tries.
  • we enabled wss, and wss works
  • for example, this time wss works.
  • that issue only happens once in 10 times or so
  • Kathy Satterlee

    Kathy Satterlee

    4 months ago
    Hi @Dawei Zhang! I'd like to get a little more info about your environment so that we can hopefully dig into the root of the environment. Let's start with: 1. How is your Fleet server deployed? 2. What version of Fleet are you running? 3. Is there anything the handing queries have in common, or does it seem random 4. Are you seeing any errors in the Fleet logs?
  • Tomas Touceda

    Tomas Touceda

    4 months ago
    another important piece of the puzzle is Redis here, we've found some issues depending on configuration. See here for instance: https://fleetdm.com/docs/deploying/faq#im-only-getting-partial-results-from-live-queries
  • d

    Dawei Zhang

    4 months ago
    @Kathy Satterlee Thanks for looking into it. 1. We deploy two fleet instances behind a Nginx load balancer. 2. fleet version: fleet_v4.12.0_linux 3. Random queries. 4. I do see some errors in logs
    {
      "component": "http",
      "err": "timestamp: 2022-03-31T18:30:26Z: error in query ingestion",
      "ingestion-err": "campaign waiting for listener (please retry)",
      "ip_addr": "10.124.121.115",
      "level": "error",
      "method": "POST",
      "took": "6.469503ms",
      "ts": "2022-03-31T18:30:26.444353614Z",
      "uri": "/api/v1/osquery/distributed/write",
      "x_for_ip_addr": "10.124.121.115"
    }
    let me know if you need more info
  • @Tomas Touceda Thank you for the info, let me try to update Redis config
  • a

    Artem

    3 months ago
    @Dawei Zhang hi! Have you solved you problem? We see same error now and try to fix it.
  • zwass

    zwass

    3 months ago
    If you are seeing the
    xhr_send
    request, this likely means your load balancer (or something in the network) is blocking websockets.
  • d

    Dawei Zhang

    3 months ago
    It's not fixed yet. We only have one instance running now
  • a

    Artem

    3 months ago
    We set all right settings in load balancer (nginx), so we don’t see any problems with websockets. BTW I have some problems with software inventory (and as result with vulnerability management module) with same errors:
    Apr 28 18:55:58 fleet-01.test.tech fleet[3040986]: {"component":"http","err":"timestamp: 2022-04-28T18:55:58Z: error in query ingestion","ingestion-err":"ingesting query software_linux: update host software: insert software: timestamp: 2022-04-28T18:55:58Z: Error 1213: Deadlock found when trying to get lock; try restarting transaction","ip_addr":"172.12.13.14","level":"error","method":"POST","took":"6.156863664s","ts":"2022-04-28T18:55:58.53477351Z","uri":"/api/v1/osquery/distributed/write","x_for_ip_addr":"172.12.13.14"}
    
    Apr 28 18:55:58 fleet-01.test.tech fleet[3040986]: {"component":"http","err":"timestamp: 2022-04-28T18:55:54Z: error in query ingestion || create transaction: timestamp: 2022-04-28T18:55:58Z: context canceled || save host with id 27: timestamp: 2022-04-28T18:55:58Z: context canceled","ingestion-err":"ingesting query software_linux: update host software: insert software: timestamp: 2022-04-28T18:55:54Z: context canceled","ip_addr":"172.12.13.15","level":"error","method":"POST","took":"19.774983596s","ts":"2022-04-28T18:55:58.898478856Z","uri":"/api/v1/osquery/distributed/write","x_for_ip_addr":"172.12.13.15"}
    Ad-hoc and scheduled queries work fine. We also know that this is not load balancer problem (direct connection to fleet from osquery represents same problem). So now we try so locate reason between Redis and MySQL
  • Tomas Touceda

    Tomas Touceda

    3 months ago
    hi @Artem what version of fleet are you running?
  • a

    Artem

    3 months ago
    Some of our clients have several days connection, but we still don’t see software data. P.S. if I do queries from https://github.com/fleetdm/fleet/blob/main/server/service/osquery_utils/queries.go#L391 in interactive mode, they works fine
  • Hi @Tomas Touceda! Fleet 4.13.0 • Go go1.17.8
  • I think we need to dive into our Redis and MySQL configs (because they were implemented by different commands). But it would me great if you can give any advices about right places to check. I checked Redis logs and don’t see any errors.
  • Tomas Touceda

    Tomas Touceda

    3 months ago
    that deadlock is in mysql, could you check the host details for host id 27 and see if it shows software there?
  • a

    Artem

    3 months ago
    No, there is no software in /hosts/27
  • Currently I don’t have direct access to mysql server, so I can not see its logs, but I will try to do it using our DBA asap 🙂