Channels
  • theopolis

    theopolis

    1 year ago
    Hi @Julian Scala, I sort of understand the behavior you are asking for, but I want to clarify. What do you want the agents to do exactly in the event a logger plugin is not responding?
  • j

    Julian Scala

    1 year ago
    Thanks for the response! I want them to do nothing haha, specifically to NOT store/cache results until logger plugin is back online/ consuming
    I mean, discard results on scheduled queries until the event logger plugin responds I guess.
  • theopolis

    theopolis

    1 year ago
    I see, I hope you don’t mind me saying that is quite opposite of what people normally want/expect.
  • j

    Julian Scala

    1 year ago
    Haha no worries, but just to give a little bit of context, we don’t want the devices to store information that cant be sent (unless there is a limit that can be set). At the same time, we don’t want our backend services to get bumped on records once they go back online. 😄
  • theopolis

    theopolis

    1 year ago
    You may be able to accomplish this today with the buffered logger options. The buffered logger is sort of the “base class” for multiple remote logging plugins.
  • j

    Julian Scala

    1 year ago
    Is
    --buffered_log_max=10
    flag used for this? I understand that
    10
    logs is the max count the device/agent can hold, if more logs are buffered the older ones will be removed. Is that the way it works? We do have this set to
    0
    meaning that there is no limit on this buffer and everything will be kept. Pls correct me if I am wrong
  • zwass

    zwass

    1 year ago
    Yes. IIRC that buffer is only cleared after a successful or failed log attempt so setting it to something like 1 might effectively do what you want?
  • j

    Julian Scala

    1 year ago
    I think it is! This is amazing, thanks a lot for your help!🚀
  • s

    seph

    1 year ago
    Can you say more about what leads you to this use case? I’m not sure I’ve encountered it before, so I’d love to hear about what’s behind it
  • j

    Julian Scala

    1 year ago
    Yes! We output logs of a huge device fleet to a AWS Kinesis data stream. This past week, there was an outage on AWS Kinesis, causing the streams not to respond or receive any record. Every device we manage stored every result for more than 24hs (we have a lot of snapshot queries in short intervals). As you can imagine, by the time the Kinesis was up again, every device sort of ‘puke’ every record they had. Thanks we have a really good backend service processing those results but it got really smashed. Not to mention devices losing HD space by accumulating every result log.
    We want to avoid this kind of situations again, just discard the data.
  • s

    seph

    1 year ago
    Okay! I kinda get that, but I’d probably come at it a bit differently… This is a common problem in modern microservices. One method is to have there be some kind of rate limiting or circuit breaker to avoid a large ingest melting things. “backpressure” is another thing. Though I’d generally expect kinesis to be able to handle anything you throw at it.
    If you’re willing to just toss aside this data, how much value does collecting it have?
  • j

    Julian Scala

    1 year ago
    Never thought Kinesis could ever be down, but it happened. Things didn’t melt, only costed more money for a bit of time. We don’t collect the data, we just process records in order to have the actual state of devices. We toss aside past data since we don’t really care on historic changes. Just the actual state.
  • s

    seph

    1 year ago
    Ah. That makes sense. If you’re just using it as a “current state” sort of thing, there’s little value in the past
  • zwass

    zwass

    1 year ago
    @Julian Scala I wonder if it might be as/more effective to run a TLS server that issues the same queries as live queries. Sounds like you're not using the main benefits of scheduled queries (differentials and offline results).