Logs taking longer to load

tacLog · October 23, 2019, 10:06pm

Hey,

I just wanted to report a small issue that has been cropping up for me recently.

When I go to troubleshoot a location I normally will open up a bunch of devices from that location in a bunch of different tabs. I do this so I can audit the logs and run terminal commands quickly on batches of devices.

However, on about every 1 in 2 devices I am getting “No logs yet” in the logs pannel.

This message will often resolve within 5-10 minutes, and sometimes faster, but it still slows me down and even worse is highly inconsistent between refreshes and different devices.

I also suspected that the log feed was also failing on me for a certain amount of time yesterday.

Thanks and I hope the summit is going well!

-Thomas

hedss · October 24, 2019, 10:48am

Hello there,

Thanks for your report. We’ve actually seen this internally as well, and we’re currently working on investigating and resolving this. We’ll let you know as soon as we have more information for you!

Best regards,

Heds

tacLog · October 24, 2019, 5:33pm

Thanks @hedss

It never seems to cripple me, they always catch up eventually.

Best of luck to you!
-Thomas

hedss · October 25, 2019, 12:38pm

Hi Thomas,

This issue has been resolved, and you should now see log speeds greatly improved. Please let us know if this is not the case!

Best regards,

Heds

tacLog · October 25, 2019, 8:41pm

@hedss

Log speeds are beyond 10x improved. A report I whipped the other day up to read all my devices logs and report a basic status used to take around 4-5 hours to run. It just ran in just over 2 minutes. Fantastic!

Great job as always!
-Thomas

tacLog · November 5, 2019, 1:39am

Hey @hedss

The issues is prevalent again.

Is it possible to get this added to the status page?

-Thomas

thundron · November 5, 2019, 11:03am

Hi, can you share with us support access to a device with slow logs? We’d like to investigate a bit more if possible, as we didn’t find anything suspicious on our end

tacLog · November 6, 2019, 12:37am

Hey @thundron:

Finding one device to share with you is hard because the problem normally resolves itself and then doesn’t come back (maybe I am warming up the cache?)

I haven’t had time today to confirm that this list is constant between runs, but I added code to time and record the time it takes to pull logs.

But here are the access times per device:
https://pastebin.com/gYfaShYC

There is a long tail of good access times, but many that took much longer than the rest.

The code snippet used to pull logs:

def get_logs(device, trys=3):
    logs = None
    if trys <= 0:
        return  logs
    try:
        logs = balena.logs.history(device['uuid'], 200)
    except requests.exceptions.ReadTimeout:
        sleep_time = math.pow(2, 3-trys + 2)
        time.sleep(sleep_time)
        logs = get_logs(device, trys - 1)

    return logs

As you can see there is a retry, system built-in, of 4, 8, and 16 seconds.

The use case for this is reading all the logs every day and extracting metadata. It isn’t a bit deal but slows things down a few hours.

-Thomas

sradevski · November 6, 2019, 10:41am

Hey @tacLog,

Thanks for all the info, we will investigate this further and let you know what we find. In the meantime, can you provide more details about the devices that have slow logs? Are they newly provisioned devices? Is it always the same devices that are slow, or it varies randomly? Any additional info would help a lot.

Thanks!

tacLog · November 7, 2019, 2:12am

@sradevski

I wish I could provide more details.

Today when the report ran the average time it took to pull logs as was just alot lower today:
Here is the updated list

I think we might just be measuring load at any given time because I can’t find a pattern.

I will update, tomorrow, when I have time to do a more in-depth analysis.

_Page · November 7, 2019, 1:03pm

Hey, just to check the pastebin logs are numbered by latency right, not by order of querying? Just want to make sure it’s not a pattern where it gets slower after you’ve run a load of queries. Apart from that we’re planning to roll out some changes that should improve load generally but I also want to track down if there’s any remaining performance issues in the implementation itself, so knowing if there’s a way to trigger the slowness would be huge for me

tacLog · November 7, 2019, 10:09pm

Hey @_Page

Yes I sorted them. Today, I will output two lists, one in the query order one sorted.

I will look for something reproducible as I collect more information.

-Thomas

tacLog · November 14, 2019, 12:46am

Hey @_Page

I wish I followed up on this sooner.

No matter what I do, I can no longer reproduce this issue. All logs now load in sub-second times.

If you rolled something out, great job! If you didn’t, I am sorry, this will be a hard one to trace down.

Ether way,
Nice job!

-Thomas

sradevski · November 14, 2019, 8:03am

Hey Thomas, it’s good to hear that this is not an issue anymore. We scaled up our redis instances, which is probably why you cannot reproduce it anymore. Do let us know if the issue appears again and you can reliably reproduce it.

Have a nice day!

Topic		Replies	Views
Logs not showing correctly Product support support	3	299	November 30, 2021
Logs not updating in dashboard (restart needed) Product support	4	823	May 13, 2018
No logs on balena dashboard balenaOS	7	468	July 18, 2022
Reading device logs from openbalena-api openBalena	2	304	May 9, 2023
Duplicate entries in `balena logs` after boot. Product support	25	1339	November 3, 2020

Logs taking longer to load

Related topics