Project

General

Profile

Task #3529

Make a small bibisection tool

Added by Eyal Rozenberg 5 months ago. Updated 5 months ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
QA
Target version:
-
Start date:
Due date:
% Done:

0%

Tags:
URL:

Description

bibisecting works by examining one LO build at a time. And yet, the bibisect tool (on Linux at least) requires downloading a 9 GB git bundle.

Perhaps if you're performing a lot of bibisecting, that makes sense; frankly - I'm skeptical, because usually you care about recent builds which break things. Regardless, the typical triaging volunteer who might want to bibiset something is bibisecting very rarely. Such a person will balk at the huge download and just give it up.

So, I'd like to ask that a small bibisecting mechanism be created. Maybe just a smaller git bundle, maybe more than that.

Current git bundle description:

https://bibisect.libreoffice.org/linux-64-6.1

#1

Updated by Beluga Beluga 5 months ago

Eyal Rozenberg wrote:

bibisecting works by examining one LO build at a time. And yet, the bibisect tool (on Linux at least) requires downloading a 9 GB git bundle.

9 GB is not much, if you consider it might contain 7000 binaries. You can't really work around downloading big binaries, if investigating locally. The only other way would be remotely connecting to some workstation hosting all the repositories.

Is your proposal in practice that we should split the repositories up, offering multiple repositories for each version? I don't see how that would solve the problem as you need to first roughly determine where the bug appeared and would end up downloading gigabytes of data anyway.

I disagree that the typical triaging volunteer is bibisecting very rarely. If you are not familiar with git and get introduced to the whole concept through bisecting, you need a lot of practice to get comfortable. I am guiding new triagers into the process quite early on in their career. I recommend the new contributors focus on the newer versions as they naturally have the most undiscovered regressions.

#2

Updated by Michael Meeks 5 months ago

Beluga Beluga wrote in #note-1:

Eyal Rozenberg wrote:

bibisecting works by examining one LO build at a time. And yet, the bibisect tool (on Linux at least) requires downloading a 9 GB git bundle.

So - 9gb is clearly too large for most people most of the time; I think it's a fair comment. Of course given the huge power that it gives us it is a price worth paying but ... I think we should consider how we can make it easier.

Here are some crazy ideas:

  • serve the git log as a file somehow - and write a simple perl^Wpython bisection tool t operate on that to find the right git hash to test next. Yes branches are confusing, and it's not a trivial thing to do - but we could do a quick & dirty first cut here I think to get much closer.
  • shallow clone: git clone [remote-url] --branch [name] --single-branch [folder]
    if we can do this for an individual git hash - we can significantly reduce the bandwidth. I wonder if that means we need a branch-per-commit - I would hope not - but might be an option.

or:

  • server-side bisection tool
  • we could effectively give people a terminal onto a server hosted repo - and allow them to rsync down an exact version. Possibly with a bit of PHP and some file-stamp locking we could have a URL to download any version from by source hash.

or:

  • even easier - just provide remote / RDP / VNC access to an AWS machine that already has the repo checked out and a "just do it" environment setup ready for people to bibisect =) - presumably we have hardware budget for that.

Of course - all of that is work; but I do think it would be valuable - when I point people at bibisection I fear many of them fall at the download hurdle.

#3

Updated by Eyal Rozenberg 5 months ago

Beluga Beluga wrote in #note-1:

9 GB is not much, if you consider it might contain 7000 binaries. You can't really work around downloading big binaries, if investigating locally.

This is basically a contradiction: If the entire set is just 9 GB, that means that the delta between pairs of binaries is not that big typically.

But - that's not the point. The point is that 9 GB deters bug reporters, who might otherwise be willing to do some bibisection, from doing it. And this won't change because there may be a good excuse for the file size.

The only other way would be remotely connecting to some workstation hosting all the repositories.

That's not the only way, as Michael Meeks indicates. To start bibisecting, you don't need all of those 7000 binary versions up-fron.t

Is your proposal in practice that we should split the repositories up, offering multiple repositories for each version? I don't see how that would solve the problem as you need to first roughly determine where the bug appeared and would end up downloading gigabytes of data anyway.

I'm not proposing a specific solution, as this is not really my specialty. I'm indicating a need.

I disagree that the typical triaging volunteer is bibisecting very rarely.

I was basing my claim on intuition and knowledge of the (small) RTL QA community. It's extremely rare, if it ever happens, for one of us to bibisect.

If you are not familiar with git and get introduced to the whole concept through bisecting, you need a lot of practice to get comfortable.

Even people who work with git daily, and occasionaly bisect their own repos, are deterred from downloading the 9 GB bundle. Like me, for example.

I am guiding new triagers into the process quite early on in their career. I recommend the new contributors focus on the newer versions as they naturally have the most undiscovered regressions.

I would assume that the majority of people who do some triage are not those who get any official guidance or training.

But, now that I'm saying this - it's not just triagers proper. If bibisecting was more accessible to bug reporters, they might be motivated enough to do it. The 9 GB bundle is a hurdle to that.

#4

Updated by Xisco Fauli Tarazona 5 months ago

Eyal Rozenberg wrote:

bibisecting works by examining one LO build at a time. And yet, the bibisect tool (on Linux at least) requires downloading a 9 GB git bundle.

Perhaps if you're performing a lot of bibisecting, that makes sense; frankly - I'm skeptical, because usually you care about recent builds which break things.

If you just care about master, just clone the master repo < https://bibisect.libreoffice.org/linux-64-7.3 >. At the moment, updated this morning to a commit from today ( eac5977bfc11797eda356560a5e45c51108ef5a1 ) and containing 820 builds it's less than 2 gbs. Then, you just need to update it every other week. Every update would be ~100 mbs of data.

Also available in: Atom PDF