Skip to content

[Bug]: Crawl workflow statistics are a bit off from expected #3011

@tw4l

Description

@tw4l

Browsertrix Version

v1.9.1 (and other recent versions)

What did you expect to happen? What happened instead?

I expect crawl workflow statistics to be accurate and self-explanatory.

In reality, all crawls that complete increment crawlSuccessfulCount and add status.filesAddedSize to the workflow's total size, even if the crawl was canceled or failed. This results in misleading or confusing workflow statistics.

In at least one case recently, we've also seen deleting a crawl result in a workflow having a crawl count of -1, though this was done by a user in a workshop and so we don't have clear repro steps.

Reproduction instructions

  1. Create a new workflow
  2. Run a crawl, pause it after a bit, then cancel
  3. Verify that the crawl workflow's size and crawlSuccessfulCount are increased

Screenshots / Video

No response

Environment

No response

Additional details

This commit contains a partial fix, though we decided it was out of scope for the PR it was originally a part of so I placed it in another branch just to preserve: 217e935

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

Status

Implementing

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions