-
-
Notifications
You must be signed in to change notification settings - Fork 56
fix: retry file copy on failure due to race #359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| )), | ||
| _ => Err(anyhow::anyhow!(e).context("Failed to copy file")), | ||
| }, | ||
| let mut delay = Duration::from_millis(FILE_COPY_INITIAL_DELAY_MS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we're mixing synchronous duration calls with asynchronous code execution here? I think it would be better to keep them under the same runtime system.
| /// } | ||
| /// ``` | ||
| async fn copy_file_with_error_handling(old_path: &Path, new_path: &Path) -> Result<()> { | ||
| match fs::copy(&old_path, &new_path).await { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested the bug you're getting with the old code, but using tokio::fs (async file operations) to see if it's due to the crossing the runtime boundary?
Perhaps we could do this using a more correct pattern for synchronous FS operations - would also be useful elsewhere in the code-base too.
(Struct/container implementing asref/deref + handling for Drop trait impl. etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, that's probably a better way to fix this. Of course because this bug is tempermental, I can't reproduce it at the moment - even without either fix. I guess because we don't really care abot blocking the main thread in this tool, it would be best to standardize on synchronous FS?
I'm honestly not super well versed in rust, but I kept running into these big issues so wanted to do something about them :p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way Rust's async runtime works differs quite a bit to other languages (Like Python or JS (Well, Node really). Anything that's blocking will effectively 'halt-the-world' until it's finished, as you're handing over the entire program runtime (which for Rust, is it's runtime and that's it) to tokio. The problem with blocking (especially in a loop like here) is every time we loop and hit a blocking operation, the program effectively 'hiccups' and halts.
I believe that tokio already wraps their struct for file operations to handle handover for their executor, though if you're up for the challenge you're more than welcome to attempt an implementation of a file operation wrapper struct.
Would like be something akin to Arc + probably an atomic to sync on, or a tokio semaphore to cause wait/gather, but you'll also need to poll the future yourself and implement Drop to handle the future completing.
Alternatives are of course handing it off via a task too
fixes #332
It appears that the check prior to the copy sometimes doesn't release the file quick enough before copying the proxy. This adds 5 retry attempts to hopefully get around this (working in my local testing!)