Skip to content

Conversation

@lfrancke
Copy link
Member

@lfrancke lfrancke commented Oct 6, 2025

https://issues.apache.org/jira/browse/ZOOKEEPER-4985

The zkCleanup.sh script has several issues that make it difficult to use in containerized or non-standard deployments:

The script hardcodes the path to zoo.cfg and fails with confusing error messages when the config file is not in the default location. When the config file is missing, it passes empty strings to PurgeTxnLog, causing confusing error messages

Steps to Reproduce:

  • Run ZooKeeper with config in a non-standard location (e.g., /custom/path/zoo.cfg)
  • Try to run cleanup:
grep: /path/to/zookeeper/bin/../conf/zoo.cfg: No such file or directory
grep: /path/to/zookeeper/bin/../conf/zoo.cfg: No such file or directory
Path '/path/to/zookeeper/bin' does not exist.
Usage:
PurgeTxnLog dataLogDir [snapDir] -n count 

@lfrancke
Copy link
Member Author

lfrancke commented Oct 6, 2025

The shellcheck failures were there before already.

bin/zkCleanup.sh Outdated
if [[ -z $ZOODATALOGDIR ]]; then
# Only dataDir specified
"$JAVA" "-Dzookeeper.log.dir=$ZOO_LOG_DIR" "-Dzookeeper.log.file=$ZOO_LOG_FILE" \
-cp "$CLASSPATH" "${flags[@]}" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLASSPATH environment variable is already exported. Passing it again as -cp $CLASSPATH is redundant, and makes really really long output in ps, and sometimes breaks the ability to use pkill or pgrep on the process, due to the length of the command-line. It is enough to export the variable. You don't need -cp.

(Same comment applies in the other cases where you've added it, as well.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I honestly can't remember why I did this. I just assume it was a remnant from debugging the issue.

@ctubbsii
Copy link
Member

The shellcheck failures were there before already.

I don't think that's true. I think it is caused by removing the shellcheck file hints. I just ran the tools/ci/run-shellcheck.sh script on the current master branch, and it succeeds. So, I think the failure was added in this PR.

lfrancke and others added 2 commits November 19, 2025 16:50
Co-authored-by: Christopher Tubbs <ctubbsii@apache.org>
@lfrancke
Copy link
Member Author

Thank you for the review. I'd like to claim it was late and I was tired but looking at the timestamp that wasn't the case. Sorry for the extra work. I believe I addressed all of your comments.

@lfrancke lfrancke requested a review from ctubbsii November 19, 2025 16:01
Comment on lines +49 to +50
ZOODATADIR="$(grep "^[[:space:]]*dataDir=" "$ZOOCFG" 2>/dev/null | sed -e 's/.*=//')"
ZOODATALOGDIR="$(grep "^[[:space:]]*dataLogDir=" "$ZOOCFG" 2>/dev/null | sed -e 's/.*=//')"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perhaps something for another PR, since it's a bit out of scope here, and there may be other scripts that would also benefit from this change, but I noticed that sed has greedy matching here and doesn't trim the whitespace in the value. cut -f2- -d= could be used instead to avoid the greedy matching, but still doesn't trim whitespace in the extracted property value.

It can be done with bash regular expressions and a simple capture group, though, without calling another process:

  re='^[^=]*=[[:space:]]*(.*)[[:space:]]*$'
  ZOODATADIR="$(grep "^[[:space:]]*dataDir=" "$ZOOCFG" 2>/dev/null)"
  [[ $ZOODATADIR =~ $re ]] && ZOODATADIR="${BASH_REMATCH[1]}"
  ZOODATALOGDIR="$(grep "^[[:space:]]*dataLogDir=" "$ZOOCFG" 2>/dev/null)"
  [[ $ZOODATALOGDIR =~ $re ]] && ZOODATALOGDIR="${BASH_REMATCH[1]}"

ZOO_LOG_FILE=zookeeper-$USER-cleanup-$HOSTNAME.log

# shellcheck disable=SC2206
flags=($JVMFLAGS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like "-Dzookeeper.log.dir=$ZOO_LOG_DIR" "-Dzookeeper.log.file=$ZOO_LOG_FILE" is added in all cases. So, it could just be added to flags here and removed from the subsequent calls:

Suggested change
flags=($JVMFLAGS)
flags=("-Dzookeeper.log.dir=$ZOO_LOG_DIR" "-Dzookeeper.log.file=$ZOO_LOG_FILE" $JVMFLAGS)

Comment on lines +59 to 73
if [[ -n $ZOODATADIR ]]; then
if [[ -z $ZOODATALOGDIR ]]; then
# Only dataDir specified
"$JAVA" "-Dzookeeper.log.dir=$ZOO_LOG_DIR" "-Dzookeeper.log.file=$ZOO_LOG_FILE" \
"${flags[@]}" org.apache.zookeeper.server.PurgeTxnLog "$ZOODATADIR" "$@"
else
# Both dataDir and dataLogDir specified
"$JAVA" "-Dzookeeper.log.dir=$ZOO_LOG_DIR" "-Dzookeeper.log.file=$ZOO_LOG_FILE" \
"${flags[@]}" org.apache.zookeeper.server.PurgeTxnLog "$ZOODATALOGDIR" "$ZOODATADIR" "$@"
fi
else
"$JAVA" "-Dzookeeper.log.dir=$ZOO_LOG_DIR" "-Dzookeeper.log.file=$ZOO_LOG_FILE" "${flags[@]}" \
org.apache.zookeeper.server.PurgeTxnLog "$ZOODATALOGDIR" "$ZOODATADIR" "$@"
# No config or config doesn't specify directories - pass all args to PurgeTxnLog
"$JAVA" "-Dzookeeper.log.dir=$ZOO_LOG_DIR" "-Dzookeeper.log.file=$ZOO_LOG_FILE" \
"${flags[@]}" org.apache.zookeeper.server.PurgeTxnLog "$@"
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The complexity of this if/else can be entirely removed if you just build up the command-line arguments. Combining with my earlier suggestion to add the other system properties to the flags array, this simplifies to:

Suggested change
if [[ -n $ZOODATADIR ]]; then
if [[ -z $ZOODATALOGDIR ]]; then
# Only dataDir specified
"$JAVA" "-Dzookeeper.log.dir=$ZOO_LOG_DIR" "-Dzookeeper.log.file=$ZOO_LOG_FILE" \
"${flags[@]}" org.apache.zookeeper.server.PurgeTxnLog "$ZOODATADIR" "$@"
else
# Both dataDir and dataLogDir specified
"$JAVA" "-Dzookeeper.log.dir=$ZOO_LOG_DIR" "-Dzookeeper.log.file=$ZOO_LOG_FILE" \
"${flags[@]}" org.apache.zookeeper.server.PurgeTxnLog "$ZOODATALOGDIR" "$ZOODATADIR" "$@"
fi
else
"$JAVA" "-Dzookeeper.log.dir=$ZOO_LOG_DIR" "-Dzookeeper.log.file=$ZOO_LOG_FILE" "${flags[@]}" \
org.apache.zookeeper.server.PurgeTxnLog "$ZOODATALOGDIR" "$ZOODATADIR" "$@"
# No config or config doesn't specify directories - pass all args to PurgeTxnLog
"$JAVA" "-Dzookeeper.log.dir=$ZOO_LOG_DIR" "-Dzookeeper.log.file=$ZOO_LOG_FILE" \
"${flags[@]}" org.apache.zookeeper.server.PurgeTxnLog "$@"
fi
directories=()
[[ -n $ZOODATADIR ]] && directories=("$ZOODATADIR")
[[ -n $ZOODATALOGDIR ]] && directories=("$ZOODATALOGDIR" "${directories[@]}")
"$JAVA" "${flags[@]}" org.apache.zookeeper.server.PurgeTxnLog "${directories[@]}" "$@"

This also covers the case where only dataLogDir is set, which I think wasn't covered before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants