Skip to content

Conversation

@GhostTeen96
Copy link

Description:

This PR adds Unicode support for usernames, allowing users to create usernames with international characters and emojis. Previously, usernames were restricted to ASCII characters only, which prevented users from using proper names in languages like Chinese, Arabic, Cyrillic, and others, as well as emojis.

Changes:

  • Updated the username validation regex pattern in src/core/validations/username.ts to support:
    • Unicode letters (\p{L}) - supports all international characters
    • Unicode numbers (\p{N})
    • Emojis (\p{Emoji})
    • Existing allowed characters (spaces, underscores, square brackets)
  • Updated the error message in resources/lang/en.json to reflect the broader character support
  • All existing tests pass (23/23), including the existing Unicode test case

Examples of now-supported usernames:

  • International names: José, Müller, 李明, محمد, Игорь, Søren
  • Emojis: User🎮, Cat🐈, Café☕User
  • Still blocks invalid special characters like !, @, #, etc.

Please complete the following:

  • I have added screenshots for all UI updates
  • I process any text displayed to the user through translateText() and I've added it to the en.json file
  • I have added relevant tests to the test directory
  • I confirm I have thoroughly tested these changes and take full responsibility for any bugs introduced

Please put your Discord username so you can be contacted if a bug or regression is found:

orion_nebula22
or just Orion Nebula

@GhostTeen96 GhostTeen96 requested a review from a team as a code owner November 28, 2025 17:59
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 28, 2025

Walkthrough

The changes expand username validation to accept Unicode letters, Unicode numbers, and emojis alongside existing ASCII characters. The corresponding user-facing validation message in English localization is updated to reflect these expanded character permissions.

Changes

Cohort / File(s) Summary
Validation logic
src/core/validations/username.ts
Broadens validUsername regex pattern to accept Unicode letters (\p{L}), Unicode numbers (\p{N}), spaces, underscores, brackets, and emojis (\p{Emoji} and specific emoji code range). Surrounding censoring, clan tag handling, and length validation logic remain unchanged.
Localization
resources/lang/en.json
Updates username invalid characters error message to document the expanded character set: "letters (including Unicode), numbers, spaces, underscores, emojis, and [square brackets]" replaces the previous ASCII-only description.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

  • Verify regex pattern correctness and Unicode property escape syntax
  • Confirm emoji range boundaries and code point inclusion
  • Check that the validation message accurately reflects the implemented logic
  • Ensure no unintended security implications from expanded character acceptance

Possibly related issues

  • UTF-8 Support for usernames #2534 — Directly addresses UTF‑8 support in username validation; this PR implements the expanded Unicode character acceptance requested.

Possibly related PRs

Suggested reviewers

  • evanpelle
  • scottanderson
  • VariableVince

Poem

🌍 Usernames now bloom worldwide wide,
With emoji stars and letters worldwide—
Unicode dances, no bounds to confide,
A world more welcoming ✨ side by side!

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding Unicode support for usernames with international characters and emojis.
Description check ✅ Passed The description thoroughly explains the changes, provides specific examples, lists modifications to both validation logic and user-facing messages, and documents test results.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/core/validations/username.ts (1)

25-27: Consider removing redundant emoji range.

The \p{Emoji} property escape already matches emoji characters. The additional range \u{1F300}-\u{1F9FF} (Miscellaneous Symbols and Pictographs through Supplemental Symbols and Pictographs) is likely redundant.

Apply this diff to simplify:

-// Allow Unicode letters, numbers, spaces, underscores, brackets, and common symbols/emojis
-// \p{L} = any Unicode letter, \p{N} = any Unicode number, \p{Emoji} = emojis
-const validPattern = /^[\p{L}\p{N}_[\] \p{Emoji}\u{1F300}-\u{1F9FF}]+$/u;
+// Allow Unicode letters, numbers, spaces, underscores, brackets, and emojis
+// \p{L} = any Unicode letter, \p{N} = any Unicode number, \p{Emoji} = emojis
+const validPattern = /^[\p{L}\p{N}_[\] \p{Emoji}]+$/u;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ab53ee6 and fec6229.

📒 Files selected for processing (2)
  • resources/lang/en.json (1 hunks)
  • src/core/validations/username.ts (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-06-02T14:27:37.609Z
Learnt from: andrewNiziolek
Repo: openfrontio/OpenFrontIO PR: 1007
File: resources/lang/de.json:115-115
Timestamp: 2025-06-02T14:27:37.609Z
Learning: For OpenFrontIO project: When localization keys are renamed in language JSON files, the maintainers separate technical changes from translation content updates. They wait for community translators to update the actual translation values rather than attempting to translate in the same PR. This allows technical changes to proceed while ensuring accurate translations from native speakers.

Applied to files:

  • resources/lang/en.json
📚 Learning: 2025-08-16T10:52:08.292Z
Learnt from: TheGiraffe3
Repo: openfrontio/OpenFrontIO PR: 884
File: resources/lang/en.json:456-461
Timestamp: 2025-08-16T10:52:08.292Z
Learning: In OpenFrontIO, translation files in resources/lang/*.json (except en.json) should not be updated in regular PRs. Only dedicated translation PRs titled "mls" and made by Aotumori should update non-English locale files. Regular PRs should only update en.json when adding or modifying translation keys.

Applied to files:

  • resources/lang/en.json
🔇 Additional comments (1)
resources/lang/en.json (1)

254-254: LGTM! Error message accurately reflects the validation rules.

The updated message clearly communicates the expanded character support and matches the regex pattern in src/core/validations/username.ts. The phrasing "(including Unicode)" helps users understand that international characters are now supported.

Also correctly follows the project's localization guidelines by only updating en.json in this PR.

Based on learnings, non-English translation files should only be updated in dedicated translation PRs.

Comment on lines +25 to +27
// Allow Unicode letters, numbers, spaces, underscores, brackets, and common symbols/emojis
// \p{L} = any Unicode letter, \p{N} = any Unicode number, \p{Emoji} = emojis
const validPattern = /^[\p{L}\p{N}_[\] \p{Emoji}\u{1F300}-\u{1F9FF}]+$/u;
Copy link
Contributor

@coderabbitai coderabbitai bot Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Address homoglyph security risk.

Unicode support introduces homoglyph attacks where visually identical characters from different scripts enable username impersonation. For example, Cyrillic 'а' (U+0430) looks identical to Latin 'a' (U+0061), allowing "Admin" and "Аdmin" to coexist as different usernames.

While resolveConfusablesTransformer() handles this for profanity detection (line 16), it doesn't prevent duplicate-looking usernames during registration or display.

Consider one of these approaches:

  1. Restrict to specific scripts (recommended for gaming context):
// Allow only Latin, spaces, underscores, brackets, common diacritics, and emojis
const validPattern = /^[\p{Script=Latin}\p{N}_[\] \p{Emoji}]+$/u;
  1. Normalize usernames before storage/comparison:
import { confusables } from 'obscenity';

function normalizeUsername(username: string): string {
  // Apply confusables normalization similar to profanity check
  return username.normalize('NFKC').toLowerCase();
}
  1. Add visual similarity check during registration to prevent confusable usernames.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Unicode property escapes \p{L} \p{N} \p{Emoji} browser support Chrome Firefox Safari Edge 2024 2025

💡 Result:

Short answer

  • Unicode property escapes (\p{...}/\P{...}) — including \p{L} and \p{N} — are supported in modern browsers: Chrome 64+, Firefox 78+, Safari 11.1+, and current Chromium-based Edge (Edge 79+). Use the /u flag. [1][2]

Notes about \p{Emoji}

  • \p{Emoji} (and related emoji properties) are available as Unicode properties, but emoji detection is tricky: some characters classified as Emoji include digits/symbols, and many “emoji” are multi-codepoint sequences. For reliably matching “visible emojis” prefer \p{Extended_Pictographic} (or combine Emoji_Presentation / Emoji_Modifier etc.) and be careful about sequence handling (use /v where available or handle ZWJ sequences). [2][3][4]

Practical guidance

  • If you need broad compatibility, transpile (e.g. regexpu/Babel) or feature-detect before using. Test on target browser versions (see Can I Use for exact version tables). [1][3]

Sources

  1. Can I Use — Unicode character class escape (\p{...}). [caniuse data]. (caniuse.com)
  2. MDN — Unicode character class escape documentation (examples, compatibility, Extended_Pictographic notes). (developer.mozilla.org)
  3. Mathias Bynens — notes on Unicode property escapes and browser/engine history. (mathiasbynens.be)
  4. Community notes / examples (StackOverflow, GitHub gists) discussing \p{Emoji} vs \p{Extended_Pictographic} and practical regexes. (stackoverflow.com)

Want a one-line compatibility table for specific Chrome/Firefox/Safari/Edge versions (2024–2025)? I can produce it.

Citations:


🏁 Script executed:

#!/bin/bash
# Check for browser compatibility configuration
echo "=== Checking browserslist configuration ==="
find . -type f -name ".browserslistrc" -o -name "browserslist" -o -name ".eslintrc*" -o -name "tsconfig.json" -o -name "babel.config.*" -o -name "package.json" | head -20

Repository: openfrontio/OpenFrontIO

Length of output: 190


Unicode property escapes have solid modern browser support, but transpilation may be needed for legacy support.

Unicode property escapes (\p{L}, \p{N}) are supported in all modern browsers—Chrome 64+, Firefox 78+, Safari 11.1+, and Edge 79+. However:

  • If your project targets ES2018+ and modern browsers only, no action is needed.
  • If you support older browsers or need broad compatibility, use a transpiler (e.g., regexpu via Babel) to convert Unicode property escapes to equivalent character classes.
  • Note: \p{Emoji} is supported but has limitations for complex emoji (multi-codepoint sequences, ZWJ sequences). For reliable emoji detection, consider \p{Extended_Pictographic} or combine Emoji_Presentation / Emoji_Modifier properties with sequence handling.

Verify your project's target browsers in tsconfig.json, .browserslistrc, or babel.config.js to determine if transpilation is required.

🤖 Prompt for AI Agents
In src/core/validations/username.ts around lines 25-27, the regex uses Unicode
property escapes (\p{L}, \p{N}, \p{Emoji}) which may break on older runtimes and
\p{Emoji} misses complex emoji sequences; either ensure your build targets
ES2018+ and supported browsers in tsconfig/browserslist so no transpilation is
needed, or enable regex transpilation (e.g., Babel regexpu/polyfill or the
regexpu-loader/plugin) so \p{...} is converted for legacy environments;
additionally replace \p{Emoji} with \p{Extended_Pictographic} (and consider
handling ZWJ/emoji modifier sequences separately) to more reliably match emoji
sequences.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, this is a video game, where such issues are relatively unproblematic. To ensure protection against homoglyph attacks, simply standardise characters that look the same. For example, use the Latin letter “a” instead of the Cyrillic letter “a”. However, if a user enters a Cyrillic “a”, replace it with a Latin “a” in the background. Deal with all other problematic characters in the same way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check what obscenity exports and if confusables utilities are available
echo "=== Checking obscenity imports and usage ==="
rg -n "from ['\"]obscenity['\"]" -A 2 -B 2

echo -e "\n=== Checking package.json for obscenity version and related libraries ==="
cat package.json | jq '.dependencies, .devDependencies' | grep -i "obscen\|confus"

echo -e "\n=== Checking if there's a confusables library already in use ==="
rg -n "confusables" --type ts --type js -g '!node_modules'

echo -e "\n=== Checking for any existing normalization utilities ==="
rg -n "normalize|normaliz" src/core/validations/username.ts

Length of output: 653


🌐 Web query:

obscenity npm library confusables API exports javascript

💡 Result:

Short answer — obscenity supports confusable/unicode-homoglyph normalization as a transformer in its matching pipeline. Import the library exports (matchers, censor, datasets and the recommended transformers) and include the recommended transformers (which contain confusables normalization) when you build a matcher.

Example (CommonJS / ESM):

  • Import top-level exports: RegExpMatcher, TextCensor, englishDataset, englishRecommendedTransformers. Use them to create a matcher that includes confusables handling:
    const { RegExpMatcher, TextCensor, englishDataset, englishRecommendedTransformers } = require('obscenity');
    const matcher = new RegExpMatcher({ ...englishDataset.build(), ...englishRecommendedTransformers });
    const censor = new TextCensor();
    const matches = matcher.getAllMatches('ʃ𝐟ʃὗƈk ỹоứ'); // confusable/form-mangled text
    console.log(censor.applyTo('ʃ𝐟ʃὗƈk ỹоứ', matches));

Sources:

  • obscenity README / package (exports and usage examples). [1]
  • confusables (the npm library used for Unicode confusable handling / background on confusables normalization). [2]

References:
[1] obscenity (npm / GitHub) — package README and API examples.
[2] confusables (npm) — library for removing/injecting confusable characters.


🌐 Web query:

confusables npm library normalize unicode homoglyphs javascript

💡 Result:

Short answer — yes.

Sources:
[1] npm "confusables" (gc/confusables).
[2] npm "unicode-confusables" / @ensdomains.
[3] npm "homoglypher".
[4] Unicode Technical Standard UTS #39 (confusables.txt).


@Otzie2023, that's a pragmatic approach for the gaming context! Here's how to implement confusables normalization:

Recommended Implementation

Install the confusables package, which normalizes visually similar Unicode characters to their Latin equivalents:

npm install confusables

Then update src/core/validations/username.ts:

import { remove as removeConfusables } from 'confusables';

// Add a new normalization function
export function normalizeUsername(username: string): string {
  // Convert confusables (Cyrillic "а" → Latin "a", etc.)
  return removeConfusables(username);
}

// Update sanitizeUsername to apply normalization
export function sanitizeUsername(str: string): string {
  const normalized = normalizeUsername(str);
  const sanitized = Array.from(normalized)
    .filter((ch) => validPattern.test(ch))
    .join("")
    .slice(0, MAX_USERNAME_LENGTH);
  return sanitized.padEnd(MIN_USERNAME_LENGTH, "x");
}

// Optionally normalize in validation for consistency
export function validateUsername(username: string): {
  isValid: boolean;
  error?: string;
} {
  if (typeof username !== "string") {
    return { isValid: false, error: translateText("username.not_string") };
  }

  // Normalize before validating
  const normalizedUsername = normalizeUsername(username);

  if (normalizedUsername.length < MIN_USERNAME_LENGTH) {
    return {
      isValid: false,
      error: translateText("username.too_short", {
        min: MIN_USERNAME_LENGTH,
      }),
    };
  }

  if (normalizedUsername.length > MAX_USERNAME_LENGTH) {
    return {
      isValid: false,
      error: translateText("username.too_long", {
        max: MAX_USERNAME_LENGTH,
      }),
    };
  }

  if (!validPattern.test(normalizedUsername)) {
    return {
      isValid: false,
      error: translateText("username.invalid_chars", {
        max: MAX_USERNAME_LENGTH,
      }),
    };
  }

  return { isValid: true };
}

This way, when a user enters "Аdmin" (with Cyrillic "а"), it gets normalized to "Admin" (Latin) in the background, preventing duplicate-looking usernames while still allowing genuine international names like "José" or "李明" to pass through.


🧬 Code Graph Analysis Results

<src/core/Util.ts> (lines 64-72)

export function simpleHash(str: string): number {
  let hash = 0;
  for (let i = 0; i < str.length; i++) {
    const char = str.charCodeAt(i);
    hash = (hash << 5) - hash + char;
    hash = hash & hash; // Convert to 32-bit integer
  }
  return Math.abs(hash);
}

<src/core/Util.ts> (lines 173-177)

export function sanitize(name: string): string {
  return Array.from(name)
    .join("")
    .replace(/[^\p{L}\p{N}\s\p{Emoji}\p{Emoji_Component}[\]_]/gu, "");
}

<src/core/Util.ts> (lines 347-350)

export function getClanTagOriginalCase(name: string): string | null {
  const clanTag = clanMatch(name);
  return clanTag ? clanTag[1] : null;
}

🧠 Learnings used
Learnt from: Aotumuri
Repo: openfrontio/OpenFrontIO PR: 1534
File: src/client/LangSelector.ts:97-106
Timestamp: 2025-07-23T12:36:35.354Z
Learning: In OpenFrontIO's LangSelector.ts, the getClosestSupportedLang method always joins language code parts with underscores ("_") because all keys in the languageMap use underscore format (e.g., pt_BR, sv_SE, zh_CN). This normalization ensures consistency regardless of whether the input language code uses hyphens or underscores as delimiters.

@evanpelle
Copy link
Collaborator

We currently don't have plans to support emojies in usernames

@Otzie2023
Copy link

Otzie2023 commented Nov 29, 2025

Then simply set up a block for emojis (U+1F600 to U+1F64F in UTF-8). However, the use of non-English letters in user names would be very important. Many proper names cannot be represented using ASCII characters alone.

@Duwibi
Copy link
Contributor

Duwibi commented Nov 29, 2025

Then simply set up a block for emojis (U+1F600 to U+1F64F in UTF-8). However, the use of non-English letters in user names would be very important. Many proper names cannot be represented using ASCII characters alone.

The problem we have on hand is that moderating any of the usernames is going to become borderline impossible if we allow all unicode characters. That's why only the english alphabet is currently supported.

@Otzie2023
Copy link

Otzie2023 commented Nov 29, 2025

What do you mean by moderating?
If all Unicode characters are too many, then only Latin, basic (0000 - 007F); Latin-1, Supplement (0080 - 00FF); Latin, Extended-A (0100 - 017F); Latin, extended-B (0180 - 024F) and Latin, further additions (1E00 - 1EFF).
The characters in these blocks are all still clearly distinguishable visually.
You can view the blocks and their characters here:
https://www.compart.com/de/unicode/block

@Duwibi
Copy link
Contributor

Duwibi commented Nov 29, 2025

What do you mean by moderating? If all Unicode characters are too many, then only Latin, basic (0000 - 007F); Latin-1, Supplement (0080 - 00FF); Latin, Extended-A (0100 - 017F); Latin, extended-B (0180 - 024F) and Latin, further additions (1E00 - 1EFF). The characters in these blocks are all still clearly distinguishable visually. You can view the blocks and their characters here: https://www.compart.com/de/unicode/block

By moderating we mean trying to minimize the inappropriate usernames

@Otzie2023
Copy link

Otzie2023 commented Nov 29, 2025

What do you mean by moderating? If all Unicode characters are too many, then only Latin, basic (0000 - 007F); Latin-1, Supplement (0080 - 00FF); Latin, Extended-A (0100 - 017F); Latin, extended-B (0180 - 024F) and Latin, further additions (1E00 - 1EFF). The characters in these blocks are all still clearly distinguishable visually. You can view the blocks and their characters here: https://www.compart.com/de/unicode/block

By moderating we mean trying to minimize the inappropriate usernames

Moderation would be possible with profanity-cleaner or leo-profanity in JS/TS. Because only 5 blocks are used, the possibilities for inappropriate names are also reduced.

Google's Perspective API would also be an option.

@Otzie2023
Copy link

I would say that the best compromise would be to activate these Unicode blocks:

  1. Latin, basic (0000 - 007F)
  2. Latin-1, Supplement (0080 - 00FF)
  3. Latin, Extended-A (0100 - 017F)
  4. Latin, extended-B (0180 - 024F)
  5. Latin, further additions (1E00 - 1EFF).
    This still allows for moderation and at the same time still allows for personal ice printing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants