-
Notifications
You must be signed in to change notification settings - Fork 706
Add Unicode support for usernames - Allow international characters and emojis in usernames #2535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…d emojis in usernames
WalkthroughThe changes expand username validation to accept Unicode letters, Unicode numbers, and emojis alongside existing ASCII characters. The corresponding user-facing validation message in English localization is updated to reflect these expanded character permissions. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes
Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks✅ Passed checks (3 passed)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/core/validations/username.ts (1)
25-27: Consider removing redundant emoji range.The
\p{Emoji}property escape already matches emoji characters. The additional range\u{1F300}-\u{1F9FF}(Miscellaneous Symbols and Pictographs through Supplemental Symbols and Pictographs) is likely redundant.Apply this diff to simplify:
-// Allow Unicode letters, numbers, spaces, underscores, brackets, and common symbols/emojis -// \p{L} = any Unicode letter, \p{N} = any Unicode number, \p{Emoji} = emojis -const validPattern = /^[\p{L}\p{N}_[\] \p{Emoji}\u{1F300}-\u{1F9FF}]+$/u; +// Allow Unicode letters, numbers, spaces, underscores, brackets, and emojis +// \p{L} = any Unicode letter, \p{N} = any Unicode number, \p{Emoji} = emojis +const validPattern = /^[\p{L}\p{N}_[\] \p{Emoji}]+$/u;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
resources/lang/en.json(1 hunks)src/core/validations/username.ts(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-06-02T14:27:37.609Z
Learnt from: andrewNiziolek
Repo: openfrontio/OpenFrontIO PR: 1007
File: resources/lang/de.json:115-115
Timestamp: 2025-06-02T14:27:37.609Z
Learning: For OpenFrontIO project: When localization keys are renamed in language JSON files, the maintainers separate technical changes from translation content updates. They wait for community translators to update the actual translation values rather than attempting to translate in the same PR. This allows technical changes to proceed while ensuring accurate translations from native speakers.
Applied to files:
resources/lang/en.json
📚 Learning: 2025-08-16T10:52:08.292Z
Learnt from: TheGiraffe3
Repo: openfrontio/OpenFrontIO PR: 884
File: resources/lang/en.json:456-461
Timestamp: 2025-08-16T10:52:08.292Z
Learning: In OpenFrontIO, translation files in resources/lang/*.json (except en.json) should not be updated in regular PRs. Only dedicated translation PRs titled "mls" and made by Aotumori should update non-English locale files. Regular PRs should only update en.json when adding or modifying translation keys.
Applied to files:
resources/lang/en.json
🔇 Additional comments (1)
resources/lang/en.json (1)
254-254: LGTM! Error message accurately reflects the validation rules.The updated message clearly communicates the expanded character support and matches the regex pattern in
src/core/validations/username.ts. The phrasing "(including Unicode)" helps users understand that international characters are now supported.Also correctly follows the project's localization guidelines by only updating
en.jsonin this PR.Based on learnings, non-English translation files should only be updated in dedicated translation PRs.
| // Allow Unicode letters, numbers, spaces, underscores, brackets, and common symbols/emojis | ||
| // \p{L} = any Unicode letter, \p{N} = any Unicode number, \p{Emoji} = emojis | ||
| const validPattern = /^[\p{L}\p{N}_[\] \p{Emoji}\u{1F300}-\u{1F9FF}]+$/u; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address homoglyph security risk.
Unicode support introduces homoglyph attacks where visually identical characters from different scripts enable username impersonation. For example, Cyrillic 'а' (U+0430) looks identical to Latin 'a' (U+0061), allowing "Admin" and "Аdmin" to coexist as different usernames.
While resolveConfusablesTransformer() handles this for profanity detection (line 16), it doesn't prevent duplicate-looking usernames during registration or display.
Consider one of these approaches:
- Restrict to specific scripts (recommended for gaming context):
// Allow only Latin, spaces, underscores, brackets, common diacritics, and emojis
const validPattern = /^[\p{Script=Latin}\p{N}_[\] \p{Emoji}]+$/u;- Normalize usernames before storage/comparison:
import { confusables } from 'obscenity';
function normalizeUsername(username: string): string {
// Apply confusables normalization similar to profanity check
return username.normalize('NFKC').toLowerCase();
}- Add visual similarity check during registration to prevent confusable usernames.
🧩 Analysis chain
🌐 Web query:
Unicode property escapes \p{L} \p{N} \p{Emoji} browser support Chrome Firefox Safari Edge 2024 2025
💡 Result:
Short answer
- Unicode property escapes (\p{...}/\P{...}) — including \p{L} and \p{N} — are supported in modern browsers: Chrome 64+, Firefox 78+, Safari 11.1+, and current Chromium-based Edge (Edge 79+). Use the /u flag. [1][2]
Notes about \p{Emoji}
- \p{Emoji} (and related emoji properties) are available as Unicode properties, but emoji detection is tricky: some characters classified as Emoji include digits/symbols, and many “emoji” are multi-codepoint sequences. For reliably matching “visible emojis” prefer \p{Extended_Pictographic} (or combine Emoji_Presentation / Emoji_Modifier etc.) and be careful about sequence handling (use /v where available or handle ZWJ sequences). [2][3][4]
Practical guidance
- If you need broad compatibility, transpile (e.g. regexpu/Babel) or feature-detect before using. Test on target browser versions (see Can I Use for exact version tables). [1][3]
Sources
- Can I Use — Unicode character class escape (\p{...}). [caniuse data]. (caniuse.com)
- MDN — Unicode character class escape documentation (examples, compatibility, Extended_Pictographic notes). (developer.mozilla.org)
- Mathias Bynens — notes on Unicode property escapes and browser/engine history. (mathiasbynens.be)
- Community notes / examples (StackOverflow, GitHub gists) discussing \p{Emoji} vs \p{Extended_Pictographic} and practical regexes. (stackoverflow.com)
Want a one-line compatibility table for specific Chrome/Firefox/Safari/Edge versions (2024–2025)? I can produce it.
Citations:
- 1: https://caniuse.com/mdn-javascript_regular_expressions_unicode_character_class_escape?utm_source=openai
- 2: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape?utm_source=openai
- 3: https://mathiasbynens.be/notes/es-unicode-property-escapes?utm_source=openai
- 4: https://stackoverflow.com/questions/18862256/how-to-detect-emoji-using-javascript?utm_source=openai
🏁 Script executed:
#!/bin/bash
# Check for browser compatibility configuration
echo "=== Checking browserslist configuration ==="
find . -type f -name ".browserslistrc" -o -name "browserslist" -o -name ".eslintrc*" -o -name "tsconfig.json" -o -name "babel.config.*" -o -name "package.json" | head -20Repository: openfrontio/OpenFrontIO
Length of output: 190
Unicode property escapes have solid modern browser support, but transpilation may be needed for legacy support.
Unicode property escapes (\p{L}, \p{N}) are supported in all modern browsers—Chrome 64+, Firefox 78+, Safari 11.1+, and Edge 79+. However:
- If your project targets ES2018+ and modern browsers only, no action is needed.
- If you support older browsers or need broad compatibility, use a transpiler (e.g.,
regexpuvia Babel) to convert Unicode property escapes to equivalent character classes. - Note:
\p{Emoji}is supported but has limitations for complex emoji (multi-codepoint sequences, ZWJ sequences). For reliable emoji detection, consider\p{Extended_Pictographic}or combineEmoji_Presentation/Emoji_Modifierproperties with sequence handling.
Verify your project's target browsers in tsconfig.json, .browserslistrc, or babel.config.js to determine if transpilation is required.
🤖 Prompt for AI Agents
In src/core/validations/username.ts around lines 25-27, the regex uses Unicode
property escapes (\p{L}, \p{N}, \p{Emoji}) which may break on older runtimes and
\p{Emoji} misses complex emoji sequences; either ensure your build targets
ES2018+ and supported browsers in tsconfig/browserslist so no transpilation is
needed, or enable regex transpilation (e.g., Babel regexpu/polyfill or the
regexpu-loader/plugin) so \p{...} is converted for legacy environments;
additionally replace \p{Emoji} with \p{Extended_Pictographic} (and consider
handling ZWJ/emoji modifier sequences separately) to more reliably match emoji
sequences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Firstly, this is a video game, where such issues are relatively unproblematic. To ensure protection against homoglyph attacks, simply standardise characters that look the same. For example, use the Latin letter “a” instead of the Cyrillic letter “a”. However, if a user enters a Cyrillic “a”, replace it with a Latin “a” in the background. Deal with all other problematic characters in the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check what obscenity exports and if confusables utilities are available
echo "=== Checking obscenity imports and usage ==="
rg -n "from ['\"]obscenity['\"]" -A 2 -B 2
echo -e "\n=== Checking package.json for obscenity version and related libraries ==="
cat package.json | jq '.dependencies, .devDependencies' | grep -i "obscen\|confus"
echo -e "\n=== Checking if there's a confusables library already in use ==="
rg -n "confusables" --type ts --type js -g '!node_modules'
echo -e "\n=== Checking for any existing normalization utilities ==="
rg -n "normalize|normaliz" src/core/validations/username.ts
Length of output: 653
🌐 Web query:
obscenity npm library confusables API exports javascript
💡 Result:
Short answer — obscenity supports confusable/unicode-homoglyph normalization as a transformer in its matching pipeline. Import the library exports (matchers, censor, datasets and the recommended transformers) and include the recommended transformers (which contain confusables normalization) when you build a matcher.
Example (CommonJS / ESM):
- Import top-level exports: RegExpMatcher, TextCensor, englishDataset, englishRecommendedTransformers. Use them to create a matcher that includes confusables handling:
const { RegExpMatcher, TextCensor, englishDataset, englishRecommendedTransformers } = require('obscenity');
const matcher = new RegExpMatcher({ ...englishDataset.build(), ...englishRecommendedTransformers });
const censor = new TextCensor();
const matches = matcher.getAllMatches('ʃ𝐟ʃὗƈk ỹоứ'); // confusable/form-mangled text
console.log(censor.applyTo('ʃ𝐟ʃὗƈk ỹоứ', matches));
Sources:
- obscenity README / package (exports and usage examples). [1]
- confusables (the npm library used for Unicode confusable handling / background on confusables normalization). [2]
References:
[1] obscenity (npm / GitHub) — package README and API examples.
[2] confusables (npm) — library for removing/injecting confusable characters.
🌐 Web query:
confusables npm library normalize unicode homoglyphs javascript
💡 Result:
Short answer — yes.
-
The npm package "confusables" (gc/confusables) normalizes/removes Unicode homoglyphs for English; API includes remove(...) and obfuscate(...). Example:
import remove from 'confusables'; remove('fück') // -> 'fuck'. [1] -
Alternatives that use Unicode's confusables data:
- unicode-confusables / @ensdomains/unicode-confusables — provides isConfusing, confusables, rectifyConfusion. Good for detection + rectification. [2]
- homoglypher — utility to normalize homoglyphs to ASCII using Unicode's confusables.txt. [3]
-
Underlying standard: these libraries are based on Unicode's confusables mappings (UTS Public lobbies map will now be picked from a randomized playlist, assuring each map is played at least once, without duplicates. #39 / confusables.txt). For robust/security-sensitive use (IDNs, identifiers), follow UTS Public lobbies map will now be picked from a randomized playlist, assuring each map is played at least once, without duplicates. #39 guidance rather than ad-hoc rules. [4]
Sources:
[1] npm "confusables" (gc/confusables).
[2] npm "unicode-confusables" / @ensdomains.
[3] npm "homoglypher".
[4] Unicode Technical Standard UTS #39 (confusables.txt).
@Otzie2023, that's a pragmatic approach for the gaming context! Here's how to implement confusables normalization:
Recommended Implementation
Install the confusables package, which normalizes visually similar Unicode characters to their Latin equivalents:
npm install confusablesThen update src/core/validations/username.ts:
import { remove as removeConfusables } from 'confusables';
// Add a new normalization function
export function normalizeUsername(username: string): string {
// Convert confusables (Cyrillic "а" → Latin "a", etc.)
return removeConfusables(username);
}
// Update sanitizeUsername to apply normalization
export function sanitizeUsername(str: string): string {
const normalized = normalizeUsername(str);
const sanitized = Array.from(normalized)
.filter((ch) => validPattern.test(ch))
.join("")
.slice(0, MAX_USERNAME_LENGTH);
return sanitized.padEnd(MIN_USERNAME_LENGTH, "x");
}
// Optionally normalize in validation for consistency
export function validateUsername(username: string): {
isValid: boolean;
error?: string;
} {
if (typeof username !== "string") {
return { isValid: false, error: translateText("username.not_string") };
}
// Normalize before validating
const normalizedUsername = normalizeUsername(username);
if (normalizedUsername.length < MIN_USERNAME_LENGTH) {
return {
isValid: false,
error: translateText("username.too_short", {
min: MIN_USERNAME_LENGTH,
}),
};
}
if (normalizedUsername.length > MAX_USERNAME_LENGTH) {
return {
isValid: false,
error: translateText("username.too_long", {
max: MAX_USERNAME_LENGTH,
}),
};
}
if (!validPattern.test(normalizedUsername)) {
return {
isValid: false,
error: translateText("username.invalid_chars", {
max: MAX_USERNAME_LENGTH,
}),
};
}
return { isValid: true };
}This way, when a user enters "Аdmin" (with Cyrillic "а"), it gets normalized to "Admin" (Latin) in the background, preventing duplicate-looking usernames while still allowing genuine international names like "José" or "李明" to pass through.
🧬 Code Graph Analysis Results
<src/core/Util.ts> (lines 64-72)
export function simpleHash(str: string): number {
let hash = 0;
for (let i = 0; i < str.length; i++) {
const char = str.charCodeAt(i);
hash = (hash << 5) - hash + char;
hash = hash & hash; // Convert to 32-bit integer
}
return Math.abs(hash);
}
<src/core/Util.ts> (lines 173-177)
export function sanitize(name: string): string {
return Array.from(name)
.join("")
.replace(/[^\p{L}\p{N}\s\p{Emoji}\p{Emoji_Component}[\]_]/gu, "");
}
<src/core/Util.ts> (lines 347-350)
export function getClanTagOriginalCase(name: string): string | null {
const clanTag = clanMatch(name);
return clanTag ? clanTag[1] : null;
}
🧠 Learnings used
Learnt from: Aotumuri
Repo: openfrontio/OpenFrontIO PR: 1534
File: src/client/LangSelector.ts:97-106
Timestamp: 2025-07-23T12:36:35.354Z
Learning: In OpenFrontIO's LangSelector.ts, the getClosestSupportedLang method always joins language code parts with underscores ("_") because all keys in the languageMap use underscore format (e.g., pt_BR, sv_SE, zh_CN). This normalization ensures consistency regardless of whether the input language code uses hyphens or underscores as delimiters.
|
We currently don't have plans to support emojies in usernames |
|
Then simply set up a block for emojis (U+1F600 to U+1F64F in UTF-8). However, the use of non-English letters in user names would be very important. Many proper names cannot be represented using ASCII characters alone. |
The problem we have on hand is that moderating any of the usernames is going to become borderline impossible if we allow all unicode characters. That's why only the english alphabet is currently supported. |
|
What do you mean by moderating? |
By moderating we mean trying to minimize the inappropriate usernames |
Moderation would be possible with profanity-cleaner or leo-profanity in JS/TS. Because only 5 blocks are used, the possibilities for inappropriate names are also reduced. Google's Perspective API would also be an option. |
|
I would say that the best compromise would be to activate these Unicode blocks:
|
Description:
This PR adds Unicode support for usernames, allowing users to create usernames with international characters and emojis. Previously, usernames were restricted to ASCII characters only, which prevented users from using proper names in languages like Chinese, Arabic, Cyrillic, and others, as well as emojis.
Changes:
src/core/validations/username.tsto support:\p{L}) - supports all international characters\p{N})\p{Emoji})resources/lang/en.jsonto reflect the broader character supportExamples of now-supported usernames:
!,@,#, etc.Please complete the following:
Please put your Discord username so you can be contacted if a bug or regression is found:
orion_nebula22
or just Orion Nebula