qmk-vial/docs/feature_unicode.md
2022-08-31 16:39:16 -07:00

19 KiB
Raw Blame History

Unicode Support

Unicode characters can be input straight from your keyboard! There are some limitations, however.

In order to enable Unicode support on your keyboard, you will need to do the following:

  1. Choose one of three supported Unicode implementations: Basic Unicode, Unicode Map, UCIS.
  2. Find which input mode is the best match for your operating system and setup.
  3. Set the appropriate input mode (or modes) in your configuration.
  4. Add Unicode keycodes to your keymap.

1. Methods :id=methods

QMK supports three different methods for enabling Unicode input and adding Unicode characters to your keymap. Each has its pros and cons in terms of flexibility and ease of use. Choose the one that best fits your use case.

The Basic method should be enough for most users. However, if you need a wider range of supported characters (including emoji, rare symbols etc.), you should use Unicode Map.


1.1. Basic Unicode :id=basic-unicode

The easiest to use method, albeit somewhat limited. It stores Unicode characters as keycodes in the keymap itself, so it only supports code points up to 0x7FFF. This covers characters for most modern languages (including East Asian), as well as symbols, but it doesn't cover emoji.

Add the following to your rules.mk:

UNICODE_ENABLE = yes

Then add UC(c) keycodes to your keymap, where c is the code point of the desired character (preferably in hexadecimal, up to 4 digits long). For example, UC(0x40B) will output Ћ, and UC(0x30C4) will output .


1.2. Unicode Map :id=unicode-map

In addition to standard character ranges, this method also covers emoji, ancient scripts, rare symbols etc. In fact, all possible code points (up to 0x10FFFF) are supported. Here, Unicode characters are stored in a separate mapping table. You need to maintain a unicode_map array in your keymap file, which may contain at most 16384 entries.

Add the following to your rules.mk:

UNICODEMAP_ENABLE = yes

Then add X(i) keycodes to your keymap, where i is the desired character's index in the mapping table. This can be a numeric value, but it's recommended to keep the indices in an enum and access them by name.

enum unicode_names {
    BANG,
    IRONY,
    SNEK
};

const uint32_t PROGMEM unicode_map[] = {
    [BANG]  = 0x203D,  // ‽
    [IRONY] = 0x2E2E,  // ⸮
    [SNEK]  = 0x1F40D, // 🐍
};

Then you can use X(BANG), X(SNEK) etc. in your keymap.

Lower and Upper Case

Characters often come in lower and upper case pairs, such as å and Å. To make inputting these characters easier, you can use XP(i, j) in your keymap, where i and j are the mapping table indices of the lower and upper case character, respectively. If you're holding down Shift or have Caps Lock turned on when you press the key, the second (upper case) character will be inserted; otherwise, the first (lower case) version will appear.

This is most useful when creating a keymap for an international layout with special characters. Instead of having to put the lower and upper case versions of a character on separate keys, you can have them both on the same key by using XP(). This helps blend Unicode keys in with regular alphas.

Due to keycode size constraints, i and j can each only refer to one of the first 128 characters in your unicode_map. In other words, 0 ≤ i ≤ 127 and 0 ≤ j ≤ 127. This is enough for most use cases, but if you'd like to customize the index calculation, you can override the unicodemap_index() function. This also allows you to, say, check Ctrl instead of Shift/Caps.


1.3. UCIS :id=ucis

This method also supports all possible code points. As with the Unicode Map method, you need to maintain a mapping table in your keymap file. However, there are no built-in keycodes for this feature — you have to create a custom keycode or function that invokes this functionality.

Add the following to your rules.mk:

UCIS_ENABLE = yes

Then define a table like this in your keymap file:

const qk_ucis_symbol_t ucis_symbol_table[] = UCIS_TABLE(
    UCIS_SYM("poop", 0x1F4A9),                // 💩
    UCIS_SYM("rofl", 0x1F923),                // 🤣
    UCIS_SYM("cuba", 0x1F1E8, 0x1F1FA),       // 🇨🇺
    UCIS_SYM("look", 0x0CA0, 0x005F, 0x0CA0)  // ಠ_ಠ
);

By default, each table entry may be up to 3 code points long. This number can be changed by adding #define UCIS_MAX_CODE_POINTS n to your config.h file.

To use UCIS input, call qk_ucis_start(). Then, type the mnemonic for the character (such as "rofl") and hit Space, Enter or Esc. QMK should erase the "rofl" text and insert the laughing emoji.

Customization

There are several functions that you can define in your keymap to customize the functionality of this feature.

  • void qk_ucis_start_user(void) This runs when you call the "start" function, and can be used to provide feedback. By default, it types out a keyboard emoji.
  • void qk_ucis_success(uint8_t symbol_index) This runs when the input has matched something and has completed. By default, it doesn't do anything.
  • void qk_ucis_symbol_fallback (void) This runs when the input doesn't match anything. By default, it falls back to trying that input as a Unicode code.

You can find the default implementations of these functions in process_ucis.c.

2. Input Modes :id=input-modes

Unicode input in QMK works by inputting a sequence of characters to the OS, sort of like a macro. Unfortunately, the way this is done differs for each platform. Specifically, each platform requires a different combination of keys to trigger Unicode input. Therefore, a corresponding input mode has to be set in QMK.

The following input modes are available:

  • UC_MAC: macOS built-in Unicode hex input. Supports code points up to 0x10FFFF (all possible code points).

    To enable, go to System Preferences > Keyboard > Input Sources, add Unicode Hex Input to the list (it's under Other), then activate it from the input dropdown in the Menu Bar. By default, this mode uses the left Option key (KC_LALT) for Unicode input, but this can be changed by defining UNICODE_KEY_MAC with a different keycode.

    !> Using the Unicode Hex Input input source may disable some Option-based shortcuts, such as Option+Left and Option+Right.

    !> UC_OSX is a deprecated alias of UC_MAC that will be removed in future versions of QMK. All new keymaps should use UC_MAC.

  • UC_LNX: Linux built-in IBus Unicode input. Supports code points up to 0x10FFFF (all possible code points).

    Enabled by default and works almost anywhere on IBus-enabled distros. Without IBus, this mode works under GTK apps, but rarely anywhere else. By default, this mode uses Ctrl+Shift+U (LCTL(LSFT(KC_U))) to start Unicode input, but this can be changed by defining UNICODE_KEY_LNX with a different keycode. This might be required for IBus versions ≥1.5.15, where Ctrl+Shift+U behavior is consolidated into Ctrl+Shift+E.

    Users who wish support in non-GTK apps without IBus may need to resort to a more indirect method, such as creating a custom keyboard layout (more on this method).

  • UC_WIN: (not recommended) Windows built-in hex numpad Unicode input. Supports code points up to 0xFFFF.

    To enable, create a registry key under HKEY_CURRENT_USER\Control Panel\Input Method of type REG_SZ called EnableHexNumpad and set its value to 1. This can be done from the Command Prompt by running reg add "HKCU\Control Panel\Input Method" -v EnableHexNumpad -t REG_SZ -d 1 with administrator privileges. Reboot afterwards. This mode is not recommended because of reliability and compatibility issues; use the UC_WINC mode instead.

  • UC_BSD: (non implemented) Unicode input under BSD. Not implemented at this time. If you're a BSD user and want to help add support for it, please open an issue on GitHub.

  • UC_WINC: Windows Unicode input using WinCompose. As of v0.9.0, supports code points up to 0x10FFFF (all possible code points).

    To enable, install the latest release. Once installed, WinCompose will automatically run on startup. This mode works reliably under all version of Windows supported by the app. By default, this mode uses right Alt (KC_RALT) as the Compose key, but this can be changed in the WinCompose settings and by defining UNICODE_KEY_WINC with a different keycode.

3. Setting the Input Mode :id=setting-the-input-mode

To set your desired input mode, add the following define to your config.h:

#define UNICODE_SELECTED_MODES UC_LNX

This example sets the board's default input mode to UC_LNX. You can replace this with UC_MAC, UC_WINC, or any of the other modes listed above. The board will automatically use the selected mode on startup, unless you manually switch to another mode (see below).

You can also select multiple input modes, which allows you to easily cycle through them using the UC_MOD/UC_RMOD keycodes.

#define UNICODE_SELECTED_MODES UC_MAC, UC_LNX, UC_WINC

Note that the values are separated by commas. The board will remember the last used input mode and will continue using it on next power-up. You can disable this and force it to always start with the first mode in the list by adding #define UNICODE_CYCLE_PERSIST false to your config.h.

Keycodes

You can switch the input mode at any time by using the following keycodes. Adding these to your keymap allows you to quickly switch to a specific input mode, including modes not listed in UNICODE_SELECTED_MODES.

Keycode Alias Input Mode Description
UNICODE_MODE_FORWARD UC_MOD Next in list Cycle through selected modes, reverse direction when Shift is held
UNICODE_MODE_REVERSE UC_RMOD Prev in list Cycle through selected modes in reverse, forward direction when Shift is held
UNICODE_MODE_MAC UC_M_MA UC_MAC Switch to macOS input
UNICODE_MODE_LNX UC_M_LN UC_LNX Switch to Linux input
UNICODE_MODE_WIN UC_M_WI UC_WIN Switch to Windows input
UNICODE_MODE_BSD UC_M_BS UC_BSD Switch to BSD input (not implemented)
UNICODE_MODE_WINC UC_M_WC UC_WINC Switch to Windows input using WinCompose
UNICODE_MODE_EMACS UC_M_EM UC_EMACS Switch to emacs (C-x-8 RET)

You can also switch the input mode by calling set_unicode_input_mode(x) in your code, where x is one of the above input mode constants (e.g. UC_LNX).

?> Using UNICODE_SELECTED_MODES is preferable to calling set_unicode_input_mode() in matrix_init_user() or similar functions, since it's better integrated into the Unicode system and has the added benefit of avoiding unnecessary writes to EEPROM.

Audio Feedback

If you have the Audio feature enabled on the board, you can set melodies to be played when you press the above keys. That way you can have some audio feedback when switching input modes.

For instance, you can add these definitions to your config.h file:

#define UNICODE_SONG_MAC  AUDIO_ON_SOUND
#define UNICODE_SONG_LNX  UNICODE_LINUX
#define UNICODE_SONG_BSD  TERMINAL_SOUND
#define UNICODE_SONG_WIN  UNICODE_WINDOWS
#define UNICODE_SONG_WINC UNICODE_WINDOWS

Additional Customization

Because Unicode is a large and versatile feature, there are a number of options you can customize to make it work better on your system.

Start and Finish Input Functions

The functions for starting and finishing Unicode input on your platform can be overridden locally. Possible uses include customizing input mode behavior if you don't use the default keys, or adding extra visual/audio feedback to Unicode input.

  • void unicode_input_start(void) This sends the initial sequence that tells your platform to enter Unicode input mode. For example, it holds the left Alt key followed by Num+ on Windows, and presses the UNICODE_KEY_LNX combination (default: Ctrl+Shift+U) on Linux.
  • void unicode_input_finish(void) This is called to exit Unicode input mode, for example by pressing Space or releasing the Alt key.

You can find the default implementations of these functions in process_unicode_common.c.

Input Mode Callbacks

There are callbacks functions available that are called whenever the unicode input mode changes. The new input mode is passed to the function.

Callback Description
unicode_input_mode_set_kb(uint8_t input_mode) Callback for unicode input mode set, for keyboard.
unicode_input_mode_set_user(uint8_t input_mode) Callback for unicode input mode set, for users.

This feature can be used, for instance, to implement LED indicators for the current unicode input mode.

Input Key Configuration

You can customize the keys used to trigger Unicode input for macOS, Linux and WinCompose by adding corresponding defines to your config.h. The default values match the platforms' default settings, so you shouldn't need to change this unless Unicode input isn't working, or you want to use a different key (e.g. in order to free up left or right Alt).

Define Type Default Example
UNICODE_KEY_MAC uint8_t KC_LALT #define UNICODE_KEY_MAC KC_RALT
UNICODE_KEY_LNX uint16_t LCTL(LSFT(KC_U)) #define UNICODE_KEY_LNX LCTL(LSFT(KC_E))
UNICODE_KEY_WINC uint8_t KC_RALT #define UNICODE_KEY_WINC KC_RGUI

Sending Unicode Strings

QMK provides several functions that allow you to send Unicode input to the host programmatically:

send_unicode_string()

This function is much like send_string(), but it allows you to input UTF-8 characters directly. It supports all code points, provided the selected input mode also supports it. Make sure your keymap.c file is formatted using UTF-8 encoding.

send_unicode_string("(ノಠ痊ಠ)ノ彡┻━┻");

Example uses include sending Unicode strings when a key is pressed, as described in Macros.

Additional Language Support

In quantum/keymap_extras, you'll see various language files — these work the same way as the ones for alternative layouts such as Colemak or BÉPO. When you include one of these language headers, you gain access to keycodes specific to that language / national layout. Such keycodes are defined by a 2-letter country/language code, followed by an underscore and a 4-letter abbreviation of the character to which the key corresponds. For example, including keymap_french.h and using FR_UGRV in your keymap will output ù when typed on a system with a native French AZERTY layout.

If the primary system layout you use on your machine is different from US ANSI, using these language-specific keycodes can help your QMK keymaps better match what will actually be output on the screen. However, keep in mind that these keycodes are just aliases for the corresponding default US keycodes under the hood, and that the HID protocol used by keyboards is itself inherently based on US ANSI.

International Characters on Windows

AutoHotkey

The method does not require Unicode support in the keyboard itself but instead depends on AutoHotkey running in the background.

First you need to select a modifier combination that is not in use by any of your programs. Ctrl+Alt+Win is not used very widely and should therefore be perfect for this. There is a macro defined for a mod-tab combo LCAG_T. Add this mod-tab combo to a key on your keyboard, e.g.: LCAG_T(KC_TAB). This makes the key behave like a tab key if pressed and released immediately but changes it to the modifier if used with another key.

In the default script of AutoHotkey you can define custom hotkeys.

<^<!<#a::Send, ä
<^<!<#<+a::Send, Ä

The hotkeys above are for the combination CtrlAltGui and CtrlAltGuiShift plus the letter a. AutoHotkey inserts the Text right of Send, when this combination is pressed.

US International

If you enable the US International layout on the system, it will use punctuation to accent the characters. For instance, typing "`a" will result in à. You can find details on how to enable this here.

Software keyboard layout on Linux :id=custom-linux-layout

This method does not require Unicode support on the keyboard itself but instead uses a custom keyboard layout for Xorg. This is how special characters are inserted by regular keyboards. This does not require IBus and works in practically all software. Help on creating a custom layout can be found here, here and here. An example of how you could edit the us layout to gain 🤣 on RALT(KC_R):

Edit the keyboard layout file /usr/share/X11/xkb/symbols/us.

Inside xkb_symbols "basic" {, add include "level3(ralt_switch)".

Find the line defining the R key and add an entry to the list, making it look like this:

key <AD04> {	[	  r,	R, U1F923		]	};

Save the file and run the command setxkbmap us to reload the layout.

You can define one custom character for key defined in the layout, and another if you populate the fourth layer. Additional layers up to 8th are also possible.

This method is specific to the computer on which you set the custom layout. The custom keys will be available only when Xorg is running. To avoid accidents, you should always reload the layout using setxkbmap, otherwise an invalid layout could prevent you from logging into your system, locking you out.