Environment Variables You may wish to specify settings of several important environment variables that affect Cygwin's operation. Some of these settings need to be in effect prior to launching the initial Cygwin session (before starting your bash shell, for instance). They should therefore be set in the Windows environment; all Windows environment variables are imported when Cygwin starts. Such settings can be placed in a .bat file. An initial file is named Cygwin.bat and is created in the Cygwin root directory that you specified during setup. Note that the "Cygwin" option of the Start Menu points to Cygwin.bat. Edit Cygwin.bat to your liking or create your own .bat files to start Cygwin processes. The CYGWIN variable is used to configure many global settings for the Cygwin runtime system. Initially you can leave CYGWIN unset or set it to tty (e.g. to support job control with ^Z etc...) using a syntax like this in the DOS shell, before launching bash. C:\> set CYGWIN=tty notitle glob Locale support is controlled by the LANG and LC_xxx environment variables. You can set all of them but Cygwin itself only honors the variables LC_ALL, LC_CTYPE, and LANG, in this order, according to the POSIX standard. The first one found rules. For a more detailed description see . The PATH environment variable is used by Cygwin applications as a list of directories to search for executable files to run. This environment variable is converted from Windows format (e.g. C:\Windows\system32;C:\Windows) to UNIX format (e.g., /cygdrive/c/Windows/system32:/cygdrive/c/Windows) when a Cygwin process first starts. Set it so that it contains at least the x:\cygwin\bin directory where "x:\cygwin is the "root" of your cygwin installation if you wish to use cygwin tools outside of bash. This is usually done by the batch file you're starting your shell with. The HOME environment variable is used by many programs to determine the location of your home directory and we recommend that it be defined. This environment variable is also converted from Windows format when a Cygwin process first starts. It's usually set in the shell profile scripts in the /etc directory. The TERM environment variable specifies your terminal type. It is automatically set to cygwin if you have not set it to something else. The LD_LIBRARY_PATH environment variable is used by the Cygwin function dlopen () as a list of directories to search for .dll files to load. This environment variable is converted from Windows format to UNIX format when a Cygwin process first starts. Most Cygwin applications do not make use of the dlopen () call and do not need this variable. In addition to PATH, HOME, and LD_LIBRARY_PATH, there are three other environment variables which, if they exist in the Windows environment, are converted to UNIX format: TMPDIR, TMP, and TEMP. The first is not set by default in the Windows environment but the other two are, and they point to the default Windows temporary directory. If set, these variables will be used by some Cygwin applications, possibly with unexpected results. You may therefore want to unset them by adding the following two lines to your ~/.bashrc file: unset TMP unset TEMP This is done in the default ~/.bashrc file. Alternatively, you could set TMP and TEMP to point to /tmp or to any other temporary directory of your choice. For example: export TMP=/tmp export TEMP=/tmp Changing Cygwin's Maximum Memory Cygwin's heap is extensible. However, it does start out at a fixed size and attempts to extend it may run into memory which has been previously allocated by Windows. In some cases, this problem can be solved by adding an entry in the either the HKEY_LOCAL_MACHINE (to change the limit for all users) or HKEY_CURRENT_USER (for just the current user) section of the registry. Add the DWORD value heap_chunk_in_mb and set it to the desired memory limit in decimal MB. It is preferred to do this in Cygwin using the regtool program included in the Cygwin package. (For more information about regtool or the other Cygwin utilities, see or use the --help option of each util.) You should always be careful when using regtool since damaging your system registry can result in an unusable system. This example sets memory limit to 1024 MB: regtool -i set /HKLM/Software/Cygwin/heap_chunk_in_mb 1024 regtool -v list /HKLM/Software/Cygwin Exit all running Cygwin processes and restart them. Memory can be allocated up to the size of the system swap space minus any the size of any running processes. The system swap should be at least as large as the physically installed RAM and can be modified under the System category of the Control Panel. Here is a small program written by DJ Delorie that tests the memory allocation limit on your system: main() { unsigned int bit=0x40000000, sum=0; char *x; while (bit > 4096) { x = malloc(bit); if (x) sum += bit; bit >>= 1; } printf("%08x bytes (%.1fMb)\n", sum, sum/1024.0/1024.0); return 0; } You can compile this program using: gcc max_memory.c -o max_memory.exe Run the program and it will output the maximum amount of allocatable memory. Internationalization Overview Internationalization support is controlled by the LANG and LC_xxx environment variables. You can set all of them but Cygwin itself only honors the variables LC_ALL, LC_CTYPE, and LANG, in this order, according to the POSIX standard. The content of these variables should follow the POSIX standard for a locale specifier. The correct form of a locale specifier is language[[_TERRITORY][.charset][@modifier]] "language" is a lowercase two character string per ISO 639-1, or, if there is no ISO 639-1 code for the language (for instance, "Lower Sorbian"), a three character string per ISO 639-3. "TERRITORY" is an uppercase two character string per ISO 3166, charset is one of a list of supported character sets. The modifier doesn't matter here (though some are recognized, see below). If you're interested in the exact description, you can find it in the online publication of the POSIX manual pages on the homepage of the Open Group. Typical locale specifiers are "de_CH" language = German, territory = Switzerland, default charset "fr_FR.UTF-8" language = french, territory = France, charset = UTF-8 "ko_KR.eucKR" language = korean, territory = South Korea, charset = eucKR "syr_SY" language = Syriac, territory = Syria, default charset If the locale specifier does not follow the above form, Cygwin checks if the locale is one of the locale aliases defined in the file /usr/share/locale/locale.alias. If so, and if the replacement localename is supported by the underlying Windows, the locale is accepted, too. So, given the default content of the /usr/share/locale/locale.alias file, the below examples would be valid locale specifiers as well. "catalan" defined as "ca_ES.ISO-8859-1" in locale.alias "japanese" defined as "ja_JP.eucJP" in locale.alias "turkish" defined as "tr_TR.ISO-8859-9" in locale.alias The file /usr/share/locale/locale.alias is provided by the gettext package under Cygwin. At application startup, the application's locale is set to the default "C" or "POSIX" locale. Under Cygwin 1.7.2 and later, this locale defaults to the ASCII character set on the application level. If you want to stick to the "C" locale and only change to another charset, you can define this by setting one of the locale environment variables to "C.charset". For instance "C.ISO-8859-1" The default locale in the absence of the aforementioned locale environment variables is "C.UTF-8". Windows uses the UTF-16 charset exclusively to store the names of any object used by the Operating System. This is especially important with filenames. Cygwin uses the setting of the locale environment variables LC_ALL, LC_CTYPE, and LANG, to determine how to convert Windows filenames from their UTF-16 representation to the singlebyte or multibyte character set used by Cygwin. The setting of the locale environment variables at process startup is effective for Cygwin's internal conversions to and from the Windows UTF-16 object names for the entire lifetime of the current process. Changing the environment variables to another value changes the way filenames are converted in subsequently started child processes, but not within the same process. However, even if one of the locale environment variables is set to some other value than "C", this does only affect how Cygwin itself converts filenames. As the POSIX standard requires, it's the application's responsibility to activate that locale for its own purposes, typically by using the call setlocale (LC_ALL, ""); early in the application code. Again, so that this doesn't get lost: If the application calls setlocale as above, and there is none of the important locale variables set in the environment, the locale is set to the default locale, which is "C.UTF-8". But what about applications which are not locale-aware? Per POSIX, they are running in the "C" or "POSIX" locale, which implies the ASCII charset. The Cygwin DLL itself, however, will nevertheless use the locale set in the environment (or the "C.UTF-8" default locale) for converting filenames etc. When the locale in the environment specifies an ASCII charset, for example "C" or "en_US.ASCII", Cygwin will still use UTF-8 under the hood to translate filenames. This allows for easier interoperability with applications running in the default "C.UTF-8" locale. Starting with Cygwin 1.7.2, the language and territory are used to fetch locale-dependent information from Windows. If the language and territory are not known to Windows, the setlocale function fails. The following modifiers are recognized. Any other modifier is simply ignored for now. For languages which default to the ISO-8859-1 character set, the modifier "@euro" can be added to enforce usage of the ISO-8859-15 character set, which includes a character for the "Euro" currency sign. Beware, that also works for non-european locales. The default script used for all Serbian language locales (sr_BA, sr_ME, sr_RS, and the deprecated sr_CS and sr_SP) is cyrillic. With the "@latin" modifier it gets switched to the latin script with the respective collation behaviour. The default charset of the "be_BY" locale (Belarusian/Belarus) is CP1251. With the "@latin" modifier it's UTF-8. The default charset of the "tt_RU" locale (Tatar/Russia) is ISO-8859-5. With the "@iqtelif" modifier it's UTF-8. The default charset of the "uz_UZ" locale (Uzbek/Uzbekistan) is ISO-8859-1. With the "@cyrillic" modifier it's UTF-8. There's a class of characters in the Unicode character set, called the "CJK Ambiguous Width Character set". For these characters the width returned by the wcwidth/wcswidth function is usually 1. This is often a problem in East-Asian languages, which historically use character sets in which these characters have a width of 2. By default, the wcwidth/wcswidth functions return 1 as the width of these characters, except if the language is specifed as "ja" (Japanese), "ko" (Korean), or "zh" (Chinese). In these languages wcwidth and wcswidth return 2 for these characters. This is not correct in all circumstances, so the user of one of these languages can specify the modifier "@cjknarrow", which modifies the behaviour of wcwidth/wcswidth to return 1 for the ambiguous width characters. How to set the locale Assume that you've set one of the aforementioned environment variables to some valid POSIX locale value, other than "C" and "POSIX". Assume further that you're living in Japan. You might want to use the language code "ja" and the territory "JP", thus setting, say, LANG to "ja_JP". You didn't set a character set, so what will Cygwin use now? Starting with Cygwin 1.7.2, the default character set is determined by the default Windows ANSI codepage for this language and territory. Cygwin uses a character set which is the typical Unix-equivalent to the Windows ANSI codepage. For instance: "en_US" ISO-8859-1 "el_GR" ISO-8859-7 "pl_PL" ISO-8859-2 "pl_PL@euro" ISO-8859-15 "ja_JP" EUCJP "ko_KR" EUCKR "te_IN" UTF-8 You don't want to use the default character set? In that case you have to specify the charset explicitly. For instance, assume you're from Japan and don't want to use the japanese default charset EUC-JP, but the Windows default charset SJIS. What you can do, for instance, is to set the LANG variable in the C:\cygwin\Cygwin.bat file which is the batch file to start a Cygwin session from the "Cygwin" desktop shortcut. @echo off C: chdir C:\cygwin\bin set LANG=ja_JP.SJIS bash --login -i For a list of locales supported by your Windows machine, use the new >getlocale -a command, which is part of the Cygwin package. For a description see For a list of supported character sets, see Last, but not least, most singlebyte or doublebyte charsets have a big disadvantage. Windows filesystems use the Unicode character set in the UTF-16 encoding to store filename information. Not all characters from the Unicode character set are available in a singlebyte or doublebyte charset. While Cygwin has a workaround to access files with unusual characters (see ), a better workaround is to use always the UTF-8 character set.i UTF-8 is the only multibyte character set which can represent every Unicode character. set LANG=es_MX.UTF-8 For a description of the Unicode standard, see the homepage of the Unicode Consortium. The Windows Console character set Most of the time the Windows console is used to run Cygwin applications. While terminal emulations like xterm or mintty have a distinct way to set the character set used for in- and output, the Windows console hasn't such a way, since it's not an application in its own right. This problem is solved in Cygwin as follows. When a Cygwin process is started in a Windows console (either explicitly from cmd.exe, or implicitly by, for instance, clicking on the Cygwin desktop icon, or running the Cygwin.bat file), the Console character set is determined by the setting of the aforementioned internationalization environment variables, the same way as described in . What is that good for? Why not switch the console character set with the applications requirements? After all, the application knows if it uses localization or not. However, what if a non-localized application calls a remote application which itself is localized? This can happen with ssh or rlogin. Both commands don't have and don't need localization and they never call setlocale. Setting one of the internationalization environment variable to the same charset as the remote machine before starting ssh or rlogin fixes that problem. Potential Problems when using Locales You can set the above internationalization variables not only in Cygwin.bat or in the Windows environment, but also in your Cygwin shell on the fly, even switch to yet another character set, and yet another. In bash for instance: bash$ export LC_CTYPE="nl_BE.UTF-8" However, here's a problem. At the start of the first Cygwin process in a session, the Windows environment is converted from UTF-16 to UTF-8. The environment is another of the system objects stored in UTF-16 in Windows. As long as the environment only contains ASCII characters, this is no problem at all. But if it contains native characters, and you're planning to use, say, GBK, the environment will result in invalid characters in the GBK charset. This would be especially a problem in variables like PATH. To circumvent the worst problems, Cygwin converts the PATH environment variable to the charset set in the environment, if it's different from the UTF-8 charset. Per POSIX, the name of an environment variable should only consist of valid ASCII characters, and only of uppercase letters, digits, and the underscore for maximum portablilty. Symbolic links, too, may pose a problem when switching charsets on the fly. A symbolic link contains the filename of the target file the symlink points to. When a symlink had been created with older versions of Cygwin, the current ANSI or OEM character set had been used to store the target filename, dependent on the old CYGWIN environment variable setting codepage (see . If the target filename contains non-ASCII characters and you use another character set than your default ANSI/OEM charset, the target filename of the symlink is now potentially an invalid character sequence in the new character set. This behaviour is not different from the behaviour in other Operating Systems. So, if you suddenly can't access a symlink anymore which worked all these years before, maybe it's because you switched to another character set. This doesn't occur with symlinks created with Cygwin 1.7 or later. Another problem you might encounter is that older versions of Windows did not install all charsets by default. If you are running Windows XP or older, you can open the "Regional and Language Options" portion of the Control Panel, select the "Advanced" tab, and select entries from the "Code page conversion tables" list. The following entries are useful to cygwin: 932/SJIS, 936/GBK, 949/EUC-KR, 950/Big5, 20932/EUC-JP. What does not work? The environment variable and locale setting LC_MESSAGES is ignored right now. There's no known WIndows function to fetch the regular expressions to recognize user input with the meaning of "yes" or "no" from some Windows function. Therefore, nl_langinfo(YESEXPR) and nl_langinfo(NOEXPR) always return a string suitable only for the English language. If somebody knows a simple solution to this problem, feel free to notify us on the Cygwin mailing list. List of supported character sets Last but not least, here's the list of currently supported character sets. The left-hand expression is the name of the charset, as you would use it in the internationalization environment variables as outlined above. Note that charset specifiers are case-insensitive. EUCJP is equivalent to eucJP or eUcJp. Writing the charset in the exact case as given in the list below is a good convention, though. The right-hand side is the number of the equivalent Windows codepage as well as the Windows name of the codepage. They are only noted here for reference. Don't try to use the bare codepage number or the Windows name of the codepage as charset in locale specifiers, unless they happen to be identical with the left-hand side. Especially in case of the "CPxxx" style charsets, always use them with the trailing "CP". This works: set LC_ALL=en_US.CP437 This does not work: set LC_ALL=en_US.437 You can find a full list of Windows codepages on the Microsoft MSDN page Code Page Identifiers. Charset Codepage ------------------- ------------------------------------------- ASCII 20127 (US_ASCII) CP437 437 (OEM United States) CP720 720 (DOS Arabic) CP737 737 (OEM Greek) CP775 775 (OEM Baltic) CP850 850 (OEM Latin 1, Western European) CP852 852 (OEM Latin 2, Central European) CP855 855 (OEM Cyrillic) CP857 857 (OEM Turkish) CP858 858 (OEM Latin 1 + Euro Symbol) CP862 862 (OEM Hebrew) CP866 866 (OEM Russian) CP874 874 (ANSI/OEM Thai) CP932 932 (Shift_JIS, not exactly identical to SJIS) CP1125 1125 (OEM Ukraine) CP1250 1250 (ANSI Central European) CP1251 1251 (ANSI Cyrillic) CP1252 1252 (ANSI Latin 1, Western European) CP1253 1253 (ANSI Greek) CP1254 1254 (ANSI Turkish) CP1255 1255 (ANSI Hebrew) CP1256 1256 (ANSI Arabic) CP1257 1257 (ANSI Baltic) CP1258 1258 (ANSI/OEM Vietnamese) ISO-8859-1 28591 (ISO-8859-1) ISO-8859-2 28592 (ISO-8859-2) ISO-8859-3 28593 (ISO-8859-3) ISO-8859-4 28594 (ISO-8859-4) ISO-8859-5 28595 (ISO-8859-5) ISO-8859-6 28596 (ISO-8859-6) ISO-8859-7 28597 (ISO-8859-7) ISO-8859-8 28598 (ISO-8859-8) ISO-8859-9 28599 (ISO-8859-9) ISO-8859-10 - (not available) ISO-8859-11 - (not available) ISO-8859-13 28603 (ISO-8859-13) ISO-8859-14 - (not available) ISO-8859-15 28605 (ISO-8859-15) ISO-8859-16 - (not available) Big5 950 (ANSI/OEM Traditional Chinese) EUCJP or euc-JP 20932 (EUC Japanese) EUCKR or euc-KR 949 (EUC Korean) GBK 936 (ANSI/OEM Simplified Chinese) GEORGIAN-PS - (not available) KOI8-R 20866 (KOI8-R Russian Cyrillic) KOI8-U 21866 (KOI8-U Ukrainian Cyrillic) PT154 - (not available) SJIS - (not available, almost, but not exactly CP932) TIS620 or TIS-620 874 (ANSI/OEM Thai) UTF-8 or utf8 65001 (UTF-8) Customizing bash To set up bash so that cut and paste work properly, click on the "Properties" button of the window, then on the "Misc" tab. Make sure that "QuickEdit mode" and "Insert mode" are checked. These settings will be remembered next time you run bash from that shortcut. Similarly you can set the working directory inside the "Program" tab. The entry "%HOME%" is valid, but requires that you set HOME in the Windows environment. Your home directory should contain three initialization files that control the behavior of bash. They are .profile, .bashrc and .inputrc. The Cygwin base installation creates stub files when you start bash for the first time. .profile (other names are also valid, see the bash man page) contains bash commands. It is executed when bash is started as login shell, e.g. from the command bash --login. This is a useful place to define and export environment variables and bash functions that will be used by bash and the programs invoked by bash. It is a good place to redefine PATH if needed. We recommend adding a ":." to the end of PATH to also search the current working directory (contrary to DOS, the local directory is not searched by default). Also to avoid delays you should either unset MAILCHECK or define MAILPATH to point to your existing mail inbox. .bashrc is similar to .profile but is executed each time an interactive bash shell is launched. It serves to define elements that are not inherited through the environment, such as aliases. If you do not use login shells, you may want to put the contents of .profile as discussed above in this file instead. shopt -s nocaseglob will allow bash to glob filenames in a case-insensitive manner. Note that .bashrc is not called automatically for login shells. You can source it from .profile. .inputrc controls how programs using the readline library (including bash) behave. It is loaded automatically. For full details see the Function and Variable Index section of the GNU readline manual. Consider the following settings: # Ignore case while completing set completion-ignore-case on # Make Bash 8bit clean set meta-flag on set convert-meta off set output-meta on The first command makes filename completion case insensitive, which can be convenient in a Windows environment. The next three commands allow bash to display 8-bit characters, useful for languages with accented characters. Note that tools that do not use readline for display, such as less and ls, require additional settings, which could be put in your .bashrc: alias less='/bin/less -r' alias ls='/bin/ls -F --color=tty --show-control-chars'