.. _i18n:

######################################
Internationalization Coding Guidelines
######################################

Preparing code to be presented in many languages can be complex and difficult.
This section presents best practices for marking English strings in source so
that they can be extracted, translated, and displayed to the user in the
language of their choice.

.. contents::
   :depth: 1
   :local:


**********************************
General Internationalization Rules
**********************************

For source files to be successfully localized, you need to prepare them so that
any human-readable strings can be extracted by a pre-processing step, and then
have localized strings used at runtime. This preparation requires attention to
detail, and unfortunately limits what you can do with strings in the code.

Follow these general rules for internationalizing your code.

#. Always mark complete sentences for translation. If you combine fragments at
   runtime, there is no way for the translator to construct a proper sentence
   in their language.

#. Do not combine strings together at runtime.

#. Limit the amount of text in strings that is not presented to the user. HTML
   markup is better applied after translation. If you include HTML in strings
   there is a chance that translators will translate your tags or attributes.

#. Use placeholders with descriptive names: ``"Welcome {student_name}"`` is
   much better than ``"Welcome {0}"``.

For details, see :ref:`i18n Style Guidelines`.


********************
Editing Source Files
********************

When you edit source files (including Python, JavaScript, or HTML template
files), you should be aware of the following conventions.

#. Know what has to be at the top of the file (if anything) to prepare it for
   i18n.

#. Know how strings are marked for internationalization. This markup typically
   takes the form of a function call with the string as an argument.

#. Know how translator comments are indicated. Such comments in the file will
   travel with the strings to the translators, giving them context to produce
   the best translation. Translator comments have a ``Translators:`` marker
   and must appear on the line preceding the text they describe. In Python,
   multi-line comments are supported for translator comments that need to be
   wrapped.

The code samples below show how to do each of these things for a variety of
programming languages.

.. note:: Take into account not just the programming language involved, but the
   type of file that you are preparing for internationalization. For example,
   JavaScript embedded in an HTML Mako template is treated differently than
   JavaScript in a pure .js file. There are many different escaping methods that
   you can use. For more details, see :ref:`Preventing XSS`.

.. contents::
   :depth: 1
   :local:


.. _i18n Python Source Code:

==================
Python Source Code
==================

.. highlight:: python

In most Python source code, indicate strings for translation and add
translator comments as shown. For more details, refer to the Django
documentation. ::

    from django.utils.translation import ugettext as _

    # Translators: This will help the translator
    message = _("Welcome!")

    # Translators: This is a very long comment that needs to wrap
    # over multiple lines because it would be too long otherwise.
    message = _("Hello world")

Some edX code cannot use Django imports. To maintain portability, XBlocks,
XModules, Inputtypes and Responsetypes forbid importing Django. Each of these
has its own way of accessing translations, as shown in the following examples. ::

    ### for XBlock & XModule:
    _ = self.runtime.service(self, "i18n").ugettext
    # Translators: a greeting to newly-registered students.
    message = _("Welcome!")

    # for InputType and ResponseType:
    _ = self.capa_system.i18n.ugettext
    # Translators: a greeting to newly-registered students.
    message = _("Welcome!")

Translator comments will work in these places too, so wherever possible,
provide clarifying comments for translators. However, be aware of a quirk in
the Python parser. When you write translator comments, make sure the message
string is on the very next line after the comment.

The following example is not correct. ::

    # Translators: this comment won't be properly harvested!
    message = _(
        "Long message "
        "on a few lines."
    )

This example is correct. ::

    message = _(
        # Translators: this comment will be properly harvested!
        "Long message "
        "on a few lines."
    )


.. _i18n Django Template Files:

=====================
Django Template Files
=====================

.. highlight:: django

In Django template files (`templates/*.html`), indicate strings for
translation and add translator comments as shown. ::

    {% load i18n %}

    {# Translators: this will help the translator. #}
    {% trans "Welcome!" %}


.. _i18n Mako Template Files:

===================
Mako Template Files
===================

.. highlight:: mako

In Mako template files (`templates/*.html`), you can use all of the tools
available to Python programmers. Just make sure to import the relevant
functions first. Here is a Mako template example. ::

    <%page expression_filter="h"/>
    <%! from django.utils.translation import ugettext as _ %>
    ...
    ## Translators: message to the translator. This comment may
    ## wrap on to multiple lines if needed, as long as they are
    ## lines directly above the marked up string.
    ${_("Welcome!")}

Make sure that all Mako comments, including translators comments, begin
with *two* pound signs (#).

All translated strings should be text, not HTML. This means that for display
in an HTML page, the strings must be HTML-escaped. In the example above, HTML-
escaping is handled through the ``<%page>`` directive with the ``h`` filter.
For more information, see :ref:`Preventing XSS`.

To mix plain text and HTML using ``format()``, you must use the ``HTML()`` and
``Text()`` functions. Use the ``HTML()`` function when you have a replacement
string that contains HTML tags. For the ``HTML()`` function to work, you must
use the ``Text()`` function to wrap the plain text translated string. Both the
``HTML()`` and ``Text()`` functions must be closed before any calls to
``format()``.

.. code-block:: mako

    <%page expression_filter="h"/>
    <%!
    from django.utils.translation import ugettext as _

    from openedx.core.djangolib.markup import HTML, Text
    %>
    ...
    ${Text(_("Click over to {link_start}the home page{link_end}.")).format(
        link_start=HTML('<a href="/home">'),
        link_end=HTML('</a>'),
    )}


You can nest the formatting further. The rule is, any string which is HTML
should be wrapped in the ``HTML()`` function, and any string which is not
wrapped in ``HTML()`` should be escaped as needed to be displayed as regular
text. Again, you must close the ``HTML()`` and ``Text()`` calls before making
any call to ``format()``.

.. code-block:: mako

    <%page expression_filter="h"/>
    <%!
    from django.utils.translation import ugettext as _

    from openedx.core.djangolib.markup import HTML, Text
    %>
    ...
    ${Text(_("Click over to {link_start}the home page{link_end}.")).format(
        link_start=HTML('<a href="{}">').format(home_page_link),
        link_end=HTML('</a>'),
    )}

For more information on proper escaping, see :ref:`Preventing XSS`.


.. _i18n JavaScript Files:

================
JavaScript Files
================

.. highlight:: html

To internationalize JavaScript, the HTML template (``base.html``) must first
load a special JavaScript library, and Django must be configured to serve it.
::

    <script type="text/javascript" src="jsi18n/"></script>

.. highlight:: javascript

Then, in JavaScript files (``*.js``)::

    // Translators: this will help the translator.
    var message = gettext('Welcome!');

For interpolation with translated strings, you must use
``StringUtils.interpolate`` or ``HtmlUtils.interpolateHtml``, as shown in the
following example. ::

    var message = StringUtils.interpolate(
        gettext('You are enrolling in {courseName}'),
        {
            courseName: 'Rock & Roll 101'
        }
    )

For more details on how to use ``StringUtils`` and ``HtmlUtils``, see :ref:`Safe
JavaScript Files <Safe JavaScript Files>`.

Note that JavaScript embedded in HTML in a Mako template file is handled
differently. There, you must use the Mako syntax even within the JavaScript.


.. _i18n CoffeeScript Files:

==================
CoffeeScript Files
==================

.. highlight:: coffeescript

CoffeeScript files are compiled to JavaScript files, so you indicate strings
for translation and add translator comments mostly as you would in
:ref:`Javascript <i18n JavaScript Files>`. ::

    `// Translators: this will help the translator.`
    message = gettext('Hey there!')

    # Interpolation must use JavaScript, not CoffeeScript interpolation
    var message = StringUtils.interpolate(
        gettext('You are enrolling in {courseName}'),
        {
            courseName: 'Rock & Roll 101'
        }
    )

However, because strings are extracted from the compiled .js files, some native
CoffeeScript features break the extraction from the .js files. Be aware of the
following rules.

#. Do not use CoffeeScript string interpolation. Doing so results in string
   concatenation in the .js file, preventing string extraction. Instead, use
   ``StringUtils.interpolate`` and ``HtmlUtils.interpolateHtml`` as documented
   in :ref:`Safe JavaScript Files <Safe JavaScript Files>`.

#. Do not use CoffeeScript comments for translator comments. They are not
   passed through to the JavaScript files.

::

    # DO NOT do this:
    # Translators: this won't get to the translators!
    message = gettext("This won't work")

    # YES do this:
    `// Translators: this will get to the translators.`
    message = gettext("This works")

    ###
    Translators: This will work, but takes three lines :(
    ###
    message = gettext("Hey there")

.. highlight:: python


.. _i18n Underscore Template Files:

=========================
Underscore Template Files
=========================

Underscore template files are used in conjunction with JavaScript, so the same
techniques that are used for localization in :ref:`Javascript <i18n JavaScript
Files>` are used for Underscore template files.

Make sure that the i18n JavaScript library has already been loaded, and then
use the i18n function ``gettext`` and the ``StringUtils`` function
``StringUtils.interpolate`` in your template, as shown in this example.

.. code-block:: javascript

    <%-
        StringUtils.interpolate(
            gettext('You are enrolling in {courseName}'),
            {
                courseName: 'Rock & Roll 101'
            }
        )
    %>

.. important:: Due to a bug in the underlying underscore extraction library,
   when ``StringUtils.interpolate`` and ``gettext`` are on the same line, the
   library will not work properly. In such cases, the library will extract the
   word ``gettext`` rather than the actual string that needs to be extracted.
   Make sure to separate ``StringUtils.interpolate`` and ``gettext`` into two
   different lines, as shown in the example above.

.. note:: You must use ``<%-`` for all translated strings that do not include
   HTML tags, as this will HTML-escape the strings before including them in the
   page.

If you have a translated string that includes a mix of HTML and plain text,
you must use ``HtmlUtils.interpolateHtml`` along with ``<%=``. Using ``<%=``
is only acceptable when you use an ``HtmlUtils`` function.

.. code-block:: javascript

    <%=
        HtmlUtils.interpolateHtml(
            gettext('You are enrolling in {spanStart}{courseName}{spanEnd}'),
            {
                courseName: 'Rock & Roll 101',
                spanStart: HtmlUtils.HTML('<span class="course-title">'),
                spanEnd: HtmlUtils.HTML('</span>')
            }
        )
    %>

You can access ``HtmlUtils`` and ``StringUtils`` from inside a template that is
processed using ``HtmlUtils.template()``. For more details regarding the use of
``StringUtils`` and ``HtmlUtils``, see :ref:`Safe JavaScript Files <Safe
JavaScript Files>`.

Currently, translator comments are not supported in underscore template files,
because the underlying library does not parse them out. You should add
translator comments using standard comment syntax, so that when work is done
to support translator comments, the comments are already defined in your code.
Additionally, translator comments in the code will enable us to answer
questions from translators.


.. _i18n Other Types of Content:

======================
Other Types of Content
======================

We have not yet established guidelines for internationalizing the following
types of content.

* Course content (such as subtitles for videos)
* Documentation (written for Sphinx as .rst files)


.. _i18n Coverage Testing:

****************
Coverage Testing
****************

These instructions assume that you are a developer working on
internationalizing new or existing user-facing features. To test that your
code is properly internationalized, you generate a set of 'dummy'
translations, then view those translations on your app's page to make sure
everything (scraping and serving) is working properly.

First, use the coverage tool to generate dummy files.

.. code-block:: bash

    paver i18n_dummy

This step creates new dummy translations in the Esperanto directory
(edx-platform/conf/local/eo/LC_MESSAGES) and the RTL directory
(edx-platform/conf/local/rtl/LC_MESSAGES). DO NOT CHECK THESE FILES IN. You
should discard these files once you have finished testing.

Next, restart the LMS and Studio to load in the new translation files.

.. code-block:: bash

    paver devstack lms
    paver devstack studio

Append ``/update_lang/`` to the root LMS or Studio URL and use the form to set
the preview language. The language code ``eo`` can be used to specify the test
language.

Instead of plain English strings, you should see specially-accented English
strings that look like this example.

    Thé Fütüré øf Ønlïné Édüçätïøn Ⱡσяєм ι#
    Før änýøné, änýwhéré, änýtïmé Ⱡσяєм #

This dummy text is distinguished by extra accent characters. If you see plain
English without these accents, it most likely means that the strings have not
yet been marked for translation, or you have broken a rule. To fix this issue,
follow these steps.

#. Find the strings in the source tree (either in Python, JavaScript, or HTML
   template code).

#. Refer to the coding guidelines above to make sure the strings have been
   properly externalized.

#. Rerun the scripts and confirm that the strings are now properly converted
   into dummy text.

This dummy text is also distinguished by "Lorem ipsum" text at the end of each
string, and is always terminated with "#". The original English string is
padded with about 30% more characters, to simulate languages (such as German)
which tend to have longer strings than English. If you see problems with your
page layout, such as columns that do not fit, or text that is truncated (the
``#`` character should always be displayed on every string), then you will
probably need to fix the page layouts accordingly to accommodate the longer
strings.

Finally, append ``/update_lang/`` to the root LMS or Studio URL and set the
language code to ``rtl`` to view your feature in the dummy right-to-left
(RTL) language. Test to make sure that the user interface is properly
"flipped" to right-to-left mode. Note that certain page elements might not
look correct because they are controlled by the browser. For more effective testing,
switch your browser's language to Arabic or another RTL language (Hebrew,
Persian, or Urdu) as well. See our `RTL UI Guidelines`_ for information
about fixing any issues that you find.

When you have finished reviewing, append ``/update_lang/`` to the LMS or
Studio URL and reset your session to your base language.

====================
Set Preview Language
====================

Before you set the preview language, sign in to either LMS or Studio. Then
append  ``/update_lang/`` to the root LMS or Studio URL. A form appears for you
to set or  clear the preview language. Set the Language code (for example use
``eo`` for the  test language Esperanto), and then select **Submit** to set the
preview language.  Use the **Reset** button to reset the preview language to
your default setting.  Refresh your browser page to display the page in the
selected language. The  language persists for the duration of your session.

.. _RTL UI Guidelines: https://github.com/openedx/edx-platform/wiki/RTL-UI-Best-Practices


.. _i18n Style Guidelines:

****************
Style Guidelines
****************

===========================================
Do Not Append Strings or Interpolate Values
===========================================

It can be difficult for translators to provide reasonable translations of
small sentence fragments. If your code appends sentence fragments, even if it
seems fine in English, the same concatenation is very unlikely to work
properly for other languages.

Bad::

    message = _("Welcome to the ") + settings.PLATFORM_NAME + _(" dashboard.")

In this scenario, the translator has to figure out how to translate these two
separate strings. It is very difficult to translate a fragment such as
"Welcome to the." In some languages, the fragments will be in a different
order. For example, in Spanish this phrase would be ordered as "Welcome to the
dashboard of edX".

It is much easier for a translator to figure out how to translate the entire
sentence, using the pattern "Welcome to the {platform_name} dashboard."

Good::

    message = _("Welcome to the {platform_name} dashboard.").format(platform_name=settings.PLATFORM_NAME)


Note that you cannot concatenate (+) within the ``gettext`` call at all. The
following example does not work.

Bad::

    message = _(
        "Welcome to {platform_name}, the online learning platform " +
        "that hosts courses from world-class universities around the world!"
    ).format(platform_name=settings.PLATFORM_NAME)

In Python, because _() is a function, the following example works.

Good (Python only!)::

    message = _(
        "Welcome to {platform_name}, the online learning platform "
        "that hosts courses from world-class universities around the world!"
    ).format(platform_name=settings.PLATFORM_NAME)

However, in JavaScript and other languages, the ``gettext`` call cannot be
broken up over multiple lines. You will have to live with long lines on
``gettext`` calls, and we make a style exception for this.

Bad::

    message = gettext('Here is a really really long message that is' +
        'incorrectly broken over two lines.')

Good (JavaScript)::

    message = gettext('Here is a really really long message that is correctly left on a single line.')

======================
Use Named Placeholders
======================

Python string formatting provides both positional and named placeholders. Use
named placeholders, never use positional placeholders. Positional placeholders
cannot be translated into other languages, which might need to re-order them
to make syntactically correct sentences. Even with a single placeholder, a
named placeholder provides more context to the translator.

Bad::

    message = _('Today is %s %d.') % (m, d)

OK::

    message = _('Today is %(month)s %(day)s.') % {'month': m, 'day': d}

Best::

    message = _('Today is {month} {day}.').format(month=m, day=d)

Notice that in English, the month comes first, but in Spanish the day comes
first. This is reflected in the .po file in the following way. ::

    # fragment from edx-platform/conf/locale/es/LC_MESSAGES/django.po
    msgid "Today is {month} {day}."
    msgstr "Hoy es {day} de {month}."

The resulting output is correct in each language. ::

    English output: "Today is November 26."
    Spanish output: "Hoy es 26 de Noviembre."


==============================
Only Translate Literal Strings
==============================

As programmers, we are used to using functions in flexible ways. But
translation functions such as ``_()`` and ``gettext()`` cannot be used in the
same ways as other functions. At runtime, they are real functions like any
other, but they also serve as markers for the string extraction process.

For string extraction to work properly, the translation functions must be
called only with literal strings. If you use them with a computed value, the
string extractor will not have a string to extract.

The difference between the right way and the wrong way can be very subtle, as
shown in these examples.

::

    # BAD: This tries to translate the result of .format()
    _("Welcome, {name}".format(name=student_name))

    # GOOD: Translate the literal string, then use it with .format()
    _("Welcome, {name}").format(name=student_name))

::

    # BAD: The dedent always makes the same string, but the extractor can't find it.
    _(dedent("""
    .. very long message ..
    """))

    # GOOD: Dedent the translated string.
    dedent(_("""
    .. very long message ..
    """))

::

    # BAD: The string is separated from _(), the extractor won't find it.
    if hello:
        msg = "Welcome!"
    else:
        msg = "Goodbye."
    message = _(msg)

    # GOOD: Each string is wrapped in _()
    if hello:
        message = _("Welcome!")
    else:
        message = _("Goodbye.")


==========================
Be Aware of Nested Context
==========================

When you provide strings in templated files for translation, you have to be
careful of nested context. For example, consider this JavaScript fragment in a
Mako template. ::

    <script>
    var feeling = '${_("I love you.")}';
    </script>

When the string is rendered in French, it will produce the following invalid
JavaScript.

.. code-block:: none

    <script>
    var feeling = 'Je t'aime.';
    </script>

Avoid this issue by following the best practices detailed in :ref:`Preventing
XSS`. Here is the same example with proper escaping.

.. code-block:: mako

    <%!
    from django.utils.translation import ugettext as _

    from openedx.core.djangolib.js_utils import js_escaped_string
    %>
    ...
    <script>
    var feeling = '${_("I love you.") | n, js_escaped_string}';
    </script>

The code with proper escaping produces the following JavaScript-escaped code.

.. code-block:: html

    <script>
    var feeling = 'Je t\u0027aime.';
    </script>


==================
Singular vs Plural
==================

It can be tempting to improve a message by selecting singular or plural based
on a count, as shown in the following example. ::

    if count == 1:
        msg = _("There is 1 file.")
    else:
        msg = _("There are {file_count} files.").format(file_count=count)

The example above is not the correct way to choose a string, because other
languages have different rules for when to use singular and when plural, and
there might be more than two choices.

One option is not to use different text for different counts. ::

    msg = _("Number of files: {file_count}").format(file_count=count)

If you want to choose based on number, you need to use another ``gettext`` variant
to do so. ::

    from django.utils.translation import ungettext
    msg = ungettext("There is {file_count} file", "There are {file_count} files", count)
    msg = msg.format(file_count=count)

The example above will properly use count to find a correct string in the
translation file; you can then use that string to format in the count.


=====================
Translating Too Early
=====================

When the ``_()`` function is called, it will fetch a translated string using
the current user's language to decide which string to fetch.

If you invoke the ``_()`` function before we know the user, then the wrong
language might be used. ::

    from django.utils.translation import ugettext as _

    HELLO = _("Hello")
    GOODBYE = _("Goodbye")

    def get_greeting(hello):
        if hello:
            return HELLO
        else:
            return GOODBYE

Here the HELLO and GOODBYE constants are assigned when the module is first
imported, at server startup. There is no current user at that time, so
``ugettext`` will use the server's default language. When we eventually use
those constants to show a message to the user, they are not looked up again,
and the user will get the wrong language.

There are a few ways to deal with this situation. The first is to avoid
calling ``_()`` until we have the user. ::

    def get_greeting(hello):
        if hello:
            return _("Hello")
        else:
            return _("Goodbye")

Another way is to use Django's ``ugettext_lazy`` function. Instead of returning
a string, it returns a lazy object that will wait to do the lookup until it is
actually used as a string. ::

    from django.utils.translation import ugettext_lazy as _

Using this method can be tricky, because the lazy object does not act like a
string in all cases.

The last way to solve the problem is to mark the string so that it will be
extracted properly, but not actually do the lookup when the constant is
defined. ::

    from django.utils.translation import ugettext

    _ = lambda text: text

    HELLO = _("Hello")
    GOODBYE = _("Goodbye")

    def get_greeting(hello):
        if hello:
            return ugettext(HELLO)
        else:
            return ugettext(GOODBYE)

Here we define ``_()`` as a pass-through function, so the string will be found
during extraction, but will not be translated too early. At runtime, we then
use the real translation function to get the localized string.


==================
Multi-line Strings
==================

Translator notes must directly precede the string literals to which they refer.
For example, the translator note here will not be passed along to translators.
::

    # Translators: you will not be able to see this note because
    # I do not directly prepend the line with the translated string literal.
    # See the line directly below this one does not contain part of the string?
    long_translated_string = _(
        "I am a long string, with many, many words. So many words that it is "
        "advisable that I be split over this line."
    )

In such a case, make sure you format your code so that the string begins on
a line directly below the translator note. ::

    # Translators: you will be able to see this note.
    # See how the line directly below this one contains the start of the string?
    long_translated_string = _("I am a long string, with many, many words. "
                               "So many words that it is advisable that I "
                               "be split over this line.")

.. _i18n Additional Resources:

********************
Additional Resources
********************

The following links provide other resources related to internationalization.

* `Django Internationalization <https://docs.djangoproject.com/en/dev/topics/i18n/>`_ (overview)
* `Django: Internationalizing Python code <https://docs.djangoproject.com/en/dev/topics/i18n/translation/#internationalization-in-python-code>`_
* `Django Translation guidelines <https://docs.djangoproject.com/en/dev/topics/i18n/translation/>`_
* `Django Format localization <https://docs.djangoproject.com/en/dev/topics/i18n/formatting/>`_


**Maintenance chart**

+--------------+-------------------------------+----------------+--------------------------------+
| Review Date  | Working Group Reviewer        |   Release      |Test situation                  |
+--------------+-------------------------------+----------------+--------------------------------+
|              |                               |                |                                |
+--------------+-------------------------------+----------------+--------------------------------+
