Operation of markup information for regional language in PHP

Time:2021-9-25

I believe you are right_ Cn is no stranger. We can see it in PHP and on our web pages. In fact, this is to specify what country or region our display code is and what language we use. PHP also has a lot of fun for markup in this regional language. Today, the locale class we want to learn is to operate the content related to the regional language. It cannot be instantiated. All functional methods are static.

Get and set the current regional language information

First, we can dynamically obtain and set the corresponding regional language information.

// # echo $LANG;
// en_US.UTF-8

// php.ini
// intl.default_locale => no value => no value

echo Locale::getDefault(), PHP_EOL; // en_US_POSIX
ini_set('intl.default_locale', 'zh_CN');
echo Locale::getDefault(), PHP_EOL; // zh_CN
Locale::setDefault('fr');
echo Locale::getDefault(), PHP_EOL; // fr

By default, the intl.default in the php.ini file is obtained by using the getdefault () method_ Content of locale configuration. If it is not configured in php.ini, it will take the content in the $Lang value of the operating system, that is, the en output in our above example_ US_ POSIX, POSIX represents the configuration from the operating system.

Using ini_ Set () directly modifies the configuration of ini or uses the setDefault () method to dynamically modify the current regional language setting.

Rules on language markers

Before continuing with the following, let’s study the specification of language markup. For most people, they may only have been exposed to en_ US 、 zh_ Cn, but its complete definition is very long, but when we use this abbreviation, many contents will be provided in the default form. The complete marking rules are:

language-extlang-script-region-variant-extension-privateuse
Language type - extended language type - writing format - country and Region - variant - extended - Private

In other words, our zh_ Cn can be written as follows:

zh-cmn-Hans-CN-Latn-pinyin

It represents: zh language type, Hans writing format is simplified Chinese, CMN Putonghua, CN countries and regions, latn variant Latin alphabet, Pinyin variant Pinyin.

Do you feel that such a simple thing suddenly becomes tall. In addition, the prefix zh is no longer recommended. Zh is no longer a language code, but a macrolang, that is, a macro language. We can use CMN, Yue (Cantonese), Wu (Wu), HSN (Hunan, Hunan) as a language directly. Therefore, the above paragraph can also be written as follows:

cmn-Hans-CN-Latn-pinyin

In the last article, when we talked about numberformatter, we said that we can directly obtain the output of Chinese digital format. Now we want the results in traditional Chinese? It’s very simple. Add hant logo and the writing format is traditional Chinese.

For the content of language marking rules, you can see the reference link at the end of the article for a more detailed introduction.

$fmt = new NumberFormatter('zh-Hant', NumberFormatter::SPELLOUT);
echo $fmt->format(1234567.891234567890000), PHP_EOL; 
//One million two hundred thirty-four thousand five hundred sixty-seven point eight nine one two three four five six seven nine

Get all kinds of information in the specified language markup rules

What can I do after learning the rules of language markers? The main function of locale class is to analyze and obtain these attribute information.

Obtain various attribute information separately

echo Locale::getDisplayLanguage('cmn-Hans-CN-Latn-pinyin', 'zh_CN'), PHP_EOL; // cmn
echo Locale::getDisplayLanguage('zh-Hans-CN-Latn-pinyin', 'zh_CN'), PHP_ EOL; //  chinese

echo Locale::getDisplayName('cmn-Hans-CN-Latn-pinyin', 'zh_CN'), PHP_ EOL; //  CMN (simplified, China, latn_pinyin)
echo Locale::getDisplayName('zh-Hans-CN-Latn-pinyin', 'zh_CN'), PHP_ EOL; //  Chinese (simplified, China, latn_pinyin)

echo Locale::getDisplayRegion('cmn-Hans-CN-Latn-pinyin', 'zh_CN'), PHP_ EOL; //  China
echo Locale::getDisplayRegion('zh-Hans-CN-Latn-pinyin', 'zh_CN'), PHP_ EOL; //  China

echo Locale::getDisplayScript('cmn-Hans-CN-Latn-pinyin', 'zh_CN'), PHP_ EOL; //  Simplified Chinese
echo Locale::getDisplayScript('zh-Hans-CN-Latn-pinyin', 'zh_CN'), PHP_ EOL; //  Simplified Chinese

echo Locale::getDisplayVariant('cmn-Hans-Latn-pinyin', 'zh_CN'), PHP_EOL; // LATN_PINYIN
echo Locale::getDisplayVariant('zh-Hans-CN-Latn-pinyin', 'zh_CN'), PHP_EOL; // LATN_PINYIN

We use two marking methods to test the code, and we can see the comparison of the results.

  • The getdisplaylanguage () method is used to obtain the displayed language information, that is, the language content in the rule.

  • The getdisplayname () method is used to obtain the standard language name. You can see that the content is richer.

  • The getdisplayregion () method obviously gets the country information.

  • Getdisplayscript () gets the information in writing format.

  • Getdisplayvariant () gets the variant information

Get attribute information in batch

Of course, we can also get some language related information in batches.

$arr = Locale::parseLocale('zh-Hans-CN-Latn-pinyin');
if ($arr) {
    foreach ($arr as $key => $value) {
        echo "$key : $value ", PHP_EOL;
    }
}
// language : zh
// script : Hans
// region : CN
// variant0 : LATN
// variant1 : PINYIN

Using the parselocale () method, you can get all kinds of information in a language tag and save it in the array. The key is the tag rule name and the value is the corresponding content. See if it is the same as what we introduced above.

Get all variation information

As can be seen from the above code, we have two variant information. This can also directly obtain the array of all variant information in the language tag through a getallvariants () method.

$arr = Locale::getAllVariants('zh-Hans-CN-Latn-pinyin');
var_export($arr);
echo PHP_EOL;
//  array (
//     0 => 'LATN',
//     1 => 'PINYIN',
//   )

Get information about character set

echo Locale::canonicalize('zh-Hans-CN-Latn-pinyin'), PHP_EOL; // zh_Hans_CN_LATN_PINYIN

$keywords_arr = Locale::getKeywords('[email protected]=CMY;collation=UTF-8');
if ($keywords_arr) {
    foreach ($keywords_arr as $key => $value) {
        echo "$key = $value", PHP_EOL;
    }
}
// collation = UTF-8
// currency = CMY

The canonicalize () method is used to display the language tag information in a standardized way. You can see that it turns our middle line into an underscore and turns the following attributes into uppercase. This is the standardized writing method. However, for our applications and web pages, underline and case are supported. Of course, you’d better define it according to the standard writing.

Getkeywords () is used to obtain language related information attributes from the @ symbol. For example, we define zh CN, and then define its currency as CMY and character set as UTF-8. You can obtain the array of currency and character set attributes directly through getkeywords ().

Match judgment language tag information

For language tags, we can judge whether a given two tags match each other, for example:

echo (Locale::filterMatches('cmn-CN', 'zh-CN', false)) ? "Matches" : "Does not match", PHP_EOL;
echo (Locale::filterMatches('zh-CN-Latn', 'zh-CN', false)) ? "Matches" : "Does not match", PHP_EOL;

Of course, we can also use another lookup () method to determine which of a given set of language tags is closest to the specified tag.

$arr = [
    'zh-hans',
    'zh-hant',
    'zh',
    'zh-cn',
];
echo Locale::lookup($arr, 'zh-Hans-CN-Latn-pinyin', true, 'en_US'), PHP_EOL; // zh_hans

Generate a standard rule language tag

Since we can get the attribute information of various language tags, can we generate a standard language tag content?

$arr = [
    'language' => 'en',
    'script' => 'Hans',
    'region' => 'CN',
    'variant2' => 'rozaj',
    'variant1' => 'nedis',
    'private1' => 'prv1',
    'private2' => 'prv2',
];
echo Locale::composeLocale($arr), PHP_EOL; // en_Hans_CN_nedis_rozaj_x_prv1_prv2

Yes, the composelocate () method can generate a complete standard language markup format content according to the content of an array format. Of course, the test code is scribbled, which is equivalent to an en_ The mark of CN will not be written like this normally.

Acceptfromhttp reads language information from the request header

In addition, the locale class also provides a method to obtain the language information of the client browser from the accept language in the header header.

// Locale::acceptFromHttp($_SERVER['HTTP_ACCEPT_LANGUAGE']);

echo Locale::acceptFromHttp('en_US'), PHP_EOL; // en_US
echo Locale::acceptFromHttp('en_AU'), PHP_EOL; // en_AU

echo Locale::acceptFromHttp('zh_CN'), PHP_EOL; // zh
echo Locale::acceptFromHttp('zh_TW'), PHP_EOL; // zh

However, from the test results, it only needs a string parameter, so we can test it on the command line. It should be noted that for Chinese, it cannot return region information, but only language information.

summary

In fact, the content related to this locale class has hardly been touched in the author’s daily development, but I believe many students who do cross-border projects will have some understanding of them. It can only be said that the business is not accessible, so we can only simply learn and have a look. Similarly, when you encounter relevant business needs in the future, don’t forget their existence!

Test code:

https://github.com/zhangyue0503/dev-blog/blob/master/php/202011/source/5.PHP Operations on locale markup information in. PHP

Reference documents:

https://www.php.net/manual/zh/class.locale.php

https://www.zhihu.com/question/20797118/answer/63480740

===============

Official account: hard core project manager

Add wechat / QQ friends: [xiaoyuezigonggong / 149844827] get free PHP and project management learning materials

Tiktok, official account, voice, headline search, hard core project manager.

Station B ID: 482780532