波音游戏源码-波音博彩公司评级_百家乐园天将_新全讯网3344111.c(中国)·官方网站

Tracking modern Chinese language with LIVAC

 

Which individuals in the Chinese speaking communities of Hong Kong, Taiwan, and Beijing have had most media exposure over the last two weeks? Which words were most frequently used? You may think these are questions to which there are no definite answers, only subjective guesses. But in fact these and other precise and statistics-based answers are only a click away in the Synchronous Linguistics Variation in Chinese Speech Communities (LIVAC) Corpus (www.rcl.cityu.edu.hk/livac/sample), developed by the Language Information Sciences Research Centre (LISRC), a CityU University Research Centre. 

The three key indices of the LISRC: "Celebrity Roster", "Place Name Rank", and "Common Word List", were compiled from the Synchronous LIVAC Corpus. First launched in 1994 by LISCR Director and Chair Professor of Linguistics and Asian Languages, Professor Benjamin T'sou , the LIVAC Corpus is one of the Competitive Earmarked Research Grants projects supported by Hong Kong's Research Grants Council.

A ten-year research project

Since July 1995, the LIVAC database has been regularly compiled with linguistic data from the major newspapers and electronic media from six Chinese-speaking communities: Hong Kong, Taiwan, Beijing, Shanghai, Macau, and Singapore. Words and phrases are first automatically selected by computer and then manually proofread and categorized. From this, a database composed of the linguistic structure-Character, phrase, sentence, and text-is constructed. This database is very useful for linguists and people interested in exploring linguistic phenomena, social organizations, culture and other developments in Chinese communities.

In early 2001, the size of the corpus exceeded 70 million characters and 400,000 phrases. It is continuously expanding. Currently, the part of the corpus database that has been put on the web comprises approximately 16 million characters and 190,000 phrases. It consists mainly of linguistic data compiled from July 1995 to June 1997. According to the LISCR schedule, the database will be expanded and renewed until June 2005. The total number of characters and phrases compiled at the end of the project is estimated to be 100 million and 600,000, respectively.

A Chinese language time capsule

"The corpus is like a time capsule, capturing the social, cultural, and linguistic developments of the six Chinese speaking communities within a decade," Professor T'sou explained, "This provides valuable primary research materials for linguists and those interested in studying Chinese societies." One of the many important objectives of the corpus is to explore in depth the dynamics in the development of modern Chinese vocabulary. This includes examining the origins and subsequent forms of new-concept words, the development of meaning in words, the transference of old phrases, and phrases with local colour.

Guess how many common Chinese translations can be found for the term "Internet" in the six targeted communities? According to LIVAC records between 1995 and 2000, there are at least 13 and the most frequently used translation varies between the different Chinese speaking communities. For instance, in Hong Kong"" (pronounced hu lian wang in Putonghua) is often used; in Taiwan, "" (wang ji wang lu); in Singapore, "" (wang ji wang luo); in Macau, ""(hu lian wang luo); and in Shanghai and Beijing, "" (yin te wang).

Professor T'sou said, "The Chinese language is diverse, not a single entity. It carries different local colour in different communities. People often criticize the Chinese written language used by young people in Hong Kong as being mingled with Cantonese colloquial expressions. This is in fact a value judgment. The same language of the same locale develops differences over the passage of time. Language never stops evolving. The corpus lets us see the developments and variations of modern Chinese language in different Chinese communities over the last 10 years."

Unlimited application potential

The process of building the database is long, laborious and tedious, similar to "cultivating a barren continent" or "moving a huge mountain", Professor T'sou said. "However, when the task is completed and the result is a 'feast' to be shared by all who are interested, we forget about the hardship and feel rewarded."

 

Apart from academic research, a database with a huge linguistic corpus, with built-in search and statistical functions, has enormous potential for application. It is increasingly common now for Hong Kong's law courts to use Cantonese, and the Synchronous LIVAC Corpus can be used in the process of recording litigation. Mobile phones designed for Chinese input also need to be supported by a huge linguistic database. In fact, as Professor T'sou pointed out, some network and IT product development companies, such as the Japanese telecom giant NTT, Hong Kong's leading web content provider, tom.com, and a subsidiary of AOL have already started applying the LIVAC database.

 

YOU MAY BE INTERESTED

Contact Information

Communications and Institutional Research Office

Back to top
澳门百家乐打法精华| 大发888娱乐城 34hytrgwsdfpv| 百家乐娱乐分析软件v4.0| 全迅网百家乐的玩法技巧和规则 | A8百家乐娱乐网| 德州扑克打法| 网上百家乐官网心得| 百家乐官网桌台布| 百家乐官网的打法技巧| 在线百家乐官网下注| 昆明百家乐装修装潢有限公司| 娱乐城官方网站| A8百家乐官网娱乐平台| 百家乐平台导航| 百家乐开户| 24鸡是什么命| 威尼斯人娱乐场申博太阳城| 盐津县| 试玩百家乐官网帐| 女优百家乐的玩法技巧和规则| 百家乐官网注码技巧| 杨公先师24山秘密全书| 大发888注册步骤| 赌场百家乐官网怎么破解| 百家乐用品| 东光县| 网上百家乐开户送现金| 大发888网址| 破解百家乐官网游戏机| 百家乐真人游戏| 菲律宾百家乐官网开户| 赌场百家乐代理| 德州扑克起手牌概率| 百家乐官网闲和庄| 百家乐赌法博彩正网| 优博注册| 百家乐视频二人雀神| 澳门百家乐官网皇冠网| 百家乐平玩法lm0| 大发888好不好| 巴宝莉百家乐官网的玩法技巧和规则 |