Bug #644
closedASKLIBREOFFICE: Japanese site transliterates question URLs incorrectly
0%
Description
http://ask.libreoffice.org/ja is transliterating question titles into URLs as though from Chinese rather than Japanese.
Assuming that the point of this is to make a URL that can be read rather than only relying on a unique identifier, the result is jarring and unpleasant to read.
For instance, the question:
カスタムインストールで機能を選べません
becomes:
http://ask.libreoffice.org/ja/question/39107/kasutamuinsutorudeji-neng-woxuan-bemasen/
Within the URL, the three Chinese characters (* written Japanese partly uses Chinese characters, but they aren't pronounced anything like Chinese) are given the readings "ji" "neng" and "xuan", which are Chinese and not Japanese.
If this had been tokenised into words and transliterated correctly, it would read something like:
...kasutamu-insutoru-de-kino-wo-erabemasen/
Assuming this is something missing in AskBot itself rather than its configuration, fixing this problem while retaining the transliteration would involve adding the use of a Japanese morphological analyser. There are obvious candidates for this in C (chasen, mecab) and Java, but unfortunately in terms of AskBot, the only Python solutions I can see appear to involve interfacing with one or other of the C packages, which doesn't seem ideal in terms of a web server environment.
I don't know a great deal about Python, but feel free to ask me if you need any more help with the analysing Japanese text aspect, where I have some prior experience.
Related issues