バイト数を取得する

例えば、

あaaaあ

という文字列があったとします。

この文字列の長さは、5文字です。

しかし、この値を表示する場合に必要なスペースを確保するなどの理由から、全角を2文字、半角文字（A-Z や 0-9 などの文字）を1文字として 7文字として評価したい場合があります。

そんな時は、以下のようにして取得します。

public int GetByteLength(string targetString) {
　　Encoding shiftJis = Encoding.GetEncoding("Shift_JIS");
　　return shiftJis.GetByteCount(targetString);
}

実は、今回話題にしたかったのは、この方法に関してではなく「何故、この処理で 7 を得ることができるのでしょうか」ということです。

以前、ある掲示板で「わからない」と言う方をお見かけしました。
最近そういう方が増えているのでしょうか。

今日は、そんな話題です。そして一応説明しておきます。

我らが Visual Studio では、基本的には Unicode という文字コードを使っています。
この文字コードは、全角も半角も同じバイト数になります。
例えば、「ANK」と「あんく」は同じ「6バイト」です。

一方、Shift_JIS では、ANK文字は 1バイト(0～255)の中に全て含むことができる程度の文字種類なので 1バイトで表わします。
そして、日本語の場合は、ANKで利用する文字コードで使用していないものを利用し、その次に1バイトを追加した 2バイトの組み合わせで表現します。

つまり、「ANK」は3バイトですが「あんく」は6バイトになります。

従って、Shift JISに変換して、バイト数を取得すれば、上記の例では 7 を得ることができるのです。

投稿日時 : 2007年1月18日 14:05

Feedback

# re: バイト数を取得する 2007/01/18 14:06 Ｒ・田中一郎

ANK と漢字の表示の違いがわかりにくい・・・orz

# re: バイト数を取得する 2007/01/18 14:26 中博俊

ANKの定義が微妙
いいたいことはわかるけど、正確にとらえるとまちがっている。
あとUnicodeはいいけど、全部2バイトで扱っているのはStringクラスの都合で、本来は適切じゃない。
なんてことも抑えるといいんだけど、まぁこゆい話にはちがいない

# re: バイト数を取得する 2007/01/18 14:32 えムナウ

>基本的には Unicode という文字コードを使っています。
>常に2バイトを組み合わせて文字を表現します。
これは間違いです。
中さんの言う通りでStringクラスの都合です。
Unicodeは4バイト以上で一文字の場合もあるのでそういう国では大変なんでしょうね。

# re: バイト数を取得する 2007/01/18 14:44 Blue

疑問

Shift_JISコードに変換してバイト数を算出する

この方法は本当にOKなんでしょうかね？
Unicodeしか表せない文字がありますが、あれは全角(２バイト相当)でいいのですかね？
(GetByteCountではどうなるんでしたっけ？その直前までのバイト数？)

# 本当は(等幅フォントの)文字幅から算出しないとダメとか。。。

# re: バイト数を取得する 2007/01/18 14:51 中博俊

いやいやいやえムナウさんそれはちがいます。
サロゲートペアの問題は日本にも存在します。
みながほっかむりしているだけでしかないのです

# re: バイト数を取得する 2007/01/18 14:57 中博俊

文字幅から算出も不適切です。
フォントになるとShift-JIS文字マッピング以上に問題です。
なぜなら完全最強なユニバーサルフォントなんてものが存在しないからです。
日本のフォントにはかなり多くの字形が含まれていますが、U+3270は韓国の字形ですがMSゴシックにはふくまれていません。
AriaｌUnicode MSフォントには含まれています
IMEパッドでいろいろ眺めていると見えてくることもあります

# re: バイト数を取得する 2007/01/18 14:59 えムナウ

>サロゲートペアの問題は日本にも存在します。
なるほど吉野家の吉の字は土に口で本来はサロゲートペアで4バイトなんですね。
でも変換されないからコンピュータでは使えないしフォントもあるのかなぁ？

# re: バイト数を取得する 2007/01/18 15:01 かずくん

> Unicodeは4バイト以上で一文字の場合もあるのでそういう国では大変なんでしょうね。
日本語でも、VistaではJIS X0213：2004をサポートし、そしてこの文字コードの一部がUNICODE補助面に配置されているため、
.net framework上で正しく扱えるかどうか疑問なのですが、問題ないのでしょうか？

> 全部2バイトで扱っているのはStringクラスの都合で、

という点が、少し気になるのですが。
もとから補助面を考慮してる？
おしえて、えらいひと。

# re: バイト数を取得する 2007/01/18 15:04 中博俊

結論だけ。
だめ

# re: バイト数を取得する 2007/01/18 15:05 Ｒ・田中一郎

中さん、えムナウさん。

ご指摘ありがとうございました。
表現を変更してみました。

＃取り急ぎ、失礼します。

# re: バイト数を取得する 2007/01/18 15:10 中博俊

簡単なテストプログラムをつくりました
this.textbox1.text.length
をメッセージボックスで表示するものです。
サロゲートペア文字を1文字入れると、字形は1文字ですが結果は2です。
要は昔の時代と同じで1文字分だけつかまえて抜き出しても表示できないという状態になっているというわけです。

# re: バイト数を取得する 2007/01/18 15:20 中博俊

文字コードには並々ならぬ情熱が！！
Stringの都合といっているのはUTF16のことを言っています。
内部形式でUTF16を使うために起きている問題もUTF32を使うと解決します
Windowsやその周辺テクノロジーはUnicode以降が非常に早かったためにUTF32化はかなりむつかしいと思われます。
Win32ApiやString32の提供とtcharなどをunsigned long intへのマッピングなどなど膨大な割にメリットはほぼ0

# re: バイト数を取得する 2007/01/18 16:07 YuO

Win32APIレベルでも，CharNextですらサロゲートを認識しなかったりしますね。
GetTextExtentExPointはサロゲート文字の真ん中を返してきますし，文字の真ん中にカーソルの来る妙なエディタとかありそうな感じがします。

ところで，Ｒ・田中一郎さんにお願いが。
- Author -欄のコメント末尾のa要素の開始タグの末尾の大なり記号が，a要素の終了タグに置き換わってしまっています。
おかげで，Operaでは文やコメントにリンクの装飾が付いてしまい読みにくいので，修正していただけると有りがたいです。

# re: バイト数を取得する 2007/01/18 16:39 かずくん

検証までしていただき、ありがとうございます。

> サロゲートペア文字を1文字入れると、字形は1文字ですが結果は2です。
表示はされたけど、文字数は正しくないということかな？
まー、表示は、native APIの範疇なので、きっとされたのでしょう。
文字サイズは、managed APIの範疇のなので、正しくない結果が返ったというところでしょうか。

末端の開発者が取れる対処法は、
1. .net frameworkの実装をMSが修正してくれるまで、使用しない。
2. 自前で文字数をカウントするユーティリティ関数を作る。
といったところでしょうか。

しかも、2.については、
> Win32APIレベルでも，CharNextですらサロゲートを認識しなかったりしますね。

ということらしいので、Win32APIを使わずに実装しなきゃならんのだろうな...

> - Author -欄のコメント末尾のa要素の開始タグの末尾の大なり記号が，a要素の終了タグに置き換わってしまっています。
そういえば、safariでも、架線が出てたけど、そーゆー理由だったのですね。
わんくま内の他の方のblogもコメント欄に架線が出てたような気がしてたけど、どこか忘れちゃった。

# re: バイト数を取得する 2007/01/18 16:43 シャノン

みんなUnicodeって言ってますけど、UTF-16のことなんですよね。
Unicodeそれ自体（というかUCS）はバイト数を規定していないはず。
UTF-8だってUnicode（の表現方法の一種）ですけど、あれは１バイトだったり３バイトだったりしますからね。

# re: バイト数を取得する 2007/01/18 16:55 中博俊

原理原則でいうならかずくんの1はあり得ません。
そこにあるテキストボックスには入力可能です。
使わないという方法を使えないわけです。
Unicodeで入力可能なテキストボックスで、シフトJIS変換できない文字をほっかむりしているのと寝は同じです。まぁそっちはもはや論外ですが。

2に関しては実現可能ですが、そこまで厳密にとった文字サイズを果たしてどこで利用するのかという問題があります。

# re: バイト数を取得する 2007/01/18 18:31 黒龍

>Unicodeで入力可能なテキストボックスで、シフトJIS変換できない文字をほっかむりしているのと寝は同じです。まぁそっちはもはや論外ですが。
なぜか基準をShift-Jisにしたがる人が多くて困りもの･･･。そういえば昔タイ語のデータで驚きました。一文字１～5バイトなんていう変則的なコードだったので。

# re: バイト数を取得する 2007/01/19 0:07 kkamegawa

>CharNext
Win32 APIが規定されたWindows NT 3.1の時代はUnicode 1.0でサロゲートペアはまだ発明されていなかったんですから、最初はまぁ、仕方ないですね。サロゲートペアは1996年のUnicode 2.0からです。確かにもう実装してくれてもいいよなーとは思いますが。

ちなみにUTF-8は最大4byteになります。

マネージドコードの場合、サロゲートペアを意識した文字数カウントはStringInfoクラスを使えば正しくカウントされるとドキュメントには載ってました。

それから結合文字のことも忘れないでください。

# re: バイト数を取得する 2007/01/19 1:39 RUN

汎用性を求めないシステムなら外字って逃げ道を使うケースも有るよね。
（これも、ほっかむりの範疇か？）
Stringクラスの都合って要するに、
ASCIIだと、char配列なメモリ領域で文字計算してるけど、
Stringクラスだと、int16配列なメモリ領域で文字計算をしてるから、
全角文字が配列数で数えると2になるか1になるかって感じだっけ？

まぁ、これも正確には間違った考えだけど、イメージしやすい考え方と言うことで（笑）

# re: バイト数を取得する 2007/01/19 14:43 Ｒ・田中一郎

中博俊さん

>あとUnicodeはいいけど、全部2バイトで扱っているのはStringクラスの都合で、本来は適切じゃない。
>なんてことも抑えるといいんだけど、まぁこゆい話にはちがいない

そして、この後にこゆい話が展開された訳ですが、とっても勉強になりました。
ありがとうございました。

>文字コードには並々ならぬ情熱が！！

過去に何があったのか想像が膨らみますね^^;

------------------------
えムナウさん

>なるほど吉野家の吉の字は土に口で本来はサロゲートペアで4バイトなんですね。

そっ、そうだったんですか・・・
この辺りも、勉強しておかなければ・・・^^;

ありがとうございます。

------------------------
Blue さん

>Unicodeしか表せない文字がありますが、あれは全角(２バイト相当)でいいのですかね？

全角を２バイト、半角を１バイトとしたい場合の使い道によるということになるんでしょうか。

純粋に表示や印刷の幅を文字列から取得する場合は、Graphics.MeasureString() メソッドを使う方が正しい気がします。

ただ、今回は単純に Shift Jis に一旦変換してからバイト数を取得するという話に展開したかったのですｗ

------------------------
かずくんさん

ご質問された内容、僕も勉強させていただきました。

>そういえば、safariでも、架線が出てたけど、そーゆー理由だったのですね。

ご迷惑をおかけしております。
早急に対応したいと思います。

------------------------
YuO さん

>ところで，Ｒ・田中一郎さんにお願いが。

以前、Jitta さんにもご指摘を受けまして、Opera で確認してみました。
やはり、IE, FireFox Opera くらいは表示確認しないとダメですね。
なおしておきます。

ご指摘ありがとうございました。

------------------------
シャノンさん

>みんなUnicodeって言ってますけど、UTF-16のことなんですよね。

僕の場合、開発環境に絡む話題の場合 Unicode ってついつい言っちゃいます。

------------------------
黒龍さん

>そういえば昔タイ語のデータで驚きました。一文字１～5バイトなんていう変則的なコードだったので。

タイ語ですか！？
なんか、すごいです^^;

------------------------
kkamegawa さん

>マネージドコードの場合、サロゲートペアを意識した文字数カウントはStringInfoクラスを使えば正しくカウントされるとドキュメントには載ってました。

早速調べてみます。

------------------------
RUN さん

なるほど。
わかりやすいご説明をありがとうございました。

# re: バイト数を取得する 2007/01/19 16:56 中博俊

特に以前の仕事で何かがあったわけじゃないです。
ただ日本人が日本語で生活していて日本語を重要な位置づけに声を挙げて行かないことへのいらだちでしょうか。
日本人はUTF-8なんていういじめに近いエンコード方式にも異をとなえなければいけないと思います。

# re: バイト数を取得する 2007/01/19 17:40 Ｒ・田中一郎

中博俊さん

なるほど、確かに仰る通りですね。
感慨深いものがあります。

# シフトjisもUTF-8も 2007/01/23 14:30 中の技術日誌ブログ

シフトjisもUTF-8も

# re: バイト数を取得する 2011/09/26 12:27 Katz

…すみません。今更の書き込みですが。
文字コードの話題になると、どうしても「だからUnicodeなんて1バイト文字圏の特定企業のワガママで制定された符号は捨てましょうよ」と叫びたくなってしまう私です。
テーブルひっくり返してすみません(_o_)

> ただ日本人が日本語で生活していて日本語を重要な位置づけに声を挙げて行かないことへのいらだちでしょうか。
> 日本人はUTF-8なんていういじめに近いエンコード方式にも異をとなえなければいけないと思います。

全くです。
ちなみに私は、以前ハガキの印刷システムに関わっていた関係で、文字なんてどんだけ用意してもクレームをつけるお客様がいらっしゃる事が身に沁みていますw ワタナベさんのナベの字なんか60個以上用意されて、それでも足りませんでした(爆)今昔文字鏡とかでは何個用意されているんでしょうね、もうほとんど業者イジメです…。

アメリカ人やヨーロッパ人には想像も出来ない世界、ですが、そんな人達に文字コード問題を任せる事がそもそもの間違い。

余談ですが、そもそも、今の文字コードの考え方は、昔の活字の技術の延長上にあるワケですが。世界には活字に馴染まない文字を使ってらっしゃる人々がいて、そんな文化で暮らしてる人々にとってはUnicodeを含む現在の文字コードの考え方はそもそもどうなのよという話になります。ちなみに人数比ではそういう人達の方が多かったりしますので「全世界の文章をコンピュータ処理する」という目標の為には無視できない問題です。

今更テーブル引っくり返す話に長文の投稿で済みませんでした…。

# re: バイト数を取得する 2011/09/27 19:40 Ｒ・田中一郎

>ワタナベさんのナベの字なんか60個以上用意されて、それでも足りませんでした(爆)今昔文字鏡とかでは何個用意されているんでしょうね、もうほとんど業者イジメです…。

えっ、そんなにあるんですか！？
大変だ～

# There's certainly a great deal to learn about this topic. I like all the points you've made. 2018/10/07 3:43 There's certainly a great deal to learn about this

There's certainly a great deal to learn about this topic.
I like all the points you've made.

# Wonderful article! We will be linking to this great post on our site. Keep up the good writing. 2018/10/28 9:44 Wonderful article! We will be linking to this grea

Wonderful article! We will be linking to this great post on our site.

Keep up the good writing.

# Hello colleagues, how is everything, and what you would like to say concerning this article, in my view its actually amazing in support of me. 2019/04/04 0:04 Hello colleagues, how is everything, and what you

Hello colleagues, how is everything, and what you would like to say concerning this
article, in my view its actually amazing in support
of me.

# Truly no matter if someone doesn't be aware of then its up to other people that they will help, so here it happens. 2019/04/09 21:06 Truly no matter if someone doesn't be aware of the

Truly no matter if someone doesn't be aware of then its up to other
people that they will help, so here it happens.

# This piece of writing will help the internet users for creating new weblog or even a blog from start to end. 2019/04/16 2:30 This piece of writing will help the internet users

This piece of writing will help the internet users for creating new
weblog or even a blog from start to end.

# There's certainly a great deal to know about this topic. I love all of the points you made. 2019/06/12 15:47 There's certainly a great deal to know about this

There's certainly a great deal to know about this topic.
I love all of the points you made.

# Hello! Do you know if they make any plugins to safeguard against hackers? I'm kinda paranoid about losing everything I've worked hard on. Any tips? 2019/07/23 15:34 Hello! Do you know if they make any plugins to saf

Hello! Do you know if they make any plugins to safeguard against
hackers? I'm kinda paranoid about losing everything I've worked hard on. Any tips?

# Since the admin of this web page is working, no hesitation very shortly it will be famous, due to its feature contents. 2021/07/18 15:00 Since the admin of this web page is working, no he

Since the admin of this web page is working, no hesitation very shortly it will be famous, due to its feature
contents.

# Since the admin of this web page is working, no hesitation very shortly it will be famous, due to its feature contents. 2021/07/18 15:01 Since the admin of this web page is working, no he

Since the admin of this web page is working, no hesitation very shortly it will be famous, due to its feature
contents.

# Since the admin of this web page is working, no hesitation very shortly it will be famous, due to its feature contents. 2021/07/18 15:01 Since the admin of this web page is working, no he

Since the admin of this web page is working, no hesitation very shortly it will be famous, due to its feature
contents.

# Since the admin of this web page is working, no hesitation very shortly it will be famous, due to its feature contents. 2021/07/18 15:02 Since the admin of this web page is working, no he

Since the admin of this web page is working, no hesitation very shortly it will be famous, due to its feature
contents.

# Spot on with this write-up, I actually feel this amazing site needs a lot more attention. I'll probably be returning to see more, thanks for the advice! 2021/08/17 15:07 Spot on with this write-up, I actually feel this a

Spot on with this write-up, I actually feel this amazing site needs a
lot more attention. I'll probably be returning to see more, thanks for the advice!

# Spot on with this write-up, I actually feel this amazing site needs a lot more attention. I'll probably be returning to see more, thanks for the advice! 2021/08/17 15:08 Spot on with this write-up, I actually feel this a

Spot on with this write-up, I actually feel this amazing site needs a
lot more attention. I'll probably be returning to see more, thanks for the advice!

# Spot on with this write-up, I actually feel this amazing site needs a lot more attention. I'll probably be returning to see more, thanks for the advice! 2021/08/17 15:08 Spot on with this write-up, I actually feel this a

Spot on with this write-up, I actually feel this amazing site needs a
lot more attention. I'll probably be returning to see more, thanks for the advice!

# Spot on with this write-up, I actually feel this amazing site needs a lot more attention. I'll probably be returning to see more, thanks for the advice! 2021/08/17 15:09 Spot on with this write-up, I actually feel this a

Spot on with this write-up, I actually feel this amazing site needs a
lot more attention. I'll probably be returning to see more, thanks for the advice!

# Cool blog! Is your theme custom made or did you download it from somewhere? A theme like yours with a few simple tweeks would really make my blog shine. Please let me know where you got your design. With thanks 2021/10/05 13:15 Cool blog! Is your theme custom made or did you do

Cool blog! Is your theme custom made or did you download it from somewhere?

A theme like yours with a few simple tweeks would really make my blog shine.
Please let me know where you got your design. With thanks

# Cool blog! Is your theme custom made or did you download it from somewhere? A theme like yours with a few simple tweeks would really make my blog shine. Please let me know where you got your design. With thanks 2021/10/05 13:15 Cool blog! Is your theme custom made or did you do

# Cool blog! Is your theme custom made or did you download it from somewhere? A theme like yours with a few simple tweeks would really make my blog shine. Please let me know where you got your design. With thanks 2021/10/05 13:16 Cool blog! Is your theme custom made or did you do

# We provide the fastest bitcoin doubler! 2021/10/09 14:01 Coin2xsoigo

We provide the fastest bitcoin doubler. Our system is fully automated it's only need 24 hours to double your bitcoins.
All you need to do is just send your bitcoins, and wait 24 hours to receive the doubled bitcoins to your address!

GUARANTEED! https://coin2x.org
<img src="https://picfat.com/images/2021/08/18/coin2x.png">

Click Home

https://coin2x.org

With thanks

# WOW just what I was searching for. Came here by searching for click this site 2021/10/31 21:37 WOW just what I was searching for. Came here by se

WOW just what I was searching for. Came here by searching for click
this site

# Свежие новости 2022/02/19 14:24 Adammvy

Где Вы ищите свежие новости?
Лично я читаю и доверяю газете https://www.ukr.net/.
Это единственный источник свежих и независимых новостей.
Рекомендую и Вам

# To understand the expense of this tour, and also different subtleties, it is perfect to take a look at their website. 2022/05/20 23:50 To understand the expense of this tour, and also d

To understand the expense of this tour, and also
different subtleties, it is perfect to take a look
at their website.

# Thankfulness to my father who told me about this web site, this webpage is actually awesome. 2022/05/21 6:29 Thankfulness to my father who told me about this w

Thankfulness to my father who told me about this web site, this webpage
is actually awesome.

# Thankfulness to my father who told me about this web site, this webpage is actually awesome. 2022/05/21 6:29 Thankfulness to my father who told me about this w

Thankfulness to my father who told me about this web site, this webpage
is actually awesome.

# Thankfulness to my father who told me about this web site, this webpage is actually awesome. 2022/05/21 6:30 Thankfulness to my father who told me about this w

Thankfulness to my father who told me about this web site, this webpage
is actually awesome.

# Thankfulness to my father who told me about this web site, this webpage is actually awesome. 2022/05/21 6:30 Thankfulness to my father who told me about this w

Thankfulness to my father who told me about this web site, this webpage
is actually awesome.

# It's going to be finish of mine day, except before ending I am reading this wonderful post to improve my experience. 2022/06/07 15:53 It's going to be finish of mine day, except before

It's going to be finish of mine day, except before ending I
am reading this wonderful post to improve my experience.

# If some one desires expert view regarding blogging and site-building then i advise him/her to go to see this weblog, Keep up the fastidious work. 2022/06/11 11:27 If some one desires expert view regarding blogging

If some one desires expert view regarding blogging and site-building
then i advise him/her to go to see this weblog,
Keep up the fastidious work.

# I simply couldn't go away your web site prior to suggesting that I actually enjoyed the usual info an individual supply in your guests? Is gonna be back continuously to investigate cross-check new posts 2022/06/16 6:11 I simply couldn't go away your web site prior to s

I simply couldn't go away your web site prior
to suggesting that I actually enjoyed the usual info an individual supply in your guests?
Is gonna be back continuously to investigate cross-check new posts

# I got through it clean and sober. All thanks to my recovery and the 12-Step program they have. Being in a 12-Step program gave me the tools to get through the tragedy. I sit here today writing the story of my life, grateful for being alive, for being a 2022/07/06 7:24 I got through it clean and sober. All thanks to my

I got through it clean and sober. All thanks to my recovery and
the 12-Step program they have. Being in a 12-Step program gave
me the tools to get through the tragedy. I sit here today writing the story of my life, grateful
for being alive, for being able to live again. I am not here to say that it was easy, because
it is extremely difficult to keep going. I will continue in my recovery.

# I got through it clean and sober. All thanks to my recovery and the 12-Step program they have. Being in a 12-Step program gave me the tools to get through the tragedy. I sit here today writing the story of my life, grateful for being alive, for being a 2022/07/06 7:24 I got through it clean and sober. All thanks to my

# I got through it clean and sober. All thanks to my recovery and the 12-Step program they have. Being in a 12-Step program gave me the tools to get through the tragedy. I sit here today writing the story of my life, grateful for being alive, for being a 2022/07/06 7:25 I got through it clean and sober. All thanks to my

タイトル		タイトルを入力してください
名前		名前を入力してください
Url
コメントコメントを入力してください
名前をブラウザに記憶する

R.Tanaka.Ichiro's Blog

目次

Blog 利用状況

ニュース

Link

Author

Contact

History

Accessories

書庫