Topic: java /unicode /UTF-8

  Print this page

1.java /unicode /UTF-8 Copy to clipboard
Posted by: zerol
Posted on: 2003-12-27 17:13

请问为什么说 in memory every char is at least 2 bytes, 怎么理解 at least?

excerpted from Sizeof for Java --Object sizing revisited by JW
http://www.javaworld.com/javaworld/javaqa/2003-12/02-qa-1226-sizeof.html?

Fallacy: You can measure an object's size by serializing it into a byte stream and looking at the resulting stream length
The reason this does not work is because the serialization layout is only a remote reflection of the true in-memory layout. One easy way to see it is by looking at how Strings get serialized: in memory every char is at least 2 bytes, but in serialized form Strings are UTF-8 encoded and so any ASCII content takes half as much space.

2.Re:java /unicode /UTF-8 [Re: zerol] Copy to clipboard
Posted by: dingligang
Posted on: 2003-12-29 16:53

unicode是一种编码方式,和ascii是同一个概念,而UTF-8是一种存储方式(格式)。

在jvm内部,虚拟机管理数据(内存里)时,或者在进行对象序列化的时候,字符(串)都是以unicode编码方式的。
但是在jvm中,字符(串)是以char这种(存储)形式存放的,一个char占2个字节(例如可以定义char c='字'),就是“字”和“Z”是同样占2个字节的;
而在对象序列化后,对象是进行UTF-8存储的,一个中文占2个字节,而英文、数字等只占一个字节,可以参看下面的链接。

所以导致系列化以后的对象只占平时的大约一半的空间(当全是中文时占用相同的空间;全是英文时unicode占用的空间是UTF-8的2倍)。

关于Unicode:
http://www.3552808.com/gy/dl/ShowArticle.asp?ArticleID=155

关于UTF-8:
http://www.ctosoft.com/book/utf8.html


   Powered by Jute Powerful Forum® Version Jute 1.5.6 Ent
Copyright © 2002-2021 Cjsdn Team. All Righits Reserved. 闽ICP备05005120号-1
客服电话 18559299278    客服信箱 714923@qq.com    客服QQ 714923