Topic: java /unicode /UTF-8 |
Print this page |
1.java /unicode /UTF-8 | Copy to clipboard |
Posted by: zerol Posted on: 2003-12-27 17:13 请问为什么说 in memory every char is at least 2 bytes, 怎么理解 at least? excerpted from Sizeof for Java --Object sizing revisited by JW http://www.javaworld.com/javaworld/javaqa/2003-12/02-qa-1226-sizeof.html? Fallacy: You can measure an object's size by serializing it into a byte stream and looking at the resulting stream length The reason this does not work is because the serialization layout is only a remote reflection of the true in-memory layout. One easy way to see it is by looking at how Strings get serialized: in memory every char is at least 2 bytes, but in serialized form Strings are UTF-8 encoded and so any ASCII content takes half as much space. |
2.Re:java /unicode /UTF-8 [Re: zerol] | Copy to clipboard |
Posted by: dingligang Posted on: 2003-12-29 16:53 unicode是一种编码方式,和ascii是同一个概念,而UTF-8是一种存储方式(格式)。 在jvm内部,虚拟机管理数据(内存里)时,或者在进行对象序列化的时候,字符(串)都是以unicode编码方式的。 但是在jvm中,字符(串)是以char这种(存储)形式存放的,一个char占2个字节(例如可以定义char c='字'),就是“字”和“Z”是同样占2个字节的; 而在对象序列化后,对象是进行UTF-8存储的,一个中文占2个字节,而英文、数字等只占一个字节,可以参看下面的链接。 所以导致系列化以后的对象只占平时的大约一半的空间(当全是中文时占用相同的空间;全是英文时unicode占用的空间是UTF-8的2倍)。 关于Unicode: http://www.3552808.com/gy/dl/ShowArticle.asp?ArticleID=155 关于UTF-8: http://www.ctosoft.com/book/utf8.html |
Powered by Jute Powerful Forum® Version Jute 1.5.6 Ent Copyright © 2002-2021 Cjsdn Team. All Righits Reserved. 闽ICP备05005120号-1 客服电话 18559299278 客服信箱 714923@qq.com 客服QQ 714923 |