Java floating point numbers review

Nov 15, 2000 compiled on — Thursday August 25, 2016 at 05:31 AM

1 Java primitive types sizes

 type size in bytes byte 1 short 2 int 4 long 8 float 4 (IEEE 754) double 8 (IEEE 754)

2 Maximum value in signed and unsigned integers

Signed integer table

 number of bits Java type range range in base 10 8 byte 16 short 32 int 64 long
 number of bits Java type range range in HEX 8 byte 7F 16 short 7F FF 32 int 7F FF FF FF 64 long 7F FF FF FF FF FF FF FF

Unsigned integer table

 number of bits Java type range range in base 10 8 byte 16 short 32 int 64 long
 number of bits Java type range range in HEX 8 byte FF 16 short FF FF 32 int FF FF FF FF 64 long FF FF FF FF FF FF FF FF

3 Some bits table

The max value that can be obtained using bits is found by using the formula , this assume unsignd values.

 bit pattern base 10 Hex 0 0 0 1 1 1 10 2 2 11 3 3 100 4 4 101 5 5 110 6 6 111 7 7 1000 8 8 1001 9 9 1010 10 A 1011 11 B 1100 12 C 1101 13 D 1110 14 E 1111 15 F 1 0000 16 10 1 0001 17 11 1 0010 18 12 1 0011 19 13 1 0100 20 14 1 0101 21 15 1 0110 22 16 1 0111 23 17 1 1000 24 18 1 1001 25 19 1 1010 26 1A 1 1011 27 1B 1 1100 28 1C 1 1101 29 1D 1 1110 30 1E 1 1111 31 1F 10 0000 32 20 0111 1111 127 7F 10000000 128 80 11111111 255 FF 1 00000000 256 1 00 1111 11111111 F FF 11111111 11111111 FF FF 1111 11111111 11111111 F FF FF 11111111 11111111 11111111 FF FF FF 1111 11111111 11111111 11111111 F FF FF FF 11111111 11111111 11111111 11111111 FF FF FF FF

So, 16 bits needs 5 digits in base 10 to represent it.
32 bits needs 10 digits in base 10 to represent it.
64 bits needs 20 digits in base 10 to represent it.

So, it looks like the number of digits in base 10 to represent a bit pattern of length is
So 128 bits will require about 42 digits in base 10 to represent externally.

4 Power of 2 table

 power of two base 2 base 10 Hex 1 1 1 01 2 2 100 4 4 1000 8 8 1 0000 16 10 10 0000 32 20 100 0000 64 40 1000 0000 128 80 1 0000 0000 256 1 00 10 0000 0000 512 2 00 … (1K) 4 00 8 00 10 00 20 00 40 00 80 00 1 00 00 2 00 00 4 00 00 8 00 00 (1 MB) 10 00 00 20 00 00 40 00 00 80 00 00 1 00 00 00 2 00 00 00 4 00 00 00 8 00 00 00 10 00 00 00 20 00 00 00 (1 GB) 40 00 00 00 80 00 00 00 1 00 00 00 00 2 00 00 00 00 4 00 00 00 00 8 00 00 00 00 10 00 00 00 00 20 00 00 00 00 40 00 00 00 00 80 00 00 00 00 (1 tera) 1 00 00 00 00 00 2 00 00 00 00 00 4 00 00 00 00 00 8 00 00 00 00 00 10 00 00 00 00 00 20 00 00 00 00 00 40 00 00 00 00 00
 power of two base 2 base 10 Hex 100000… 80 00 00 00 00 00 1 00 00 00 00 00 00 2 00 00 00 00 00 00 4 00 00 00 00 00 00 8 00 00 00 00 00 00 10 00 00 00 00 00 00 20 00 00 00 00 00 00 40 00 00 00 00 00 00 80 00 00 00 00 00 00 1 00 00 00 00 00 00 00 2 00 00 00 00 00 00 00 4 00 00 00 00 00 00 00 8 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 20 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 80 00 00 00 00 00 00 00 1 00 00 00 00 00 00 00 00

5 Float and Double in Java

Java uses IEEE 754.

A number such as is expressed as or .

In floating point, the second form above is used. i.e. base 2 is used for the exponent.

The sign uses 1 bit. 0 for positive and 1 for negative. The exponent uses the next 8 bits (biased by 127), and the exponent uses the next 23 bits.

In Java, a float uses IEEE 754. The following explains how float and double represented in Java.

So, from the above, a float in IEEE 754 is in the range

In Java a double is expressed as

So, from the above, a double in IEEE 754 is in the range

5.1 How to read a floating point?

Given this example:

11000011100101100000000000000000

The above is binary representation of single precision floating point (32 bit).

Reading from the left most bit (bit 31) to the right most bit (bit 0).

bit 31 is 1, so this is a negative number. bits 30 …23 is the exponent, which is 10000111 or 135. But since the exponent is biased by 127, it is actually 8, so now we have the exponent part which is . Next is bits 22 …0, which is 00101100000000000000000, since there is an implied 1, the above can be re-written as 1.00101100000000000000000, which is read as follows:

which is

Hence the final number is .

The above implies that a number that be can't be expressed as sum of power of 2, can't be represented exactly in a floating point. Since a float is represented as , assume , then the accuracy of a float goes like this: or ,

So, a number such as can't be exactly expressed in floating point ! because the value can't be expressed as a sum of power of 2.

The greatest number that has an exact IEEE single-precision representation is 340282346638528859811704183484516925440.0 , This is 40 digits number, which is represented by

6 References

The Java programing language specifications.