Predefined Data Types
Now that you have seen how to declare
variables and constants, this section takes a closer look at the
data types available in C#. As you will see, C# is a lot more
strict about the types available and their definitions than some
other languages are.
Value Types and Reference Types
Before examining the data types in C#, it is
important to understand that C# distinguishes between two
categories of data type:
-
Value types
-
Reference types
The next few sections look in detail at the syntax
for value and reference types. Conceptually, the difference is that
a value type stores its value directly, whereas a
reference type stores a reference to the value.
Compared to other languages, value types in C# are basically the
same thing as simple types (integer, float, but not pointers or
references) in Visual Basic or C++. Reference types are the same as
reference types in Visual Basic or are similar to types accessed
through pointers in C++.
These types are stored in different places in
memory; value types are stored in an area known as the
stack, and reference types are stored in an area
known as the managed heap. It is important to be aware of
whether a type is a value type or a reference type because of the
different effect that assignment has. For example, int is a value type, which means that the following
statement will result in two locations in memory storing the value
20:
However, consider the following code. For this
code, assume that you have defined a class called Vector. Assume that Vector is a reference type and has an int member variable called Value:
The crucial point to understand is that after
executing this code, there is only one Vector object around. x
and y both point to the memory location
that contains this object. Because x and
y are variables of a reference type,
declaring each variable simply reserves a reference - it doesn’t
instantiate an object of the given type. This is the same as
declaring a pointer in C++ or an object reference in Visual Basic.
In neither case is an object actually created. In order to create
an object, you have to use the new
keyword, as shown. Because x and
y refer to the same object, changes made
to x will affect y and vice versa. Hence the code will display
30 then 50.
|
|
Tip |
C++ developers should note that this syntax
is like a reference, not a pointer. We use the . notation, not ->, to
access object members. Syntactically, C# references look more like
C++ reference variables. However, behind the superficial syntax,
the real similarity is with C++ pointers.
|
If a variable is a reference, it is possible to
indicate that it does not refer to any object by setting its value
to null:
This is just the same as setting a reference to
null in Java, a pointer to NULL in C++, or an object reference in Visual Basic
to Nothing. If a reference is set to
null, then clearly it is not possible to
call any nonstatic member functions or fields against it; doing so
would cause an exception to be thrown at runtime.
In languages like C++, the developer could choose
whether a given value was to be accessed directly or via a pointer.
Visual Basic was more restrictive, taking the view that COM objects
were reference types and simple types were always value types. C#
is similar to Visual Basic in this regard: whether a variable is a
value or reference is determined solely by its data type, so
int, for example, is always a value
type. It is not possible to declare an int variable as a reference (although in Chapter
6, “Operators and Casts,” which covers
boxing, you see it is possible to wrap value types
in references of type object).
In C#, basic data types like bool and long are value
types. This means that if you declare a bool variable and assign it the value of another
bool variable, you will have two
separate bool values in memory. Later,
if you change the value of the original bool variable, the value of the second bool variable does not change. These types are
copied by value.
In contrast, most of the more complex C# data
types, including classes that you yourself declare, are reference
types. They are allocated upon the heap, have lifetimes that can
span multiple function calls, and can be accessed through one or
several aliases. The Common Language Runtime (CLR) implements an
elaborate algorithm to track which reference variables are still
reachable and which have been orphaned. Periodically, the CLR will
destroy orphaned objects and return the memory that they once
occupied back to the operating system. This is done by the garbage
collector.
C# has been designed this way because high
performance is best served by keeping primitive types (like
int and bool)
as value types, while having larger types that contain many fields
(as is usually the case with classes) as reference types. If you
want to define your own type as a value type, you should declare it
as a struct.
CTS
Types
As mentioned in Chapter 1, “.NET
Architecture,” the basic predefined types recognized by C# are
not intrinsic to the language but are part of the .NET Framework.
For example, when you declare an int in
C#, what you are actually declaring is an instance of a .NET
struct, System.Int32. This may sound
like an use an easier word point, but it has a profound
significance: it means that you are able to treat all the primitive
data types syntactically as if they were classes that supported
certain methods. For example, to convert an int
i to a string, you can write:
It should be emphasized that, behind this
syntactical convenience, the types really are stored as primitive
types, so there is absolutely no performance cost associated with
the idea that the primitive types are notionally represented by
.NET structs.
The following sections review the types that
are recognized as built-in types in C#. Each type is listed, along
with its definition and the name of the corresponding .NET type
(CTS type). C# has 15 predefined types, 13 value types, and 2
(string and object) reference types.
Predefined Value Types
The built-in value types represent
primitives, such as integer and floating-point numbers, character,
and Boolean types.
Integer types
C# supports eight predefined integer types,
shown in the following table.
Future versions of Windows will target 64-bit
processors, which can move bits into and out of memory in larger
chunks to achieve faster processing times. Consequently, C#
supports a rich palette of signed and unsigned integer types
ranging in size from 8 to 64 bits.
Many of these type names will be new to Visual
Basic. C++ and Java developers should be careful; some of the names
of C# types are the same as C++ and Java types, but the types have
different definitions. For example, in C#, an int is always a 32-bit signed integer. In C++ an
int is a signed integer, but the number
of bits is platform-dependent (32 bits on Windows). In C#, all data
types have been defined in a platform-independent manner to allow
for the possible future porting of C# and .NET to other
platforms.
A byte is the standard
8-bit type for values in the range 0 to 255 inclusive. Be aware
that, in keeping with its emphasis on type safety, C# regards the
byte type and the char type as completely distinct, and any
programmatic conversions between the two must be explicitly
requested. Also be aware that unlike the other types in the integer
family, a byte type is by default
unsigned. Its signed version bears the special name sbyte.
With .NET, a short is no
longer quite so short; it is now 16 bits long. The int type is 32 bits long. The long type reserves 64 bits for values. All
integer-type variables can be assigned values in decimal or in hex
notation. The latter require the 0x
prefix:
If there is any ambiguity about whether an integer
is int, uint,
long, or ulong, it will default to an int. To specify which of the other integer types the
value should take, you can append one of the following characters
to the number:
You can also use lowercase u and l, although the
latter could be confused with the integer 1 (one).
Floating-Point Types
Although C# provides a plethora of integer
data types, it supports floating-point types as well. They will be
familiar to C and C++ programmers.
The float data type is
for smaller floating-point values, for which less precision is
required. The double data type is
bulkier than the float data type but
offers twice the precision (15 digits).
If you hard-code in a non-integer number (such as
12.3) in your code, the compiler will normally assume that you want
the number interpreted as a double. If
you want to specify that the value is a float, you append the character F (or f) to it:
The
decimal Type
In addition, there is a decimal type representing higher-precision
floating-point numbers, as shown in the following table.
One of the great things about the CTS and C# is the
provision of a dedicated decimal type
for financial calculations. How you use the 28 digits that the
decimal type provides is up to you. In other words, you can track
smaller dollar amounts with greater accuracy for cents or larger
dollar amounts with more rounding in the fractional area. You
should bear in mind, however, that decimal is not implemented under the hood as a
primitive type, so using decimal will
have a performance impact your calculations.
To specify that your number is of a decimal type rather than a double, float, or an
integer, you can append the M (or
m) character to the value, as shown in
the following example:
The
Boolean Type
The C# bool type
is used to contain Boolean values of either true or false.
You cannot implicitly convert bool values to and from integer values. If a
variable (or a function return type) is declared as a bool, you can only use values of true and false. You will
get an error if you try to use zero for false and a non-zero value for true.
The
Character Type
For storing the value of a single character,
C# supports the char data type.
Although this data type has a superficial
resemblance to the char type provided by
C and C++, there is a significant difference. C++ char represents an 8-bit character, whereas a C#
char contains 16 bits. This is part of
the reason that implicit conversions between the char type and the 8-bit byte type are not permitted.
Although 8 bits may be enough to encode every
character in the English language and the digits 0–9, they aren’t
enough to encode every character in more expansive symbol systems
(such as Chinese). In a gesture toward universality, the computer
industry is moving away from the 8-bit character set and toward the
16-bit Unicode scheme, of which the ASCII encoding is a subset.
Literals of type char
are signified by being enclosed in single quotation marks, for
example ‘A’. If you try to enclose a
character in double quotation marks, the compiler will treat this
as a string and throw an error.
As well as representing chars as character literals, you can represent them
with four-digit hex Unicode values (for example ‘\u0041’), as integer values with a cast (for
example, (char)65), or as hexadecimal
values (‘\x0041’). They can also be
represented by an escape sequence, as shown in the following
table.
C++ developers should note that because C# has
a native string type, you don’t need to
represent strings as arrays of chars.
Predefined Reference Types
C# supports two predefined reference types,
described in the following table.
The
object Type
Many programming languages and class
hierarchies provide a root type, from which all other objects in
the hierarchy are derived. C# and .NET are no exception. In C#, the
object type is the ultimate parent type
from which all other intrinsic and user-defined types are derived.
This is a key feature of C#, which distinguishes it from both
Visual Basic and C++, although its behavior here is very similar to
Java. All types implicitly derive ultimately from the System.Object class. This means that you can use the
object type for two purposes:
-
You can use an object reference to bind to an object of any
particular subtype. For example, in Chapter 6, “Operators
and Casts,” you see how you can use the object type to box a value object on the stack to
move it to the heap. object references
are also useful in reflection, when code must manipulate objects
whose specific types are unknown. This is similar to the role
played by a void pointer in C++ or by a Variant data type in VB.
-
The object type
implements a number of basic, general-purpose methods, which
include Equals(), GetHashCode(), GetType(),
and ToString(). Responsible user-defined
classes may need to provide replacement implementations of some of
these methods using an object-oriented technique known as overriding, which is discussed in Chapter
4, “Inheritance.” When you override ToString(), for
example, you equip your class with a method for intelligently
providing a string representation of itself. If you don’t provide
your own implementations for these methods in your classes, the
compiler will pick up the implementations in object, which may or may not be correct or sensible
in the context of your classes.
The object type is
examined in more detail in subsequent chapters.
The
string Type
Veterans of C and C++ probably have battle
scars from wrestling with C-style strings. A C or C++ string was
nothing more than an array of characters, so the client programmer
had to do a lot of work just to copy one string to another or to
concatenate two strings. In fact, for a generation of C++
programmers, implementing a string class that wrapped up the messy
details of these operations was a rite of passage requiring many
hours of teeth gnashing and head scratching. Visual Basic
programmers had a somewhat easier life, with a string type, while Java people had it even better,
with a String class that is in many ways
very similar to C# string.
C# recognizes the string
keyword, which under the hood is translated to the .NET class,
System.String. With it, operations like
string concatenation and string copying are a snap:
Despite this style of assignment, string is a reference type. Behind the scenes, a
string object is allocated on the heap,
not the stack, and when you assign one string variable to another
string, you get two references to the same string in memory.
However, with string there are some
differences from the usual behavior for reference types. For
example, should you then make changes to one of these strings, note
that this will create an entirely new string object, leaving the other string unchanged.
Consider the following code:
The output from this is:
Changing the value of s1
had no effect on s2, contrary to what
you’d expect with a reference type! What’s happening here is that
when s1 is initialized with the value
a string, a new string object is
allocated on the heap. When s2 is
initialized, the reference points to this same object, so
s2 also has the value a string. However, when you now change the value of
s1, instead of replacing the original
value, a new object will be allocated on the heap for the new
value. The s2 variable will still point
to the original object, so its value is unchanged. Under the hood,
this happens as a result of operator overloading, a topic that is
explored in Chapter 6, “Operators and Casts.” In general,
the string class has been implemented so
that its semantics follow what you would normally intuitively
expect for a string.
String literals are enclosed in double quotation
marks (“...”); if you attempt to enclose
a string in single quotation marks, the compiler will take the
value as a char, and throw an error. C#
strings can contain the same Unicode and hexadecimal escape
sequences as chars. Because these escape
sequences start with a backslash, you can’t use this character
unescaped in a string. Instead, you need to escape it with two
backslashes (\\):
Even if you are confident you can remember to do
this all the time, it can prove annoying typing all those double
backslashes. Fortunately, C# gives you an alternative. You can
prefix a string literal with the at character (@) and all the characters in it will be treated at
face value; they won’t be interpreted as escape sequences:
This even allows you to include line breaks in your
string literals:
Then the value of jabberwocky would be this:
|