OpenGL Coordinates

3D Basics Everyone Should Know Before Touching OpenGL

In this part I will cover 3D graphics in general and most of the following topics don't have to be constrained to OpenGL alone.

So what is exactly 3D and how can it be represented to the viewer on the computer screen?

To describe the idea behind rendering 3D objects on the screen it's best for me to use a 3D object.

Lets examine the following image of a wire-framed 3D cube.

You see, for your brain 3D objects are so common that by looking at this picture you will instantly recognize a 3D shape even though it's nothing more than a collection of 12 2D lines connected to each other with specific angles between them.

And yet it's hard to think of this image as being "flat". 3D graphics on the visual level is (mostly) all about rendering objects to the screen.

The question is what are the main requirements to render an object so that you will be able to correctly recognize it as a 3D object and not just a collection of lines or perhaps polygons?

Obviously, the idea is to render objects to the screen the way you would see them in real life. And how do you see objects in real life?

This is where the meaning of perspective comes from.

In the pre-computer ages artists had used the same techniques for painting their masterpieces that today's 3D software is using for creating 3D images.

The point behind perspective is that all objects farther away from the viewer look smaller than objects closer to the viewer, and ultimately they disappear into the vanishing point.

This is true for most 3D graphics applications. Now lets take a look at the OpenGL coordinate system we will be using.

It is so-called 3D Cartesian coordinate system. As you can see, additionally to the x and y-axis known in 2D graphics we have the z-axis which extends into negative space from the center of the screen from the viewer and into positive space from the center of the screen towards the viewer.

This image visually mimics what I've just said.

Perspective and Orthographic Projections

As we take little steps towards the end of this tutorial, I think it's the right time to explain projection right here. There are two types of projections actually. Perspective Projection and Orthographic Projection (described shortly). First I want to talk about Perspective projection because I've already explained perspective. Objects that you're going to render will be actually what we might call "projected" to the screen. What I mean by projection is the actual conversion from the 3D coordinates (usually vertices of objects) to the 2D flat surface of the screen. Since the computer screen has only two dimensions, we, somehow, have to display the 3D objects on the 2D screen. And that's precisely what projection does for us. Perspective projection works as follows. I will take a single pixel as an example. Imagine we have a pixel with coordinates of (5, -3, 2) on the x y and z-axis respectively and we want to project it to the screen. We do it with the following formula. Assume we have a structure POINT3D containing the coordinates of the point initialized with the mentioned values for this example.

// initialize point

POINT3D point = { 5, -3, 2 };

// find the right position on the screen in 2D coordinates

int x2d = HALFWIDTH + point.x * ViewingDistance / point.z;

int y2d = HALFHEIGHT + point.y * ViewingDistance / point.z;

// project the 3D point to the screen

Pixel(x2d, y2d);

Let's take the formula apart. As you already know, usually in 2D all coordinates are based on the 4th quadrant in 2D Cartesian Coordinate system. That means that (0, 0) is at the upper left corner of the screen. In 3D graphics, we want our view, or the camera to be exact, (camera is explained a little further into this tutorial) to be located as in the following image, so that we're always looking straight down the negative space of the z-axis.

As you can see, if we had a 3D point at (0, 0, -16) it would be exactly in the center of the screen. A little modification is required here. Take a look at the projection formula again. There we're adding halves of the screen resolution first to center all results. We're in fact translating the point from (0, 0) to (halfwidth, halfheight) on the screen. If we're in 640x480 resolution we would be translating the point to (320, 240). Take the constant ViewingDistance out of the equation for a second. And you will realize that the second part of the formula is just the relationship between "X and Z" for x2d and "Y and Z" for y2d. This is the most important idea behind perspective-projected objects. As you recall objects that appear farther from the viewer are smaller, and this is the exact relationship between the 2D points and the perspective, which is achieved by division of the both horizontal and vertical coordinates by the amount of how far away the object is. However there is a problem. By merely dividing the x and y coordinates by depth (the z coordinate) we will only get the ratio between the depth and vertical/horizontal position of the pixel. And what we need is how they are actually related to the Viewing Distance and Viewing Volume. These two terms are explained below.

The Viewing Volume is the space between the near clipping plane (or the viewing plane) and the far clipping plane as seen on the second picture below. So, back to our equation for a second, we simply multiply x and y by ViewingDistance to get the right relationship between the Viewing Volume and the X and Y coordinates. Simple as that. Viewing Distance is closely related to the Viewing Volume. The longer the viewing distance, the narrower is the line of sight and therefore the smaller the viewing volume. Well, the good news is that we don't have to worry about all of this in OpenGL since everything is done behind the scenes, however you still need to understand these terms to understand why images appear the way they appear on the screen, and I just wanted to explain the basics of perspective projection. The above formula could be used in a software 3D rendered but we're not interested in that at this moment.

In conclusion, here's how a whole object (as opposed to the pixel in previous example) would be projected onto the screen in theory. At the upper right corner of this image there is a real object (cube) in space. I tried to make the projected version of the cube as it appears on the screen as close as possible to what it would be like, but I'm sure this is wrong. Just keep in mind that the whole object is projected on the flat screen pixel by pixel (and polygon by polygon on a higher scale).

I talked about Viewing Volume and how it is related to the perspective projection equation. But what is Viewing Volume? The Viewing Volume is also known as the Clipping volume or the Frustum. Here's the visual representation of the viewing volume.

There are two planes, the viewing plane and the far clipping plane. The viewing plane is actually the screen and the far plan indicates how far you can "see", whatever is behind the far clipping plane will not be visible. The viewing volume is the space between those two planes. The viewing volume is sometimes called clipping volume because you usually want to clip your polygons against it.

Orthographic Projection

As I mentioned before there is another type of projection, which is the Orthographic Projection. This type of projection cannot be used for games or real-time applications with desirable results since it ignores the z-axis coordinate. In other words, if you draw a bunch of trees close and far away from the view, they will all appear the same size. Orthographic projection is used with technical design software and OpenGL supports it as well. In this series of OpenGL tutorials we will be always using the perspective projection.

The 3D Camera

At this point I should explain what camera is. The camera is always located at the origin of the virtual "view". Note however, that it is NOT NECESSARY located at the origin of the COORDINATE SYSTEM since you can move the camera around and transform it to anywhere in the world. The camera and the view are basically the same things. Camera is only mentioned to represent a virtual viewing point but there is actually no physical camera anywhere around. I already talked about it but it is important to understand that there is some space between the origin of the camera and the viewing plane. As you saw in the previous image. That space is the VIEWING DISTANCE.

If you look straight ahead for example you are considered to be looking down the camera's z-axis into the negative z space, in 3D terms. Camera rotation is possible around all 3 axis as you would expect and is made even easier for you by OpenGL. Camera rotation is responsible for moving the view, and it's what happens when you move your virtual head around with the mouse or arrow keys in a 3D-FPS shooter. Lets examine the camera a little closer. Camera, as any other object in space has 2 coordinate systems. The two are the Local Coordinate System and the World Coordinate System. The local coordinates are the camera's rotation degrees on all of it's LOCAL xyz-axis and actual displacement from the local coordinate system. The world coordinates specify the camera's position in the world. For example, when you walk around in a 3D FPS-shooter kind of game you are actually moving the camera's world coordinates and when you look around you change the camera's local coordinates. It is possible to use the local camera coordinates for moving also, by translating them to the new location but only BEFORE rotation is performed because rotation is also done in local coordinates around (0,0,0) and if you move the camera before rotating to say (0, 5, 0) it will not rotate correctly as its center will be displaced and taken into account during rotation. Remember this rule: always rotate around the local center (0,0,0). If this sounds confusing, don't worry. It will all settle down the more you study and actually code in OpenGL, if you haven't already. Here's how the camera's coordinates are transformed.

If you understand this so far, that's good. Now, let's move on to object rotation basics. This is exactly the same as demonstrated on the camera rotation part of the above image. The only difference is that we're not viewing the world FROM that object, but are in fact OBSERVING that object from the current camera position. This is the way an object is rotated around all of the 3 possible axis. When we get down to actually doing it in the following tutorials, I will make it more clear, so don't worry if you don't get something at this moment.

Just the same way it is with the camera, the objects also have two coordinate systems and as you might have guessed already, the objects are positioned according to the LOCAL and WORLD coordinate systems. The local coordinates are usually used for rotating the object and the world coordinates are used for positioning the object in the world or, say, in a 3D level.

As you add objects and static polygons (e.g. walls, terrain, etc.) to your 3D world you want to clip all of the polygons that are not located in the camera's viewing volume. You also want to clip off parts of the polygons that are on the edge of the view volume against the bounding box of the screen. The former is provided for us by OpenGL. Another issue associated with drawing polygons is that you don't want to draw the back faces (or sides) of the polygons when they are facing the camera. Imagine a textured polygon which is rotated by 180 degrees so its "back" is facing us. Let's also assume that that polygon is a part of a bigger structure, a wall for example. Usually you will never want to see what's "behind" the wall. Have you ever wanted to see what's behind your room's wallpaper? I surely hope not. So the point is, if you rotate a textured polygon, its coordinates are reversed judged against the camera view and you never want to see that anyway and that space is usually covered with another side of the wall, so why bother drawing it? That's right, there is no reason to and a technique called Back-face Culling comes to our help. Back-face culling works this way: it calculates the normal of the polygon (a normal is a perpendicular pointing straight out of the polygon at a 90-deg angle, and is very common in 3D graphics) and if it is pointing in the same direction as the camera, the surface of that polygon is not rendered as illustrated in this image.

This technique was so common among the older 3D engines that developers of OpenGL decided to take it into consideration and do all the dirty job for us in hardware to speed up the pipeline which is in fact the next topic of this tutorial.

3D Graphics Pipeline

In case you're all wondering what's up with all these pipelines everyone is talking about, a pipeline is actually nothing more than an order of relatively distinctive operations. At this stage it is early to talk about what the operations are. Depending on what kind of program you're writing, be it a 3D FPS engine or a flight simulator, the pipeline might actually change into different forms that will work the best for a given task. And therefore I'm not going to describe it here in detail, but I will as soon as we get some tasks to do in further tutorials.

OpenGL Variable and Function Naming Conventions

In conclusion I want to say a few words on this topic. OpenGL was made for use with various environments, not just Windows. You can always find more information in the numerous OpenGL books that are reasonably affordable, for a technical book, considering the amount of knowledge you would have gained by the time you finished a book. In this section I explain naming conventions for both OpenGL functions and variables. Although you don't have to use OpenGL-defined types I still feel obligated to describe them here so that anyone who wants their software to be platform-independent understand what this all means. Well, lets see. OpenGL has a number of predefined types. If you never plan being platform-independent it might be the best way to use local C types such as int, float and double. However if that's not the case, OpenGL has definitions that will work on the current system whatever the system is. All you have to do is add GL in front of the standard C types. For example, if you want to use a floating number type use GLfloat instead of C's float and if you want to use an int, use GLint. That works for the rest of the normal C types as well. If you want to use an unsigned value, just add a "u" between GL and the type like so: GLuint; is an unsigned integer. There is also a GLboolean which is identical to bool in C. GLbitfield is used to define binary fields. A little less obvious type in OpenGL is clamp; its variations are clampf and clampi for floating and integer variables respectively. It is short for ColorR AMPlitude and used for color compositions. There are no types for pointers. Pointers are defined the usual way. For instance this is an array of pointers to int: GLint *i[16];

Each OpenGL function has a neat naming convention and its format is:

<library><function name><number of arguments><type of arguments>

To demonstrate this on a real name function I will use the glVertex3f function.

glVertex3f(0.0f, 0.0f, 0.0f);

| | ||

| | |+- f means all parameters are floats

| | |

| | +- 3 is the number of parameters

| |

| +- Vertex is the name of the function that renders a 3D point (or a vertex)

+- gl specifies the opengl library

The last two parameters are mostly encountered in the functions that are responsible for drawing primitives. Many other functions are usually used in this form: