An OpenVPN Primer

© Steve Butler, December 8, 2010

Why VPN?

A Virtual Private Network (VPN) is a useful tool for anyone wanting to have full access to their home computers from anywhere in the world via the Internet. There are many reasons why someone would want to do this. To my mind, one of the main ones is cost since you get more bang for your buck when beefing up a desktop PC than you do on a notebook. If you can access your PC from anywhere, then why not build yourself a killer home machine, and save on the notebook? Especially with the super cheap netbooks and lightning communications networks available nowadays, you can pretty easily get a powerful system set up at home without sacrificing portability. Of course, if what you want to do is play games on your laptop, this isn’t going to help you. But if like me you mostly use your computers for work and hobbies (rather than entertainment) then there’s a lot of value in having a VPN, and it’s relatively straightforward to set up.

Another benefit of configuring a VPN is that it helps to keep things organized since I can work from one central point and not have to constantly copy things back and forth between my notebook and home computer (inevitably leading me to accidentally overwrite information, and having to go searching multiple systems for files whenever I need them).

A popular piece of free VPN software is called OpenVPN and can be downloaded for Windows, Linux, or Mac at openvpn.net. When I was setting up OpenVPN I read a lot of newsgroup posts where people commented on what a breeze it was to set up. I’m glad some people have this experience, because in my case it took several days and a lot of reading and mucking around to get things working the way I wanted them. In retrospect I can’t help thinking that the process could be made much easier if some basic concepts were better explained up-front for the layperson. I’m a Software Developer by trade and I found setting up my VPN enough of a challenge – I can only assume that most of these newsgroup posters who enjoyed the cakewalk are hobbyists who delight in spending hours learning new technologies or IT professionals with a better background in networking than mine. Or maybe I was just unlucky for whatever reason. Anyway, always wanting to improve matters I figured I’d write an article explaining the things I found out about OpenVPN so anyone using this software in the future can benefit.

This article is not intended to go through the details of configuring OpenVPN on your computer since there are a lot of resources out there explaining this already, but rather as an introduction to OpenVPN that explains some of the overall concepts in plain English, so those resources will be easier to understand. I tried to write it so that anyone interested in setting up a VPN can follow along regardless of their technical background. As such, if you’re a more advanced reader, you may find some of the explanations obvious, but you can skip the sections explaining concepts you already understand.

DISCLAIMER:I’m not an experienced IT Administrator nor do I have any particular knowledge of the inner workings of OpenVPN. I’m a Software Developer who went through the process of setting up a VPN on my own network, and wrote this article to try and pass on my understanding of how it works to anyone doing the same. The information provided here, while correct to the best of my knowledge, may contain some factual errors, in which case I welcome any comments or suggestions that I can use to improve the content of the article. In short, I hope it helps you, but don’t blame me should you find something I’ve written to be incorrect.

A VPN In Broad Brushstrokes

A VPN allows you to connect your notebook to a local network or LAN with an Internet connection, such as in a coffee shop, and use that connection to “plug into” a remote (meaning physically elsewhere) network, such as your home network, so that you can access the remote network as if it were your local network. That means you should have all the same access to your home computers as you would if you were plugging your notebook directly into your home router using a network cable. As such you can do all the same things you can do at home like see and use the desktop of your home PC, print on your home printers, transfer files from your home PC to your notebook, etc. Also this must be done in a way that doesn’t open up your home network to anyone other than you, so a basic requirement of a VPN is that it is secure.

So how do you convince the computers on your network that your laptop is connected locally, when really it is sitting in a coffee shop somewhere? The answer is that you need a piece of software running inside the network that receives data (known as traffic) over the Internet from the laptop, and spits it back out onto the network as if it had come from there. Also you need a piece of software running on your laptop that does the same thing in the other direction – receives incoming traffic from the remote network and makes it appear to the laptop to have come from a network that’s connected locally. These two pieces of software, one inside and one outside your home network, need to stay in communication with each other, shuttling data back and forth, to keep up the ruse that the laptop is plugged into the remote network. Once you have that, the laptop becomes part of what’s known as a virtual network – it isn’t actually physically plugged into the network, but it acts for all intents and purposes as if it is. As it turns out, both ends of the virtual network are taken care of by two copies of OpenVPN running simultaneously that talk to each other – one is running inside the network, and the other’s on the laptop.

As I mentioned, the “private” part of the Virtual Private Network (VPN) has to do with security. If you know how the Internet works, you know that no matter what you do, traffic between the local and remote networks is going to travel through strange and unknown computers as it crosses the Net. That means random people are going to be able to look at it en route, and some of these people may not be well-intentioned. So the only way to ensure the data isn’t used maliciously is to put it in a form that nobody could possibly understand – in other words to encrypt it, and decrypt it at the destination. Encryption just means that you mix the data up so badly it becomes unreadable, but that the recipient will still be able to restore it to its original form. To be of practical use, encryption and decryption needs to be done automatically on all data traveling between the two networks. The automatic agent that takes care of this on both ends is OpenVPN.

In your home network you’ll have OpenVPN installed on one computer that will be considered the server, and you’ll also install it on your notebook, which will be called the client (see Fig.1). The difference between the two is that the client typically connects to the server, and not vice versa. Also you can set up any number of clients to connect to a single server. For example, in addition to your notebook you might also want to connect a work PC to your home VPN, and you can use a similar configuration to do so for both clients.

So in the coffee shop scenario OpenVPN will be running both locally (on the client) and remotely (on the server), and both of these machines will have been pre-configured by you to be able to talk to each other. You can then tell the client to connect to the server, which means OpenVPN will “call” the server over the Internet, and establish a secure connection with it. This is easily compared to the client calling the server over the telephone. The connection that gets established, like a phone call, is initiated by either one of the participants and can be ended by either party at any time.

This connection is often conceptualized as a pipe connecting the two computers, through which data can flow freely, and which gets torn down when OpenVPN disconnects. The interesting thing about the pipe, which is translucent, is that it’s made of a magic material that confuses anyone who looks through it, so they can’t understand what’s going on inside there. In reality the pipe is nothing more than an arrangement between the client and server – they’re aware that they’re talking to each other, know how to reach each other over the Internet, and are encrypting all data passing between them.

 

OpenVPN Coffee Shop Scenario
Fig. 1 – Secure pipe from home to the coffee shop

 

OpenVPN offers two methods of creating this pipe:

  1. via Network Bridging
  2. via Routing

When reading other online resources, it’s important to know which of these two methods is being discussed, because the two do not mix well. There are some differences between the two methods both in terms of functionality and security considerations. Generally the documentation I’ve read recommends setting up a routed network because it gives you more control when it comes to security, however it’s also a little more complicated to do so.

To understand the difference between bridging and routing, it helps to know about a thing called the TCP/IP protocol stack. This is explained in more detail in Appendix A. If you’re already familiar with the TCP/IP stack then read on.

Bridging Explained

The term “Network bridging” refers to a situation where you have a computer with two or more network cards, each of which is connected to a different network, and you want to merge the smaller networks together into one big network. To illustrate this, imagine a guy working out of his home who has four computers that he uses for work, and each of his kids have PCs in their rooms that they use for downloading media and playing games. So in this example there are two networks in the same house: a home network and a work network. His main desktop PC has two network cards in it, one connects the PC to the work network, and the other connects it to the home network (See Fig. 2). Now say for some reason, he wants one of his other work computers (not his main desktop PC since that can already see both networks) to access the PC in one of his kids’ rooms. In fig. 2 this is shown as computer B trying to talk to computer A. These two computers are already physically connected – if you follow the network wires from either end they’ll both take you into Dad’s desktop PC which is the connection point – however these computers can’t currently talk to each other because they exist on different networks.

 

Two Networks Connected by A Bridge
Fig. 2 – A Network bridge can combine networks that share a host

 

So here’s where bridging comes in handy – the guy can connect up the two networks by creating a bridge between the two network cards in his PC. Each chunk of information flowing over the network (called a “packet” or “frame”) contains an address (called a MAC address) indicating what computer on the network it is destined for. The bridge in this case is just a program that takes all frames coming into either card and shoves them out over the other card, assuming they aren’t addressed to the desktop PC itself (in which case they would just get used by the PC instead of pushed out over the other network). What this does is that now instead of two separate networks, there’s effectively only a single network in the home with all the computers on it, and the desktop PC acting as a go-between. The work computer (labeled “B” in Fig. 2) can now see and talk to the kid’s personal PC (labeled “A” in Fig. 2) on the same network.

Bridging can be done with any number of network cards in a machine – if there are more than two then the bridge software looks at each packet’s address to figure out which card to shove it out onto. For a particular frame if it doesn’t know where the destination computer resides (because it hasn’t seen any traffic from that computer yet) it’ll shove the frame out on all cards other than the one it came in on, until it figures out which network the destination machine is attached to.

Note that one of the dangers of bridging is that IP addresses (which are different from MAC addresses) on separate networks can conflict, and these conflicts would need to be resolved for the bridged network to work properly. I’ll explain what this means better later, but imagine if two teams in a workplace get merged, and each team has a guy named Steve (this has in fact happened to me many times). The new team will need to start calling one of the Steves something different so that two people don’t keep going “huh?” every time someone calls out the name “Steve”.

A Bridged VPN

The way OpenVPN uses bridging to create a secure VPN pipe is that it creates a virtual network card on your computer, called the tap device, which it pretends is connected to the remote network (such as, via an imaginary Ethernet cable). This “device” is just a piece of software, but for all intents and purposes your Operating System treats it as another hardware device just like your network card. After creating the device, OpenVPN makes a bridge between your real network card and the virtual tap network card, which effectively combines the local LAN and the remote network into one single network as explained above. Of course, all data is really flowing in and out of your computer via your one real network card (or wireless card, same difference).

The network traffic flowing through your real network card gets divided up by the OpenVPN tap device to appear to be coming from two devices. The word “tap” refers to a traditional network tap device – a physical box that attaches to a network cable without interrupting the data flow, but copies all data in either direction to an alternate output terminal for the purposes of sniffing network traffic (kind of like tapping a water main). So OpenVPN’s virtual device acts like a tap on the data coming out of your real network card, and examines each inbound network frame. If incoming data is encrypted and originating from the remote network it gets decrypted and then passed up to the Network layer (only it will appear to have come from the tap device instead of the real network adaptor), otherwise the unchanged frame flows through the tap as it would normally from your network adaptor. This all happens in the Link layer, which means no other layers or programs running on your computer will have a chance to see the data before this redirection happens.

So to recap, a bridged VPN:

  1. uses software to create the appearance of two network cards on the local machine,
  2. re-directs any VPN traffic from the real network card to the virtual device to create the appearance of a second local network, and then
  3. bridges the two networks by saying “from now on I’m going to treat these two separate networks as one big network”.

Why bother separating out the network traffic into two devices and then re-combining the networks? Because unlike the data coming from your local network (LAN) or Internet (WAN), all the VPN packets are encrypted and decrypted by the virtual device in order to provide a secure private channel over the public Internet. Using this approach, the end-user experiences the remote network exactly as if they had a second network card in their computer physically attached to it.

Although the ease with which you can merge a local and remote network into one using bridging is pretty attractive in terms of its elegance, it turns out it comes with a significant downside, and this is why routing tends to be a preferred method of creating a VPN pipe. While the fact that both networks are merged is great in one direction – you now have total access to all the computers on your home network – it’s a real drag in the other. It means not only do you have access to your home network but potentially any other machine on your local network does as well, since they now see all computers on both networks as being on the same network. So in other words the guy sitting next to you at the coffee shop, if he knew what he was doing, could potentially access your home network. Of course you can put measures in place to stop traffic between your home network and his machine, since all VPN traffic is flowing through your laptop. But the point is that a bridged VPN takes a non-conservative strategy in terms of security. It blanket opens up your network to anyone on the client-side LAN, and then you must scramble to narrow access back down to a level you’re comfortable with. In contrast, the routing method only opens up those routes that are intended, so there can be no accidental access to your network that you didn’t overtly grant. This latter approach is much preferable to those paranoid network-administrator types who crave complete control when it comes to configuring for security.

A Routed VPN

Whereas bridging just kind of mashes the traffic from the local and remote networks together without altering it, routing takes a more hands-on approach, which takes place further up the TCP/IP protocol stack in the “Network” layer, using the Internet Protocol (IP).

If you already know about IP addresses and Network Address Translation (NAT), then you will easily be able to understand how a routed network works. Otherwise, read Appendix B to learn about these concepts.

Diagram Note: For simplicity’s sake Fig.1 shows an integrated broadband modem and router in both the coffee shop and at home. In reality the gateway computer may be a stand-alone broadband modem with an Ethernet cable connecting it to a separate router, but conceptually speaking this doesn’t change things too much as long as the modem isn’t blocking any traffic from flowing to the router.

So on with our discussion of routed VPNs… To illustrate, consider your laptop connected via Wifi to the coffee shop’s LAN. It will have a local IP address on the wireless network, and it will know the public IP address of the Internet gateway for the remote (home) network.

 

OpenVPN Coffee Shop Scenario
Fig. 1 – Secure pipe from home to the coffee shop

 

If you attempt to access the remote gateway over the Internet then your laptop will stamp the outbound traffic with its address. The traffic will leave your laptop, travel to the coffee shop’s Internet gateway (which will have its own Internet address), and from there it will get forwarded via the Internet to the remote gateway.

Now say there’s a computer inside the remote network you want to talk to (ie. behind the gateway computer). This is typical if the gateway is a broadband modem and is connected via a LAN to a home PC as illustrated in Fig. 1. A simple solution is to configure the home gateway to forward any incoming traffic straight to the PC (for example using port forwarding). Then you’d have a pipe from your laptop to the PC over the Internet and you’d be able to do things like log into it and use it right from the coffee shop.

But the problem here is that anyone else on the Internet can do it too, because all traffic, regardless of sender, is being forwarded indiscriminately by the home gateway. Even if the login is password-protected, anyone on a network in the path between the coffee shop and your home can easily read your password as soon as you attempt to log in, because none of the information flowing through this pipe is hidden in any way. It’s like shouting your PIN number to the cashier instead of typing it in every time you buy groceries – other people might not be interested in that information, but chances are, if you always do things this way, sooner or later someone unscrupulous will use the opportunity to take advantage of your openness. Also, unlike in the supermarket, the Internet has programs that can snoop your traffic on behalf of the ne’er-do-wells.

Of course, this security concern is the whole reason we need a VPN in the first place. Instead of sending the raw information back and forth over the Internet, we want the traffic specifically between your laptop and the remote host to get automatically encrypted and decrypted at each end, so that the endpoints have access to the data but nobody in between does.

So OpenVPN’s solution in a routed VPN is to say “let’s pretend there are two remote networks: an insecure one, and a secure one.” Communication with the insecure network is already provided by the remote gateway’s IP address, and OpenVPN makes up a new IP address to represent the secure network. In the config file you can actually tell OpenVPN what pool of IP addresses you like it to use for this purpose, but it’s important to remember that the secure network’s address does not represent a real network connection – it’s just a conceptual address used by OpenVPN. Using it, any application running on the local machine can send data to the remote network and take for granted that it’ll remain encrypted until it reaches the safety of the OpenVPN server on the other end of the pipe.

That’s because any traffic from your laptop that’s destined for the secure network’s address gets automatically encrypted by OpenVPN. How is this done? Once again, using the same trusty “tap” device (described in the section entitled “A Bridged VPN”) that was installed locally when you installed OpenVPN, and which checks all traffic flowing into and out of the network card (or wireless card). If outbound traffic is destined for the secure network, the device will encrypt it first, and similarly inbound traffic arriving from the secure network will get decrypted.

This is similar to the behaviour of the tap device described in the section on Bridged VPNs, however the device acts differently in this case. Whereas the tap device allows all traffic to flow through it, selectively encrypting or decrypting VPN packets, in the routing case the device has its own IP address that inbound and outbound VPN packets are stamped with, so it acts more as an endpoint of a pipe than a tap on a water main. Because of this, in a routed network the virtual device is referred to as a tunneling device, which OpenVPN abbreviates as “tun”. So you’ll see in the OpenVPN config files you can specify what mode you want the device to act in, and the two possible options are “tap” (for a bridged VPN) or “tun” (for a routed VPN).

Besides just encrypting packets you may have already guessed that the tun device also does one other thing before sending the traffic across the Internet. Remember that the traffic is destined for a fictitious “secure network” with a made-up IP address that OpenVPN selected. Therefore if that traffic was allowed to flow out into the public Internet it would end up at an unintended destination; much like if you picked up your phone and dialed a made-up number – who knows where it’ll end up? So this made-up IP address needs to be removed from any packets before they hit the “outside”. It follows that after encrypting a packet the tun device must also change its destination address be the insecure network’s address (which is the only real destination it knows about), but now it’s OK to send the packet there because no one will be able to decipher its contents. In other words, the tun device in this case is translating the fictitious address to a real address – yes, it’s doing Network Address Translation (NAT).

You’ll notice that while the tap device doesn’t actually change the packets flowing through it, the tun device actively writes routing information in each packet. Because of this behaviour, the tap/tun device is in fact acting totally differently than it was in the bridged network scenario. There, it was acting as a network bridge, and here it is acting as a router – therein lies the difference between the two methods of building a secure pipe supported by OpenVPN.

Getting It To Actually Work

After you’ve installed OpenVPN on both the client and the server, making it work comes down to getting two things right:

  1. the contents of the OpenVPN config file on both client and server
  2. ensuring the data is getting routed properly between client and server

The easiest way to debug your setup is to sit at the server with your laptop connected to an external network (for example, using a 3G USB dongle) and run a packet sniffer (such as Wireshark) on both. It would also be a good idea to run a packet sniffer on a third PC inside the LAN other than the OpenVPN server. First, try and get the client and server talking to each other, and once that’s working then try and get the client talking to the third PC. If you set the client to endlessly ping the target computer in each case (for example, by using “ping -t <destination IP>”), then you should be able to monitor the packet sniffer output on each computer to see the those ping packets making their way across your network. If the packets aren’t reaching their destination, this will enable you to see pretty easily where the pipe is breaking down.

Some barriers you’re likely to encounter along the way include:

Firewalls

Your router, your broadband modem, and each PC/laptop in your VPN is likely running a firewall of some kind, any of which could decide to toast your VPN packets if they don’t like the look of them. If you suspect a particular firewall, disable it temporarily to see if it fixes the problem. If that fixes it, then as a longer term solution you’ll need to specify a firewall exception rule to allow your specific VPN traffic to pass. The way to do this differs depending on the firewall software, but for security reasons you should always strive to make the exception as narrow as possible. Bear in mind too that the failure of the packet to arrive at its destination could be due to it getting blocked by several firewalls along the way, rather than just one, and this can complicate the debugging process.

Local Routing Tables

If you’re using a routed VPN then you may need to add your VPN subnet to the local routing tables of some of your network PCs in order to be able to properly talk to them. The reason for this is as follows.

Imagine your ping packet arriving at your home network via the Internet. It’ll be stamped with the public IP address for your gateway (ie. your broadband modem). First off, if there’s no forwarding rule set up at your router then its journey will stop there, so you’ll need to make sure the router knows to forward this traffic to your OpenVPN server. Once the packet reaches the server, it is decrypted and then spat back out onto the network stamped with the destination host’s address (which had been previously hidden within the encrypted packet) and gets routed to that host.

So now the destination host has received the ping and it should attempt to respond to it. In order to do so it pulls the source IP address from the packet – that is, the VPN address of the computer that sent the ping (your laptop). But remember, this is a made-up address, known only to OpenVPN, so none of the other computers on the network are going to know how to reach this address. The ping response will head out onto the network stamped with this destination address, but once it hits the router it will get dropped, never making it to the OpenVPN server, and the laptop will never get its ping response.

This problem can be addressed by one of two ways:

  1. by adding a forwarding rule to the router so any LAN traffic arriving at it that’s destined for any address in the VPN’s subnet will get forwarded to the OpenVPN server.
  2. by modifying the local routing table on the destination host so it knows to send any VPN subnet responses (such as the ping response in the example) directly to the OpenVPN server. The way to do this will differ depending on the destination host – you can just Google the method for your particular setup.

IP Address Conflicts

Whether you’re using a bridged or a routed VPN you may encounter annoying IP conflict problems. To illustrate what I mean, consider that your laptop is connected to a local network such as a wireless network in a coffee shop, and has an IP address on that network that is used by all the other computers to identify your laptop on the network. This IP address is like a name given to your computer, so that the others on the network know how to address it. This can be done in one of two ways: either you have a DHCP server that chooses a name from a pre-defined “pool” and assigns it (“Welcome to our network, from now on we will call you Fred because none of us are using that name”), or you can hard-code the address and hope no one else is using it (“Hi there everyone, you can call me Fred”). In the latter case, if someone is using that name already, you’ll experience an IP Address conflict, which is what happens when you have two guys in the same room named Fred. Whenever someone on the network puts their hand up and says “I’ve got a message here for Fred”, that message could end up with one or the other (or both) of the Freds, and the one that gets it may very well be an unintended recipient. This is bad because you can’t reliably connect to either of the Freds in this case – you’d never know at any time what Fred you were talking to. Obviously this situation is easily resolved by changing your hard-coded IP address to one that isn’t in use; then your laptop will have a unique name on the network like every other computer and there’ll be no confusion as to which traffic is meant for whom.

But when you connect to a bridged VPN this problem can crop up again in a less obvious way. As soon as the bridge is made between the local and remote network, the traffic from both networks gets combined at the Data Link layer, which has no knowledge of IP addresses on account of being lower in the TCP/IP stack than the IP protocol (which, if you recall, resides at the Network layer). However all computers on both networks do have local IP addresses assigned to them. The new traffic flooding onto either network from the other is stamped with the IP addresses from the originating network, so once again you can see there’s a potential here for conflict. To re-use the analogy I used earlier, you have two teams of people being merged into one team, but nobody has checked to see whether any of the members might have the same names. The new combined team may end up having two Freds on it by accident.

This scenario is even more likely due to the fact that there are certain pools of IP addresses that are conventionally used in small networks, for example the ranges 192.168.0.*, 192.168.1.*, and 10.10.10.*. If you’re unlucky the local addresses on both ends of the VPN pipe could be getting drawn from the same pool, which will almost certainly result in conflicts. So to avoid this eventuality it’s recommended that you configure your home network IP address pool to be something a little unusual, such as 192.168.5.*. This doesn’t offer you any guarantees of avoiding conflicts, but it’s much less likely for a coffee shop to be assigning addresses from this pool than the other ones previously mentioned.

A routed VPN doesn’t protect you from these conflicts either. Say you have your home network configured to use 10.10.10.* addresses, and you configure OpenVPN to assign addresses from the pool 192.168.5.*. You likely won’t have a problem connecting your laptop to your OpenVPN server because they will talk to each other using addresses on the 192.168.5.* subnet, and that’s unlikely to be used by the coffee shop. But say you want to access another computer on your home network besides the OpenVPN server – then you’ll have to use its local address which would be something like 10.10.10.3. If you’re unlucky and the coffee shop you’re in happens to be assigning from the 10.10.10.* pool, it’s highly likely that some other machine in the coffee shop, such as someone else’s laptop, has already been assigned this address. So traffic leaving your computer destined for 10.10.10.3 might either get routed to the other customer’s laptop within the coffee shop, or it might get sent through the secure pipe to your home network. What actually happens in any given case is unclear and probably comes down to the luck of the draw – hardly what you’d consider to be a reliable connection. So again, regardless of the type of VPN you have implemented, it behooves you to configure your home network to use uncommon IP addresses to avoid these circumstances.

Appendix A: The TCP/IP Protocol Stack

Protocols

To begin, a protocol in computers has a similar meaning to the definition in the usual sense. It’s a collection of strictly defined rules that, when adhered to exactly, allows two independent components to understand each other’s meaning. For example, I had a friend explain to me that in Japan, when “cheers”ing, it’s culturally understood that the socially “lower” person must clink with his glass rim positioned lower than the socially “higher” person. So as humans, if I were to clink with my glass held higher than an older businessman’s we’d say I made a gaffe, and if we were computers he might say “Error 0x04f5: respect not found”. The protocol in this case would be the rule “Thou shalt clink glasses with rims at levels respective to thy social rank”, which if observed on both sides would allow me and the Japanese businessman to get on together.

Or a simple technical example would be this – say a protocol for movie files contains the rule “the first number in the file indicates the frames per second of the movie, and the second number is the length of the movie in minutes”. So a file might start with {30, 15} to mean a 15 minute clip at 30 fps. Now if a movie player wasn’t following this protocol, it might read this file and think “this is a 30 minute file at 15 fps”, and then continue to read the file contents with this wrong idea in mind. Clearly at some point the movie player is going to get confused and the movie won’t get played as intended. The problem here is that the writer of the file was following a different protocol than the reader (ie. the movie player app) which means they couldn’t talk effectively to each other. (Side note: Strictly speaking, this would probably be more commonly understood as a file format rather than a protocol since these components are not actively talking to each other at the same time, but if the movies were being streamed rather than read from a file then this same example would describe a protocol.)

So in computers a protocol, which is just a set of rules, is said to have some dependencies, and also offers some services to the components that use it. Using the earlier example, a dependency of the “cheers” protocol is: “the two people must be seated together at dinner first” – if I ran into a business meeting in progress holding a glass of sake and started clinking with people, it’s unlikely to be well received regardless of how high I’m holding the rim. The service offered by this protocol, you could say, is some minor social cohesion – after clinking, me and the businessman feel more relaxed and amiable towards each other, that’s the whole reason we bothered doing it in the first place. So protocols in general rely on dependencies in order to deliver services – these things are a part of the protocol’s definition as well as the rules themselves.

Protocol Stack

A protocol stack then, is a bunch of protocols that work together to deliver more complex services than any of them individually. Each protocol in the stack has a different job that it is concerned with. It delivers services to the next protocol up the stack, and is dependent on the protocol underneath it in the stack.

This is easy to understand if you think of how things work when you buy a DVD player online. Imagine if when you submitted your online order there was a guy sitting in the warehouse watching what you’re typing, who then goes over, picks up your DVD player, tosses it in the back of his pickup truck and drives it over to your place (after going down the bank to get the money out of your account for it first). Clearly this way of doing things would never work in the real world – you’d need a dedicated multi-skilled employee for every DVD player sold!

The way it really works is: there’s an IT department managing the web page, a sales department taking care of the money transaction and managing orders, a warehouse managing inventory, a delivery company to bring it your door, and who knows what other groups involved in getting that DVD player to you. Each group is just doing its piece of the overall job and it understands the rules it needs to follow to get that job done. The sales department depends on the fact that the paid orders it records are going to be serviced by the rest of the infrastructure, and it offers the service of handling the money exchange for the website. The delivery company offers a service, and delivers that service (you could say) to the warehouse, which is the next department up in the stack. The user typing at the computer is at the top of the stack, because he isn’t offering any services to anyone, and is dependant on all other layers to deliver his DVD player.

 

Stacked Services For Delivering a DVD Player
Fig. 3 – Possible stack of services to handle online purchase of a DVD player

 

So a protocol stack is no different. If you’ve ever thought that it must be kind of complicated to get the instant message you just typed to display on a monitor halfway across the world, you’re right! There are several different stacked protocols working just on your computer alone to make this happen, each doing a different job that contributes to those bits going across the world in the way you intended it to happen when you typed the message. That stack is what’s called the TCP/IP protocol stack and it’s used across the board by every Internet-enabled application on your computer to satisfy its own communication needs.

If you get the idea of a protocol stack then it should make sense when I say that bridging occurs at a lower level in the stack than routing does, and that’s the difference between them. The software needed to implement Bridging has to run at the second level of the stack, called the “Link layer”, whereas Routing takes place at the next level up – level three, which is the “Network” or “Internet” layer. In other words, both types of VPN use different protocols to accomplish the same end goal (ie. sending encrypted data back and forth over the internet between two specific computers). This is explained in more detail in the main article.

 

Instant Messaging Using the TCP Stack
Fig. 4 – The TCP/IP stack layers protocols for transmitting all Internet data, such as Instant Messages

Appendix B: The Internet Protocol at a Glance

Rather than just moving frames between local devices as is done in a bridged network, the IP protocol is involved with routing packets anywhere in the world via the Internet. How is this done? Using a standardized addressing scheme that all Internet computers adhere to, much like the one used by paper mail systems worldwide. For this to work all destination computers have a globally unique address that the sender can find out. This address is called a…yes, you guessed it, “IP address”, which is just a number between 0 and 4,294,967,295 (Actually, later versions of the IP protocol have increased this number to an even huger number to handle the sheer size of the Internet, since it must be bigger than the total number of computers on the Internet. But most computers these days are still using this old limit for many applications). IP addresses are typically expressed as four smaller numbers (each less than 256) put together and separated by dots such as 173.194.37.104.

The IP protocol also defines a series of rules that all Internet-enabled computers use to pass data around the globe. This seemingly impossible task is done by arranging all computers in a global hierarchy and segmenting the IP address so that different parts of the address refer to different levels of the hierarchy. The end result is that each packet gets passed from computer to computer as it traverses the Internet until it reaches its intended destination – this only works when all computers stick to the forwarding rules defined in the protocol. It also means that information going across the Internet passes through any number of strange computers, where malicious people can potentially do unscrupulous things with it if they’re able to read it.

So as I mentioned, up until recently there’s been a limit of around 4.3 billion possible IP addresses. If this seems small considering this number has to account for every single Internet-enabled computer in the whole world, you’re right. In practice the demands on addressing are reduced by isolating local area networks (LANs) from the Internet. For example, most home computers are attached to LANs in which only one machine is actually connected to the Internet (often a broadband modem in home networks). So even if there are 20 computers on the network only one address is used for the whole lot of them. Consider how everyone working for a company can have their mail sent to the front desk, and the company will route it internally to each employee’s desk. In this way all employees of the company effectively share a single mailing address – LANs are typically set up to use a similar scheme.

Networks arranged in hierarchies, such as the Internet, make use of subnets to help with routing traffic properly. Subnets are basically just pools of IP addresses that are assumed to be a part of the same network. For example, your home network might make use of the 10.10.10.* subnet, which means all your computers have addresses drawn from this pool, and your router knows that any traffic destined for an IP address beginning with 10.10.10 will get routed inside your network regardless of the last number attached to it. Similarly, as described in the main article, in a routed VPN OpenVPN defines a subnet to represent the remote network, and will automatically encrypt any traffic destined for, or decrypt any traffic arriving from, that subnet.

Network Address Translation (NAT)

In order for a company to deliver mail internally to the proper desks, it needs an internal addressing scheme (such as office building/floor/office number/name, for example) that’s different from the mailing address the Post Office delivers to. So the front desk will receive mail that was posted to the public company address, and it will somehow stamp it with an internal address before sending it via inter-office mail to the intended recipient. The same holds true in networking (see Fig. 5), but the local network actually uses the same addressing scheme as the public one (Note the addresses themselves would be different for Internet than for local traffic). That is to say LANs also use the Internet Protocol to route traffic even when it’s not going across the Internet.

So each computer on the LAN has one IP address, except for the Internet gateway computer, which has two: a local address and a global address. This global address is shared by all computers on the LAN to receive traffic from the Internet. Traffic from the Internet arrives at the gateway stamped with the global address (ie. the company’s mailing address), and the gateway has to figure out which LAN computer is the intended recipient and stamp the local address on it before forwarding it internally (ie. using inter-office mail in the example). This process is called Network Address Translation (NAT) because it translates the global address into a local one for inbound traffic, and does the opposite for outbound traffic.

Note that if a computer is doing NAT, it’s acting as a router because it is actively defining the route packets will take across the network. A NAT router can change change the address of any packet, but it doesn’t have to, depending on where it wants to send it.

 

Network Address Translation is Like the Mail System
Fig. 5: NAT replaces packets’ public addresses with private LAN addresses, much as a company employs an inter-office mail system

Appendix C: Glossary of Terms

3G 3rd Generation cellular phone commmunication standard.
Client A role played by a piece of software in client-server style communication. Any number of clients can initiate connections to a shared server in order to receive services from it.
Connection An active data channel between two pieces of software that allows them to communicate freely until the connection is broken.
Decryption A procedure to return a previously-encrypted piece of data to its original state.
DHCP The Dynamic Host Configuration Protocol. A protocol used to automatically assign IP addresses to computers on a network.
Encryption A procedure to convolute data into an unreadable form that requires a decryption key to reverse. The decryption key will only be held by the intended recipient allowing them but nobody else to retrieve the original form of the data.
File format Precise definition of each element in a type of file that if followed by the file writer, allows any reader to properly interpret the data in files of the type.
Firewall A software program that analyzes data arriving at or leaving from a network connection and either permits or denies passage of each packet.
Frame A unit of data on a packet-switched network, containing a payload as well as addressing information.
Gateway The computer in a local area network that has a connection to the Internet in addition to the local connection.
Insecure When discussing security, any insecure component is one that would potentially allow a malevolent outsider to read or otherwise compromise the network.
Internet Layer In the TCP/IP stack this is the layer that implements the Internet protocol (IP).
Internet Protocol (IP) The protocol that defines how traffic is routed across the Internet. It includes things like the format of mandatory headers included in each packet, the Internet addressing scheme using IP addresses, and routing behaviour.
IP address An address, usually expressed as 4 numbers between 0 and 255 separated by periods (eg. 209.85.146.103), that’s assigned to any machine on a network that allows that machine to be uniquely identified by the IP protocol.
LAN A Local Area Network, for example a home network, as opposed to a public network such as the Internet.
Link layer In the TCP/IP protocol stack the Link layer implements the Ethernet protocol that defines how traffic is routed around a LAN.
local The word local in computers is a non-specific term referring to scope, the same as it does in the general sense. Local activity occurs in the immediate vicinity of the user, as opposed to remote activity which is physically elsewhere. Depending on the context this could mean the user’s LAN, the user’s PC, or even the scope of an individual application.
MAC address A Media Access Control Address. This is a globally unique address that’s hard-coded in network hardware (such as network adaptors) by the manufacturer. Low-level protocols in the Link layer use it to identify devices that are physically present in the LAN.
Network Address Translation (NAT) A process wherein a router consistently changes (ie. translates) the IP address in a network packet to another address based on an associative table it maintains.
Network bridge A software component that bridges the traffic passing through two or more individual network adaptors. See the main article “A Bridged VPN” for details on how a network bridge woks.
network adaptor (or network card) A hardware device that allows a computer to connect to a LAN. This might be a wired connection, in which case the adaptor has a port to receive a network cable, or a wireless (Wifi) connection.
Network layer For the purposes of this article, another name for the Internet layer.
packet See definition for “Frame”.
packet sniffer An application that displays the contents (in text) of network packets arriving at or leaving from a particular network adaptor. These programs are extremely useful for debugging network problems as they allow the administrator to examine any and all network data passing through a particular computer.
PC Personal Computer.
ping A simple application/protocol that sends a tiny packet of data to a target computer. When any computer receives a ping packet it replies to the sender with a response packet. Ping is useful for testing which computers can talk to each other on a network. If a ping does not receive a response, the connection may be blocked (for example, by an intervening firewall preventing the computers from talking to each other).
pipe A term often used to describe an established data connection between two specific endpoints on a network.
port forwarding A feature of routers where specific ports can be associated with network hosts so that any inbound traffic sent to that port will get automatically forwarded to the associated host by the router.
remote A general term meaning “physically elsewhere”. See the definition for “local”.
Router A computer, often with multiple network connections, that actively moves traffic between hosts on a network. Routers have to decide where to send each packet based on its own configuration and the contents of the IP headers in the packet.
Routing The process of moving traffic around a network. Typically routing behaviour is defined by the IP protocol and the router’s local configuration.
secure When discussing security, a secure component is one that is protected against attacks from an external assailant (such as a hacker).
server A role played by a piece of software in client-server style communication wherein the server accepts requests from arbitrary clients and delivers services to the clients.
service A service in software is any functionality that adds value to an application. Usually services are offered by dedicated software components, and consumed by clients of the component.
subnet A part of the IP network hierarchy wherein all IP addresses in the subnet share a prefix. For example, the computers on a home network having addresses starting with 10.10.10.* define a subnet.
tap device A virtual device created by an OpenVPN installation that implements a VPN tunnel using the Ethernet protocol.
TCP/IP Transfer Control Protocol/Internet Protocol. These are two original protocols which form the basis for Internet communication. Nowadays TCP/IP can also refer to a suite of protocols including these two as well as others.
TCP/IP protocol stack See Appendix A for a description of the TCP/IP protocol stack.
Tunneling device (tun) A virtual device created by an OpenPN installation that implments a VPN tunnel using the IP protocol.
WAN Wide Area Network. This refers to a large network, usually the Internet, as opposed to a LAN.
Wireshark A popular free packet sniffer application.

Comments are closed.