Vagaries: The Internet Wiggles

Question

Why does the Confidence Factor℠ change sometimes?

Also why does it sometimes take longer than other times to get a webpage? And why does a webpage, or part of it, sometimes fail to load at all, but a refresh fixes it?

And while we are at it, why does my email sometimes arrive right away, and sometimes take minutes or even hours?

Answer

The answer is "Internet Vagaries". Vagaries is a term we coined to lump together all the excuses the Internet has for these things.

These questions get a the core of how the Internet, and email even more, are designed. Unlike a engine or a gear box, the pieces of the Internet are not directly coupled together. The are loosely coupled, which makes them substantially more robust.

To understand why, lets first look at how the Internet works.

How the Internet Works

Data goes across the Internet in little pieces, called packets. The Internet was designed by the US Department of Defense (DARPA) to get these packets from point A to point B even if something breaks inbetween them. Let us use a fantasy analogy to illustrate this.

Imagine a sport stadium full of people and ushers who have all agreed to pass notes for each other. Now let's say a husband in in section 132 row 12 seat 4 wants to send a note to their wife in section 401 row 3 seat 22. Notice how the husband has a better seat than his wife? Chauvinist pig!

Anyway, the husband can't reach his wife directly -- she is two tiers and many sections away around the stadium. So he hands the note to the person next to him, who hands it to the person next to them, and so on until the note reaches the end of the row.

That person doesn't know how to reach the wife, so they hand it to the person behind them. Eventually the note reaches the back row, where it gets handed to the usher. The usher knows that section 401 is closer in a clockwise direction, so he hands the note to the usher in the next clockwise section. And so on until the note reaches the usher for section 401. The usher hands the note to people in the section who pass it to the wife.

Turns out this is a very fail-safe way to move notes. If the person next to you is up getting a snack, you just hand it to the next person over. If an usher is on break, the note is just passed to the next usher. If a whole area is empty (bad team, can't fill the stadium), the person with the note gets up and walks it across.

Each person (node) in a path between husband and wife doesn't know about the whole path, but everyone knows how to get the note closer to where it belongs. The ushers do know how the sections are laid out -- these are the core routers on the Internet that know where big blocks of hosts (seats) are.

Hosts connected to the Internet know what to do with packets going to other hosts near them, and they have a default route where they hand packets they don't know what to do with. Packets go up a local path to a core router (ushers) that sends them to another core router that sends them back down a local path.

The Internet Wiggles

So how does the stadium "wiggle"? People are always leaving their seats. Ushers are always taking breaks. Another note from the same husband to his wife could be handled by different people. The note could go counter-clockwise, or go up and then over instead of over then up, and so on.

So a note from and to the same places can take different times to get there. And sometimes people drop notes or the usher gets really busy and loses one.

Bottom line, the design of the Internet makes data transfers reliable but inconsistent -- they "wiggle". The effect of this wiggle is the inconsistency in webpage loads and email arrival times. We coined the term "Internet Vagaries" as the reason for these inconsitencies.

Email is an even worse case for wiggles because it takes a draconian approach to dealing with Internet Vagaries. Email is quick to give up on a message if a packet gets lost or takes too long to get there because all email systems are programmed to try again and again to deliver an email. Quitting is what email often does, knowing it will be trying again.

CheckTLS tests, on the other hand, are real-time and are looking for failures, so a retry affects the result. A quit or even just a random wiggle can change a test result.

At CheckTLS, we handle this by ignoring an inconsistent test result. For particular tests that we run daily, we don't worry about a change unless it happens two or three days. For time critical tests, we schedule the test to run three times separated by five minutes. If all three fail, we know it was not Internet Vagaries but a real change in results.