Posts tagged ‘numbers’

2009-08-3

The 64-bit Difference

I was just reading about the limitations of the WAV audio format.

The WAV format is limited to files that are less than 4 GB in size, because of its use of a 32-bit unsigned integer to record the file size header (some programs limit the file size to 2–4 GB). Although this is equivalent to about 6.8 hours of CD-quality audio (44.1 KHz, 16-bit stereo), it is sometimes necessary to exceed this limit, especially when greater sampling rates or bit resolutions are required. […] Its 64-bit header allows for much longer recording times.

I got to thinking about computer memory and the difference in capacity between N-bit systems. A computer uses an address to access different parts of the memory. The address consists of numbers (internally, ones and zeros), which for a 32-bit system (where the addresses always have a length of 32 bits) would look something like “af34c97b” written using a radix of 16. A 32-bit uses these addresses to look up places in the memory. Each address stands for a certain byte in the memory, so obviously if we only have addresses with 32 bits, we can’t look beyond the address with 32 ones in a row since that is the maximum value 32 bits hold.

Think of when you’re mailing a letter: you can mail the letter to anyone you want using only two numbers for the house or apartment number. You would be able to send it to (0)1-99 Blah Blah St., but not to the guy living at the end of the street at no. 100. Memory addresses work in the same way.

Let’s do some math now. Say your system is working with 32-bit memory addresses. That means the largest value we could have (the farthest down the street we would be able to send the letter) would be 1111 1111 1111 1111 1111 1111 1111 1111, or FFFF FFFF in hexadecimal numbers. Let’s write this figure out in a format that we’re more familiar with, such as Gibibytes (GiB) or as it is more incorrectly known as: Gigabytes (GB). 1 GiB = 1024^3 bytes; 1 GB = 1000^3 bytes.

FFFF FFFF in GiB is 2^32 / 1024^3 = 2^32 / (2^10)^3 = 2^32 / 2^30 = 2^(32-30) = 2^2 = 4 GiB.

You might have heard already that 32-bit systems only can handle 4 GiB of memory, and now you hopefully know why if you didn’t already. Now then, what happens if we double that number, and make it a 64-bit system?

FFFF FFFF FFFF FFFF in GiB would be 2^64 / 2^30 = 2^34 = 17179869184 GiB, or 16 Exbibyte (EiB). A MASSIVE amount of memory. As you can see, with a double increase in address size, we do not get a doubling of the memory space, but rather a number that is the number of bytes in 4 GiB to the power of 2. 4 GiB = 4294967296 Bytes, and 16 EiB = 18446744073709551616 bytes. These numbers are obviously incomprehensible. So I thought it would be easier to demonstrate them with an example, regarding the Wikipedia article quoted at the top of the article.

As the quote says, a 4 GB (actually GiB) WAV file (with file size header of 32 bits) would give us 6.8 hours of music with a sampling rate of 44.1 kHz, a bit depth of 16 bits and 2 channels (stereo).

If we assume the file size is proportional to the playing time of the audio file if the quality specifications remain the same, then we can calculate the playing time of a WAV file with file size header of 64 bits:

17179869184 [GiB] * (6.8 [hours] / 4 [GiB]) [hour-to-filesize ratio] = 29205777612.8 hours of music.

This number is still incomprehensible so let’s walk up the ladder of time units, shall we? Note that when calculating the amount of years, we will use a year length of 365.2425 days, which is the arithmetic mean of amount of days in a year in the Gregorian Calendar, which has a 400-year cycle and 146 097 days: 146097 / 400 = 365.2425 days. This is to take leap years into account. One could also use a day length of 24 hours and 58.3594503 seconds, but that doesn’t feel as nice, somehow.

29205777612.8 hours
= 1216907400.533333333 days
= 173843914.361904762 weeks
= 39981351.585316605 months (average of 30.436875 days/month in one 365.2425-day year)
= 3331779.298776384 years

So we see that just by doubling the address space, we go from 6.8 hours of music — which I guess you could plough through on a really dull and long bus ride — to more than 3.33 million years of music.

That, my friend, is the 64-bit difference.

… hmm? What was that about 128 bits? Shut up. 😦

No but really, to fill a 128-bit hard drive, it would more energy than it would to boil the oceans of the earth. Theoretical breakdown. Enjoy.

Advertisements
2009-07-18

Open-Source Calculus

I’m (re(re))taking the second course on calculus during the summer at the university. It’s going much better this time around, which is a good thing.

I did have problems with one problem (har), and I couldn’t make sense of it on my messy notebook, and I didn’t feel like doing the entire problem over again. The only thing wrong with my solution is that the answers in the back of the book said term1 – term2, while I kept getting term2 – term1 in my notebook, so I wasn’t too far off but still couldn’t get it right — couldn’t find the erring minus sign. All I really felt like was doing some programming, which I enjoy, but what would I code if I have no software needs?

Then I thought: I need to get a more structured view of this problem. Why not write it up in LaTeX and make it into a nice, good-enough-to-print, PDF solution. That way, I will get some practice writing LaTeX documents (it’s been a few months, sadly), and writing LaTeX is pretty much programming in a way, so I get to practice LaTeX, scratch my coding itch, and maybe find out where that offending minus sign went.

Sure enough, it worked pretty well.

Solution to problem 6.2.9 in Calculus, A Complete Course (Sixth Edition).

I want to point out that this PDF was produced entirely using Free and Open-Source Software (FOSS):

  • texlive LaTeX distribution (tex->PDF compiler: pdflatex/pdftex)
  • GNU Emacs as the LaTeXt editor
  • Ubuntu Linux to run Emacs and tex compiler
  • Totem to play music while writing up the solution
  • Grip + oggenc to rip CDs to Ogg Vorbis
  • Ogg Vorbis media container format
  • libVorbis 1.2.0 used by oggenc (darn you Ubuntu for not updating libVorbis since 2007)
  • and so forth.

It works, folks!

2009-01-26

Endianness

Once again I came across the term Endianness. I’ve never really cared enough or felt I had the general knowledge of the field to understand what endianness really means, but today I finally felt differently. Endianness has an article on Wikipedia and I decided I would read some of it and finally get an understanding of what the term means.

Endianness basically has to do with what comes first. Take the number 128 as an example. The “1” has the most significant meaning because it is the highest number, or the number with the greatest value because of its position. The same goes for any numbering notation that is positional of any base. Take the binary number 00100100 for example (the ASCII code for “$” and Bender’s apartment number). The first 0 represents 0 * 128, the second one 0 * 64, then the first 1 represents 1 * 32, and so on. As you can see, the farther we go to the right, the smaller the value of the position. This is called a big-endian order, that is, the information with the most significance comes first. Then there is little-endian order which would quite simply just make the number 128 be read as 821, but still have the same value.

Big-endian and little-endian order can be something that is important to deal with when writing a computer program, especially for applications that communicate over a network and run on different architectures. This is discussed in the Wikipedia article. You can however have a lot of luck, for example with our dollar sign: 00100100 is a palindrome which means it’s written the same if we write it backwards. Most words and numbers are not palindromes.

Now we get to the practical part of endianness in regular writing and reading on a piece of paper. Say you’re reading an article on astronomy and it gives you some astronomical number that has tonnes of digits, and the writer doesn’t use prefixes or scientific notation because let’s say it’s a popular-science magazine and the target readers aren’t used to either of them.

Say the article talks about a distance in space of 819273987123781233 km. That’s fun to read. If you would be reading the article to one of your friends, you’d likely take a few seconds to first determine how big that number actually is (millions/billions/trillions/etc.), and then start to slowly traverse the big number. Now, we’ve since long invented something called the thousands separator, which would transform the huge number into something slightly more readable, but not by much: 819,273,987,123,781,233. The problem is that we don’t see big numbers like this often enough to “see” how big it is immediately. If we see the number “100,000” or even “100000” we might be able to determine its true size much faster, because those numbers are much more common. But not this one.

The number is spoken as: “eight-hundred and nineteen quadrillion two-hundred and seventy-three trillion nine-hundred and eighty-seven billion one-hundred and twenty-three million seven-hundred and eighty-one thousand two-hundred and thirty-three”. Not only does it take long to say, but even longer to form the phrase in your head when you only have the digits to start with. What would seemingly make things easier would be to move over to a little-endian notation where the smallest number comes first, such that we would have 332,187,321,789,372,918 but it would at the same time represent the same value as before. However, this would force us to say “three, thirty, two-hundred, one thousand, eighty thousand, seven-hundred thousand, three million” and so on. Even if it is easier to start reading and saying the number much quicker, this is still as inefficient as the old way, or worse, since we have to say “thousand” and “million” and “billion” as so on for each digit that we come to.

This is where I propose the adoption of a more peculiar style of what I call little-endian thousands-separated notation, which is based on the fact that we like to group things by the thousands, and multiples and exponentiations of one thousand, in our numbering system. The basic idea is to either say the first 332 as “two-hundred and thirty-three” and keep a pure little-endian literal notation, or the even more peculiar, but most likely underestimated, notation that looks like this: 233,781,123,987,273,819, which you would read as “two-hundred and thirty-three seven-hundred and eighty-one thousand one-hundred and twenty-three million nine-hundred and eighty-seven billion two-hundred and seventy-three trillion eight-hundred and nineteen quadrillion”. This way, we can compress not only the way we actually think the huge number in our head but also the way we say it. We can also benefit from being able to start reading and saying the number right away without having to scan more than 3 digits at a time, which would come more and more naturally as this notation becomes more widely adopted. As an added bonus, saying really huge numbers could add excitement, because as it is now, we say the highest number first, thereby ruining the surprise of just how big the number really is.