Archive for November 2011
Hash Tables
Hash tables are an efficient implementation of a keyed array data structure,
a structure sometimes known as an associative array or map. If you're working
in C++, you can take advantage of the STL map container
for keyed arrays implemented using binary trees, but
this article will give you some of the theory behind how a hash table
works.
Keyed Arrays vs. Indexed Arrays
One of the biggest drawbacks to a language like C is that there are no
keyed arrays. In a normal C array (also called an indexed array), the only way
to access an element would be through its index number. To find element 50 of
an array named "employees" you have to access it like this:
1
| employees[50]; |
In a keyed array, however, you would be able to associate each
element with a "key," which can be anything from a name to a product
model number. So, if you have a keyed array of employee records, you
could access the record of employee "John Brown" like this:
1
| employees["Brown, John"]; |
One basic form of a keyed array is called the hash table. In a hash table, a
key is used to find an element instead of an index number. Since the hash table
has to be coded using an indexed array, there has to be some way of
transforming a key to an index number. That way is called the hashing function.
Hashing Functions
A hashing function can be just about anything. How the hashing function is
actually coded depends on the situation, but generally the hashing function
should return a value based on a key and the size of the array the hashing
table is built on. Also, one important thing that is sometimes overlooked is
that a hashing function has to return the same value every time it is given the
same key.
Let's say you wanted to organize a list of about 200 addresses by people's
last names. A hash table would be ideal for this sort of thing, so that you can
access the records with the people's last names as the keys.
First, we have to determine the size of the array we're using. Let's
use a 260 element array so that there can be an average of about 10
element spaces per letter of the alphabet.
Now, we have to make a hashing function. First, let's create a relationship between letters and numbers:
A --> 0 B --> 1 C --> 2 D --> 3 ... and so on until Z --> 25.
The easiest way to organize the hash table would be based on the first letter of the last name.
Since we have 260 elements, we can multiply the first letter of the
last name by 10. So, when a key like "Smith" is given, the key would be
transformed to the index 180 (S is the 19 letter of the alphabet, so S
--> 18, and 18 * 10 = 180).
Since we use a simple function to generate an index number quickly,
and we use the fact that the index number can be used to access an
element directly, a hash table's access time is quite small. A linked list of keys and elements wouldn't be nearly as fast, since you would have to search through every single key-element pair.
Collisions and Collision Handling
Problems, of course, arise when we have last names with the same
first letter. So "Webster" and "Whitney" would correspond to the same
index number, 22. A situation like this when two keys get sent to the
same location in the array is called a collision. If you're trying to
insert an element, you might find that the space is already filled by a
different one.
Of course, you might try to just make a huge array and thus make it
almost impossible for collisions to happen, but then that defeats the
purpose of using a hash table. One of the advantages of the hash table
is that it is both fast and small.
Collision handling with open addressing
The simplest collision handling algorithm is known as the open
address method or the closed hashing method. When you are adding an
element, say "Whitney," and you find that another element is already
there ("Webster," for instance) then you would just proceed to the next
element space (the one after "Webster"). If that is filled, you go on
to the next one, and so on, until you find an empty space to insert the
new element (all those extra elements came in handy after all!)
... 220 "White" | <-- ### COLLISION ### : Gotta move on to the next. 221 "Webster" | <-- ### COLLISION ### : Next one. 222 | Ahhh, perfect. Insert Here. 223 | ...Since we modified the insertion algorithm, we also have to change the function that finds the element. You have to have some way of verifying that you've found the element you want, and not some other element. The simplest way is to just compare keys. (Does this record have the last name "Whitney"? Does this one?) If the element you find is not one of them, just move on to the next element until you reach the one you want or you find an empty space (which means the element is not in the table).
Sounds simple, right? Well, it gets more complicated. What if you have so many collisions that you run off the end of the array?
If you're trying to insert "Zorba" and all the elements are filled
because of the collision handling, then what? Look at the example:
... 258 "Whitney" | <-- Nope, not Empty 259 "Zeno" | Nope, not Empty ---------------- <-- Ummm, what now?
The easiest thing to do is to just wrap around to the beginning again. If
there are still no empty spaces, then we have to resize the array, since there
isn't enough space in the hash table for all of the elements. If we resize the
array, of course, we'll have to come up with a tweak to our hash function (or
at least how we handle it) so that it covers the right range of values
again, but at least we'll have room. (Note that resizing the array means that
occasionally inserting a value into the list will cause an O(n) copy
operation to take place, but that on average this should happen only
once for every n items inserted, so insertion should be on average
constant time, O(1). (If you aren't sure what terms like "O(n") and
"constant time" mean, take a look at the
Cprogramming.com article series on algorithmic efficiency.)
As you can see, resizing isn't all that bad--still, if you know the
amount of space you will need to start with, you can save your program
some work.
Handling collisions with separate chaining
A second collision handling strategy is to store a linked list
at each element in the hash data structure. This way, when a collision
occurs, you can just add the element into the linked list that is
stored at the hash index. If you have only a single element with a
particular hash value, then you have a single element list--no
performance penalty. If you have a lot of elements hashing to the same
value, you'll see a slowdown of course, but no more than you otherwise
would see with hash collisions.
One nice thing about separate chaining is that having a bunch of
values that hash "near" each other is less important. With open
addressing, if you have a cluster of values that hash to nearly the
same value, you'll run out of open space in that part of the hash. With
separate chaining, each element that has a different hash value will
not impact the other elements.
Resizing dynamically based on a load factor
Generally speaking, you wouldn't want your hash table to grow
completely full because this will make lookups take much longer. If a
value isn't in the array, with open addressing, you have to keep
looking until you hit an empty location or you get back to the starting
point--in other words, with a completely full table, lookups could be
O(n), which is horrible. A real hash table implementation will keep
track of its load factor, the ratio of elements to array size. If you
have a 10 element array, with 7 elements, the load factor is 0.7. In
fact, 0.7 is generally about the right time to resize the underlying
array.
Choosing a Good Hash Algorithm
The more collisions you have, the worse the performance of your hash table
will be. With enough elements in your hash table, you can get an average
performance that's quite good--essentially constant time O(1). (The trick is to
make the array grow over time as you start to fill up the array.) But if you
have a lot of elements that hash to the same value, then you will have to start
doing a lookup through a list of elements that all have the same hash value.
This can make your hash lookups go from constant time to being, well, linear
time in the number of elements. Imagine if your hash function hashed all values
to 0, putting them in the first element of the array. Then it would be just a
really complicated way of implementing a linear search.
Choosing a good hash algorithm can require some care and experimentation,
and it will depend on your problem domain. If you're working with names, you probably
don't want a hash algorithm that just looks at the first letter, because the
letters of the alphabet are not used evenly--you'll find a lot more names that
start with S than with Z. You also want to have your hash functions be
fast--you don't want to lose all the time savings you're getting from the hash
table because you're computing the hash function really slowly. It's a delicate
balance..
Bitwise Operators in C and C++: A Tutorial
Another example comes up when dealing with data compression: what if you wanted to compress a file? In principle, this means taking one representation and turning it into a representation that takes less space. One way of doing this is to use an encoding that takes less than 8 bits to store a byte. (For instance, if you knew that you would only be using the 26 letters of the Roman alphabet and didn't care about capitalization, you'd only need 5 bits to do it.) In order to encode and decode files compressed in this manner, you need to actually extract data at the bit level.
Finally, you can use bit operations to speed up your program or perform neat tricks. (This isn't always the best thing to do.)
Thinking about Bits
The byte is the lowest level at which we can access data; there's no "bit" type, and we can't ask for an individual bit. In fact, we can't even perform operations on a single bit -- every bitwise operator will be applied to, at a minimum, an entire byte at a time. This means we'll be considering the whole representation of a number whenever we talk about applying a bitwise operator. (Note that this doesn't mean we can't ever change only one bit at a time; it just means we have to be smart about how we do it.) Understanding what it means to apply a bitwise operator to an entire string of bits is probably easiest to see with the shifting operators. By convention, in C and C++ you can think about binary numbers as starting with the most significant bit to the left (i.e., 10000000 is 128, and 00000001 is 1). Regardless of underlying representation, you may treat this as true. As a consequence, the results of the left and right shift operators are not implementation dependent for unsigned numbers (for signed numbers, the right shift operator is implementation defined).The leftshift operator is the equivalent of moving all the bits of a number a specified number of places to the left:
[variable]<<[number of places]For instance, consider the number 8 written in binary 00001000. If we wanted to shift it to the left 2 places, we'd end up with 00100000; everything is moved to the left two places, and zeros are added as padding. This is the number 32 -- in fact, left shifting is the equivalent of multiplying by a power of two.
int mult_by_pow_2(int number, int power)
{
return number<<power;
}
Note that in this example, we're using integers, which are either 2 or 4
bytes, and that the operation gets applied to the entire sequence of 16 or 32
bits.
But what happens if we shift a number like 128 and we're only storing it in a single byte: 10000000? Well, 128 * 2 = 256, and we can't even store a number that big in a byte, so it shouldn't be surprising that the result is 00000000.
It shouldn't surprise you that there's a corresponding right-shift operator: >> (especially considering that I mentioned it earlier). Note that a bitwise right-shift will be the equivalent of integer division by 2.
Why is it integer division? Consider the number 5, in binary, 00000101. 5/2 is 2.5, but if you are performing integer division, 5/2 is 2. When you perform a right shift by one: (unsigned int)5>>1, you end up with 00000010, as the rightmost 1 gets shifted off the end; this is the representation of the number 2. Note that this only holds true for unsigned integers; otherwise, we are not guaranteed that the padding bits will be all 0s.
Generally, using the left and right shift operators will result in significantly faster code than calculating and then multiplying by a power of two. The shift operators will also be useful later when we look at how to manipulating individual bits.
For now, let's look at some of the other binary operators to see what they can do for us.
Bitwise AND
The bitwise AND operator is a single ampersand: &. A handy mnemonic is that the small version of the boolean AND, &&, works on smaller pieces (bits instead of bytes, chars, integers, etc). In essence, a binary AND simply takes the logical AND of the bits in each position of a number in binary form.For instance, working with a byte (the char type):
01001000 & 10111000 = -------- 00001000The most significant bit of the first number is 0, so we know the most significant bit of the result must be 0; in the second most significant bit, the bit of second number is zero, so we have the same result. The only time where both bits are 1, which is the only time the result will be 1, is the fifth bit from the left. Consequently,
72 & 184 = 8
Bitwise OR
Bitwise OR works almost exactly the same way as bitwise AND. The only difference is that only one of the two bits needs to be a 1 for that position's bit in the result to be 1. (If both bits are a 1, the result will also have a 1 in that position.) The symbol is a pipe: |. Again, this is similar to boolean logical operator, which is ||.01001000 | 10111000 = -------- 11111000and consequently
72 | 184 = 248Let's take a look at an example of when you could use just these four operators to do something potentially useful. Let's say that you wanted to keep track of certain boolean attributes about something -- for instance, you might have eight cars (!) and want to keep track of which are in use. Let's assign each of the cars a number from 0 to 7.
Since we have eight items, all we really need is a single byte, and we'll use each of its eight bits to indicate whether or not a car is in use. To do this, we'll declare a char called in_use, and set it to zero. (We'll assume that none of the cars are initially "in use".)
char in_use = 0;Now, how can we check to make sure that a particular car is free before we try to use it? Well, we need to isolate the one bit that corresponds to that car. The strategy is simple: use bitwise operators to ensure every bit of the result is zero except, possibly, for the bit we want to extract.
Consider trying to extract the fifth bit from the right of a number: XX?XXXXX We want to know what the question mark is, and we aren't concerned about the Xs. We'd like to be sure that the X bits don't interfere with our result, so we probably need to use a bitwise AND of some kind to make sure they are all zeros. What about the question mark? If it's a 1, and we take the bitwise AND of XX?XXXXX and 00100000, then the result will be 00100000:
XX1XXXXX & 00100000 = -------- 00100000Whereas, if it's a zero, then the result will be 00000000:
XX0XXXXX & 00100000 = -------- 00000000So we get a non-zero number if, and only if, the bit we're interested in is a 1.
This procedure works for finding the bit in the nth position. The only thing left to do is to create a number with only the one bit in the correct position turned on. These are just powers of two, so one approach might be to do something like:
int is_in_use(int car_num)
{
// pow returns an int, but in_use will also be promoted to an int
// so it doesn't have any effect; we can think of this as an operation
// between chars
return in_use & pow(2, car_num);
}
While this function works, it can be confusing. It obscures the fact that
what we want to do is shift a bit over a certain number of places, so that we
have a number like 00100000 -- a couple of zeros, a one, and some more zeros.
(The one could also be first or last -- 10000000 or 00000001.)
We can use a bitwise leftshift to accomplish this, and it'll be much faster to boot. If we start with the number 1, we are guaranteed to have only a single bit, and we know it's to the far-right. We'll keep in mind that car 0 will have its data stored in the rightmost bit, and car 7 will be the leftmost.
int is_in_use(int car_num)
{
return in_use & 1<<car_num;
}
Note that shifting by zero places is a legal operation -- we'll just get back
the same number we started with.
All we can do right now is check whether a car is in use; we can't actually set the in-use bit for it. There are two cases to consider: indicating a car is in use, and removing a car from use. In one case, we need to turn a bit on, and in the other, turn a bit off.
Let's tackle the problem of turning the bit on. What does this suggest we should do? If we have a bit set to zero, the only way we know right now to set it to 1 is to do a bitwise OR. Conveniently, if we perform a bitwise OR with only a single bit set to 1 (the rest are 0), then we won't affect the rest of the number because anything ORed with zero remains the same (1 OR 0 is 1, and 0 OR 0 is 0).
Again we need to move a single bit into the correct position: void set_in_use(int car_num) { in_use = in_use | 1<<car_num; } What does this do? Take the case of setting the rightmost bit to 1: we have some number 0XXXXXXX | 10000000; the result, 1XXXXXXX. The shift is the same as before; the only difference is the operator and that we store the result.
Setting a car to be no longer in use is a bit more complicated. For that, we'll need another operator.
The Bitwise Complement
The bitwise complement operator, the tilde, ~, flips every bit. A useful way to remember this is that the tilde is sometimes called a twiddle, and the bitwise complement twiddles every bit: if you have a 1, it's a 0, and if you have a 0, it's a 1.This turns out to be a great way of finding the largest possible value for an unsigned number:
unsigned int max = ~0;0, of course, is all 0s: 00000000 00000000. Once we twiddle 0, we get all 1s: 11111111 11111111. Since max is an unsigned int, we don't have to worry about sign bits or twos complement. We know that all 1s is the largest possible number.
Note that ~ and ! cannot be used interchangeably. When you take the logical NOT of a non-zero number, you get 0 (FALSE). However, when you twiddle a non-zero number, the only time you'll get 0 is when every bit is turned on. (This non-equivalence principle holds true for bitwise AND too, unless you know that you are using strictly the numbers 1 and 0. For bitwise OR, to be certain that it would be equivalent, you'd need to make sure that the underlying representation of 0 is all zeros to use it interchangeably. But don't do that! It'll make your code harder to understand.)
Now that we have a way of flipping bits, we can start thinking about how to turn off a single bit. We know that we want to leave other bits unaffected, but that if we have a 1 in the given position, we want it to be a 0. Take some time to think about how to do this before reading further.
We need to come up with a sequence of operations that leaves 1s and 0s in the non-target position unaffected; before, we used a bitwise OR, but we can also use a bitwise AND. 1 AND 1 is 1, and 0 AND 1 is 0. Now, to turn off a bit, we just need to AND it with 0: 1 AND 0 is 0. So if we want to indicate that car 2 is no longer in use, we want to take the bitwise AND of XXXXX1XX with 11111011.
How can we get that number? This is where the ability to take the complement of a number comes in handy: we already know how to turn a single bit on. If we turn one bit on and take the complement of the number, we get every bit on except that bit:
~(1<<position)Now that we have this, we can just take the bitwise AND of this with the current field of cars, and the only bit we'll change is the one of the car_num we're interested in.
int set_unused(int car_num)
{
in_use = in_use & ~(1<<position);
}
You might be thinking to yourself, but this is kind of clunky. We actually
need to know whether a car is in use or not (if the bit is on or off) before we can
know which function to call. While this isn't necessarily a bad thing, it
means that we do need to know a little bit about what's going on. There is an
easier way, but first we need the last bitwise operator: exclusive-or.
Bitwise Exclusive-Or (XOR)
There is no boolean operator counterpart to bitwise exclusive-or, but there is a simple explanation. The exclusive-or operation takes two inputs and returns a 1 if either one or the other of the inputs is a 1, but not if both are. That is, if both inputs are 1 or both inputs are 0, it returns 0. Bitwise exclusive-or, with the operator of a carrot, ^, performs the exclusive-or operation on each pair of bits. Exclusive-or is commonly abbreviated XOR.For instance, if you have two numbers represented in binary as 10101010 and 01110010 then taking the bitwise XOR results in 11011000. It's easier to see this if the bits are lined up correctly:
01110010 ^ 10101010 -------- 11011000You can think of XOR in the following way: you have some bit, either 1 or 0, that we'll call A. When you take A XOR 0, then you always get A back: if A is 1, you get 1, and if A is 0, you get 0. On the other hand, when you take A XOR 1, you flip A. If A is 0, you get 1; if A is 1, you get 0.
So you can think of the XOR operation as a sort of selective twiddle: if you apply XOR to two numbers, one of which is all 1s, you get the equivalent of a twiddle.
Additionally, if you apply the XOR operation twice -- say you have a bit, A, and another bit B, and you set C equal to A XOR B, and then take C XOR B: you get A XOR B XOR B, which essentially either flips every bit of A twice, or never flips the bit, so you just get back A. (You can also think of B XOR B as cancelling out.) As an exercise, can you think of a way to use this to exchange two integer variables without a temporary variable?
How does that help us? Well, remember the first principle: XORing a bit with 0 results in the same bit. So what we'd really like to be able to do is just call one function that flips the bit of the car we're interested in -- it doesn't matter if it's being turned on or turned off -- and leaves the rest of the bits unchanged.
This sounds an awful lot like the what we've done in the past; in fact, we only need to make one change to our function to turn a bit on. Instead of using a bitwise OR, we use a bitwise XOR. This leaves everything unchanged, but flips the bit instead of always turning it on:
void flip_use_state(int car_num)
{
in_use = in_use ^ 1<<car_num;
}
When should you use bitwise operators?
Bitwise operators are good for saving space -- but many times, space is hardly an issue. And one problem with working at the level of the individual bits is that if you decide you need more space or want to save some time -- for instance, if we needed to store information about 9 cars instead of 8 -- then you might have to redesign large portions of your program. On the other hand, sometimes you can use bitwise operators to cleverly remove dependencies, such as by using ~0 to find the largest possible integer. And bit shifting to multiply by two is a fairly common operation, so it doesn't affect readability in the way that advanced use of bit manipulation can in some cases (for instance, using XOR to switch the values stored in two variables).There are also times when you need to use bitwise operators: if you're working with compression or some forms of encryption, or if you're working on a system that expects bit fields to be used to store boolean attributes.
Summary
You should now be familiar with six bitwise operators:Works on bits for left argument, takes an integer as a second argument
bit_arg<<shift_argShifts bits to of bit_arg shift_arg places to the left -- equivalent to multiplication by 2^shift_arg
bit_arg>>shift_argShifts bits to of bit_arg shift_arg places to the right -- equivalent to integer division by 2^shift_arg
Works on the bits of both arguments
left_arg & right_argTakes the bitwise AND of left_arg and right_arg
left_arg ^ right_argTakes the bitwise XOR of left_arg and right_arg
left_arg | right_argWorks on the bits of only argument
~argReverses the bits of arg
Understanding Different Base Systems
This essay is targeted at new students of computer programming or computer science who want to understand how base two (binary), base eight (octal), and base sixteen (hexadecimal) work.
First of all, it's important to realize that each of these base systems is just another way of writing down the same number. When you convert a number between different bases, it should still have the same value. In this essay, when I want to refer to the actual value of a number (regardless of its base), I'll do it in base 10 because that's what most people are used to.
It's generally easiest to understand the concept of different bases by looking at base 10. When we have a number in base 10, each digit can be referred to as the ones digit, tens digit, the hundreds digit, the thousands digit, or so forth. For instance, in the number 432, 4 is the hundreds digit, 3 is the tens digit, and 2 is the ones digit.
Another way to think about this is to rewrite 432 as
4 x 102 + 3 x 101 + 2 x 100Each digit is multiplied by the next power of ten. Numbers in other bases, such as base 16, are merely numbers where the base is not ten! For instance, we could interpret 432 as though it were in base 16 by evaluating it as
4 x 162 + 3 x 161 + 2 x 100This would be the same as the number 1074 in base 10.
So to convert a number from a given base into base 10, all we need to do is treat each place as a power of the given base times the value of the digit in that place. Note that customarily for a given base, only digits from 0 to the base minus one are used. For instance, in decimal, we only use the digits 0 through 9. That's because we don't need any more digits to express every possible number. (But we do need at least that many; if we only had 8 digits, how would we ever express the value 9?)
Now, bases greater than 10 will require more than 10 possible digits. For instance, the number 11 in base ten can be expressed in base 16 with only a single digit because the ones place in base 16 can range from 0 to 15. Since we only have 10 digits, the letters A through F are used to stand for the "digits" 10 through 15. So, for instance, the hexadecimal number B stands for the decimal number 11.
Bases less than ten will require fewer digits--for instance, binary, which works using powers of two, only needs two digits: one and zero. The binary number 1001, for instance, is the same as writing
1 * 23 0 * 22 0 * 21 1 * 20which comes out to the decimal value 9.
Numbers written in octal use a base of 8 instead of 2 or 16. See if you can figure out what the number 20 written in octal would be in base ten.
Because octal, hexadecimal, and decimal numbers can often share the same digits, there needs to be some way of distinguishing between them. Traditionally, octal numbers are written with a leading 0; for instance, 020 is the same thing as the number 20 in base 8. Hexadecimal numbers are written with the prefix of "0x". So 0x20 would be the number 20 in base 16; we'd interpret it the same as the decimal number 32.
Converting from decimal to octal or hexadecimal
It turns out that when you wish to convert from decimal to octal or hexadecimal, there is a very easy formula that you can use. I'll give you the one for octal, and let you puzzle out the hexadecimal version (which may come quite naturally to some of you).To convert from octal to hexadecimal, all you need to do is group the binary digits into pairs of three and convert each one into the corresponding octal number. For instance, given the binary number 010011110, you would group 011 and 110 together. 010 is 2, 011 is 3 and 110 is 6. So the octal number is 0236.
So why exactly does this work? Well, let's take a look at what 010011110 looks like:
0 * 28 1 * 27 0 * 26 0 * 25 + 1 * 24 + 1 * 23 + 1 * 22 + 1 * 21 + 0 * 20That's actually the same as
0 * 22 * 26 + 1 * 21 * 26 + 0 * 20 * 26 + 0 * 22 * 23 + 1 * 21 * 23 + 1 * 20 * 23 + 1 * 22 * 20 + 1 * 21 * 20 + 0 * 20 * 20Whoa! First, notice that the far right column is actually turning into powers of 8! 23 is 8, and 26 is 64! So this means for each group of three digits, we have the base increasing by a factor of 8. Moreover, look at the two columns on the left. It can sum up to at most 7 (since 20 + 21 + 22 = 1 + 2 + 4 and the binary digit just decides whether each power of two is included into the sum or not). That's exactly the same as having eight digits, 0 through 7, and once we sum them all together, we multiply the sum by a power of eight. That's just the same as making each group of three binary digits an octal digit!
Integer to english string program in c++
#include <string.h>
#include <iostream.h>
using namespace std;
string num_to_text[] = { "", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen", "seventeen", "eighteen", "nineteen" };
string tens_to_text[] = { "", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety" };
string power_to_text[] = { "", "thousand", "million", "billion" };
string padIfNeeded (string ans)
{
if ( ans == "" )
{
return "";
}
return " " + ans;
}
string translateHundred (int hundred_chunk)
{
// handle special cases in the teens
if ( hundred_chunk < 20 ) {
return num_to_text[ hundred_chunk ];
}
int tens = hundred_chunk / 10;
int ones = hundred_chunk % 10;
return tens_to_text[ tens ] + padIfNeeded( num_to_text[ ones ] );
}
string translateThousand (int thousand_chunk)
{
if ( thousand_chunk < 100 )
{
return translateHundred( thousand_chunk );
}
else
{
int hundreds = thousand_chunk / 100;
int hundred_chunk = thousand_chunk % 100;
return num_to_text[ hundreds ] + " hundred" + padIfNeeded( translateHundred( hundred_chunk ) );
}
}
int main()
{
int n;
cin >> n;
string number;
bool is_negative = false;
if ( n < 0 )
{
is_negative = true;
n *= -1;
}
int chunk_count = 0;
while ( n > 0 )
{
if ( n % 1000 != 0 ) {
number = translateThousand( n % 1000 ) + padIfNeeded( power_to_text[ chunk_count ] + padIfNeeded( number ) );
}
n /= 1000;
chunk_count++;
}
if ( number == "" )
{
number = "zero";
}
if ( is_negative )
{
number = "negative " + number;
}
cout >> number >> endl;
}
What's the difference between... > static and extern?
The extern keyword specifies external linkage. It tells the compiler that the object or function declared here is actually defined in another file. It helps here to explain the difference between definition and declaration. Definition is where the variable is created and space is set aside for it. For functions, the definition is the function body. Declaration is simply stating the linkage and type of the variable, so the compiler can ensure you're using it correctly. For functions, the declaration states the linkage, return type, parameter count and parameter types. Go here for more on the difference between declare and define in C and C++. The static keyword is somewhat the opposite of extern. It tells the compiler that the object or function declared is internally linked, and only visible from within that translation unit (a translation unit is a technical term for a .c file after all the preprocessing is finished and the #includes added in). extern should be used in a header file to make functions available to other .c files that wish to use that interface. This makes them akin to public members in a C++ class. static should be used in the .c file with the variable and function definitions, to hide functions and variables from other .c files. This is akin to private members in a C++ class. As you can tell, it makes no sense to have something declared as static and again as extern. Doing so results in undefined behavior, so don't do it! For an example, see the Multiple Source Files FAQ entry. Footnote: (1) static seems to be a poorly chosen keyword for specifying linkage, and has confused many, if not all C programmers at some point (it's choice for describing storage duration, however, is excellent). There is nothing particularly static about something with internal (static) linkage. They don't stay in one place any more than their non-static counterparts. A more appropriate choice might have been intern(al) or private, but it's too late now. |
Multiple source files for one program (C++ example)
/* * Program1.cpp */ //all in one program. Want to "save" class and function for reuse. #include <iostream.h> class MyClass { public: //using default default constructor //using default copy constructor //using default assignment operator //using default destructor int getData(void); void setData(int); private: int data; }; int MyClass::getData(void) { return(data); } void MyClass::setData(int input) { data = input; } void myFunc(int &); int main(void) { int num = 3; myFunc(num); cout << num << endl; MyClass sample; sample.setData(num); cout << sample.getData(); return(0); } void myFunc(int &Num) { Num += 10; } To break Program1 up into reuseable segments do this:For the CLASS /* * The header file: Data.h */ #ifndef DATA_H #define DATA_H //put just the descriptive part of the class in here class MyClass { public: //using default default constructor //using default copy constructor //using default assignment operator //using default destructor int getData(); void setData(int); private: int data; }; #endif For the associated cpp file for the CLASS /* * Data.cpp */ #include "Data.h" //put interesting/working part of class in here int MyClass::getData() { return data; } void MyClass::setData(int input) { data = input; } For the function /* * The header file: MyFunctions.h */ #ifndef MYFUNCTIONS_H #define MYFUNCTIONS_H //list descriptive part of function--usually just the prototype(s) void myFunc(int &); //any other functions that you might want to include could go here too. #endif For the associated cpp file for the function /* * The associated cpp file for the class: MyFunctions.cpp */ #include "MyFunctions.h" //here you put the working part of the function(s) //usually the definitions void myFunc(int &Num) { Num += 10; }Now put the program back together using the modules /* * Program2.cpp */ //driver program for MyClass and MyFunctions #include <iostream.h> //include your header files this time, not the actual code as before //Note: do NOT include the cpp files, don't even mention them //the linker will link the necessary cpp files for you (like magic!) #include "Data.h" #include "MyFunctions.h" int main(void) { int num = 3; myFunc(num); cout << num << endl; MyClass sample; sample.setData(num); cout << sample.getData(); return(0); }And here's the output from the compiler: C:\>bcc32 -emyprog.exe myfunctions.cpp program2.cpp data.cpp Borland C++ 5.5 for Win32 Copyright (c) 1993, 2000 Borland myfunctions.cpp: program2.cpp: data.cpp: Turbo Incremental Link 5.00 Copyright (c) 1997, 2000 Borland C:\>myprog 13 13 For most header files you create on your computer with your compiler this should work just fine. But let's say you copy your header file and associated cpp file to a floppy to give to a friend or you download the above MyClass files to your computer from a shareware site. If the header file and cpp file you are trying to use is not in the same subdirectory/directory/folder as the program you are trying to link to, then you should use the angled brackets like you do for iostream (as iostream.h is usually in a different directory/subdirectory from the program you are working on). |
C program to convert the number entered into words as in cheques etc...
#include <stdio.h>
#include <stdlib.h>
#include<string.h>
#include<conio.h>
char *ordinal (char buf[1024], int value);
int main (int argc, char *argv[])
{
clrscr();
char buf[1024];
if (argc != 2)
{
printf("usage: ordinal <number>\n");
return EXIT_FAILURE;
}
ordinal (buf, atoi (argv[1]));
puts (buf);
getch();
return EXIT_SUCCESS;
}
/* Works for positive and negative numbers up to 9 digits long,
if you speak American English. */
char *ordinal (char buf[1024], int value)
{
static const char *const powers[]={"thousand", "million", "billion"};
static const char *const ones[]={"one", "two", "three", "four", "five",
"six", "seven", "eight", "nine", "ten",
"eleven", "twelve", "thirteen", "fourteen", "fifteen",
"sixteen", "seventeen", "eighteen", "nineteen"};
static const char *const tens[]={"twenty", "thirty", "forty", "fifty",
"sixty", "seventy", "eighty", "ninety"};
char *cp = buf;
if (value < 0)
{
cp += sprintf (cp, "negative ");
value = -value;
}
else if (value == 0)
{
strcpy (buf, "zero");
getch();
return buf;
}
{
int part_stack[4];
int *part_ptr = part_stack;
for (; value; value /= 1000)
*part_ptr++ = value % 1000;
while (part_ptr > part_stack)
{
int p = *--part_ptr;
if (p >= 100)
{
cp += sprintf (cp, "%s hundred ", ones[p / 100 - 1]);
p %= 100;
}
if (p >= 20)
{
if (p % 10)
cp += sprintf (cp, "%s-%s ", tens[p / 10 - 2], ones[p %
10 - 1]);
else
cp += sprintf (cp, "%s ", tens[p / 10 - 2]);
}
else if (p > 0)
cp += sprintf (cp, "%s ", ones[p - 1]);
if (p && part_ptr > part_stack)
cp += sprintf(cp, "%s ", powers[part_ptr - part_stack - 1]);
}
}
cp[-1] = 0;
return buf;
}
C++0x: The future of C++
What is C++0x?
C++0x is the working name for the new standard for C++, adding many language features that I'll cover in this series on C++0x. Just days ago, a C++0x draft was voted on and finalized, meaning that C++0x is almost certain to be C++11, and many compilers now provide support for some of the core C++0x features. Final publication of the standard itself is expected in mid-2011, with the final standard likely being called C++11. Until the final name change, I'll refer to the new draft as C++0x. [Update: according to Herb Sutter, the standard is confirmed as C++11 and should be published in a matter of weeks as of mid-August.]
C++0x includes a wide range of features: major new features like lambda support and "move semantics", usability improvements like type inference through the auto keyword, simplified looping over containers, and many improvements that will make templates easier to write and easier to use. This series on C++0x will cover all of these features and many more.
Should you care about C++0x?
Most definitely. C++0x adds many new language features to C++. C++0x should fix many annoyances and reduce the overall verbosity of C++ as well as provide new tools, such as lambda expressions, that increase its overall expressiveness and clarity. Fetaures like move semantics improve the basic efficiency of the language, allowing you to write faster code, and the improvements to the template system make it much easier to write generic code.
The new standard library will also include many new features, including adding multithreading support directly into C++ and improved smart pointers that will simplify memory management for those who aren't already using features like boost::shared_ptr.
I've started using several new C++0x features professionally and I'm loving it. Some of the new features I'm fond of include the new meaning of the auto keyword, simplifications like better handling of right angle brackets in templates, lambda expressions and the new function declaration syntax.
How was C++0x developed?
I can't go on further about C++0x without acknowledging the hard work done by the C++ standards committee--a group of experts from academia and industry who have met many times to work through all the edge cases and design a programming language that can be implemented across multiple platforms by multiple compilers, producing efficient and reasonably maintainable code. The next standard, C++0x, looks to be a fantastic addition to the flexibility and power of C++.What is C++0x about?
Language usability
Having started to use C++0x, I'd say that the most fundamental way of looking at it is that it makes C++ a much more usable language. This isn't to say that it makes it a simpler language--there are lots of new features--but it provides a lot of functionality that makes it easier to program. Let's look at one example, the auto keyword.
In C++0x, if the compiler is able to determine the type of a variable from its initialization, you don't need to provide the type. For example, you can write code such as
int x = 3; auto y = x;and the compiler will deduce that y is an int. This, of course, isn't a shining example of where auto is really useful. Auto really comes into its own when working with templates and especially the STL. Why is that? Imagine working with an iterator:
map<string, string> address_book; address_book[ "Alex" ] = "webmaster@cprogramming.com"; // add a bunch of people to address_book
Now you want to iterate over the elements of the address_book. To do it, you need an iterator:
map<string, string>::iterator itr = address_book.begin();
That's an awfully long type declaration for something that you already know the type of! Wouldn't it be nice to simply write:
auto itr = address_book.begin();
The code is much shorter, and frankly, I think it's more readable, not less, because the template syntax obscures everything else on that line. This is one of my favorite new features, and I find it eliminates a lot of headaches and hard-to-track-down compiler errors, and just generally saves time without losing expressiveness.
Ranged For Loops
Now, the iterator example is one where C++0x has come up with an even better way of handling this--something called a range-based for loop (which almost every language has nowadays). The idea is so elegant, an example should suffice:
vector<int> vec;
vec.push_back( 10 );
vec.push_back( 20 );
for (int &i : vec )
{
cout << i;
}
All you need to do is give your variable and the range to iterate over (defined as something with iterators available via calls to begin and end--so all STL containers) and you're set! This is a pretty new feature, available as far as I know only in GCC 4.6.
But what if you want to iterate over a map? How do you put in the type for a value stored in a map? With a vector, you know the value is an int. With a map, it's essentially a pair, with .first and .second giving you the key and value. But with auto, we don't need to worry about getting the exact type right, you can simply do this:
for ( auto address_entry : address_book )
{
cout << address_entry.first << " < " << address_entry.second << ">" <<endl;
}
Which prints out as:
Alex <webmaster@cprogramming.com>
Isn't that a nice combination of new features in C++0x? It feels like it was designed that way :)
Right angle brackets
And I have one more usability improvement for you--in previous versions of the C++ standard, if you wrote a template that had another template:
vector<vector<int> > vector_of_int_vectors;
You had to write a space between the two closing angle brackets. This was not only annoying, but if you did write >> without a space, you'd get obtuse and confusing compiler error messages. The reason for this behavior had to do with an obscure C++ lexer trait called the maximal munch rule. The good news is, you no longer need to worry about it! Say hello to
vector<vector<int>> vector_of_int_vectors;
True, this seems like a small thing, but it's a victory of human code writers over machine tools. Plus it's much less ugly. Compiler support for right-angle brackets is great: GCC since 4.3, MSVC since version 8 (!) and the Intel compiler since version 11.
Mulithreading
For the first time, the C++0x standard will include a memory model and corresponding libraries for multithreading, meaning that you'll be able to write standards-compliant multithreading code. The new standard will provide for all the normal threading functionality, such as threads and thread-local storage and atomic operations. It will also include an interesting set of features, futures and promises. The basic idea of futures and promises is that you can write code that says, "this object, a future, stands for a result that hasn't been computed yet" and the work to compute the value can take place in the background. When the value is needed, you ask the future for it; if the value is ready, you get it; if not, you wait.
I'll go into more depth on mulithreading in a later article in this series.
Lots of other stuff
The number of features in C++0x is incredibly exciting. You can get a taste for what's available on the C++0x page at Wikipedia ,and I plan to dive into many of these features in more depth in this series, including:
- How auto, decltype, and the new function syntax work together to create better code
- Lambda functions
- Usability improvements like range-based for loops
- Performance improvements like generalized constant expressions
- Performance improvements like rvalue references and move semantics
- Language expressiveness improvements like nullptr, improved enumerations, and the ability to explicitly override or make "final" virtual functions
- Improvements to object construction, like initialization lists, delegating constructors and explicit control over auto-generated functions
- Template improvements including templated typedefs, variadic templates, uniform initialization and static assertions
- The new C++ memory model and the feature it supports: multithreading
- Improvements to the standard library including regular expressions, hashtables and improved smart pointers
- What features were removed or deprecated by the standard, and why
Compiler Support for C++0x
Of course, no language feature matters if it's not available to use, and the good news is that many compilers now support the new C++0x features. The Apache foundation has compiled a very useful list of C++0x language features and the compilers that support them: Compiler Support for C++0x. If you're interested in GCC, this page describes the GCC 4.7 support for C++0x.
Some compilers, such as GCC, do not automatically enable support for these features--for example, to enable C++0x features, you must compile with -std=c++0x. Nonetheless, they are still valuable if you're working on a project where you can control the choice of compiler and set of language features.
Monday, 14 November 2011
Posted by ANIMESH SHAW
