cubbi.com: Fibonacci numbers in many programming languages

Fibonacci numbers are defined as follows

F(0) = 0
F(1) = 1
F(n) = F(n-1) + F(n-2)
for negative n: F(-n) = (-1)n+1F(n)

Note that examples and benchmarks made before september 2009 on this page use the older, less general, definition, with F(0) = 1.
September 2009 UPDATE:
UPDATE IN PROGRESS: Five years since I wrote this page, I've decided to rewrite all examples to be more useful, the subroutines will be as general as possible (no more 2x2 matrix multiplication), all code will be commented, each example will now take its arguments from command line (where possible)
April 2014 UPDATE:
This effort was abandoned somewhere in 2010.

ALGORITHMS AND BENCHMARKS SUMMARY

1. EXPONENTIAL COMPLEXITY
1A - naive binary recursion
execution time: T(n)=ton
asm
java
C
to=2.8 ns
to=3.0 ns
to=3.1 ns
2. LINEAR COMPLEXITY
1B - cached binary recursion / memoization pending 2009/2010 update
2A - cached linear recursion / infinite lazy-evaluated list pending 2009/2010 update
2B - linear recursion with accumulator pending 2009/2010 update
2C - imperative loop with mutable variables pending 2009/2010 update
3. LOGARITHMIC COMPLEXITY
3A - matrix multiplication pending 2009/2010 update
3B - fast recursion pending 2009/2010 update
3C - Binet's formula with rounding pending 2009/2010 update

EXAMPLES

LanguagesAlgorithms
1A1B2A2B2C3A3B3C
concatenative languages
(program is a composition of functions applied to stack of arguments)
Forth (1960) OK OK OK OK OK OK OK OK
PostScript (1982) OK OK OK OK OK OK OK OK
MUF (1989) OK OK OK OLD OLD OLD OLD OLD
Joy (1999) OK OLD OLD OLD OLD OLD OLD
possible todo: Factor
Vector languages
(program is a composition of functions applied to a multidimentional array argument)
J (1990) OK OK OK OK OK OK OK OK
possible todo: APL, K, Q, Nial
functional languages
(program is an application of functions)
Scheme (1975) OK OK OK OK OK OK OK OK
Hope (1978) OK OK OK OK N/A OK OK OK
OCaml (1989) OK OK OK OK OK OK OK OK
Haskell (1990) OK OK OK OK N/A OK OK OK
possible todo: Clojure, Pure
logic and concurrent languages
(program is a collection of logic statements)
Prolog (1971) OK OK OK OK OK OK OK OK
possible todo: KL1, Erlang
object languages
(program is a list of data objects and their relationships)
C++ (1986) OK OK OK OK OK OK OK OK
Java (1995) OK OK OK OK OK OK OK OK
possible todo: C#, Python
imperative languages
(program is a sequence of commands to the CPU)
Assembly (1952) OK OLD OLD OLD TBD OLD TBD
Fortran (1954) OK OLD OLD OLD OLD OLD
C (1972) OK OK OLD OK OK OLD OLD OLD
sh (1972) OK OLD OLD OLD OLD OLD OLD
awk (1978) OK OLD OLD OLD OLD OLD OLD
perl (1987) OK OLD OLD OLD OLD OLD OLD
Tcl (1990) OK OLD OLD OLD OLD OLD OLD

ALGORITHMS AND BENCHMARKS IN DETAIL

Please remember that these benchmarks do not show strength or weakness of any programming language. All they show is how efficient specific language translators are at implementing the featured algorithms. It is currently (as of 2009) in the transitional stage from OLD to NEW benchmarks using modern hardware and better written examples.

Benchmarked on Intel Pentium Core i7 2.66GHz running Linux 2.6.31 with 12GB RAM, in x86_64 mode
OLD benchmarks: Intel Pentium 4 2.20GHz running Linux 2.5.75 with 640M RAM
NEW RAW BENCHMARK DATA available if you want to see them, as well as OLD RAW BENCHMARK DATA.

ALGORITHM 1A: NAIVE BINARY RECURSION
  1. If n equals 0, return 0
  2. If n equals 1, return 1
  3. Recursively calculate F(n-1)
  4. Recursively calculate F(n-2)
  5. Return the sum of the results from steps 2 and 3
    Complexity (fixed precision): O(exp n) speed, O(exp n) space
The number of additions grows exponentially with n, and each recursive call requires memorization of state (since neither call is tail-recursive), which makes space complexity also exponential. This algorithm becomes uselessly slow for F(n) with n>50 and all measured results easily fit in 64-bit integer data type. With the duration of the integer arithmetic operations independent of n, the time to execute these programs is almost perfectly t0φn, which is why the plots of the logarithm of execution time vs. n are parallel straight lines.
This algorithm was implemented directly in all languages featured on this web page, since all of them allow recursion. Only Fortran did not allow recursion in the past, but it is supported since the 1990 standard.
Curiously, Java outperforms C++ on this test, but it's one of the very few artifical tests
where it does so
LanguageTranslator [*]T0T(46),
seconds
Assembly GNU as 2.19.1 [c] 2.8 ns 11.63 plot
log(Execution time, seconds) vs. Fibonacci number calculated
Java Sun Java SE 1.6.0_16-b01 / HotSpot 64-Bit VM 14.2-b01 [b] 3.0 ns 12.34
C GNU gcc 4.4.1 [c] 3.1 ns 12.57
C++ GNU g++ 4.4.1 [c] 3.1 ns 12.61
OCaml INRIA OCaml 3.11.1 [c] 3.4 ns 13.98
Fortran GNU gfortran 4.4.1 [i] 4.3 ns 17.62
Forth GNU gforth 0.7.0 [i] 15.3 ns 61.91
Scheme MIT Scheme 7.7.90 20090107 [b] 22.6 ns 98.04
Haskell GHC 6.8.2 [c] 31.3 ns 129.1
Joy Joy1 by Manfred von Thun, 07-May-03 [i] 199 ns 813.5
PostScript GPL Ghostscript 8.70 [i] 284 ns 1147
awk GNU Awk 3.1.7 [i] 417 ns 1729
MUF FuzzBall MUCK 6.09 [b] 716 ns 2939
perl perl 5.8.8 [i] 755 ns 3101
J Jsoftware J64 6.02 [i] 1.08 μs 4386
Hope Hope by Ross Paterson, 08-Dec-03 [i] 1.52 μs 6035
Prolog SWI Prolog 5.6.64 [b] 2.17 μs 8900
Tcl Tcl 8.5.7 [i] 3.37 μs 13700
sh bash 4.0.33(2) [i] 101 μs 415000
     
[*] - [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1) except for MUF, where an internal time function was called from the program. T(46) was measured directly only when it was less than 2000 seconds, otherwise it was extrapolated from execution times for smaller inputs. In all cases I tried to pick the fastest one of the language translators that are readily available and offer unlimited precision integer arithmetics (it becomes important in further tests).

 

ALGORITHM 1B: CACHED BINARY RECURSION / MEMOIZATION
  1. If n equals 0, return 0
  2. If n equals 1, return 1
  3. Read F(n-1) from cache or recursively calculate if not cached yet
  4. Read F(n-2) from cache or recursively calculate if not cached yet
  5. Return the sum of the results from steps 2 and 3 and save it in the cache
    Complexity (unlimited precision): O(n2) speed, O(exp n) space
The common optimization technique applied to binary recursions is memoization: for any given set of arguments, the recursive function is calculated only once, and the result is stored in memory. Further calls to the function with the same arguments return the previously calculated result.
Since no function result is calculated twice, the number of additions required to calculate F(n) is reduced to n. The number of cache lookups is instead exponential but, assuming it is implemented as a suitable random-access data structure (array or hash table), the time of each cache lookup is O(1).
Space requirements for this algorithm are still exponential because of binary recursion.
In the programming languages that have built-in support for memoization (such as J or Factor), algorithm 1A can be turned into 1B with just one keyword. For other languages, I manually implement a suitably generic data structure.
LanguageTranslator [*]t0T(1,000,000),
seconds
        plot
sqrt(Execution time, seconds) vs. Fibonacci number calculated
     
[*] - [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1), the test programs were modified to avoid conversion of the entire result to string, since that operation may be slow for large numbers.

 

ALGORITHM 2A: CACHED LINEAR RECURSION
When previously calculated Fibonacci numbers are stored in a data structure, there is no need for binary recursion, the entire structure can be filled with only n iterations.
  • infinite lazy evaluated list.
    1. Construct an infinite list F with F[0] equal 0 and F[1] equal 1
    2. Define F[i] as the sum of F[n-1] and F[n-2].
    3. Return the nth element of the list
    In the languages that support lazy evaluation, either implicitly (like Hope or Haskell) or explicitly (like OCaml or Scheme), it is possible to construct the data structure containing all Fibonacci numbers and simply return the nth element of the structure.
  • random-access container.
    1. Create a container F with the elements F[0] equal 0 and F[1] equal 1
    2. Recursively (or iteratively) calculate the elements F[2..n] according to the formula F[i]=F[i-1]+F[i-2]
    3. Return F[n]
    In the languages that only support eager evaluation, the entire data structure must be filled with an explicit loop (or a tail-recursive function). If the container permits random access with O(1) complexity, the total execution complexity is O(n), assuming fixed precision. For large precision numbers, the complexity of addition grows with n as well.
  • Complexity (fixed precision): O(n) speed, O(n) space
    Complexity (unlimited precision): O(n2) speed, O(n2) space.
LanguageTranslator [*]t0T(1,000,000),
seconds
        plot
sqrt(Execution time, seconds) vs. Fibonacci number calculated
     
[*] - [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1), the test programs were modified to avoid conversion of the entire result to string, since that operation may be slow for large numbers.

 

ALGORITHM 2B: LINEAR RECURSION WITH ACCUMULATOR
  1. Define a recursive function λ(n1,n2,n) such as
  2. λ(n1,n2,n) returns 0, if n=0 and 1 if n=1
  3. λ(n1,n2,n) return the result of λ(n2,n1+n2,n-1), if n>1
  4. Return the result of λ(0,1,n)
    Complexity (fixed precision): O(n) speed, O(1) space
    Complexity (unlimited precision): O(n2) speed, O(n) space
These algorithms execute n additions to calculate F(n), and store the intermediate results in the recursive function's additional arguments, which serve as accumulators. The time to execute is, therefore n * time to execute one addition and one recursive call. The addition of large numbers if O(n), and recursive calls are typically constant time and never greater than O(n). The time to execute the algorithm is T(n)=t0n2. which is a straight line in the plot of sqrt(T) vs. n. This recursive functions is easily written as tail recursion, which means it does not contribute to space complexity, but space is needed to store the accumulated sum, which grows as O(n). In the languages that do not support tail recursion, such as C/C++, space grows as O(n2), and the algorithm quickly becomes useless. Such languages are more suited for 2C (iteration) anyway.
LanguageTranslator [*]t0T(1,000,000),
seconds
Haskell GHC 6.8.2 [c] 12.4 fs 12.42 plot
sqrt(Execution time, seconds) vs. Fibonacci number calculated
Prolog SWI Prolog 5.7.15 [b] with GMP 4.3.1 21.3 fs 22.22
OCaml INRIA OCaml 3.11.1 [c] 32.2 fs 32.14
Java Sun Java SE 1.6.0_16-b01 / HotSpot 64-Bit VM 14.2-b01 [b] 34.1 fs 31.27
Scheme MIT Scheme 7.7.90 20090107 [b] 48.8 fs 47.88
Forth GNU gforth 0.7.0 [i] - -
C++ GNU g++ 4.4.1 [c] with GMP 4.3.1 - -
J Jsoftware J64 6.02 [i] - -
     
[*] - [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1), the test programs were modified to avoid conversion of the entire result to string, since that operation may be slow for large numbers.
J reports stack overflow when trying to calculate numbers over ~3000, and I could not find a way to increase available stack size in jconsole
C++ linked with GMP consumes memory at a very large pace, exceeding 20Gb when calculating 500,000th number
Forth (Gforth linked with GMP) also consumes memory at a very large pace, running into tens of gigabytes about the 500,000th number

 

ALGORITHM 2C: IMPERATIVE LOOP WITH MUTABLE VARIABLES
  1. Assign value 0 to variable n1 and value 1 to variable n2
  2. Assign the value of n1+n2 to variable n2 and assign the old value of n2 to n1
  3. Repeat step 2 n times.
  4. Return the value of variable n1
    Complexity (fixed precision): O(n) speed, O(1) space
    Complexity (unlimited precision): O(n2) speed, O(n) space
These algorithms execute n additions to calculate F(n), and store the intermediate results in mutable variables. Time to execute is, therefore n * time to execute one addition and two variable read/store operations. The addition of large numbers if O(n), and store/retrieve is typically constant time and never greater than O(n). The time to execute the algorithm is T(n)=t0n2. which is a straight line in the plot of sqrt(T) vs. n.
LanguageTranslator [*]t0T(1,000,000),
seconds
Forth GNU gforth 0.7.0 [i] with GMP 4.3.1 11.1 fs 11.13 plot
sqrt(Execution time, seconds) vs. Fibonacci number calculated
C++ GNU g++ 4.4.1 [c] with GMP 4.3.1 23.2 fs 22.62
Java Sun Java SE 1.6.0_16-b01 / HotSpot 64-Bit VM 14.2-b01 [b] 31.6 fs 29.52
OCaml INRIA OCaml 3.11.1 [c] 35.9 fs 34.52
Scheme MIT Scheme 7.7.90 20090107 [b] 50.8 fs 48.46
J Jsoftware J64 6.02 [i] 413 fs 403.6
Prolog SWI Prolog 5.7.15 [b] with GMP 4.3.1 73.8 ns -
Haskell GHC 6.8.2 [c] - -
Hope Hope by Ross Paterson, 08-Dec-03 [i] - -
     
[*] - [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1), the test programs were modified to avoid conversion of the entire result to string, since that operation may be slow for large numbers.
Prolog usually does not have mutable variables, but some implementations do.
Haskell does not have the concept of mutable variables.
Hope does not have the concept of mutable variables.

 

ALGORITHM 3A: MATRIX MULTIPLICATION
This is the most mathematically elegant way to calculate a Fibonacci number.
    
| F(n)   F(n-1) |     | 1  1 | ^ n
|               |  =  |      |
| F(n-1) F(n-2) |     | 1  0 |
    Complexity (fixed precision): O(log n) speed, O(1) space
    Complexity (unlimited precision): O(n log2 n log log n) speed, O(n) space
Raising a matrix to the power of n requires only O(log n) matrix multiplications, if n is a natural number and the repeated squaring algorithm is used. Space requirements for repeated squaring include the product accumulator and the last calculated square, which makes it O(1) (in fixed-precision case) unless non-tail-recursive implementation is used. In case of unlimited precision, execution time of each 2x2 matrix multiplication is proportionate to the execution time of one integer multiplication, which has the complexity of O(n log n log log n) and space requirements to store one 2x2 matrix grow as O(n).

The repeated squaring algorithm for calculation of xn:
  1. start with product accumulator a=1 and last square accumulator s=x
  2. if n is odd, increase the accumulated product, a=a*s
  3. calculate the next square of s, s=s2
  4. divide n by 2, n = n div 2
  5. repeat until n=0, at which time the final value of a is the answer
In languages providing bit operations, this algorithm can examine the bits in the binary representation of n, proceeding from least-significant to most-significat bit, instead of repeating division by 2. The example in J makes good use of this approach.
LanguageTranslator [*]t0T(1,000,000),
seconds
        plot
sqrt(Execution time, seconds) vs. Fibonacci number calculated
     
[*] - [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1), the test programs were modified to avoid conversion of the entire result to string, since that operation may be slow for large numbers.

 

ALGORITHM 3B: FAST RECURSION
This algorithm can be derived either from from the fact that, from definition, F(n+m)=F(n)F(m) + F(n-1)F(m-1), or by rewriting the matrix equation using the fact that An+m=AnAm:
F(2n-1) = Fn^2 + F(n-1^2
F(2n) = (F(n-1) + F(n+1)Fn = (2F(n-1) + Fn)Fn
This algorithm is faster than than 3A because it does not repeat calculations of the same numbers (top right and bottom left corners of the matrix in 3A), but it cannot be easily implemented as a tail-recursive function, resulting in O(log n) space complexity.
I implement it this way:
  1. Define a recursive function λ(n) such as
  2. λ(n) returns a pair of numbers (0,0) if n equals 0
  3. λ(n) returns a pair of numbers (1,0) if n equals 1
  4. recursively calculate λ(n div 2) (integer division, (n-1)/2 for odd n) to obtain a pair of numbers (k2,k1)
  5. for even n, return a pair of numbers (2k1+k2)*k2, k12+k22)
  6. for odd n, return a pair of numbers ((k1+k2)2 , (2k1+k2)*k2)
  7. The end result is the first element of the result of λ(n)
    Complexity (fixed precision): O(log n) speed, O(log n) space
    Complexity (unlimited precision): O(n log2 n log log n) speed, O(n log n) space
LanguageTranslator [*]t0T(1,000,000),
seconds
        plot
sqrt(Execution time, seconds) vs. Fibonacci number calculated
     
[*] - [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1), the test programs were modified to avoid conversion of the entire result to string, since that operation may be slow for large numbers.

 

ALGORITHM 3C: BINET'S FORMULA WITH ROUNDING
    Eigendecomposition of the matrix equation from 1A gives a formula, published by Abraham de Moivre in 1730, Euler in 1765, and by Jacques Philippe Marie Binet in 1843
    F(n) = 1/sqrt(5) * (phi^n - (1-phi^n)
    Where φ is the golden ratio, (1+sqrt(5))/2. Because (1-phin) / sqrt(5) is less than 1/2 for all positive n, this formula can be simplified to nearest integer to φn/sqrt(5) or, using the floor function,
    F(n) = floor((phi^n/sqrt(5) + 1/2)
    Complexity (fixed precision): O(log n) speed, O(1) space
    Complexity (unlimited precision): O(n log2 n log log n) speed, O(n) space

Evaluation of this formula, just as evaluation of the formula in 3A, is thus reduced to taking an integer power of a number, only this number is irrational and must be represented as a floating-point value with sufficient precision to produce the correct fibonacci number. Not many languages support arbitrary precision floating point values, I use GNU MP library where possible. The computational and space complexity of this algorithm are the same as 1A, and are the same as the complexity of the repeated squaring algorithm with one addition: time to calculate the square root of five with sufficient precision. Where not provided by the compiler/library, I iteratively calculate inverse square root via y(n+1) = 0.5*y(n)*[3 - a*y(n)^2] because this recurrent expression has no division.
LanguageTranslator [*]t0T(1,000,000),
seconds
        plot
sqrt(Execution time, seconds) vs. Fibonacci number calculated
     
[*] - [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1), the test programs were modified to avoid conversion of the entire result to string, since that operation may be slow for large numbers.

LINKS
Wolfram's MathWorld entry
OEIS entry