28.10.2014 Views

Synergy User Manual and Tutorial. - THE CORE MEMORY

Synergy User Manual and Tutorial. - THE CORE MEMORY

Synergy User Manual and Tutorial. - THE CORE MEMORY

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> And <strong>Tutorial</strong>


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Documenting the <strong>Synergy</strong> Project<br />

Supervised by Dr. Yuan Shi<br />

Compiled by Joe Jupin<br />

syn·er·gy (sǐn r-jē) noun<br />

plural syn·er·gies<br />

1. The interaction of two or more agents or forces so that their combined<br />

effect is greater than the sum of their individual effects.<br />

2. Cooperative interaction among groups, especially among the acquired<br />

subsidiaries or merged parts of a corporation, that creates an enhanced<br />

combined effect.<br />

[From Greek sunergia, cooperation, from sunergos, working together.]<br />

"For it is unworthy of excellent men to lose hours like slaves in the labour of<br />

calculation which could safely be relegated to anyone else if machines were<br />

used."<br />

-Gottfried Wilhelm Leibniz<br />

2


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Table of Contents<br />

Introduction<br />

1. History <strong>and</strong> Limitations of Traditional Computing<br />

Parallel Processing<br />

1. What is parallel processing?<br />

2. Why parallel processing?<br />

3. History <strong>and</strong> Existing Tools for Parallel Processing<br />

a. History of Parallel Processing<br />

b. Linda<br />

c. Parallel Virtual Machine (PVM)<br />

d. Message Passing Interface (MPI)<br />

4. Parallel Programming Concepts<br />

a. Symmetric MultiProcessor (SMP)<br />

b. Stateless Machine (SLM)<br />

c. Stateless Parallel Processing (SPP)<br />

d. Tuple Spaces<br />

e. Division of labor (sharing workload between workers)<br />

f. Debugging Parallel Programs<br />

5. Theory <strong>and</strong> Challenges of Parallel Programs <strong>and</strong> Performance Evaluation<br />

a. Temporal Logic<br />

b. Petri Net<br />

c. Amdahl’s Law<br />

d. Gustafson’s Laws<br />

e. Performance Metrics<br />

f. Timing Models<br />

i. Gathering System Performance Data<br />

ii. Gathering Network Performance Data<br />

g. Optimal Load balancing<br />

h. Availability<br />

About <strong>Synergy</strong><br />

1. Introduction to The <strong>Synergy</strong> Project<br />

a. What is <strong>Synergy</strong>?<br />

b. Why <strong>Synergy</strong>?<br />

c. History<br />

2. Major Components <strong>and</strong> Inner Workings of <strong>Synergy</strong><br />

a. What are in <strong>Synergy</strong>? (<strong>Synergy</strong> Kernel with Explanation)<br />

3. Comparisons with Other Systems<br />

a. <strong>Synergy</strong> vs. PVM/MPI<br />

b. <strong>Synergy</strong> vs. Linda<br />

4. Parallel Programming <strong>and</strong> Processing in <strong>Synergy</strong><br />

5. Load Balance <strong>and</strong> Performance Optimization<br />

3


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

6. Fault Tolerance<br />

Installing <strong>and</strong> Configuring <strong>Synergy</strong><br />

1. Basic Requirements<br />

2. Compiling<br />

3. Setup<br />

4. Configuring the <strong>Synergy</strong> Environment<br />

5. Activating <strong>Synergy</strong><br />

6. Creating a Processor Pool<br />

Using <strong>Synergy</strong><br />

1. The <strong>Synergy</strong> System<br />

a. The Comm<strong>and</strong> Specification Language (csl) File<br />

b. <strong>Synergy</strong>’s Tuple Space Objects<br />

c. <strong>Synergy</strong>’s Pipe Objects<br />

d. <strong>Synergy</strong>’s File Objects<br />

e. Compiling <strong>Synergy</strong> Applications<br />

f. Running <strong>Synergy</strong> Applications<br />

g. Debugging <strong>Synergy</strong> Applications<br />

2. Tuple Space Object Programming<br />

a. A simple application—Hello <strong>Synergy</strong>!<br />

b. Sending <strong>and</strong> Receiving Data—Hello Workers!—Hello Master!!!<br />

c. Sending <strong>and</strong> Receiving Data Types<br />

d. Getting Workers to Work<br />

i. Sum of First N Integers<br />

ii. Matrix Multiplication<br />

e. Work Distribution by Chunking<br />

i. Sum of First N Integers Chunking Example<br />

ii. Matrix Multiplication Chunking Example<br />

f. Optimized Programs<br />

i. Matrix Multiplication Optimized<br />

3. Pipe Object Programming<br />

4. File Object Programming<br />

Parallel Meta-Language (PML)<br />

1. Automated Parallel Code Generation<br />

Future Directions<br />

Function <strong>and</strong> Comm<strong>and</strong> Reference<br />

1. Comm<strong>and</strong>s<br />

2. Functions<br />

3. Error Codes<br />

References<br />

Index<br />

4


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Introduction<br />

Red text: Copied <strong>and</strong> pasted from syng_man.ps by Dr. Shi<br />

The emergence of low cost, high performance uni-processors forces the enlargement of<br />

processing grains in all multi-processor systems. Consequently, individual parallel<br />

programs have increased in length <strong>and</strong> complexities. However, like reliability, parallel<br />

processing of any multiple communicating sequential programs is not really a functional<br />

requirement.<br />

Separating pure functional programming concerns from parallel processing <strong>and</strong> resource<br />

management concerns can greatly simplify the conventional ``parallel programming''<br />

asks. For example, the use of dataflow principles can facilitate automatic task<br />

scheduling. Smart tools can automate resource management. As long as the application<br />

dependent parallel structure is uncovered properly, we can even automatically assign<br />

processors to parallel programs in all cases.<br />

<strong>Synergy</strong> V3.0 is an implementation of above ideas. It supports parallel processing using<br />

multiple ``Unix computers'' mounted on multiple file systems (or clusters) using TCP/IP.<br />

It allows parallel processing of any application using mixed languages, including parallel<br />

programming languages. <strong>Synergy</strong> may be thought of as a successor to Linda 1 , PVM 2 <strong>and</strong><br />

Express 3 .<br />

Our need to store <strong>and</strong> process data has been continually increasing for thous<strong>and</strong>s of years.<br />

This need has lead to the development of complex storage, communication, numerical<br />

<strong>and</strong> processing systems. The information in this section was wholly obtained from<br />

sources freely available on the Internet, which are cited in the references section. Much<br />

of it was obtained from timelines, encyclopedias <strong>and</strong> academic Web pages. The accuracy<br />

of information collected from the Internet was checked by using multiple corroborating<br />

resources <strong>and</strong> eliminating contradictory information.<br />

1 Linda is a tuple space parallel programming system lead by Dr. David Gelenter, Yale University. Its<br />

commercial version is distributed by the Scientific Computing Associates, New Heaven, NH.<br />

2 PVM is a message passing parallel programming system by Oak Ridge National Laboratory, University<br />

of Tennessee <strong>and</strong> Emory University.<br />

3 Express is a commercial message passing parallel programming system by ParaSoft, CA.<br />

5


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

History <strong>and</strong> Limitations of Ancient <strong>and</strong> Traditional<br />

Computing<br />

The first recognized use of a tool to record the<br />

result of transactions was a device called a tally<br />

stick. The oldest known artifact is a wolf bone<br />

with a series of fifty-five cuts in groups of five<br />

that dates from approximately 30,000 to 25,000<br />

BC. The notches in the stick may refer to the<br />

number of coins or other items that are counted<br />

by some early form of bookkeeping. The<br />

earliest stock markets used tally sticks to record<br />

transactions. The word “stock” actually means a<br />

stout stick. During a transaction the “broker”<br />

would record the transaction for the purchase of<br />

stock on a tally stick <strong>and</strong> then “break” the stick,<br />

keeping half <strong>and</strong> giving the other half to the<br />

investor. The two halves would be fit together<br />

at some later time to verify the investor’s<br />

ownership of the shares of stock. In 1734 the<br />

English government ordered the cessation of the<br />

use of tally sticks but they were not completely<br />

abolished until 1826. By 1834 British<br />

Parliament collected a very large number of tally sticks, which the decided to burn in the<br />

fireplace at the House of Lords. The fireplace was “engorged” with tally sticks such that<br />

the fire spread to the paneling <strong>and</strong> to the neighboring House of Commons, destroying<br />

both buildings, which took ten years to reconstruct. i Other primitive recording devices<br />

included clay tablets, knotted strings, pebbles in bags <strong>and</strong> parchments. In modern times,<br />

books or legers have been used to record commercial or financial data using more formal<br />

bookkeeping systems, such as the double entry st<strong>and</strong>ard that is widely used today.<br />

The first place-valued numerical system, in which both digit <strong>and</strong> position within the<br />

number determine value, <strong>and</strong> the abacus, which was the first actual calculating<br />

mechanism, are believed to have been invented by the Babylonians sometime between<br />

3000 <strong>and</strong> 500 BC. Their number system is believed to have been developed based on<br />

astrological observations. It was a sexagesimal (base-60) system, which had the<br />

advantage of being wholly divisible by 2, 3, 4, 5, 6, 10, 15, 20 <strong>and</strong> 30. The first abacus<br />

was likely a stone covered with s<strong>and</strong> on which pebbles were moved across lines drawn in<br />

the s<strong>and</strong>. Later improvements were constructed from wood frames with either thin sticks<br />

or a tether material on which clay beads or pebbles were threaded. Sometime between<br />

6


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

200 BC the 14 th century, the<br />

Chinese invented a more advanced<br />

abacus device. The typical<br />

Chinese swanpan (abacus) is<br />

approximately eight inches tall <strong>and</strong><br />

of various widths <strong>and</strong> typically has<br />

more than seven rods, which hold<br />

beads usually made from<br />

hardwood. This device works as a<br />

5-2-5-2 based number system,<br />

which is similar to the decimal<br />

system. Advanced swanpan techniques are not limited to simple addition <strong>and</strong><br />

subtraction. Multiplication, division, square roots <strong>and</strong> cube roots can be calculated very<br />

efficiently. A variation of this devise is still in use by shopkeepers in various Asian<br />

countries. ii There is direct evidence that the Chinese were using a positional number<br />

system by 1300 BC <strong>and</strong> were using a zero valued digit by 800 AD.<br />

Sometime after 200 BC, Eratosthenes of Cyrene (276-194 BC) developed the Sieve of<br />

Eratosthenes, which was a procedure for determining prime numbers. It is called a sieve<br />

because it strains or filters out all non-primes. The process is as follows:<br />

1. Make a list of all integers greater than one <strong>and</strong> less than or equal to n<br />

2. Strike out the multiples of all primes less than or equal to the square root of n.<br />

3. The numbers that are left are the primes.<br />

The table below show the result for n = 50 with primes in the white squares.<br />

2 3 4 5 6 7 8 9 10<br />

11 12 13 14 15 16 17 18 19 20<br />

21 22 23 24 25 26 27 28 29 30<br />

31 32 33 34 35 36 37 38 39 40<br />

41 42 43 44 45 46 47 48 49 50<br />

Eratosthenes is also credited with being the first person to accurately estimate the<br />

diameter of the Earth <strong>and</strong> also served as the director of the famed Library of Alex<strong>and</strong>ria. iii<br />

7


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

A postage stamp issued by the USSR in<br />

1983 to commemorate the 1200th<br />

anniversary of Muhammad al-<br />

Khowarizmi. Scanned by Donald Knuth,<br />

one of the legends of computer science.<br />

The Sieve of Eratosthenes is one of the first welldocumented<br />

uses of an efficient algorithm-type solution<br />

to solve a complex problem. The word algorithm is<br />

derived from the Latin derivation of Al-Khowarizmi’s<br />

name. Muhammad ibn Musa al-Khwarizmi was an<br />

Arab mathematician of the court of Mamun in Baghdad<br />

born before 800 AD in central Asia, now called<br />

Uzbekistan. Along with other Arabic mathematicians,<br />

he is responsible for the proliferation of the base-ten<br />

number system, which was developed in India. His<br />

book on the subject of Hindu numerals was later<br />

translated into the Latin text Liber Algorismi de<br />

numero Indorum. While a scholar at the House of<br />

Wisdom in Baghdad, he wrote Hisãb al-jabr w'almuqãbala<br />

(from which the word "algebra" is derived).<br />

Lose translations of this title could be “the science of<br />

transposition <strong>and</strong> cancellation” or “the calculation of<br />

reduction <strong>and</strong> restoration.” He devised a method to<br />

restore or transpose negative terms to the other side of<br />

an equation <strong>and</strong> reduce (cancel) or unite similar terms<br />

on either side of the equation. Transposition means that a quantity can be added or<br />

subtracted (multiplied or divided) from both sides of an equation <strong>and</strong> cancellation means<br />

that if there are two equal terms on either side of an equation, they can be altogether<br />

cancelled. The following is a translation of a popular verse in Arab schools from over six<br />

hundred years ago:<br />

Cancel minus terms <strong>and</strong> then<br />

Restore to make your algebra;<br />

Combine your homogeneous terms<br />

And this is called muqabalah.<br />

Robert of Chester translated this work into Latin in 1140 AD. Similar methods are still in<br />

use in modern algebraic manipulations, which came in the sixteenth century from<br />

Francois Viète. Al-Khowarizmi also claimed in his book Indorum (the book of Al-<br />

Khowarizmi) that any complex mathematical problem could be broken down into<br />

smaller, simpler sub-problems, whose results could be logically combined to solve the<br />

initial problem. This is the main concept of an algorithm. Latin translations of his work<br />

contributed to much of medieval Europe’s knowledge of mathematics. In 1202,<br />

Leonardo of Pisa (otherwise known by his nickname Fibonacci) (c. 1175-1250) wrote the<br />

8


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

historic book Liber Abaci or “The Book of Calculation”, which was his interpretation of<br />

the Arabic-Hindu decimal number system that he learned while traveling with Arabs in<br />

North Africa. This book was the first to expose the general public, rather than academia,<br />

to the decimal number system, which quickly gained popularity because of its clear<br />

superiority over existing systems. iv<br />

The Greek astronomer,<br />

geographer <strong>and</strong><br />

mathematician<br />

Hipparchus (c. 190 BC<br />

– 120 BC) likely<br />

invented the<br />

navigational instrument<br />

called an astrolabe.<br />

This is a protractor-like<br />

device consisting of a<br />

degree marked circle<br />

with a center attached rotating arm. When the zero degree mark is aligned on the horizon<br />

<strong>and</strong> a celestial body is sighted along the movable arm, the celestial body’s position can be<br />

read from the degree marks on the circle. The sextant eventually replaced this device<br />

because the sextant measured relative to the horizon <strong>and</strong> not the device itself, which<br />

allowed more accurate measurements of position for latitude.<br />

Sometime between 1612 <strong>and</strong> 1614, John Napier (1550 -<br />

1617), born at Merchiston Tower in Edinburgh,<br />

Scotl<strong>and</strong>, developed the decimal point, logarithms <strong>and</strong><br />

Napier’s bones—an abacus for the calculation of<br />

products <strong>and</strong> quotients of numbers. H<strong>and</strong> performed<br />

calculations were made much easier by the use of<br />

logarithms, which made possible many later scientific<br />

advancements. Mirifici Logarithmorum Canonis<br />

Descriptio or in English "Description of the Marvelous<br />

Canon of Logarithms", his mathematical work, contained<br />

thirty-seven pages of explanatory matter <strong>and</strong> ninety<br />

pages of tables, which furthered advancements in<br />

astronomy, dynamics <strong>and</strong> physics. Based on Napier’s<br />

algorithms in 1622, William Oughtred (1574 - 1660)<br />

invented the circular slide rule for calculating multiplication <strong>and</strong> division. In 1632 he<br />

published Circles of Proportion <strong>and</strong> the Horizontal Instrument, which described slide<br />

rules <strong>and</strong> sundials. By 1650 the sliding stick form of the slide rule was developed. In<br />

9


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

1624, Henry Briggs (1561 - 1630) published the first set of modern logarithms, <strong>and</strong> in<br />

1628, Adrian Vlacq published the first complete set of modern logarithms.<br />

In 1623,<br />

Wilhelm<br />

Schickard (1592<br />

- 1635) invented<br />

what is believed<br />

to be the first<br />

mechanical<br />

calculating<br />

machine (left). This device used a “calculating<br />

clock” with a gear driven carry for mechanism to<br />

calculate the multiplication of multi-digit numbers<br />

in higher order positions. Between 1642 <strong>and</strong> 1643, at the age of 18, Blaise Pascal (1623 -<br />

1662) created the “Pascaline” (right) a gear driven adding machine, which was the first<br />

mechanical adding/subtracting machine. Pascal developed this machine to help his father<br />

with his work—a tax collector. He discovered how to mechanically carry numbers to the<br />

next high order by causing the higher order gear to advance one tooth for a full rotation<br />

(ten teeth) of the next lower ordered gear. This method is similar to that of old pinball<br />

machines or gas pumps with rotating number counters. These devices were never placed<br />

into commercial service due to high cost of manufacture. Approximately fifty Pascalines<br />

were constructed <strong>and</strong> could h<strong>and</strong>le calculations with up to eight digits. v<br />

In 1666 Sir Samuel Morl<strong>and</strong> (1625-1695) invented a mechanical calculator that could add<br />

<strong>and</strong> subtract. This machine was designed for use with English currency but had no<br />

automatic carry mechanism. Auxiliary dials recorded numerical overflows <strong>and</strong> had to be<br />

re-entered as addends. vi In 1673, Gottfried Wilhelm von Leibniz (1646 - 1716) designed<br />

a machine called the “Stepped Reckoner” that could mechanically perform all four<br />

mathematical operations using a stepped cylinder gear, though the initial design gave<br />

some wrong answers. This machine was never mass-produced because the high level of<br />

precision needed to manufacture it was not yet available. vii In 1774 Philipp-Matthaus<br />

Hahn (1739 - 1790) constructed <strong>and</strong> sold a small number of mechanical calculators with<br />

twelve digits of precision.<br />

The advent of the Industrial Revolution, just prior to the start of the nineteenth century,<br />

ushered in a massive increase in commercial activity. This created a great need for<br />

automatic <strong>and</strong> reliable calculation. Charles Xavier Thomas (1791 - 1871) of Colmar,<br />

France invented the first mass-produced calculating machine, called the Arithmometer<br />

(left) in 1820. His machine used Leibniz’s stepped cylinder as a digital-value actuator.<br />

However, Thomas’ automatic carry system worked in every possible case <strong>and</strong> was much<br />

10


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

more robust than any<br />

predecessor. This machine was<br />

improved <strong>and</strong> produced for<br />

decades. Other models, designed<br />

by competitors, eventually<br />

entered the marketplace.<br />

In 1786, J. H. Mueller, of the<br />

Hessian army, conceived the<br />

“Difference Engine” but could<br />

not raise the funds necessary for<br />

its construction. This was a<br />

special purpose calculating<br />

device that, given the differences between certain values where the polynomial is<br />

uniquely specified, can tabulate the polynomial values. This calculator would be useful<br />

for functions that can be approximated polynomially over certain intervals. The<br />

realization of the Difference Engine’s mechanical computer prototype design would not<br />

occur until 1822, when conceived by<br />

Charles Babbage (1792 - 1871). In<br />

1832, Babbage <strong>and</strong> Joseph Clement<br />

built a scaled-down prototype that could<br />

perform operations on 6-digit numbers<br />

<strong>and</strong> 2 nd order or quadratic polynomials.<br />

A full-sized machine would be as big as<br />

a room <strong>and</strong> able to perform operations<br />

on 20-digit numbers <strong>and</strong> 6 th order<br />

polynomials. Babbage’s Difference<br />

Engine project was eventually canceled<br />

due to cost overruns. In 1843, George<br />

Scheutz <strong>and</strong> his son Edvard Scheutz, of<br />

Stockholm, produced a 3 rd order engine<br />

with the ability to print its results. From<br />

1989-91, a team at London's Science<br />

Museum built a fully functional<br />

Difference Engine based on Babbage’s<br />

latest (1837), improved <strong>and</strong> simpler<br />

design, using modern construction<br />

materials <strong>and</strong> techniques. The machine<br />

could successfully operate on 31-digit numbers <strong>and</strong> 7 th order differences.<br />

11


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

The Difference Engine uses Sir Isaac Newton’s method of differences. It works as<br />

follows: Consider the polynomial p(x) = x 2 + 2x + 1 <strong>and</strong> tabulate the values for p(0),<br />

p(0.1) , p(0.2) , p(0.3) , p(0.4). The table below contains the results of the polynomial<br />

values in the first column, the differences of each consecutive set of polynomial results in<br />

the second column, <strong>and</strong> the differences of each consecutive set of differences from the<br />

second column in the third column. For a 2 nd order polynomial, the third column will<br />

always contain the same value.<br />

Likewise, for an n th order<br />

polynomial, column n+1 will<br />

always have the same value. To<br />

find p(0.5), start from the right<br />

column with value 0.02 <strong>and</strong><br />

subtract this from the second<br />

column to get -0.29. Then<br />

subtract this value from the first<br />

column to get 2.25, which is the<br />

solution to p(0.5). This can be<br />

p(0) = 1<br />

1 – 1.21 = -0.21<br />

p(0.1) = 1.21 -0.21 – (-0.23) = 0.02<br />

1.21 – 1.44 = -0.23<br />

p(0.2) = 1.44 -0.23 – (-0.25) = 0.02<br />

1.44 – 1.69 = -0.25<br />

p(0.3) = 1.69 -0.25 – (-0.27) = 0.02<br />

p(0.4) = 1.96<br />

continued incrementally for greater p(x), indefinitely, by updating the table <strong>and</strong> repeating<br />

the algorithm.<br />

This device impresses a zinc<br />

block, which prints the results<br />

of calculations on paper. This<br />

could be considered the first<br />

st<strong>and</strong>alone computer printer.<br />

1.69 – 1.96 = -0.27<br />

Babbage also invented the<br />

Analytical Engine, which<br />

was the first computing<br />

device designed to use readonly<br />

memory, in the form of<br />

punched cards, to store programs. This generalpurpose<br />

mathematical device was very similar to<br />

electronic processes used in early computers. Later<br />

designs of this machine would perform operations on<br />

40-digit numbers. The machine had a processing unit<br />

called the “mill” that contained two main<br />

accumulators <strong>and</strong> some special purpose auxiliary<br />

accumulators. It also had memory area called the<br />

“store”, which could hold approximately 100 more<br />

numbers. To accept data <strong>and</strong> program instructions,<br />

the Analytical Engine would be equipped with<br />

several punch card readers in which the cards were<br />

linked together to allow forward <strong>and</strong> reverse reading.<br />

These linked cards were first used in 1801 by Joseph-<br />

Marie Jacquard to control the weaving patterns of a<br />

loom. The machine could perform conditional<br />

12


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

branching called “jumps”, which allowed it to skip to a desired instruction. The device<br />

was capable of using a form of microcoding by using the position of studs on a metal<br />

barrel called the “control barrel” to interpret instructions. This machine could calculate<br />

an addition or subtraction operation in about three seconds, <strong>and</strong> a multiplication or<br />

division operation in about three minutes.<br />

In 1843, Augusta Ada Byron (1815 - 1852), Lady<br />

Lovelace, mathematician, scientist <strong>and</strong> daughter of the<br />

famed poet Lord Byron, translated an article from<br />

French about Babbage’s Analytical Engine, adding her<br />

own notes. Ada composed a plan for the calculation of<br />

Bernoulli numbers, which is considered to be the first<br />

ever “computer program.” Though because it was<br />

never built, the algorithm was never run on Analytical<br />

Engine. In 1979, the U.S. Department of Defense<br />

honored the world’s first “computer programmer” by<br />

naming its own software development language as<br />

“Ada.” viii<br />

George Boole (1815 -<br />

1864) (right) wrote, "An<br />

Investigation of the Laws<br />

of Thought, on Which Are<br />

Founded the Mathematical<br />

Theories of Logic <strong>and</strong> Probabilities" in 1854. This article<br />

detailed Boole’s new binary approach, which processed only<br />

two objects at a time (in a yes-no, true-false, on-off, zero-one<br />

type manner), to logic by incorporating it into mathematics<br />

<strong>and</strong> reducing it to a simple algebra, which presented an<br />

analogy between symbols that represent logical forms <strong>and</strong><br />

algebraic symbols. Three primary operations were defined based on those in Set Theory:<br />

AND—intersection, OR—union, <strong>and</strong> NOT—compliment. This system was the<br />

beginning of the Boolean algebra that is the basis for many applications in modern<br />

electronic circuits <strong>and</strong> computation. ix Though his idea was either ignored or criticized by<br />

many of his peers, twelve years later, an American, Charles S<strong>and</strong>ers Peirce, described it<br />

to the American Academy of Arts <strong>and</strong> Sciences. He spent the next twenty years<br />

exp<strong>and</strong>ing <strong>and</strong> modifying the idea, eventually designing a basic electrical logic-circuit.<br />

Processing <strong>and</strong> storage were not the only advancements<br />

made prior to the 20 th century. There were also great<br />

improvements in communications technology. Samuel<br />

13


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Morse (1791 -1872) conceived the telegraph in 1832<br />

<strong>and</strong> had built a working model by 1835. This was the<br />

first device to communicate through the use of<br />

electricity. The telegraph worked by tapping out a<br />

message from a sending device (right) in Morse code,<br />

which was a series of dots-<strong>and</strong>-dashes that<br />

represented letters, numbers, punctuation <strong>and</strong> other<br />

symbols. These dots-<strong>and</strong>-dashes were converted into<br />

electrical impulses <strong>and</strong> sent, on the wire, to a receiver<br />

(left). The receiver converted the electrical impulses to an audible sound that represented<br />

the original dots-<strong>and</strong>-dashes. In 1844, he sent a signal from Washington to Baltimore<br />

over this communication device. By 1854 there was 23,000 miles of telegraph wire being<br />

used within the United States. This provided a much more efficient form of<br />

communication that greatly affected national socio-economic development. x In 1858, a<br />

telegraph cable was run across the Atlantic Ocean, providing communication service<br />

between the U.S. <strong>and</strong> Engl<strong>and</strong> for less than a month. By 1861 a transcontinental cable<br />

connected the East <strong>and</strong> West coasts of the U.S. <strong>and</strong> by 1880, 100,000 miles of undersea<br />

cable had been laid.<br />

The next great advancement in<br />

communication was Alex<strong>and</strong>er<br />

Graham Bell’s (1847 - 1922)<br />

invention of the "electrical speech<br />

machine" or telephone in 1876.<br />

This invention was developed from<br />

improvements that Bell made to<br />

the telegraph, which allowed more<br />

than one signal to be transmitted<br />

over a single set of telegraph wires,<br />

simultaneously. Within two years,<br />

he had set up the first telephone<br />

exchange in New Haven,<br />

Connecticut. He had established<br />

long distance connections between<br />

Boston, Massachusetts <strong>and</strong> New<br />

York City by 1884. The telecommunication industry would eventually reach almost<br />

every locality in the country, then the world. Bell’s original venture evolved into larger<br />

companies <strong>and</strong> in 1881 American Bell Telephone Co. Inc. purchased Western Electric<br />

Manufacturing Company to manufacture equipment for Bell. In 1885, American<br />

Telephone <strong>and</strong> Telegraph Company (AT&T) were formed to extend Bell system long<br />

lines across the U.S. <strong>and</strong> in 1899 AT&T became the parent company of Bell, assuming<br />

14


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

all assets. The Western Electric Engineering Dept. was organized in 1907 <strong>and</strong> a research<br />

branch to do scientific research <strong>and</strong> development was organized in 1911. On December<br />

27, 1925, Bell Telephone Laboratories was created to consolidate the research labs from<br />

AT&T <strong>and</strong> Western Electric, which remained a wholly owned subsidiary of AT&T after<br />

the divestiture of the seven regional Bell companies. Bell Laboratories would eventually<br />

become one of the world’s premier communication <strong>and</strong> computer research centers. One<br />

of Bell Labs contributions to computing was the development of UNIX by Dennis<br />

Ritchie <strong>and</strong> Kenneth Thomson in 1970. In 1991, AT&T acquired NCR, formerly<br />

National Cash Register, which became AT&T Global Information Solutions. xi<br />

The explosion in population growth between 1880 <strong>and</strong> 1890,<br />

due to increased birth rates <strong>and</strong> immigration, created a great<br />

dilemma for the Census Bureau. During this time, Herman<br />

Hollerith (right) was a statistician for the Census Bureau <strong>and</strong><br />

was responsible to solve problems related to the processing<br />

of large amounts of data from the 1880 US census. He was<br />

attempting to find ways of manipulating data mechanically as<br />

was suggested to him by Dr. John Shaw Billings. In 1882,<br />

Hollerith joined MIT to teach mechanical engineering <strong>and</strong><br />

also started to experiment with Billings’ suggestion by<br />

studying the operation of the Jacquard loom. Though he<br />

found that the loom’s operation was not useful for processing data, he determined that the<br />

punched cards were very useful for storing data. In 1884, Hollerith devised a method to<br />

convert the data stored on the punched cards into electrical impulses using card-reading<br />

device. He also developed a typewriter-like device to record the data on the punched<br />

cards, which changed very little in its design over the next 50 years. The card readers<br />

used pins that pass through the holes in the cards creating electrical contacts, where the<br />

impulses from these contacts would activate mechanical counters to manipulate <strong>and</strong> tally<br />

the data. This system was successfully demonstrated in 1887 by tabulating mortality<br />

statistics <strong>and</strong> won the bid to be used to tabulate the 1890 Census data.<br />

Hollerith had Pratt <strong>and</strong> Whitney manufacture the<br />

punching devices <strong>and</strong> the Western Electric<br />

Company to manufacture the counting devices. The<br />

Census Bureau’s new system was ready by 1890<br />

<strong>and</strong> processing the first data by September the same<br />

year. The count was completed by December 12,<br />

1890 revealing that the total population of the<br />

United States to be 62,622,250. The count was not<br />

only completed eight times faster than if it was<br />

performed manually, it also allowed the gathering<br />

15


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

of more data than was possible before about the country’s population, such as number of<br />

children in family, etc. Hollerith founded the Tabulating Machine Company in 1896 to<br />

produce his improved counting machines <strong>and</strong> other inventions, one of which<br />

automatically fed the cards into the counting machines. His system was used again for<br />

the 1900 Census but because Hollerith dem<strong>and</strong>ed more that the cost to count the data by<br />

h<strong>and</strong>, the Census Bureau was forced to develop its own system. In 1911, Hollerith’s<br />

company merged with another company, becoming the Computer Tabulating Recording<br />

Company but was nearly forced out of the counting machine market due to fierce<br />

competition from new entrants. Hollerith retired his position of consulting engineer in<br />

1921. Because of the efforts Thomas J Watson, who joined the company in 1918, the<br />

company reestablished its position as a leader in the market by 1920. In 1924, Computer<br />

Tabulating Recording Company was renamed as International Business Machines<br />

Corporation (IBM). By 1928, punch card equipment will be attached to computers as<br />

output devices <strong>and</strong> will also be used by L. J. Comrie to calculate the motion of the<br />

moon. xii<br />

In 1895, Italian physicist <strong>and</strong> inventor<br />

Guglielmo Marconi sent the first<br />

wireless message. Prior to his first<br />

transmission, Marconi studied the works<br />

of Heinrich Hertz (1857-1894) <strong>and</strong> later<br />

started to experiment with Hertzian<br />

waves to transmit <strong>and</strong> receive messages<br />

over increasing distances without the use<br />

of wires. The messages were sent in<br />

Morse code. He patented his invention<br />

in 1896. After years of<br />

experimentation <strong>and</strong> improvement,<br />

especially with respect to distance, in<br />

1897 Marconi named his company as the Wireless Telegraph <strong>and</strong> Signal Company. After<br />

a series of takeovers <strong>and</strong> mergers, this company eventually became part of the General<br />

Electric Company (GEC), which was eventually renamed Marconi Corporation plc in<br />

2003. xiii In 1904, radio technology was improved by the<br />

invention of the two-electrode radio rectifier, which was<br />

the first electron tube, also called the oscillation valve or<br />

thermionic valve (left). It is credited to John Ambrose<br />

Fleming, a consultant to the Marconi Company. This<br />

device was much more sensitive to radio signals then its<br />

predecessor, the coherer. This invention inspired all<br />

16


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

subsequent developments in wireless transmission. In<br />

1906, Lee de Forest improved the thermionic valve by<br />

adding a third electrode <strong>and</strong> a grid to control <strong>and</strong> amplify<br />

signals, creating a new device called an Audion. This<br />

device was used to detect radio waves <strong>and</strong> convert the<br />

radio frequency (RF) to an audio frequency (AF), which<br />

could be amplified through a loudspeaker or headphones.<br />

By 1907 gramophone music was regularly broadcast from<br />

New York over radio waves. xiv In 1907, both A. A.<br />

Campbell-Swinton<br />

()(left) <strong>and</strong> Boris<br />

Rosing ()<br />

independently suggest<br />

using cathode ray tubes to transmit images. Though<br />

intended for television, the cathode ray tube has made<br />

a valuable contribution to computing by providing a<br />

human readable interface with computational devices.<br />

In a letter to Nature magazine, Swinton describes first<br />

full description of an all-electronic television system<br />

as:<br />

“Distant electric vision can probably be solved by the<br />

employment of two beams of kathode rays (one at the<br />

transmitting <strong>and</strong> one at the receiving station)<br />

synchronously deflected by the varying fields of two<br />

electromagnets placed at right angles to one another <strong>and</strong> energised by two alternating<br />

electric currents of widely different frequencies, so that the moving extremities of the two<br />

beams are caused to sweep synchronously over the whole of the required surfaces within<br />

the one-tenth of a second necessary to take advantage of visual persistence. Indeed, so<br />

far as the receiving apparatus is concerned, the moving kathode beam has only to be<br />

arranged to impinge on a suitably sensitive fluorescent screen, <strong>and</strong> given suitable<br />

variations in its intensity, to obtain the desired result.”<br />

In 1927, during a television demonstration, Herbert Hoover’s face is the first image<br />

broadcast in the U.S., using telephone wires for the voice transmission. Vladimir<br />

Zworykin invented the cathode ray tube (CRT) in 1928. It eventually became the first<br />

computer storage device. Color television signals were successfully transmitted in 1929<br />

<strong>and</strong> first broadcast in 1940.<br />

17


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

In 1911, while studying the effects of extremely cold temperatures on metals such as<br />

mercury <strong>and</strong> lead, physicist Heike Kamerlingh Onnes discovered that they lost all<br />

resistance at certain low temperatures just above absolute zero. This phenomenon is<br />

known as superconductivity. In 1915, another physicist, Manson Benedicks, discovered<br />

that alternating current could be converted to direct current by using a germanium crystal,<br />

which eventually leads to the use of microchips. In 1919, U.S. physicists William Henry<br />

Eccles (1875 - 1966) <strong>and</strong> F.W. Jordan () invented the flip-flop, the first electronic<br />

switching electric circuit, which was critical to high-speed electronic counting systems.<br />

The flip-flop is a digital logic hardware circuit that can switch or toggle between two<br />

states controlled by its inputs, which is similar to a one-bit memory. The three common<br />

types of flip-flop are: the SR flip-flop, the JK flip-flop <strong>and</strong> the D-type flip-flop (shown<br />

below).<br />

In 1925, Vannevar<br />

Bush (1890 - 1974)<br />

developed the first<br />

analog computer to<br />

solve differential<br />

equations. These<br />

analog computers<br />

were mechanical<br />

devices that used<br />

large gears <strong>and</strong> other<br />

mechanical parts to<br />

solve equations. The<br />

first working machine<br />

was completed in<br />

1931 (left). In 1945,<br />

he published an<br />

article in the Atlantic Monthly called, "As We May Think, which described a theoretical<br />

device called a memex. This device uses a microfilm search system, which is very<br />

similar to hypertext, using a concept that he called associative trails. His description of<br />

the system is:<br />

18


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

"The owner of the memex let us say, is interested in the<br />

origin <strong>and</strong> properties of the bow <strong>and</strong> arrow. Specifically he<br />

is studying why the short Turkish bow was apparently<br />

superior to the English long bow in the skirmishes of the<br />

Crusades. He has dozens of possibly pertinent books <strong>and</strong><br />

articles in his memex. First he runs through an<br />

encyclopedia, finds an interesting but sketchy article,<br />

leaves it projected. Next, in a history, he finds another<br />

pertinent item, <strong>and</strong> ties the two together. Thus he goes,<br />

building a trail of many items. Occasionally he inserts a<br />

comment of his own, either linking it into the main trail or<br />

joining it by a side trail to a particular item. When it<br />

becomes evident that the elastic properties of available<br />

materials had a great deal to do with the bow, he branches<br />

off on a side trail which takes him through textbooks on<br />

elasticity <strong>and</strong> physical constants. He inserts a page of longh<strong>and</strong> analysis of his own. Thus<br />

he builds a trail of his interest through the maze of materials available to him."<br />

In 1934, Konrad Zuse (1910 - 1995) was an engineer<br />

working for Henschel Aircraft Company, studying<br />

stresses caused by vibrations in aircraft wings. His<br />

work involved a great deal of mathematical calculation.<br />

To aid him in these calculations, he developed ideas on<br />

how machines should perform calculations. He<br />

determined that these machines should be freely<br />

programmable by reading a sequence of instructions<br />

from a punched tape <strong>and</strong> that the machine should make<br />

use of both the binary number system <strong>and</strong> a binary logic<br />

system to be capable of using binary switching<br />

elements. He designed a semi-logarithmic floatingpoint<br />

unit representation, using an exponent <strong>and</strong> a<br />

mantissa, to calculate both very small <strong>and</strong> very large<br />

numbers. He developed a “high performance adder”,<br />

which included a one-step carry-ahead <strong>and</strong> precision<br />

arithmetic exceptions h<strong>and</strong>ling. He also developed an addressable memory that could<br />

store arbitrary data. He devised a control unit to control all other devices within the<br />

machine along with input <strong>and</strong> output devices that convert numbers from binary to<br />

decimal <strong>and</strong> vice versa.<br />

By 1936 he completed the design for the Z1 computer (top next page), which he<br />

constructed in his parents’ living room by 1938. This was a completely mechanical unit<br />

19


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

based on his previous design.<br />

Though unreliable, it had the<br />

ability to store 64 words, each 22<br />

bits in length (8 bits for the<br />

exponent <strong>and</strong> sign, <strong>and</strong> 14 bits for<br />

the mantissa), in its memory,<br />

which consisted of layers of metal<br />

bars between layers of glass. Its<br />

arithmetic unit was constructed<br />

from a large number of mechanical<br />

switches <strong>and</strong> had two 22-bit<br />

registers. The machine was freely<br />

programmable with the use of a<br />

punched tape. The device also had<br />

the prescribed control unit <strong>and</strong><br />

addressable memory, making it the world’s first programmable binary computing<br />

machine, with a clock speed of 1-Hertz. The picture above is a topside view of the Z1,<br />

which is very similar in appearance to a silicon chip. At first the machine was not very<br />

reliable. However, it functioned reliably by 1939.<br />

The Z2 was an experimental<br />

machine similar to the Z1 but<br />

used 800 relays for the<br />

arithmetic unit instead of<br />

mechanical switches. This<br />

machine proved that relays<br />

were reliable, which prompted<br />

Zuse to design <strong>and</strong> build the Z3<br />

using relays. The Z3 was<br />

constructed between 1938 <strong>and</strong><br />

1941 in Berlin. The Z3 used<br />

relays for the entire machine<br />

<strong>and</strong> had a 64-word memory,<br />

consisting of 22-bit floatingpoint<br />

numbers. The Z3 was the<br />

first reliable, fully functional, freely programmable computer based on the binary<br />

floating-point number <strong>and</strong> a switching system, which had the capability to perform<br />

complex arithmetic calculations. It had a clock speed of 5.33 Hertz <strong>and</strong> could perform a<br />

multiplication operation in 3 seconds. This machine contained all the components except<br />

the ability to store the program in the memory together with the data that was described<br />

by the von Neumann et al machine in 1946. In 1998, Raul Rojas proved that the Z3 was<br />

20


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

a truly universal computer in the sense of a Turing machine. The picture above is Zuse<br />

along with his 1961 reconstruction of the Z3. Allied bombing, during World War II,<br />

destroyed the original Z3.<br />

An example program from “The Life <strong>and</strong> Work of Konrad Zuse” Web Site, authored by<br />

Horst Zuse, listed in the references section, for the Z3 is the calculation of the<br />

polynomial: ((a4x + a3)x + a2)x + a1, where a4, a3, a2, <strong>and</strong> a1 would first be loaded into<br />

the memory cells 4, 3, 2, <strong>and</strong> 1.<br />

Lu To call the input device for the variable x<br />

Ps 5 To store variable x in memory word 5<br />

Pr 4 Load a4 in Register R1<br />

Pr 5 Load x in Register R2<br />

Lm Multiply: R1 := R1 x R2<br />

Pr 3 Load a3 in Register R2<br />

Ls1 Add: R1 := R1 + R2<br />

Pr 5 Load x in R2<br />

Lm Multiply: R1 := R1 x R2<br />

Pr 2 Load a2 in Register R2<br />

Ls1 Add: R1 := R1 + R2<br />

Pr 5 Load x in Register R2<br />

Lm Multiply: R1 := R1 x R2<br />

Ppr 1 Load a1 in Register R2<br />

Ls1 Add: R1 := R1 + R2<br />

Ld Shows the result as a decimal number<br />

The program above is very<br />

similar to the assembly code<br />

that is used in modern<br />

computers. From 1942 to<br />

1946 Zuse began to develop<br />

ways to program computers.<br />

To aid engineers <strong>and</strong><br />

scientists in the solution of<br />

complex problems, he<br />

developed the Plankakül<br />

(plan calculus) programming<br />

language. This precursor to<br />

today’s algorithm-type<br />

languages was the world’s<br />

first programming language<br />

<strong>and</strong> was intended for a<br />

logical machine. A logical machine could do more than just numerical calculations, of<br />

which the algebraic machines (Z1, Z2, Z3 & Z4) that he had previously designed are<br />

limited. The picture on the left is the Z4 model, completed in 1945 <strong>and</strong> reconstructed in<br />

21


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

1950, which used a mechanical memory, similar to that in the Z1, <strong>and</strong> had 32-bit words.<br />

By 1955, this machine had the added abilities to call subprograms, through a secondary<br />

punch tape reader, <strong>and</strong> use a conditional branch instruction.<br />

In 1942, Zuse built the S1, a special purpose computer to measure the wing surface area<br />

of airplanes, with 600 relays <strong>and</strong> 12-bit binary words. This machine was destroyed in<br />

1944. Zuse improved this model with the construction of the S2. This machine used<br />

approximately 100 clock gauges to automatically scan the surface of wings. This<br />

computer was most likely the first machine to use the concept of a process. It was<br />

destroyed in 1945. In 1949, he founded Zuse KG, Germany’s first computer company.<br />

In 1952, Zuse KG constructed the Z5 for optical calculations, an improved version of the<br />

Z4, which was about six times faster. It had many punch card readers for data <strong>and</strong><br />

program input, a punch card writer to output data <strong>and</strong> could h<strong>and</strong>le 32-bit floating-point<br />

numbers. In 1957, Zuse KG constructed the Z22 that contained an 8192-word magnetic<br />

drum <strong>and</strong> was the first stored program computer. In 1961, Zuse KG built the Z23, which<br />

was based on the same logic as <strong>and</strong> three times faster than the Z22, <strong>and</strong> was the first<br />

transistor-based computer. In 1965, his company produced the Z43, which was the first<br />

modern transistor computer to use TTL logic. The TTL (transistor-transistor-logic) type<br />

digital integrated circuit (IC) uses transistor switches for logical operations. In 1956,<br />

Siemens AG purchased Zuse KG. xv<br />

In 1937, Howard Aiken (1900 - 1973) proposed a machine that could perform four<br />

fundamental operations of arithmetic, addition, subtraction, multiplication <strong>and</strong> division,<br />

in a predetermined order to Harvard University, which was forwarded to IBM. His<br />

research had led to a system of differential equations that could only be solved using a<br />

prohibitive amount of calculations using numerical techniques <strong>and</strong> which had no exact<br />

solutions. His report stated:<br />

“... whereas accounting machines h<strong>and</strong>le only positive<br />

numbers, scientific machines must be able to h<strong>and</strong>le negative<br />

ones as well; that scientific machines must be able to h<strong>and</strong>le<br />

such functions as logarithms, sines, cosines <strong>and</strong> a whole lot of<br />

other functions; the computer would be most useful for<br />

scientists if, once it was set in motion, it would work through<br />

the problem frequently for numerous numerical values without<br />

intervention until the calculation was finished; <strong>and</strong> that the<br />

machine should compute lines instead of columns, which is<br />

more in keeping with the sequence of mathematical events.”<br />

22


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Aiken, working with IBM engineers, developed the ASCC computer (Automatic<br />

Sequence Controlled Calculator), which was capable of five operations, addition,<br />

subtraction, multiplication, division <strong>and</strong> reference to previous results. Though it ran on<br />

electricity <strong>and</strong> the major components were magnetically operated switches, this machine<br />

had a lot in common with Babbage's analytical engine. Construction of the machine<br />

started in 1939 at the IBM laboratories, Endicott <strong>and</strong> was completed in 1943. The<br />

machine weighed 35 tons, had more than 500 miles of wire, <strong>and</strong> used vacuum tubes <strong>and</strong><br />

relays to operate. The machine had 72 storage registers <strong>and</strong> could perform operations to<br />

23 significant figures. The machine instructions were entered on punched paper tapes,<br />

<strong>and</strong> punched cards were used to enter input data. The output was either in the form of<br />

punched cards or printed by means of an electric typewriter. The machine was moved to<br />

Harvard University, where it was renamed the Harvard Mark I, pictured above. The US<br />

navy used this machine in the Bureau of Ordnance’s Computation Project for gunnery<br />

<strong>and</strong> ballistics calculations, which was performed at Harvard. In 1947, Aiken completed<br />

the Harvard Mark II, which was a completely electronic<br />

computer. He also worked on the Mark III (the first<br />

computer to contain a drum memory) <strong>and</strong> Mark IV<br />

computers, <strong>and</strong> made contributions in electronics <strong>and</strong><br />

switching theory. xvi<br />

In 1937, Claude Shannon (1916 - 2001) wrote his Master's<br />

thesis, “A Symbolic Analysis of Relay <strong>and</strong> Switching<br />

Circuits”, using symbolic logic <strong>and</strong> Boole's algebra to<br />

analyze <strong>and</strong> optimize relay-switching <strong>and</strong> computer circuits.<br />

It was published in A.I.E.E. Transactions in 1938. For this<br />

work, Shannon was awarded the Alfred Nobel Prize of the<br />

combined engineering societies of the United States in<br />

1940. In 1948, Shannon published his most important work<br />

on information theory <strong>and</strong> communication, “A<br />

23


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Mathematical Theory of Communication”, where he demonstrated that all information<br />

sources have a “source rate” <strong>and</strong> all communication channels have a “capacity”, both<br />

measurable in bits-per-second, <strong>and</strong> that the information can be transmitted over the<br />

channel if <strong>and</strong> only if the capacity of the channel is not exceeded by the source rate. He<br />

also published works related to cryptography <strong>and</strong> the reliability of relay circuits, both<br />

with respect to transmission in noisy channels. xvii<br />

George Stibitz, a Bell Labs researcher, created the first electromechanical circuit that<br />

could control binary addition from old relays, batteries, flashlight bulbs, wires <strong>and</strong> tin<br />

strips in 1937. He realized that Boolean logic could be used for electromechanical<br />

telephone relays. He incorporated this binary adder (picture on left with Stibitz)<br />

prototype in his Model K digital calculator. Over the next two years, Stibitz <strong>and</strong> his<br />

associates at Bell Labs devised a machine to perform all four basic math operations on<br />

complex numbers. It was initially called the Complex Number Calculator but was<br />

renamed the Bell Labs Model Relay Computer (also known as the Bell Labs Model 1) in<br />

1949. This machine is considered to be the world's first electronic digital computer. Its<br />

electromechanical brain consisted of 450 telephone relays <strong>and</strong> 10 crossbar switches, <strong>and</strong><br />

three teletypewriters provided input to the machine. It could find the quotient of two<br />

eight-place complex numbers in about 30 seconds. Stibitz brought one of the typewriters<br />

to an American<br />

Mathematical<br />

Association<br />

meeting in 1940<br />

at Dartmouth<br />

<strong>and</strong> performed<br />

the world's first<br />

demonstration<br />

of remote<br />

computing by<br />

using phone<br />

lines to<br />

communicate<br />

with the<br />

Complex<br />

Number<br />

Calculator,<br />

which was in<br />

New York. xviii<br />

24


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

In 1937, Alan Turing (1912 - 1954) published his<br />

paper “On Computable Numbers, with an<br />

application to the Entscheidungsproblem (decision<br />

problem)”. In this paper, he introduced the Turing<br />

Machine, which was an abstract machine capable of<br />

reading or writing symbols <strong>and</strong> moving between<br />

states, dependent upon the symbol read from a bidirectional,<br />

movable tape, using a set of finite rules<br />

listed in a finite table. This machine demonstrated<br />

that every method found for describing ‘welldefined<br />

procedures’, introduced by other<br />

mathematicians, could be reproduced on a Turing<br />

machine. This statement is known as the Church-<br />

Turing thesis <strong>and</strong> is a founding work of modern<br />

computer science, which defined computation <strong>and</strong><br />

its absolute limitation. His definition of computable<br />

was that a problem is ‘Calculable by finite means’.<br />

In 1938, his Ph.D. thesis, which was published as “Systems of Logic based on Ordinals”<br />

in 1939, Turing addressed uncomputable problems.<br />

During World War II, Turing worked at Bletchley Park,<br />

the British government's wartime communications<br />

headquarters. His main task was to master the Enigma<br />

(pictured right), the German enciphering machine,<br />

which he was able to crack, providing the Allies with<br />

valuable intelligence. His contributions made him a<br />

chief scientific figure in the fields of computation <strong>and</strong><br />

cryptography. After the war, he was interested in the<br />

comparison of the power of computation <strong>and</strong> the power<br />

of the human brain. He proposed the possibility that a<br />

computer, if properly programmed, could rival the<br />

human mind. In 1950, Turing wrote his famous paper<br />

"Computing Machinery <strong>and</strong> Intelligence," which, along<br />

with his previous work, founded the study of ‘Artificial<br />

Intelligence’. This paper introduces ‘the imitation<br />

game’, which is a test to determine if a computer<br />

program has intelligence. This game is now referred to<br />

as the Turing Test. Turing describes the original<br />

imitation game as:<br />

25


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

“The new form of the problem can be described in terms of a game which we call the<br />

‘imitation game.’ It is played with three people, a man (A), a woman (B), <strong>and</strong> an<br />

interrogator (C) who may be of either sex. The interrogator stays in a room apart from<br />

the other two. The object of the game for the interrogator is to determine which of the<br />

other two is the man <strong>and</strong> which is the woman. He knows them by labels X <strong>and</strong> Y, <strong>and</strong> at<br />

the end of the game he says either "X is A <strong>and</strong> Y is B" or "X is B <strong>and</strong> Y is A." The<br />

interrogator is allowed to put questions to A <strong>and</strong> B.”<br />

The idea in the Turing Test is that the interrogator (C) is actually communicating with<br />

human (A), a machine (B). The interrogator asks the two c<strong>and</strong>idates questions to decide<br />

their identities, as above with the man <strong>and</strong> woman. In order to prove that it’s program is<br />

intelligent, the machine must fool the interrogator into choosing it as the human. xix<br />

Between 1937 <strong>and</strong><br />

1938, John<br />

Vincent Atanasoff<br />

(far left) <strong>and</strong><br />

Clifford Berry<br />

devised the<br />

principals for the<br />

ABC machine<br />

(right), an<br />

electronic-digital<br />

machine that<br />

would lead to<br />

advances in digital computing machines. This nonprogrammable<br />

binary machine’s construction began in 1941<br />

but was stopped in 1942 due to World War II before<br />

becoming operational. This machine employed capacitors to<br />

store electrical charge that could correspond to numbers in<br />

the form of logical 0’s <strong>and</strong> 1’s. This was the first machine to<br />

demonstrate electronic techniques in calculation <strong>and</strong> to use<br />

regenerative memory. It contained 300 vacuum tubes in its<br />

arithmetic unit <strong>and</strong> 300 more in its control unit. The capacitors were affixed inside of 12-<br />

inch tall by 8-inch diameter rotating Bakelite (a thermosetting plastic) cylinders (shown<br />

below) with metal contact b<strong>and</strong>s on their outer surface. Each cylinder contained 1500<br />

capacitors <strong>and</strong> could store 30 binary numbers, 50 bits in length, which could be read from<br />

or written to the metal b<strong>and</strong>s of the rotating cylinder. The input data was loaded on<br />

punched cards. Intermediate data was also stored on punched cards by burning small<br />

spots onto the cards with electric sparks, which could be re-read by the computer at some<br />

26


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

later time by detecting the difference in electrical resistance of the carbonized burned<br />

spots. This machine could also convert from binary to decimal <strong>and</strong> vice versa. xx<br />

In 1943, the U.S. Army contracted with the Moore School of Electrical Engineering,<br />

University of Pennsylvania, for the production of the Electrical Numerical Integrator <strong>and</strong><br />

Computer (ENIAC), which would be used to calculate ballistic tables, which was<br />

designed by J. Presper Eckert (1919-1995) <strong>and</strong> John Mauchly (1907-1980). The 30-ton<br />

machine with approximately 18,000 vacuum tubes was completed in 1946 <strong>and</strong> was<br />

contained in a 30’ by 50’ room.<br />

The ENIAC was a general-purpose digital electronic computer that could call<br />

subroutines. It could reliably perform 5,000 additions or 360 multiplications per second,<br />

which was between 100 <strong>and</strong> 1000 times faster than existing technology. At the time of<br />

its introduction, ENIAC was the world’s largest single electronic apparatus. This<br />

machine was separated into thirty autonomous units. Twenty of these were accumulators,<br />

which were ten-digit, high-speed adding machines with the ability to store results. These<br />

accumulators used electronic circuits called ring counters, a loop of bistable devices (flipflops)<br />

interconnected in such a manner that only one of the devices may be in a specified<br />

27


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

state at one time, to count each of<br />

its digits from 0 to 9 (a decimal<br />

arithmetic unit). The machine<br />

also had a multiplier <strong>and</strong> dividersquare<br />

rooter, which special<br />

devices to accelerate their<br />

respective arithmetic operations.<br />

A “computer program” on<br />

ENIAC was entered by using<br />

wires to connect different units<br />

of the machine as to perform<br />

operations is a required<br />

sequence. The picture on the left<br />

shows two women entering a<br />

program, which was a very<br />

difficult task. The machine was controlled by a sequence of electronic pulses, in which<br />

each unit on the machine could issue a pulse to cause one or more other units to perform<br />

a computation. The control <strong>and</strong> data signals on ENIAC were identical, typically were 2<br />

microsecond pulses placed at ten microsecond intervals, which could allow for the output<br />

28


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

of an accumulator to be attached to the input of a control line of another accumulator.<br />

This could allow data-sensitive operations or operations based on data content. It also<br />

had a unit called the “Master Programmer”, which performed nested loops or iterations.<br />

ENIAC’s units could operate simultaneously, performing parallel calculations.<br />

Eventually this machine could perform IF-<strong>THE</strong>N conditional branches. It is likely that<br />

this was the first machine with this operation. xxi<br />

In 1944, because of suggested improvements from people involved with the project, the<br />

U.S. Army extended the ENIAC project to include research on Electronic Discrete<br />

Variable Automatic Computer (EDVAC), a stored program computer. At about this<br />

time, John von Neumann (1903 - 1957) visited the Moore School to take part in<br />

discussions regarding EDVAC’s design. He is best known for producing the bestrecognized<br />

formal description of a modern computer, based on a stored program<br />

computer, known as the von Neumann architecture, in his 1946 paper "First Draft of a<br />

report to the EDVAC". The basic elements of this architecture are:<br />

29


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

• A memory, which contains both data <strong>and</strong> instructions <strong>and</strong> also allows both data<br />

<strong>and</strong> instruction locations to be read from, <strong>and</strong> written to, in any order.<br />

• A calculating unit, which can perform both arithmetic <strong>and</strong> logical operations on<br />

the data.<br />

• A control unit, which can interpret retrieved memory instructions <strong>and</strong> select<br />

alternative courses of action based on the results of previous operations.<br />

The EDVAC was a multipurpose binary computing machine with a memory capacity of<br />

1,000 words, which was more than any other computing device of its time. Its memory<br />

worked by using mercury delay lines, tubes of mercury in which electrical impulses were<br />

bounced back <strong>and</strong> forth, creating a two-state device for storing 0’s <strong>and</strong> 1’s, which could<br />

be assigned or retrieved at will. It used 12 of 16 possible 4-bit instructions <strong>and</strong> each word<br />

in memory had 44 bits. The integer range was ±1-2 43 <strong>and</strong> the floating-point numbers had<br />

a 33-bit mantissa, 10 bit exponent <strong>and</strong> 1 bit for the sign, with a range ± (1-2 -33 )2 511 . It<br />

had approximately 10,000 crystal diodes <strong>and</strong> 4,000 vacuum tubes. Its average error-free<br />

up-time was about 8 hours. Its magnetic drum could hold 4,608 words 48 bits in length<br />

<strong>and</strong> a block transfer length of between 1 <strong>and</strong> 384 words. It also had a magnetic tape<br />

storage system that could store 112 characters per inch on a magnetic wire that was<br />

between 1,250 <strong>and</strong> 2500 feet long with a variable block length of between 2 <strong>and</strong> 1024<br />

words also 48 bits long. During searches of the tape the machine could be released for<br />

computation <strong>and</strong> data read from the tape could be automatically re-recorded to the same<br />

place on the tape. EDVAC’s input devices consisted of a photoelectric tape reader could<br />

read 78 words per second <strong>and</strong> an IBM card reader that could read 146 cards per minute at<br />

8 words per card. The output devices were a 30 word per minute paper tape perforator, a<br />

30 word per minute teletypewriter <strong>and</strong> a 1000 word per minute cardpunch. This machine<br />

had a clock speed of 1 MHz <strong>and</strong> was a significant improvement over ENIAC. xxii<br />

Thomas Flowers <strong>and</strong><br />

crew started<br />

construction on the<br />

Mark 1 COLOSSUS<br />

computer in 1943 at<br />

Dollis Hill Post Office<br />

Research Station in the<br />

U.K. Max Newman<br />

<strong>and</strong> associates of<br />

Bletchley Park (‘Station<br />

X’), Buckinghamshire,<br />

designed this machine,<br />

which was primarily<br />

intended for<br />

30


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

cryptanalysis of German Fish teleprinter ciphers used during World War II. This<br />

electromechanical attempt at a one-time pad was the German military’s most secure<br />

method of communication. Prior to knowledge of Zuse’s Z3, this was considered to be<br />

the first totally electronic computing device, using only vacuum tubes as opposed to<br />

relays in the Z3. This special purpose computer was equipped with very fast optical<br />

punch card readers for input. Nine of the improved Mark II machines were constructed<br />

<strong>and</strong> the original COLOSSUS Mark I was converted, for a total of ten machines. These<br />

machines were considered to be of the highest level of secrecy. After the end of the war,<br />

by direct orders from Churchill, all ten machines were destroyed—reduced into pieces no<br />

larger than a man’s h<strong>and</strong>. The COLOSSUS, Heath Robinson (precursor to the<br />

COLOSSUS) <strong>and</strong> the Bombe (a machine designed by Alan Turing) are all in the process<br />

of reconstruction to preserve these important achievements.<br />

The Universal Automatic Computer I (UNIVAC I) was designed by J. Presper Eckert <strong>and</strong><br />

John Mauchly in 1947. The machine, constructed by Eckert-Mauchly Computer<br />

Corporation, founded by Eckert <strong>and</strong> Mauchly in 1946 but later purchased by Sperry-<br />

R<strong>and</strong>, was delivered to the US Census Bureau in 1951 at a cost of $159,000. By 1953,<br />

three UNIVACs were in operation <strong>and</strong> by 1958 there were forty-six in the service of<br />

government departments <strong>and</strong> private organizations. R<strong>and</strong> sold the later machines for<br />

more than $1,000,000 each.<br />

31


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

UNIVAC’s input consisted of 12,800 character per second magnetic tape reader, a 240<br />

card per minute card to tape converter <strong>and</strong> a punched paper tape to magnetic tape<br />

converter. Its output consisted of a12,800 character per second magnetic tape reader, a<br />

120 card per minute card to tape converter, a 10 character per second character printer, a<br />

Uniprinter (a 600 line per minute high-speed line printer developed by Earl Masterson in<br />

1954) <strong>and</strong> a 60 word per minute Rad Lab buffer. This was the first machine to use a<br />

buffered memory. It had 5,200 vacuum tubes, 18,000 crystal diodes, 300 relays <strong>and</strong><br />

contained a mercury delay line memory that could hold 1,000 words 72 bits in length (11<br />

decimal digits plus sign). The 8 ton, 25 by 50 feet machine consumed 125,000 Watts of<br />

power—31,250 times as much as a desktop computer (the average desktop consumes less<br />

than 400 Watts). It could perform 1,900 additions, 465 multiplications or 256 divisions<br />

per second. The machine also had a character set, similar to a typewriter keyboard, with<br />

capital letters. In 1956 a commercial UNIVAC computer was introduced that used<br />

transistors.<br />

In 1943, the Massachusetts Institute of Technology (MIT) started the Whirlwind Project,<br />

under the supervision of Jay Forrester, for the U.S. Navy after determining that it was<br />

possible to produce a computer to run a flight simulator for training bomber crews.<br />

Initially, they attempted to use an analog machine but found that it was neither flexible<br />

nor accurate. Another problem was the typical batch-mode computers of the day were<br />

32


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

not computationally sufficient for<br />

time constrained processing because<br />

they could not continually operate on<br />

continually changing input.<br />

Whirlwind also required much more<br />

speed than typical computational<br />

systems. The design of this highspeed<br />

stored-program computer was<br />

completed by 1947 <strong>and</strong> 175 people<br />

started construction in 1948. The<br />

system was completed in three years,<br />

when the U.S. Air Force picked it up<br />

because the Navy had lost interest,<br />

renaming it Project Claude. This machine was too slow <strong>and</strong> improvements were<br />

implemented to increase performance. The initial machine used Williams tubes, cathode<br />

ray tubes that were used to store electronic data, which were unreliable <strong>and</strong> slow.<br />

Forrester exp<strong>and</strong>ed on the work of An Wang, who created the pulse transfer-controlling<br />

device in 1949. The product was magnetic core memory (upper left), which permanently<br />

stores binary data on tiny donut shaped magnets strung together by a wire grid. This<br />

approximately doubled the memory speed of the new machine, completed in 1953.<br />

Whirlwind was the world’s first real-time computer <strong>and</strong> the first computer to use the<br />

cathode ray tube, which at this time was a large oscilloscope screen, as a video monitor<br />

for an output device.<br />

The new machine was used in the<br />

Semi Automated Ground<br />

Environment (SAGE), which was<br />

manufactured by IBM <strong>and</strong> became<br />

operational in 1958. The picture on<br />

the right shows a SAGE terminal.<br />

This system coordinated a complex<br />

system of radar, telephone lines,<br />

radio links, aircraft <strong>and</strong> ships. It<br />

could identify <strong>and</strong> detect aircraft<br />

when they entered U.S. airspace.<br />

SAGE was contained in a 40,000<br />

square foot area for each two-system<br />

installation, had 30,000 vacuum<br />

tubes, had a 4k by 32-bit word magnetic drum memory <strong>and</strong> used 3 megawatts of power.<br />

In 1958, the Whirlwind project was also extended to include an air traffic control system.<br />

The last Whirlwind-based SAGE computer was in service until 1983. xxiii<br />

33


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

In 1946, work started on the Electronic Delay Storage Automatic Calculator (EDSAC), a<br />

serial electronic calculating machine, at Cambridge. It was contained in a 5 by 4 meter<br />

room, had 3000 valves, consumed 12,000 Watts <strong>and</strong> could perform 650 instructions per<br />

second at 500kHz. Its mercury ultrasonic delay line memory could 1024 words 17 bits in<br />

length (35-bit “long” digits could be contained by using two adjacent memory “tanks”)<br />

<strong>and</strong> had an “Operating System” (called “initial orders”) that was stored in 31 words in<br />

read-only memory”. The input device consisted of a 6⅔ character per second 5-track<br />

teleprinter paper tape reader <strong>and</strong> output was performed on a 6⅔ character per second<br />

teleprinter. A commercial version of EDSAC, called LEO, which was manufactured by<br />

the Lyons Company, began service in 1953. Cambridge was the first university in the<br />

world to offer a Diploma in Computer Science, using EDASC, which was initially a oneyear<br />

post graduate course called Numerical Analysis <strong>and</strong> Automatic Computing. xxiv<br />

34


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

In 1948, at the University of Manchester in Engl<strong>and</strong>, the Small Scale Experimental<br />

Machine, nicknamed the “Baby”, successfully executed its first program, becoming<br />

world's first stored-program electronic digital computer. Frederic C. Williams (1911 -<br />

1977) <strong>and</strong> Tom Kilburn (1921 - 2001) built the machine to test the Williams-Kilburn<br />

Tube (type of memory composed of cathode vacuum tubes storing one bit of information<br />

on a cathode ray tube, illuminating a point on the screen that stays on) for speed <strong>and</strong><br />

reliability, <strong>and</strong> to demonstrate the feasibility of a stored program computer. Its success<br />

prompted the development of the Manchester Mark I, a useable computer based on the<br />

same principals. The picture shows the “Baby” (replica), the shortest cabinet at the right,<br />

<strong>and</strong> the Mark I, the six taller cabinets.<br />

35


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

The picture on<br />

the left shows<br />

Williams <strong>and</strong><br />

Kilburn at the<br />

console of the<br />

Manchester<br />

Mark I. It was<br />

built in 1949<br />

<strong>and</strong> could<br />

store data in<br />

addressable<br />

"line"s,<br />

holding one<br />

40-bit number<br />

or two 20-bit<br />

instruction<br />

registers, <strong>and</strong><br />

had two 20-bit<br />

address modifier registers, called "B-lines" (for modifying addresses in instructions),<br />

which functioned either as index registers or as base address registers. This Mark I was<br />

of historical significance because it is the first machine to include this index/base register<br />

in its architecture, which was a very important improvement. It was the first R<strong>and</strong>om<br />

Access Memory computer. It could perform serial 40-bit arithmetic, with hardware add,<br />

subtract <strong>and</strong> multiply (with an 80-bit<br />

double-length accumulator) <strong>and</strong> logical<br />

instructions. The average instruction<br />

time was 1.8 milliseconds (about 550<br />

additions per second), with<br />

multiplication taking much longer. It<br />

had a single-address format order code<br />

with about 30 function codes. The<br />

machine used two Williams tubes for<br />

its 128 words of memory. Each tube<br />

contained 64 rows with 40 points (bits)<br />

per row, which was two “page”s (A<br />

page was an array of 32 by 40 points).<br />

It also had a 128 page capacity drumbacking<br />

store, 2 pages per track, about<br />

30 milliseconds revolution time on 2<br />

drums (each drum could hold up to 32<br />

36


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

tracks, i.e. 64 pages).<br />

The machine’s peripheral instructions included a “read” from a 5-hole paper tape reader,<br />

on which the code was normally entered, <strong>and</strong> “transfer” a page or track to or from a<br />

Williams-Kilburn Tube page or pair of pages in storage. It also had a bank of 40 (8 by 5)<br />

buttons that could be used to set the ones in a word in storage. There were also additional<br />

switches that controlled the operations of the Mark I. The current storage contents could<br />

be viewed on the machine’s display tube, shown on the left, which was organized into 8<br />

columns of 5-bit groups. There was a direct correspondence between the symbols, each<br />

made up of a 5-bit group, on the punched cards <strong>and</strong> the symbols on the display tube. The<br />

government awarded the contract to mass-produce Mark I computers to Ferranti Ltd.,<br />

which was the world’s first commercially available computer. Kilburn wrote the first<br />

electronically stored computer program for the Mark I <strong>and</strong> also established the world’s<br />

first university computer science department at Manchester. xxv<br />

There were substantial improvements in computer programming <strong>and</strong> user interface design<br />

as well as hardware architecture. John Mauchly (ENIAC <strong>and</strong> UNIVAC) developed Short<br />

Order Code, which is thought to be the first high-level language in 1949, for the Binary<br />

Automatic Computer (BINAC) computer. The BINAC, completed in 1949, was designed<br />

for Northrop Aviation <strong>and</strong> was the first computer to use a magnetic tape. In 1951, David<br />

Wheeler, Maurice Wilkes, <strong>and</strong> Stanley Gill introduced sub-programs <strong>and</strong> the “Wheeler<br />

jump”, to implement them by moving to a different section of instructions <strong>and</strong> returning<br />

to the original section after the sub-program is finished. Maurice Wilkes also originated<br />

the concept of micro-programming, which is a technique for providing an orderly<br />

approach to designing the control section of a computer system.<br />

In 1951, while working with the UNIVAC I<br />

mainframe, Betty Holberton (left) created the sortmerge<br />

generator, which was predecessor to the<br />

compiler <strong>and</strong> may have been the first useful<br />

program that had the capability of generating other<br />

programs for the UNIVAC I, <strong>and</strong> developed the C-<br />

10 instruction code, which controlled the its core<br />

functions. The C-10 instruction code allowed<br />

UNIVAC to be controlled by a control console<br />

(keyboard) comm<strong>and</strong>s instead of switches, dials <strong>and</strong><br />

wires, which made the system much more useful<br />

<strong>and</strong> human friendly. The code was designed to use<br />

mnemonic characters to input instructions, such as<br />

‘a’ for add. She later was the chairperson for the<br />

37


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

committee that established the st<strong>and</strong>ards for the Common Business Oriented Language<br />

(COBOL). xxvi<br />

In 1952, Grace Murray Hopper developed A-0, which is believed<br />

to be the first real compiler or an intermediary program that<br />

converts symbolic mathematical code into a sequence of<br />

instructions that can be executed by a computer. This allowed<br />

the use of specific call numbers assigned to the collected<br />

programming routines that were stored on magnetic tape, which<br />

the computer could find <strong>and</strong> execute. In the same year she<br />

developed a compiler for business use, B-0 (later renamed<br />

FLOW-MATIC) that could translate English terms <strong>and</strong> wrote a<br />

paper that described the use of symbolic English notation to<br />

program computers, which is much easier to use than machine<br />

code that was previously used. While working on the UNIVAC<br />

I, she encouraged programmers to reuse common pieces<br />

of code that were known to work well, reducing<br />

programming errors. She was on the CODASYL Short<br />

Range Committee to define the basic COBOL language<br />

design, which appeared in 1959 <strong>and</strong> were greatly<br />

influenced by FLOW-MATIC. COBOL was launched in<br />

1960 <strong>and</strong> was the first st<strong>and</strong>ardized computer<br />

programming language for business applications.<br />

Various computer manufacturers <strong>and</strong> the Department of<br />

Defense supported development of the st<strong>and</strong>ard. It was<br />

intended to solve business problems, be machine<br />

independent <strong>and</strong> to be updated. COBOL has been<br />

updated <strong>and</strong> improved over the years, <strong>and</strong> is still used today. Hopper spent many years<br />

contributing to the st<strong>and</strong>ardization of compilers, which eventually led to international <strong>and</strong><br />

national st<strong>and</strong>ards <strong>and</strong> validation facilities for many programming languages. xxvii<br />

In 1956, John Backus <strong>and</strong> his IBM team created the first<br />

FORTRAN (short for FORmula TRANslation). The initial<br />

compiler consisted of 25,000 lines of machine code, which<br />

could be stored on magnetic tape. Backus <strong>and</strong> team wrote<br />

the paper “Preliminary Report, Specifications for the IBM<br />

Mathematical FORmula TRANslating System, FORTRAN”<br />

to communicate their discovery <strong>and</strong> to show that scientists<br />

<strong>and</strong> mathematicians could program without actually<br />

underst<strong>and</strong>ing how the machines worked or without<br />

knowing assembly language. It works by using a software<br />

38


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

device called a translator, which contains a parser to translate the high-level language that<br />

could be read by people to a binary language that can be executed on a computer. A later<br />

version of FORTRAN is still in use today, over 40 years later. Backus also developed a<br />

st<strong>and</strong>ard notation, Backus-Naur Form (BNF), to unambiguously <strong>and</strong> formally describe a<br />

computer language. BNF uses grammatical-type rules to describe a language.<br />

In 1947, a major event occurred in<br />

electronics <strong>and</strong> computation. John<br />

Bardeen, Walter Brattain <strong>and</strong> William<br />

Shockley (pictured in order on left)<br />

announced that they developed the<br />

transistor for which they were<br />

awarded the Nobel Prize in 1956.<br />

This invention ushered in a new era in<br />

computers. First generation<br />

computers used vacuum tubes as their principal digital circuits. Vacuum tubes generated<br />

heat, consumed electrical power <strong>and</strong> quickly burned out, requiring frequent maintenance.<br />

They were also used in telecommunications to amplify long distance phone calls, which<br />

is the reason for this team’s research. Transistors can switch <strong>and</strong> modulate electronic<br />

current, <strong>and</strong> are composed of a semi-conductor that can both conduct <strong>and</strong> insulate, such<br />

as germanium or silicon. The transistor can act as a transmitter by converting sound<br />

waves into electronic waves <strong>and</strong> a resistor by controlling electrical current. In 1954,<br />

Texas Instruments lowered the cost of production by introducing silicon transistors. The<br />

transistor brought about the second generation in computers by replacing vacuum tubes<br />

with solid-state components, which began the semiconductor revolution. xxviii Philco<br />

Corporation engineers developed the surface barrier transistor in 1954, which was the<br />

first transistor suitable for use in high-speed computers. In 1957, Philco completed the<br />

TRANSAC S-2000—the first large-scale, fully transistorized<br />

scientific computer to be offered as a manufactured<br />

product. xxix<br />

In 1957, the Burroughs Atlas computer, constructed at the<br />

Great Valley Research Laboratory outside of Philadelphia,<br />

was one of the first to use transistors. The machine was<br />

developed for the America air defense system deployed<br />

during the 1950’s <strong>and</strong> was the ground guidance computer for<br />

the Atlas intercontinental ballistic missile (ICBM). The first<br />

launch was in 1958. The system had two memory areas, one<br />

for data with 256 24-bit words <strong>and</strong> one for instructions with<br />

2048 18-bit words. There were 18 Atlas computers<br />

constructed, costing $37 million. xxx<br />

39


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

After the launch of Sputnik (NASA recreated model pictured on left) by the U.S.S.R. in<br />

1957, The U.S. government responded by forming the Advanced Research Projects<br />

Agency (ARPA) to ensure technological superiority by exp<strong>and</strong>ing new frontiers of<br />

technology beyond immediate requirements. Initially ARPA's mission concerned issues,<br />

including space, ballistic missile defense, <strong>and</strong> nuclear test detection. The major<br />

contribution that ARPA made to computer technology was the Advanced Research<br />

Projects Agency Network (ARPANET).<br />

In 1960, Paul Baran of the RAND Corporation<br />

published studies on secure communication<br />

technologies that would allow military<br />

communications to continue operations after a nuclear<br />

attack. He discovered two important ideas that outline<br />

the packet-switching principal for data<br />

communications:<br />

1. Use a decentralized network having multiple<br />

paths between any two points, which allows<br />

single points of failure from which the system<br />

could automatically recover<br />

2. Divide complete user messages into blocks<br />

before sending them into the network<br />

In 1961, Leonard Kleinrock performed research<br />

on “store <strong>and</strong> forward” messaging, where<br />

messages are buffered completely on a switch or<br />

router, checksummed to find if an error exists in<br />

the message, <strong>and</strong> sent to the next location. In<br />

1962, J.C.R. Licklider from MIT discussed the<br />

“Gallactic Network” concept in a series of<br />

memos. These computer network ideas<br />

represent the same type of general<br />

communication system as is used in the Internet.<br />

The same year that he wrote these memos,<br />

Licklider was working at ARPA <strong>and</strong> was able to<br />

convince others that this was an important idea.<br />

In 1966, Lawrence G. Roberts from MIT was<br />

brought in to head the APRANET project to<br />

build the network. Roberts’ "plan for the<br />

ARPANET" was introduced at a symposium in<br />

40


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

1967, which included a time-sharing scheme using smaller computers to facilitate<br />

communication between larger machines as suggested by Wesley Clark. An updated<br />

plan was completed in 1968, which included packet switching. The contract to construct<br />

the network was awarded to Bolt, Beranek <strong>and</strong> Newman in early 1969. The first<br />

connected network consisted of four nodes between UCLA, the Stanford Research<br />

Institute, UCSB, <strong>and</strong> University of Utah. It was completed in December 1969. The<br />

ARPANET was the world’s first operational packet switched network. Packet switching<br />

was a new concept that allowed more than one machine to access one channel to<br />

communicate with other machines. Previously these channels were switched <strong>and</strong> only<br />

allowed one machine to communicate with one other machine at a time. By 1973, the<br />

University College of London in Engl<strong>and</strong> <strong>and</strong> the Royal Radar Establishment in Norway<br />

connect to the ARPANET, making it an international network.<br />

With the advent of computer internetworking came new innovations to facilitate<br />

communication between machines. One innovation formulated by Robert Kahn <strong>and</strong> Vint<br />

Cerf was to make host computers responsible for reliability, instead of the network as<br />

was done in the initial ARPANET. This minimized the role of the network, which made<br />

it possible to connect networks <strong>and</strong> machines with different characteristics <strong>and</strong>, made the<br />

development of the Transmission Control Protocol (TCP)—to check, track <strong>and</strong> correct<br />

transmission errors <strong>and</strong> the Internet Protocol (IP)—to manage packet switching. The<br />

TCP/IP suite is arranged as a layered set of protocols, called the TCP/IP Stack, which<br />

defines each layers responsibilities in the connectionless transmission of data <strong>and</strong><br />

interfaces that allow the passing of data between each layer. Because the interfaces<br />

between each layer are st<strong>and</strong>ardized <strong>and</strong> well defined, development of hardware <strong>and</strong><br />

software is possible for different purposes, <strong>and</strong> from different architectures. The TCP/IP<br />

protocols replaced the Network Control Protocol (NCP), the original ARPANET<br />

protocol, <strong>and</strong> the military part of ARPANET was separated, forming MILNET, in 1983.<br />

The initial network restricted commercial activities because it was government funded.<br />

In the early 1970’s, message exchanges that were initially available on mainframe<br />

systems became available across wide area networks. In 1972, Ray Tomlinson<br />

introduced the “name@computer” addressing scheme to simplify e-mail messaging,<br />

which is still in use today. In 1972, the Telnet st<strong>and</strong>ard for terminal emulation over<br />

TCP/IP networks, which allows users to log onto a remote computer, was introduced. It<br />

enables users to enter comm<strong>and</strong>s on offsite computers, executing the as if they were<br />

using the remote systems own console. In 1973, the File Transfer Protocol (FTP) was<br />

developed to facilitate the long-distance transfer of files across computer networks. The<br />

Unix <strong>User</strong> Network (Usenet) was created in 1979 to facilitate the posting <strong>and</strong> sharing of<br />

messages, called “articles”, to network distributed bulletin boards, called “newsgroups”.<br />

In the mid 1980’s the Domain Name System used Domain Name Servers to simplify<br />

machine identification. Instead of using a machines IP address, such as “10.192.20.128”,<br />

41


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

a user only need remember the machines domain name, such as “thismachine.net”. By<br />

1982, commercial e-mail service was available in 25 cities <strong>and</strong> the term “Internet” was<br />

designated to mean a “connected set of computer networks”. In 1983, the complete<br />

change to TCP/IP created a truly global “Internet”.<br />

National Science Foundation (NSF) became involved in ARPANET in the mid 1980’s.<br />

In 1986, the NSFNet Backbone was started to connect <strong>and</strong> provide access to<br />

supercomputers. In the late 1980’s, the Department of Defense stopped funding for<br />

ARPANET <strong>and</strong> the NSF assumed responsibility for long-haul connectivity in 1989. The<br />

first Internet Service Providers (ISP) companies also appeared, servicing regional<br />

research networks <strong>and</strong> providing access to email Usenet News for the public. The NSF<br />

initiated the connection of regional TCP/IP networks <strong>and</strong> the Internet began to emerge.<br />

In the 1990’s, commercial activity was allowed <strong>and</strong> the Internet grew rapidly.<br />

Eventually, this commercial activity created competition <strong>and</strong> commercial regional<br />

providers, called Network Access Points (NAP’s) took over backbones <strong>and</strong><br />

interconnections, causing NSFNet to be dropped <strong>and</strong> the removal of all existing<br />

commercial restrictions.<br />

In 1989, Tim Berners-Lee invented the Uniform<br />

Resource Locator (URL) <strong>and</strong> Hypertext Markup<br />

Language (HTML), which were inspired by Vannevar<br />

Bush's "memex". The URL provides a simple way to<br />

find specific documents on the Internet by using the<br />

name of the machine, the name of the document file<br />

<strong>and</strong> the protocol to obtain <strong>and</strong> display the file. HTML<br />

is a method to set the format a document by<br />

embedding codes, which can also be used to designate<br />

hypertext—text that can be “clicked” on with a mouse<br />

pointer to cause some action or to retrieve another<br />

document. Eventually it was possible to place<br />

graphics <strong>and</strong> sound in documents, which started the<br />

World Wide Web (WWW), <strong>and</strong> many of the services<br />

that are now available on the Internet. By 1997, 150<br />

countries <strong>and</strong> 15 million host computers were<br />

connected to the Internet, <strong>and</strong> 50 million people were using the World Wide Web. By<br />

1990, approximately 9 million people will send over 2.3 billion e-mail messages. xxxi<br />

In 1958, the ALGOrithmic Language (ALGOL) 58 high-level scientific programming<br />

language was formalized. It was designed to be a universal language by an international<br />

committee. It was the first attempt at software portability to provide a machine<br />

independent implementation. ALGOL is considered to be an important language because<br />

42


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

it influenced the development of future languages. Almost all languages have been<br />

developed with “ALGOL-like” lexical <strong>and</strong> syntactic structures that have hierarchal,<br />

nested environment <strong>and</strong> control structures. ALGOL 60 had block structure for statements<br />

<strong>and</strong> the ability to call subprograms by name or by value. It also had if-then-else control<br />

statements for iteration <strong>and</strong> with recursive ability. ALGOL has a small number of basic<br />

constructs with a non-restricted associated type <strong>and</strong> rules to combine them into more<br />

complex constructs, of which some can produce values. ALGOL also had dynamic<br />

arrays wit variable specified subscript ranges, reserved words for key functions that could<br />

not be used as identifiers, <strong>and</strong> user defined data types to fit particular problems. A<br />

sample ALGOL source code “Hello World!” program from the Web site referenced for<br />

this information that runs on a Unisys A-series mainframe is: xxxii<br />

BEGIN<br />

FILE F (KIND=REMOTE);<br />

EBCDIC ARRAY E [0:11];<br />

REPLACE E BY "HELLO WORLD!";<br />

WHILE TRUE DO<br />

BEGIN<br />

WRITE (F, *, E);<br />

END;<br />

END.<br />

As of 1959, more that 200 programming languages had been created.<br />

Between 1958 <strong>and</strong> 1959, both<br />

Texas Instruments <strong>and</strong> Fairchild<br />

Semiconductor Corporation were<br />

introducing integrated circuits<br />

(IC). TI’s Jack Kirby, an<br />

engineer with a background in<br />

transistor-based hearing aids,<br />

introduced first IC (pictured left<br />

from CNN), which was based on<br />

a germanium semiconductor.<br />

Soon after, one of Fairchild’s<br />

founders <strong>and</strong> research engineers,<br />

Robert Noyce, produced a<br />

similar device based on a silicon<br />

semiconductor. The monolithic<br />

integrated circuit combined transistors, capacitors, resistors <strong>and</strong> all connective wiring on<br />

a single semiconductor crystal or chip. Fairchild produced the first commercially<br />

available ICs in 1961. Integrated circuits quickly became the industry st<strong>and</strong>ard<br />

43


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

architecture for computers. Robert Noyce later founded Intel. Jack Kirby had<br />

commented:<br />

"What we didn't realize then was that the integrated circuit would reduce the cost of<br />

electronic functions by a factor of a million to one, nothing had ever done that for<br />

anything before" xxxiii<br />

In 1960, The<br />

Remington R<strong>and</strong><br />

UNIVAC delivered<br />

the Livermore<br />

Advanced Research<br />

Computer (LARC)<br />

computer to the<br />

University of<br />

California Radiation<br />

Laboratory, now<br />

called the Lawrence<br />

Livermore National<br />

Laboratory. This<br />

machine had four<br />

major cabinets that<br />

were approximately<br />

20 feet long, 4 feet<br />

wide <strong>and</strong> 7 feet tall.<br />

One cabinet contained the I/O processor to route <strong>and</strong> control input <strong>and</strong> output, another<br />

had the computational unit to perform computational activity, <strong>and</strong> the last two contained<br />

16K of ferrite core memory. There were also twelve floating head drums, rotating<br />

cylinders coated with a magnetic material, that were approximately 4 feet wide, 3 feet<br />

deep <strong>and</strong> 5 feet high, which were used as storage devices. Each drum could store<br />

250,000 12-decimal-digit LARC words—almost 3 Megs on its 12 drums. There were<br />

also two independent controllers for read <strong>and</strong> write operations. There were also eight<br />

tape head units that could hold approximately 450,000 LARC words on each tape reel,<br />

deducting storage overhead. Its printer could print 600 lines per minute <strong>and</strong> had a 51<br />

alphanumeric characters set. There was a punch card reader <strong>and</strong> a control console with<br />

toggle switches to control the system (pictured above). The LARC performed decimal<br />

mode arithmetic operations to 22 decimal digits <strong>and</strong> could perform 12x12 addition in 4<br />

microseconds <strong>and</strong> 12x12 multiplication in 12 microseconds, with division taking a little<br />

bit longer. The machine used storage, shift <strong>and</strong> result registers to store information<br />

during repetitive calculations. LARC’s hardware was difficult to maintain due to its<br />

44


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

discrete nature, being comprised of a collection of transistors, resistors, capacitors <strong>and</strong><br />

other electronic components. xxxiv<br />

In November of 1960, Digital<br />

Equipment Corporation (DEC)<br />

started production of the world’s<br />

first commercial interactive<br />

computer, the PDP-1 (left). The<br />

$120,000 machine’s four cabinets<br />

measured approximately 8 feet in<br />

length. A DEC technical bulletin<br />

describes it as:<br />

"...a compact, solid state general<br />

purpose computer with an internal<br />

instruction execution rate of<br />

100,000 to 200,000 operations per<br />

second. PDP-1 is a single address,<br />

single construction, stored program<br />

machine with a word length of 18-<br />

bits operating in parallel on 1's<br />

complement binary numbers."<br />

It had a 4000 18-bit word memory.<br />

It was the first computer with a<br />

typewriter keyboard <strong>and</strong> a cathoderay<br />

tube display monitor. It also<br />

had a light pen, which made it<br />

interactive, <strong>and</strong> a paper punch output device. Producing 50 of these machines made DEC<br />

the world’s first mass computer maker. xxxv<br />

Between 1961 <strong>and</strong> 1962, Fern<strong>and</strong>o Corbató of MIT developed Compatible Time-Sharing<br />

System (CTSS) as part of Project MAC, which was one of the first time-sharing<br />

operating systems that allowed multiple users to share a single machine. It was also the<br />

first system to have formatting text utility <strong>and</strong> one of the first to have e-mail capabilities.<br />

Louis Pouzin developed RUNCOM for CTSS, the precursor of UNIX shell script, which<br />

executed comm<strong>and</strong>s contained in a file <strong>and</strong> allowed parameter substitution. Multiplexed<br />

Information <strong>and</strong> Computing Service (Multics), the operating system that led to the<br />

development of UNIX, was also developed by project MAC. This system was the<br />

successor to CTSS <strong>and</strong> was used for multiple-access computing. xxxvi<br />

45


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

In 1962, the Telstar I communications satellite was<br />

launched <strong>and</strong> relayed the first transatlantic television<br />

signals. The black <strong>and</strong> white image of an American flag<br />

was relayed from a large antenna in Andover, Maine to<br />

the Radome in Pleumeur-Bodou, France. This was the<br />

first satellite built for active communications. It<br />

demonstrated that a worldwide communication system<br />

was feasible. The satellite was launched by NASA from<br />

Cape Canaveral, Florida, weighed 171 pounds <strong>and</strong> was<br />

34 inches in diameter. On the same day, the Telstar I<br />

beamed the first satellite long distance phone call. The satellite was in service until 1963.<br />

As of 2002, there were 260 active satellites in Earth’s orbit.<br />

46


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

In late 1962, the Atlas computer (left) entered service at the University of Manchester,<br />

Engl<strong>and</strong>. This was the first machine to have pipelined instruction execution, virtual<br />

memory <strong>and</strong> paging, <strong>and</strong> separate fixed <strong>and</strong> floating-point arithmetic units. At the time it<br />

was the world’s most powerful computer capable of about 200,000 FLOPS. It could<br />

perform the following arithmetic operations (approximate times):<br />

• Fixed-point addition in 1.59 microseconds<br />

• Floating-point add in 1.61 microseconds<br />

• Floating-point multiply in 4.97 microseconds<br />

The machine could timeshare between different peripheral <strong>and</strong> computing operations,<br />

was multiprogramming capable, had interleaved stores, had V-stores to store images of<br />

memory, had a one-level virtual store, had autonomous transfer units <strong>and</strong> ROM stores. It<br />

had an operating system called the “Supervisor” to manage the computers processing<br />

time <strong>and</strong> scheduling operations <strong>and</strong> could compile high-level languages. The machine<br />

had a 48-bit word size <strong>and</strong> a 24-bit address size. It could store 16K words in its main<br />

ferrite core memory, interleaving odd <strong>and</strong> even address. It had an additional 96K of<br />

storage in its four magnetic drum storage, which was integrated with the main memory<br />

using virtual memory or paging. It also accessed its peripheral devices through V-store<br />

addresses <strong>and</strong> extracode routines. xxxvii<br />

In 1964, J. Kemeny <strong>and</strong> T. Kurtz, mathematics professors at Dartmouth College,<br />

developed the Beginner's All Purpose Symbolic Instruction Code (BASIC) as a simple to<br />

learn <strong>and</strong> interpret language that would serve to help students learn more complex <strong>and</strong><br />

powerful languages, such as FORTRAN or ALGOL. xxxviii In the same year, IBM<br />

developed its Programming Language 1 (PL/1), formerly known as New Programming<br />

Language (NPL), which was the first attempt to develop a language that could be used for<br />

many application areas. Previously, programming languages were designed for a single<br />

purpose, such as mathematics or physics. PL/1 can be used for business <strong>and</strong> scientific<br />

purposes. PL/1 is a freeform language with no reserved keywords, has hardware<br />

independent data types, is block oriented, contains control structures to conditionally<br />

allow logical operations, supports arrays, structures <strong>and</strong> unions (<strong>and</strong> complex<br />

combinations of the three structures), <strong>and</strong> provides storage classes. xxxix<br />

In 1962, Doug Englebart of the Stanford Research Institute published the paper:<br />

“Augmenting Human Intellect: A Conceptual Framework”. His ideas proposed a device<br />

that would allow a computer user to interact with an information display screen by using<br />

a device to move a cursor on the screen—in other words, a mouse. The actual device,<br />

shown on the left, was invented in 1964. xl In the same year, the number of computers in<br />

the US grows to 18,000. In 1972, Xerox Palo Alto Research Center (PARC) Learning<br />

47


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Research Group developed Smalltalk.<br />

This forerunner of Mac OS <strong>and</strong> MS<br />

Windows was the first system with<br />

overlapping windows <strong>and</strong> opaque popup<br />

menus. In 1973, Alan Kay invented<br />

the “office computer”, a forerunner of<br />

the PC <strong>and</strong> Mac. Its design was based<br />

on Smalltalk, with icons, graphics <strong>and</strong> a<br />

mouse. Kay stated at a 1971 meeting at<br />

PARC:<br />

"Don't worry about what anybody else is going to do… The best way to predict the future<br />

is to invent it. Really smart people with reasonable funding can do just about anything<br />

that doesn't violate too many of Newton's Laws!" xli<br />

In 1973, R. Metcalfe <strong>and</strong> researchers at Xerox PARC developed the experimental Alto<br />

PC that incorporates a mouse, graphical user interface <strong>and</strong> Ethernet. Within the same<br />

year, PARC’s Charles Simonyi developed Bravo text editor, the first “What You See Is<br />

What You Get—type” (WYSIWYG) application. Metcalfe, later in the year, wrote a<br />

memo describing Ethernet as a modified “Alohanet”, titled “Ether Acquisition”. By<br />

1975, Metcalfe developed the first Ethernet local area network (LAN). By 1979, Xerox,<br />

Intel <strong>and</strong> DEC had announced support for Ethernet. The Alto PC was officially<br />

introduced in 1981 with a mouse, built-in Ethernet <strong>and</strong> Smalltalk. The commercial<br />

version, available the same year, was named the Xerox Star <strong>and</strong> was the forst<br />

commercially available workstation with a WYSIWYG desktop-type Graphical <strong>User</strong><br />

interface (GUI).<br />

In 1964, Control Data Corp. introduced the CDC<br />

6600 (left). It was designed by supercomputer guru<br />

Seymour Cray, had 400,000 transistors <strong>and</strong> was<br />

capable of 350,000 FLOPS. The 100 produced $7-<br />

10 million machines had over 100 miles of electrical<br />

wiring <strong>and</strong> a Freon refrigeration system to keep the<br />

system’s electronics cool <strong>and</strong> were the world’s first<br />

commercially successful supercomputer. The<br />

machine was also the first to have an interactive<br />

display the showed the graphical results of data, as it<br />

was processed in real-time.<br />

48


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Between 1964 <strong>and</strong> 1965, DEC<br />

introduced the PDP-8 (left)—the<br />

world’s first minicomputer. It<br />

contained transistor-based circuitry<br />

modules <strong>and</strong> was mass-produced for<br />

the commercial market—the first<br />

computer sold as a retail product.<br />

During its initial offering at $18,000,<br />

it was the smallest <strong>and</strong> least<br />

expensive available parallel generalpurpose<br />

computer. By 1973, the<br />

PDP-8, described as the “Model T”<br />

of the computer industry, was the<br />

best selling computer in the world.<br />

They had 12-bit words, usually with 4K words of memory, a robust instruction set <strong>and</strong><br />

could run at room temperature. xlii<br />

In 1965, Maurice V. Wilkes proposes the use of cache memory—a smaller, faster, more<br />

expensive type of memory that hold a copy of part of main memory. Access to entities in<br />

cache memory is much faster than that in main memory, which leads to better system<br />

performance. The same year, Intel founder Gordon Moore proposed that the number of<br />

transistors on microchips would double every year. The prediction was valid <strong>and</strong> came to<br />

be known as Moore’s Law. Consider that a chip in 1964 that was 2½ cm 2 had ten<br />

components <strong>and</strong> a chip in 1970 of the same size had about 1000.<br />

In 1967, Donald Knuth produced some of the work that would become “The Art of<br />

Computer Programming”. He introduced the idea that a computer program’s algorithms<br />

<strong>and</strong> data structures should be treated as different entities than the program itself, which<br />

has greatly improved computer programming. Volume 1 of The Art of Computer<br />

Programming was published in 1968.<br />

In 1967, Niklaus Wirth began to develop the Pascal structured programming language.<br />

The Pascal St<strong>and</strong>ard (ISO 7185) states that it was intended to:<br />

• “make available a language suitable for teaching programming as a systematic<br />

discipline based on fundamental concepts clearly <strong>and</strong> naturally reflected by the<br />

language”<br />

• “to define a language whose implementations could be both reliable <strong>and</strong> efficient<br />

on then-available computers” xliii<br />

49


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Pascal, based on ALGOL’s block structure, was released in 1970. An example “Hello<br />

World!” Program in Pascal is:<br />

Program Hello (Input, Output);<br />

Begin<br />

Writeln ('Hello World!');<br />

End.<br />

In 1968, Burroughs introduced the first computers that used integrated circuits—the<br />

B2500 <strong>and</strong> the B3500. The same year Control Data built the CDC7600 <strong>and</strong> NCR<br />

introduced their Century series computer—both using only integrated circuits.<br />

In 1968, the Federal Information Processing St<strong>and</strong>ard created the “The Year 2000 Crisis”<br />

by encouraging the “YYMMDD” six-digit date format for information interchange. In<br />

1968, the practice of structured programming started with Edsger Dijkstra’s writings<br />

about the harm of the goto statement. This lead to wide use of control structures, such as<br />

the while loop, to control iterative routines in programs. xliv Between 1968 <strong>and</strong> 1969,<br />

NATO Science Committee held two conferences on Software Engineering, which is<br />

considered to be the start of this field. From the 1960’s to the 1980’s, there was a<br />

“software crisis” because many software projects had undesirable endings. Software<br />

Engineering arose from the need to produce better software, on schedule <strong>and</strong> within the<br />

anticipated budget. Essentially, Software Engineering is a set of diverse practices <strong>and</strong><br />

technologies used in the creation <strong>and</strong> maintenance of software for diverse purposes. xlv<br />

In 1969, Bell Labs withdrew support from Project MAC <strong>and</strong> the Multics system to begin<br />

development of UNIX. Kenneth Thompson <strong>and</strong> Dennis Ritchie began designing UNIX<br />

in the same year. The operating system was initially named Uniplexed Information <strong>and</strong><br />

Computing System (UNICS) as a hack on Multics but was later changed. In the<br />

beginning, UNIX received no financial support from Bell Labs. Some support was<br />

granted to add text processing to UNIX for use on the DEC PDP-11/20. The text<br />

processor was named runoff, which Bell Labs used to record patent information, <strong>and</strong> later<br />

evolved into troff, the world’s first publishing program with the capability of full<br />

typesetting. In 1973, it was decided to rewrite UNIX in C, a high level language, to<br />

make it easily modifiable <strong>and</strong> portable to other machines, which accelerated the<br />

development of UNIX. AT&T licensed use of this system to commercial, education <strong>and</strong><br />

government organizations.<br />

In 1973, Dennis Ritchie developed the C programming language. C is a high level<br />

programming language mainly to be used with UNIX. A sample “Hello World!”<br />

program in C is:<br />

50


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

#include <br />

int main(){<br />

printf ("Hello World!\n");<br />

return 0;<br />

}<br />

Later in 1983, Bjarne Stoustrup (right) added object<br />

orientation to C, creating C++, at AT&T Bell Labs. In<br />

1995, Sun Microsystems released its object-oriented Java<br />

programming language, which was both platform<br />

independent <strong>and</strong> network compatible. Java is an extension<br />

of C++ <strong>and</strong> C++ is an extension of C.<br />

By 1975, there were versions of UNIX using pipes for inter process communication<br />

(IPC). AT&T released a commercial version, UNIX System III, in 1982. Later, System<br />

V was developed by combining features from other versions, including U.C. Berkley’s,<br />

Berkeley Software Distribution (BSD), which contributed the Vi editor <strong>and</strong> curses.<br />

Berkley continued to work on BSD the noncommercial version <strong>and</strong> added Transmission<br />

Control Protocol (TCP) <strong>and</strong> the Internet Protocol (IP), known as the TCP/IP suite, for<br />

network communication to the UNIX kernel. Eventually AT&T produced UNIX System<br />

V by adding system administration, file locking for file level security, job control,<br />

streams, the Remote File System <strong>and</strong> Transport Layer Interface (TLI) as a network<br />

application programming interface (API). Between 1987 <strong>and</strong> 1989, AT&T merged<br />

System V <strong>and</strong> XENIX, Microsoft’s x86 UNIX implementation, into UNIX System V<br />

Release 4 (SVR4).<br />

Novel bought the rights for UNIX from AT&T to in an attempt to challenge Microsoft’s<br />

Windows NT, which caused their core markets to suffer. Novel sold the UNIX rights to<br />

X/OPEN, an industry consortium that defined a version of the UNIX st<strong>and</strong>ard, who later<br />

merged with OSF/1, another st<strong>and</strong>ard group, to form the Open Group. The Open Group<br />

presently defines the UNIX operating system. xlvi<br />

In 1969, the RS-232<br />

st<strong>and</strong>ard, commonly<br />

referred to as a serial port,<br />

for serial binary data<br />

interchange between Data terminal equipment (DTE)<br />

<strong>and</strong> Data communication equipment (DCE) was<br />

established. xlvii<br />

51


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

In 1970, RCA developed metal-oxide semiconductor<br />

(MOS) technology for fabricating integrated circuits,<br />

which made them smaller in size, cheaper <strong>and</strong> faster to<br />

produce. The first chips using large-scale integration<br />

(LSI) were produced in the same year, containing up to<br />

15,000 transistors per chip. In 1971, Intel introduced<br />

the world’s first mass produced, single chip, universal<br />

microprocessor, the Intel 4004 (left), which was<br />

invented by Federico Faggin, Ted Hoff, Stan Mazor <strong>and</strong> their<br />

engineering team. It was a dual inline package (DIP)<br />

processor, which means that it had two rows of pins that were<br />

inserted into the motherboard. The microprocessor can be<br />

thought of as a “computer on a chip”. All of the thinking<br />

parts of the computer, central processing unit (CPU),<br />

memory, input <strong>and</strong> output (I/O) controls, were miniaturized<br />

<strong>and</strong> condensed onto a single<br />

chip. The 4004 chip, based<br />

on the silicon-gated MOS<br />

technology, had more than<br />

2,300 transistors in an area<br />

of 12 square millimeters, a<br />

4-bit CPU that used 8-bit<br />

instructions, a comm<strong>and</strong><br />

register, a decoder,<br />

decoding control, control<br />

monitoring of machine<br />

comm<strong>and</strong>s <strong>and</strong> an interim<br />

register. The chip ran at a<br />

speed of 108 kHz <strong>and</strong> could<br />

process 60,000 instructions<br />

per second at a cost of $300. It had sixteen either 4-<br />

bit or 8-bit general-purpose registers <strong>and</strong> set of 45<br />

instructions. It could address 1K of program<br />

memory <strong>and</strong> 4K of data memory. Later models had<br />

clock speeds of up to 740KHz. The picture on the<br />

lower left shows the 4004 motherboard <strong>and</strong> the<br />

picture on the right shows the chip die. The Pioneer<br />

10 spacecraft, launched on March 2, 1972, used a<br />

4004 processor <strong>and</strong> became the first spacecraft (<strong>and</strong><br />

microprocessor) to enter the Asteroid Belt. xlviii<br />

52


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

In 1972, Intel offered the 8008 chip (left),<br />

which was the world’s first 8-bit<br />

microprocessor. The 8008 had 3300<br />

transistors <strong>and</strong> even though its clock speed<br />

was 800 KHz it was slightly slower in<br />

instructions per second than the 4004 but<br />

because it was 8-bit, it could access more<br />

RAM <strong>and</strong> process data 3 to 4 times faster<br />

than the 4-bit chips. In 1974, Intel released<br />

the 8080 chip (left), which had a 16-bit<br />

address bus <strong>and</strong> an 8-bit data bus. It had a<br />

16-bit stack pointer, a 16-bit program<br />

counter <strong>and</strong> seven 8-bit registers, of which some could be combined for 16-bit registers.<br />

It also had 256 I/O ports to ensure that devices did not interfere with its memory address<br />

space. It had a clock speed of 2 MHz, 64 KB of addressable memory, 48 instructions <strong>and</strong><br />

vectored multilevel interrupts.<br />

In 1978, Intel introduced the 8086 chip<br />

(left)—the first 16-bit microprocessor.<br />

This chip had 29,000 transistors, using<br />

a 3.0-micron die core design <strong>and</strong> 300<br />

instructions. It had a 16-bit bus<br />

compatibility for communication with<br />

peripherals. The chips were available in 5, 6, 8, <strong>and</strong> 10 MHz clock speeds <strong>and</strong> had a 20-<br />

bit memory address space that could address up to 1 MB of RAM. Though the 8086 was<br />

available, IBM chose to use the 8088, the 8-bit version developed slightly later, because<br />

of the former chip’s great expense. xlix<br />

The Intel 80186, released in<br />

1980, had a 16-bit external<br />

bus, an initial clock speed of<br />

6 MHz <strong>and</strong> a 1.0-micron die.<br />

This chip was Intel’s first pin<br />

grid array (PGA) offering,<br />

meaning that the pins on the<br />

processor were arranged into<br />

a matrix-like array with the<br />

pins around the outside edge<br />

(upper right). This popular<br />

chip was mostly used in imbedded systems <strong>and</strong> rarely used in PCs. This model required<br />

less external chips than its predecessors. It had an integrated system controller, a priority<br />

53


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

interrupt controller, two direct memory access (DMA) channels (with<br />

controller), <strong>and</strong> timing circuitry (three timers). It replaced 22 separate<br />

VLSI <strong>and</strong> transistor-transistor logic (TTL) chips <strong>and</strong> was more cost<br />

efficient than the chips it replaced. In 1982, Intel developed the 80286<br />

processor, which had 134,000 transistors, a 1.5-micron die, <strong>and</strong> could<br />

address up to 16 megabytes of memory. This microprocessor was the<br />

first to introduce the protected mode, which allowed the computer to<br />

multitask by running more than one program at a time by time-sharing<br />

the systems resources. Its initial models ran at 8, 10 <strong>and</strong> 12.5 MHz but<br />

later models ran as fast as 20 MHz. The 80386 processor was released in<br />

1985 with 275,000 transistors, a 1.0-micron die, a 32-bit instruction <strong>and</strong><br />

a 32-bit memory address space that could address up to four gigabytes of<br />

RAM. It had the ability to address up to 64 terabytes of virtual memory. The initial<br />

clock speeds were 16, 20, 25, <strong>and</strong> 33 MHz. It also had a feature called instruction<br />

pipelining, which allowed the processor to run the next instruction before finishing the<br />

previous instruction. It had a virtual real time mode that allowed more than one running<br />

session of real time programs, a feature that is used in multitasking operating systems.<br />

This chip also had a system management mode (SMM), which could power down various<br />

hardware devices to decrease power use. In 1989, Intel introduced the 80486 line of<br />

processors with 1.2 million transistors, a 1.0-micron die, <strong>and</strong> the same instruction <strong>and</strong><br />

memory address size as the 386. This was the first microprocessor to have an integrated<br />

floating-point unit (FPU). Previously, CPUs had to have an external FPU, called a math<br />

coprocessor, to speed up floating-point operations. It also had 8 kilobytes of on-die<br />

cache, which stored predicted next instructions for pipelining. This saved an access to<br />

main memory, which is much slower than cache memory. Later 486 models could<br />

operate at greater speeds that the maximum system bus speed. The 486DX2/66 was a<br />

clock doubled 33 MHz to 66 MHz <strong>and</strong> the 486DX4/100 was clock a tripled 33 MHz to<br />

100 MHz.<br />

In 1993, Intel released the Pentium processor with 3.21 million<br />

transistors <strong>and</strong> a 0.8-micron die. Clock speeds were available<br />

from 60 to 200 MHz, with a 60 MHz processor capable of 100<br />

MIPS. It had the same 32-bit address space as the 386 <strong>and</strong> 486<br />

but had an external data bus width of 64 bits <strong>and</strong> a superscalar<br />

architecture (able to process two instructions per clock cycle),<br />

which allowed it to process instructions <strong>and</strong> data about twice as<br />

fast as the 486. Internally, this chip was actually two 32-bit<br />

processors chained together that shared the workload. It had<br />

two separate 8 KB caches (one data <strong>and</strong> one instruction cache) <strong>and</strong> a pipelined FPU,<br />

which could perform floating-point operations much faster than the 486. Later versions<br />

54


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

of the chip had symmetric dual processing—the ability to have two processors in the<br />

same system.<br />

In 1995, the Pentium Pro was released with 5.5 million<br />

transistors, a 0.6-micron die <strong>and</strong> a clock speed of up to 200<br />

MHz. It was a reduced instruction set computer (RISC)<br />

processor. RISC processors have a smaller set of instructions<br />

than complex instruction set computer processors. The first<br />

computers were of CISC design to bridge semantic differences<br />

or gaps between low-level machine code <strong>and</strong> high-level<br />

programming languages, which reduced the size of computer<br />

programs <strong>and</strong> calls to main memory but did not necessarily<br />

improve system performance. The main idea with RISC is to build more complex<br />

instructions using a sequence of smaller, simpler instructions. Complex instructions have<br />

greater time <strong>and</strong> space overhead while decoding instructions, especially when microcode<br />

is used to decode macroinstructions. There is a high probability that the frequency of<br />

instructions to be processed will be smaller rather than larger. Limiting the number of<br />

instructions in a computer to a smaller optimized set can contribute to greater<br />

performance. The Pentium Pro could process three instructions per clock cycle <strong>and</strong> had<br />

decoupled decoding <strong>and</strong> execution, which allowed the processor to keep working on<br />

instructions in other pipelines if one of the pipelines stops to wait for an event. The<br />

st<strong>and</strong>ard Pentium would stop all pipelines until the event occurred. It also had up to 1<br />

MB of onboard level-2 cache, which was faster than having the cache on the<br />

motherboard.<br />

In 1997, Intel released the Pentium MMX series of processors<br />

with 4.5 million transistors, clock speeds up to 233 MHz <strong>and</strong> a<br />

0.35-micron die size. The MMX had 57 additional complex<br />

instructions that aided the CPU in performing multimedia <strong>and</strong><br />

gaming instructions 10 to 20 percent faster than processors<br />

without the MMX instruction set. The processor also had dual<br />

16K level-1 cache <strong>and</strong> improved dynamic branch prediction, an<br />

additional instruction pipe <strong>and</strong> a pipelined FPU.<br />

In 1993, Intel released the Pentium II, which had 27.4<br />

million transistors <strong>and</strong> a 0.25-micron die. The<br />

Pentium II combined technology from both the<br />

Pentium Pro <strong>and</strong> the Pentium MMX. It had the Pro’s<br />

dynamic branch prediction, the MMX instructions,<br />

dual 16K level-1 cache <strong>and</strong> 512K of level-2 cache.<br />

The level-2 cache ran at ½-speed <strong>and</strong> was not<br />

55


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

attached directly to the processor, which yielded greater performance but not as much as<br />

if it were full-speed <strong>and</strong> attached. The most notable change was the single edge contact<br />

(SEC), called the “Slot 1”, package design, which resembled a card more than it did a<br />

processor. Initial chips had a 66 MHz bus speed but later models had a 100 MHz bus.<br />

The bus speed is the maximum speed that the processor uses to access data in main<br />

memory.<br />

In 1999, Intel released the Pentium III processor<br />

with 28 million transistors, a 0.18 die <strong>and</strong> a 450<br />

MHz clock speed. This processor had 70 additional<br />

instructions that were extensions of the MMX set,<br />

called the SSE instruction set (also known as the<br />

MMX2 instruction set), which improved the<br />

performance of 3D graphics applications. Later<br />

versions of the Pentium III increased the bus speed<br />

to 133 MHz <strong>and</strong> moved the level-2 cache off of the<br />

board <strong>and</strong> onto the CPU core. Though Intel halved the memory to 256K, there was still a<br />

benefit to performance.<br />

In late 2000, Intel introduced the Pentium IV with 42<br />

million transistors, 0.13-micron die <strong>and</strong> a new NetBurst<br />

architecture to support future increases in speed. NetBurst<br />

consists of the Hyper Pipelined Technology, the Rapid<br />

Execution Engine, the Execution Trace Cache <strong>and</strong> a<br />

400MHz system bus. The Hyper Pipelined Technology<br />

doubled the width of the data pipe from 10 to 20 stages,<br />

which decreased the amount of work per stage <strong>and</strong> allowed<br />

it to h<strong>and</strong>le more instructions. A negative consequence of<br />

widening the data pipe is that it took longer to recover from<br />

errors. A newer <strong>and</strong> advanced branch predictor aided the chip in hedging against this<br />

propensity. The Rapid Execution Engine was the inclusion of two arithmetic logic units<br />

operating at double the speed of the processor, which was necessary to h<strong>and</strong>le the<br />

doubled data pipe. The Execution Trace Cache was a new kind of cache that could hold<br />

decoded instructions until they are ready for execution. The chip has less level-1 cache,<br />

8K, to decrease latency. l<br />

One of the ways Intel <strong>and</strong> other manufacturers have increased the speed <strong>and</strong> performance<br />

of CPUs was to decrease die size. This decreases the voltage needed to run the processor<br />

<strong>and</strong> increases clock speed. The functional part of a processor is actually a tiny chip with<br />

less than a third of a square inch of area within the external package shown in the<br />

preceding paragraphs. The chips are thinner than a dime <strong>and</strong> contain tens-of-millions of<br />

56


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

electronic circuits <strong>and</strong> switches. The chips are constructed from semiconductor<br />

materials, such as gallium arsenide or most commonly silicon, which require certain<br />

conditions to conduct electricity. In the case of silicon, it is grown into a large crystal<br />

<strong>and</strong> sliced by precision saws into sheets, called wafers, which can hold many individual<br />

chips. Layers of various materials treated with a photosensitive material are built up on<br />

the surface of the wafer to form the foundation of the transistors <strong>and</strong> data pathways. A<br />

process called photolithography is used to process these wafers by copying the circuitry<br />

onto the layered materials on the wafer using a separate mask for each layer. Light is<br />

accurately focused through the masks, transferring the masks image onto the wafer,<br />

which causes a chemical reaction on the photosensitive material, fixing the circuitry.<br />

Another chemical is used to wash away the excess material. Sometime after the<br />

photolithography process is complete, the wafer is cut into small rectangular chips. The<br />

chips are installed into the CPU package by soldering the appropriate contacts on the chip<br />

with other circuitry <strong>and</strong> the pins that create the interface with the computer’s<br />

motherboard. li<br />

FIND MATERIAL ON ANALYTIC COMPLEXITY <strong>THE</strong>ORY—1972<br />

In 1975, Bill Gates <strong>and</strong> Paul Allen developed BASIC—the first microcomputer<br />

programming language. In 1977, Microsoft, Gates <strong>and</strong> Allen’s newly founded company,<br />

released Altair BASIC for use on the Altair 8800. In 1980, Microsoft acquired the<br />

nonexclusive rights to an operating system, called 86-DOS, that was developed by a<br />

Seattle Computer Products' Tim Patterson. Microsoft had paid $100,000 to contract the<br />

rights from SCP to sell 86-DOS to an unnamed client. In 1980, IBM chose Microsoft<br />

product PC-DOS as the operating system for their new personal computer line.<br />

The IBM PC became a mainstream corporate item when it<br />

was released in 1981. Microsoft bought all rights to 86-DOS<br />

in 1981, renaming it as MS-DOS. IBM’s 5150 had a 4.77<br />

MHz Intel 8088 CPU with 64K of RAM <strong>and</strong> 40K of ROM.<br />

It had a 5.25-inch, single-sided floppy drive, PC-DOS 1.0<br />

installed <strong>and</strong> sold for $3000. IBM’s new PC had an open<br />

architecture, which used off-the-shelf components. This was<br />

good for rapid <strong>and</strong> industry st<strong>and</strong>ard development but bad<br />

(for IBM) because other companies could obtain these<br />

components <strong>and</strong> build their own machines. In 1982,<br />

Columbia Data Products released the first IBM PC<br />

compatible “clone”, called the MPC <strong>and</strong> Microsoft released an IBM compatible version<br />

operating system—MS-DOS v1.25, which could support 360K double-sided floppy<br />

57


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

disks. The same year, Compaq introduces<br />

their first PC. The popularity of the PC<br />

caused sales to soar to 3,275,000 units in<br />

1982, which was greater than ten times as<br />

many in 1981. The social impact of<br />

computers was so important that Time<br />

Magazine named the PC as its “Man of the<br />

Year” to be published on the cover of the<br />

January 1983 edition as the “Machine of the<br />

Year”. By 1990, more that 54 million<br />

computers will be in use in the U.S. By 1996,<br />

approximately 66 percent of employees <strong>and</strong><br />

33 percent of homes have access to personal<br />

computers.<br />

The initial MS-DOS offerings did not support<br />

hard disks. Version 2.0 in 1983 supported up<br />

to 10 MB hard disks <strong>and</strong> tree – structured file<br />

systems. Version 3.0 in 1984 supported 1.2<br />

MB <strong>and</strong> hard disks larger than 10 MB <strong>and</strong> 3.1<br />

had Microsoft network support. Version 4.0 in 1988 had graphical user interface support,<br />

a shell menu interface <strong>and</strong> support for hard disks larger than 32 MB. Version 5.0 in 1991<br />

had a full-screen editor, undelete <strong>and</strong> unformat utilities, <strong>and</strong> task swapping. Version 6.0<br />

in 1993 had DoubleSpace disk compression utility <strong>and</strong> sold over a million copies in 40<br />

days. Version 7.0 of MS-DOS was included with Windows 95 in 1995. lii In 1985,<br />

Microsoft<br />

introduced<br />

Windows<br />

1.0(top left)<br />

with the promise<br />

of an easy-touse<br />

graphical<br />

user interface,<br />

device<br />

independent<br />

graphics <strong>and</strong><br />

multitasking<br />

support. A<br />

limited set of available applications lead to modest sales. Windows 2.0 (bottom left) was<br />

58


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

released in 1987 with two types available.<br />

One was for the 16-bit Intel 80286<br />

microprocessor, called Windows/286. It<br />

added icons <strong>and</strong> overlapping windows with<br />

independently running applications. The other<br />

was for Intel’s 32-bit line of 80386<br />

microprocessors, which had all the<br />

functionality of the Windows/286 system but<br />

also had the ability to run multiple DOS<br />

applications, simultaneously. Windows 2.0<br />

had much better sales due to the availability of<br />

software applications, including Excel, Word,<br />

Corel Draw!, Ami, Aldus PageMaker <strong>and</strong><br />

Micrografx Designer. In 1990, Microsoft<br />

released Windows 3.0 (left) with a completely<br />

new interface <strong>and</strong> the ability to address<br />

memory beyond 640K without secondary<br />

memory manager utilities. Many independent<br />

software developers produced software<br />

applications for this environment, boosting<br />

sales to over 10,000,000 copies.<br />

In 1994, Microsoft released Windows NT 3.1<br />

with an entirely new operating system kernel.<br />

This system was intended for high-end uses,<br />

such as network servers, workstations <strong>and</strong><br />

software development machines. Windows<br />

NT 4.0 was released later the same year <strong>and</strong><br />

was an object-oriented operating system. In<br />

1995, Microsoft introduced Windows 95<br />

(left), which was a full 32-bit operating<br />

system. It had preemptive multitasking,<br />

multithreaded, integrated network, advanced<br />

file system. Though it included DOS 7.0, the<br />

Windows 95 OS assumed full control of the<br />

system after booting. In 1998, Windows 98<br />

was released with enhanced Web support (the Internet Explorer browser was integrated<br />

with the OS), FAT32 for very large hard disk support, <strong>and</strong> multiple display support to use<br />

up to 8 video cards <strong>and</strong> monitors. It also had hardware support for DVD, Firewire,<br />

universal serial bus (USB) <strong>and</strong> accelerated graphics port (AGP). In 2000, Windows 2000<br />

(formerly NT 5.0) was released <strong>and</strong> included many of the features of Windows 98,<br />

59


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

including integrated Web support, <strong>and</strong> enhanced support for distributed file system. It<br />

also supported Internet, intranet <strong>and</strong> extranet platforms, active directory, virtual private<br />

networks, file <strong>and</strong> directory encryption, <strong>and</strong> installation of the W2K OS from a server<br />

located on the LAN.<br />

1976, Cray Research developed the Cray-1 (left)<br />

supercomputer with vectorial architecture, which<br />

was installed at the Los Alamos National<br />

Laboratory. The $8.8 million machine could<br />

perform 160 FLOPS (world record at the time)<br />

<strong>and</strong> had an 8-megabyte (1 million words) main<br />

memory. The machines hardware contained no<br />

wires longer than four feet <strong>and</strong> had a “unique C-<br />

shape”, which allowed integrated circuits to be<br />

very close together. In 1982, Steve Chen’s <strong>and</strong><br />

his research group built the Cray X-MP (right) by<br />

making architectural changes to the Cray-1,<br />

which contained two Cray-1 compatible<br />

pipelined processors <strong>and</strong> a shared memory<br />

(essentially two Cray-1 machines were linked<br />

together in parallel using a shared memory).<br />

This was the first use of shared-memory<br />

multiprocessing in vector supercomputing. The<br />

initial computational speedup of the twoprocessor<br />

X-MP over the Cray-1 was 300%—<br />

three times the computational speed by only<br />

doubling the number of processors. It was<br />

capable of 500 megaflops. This machine<br />

became world’s most commercially successful<br />

parallel vector supercomputers. Chen<br />

commented that the X in X-MP stood for<br />

“extraordinary”. The X-MP ran on UNICOS,<br />

which was Cray’s first UNIX-like operating<br />

system. In 1985, the Cray-2 reached one<br />

billion FLOPS <strong>and</strong> had the world’s largest<br />

memory at 2048 megabytes. In 1988, Cray<br />

produced the Y-MP, which was first<br />

supercomputer to “sustain” over one billion<br />

FLOPS on many of its applications. It had<br />

multiple 333 million FLOPS processors that<br />

could achieve 2.3 billion FLOPS. liii<br />

60


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

In 1977, DEC introduced the 32-bit<br />

VAX11/780 computer (left), which was<br />

used primarily for scientific <strong>and</strong> technical<br />

applications. The first machine was<br />

installed at Carnegie Mellon University<br />

with other units installed at CERN in<br />

Switzerl<strong>and</strong> <strong>and</strong> the Max Planck Institute<br />

in Germany. It could perform 1,000,000<br />

instructions per second <strong>and</strong> was the first<br />

commercially available 32-bit machine. liv<br />

In 1981, Motorola introduced one of the first<br />

32-bit instruction microprocessor offerings from<br />

their 68000 line of processors. The chip has 32-<br />

bit registers <strong>and</strong> a flat 32-bit address space,<br />

which could access a specific memory location,<br />

instead of blocks of memory like the 8086. It<br />

had a 16-bit ALU but had a 32-bit address adder<br />

for address arithmetic. It had eight generalpurpose<br />

registers <strong>and</strong> eight address registers. It<br />

used the last address register as a stack pointer<br />

<strong>and</strong> had a separate status register. It was<br />

initially designed as an embedded processor for<br />

household products but found its way into<br />

Amiga <strong>and</strong> Atari home computers <strong>and</strong> arcade<br />

computer games as a controller. It was also used in Apple Macintosh, Sun Microsystems<br />

<strong>and</strong> Silicon Graphics machines. The architecture of this chip was very similar to PDP-11<br />

<strong>and</strong> VAX machines, which made it very compatible with programs written in the c<br />

language. The chip has been used by auto manufacturers as controllers as well as in<br />

medical hardware <strong>and</strong> computer printers because of its low cost. Updated models of the<br />

processor are still used today in personal digital assistants (PDAs) <strong>and</strong> Texas Instruments<br />

TI-89, TI-92 <strong>and</strong> Voyage 2000 calculators. In 1988, Motorola introduced the 88000<br />

series processors, which were RISC-based, had a true Harvard architecture (separate<br />

instruction <strong>and</strong> data busses) <strong>and</strong> could perform 17 MIPS. lv<br />

In 1985, Inmos introduced the transistor computer (transputer) with its concurrent parallel<br />

microprocessing architecture. Single transputer chips would have all the necessary<br />

circuitry to work by themselves or could be wired together to form more powerful<br />

devices from simple controllers to complex computers. Chips of varying power <strong>and</strong><br />

complexity were available to serve a wide array of tasks. A low power chip might be<br />

61


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

configured to be a hard disk controller <strong>and</strong> a few higher-powered chips might act as<br />

CPUs. These were the first general purpose chips to be specifically designed for parallel<br />

computing.<br />

It was realized in the early 1980’s that conventional CPUs would reach a performance<br />

limit. Even though advances in technology had miniaturized processor circuitry, packing<br />

millions of transistors on chips smaller than the size of a fingernail <strong>and</strong> had drastically<br />

increased computational speed, there was still a impenetrable barrier to conventional<br />

processor performance—the speed of light. Light in a vacuum travels at approximately<br />

299,792,458 meters per second or approximately one foot in a nanosecond. This is the<br />

upper limit for the speed that electrons can travel within electrical equipment, which<br />

suggests that the clock speed limit for processors is about 10 GHz. We are almost half<br />

way to this limit <strong>and</strong> we realize that the speed of light is a limiting factor in the design of<br />

CPUs. The best way to ensure progress in computational performance is parallel<br />

processing. lvi<br />

62


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Parallel Processing<br />

What is parallel processing?<br />

Parallel processing is the concurrent execution of the same activity or task on multiple<br />

processors. The task is divided or specially prepared so that the work can be spread<br />

among many processors <strong>and</strong> yield the same result as if done on one processor but in less<br />

time. There is a variety of parallel processing systems. A parallel processing system can<br />

be a single machine with many processors or many machines connected by a network.<br />

The most powerful machines in the world are machines with hundreds or thous<strong>and</strong>s of<br />

processors <strong>and</strong> hundreds of gigabytes of memory. These machines are called massively<br />

parallel processors (MPP). Many individual machines can cooperate to perform the same<br />

task in distributed networks. The combination of lower performance computers may<br />

exceed the power of a single high-performance computer, when the computational<br />

resources are comparable. The computational power of MPPs has been combined using<br />

the distributed system model to produce unprecedented performance.<br />

Flynn’s taxonomy classifies computing systems with respect to the two types of streams<br />

that flow into <strong>and</strong> out of a processor: instructions <strong>and</strong> data. These two types of streams<br />

can be conceptually split into two different streams, even if delivered on the same wire.<br />

The classifications, based on the number of streams of each type, are:<br />

Single instruction stream/single data stream (SISD) systems have a single instruction<br />

processing unit <strong>and</strong> a single data processing unit. These are conventional single<br />

processor computers, also known as sequential computers scalar processors.<br />

Single instruction stream/multiple data streams (SIMD) systems have a single instruction<br />

processing unit or controller <strong>and</strong> multiple data processing units. The instruction unit<br />

fetches <strong>and</strong> executes instructions until a data or arithmetic operation is reached. It then<br />

sends this instruction to all of the data processing units, which each perform the same<br />

task on different pieces of data, until all data is processed. These data processing units<br />

are either idle or all performing the same task as all other data processors. They cannot<br />

perform different tasks, simultaneously. Each of the data processors has a dedicated<br />

memory storage area. They are directed by the instruction processor to store <strong>and</strong> retrieve<br />

data to <strong>and</strong> from memory. The advantage of this system is the decrease in the amount of<br />

logic on the data processors. Approximately 20 to 50 percent of a single processor’s<br />

logic is dedicated to control operations. The rest of the logic is shared by register, cache,<br />

arithmetic <strong>and</strong> data operations. The data processors have little or no control logic, which<br />

allows them to perform arithmetic <strong>and</strong> data operations much more rapidly. A vector or<br />

array processing machine is an example of an SIMD machine that distributes data across<br />

63


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

all memories (possibly stores each cell of an array or each column of a matrix in a<br />

different memory area). These machines are designed to execute arithmetic <strong>and</strong> data<br />

operations on a large number of data elements very quickly. A vector machine can<br />

perform operations in constant time if the length of the vectors (arrays) does not exceed<br />

the number of data processors. Most supercomputers, used for scientific computing in<br />

the 1980’s <strong>and</strong> 1990’s, are based on this architecture.<br />

Multiple instruction streams/single data stream (MISD) systems have multiple instruction<br />

processors <strong>and</strong> a single data processor. Few of these machines have been produced <strong>and</strong><br />

have had no commercial success.<br />

Multiple instruction streams/multiple data streams (MIMD) systems have multiple<br />

instruction processors <strong>and</strong> multiple data processors. There are a diverse variety of MIMD<br />

systems including those constructed from inexpensive off-the-shelf components to much<br />

more expensive interconnected vector processors, <strong>and</strong> many other configurations.<br />

Computers over a network that simultaneously cooperate to complete a single task are<br />

MIMD systems. Computers that have two or more independent processors are another<br />

example. A multiple independent processor machine has the ability to perform more than<br />

one task, simultaneously. lvii<br />

There are three types of performance gains received from parallel processing solutions<br />

for the use of n processors:<br />

• Sub-linear speedup is when the increase in speed is less than<br />

o i.e. five processors yields only 3x speedup<br />

• Linear speedup is when the increase is equal to n<br />

o i.e. five processors yields 5x speedup<br />

• Super-linear speedup is when the increase is greater than n<br />

o i.e. five processors yields 7x speedup<br />

Generally linear or faster speedup is very hard to achieve because of the sequential nature<br />

of most algorithms. Parallel algorithms must be designed to take advantage of parallel<br />

hardware. Parallel systems may have one shared memory area, to which all processors<br />

may have access. In shared memory systems care must be taken to design parallel<br />

algorithms that ensure mutual exclusion, which protects data from being corrupted when<br />

operated on by more than one processor. The results from parallel operations should be<br />

determinate, meaning they should be the same as if done by a sequential algorithm. As<br />

an example, if two processors write to the same variable in memory such that:<br />

• Processor 1 reads: x<br />

• Processor 2 reads: x<br />

64


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

• Processor 1 writes: x = x + 1<br />

• Processor 2 writes: x = x – 1<br />

Depending on the possible orderings of the reads <strong>and</strong> writes the resulting variable could<br />

be x–1, x+1 or x. This is a race condition <strong>and</strong> is an extremely undesirable because the<br />

result depends on chance. Synchronization primitives, such as semaphores <strong>and</strong> monitors,<br />

aid in the resolution of conflicts due to race conditions. The shared memory may be in a<br />

single machine if it has more than one processor or a distributed shared memory, where<br />

individual computers access the same memory area(s) located on another computer(s) on<br />

the network.<br />

Parallel processors must use some means to communicate. This is done on the system<br />

buss <strong>and</strong> with shared memory in the case of a single computer with multiple processors.<br />

When multiple machines are involved, communication can be implemented over a<br />

network using either message passing or a distributed shared memory.<br />

Cost is a very important consideration in distributed computing. A parallel system with n<br />

processors is cheaper to build than a processor that is n-times faster. For tasks that need<br />

to be completed quickly <strong>and</strong> can be performed by more than one thread of execution with<br />

minimal concurrency, parallel processing is an exceptional solution. Many highperformance<br />

or supercomputing machines have parallel processing architectures. The<br />

parallel implementations discussed in the remainder of this book will be based on<br />

distributed computing as opposed to single machines with multiple processors.<br />

65


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Existing Tools for Parallel Processing<br />

The parallel programming systems discussed, PVM, MPI <strong>and</strong> Linda, are implemented<br />

with libraries of function calls that are coded directly into either C or Fortran source code<br />

<strong>and</strong> compiled. There are two primary types of communication used: message passing<br />

(PVM <strong>and</strong> MPI) <strong>and</strong> tuple space (Linda <strong>and</strong> <strong>Synergy</strong>). In message passing a participating<br />

process may send messages to any other process, directly, which is somewhat similar to<br />

inter-process communication (IPC) in the Linux/UNIX operating system. In fact, both<br />

message passing <strong>and</strong> tuple space systems are implemented with sockets in the<br />

Linux/UNIX environment. A tuple space is a type of distributed shared memory that is<br />

used by participating processes to hold messages. These messages can be posted or<br />

obtained by any of the participants. All of these programs function by the use of<br />

“master” <strong>and</strong> “worker” designations. The master is generally responsible to break the<br />

task into pieces <strong>and</strong> to assemble the results. The workers are responsible to complete<br />

their piece of the task. These systems are communicate over computer networks <strong>and</strong><br />

typically have some type of middleware to facilitate cooperation between machines, such<br />

as the cluster discussed below.<br />

Computer Clusters<br />

Computer clusters, sometimes referred to as server farms, are groups of connected<br />

computers that form a parallel computer by working together to complete tasks. Clusters<br />

were originally developed in the 1980’s by Digital Equipment Corporation (DEC) to<br />

facilitate parallel computing <strong>and</strong> file <strong>and</strong> peripheral device sharing. An example of a<br />

cluster would be a Linux network with some middleware software to implement the<br />

parallelism. Well established cluster systems have procedures to eliminate single point<br />

failures, providing some level of fault tolerance. The four major types of clusters are:<br />

• Director based clusters—one machine directs or controls the behavior of the<br />

cluster <strong>and</strong> usually implemented to enhance performance<br />

• Two-node clusters—two nodes perform the same part of the task or one serves as<br />

a backup in case the other fails to ensure fault tolerance<br />

• Multi-node clusters—may have tens of clustered machines, which are usually on<br />

the same network<br />

• Massively parallel clusters—may have hundreds or thous<strong>and</strong>s of machines on<br />

many networks<br />

Currently, the fastest supercomputing cluster is Earth Simulator at 35.86 TFlops, which is<br />

15 TFlops faster than the second place machine. The main reason for cluster based<br />

66


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

supercomputing, after performance, is cost efficiency. The third fastest supercomputing<br />

cluster is the 17.6 TFlop System X at Virginia Tech. It consists of 1100 dual processor<br />

Apple Power Macintosh G5s running Mac OS X. It cost a mere $5.2 million, which is 10<br />

percent of the cost of much slower mainframe supercomputers.<br />

The Parallel Virtual Machine (PVM)<br />

The Parallel Virtual Machine (PVM), a software tool to implement a system of<br />

networked parallel computers, was originally developed by Oak Ridge National<br />

Laboratory (ORNL) in 1989 by Vaidy Sunderam <strong>and</strong> Al Geist. Version 1 was a<br />

prototype that was only used internally for research .PVM was later rewritten by<br />

University of Tennessee <strong>and</strong> released as Version 2 in 1991, which was used primarily for<br />

scientific applications. PVM Version 3, completed in 1993, supported fault tolerance <strong>and</strong><br />

provided better portability. This system supports C, C++ <strong>and</strong> Fortran programming<br />

languages.<br />

PVM allows a heterogeneous network of machines to function as a single distributed<br />

parallel processor. This system uses message-passing model as a means to implement the<br />

sharing of tasks between machines. Programmers use PVM’s message passing to take<br />

advantage of the computational power of possibly many computers of various types in a<br />

distributed system, making them appear to be one virtual machine. PVM’s API has a<br />

collection of functions to facilitate parallel programming by message passing. To spawn<br />

workers, the pvm_spawn() function is called:<br />

int status = pvm_spawn(char* task, char** argv, int flag, char* where, int<br />

ntask, int* tid);<br />

where status is an integer that holds the number of tasks successfully spawned, task is the<br />

name of the executable to start, argv is the arguments for the task program, flag is an<br />

integer that specifies PVM options, where is the identifier of a host or system in which to<br />

start a process, ntask is an integer holding the number of task processes to start, <strong>and</strong> tid is<br />

an array to hold the task process ID’s. To end another task process, use the pvm_kill()<br />

function:<br />

int status = pvm_kill(int tid)<br />

where status contains information about the operation, <strong>and</strong> tid is the task process number<br />

to kill. To end the calling task, use the pvm_exit() function:<br />

int status = pvm_exit();<br />

67


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

where status contains information about the operation. To obtain the task process ID of<br />

the calling function, use the pvm_mytid() function:<br />

int myid = pvm_mytid();<br />

where myid is an integer holding the calling function’s task process ID. To obtain the<br />

task process ID of the calling function’s parent, use the pvm_mytid() function:<br />

int pid = pvm_parent();<br />

where pid is an integer holding the parent function’s task process ID. To send a message,<br />

the buffer must be initialized by calling the pvm_initsend() function:<br />

int bufid = pvm_initsend(int encoding);<br />

where bufid is the buffers ID number, <strong>and</strong> encoding is the method used to pack the<br />

message. To pack a string message into the buffer, use the pvm_pkstr() function:<br />

int status = pvm_pkstr(char* msg);<br />

where status contains information about the operation, <strong>and</strong> msg is a null terminated<br />

string. This function basically packs the array msg into the buffer. There are other<br />

functions to pack arrays of other data into the buffer. For a complete listing, see the<br />

PVM <strong>User</strong>’s Guide listed in the references. To send a message use the pvm_send()<br />

function:<br />

int status = pvm_send(int tid, int msgtag);<br />

where status contains information about the operation, tid is the task process number of<br />

the recipient, <strong>and</strong> msgtag is the message identifier. To receive a message, use the<br />

pvm_recv() function:<br />

int bufid = pvm_recv(int tid, int msgtag);<br />

where bufid is the buffers ID number, tid is the task process number of the sender, <strong>and</strong><br />

msgtag is the message identifier. This is a blocking receive. Entering “-1” as the tid<br />

value is a wildcard receive <strong>and</strong> will accept messages from all task processes. To unpack<br />

a buffer, use the pvm_upkstr() function:<br />

int status = pvm_upkstr(char* msg);<br />

where status contains information about the operation, <strong>and</strong> msg is a string in which to<br />

store the message. To compile <strong>and</strong> run a PVM application type:<br />

68


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

[c615111@owin ~/pvm ]>aimk master slave<br />

[c615111@owin ~/pvm ]>master<br />

The amik comm<strong>and</strong> compiles the application <strong>and</strong> the executable name of the master<br />

executable runs the application. An example of a PVM “Hello worker—Hello master”<br />

application is below. It demonstrates the structure of a basic PVM program. The master<br />

program is:<br />

// master.c: “Hello worker” program<br />

#include <br />

#define NUM_WKRS 3<br />

main(){<br />

int status; // Status of operation<br />

int tid[NUM_WKRS];// Array of task ID’s all must be unique in system<br />

int msgtag; // Message tag to ID a message<br />

int flag = 0; // Used to specify options for pvm_spawn<br />

char buf[100]; // Message string buffer<br />

char wkr_arg0 = 0;// Null argument to activate workers<br />

char** wkr_args; // Array of args to activate workers<br />

char host[128]; // Host machine name<br />

// Set wkr_args to start worker program to address of wkr_arg0<br />

// which has been set to 0 (NULL)<br />

wkr_args = &wkr_arg0;<br />

// Get host machine name<br />

gethostname(host, sizeof(host));<br />

// Get my task ID <strong>and</strong> print ID <strong>and</strong> host name to screen<br />

printf("Master: ID is %x, name is %s\n", pvm_mytid(), host);<br />

// Spawn a program executable named “worker”<br />

// Will return the number of workers spawned on success or 0 on error<br />

// The empty string (fourth arg) requests any machine<br />

// Putting a name in this arg would request a specific machine<br />

status = pvm_spawn("worker", wkr_args, flag, "", NUM_WKRS, tid);<br />

// If spawn was successful it will return NUM_WKRS<br />

// since there are NUM_WKRS workers<br />

if(status == NUM_WKRS){<br />

// Label first message as 1<br />

msgtag = 1;<br />

// Put message in buffer<br />

sprintf(buf, "Hello worker from %s", host);<br />

// Initialize the send message operation<br />

pvm_initsend(PvmDataDefault);<br />

69


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

// Transfer the message to PVM storage<br />

pvm_pkstr(buf);<br />

// Send the message signal to all workers<br />

for(i=0; i< NUM_WKRS; i++)<br />

pvm_send(tid[i], msgtag);<br />

// Print messages sent to workers<br />

printf(“Master: Messages sent to %d workers\n”)<br />

// Get replies from workers<br />

for(i=0; i< NUM_WKRS; i++){<br />

}<br />

// Execute a blocking receive to wait for reply from any (-1) worker<br />

pvm_recv(-1, msgtag);<br />

// Put the received message in the buffer<br />

pvm_upkstr(buf);<br />

// Print the message<br />

printf("Master: From %x: %s\n", tid, buf);<br />

// Print end message<br />

printf(“Master: Application is finished\n”);<br />

}<br />

}<br />

// Else the spawn was not successful<br />

else<br />

printf("Cannot start worker program\n");<br />

// Exit application<br />

pvm_exit();<br />

The master program spawns a number of workers, sends the “Hello worker…” message<br />

<strong>and</strong> waits for a reply. After the reply is received, it is printed to screen <strong>and</strong> the master<br />

terminates. The worker program is:<br />

// worker.c: “Hello Master” program<br />

#include <br />

main(){<br />

int ptid; // Parents task ID<br />

int msgtag; // Message tag to ID a message<br />

char buf[100]; // Message string buffer<br />

char host[128]; // Host machine name<br />

FILE* fd; // File in which to write master’s message<br />

// Open file to store message<br />

fd = fopen(“msg.txt”, "a");<br />

70


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

}<br />

// Get host machine name<br />

gethostname(host, sizeof(host));<br />

// Get parents task ID<br />

ptid = pvm_parent();<br />

// Label first message as 1<br />

msgtag = 1;<br />

// Execute a blocking receive to wait for message from master<br />

pvm_recv(ptid, msgtag);<br />

// Put the received message in the buffer<br />

pvm_upkstr(buf);<br />

// Print the message to file<br />

fprintf(fd, "Worker: From %x: %s\n", ptid, buf);<br />

// Put reply message in buffer<br />

sprintf(buf, "Hello master from %s", host);<br />

// Initialize the send message operation<br />

pvm_initsend(PvmDataDefault);<br />

// Transfer the message to PVM storage<br />

pvm_pkstr(buf);<br />

// Send the message signal to master<br />

pvm_send(ptid, msgtag);<br />

// Close file<br />

fclose(fd);<br />

// Exit application<br />

pvm_exit();<br />

The worker waits for the initial message from the master, writes the message to a file,<br />

sends a reply <strong>and</strong> terminates. The output on the master machine would resemble:<br />

[c615111@owin ~/pvm ]>master<br />

Master: ID is 0, name is owin<br />

Master: Messages sent to 3 workers<br />

Master: From 3: Hello master from saber<br />

Master: From 1: Hello master from sarlac<br />

Master: From 2: Hello master from owin<br />

Master: Application is finished<br />

All the workers output can be redirected to the master’s terminal by running the<br />

application in PVM’s console, which can be started by typing:<br />

[c615111@owin ~/pvm ]>pvm<br />

71


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

pvm>spawn -> master<br />

Typing “pvm” at the comm<strong>and</strong> prompt activates the console <strong>and</strong> typing “spawn -><br />

master” at the console prompt executes the application in console mode. The “->” causes<br />

all worker screen output to be printed on the masters terminal. At any point or time in a<br />

parallel application any executing PVM task (worker) may:<br />

• Create or terminate other tasks<br />

• Add or remove computers from the parallel virtual machine<br />

• Have any of its process communicate with any other task’s processes<br />

• Have any of its process synchronize with any other task’s processes<br />

By proper use of PVM constructs <strong>and</strong> host language control-flow statements, any specific<br />

dependency <strong>and</strong> control structure may be employed under the PVM system. Because of<br />

its easy to use programming interface <strong>and</strong> its implementation of the virtual machine<br />

concept, PVM became popular in the high-performance scientific computing community.<br />

Currently it is not being developed but it made a significant contribution to modern<br />

distributed processing designs <strong>and</strong> implementations. lviii<br />

Message Passing Interface (MPI/MPICH)<br />

The Message Passing Interface (MPI) is a communications protocol that was introduced<br />

in 1994. It is the product of a community effort to define the semantics <strong>and</strong> syntax for a<br />

core set of message passing libraries for use by a wide variety of users <strong>and</strong> that could be<br />

used on a wide variety of MPP systems. MPI is not a st<strong>and</strong>alone parallel system for<br />

distributed computing because it does not include facilities to manage processes,<br />

configure virtual machines or support input/output operations. It has become a st<strong>and</strong>ard<br />

for communication among machines running parallel programs on distributed memory<br />

systems. MPI is primarily a library of routines that can be invoked from programs<br />

written in the C, C++ or Fortran languages. Its differential advantages over older<br />

protocols are portability <strong>and</strong> performance. Its more portable because MPI has an<br />

implementation for almost every distributed system <strong>and</strong> faster because it is optimized for<br />

the specific hardware on which it is run. MPICH is the most commonly used<br />

implementation of MPI.<br />

The MPI API has hundreds of function calls to perform various operations within a<br />

parallel program. Many of these function calls are similar to IPC calls in the UNIX<br />

operating system. Some of the basic MPI functions will be briefly explained <strong>and</strong> used in<br />

an example program. Before any MPI operations can be used in a program the MPI<br />

interface must be initialized with the MPI_Init() function:<br />

72


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

MPI_Init(&argc, &argv);<br />

where argc is the number of arguments <strong>and</strong> argv is a vector of strings, both of which<br />

should be taken as comm<strong>and</strong> line arguments because the same program will be used for<br />

both the master <strong>and</strong> worker processes in the example application. After initialization, a<br />

program must determine its rank by calling MPI_Comm_rank(), designated by process<br />

number, to determine if it is the master or a worker process. The master will be process<br />

number 0. The function call is:<br />

MPI_Comm_rank(MPI_Comm comm, int* rank);<br />

where comm is a communicator <strong>and</strong> is defined in MPI’s libraries <strong>and</strong> rank is a reference<br />

pointer to an integer to hold this process’ rank. It may also be necessary for an<br />

application to determine the number of currently running processes. The<br />

MPI_Comm_size() function returns this number. The function call is:<br />

MPI_Comm_size(MPI_Comm comm, int* size);<br />

where comm is a communicator <strong>and</strong> is defined in MPI’s libraries <strong>and</strong> size is a reference<br />

pointer to an integer to hold the number processes. To send a message to another process<br />

the MPI_Send() function is used as such:<br />

MPI_Send(void* msg, strlen(msg)+1, MPI_Datatype type, int dest, int tag,<br />

MPI_Comm comm);<br />

where msg is a message buffer, strlen(msg)+1 sets the length of the message <strong>and</strong> its null<br />

terminal, type is the data type of the message as defined by MPI’s libraries, dest is an<br />

integer holding the process number of the destination, tag is an integer holding the<br />

message tag, <strong>and</strong> comm is a communicator <strong>and</strong> is defined in MPI’s libraries. This is a<br />

blocking send <strong>and</strong> will wait for the destination to receive the message before executing<br />

further instructions. To receive a message the MPI_Recv() function is used as such:<br />

MPI_Recv(void* msg, int size, MPI_Datatype type, int source, int tag, MPI_Comm<br />

comm, MPI_Status* status)<br />

where msg is a message buffer, is an integer holding the size actual size of the receiving<br />

buffer, type is the data type of the message as defined by MPI’s libraries, source is an<br />

integer holding the process number of the source, tag is an integer holding the message<br />

tag, comm is a communicator <strong>and</strong> is defined in MPI’s libraries, <strong>and</strong> status is the data<br />

about the receive operation. To end an MPI application session the MPI_Finalize()<br />

function is called:<br />

73


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

MPI_Finalize();<br />

which disables the MPI interface. To compile <strong>and</strong> run an MPI application type:<br />

[c615111@owin ~/mpi ]>mpicc -o hello hello.c<br />

[c615111@owin ~/mpi ]>mpirun –np 4 hello<br />

The mpirun comm<strong>and</strong> activates a MPI application named “hello” with 4 processes (1<br />

master <strong>and</strong> 3 workers) <strong>and</strong> the mpicc comm<strong>and</strong> is actually not a proprietary compiler. It<br />

is a definition that is equivalent a call to the cc compiler with the following arguments to<br />

access the proper libraries:<br />

[c615111@owin ~/mpi ]>cc -o hello hello.c -I/usr/local/mpi/include\<br />

-L/usr/local/mpi/lib -lmpi<br />

An example of an MPI application is:<br />

// hello.c program<br />

#include <br />

#include “mpi.h”<br />

main(int argc, char** argv){<br />

int my_rank;<br />

int p;<br />

int source;<br />

int dest;<br />

int tag = 50;<br />

char buf[100];<br />

MPI_Status status;<br />

FILE* fd;<br />

// Rank of process<br />

// Number of processes<br />

// Rank of sender in loops<br />

// Rank of receiver<br />

// Tag for messages<br />

// Storage buffer for the message<br />

// Return status for receive<br />

// File in which to write master’s message<br />

// Open file to store message<br />

fd = fopen(“msg.txt”, "a");<br />

// Get host machine name<br />

gethostname(host, sizeof(host));<br />

// Initialize MPI application session<br />

// No MPI functions may be used until this is called<br />

// This function may only be called once<br />

MPI_Init(&argc, &argv);<br />

// Get my rank<br />

// Master’s rank will be ‘0’<br />

// Worker’s ranks will be greater than ‘0’<br />

MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);<br />

// Get the number of running processes<br />

MPI_Comm_size(MPI_COMM_WORLD, &p);<br />

// If my_rank != 0, I am a worker<br />

74


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

if (my_rank != 0){<br />

}<br />

// Set source to ‘0’ for master<br />

source = 0;<br />

// Receive message from master i<br />

MPI_Recv(buf, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status);<br />

// Print the message to file<br />

fprintf(fd, "Worker: %s\n", buf);<br />

// Put reply in buffer<br />

sprintf(buf, “Hello master from %s number %d”, buf, my_rank);<br />

// Set destination to ‘0’ for master<br />

dest = 0;<br />

// Send the reply to master<br />

// Use strlen(buf)+1 to include '\0'<br />

MPI_Send(buf, strlen(buf)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);<br />

// Else my_rank == 0 <strong>and</strong> I am the master<br />

else{<br />

// Get my task ID <strong>and</strong> print ID <strong>and</strong> host name to screen<br />

printf("Master: ID rank %d, name is %s\n", my_rank, host);<br />

}<br />

// Put reply in buffer<br />

sprintf(buf, “Hello worker from %s number %d”, buf, my_rank);<br />

// Send messages to all workers<br />

for (dest=1; dest


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

}<br />

// Close file<br />

fclose(fd);<br />

// Print end message<br />

printf(“Master: Application is finished\n”);<br />

// End MPI application session<br />

// No MPI functions may be called after this function is called<br />

MPI—Finalize();<br />

The screen output on the master machine would resemble:<br />

Master: ID rank 0, name is owin<br />

Master: Sent: Hello worker from owin number 0 to 1<br />

Master: Sent: Hello worker from owin number 0 to 2<br />

Master: Sent: Hello worker from owin number 0 to 3<br />

Master: Received: Hello master from saber number 3<br />

Master: Received: Hello master from owin number 1<br />

Master: Received: Hello master from sarlac number 2<br />

Master: Application is finished<br />

Linda<br />

Linda is an environment <strong>and</strong> coordination language for parallel processing that was<br />

initially developed as a research project <strong>and</strong> a commercial product at Yale University by<br />

David Gelernter <strong>and</strong> Nicolas Carriero. Linda’s design is based on a compromise between<br />

message passing <strong>and</strong> shared memory within a distributed parallel processing system.<br />

This system introduced the concept of a tuple space, which is a distributed shared<br />

memory area in which machines can communicate by reading, taking or putting tuples.<br />

A single tuple space is created when the master program is executed. Tuples are similar<br />

to a vector data type but do not have specified primitive or structured data types<br />

contained within them. This allows any data to be stored in a binary format within the<br />

tuple space. Any combination of mixed data types can be placed not only into a tuple<br />

space but also in individual tuples within the space. Linda tuples may have a maximum<br />

of 16 fields, which are separated by commas. Entries in the tuple space are identified by<br />

names or numerical values in the tuple’s data rather than as an address in local machines.<br />

An example of a tuple space entry with 3 fields is:<br />

(“string”, 123, 45.678);<br />

which contains a character string, an integer <strong>and</strong> a floating point number, respectively.<br />

There are two kinds of tuples in Linda: active tuples, also called live or process tuples,<br />

are tuples that are under active evaluation, <strong>and</strong> passive tuples, also called data tuples, are<br />

76


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

entries in the tuple space similar to the example above. Active tuples are created with the<br />

eval() function. The function call:<br />

eval(“worker”, worker());<br />

would create a tuple entry with “worker” in the first field <strong>and</strong> spawn a new process that<br />

will immediately call the worker() function. Passive tuples are created <strong>and</strong> added to the<br />

tuple space with the Linda’s out() function. The function call:<br />

out(“string”, 123, 45.678);<br />

would create the tuple <strong>and</strong> add it to the tuple space.<br />

Data can be either read or removed from the tuple space. A template is used to retrieve a<br />

tuple from the tuple space by matching a pattern in the fields of a tuple’s fields. The<br />

following conditions must be met to match a template to a tuple:<br />

1. The template <strong>and</strong> tuple both must have the same number of fields.<br />

2. The template <strong>and</strong> tuple both must have the same types, values, <strong>and</strong> length of all<br />

literal values in corresponding fields.<br />

3. The template <strong>and</strong> tuple both must have matching types <strong>and</strong> lengths of all formals<br />

in the corresponding fields.<br />

A read operation, using the rd() function, leaves the tuple for other processes to access.<br />

The function call:<br />

rd(“string”, 123, ? A);<br />

reads a three entry tuple that has “string” as its first element <strong>and</strong> 123 as its second. The<br />

data in the third element is placed in the A variable. The in() function gets <strong>and</strong> removes<br />

an entry from the tuple space. The function call:<br />

in(“string”, 123, ? A);<br />

gets a three entry tuple that has “string” as its first element <strong>and</strong> 123 as its second. The<br />

data in the third element is placed in the A variable <strong>and</strong> the entry is removed from the<br />

tuple space.<br />

Programming for a tuple space is similar to programming for shared memory because all<br />

participating processes share it. However it is also similar to message passing because<br />

entries are posted <strong>and</strong> taken from it. The major benefit of this system is that participants<br />

can enter <strong>and</strong> leave the system without formerly announcing an arrival or departure.<br />

77


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

They can also take messages, data or tasks from the tuple space at their own pace, which<br />

can balance the workload, giving more work to machines capable of greater performance,<br />

<strong>and</strong> decrease the overall duration of a given task. Tuple spaces <strong>and</strong> load balancing will<br />

be discussed further in later sections.<br />

It should also be noted that Linda tuple spaces do not observe a first in first out (FIFO)<br />

structure. Reading or retrieving an entry may not necessarily obtain the oldest entry,<br />

which may cause programming errors if this structure is assumed. Linda parallel<br />

programs are written with both the master <strong>and</strong> worker programs in the same source file.<br />

The master function is the main function <strong>and</strong> the worker is a named function. Linda has<br />

its own built in compiler to compile the executable. To compile <strong>and</strong> execute a distributed<br />

network application type:<br />

[c615111@owin ~/linda ]>clc -o hello hello.cl<br />

[c615111@owin ~/linda ]>ntsnet hello<br />

The clc comm<strong>and</strong> activates Linda’s compiler <strong>and</strong> the ntsnet comm<strong>and</strong> executes the hello<br />

program as a network application. An example of a Linda master or main function for<br />

the “Hello worker—Hello Master” application is:<br />

// hello.cl program<br />

#define NUM_WKRS 3<br />

real_main(int argc, char* argv){<br />

int i;<br />

// Loop counter<br />

int hello(); // Function declaration<br />

char buf[100]; // Message string buffer<br />

char host[128]; // Host machine name<br />

// Get host machine name<br />

gethostname(host, sizeof(host));<br />

// Print master’s name<br />

printf("Master: Name is %s\n", host);<br />

// Put message in buffer<br />

sprintf(buf, "Hello workers from %s", host);<br />

// Put the message in the tuple space<br />

out("message", buf);<br />

// Start the workers<br />

for (i=0; i< NUM_WKRS; i++)<br />

// Start an active tuple (a worker process)<br />

eval("worker", worker(i));<br />

// Get all workers’ reply from tuple space<br />

78


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

}<br />

for (i=0; i< NUM_WKRS; i++){<br />

}<br />

// Get reply <strong>and</strong> remove from tuple space<br />

in("reply", ? buf);<br />

// Print reply to screen<br />

printf(“Master: %s\n”, buf);<br />

// Print end message to screen<br />

printf("Master: Application is finished\n");<br />

// End the master<br />

return(0);<br />

An example of a worker function is:<br />

// The worker function<br />

worker(int i){<br />

}<br />

char buf[100]; // Message string buffer<br />

char host[128]; // Host machine name<br />

// Get host machine name<br />

gethostname(host, sizeof(host));<br />

// Read the message from tuple space<br />

rd(“message”, ? buf);<br />

// Print the message to screen<br />

printf("Worker: %s number %d got %s\n", host, i, buf);<br />

// Put message in buffer<br />

sprintf(buf, "Hello master from %s number %d", host, i);<br />

// Put reply in tuple space<br />

out("reply", buf);<br />

// Print end message to screen<br />

printf("Worker: %s finished\n");<br />

// End the worker<br />

return(0);<br />

Linda prints both the master <strong>and</strong> workers’ output to the master’s screen. The screen<br />

output on the master machine would resemble:<br />

[c615111@owin ~/fpc01 ]>ntsnet hello<br />

Master: Name is owin<br />

79


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Worker: saber number 1 got Hello workers from owin<br />

Worker: owin number 0 got Hello workers from owin<br />

Worker: owin finished<br />

Worker: sarlac number 2 got Hello workers from owin<br />

Master: Hello master from sarlac number 2<br />

Worker: saber finished<br />

Worker: sarlac finished<br />

Master: Hello master from saber number 1<br />

Master: Hello master from owin number 0<br />

Master: Application is finished<br />

It should also be noted that global variables in Linda applications are not transferred to<br />

workers. Using global variables will have unpredictable results. lix<br />

80


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Parallel Programming Concepts<br />

Stateless Parallel Processing (SPP)<br />

The Stateless Parallel Processing architecture is comprised of “fully configured<br />

computers” connected by a “multiple redundant switching network” that form a<br />

“unidirectional virtual ring network”, as shown below. Multiple direct paths are provided<br />

from each node to every other node. Redundancy allows for scalable performance <strong>and</strong><br />

fault tolerance.<br />

Multiple<br />

Redundant<br />

Switching<br />

Network<br />

Unidirectional<br />

Virtual Ring<br />

Network<br />

Fully<br />

Configured<br />

Computers<br />

The Stateless Parallel Processing Architecture<br />

Please note that the unidirectional “virtual” network is implemented through the multiple<br />

redundant switching network’s hardware <strong>and</strong> is not an actual physical ring. Each<br />

computer might have only one network interface adapter card. Each node on the virtual<br />

ring is aware of every other node because each maintains a current list of all participating<br />

nodes. Each node can also detect <strong>and</strong> isolate faulty nodes. The SPP virtual ring’s<br />

responsibility is limited to tuple queries <strong>and</strong> SPP backbone management. Tuple data is<br />

transmitted directly from point to point. This ring also provides full b<strong>and</strong>width support<br />

for multicast communication through the network, where all nodes can access multicast<br />

81


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

messages. The diagram below shows a conceptual representation of a unidirectional<br />

virtual ring, where the arrows may represent possibly a single multicast message that all<br />

nodes can acquire. The multiple switch network can transport a massive amount of data<br />

between machines.<br />

P1<br />

P8<br />

P2<br />

P7<br />

P3<br />

P6<br />

P4<br />

P5<br />

The Unidirectional Virtual Ring Configuration<br />

The tuple space model allows participating processes to acquire massages from a current<br />

tuple space without temporal restrictions. Processes can take messages when they are<br />

ready without causing a work stoppage, unlike communication methods that uses a<br />

blocking send. In this design, tuples flow freely through the network from process to<br />

process. Each process will perform a part of the task by taking work date tuples from the<br />

tuple space at its own pace. The processes are purely data driven <strong>and</strong> will activate or<br />

continue processing only when it receives required data. There are no explicit global<br />

state controls in this “stateless” system, which ensures fault tolerance. If a process fails<br />

the system can recover because the data can be renewed in the tuple space <strong>and</strong> taken by<br />

another worker process.<br />

SPP applications use a parallel processing model called “scatter <strong>and</strong> gather”, involving<br />

master <strong>and</strong> worker processes. A master process is the application controller for the<br />

worker processes. In a single task, single pass application, it divides the task into n<br />

subtasks, places the work data tuples in a tuple space, collects the completed subtasks<br />

82


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

from a tuple space, <strong>and</strong> directs the workers to terminate when all of the results are<br />

received. The three diagrams below show possible contents during an applications<br />

execution.<br />

Owin<br />

Saber<br />

Sarlac<br />

Luke<br />

Owin<br />

Saber<br />

Sarlac<br />

Luke<br />

Owin<br />

Saber<br />

Sarlac<br />

Luke<br />

(Another message tuple)<br />

(Data tuple 1)<br />

(A message tuple)<br />

ProblemTuple Space<br />

(Data tuple 2)<br />

...<br />

(Data tuple n)<br />

Result Tuple Space<br />

(Result tuple 1)<br />

(Result tuple 2)<br />

...<br />

(Result tuple n)<br />

(A message tuple)<br />

ProblemTuple Space<br />

(Another message tuple)<br />

(Termination tuple)<br />

The<br />

left-most diagram shows a problem tuple space, where work data is stored, after<br />

messages to workers <strong>and</strong> work data tuples have received. The center shows a result tuple<br />

space, where the master will receive completed subtasks. The right-most diagram shows<br />

a problem tuple space with a termination tuple, also called a poison pill, which instructs<br />

the workers to terminate. Notice that the message tuples remain in the tuple space <strong>and</strong><br />

that the data tuples are removed. This is because the messages were accessed by a read<br />

operation <strong>and</strong> the data tuples were accessed by a take operation. If the terminal message<br />

is accessed by a take operation, it must be replaced so that the next worker can access it.<br />

This scenario assumes a parallel system that can create multiple tuple spaces, such a<br />

synergy. If the system is limited to one, then it depends more heavily on name pattern<br />

matching of tuples.<br />

The master program with its accompanying tuple spaces can reside on any participating<br />

node. The worker processes take work tuples from the tuple space that match a tuple<br />

query, put the results into the result tuple space, until all work is completed, <strong>and</strong><br />

terminate when they get the terminate message tuple from the master. The diagram<br />

below shows a possible master-worker configuration. It should be noted that the master<br />

machine generally has both a master process <strong>and</strong> a worker process. Otherwise a valuable<br />

system resource would be wasted because the master machine would be idle between<br />

receiving results.<br />

83


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Initial<br />

Requests Are<br />

Multicast On<br />

Virtual Ring<br />

Network<br />

Multiple<br />

Switching<br />

Network<br />

W<br />

W<br />

W<br />

W<br />

W<br />

M<br />

W<br />

Node Running<br />

Master<br />

Program<br />

W<br />

Nodes Running<br />

Worker<br />

Programs<br />

The SPP Architectural Support<br />

Stateless Machine (SLM)<br />

A stateless machine (SLM) is a fully implemented stateless parallel processing system.<br />

An SLM should provide an API that offers a robust but easy to use interface with the<br />

system’s functionality. It should have a fault tolerance facility to recover from dropped<br />

hosts <strong>and</strong> lost data. The network structure should offer high efficiency <strong>and</strong> high<br />

performance. The locations of processes should be transparent for all participating<br />

processes in the application, meaning that the system should h<strong>and</strong>le communication<br />

between machines <strong>and</strong> not be directly noticeable to running programs. The workload<br />

should be balanced between the participating processes, where each process is kept busy<br />

until all work is complete.<br />

Linda Tuple Spaces Revisited<br />

84


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

As previously mentioned, the tuple space was first defined in the Linda distributed<br />

parallel programming implementation as a method of multi-machine inter-process<br />

coordination. It’s easiest to think of a Linda tuple space as a buffer, a virtual bag or a<br />

public repository that cooperating processes from different computers can put tuples in,<br />

or read <strong>and</strong> get tuples from. It’s a type of distributed shared memory, where any process<br />

can access any tuple, regardless of its storage location. A tuple space is not a physical<br />

shared memory. It is a logical shared memory because processes have to access it<br />

through an intermediary or tuple h<strong>and</strong>ling process. The API only makes the tuple space<br />

appear to be physically shared memory. The computers, though physically dispersed,<br />

must be part of some distributed system. The machines can communicate with each other<br />

without really being aware that any of the other machines exist, other than the data passed<br />

through the tuple space. Heterogeneous data types can be stored in tuples <strong>and</strong> differently<br />

structured tuples can be placed in the tuple space. Hence, all of the following data types:<br />

char name[4] = {“Bob”};<br />

int number = 12;<br />

double fraction = 34.56;<br />

can be placed in the same tuple:<br />

(name, number, fraction)<br />

<strong>and</strong> all of the following tuples:<br />

(name, number, fraction)<br />

(102, 73, 36, 125, 67.5, 1000)<br />

(“Sally”, “123 Broad St”, “Philadelphia PA 19024”, “555-123-4567”)<br />

can be placed in the same tuple space.<br />

85


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Owin<br />

Saber<br />

Sarlac<br />

Luke<br />

("Bob", 12, 34.56)<br />

(102, 73, 36, 125, 67.5, 1000)<br />

Tuple Space<br />

("Sally", "123 Broad St",<br />

"Philadelphia PA 19024",<br />

"555-123-4567")<br />

Tuples are placed in <strong>and</strong> retrieved from tuple spaces by function calls, previously<br />

described, that match a pattern from a template. A template is essentially a tuple that is<br />

used to express a pattern. The template:<br />

(? A, 12, ? B)<br />

where A is a string <strong>and</strong> B is a double, matches:<br />

(name, number, fraction) = (“Bob”, 12, 34.56)<br />

However, this template will not match the other tuples in the example above. The<br />

general rules for a Linda tuple were stated previously. This is called an associative<br />

memory because elements or tuples in the memory are accessed by associating them,<br />

synonymously, with a pattern in their content as opposed to being referenced by a<br />

memory address or physical location.<br />

Active tuples in Linda are based on the generative communication model, where<br />

dynamically spawned processes are turned into data upon completion of their task. The<br />

eval(“worker”, worker()) function will leave a tuple in the tuple space with two fields<br />

from the called worker function:<br />

worker(){<br />

// perform task<br />

86


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

return 0;<br />

}<br />

will place a tuple with the name assigned from the process that spawned the worker<br />

function in the first field (in this case “worker”) <strong>and</strong> the return value of the worker<br />

function. All tuples placed by the worker into the tuple space will be accessible by all<br />

other processes even after the worker terminates. The tuple from the example above after<br />

the eval() function returns would be:<br />

(“worker”, 0)<br />

Since the concept was pioneered at Yale, many languages have been implemented using<br />

variants of Linda’s tuple space model, including LiPS, ActorSpaces, TSpace,<br />

PageSpaces, OpenSpaces, Jini/Javaspaces, <strong>Synergy</strong>, etc.<br />

87


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Theory <strong>and</strong> Challenges of Parallel Programs <strong>and</strong> Performance<br />

Evaluation<br />

Basic Logic<br />

Logic is the study of the reasoning of arguments <strong>and</strong> is both a branch of mathematics <strong>and</strong><br />

a branch of philosophy. In the mathematical sense, it is the study of mathematical<br />

properties <strong>and</strong> relations, such as soundness <strong>and</strong> completeness of arguments. In the<br />

philosophical sense, logic is the study of the correctness of arguments. A logic is<br />

comprised of an informal language coupled with model-theoretic semantics <strong>and</strong>/or a<br />

deductive system. The language allows the arguments to be stated, which is similar to<br />

the way we state our thoughts in written or spoken languages. The semantics provide a<br />

definition of possible truth-conditions for arguments <strong>and</strong> the deductive system provides<br />

inferences that are correct for the given language.<br />

This section introduces formal logics that can be used as methods to design program logic<br />

<strong>and</strong> prove that the logic is sound. Systems based on propositional logic have been<br />

produced to facilitate the design <strong>and</strong> proofs for sequential programs. However, these<br />

systems were inadequate for concurrent applications. Variations of temporal logic, which<br />

is based on modal logic, are used to evaluate the logic of concurrent programs.<br />

Propositional Logic<br />

Symbolic logic is divided into several parts of which propositional calculus is the most<br />

fundamental. A proposition, or statement, is any declarative sentence, which is either<br />

true or false. We refer to true (T) or false (F) as the truth-value of the statement.<br />

“1 + 1 = 2” is a true statement.<br />

“1 + 1 = 11” is a false statement.<br />

“Tomorrow will be a sunny day” is a proposition whose truth is yet to be determined.<br />

“The number 1” is not a proposition because it is not a sentence.<br />

Simple statements are those that represent a single idea or subject <strong>and</strong> contain no other<br />

statements within. Simple statements will be represented by the symbols: p, q, r <strong>and</strong> s. If<br />

p st<strong>and</strong>s for the proposition: “ice is cold”, we denote it as:<br />

p: “ice is cold”,<br />

which is read as:<br />

88


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

p is the statement “ice is cold”.<br />

The following is an example of a simple statement assertion <strong>and</strong> negation.<br />

p assertion p is true if p is true or p is false if p is false.<br />

¬p negation ¬p is false if p is true or ¬p is true if p is false.<br />

Then for the true statement: p: “ice is cold”, ¬p is the statement that “ice is not cold”,<br />

which is false.<br />

A compound statement is made up of two or more simple statements. The simple<br />

statements are known as components of the compound statement. These components<br />

may be made up of smaller components. Operators, or connectives, separate<br />

components. The sentential connectives are disjunction (∨, pronounce as OR),<br />

conjunction (∧, pronounce as AND), implication (→, pronounce as IF) <strong>and</strong> equivalence<br />

(↔, pronounce as IF AND ONLY IF). These are called sentential because they join<br />

statements, or sentences, into compound sentences. They are binary operators because<br />

they operate on two components or statements. Equivalence statements (p↔q) are also<br />

called biconditionals, <strong>and</strong> implication statements (p→q) are also called conditionals. In<br />

the p → q conditional statement, the "if- clause" or first statement, p, is called the<br />

antecedent <strong>and</strong> the "then-clause" or second statement, q, is called the consequent. The<br />

antecedent <strong>and</strong> consequent could be compounds in more complicated conditionals rather<br />

than the simple statements shown above. These terms are used for all the binary<br />

operators listed above. Negation (¬) is called a unary operator because it only operates<br />

on one component or statement. The following define the conditions under which<br />

components joined with connectives are true; otherwise they are false:<br />

p∨q disjunction either p is true, or q is true, or both are true<br />

p∧q conjunction both p <strong>and</strong> q are true<br />

p→q implication if p is true, then q is true<br />

p↔q equivalence p <strong>and</strong> q are either both true or both false<br />

The statements:<br />

p: “ice is cold”<br />

q: 1 + 1 = 2<br />

r: “water is dry”<br />

s: 1 + 1 = 11<br />

under conjunction:<br />

89


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

p∧q is true because “ice is cold” is true <strong>and</strong> “1 + 1 = 2” is true<br />

p∧r is false because “ice is cold” is true <strong>and</strong> “1 + 1 = 11” is false<br />

s∧q is false because “1 + 1 = 11” is false <strong>and</strong> “1 + 1 = 2” is true<br />

r∧s is false because “water is dry” is false <strong>and</strong> “1 + 1 = 11” is false<br />

All meaningful statements will have a truth-value. The truth-value of a statement<br />

designates the statement as true T or false F. The statement p is either absolutely true or<br />

absolutely false. If a compound statement’s truth-value can be determined in its entirety<br />

based solely on its components, the compound statement is said to be truth-functional. If<br />

a connective constructs compounds that are all truth-functional, the connective is said to<br />

be truth-functional. Using these conditions it is possible to build truth-functional<br />

compounds from other truth-functional compounds <strong>and</strong> connectives. As an example: if<br />

the truth-values of p <strong>and</strong> of q are known, then we could deduce the truth-value of the<br />

compound using the disjunction connective, p∨q. This establishes that the compound,<br />

p∨q, is a truth-functional compound <strong>and</strong> disjunction is a truth-functional connective. A<br />

truth table contains all possible truth-values for a given statement. The truth table for p<br />

is:<br />

because the simple statement p is either absolutely true or absolutely false. The<br />

following is the truth table of p <strong>and</strong> q for the five previously mentioned operators:<br />

p<br />

T<br />

F<br />

p q ¬p ¬q p∨q p∧q p→q p↔q<br />

T T F F T T T T<br />

T F F T T F F F<br />

F T T F T F T F<br />

F F T T F F T T<br />

Parentheses ( ) are used to group components into whole statements. The whole<br />

compound statement p∧q can be negated by grouping it with parentheses <strong>and</strong> negating<br />

the group ¬(p∧q). The table below shows all negated truth-values for the operators<br />

previous table.<br />

p q ¬(¬p) ¬(¬q) ¬(p∨q) ¬(p∧q) ¬(p→q) ¬(p↔q)<br />

T T T T F F F F<br />

T F T F F T T T<br />

F T F T F T F T<br />

F F F F T T F F<br />

90


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

To avoid an excessive number of parentheses in statements, there is a st<strong>and</strong>ard for<br />

operator precedence. This simply means the order in which operations are performed.<br />

Negation has precedence over conjunction <strong>and</strong> conjunction has precedence over<br />

disjunction. The statement:<br />

¬p∨q is (¬p)∨q not ¬(p∨q)<br />

<strong>and</strong><br />

¬p∨q∧r is ((¬p) ∧q)∨r<br />

A truth table will have 2 n rows, where n is the number of distinct simple statements in the<br />

whole statement. The first truth table for p had only two rows <strong>and</strong> the previous two had<br />

four rows. If p, q <strong>and</strong> r were under consideration, there would be eight rows. To find<br />

which values for p, q, <strong>and</strong> r will evaluate to true for P(p, q, r) = ¬(p∨q)∧(r∨p), construct a<br />

truth table for the statement. Start by placing true values in the top row <strong>and</strong> false values<br />

in the next from the bottom row for one instance of each unique simple statement as<br />

shown below. The last row is to maintain the steps performed by operator precedence<br />

<strong>and</strong> parentheses. Mark all simple statements step 1.<br />

¬ (p ∨ q) ∧ (r ∨ p)<br />

T T T<br />

F F F<br />

1 1 1 1<br />

Then assume all F’s are 0’s <strong>and</strong> all T’s are 1’s, <strong>and</strong> count up the table from 0 to 7 in<br />

binary. Then copy values to all other duplicate simple statements.<br />

¬ (p ∨ q) ∧ (r ∨ p)<br />

T T T T<br />

T T F T<br />

T F T T<br />

T F F T<br />

F T T F<br />

F T F F<br />

F F T F<br />

F F F F<br />

91


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

1 1 1 1<br />

This holds all combinations of F’s <strong>and</strong> T’s relative to the three simple statements.<br />

Remember the pattern in the columns <strong>and</strong> you wont have to count next time. Next mark<br />

the second set columns to be evaluated by precedence <strong>and</strong> fill in the truth-values.<br />

Because of the parentheses, the next columns will be the third <strong>and</strong> seventh.<br />

¬ (p ∨ q) ∧ (r ∨ p)<br />

T T T T T T<br />

T T T F T T<br />

T T F T T T<br />

T T F F T T<br />

F T T T T F<br />

F T T F F F<br />

F F F T T F<br />

F F F F F F<br />

1 2 1 1 2 1<br />

Negation has precedence over conjunction. Hence the first column is the negation of the<br />

third. To find the truth-values for conjunction, consider the highest values in the last row<br />

on each side, which is column one on the left <strong>and</strong> column seven on the right.<br />

¬ (p ∨ q) ∧ (r ∨ p)<br />

F T T T F T T T<br />

F T T T F F T T<br />

F T T F F T T T<br />

F T T F F F T T<br />

F F T T F T T F<br />

F F T T F F F F<br />

T F F F T T T F<br />

T F F F F F F F<br />

3 1 2 1 4 1 2 1<br />

The statement is only true for P(p, q, r) = P(F, F, T).<br />

Again if p, q <strong>and</strong> r were under consideration, values for p, q, <strong>and</strong> r will evaluate to true<br />

for Q(p, q, r) = (p→q)∧[(r↔p)∨(¬p)], construct a truth table for the statement. Also note<br />

that brackets [ ] <strong>and</strong> braces { } can be used to differentiate compound groupings up to<br />

three levels.<br />

(p → q) ∧ [(r ↔ p) ∨ (¬ p)]<br />

T T T T T T T T F T<br />

T T T F F F T F F T<br />

T F F F T T T T F T<br />

92


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

T F F F F F T F F T<br />

F T T T T F F T T F<br />

F T T T F T F T T F<br />

F T F T T F F T T F<br />

F T F T F T F T T F<br />

1 2 1 4 1 2 1 3 2 1<br />

There are three types of propositional statements that can be deduced from all truthfunctional<br />

statements:<br />

• If the truth-value column for the table has a mixture of T’s <strong>and</strong> F’s, the table’s<br />

statement is called a contingency.<br />

• If the truth-value column contains all T’s, the statement is called a tautology.<br />

• Lastly, if the truth-value column contains all F’s, the statement is called a<br />

contradiction.<br />

The following logical equivalences apply to any combination of statements used to create<br />

larger compound statements. The p's, q's <strong>and</strong> r' s can be atomic statements or compound<br />

statements.<br />

The Double Negative Law<br />

The Commutative Law for conjunction<br />

The Commutative Law for disjunction<br />

The Associative Law for conjunction<br />

The Associative Law for disjunction<br />

DeMorgan's Law for conjunction<br />

DeMorgan's Law for disjunction<br />

The Distributive Law for conjunction<br />

The Distributive Law for disjunction<br />

Absorption Law for conjunction<br />

Absorption Law for disjunction<br />

Conditional using negation <strong>and</strong> disjunction<br />

Equivalence using conditionals <strong>and</strong> conjunction<br />

¬(¬p) ≡ p<br />

p∧q ≡ q∧p<br />

p∨q ≡ q∨p<br />

(p∧q)∧r ≡ p∧(q∧r)<br />

(p∨q)∨r ≡ p∨(q∨r)<br />

¬(p∨q) ≡ (¬p)∧(¬q)<br />

¬(p∧q) ≡ (¬p)∨(¬q)<br />

p∧(q∨r) ≡ (p∧q)∨(p∧r)<br />

p∨(q∧r) ≡ (p∨q)∧(p∨r)<br />

p∧p ≡ p<br />

p∨p ≡ p<br />

p→q ≡ (~p)∨q<br />

p↔ ≡ (p→q)∧(q→p)<br />

Predicate Calculus<br />

Another part of symbolic logic is predicate calculus, which is built from propositional<br />

calculus. Predicate calculus allows logical arguments based on some or all variables<br />

under consideration. Consider the following arguments, which cannot be expressed in<br />

propositional logic:<br />

All dogs are mammals<br />

93


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Fido is a dog<br />

Therefore, Fido is a mammal<br />

The three statements:<br />

p: All dogs are mammals<br />

q: Fido is a dog<br />

r: Fido is a mammal<br />

are of the form:<br />

p<br />

q<br />

∴ r<br />

can be independently evaluated under propositional logic but cannot be evaluated to<br />

derive the conclusion “r: Fido is a mammal” because “therefore” (‘∴’) is not a legitimate<br />

propositional logic operator. We need to exp<strong>and</strong> propositional calculus <strong>and</strong> set theory to<br />

make use of the predicate calculus.<br />

We use the universal quantifier ∀, which means for all or for every, to establish a<br />

symbolic statement that includes all of the things in a set X that we are considering as<br />

such:<br />

∀x[Px→Qx]<br />

The brackets define the scope of the quantifier. This example is read “For every variable<br />

x in set X, if Px then Qx”. Applied to the example above, we could reword the statement<br />

“All dogs are mammals” by letting Px be: “if x is a mammal” <strong>and</strong> Qx be “then x is a<br />

mammal”. We have:<br />

“For all x, if x is a dog, then x is a mammal”.<br />

This is called a statement form <strong>and</strong> will become a statement when x is given a value. Let<br />

f = Fido. A syllogism is a predicate calculus argument with two premises sharing a<br />

common term.<br />

∀x[Px→Qx]<br />

Pf<br />

∴ Qf<br />

94


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

The predicate P means “is a dog” <strong>and</strong> Q means “is a mammal”. The conclusion states<br />

that because Fido is a dog, Fido is a mammal. If we negate the quantifier as such:<br />

¬∀x[Px→Qx]<br />

The statement becomes:<br />

“Not every dog is a mammal”.<br />

Which sounds ridiculous but the statement is permissible by predicate logic. We can<br />

change this to:<br />

∀x[Px→¬Qx]<br />

Which translates to:<br />

“Some dogs are not mammals”.<br />

Mathematical statements can be constructed using propositional calculus. The statement:<br />

“If a integer is less than 10, then it is less than 11”<br />

This statement can be converted using the universal quantifier so that is true for every<br />

integer x (x ∈ N) less than 10 as such:<br />

∀x ∈ N [(x


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

If we let Px be “x is a lawyer” <strong>and</strong> Qx be “x speaks the truth”, we have:<br />

∃x [Px ∧ Qx],<br />

which states that at least one lawyer speaks the truth. Quantifiers can be applied to more<br />

then one variable in a statement.<br />

Let P be “is a shoe in my closet”, where x is a right shoe <strong>and</strong> y is a left shoe. Then:<br />

∀x, ∃y[Px ∧ Py],<br />

is a symbolic representation of the statement: “For every right shoe in my closet, there<br />

exists a left shoe”. A mathematical statement would be:<br />

∃z ∈ N [x = y×z], x ∈ N, y ∈ N,<br />

which states that there exists an integer z, such that integer x is divisible by integer y. lx<br />

Modal Logic<br />

Modal logic extends the capabilities of traditional logic to include modal expressions,<br />

which contain premises such as “it is necessary that…” or “it is possible that…”. Modal<br />

logic is the study of deductive behavior of expressions based on necessary <strong>and</strong>/or<br />

possible premises. Modal logic can also be defined as a family of related logical systems<br />

that include logics for belief <strong>and</strong> temporal related expressions. The table below contains<br />

some common symbols <strong>and</strong> definitions used in the modal logic family:<br />

Logic Symbols Expressions Symbolized<br />

Modal Logic It is necessary that …<br />

◊<br />

It is possible that …<br />

Deontic Logic O It is obligatory that …<br />

P<br />

It is permitted that …<br />

F<br />

It is forbidden that …<br />

Temporal Logic<br />

G<br />

It will always be the case that …<br />

F<br />

It will be the case that …<br />

H<br />

It has always been the case that …<br />

P<br />

It was the case that…<br />

Doxastic Logic<br />

Bx<br />

x believes that …<br />

96


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

A popular weak modal logic K, conceived by Saul Kripke, .defines three operators:<br />

“negation” (¬), “if…then…” (→), <strong>and</strong> “it is necessary that…” (). The other<br />

connectives, “<strong>and</strong>” (∧), “or” (∨), <strong>and</strong> “if <strong>and</strong> only if” (↔), can be defined by ¬ <strong>and</strong> → as<br />

in propositional logic. The operator “possibly” (◊) can be defined by ◊A = ¬¬A. In<br />

addition to the st<strong>and</strong>ard rules in propositional logic, K has the following rules:<br />

Necessitation Rule:<br />

Distribution Axiom:<br />

If A is a theorem of K, then so is A.<br />

(A → B) → (A → B).<br />

The necessitation rule states that all theorems are necessary <strong>and</strong> the distribution axiom<br />

states that “if it is necessary that if A then B, then if necessarily A then necessarily B”. A<br />

<strong>and</strong> B range over all possible formulas for the language.<br />

(M)<br />

A → A<br />

(4) A → A<br />

(5) ◊A → ◊A<br />

(S4):<br />

(S5):<br />

… = <strong>and</strong> ◊◊…◊ = ◊<br />

00… = <strong>and</strong> 00…◊ = ◊, where each 0 is either or ◊<br />

(B)<br />

A → ◊A<br />

Axiom Name Axiom Condition on Frames R is...<br />

(D) A → ◊A ∃u wRu Serial<br />

(M) A → A wRw Reflexive<br />

(4) A → A (wRv ∧ vRu) → wRu Transitive<br />

(B) A → ◊A wRv → vRw Symmetric<br />

(5) ◊A → ◊A (wRv ∧ wRu) → vRu Euclidean<br />

(CD) ◊A → A (wRv ∧ wRu) → v = u Unique<br />

(M) (A → A) wRv → vRv Shift Reflexive<br />

(C4) A → A wRv → ∃u(wRu∧uRv) Dense<br />

(C) ◊A → ◊A wRv∧wRx → ∃u(vRu ∧ xRu) Convergent<br />

97


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

lxi<br />

Temporal Logic<br />

P<br />

F<br />

H<br />

G<br />

"It has at some time been the case that …"<br />

"It will at some time be the case that …"<br />

"It has always been the case that …"<br />

"It will always be the case that …"<br />

Pp ≡ ¬H¬p<br />

Fp ≡ ¬G¬p<br />

Gp→Fp<br />

"What will always be, will be"<br />

G(p→q)→(Gp→Gq) "If p will always imply q, then if p will always be the case, so will q"<br />

Fp→FFp<br />

"If it will be the case that p, it will be — in between — that it will be"<br />

¬Fp→F¬Fp "If it will never be that p then it will be that it will never be that p"<br />

p→HFp<br />

p→GPp<br />

H(p→q)→(Hp→Hq)<br />

G(p→q)→(Gp→Gq)<br />

"What is, has always been going to be"<br />

"What is, will always have been"<br />

"Whatever always follows from what always has been, always has been"<br />

"Whatever always follows from what always will be, always will be"<br />

RH: From a proof of p, derive a proof of Hp<br />

RG: From a proof of p, derive a proof of Gp<br />

F∃xp(x)→∃xFp(x)<br />

p")<br />

("If there will be something that is p, then there is now something that will be<br />

98


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Spq "q has been true since a time when p was true"<br />

Upq "q will be true until a time when p is true"<br />

Pp ≡ Sp(p∨¬p)<br />

Fp ≡ Up(p∨¬p)<br />

Pp ≡ ∃n(n0 & Fnp)<br />

Hp ≡ ∀n(n0→Fnp)<br />

Op ≡ Up(p&¬p)<br />

Fp ≡ Op ∨ OFp<br />

Pp is true at t if <strong>and</strong> only if p is true at some time t′ such that t′


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Gp→Fp ∀t∃t′(t


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Petri Net<br />

Amdahl’s Law<br />

Gene Amdahl, a computer architect, entrepreneur, former IBM employee <strong>and</strong> one of the<br />

creators of the IBM System 360 architecture, devised this method in 1967 to determine<br />

the maximum expected improvement to a system when only part of it has been improved.<br />

He presented this as an argument against parallel processing. This law is similar to the<br />

law of diminished returns, which states that as more input is applied, each additional<br />

input unit will produce less additional output. Amdahl’s law states that a number of<br />

functions or operations must be executed sequentially, decreasing a computer’s speed<br />

when more processors are added. In other words, the number of tasks that must be<br />

completed sequentially limits computational speedup. This causes a bottleneck in the<br />

workflow, slowing the overall task. However as the size of a task increases the effect of<br />

Amdahl’s law decreases. The speedup of a system is:<br />

unimproved _ time<br />

= speedup =<br />

improved _ time<br />

performance _ with _ improvement<br />

performance _ without _ improvement<br />

If you make an improvement that greatly increases performance (maybe 100 times or<br />

more) in part of a computation but the overall improvement is only 25 percent, then the<br />

upper limit for speedup S is:<br />

S<br />

unimproved _ time<br />

=<br />

improved _ time<br />

1.00<br />

= = 1.333<br />

1.00 − 0.25<br />

Note: The unimproved execution time is 1.00 = 100% because this example makes use of<br />

the ratio between the two times, not the actual values. Assume that an unimproved<br />

computation takes 4 seconds <strong>and</strong> the improved computation takes 3 seconds. The<br />

equation is:<br />

S<br />

unimproved _ time<br />

=<br />

improved _ time<br />

=<br />

4sec<br />

3sec<br />

= 1.333<br />

If the improved computation is taken to be 100 percent performance, then by the<br />

relationship above the unimproved computation has 75 percent performance with respect<br />

to the improved.<br />

101


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

S<br />

=<br />

performance _ with _ improvement<br />

performance _ without _ improvement<br />

100<br />

= = 1.333<br />

75<br />

If a computation is improved such that it affects a proportion F p of the computation, then<br />

the improvement will have a speedup S affecting F p . The improved time for a<br />

computation will be equal to the unimproved time multiplied by the sum of the<br />

unaffected portion (1-F p ) <strong>and</strong> the speedup reduced affected portion (F p ÷S) of the task. To<br />

find the improved execution time we use:<br />

⎡<br />

improved _ time = unimproved _ time ×<br />

⎣<br />

F<br />

p ⎤<br />

⎢( 1−<br />

Fp<br />

) + ⎥ ⎦<br />

Continuing the formula above with an affected portion of 40 percent <strong>and</strong> a speedup of<br />

2.66 times on this portion, we have:<br />

⎡ 0.4 ⎤<br />

improved _ time = 4×<br />

⎢<br />

=<br />

⎣ 2.66⎥<br />

⎦<br />

( 1−<br />

0.4) + = 4×<br />

( 0.6×<br />

0.15) = 4×<br />

0.75 3<br />

This method states, assuming that the value for the speed of the unimproved computation<br />

is 100 percent, the overall speedup for this computational improvement will be:<br />

S<br />

S<br />

unimproved _ time<br />

=<br />

improved _ time<br />

=<br />

(1 − F<br />

1<br />

p<br />

) +<br />

F<br />

S<br />

p<br />

Then plugging in the example proportional values:<br />

S<br />

1<br />

=<br />

(1 − 0.4) +<br />

0.4<br />

2.66<br />

=<br />

1<br />

0.75<br />

= 1.33<br />

Using time values instead of proportions, we have:<br />

4sec<br />

S =<br />

1.6sec<br />

(4sec−1.6sec)<br />

+<br />

2.66<br />

=<br />

4sec<br />

3sec<br />

= 1.33<br />

102


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Amdahl’s law for parallelization states that the sequential fraction F s of a task that cannot<br />

be performed in parallel <strong>and</strong> the fraction F p = (1-F s ) that can gives the following formula<br />

for maximum speedup by N p processors:<br />

S<br />

=<br />

F<br />

s<br />

1<br />

1−<br />

F<br />

+<br />

N<br />

p<br />

s<br />

As N approaches infinity, the maximal speedup approaches 1/F s . As the (1-F s )/N p value<br />

becomes very small, the price paid for marginal performance increases. Assume that F s =<br />

0.06. Then F p = 1-F s = 0.94. For 4 processors:<br />

S<br />

1<br />

=<br />

1−<br />

0.06<br />

0.06 +<br />

4<br />

1<br />

=<br />

0.06 +<br />

0.94<br />

4<br />

1<br />

=<br />

=<br />

0.06 + 0.235<br />

1<br />

0.295<br />

= 3.3898<br />

The table below shows the run time, speedup, efficiency <strong>and</strong> cost for processors<br />

N p ={1,2,4,…,1024}, where F s = 0.06 <strong>and</strong> F p = 0.94. Notice that the speedup per<br />

additional processor is much less as N p increases, causing greater cost <strong>and</strong> less efficiency.<br />

The graphs show the effect on speedup (y-axis) with respect to F s (x-axis) with increasing<br />

N p .<br />

Processors(N p) 1 2 4 8 16 32 64 128 256 512 1024<br />

Run Time 1024.00 542.72 302.08 181.76 121.60 91.52 76.48 68.96 65.20 63.32 62.38<br />

Speedup 1.0000 1.8868 3.3898 5.6338 8.4211 11.1888 13.3891 14.8492 15.7055 16.1718 16.4155<br />

Efficiency 100.00% 94.34% 84.75% 70.42% 52.63% 34.97% 20.92% 11.60% 6.13% 3.16% 1.60%<br />

Cost 1.00 1.06 1.18 1.42 1.90 2.86 4.78 8.62 16.30 31.66 62.38<br />

4<br />

16<br />

16<br />

64<br />

70<br />

60<br />

3.8<br />

14<br />

50<br />

1<br />

1<br />

1<br />

F<br />

1 F 3.6<br />

4<br />

F<br />

1 F<br />

16<br />

12<br />

F<br />

1 F<br />

64<br />

40<br />

30<br />

3.4<br />

10<br />

20<br />

3.39 3.2<br />

0 0.01 0.02 0.03 0.04 0.05 0.06<br />

8.421<br />

8<br />

0 0.01 0.02 0.03 0.04 0.05 0.06<br />

13.389<br />

10<br />

0 0.01 0.02 0.03 0.04 0.05 0.06<br />

0 F<br />

0.06<br />

0 F<br />

0.06<br />

0 F<br />

0.06<br />

256<br />

300<br />

1200<br />

1.024 . 10 3<br />

250<br />

1000<br />

200<br />

800<br />

1<br />

1<br />

F<br />

1 F 150<br />

256<br />

F<br />

1 F 600<br />

1024<br />

100<br />

400<br />

50<br />

103<br />

200<br />

15.706<br />

0<br />

0 0.01 0.02 0.03 0.04 0.05 0.06<br />

16.416<br />

0<br />

0 0.01 0.02 0.03 0.04 0.05 0.06<br />

0 F<br />

0.06<br />

0 F<br />

0.06


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

The graphs have values N p of 4, 16, 64, 256 <strong>and</strong> 1024. Notice that as the value N p<br />

increases, the area under the curve decreases, meaning that the non-parallizable part of<br />

the serial program has a greater effect <strong>and</strong> the degeneration occurs faster as N p increases.<br />

Amdahl’s intention was to show “the continued validity of the single processor approach<br />

<strong>and</strong> of the weaknesses of the multiple processor approach”. His paper proposed<br />

arguments to support his proposal, such as:<br />

• “The nature of this overhead appears to be sequential so that it is unlikely to be<br />

amenable to parallel processing techniques.”<br />

• “A fairly obvious conclusion which can be drawn at this point is that the effort<br />

expended on achieving high parallel performance rates is wasted unless it is<br />

accompanied by achievements in sequential processing rates of very nearly the<br />

same magnitude.”<br />

Gustafson’s Law<br />

In 1988, John L. Gustafson proposed the notion that massively parallel processing was<br />

beneficial because Amdahl’s law implies that the parallel part of the computation <strong>and</strong> the<br />

number of processors is independent [ lxiii ]. He proposed a formula for a scaled speedup<br />

based on an observation that in most real world computations “the problem size scales<br />

with the number of processors”. His proposed formula is:<br />

S =<br />

fraction _ serial + ( fraction _ parallel × number _ of _ processors)<br />

( fraction _ serial + fraction _ parallel = 1)<br />

Fs<br />

+ (1 − Fs<br />

) × N<br />

p<br />

=<br />

F + (1 − F )<br />

s<br />

s<br />

=<br />

F + (1 − F ) × N<br />

s<br />

1<br />

s<br />

p<br />

= F + N<br />

s<br />

p<br />

− N<br />

p<br />

F<br />

s<br />

= N<br />

p<br />

+ ( F − N F )<br />

s<br />

p<br />

s<br />

= N<br />

p<br />

+ (1 − N<br />

p<br />

) × F<br />

s<br />

104


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

where S is the speedup, the serial portion is F s <strong>and</strong> N p is the number of processors.<br />

Again, assume that F s = 0.06. Then F p = 1–F s = 0.94. For 4 processors:<br />

S<br />

= N<br />

p<br />

p s<br />

+ ( 1−<br />

N ) × F = 4 + (1 − 4) × 0.06 = 4 − 0.18 = 3.82<br />

The table <strong>and</strong> graphs below show the same data as in Amdahl but using Gustafson’s law.<br />

Processors(N) 1 2 4 8 16 32 64 128 256 512 1024<br />

Run Time 1024.0000 527.8351 268.0628 135.0923 67.8146 33.9748 17.0043 8.5064 4.2543 2.1274 1.0638<br />

Speedup 1.0000 1.9400 3.8200 7.5800 15.1000 30.1400 60.2200 120.3800 240.7000 481.3400 962.6200<br />

Efficiency 100.00% 97.00% 95.50% 94.75% 94.38% 94.19% 94.09% 94.05% 94.02% 94.01% 94.01%<br />

Cost 1.0000 1.0309 1.0471 1.0554 1.0596 1.0617 1.0628 1.0633 1.0636 1.0637 1.0638<br />

4<br />

4<br />

16<br />

16<br />

65<br />

65<br />

3.95<br />

15.8<br />

64<br />

15.6<br />

4 ( 1 4) F<br />

3.9<br />

16 ( 1 16) F<br />

.<br />

64 1 F 64 ( )<br />

63<br />

15.4<br />

3.85<br />

15.2<br />

62<br />

3.82<br />

. 0.06<br />

3.8<br />

0 0.01 0.02 0.03 0.04 0.05 0.06<br />

0 F<br />

15.1<br />

. 0.06<br />

15<br />

0 0.01 0.02 0.03 0.04 0.05 0.06<br />

0 F<br />

61.16<br />

61<br />

0 0.01 0.02 0.03 0.04 0.05 0.06<br />

0 F<br />

0.06<br />

260<br />

256<br />

1040<br />

1.024 . 10 3<br />

255<br />

1020<br />

256 ( 1 256) F 250<br />

. 0.06<br />

1024 ( 1 1024) F1000<br />

. 0.06<br />

245<br />

980<br />

240.7 240<br />

0 0.01 0.02 0.03 0.04 0.05 0.06<br />

0 F<br />

Consider the following diagrams, which are similar to those in Gustafson’s paper:<br />

Time = s A + p A = 1<br />

s A<br />

p A<br />

Single Processor<br />

962.62<br />

960<br />

0 0.01 0.02 0.03 0.04 0.05 0.06<br />

0 F<br />

s A<br />

p A /N p<br />

N Processors<br />

Time = s A + p A /N p<br />

105


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Time = s G +N p p G<br />

s G<br />

N p p G<br />

Single Processor<br />

s G<br />

p G<br />

N Processors<br />

Time = s G + p G = 1<br />

Under Gustafson’s proposal, increasing the number of processors has little affect on cost<br />

or efficiency <strong>and</strong> an almost linear speedup, as shown in the graphs above. The problem<br />

with this method of evaluating computational speedup is that the serial <strong>and</strong> parallel<br />

programs perform different numbers of operations on the primary task because the task<br />

for the parallel implementation is N p times larger than that of the serial. If the<br />

parallelized operation were matrix multiplication on n 2 matrices for n s = 10, there would<br />

be 10 3 = 1000 multiplication <strong>and</strong> 1000 addition operations in the serial program. If you<br />

scale up the problem for N p = 4 processors the multiplication operations must increase to<br />

4000 <strong>and</strong> the matrix n p size must increase to:<br />

3<br />

4000 =<br />

3<br />

1000 ×<br />

3<br />

4 = 10×<br />

1.5874 ≈ 16<br />

Because matrix multiplication is O(n 3 ) complexity, increasing the size of the matrix, even<br />

minimally, creates a much bigger job. An observation by Yuan Shi was proposed in [ lxiv ],<br />

where an equivalence between Amdahl’s Law <strong>and</strong> Gustafson’s Law is explained. The<br />

relationship is based on the adjustment to the serial fraction in Amdahl’s Law, call it F sA ,<br />

<strong>and</strong> the unadjusted serial fraction used in Gustafson’s Law, call it F sG , such that:<br />

F<br />

sA<br />

1<br />

=<br />

(1 − FsG<br />

) × N<br />

1+<br />

F<br />

sG<br />

p<br />

As an example, consider a task that has serial fraction F sG = 0.05 with 1024 processors.<br />

Amdahl’s Law would predict speedup S to be:<br />

106


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

S =<br />

F<br />

sG<br />

1<br />

1−<br />

F<br />

+<br />

N<br />

p<br />

sG<br />

1<br />

=<br />

1−<br />

0.05<br />

0.05 +<br />

1024<br />

1<br />

=<br />

0.95<br />

0.05 +<br />

1024<br />

1<br />

=<br />

0.05 + 0.0009277<br />

Gustafson’s Law predicts:<br />

=<br />

1<br />

0.0509277<br />

= 19.635666<br />

S<br />

+ (1 − N<br />

) × F<br />

= N<br />

p<br />

p s<br />

= 1024 + (1 −1024)<br />

× 0.05<br />

= 1024 −1023×<br />

0.05 = 1024 − 51.15 = 972.85<br />

However when the serial fraction F sA is calculated from F sG using the equation above, we<br />

have:<br />

F<br />

sA<br />

1<br />

=<br />

(1 − FsG<br />

) × N<br />

1+<br />

F<br />

sG<br />

p<br />

1<br />

=<br />

(1 − 0.05) × 1024<br />

1+<br />

0.05<br />

1<br />

=<br />

972.8<br />

1+<br />

0.05<br />

1<br />

= =<br />

1+<br />

19456<br />

1<br />

19457<br />

=<br />

5.14E - 05<br />

We substitute F sA for F sG <strong>and</strong> solve:<br />

S =<br />

1<br />

1−<br />

Fs<br />

Fs<br />

+<br />

N<br />

p<br />

=<br />

5.14E - 05<br />

1<br />

1−<br />

5.14E - 05<br />

+<br />

1024<br />

=<br />

5.14E - 05<br />

1<br />

+<br />

0.99994<br />

1024<br />

=<br />

5.14E - 05<br />

1<br />

=<br />

+ 9.7650E - 04<br />

1<br />

1.0279E - 03<br />

= 972.85<br />

For this situation, the claim of equivalent results with Gustafson’s Law by obtaining F sA<br />

from F sG , as defined above, <strong>and</strong> substituting F sA for F sG in Amdahl’s Law is true. The<br />

table below shows that this is true for all number of processors, where Np = {1, 2, 4, 8,<br />

…, 1024} <strong>and</strong> F sG = 0.05.<br />

107


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Processors N p<br />

1 2 4 8 16 32 64 128 256 512 1024<br />

F sG 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05<br />

F sA<br />

0.05 0.025641 0.012987 0.0065359 0.0032787 0.001642 0.0008217 0.000411 0.0002055 0.0001028 5.14E-05<br />

Amdahl-F sG 1 1.9047619 3.4782609 5.9259259 9.1428571 12.54902 15.421687 17.414966 18.618182 19.284369 19.635666<br />

Gustafson 1 1.95 3.85 7.65 15.25 30.45 60.85 121.65 243.25 486.45 972.85<br />

Amdahl-F sA 1 1.95 3.85 7.65 15.25 30.45 60.85 121.65 243.25 486.45 972.85<br />

The table below shows that this is also true for all F sG , where F sG = {0.01, 0.02, …, 0.90,<br />

0.1, 0.2} <strong>and</strong> Np = 1024.<br />

Processors N p 1024 1024 1024 1024 1024 1024 1024 1024 1024 1024 1024<br />

F sG 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.2<br />

F sG 9.864E-06 1.993E-05 3.02E-05 4.069E-05 5.14E-05 6.233E-05 7.35E-05 8.491E-05 9.657E-05 0.0001085 0.0002441<br />

Amdahl-F sG 91.184328 47.716682 32.313033 24.427481 19.635666 16.415518 14.102741 12.361178 11.002471 9.9128751 4.9805447<br />

Gustafson 1013.77 1003.54 993.31 983.08 972.85 962.62 952.39 942.16 931.93 921.7 819.4<br />

Amdahl-F sA 1013.77 1003.54 993.31 983.08 972.85 962.62 952.39 942.16 931.93 921.7 819.4<br />

Performance Metrics<br />

Performance metrics are basically measures of computer <strong>and</strong>/or network system behavior<br />

over a given period of time. The four primary types of performance metrics:<br />

• Latency<br />

• Throughput<br />

• Efficiency<br />

• Availability<br />

• Reliability<br />

• Utilization<br />

Latency is also called response time. It is a measure of the delay between the initial time<br />

of a request for some service <strong>and</strong> the time that the service arrives, expressed in units of<br />

elapsed time. The elapsed time between the completion of dialing a phone number <strong>and</strong><br />

the first ring, the time that a router holds a packet, <strong>and</strong> the time spent waiting for a Web<br />

page to be displayed after a hyperlink is clicked are all latency metrics. It can be stated<br />

as a statistical distribution. An example is a server that must acknowledge 99.9% of<br />

client requests in one second or less.<br />

108


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Throughput, also called capacity, is the rate that results arrive or the amount of work<br />

done in a given time. It is measured in the quantity of units per time. Megabits per<br />

second of data transmitted across a network, transactions completed per minute in a<br />

transaction server, <strong>and</strong> gigabytes of data per second transferred across a system buss are<br />

all throughput metrics. The theoretical maximum throughput is called b<strong>and</strong>width. The<br />

b<strong>and</strong>width of a 400Mhz, 64-bit data bus is 25.6Gb/s (400Mhz × 64-bit) but the actual<br />

throughput is less because of padding between data blocks <strong>and</strong> control protocols.<br />

The ratio of usable throughput compared to the b<strong>and</strong>width is called efficiency. The<br />

efficiency of a 400Mhz, 64-bit data bus, with a throughput of 20.48Gb/s, is 80%<br />

(20.48Gb/s ÷ 25.6Gb/s). Goodput is the arrival rate of good data packets across a<br />

computer network. If, on average, 920 packets arrive uncorrupted at the destination, the<br />

goodput is said to be 92%.<br />

Availability is the percentage of time that a system is available to provide service. If a<br />

server is down for 15 minutes each day for maintenance, it has 98.96% availability<br />

(1425min ÷ 1440min).<br />

The reliability metric reports the mean time between failures (MTBF), which indicates<br />

the average period that the system is usable. The mean time to repair (MTTR) is the<br />

average time to recover from failures.<br />

Utilization is the percentage of time that a component in the system is active. Utilization<br />

is typically measured as a percentage. The capacity or maximum throughput of a system<br />

is reached when the utilization of the busiest component is 100%. Many systems have a<br />

utilization threshold because as utilization approaches 100%, system latency quickly<br />

increases.<br />

Performance metrics for parallel systems include the following:<br />

• Runtime<br />

• Speedup<br />

• Efficiency<br />

• Cost<br />

• Scalability<br />

The run time of a parallel system is elapsed time from the instance of execution of the<br />

master or controller program until the last program in the parallel system terminates. T s<br />

usually denotes the serial or single processor run time of a task is <strong>and</strong> T p usually denotes<br />

the parallel run time.<br />

109


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Speedup, usually denoted by S, is the ratio calculated by dividing the serial run time of a<br />

particular task by the parallel run time for the same task:<br />

T<br />

S =<br />

T<br />

s<br />

p<br />

As an example, if two size n matrices are to be multiplied, the operation has complexity<br />

Θ(n 3 ). Assuming that the run time for the operation a single processor is n 3 , the<br />

theoretical speedup, ignoring parallel system overhead, for 2 processors is:<br />

3<br />

n T<br />

= , S =<br />

2 T<br />

3<br />

3<br />

1<br />

1<br />

= n , T2<br />

= =<br />

3<br />

2<br />

n<br />

T<br />

Be careful not to make the following mistake for parallel time <strong>and</strong> speedup:<br />

n<br />

2<br />

2<br />

T<br />

s<br />

= n<br />

3<br />

3<br />

3<br />

3<br />

, ⎛ n ⎞ n Ts<br />

n<br />

Tp<br />

= ⎜ ⎟ = , S = =<br />

⎝ 2 ⎠ 8 T n<br />

3<br />

p<br />

8<br />

= 8<br />

This assumes a change in the overall problem size, which is false because matrix<br />

multiplication is n 3 multiplications <strong>and</strong> n 3 additions, regardless of how many processors<br />

are used.<br />

Efficiency, usually denoted as E, is the ratio calculated by dividing the speedup S by the<br />

number of processors N p , which measures the percentage of time that a processor is<br />

working on the primary task. For the matrix multiplication example the efficiency is:<br />

E =<br />

S<br />

N p<br />

2<br />

= = 1 = 100%<br />

2<br />

Parallel system overhead T o can decrease system efficiency. Parallel system overhead<br />

consists of all the necessary operations to manage <strong>and</strong> setup the parallel system, divide<br />

the task among the processors, transmit the task to the worker processes, collect the<br />

results from the processes <strong>and</strong> compile the results. It may include pieces of the sequential<br />

program that cannot be parallelized T 1-p . Hence a more realistic formula for the run time<br />

with n processors T n , where T c is the time spent on computation of the task, is:<br />

110


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

T<br />

n<br />

= Tc<br />

+ To<br />

+ T1<br />

− p<br />

Assume that the following values are valid for the matrix multiplication above:<br />

• Sequential run time T 1 120sec<br />

• Parallel computation time T c 60sec<br />

• Parallel overhead T o 20sec<br />

• Assume no non-parallizable code T 1-p 0sec<br />

Then speedup would be<br />

120sec<br />

T1 = 60sec,<br />

T2<br />

= Tc<br />

+ To<br />

+ T1<br />

− p<br />

= 60sec+<br />

20sec+<br />

0sec = 80sec, S = = 1.5 = 150%<br />

80sec<br />

This is somewhat less than the previous speedup.<br />

The cost C of a parallel system is calculated by multiplying the parallel run T n time <strong>and</strong><br />

the number of processors N p divided by the sequential run time T 1 :<br />

T<br />

C =<br />

n<br />

×<br />

T 1<br />

N<br />

p<br />

The values in the example above, ignoring overhead, would be.<br />

T<br />

C =<br />

n<br />

× N<br />

T<br />

1<br />

p<br />

60sec×<br />

2 120sec<br />

= = = 1<br />

120sec 120sec<br />

This equation is shows that the parallel system is optimal because the increase in speed is<br />

proportional with the number of processors added. Typically costs are not optimal.<br />

Considering the overhead in the example above, we have:<br />

T<br />

C =<br />

n<br />

× N<br />

T<br />

1<br />

p<br />

80sec×<br />

2 160sec<br />

= = = 1.333.<br />

120sec 120sec<br />

Timing Models<br />

111


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Gathering System Performance Data<br />

Gathering Network Performance Data<br />

Optimal Load balancing<br />

Load balancing is the efficient distribution of the workload over all available processors,<br />

keeping all processors busy until the task is complete. Not all machines will have the<br />

same computational capacity. Some machines may have lower processor speeds or other<br />

tasks that consume system resources. The idea is to shift more work to processors that<br />

can accommodate it. Optimization is the modification of a system to improve<br />

performance <strong>and</strong> efficiency. Optimal load balancing occurs when the latency of requests<br />

is minimized, computation is distributed equally across all processors, system throughput<br />

is maximized, <strong>and</strong> the system completes all tasks in the least possible time. An<br />

absolutely optimal system is rare <strong>and</strong> can be difficult to produce. Optimization usually<br />

involves compromise. Performance or efficiency in one part of a system may have to be<br />

sacrificed to optimize another part.<br />

Successful optimization requires the development of sound algorithms <strong>and</strong> a functional<br />

prototype. Challenges to load balancing include problems with timing, communication,<br />

synchronization, <strong>and</strong> iterative tasks <strong>and</strong> branching that may depend conditions elsewhere<br />

in the parallel system. If tasks in a parallel system have differing execution times, one or<br />

more processors will have to wait for the longest executing task to finish.<br />

Communication <strong>and</strong> synchronization will occur over some communication channel, such<br />

as the system buss or a network. Systems that require an abundance of communication<br />

may cause a bottleneck in these channels. If the channel is shared between multiple<br />

processes, competition for the resource may cause contention in heavily loaded channels.<br />

Loops <strong>and</strong> branches can easily lead to non-deterministic program behavior if measures<br />

are not employed to prevent it.<br />

There are two classifications of load balancing: static <strong>and</strong> dynamic. Static load balancing<br />

uses statistics, based on the ability of each processor’s ability to perform, to share the<br />

burden of the workload. Dynamic load balancing shares work by dynamically averaging<br />

job size based on the performance of participating processors. Dynamic load balancing<br />

requires more communication synchronization between processes, which consumes<br />

communication time. However, the tradeoff is that dynamic load balancing can h<strong>and</strong>le<br />

unexpected delays when jobs take unreasonable amounts of time, where static load<br />

balancing cannot. If a task is taking longer than anticipated, some work can be sent to<br />

other processes. The extra communication may decrease throughput but the processes<br />

will be kept busy. It is also important to mention that load balancing should reduce the<br />

112


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

overall run time for the system. If it takes less time to complete the task without it, we<br />

should forgo load balancing. lxv<br />

113


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

About <strong>Synergy</strong><br />

Blue text: Copied <strong>and</strong> pasted from Getting Started by Dr. Shi<br />

Red text: Copied <strong>and</strong> pasted from syng_man.ps by Dr. Shi<br />

Introduction to The <strong>Synergy</strong> Project<br />

What is <strong>Synergy</strong>?<br />

<strong>Synergy</strong> is a parallel computing system using a Stateless Parallel Processing (SPP)<br />

principle. It is a simplified prototype implementation of a Stateless Machine (SLM). It<br />

lacks backbone fault tolerance <strong>and</strong> stateful process fault tolerance. It is also known to<br />

have an inefficient tuple matching engine in comparison to the full implementation of<br />

SLM.<br />

SPP is based on coarse-grain dataflow processing. A full SLM implementation will<br />

offer, in addition to all benefits that <strong>Synergy</strong> affords, a more efficient tuple matching<br />

engine <strong>and</strong> a non-stop computing platform with total fault tolerance for stateful processes<br />

<strong>and</strong> for the backbone. An SLM can be considered a higher form of Symmetric<br />

MultiProcessor (SMP).<br />

Functionally, <strong>Synergy</strong> can be thought of as an equivalent to PVM, Linda or MPI/MPICH.<br />

<strong>Synergy</strong> uses passive objects for inter-process(or) communication. It offers<br />

programming ease, load balancing <strong>and</strong> fault tolerance benefits. The applicationprogramming<br />

interface (API) is a small set of operators defined on the supported object<br />

types, such as tuple space, file <strong>and</strong> database. <strong>Synergy</strong> programs use a conventional openmanipulate-close<br />

sequence for each passive object. Each <strong>Synergy</strong> program is<br />

individually compiled using a conventional compiler <strong>and</strong> a <strong>Synergy</strong> Language Injection<br />

Library (LIL). A parallel application is synthesized through a configuration specification<br />

(CSL) <strong>and</strong> an automatic processor-binding algorithm. <strong>Synergy</strong> runtime system can<br />

execute multiple parallel applications on the same cluster at the same time.<br />

<strong>Synergy</strong> API blends well into the conventional sequential programs. It is particularly<br />

helpful for reengineering legacy applications. It even allows parallel processing of mixed<br />

PVM <strong>and</strong> MPI programs.<br />

114


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

<strong>Synergy</strong> <strong>and</strong> SPP<br />

<strong>Synergy</strong> is a prototype implementation of a StateLess Machine (SLM). It uses a Passive<br />

Object-Flow Programming (POFP) method to offer programming ease, process fault<br />

Tolerance <strong>and</strong> high efficiency using cluster of networked computers.<br />

In principle, a Stateless Parallel Processing (SPP) system requires total location<br />

transparency for all processes (running programs). This affords three important nonfunctional<br />

features: ease of programming, fault tolerance <strong>and</strong> load balancing.<br />

In programming, this means that location (host address <strong>and</strong> port) dependent IPC<br />

primitives are NOT allowed. Consequently, a special asynchronous IPC layer (of Passive<br />

Objects) is used for inter-process communication <strong>and</strong> synchronization. The SPP runtime<br />

system can automatically determine the optimal process-to-processor binding during the<br />

execution of a parallel application. This additional IPC layer does carry some overheads<br />

in comparison to direct IPC systems such as MPI/PVM. In return, it gives three critical<br />

benefits: programming ease, load balancing <strong>and</strong> fault tolerance support at the architecture<br />

level.<br />

Why <strong>Synergy</strong>?<br />

First, one hidden fact that has not been mentioned in any high performance<br />

multiprocessor's literature is that the use of multiple processors for a single application<br />

necessarily reduces its availability if any processor failure can halt the entire application.<br />

The current state of art in parallel processing is still under the shadow of this gloomy fact.<br />

SPP offers an approach that promises breakthroughs in both high performance <strong>and</strong> high<br />

availability using multi-processors. <strong>Synergy</strong> is the first prototype designed to explore<br />

architectural flaws <strong>and</strong> to validate the claims of SPP.<br />

Second, technically, separation of functional programming from process coordination <strong>and</strong><br />

resource management functions can ease parallel programming while maintaining high<br />

performance <strong>and</strong> availability. Although many believe that explicit manipulation of<br />

processes <strong>and</strong> data objects can produce highly optimized parallel codes, we believe ease<br />

of programming, high performance <strong>and</strong> high availability are of a higher importance in<br />

making industrial strength parallel applications using multiprocessors.<br />

115


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

<strong>Synergy</strong> Philosophy<br />

Facilitating the best use of computing <strong>and</strong> networking resources for each application is<br />

the key philosophy in <strong>Synergy</strong>. We advocate competitive resource sharing as opposed to<br />

``cycle stealing.'' The tactic is to reduce processing time for each application. Multiple<br />

running applications would fully exploit system resources. The realization of the<br />

objectives, however, requires both quantitative analysis <strong>and</strong> highly efficient tools.<br />

It is inevitable that parallel programming <strong>and</strong> debugging will be more time consuming<br />

than single thread processing regardless how well the application programming interface<br />

(API) is designed. The illusive parallel processing results taught us that we must have<br />

quantitatively convincing reasons to processing an application in parallel before<br />

committing to the potential expenses (programming, debugging <strong>and</strong> future maintenance.)<br />

We use Timing Models to evaluate the potential speedups of a parallel program using<br />

different processors <strong>and</strong> networking devices [13]. Timing models capture the orders of<br />

timing costs for computing, communication, disk I/O <strong>and</strong> synchronization requirements.<br />

We can quantitatively examine an application's speedup potential under various processor<br />

<strong>and</strong> networking assumptions. The analysis results delineate the limit of hopes. When<br />

applied to practice, timing models provide guidelines for processing grain selection <strong>and</strong><br />

experiment design.<br />

Efficiency analysis showed that effective parallel processing should follow an<br />

incremental coarse-to-fine grain refinement method. Processors can be added only if<br />

there are unexplored parallelism, processors are available <strong>and</strong> the network is capable of<br />

carrying the anticipated load. Hard-wiring programs to processors will only be efficient<br />

for a few special applications with restricted input at the expense of programming<br />

difficulties.<br />

To improve performance, we took an application-oriented approach in the tool design.<br />

Unlike conventional compilers <strong>and</strong> operating systems projects, we build tools to<br />

customize a given processing environment for a given application. This customization<br />

defines a new infrastructure among the pertinent compilers, operating systems <strong>and</strong> the<br />

application for effective resource exploitation. Simultaneous execution of multiple<br />

parallel applications permits exploiting available resources for all users. This makes the<br />

networked processors a fairly real ``virtual supercomputer.''<br />

An important advantage of the <strong>Synergy</strong> compiler-operating system-application<br />

infrastructure is the higher level portability over existing systems. It allows written<br />

parallel programs to adapt into any programming, processor <strong>and</strong> networking technologies<br />

without compromising performance.<br />

116


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

An important lesson we learned was that mixing parallel processing, resource<br />

management <strong>and</strong> functional programming tools in one language made tool automation<br />

<strong>and</strong> parallel programming unnecessarily difficult. This is especially true for parallel<br />

processors employing high performance uni-processors.<br />

Building timing models before parallel programming can determine the worthiness of the<br />

undertaking in the target multiprocessor environment <strong>and</strong> prevent costly design mistakes.<br />

The analysis can also provide guidelines for parallelism grain size selection <strong>and</strong><br />

experiment design (http://joda.cis.temple.edu/~shi/super96/timing/timing.html)<br />

Except for server programs, all parallel processing applications can be represented by a<br />

coarse grain dataflow graph (CGDG). In CGDG, each node is either a repetition node or<br />

a non-repetition node. A repetition node contains either an iterative or recursive process.<br />

The edges represent data dependencies. It should be fairly obvious that CGDG must be<br />

acyclic.<br />

CGDG fully exhibits potential effective (coarse grain) parallelism for a given application.<br />

For example, the SIMD parallelism is only possible for a repetition node. The MIMD<br />

parallelism is possible for any 1-K branch in CGDG. Pipelines exist along all<br />

sequentially dependent paths provided that there are repetitive input data feeds. The<br />

actual processor assignment determines the deliverable parallelism.<br />

Any repetition node can be processed in a coarse grain SIMD (or scatter-<strong>and</strong>-gather)<br />

fashion. The implementation of a repetition node is to have a master <strong>and</strong> a worker<br />

program connected via two tuple space objects. The master is responsible for distributing<br />

the work tuples <strong>and</strong> collecting results. The worker is responsible for computing the<br />

results from a given input <strong>and</strong> delivering the results.<br />

For all other components in the graph, one can use tuple space or pipe. The use of<br />

file <strong>and</strong> database (yet to be implemented) objects is defined by the application.<br />

Following the above description results in a static IPC graph using passive objects. The<br />

programmer's job is to compose parallel programs communicating with these objects.<br />

History<br />

<strong>Synergy</strong> V3.0 is an enhancement to <strong>Synergy</strong> V2.0 (released in early 1994). Earlier<br />

versions of the same system appeared in the literature under the names of MT (1989),<br />

ZEUS (1986), Configurator (1982) <strong>and</strong> <strong>Synergy</strong> V1.0 (1992) respectively.<br />

117


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Major Components <strong>and</strong> Inner Workings of<br />

<strong>Synergy</strong><br />

Technically, the <strong>Synergy</strong> system is an automatic client/server software generation system<br />

that can form an effective parallel processor for each application using multiple<br />

distributed Unix or Linux computers. This parallel processor is specifically engineered to<br />

process programs inter-connected in an application dependent IPC (Inter-Program<br />

Communication/ Synchronization) graph using industry st<strong>and</strong>ard compilers, operating<br />

systems <strong>and</strong> communication protocols. This IPC graph exhibits application dependent<br />

coarse grain SIMD (Single Instruction Multiple Data), MIMD (Multiple Instruction<br />

Multiple Data) <strong>and</strong> pipeline parallelisms.<br />

<strong>Synergy</strong> V3.0 supports three passive data objects for program-to-program communication<br />

<strong>and</strong> synchronization:<br />

1. Tuple space (a FIFO ordered tuple data manager)<br />

2. Pipe (a generic location independent indirect message queue)<br />

3. File (a location transparent sequential file)<br />

A passive object is any structured data repository permitting no object creation functions.<br />

All commonly known large data objects, such as databases, knowledge bases, hashed<br />

files, <strong>and</strong> ISAM files, can be passive objects provided the object creating operators are<br />

absent. Passive objects confine dynamic dataflows into a static IPC graph for any<br />

parallel application. This is the basis for automatic customization.<br />

POFP uses a simple open-manipulate-close sequence for each passive object. An onedimensional<br />

Coarse-To-Fine (CTF) decomposition method (see Adaptable Parallel<br />

Application Development section for details) can produce designs of modular parallel<br />

programs using passive objects. A global view of the connected parallel programs reveals<br />

application dependent coarse grain SIMD, MIMD <strong>and</strong> pipeline potentials. Processing<br />

grain adjustments are done via the work distribution programs (usually called Masters).<br />

These adjustments can be made without changing codes. All parallel programs can be<br />

developed <strong>and</strong> compiled independently.<br />

What are in <strong>Synergy</strong>? (<strong>Synergy</strong> Kernel with Explanation)<br />

The first important ingredient in <strong>Synergy</strong> is the confinement of inter-program<br />

communication <strong>and</strong> synchronization (IPC) mechanisms. They convert dynamic<br />

118


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

application dataflows to a static, bipartite IPC graph. In <strong>Synergy</strong>, this graph is used to<br />

automate process coordination <strong>and</strong> resource management. In other words, <strong>Synergy</strong> V3.0<br />

uses this static IPC graph to automatically map parallel programs onto set of networked<br />

computers that forms a virtual multiprocessor. In the full SLM implementation, this<br />

static IPC graph will be implemented via a self-healing backbone.<br />

<strong>Synergy</strong> v3.0 contains the following service components:<br />

• A language injection library (LIL). This is the API programmers use to compose<br />

parallel programs. It contains operators defined on supported passive objects,<br />

such as tuple space, file, pipe or database.<br />

• Two memory resident service daemons (PMD <strong>and</strong> CID). These daemons resolve<br />

network references <strong>and</strong> are responsible for remote process/object execution <strong>and</strong><br />

management.<br />

• Two dynamic object daemons (TSH <strong>and</strong> FAH). These daemons are launched<br />

before every parallel application begins <strong>and</strong> are removed after the application<br />

terminates. They implement the defined semantics of LIL operators.<br />

• A customized Distributed Application Controller (DAC). This program actually<br />

synthesizes a multiprocessor application. It conducts processor binding <strong>and</strong><br />

records relevant information about all processes involved in the application until<br />

completion. DAC represents a customized virtual multiprocessor for each<br />

application.<br />

• <strong>Synergy</strong> shell: (prun <strong>and</strong> pcheck). These programs are <strong>Synergy</strong> runtime user<br />

interface.<br />

o prun launches a parallel application<br />

o pcheck is a runtime monitor for managing multiple parallel applications<br />

<strong>and</strong> processes<br />

ADD PRUN AND LIL INFO HERE<br />

Program ``pcheck'' functions analogously as the ``ps'' comm<strong>and</strong> in Unix. It monitors<br />

parallel applications <strong>and</strong> keeps track of parallel processes of each application. Pcheck<br />

also allows killing running processes or applications if necessary.<br />

To make remote processors listening to personal comm<strong>and</strong>s, there are two light weight<br />

utility daemons: the Comm<strong>and</strong> Interpreter Daemon (cid) <strong>and</strong> the Port Mapper Daemon<br />

(pmd). Cid interprets a limited set of process control comm<strong>and</strong>s from the network for<br />

each user account. In other words, parallel users on the same processor need different<br />

cid's. Pmd (the peer leader) provides a "yellow page" service for locating local cid's.<br />

Pmd is automatically started by any cid <strong>and</strong> is transparent to all users.<br />

119


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

FDD is a Fault Detection Daemon. It is activated by an option in the prun comm<strong>and</strong> to<br />

detect worker process failures at runtime.<br />

<strong>Synergy</strong> V3.0 requires no root privileged processes. All parallel processes assume<br />

respective user security <strong>and</strong> resource restrictions defined at account creation. Parallel use<br />

of multiple computers imposes no additional security threat to the existing systems.<br />

Theoretically, there should be one object daemon for each supported object type. For the<br />

three supported types: tuple space, pipe <strong>and</strong> files, we saved the pipe daemon by<br />

implementing it directly in LIL. Thus, <strong>Synergy</strong> V3.0 has only two object daemons: the<br />

Tuple Space H<strong>and</strong>ler (tsh) <strong>and</strong> the File Access H<strong>and</strong>ler (fah). The object daemons, when<br />

activated, talk to parallel programs via the LIL operators under the user defined identity<br />

(via CSL). They are potentially resource hungry. However they only "live" on the<br />

computers where they are needed <strong>and</strong> permitted.<br />

Optimal processor assignment is theoretically complex. <strong>Synergy</strong>'s automatic processor<br />

binding algorithm is extremely simple: unless specifically designated, it binds all tuple<br />

space objects, one master <strong>and</strong> one worker to a single processor. Other processors run the<br />

worker-type (with repeatable logic) processes. Since network is the bottleneck, this<br />

binding algorithm minimizes network traffic thus promising good performance for most<br />

applications using the current tuple matching engine. The full implementation of SLM<br />

will have a distributed tuple matching engine that promises to fulfill a wider range of<br />

performance requirements.<br />

Fault tolerance is a natural benefit of the SPP design. Processor failures discovered<br />

before a run are automatically isolated. Worker processor failures during a parallel<br />

execution is treated in V3.0 by a "tuple shadowing" technique. <strong>Synergy</strong> V3.0 can<br />

automatically recover the lost data from a lost worker with little overhead. This feature<br />

brings the availability of a multiprocessor application to be equal to that of a single<br />

processor <strong>and</strong> is completely transparent to application programs.<br />

<strong>Synergy</strong> provides the basis for automatic load balancing. However, optimal load<br />

balancing requires adjusting tuple sizes. Tuple size adjustments can adapt guided selfscheduling<br />

[1], factoring [2] or fixed chunking using the theory of optimal granule size<br />

for load balancing [3].<br />

<strong>Synergy</strong> V3.0 runs on clusters of workstations. This evaluation copy allows unlimited<br />

processors across multiple file systems (*requires one binary installation per file system).<br />

120


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Comparisons with Other Systems<br />

<strong>Synergy</strong> vs. PVM/MPI<br />

PVM/MPI is a direct message passing system [5,6] that requires inter-process<br />

communication be carried out based on process task id's. This requirement forces an<br />

extra user-programming layer if fault tolerance <strong>and</strong> load balancing are desired. This is<br />

because for load balancing <strong>and</strong> fault tolerance, working data cannot be "hard wired" to<br />

specific processors. An "anonymous" data item can only be supplied using an additional<br />

data management layer providing a tuple space-like interface. In this sense, we consider<br />

PVM/MPI a lower level parallel API as compared to Linda <strong>and</strong> <strong>Synergy</strong>.<br />

Fault tolerant <strong>and</strong> load balanced parallel programs typically require more inter-process<br />

communication than direct message passing since they refresh their states frequently in<br />

order to expose more “stateless moments” – critical to load balance <strong>and</strong> fault tolerance.<br />

This is a tradeoff that users must make before adapting the <strong>Synergy</strong> parallel programming<br />

platform.<br />

<strong>Synergy</strong> vs. Linda<br />

The original Linda implementation [4] uses a virtual global tuple space implemented<br />

using a compile time analysis method. The main advantage of the Linda method is the<br />

potential to reduce communication overhead. It was believed that many tuple access<br />

patterns could be un-raveled into single lines of communication. Thus the compiler can<br />

build the machine dependent codes directly without going through an intermediate<br />

runtime daemon that would potentially double the communication latency of each tuple<br />

transmission. However, experiments indicate that majority applications do not have<br />

static tuple access patterns that a compiler can easily discern. As a result, increased<br />

communication overhead is inevitable.<br />

The compile time tuple binding method is also detrimental to fault tolerance <strong>and</strong> load<br />

balancing.<br />

Another problem in the Linda design is the limited scalability. Composing all parallel<br />

programs in one file <strong>and</strong> compiled by a single compiler makes programming<br />

unnecessarily complex <strong>and</strong> is impractical to large-scale applications. It also presents<br />

difficulties for mixed language processing.<br />

121


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

In comparison, <strong>Synergy</strong> uses dynamic tuple binding at the expense of increased<br />

communication overhead by using dynamic tuple space daemons. In the full SLM<br />

implementation, this overhead will be reduced by a distributed tuple matching engine.<br />

Practical computational experiments indicate that synchronization overhead (due to load<br />

imbalance) logged more time than communication. Thus <strong>Synergy</strong>'s load balancing<br />

advantage can be used to offset its increased communication overhead.<br />

Parallel Programming <strong>and</strong> Processing in <strong>Synergy</strong><br />

A parallel programmer must use the passive objects for communication <strong>and</strong><br />

synchronization purposes. These operations are provided via the language injection<br />

library (LIL). LIL is linked to source programs at compilation time to generate hostless<br />

binaries that can run on any binary compatible platforms.<br />

After making the parallel binaries the interconnection of parallel programs (IPC graph)<br />

should be specified in CSL (Configuration Specification Language). Program ``prun''<br />

starts a parallel application. Prun calls CONF to process the IPC graph <strong>and</strong> to complete<br />

the program/object-to-processor assignments automatically or as specified. It then<br />

activates DAC to start appropriate object daemons <strong>and</strong> remote processes (via remote<br />

cid's). It preserves the process dependencies until all processes are terminated.<br />

Building parallel applications using <strong>Synergy</strong> requires the following steps:<br />

1. Parallel program definitions. This requires, preferably, establishing timing models<br />

for a given application. Timing model analysis provides decomposition<br />

guidelines. Parallel programs <strong>and</strong> passive objects are defined using these<br />

guidelines.<br />

2. Individual program composition using passive objects.<br />

3. Individual program compilation. This makes hostless binaries by compiling the<br />

source programs with the <strong>Synergy</strong> object library (LIL). It may also include<br />

moving the binaries to the $HOME/bin directory when appropriate.<br />

4. Application synthesis. This requires a specification of program-to-program<br />

communication <strong>and</strong> synchronization graph (in CSL). When needed, user preferred<br />

program-to-processor bindings are to be specified as well.<br />

5. Run (prun). At this time the program synthesis information is mapped on to a<br />

selected processor pool. Dynamic IPC patterns are generated (by CONF) to guide<br />

the behavior of remote processes (via DAC <strong>and</strong> remote cid's). Object daemons are<br />

started <strong>and</strong> remote processes are activated (via DAC <strong>and</strong> remote cid's).<br />

6. Monitor <strong>and</strong> control (pcheck).<br />

122


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Load Balancing <strong>and</strong> Performance Optimization<br />

Fault Tolerance<br />

123


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Installing <strong>and</strong> Configuring <strong>Synergy</strong><br />

Red text: Copied <strong>and</strong> pasted from syng_man.ps by Dr. Shi<br />

Gray text: Copied <strong>and</strong> pasted from a document by Dr. Shi<br />

Basic Requirements<br />

In addition to installing <strong>Synergy</strong> V3.0 on each computer cluster, there are four<br />

requirements for each ``parallel'' account:<br />

1. An active SNG_PATH symbol definition pointing to the directory where <strong>Synergy</strong><br />

V3.0 is installed. It is usually /usr/local/synergy.<br />

2. An active comm<strong>and</strong> search path ($SNG_PATH/bin) pointing to the directory<br />

holding the <strong>Synergy</strong> binaries.<br />

3. A local host file ($HOME/.sng_hosts). Note that this file is only necessary for a<br />

host to be used as an application submission console.<br />

4. An active personal comm<strong>and</strong> interpreter (cid) running in the background. Note<br />

that the destination of future parallel process's graphic display should be defined<br />

before starting cid.<br />

Since the local host file is used each time an application is started, it needs to reflect a)<br />

all accessible processors; <strong>and</strong> b) selected hosts for the current application.<br />

Unpacking<br />

To uncompress, at Unix prompt, type<br />

% uncompress synergy-3.0.tar.Z<br />

To untar,<br />

% tar -xvf synergy-3.0.tar<br />

A directory called "synergy" will be created <strong>and</strong> all files<br />

unpacked under this directory.<br />

Compiling<br />

124


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

To compile, change to the synergy directory <strong>and</strong> type<br />

% make<br />

The current version has been tested on these platforms:<br />

- SUN 3/4, SunOs<br />

- IBM RS6000, AIX<br />

- DEC Alpha, OSF/1<br />

- DEC ULTRIX<br />

- Silicon Graphics, SGI<br />

- HP, HP-UX<br />

- CDC cyber, EP/IX<br />

The makefile will try to detect the operating system <strong>and</strong> build binaries, libraries <strong>and</strong><br />

sample applications. You may need to edit the makefile if your system requires special<br />

flags, <strong>and</strong>/or if your include/library path is nonst<strong>and</strong>ard. Check the makefile for detail.<br />

Configuring the <strong>Synergy</strong> Environment<br />

After the installation procedure is complete, some minor changes must be made to the<br />

computers environment to access the <strong>Synergy</strong> system. When using a UNIX/Linux<br />

system we enter comm<strong>and</strong>s in a comm<strong>and</strong>-line environment called a shell. This shell<br />

must be configured to recognize the <strong>Synergy</strong> system. The two most used shells are C<br />

Shell (csh) <strong>and</strong> Bourne Again Shell (bash). Examples of configuration or profile files<br />

will be shown below for csh <strong>and</strong> bash. Because these files are hidden, you must type:<br />

ls –a<br />

<strong>and</strong> press the enter key at the terminal comm<strong>and</strong> prompt to view them.<br />

To configure csh, you must edit the “.cshrc” file in your home directory by adding the<br />

line:<br />

setenv SNG_PATH synergy_directory<br />

where synergy_directory is the directory containing all the binary files <strong>and</strong> the<br />

<strong>Synergy</strong> object library. Next, add the <strong>Synergy</strong> binary directory to the path definition by<br />

typing:<br />

set path=($SNG_PATH/bin $path)<br />

125


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

at the comm<strong>and</strong> line <strong>and</strong> pressing enter. It is important to add $SNG_PATH/bin before<br />

$path, since “prun” may be overloaded in some operating systems (such as SunOS 5.9).<br />

To activate the new settings enter:<br />

source .cshrc<br />

at the comm<strong>and</strong> prompt.<br />

An example of a “.cshrc” file after the settings have been changed, with the changes in<br />

bold, for the SunOS is:<br />

#ident "@(#)local.cshrc 1.2 00/05/01 SMI"<br />

umask 077<br />

set path=( /usr/users/shi/synergy/bin /opt/SUNWspro/bin /bin /usr/bin /usr/ucb<br />

/etc ~ )<br />

if ( -d ~/bin ) then<br />

set path=( $path ~/bin )<br />

endif<br />

set path=( $path . )<br />

if ( $?prompt ) then<br />

set history=32<br />

endif<br />

set prompt="[%n@%m %c ]%#"<br />

# Initialize new variables<br />

setenv LD_LIBRARY_PATH ""<br />

setenv MANPATH "/opt/SUNWspro/man"<br />

# Adding the SUN Companion CD Software, including GCC 2.95<br />

set path=( $path /opt/sfw/bin /opt/sfw/sparc-sun-solaris2.9/bin /usr/local/bin<br />

)<br />

setenv LD_LIBRARY_PATH "${LD_LIBRARY_PATH}:/opt/sfw/lib:/usr/local/lib"<br />

setenv MANPATH "/opt/sfw/man:/usr/local/man:${MANPATH}"<br />

# Adding Usr-Local-Bin<br />

set path=( $path /usr/local/bin )<br />

setenv LD_LIBRARY_PATH "${LD_LIBRARY_PATH}:/usr/local/lib"<br />

setenv MANPATH "/usr/local/man:${MANPATH}"<br />

# Usr-Sfw<br />

set path=( $path /usr/sfw/bin )<br />

setenv LD_LIBRARY_PATH "${LD_LIBRARY_PATH}:/usr/lib:/usr/sfw/lib"<br />

setenv MANPATH "${MANPATH}:/usr/man:/usr/sfw/man"<br />

# DT Window Manager<br />

set path=( $path /usr/dt/bin )<br />

#setenv LD_LIBRARY_PATH $LD_LIBRARY_PATH:/usr/dt/lib<br />

setenv MANPATH "${MANPATH}:/usr/dt/man"<br />

# GNOME<br />

126


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

set path=( $path /usr/share/gnome )<br />

setenv LD_LIBRARY_PATH "${LD_LIBRARY_PATH}:/usr/share/lib"<br />

setenv MANPATH "${MANPATH}:/usr/share/man"<br />

setenv SNG_PATH /usr/users/shi/synergy<br />

# SBIN<br />

set path=( $path /sbin /usr/sbin )<br />

An example “.cshrc” file for Linux OS would be:<br />

set path = ( ~ ~/bin /usr/java/j2sdk_nb/j2sdk1.4.2/bin $path \<br />

/usr/local/X11R6/bin /usr/local/bin /usr/bin /usr/users/shi/synergy/bin<br />

. )<br />

set noclobber<br />

limit coredumpsize 0<br />

# aliases for all shells<br />

#alias cd<br />

alias pwd<br />

alias edt<br />

'cd \!*;set prompt="`hostname`:`pwd`>"'<br />

'echo $cwd'<br />

'textedit -fn screen.b.14'<br />

set history = 1000<br />

set savehist = 400<br />

set ignoreeof<br />

set prompt="%m:%~>"<br />

alias help<br />

alias key<br />

man<br />

'man -k'<br />

setenv EDITOR 'pico -t'<br />

setenv MANPATH /usr/man:/usr/local/man:/usr/share/man<br />

setenv WWW_HOME http://www.cis.temple.edu<br />

setenv NNTPSERVER netnews.temple.edu<br />

setenv SNG_PATH /usr/users/shi/synergy<br />

#source ~/.aliases<br />

# auto goto client<br />

[ "$tty" != "" ] && [ `hostname` = 'lucas' ] && exec gotoclient<br />

To configure bash you must edit the “.bash_profile” file by adding the lines:<br />

SNG_PATH = synergy_directory<br />

export SNG_PATH<br />

where synergy_directory is the directory containing all the binary files <strong>and</strong> the<br />

<strong>Synergy</strong> object library <strong>and</strong> add the following entry to the path:<br />

/usr/users/shi/synergy/bin:<br />

127


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

To activate the new settings enter:<br />

source .bash_profile<br />

at the comm<strong>and</strong> prompt.<br />

Below is an example of the “.bash_profile” file for the Linux OS.<br />

# .bash_profile<br />

# Get the aliases <strong>and</strong> functions<br />

if [ -f ~/.bashrc ]; then<br />

. ~/.bashrc<br />

fi<br />

# <strong>User</strong> specific environment <strong>and</strong> startup programs<br />

PATH=/usr/users/shi/synergy/bin:/usr/java/j2sdk_nb/j2sdk1.4.2/bin:$PATH:$HOME/bin<br />

SNG_PATH = usr/users/shi/synergy<br />

export PATH<br />

export SNG_PATH<br />

unset USERNAME<br />

# auto goto client<br />

[ "$TERM" != "dumb" ] && [ `hostname` = 'lucas' ] && exec gotoclient<br />

Activating a Processor Pool<br />

To activate your personal parallel processors, you will need to start one "cid" one<br />

each of the host either manually or by some shell script at least once.<br />

In addition, if you have special remote display requirements, you need to setup your<br />

display characteristics BEFORE starting cid. For example you may want to monitor a<br />

simulator running on many hosts <strong>and</strong> "steer" the program as it goes.<br />

In this case, you will need to open as many windows as the number of hosts you want to<br />

monitor <strong>and</strong> telnet (rlogin) to these hosts. Then you need to start a cid in each of these<br />

hosts after you designate your display host. Cid has memories. It will send the local<br />

display to the designated host as by the "setenv DISPLAY" comm<strong>and</strong>.<br />

To start cid enter:<br />

128


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

%cid &<br />

Cid will try to connect to another daemon named "pmd". If it could not contact the peer<br />

leader in three times, it will start the peer leader automatically.<br />

To check for the total processor accessibility at any host, enter:<br />

%cds<br />

This comm<strong>and</strong> checks host status for all SELECTED entries in your host file.<br />

Note that you DO NOT have to re-start cid on the de-selected host if you want to reselect<br />

them if a cid is already running, unless you want to change the display setup.<br />

129


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Using <strong>Synergy</strong><br />

The <strong>Synergy</strong> System<br />

Using <strong>Synergy</strong>’s Tuple Space Objects<br />

Using <strong>Synergy</strong>’s Pipe Objects<br />

Using <strong>Synergy</strong>’s File Objects<br />

Compiling <strong>Synergy</strong> Applications<br />

Running <strong>Synergy</strong> Applications<br />

Debugging <strong>Synergy</strong> Applications<br />

130


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Tuple Space Object Programming<br />

A Simple Application – Hello <strong>Synergy</strong>!<br />

The first example given in most introductory computer programming books is the “Hello<br />

World!” program. To get started with <strong>Synergy</strong> programming, the “Hello <strong>Synergy</strong>!”<br />

program will be the first example. The master program (tupleHello1Master.c) simply<br />

opens a tuple space, puts the message in the tuple space <strong>and</strong> terminates. The worker<br />

programs (tupleHello1Worker.c) open the tuple space, read the message from the tuple<br />

space, display the message <strong>and</strong> terminate. The following example programs can be found<br />

in the example01 directory.<br />

The following is the tuple space “Hello <strong>Synergy</strong>!” master program:<br />

#include <br />

#include <br />

main(){<br />

int tplength;<br />

int status;<br />

int P;<br />

int tsd;<br />

char host[128];<br />

char tpname[20];<br />

// Length of ts entry<br />

// Return status for tuple operations<br />

// Number of processors<br />

// Problem tuple space identifier<br />

// Host machine name<br />

// Identifier of ts entry<br />

// Message sent to workers<br />

char sendMsg[50] = "Hello <strong>Synergy</strong>!\0";<br />

// Get host machine name<br />

gethostname(host, sizeof(host));<br />

// Open tuple spaces<br />

printf("Master: Opening tuple space\n");<br />

// Open problem tuple space<br />

tsd = cnf_open("problem",0);<br />

printf("Master: Tuple space open complete\n");<br />

// Get number of processors<br />

P = cnf_getP();<br />

printf("Master: Processors %d\n", P);<br />

// Send 'Hello <strong>Synergy</strong>!' to problem tuple space<br />

// Set length of send entry<br />

tplength = sizeof(sendMsg);<br />

// Set name of entry to host<br />

strcpy(tpname, host);<br />

printf("Master: Putting '%s' Length %d Name %s\n",<br />

sendMsg, tplength, tpname);<br />

// Put entry in tuple space<br />

131


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

}<br />

status = cnf_tsput(tsd, tpname, sendMsg, tplength);<br />

printf("Master: Put '%s' complete\n", sendMsg);<br />

// Sleep 1 second<br />

sleep(1);<br />

// Terminate program<br />

printf("Master: Terminated\n");<br />

cnf_term();<br />

The following is the tuple space “Hello <strong>Synergy</strong>!” worker program:<br />

#include <br />

#include <br />

main(){<br />

int tsd;<br />

// Problem tuple space identifier<br />

int status; // Return status for tuple operations<br />

int tplength; // Length of ts entry<br />

char host[128]; // Host machine name<br />

char tpname[20]; // Identifier of ts entry<br />

char recdMsg[50]; // Message received from master<br />

}<br />

// Get host machine name<br />

gethostname(host, sizeof(host));<br />

// Open tuple space<br />

printf("Worker: Opening tuple space\n");<br />

// Open problem tuple space<br />

tsd = cnf_open("problem",0);<br />

printf("Worker: Tuple space open complete\n");<br />

// Set name to any<br />

strcpy(tpname,"*");<br />

// Read problem from problen tuple space<br />

tplength = cnf_tsread(tsd, tpname, recdMsg, 0);<br />

printf("Worker: Taking item (%s)\n", tpname);<br />

// Normal receive<br />

if (tplength > 0){<br />

printf("Worker: Took message: %s from %s\n",<br />

recdMsg, tpname);<br />

}<br />

// Terminate program<br />

printf("Worker: Terminated\n");<br />

cnf_term();<br />

Before the master <strong>and</strong> worker programs can execute these programs, a Comm<strong>and</strong><br />

Specification Language (csl) file must be created. It would be much more convenient to<br />

use a makefile to compile the programs. Examples of both are below.<br />

132


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

The csl file the programs is:<br />

configuration: tupleHello1;<br />

m: master = tupleHello1Master<br />

(factor = 1<br />

threshold = 1<br />

debug = 0<br />

)<br />

-> f: problem<br />

(type = TS)<br />

-> m: worker = tupleHello1Worker<br />

(type = slave)<br />

-> f: result<br />

(type = TS)<br />

-> m: master;<br />

The makefile for the programs is:<br />

CFLAGS = -O1<br />

OBJS = -L$(SNG_PATH)/obj -lsng -lnsl -lsocket<br />

all : nxdr copy<br />

nxdr : master1 worker1<br />

master1 : tupleHello1Master.c<br />

gcc $(CFLAGS) -o tupleHello1Master tupleHello1Master.c $(OBJS)<br />

worker1 : tupleHello1Worker.c<br />

gcc $(CFLAGS) -o tupleHello1Worker tupleHello1Worker.c $(OBJS)<br />

copy : tupleHello1Master tupleHello1Worker<br />

cp tupleHello1Master $(HOME)/bin<br />

cp tupleHello1Worker $(HOME)/bin<br />

To run the “Hello <strong>Synergy</strong>!” distributed application:<br />

1. Make the executables by typing “make” <strong>and</strong> pressing the enter key.<br />

2. Run the application by typing “prun tupleHello1” <strong>and</strong> pressing the enter key.<br />

The screen output for the master terminal should resemble:<br />

[c615111@owin ~/fpc01 ]>prun tupleHello1<br />

== Checking Processor Pool:<br />

++ Benchmark (186) ++ (owin) ready.<br />

== Done.<br />

== Parallel Application Console: (owin)<br />

== CONFiguring: (tupleHello1.csl)<br />

== Default directory: (/usr/classes/cis6151/c615111/fpc01)<br />

++ Automatic program assignment: (worker)->(owin)<br />

133


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

++ Automatic program assignment: (master)->(owin)<br />

++ Automatic object assignment: (problem)->(owin) pred(1) succ(1)<br />

++ Automatic object assignment: (result)->(owin) pred(1) succ(1)<br />

== Done.<br />

== Starting Distributed Application Controller ...<br />

Verifying process [|(c615111)|*/tupleHello1Master<br />

CID verify ****'d process (bin/tupleHello1Master)<br />

Verifying process [|(c615111)|*/tupleHello1Worker<br />

CID verify ****'d process (bin/tupleHello1Worker)<br />

** (tupleHello1.prcd) verified, all components executable.<br />

CID starting object (result)<br />

CID starting object (problem)<br />

CID starting program. path (bin/tupleHello1Master)<br />

Master: Opening tuple space<br />

CID starting program. path (bin/tupleHello1Worker)<br />

Master: Tuple space open complete<br />

Master: Processors 1<br />

Master: Putting 'Hello <strong>Synergy</strong>!' Length 50 Name owin<br />

Master: Put 'Hello <strong>Synergy</strong>!' complete<br />

Worker: Opening tuple space<br />

** (tupleHello1.prcd) started.<br />

Worker: Tuple space open complete<br />

Worker: Taking item (owin)<br />

Worker: Took message: Hello <strong>Synergy</strong>! from owin<br />

Worker: Terminated<br />

CID. subp(27144) terminated<br />

Setup exit status for (27144)<br />

Master: Terminated<br />

CID. subp(27143) terminated<br />

Setup exit status for (27143)<br />

CID. subp(27141) terminated<br />

Setup exit status for (27141)<br />

== (tupleHello1) completed. Elapsed [1] Seconds.<br />

CID. subp(27142) terminated<br />

Setup exit status for (27142)<br />

[c615111@owin ~/fpc01 ]><br />

The output for the worker terminal should resemble:<br />

CID verify ****'d process (bin/tupleHello1Worker)<br />

CID starting program. path (bin/tupleHello1Worker)<br />

Worker: Opening tuple space<br />

Worker: Tuple space open complete<br />

Worker: Taking item (owin)<br />

Worker: Took message: Hello <strong>Synergy</strong>! from owin<br />

Worker: Terminated<br />

CID. subp(21015) terminated<br />

Setup exit status for (21015)<br />

The output shows <strong>Synergy</strong>’s distributed application initialization screen output, the<br />

execution screen output of the master <strong>and</strong> worker programs, <strong>and</strong> termination screen<br />

output of both programs <strong>and</strong> the distributed application.<br />

134


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

135


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Sending <strong>and</strong> Receiving Data<br />

Hello Workers!—Hello Master!!!<br />

In this example application, the master (tupleHello2Master.c) sends the message “Hello<br />

Workers!” to all workers (tupleHello2Worker.c) <strong>and</strong> gets the response “Hello Master!!!”<br />

<strong>and</strong> the worker’s name from each worker. The source code, makefile <strong>and</strong> csl file for this<br />

application is located in the example02 directory.<br />

The following is the tuple space “Hello Workers!—Hello Master!!!” master program:<br />

#include <br />

#include <br />

main() {<br />

int tplength;<br />

int status;<br />

int P;<br />

int i;<br />

int res;<br />

int tsd;<br />

char host[128];<br />

char tpname[20];<br />

char recdMsg[50];<br />

// Length of ts entry<br />

// Return status for tuple operations<br />

// Number of processors<br />

// Counter index<br />

// Result tuple space identifier<br />

// Problem tuple space identifier<br />

// Host machine name<br />

// Identifier of ts entry<br />

// Message received from workers<br />

// Message sent to workers<br />

char sendMsg[50] = "Hello Workers!\0";<br />

// Get host machine name<br />

gethostname(host, sizeof(host));<br />

// Open tuple spaces<br />

printf("Master: Opening tuple spaces\n");<br />

// Open problem tuple space<br />

tsd = cnf_open("problem",0);<br />

// Open result tuple space<br />

res = cnf_open("result",0);<br />

printf("Master: Tuple spaces open complete\n");<br />

// Get number of processors<br />

P = cnf_getP();<br />

printf("Master: Processors %d\n", P);<br />

// Send 'Hello <strong>Synergy</strong>!' to problem tuple space<br />

// Set length of send entry<br />

tplength = sizeof(sendMsg);<br />

// Set name of entry to host<br />

strcpy(tpname, host);<br />

printf("Master: Putting '%s' Length %d Name %s\n",<br />

sendMsg, tplength, tpname);<br />

// Put entry in tuple space<br />

136


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

}<br />

status = cnf_tsput(tsd, tpname, sendMsg, tplength);<br />

printf("Master: Put '%s' complete\n", sendMsg);<br />

// Sleep 1 second<br />

sleep(1);<br />

// Receive 'Hello Back!!!' from result tuple space<br />

for(i=0; i


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

}<br />

// Normal receive<br />

if (tplength > 0){<br />

printf("Worker: Took message: %s from %s\n",<br />

recdMsg, tpname);<br />

// Set size of entry<br />

tplength = sizeof(sendMsg);<br />

// Set name to host<br />

sprintf(tpname,"%s", host);<br />

printf("Worker: Put '%s' Length %d Name %s\n",<br />

sendMsg, tplength, tpname);<br />

// Put response in result tuple space<br />

status = cnf_tsput(res, tpname, sendMsg, tplength);<br />

printf("Worker: Reply sent\n");<br />

}<br />

// Terminate program<br />

printf("Worker: Terminated\n");<br />

cnf_term();<br />

The makefile <strong>and</strong> csl file are similar to the “Hello <strong>Synergy</strong>!” program except that all<br />

occurrences of “tupleHello1…” is changed to “tupleHello2…” in both files. To run the<br />

“Hello <strong>Synergy</strong>!” distributed application:<br />

1. Make the executables by typing “make” <strong>and</strong> pressing the enter key.<br />

2. Run the application by typing “prun tupleHello2” <strong>and</strong> pressing the enter key.<br />

The screen output for the master terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />

output removed should resemble:<br />

[c615111@owin ~/fpc02 ]>prun tupleHello2<br />

Master: Tuple spaces open complete<br />

Master: Processors 2<br />

Master: Putting 'Hello Workers!' Length 50 Name owin<br />

Master: Put 'Hello Workers!' complete<br />

Worker: Opening tuple spaces<br />

Worker: Tuple spaces open complete<br />

Worker: Taking item owin<br />

Worker: Took message: ‘Hello Workers!’ from owin<br />

Worker: Put 'Hello Master!!!' Length 50 Name owin<br />

Worker: Reply sent<br />

Worker: Terminated<br />

Master: Waiting for reply<br />

Master: Taking item from saber<br />

Master: Took message 'Hello Master!!!'<br />

Master: Waiting for reply<br />

Master: Taking item from owin<br />

Master: Took message 'Hello Master!!!'<br />

Master: Terminated<br />

[c615111@owin ~/fpc02 ]><br />

138


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

The screen output for the worker terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />

output removed should resemble:<br />

Worker: Opening tuple spaces<br />

Worker: Tuple spaces open complete<br />

Worker: Taking item owin<br />

Worker: Took message: ‘Hello Workers!’ from owin<br />

Worker: Put 'Hello Master!!!' Length 50 Name saber<br />

Worker: Reply sent<br />

Worker: Terminated<br />

139


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Sending <strong>and</strong> Receiving Data Types<br />

Sending Various Data Types<br />

<strong>Synergy</strong> can put <strong>and</strong> get more than characters from its tuple space. The following<br />

example shows how to put various data types into a tuple space <strong>and</strong> get various data types<br />

out of a tuple space. The master program (tuplePassMaster.c) puts different data types<br />

into the problem tuple space, <strong>and</strong> the worker (tuplePassWorker.c) gets them, displays<br />

them <strong>and</strong> puts messages in the result tuple space identifying which data types it took.<br />

This application also uses a distributed semaphore to ensure that the workers take data<br />

properly. It also demonstrates the difference between the cnf_read() <strong>and</strong> cnf_get()<br />

functions. The tuplePass application is located in the example03 directory. The<br />

tuplePass.h file has the definitions for the constant <strong>and</strong> the data structure used in the<br />

application.<br />

The following is the tuple space “data type passing” master program:<br />

#include <br />

#include <br />

#include "tuplePass.h"<br />

main(){<br />

int tplength;<br />

int status;<br />

int P;<br />

int i;<br />

int res;<br />

int tsd;<br />

int sem;<br />

char host[128];<br />

char tpname[20];<br />

char recdMsg[50];<br />

// Length of ts entry<br />

// Return status for tuple operations<br />

// Number of processors<br />

// Counter index<br />

// Result tuple space identifier<br />

// Problem tuple space identifier<br />

// Semaphore<br />

// Host machine name<br />

// Identifier of ts entry<br />

// Message received from workers<br />

// Different datatypes to send to workers<br />

// Integer sent to worker<br />

int num = 12000;<br />

int *numPtr = &num;<br />

// Long integer sent to worker<br />

long lnum = 1000000;<br />

long *lnumPtr = &lnum;<br />

// Float sent to worker<br />

float frac = 0.5;<br />

float *fracPtr = &frac;<br />

// Double sent to worker<br />

double dfrac = 12345.678;<br />

double *dfracPtr = &dfrac;<br />

// Integer array sent to worker<br />

int numArr[MAX] = {0,1,2,3,4};<br />

140


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

// Double array sent to worker<br />

double dblArr[MAX] = {10000.1234, 2000.567,<br />

300.89, 40.0, 5.01};<br />

// String sent to worker<br />

char sendMsg[50] = "A text string.\0";<br />

// Struct sent to worker<br />

struct person bob = {"Bob",<br />

"123 Broad St.",<br />

"Pliladelphia", "PA", "19124",<br />

20, "brown", 70.5, "red"};<br />

// Get host machine name<br />

gethostname(host, sizeof(host));<br />

// Open tuple spaces<br />

printf("Master: Opening tuple spaces\n");<br />

// Open problem tuple space<br />

tsd = cnf_open("problem",0);<br />

// Open result tuple space<br />

res = cnf_open("result",0);<br />

printf("Master: Tuple spaces open complete\n");<br />

// Get number of processors<br />

P = cnf_getP();<br />

printf("Master: Processors %d\n", P);<br />

// Put semaphore in problem tuple space<br />

// Set name to sem<br />

strcpy(tpname,"sem");<br />

// Set length for semaphore<br />

tplength = sizeof(int);<br />

// Place the semaphore signal in problem ts<br />

printf("Master: Putting semaphore\n");<br />

status = cnf_tsput(tsd, tpname, &sem, tplength);<br />

// Put int num in ts<br />

// Set length of send entry<br />

tplength = sizeof(int);<br />

// Set name of entry to num<br />

strcpy(tpname, "D_num");<br />

printf("Master: Putting '%d' Length %d Name %s\n",<br />

num, tplength, tpname);<br />

// Put entry in tuple space<br />

status = cnf_tsput(tsd, tpname, numPtr, tplength);<br />

printf("Master: Put '%d' complete\n", num);<br />

// Put long lnum in ts<br />

// Set length of send entry<br />

tplength = sizeof(long);<br />

// Set name of entry to lnum<br />

strcpy(tpname, "D_lnum");<br />

printf("Master: Putting '%ld' Length %d Name %s\n",<br />

lnum, tplength, tpname);<br />

// Put entry in tuple space<br />

status = cnf_tsput(tsd, tpname, lnumPtr, tplength);<br />

printf("Master: Put '%ld' complete\n", lnum);<br />

141


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

// Put float frac in ts<br />

// Set length of send entry<br />

tplength = sizeof(float);<br />

// Set name of entry to frac<br />

strcpy(tpname, "D_frac");<br />

printf("Master: Putting '%f' Length %d Name %s\n",<br />

frac, tplength, tpname);<br />

// Put entry in tuple space<br />

status = cnf_tsput(tsd, tpname, fracPtr, tplength);<br />

printf("Master: Put '%f' complete\n", frac);<br />

// Put double dfrac in ts<br />

// Set length of send entry<br />

tplength = sizeof(double);<br />

// Set name of entry to dfrac<br />

strcpy(tpname, "D_dfrac");<br />

printf("Master: Putting '%g' Length %d Name %s\n",<br />

dfrac, tplength, tpname);<br />

// Put entry in tuple space<br />

status = cnf_tsput(tsd, tpname, (char *)dfracPtr, tplength);<br />

printf("Master: Put '%g' complete\n", dfrac);<br />

// Put int array numArr in ts<br />

// Set length of send entry<br />

tplength = sizeof(int)*MAX;<br />

// Set name of entry to numArr<br />

strcpy(tpname, "D_numArr");<br />

printf("Master: Putting\n ");<br />

for(i=0; i


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

printf(" %s %s, %s %s\n",<br />

bob.address, bob.city, bob.state, bob.zip);<br />

printf(" %d %s %f %s\n",<br />

bob.age, bob.eyes, bob.height, bob.hair);<br />

printf(" Length %d Name %s\n", tplength, tpname);<br />

// Put entry in tuple space<br />

status = cnf_tsput(tsd, tpname, bob, tplength);<br />

printf("Master: Put struct bob complete\n");<br />

// Put string in ts<br />

// Set length of send entry<br />

tplength = sizeof(sendMsg);<br />

// Set name of entry to msg<br />

strcpy(tpname, "D_msg");<br />

printf("Master: Putting '%s' Length %d Name %s\n",<br />

sendMsg, tplength, tpname);<br />

// Put entry in tuple space<br />

status = cnf_tsput(tsd, tpname, sendMsg, tplength);<br />

printf("Master: Put '%s' complete\n", sendMsg);<br />

// Receive results from result tuple space<br />

for(i=0; i


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

int tsd;<br />

int res;<br />

// Problem tuple space identifier<br />

// Result tuple space identifier<br />

int status; // Return status for tuple operations<br />

int tplength; // Length of ts entry<br />

int i;<br />

// Counter index<br />

int sem = 0; // Semaphore<br />

char host[128]; // Host machine name<br />

char tpname[20]; // Identifier of ts entry<br />

char sendMsg[50]; // Message sent back to master<br />

// Different datatypes to receive from master<br />

// Integer received from master<br />

int num;<br />

// Long integer received from master<br />

long lnum;<br />

// Float received from master<br />

float frac;<br />

// Double received from master<br />

double dfrac;<br />

// Integer array received from master<br />

int numArr[MAX];<br />

// Double array received from master<br />

double dblArr[MAX];<br />

// String received from master<br />

char recdMsg[50];<br />

// Struct received from master<br />

struct person bob;<br />

// Initialize sendMsg<br />

strcpy(sendMsg, "");<br />

// Get host machine name<br />

gethostname(host, sizeof(host));<br />

// Open tuple spaces<br />

printf("Worker: Opening tuple spaces\n");<br />

// Open problem tuple space<br />

tsd = cnf_open("problem",0);<br />

// Open result tuple space<br />

res = cnf_open("result",0);<br />

printf("Worker: Tuple spaces open complete\n");<br />

while(1){<br />

// Set name to sem<br />

strcpy(tpname,"sem");<br />

// Read semaphore from problem tuple space<br />

tplength = cnf_tsget(tsd, tpname, &sem, 0);<br />

printf("Worker: Taking semaphore\n");<br />

// Set name to any<br />

strcpy(tpname,"D_*");<br />

tplength = cnf_tsread(tsd, tpname, recdMsg, 0);<br />

printf("Worker: Taking item %s\n", tpname);<br />

// Get int num from ts<br />

if(!strcmp(tpname, "D_num")){<br />

144


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

}<br />

// Read problem from problem tuple space<br />

tplength = cnf_tsget(tsd, tpname, &num, 0);<br />

// Record the data type received<br />

strcpy(sendMsg, tpname);<br />

// Display the data<br />

printf("Worker: took %s '%d'\n", tpname, num);<br />

// Send reply back to master<br />

// Set size of entry<br />

tplength = sizeof(sendMsg);<br />

// Set name to host<br />

strcpy(tpname, host);<br />

printf("Worker: Put '%s' Length %d Name %s\n",<br />

sendMsg, tplength, tpname);<br />

// Put response in result tuple space<br />

status = cnf_tsput(res, tpname, sendMsg, tplength);<br />

printf("Worker: Reply sent\n");<br />

// Get int lnum from ts<br />

else if(!strcmp(tpname, "D_lnum")){<br />

// Read problem from problem tuple space<br />

tplength = cnf_tsget(tsd, tpname, &lnum, 0);<br />

// Record the data type recieve<br />

strcpy(sendMsg, tpname);<br />

// Display the data<br />

printf("Worker: took %s '%ld'\n", tpname, lnum);<br />

// Send reply back to master<br />

// Set size of entry<br />

tplength = sizeof(sendMsg);<br />

// Set name to host<br />

strcpy(tpname, host);<br />

printf("Worker: Put '%s' Length %d Name %s\n",<br />

sendMsg, tplength, tpname);<br />

// Put response in result tuple space<br />

status = cnf_tsput(res, tpname, sendMsg, tplength);<br />

printf("Worker: Reply sent\n");<br />

}<br />

// Get int frac from ts<br />

else if(!strcmp(tpname, "D_frac")){<br />

// Read problem from problem tuple space<br />

tplength = cnf_tsget(tsd, tpname, &frac, 0);<br />

// Record the data type received<br />

strcpy(sendMsg, tpname);<br />

// Display the data<br />

printf("Worker: took %s '%f'\n", tpname, frac);<br />

// Send reply back to master<br />

// Set size of entry<br />

tplength = sizeof(sendMsg);<br />

// Set name to host<br />

strcpy(tpname, host);<br />

printf("Worker: Put '%s' Length %d Name %s\n",<br />

sendMsg, tplength, tpname);<br />

// Put response in result tuple space<br />

status = cnf_tsput(res, tpname, sendMsg, tplength);<br />

printf("Worker: Reply sent\n");<br />

145


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

}<br />

// Get double dfrac from ts<br />

else if(!strcmp(tpname, "D_dfrac")){<br />

// Read problem from problem tuple space<br />

tplength = cnf_tsget(tsd, tpname, &dfrac, 0);<br />

// Record the data type received<br />

strcpy(sendMsg, tpname);<br />

// Display the data<br />

printf("Worker: took (%s) '%g'\n", tpname, dfrac);<br />

// Send reply back to master<br />

// Set size of entry<br />

tplength = sizeof(sendMsg);<br />

// Set name to host<br />

strcpy(tpname, host);<br />

printf("Worker: Put '%s' Length %d Name %s\n",<br />

sendMsg, tplength, tpname);<br />

// Put response in result tuple space<br />

status = cnf_tsput(res, tpname, sendMsg, tplength);<br />

printf("Worker: Reply sent\n");<br />

}<br />

// Get integer array numArr<br />

else if(!strcmp(tpname, "D_numArr")){<br />

// Read problem from problem tuple space<br />

tplength = cnf_tsget(tsd, tpname, numArr, 0);<br />

// Record the data type received<br />

strcpy(sendMsg, tpname);<br />

// Display the data<br />

printf("Worker: took %s\n ", tpname);<br />

for(i=0; i


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

}<br />

// Send reply back to master<br />

// Set size of entry<br />

tplength = sizeof(sendMsg);<br />

// Set name to host<br />

strcpy(tpname, host);<br />

printf("Worker: Put '%s' Length %d Name %s\n",<br />

sendMsg, tplength, tpname);<br />

// Put response in result tuple space<br />

status = cnf_tsput(res, tpname, sendMsg, tplength);<br />

printf("Worker: Reply sent\n");<br />

// Get struct person bob<br />

else if(!strcmp(tpname, "D_bob")){<br />

// Read problem from problem tuple space<br />

tplength = cnf_tsget(tsd, tpname, &bob, 0);<br />

// Record the data type received<br />

strcpy(sendMsg, tpname);<br />

// Display the data<br />

printf("Worker: took\n");<br />

printf(" %s\n", bob.name);<br />

printf(" %s %s, %s %s\n", bob.address,<br />

bob.city, bob.state, bob.zip);<br />

printf(" %d %s %f %s\n", bob.age, bob.eyes,<br />

bob.height, bob.hair);<br />

printf(" Length %d Name %s\n", tplength, tpname);<br />

// Send reply back to master<br />

// Set size of entry<br />

tplength = sizeof(sendMsg);<br />

// Set name to host<br />

strcpy(tpname, host);<br />

printf("Worker: Put '%s' Length %d Name %s\n",<br />

sendMsg, tplength, tpname);<br />

// Put response in result tuple space<br />

status = cnf_tsput(res, tpname, sendMsg, tplength);<br />

printf("Worker: Reply sent\n");<br />

}<br />

// Get string<br />

else if(!strcmp(tpname, "D_msg")){<br />

// Read problem from problem tuple space<br />

tplength = cnf_tsget(tsd, tpname, recdMsg, 0);<br />

// Record the data type received<br />

strcpy(sendMsg, tpname);<br />

// Display the data<br />

printf("Worker: took %s '%s'\n", tpname, recdMsg);<br />

// Send reply back to master<br />

// Set size of entry<br />

tplength = sizeof(sendMsg);<br />

// Set name to host<br />

strcpy(tpname, host);<br />

printf("Worker: Put '%s' Length %d Name %s\n",<br />

sendMsg, tplength, tpname);<br />

// Put response in result tuple space<br />

status = cnf_tsput(res, tpname, sendMsg, tplength);<br />

printf("Worker: Reply sent\n");<br />

147


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

}<br />

// Get terminal<br />

else if(!strcmp(tpname, "D_term")){<br />

printf("Worker: Received terminal\n");<br />

// Set name to sem<br />

strcpy(tpname,"sem");<br />

// Set length for semaphore<br />

tplength = sizeof(int);<br />

// Replace the semaphore signal in problem ts<br />

printf("Worker: Putting semaphore\n");<br />

status = cnf_tsput(tsd, tpname, &sem, tplength);<br />

break;<br />

}<br />

// Set name to sem<br />

strcpy(tpname,"sem");<br />

// Set length for semaphore<br />

tplength = sizeof(int);<br />

// Replace the semaphore signal in problem ts<br />

printf("Worker: Putting semaphore\n");<br />

status = cnf_tsput(tsd, tpname, &sem, tplength);<br />

// Sleep 1 second<br />

sleep(1);<br />

}<br />

// Terminate program<br />

printf("Worker: Terminated\n");<br />

cnf_term();<br />

}<br />

The makefile <strong>and</strong> csl file are similar to the last two applications except in the naming of<br />

the application objects <strong>and</strong> files. To run the data passing distributed application:<br />

1. Make the executables by typing “make” <strong>and</strong> pressing the enter key.<br />

2. Run the application by typing “prun tuplePass” <strong>and</strong> pressing the enter key.<br />

The screen output for the master terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />

output removed should resemble:<br />

[c615111@owin ~/fpc03 ]>prun tuplePass2<br />

Master: Opening tuple spaces<br />

Master: Tuple spaces open complete<br />

Master: Processors 2<br />

Master: Putting semaphore<br />

Master: Putting '12000' Length 4 Name D_num<br />

Master: Put '12000' complete<br />

Master: Putting '1000000' Length 4 Name D_lnum<br />

Master: Put '1000000' complete<br />

Master: Putting '0.500000' Length 4 Name D_frac<br />

148


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Master: Put '0.500000' complete<br />

Master: Putting '12345.7' Length 8 Name D_dfrac<br />

Master: Put '12345.7' complete<br />

Master: Putting<br />

0 1 2 3 4<br />

Length 20 Name D_numArr<br />

Master: Put 'D_numArr' complete<br />

Master: Putting<br />

10000.1 2000.57 300.89 40 5.01<br />

Length 40 Name D_dblArr<br />

Master: Put 'D_dblArr' complete<br />

Master: Putting<br />

Bob<br />

123 Broad St. Pliladelphia, PA 19124<br />

20 brown 70.500000 red<br />

Length 164 Name D_bob<br />

Master: Put struct bob complete<br />

Master: Putting 'A text string.' Length 50 Name D_msg<br />

Master: Put 'A text string.' complete<br />

Master: Waiting for reply<br />

Master: Taking item from saber<br />

Master: saber took 'D_num'<br />

Master: Waiting for reply<br />

Worker: Opening tuple spaces<br />

Worker: Tuple spaces open complete<br />

Worker: Taking semaphore<br />

Worker: Taking item D_lnum<br />

Worker: took D_lnum '1000000'<br />

Worker: Put 'D_lnum' Length 50 Name owin<br />

Master: Taking item from owin<br />

Master: owin took 'D_lnum'<br />

Master: Waiting for reply<br />

Worker: Reply sent<br />

Worker: Putting semaphore<br />

Master: Taking item from saber<br />

Master: saber took 'D_frac'<br />

Master: Waiting for reply<br />

Worker: Taking semaphore<br />

Worker: Taking item D_dfrac<br />

Worker: took (D_dfrac) '12345.7'<br />

Worker: Put 'D_dfrac' Length 50 Name owin<br />

Master: Taking item from owin<br />

Master: owin took 'D_dfrac'<br />

Master: Waiting for reply<br />

Worker: Reply sent<br />

Worker: Putting semaphore<br />

Master: Taking item from saber<br />

Master: saber took 'D_numArr'<br />

Master: Waiting for reply<br />

Worker: Taking semaphore<br />

Worker: Taking item D_dblArr<br />

Worker: took D_dblArr<br />

10000.1 2000.57 300.89 40 5.01<br />

Length 40 Name D_dblArr<br />

Worker: Put 'D_dblArr' Length 50 Name owin<br />

Worker: Reply sent<br />

149


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Worker: Putting semaphore<br />

Master: Taking item from owin<br />

Master: owin took 'D_dblArr'<br />

Master: Waiting for reply<br />

Master: Taking item from saber<br />

Master: saber took 'D_bob'<br />

Master: Waiting for reply<br />

Worker: Taking semaphore<br />

Worker: Taking item D_msg<br />

Worker: took D_msg 'A text string.'<br />

Worker: Put 'D_msg' Length 50 Name owin<br />

Worker: Reply sent<br />

Worker: Putting semaphore<br />

Master: Taking item from owin<br />

Master: owin took 'D_msg'<br />

Master: Putting terminal signal in problem ts<br />

Master: Put terminal in ts<br />

Master: Terminated<br />

Worker: Taking semaphore<br />

Worker: Taking item D_term<br />

Worker: Received terminal<br />

Worker: Putting semaphore<br />

Worker: Terminated<br />

The screen output for the worker terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />

output removed should resemble:<br />

Worker: Opening tuple spaces<br />

Worker: Tuple spaces open complete<br />

Worker: Taking semaphore<br />

Worker: Taking item D_num<br />

Worker: took D_num '12000'<br />

Worker: Put 'D_num' Length 50 Name saber<br />

Worker: Reply sent<br />

Worker: Putting semaphore<br />

Worker: Taking semaphore<br />

Worker: Taking item D_frac<br />

Worker: took D_frac '0.500000'<br />

Worker: Put 'D_frac' Length 50 Name saber<br />

Worker: Reply sent<br />

Worker: Putting semaphore<br />

Worker: Taking semaphore<br />

Worker: Taking item D_numArr<br />

Worker: took D_numArr<br />

0 1 2 3 4<br />

Length(20) Name(D_numArr)<br />

Worker: Put 'D_numArr' Length 50 Name saber<br />

Worker: Reply sent<br />

Worker: Putting semaphore<br />

Worker: Taking semaphore<br />

Worker: Taking item D_bob<br />

Worker: took<br />

Bob<br />

123 Broad St. Philadelphia, PA 19124<br />

150


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

20 brown 70.500000 red<br />

Length 164 Name D_bob<br />

Worker: Put 'D_bob' Length 50 Name saber<br />

Worker: Reply sent<br />

Worker: Putting semaphore<br />

Worker: Taking semaphore<br />

Worker: Taking item D_term<br />

Worker: Received terminal<br />

Worker: Putting semaphore<br />

Worker: Terminated<br />

151


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Getting Workers to Work<br />

Sum of First N Integers<br />

The calculation of the sum of the first n integers or ∑i<br />

can be easily calculated in a<br />

regular computer program. An ANSI C program would be:<br />

#include <br />

#define N 6<br />

int main{<br />

int i;<br />

int sum = 0;<br />

}<br />

for(i=N; i>=N; i--)<br />

sum+=i;<br />

printf(“The sum of the first %d integers is %d\n”, N, sum);<br />

return 0;<br />

This problem can easily be performed in a parallel program by having the master<br />

(tupleSum1Master.c) put each integer into the problem tuple space. The workers<br />

(tupleSum1Workers.c) take the integers out of the problem tuple space, tally their<br />

respective sub sums <strong>and</strong> put the sub sums into the result tuple space. The master gets the<br />

sub sums from the result tuple space <strong>and</strong> produces the desires sum. This application is<br />

located in the example04 directory.<br />

The following is the tuple space sum of n integers master program:<br />

n<br />

i=<br />

1<br />

#include <br />

#include <br />

main(){<br />

int P;<br />

// Number of processors<br />

int i;<br />

// Counter index<br />

int status;<br />

// Return status for tuple operations<br />

int res;<br />

// Result tuple space identifier<br />

int tsd;<br />

// Problem tuple space identifier<br />

int maxNum = 6;<br />

// MAX of n for sum of 1..n<br />

int sendNum = 0;<br />

// Number sent to problem ts<br />

int *sendPtr = &sendNum; // Pointer to sendNum<br />

int recdSum = 0;<br />

// Subsum received from result ts<br />

int *recdPtr = &recdSum; // Pointer to recdSum<br />

int calcSum = 0;<br />

// Calculated sum<br />

int sumTotal = 0;<br />

// Sum total of all subsums<br />

int tplength;<br />

// Length of ts entry<br />

152


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

char tpname[20];<br />

char host[128];<br />

// Identifier of ts entry<br />

// Host machine name<br />

// Get host machine name<br />

gethostname(host, sizeof(host));<br />

// Open tuple spaces<br />

printf("Master: Opening tuple spaces\n");<br />

// Open problem tuple space<br />

tsd = cnf_open("problem", 0);<br />

// Open result tuple space<br />

res = cnf_open("result", 0);<br />

printf("Master: Tuple spaces open complete\n");<br />

// Get number of processors<br />

P = cnf_getP();<br />

printf("Master: Processors %d\n", P);<br />

// Send integers to problem tuple space<br />

// Set length of entry<br />

tplength = sizeof(int);<br />

printf("Master: tplength = (%d)\n", tplength);<br />

// Set maximum n<br />

sendNum = maxNum;<br />

printf("Master: Putting 1...%d to problem tuple space\n", maxNum);<br />

// Loop until all numbers are sent to workers<br />

while (sendNum > 0) {<br />

printf("Master: Putting %d\n", sendNum);<br />

// Set name of entry<br />

sprintf(tpname,"%d", sendNum);<br />

// Put entry in problem tuple space<br />

status = cnf_tsput(tsd, tpname, (char *)sendPtr, tplength);<br />

// Decrement number to set entry value<br />

sendNum--;<br />

}<br />

printf("Master: Finished sending 1...%d to tuple space\n", maxNum);<br />

// Insert negative integer tuple as termination signal<br />

printf("Master: Sending terminal signal\n");<br />

// Set length of entry<br />

tplength = sizeof(int);<br />

// Set entry value<br />

sendNum = -1;<br />

// Set entry name<br />

sprintf(tpname, "%d", maxNum+1);<br />

// Put entry in problem tuple space<br />

status = cnf_tsput(tsd, tpname, (char *)sendPtr, tplength);<br />

printf("Master: Finished sending terminal signal\n");<br />

// Receive sub sums from result tuple space<br />

i = 1;<br />

printf("Master: Getting sub sums from result tuple space\n");<br />

while (i


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

}<br />

tplength = cnf_tsget(res, tpname, (char *)recdPtr, 0);<br />

printf("Master: Received %d from %s\n", recdSum, tpname);<br />

// Add result to total<br />

sumTotal += recdSum;<br />

// Increment counter<br />

i++;<br />

}<br />

printf("Master: The sum total is: %d\n", sumTotal);<br />

// Calculate correct answer with math formula<br />

calcSum = (maxNum*(maxNum+1))/2;<br />

printf ("Master: The calculated sum is: %d\n", calcSum);<br />

// Compare results<br />

if(calcSum == sumTotal)<br />

printf("Master: The workers gave the correct answer\n");<br />

else<br />

printf("Master: The workers gave an incorrect answer\n");<br />

// Terminate program<br />

printf("Master: Terminated\n");<br />

cnf_term();<br />

The following is the tuple space sum of n integers worker program:<br />

#include <br />

#include <br />

main(){<br />

// Variable declarations<br />

int tsd;<br />

// Problem tuple space identifier<br />

int res;<br />

// Result tuple space identifier<br />

int recdNum = 0;<br />

// Number received to be added<br />

int *recdPtr = &recdNum; // Pointer to recdNum<br />

int sendSum = 0;<br />

// Sum of numbers received<br />

int *sendPtr = &sendSum; // Pointer to sendSum<br />

int status;<br />

// Return status for tuple operations<br />

int tplength;<br />

// Length of ts entry<br />

char tpname[20];<br />

// Identifier of ts entry<br />

char host[128];<br />

// Host machine name<br />

// Get host machine name<br />

gethostname(host, sizeof(host));<br />

// Open tuple spaces<br />

printf("Worker: Opening tuple spaces\n");<br />

// Open problem tuple space<br />

tsd = cnf_open("problem", 0);<br />

// Open result tuple space<br />

res = cnf_open("result", 0);<br />

printf("Worker: Tuple spaces open complete\n");<br />

// Loop forever to accumulate sendSum<br />

154


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

}<br />

printf("Worker: Beginning to accumulate sum\n");<br />

while(1){<br />

// Set name to any<br />

strcpy(tpname, "*");<br />

// Get problem from tuple space<br />

tplength = cnf_tsget(tsd, tpname, (char *)recdPtr, 0);<br />

printf("Worker: Took item %s\n", tpname);<br />

// If normal receive<br />

if(recdNum > 0){<br />

// Add to sum<br />

sendSum += recdNum;<br />

printf("Worker: Present subtotal is %d\n", sendSum);<br />

}<br />

// Else terminate worker<br />

else{<br />

printf("Worker: Received terminal signal\n");<br />

// Put terminal message back in problem tuple space<br />

status = cnf_tsput(tsd, tpname, (char *)recdPtr, tplength);<br />

// Set length of entry<br />

tplength = sizeof(int);<br />

// Set name of entry to host<br />

sprintf(tpname,"%s", host);<br />

printf("Worker: Sending sum %d\n", sendSum);<br />

// Put sum in result tuple space<br />

status = cnf_tsput(res, tpname, (char *)sendPtr, tplength);<br />

// Terminate worker<br />

printf("Worker: Terminated\n");<br />

cnf_term();<br />

}<br />

// Sleep 1 second<br />

sleep(1);<br />

}<br />

To run the sum of first n integers distributed application:<br />

1. Make the executables by typing “make” <strong>and</strong> pressing the enter key.<br />

2. Run the application by typing “prun tupleSum1” <strong>and</strong> pressing the enter key.<br />

The screen output for the master terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />

output removed should resemble:<br />

[c615111@owin ~/fpc04 ]>prun tupleSum1<br />

Master: Opening tuple spaces<br />

Master: Tuple spaces open complete<br />

Master: Processors 2<br />

Master: tplength = (4)<br />

Master: Putting 1...6 to problem tuple space<br />

Master: Putting 6<br />

Master: Putting 5<br />

Master: Putting 4<br />

Master: Putting 3<br />

Master: Putting 2<br />

155


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Master: Putting 1<br />

Master: Finished sending 1...6 to tuple space<br />

Master: Sending terminal signal<br />

Master: Finished sending terminal signal<br />

Master: Getting sub sums from result tuple space<br />

Worker: Opening tuple spaces<br />

Worker: Tuple spaces open complete<br />

Worker: Beginning to accumulate sum<br />

Worker: Took item 5<br />

Worker: Present subtotal is 5<br />

Worker: Took item 3<br />

Worker: Present subtotal is 8<br />

Worker: Took item 1<br />

Worker: Present subtotal is 9<br />

Master: Received 12 from saber<br />

Worker: Took item 7<br />

Worker: Received terminal signal<br />

Worker: Sending sum 9<br />

Worker: Terminated<br />

Master: Received 9 from owin<br />

Master: The sum total is: 21<br />

Master: The calculated sum is: 21<br />

Master: The workers gave the correct answer<br />

Master: Terminated<br />

[c615111@owin ~/fpc04 ]><br />

The screen output for the worker terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />

output removed should resemble:<br />

Worker: Tuple spaces open complete<br />

Worker: Beginning to accumulate sum<br />

Worker: Took item 6<br />

Worker: Present subtotal is 6<br />

Worker: Took item 4<br />

Worker: Present subtotal is 10<br />

Worker: Took item 2<br />

Worker: Present subtotal is 12<br />

Worker: Took item 7<br />

Worker: Received terminal signal<br />

Worker: Sending sum 12<br />

Worker: Terminated<br />

Matrix Multiplication<br />

Matrix multiplication, A ⋅ B = C, can be performed by a traditional C program using the<br />

following function:<br />

void multIntMats(int A[N][N], int B[N][N], int C[N][N]){<br />

int i=0, j=0, k=0;<br />

for(i=0; i


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

}<br />

}<br />

for(j=0; j


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

The procedure multiplies an array (or vector) by a matrix. An example of this procedure<br />

is:<br />

0<br />

0<br />

1 0 1 0 1 0<br />

A0 ( 1 0 1 0 0 0 ) B =<br />

C0 A0. B C0 = ( 1 0 0 0 0 0 )<br />

0 1 0 1 0 1<br />

1<br />

0<br />

0<br />

0<br />

0<br />

1<br />

1<br />

0<br />

1<br />

0<br />

0<br />

1<br />

0<br />

1<br />

1<br />

0<br />

0<br />

0<br />

0<br />

1<br />

0<br />

0<br />

1<br />

0<br />

1<br />

0<br />

0<br />

0<br />

0<br />

0<br />

1<br />

0<br />

1<br />

0<br />

1<br />

0<br />

0<br />

0<br />

0<br />

0<br />

0<br />

1<br />

0<br />

1<br />

0<br />

0<br />

0<br />

0<br />

0<br />

1<br />

0<br />

1<br />

0<br />

1<br />

0<br />

0<br />

0<br />

0<br />

A<br />

1<br />

0<br />

0<br />

1<br />

1<br />

0<br />

0<br />

1<br />

1<br />

0<br />

0<br />

1<br />

1 0 1 0 1 0<br />

0 0 1 0 0 0<br />

B = C A . B C = B A 1<br />

0 1 0 1 0 1<br />

0 0 0 1 0 0<br />

0<br />

0<br />

1<br />

0<br />

1<br />

0<br />

1<br />

0<br />

1<br />

0<br />

0<br />

0<br />

0<br />

0<br />

0<br />

0<br />

1<br />

0<br />

0<br />

0<br />

0<br />

1<br />

0<br />

1<br />

0<br />

1<br />

0<br />

1<br />

0<br />

0<br />

0<br />

0<br />

0<br />

0<br />

0<br />

1<br />

The master will know which row to put the Ci results in because the tuple name (the i)<br />

will be the row number, which is also the tuple entry name. The multiplication of A <strong>and</strong><br />

B after the results were taken out of the result tuple space <strong>and</strong> assembled by the master<br />

would be:<br />

Notice that the multiplication produces the identity matrix. The B matrices used in<br />

examples are intentionally set to be the inverse of their respective A matrices to<br />

demonstrate that the programs actually work. The files for this application are located in<br />

the example05 directory. The master program for the matrix multiplication is:<br />

#include <br />

#include <br />

#include <br />

#define N 6<br />

158


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

main(){<br />

int i, j;<br />

int tplength;<br />

int status;<br />

int P;<br />

int res;<br />

int tsd;<br />

int n;<br />

int Ai[N];<br />

int Ci[N];<br />

char host[128];<br />

char tpname[20];<br />

// Matrix indices<br />

// Length of ts entry<br />

// Return status for tuple operations<br />

// Number of processors<br />

// Result tuple space identifier<br />

// Problem tuple space identifier<br />

// Counter<br />

// Row from A to send to worker<br />

// Row from C to get from worker<br />

// Host machine name<br />

// Identifier of ts entry<br />

// The A matrix to break up into arrays<br />

// <strong>and</strong> send to workers<br />

int A[N][N] = {{1,0,1,0,0,0},<br />

{0,1,0,1,0,0},<br />

{1,0,1,0,1,0},<br />

{0,1,0,1,0,1},<br />

{0,0,1,0,1,0},<br />

{0,0,0,1,0,1}};<br />

// The B matrix to send to workers<br />

int B[N][N] = {{0,0,1,0,-1,0},<br />

{0,0,0,1,0,-1},<br />

{1,0,-1,0,1,0},<br />

{0,1,0,-1,0,1},<br />

{-1,0,1,0,0,0},<br />

{0,-1,0,1,0,0}};<br />

// The C matrix built from arrays<br />

// received from workers<br />

int C[N][N];<br />

printf("Master: started\n");<br />

// Get host machine name<br />

gethostname(host, sizeof(host));<br />

// Open tuple spaces<br />

printf("Master: Opening tuple spaces\n");<br />

// Open problem tuple space<br />

tsd = cnf_open("problem",0);<br />

// Open result tuple space<br />

res = cnf_open("result",0);<br />

printf("Master: Tuple spaces open complete\n");<br />

// Get number of processors<br />

P = cnf_getP(); // Get number of processors<br />

printf("Master: Processors %d\n", P);<br />

// Print matrix A <strong>and</strong> B<br />

printf("Master: Matrix A\n");<br />

for(i=0; i


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

printf("\n");<br />

}<br />

printf("Master: Matrix B\n");<br />

for(i=0; i


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

}<br />

// Print the C matrix from workers<br />

printf("Master: Matrix C\n");<br />

for(i=0; i


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

// Set name to B<br />

strcpy(tpname,"B");<br />

// Read B matrix from problem tuple space<br />

status = cnf_tsread(tsd, tpname, B, 0);<br />

tplength = (N*N)*sizeof(double);<br />

printf("Worker: Matrix B\n");<br />

for(i=0; i


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

}<br />

status = cnf_tsput(res, tpname, Ci, tplength);<br />

sleep(1);<br />

}<br />

}<br />

// Else a zero length tuple was received<br />

else{<br />

printf("Worker: Error-received zero length tuple");<br />

printf("Worker: Terminated\n");<br />

cnf_term();<br />

}<br />

To run the matrix multiplication distributed application:<br />

1. Make the executables by typing “make” <strong>and</strong> pressing the enter key.<br />

2. Run the application by typing “prun tupleMat1” <strong>and</strong> pressing the enter key.<br />

The screen output for the master terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />

output removed should resemble:<br />

[c615111@owin ~/fpc05 ]>prun tupleMat1<br />

Master: Tuple spaces open complete<br />

Master: Processors 2<br />

Master: Matrix A<br />

1 0 1 0 0 0<br />

0 1 0 1 0 0<br />

1 0 1 0 1 0<br />

0 1 0 1 0 1<br />

0 0 1 0 1 0<br />

0 0 0 1 0 1<br />

Master: Matrix B<br />

0 0 1 0 -1 0<br />

0 0 0 1 0 -1<br />

1 0 -1 0 1 0<br />

0 1 0 -1 0 1<br />

-1 0 1 0 0 0<br />

0 -1 0 1 0 0<br />

Master: Starting C = A . B<br />

Master: Putting Length 144 Name B<br />

Master: tplength = 24<br />

Master: Putting item A0 1 0 1 0 0 0<br />

Master: Putting item A1 0 1 0 1 0 0<br />

Master: Putting item A2 1 0 1 0 1 0<br />

Master: Putting item A3 0 1 0 1 0 1<br />

Master: Putting item A4 0 0 1 0 1 0<br />

Master: Putting item A5 0 0 0 1 0 1<br />

Worker: Opening tuple spaces<br />

Worker: Tuple spaces open complete<br />

Worker: Matrix B<br />

0 0 1 0 -1 0<br />

0 0 0 1 0 -1<br />

Master: Received 0<br />

163


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

1 0 -1 0 1 0<br />

0 1 0 -1 0 1<br />

-1 0 1 0 0 0<br />

0 -1 0 1 0 0<br />

Worker: Taking item A1<br />

0 1 0 1 0 0<br />

Worker : Array CA1 0 1 0 0 0 0<br />

Master: Received 1<br />

Worker: Taking item A2<br />

1 0 1 0 1 0<br />

Worker : Array CA2 0 0 1 0 0 0<br />

Master: Received 2<br />

Master: Received 3<br />

Worker: Taking item A4<br />

0 0 1 0 1 0<br />

Worker : Array CA4 0 0 0 0 1 0<br />

Master: Received 4<br />

Master: Received 5<br />

Master: Matrix C<br />

1 0 0 0 0 0<br />

0 1 0 0 0 0<br />

0 0 1 0 0 0<br />

0 0 0 1 0 0<br />

0 0 0 0 1 0<br />

0 0 0 0 0 1<br />

Master: Putting terminal signal<br />

Master: Terminated<br />

Worker: Taking item A6<br />

Worker: Terminated<br />

[c615111@owin ~/fpc05 ]><br />

The screen output for the worker terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />

output removed should resemble:<br />

Worker: Opening tuple spaces<br />

Worker: Tuple spaces open complete<br />

Worker: Matrix B<br />

0 0 1 0 -1 0<br />

0 0 0 1 0 -1<br />

1 0 -1 0 1 0<br />

0 1 0 -1 0 1<br />

-1 0 1 0 0 0<br />

0 -1 0 1 0 0<br />

Worker: Taking item A0<br />

1 0 1 0 0 0<br />

Worker : Array CA0 1 0 0 0 0 0<br />

Worker: Taking item A3<br />

0 1 0 1 0 1<br />

Worker : Array CA3 0 0 0 1 0 0<br />

Worker: Taking item A5<br />

0 0 0 1 0 1<br />

Worker : Array CA5 0 0 0 0 0 1<br />

Worker: Taking item A6<br />

Worker: Terminated<br />

164


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

165


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Work Distribution by Chunking<br />

Finding the Sum of the First n Integers with Chunking<br />

The following is the tuple space “sum of n integers” master program implemented by<br />

sending work in chunks:<br />

#include <br />

#include <br />

#define N 32<br />

main(){<br />

int P;<br />

// Number of processors<br />

int chunk_size;<br />

// Chunk size<br />

int remainder;<br />

// Remainder of numbers to be sent<br />

int i;<br />

// Counter index<br />

int job;<br />

// Job number<br />

int status;<br />

// Return status for tuple operations<br />

int res;<br />

// Result tuple space identifier<br />

int tsd;<br />

// Problem tuple space identifier<br />

int *sendArr = 0;<br />

// Number sent to problem ts<br />

int sendNum;<br />

// Number sent to worker in sendArr<br />

int recdSum = 0;<br />

// Subsum recieved from result ts<br />

int *recdPtr = &recdSum; // Pointer to recdSum<br />

int calcSum = 0;<br />

// Calculated sum<br />

int sumTotal = 0;<br />

// Sum total of all subsums<br />

int tplength;<br />

// Length of ts entry<br />

char tpname[20];<br />

// Identifier of ts entry<br />

char host[128];<br />

// Host machine name<br />

// Get host machine name<br />

gethostname(host, sizeof(host));<br />

// Open tuple spaces<br />

printf("Master: Opening tuple spaces\n");<br />

// Open problem tuple space<br />

tsd = cnf_open("problem", 0);<br />

// Open result tuple space<br />

res = cnf_open("result", 0);<br />

printf("Master: Tuple spaces open complete\n");<br />

// Get number of processors<br />

P = cnf_getP();<br />

printf("Master: Processors %d\n", P);<br />

// Get chunk size<br />

chunk_size = cnf_getf();<br />

printf("Master: Chunk size %d\n", chunk_size);<br />

// Put chunk size in ts<br />

// Set length of entry<br />

166


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

tplength = sizeof(int);<br />

// Set name of entry<br />

strcpy(tpname, "chunk_size");<br />

// Put entry in ts<br />

status = cnf_tsput(tsd, tpname, &chunk_size, tplength);<br />

printf("Master: Sent chunk size\n");<br />

// Send integers to problem tuple space<br />

// Set length of entry to chunk_size + 1 integers<br />

tplength = (chunk_size+1) * sizeof(int);<br />

printf("Master: tplength = %d\n", tplength);<br />

// Prepare <strong>and</strong> send integer arrays into tuple space<br />

printf("Master: Putting 1...%d to problem tuple space\n", N);<br />

if((sendArr = (int *) malloc(tplength)) == NULL)<br />

exit(1);<br />

// Loop until all numbers are sent to workers<br />

remainder = N;<br />

job = 0;<br />

sendNum = 1;<br />

while (remainder > 0) {<br />

if (remainder < chunk_size)<br />

chunk_size = remainder;<br />

remainder = remainder - chunk_size;<br />

job++;<br />

// Set name of entry to job number<br />

sprintf(tpname,"A%d", job);<br />

// Put chunk_size in index zero<br />

sendArr[0] = chunk_size;<br />

printf("Master: Putting %s Size %d\n ", tpname, sendArr[0]);<br />

// Put chunk_size integers in array<br />

for(i=1; i 0){<br />

// Set name of entry to any<br />

strcpy(tpname,"*");<br />

// Get entry from result tuple space<br />

tplength = cnf_tsget(res, tpname, (char *)recdPtr, 0);<br />

printf("Master: Recieved %d from %s\n", recdSum, tpname);<br />

// Add result to total<br />

sumTotal += recdSum;<br />

// Increment counter<br />

}<br />

167


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

}<br />

printf("Master: The sum total is: %d\n", sumTotal);<br />

// Calculate correct answer with math formula<br />

calcSum = (N*(N+1))/2;<br />

printf ("Master: The formula calculated sum is: %d\n", calcSum);<br />

// Compare results<br />

if(calcSum == sumTotal)<br />

printf("Master: The workers gave the correct answer\n");<br />

else<br />

printf("Master: The workers gave an incorrect answer\n");<br />

// Insert negative integer tuple as termination signal<br />

printf("Master: Sending terminal signal\n");<br />

// Set length of entry<br />

tplength = (1) * sizeof(int);<br />

// Set entry value<br />

sendArr[0] = -1;<br />

// Set entry name<br />

sprintf(tpname, "A%d", N+1);<br />

// Send entry to tuple space<br />

status = cnf_tsput(tsd, tpname, sendArr, tplength);<br />

printf("Master: Finished sending terminal signal\n");<br />

// Terminate program<br />

printf("Master: Terminated\n");<br />

cnf_term();<br />

The following is the tuple space “sum of n integers” worker program implemented by<br />

receiving work in chunks:<br />

#include <br />

#include <br />

main(){<br />

// Variable declarations<br />

int tsd;<br />

// Problem tuple space identifier<br />

int res;<br />

// Result tuple space identifier<br />

int *recdPtr;<br />

// Pointer to recd array<br />

int sendSum = 0;<br />

// Sum of numbers received<br />

int *sendPtr = &sendSum; // Pointer to sendSum<br />

int status;<br />

// Return status for tuple operations<br />

int tplength;<br />

// Length of ts entry<br />

int chunk_size;<br />

// Size of recdPtr<br />

int i;<br />

// Index counter<br />

char tpname[20];<br />

// Identifier of ts entry<br />

char host[128];<br />

// Host machine name<br />

// Get host machine name<br />

gethostname(host, sizeof(host));<br />

// Open tuple spaces<br />

168


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

printf("Worker: Opening tuple spaces\n");<br />

// Open problem tuple space<br />

tsd = cnf_open("problem", 0);<br />

// Open result tuple space<br />

res = cnf_open("result", 0);<br />

printf("Worker: Tuple spaces open complete\n");<br />

// Get the chunk size from ts<br />

// Set name of entry<br />

strcpy(tpname, "chunk_size");<br />

// Read chunk size<br />

status = cnf_tsread(tsd, tpname, &chunk_size, 0);<br />

printf("Worker: Chunk size %d\n", chunk_size);<br />

// Set length of tuple space entry<br />

tplength = (chunk_size+1) * sizeof(int);<br />

// Allocate memory for entry<br />

if((recdPtr = (int *)malloc(tplength)) == NULL)<br />

exit(-1);<br />

printf("Worker: array size %d\n", tplength);<br />

// Loop forever to accumulate sendSum<br />

printf("Worker: Begining to accumulate sum\n");<br />

while(1){<br />

sendSum = 0;<br />

// Set name to any<br />

strcpy(tpname, "A*");<br />

// Get problem from tuple space<br />

tplength = cnf_tsget(tsd, tpname, recdPtr, 0);<br />

// Get chunk_size from index zero<br />

chunk_size = (int) recdPtr[0];<br />

printf("Worker: Took item %s length %d\n ", tpname, chunk_size);<br />

// If normal receive<br />

if(chunk_size > 0){<br />

// Get number of array elements<br />

// Add to sendSum<br />

for(i=1; i


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

}<br />

}<br />

}<br />

// Sleep 1 second<br />

sleep(1);<br />

To run the sum of first n integers distributed application with chunking:<br />

1. Make the executables by typing “make” <strong>and</strong> pressing the enter key.<br />

2. Run the application by typing “prun tupleSum2” <strong>and</strong> pressing the enter key.<br />

The screen output for the master terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />

output removed should resemble:<br />

[c615111@owin ~/fpc06 ]>prun tupleSum2<br />

Master: Opening tuple spaces<br />

Master: Tuple spaces open complete<br />

Master: Processors 2<br />

Master: Chunk size 4<br />

Master: Sent chunk size<br />

Master: tplength = 20<br />

Master: Putting 1...32 to problem tuple space<br />

Master: Putting A1 Size 4<br />

1 2 3 4<br />

Master: Putting A2 Size 4<br />

5 6 7 8<br />

Master: Putting A3 Size 4<br />

9 10 11 12<br />

Master: Putting A4 Size 4<br />

13 14 15 16<br />

Master: Putting A5 Size 4<br />

17 18 19 20<br />

Master: Putting A6 Size 4<br />

21 22 23 24<br />

Master: Putting A7 Size 4<br />

25 26 27 28<br />

Master: Putting A8 Size 4<br />

29 30 31 32<br />

Master: Finished sending 1...32 to tuple space<br />

Master: Getting sub sums from result tuple space<br />

Master: Recieved 10 from saber<br />

Worker: Opening tuple spaces<br />

Worker: Tuple spaces open complete<br />

Worker: Chunk size 4<br />

Worker: array size 20<br />

Worker: Begining to accumulate sum<br />

Worker: Took item A2 length 4<br />

5 6 7 8<br />

Worker: Sending sum 26<br />

Master: Recieved 26 from owin<br />

Master: Recieved 42 from saber<br />

Worker: Took item A4 length 4<br />

13 14 15 16<br />

170


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Worker: Sending sum 58<br />

Master: Recieved 58 from owin<br />

Master: Recieved 74 from saber<br />

Worker: Took item A6 length 4<br />

21 22 23 24<br />

Worker: Sending sum 90<br />

Master: Recieved 90 from owin<br />

Master: Recieved 106 from saber<br />

Worker: Took item A8 length 4<br />

29 30 31 32<br />

Worker: Sending sum 122<br />

Master: Recieved 122 from owin<br />

Master: The sum total is: 528<br />

Master: The formula calculated sum is: 528<br />

Master: The workers gave the correct answer<br />

Master: Sending terminal signal<br />

Master: Finished sending terminal signal<br />

Master: Terminated<br />

Worker: Took item A33 length -1<br />

Worker: Recieved terminal signal<br />

Worker: Terminated<br />

[c615111@owin ~/fpc06 ]><br />

The screen output for the worker terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />

output removed should resemble:<br />

Worker: Opening tuple spaces<br />

Worker: Tuple spaces open complete<br />

Worker: Chunk size 4<br />

Worker: array size 20<br />

Worker: Begining to accumulate sum<br />

Worker: Took item A1 length 4<br />

1 2 3 4<br />

Worker: Sending sum 10<br />

Worker: Took item A3 length 4<br />

9 10 11 12<br />

Worker: Sending sum 42<br />

Worker: Took item A5 length 4<br />

17 18 19 20<br />

Worker: Sending sum 74<br />

Worker: Took item A7 length 4<br />

25 26 27 28<br />

Worker: Sending sum 106<br />

Worker: Took item A33 length -1<br />

Worker: Recieved terminal signal<br />

Worker: Terminated<br />

Matrix Multiplication with Chunking<br />

171


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

0 1 2 3 4 5 6 7 8 9<br />

0 1 1 1 1 1 1 1 1 1 1<br />

1 1 1 1 1 1 1 1 1 1 0<br />

2 1 1 1 1 1 1 1 1 0 0<br />

3 1 1 1 1 1 1 1 0 0 0<br />

A = 4 1 1 1 1 1 1 0 0 0 0 B =<br />

5 1 1 1 1 1 0 0 0 0 0<br />

6 1 1 1 1 0 0 0 0 0 0<br />

7 1 1 1 0 0 0 0 0 0 0<br />

8 1 1 0 0 0 0 0 0 0 0<br />

9 1 0 0 0 0 0 0 0 0 0<br />

0<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

7<br />

8<br />

9<br />

0 1 2 3 4 5 6 7 8 9<br />

0 0 0 0 0 0 0 0 0 1<br />

0 0 0 0 0 0 0 0 1 -1<br />

0 0 0 0 0 0 0 1 -1 0<br />

0 0 0 0 0 0 1 -1 0 0<br />

0 0 0 0 0 1 -1 0 0 0<br />

0 0 0 0 1 -1 0 0 0 0<br />

0 0 0 1 -1 0 0 0 0 0<br />

0 0 1 -1 0 0 0 0 0 0<br />

0 1 -1 0 0 0 0 0 0 0<br />

1 -1 0 0 0 0 0 0 0 0<br />

C<br />

0<br />

1<br />

2<br />

3 0 0 0 1 0 0 0 0 0 0<br />

A . B C = 4 0 0 0 0 1 0 0 0 0 0 B A 1<br />

5<br />

6<br />

7<br />

8<br />

9<br />

0 1 2 3 4 5 6 7 8 9<br />

1 0 0 0 0 0 0 0 0 0<br />

0 1 0 0 0 0 0 0 0 0<br />

0 0 1 0 0 0 0 0 0 0<br />

0 0 0 0 0 1 0 0 0 0<br />

0 0 0 0 0 0 1 0 0 0<br />

0 0 0 0 0 0 0 1 0 0<br />

0 0 0 0 0 0 0 0 1 0<br />

0 0 0 0 0 0 0 0 0 1<br />

The following is the tuple space “matrix multiplication” master program implemented by<br />

sending work in chunks:<br />

#include <br />

#include <br />

#include <br />

#include "matrix.h"<br />

// The A matrix to break up into arrays<br />

// <strong>and</strong> send to workers<br />

double A[N][N];<br />

// The B matrix<br />

double B[N][N];<br />

// The resulting C matrix<br />

double C[N][N];<br />

172


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

main(){<br />

int processors; // Number of processors<br />

int chunk_size; // Chunk size<br />

int remaining; // Remaining arrays of work<br />

int i, j;<br />

// Matrix indices<br />

int matrix_row; // Index of matrix row<br />

int array_pos; // Array position in rows array<br />

int status;<br />

// Return status for tuple operations<br />

int res;<br />

// Result tuple space identifier<br />

int tsd;<br />

// Problem tuple space identifier<br />

double *rows; // Rows from A to send to worker<br />

double worker_time; // Sum of times returned by workers<br />

double total_time; // Total application run time<br />

int tplength; // Length of ts entry<br />

char tpname[20]; // Identifier of ts entry<br />

char host[128]; // Host machine name<br />

// Get host machine name<br />

gethostname(host, sizeof(host));<br />

// Get time stamp<br />

total_time = wall_clock();<br />

// Open tuple spaces<br />

printf("Master: Opening tuple spaces\n");<br />

// Open problem tuple space<br />

tsd = cnf_open("problem",0);<br />

// Open result tuple space<br />

res = cnf_open("result",0);<br />

printf("Master: Tuple spaces open complete\n");<br />

// Get number of processors<br />

processors = cnf_getP();<br />

printf("Master: Processors %d\n", processors);<br />

// Get chunk size<br />

chunk_size = cnf_getf();<br />

printf("Master: Chunk size %d\n", chunk_size);<br />

printf("Master: Starting C = A . B\n");<br />

printf(" on %d x %d matrices\n", N, N);<br />

// Create <strong>and</strong> print matrix B<br />

makeDblInv(B);<br />

if(N


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

// Put entry in tuple space<br />

status = cnf_tsput(tsd, tpname, B, tplength);<br />

// Create <strong>and</strong> print matrix A<br />

makeDblMat(A);<br />

if(N 0) {<br />

// If remaining rows is less than chunk size<br />

// set number of rows sent to remaining rows<br />

if (remaining < chunk_size)<br />

chunk_size = remaining;<br />

// Subtract rows being sent from remaining rows<br />

remaining = remaining - chunk_size;<br />

printf("Master: chunk_size(%d) remaining(%d) \n",<br />

chunk_size, remaining);<br />

// Put chunk_size in first index<br />

rows[0] = chunk_size;<br />

// Set rows array position to 2<br />

// Second position (1) is reserved for<br />

// time returned by worker<br />

array_pos = 2;<br />

// Put rows of A matrix in rows array<br />

for (i=0; i


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

}<br />

}<br />

if(N 0){<br />

// Set entry name<br />

strcpy(tpname,"*");<br />

// Get entry from result tuple space<br />

tplength = cnf_tsget(res, tpname, rows, 0);<br />

// Get number rows in this chunk from last index<br />

chunk_size = rows[0];<br />

// Get time returned by worker<br />

worker_time += rows[1];<br />

// Convert beginning row of entry to an integer<br />

matrix_row = atoi(tpname);<br />

printf("Master: Recieved %s Size %d\n", tpname, chunk_size);<br />

// Set the position in the array to 2<br />

array_pos = 2;<br />

}<br />

// Assemble the result matrix C<br />

// Loop through recieved rows<br />

printf("Master: Recieved\n");<br />

for (i= 0; i


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

}<br />

// Resolve worker time<br />

printf("Master: The workers used %g seconds of processor time\n",<br />

(worker_time/1000000.0));<br />

// Check <strong>and</strong> print the C matrix<br />

if(N


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

int tplength;<br />

char tpname[20];<br />

char host[128];<br />

// Length of ts entry<br />

// Identifier of ts entry<br />

// Host machine name<br />

// Get host machine name<br />

gethostname(host, sizeof(host));<br />

// Open tuple spaces<br />

printf("Worker: Opening tuple spaces\n");<br />

// Open problem tuple space<br />

tsd = cnf_open("problem", 0);<br />

// Open result tuple space<br />

res = cnf_open("result", 0);<br />

printf("Worker: Tuple spaces open complete\n");<br />

// Set tpname to B<br />

strcpy(tpname,"B");<br />

// Read matrix B from tuple space<br />

status = cnf_tsread(tsd, tpname, B, 0);<br />

// Print matrix B<br />

if(N 0){<br />

// Check termination signal<br />

if (!strcmp(tpname, "A-term")){<br />

printf("Worker: Recieved the terminal signal\n");<br />

// Replace the terminal signal in problem ts<br />

status = cnf_tsput(tsd, tpname, rows, tplength);<br />

// Free memory for rows<br />

free(rows);<br />

// Terminate worker<br />

printf("Worker: Terminated\n");<br />

cnf_term();<br />

}<br />

// Get number rows in this chunk from last index<br />

177


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

chunk_size = rows[0];<br />

// Convert beginning row of entry to an integer<br />

matrix_row = atoi(&tpname[1]);<br />

printf("Worker: chunk_size %d matrix_row %d\n",<br />

chunk_size, matrix_row);<br />

// Set rows array put position to 2<br />

array_put = 2;<br />

// Set rows array get position to 2<br />

array_get = 2;<br />

// Get beginning worker time<br />

worker_time = wall_clock();<br />

// For each row in chunk_size<br />

for(n=0; n


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

}<br />

status = cnf_tsput(res, tpname, rows, tplength);<br />

if(N f: problem<br />

(type = TS)<br />

-> m: worker = tupleMat2Worker<br />

(type = slave)<br />

-> f: result<br />

(type = TS)<br />

-> m: master;<br />

The screen output for the master terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />

output removed should resemble:<br />

[c615111@owin ~/fpc07new ]>prun tupleMat2<br />

Master: Opening tuple spaces<br />

Master: Tuple spaces open complete<br />

Master: Processors 2<br />

Master: Chunk size 4<br />

Master: Starting C = A . B<br />

on 10 x 10 matrices<br />

The B double matrix<br />

0 0 0 0 0 0 0 0 0 1<br />

0 0 0 0 0 0 0 0 1 -1<br />

179


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

0 0 0 0 0 0 0 1 -1 0<br />

0 0 0 0 0 0 1 -1 0 0<br />

0 0 0 0 0 1 -1 0 0 0<br />

0 0 0 0 1 -1 0 0 0 0<br />

0 0 0 1 -1 0 0 0 0 0<br />

0 0 1 -1 0 0 0 0 0 0<br />

0 1 -1 0 0 0 0 0 0 0<br />

1 -1 0 0 0 0 0 0 0 0<br />

Master: Putting B Length(800) Name(B)<br />

The A double matrix<br />

1 1 1 1 1 1 1 1 1 1<br />

1 1 1 1 1 1 1 1 1 0<br />

1 1 1 1 1 1 1 1 0 0<br />

1 1 1 1 1 1 1 0 0 0<br />

1 1 1 1 1 1 0 0 0 0<br />

1 1 1 1 1 0 0 0 0 0<br />

1 1 1 1 0 0 0 0 0 0<br />

1 1 1 0 0 0 0 0 0 0<br />

1 1 0 0 0 0 0 0 0 0<br />

1 0 0 0 0 0 0 0 0 0<br />

Master: Putting chunk_size Length(4) Name(chunk_size)<br />

Master: Ai tplength = (336)<br />

Master: Putting A in problem tuple space<br />

Master: chunk_size(4) remaining(6)<br />

1 1 1 1 1 1 1 1 1 1<br />

1 1 1 1 1 1 1 1 1 0<br />

1 1 1 1 1 1 1 1 0 0<br />

1 1 1 1 1 1 1 0 0 0<br />

Master: chunk_size(4) remaining(2)<br />

1 1 1 1 1 1 0 0 0 0<br />

1 1 1 1 1 0 0 0 0 0<br />

1 1 1 1 0 0 0 0 0 0<br />

1 1 1 0 0 0 0 0 0 0<br />

Master: chunk_size(2) remaining(0)<br />

1 1 0 0 0 0 0 0 0 0<br />

1 0 0 0 0 0 0 0 0 0<br />

Worker: Opening tuple spaces<br />

Worker: Tuple spaces open complete<br />

The B double matrix<br />

0 0 0 0 0 0 0 0 0 1<br />

0 0 0 0 0 0 0 0 1 -1<br />

0 0 0 0 0 0 0 1 -1 0<br />

0 0 0 0 0 0 1 -1 0 0<br />

0 0 0 0 0 1 -1 0 0 0<br />

0 0 0 0 1 -1 0 0 0 0<br />

0 0 0 1 -1 0 0 0 0 0<br />

0 0 1 -1 0 0 0 0 0 0<br />

0 1 -1 0 0 0 0 0 0 0<br />

1 -1 0 0 0 0 0 0 0 0<br />

Worker: chunk_size 4 matrix_row 4<br />

Worker: Recieved<br />

1 1 1 1 1 1 0 0 0 0<br />

Worker: Calculated array CA4+0<br />

0 0 0 0 1 0 0 0 0 0<br />

Worker: Recieved<br />

1 1 1 1 1 0 0 0 0 0<br />

180


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Worker: Calculated array CA4+1<br />

0 0 0 0 0 1 0 0 0 0<br />

Worker: Recieved<br />

1 1 1 1 0 0 0 0 0 0<br />

Worker: Calculated array CA4+2<br />

0 0 0 0 0 0 1 0 0 0<br />

Worker: Recieved<br />

1 1 1 0 0 0 0 0 0 0<br />

Worker: Calculated array CA4+3<br />

0 0 0 0 0 0 0 1 0 0<br />

Worker: Putting 4<br />

Master: Recieved 4 Size 4<br />

Master: Recieved<br />

0 0 0 0 1 0 0 0 0 0<br />

0 0 0 0 0 1 0 0 0 0<br />

0 0 0 0 0 0 1 0 0 0<br />

0 0 0 0 0 0 0 1 0 0<br />

Master: Recieved 0 Size 4<br />

Master: Recieved<br />

1 0 0 0 0 0 0 0 0 0<br />

0 1 0 0 0 0 0 0 0 0<br />

0 0 1 0 0 0 0 0 0 0<br />

0 0 0 1 0 0 0 0 0 0<br />

Worker: chunk_size 2 matrix_row 8<br />

Worker: Recieved<br />

1 1 0 0 0 0 0 0 0 0<br />

Worker: Calculated array CA8+0<br />

0 0 0 0 0 0 0 0 1 0<br />

Worker: Recieved<br />

1 0 0 0 0 0 0 0 0 0<br />

Worker: Calculated array CA8+1<br />

0 0 0 0 0 0 0 0 0 1<br />

Worker: Putting 8<br />

Master: Recieved 8 Size 2<br />

Master: Recieved<br />

0 0 0 0 0 0 0 0 1 0<br />

0 0 0 0 0 0 0 0 0 1<br />

Master: The multiplication took 1.11439 seconds total time<br />

Master: The workers used 0.024033 seconds of processor time<br />

The C double matrix<br />

1 0 0 0 0 0 0 0 0 0<br />

0 1 0 0 0 0 0 0 0 0<br />

0 0 1 0 0 0 0 0 0 0<br />

0 0 0 1 0 0 0 0 0 0<br />

0 0 0 0 1 0 0 0 0 0<br />

0 0 0 0 0 1 0 0 0 0<br />

0 0 0 0 0 0 1 0 0 0<br />

0 0 0 0 0 0 0 1 0 0<br />

0 0 0 0 0 0 0 0 1 0<br />

0 0 0 0 0 0 0 0 0 1<br />

Master: C is Identity Matrix<br />

Master: Terminated<br />

Worker: Recieved the terminal signal<br />

Worker: Terminated<br />

== (tupleMat2) completed. Elapsed [2] Seconds.<br />

[c615111@owin ~/fpc07new ]><br />

181


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

The screen output for the worker terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />

output removed should resemble:<br />

Worker: Opening tuple spaces<br />

Worker: Tuple spaces open complete<br />

The B double matrix<br />

0 0 0 0 0 0 0 0 0 1<br />

0 0 0 0 0 0 0 0 1 -1<br />

0 0 0 0 0 0 0 1 -1 0<br />

0 0 0 0 0 0 1 -1 0 0<br />

0 0 0 0 0 1 -1 0 0 0<br />

0 0 0 0 1 -1 0 0 0 0<br />

0 0 0 1 -1 0 0 0 0 0<br />

0 0 1 -1 0 0 0 0 0 0<br />

0 1 -1 0 0 0 0 0 0 0<br />

1 -1 0 0 0 0 0 0 0 0<br />

Worker: chunk_size 4 matrix_row 0<br />

Worker: Recieved<br />

1 1 1 1 1 1 1 1 1 1<br />

Worker: Calculated array CA0+0<br />

1 0 0 0 0 0 0 0 0 0<br />

Worker: Recieved<br />

1 1 1 1 1 1 1 1 1 0<br />

Worker: Calculated array CA0+1<br />

0 1 0 0 0 0 0 0 0 0<br />

Worker: Recieved<br />

1 1 1 1 1 1 1 1 0 0<br />

Worker: Calculated array CA0+2<br />

0 0 1 0 0 0 0 0 0 0<br />

Worker: Recieved<br />

1 1 1 1 1 1 1 0 0 0<br />

Worker: Calculated array CA0+3<br />

0 0 0 1 0 0 0 0 0 0<br />

Worker: Putting 0<br />

Worker: Recieved the terminal signal<br />

Worker: Terminated<br />

To run the matrix multiplication distributed application with chunk size of 200 <strong>and</strong> N =<br />

500 (a 500 x 500 matrix):<br />

1. Set the factor value in the csl file to 200 (as shown below)<br />

2. Make the executables by typing “make SIZE=500” <strong>and</strong> pressing the enter key.<br />

3. Run the application by typing “prun tupleMat2” <strong>and</strong> pressing the enter key.<br />

configuration: tupleMat2;<br />

m: master = tupleMat2Master<br />

(factor = 200<br />

threshold = 1<br />

debug = 0<br />

182


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

)<br />

-> f: problem<br />

(type = TS)<br />

-> m: worker = tupleMat2Worker<br />

(type = slave)<br />

-> f: result<br />

(type = TS)<br />

-> m: master;<br />

The screen output for the master terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />

output removed should resemble:<br />

[c615111@owin ~/fpc07new ]>prun tupleMat2<br />

Master: Opening tuple spaces<br />

CID starting program. path (bin/tupleMat2Worker)<br />

Master: Tuple spaces open complete<br />

Master: Processors 2<br />

Master: Chunk size 200<br />

Master: Starting C = A . B<br />

on 500 x 500 matrices<br />

Master: Putting B Length(2000000) Name(B)<br />

Worker: Opening tuple spaces<br />

Worker: Tuple spaces open complete<br />

Master: Putting chunk_size Length(4) Name(chunk_size)<br />

Master: Ai tplength = (800016)<br />

Master: Putting A in problem tuple space<br />

Master: chunk_size(200) remaining(300)<br />

Master: chunk_size(200) remaining(100)<br />

Worker: chunk_size 200 matrix_row 200<br />

Master: chunk_size(100) remaining(0)<br />

Worker: Putting 200<br />

Master: Recieved 0 Size 200<br />

Master: Recieved<br />

Master: Recieved 200 Size 200<br />

Master: Recieved<br />

Master: Recieved 400 Size 100<br />

Master: Recieved<br />

Master: The multiplication took 9.66808 seconds total time<br />

Master: The workers used 15.0322 seconds of processor time<br />

Master: C is Identity Matrix<br />

Worker: Recieved the terminal signal<br />

Master: Terminated<br />

Worker: Terminated<br />

== (tupleMat2) completed. Elapsed [10] Seconds.<br />

[c615111@owin ~/fpc07new ]><br />

The screen output for the worker terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />

output removed should resemble:<br />

Worker: Opening tuple spaces<br />

Worker: Tuple spaces open complete<br />

Worker: chunk_size 200 matrix_row 0<br />

Worker: Putting 0<br />

183


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Worker: chunk_size 100 matrix_row 400<br />

Worker: Putting 400<br />

Worker: Recieved the terminal signal<br />

Worker: Terminated<br />

184


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Optimized Programs<br />

Optimized Matrix Multiplication with Chunking<br />

The following is the tuple space “optimized matrix multiplication” master program<br />

implemented by sending work in chunks:<br />

#include <br />

#include <br />

#include <br />

// The A matrix to break up into arrays<br />

// <strong>and</strong> send to workers<br />

double A[N][N];<br />

double B[N][N];<br />

double C[N][N];<br />

#include "matrix.h"<br />

// Main function<br />

main(){<br />

int processors; // Number of processors<br />

int chunk_size; // Chunk size<br />

int remaining; // Remaining arrays of work<br />

int i, j;<br />

// Matrix indices<br />

int matrix_row; // Index of matrix row<br />

int array_pos; // Array position in rows array<br />

int status;<br />

// Return status for tuple operations<br />

int res;<br />

// Result tuple space identifier<br />

int tsd;<br />

// Problem tuple space identifier<br />

double *rows; // Rows from A to send to worker<br />

double worker_time; // Sum of times returned by workers<br />

double total_time; // Total application run time<br />

int tplength; // Length of ts entry<br />

char tpname[20]; // Identifier of ts entry<br />

char host[128]; // Host machine name<br />

// Get host machine name<br />

gethostname(host, sizeof(host));<br />

// Get time stamp<br />

total_time = wall_clock();<br />

// Open tuple spaces<br />

printf("Master: Opening tuple spaces\n");<br />

// Open problem tuple space<br />

tsd = cnf_open("problem",0);<br />

// Open result tuple space<br />

res = cnf_open("result",0);<br />

printf("Master: Tuple spaces open complete\n");<br />

185


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

// Get number of processors<br />

processors = cnf_getP();<br />

printf("Master: Processors %d\n", processors);<br />

// Get chunk size<br />

chunk_size = cnf_getf();<br />

printf("Master: Chunk size %d\n",<br />

chunk_size);<br />

printf("Master: Starting C = A . B\n");<br />

printf(" on %d x %d matrices\n", N, N);<br />

// Create <strong>and</strong> print matrix B<br />

makeDblInv(B);<br />

if(N


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

matrix_row = 0;<br />

// Loop until all numbers are sent to workers<br />

while (remaining > 0) {<br />

// If remaining rows is less than chunk size<br />

// set number of rows sent to remaining rows<br />

if (remaining < chunk_size)<br />

chunk_size = remaining;<br />

// Subtract rows being sent from remaining rows<br />

remaining = remaining - chunk_size;<br />

// Set rows array position to 2<br />

// Second position (1) is reserved for<br />

// time returned by worker<br />

array_pos = 2;<br />

// Put chunk_size in last index<br />

rows[0] = chunk_size;<br />

// Put rows of A matrix in rows array<br />

for (i=0; i


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

}<br />

// Set the position in the array to 2<br />

array_pos = 2;<br />

// Assemble the result matrix C<br />

// Loop through recieved rows<br />

if(N


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

}<br />

printf("Master: Terminated\n");<br />

cnf_term();<br />

The following is the tuple space “optimized matrix multiplication” worker program<br />

implemented by sending work in chunks:<br />

#include <br />

#include <br />

#include <br />

double Ai[N/2][N]; // A chunk of A matrix<br />

double B[N][N]; // B matrix<br />

double Ci[N/2][N]; // A chunk of C matrix<br />

#include "matrix.h"<br />

// Main function<br />

main(){<br />

int chunk_size; // Chunk size<br />

int i, j, k; // Matrix indices<br />

int matrix_row; // Index of matrix row<br />

int array_pos; // Get array position in rows array<br />

int status;<br />

// Return status for tuple operations<br />

int res;<br />

// Result tuple space identifier<br />

int tsd;<br />

// Problem tuple space identifier<br />

double *rows; // Rows from A<br />

double worker_time; // Time to return to master<br />

int tplength; // Length of ts entry<br />

char tpname[20]; // Identifier of ts entry<br />

char host[128]; // Host machine name<br />

// Get host machine name<br />

gethostname(host, sizeof(host));<br />

// Open tuple spaces<br />

printf("Worker: Opening tuple spaces\n");<br />

// Open problem tuple space<br />

tsd = cnf_open("problem", 0);<br />

// Open result tuple space<br />

res = cnf_open("result", 0);<br />

printf("Worker: Tuple spaces open complete\n");<br />

// Set tpname to B<br />

strcpy(tpname,"B");<br />

// Read matrix B from tuple space<br />

status = cnf_tsread(tsd, tpname, B, 0);<br />

// Print matrix B<br />

if(N


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

// Get chunk_size from master<br />

// Set tpname to chunk_size<br />

strcpy(tpname,"chunk_size");<br />

// Read chunk_size from tuple space<br />

status = cnf_tsread(tsd, tpname, &chunk_size, 0);<br />

// Prepare integer array for tuple space exchanges<br />

tplength = (2+chunk_size*N)*sizeof(double);<br />

if ((rows = (double*)malloc(tplength)) == NULL)<br />

exit(-1);<br />

// Loop until terminal signal is recieved<br />

while(1){<br />

// Set entry name to any begins with A<br />

strcpy(tpname,"A*");<br />

// Set length of entry<br />

tplength = cnf_tsget(tsd, tpname, rows, 0);<br />

// Normal recieve<br />

if(tplength > 0){<br />

// Check termination signal<br />

if (!strcmp(tpname, "A-term")){<br />

printf("Worker: Recieved the terminal signal\n");<br />

// Replace the terminal signal in problem ts<br />

status = cnf_tsput(tsd, tpname, rows, tplength);<br />

// Free memory for rows<br />

free(rows);<br />

// Terminate worker<br />

printf("Worker: Terminated\n");<br />

cnf_term();<br />

}<br />

// Get number rows in this chunk from last index<br />

chunk_size = (int)rows[0];<br />

// Convert beginning row of entry to an integer<br />

matrix_row = atoi(&tpname[1]);<br />

printf("Worker: Recieved chunk_size %d matrix_row %d\n",<br />

chunk_size, matrix_row);<br />

// Get beginning worker time<br />

worker_time = wall_clock();<br />

// For each row in chunk_size<br />

// Copy rows from rows to Ai<br />

for(i=0; i


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

}<br />

for(i=0; i m: worker = tupleMat3Worker<br />

(type = slave)<br />

-> f: result<br />

(type = TS)<br />

-> m: master;<br />

The screen output for the master terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />

output removed should resemble:<br />

191


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Master: Opening tuple spaces<br />

Master: Tuple spaces open complete<br />

Master: Processors 2<br />

Master: Chunk size 200<br />

Master: Starting C = A . B<br />

on 500 x 500 matrices<br />

Master: Putting B Length 2000000 Name B<br />

Worker: Opening tuple spaces<br />

Worker: Tuple spaces open complete<br />

Master: Putting chunk_size Length 4 Name chunk_size<br />

Master: Ai tplength = (800016)<br />

Master: Putting A in problem tuple space<br />

Master: Putting chunk_size 200 matrix_row A0 remaining 300<br />

Master: Putting chunk_size 200 matrix_row A200 remaining 100<br />

Worker: Recieved chunk_size 200 matrix_row 200<br />

Master: Putting chunk_size 100 matrix_row A400 remaining 0<br />

Master: All work has been sent<br />

Master: Recieved chunk_sizs 200 matrix_row 0<br />

Worker: Putting chunk_size 200 matrix_row 200<br />

Master: Recieved chunk_sizs 200 matrix_row 200<br />

Master: Recieved chunk_sizs 100 matrix_row 400<br />

Master: Recieved all work from workers<br />

Master: C matrix has been assembled<br />

Master: The multiplication took 4.39389 seconds total time<br />

Master: The workers used 6.23962 seconds of processor time<br />

Master: C is Identity Matrix<br />

Master: Terminated<br />

Worker: Recieved the terminal signal<br />

Worker: Terminated<br />

== (tupleMat3) completed. Elapsed [4] Seconds.<br />

[c615111@owin ~/fpc08 ]><br />

The screen output for the worker terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />

output removed should resemble:<br />

Worker: Opening tuple spaces<br />

Worker: Tuple spaces open complete<br />

Worker: Recieved chunk_size 200 matrix_row 0<br />

Worker: Putting chunk_size 200 matrix_row 0<br />

Worker: Recieved chunk_size 100 matrix_row 400<br />

Worker: Putting chunk_size 100 matrix_row 400<br />

Worker: Recieved the terminal signal<br />

Worker: Terminated<br />

192


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

<strong>Synergy</strong> in the Future<br />

193


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Function <strong>and</strong> Comm<strong>and</strong> Reference<br />

Comm<strong>and</strong>s<br />

addhost<br />

This comm<strong>and</strong> adds a host into the host file. The comm<strong>and</strong> fails if the given host is not<br />

<strong>Synergy</strong> capable. The [-f] option forces the insertion even if the host is not ready. A<br />

newly added host automatically becomes “selected”.<br />

Syntax:<br />

[c615111@owin ~ ]>addhost [-f]<br />

cds<br />

Checks the status of remote daemons. This comm<strong>and</strong> prints all available remote hosts to<br />

screen <strong>and</strong> shows their benchmark, name <strong>and</strong> availability.<br />

Example:<br />

[c615111@owin ~ ]>cds<br />

++ Benchmark (186) ++ (owin) ready.<br />

++ Benchmark (2077) ++ (rancor) ready.<br />

++ Benchmark (2109) ++ (saber) ready.<br />

++ Benchmark (1497) ++ (sarlac) ready.<br />

++ Benchmark (186) ++ (lynox) ready.<br />

[c615111@luke ~ ]><br />

[c615111@owin ~ ]>cds<br />

????? PMD down (129.32.92.82,ewok)<br />

????? CID down (129.32.92.66,luke) (c615111)<br />

????? CID down (129.32.92.89,ackbar) (c615111)<br />

????? CID down (129.32.92.69,r2d2) (c615111)<br />

[c615111@luke ~ ]><br />

[c615111@luke ~ ]>cds<br />

????? PMD down (129.32.92.82,ewok)<br />

++ Benchmark (371) ++ (luke) ready.<br />

????? CID down (129.32.92.89,ackbar) (c615111)<br />

????? CID down (129.32.92.69,r2d2) (c615111)<br />

[c615111@luke ~ ]><br />

194


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

chosts<br />

This comm<strong>and</strong> allows you to toggle the selected <strong>and</strong> de-selected status of processors.<br />

Only the selected processors will be used for immediate parallel processing. The -v<br />

option gives the current <strong>Synergy</strong> connection status. It requires some extra time.<br />

Syntax:<br />

[c615111@owin ~ ]>chosts [-v]<br />

Example:<br />

<strong>Synergy</strong> V3.0 : Host Selection Utility<br />

=Status=No.===IP Address=================Host Name==============Login=F Sys.=<br />

[-----] ( 1) #129.32.92.82 ewok c615111 none<br />

[-----] ( 2) #129.32.92.66 luke c615111 none<br />

[-----] ( 3) #129.32.92.89 ackbar c615111 none<br />

[-----] ( 4) #129.32.92.69 r2d2 c615111 none<br />

[-----] ( 5) #129.32.92.87 alliance c615111 none<br />

[-----] ( 6) #129.32.92.91 anakin c615111 none<br />

[-----] ( 7) #129.32.92.78 bantha c615111 none<br />

[-----] ( 8) #129.32.92.74 bobafet c615111 none<br />

[-----] ( 9) #129.32.92.80 c3p0 c615111 none<br />

[-----] ( 10) #129.32.92.88 chewbaca c615111 none<br />

[-----] ( 11) #129.32.92.86 droids c615111 none<br />

[-----] ( 12) #129.32.92.68 emperor c615111 none<br />

[-----] ( 13) #129.32.92.77 gredo c615111 none<br />

[-----] ( 14) #129.32.92.71 jabba c615111 none<br />

[-----] ( 15) #129.32.92.76 jawa c615111 none<br />

[-----] ( 16) #129.32.92.83 l<strong>and</strong>o c615111 none<br />

[-----] ( 17) #129.32.92.84 leia c615111 none<br />

[-----] ( 18) #129.32.92.81 owin c615111 none<br />

[-----] ( 19) #129.32.92.70 rancor c615111 none<br />

=== Enter s(elect) | d(e-select) | c(ontinue):<br />

[-----] ( 3) #129.32.92.89 ackbar c615111 none<br />

[-----] ( 4) #129.32.92.69 r2d2 c615111 none<br />

[-----] ( 5) #129.32.92.87 alliance c615111 none<br />

[-----] ( 6) #129.32.92.91 anakin c615111 none<br />

[-----] ( 7) #129.32.92.78 bantha c615111 none<br />

[-----] ( 8) #129.32.92.74 bobafet c615111 none<br />

[-----] ( 9) #129.32.92.80 c3p0 c615111 none<br />

[-----] ( 10) #129.32.92.88 chewbaca c615111 none<br />

[-----] ( 11) #129.32.92.86 droids c615111 none<br />

[-----] ( 12) #129.32.92.68 emperor c615111 none<br />

[-----] ( 13) #129.32.92.77 gredo c615111 none<br />

[-----] ( 14) #129.32.92.71 jabba c615111 none<br />

195


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

[-----] ( 15) #129.32.92.76<br />

[-----] ( 16) #129.32.92.83<br />

jawa<br />

l<strong>and</strong>o<br />

c615111<br />

c615111<br />

none<br />

none<br />

[-----] ( 17) #129.32.92.84 leia c615111 none<br />

[-----] ( 18) #129.32.92.81 owin c615111 none<br />

[-----] ( 19) #129.32.92.70 rancor c615111 none<br />

=== Enter s(elect) | d(e-select) | c(ontinue): s<br />

=== Host From (0 to continue) #: 1<br />

To #: 4<br />

(129.32.92.82 ewok) selected.<br />

(129.32.92.66 luke) selected.<br />

(129.32.92.89 ackbar) selected.<br />

(129.32.92.69 r2d2) selected.<br />

=== Enter s(elect) | d(e-select) | c(ontinue):<br />

<strong>Synergy</strong> V3.0 : Host Selection Utility<br />

=Status=No.===IP Address=================Host Name==============Login=F Sys.=<br />

[-----] ( 1) 129.32.92.82 ewok c615111 none<br />

[-----] ( 2) 129.32.92.66 luke c615111 none<br />

[-----] ( 3) 129.32.92.89 ackbar c615111 none<br />

[-----] ( 4) 129.32.92.69 r2d2 c615111 none<br />

[-----] ( 5) #129.32.92.87 alliance c615111 none<br />

[-----] ( 6) #129.32.92.91 anakin c615111 none<br />

[-----] ( 7) #129.32.92.78 bantha c615111 none<br />

[-----] ( 8) #129.32.92.74 bobafet c615111 none<br />

[-----] ( 9) #129.32.92.80 c3p0 c615111 none<br />

[-----] ( 10) #129.32.92.88 chewbaca c615111 none<br />

[-----] ( 11) #129.32.92.86 droids c615111 none<br />

[-----] ( 12) #129.32.92.68 emperor c615111 none<br />

[-----] ( 13) #129.32.92.77 gredo c615111 none<br />

[-----] ( 14) #129.32.92.71 jabba c615111 none<br />

[-----] ( 15) #129.32.92.76 jawa c615111 none<br />

[-----] ( 16) #129.32.92.83 l<strong>and</strong>o c615111 none<br />

[-----] ( 17) #129.32.92.84 leia c615111 none<br />

[-----] ( 18) #129.32.92.81 owin c615111 none<br />

[-----] ( 19) #129.32.92.70 rancor c615111 none<br />

=== Enter s(elect) | d(e-select) | c(ontinue):<br />

[-----] ( 1) 129.32.92.82 ewok c615111 none<br />

[-----] ( 2) 129.32.92.66 luke c615111 none<br />

[-----] ( 3) 129.32.92.89 ackbar c615111 none<br />

[-----] ( 4) 129.32.92.69 r2d2 c615111 none<br />

[-----] ( 5) #129.32.92.87 alliance c615111 none<br />

[-----] ( 6) #129.32.92.91 anakin c615111 none<br />

[-----] ( 7) #129.32.92.78 bantha c615111 none<br />

[-----] ( 8) #129.32.92.74 bobafet c615111 none<br />

[-----] ( 9) #129.32.92.80 c3p0 c615111 none<br />

[-----] ( 10) #129.32.92.88 chewbaca c615111 none<br />

[-----] ( 11) #129.32.92.86 droids c615111 none<br />

[-----] ( 12) #129.32.92.68 emperor c615111 none<br />

[-----] ( 13) #129.32.92.77 gredo c615111 none<br />

[-----] ( 14) #129.32.92.71 jabba c615111 none<br />

[-----] ( 15) #129.32.92.76 jawa c615111 none<br />

[-----] ( 16) #129.32.92.83 l<strong>and</strong>o c615111 none<br />

196


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

[-----] ( 17) #129.32.92.84<br />

[-----] ( 18) #129.32.92.81<br />

leia<br />

owin<br />

c615111<br />

c615111<br />

none<br />

none<br />

[-----] ( 19) #129.32.92.70 rancor c615111 none<br />

=== Enter s(elect) | d(e-select) | c(ontinue): d<br />

=== Host From (0 to continue) #: 2<br />

To #: 3<br />

(luke, #129.32.92.66) de-selected.<br />

(ackbar, #129.32.92.89) de-selected.<br />

=== Enter s(elect) | d(e-select) | c(ontinue):<br />

<strong>Synergy</strong> V3.0 : Host Selection Utility<br />

=Status=No.===IP Address=================Host Name==============Login=F Sys.=<br />

[-----] ( 1) 129.32.92.82 ewok c615111 none<br />

[-----] ( 2) #129.32.92.66 luke c615111 none<br />

[-----] ( 3) #129.32.92.89 ackbar c615111 none<br />

[-----] ( 4) 129.32.92.69 r2d2 c615111 none<br />

[-----] ( 5) #129.32.92.87 alliance c615111 none<br />

[-----] ( 6) #129.32.92.91 anakin c615111 none<br />

[-----] ( 7) #129.32.92.78 bantha c615111 none<br />

[-----] ( 8) #129.32.92.74 bobafet c615111 none<br />

[-----] ( 9) #129.32.92.80 c3p0 c615111 none<br />

[-----] ( 10) #129.32.92.88 chewbaca c615111 none<br />

[-----] ( 11) #129.32.92.86 droids c615111 none<br />

[-----] ( 12) #129.32.92.68 emperor c615111 none<br />

[-----] ( 13) #129.32.92.77 gredo c615111 none<br />

[-----] ( 14) #129.32.92.71 jabba c615111 none<br />

[-----] ( 15) #129.32.92.76 jawa c615111 none<br />

[-----] ( 16) #129.32.92.83 l<strong>and</strong>o c615111 none<br />

[-----] ( 17) #129.32.92.84 leia c615111 none<br />

[-----] ( 18) #129.32.92.81 owin c615111 none<br />

[-----] ( 19) #129.32.92.70 rancor c615111 none<br />

=== Enter s(elect) | d(e-select) | c(ontinue):<br />

cid<br />

Example:<br />

[c615111@luke ~ ]>cid &<br />

[1] 23104<br />

[c615111@luke ~ ]> CID HOST NAME (luke)<br />

Actual CID IP(129.32.92.66)<br />

CID ready.<br />

[c615111@owin ~ ]><br />

[c615111@owin ~ ]>cid &<br />

[2] 240<br />

197


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

[c615111@owin ~ ]> CID HOST NAME (owin)<br />

Actual CID IP(129.32.92.81)<br />

Found an old CID.<br />

Removed an old CID<br />

Reusing cid entry.<br />

CID ready.<br />

[c615111@owin ~ ]><br />

delhost<br />

This comm<strong>and</strong> permanently deletes a host from the host file. It fails if the host is <strong>Synergy</strong><br />

ready. The [-f] option forces the removal.<br />

Syntax:<br />

[c615111@owin ~ ]>delhost [-f]<br />

Example:<br />

dhosts<br />

This comm<strong>and</strong> lets you permanently delete more than one host at a time. The -v option<br />

will verify the hosts' current <strong>Synergy</strong> connection status (it takes some extra time).<br />

Syntax:<br />

[c615111@owin ~ ]>dhosts [-v]<br />

Example:<br />

kds<br />

This comm<strong>and</strong> kills all remote daemons. It only kills the daemons started by your own<br />

login. It will NOT kill daemons started by others.<br />

pcheck<br />

Utility to check <strong>and</strong> maintain running parallel programs<br />

198


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Syntax:<br />

[c615111@owin ~ ]>pcheck<br />

Example:<br />

pmd<br />

Example:<br />

[c615111@ewok ~ ]>pmd &<br />

[1] 24172<br />

[c615111@ewok ~ ]><br />

[c615111@luke ~ ]>pmd &<br />

[2] 23106<br />

[c615111@luke ~ ]>PMD already running.<br />

[2] Exit 1 pmd<br />

[c615111@luke ~ ]><br />

prun<br />

Example:<br />

[c615111@owin ~/example01 ]>prun tupleHello1<br />

== Checking Processor Pool:<br />

++ Benchmark (185) ++ (owin) ready.<br />

++ Benchmark (1487) ++ (rancor) ready.<br />

++ Benchmark (1482) ++ (saber) ready.<br />

== Done.<br />

== Parallel Application Console: (owin)<br />

== CONFiguring: (tupleHello1.csl)<br />

== Default directory: (/usr/classes/cis6151/c615111/example01)<br />

++ Automatic program assignment: (worker)->(owin)<br />

++ Automatic slave generation: (worker1)->(rancor)<br />

++ Automatic slave generation: (worker2)->(saber)<br />

++ Automatic program assignment: (master)->(owin)<br />

++ Automatic object assignment: (problem)->(owin) pred(1) succ(3)<br />

++ Automatic object assignment: (result)->(owin) pred(3) succ(1)<br />

== Done.<br />

== Starting Distributed Application Controller ...<br />

Verifying process [|(c615111)|*/tupleHello1Worker<br />

199


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Verifying process [|(c615111)|*/tupleHello1Worker<br />

Verifying process [|(c615111)|*/tupleHello1Master<br />

Verifying process [|(c615111)|*/tupleHello1Worker<br />

** (tupleHello1.prcd) verified, all components executable.<br />

** (tupleHello1.prcd) started.<br />

== (tupleHello1) completed. Elapsed [5] Seconds.<br />

[c615111@owin ~/example01 ]><br />

sds<br />

This comm<strong>and</strong> starts daemons on selected hosts (defined in ~/.sng_hosts).<br />

sfs<br />

Example:<br />

shosts<br />

Example:<br />

200


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Functions<br />

cnf_close(id)<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Close all internal data structures according to type<br />

int id – identifier of object to be closed<br />

Nothing<br />

cnf_dget(tpname, tpvalue, tpsize)<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Destructive read a tuple from a direct tuple space<br />

char *tpname – the name of the object to be read from<br />

char *tpvalue – address of receiving buffer<br />

int tpsize – ?<br />

int tpsize – the length of the data read in 8-bit bytes<br />

cnf_dinit()<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Initializes the tid_list before each scatter operation<br />

None<br />

1 always<br />

cnf_dput(tsd, tid, tpname, tpvalue, tpsize)<br />

PURPOSE: Inserts a typle into a direct tuple space<br />

PARAMETERS: int tsd<br />

long tpsize<br />

char *tid<br />

char *tpname<br />

char *tpvalue<br />

RETURNS: ?<br />

cnf_dread(tpname, tpvalue, tpsize)<br />

PURPOSE:<br />

PARAMETERS:<br />

Destructive read a tuple from a direct tuple space<br />

int tpsize;<br />

char *tpname;<br />

201


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

RETURNS:<br />

char *tpvalue;<br />

int tpsize<br />

cnf_dzap()<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Removes all local CID's tuples<br />

None<br />

1 if success or an error code otherwise<br />

cnf_eot(id)<br />

PURPOSE: Marks the end of tasks<br />

PARAMETERS: int id - ?<br />

RETURNS: 1 if success or an error code otherwise<br />

cnf_error(errno)<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Prints to the user the kind of error encountered<br />

int errno<br />

1 always<br />

cnf_fflush(id)<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Flushes a file<br />

int id – index into cnf_map to get channel #/ptr<br />

1 if success or 0 if error<br />

cnf_fgetc(id, buf)<br />

PURPOSE: Read a char from file into buffer<br />

PARAMETERS: int id – index into cnf_map to get channel #/ptr<br />

char *buf; – address of receiving buffer<br />

RETURNS: 0 on EOF otherwise 1<br />

int cnf_fgets(id, buf, bufsiz)<br />

202


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Read a line from file into buffer<br />

int id – index into cnf_map to get channel #/ptr<br />

char *buf – address of receiving buffer<br />

int bufsiz – max size of receiving buffer<br />

0 if EOF otherwise number of bytes read<br />

cnf_fputc(id, buf)<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Write a char from buffer to file<br />

int id – index into cnf_map to get channel #/ptr<br />

char buf – address of receiving buffer<br />

1 if success or 0 if error<br />

cnf_fputs(id, buf, bufsiz)<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Write a line from buffer to file<br />

int id – index into cnf_map to get channel #/ptr<br />

char *buf – address of receiving buffer<br />

int bufsiz – size of buffer<br />

Number of bytes written or 0 if error<br />

cnf_fread(id, buf, bufsiz, nitems)<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Read a 'record' from file into buffer<br />

int id – index into cnf_map to get channel #/ptr<br />

char *buf – address of receiving buffer<br />

int bufsiz – max size of receiving buffer<br />

int nitems – number of bufsiz blocks to read<br />

0 if EOF otherwise number of bytes read<br />

cnf_fseek(id, from, offset)<br />

PURPOSE:<br />

PARAMETERS:<br />

Set the reader pointer from "from" to "offset" in a file<br />

int id – index into cnf_map to get channel #/ptr<br />

int from<br />

int offset<br />

203


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

RETURNS:<br />

1 if success or 0 if error<br />

cnf_fwrite(id, buf, bufsiz, nitems)<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Write a 'record' from buffer into file<br />

int id – index into cnf_map to get channel #/ptr<br />

char *buf – address of receiving buffer<br />

int bufsiz – max size of receiving buffer<br />

int nitems – number of bufsiz blocks to write<br />

Number of bytes written or an error code on error<br />

cnf_getarg(idx)<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Returns the runtime argument by index<br />

int idx – the index<br />

char * (idx'th argument)<br />

cnf_getf()<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Returns the factor value for loop scheduling<br />

None<br />

f value (0..100] integer<br />

cnf_getP()<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Returns the number of parallel workers<br />

None<br />

P value [1..N] integer<br />

cnf_gett()<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Returns the threshold value for loop scheduling<br />

None<br />

t value [1..N) integer<br />

204


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

cnf_gts(tsd)<br />

PURPOSE: Get all tid's processor assignments in one shot<br />

PARAMETERS: int tsd - ?<br />

RETURNS: 1 if success, 0 if no memory or an error code otherwise<br />

cnf_init()<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Initializes sng_map_hd <strong>and</strong> sng_map using either the init file or<br />

direct transmission from DAC. The init file's name is constructed<br />

from the value of the logical name CNF_MODULE suffixed with<br />

".ini".<br />

None<br />

Nothing if successful or an error code otherwise<br />

cnf_open(local_name, mode)<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Lookup a pipe or tuple space object in sng_map structure, open a<br />

channel to the physical address for that ref_name<br />

char *local_name – local_name to find in cnf_map<br />

char *mode – open modes: r,w,a,r+,w+,a+. Only for FILEs<br />

int chan – an integer h<strong>and</strong>le, if successful or an error code<br />

otherwise. This is used like a usual Unix file h<strong>and</strong>le.<br />

cnf_print_map()<br />

PURPOSE: ?<br />

PARAMETERS: None<br />

RETURNS: Nothing<br />

cnf_read(id, buf, bufsiz)<br />

PURPOSE:<br />

PARAMETERS:<br />

read a 'record' from file or pipe into buffer (starting at address<br />

buff).<br />

int id – index into cnf_map to get channel #/ptr<br />

int bufsiz – max size of receiving buffer<br />

char *buf – address of receiving buffer<br />

205


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

RETURNS:<br />

0 on EOF otherwise number of bytes read<br />

cnf_rmall(id)<br />

PURPOSE: Destroy all tuples in a named tuple space<br />

PARAMETERS: int id - ?<br />

RETURNS: 0 if successful or an error code otherwise<br />

cnf_sot(id)<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Marks the start of scantering of tasks<br />

int id<br />

1 if successful or an error code otherwise<br />

cnf_spzap(tsd)<br />

PURPOSE: Removes all "retrieve" entries in TSH<br />

PARAMETERS: int tsd - ?<br />

RETURNS: 1 if successful or an error code otherwise<br />

cnf_term()<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Called before image return to clean things up. Closes any files left<br />

open.<br />

None<br />

Nothing<br />

cnf_tget(tpname, tpvalue, tpsize)<br />

PURPOSE: Destructive read a tuple from a named tuple space<br />

PARAMETERS: int tpsize -<br />

char *tpname -<br />

char *tpvalue -<br />

RETURNS: int tpsize – the size of the tuple received if successful or an error<br />

code otherwise<br />

206


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

cnf_tsput(tpname, tpvalue, tpsize)<br />

PURPOSE: Inserts a tuple into a named tuple space<br />

PARAMETERS: int tpsize -<br />

char *tpname -<br />

char *tpvalue -<br />

RETURNS: ? on success or an error code otherwise<br />

cnf_tsread(tpname, tpvalue, tpsize)<br />

PURPOSE: Read a tuple from a named tuple space<br />

PARAMETERS: int tpsize -<br />

char *tpname -<br />

char *tpvalue -<br />

RETURNS: int tpsize – the size of the tuple received if successful or an error<br />

code otherwise<br />

cnf_tsget(id, tpname, tpvalue, tpsize)<br />

PURPOSE: Destructive read a tuple from a named tuple space<br />

PARAMETERS: int id -<br />

int tpsize -<br />

char *tpname -<br />

char *tpvalue -<br />

RETURNS: int tpsize – the size of the tuple received if successful or an error<br />

code otherwise<br />

cnf_tsput(id, tpname, tpvalue, tpsize)<br />

PURPOSE:<br />

PARAMETERS: int id -<br />

int tpsize -<br />

char *tpname -<br />

char *tpvalue -<br />

RETURNS: ? on success or an error code otherwise<br />

Inserts a tuple into a named tuple space<br />

207


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

cnf_tsread(id, tpname, tpvalue, tpsize)<br />

PURPOSE: Read a tuple from a named tuple space<br />

PARAMETERS: int id -<br />

int tpsize -<br />

char *tpname -<br />

char *tpvalue -<br />

RETURNS: int tpsize – the size of the tuple received if successful or an error<br />

code otherwise<br />

cnf_write(id, buf, bytes)<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Send a 'record' to file (or mailbox or decnet channel) from buffer<br />

(starting at address buff). bytes is the number of bytes to send. id<br />

is the index into cnf_map global data structure where the actual<br />

channel number or file pointer is stored.<br />

int id – index into cnf_map for channel #/ptr<br />

int bytes – number of bytes to send/write<br />

char buf[] – address of message to send<br />

1 if successful or an error code otherwise<br />

cnf_xdr_fgets(id, buf, bufsize, e_type)<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Read the external data representation of a line from file into buffer<br />

(starting at address xdr_buff) <strong>and</strong> translates it to C language.<br />

int id – The index into cnf_map global data structure where the<br />

actual channel number or file pointer is stored<br />

char *buf -<br />

int bufsize – the number of bytes to read<br />

int e_type -<br />

0 on EOF or number of bytes read on success otherwise an error<br />

code on error<br />

cnf_xdr_fputs(id, buf, bufsize, e_type)<br />

PURPOSE:<br />

Translates a line to it's external data representation <strong>and</strong> sends it to<br />

file from buffer (starting at address xdr_buff). .<br />

208


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

PARAMETERS:<br />

RETURNS:<br />

int id – The index into cnf_map global data structure where the<br />

actual channel number or file pointer is stored<br />

char *buf -<br />

int bufsize – the number of bytes to send<br />

int e_type -<br />

int status - number of bytes written, 0 if error writing or an error<br />

code otherwise<br />

cnf_xdr_fread(id, buf, bufsize, nitems, e_type)<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Read the external data representation of a 'record' from file into<br />

buffer (starting at address xdr_buff) <strong>and</strong> translates it to C language.<br />

int id – The index into cnf_map global data structure where the<br />

actual channel number or file pointer is stored<br />

char *buf -<br />

int bufsize – the number of bytes to read<br />

int nitems -<br />

int e_type -<br />

int status - number of bytes read, 0 if error writing or an error code<br />

otherwise<br />

cnf_xdr_fwrite(id, buf, bufsize, nitems, e_type)<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Translates a 'record` to it's external data representation <strong>and</strong> sends it<br />

to file from buffer (starting at address xdr_buff).<br />

int id – The index into cnf_map global data structure where the<br />

actual channel number or file pointer is stored<br />

char *buf -<br />

int bufsize – the number of bytes to send<br />

int nitems -<br />

int e_type -<br />

Number of bytes written or an error code or -1 on error<br />

cnf_xdr_read(id, buf, bufsize, e_type)<br />

PURPOSE:<br />

Read the external data representation of a 'record' from file or pipe<br />

into buffer (starting at address xdr_buff) <strong>and</strong> translates it to C<br />

language.<br />

209


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

PARAMETERS:<br />

RETURNS:<br />

int id – The index into cnf_map global data structure where the<br />

actual channel number or file pointer is stored<br />

char *buf -<br />

int bufsize – the number of bytes to read<br />

int e_type -<br />

int status - number of bytes read, 0 if error writing or an error code<br />

otherwise<br />

cnf_xdr_tsget(tsh, tp_name, tuple, tp_len, e_type)<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Destructive reads the external data representation of a tuple from a<br />

named tuple space <strong>and</strong> Translates it to C language.<br />

int tsh<br />

char *tp_name<br />

char *tuple<br />

int tp_len<br />

int e_type<br />

int status - the size of the tuple received if successful, 0 if it is an<br />

asynchronous read or –1 on error<br />

cnf_xdr_tsput(tsh, tp_name, tuple, tp_len, e_type)<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

Translates a tuple to it's external data representation <strong>and</strong> inserts it<br />

into a named tuple space<br />

int tsh<br />

char *tp_name<br />

char *tuple<br />

int tp_len<br />

int e_type<br />

int status - ? on success or an error code otherwise<br />

cnf_xdr_tsread(tsh, tp_name, tuple, tp_len, e_type)<br />

PURPOSE:<br />

PARAMETERS:<br />

Reads the external data representation of a tuple from a named<br />

tuple space <strong>and</strong> translates it to C language.<br />

int tsh<br />

char *tp_name<br />

char *tuple<br />

210


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

RETURNS:<br />

int tp_len<br />

int e_type<br />

int status - number of bytes read, 0 if error writing or an error code<br />

or –1 on error<br />

cnf_xdr_write(id, buf, bufsize, e_type)<br />

PURPOSE: Translates a 'record` to it's external data representation <strong>and</strong> sends it<br />

to file (or mailbox or decnet channel) from buffer (starting at address xdr_buff).<br />

PARAMETERS: int id – The index into cnf_map global data structure where the<br />

actual channel number or file pointer is stored<br />

char *buf -<br />

int bufsize – the number of bytes to send<br />

int e_type -<br />

RETURNS: 1 if successful or an error code or –1 on error<br />

PURPOSE:<br />

PARAMETERS:<br />

RETURNS:<br />

211


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

Error Codes<br />

TSH_ER_NOERROR<br />

TSH_ER_INSTALL<br />

TSH_ER_NOTUPLE<br />

TSH_ER_NOMEM<br />

TSH_ER_OVERRT<br />

Normal operation - No error at all<br />

Error: Tuple Space daemon could not be started<br />

Error: Could not find such tuple<br />

Error: Tuple space daemon out of memory<br />

Warning: Tuple was overwritten<br />

212


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

References<br />

i Information on tally sticks found at members.fortunecity.com<br />

ii Information on abacus found at http://www.maxmon.com<br />

iii Jill Britton, Department of Mathematics, Camosun College, 3100 Foul Bay Road, Victoria, BC, Canada,<br />

V8P 5J2. Web Page: http://ccins.camosun.bc.ca/~jbritton/jberatosthenes.htm<br />

iv http://encyclopedia.thefreedictionary.com/<br />

v http://www.thocp.net/hardware/pascaline.htm<br />

vi http://www.ox.compsoc.net/~swhite/history/timelines.html<br />

vii http://miami.int.gu.edu.au/dbs/1010/lectures/lecture4/Ifrah-pp121-133.html<br />

viii http://www.agnesscott.edu/lriddle/women/love.htm<br />

ix http://www.kerryr.net/pioneers/boole.htm<br />

x http://knight.city.ba.k12.md.us/faculty/ss/samuelmorse.htm<br />

xi http://history.acusd.edu/gen/recording/bell-evolution.html<br />

xii http://www-gap.dcs.st-<strong>and</strong>.ac.uk/~history/Mathematicians/Hollerith.html - Article by: J J O'Connor <strong>and</strong><br />

E F Robertson<br />

xiii http://www.marconi.com/html/about/marconihistory.htm<br />

xiv http://www.radio-electronics.com/info/radio_history/gtnames/fleming.html<br />

xv http://www.epemag.com/zuse/<br />

xvi http://www-gap.dcs.st-<strong>and</strong>.ac.uk/~history/Mathematicians/Aiken.html<br />

xvii http://www.research.att.com/~njas/doc/shannonbio.html<br />

xviii http://www.kerryr.net/pioneers/stibitz.htm<br />

xix http://plato.stanford.edu/entries/turing/<br />

xx http://ei.cs.vt.edu/~history/do_Atanasoff.html<br />

xxi http://www.library.upenn.edu/exhibits/rbm/mauchly/jwmintro.html<br />

xxii http://ftp.arl.mil/~mike/comphist/61ordnance/chap3.html<br />

xxiii http://en.wikipedia.org/wiki/MIT_Whirlwind<br />

xxiv http://www.cl.cam.ac.uk/UoCCL/misc/EDSAC99/statistics.html<br />

xxv http://www.computer50.org/mark1/MM1.html<br />

xxvi http://www.awc-hq.org/lovelace/1997.htm<br />

xxvii http://www.cs.yale.edu/homes/tap/Files/hopper-story.html<br />

xxviii http://inventors.about.com/library/weekly/aa061698.htm<br />

xxix http://csdl.computer.org/comp/mags/an/2004/02/a2034abs.htm<br />

xxx http://www.cc.gatech.edu/gvu/people/r<strong>and</strong>y.carpenter/folklore/v3n1.html<br />

xxxi http://en.wikipedia.org/wiki/Defense_Advanced_Research_Projects_Agency<br />

xxxii http://www.engin.umd.umich.edu/CIS/course.des/cis400/algol/algol.html#history<br />

xxxiii http://inventors.about.com/library/weekly/aa080498.htm<br />

xxxiv http://www.nersc.gov/~deboni/Computer.history/LARC.Cole.html<br />

xxxv http://www.smartcomputing.com/editorial/dictionary/<br />

detail.asp?guid=&searchtype=1&DicID=16502&RefType=Encyclopedia<br />

xxxvi http://en.wikipedia.org/wiki/CTSS<br />

xxxvii http://www.ukuug.org/events/linux2001/papers/html/DAspinall.html<br />

xxxviii http://www.fys.ruu.nl/~bergmann/history.html<br />

xxxix http://www.engin.umd.umich.edu/CIS/course.des/cis400/pl1/pl1.html<br />

xl http://www.afrlhorizons.com/Briefs/Mar02/OSR0103.html<br />

xli http://www.smalltalk.org/alankay.html<br />

xlii http://www.faqs.org/faqs/dec-faq/pdp8/<br />

213


<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />

xliii http://bugclub.org/beginners/languages/pascal.html<br />

xliv http://en.wikipedia.org/wiki/Edsger_Dijkstra<br />

xlv http://www.campusprogram.com/reference/en/wikipedia/s/so/software_engineering.html<br />

xlvi http://en.wikipedia.org/wiki/UNIX<br />

xlvii http://en.wikipedia.org/wiki/RS-232<br />

xlviii http://inventors.about.com/library/weekly/aa092998.htm<br />

xlix http://bugclub.org/beginners/processors/Intel-8086.html<br />

l http://bugclub.org/beginners/processors/Intel-80186.html<br />

li http://www.pcguide.com/ref/cpu/char/mfg.htm<br />

lii http://members.fortunecity.com/pcmuseum/dos.htm<br />

liii http://www.cs.uiuc.edu/news/alumni/fa98/chen.html<br />

liv http://www.webmythology.com/VAXhistory.htm<br />

lv http://en.wikipedia.org/wiki/Motorola_68000<br />

lvi http://en.wikipedia.org/wiki/INMOS_Transputer<br />

lvii http://csep1.phy.ornl.gov/ca/node11.html<br />

lviii PVM: Parallel Virtual Machine - A <strong>User</strong>s' Guide <strong>and</strong> <strong>Tutorial</strong> for Networked Parallel Computing; Al<br />

Geist, Adam Beguelin, Jack Dongarra, Weicheng Jiang, Robert Manchek, Vaidy Sunderam; MIT Press,<br />

Scientific <strong>and</strong> Engineering Computation; Janusz Kowalik, Editor; Copyright 1994 Massachusetts Institute<br />

of Technology. The book can be viewed at: http://www.netlib.org/pvm3/book/pvm-book.html<br />

lix Linda <strong>User</strong>s Guide <strong>and</strong> Reference <strong>Manual</strong>, <strong>Manual</strong> Version 6.2; Copyright © 1989-1994, SCIENTIFIC<br />

Computing Associates, Inc. All rights reserved.<br />

lx http://people.hofstra.edu/faculty/Stefan_Waner/RealWorld/logic/logicintro.html<br />

lxi Garson, James, "Modal Logic", The Stanford Encyclopedia of Philosophy (Winter 2003 Edition), Edward<br />

N. Zalta (ed.), URL = .<br />

lxii Galton, Antony, "Temporal Logic", The Stanford Encyclopedia of Philosophy (Winter 2003 Edition),<br />

Edward N. Zalta (ed.), URL = .<br />

lxiii Reevaluating Amdahl’s Law; John L. Gustafson; S<strong>and</strong>ia National Laboratories; 1988.<br />

lxiv Reevaluating Amdahl’s Law <strong>and</strong> Gustafson’s Law; Yuan Shi; Temple University; October 1996.<br />

lxv <strong>Synergy</strong> <strong>Manual</strong><br />

214

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!