ALGEBRA: the abstraction of pattern from the numerical universe

Algebra is often presented as though it was some sort of symbolic game, with increasingly sophisticated rules and conventions, which appeals to a small proportion of individuals, but which is an incomprehensible exercise in futility to the larger.
A subset of those individuals who find they are good at the game, end up 'teaching' the activity to the next generation...without bothering to introspectively investigate the epistemological purpose and meanings of the intellectual procedures involved... and so the aura of mystery and exasperation of the majority of initiates is perpetuated.

This brief essay is an attempt to present a rationale of some any moderately literate reader...that might engender an appreciation of what the activity is all about, and why it has proved so powerful a tool, in the hands and minds of the expert practitioners.

In the 'Arithmetic' essay on this website, although the main focus was on numerical manipulations, never-the-less, it was convenient at several stages, to introduce a number of abstract symbols in order to be able to express several general patterns.

alphabetic F, A, B, M, N, a,b,c,x,...
were used to represent undefined numbers.

an 'index' notation was used to represent various multiplications of ten
10 * 10 = 102     10*10*10 = 103     10*10*10*10 = 104... and so on...

and the symbol '√' was introduced to represent the idea of a 'square root'.

In this essay, all the ideas and patterns that were established in the 'Arithmetic' essay will be assumed, but the focus will now be more on the patterns that can be symbolized rather than on any numerical detail.


Ordinary 'algebra' is a written symbolic system that evolved from the aspiration of attempting to generalize and formalize all the realities and practicalities of manipulating numerical entities. Symbols were chosen, definitions adopted, rules agreed upon, conventions adhered to and logical realities recognized...all in a sufficiently coherent and integrated structure that the communication of generalized numerical statements might become as unambiguous as possible.


Conventions are generally accepted and common understandings about the manner in which social interactions and communications are carried out. They are usually historically justified rather than being based on irrefutable logic. Some of the most widely used mathematical symbol conventions are as follows...


a     b     c ... x     y     z         A     B     C ...     α     β     γ     δ ...
are generally used as symbols for numbers which are variables, parameters,
and invented 'entities' like matrices and points.

Where it is possible we often use characters [letters] which remind us what we are symbolizing, when the numbers being considered are connected to some sort of physical phenomena. In English, we might use the letter 'H' to stand for the Height of something, the letter 'r' to stand for the Radius of a circle, the symbol 'M' to stand for the Mass of something and so on. The letters 'x', 'y', 'z', 't', in particular get heavily used as variables for measurements in three dimensional space-time and in all manner of theoretical manipulations of equation statements.


as in Arithmetic, numerals are used as symbols for specific number counts.

    0     1    
in base 2 counting

    0     1     2     3     4     5     6     7     8     9    
in base 10 counting

  0   1   2   3   4   5   6   7   8   9   A   B   C   D   E   F  
in base 16 counting [hexadecimal]

For a detailed account of how the base ten numerals are traditionally used
refer to 'Arithmetic'.


indicate how two or more entities
would sequence if compared numerically.
The most common are:

equality   = less than   < greater than   >

thus '5 > 3' is a true statement
'5 < 3' is a false statement
and 'x = 8' asserts that x can be replaced by 8.

the following symbols are also often used
    not equal to
    greater than or equal to
    less than or equal to

and various others are defined in the context as the need arises.


indicate specific arithmetic procedures that are to be performed.

Each of the following operator pairs are inverses that they each can reverse the effect of the other...

addition subtraction
+ -

thus all the vast infinity of arithmetic calculations like:
8 + 5 - 5 = 8     and     10 + 2 - 2 = 10     and     157 + 32 - 32 = 157
could all summarized by the single symbolic formula

A + a - a = A

multiplication division

and all the calculations like:
17 ∗ 6 ÷ 6 = 17     and     269 ∗ 39 ÷ 39 = 269     and     13 ∗ 2.7 ÷ 2.7 = 13
could all summarized by the single symbolic formula

A ∗ a ÷ a = A

squaring square root
( )2 √( )

'Squaring' is a common operation whereby an entity is multiplied by itself.
Thus all the operations like:
(10)2= 10 ∗ 10     and     (4)2 = 4*4     and     (2.5)2 = 2.5*2.5
could be summarized by the symbolic formula

        A2 = A ∗ A

'Square root' is defined as the inverse('reverse') of 'squaring'.
so that operations like √(102) = 10     and         (√ 7)2 = 7
could all be summarized by symbolic formulae like :

   √((A)2) = A         or         ( √ (A ))2 = A
( if one chooses to be pedantic about the brackets)

but since 'A' is a single entity, the brackets around it are essentially redundant
and the formulae would be just as unambiguous if they were written as:

   √(A2) = A         or         ( √ A )2 = A

nth power nth root
( )n ( )1/n

'nth power' is an operation that extends the idea of 'squaring'
to allow for situations where an entity is multiplied by itself n times
Thus 103= 10*10*10       and       54 = 5*5*5*5       and       28= 2*2*2*2*2*2*2*2
might be generalized by some sort of formula like:

An = A*A*A*...*A (n multiplications)

'nth root' is then defined to be the inverse of 'nth power'.
so that it 'reverses' or 'undoes' the nth power operation
eg    (103)1/3 = 10         or         (21/5)5 = 2 etc
and this situation could be symbolized by such formulae as:

   (An)1/n = A         or         (A1/n)n = A

NB. Traditionally, the symbol 'x' has been (and still is) used for multiplication.

This has always provided the opportunity for ambiguity and confusion
between the use of 'x' as a numerical symbol and 'x' as an operational instruction.

With the advent of computers and computer languages
it was imperative that such uncertainty be resolved
so a decision was taken during the writing of the early languages
to use the 'asterisk' symbol '*' for multiplication
because it was conveniently available on the standard type keyboard.
The interpreter or compiler of a computer language statement like A = B*C
would translate the B*C part of the instruction as 'B multiplied by C'.
Many of these languages are still in use
and so the interpretation of '*' to mean multiplication is quite widespread.

For similar reasons, the symbol '/' is used for division
simply because it is available directly on the keyboard.

However, by far the most widespread presentation adopted
is to omit the multiplication sign altogether
assume that any adjacent alphabetic characters are multiplied
and to place divisions below a horizontal line.

For this to work in complex situations, certain priority rules must be adhered
to but for a simple case

abc         means         'a' multiplied by 'b' multiplied by 'c'
divided by
'x' multiplied by 'y'


are used to partition symbols and remove ambiguity.

Thus a (decimal) point partitions integers from fractions
345.678 means that 345 represents the integers and 678 the fractions

Brackets ( ) [ ] { } etc package associated symbols
so that a(b +c) means that b and c must be added before being multiplied by a.

A division line partitions the top 'numerator' from the bottom 'denominator' so

a + c         partitions         the 'numerator' (a + c)
the 'denominator' (x + y)
x + y

Unicode and mathematical symbol library

Thousands of symbols are available from the unicode library.
♠         ♬         ♁         ♃         ◔         ≆         ∯         ∀         and so on....
Mathematical symbols are a small subset of the total resource.
All/any of the symbols could be defined and interpreted mathematically
according to context and suitability


Rules are devised to conveniently resolve ambiguities and improve efficiencies.
Because they are often concerned with such matters as
the order in which a process is carried out,
the positional orientation of a symbol,
the omission (or not) of some relevant aspect
they can be the source of incomprehension to the uninitiated.
Making sense of algebra and arithmetic
requires an appreciation of at least the following few rules:


These are assumed to be multiplied thus...

        abcdef         =         a ∗ b ∗ c ∗ d ∗ e ∗ f

Multiplication symbols are often omitted
so that multiplications are usually implied rather than explicitly symbolized

It is an attempt at efficiency and simplification
and is a satisfactory shortcut in most contexts.

If it is necessary to emphasize the multiplication operation for some reason
one of the symbols '∗' or '.' or 'x' is deliberately introduced

and so 'a multiplied by b'     might be symbolized either by
        'ab'     or     'a ∗ b'     or     'a.b'     or     'a x b'

Clearly such practices are not so logically irreproachable,
but this is the present linguistic reality of mathematical symbolism.


These represent numbers using a positional rule
(described in some detail in 'Arithmetic')

In essence, adjacent digits differ in magnitude by a chosen 'base' multiplier,
and integers are separated from fractions by the use of the '.' symbol.

The magnitude of the represented number is found by adding all the contributions.

thus 735.69     represents     7x102 + 3x10 + 5 + 6/10 + 9/100     in base 10
and 101.01     represents     1x22 + 0x2 + 1 + 0/2 + 1/(22)     in base 2


These rules arise from the consequences of numerical manipulations
and are justified and examined more carefully in 'Arithmetic'

Firstly, if no sign is physically present, the '+' sign is assumed
...thus either 'a' or '+a' means 'positive a'
whereas '-a' is the only option for 'negative a'

Secondly, when two signs are involved the following procedures are adopted

Like signs become positive

Thus '+(+a)' could be replaced by '+a' or simply 'a'
and '-(-a) could also be replaced by '+a' or 'a'

'(+a) ∗ (+b)' could be replaced by '+ab' or 'ab'
'(-a) ∗ (-b)' could be replaced by '+ab' or 'ab'
'(+a)/(+b)' could be replaced by '+a/b' or 'a/b'
'(-a)/(-b)' could be replaced by '+a/b' or 'a/b'

Unlike signs become negative

Thus '+(-a)' could be replaced by '-a'
and '-(+a) could also be replaced by '-a'

'(-a) ∗ (+b)' could be replaced by '-ab'
'(+a) ∗ (-b)' could be replaced by '-ab'
'(-a)/(+b)' could be replaced by '-a/b'
'(+a)/(-b)' could be replaced by '-a/b'

It is worth noting that an extension of these rules
for more than two multiplications or divisions
implies that
an even number of '-' signs will result in '+'
an odd number of '-' signs will result in '-'

so that (-a)*(-b)*(-c)*(-d)*(+e)     =     +abcde
whereas (-a)*(+b)*(-c)*(-d)*(+e)     =     -abcde


There are at least four words in common use
for a numeric symbol placed above and right of a base symbol

... index, power, exponent, logarithm ...

so that the four linguistic templates below are essentially equivalent

        [base](index)        [base](power)         [base](exponent)        [base](logarithm)

Here are a few very common examples

106                a2                x1/2                 (a+x)n                 e-kt                (1-x)-1/3


The influence (scope) of an index is restricted to the immediately preceding base
and not to any other multiplying factors, coefficients or signs.

Thus the 2 of -52 applies only to the 5 and not to the - sign
so that -52 means -(52) which is equivalent to -(5*5) = -25

Brackets need to be used if the 2 is required to apply to the - sign as well as the 5
thus (-5)2 = (-5)*(-5) = + 25

It is important to realize that

-2x2 does NOT mean (-2x)2 or -(2x)2

-2x2 = -2*x*x
(-2x)2 = (-2x)*(-2x) = (-2)*x*(-2)*x = +4*x*x = +4x2

and -(2x)2 = -(2x)*(2x) = -4x2

Operators with the smallest scope have priority

It is always desirable that brackets be used to remove ambiguity
but if circumstances are encountered where this has not been done
then operations with the least influence (scope) are performed first

Suppose for example, it was necessary to evaluate the following arithmetic expression

8 + 5√(32 -2)
4 ∗ 6

The power operator '2' only operates on the 3
and the multiplication operator '∗' only influences the 4 and the 6
so these have computational priority thus

8 + 5√(32 -2)     =         8 + 5√(9 -2)
4 ∗ 6 24

The '-' operator only operates on the 9 and the 2 to give 7
and then the √ only operates on this 7 so we end up with

8 + 5√(9 -2)     =         8 + 5 ∗2.64575 (approx)
24 24

The '∗' operator only operates on the 5 and the 2.64575 to give 13.22876 approx
and then the '+' only operates on this and the 8

8 + 5 ∗ 2.64575     =         8 +13.22876
24 24

Finally the 'division' operator only operates on the 21.22876 and the 24

21.22876     =         0.88453 (approx)


When the operations of         ∗         /         ( )n         ( )1/n         are applied
to [base](index) expressions and combinations
there are a number of important rules and interpretations that result
from ensuring that symbolic meanings remain consistent and are not contradictory

       MULTIPLICATIONS                 ADD indices
BxBy         =         Bx + y

the reason for this is related directly to the meaning of indices
B4 means B ∗ B ∗ B ∗ B         and         B3 means B ∗ B ∗ B
so that B4 ∗ B3         =         B ∗ B ∗ B ∗ B ∗ B ∗ B ∗ B         =         B7

       DIVISIONS                 SUBTRACT indices
Bx / By         =         Bx - y

this is because division is the inverse of multiplication
and numerator/denominator pairs will neutralize one another
B8         =         B ∗ B ∗ B ∗ BBBBB        =         B3

       INDEX OF ZERO                 UNITY
B0         =         1

since any number divided by itself must equal 1
any number        =         A        =         1
divided by itself A
so         Bn        =         1        =         Bn - n        =         B0 .

       INDEX OF ONE                 BASE
B1         =         B

because of the meaning of indices         B2 means B ∗ B
so that to make the first rule work we need to choose an index of one for B
thus         B2         =         B1 ∗ B1

       INDEX OF MINUS ONE                 RECIPROCAL
B-1         =         1/B

by definition, the product of reciprocals is one
thus 10 ∗ (1/10) = 1         7 ∗ (1/7) = 1         π ∗ (1/π) = 1        
B1 ∗ B-1        =         B1 + (-1)        =         B0        =         1
thus B-1 must be treated as the reciprocal of B

       REPEAT POWERS                 MULTIPLY indices
[Bx ] y         =         Bxy

[B3]4 means B3 ∗ B3 ∗ B3 ∗ B3
because the scope of the 4 applies to every thing inside the [ ] brackets
and B3 ∗ B3 ∗ B3 ∗ B3     =     B3+3+3+3     =     B3 ∗ 4     =     B12

       FRACTIONAL INDEX                 POWER/ROOT
Bx/y         =         [Bx]1/y         =         [B1/y]x

because x/y         =         x ∗ (1/y)         =         (1/y) ∗ x
are valid optional product equivalences

Bx/y can be interpreted either as
the (1/y)th root of Bx or as the xth power of B(1/y)

thus 82/3         =         [82](1/3)         =         [64](1/3)         =         4
or 82/3         =         [8(1/3)]2         =         [2]2         =         4

These rules can be a little daunting to begin with
but as with any language and skill
repetition and training in the basics
is the only pathway to fluency and confidence


A term is a group of numerals and/or alphabetic characters
connected by multiplications and/or divisions.

-2a2b         5a/3b         12x3y4z2         are examples of simple terms.

but because multiplication commutations are valid
( as established in 'Arithmetic' )

-2ba2         (a ∗ 5)/(b ∗ 3)         12y4z2x3         would be valid alternatives.

A factor is one of the multipliers of a term...thus...

1     -1     2     -2     a     -a     +b     -b    
+2a     -2a     +2b     -2b     a2     -a2     +2a2     -2a2    
+ab     -ab     +2ab     -2ab     +a2b     -a2b     +2a2b    -2a2b

could all be considered as possible factors of -2a2b


to multiply two or more terms (5a2b)(-4ba)3(-ab3)
write it out in more detail (5a2b)*(-4ba)*(-4ba)*(-4ba)*(-ab3)
sort/commute the numbers and bases 5*-4*-4*-4*-1*a2*a*a*a*a*b*b*b*b*b3
apply the arithmetic and index rules 320a6b7
(with care and some practice
simple examples can be done mentally)


The index rules can be used to derive a useful rule
when dividing anything 'A' by a fraction 'N/D'

A/[N/D]'Anything' / [Numerator/Denominator]
= A / [N ∗ D-1]since 1/D = D-1
= A ∗ [N ∗ D-1]-1 since 1/[ ] = [ ]-1
= A ∗ [N-1 ∗ D]multiplying indices
= A ∗ [D ∗ N-1]commuting 2nd multiplication
= A ∗ [D/N]since N-1 = 1/N

thus to divide 'anything' by a fraction
invert the fraction and multiply

5     =     5     ∗     2     =     10     =     10
1/2 1 1 1

a3     =     a3     ∗     b     =     a2b
a/b 1 a


invert any divisions and write as a single multiplication
then apply the rules of arithmetic and indices

given (5a2b/2) ∗ (10b/15a) / (3b-1/2a)

write out in fraction format
inverting any divisions
5a2b     ∗     10b     ∗     2a
2 15a 3b-1
sort numerical and algebraic
by commutation
5 ∗ 10 ∗ 2 ∗ a2b ∗ b ∗ a
2 ∗ 15 ∗ 3 ∗ a ∗ b-1
simplify arithmetic and
use index rules
10 ∗ a3b2
9 ∗ a ∗ b-1
use index rules (10/9)a2b3


These entities are simply sets of terms linked by '+' or '-' operators like

5x2 + 2x - 3       or       ax3 + bx2 + cx + d       or       √x + 3x-1 - 2/y

when it is important that they are treated as independent entities
they will be enclosed in some form of brackets
so that multiplying two expressions together
could be symbolized something like the following

(X1 + X2 + X3 + ... ) ∗ (Y1 + Y2 + Y3 + ... )


This so-called 'Law' is simply a generalized statement which
specifies how one expression can validly multiply another expression.

To establish the most elementary symbolic form of this algebraic manipulation
consider a rectangle (dotted line) made up of two smaller rectangles (left and right)

area(whole rectangle) = area(left rectangle) + area(right rectangle)

so if the lengths and breadths are labeled thus...

area(entire dotted-line rectangle)     =     b ∗ ( c + d )
area(left rectangle) + area(right rectangle)     =     ( b ∗ c ) + ( b ∗ d )

and since both areas must be the same we can conclude that

b ∗ ( c + d )     =     ( b ∗ c ) + ( b ∗ d )

In this minimal form we think of the term 'b' being multiplied
by each of the terms in the expression (c + d)
no matter what order we take them in

for example:
-3x-2y2 ∗ (2xy-1 + x2y-2 ) = -3x-2y2 ∗ 2xy-1 + -3x-2y2 ∗ x2y-2
= -6x-1y - 3

( a3 + a ) a-1 = a3 ∗ a-1 + a ∗ a-1
= a2 + 1

If the single term 'b' was replaced by a simple 2-term expression (a + b)
then the area analogy diagram would now look something like this...

Using the same lines of reasoning, the multiplication of two 2-term expressions would be

the 'expansion' of a simple example would look something like this....

(x - 2y) ∗ (3a - 4b ) =     (x) ∗ (3a) + (x) ∗ (-4b) + (-2y) ∗ (3a) + (-2y) ∗ (-4b)
=     3ax - 4bx - 6ya + 8yb

Extending these ideas to include expressions with any number of terms

the Distributive Law asserts that
the valid multiplication of one expression by another
requires that every term in both expressions
must be multiplied by one another


( x + y ) ∗ ( a + b + c) = xa + xb + xc + ya + yb + yc
( x + y + z ) ∗ ( a + b + c) = xa + xb + xc + ya + yb + yc + za + zb + zc
and so on


The most logically secure method of simplifying additions and subtractions
is to identify 'common factors' and rewrite them as a distribution.

it is tempting to imagine that this is 'obvious' in simple cases like...

5a + 7a - 2a = 5 ∗ a + 7 ∗ a - 2 ∗ a
= (5 + 7 - 2 ) ∗ a
= 10a

but as things get more demanding, some caution and ingenuity is required...thus...

a4b + 2a3b2 + a2b3     =     a2b * a2     +     a2b * 2ab     +     a2b * b2
    =     a2b * ( a2     +     2ab     +     b2)
    =     a2b * ( a2     +     ab     +     ab     +     b2)
    =     a2b * (a*(a + b) + b*(a + b))
    =     a2b * ((a + b)*(a + b))
    =     a2b(a + b)2


A 'fraction' is essentially a term which involves at least one division.
The possible symbolisms are outlined in 'Arithmetic'
but are repeated here again here algebraically.

A 'numerator' divided by a 'denominator might variously be expressed as

n/d                 n ÷ d                 n ∗ d-1                 n : d

but the most convenient form is usually a horizontal line used as a bracket
where all the multiplication factors are written above it
and all the division factors below it thus:

numerator     =     all multiplication factors
denominator all division factors

Terms with negative exponents are often rewritten in this manner thus:

5(a/b)x3y2(x+1)-4     =     5ax3y2

One of the most useful manipulation tools in algebra
results from the realization that
... anything multiplied by one is unaltered in value
... any factor multiplied by its inverse can be replaced by one

thus         F ∗ F-1     =     F/F     =     1

n     =     n     ∗     1     =     n     ∗     F     =     n ∗ F
d d d F d ∗ F

for any fraction...
multiplication of both numerator and denominator
by the same factor
does not change its value

numerator     =     numerator ∗ (Factor)
denominator denominator ∗ (Factor)


If the denominators are the same,
the distributive law is all that is needed

fraction format   a     ±     b
x x

indices format     ax-1     ±     bx-1

distributive law     x-1 [ a     ±     b ]

fraction format     a         ±         b

If the denominators are the different
the denominators must be manipulated to be identical
before the distributive law is used

two simple terms
in fraction format
A     ±     B
x y

now multiply both numerator and denominator of each fraction
by whatever factors are necessary
to ensure that all of the denominators are equal.

In this case, multiplying the first by (y) and the second by (x)
will ensure that both denominators are (xy).

denominators are now equal A ∗ y     ±     B ∗ x
x ∗ y y ∗ x

since the horizontal line acts as a bracket
the term 1/(xy) is distributed over the two terms (Ay) and (Bx)
so we obtain the equivalence

A     ±     B     =     Ay     ±     Bx
x y xy

More complex examples use exactly the same techniques.

Suppose we require a single fraction representation for the expression

3/(10a2) + a/6 - b/a

write in fraction format 3     +     a     -     b
10a2 6 a

now multiply both numerator and denominator of each fraction
by whatever factors are necessary
to ensure that all of the denominators are equal.

the 1st denominator has the factors '2,5,a,a'
the 2nd denominator has the factors '2,3'
the 3rd denominator has just 'a'

the single factor which includes all of these is     2 ∗ 3 ∗ 5 ∗ a ∗ a = 30a2
so each denominator must be provided with the factors that it lacks thus...

3 ∗ 3     +     a ∗ 5a2     -     b ∗ 30a
10a23 6 ∗ 5a2 a ∗ 30a

since the horizontal line acts as a bracket
the term 1/(30a2) is distributed over all three terms
so we can write the three numerators over the single denominator thus:

9     +     5a3     -     30ab


In mathematics, an expression with exactly two terms
can be referred to as a 'binomial'.
[ the Latin bi + nomen = two + names ]

The expression        a + b        is a binomial

We have already examined how the multiplication of two such expressions
is subject to a distribution process
where each term of one binomial multiplies each term of the other.
Brackets are used around each binomial in order to avoid ambiguity

( a₁ + b₁ )( a₂ + b₂ ) = a₁a₂ + a₁b₂ + b₁a₂ + b₁b₂

Note that each of the terms on the right hand side
has just one factor chosen from each of the binomials.
The two possibilities from the first binomial could each be combined
with the two possibilities of the second so that so that in total
there must be     2 ∗ 2     =     22     = 4     terms on the right hand side

If we now use the distribution process again
and multiply by a third binomial we obtain

( a₁ + b₁ )( a₂ + b₂ )( a₃ + b₃ ) = a₁a₂a₃ + a₁a₂b₃ + a₁b₂a₃ + a₁b₂b₃ +
b₁a₂a₃ + b₁a₂b₃ + b₁b₂a₃ + b₁b₂b₃

because again... each of the terms on the right hand side
has just one factor chosen from each of the binomials.

The two possibilities from the first binomial could each be combined
with the two possibilities of the second and then
with the two possibilities of the third...

1st a1 b1
2nd a2 b2 a2 b2
3rd a3 a3 a3 a3 a3 a3 a3 a3

so in total there must be
2 ∗ 2 ∗ 2     =     23     =     8 terms on the right hand side

We can now appreciate the consequences of further binomial multiplications.

If there are 'n' binomial multiplications, where 'n' is any +ve integer

( a₁ + b₁ )( a₂ + b₂ )( a₃ + b₃ )( a₄ + b₄ ) ...... ( an-1 + bn-1 )( an + bn )

    there will be 2ⁿ terms in the expansion
    and each term will have 'n' factors      
(exactly one from each of the different binomials)

thus ( a₁ + b₁ )( a₂ + b₂ )( a₃ + b₃ )( a₄ + b₄ )( a₅ + b₅ )
will have 2⁵ = 32 terms, each of which will have 5 factors

There are no short-cuts when such different binomials are multiplied
and the distributive process is all that is available if an expansion is necessary

( 3a + 4b )( 2p - 5q )( 2x - y ) = (3a + 4b)(2p∗ 2x - 2p∗ y - 5q∗ 2x + 5q∗ y)
  = 3a∗ 4px - 3a∗ 2py - 3a∗ 10qx + 3a∗ 5qy
4b∗ 4px - 4b∗ 2py - 4b∗ 10qx + 4b∗ 5qy
  = 12apx - 6apy - 30aqx + 15aqy
+ 16bpx - 8bpy - 40bqx + 20bqy

The increasing complexity of successive binomial multiplications
can be alleviated somewhat by deriving a formula rule... called a 'theorem'...
if and when we introduce some general simplifications, but we need to address two or three general considerations first...

• How many ways can a source group be ordered?[permutations]

• How many ways can a sub-group be selected from a source group?[combinations]

• What if all the 'a' terms were the same and all the 'b' terms were the same?


When we select members of a group...for various reasons...
there could be many different sequences or orders by which this might be done.
Sometimes the selection order is important...and sometimes it is not.

If the order is important the different choices are called 'permutations'.

Suppose for example we had a set of 4 brass digits { 1, 2, 5, 8 }
and wanted to use them as 2-digit street address numbers.

We have 4 choices for the first digit and then 3 choices for the second,
so that the total possible permutations are     4     *     3     =     12

12     15     18     21     25     28     51     52     58     81     82     85

whether the results are interpreted as decimal numbers, or unique house markers
the order of the digits is very significant.

Suppose now we wanted to use all 4 digits { 1, 2, 5, 8 } for a street address...
how many different address could we choose?
Now we have 4 choices for the 1st digit, 3 choices for the 2nd digit,
2 choices for the 3rd digit, and 1 choice for the 4th digit
i.e there are     4     *     3     *     2     *     1     =     24 permutations

1258 215851288125
1285 218551828152
1528 251852188215
1582 258152818251
1825 281558128512
1852 285158218521

Once again each is unique and the order is important.

If we are considering a general group of 'n' different unspecified things
then we could symbolize the group thus

{ a1,    a2,    a3, }
( where n could be any +ve integer 0,1,2,3...)

Let us demonstrate the pattern that emerges
when we consider choosing ALL of a general set of unspecified things.

If we have a group of one object {a1} then we have one choice only
[choose a1 and it is all over! ]

If we have a group of two objects { a1, a2}
then we have two sequence choices...namely     a1 a2     or     a2, a1
[choose a1 first and then a2     or     choose a2 first and then a1]

For a group of three objects { a1, a2, a3} then there are six permutations
[    a1 a2 a3     a1 a3 a2     a2 a1 a3     a2 a3 a1     a3 a1 a2     a3 a2 a1    ]
[there are 3 choices to begin with, then 2 choices, then only one ]
[i.e    3∗ 2 ∗ 1     =     6 possible selection sequences]

Extrapolating this selection process to the group of n objects
we would have ( n ) choices to begin with
then (n-1) choices, then (n-2) choices and so on
until only 1 choice was left
so that the total number of possible choice sequence permutations would be
(n) ∗ (n-1) ∗ (n-2) ∗ ... ∗ 3 ∗ 2 ∗ 1

this counting-number product occurs quite often,
is called 'n factorial' and is written in shorthand as     n!

When selecting ALL of a group of {n objects}
the number of possible choice permutations is

n!     =     (n) ∗ (n-1) ∗ (n-2) ∗ ... ∗ 3 ∗ 2 ∗ 1

eg the 6 objects { a, b, c, d, e, f} can be sequenced in     6!     different ways
where     6!     =     6 ∗ 5 ∗ 4 ∗ 3 ∗ 2 ∗ 1 = 720     (which is countable)

but the 26 objects {a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z}
can be sequenced in 26! different ways
or about 403 291 461 100 000 000 000 000 000     =     4.033*1026 ways

Numbers like this are so large that they are more or less beyond comprehension.

Let us take a moment or two grasp the implications.

Suppose a computer could do all the processing necessary
to choose and record just one permutation
zyxabcwvudeftsrghiqpojklmn         for example
in just one nano-second     =     10-9sec

Then the total time necessary
for the computer to be able to write them all out
would be about 4.033*1026 * 10-9sec     =     4.033*1017seconds

and since there are about 3.156*107sec in one Julian year
the total time necessary would be around
4.033*1017/3.156*107     =     1.282*1010 Julian years

but evidence to date seems to suggest that
the age of the universe is only about 1.38*109years
so the calculated time necessary to write all the permutations
comes out at 1.282*1010 /1.38*109    =     9.28 universe durations.

We are definitely flirting with nonsense....

{ a1, a2, a3, a4 ... an }

Sometimes, order is basically unimportant when making selections from a source group.
Choosing reference books for a course of study is based on content of the books,
not the order in which thy were chosen.
Selecting a representative sports team from a pool of association players
is based on ability and team strengths, not on the order of selection.
Selecting tools for a workshop is fundamentally a process based on the expected utility of the tools. The order in which the choices were made is likely to have a much lower significance.

A sub-group selection from a source group...
when order is irrelevant or unimportant...
is called a combination

eg     any of the selection sequences
a1 a3 an     a1 an a3     a3 a1 an    a3 an a1     an a1 a3     an a3 a1   
will select the combination of an 'a1' and an 'a3' and an 'an'
from the group { a1, a2, a3, a4 ... an } equally validly

[ When you select a hammer and a saw and a drill for your toolbox,
...from the display shelves of the tool supply store...
the order of selection will usually be entirely irrelevant.] many other combinations of 3 could be selected from the source group?
( assuming n is greater than or equal to 4 )

The number of possible 3 selection sequences is simply (n)∗(n-1)∗(n-2)
because there are n possible entities for the first choice
then (n-1) possible entities for the second choice
and finally (n-2) possible entities for the third choice

Since each and every group of 3 can be validly selected
by (3)∗(2)∗(1) = 3! = 6 indicated previously...
the number of possible 3-group combinations must be equal to
the number of possible 3-sequence selections divided by 3!

number of possible 3-group combinations
from a source group { a1, a2, a3, a4 ... an }
    =     (n)∗(n-1)∗(n-2)

thus for example:
number of possible 3-group combinations
from a source group { a1, a2, a3, a4, a5 }
    =     5 ∗ 4 ∗ 3     =     10
3 ∗ 2 ∗ 1

Which are listed below...
a₁ a₂ a₃     a₁ a₂ a₄     a₁ a₂ a₅     a₁ a₃ a₄     a₁ a₃ a₅    
a₁ a₄ a₅     a₂ a₃ a₄     a₂ a₃ a₅     a₂ a₄ a₅     a₃ a₄ a₅    

In general, very similar considerations apply.
If n is a counting integer and r is less-than or equal to n
and the symbolism nCr is used as shorthand for
'the number of possible r-group combinations
from an n-group source { a1, a2, a3, a4 ... an }

nCr     =     (n)∗(n-1)∗(n-2)∗...∗(n-r+1)

so that
number of possible 2-group combinations
from a 5-group source { a₁ a₂ a₃ a₄ a₅}
    =     5C2     =     5 ∗ 4     =     10
2 ∗ 1

a₁ a₂     a₁ a₃     a₁ a₄     a₁ a₅     a₂ a₃    
a₂ a₄     a₂ a₅     a₃ a₄     a₃ a₅     a₄ a₅    

An awkward exception to this symbolism occurs if r = 0
Arithmetically nC0 would involve a division by zero
...which is undefined...
nC0 is thus defined to be equal to one
in order to bypass an irritating inconvenience...

( using sub-group selections)

Let us consider multiplying 4 binomials together

( a₁ + b₁ )( a₂ + b₂ )( a₃ + b₃ )( a₄ + b₄ )

firstly...     there will be 24 terms in the expansion
secondly...     each term will have 4 factors        

Let us tabulate all 16 possible ways of choosing groups of 'a' factors from { a₁ a₂ a₃ a₄ }
(once the 'a' factors are chosen all the rest will automatically be 'b' factors)

4 a's 4C4
4 ∗ 3 ∗ 2 ∗ 1
1 ∗ 2 ∗ 3 ∗ 4
1 a₁ a₂ a₃ a₄
3 a's 4C3
4 ∗ 3 ∗ 2
1 ∗ 2 ∗ 3
4 a₁ a₂ a₃ b₄
a₁ a₂ b₃ a₄
a₁ b₂ a₃ a₄
b₁ a₂ a₃ a₄
2 a's 4C2
4 ∗ 3
1 ∗ 2
6 a₁ a₂ b₃ b₄
a₁ a₃ b₂ b₄
a₁ a₄ b₂ b₃
a₂ a₃ b₁ b₄
a₂ a₄ b₁ b₃
a₃ a₄ b₁ b₂
1 a 4C1
4 a₁ b₂ b₃ b₄
a₂ b₁ b₃ b₄
a₃ b₁ b₂ b₄
a₄ b₁ b₂ b₃
0 a 4C0 =1 by defn. 1 b₁ b₂ b₃ b₄

If we now introduce the simplification
that all the a's are to be the same
and all the b's are to be the same
( a₁ + b₁ )( a₂ + b₂ )( a₃ + b₃ )( a₄ + b₄ ) will become ( a + b )( a + b )( a + b )( a + b )
the term a₁ a₂ a₃ a₄ will simply become a ∗ a ∗ a ∗ a = a4
the term a₁ a₂ b₃ b₄ will simply become a ∗ a ∗ b ∗ b = a2b2
and so on... so we will be able to write that...

( a + b )4 = a4 + 4a3b + 6a2b2 + 4ab3 + b4

Clearly a parallel procedure would work for further binomial multiplications
and if 'n' was any positive integer
a more general statement... the Binomial Theorem...could be expressed thus:

( a + b )n = nCnan + nCn-1an-1b + nCn-2an-2b2 + ... nC1abn-1 + nC0bn

For simple powers of ( a + b ) it is not really as complicated as it looks:

( a + b )5 = 5C5a5 + 5C4a4b + 5C3a3b2 + 5C2a2b3 + 5C1ab4 + 5C0b5

( a + b )5 = a5 + 5a4b + 10a3b2 + 10a2b3 + 5ab4 + b5

What is sometimes called 'ordinary' algebra,
is just an elaboration of the ideas and principles
that have been outlined above.
Confidence and skill, as with any language,
can only come with many hours of practice.

There are no 'practice examples' here in this essay
but following worked examples
...on the internet or with the assistance of artificial intelligence...
where an 'example expression' is modified to a 'solution expression'
in a series of justified step by step manipulations
will prove very helpful.