Quantum Machine Learning: Introduction to Quantum Systems

Quantum systems are similar to classical probability distributions, but they have certain properties that make them unique.

5 years ago   •   11 min read

By Peter Foy

Quantum computers have been making rapid advances over the last few years, and one of the main fields that quantum computers will be applied to is machine learning.

As you've probably seen, machine learning has also been making rapid advances in recent years, particularly in the field of deep learning.

One of the ways that we compute large deep neural networks with classical computers is with parallel processing.

Since parallel processing is already capable of handling many of these tasks, we want to use quantum computers for the extremely difficult machine learning models.

One example of this is probabilistic models over graph structures, as these models are very difficult to compute classically.

In this series on Quantum Machine Learning we're going to look at abstract algorithms (which will require near perfect quantum computers), as well as practical uses for the imperfect quantum computers we have today.

source: IBM Research

This guide is based on this course on Quantum Machine Learning by Peter Wittek, and is organized as follows:

  1. Classical Probability Theory
  2. Quantum States
  3. Measurement & Mixed States
  4. Evolution of Closed and Open Systems
  5. Summary of Quantum Systems

Before we get into quantum systems, let's first review classical probability theory.

Stay up to date with AI

We're an independent group of machine learning engineers, quantitative analysts, and quantum computing enthusiasts. Subscribe to our newsletter and never miss our articles, latest news, etc.

Great! Check your inbox and click the link.
Sorry, something went wrong. Please try again.

1. Classical Probability Theory

Probability theory is a cornerstone for machine learning.

Before we move onto quantum states, let's review a few important concepts in classical probability theory.

Let's start with the classic example of probability distribution: flipping a coin.

When we flip a coin there are two possible outcomes: heads or tails, and we can associate a probability with each outcome.

We have a probability \(P(X=0) = p_0\) for heads \(P(X=1) = p_1\) for tails with each toss of the coin.

Here's what we know about these values:

  • They are non-negative
  • They are real numbers
  • They sum up to 1, \(\sum_i p_i = 1\)

The plot below demonstrates all possible probability distributions of biased and unbiased coins:

We can also describe the probabilities in a vector:

\[\vec{p} = \begin{bmatrix} p_0 \\ p_1 \end{bmatrix}\]

Another important concept of classical probability is called entropy.

Entropy characterizes a probability distribution and says something about its unpredictability.

A probability distribution's entropy is defined as:

\[H(p) = - \sum_i p_i \log_2 p_i\]

Here's a visual representation of a probability distributions entropy:

As we can see, entropy peaks at the 50-50 unbiased coin, which is the uniform distribution.

This point represents the most unpredictable coin flip - it's 50-50 so there's no predictive power about what the next coin flip will be.

If the coin is biased either way it gives it probability predictive power about future coin flips.

3. Quantum States

A classical coin is a two-level system: it is either heads or tails.

A quantum state is a probability distribution, and the simplest case is a two-level state, which we call a qubit.

We can think of quantum states as probability distributions, but with certain properties that make them unique from classical probabilities.

Based on classical probability theory and the stochastic vector (which describes a probability distribution) we can now move on to quantum states.

The main difference with quantum states is that we are not restricted to real and non-negative real numbers.

Instead, entries in the vector are complex values.

A quantum state is similar to a stochastic vector and can be written as a column vector.

The label of a quantum state is called a ket in Dirac notation.

For example, for a qubit we can write:

|\psi\rangle = \begin{bmatrix}
a_0 \\
a_1 \\

One key difference to classical probability distributions and stochastic vectors is the normalization constraint.

The normalization of a quantum state vector doesn't happen in \(l_1\) norm, instead it happens in \(l_2\) norm.

We're still normalizing to one but with quantum states, it is the square of the absolute values that adds up to one.


We can now introduce notation for the 1 0 vector and for the 0 1 vector.

  • The 1 0 vector is called 0 ket, and the notation is \(|0\rangle\)
  • The 0 1 vector is called 1 ket, and the notation is \(|1\rangle\)

|0\rangle = \begin{bmatrix}
1 \\
0 \\
\end{bmatrix} \,\,\, |1\rangle = \begin{bmatrix}
0 \\
1 \\

Superposition is just the expansion of this vector in a basis.

In front of the 0 and 1 ket we can also have different coefficients: \(a0\) and \(a1\).

These coefficients are called probability amplitudes.

These coefficients no longer represent probabilities directly, instead it's the absolute value squared of these values that gives you a probability.

So once you get an outcome of 0, we know that it is in the 0 state, and if we get an outcome of 1 the state is going to be in 1 state.

This is referred to as the "collapse of the wave function".

So once we pull out a sample from the distribution and make an observation, we get a deterministic state.

So we have a 2-dimensional complex space, which would take 4 dimensions to visualize.

But since we have a restriction on the degree of freedom we can have a 3-dimensional object representing these qubit states.

The geometric representation is referred to as the Bloch sphere:

As we can see the north pole is represented by 0 ket and the south pole is identified by 1 ket.

Every single point on the surface of the sphere is a qubit state.

If we compare this to classical probability distributions where every single probability distribution lies on a straight light, you can see that we have a much larger representative power.

Here's what our \(|0\rangle\) looks like plotted on the Boch sphere using the Forest SDK from Rigetti:

There are several things we can do with quantum states which we cannot with classical computers.

One of which is called "interference".

Interference is a phenomenon where different basis vectors and the coefficients interact in our calculations.

If you want to read more about the role of interference in quantum computing, check out this article on the subject.

Multiple Qubits

Now that we've introduced the simplest possible quantum system, a qubit state, let's now look at larger probability distributions composed of multiple qubits.

First, we need to introduce a new mathematical operation: a tensor product.

As we can see from Wikipedia:

Given two vectors, we can form a tensor of their own from them rather naturally using the outer product, which is denoted $VW$ and equals $vw^T$.

Here's how we write the column vector describing two qubits with a tensor product.

Given two quits:

\(|\psi\rangle=\begin{bmatrix}a_0\\a_1\end{bmatrix}\) and \(|\psi'\rangle=\begin{bmatrix}b_0\\b_1\end{bmatrix}\), their product is \(|\psi\rangle\otimes|\psi'\rangle=\begin{bmatrix}a_0b_0\\ a_0b_1\\ a_1b_0\\ a_1b_1\end{bmatrix}\)

One important convention in most quantum computing libraries is that the rightmost qubit is the qubit 0, and to the left of it would be 1, and so on.

This is the same order of representing binary values in classical computers.

Now we know these are called "product states", but there are also states that cannot be written in this form, even though they live in the same space.

State that cannot be written as a product are called entangled states.

Entanglement is the mathematical form of describing a phenomenon of strong correlations between random variables.

This correlation exceeds what is possible with classical computation.

Together with interference, entanglement plays a very important role in quantum computing.

Later in this series we'll look at how we can exploit these quantum-mechanical properties in quantum calculations.

4. Measurement & Mixed States

Now that we've introduced quantum states as a generalization of classical probabilities, we need to discuss how we actually extract a certain event with a certain probability.

To understand this concept we need to introduce measurements.

Measurement connects the quantum realm to our classical world.

First we need to introduce some more notation to complement to ket.

As described earlier, a quantum state can be described as a column vector.

In the simplest form, a quantum state is a two-level system called a qubit, which is a vector of two complex elements.

The notation we need is called a bra and it is denoted by \(\langle\psi|\) for some quantum state \(|\psi\rangle\).

Together these form the bra-ket, or Dirac notation.

One thing that we can do with complex numbers that we cannot with real numbers is take their conjugate (this means we flip the sign of the imaginary component).

A bra is the conjugate transpose of a ket, and is written as the mirror of a ket.

This means that a bra is a row vector.

For example, this is the bra for \(|0\rangle\):

|0> ket:
<0| bra:
 [[1 0]]

This makes it esay to write dot products: for example, a bra followed by a ket is exactly what the dot product is.

This is so common that we drop one the vertical bars and just write, for example: \(\langle 0|0\rangle\).

As mentioned quantum states are normalized, so the inner product of a quantum state with itself is always 1.

On the other hand, if we have a ket and a bra this is going to result in a matrix of the outer products of the vectors.

For example, here's \(|0\rangle\langle 0|\):

zero_ket @ zero_ket.T.conj()
array([[1, 0],
       [0, 0]])

To summarize:

  • If we take a ket and a bra this gives us a matrix
  • If we take a bra and a ket this gives us a scalar value

Now that we have this understanding we can discuss measurement.


The intuition for measuring a quantum system is very similar to a random variable in classical probability theory.

In classical probability theory, random variables take values.

When measuring quantum states these take measurement outcomes and you always get a random outcome, the same way that random variables are intrinsically random.

To formalize this, we need to introduce the Born Rule.

The Born Rule tells us that we get some outcome 0 of the qubit state with a probability of the absolute value of \(|a_0^2|\).

The state afterward becomes 0 ket.

After measurement superposition is destroyed and we only get one part of the superposition. The part that we get is random with a certain probability.

As mentioned above, this is called the "collapse of the wave function".

To summarize, a measurement in quantum mechanics is an operator-valued random variable.

Many of the questions about measurements are still waiting to be solved, although most quantum computers today use one specific measurement.

This measurement is in the canonical basis.

The measurement contains two projections: \(|0\rangle\langle 0|\) and \(|1\rangle\langle 1|\).

This measurement can be applied to any of the qubits of the quantum computer.

From this description, it seems like there is a rigid boundary between a quantum and a classical system that is bridged by measurement.

Reality is, of course, a bit more complex.

Unless a quantum system is perfectly isolated, it still interacts with its surrounding environment.

This leads to the introduction of mixed states, which in one limit recover classical probabilities.

Mixed States

One of the characteristics of quantum computers today is that they have a lot of noise that affects the quantum states.

To understand this we need to introduce more notation.

Mixed states are states that factor in the noise that affects our current quantum computers.

As we saw a ket and a bra is a projection, and it is also a density matrix.

A density matrix is another way of writing a quantum state, instead of using ket.

For example, we could write \(\rho = |\psi\rangle\langle\psi|\), where \(\rho\) is the density matrix for \(|\psi\rangle\).

So why do we need these density matrices?

The reason is that we can also create probabilistic mixtures over pure states.

Every state that we've mentioned so far has been a pure state, meaning we have kets or a density matrix created as a ket and a bra.

But there are also other states called mixed states.

A mixed state is a classical probability distribution over pure states.

We can write a mixed state as \(\sum_i p_i |\psi_i\rangle\langle\psi_i|\), where \(\sum_i p_i=1\).

Ideally, our quantum state would be perfectly isolated from the environment, but the quantum computers we have today simply cannot achieve this level of isolation.

This means that coherences are slowly lost to the environment, a process which is called decoherence.

The speed of decoherence determines how long the quantum algorithms can run on the quantum computer.

5. Evolution of Closed and Open Systems

We now know what a quantum state is and how we can pull samples from the quantum state.

Let's now discuss how we do a quantum computation.

In other words, how do we evolve this probability distribution?

Let's start with closed systems.

Closed Quantum Systems

A closed system means that the evolution is going to be unitary.

This is an idealized version of a quantum calculation.

First, let's look at how we transform classical probability distributions.

Recall that we have stochastic vectors, and we transform them with stochastic matrices.

After the transformation, we still have a stochastic vector.

When it comes to quantum states, we want to apply an operation on the state and still return a quantum state.

The matrix that fulfills this condition has a unitary property.

Unitary means that if you apply its convex conjugate you get the identity.

This means that the complex conjugate of the matrix is its inverse.

This operation is linear since we're applying a matrix on a vector, and this becomes difficult when we want to apply nonlinear operations.

This idealized quantum system is in contrast with classical computing, as in digital computers you actually lose information as you proceed in a calculation.

In this quantum system, if we apply the inverse operation we get back the initial state that we started with.

Open Quantum Systems

The closed quantum system we just discussed is exactly what we want a universal quantum computer to acheive.

Our current quantum systems, however, are not closed systems.

Instead, they are open systems.

This means that if we have a system, for example a quantum processor, which interacts with its environment.

This causes the system to lose coherence as the system interacts with the environment, which happens largely in an uncontrolled way.

Since there is this interaction with the environment, the system's evolution is no longer a unitary operation.

To understand this, let's return back to the idea of decoherence.

We introduced density matrices - so when we take a density matrix of a pure state we can mix it with random noise.

We can control how much noise we have by tuning a parameter called invisibility between 0 and 1.

With invisibility of 1 we have a pure state, and as we get to 0 we get to end up with a maximally mixed state.

The speed at which this happens affects our quantum calculations, and this is sometimes referred to as the \(T_2\) time.

6. Summary of Quantum Systems

In this article we introduced the foundation for quantum machine learning, which is understanding quantum systems.

As we discussed quantum systems are similar to classical probability distributions, but they have certain properties that make them unique.

A quantum state is a probability distribution, and the simplest case is a two-level state called a qubit.

The main difference with quantum states is that we're not restricted to real and non-negative real numbers. Instead, entries in the vector are complex values.

We saw that interference, superposition, and entanglement play a very important role in quantum computing.

We then looked at how we measure a quantum system, which is very similar to a random variable in classical probability theory.

We saw that after measurement superposition is destroyed and we only get one part of the superposition. The part that we get is random with a certain probability.

Next we discussed one of the characteristics of quantum computers, which is that they have a lot of noise that affects the quantum states.

Mixed states are states that factor in the noise that affects our current quantum computers.

Ideally, our quantum state would be perfectly isolated from the environment, but current quantum computers cannot achieve this level of isolation.

Finally we looked at the evolution of closed and open systems.

An idealized universal quantum computer would exist in a closed system, but our current systems are open systems.

This means that we have a system that interacts with its environment, which causes it to lose coherence in a largely uncontrolled way.

Now that we have an understanding of quantum systems, in the next article we will discuss how we exploit these systems with quantum computation.


Spread the word

Keep reading