algorithms – Finematics

Two Generals’ Problem

jakub — Wed, 02 May 2018 12:59:14 +0000

What is the Two Generals’ Problem?

The Two Generals’ Problem, also known as the Two Generals’ Paradox or the Two Armies Problem, is a classic computer science and computer communication thought experiment that we’re going to talk about in this post.

First of all, to avoid any confusion, we need to remember that the Two Generals’ Problem, although related to the Byzantine Generals’ Problem is not the same. Byzantine Generals’ Problem is a more general version of the Two Generals’ Problem and it’s often discussed when talking about distributed systems, fault tolerance and blockchain. We’ll be talking about it in the following post.

But now let’s move to the story of the two generals.

The story of the Two Generals

Let’s imagine two armies, led by two generals, planning an attack on a common enemy. The enemy’s city is in a valley and has a strong defence that can easily fight off a single army. The two generals have to communicate with each other to plan a synchronised attack as this is their only chance to win. The only problem is that to communicate with each other they have to send a messenger across the enemy’s territory. If a messenger is captured the message he’s carrying is lost. Also, each general wants to know that the other general knows when to attack. Otherwise, a general wouldn’t be sure if he’s attacking alone and as we know attacking alone is rather pointless.

Now, let’s go through a simple scenario. Let’s call our generals A and B and let’s assume everything goes perfectly fine. General A, who is the leader, sends a message – “Attack tomorrow at dawn”. General B receives a message and sends back an acknowledgement – “I confirm, attack tomorrow at dawn”. A receives B’s confirmation. Is this enough to form a consensus between the generals? Unfortunately not, as General B still doesn’t know if his confirmation was received by General A. Ok, so what if General A confirms General’s B confirmation? Then, of course, that confirmation has to be also confirmed and we end up with an infinite exchange of confirmations.

In the second scenario, let’s also assume that General A sends a message to General B. Some time has passed and General A starts wondering what happened to his message as there is no confirmation coming back from General B. There are two possibilities here. Either the messenger sent by General A has been captured and hasn’t delivered a message or maybe B’s messenger carrying B’s confirmation has been captured. In both scenarios, they cannot come to a consensus again as A is not able to tell if his message was lost or if it was B’s confirmation that didn’t get through. Again, we ended up in an inconsistent state which would result in either General A or B attacking by himself.

We can quickly realise that no matter how many different scenarios we try and how many messages we send we cannot guarantee that consensus is reached and each general is certain that his ally will attack at the same time. To make it even worse, there is no solution to the Two Generals’ Problem, so the problem remains unsolvable.

I hope you can clearly see an analogy to computers’ communication here.

How is this related to computer science and TCP?

Instead of two generals, let’s imagine two computer systems talking to each other. The main problem here is again the untrusted communication channel and inconsistent state between two machines. A very common example that always comes up when talking about the Two Generals’ Problem is the TCP protocol.

As we probably know, TCP uses a mechanism called 4-way handshake to terminate the connection. In this mechanism, a system that wants to terminate a connection sends a FIN message. The system on the other side of the communication channel replies with an ACK and sends its own FIN message which is followed by another ACK from the system which initialised termination. When all of those messages are received correctly, both sides know that the connection is terminated. So far it looks ok, but the problem here is again the shared knowledge between the two systems. When, for example, the second FIN is lost we end up with a half-open connection where one side is not aware that the connection has been closed. That’s why even though TCP is very reliable protocol it doesn’t solve the Two Generals’ Problem.

So maybe a pragmatic approach?

I’m happy you’re not giving up. Unsurprisingly, there was a number of people trying to solve unsolvable Two General’s Problem and they came up with a few practical approaches. The main assumption here is to accept the uncertainty of the communication channel and mitigate it to a sufficient degree.

Let’s go back to our generals. What if General A instead of sending only 1 messenger sends 100 of them assuming that General B will receive at least 1 message. How about marking each message with a serial number starting from 1 up to 100. General B, based on the missing numbers in the sequence, would be able to gauge how reliable the communication channel is and reply with an appropriate number of confirmations. These approaches, even though, quite expensive are helping the generals to build up their confidence and come to a consensus.

If sacrificing messengers is a problem, we can come up with yet another approach where the absence of the messengers would build up generals’ confidence. Let’s assume that it takes 20 minutes to cross the valley, deliver a message and come back. General A starts sending messengers every 20 minutes until he gets a confirmation from General B. Whenever confirmation arrives General A stops sending messengers. In the meantime, General B after sending his messenger with his confirmation awaits for the other messengers coming from General A, but this time an absence of a messenger builds up General’s B confidence as this is what the Generals agreed on.

In this case, we have a clear speed vs cost tradeoff and it’s up to us which approach is more suitable to our problem.

That’s the end of the story of the Two Generals. Time for a quick summary.

Summary

Two Generals’ Problem is a classic computer science problem that remains unsolvable.

It comes up whenever we talk about communication over an unreliable channel.

The main problem is an inconsistent state caused by lack of common knowledge

There are some pragmatic approaches to the Two Generals’ Problem.

Extras

Two Generals’ Problem was first published in 1975 in “Some Constraints and Trade-offs in the Design of Network Communications” paper and described a problem with communication between two groups of gangsters. If you want to read the original version check this link.

What are robo-advisors?

jakub — Wed, 18 Oct 2017 22:52:19 +0000

What are robo-advisors?

Have you ever come across a term robo-advisor? Have you ever wondering what that is? The first thing that comes to your mind is probably the “robo” bit, but if you’re imagining a human-like, metal creature you cannot be further from the truth. Robo-advisors are digital, financial advising platforms that make investment decisions without or with a minimal human intervention. They make use of algorithms to allocate assets and manage portfolios. After the 2008 financial crisis, lots of people lost trust in the financial sector and were reluctant to pay financial advisors hefty fees. This is when low-cost robo-advisors came to play and started attracting lots of customers. The other factor that drives robo-advisors is that the majority of millennials are comfortable with online tools, so they expect to have a similar experience when it comes to investing.

How do robo-advisors work?

So how exactly those robo-advisors work? Let’s get into the details.

Robo-advisors collect client’s personal information, risk tolerance and investment goals. Based on that, they choose an appropriate portfolio structure. Most robo-advisors make use of Modern Portfolio Theory to allocate money between different types of assets like stocks and bonds maximising returns while minimising risk. Instead of investing money in particular stocks or bonds, robo-advisors prefer to choose exchange-traded funds (ETFs) to diversify their portfolios even further. Majority of robo-advisors make use of tax-loss harvesting to optimise customers’ taxes. The first thing that a tech-savvy person might notice is the fact that robo-advisors are not that clever. They allocate money based on predefined rules and rebalance your portfolio to bring it back to the base level if there is too much money allocated to one type of assets. Let’s look at this example. Let’s assume you are a fairly young investor and a robo-advisor decides to put 90% of your cash into a well-diversified portfolio of stocks and 10% of your cash into government bonds. If the value of your stock allocation rises to let’s say 95% of the value of your total portfolio a robo-advisor rebalances your portfolio by selling some of your shares and buying more bonds to bring the levels back to 90/10. It is as simple as that. It’s not rocket science at all.

Although the most popular robo-advisors work in this way, there is also a small subset of robo-advisors that make use of AI and machine learning to predict what the most profitable investments in the future will be. These types of robo-advisors are way more exciting, but they also carry a substantially higher risk. Of course, it all depends on how big chunk of our portfolio is actively managed by AI, but from my perspective, it looks like those robo-advisors start to resemble more hedge funds than anything else.

Different types of robo-advisors

As we know from the previous chapter, not every single robo-advisor is the same and there are some substantial differences between them. Let’s have a look at some of the common types of robo-advisors:

Standard robo-advisors – these robo-advisors work just like described in the previous section. They allocate money between stocks and bonds based on predefined rules. They facilitate passive investment strategy.
AI-driven robo-advisor – this is the new kid on the block. One of the most popular AI-driven robo-advisor is Responsive that rebalances your portfolio automatically as the economy changes.
Theme based investing – Motif is a good example of a robo-advisor which allows its customers to have a greater control over where their money goes. Investors can choose from different theme based investments in their Impact Portfolio. For example, Sustainable Planet, Fair Labour or Good Corporate Behaviour.

What are the pros and cons of robo-advisors?

Like with everything there are always pros and cons. Let’s start with the pros:

lower management fees – robo-advisors offer lower management fees when compared to the traditional financial advisors. The reason for that is quite simple. There is a minimal or no human time required to manage your portfolio and you don’t need to pay machines (besides paying for electricity, space, maintenance and developers’ salaries…)
low barrier to entry – some robo-advisors can look after your portfolio from as low as $1 which is particularly beneficial for millennials who are, as we know, not very keen on saving money.
automated process – your portfolio is managed online, your balance and chosen investments are visible all the time and you don’t need to waste time meeting your financial advisor.

The cons:

lack of human interaction – for some people, especially the older generation, it is a disadvantage.
managed portfolios are less personalized – most robo-advisors allocate your money based on the personal information and risk tolerance, but they do not treat customers individually like the real financial advisors. This might change with the help of AI.
lack of active investment – your portfolio is well diversified and passive. There is no space for excitement and some people find it very boring, but as George Soros said: “Good investing is boring”.

What are the biggest robo-advisors?

Globally, there are over 300 robo-advisors looking after people’s money. The US takes the sheer amount of that market with over 200 of them. The biggest robo-advisors in the US are Betterment and Wealthfront. In the UK the market leaders are Nutmeg, Moneyfarm and Wealthify. Also, some bigger players in the investment business started looking into the robo-advisors space and came up with their own solutions or acquired some already existing companies. Robo-advisors industry is expanding rapidly and its asset under management (AUM) is expected to grow to stunning $4.6 trillion by 2022 (prediction by BI Intelligence).

Summary

Robo-advisors seem to be a good solution for someone who’s looking for a passive investment strategy and don’t want to necessarily go into the details of their investments. They definitely fill up the gap in the industry. From the technology point of view, it seems like robo-advisors were inevitable and I’m not surprised they are taking more and more of the financial advising business. In the future, we can only expect more solutions like that and hopefully that will be beneficial for the consumers. I’m also expecting a rise of AI-driven robo-advisors which may dominate the wealth management space.

Algorithmic efficiency and Big O Notation

jakub — Sat, 30 Sep 2017 10:15:06 +0000

How to compare the efficiency of algorithms?

Let’s imagine you just wrote a new search algorithm. How would you measure its performance? How would you compare it with other already existing algorithms? One of the most important elements of every algorithm is its time complexity. Measuring time complexity allows us to predict how long it takes for an algorithm to complete its execution and this is crucial for every algorithm that has changeable input. To be able to compare different algorithms we use asymptotic notation. Big O, Big Ω and Big Θ are all examples of asymptotic notation, they belong to the family of Bachmann-Landua notations and they can precisely describe the complexity of algorithms. Although all three previously mentioned notations are accurate ways of describing algorithms, software developers tend to use only Big O notation.

There is also a difference between what academics and non-academics mean by Big O notation. In the academic environment Big O puts an upper bound on the algorithm. This means that an algorithm with an average time complexity of n can be described by Big O notation as $O(n^2)$ as the algorithm is not slower than $n^2$ (and it’s also not slower than $n log n$). Software developers however, tend to use the tightest bound possible when it comes to Big O notation. When your fellow programmer tells you that his algorithm has O(n) complexity it means that on average it executes in linear time not in logarithmic or constant time. In the academic setup that tight bound is described by Big Θ.

I know it’s a little bit confusing, but for some reason software developers use Big O as if it was Big Θ. It might be too late to change it now so I think we should just accept this discrepancy. It’s also worth mentioning that very often programmers just drop the “O” and just say that, for example, their algorithm has $n^2$ complexity (in most cases they just mean $O(n^2)$).

What is Big O notation?

As we mentioned before Big O is a tech industry standard for describing the complexity of algorithms. Big O notation can be used to describe both time and space complexity. In this article we’ll focus on time complexity, but it’s also important to understand space complexity. Time complexity describes how the runtime scales regarding the input parameters. On the other hand, space complexity describes how memory scales regarding the input parameter. It is useful to determine how much memory is needed for running our algorithms smoothly.

In Big O notation, the complexity of algorithms is described using well known mathematical functions. The most common ones are constant, logarithmic, linear, quadratic, exponential and factorial functions. Big O simplifies more complicated functions to show the growth rate of a function rather than every detail of it.

Average, best and worst cases

Big O notation can be used to describe best, worst and average case. In most cases, we just focus on the average and worst scenario as best execution is not that relevant. Let’s use bubble sort as an example. As we know bubble sort is a simple sorting algorithm which compares each pair of adjacent items and swaps them if they are in the wrong order. We can easily imagine a situation when our input array is already sorted. In that situation, our bubble sort goes through our input array only once as there are no swaps required. It means that it’s best time complexity is O(n). What’s the probability of this scenario? From basic combinatorics we can calculate that it’s $1/n!$ If we have an array with just 10 random elements our chances are $1/3628800$. That’s why we almost never focus on the best performance. What matters in most cases is the average and worst-case performance. Although the worst case might be also very unlikely, it’s important to be aware of it as it might degenerate our algorithm and cause huge trouble (going from O(log n) to O(n) for big enough input may freeze the whole system for a while). This is exactly why we focus mostly on the average and worst cases and not on the best one.

Now let’s look at common ways of simplifying Big O notation.

Dropping constants

When using Big O and other asymptotic notations it’s standard practice to drop all the constants. What’s the reason for this? Let’s have a look at an example. Let’s say our algorithm works in $3n^2$ time complexity. How does our constant “3” influence our algorithm? Let’s have a look at the table below.

n	Number of Operations for 3n^2	number of operations for n^2
1	3	1
10	300	100
100	30000	10000

As we can see the constant can change the final result. However, we are interested in the rate of increase and the major factor for that increase is $n^2$, not the constant. To simplify Big O we omit constants.

Dropping non-dominant terms

Dropping non-dominant terms has exactly the same purpose as dropping constants. It simplifies our function and focuses on what is important which is the rate of increase. Again, let’s have a look at a quick example: $n^2 + n$

n	number of operations for n^2+n	number of operations for n^2
1	2	1
10	110	100
100	10100	10000

As we can see when input grows our non-dominant n term means less and less and becomes irrelevant. That’s why we can easily drop it.

Common pitfalls

Different input variables

Let’s take a function which accepts two different lists and for every element in the first list it iterates through every element of the second list.

void printAll(String[] firstNames, String[] lastNames)

Some developers might be tempted to classify this as $O(n^2)$ which is not true. Although we iterate through the internal array the input lists are not of the same size thus the complexity is O(ab) where a is the length of the first list and b is the length of the second one. This is a very common mistake while calculating algorithm’s performance and we need to be able to avoid it.

When to add and when to multiply

This is going to be our last rule for calculating Big O. Let’s imagine we have two loops in our algorithm. When should we add and when should we multiply their complexity? The rule is quite simple. If your algorithm processes one loop and goes to the other one while the processing of the first one is finished we add. If for every element in the first loop the algorithm goes to every single element of the second loop we multiply. Of course adding doesn’t really bring any value as we’re going to drop constants after that, so if we have 2 loops going through n elements our complexity is still O(n).

Different complexities with examples

Let’s have a look at the chart with all of our common functions plotted.

As we can see the difference between every single function is huge. Let’s have a closer look at each complexity.

O(1) – constant

O(1) means that complexity does not depend on the size of the input. No matter how big our input is it never changes the speed of our algorithm. On the chart, it’s represented as a horizontal line.

n	Number of Operations
1	5
10	5
100	5

Although sometimes our constant might be quite big it’s still a constant and it doesn’t grow when the input changes.

Examples:

Searching for an element in a HashMap (although it can degenerate to O(n) if hashes are not efficient)
Adding two integers

O(log n) – logarithmic

O(log n) algorithms grow in a logarithmic time regarding the input. What’s the base of the logarithm? The answer is that it’s not really relevant as logarithms with different bases are different by a constant factor that we drop in Big O notation anyway. O(log n) logarithms are very often based on halving problems to find a result.

n	Number of Operations
1	0
10	≈3.32
100	≈6.64

Examples:

Binary search
Insertion, deletion and search in a binary tree (although it may degenerate to O(n))

O(n) – linear

O(n) algorithms have linear growth rate which means their complexity is strictly related to the input. Let’s have a look at the table below.

n	Number of Operations
1	1
10	10
100	100

Examples:

finding an element in an array by iterating through the whole array
printing all the elements of an array

O(n*log n) – quasilinear

O(n*log n) is a very common complexity of algorithms. It’s especially common amongst efficient sorting algorithms.

n	Number of Operations
1	0
10	≈33.22
100	≈664.39

Examples:

the average case of Quick Sort
fast Fourier transform

$O(n^2)$ – quadratic

$O(n^2)$ algorithms grow in a quadratic fashion in the relation to the input. This is where our algorithms start to become very inefficient and it is a very common practice to try to reduce quadratic complexity to O(n log n) if possible.

Let’s have a look at another table. We can easily see that the rate of growth is quite fast.

n	Number of Operations
1	1
10	100
100	10000

Examples:

Bubble Sort
Insertion Sort

$O(2^n)$ – exponential time

This is where our algorithms become really slow. Unfortunately, there is a set of problems that cannot be sped up any further and they end up having exponential time complexity.

n	Number of Operations
1	2
10	1024
100	1.267651e+30

Examples:

solving the travelling salesman problem using dynamic programming

O(n!) – factorial

The worst of them all. These types of algorithms are incredibly slow. Let’s have a look at the main chart. The orange line shoots up dramatically.

n	Number of Operations
1	1
10	3628800
100	9.332622e+157

Example:

solving travelling salesman problem by the brute force method
generating all the possible permutations of a list

Why is it important to know all of this?

It’s all about being able to predict how well your algorithm performs when the input scales. When it comes to small numbers it may not really matter. Your bubble sort may work exactly the same as merge sort (or even faster), but when it comes to the big numbers, complexity plays a huge role. Imagine that your algorithm accepts an array as its input. Let’s also imagine that the same algorithm processes every single element of that input array in one millisecond. Let’s pass an array with 1 million elements to our algorithm and let’s assume we just wrote 2 versions of that algorithm: one which works in O(log n) time and another one which works in O(n) time. This is how long it takes for 2 versions of our algorithm to complete its execution.

O(n): n=1 000 000 ms = 1000 seconds = 16 minutes

O(log n): log(1 000 000ms) = log(1000 seconds) = 9.96 seconds

The numbers speak for themselves. There is nothing really to add here. Let me just quote a famous proverb “time is money”. It may cost our company a fortune if we decide to use inefficient algorithms.

Summary

As you can see being able to determine the speed of an algorithm is a crucial skill for a software developer (and any other technologist too). There is a huge performance difference between particular algorithms and we always need to be aware of that. I hope I convinced you in this article that the complexity of algorithms and Big O notation matter and it’s extremely important to understand them.

algorithms – Finematics

Two Generals’ Problem

What is the Two Generals’ Problem?

The story of the Two Generals

How is this related to computer science and TCP?

So maybe a pragmatic approach?

Summary

Extras

What are robo-advisors?

What are robo-advisors?

How do robo-advisors work?

Different types of robo-advisors

What are the pros and cons of robo-advisors?

What are the biggest robo-advisors?

Summary

Algorithmic efficiency and Big O Notation

How to compare the efficiency of algorithms?

What is Big O notation?

Different complexities with examples

O(1) – constant

O(log n) – logarithmic

O(n) – linear

O(n*log n) – quasilinear

\(O(n^2)\) – quadratic

\(O(2^n)\) – exponential time

O(n!) – factorial

Why is it important to know all of this?

Summary