Computer Science – Finematics https://finematics.com decentralized finance education Fri, 05 Jun 2020 12:33:20 +0000 en-GB hourly 1 https://wordpress.org/?v=5.8.1 https://finematics.com/wp-content/uploads/2017/09/cropped-favicon-32x32.png Computer Science – Finematics https://finematics.com 32 32 Compiled vs Interpreted Programming Languages – C, C++, Rust, Go, Haskell, C#, Java, Python, Ruby, Javascript https://finematics.com/compiled-vs-interpreted-programming-languages/?utm_source=rss&utm_medium=rss&utm_campaign=compiled-vs-interpreted-programming-languages&utm_source=rss&utm_medium=rss&utm_campaign=compiled-vs-interpreted-programming-languages https://finematics.com/compiled-vs-interpreted-programming-languages/#comments Wed, 27 May 2020 16:23:52 +0000 https://finematics.com/?p=654

Introduction

When it comes to code compilation and execution, not all programming languages follow the same approach. One of the common although not ideal ways to differentiate them is to split them into 2 groups compiled and interpreted languages.

The main goal of both compilation and interpretation is to transform the human-readable source code into machine code that can be executed directly by a CPU, but there are some caveats to it.

One of the main things we have to understand is that a programming language itself is neither compiled nor interpreted, but the implementation of a programming language is. In fact, there are many programming languages that have been implemented using both compilers and interpreters.

Java can be a good example of such a language as Java’s source code is compiled to an intermediate representation called bytecode and interpreted by Java’s interpreter that is a part of Java Virtual Machine (JVM). This is a standard process present in all of Java’s popular implementations.

Compiled Languages

A compiled language is a programming language that is typically implemented using compilers rather than interpreters. A compiler is a program that translates statements written in a particular programming language into another language usually machine code. A standard compiler instead of translating code on the fly does all of its work ahead of execution time.

A good example of a compiled language is C++.

In C++ the source code is compiled into machine code. The compilation process consists of preprocessing, compiling and linking, but the end result is either a library or an executable that can be executed directly by a CPU that the program was compiled for.

The main benefit of compiled languages is the speed of execution as the executable that contains machine code can be directly executed on the target machine without any additional steps.

The main drawbacks are poor portability as programs have to be compiled for a specific CPU architecture and a long time that is required for the actual compilation.

Other examples of popular compiled languages are C, Go, Haskell or Rust.

Interpreted Languages

An interpreted language is a programming language that is typically implemented using interpreters and doesn’t compile source code directly into machine code ahead of execution. The interpreter executes program translating each statement into a sequence of one or more subroutines and then into machine code. We can say that the interpreter translates programs on the fly instead of focusing on the whole program at once.

Even though interpreter could be translating source code into machine code, these days most of the interpreters work with an intermediate representation also called bytecode in most interpreted programming languages. This is because interpreting source code directly would be quite slow and most interpreted languages benefit from compiling into bytecode first that can prepare and optimise the code for further interpretation into machine code.

There are not many fully interpreted languages left. One noticeable example is Javascript that depending on the implementation can be fully interpreted. This means that the source code of the actual program would be interpreted by the interpreter and translated into machine code on the fly. This feature was quite useful in Javascript as the code could be easily sent over the network and executed in the user’s browser.

Even though it is quite hard to find any popular language in the fully interpreted language category, we can easily find plenty of them in the bytecode interpreted one. The examples are Java, C#, Python or Ruby.

The main benefits of using an interpreted language are portability as programs don’t have to be compiled for a specific CPU architecture and faster compilation process (for the language implementations that compile to bytecode).

The main drawbacks are usually slower execution speed and potential for leaking source code if the non-obfuscated source code is sent to the client.

Middle ground? JIT Compilation

So far it looks like both of the languages compiled and interpreted have their pros and cons.

What if I tell you you could still achieve the speed of a fully compiled language without sacrificing portability and faster compilation time? Sounds impossible? This is where JIT compilation comes to play.

JIT or just-in-time compilation is a hybrid between normal compilation also called ahead-of-time compilation and interpretation. Instead of translating each statement from the input file (which is usually bytecode), JIT has the ability to store already compiled machine code so it doesn’t have to translate it each time.

JIT compilation works by analysing the code that is being executed (usually bytecode) and making decisions which parts of the code should be fully compiled to machine code based on how often that piece of code is being executed (and a few other factors).

The main benefit of this approach is high execution speed as all the critical and often executed code fragments are fully compiled into machine code. This comes at a cost of a bit slower execution during the initial period when the critical code fragments are being analysed and are not fully compiled yet.

A full explanation of the JIT compilation process is outside of the scope of this video, but I’m thinking about creating another one dedicated to the JIT compilation as this is a super interesting process that not everyone fully understands.

Some of the languages that make use of JIT compilation are Java, C#, Pypy (alternative Python implementation) and V8 (Javascript engine).

Summary

Let’s compare a few main characteristics of compiled, interpreted and JIT-compiled languages one by one.

compiledinterpretedJIT-compiled
execution speedfastslowusually fast (depending on the JIT implementation)
portabilitypoorgoodgood
compilation timeslowfast (bytecode)fast (bytecode)

As you probably already noticed splitting programming languages into compiled and interpreted languages is quite artificial as there is not a lot of fully interpreted languages left.

Most of the popular programming languages these days fit into one of these three categories compiled, compiled to bytecode and interpreted and compiled to bytecode and interpreted with JIT compilation.

Extra

One more interesting fact before we wrap this up.

When it comes to programming languages with a multitude of different implementations, Python is one of the winners.

This is a non-exhaustive list of Python’s alternative implementations

  • IronPython (Python running on .NET)
  • Jython (Python running on the Java Virtual Machine)
  • PyPy (Python with a JIT compiler)

If you have any questions about compiled and interpreted languages or any suggestions for the next videos please comment down below.

]]>
https://finematics.com/compiled-vs-interpreted-programming-languages/feed/ 1
UDP vs TCP. What are the differences? https://finematics.com/udp-vs-tcp/?utm_source=rss&utm_medium=rss&utm_campaign=udp-vs-tcp&utm_source=rss&utm_medium=rss&utm_campaign=udp-vs-tcp https://finematics.com/udp-vs-tcp/#respond Tue, 21 Apr 2020 17:23:36 +0000 https://finematics.com/?p=621

Introduction

UDP and TCP are both transport protocols used for communication between different hosts. They’re part of the transport layer in the broadly-known OSI model. The transport layer is responsible for delivering data to the correct application processes over a network.

UDP and TCP combined together contribute to pretty much all traffic on the Internet no matter if you’re watching a movie on Netflix, browsing the web or checking your banking app.

Although both UDP and TCP are ultimately used for the same purpose (communication) there are many differences between them.

Before we jump into the comparison let’s quickly review both of the protocols.

UDP

UDP or the User Datagram Protocol is a simple connectionless protocol that can be used to send messages a.k.a. datagrams between different systems. UDP datagram consists of a header and a data section. The header consists of 4 fields: source port, destination port, length and checksum each one of them is 2 bytes which make the header size equal to 8 bytes.

UDP is fast as it doesn’t have to establish a connection before it can start sending data. On top of that, it doesn’t care about acknowledgements which results in overall less data being sent over a network. The other characteristic that makes it fast is no concept of congestion control, so the data is always sent immediately even if the receiver cannot keep up with the speed.

The main downside of UDP is the fact that it’s not reliable. The way how UDP sends messages is sometimes called fire and forget as the sender doesn’t care if the data was successfully delivered, it doesn’t attempt to resend lost messages and it doesn’t wait for any acknowledgements. On top of having a potential of not delivering all the data, the messages can also arrive in a different order to what they were sent in or they can contain duplicates. The only thing that UDP provides is integrity verification of the header and the payload that is implemented using checksums. If two systems want to communicate via UDP in a reliable way they would have to add reliability on the application level.

So what kind of applications use UDP?

UDP is used across different systems ranging from simple query-response protocols that don’t have to establish a connection at all, for example, DNS or NTP protocols to time-sensitive applications that prefer to drop messages instead of waiting for delayed packets caused by retransmission, for example, IPTV, VOIP or online games. The other group of systems that UDP is suitable for are broadcasting services where one message has to be delivered to multiple recipients which can be easily achieved with multicast that UDP supports. Some of the examples are service discovery systems or routing protocols.

It’s also worth noting that UDP cannot be used in situations when all data must arrive at the destination in the correct order, so sending an important file or an email via UDP may not be the best idea.

Now, let’s jump into the UDP’s biggest rival – TCP.

TCP

TCP or the Transmission Control Protocol is a complex, connection-oriented protocol that can be used for reliable communication between different systems. TCP divides data into chunks and adds a TCP header creating a TCP segment. TCP header consists of multiple fields including source and destination port, sequence number, acknowledgement number and checksum. The total minimum size of the TCP header is 20 bytes.

TCP as a connection-oriented protocol initiates communication by establishing a connection between two parties before data can be sent. It does it by using a mechanism called a three-way handshake. When parties decide they are done with sending data, the connection can be terminated by another mechanism called a four-way handshake, with each side of the connection terminating independently.

TCP is a reliable protocol as it provides multiple features for guaranteed data delivery. The main features are a retransmission of lost packets, congestion control, error detection, guaranteed ordering of packets and sending acknowledgements for delivered packets.

The main downsides are the usually lower speed of sending data and higher latency as data doesn’t always get sent out immediately. Also, overall more data is being sent over the network because of the acknowledgements.

TCP is one of the most commonly used protocols on the Internet. HTTP uses it, so whenever you’re browsing the web you’re using TCP. Other examples include FTP for file transfers, SSH for remote sessions and SMTP for sending emails.

Time for the main comparison.

UDP vs TCP

Let’s compare UDP and TCP one to one.

UDP

Message-oriented

Smaller header – 8 bytes

not reliable (no retransmission, no error detection besides checksums, no guaranteed ordering, no congestion control)

usually faster with lower latency (no acks, no congestion control)

lower bandwidth consumption (no acks)

multicast

TCP

Connection-oriented

Bigger header – 20 bytes (minimum)

reliable (retransmission, error detection, guaranteed ordering, congestion control)

usually slower with higher latency (acks, congestion control)

higher bandwidth consumption (acks, retransmissions)

no multicast

Summary

UDP and TCP protocols despite major differences are both extremely useful in their own way. They provide intrinsic trade-offs when it comes to reliability, speed and simplicity.

The key for choosing which one is more appropriate for our system is knowing those trade-offs and understanding our system requirements.

If you have any questions about UDP and TCP or any suggestions for the next posts/videos please comment down below.

Extra

TCP/IP Illustrated by Addison-Wesley Professional → https://amzn.to/2VNYG69

]]>
https://finematics.com/udp-vs-tcp/feed/ 0
Two Generals’ Problem https://finematics.com/two-generals-problem/?utm_source=rss&utm_medium=rss&utm_campaign=two-generals-problem&utm_source=rss&utm_medium=rss&utm_campaign=two-generals-problem https://finematics.com/two-generals-problem/#respond Wed, 02 May 2018 12:59:14 +0000 https://finematics.com/?p=379

What is the Two Generals’ Problem?

The Two Generals’ Problem, also known as the Two Generals’ Paradox or the Two Armies Problem, is a classic computer science and computer communication thought experiment that we’re going to talk about in this post.

First of all, to avoid any confusion, we need to remember that the Two Generals’ Problem, although related to the Byzantine Generals’ Problem is not the same. Byzantine Generals’ Problem is a more general version of the Two Generals’ Problem and it’s often discussed when talking about distributed systems, fault tolerance and blockchain. We’ll be talking about it in the following post.

But now let’s move to the story of the two generals.

The story of the Two Generals

Two Generals Problem

Let’s imagine two armies, led by two generals, planning an attack on a common enemy. The enemy’s city is in a valley and has a strong defence that can easily fight off a single army. The two generals have to communicate with each other to plan a synchronised attack as this is their only chance to win. The only problem is that to communicate with each other they have to send a messenger across the enemy’s territory. If a messenger is captured the message he’s carrying is lost. Also, each general wants to know that the other general knows when to attack. Otherwise, a general wouldn’t be sure if he’s attacking alone and as we know attacking alone is rather pointless.

Now, let’s go through a simple scenario. Let’s call our generals A and B and let’s assume everything goes perfectly fine. General A, who is the leader, sends a message – “Attack tomorrow at dawn”. General B receives a message and sends back an acknowledgement – “I confirm, attack tomorrow at dawn”. A receives B’s confirmation. Is this enough to form a consensus between the generals? Unfortunately not, as General B still doesn’t know if his confirmation was received by General A. Ok, so what if General A confirms General’s B confirmation? Then, of course, that confirmation has to be also confirmed and we end up with an infinite exchange of confirmations.

In the second scenario, let’s also assume that General A sends a message to General B. Some time has passed and General A starts wondering what happened to his message as there is no confirmation coming back from General B. There are two possibilities here. Either the messenger sent by General A has been captured and hasn’t delivered a message or maybe B’s messenger carrying B’s confirmation has been captured. In both scenarios, they cannot come to a consensus again as A is not able to tell if his message was lost or if it was B’s confirmation that didn’t get through. Again, we ended up in an inconsistent state which would result in either General A or B attacking by himself.

We can quickly realise that no matter how many different scenarios we try and how many messages we send we cannot guarantee that consensus is reached and each general is certain that his ally will attack at the same time. To make it even worse, there is no solution to the Two Generals’ Problem, so the problem remains unsolvable.

I hope you can clearly see an analogy to computers’ communication here.

How is this related to computer science and TCP?

Instead of two generals, let’s imagine two computer systems talking to each other. The main problem here is again the untrusted communication channel and inconsistent state between two machines. A very common example that always comes up when talking about the Two Generals’ Problem is the TCP protocol.

TCP

As we probably know, TCP uses a mechanism called 4-way handshake to terminate the connection. In this mechanism, a system that wants to terminate a connection sends a FIN message. The system on the other side of the communication channel replies with an ACK and sends its own FIN message which is followed by another ACK from the system which initialised termination. When all of those messages are received correctly, both sides know that the connection is terminated. So far it looks ok, but the problem here is again the shared knowledge between the two systems. When, for example, the second FIN is lost we end up with a half-open connection where one side is not aware that the connection has been closed. That’s why even though TCP is very reliable protocol it doesn’t solve the Two Generals’ Problem.

So maybe a pragmatic approach?

I’m happy you’re not giving up. Unsurprisingly, there was a number of people trying to solve unsolvable Two General’s Problem and they came up with a few practical approaches. The main assumption here is to accept the uncertainty of the communication channel and mitigate it to a sufficient degree.

Let’s go back to our generals. What if General A instead of sending only 1 messenger sends 100 of them assuming that General B will receive at least 1 message. How about marking each message with a serial number starting from 1 up to 100. General B, based on the missing numbers in the sequence, would be able to gauge how reliable the communication channel is and reply with an appropriate number of confirmations. These approaches, even though, quite expensive are helping the generals to build up their confidence and come to a consensus.

If sacrificing messengers is a problem, we can come up with yet another approach where the absence of the messengers would build up generals’ confidence. Let’s assume that it takes 20 minutes to cross the valley, deliver a message and come back. General A starts sending messengers every 20 minutes until he gets a confirmation from General B. Whenever confirmation arrives General A stops sending messengers. In the meantime, General B after sending his messenger with his confirmation awaits for the other messengers coming from General A, but this time an absence of a messenger builds up General’s B confidence as this is what the Generals agreed on.

In this case, we have a clear speed vs cost tradeoff and it’s up to us which approach is more suitable to our problem.

That’s the end of the story of the Two Generals. Time for a quick summary.

Summary

Two Generals’ Problem is a classic computer science problem that remains unsolvable.

It comes up whenever we talk about communication over an unreliable channel.

The main problem is an inconsistent state caused by lack of common knowledge

There are some pragmatic approaches to the Two Generals’ Problem.

Extras

Two Generals’ Problem was first published in 1975 in “Some Constraints and Trade-offs in the Design of Network Communications” paper and described a problem with communication between two groups of gangsters. If you want to read the original version check this link.

 

 

 

]]>
https://finematics.com/two-generals-problem/feed/ 0
Actor Model Explained https://finematics.com/actor-model-explained/?utm_source=rss&utm_medium=rss&utm_campaign=actor-model-explained&utm_source=rss&utm_medium=rss&utm_campaign=actor-model-explained https://finematics.com/actor-model-explained/#comments Tue, 17 Apr 2018 23:16:03 +0000 https://finematics.com/?p=323

What is the Actor Model?

Actor Model is a conceptual model of concurrent computation originated in 1973. In the Actor Model, an actor is a fundamental unit of computation and everything is an actor.

The only allowed operations for an actor are:

  • to create another actor,
  • to send a message
  • or to designate how to handle the next message

First two operations are quite easy to understand. Let’s focus on the last one. An actor can hold its own private state and it can decide how to process the next message based on that state. Let’s imagine an actor that stores a total balance of our account. If our actor receives a message with a new transaction, it updates its state by adding the new amount to the already calculated total balance which means that for the next message the state of the actor will be different.

Properties of the actors

Actors are lightweight and it is easy to create thousands or even millions of them as they require fewer resources than threads.

Let’s have a look at the actors in more details. Actors are isolated from each other and they do not share memory. They have a state, but the only way to change it is by receiving a message. Every actor has its own mailbox, which is similar to a message queue. Messages are stored in actors’ mailboxes until they are processed. Actors, after created, are waiting for messages to arrive. Actors can communicate with each other only through messages. Messages are sent to actors’ mailboxes and processed in FIFO (first in, first out) order. Messages are simple, immutable data structures that can be easily sent over the network.

Conceptually an actor can handle only 1 message at a time. Actors are decoupled, they work asynchronously and they don’t need to wait for a response from another actor.

Addresses

Actors have addresses, so it’s possible for an actor to send a message to another actor by knowing its address. An actor can only communicate with actors whose addresses it has. An actor has addresses of the actors it has itself created and it can obtain other addresses from a message it receives. One actor can have many addresses. We need to remember that address is not equal to identity, so it doesn’t mean that two actors with the same identity have the same address.

Actors can run locally or remotely on another machine. It is completely transparent for the system as actors communicate through addresses which can be local or remote.

Fault tolerance

Now, let’s look at the fault tolerance. In the Actor Model, actors can supervise other actors. An actor can supervise the actors it creates and can decide what to do in case of failure. A supervisor can, for example, restart a supervised actor or redirect messages to another actor. It leads us to self-healing systems.

Pros and cons

Let’s have a look at the pros and cons of the Actor Model.

ProsCons
easy to scaleactors are susceptible to deadlocks
fault tolerance overflowing mailboxes
geographical distribution
not sharing state

Implementations of the Actor Model

We need to remember that the Actor Model is only a conceptual model and a lot of properties of your system depends on the chosen implementation. The best-known implementations of the Actor Model are Akka (Scala and Java) and Elixir (Erlang).

Akka ► https://akka.io

Elixir ► https://elixir-lang.org

Extras

If you want to learn more about the Actor Model please check this great talk by Carl Hewitt

If you want to learn more about different concurrency models including the Actor Model, I recommend the following book.

]]>
https://finematics.com/actor-model-explained/feed/ 2