Wednesday, 20 October 2021

Distibuted system design

 User thinks that he is contacting with one computer only. But there are bunch of other machines/hardware/services are there in background. These machines/servies which runs in background are called distributed system.

Fallacies of Distributed system:-

Network is reliable

Latency is zero

Bandwidth is infinite

Topology doesnt change

Network is secure

Only one administration

Transport cost is zero


Distributed system Characterstics:-

- No shared clock 

- No shared Memory - each component having their own storage

- Shared resources - Anything should be shareable among the nodes eg; hardware/software/data.

- Concurrency and Consistency 

Distributed system communication:-

- Different parts of distributed systems needs to be able to talk.

- Requires agreed upon format and protocol.

- Lots of things can go wrong. Need to handle them.

--Client cant find server.

--Server crashes in between

--Server processed the request but somehow its response is not reached to client (network failed)

--Client crashes

Benefits of Distributed system:-

- More reliable, fault tolerant - high availability

- Scalability - multimachines or nodes we can add dependes of traffic

- Lower latency, increased performance - As lots of servers running in background

-Cost effective

System design performance matrics:-

Scalability:-

--Ability of a system to grow and manage increased traffic. Horizontal or vertical scalling needs to be applied.

--Increased volume of data or request

Reliability:-

--Probability a system will fail during a period of time

Failure can belongs to software or hardware. Need to monitor both by healthcheck related scripts on productions and take neccessory actions to reduce the chances of failure.

And other automated testing can be performed on production to find the bugs if any.

Any product is reliable when it keeps on working even after failure of hardware/software.

Availability:-

--Amount of time a system is operational during a period of time. High availability we need to use here. If one server failes, redirect requests to other server. In this way server is always available for the clients.

--Poorly designed system requires more downtime for the updates. I remember a decades ago where we need to update a small thing we used to down whole site and update it early morning when traffic flow was minimum.

Availability calculations:-

Availability % = ( available time / total time ) x 100

                        = 23/24 x 100 = 95.83% = 360 hrs in an year = 15 days of downtime required annually 

                        if availability is 95.83%.

Reliable Vs Available:-

- Reliable system is always an available system.

- Availablity can be achieved by high availabiliy , means by adding more servers, creating clusters and replications of those servers.

- Reliability says to detect early failure of software or hardware. So that traffic can be redirected working software/hardware timely.

- Reliable software are more profitable - because it doesnt required more servers/clusters/replications.

- Depends on your software you can think to give more priority to availability or reliablity. Eg: suppose software is social media, if you are posting a video and it got failed - no need to worry you can try it later...means its less reliable and you can live with it. But at same time if you take an example of plane and its in air. here reliability is more important that availability. you can't take risk to rely on availability.