Skip to main content

Designing a URL Shortening Service

Designing a URL Shortening Service like TinyURL

Lets design a URL Shortening service like TinyURL. This service will provide short short URLs for a large URL.

What is the Problem ?

URLs can be pretty huge depending upon the resources like the following :

https://news.google.com/topics/CAAqIggKIhxDQkFTRHdvSkwyMHZNREZqY0hsNUVnSmxiaWdBUAE?hl=en-IN&gl=IN&ceid=IN%3Aen , 

I think this Shortening was majorly used in Twittor where there is a limit of 140 characters. 

Requirements of the System

Its always necessary to clear out the requirements with the Stakeholders on what are the expectations they are making, This will ensure that our System is designed as per the Requirements. 

Questions which are already answered 

  1. We need to design a system which will store a shorter version of URL that was given.
  2. When somebody clicks that shorter URL , request will hit our Service and they will be redirected to the original URL.

Questions which needs to be answered before designing System 

  1. Are Users allowed to pick among a list of URLs to choose ?
  2. For how long we will be storing the actual URL and shorter version of it ?
  3. Who will be our clients  : Browsers or Applications or Both ?
  4. Do we need to apply authentication and authorization on this ?
  5. What kind of metrics are required like How many times a  particular URL was redirected ? Do we need to store who all accessed it ?
  6. What is maximum allowed length of shortened URLs ?

Once we start asking questions about the problem and related parameters to it we are starting deeper into the Problem Scenario. Some of the above questions will be linked to Design and lets say we will be asked how much there is our capability to do this. 

Following Questions will be asked for which we need to find answers from Technology Capability available today 

  1. What is the shortest length or URL that our Service can support.
  2. How many Requests (Shortening/Redirection) we can serve per month/day/second.
  3. What is the maximum time duration for which we can store these URLs.
  4. What it will take to have our Service be called from Browsers as well as Applications. 
  5. What is the complexity of involving authentication and authorization in our Service.
  6. What should be throttling limit per User ? 

Problem 1 : What is the minimum size of shortened version of URL we can accommodate ?

This is really dependent on the algorithm we are choosing for shortening the URL. 
We can compute a unique hash of the URL. These hash functions could be md5, sha245.
md5 , Outputs 128 Bits  ie. 128 digits in Base2
sha256 : Outputs 256 Bits i.e 128 digits in Base2

We won't be storing these 128 bits, and we can convert these bits to some alphanumeric characters as these will be returned as a shortened URL.
If we take 6 Base64 characters we would have 64^6 = 68,719,476,736 , which is 68 Billion Unique Codes.
If we take 8 Base64 characters we would have 64^8 = 281 Trillion Unique Codes.





But when we are going to store these many URLs we also going to incur some cost on it. 

So, choosing how many digits is also dependent on the Cost and the URLs which we want to support.





Problem 2 : How many URLs Requests we can serve with this System ?
Taking the current existing systems around this market we have

Refer Here : https://tinyurl.com/y7l98fv4
Maximum that i found was 1 Million/Day , lets say we want to build which can take up to 50 Millions/Day Requests.

For URL Shortening Services, we need to determine the Read/Write Traffic Ratio of it.
We can safely assume that traffic would be 80:20 , ie 80 %of Requests will be Read Requests and 20% are Write Requests.
i.e. 80% are  Redirection Requests where Shortened URL needs to be redirected to original URL
and 20% are shortening Requests which will be returning a shortening requests.
Lets divide our Requests in actual numbers based on our maximum planned capacity. it would be

Read Requests = 50 Million/day x 0.80 = 40 Million/Day.

Write Requests = 50 Million/day x 0.20 = 10 Million/Day.

Taking the above capacity ,let see how many URLs we would be generating per day, per month, per year.
Per Day : 10 Million
Per Month : 300 Million
Per Year : 3600 Million = 3.6 Trillion
Per 10 Year  : 36 Trillion.


Even if we take 8 Base64 Characters we can serve this upto
281 Trillion / 36 ~ 78 Years.  That is we can serve upto 78 Years till the time our Unique Keys are exhausted.
But Traffic will be increasing over the year, which we cannot plan ahead.
But looking at this data Base64 8 Characters if more than sufficient for the next 10 Years atleast in terms of how many URLs we can store.
If we want to reduce this length , then we have to go on Expiry which we can discuss later on Expiry Problem.

But can we serve these requests ie. 50 Million/Day. = 50/24 Million/Hour ~ 2 Million/Hour ~  2/3600 Million/Hour = 555 Requests/Second.

Well obviously we can serve these many request, we need to see how much we are taking to process a Request. This would decide how many servers we need to put in behind load balancers. Here comes the cost impact, where we need to decide how many EC2/Virtual Machines/Lambdas we would be firing to get the work done.
But given the fact that this is very low cost operation , its latency will depend on lot many factors which we will discuss when we will come on Data Store part.

For now lets consider that Our Service would be serving 555 Requests/Second where 555 * 0.80 =  444 Read Request/Second  and 111 Write Requests Per Second.



References








Comments

Popular posts from this blog

Database Sharding

Collating some of the resources which talks about Database Sharding. https://en.wikipedia.org/wiki/Shard_(database_architecture) [Feb 2019]  http://highscalability.com/blog/2019/2/19/intro-to-redis-cluster-sharding-advantages-limitations-deplo.html Redis Cluster is the Native Sharding implementation available within Redis that allows your to automatically distribute your data across multiple nodes without having to rely on external tools and utilities. Its covers Sharding with Redis Cluster  where Redis Clusters is divided in 16384 slots and these slots are assigned to multiple Redis Nodes. The  Redis Cluster Specification  is the definitive guide to understanding the internals of the technology, while the  Redis Cluster Tutorial  provides deployment and administration guidelines. [ Jan 2019  ]  https://scalegrid.io/blog/scalegrid-hosting-adds-support-for-highly-available-redis-clusters-with-automated-sharding/ ScaleGrid : Fully Manage...

Penetration Testing Basics

Penetration testing, often called “pentesting”,“pen testing”, or “security testing”, is the practice of attacking your own or your clients’ IT systems in the same way a hacker would to identify security holes. Of course, you do this without actually harming the network. The person carrying out a penetration test is called a penetration tester or pentester. The difference between Penetration Testing and Hacking is that you have the system owner's permission to do testing and to identfiy security holes. If you want to do penetration testing u should better ask for his/her permission. Basic Security Concepts Vulnerability: It is a security hole in a piece of software, hardware of Operating system that provides a way to attack the system.A vulnerabilty is as simple as weak passwords and as complex as buffer overflows as well as SQL injection. Security Research: Vulnerabilities are typically searched by security researchers who finds the flaws in the system. Security Research can ...