Designing a URL Shortening Service like TinyURL
Lets design a URL Shortening service like TinyURL. This service will provide short short URLs for a large URL.
What is the Problem ?
URLs can be pretty huge depending upon the resources like the following :
https://news.google.com/topics/CAAqIggKIhxDQkFTRHdvSkwyMHZNREZqY0hsNUVnSmxiaWdBUAE?hl=en-IN&gl=IN&ceid=IN%3Aen ,
https://news.google.com/topics/CAAqIggKIhxDQkFTRHdvSkwyMHZNREZqY0hsNUVnSmxiaWdBUAE?hl=en-IN&gl=IN&ceid=IN%3Aen ,
I think this Shortening was majorly used in Twittor where there is a limit of 140 characters. 
Requirements of the System
Its always necessary to clear out the requirements with the Stakeholders on what are the expectations they are making, This will ensure that our System is designed as per the Requirements. 
Questions which are already answered
- We need to design a system which will store a shorter version of URL that was given.
- When somebody clicks that shorter URL , request will hit our Service and they will be redirected to the original URL.
Questions which needs to be answered before designing System
- Are Users allowed to pick among a list of URLs to choose ?
- For how long we will be storing the actual URL and shorter version of it ?
- Who will be our clients : Browsers or Applications or Both ?
- Do we need to apply authentication and authorization on this ?
- What kind of metrics are required like How many times a particular URL was redirected ? Do we need to store who all accessed it ?
- What is maximum allowed length of shortened URLs ?
Once we start asking questions about the problem and related parameters to it we are starting deeper into the Problem Scenario. Some of the above questions will be linked to Design and lets say we will be asked how much there is our capability to do this. 
Following Questions will be asked for which we need to find answers from Technology Capability available today
- What is the shortest length or URL that our Service can support.
- How many Requests (Shortening/Redirection) we can serve per month/day/second.
- What is the maximum time duration for which we can store these URLs.
- What it will take to have our Service be called from Browsers as well as Applications.
- What is the complexity of involving authentication and authorization in our Service.
- What should be throttling limit per User ?
Problem 1 : What is the minimum size of shortened version of URL we can accommodate ?
This is really dependent on the algorithm we are choosing for shortening the URL. 
We can compute a unique hash of the URL. These hash functions could be md5, sha245.
md5 , Outputs 128 Bits  ie. 128 digits in Base2
sha256 : Outputs 256 Bits i.e 128 digits in Base2
We won't be storing these 128 bits, and we can convert these bits to some alphanumeric characters as these will be returned as a shortened URL.
If we take 6 Base64 characters we would have 64^6 = 68,719,476,736 , which is 68 Billion Unique Codes.
If we take 8 Base64 characters we would have 64^8 = 281 Trillion Unique Codes.
We won't be storing these 128 bits, and we can convert these bits to some alphanumeric characters as these will be returned as a shortened URL.
If we take 6 Base64 characters we would have 64^6 = 68,719,476,736 , which is 68 Billion Unique Codes.
If we take 8 Base64 characters we would have 64^8 = 281 Trillion Unique Codes.
But when we are going to store these many URLs we also going to incur some cost on it. 
So, choosing how many digits is also dependent on the Cost and the URLs which we want to support.
Problem 2 : How many URLs Requests we can serve with this System ?
Taking the current existing systems around this market we have
Refer Here : https://tinyurl.com/y7l98fv4
Maximum that i found was 1 Million/Day , lets say we want to build which can take up to 50 Millions/Day Requests.
For URL Shortening Services, we need to determine the Read/Write Traffic Ratio of it.
We can safely assume that traffic would be 80:20 , ie 80 %of Requests will be Read Requests and 20% are Write Requests.
i.e. 80% are Redirection Requests where Shortened URL needs to be redirected to original URL
and 20% are shortening Requests which will be returning a shortening requests.
Lets divide our Requests in actual numbers based on our maximum planned capacity. it would be
Read Requests = 50 Million/day x 0.80 = 40 Million/Day.
Write Requests = 50 Million/day x 0.20 = 10 Million/Day.
Taking the above capacity ,let see how many URLs we would be generating per day, per month, per year.
Per Day : 10 Million
Per Month : 300 Million
Per Year : 3600 Million = 3.6 Trillion
Per 10 Year : 36 Trillion.
Even if we take 8 Base64 Characters we can serve this upto
281 Trillion / 36 ~ 78 Years. That is we can serve upto 78 Years till the time our Unique Keys are exhausted.
But Traffic will be increasing over the year, which we cannot plan ahead.
But looking at this data Base64 8 Characters if more than sufficient for the next 10 Years atleast in terms of how many URLs we can store.
If we want to reduce this length , then we have to go on Expiry which we can discuss later on Expiry Problem.
But can we serve these requests ie. 50 Million/Day. = 50/24 Million/Hour ~ 2 Million/Hour ~ 2/3600 Million/Hour = 555 Requests/Second.
Well obviously we can serve these many request, we need to see how much we are taking to process a Request. This would decide how many servers we need to put in behind load balancers. Here comes the cost impact, where we need to decide how many EC2/Virtual Machines/Lambdas we would be firing to get the work done.
But given the fact that this is very low cost operation , its latency will depend on lot many factors which we will discuss when we will come on Data Store part.
For now lets consider that Our Service would be serving 555 Requests/Second where 555 * 0.80 = 444 Read Request/Second and 111 Write Requests Per Second.
Problem 2 : How many URLs Requests we can serve with this System ?
Taking the current existing systems around this market we have
Refer Here : https://tinyurl.com/y7l98fv4
Maximum that i found was 1 Million/Day , lets say we want to build which can take up to 50 Millions/Day Requests.
For URL Shortening Services, we need to determine the Read/Write Traffic Ratio of it.
We can safely assume that traffic would be 80:20 , ie 80 %of Requests will be Read Requests and 20% are Write Requests.
i.e. 80% are Redirection Requests where Shortened URL needs to be redirected to original URL
and 20% are shortening Requests which will be returning a shortening requests.
Lets divide our Requests in actual numbers based on our maximum planned capacity. it would be
Read Requests = 50 Million/day x 0.80 = 40 Million/Day.
Write Requests = 50 Million/day x 0.20 = 10 Million/Day.
Taking the above capacity ,let see how many URLs we would be generating per day, per month, per year.
Per Day : 10 Million
Per Month : 300 Million
Per Year : 3600 Million = 3.6 Trillion
Per 10 Year : 36 Trillion.
Even if we take 8 Base64 Characters we can serve this upto
281 Trillion / 36 ~ 78 Years. That is we can serve upto 78 Years till the time our Unique Keys are exhausted.
But Traffic will be increasing over the year, which we cannot plan ahead.
But looking at this data Base64 8 Characters if more than sufficient for the next 10 Years atleast in terms of how many URLs we can store.
If we want to reduce this length , then we have to go on Expiry which we can discuss later on Expiry Problem.
But can we serve these requests ie. 50 Million/Day. = 50/24 Million/Hour ~ 2 Million/Hour ~ 2/3600 Million/Hour = 555 Requests/Second.
Well obviously we can serve these many request, we need to see how much we are taking to process a Request. This would decide how many servers we need to put in behind load balancers. Here comes the cost impact, where we need to decide how many EC2/Virtual Machines/Lambdas we would be firing to get the work done.
But given the fact that this is very low cost operation , its latency will depend on lot many factors which we will discuss when we will come on Data Store part.
For now lets consider that Our Service would be serving 555 Requests/Second where 555 * 0.80 = 444 Read Request/Second and 111 Write Requests Per Second.


Comments
Post a Comment