This article is going to be different from the rest of my articles published on Analytics Vidhya – both in terms of content and format. I usually layout my article such that after a read, the reader is left to think about how this article can be implemented on grounds.
In this article, I will start with a round of brainstorming around a particular type of business problem and then talk about a sample analytics based solution to these problems. To make use of this article make sure that you follow my instructions carefully.
Let’s start with a few business cases:
- Retail bank: Optimize primary bank branch allocation for all the customers. This is to make sure that the bank branch allotted to the customer is close to the mailing or permanent address of the customer for his convenience. This might be specially applicable, if we open a new branch and the closest branch for many existing customer changes to this new branch.
- Retail Store chain: Send special offers to your loyal customers. But offers could be region specific so same offer cannot be sent to all. Hence, you first need to find the closest store to the customer and then mail the offer which is currently applicable for that store.
- Credit card company who sells co-branded cards: You wish to find out all partner stores which are closest to your existing client base and then mail them appropriate offers.
- Manufacturing plant: Wish to find out wholesalers near your plant for components required in manufacturing of the product.
What is so common in all the problems mentioned above? Each of these problems deal with getting the distance between multiple combination of source and target destinations.
Exercise : Think about at-least 2 such cases in your current industry and then at least 2 cases outside your current industry and write them in the comment section below.
A common approach
I have worked in multiple domains and saw this problem being solved in similar fashion which gives approximate but quick results.
Exercise : Can you think of a method to do the same using your currently available data and resources?
Here is the approach :
You generally have a PIN CODE for both source and destination. Using these PIN CODES, we find the centroid of these regions. Once you have both the centroids, you check their latitude and longitude. You finally calculate the eucledian distance between these two points. We approximate our required distance with this number. Following figure will explain the process better :
The two marked areas refers to different PIN CODES and the distance 10 kms is used as an approximate distance between the two points.
Exercise : Can you think of challenges with this approach ?
Here are a few I can think of :
- If the point of interest is far away from the centroid, this approach will give inaccurate results.
- Some times the centroid of other PIN CODE can be more closer to the point of interest than its own PIN CODE. But because it falls in area of the distant PIN CODE, we still approximate the point of interest with the centroid of distant PIN CODE.
- In cases where we need finer distances than the precision of PIN CODE demarcation, this method will lead nowhere. Imagine a scenario where two branches of a bank and customer address is located in the same PIN CODE. We have no way to find the closest branch.
- The distance calculated is a point to point distance and not on road. Imagine a scenario when you have two PIN Codes right next to each other but you have valley between which you need to circle around to reach destination.
A manual Approach
Say you have two branches and a single customer, how will you make a call between the two branches (which one is closer)? Here is a step by step approach :
- You choose the first combination of branch-customer pair.
- You feed the two addresses in Google Maps.
- You pick the distance/time on road
- You fill in the distance in the table with the combinations (2 in this case)
- Repeat the same process with the other combination.
How to automate this approach?
Obviously, this process cannot be done manually for millions of customers and thousands of branches. But this process can be well automated (however, Google API have a few caps on the total number of searches). Here is a simple Python code which can be used to create functions to calculate the distance between two points on Google Map.
Exercise : Create a table with a few sources and destinations. Use these functions to find distance and time between those points. Reply “Done without support” if you are able to implement the code without looking at the rest of the solution.
Here is how we can read in a table of different source-destination combinations :
Notice that we have all types of combinations here. Combination 1 is a combo of two cities. Combo 4 is a combination of two detailed address. Combo 6 is a combination of a city and a monument. Let’s now try to get the distances and time & check if they make sense.
All the distance and time calculations in this table look accurate.
Exercise : What are the benefits of using this approach over the PIN CODE approach mentioned above? Can you think of a better way to do this task?
Here is the complete Code :
[stextbox id=”grey”]
import googlemaps from datetime import datetime
def finddist(source, destination): gmaps = googlemaps.Client(key='XXX') now = datetime.now() directions_result = gmaps.directions(source, destination, mode="driving",departure_time=now) for map1 in directions_result: overall_stats = map1['legs'] for dimensions in overall_stats: distance = dimensions['distance'] return [distance['text']] def findtime(source, destination): gmaps = googlemaps.Client(key='XXX') now = datetime.now() directions_result = gmaps.directions(source, destination, mode="driving",departure_time=now) for map1 in directions_result: overall_stats = map1['legs'] for dimensions in overall_stats: duration = dimensions['duration'] return [duration['text']]
import numpy as np import pandas as pd import pylab as pl import os os.chdir(r"C:\Users\Tavish\Desktop") cities = pd.read_csv("cities.csv")
cities["distance"] = 0 cities["time"] = 0 for i in range(0,8): source = cities['Source'][i] destination = cities['Destination'][i] cities['distance'][i] = finddist(source,destination) cities['time'][i] = findtime(source,destination) [/stextbox]
End Notes
GoogleMaps API come with a few limitations on the total number of searches. You can have look at the documentation, if you see a use case of this algorithm.
Did you find the article useful? Share with us find more use cases of GoogleMaps API usage apart from the one mentioned in this article? Also share with us any links of related video or article to leverage GoogleMaps API. Do let us know your thoughts about this article in the box below.