Introduction
I have been associated with the Great Lakes Business Analytics Program as a visiting faculty for some time now. This is one of the ways, I interact with people wanting to make their career in analytics. The energy and the desire to make a career in business analytics in some of these students is just contagious!
As part of the curriculum, people undergoing this course are expected to complete a Capstone Project. This Capstone Project is like a climax at the end of a yearlong movie – it looks difficult at the start, but everything you do in the entire movie comes together at the end.
I had the privilege of sitting in the evaluation panel for a few projects. In today’s article, I plan to give an overview of what happened in these projects.
Why share these projects?
Well, there are a few reasons, why I wanted to share these projects in form of an article with a larger audience:
- I think these projects are good examples of small projects, people who want to pursue analytics can undertake on their own to create a portfolio of data science projects.
- We get a lot of queries about this program and I think this article can help address a lot of those queries directly or indirectly.
I have already informed Great Lakes about the article and I’ve published this with their consent.
Overview – How do these projects work?
As mentioned before, every person undergoing this course is expected to submit a Capstone Project. In order to do this project, a project group is formed with participants from diverse backgrounds. These groups, then work along with a mentor to complete these projects over a period of 4– 5 months. About 80% of these projects happen with mentors from industry. Mentors from various companies like Value Labs, Whiskkers Marketing Pvt. Ltd., BRIDGE i2i, Analytics Vidhya participate in helping students with these projects.
Importance of Capstone Project for students
Analytics is a very hands on subject. I personally believe that until and unless you complete a few projects end to end (i.e. from data collection to building a model to implementing a model), your knowledge and approach stays theoretical in nature.
The same view was reflected by students, when I talked to a few of them. Here is one of the student sharing his experience undergoing the capstone project:
During our course at Great Lakes, we learnt about various techniques like, ways of cleaning-up data, multiple algorithms for analysis and modeling, goodness of fit to check model fitness and had numerous sessions on business intelligence. But it is the Capstone Project that introduced us to the real world challenges of applying our learning, testing adequacy, usefulness and subjectivity of certain algorithms. We also got some real time insight into the processes and steps involved in a typical analytics project. Without Capstone Project our learning would have remained incomplete. – Sriram Alagarsamy, Great Lakes PGPBA Alumnus
The Projects
In the rest of this article, I’ll provide a few examples of the Capstone Projects, which happened in last batches of Great Lakes and the industry sponsors / mentors were happy to share the project details through this article.
Project 1: Credit Default Prediction
Students: Kanthimathi Gayatri Sukumar, S.R.Balaji, S.Ramnath, Rudragouda Patil, Mithur Niranjan
In this project, the students had built several models to predict credit default for 2 wheeler loans in villages in India. The variables available to the candidates included:
- Interest rate (Finance rate)
- Finance charges in Rupees
- Finance amount (Loaned amount) in Rupees
- Average EMI (Equated Monthly Instalment) in Rupees
- Income to Instalment ratio (noir)
- First information report (fire)
- Gross income of the customer (gih) in Rupees
- Gross income of spouse(gis) in Rupees
- Gross income of co-borrower(gic) in Rupees
- Average bank balance/EMI ratio (noisb)
The group had built several models including Logistic regression, Classification tree – CART, Classification tree (Ensemble) – Random forest, Neural networks and Discriminant analysis. Here is the comparison of various modeling techniques:
The group finally used Logistic regression because of high accuracy and ease of insight generation (rather than a black box approach to modeling)
Project 2: Statistical Analysis of Consumer Durables Retail Sales
Students: Soumya Tiwari, Remina Surendrababu, Arnab Majumdar, Vijayalekshmi L, Siju Joseph
The project team analyzed about 0.5 million transactions for a retail chain having 20 branches across Chennai. The team collected the data for a period of three years to find a reasonable practical solution to the following business problems faced by a consumer durable retail chain:
- Purchase Basket Analysis
- Analysis of loyalty to any particular brand across categories
This group used R and built a recommender system using Apriori algorithm.
Project 3: Web & Text Mining – Sentiment Analysis
Student: Anjana Agrawal
Anjana has been doing freelance IT consulting to clients across the globe. Her Capstone Project required her to enable a political organization to understand people’s concerns, views and sentiments on topics in near real time. She also has a publication, which came out of her work (commendable! I must say).
The project started by obtaining input text data from “Twitter” for one of the leading political organization and the tweets were scored for sentiment analysis. Output generated in terms of “Common Key Words”, “Association between common key words”, “Sentiment Score” was observed and analyzed for a period of approx. 3 weeks. This output was important for the organization considering the fact that this political organization was campaigning for the elections scheduled in near future and based on the “Citizen” sentiments, organization was able to refine their next course of action.
The project ended up being used by the party in real time and helped them in addressing the concerns of its people quickly.
Project 4: Credit card – Risk Analytics
Students: Amit Madan, Praveen Panwar
The objective of this project was to develop a model to calculate the probability of default (PD Model) for credit card holders of a bank. The model was supposed to be run after the customer has spent more than a year with the bank and predicts the probability of default over a period of next 3 months.
Amit & Praveen had more than 2 million data points with 20 variables (18 numeric and 2 character) to start with. They created more than 50 additional variables based on their internal brainstorming. After removing variables with high correlation and the insignificant variables, they used Logistic Regression to build the model.
The final model had a concordant ratio of 91.4% and a list of 230%. Here is the Lorenz curve for the same.
It was amazing to see the amount or learning the students had undergone through this project. Even if you join a bank as an analyst, it would take you a few years before you can get your hands dirty on a problem like this!
Amit reflected similar sentiment in my discussion with him later:
There are three important steps in the computational modelling of any physical process (1) Problem definition (2) mathematical model and (3) computer simulation. CAPSTONE provided for an opportunity to synthesize domain expertise, mathematical representation and computational skills to re-produce a modelling framework so as to be able to cull out the insights present in a data. It was by virtue of CAPSTONE, we could apply all of what we had learned at Great Lakes and it helped generate an urge to keep undertaking similar assignments for sustained mental stimulation. – Amit Madan, Great Lakes PGPBA, Alumnus
Project 5: Developing Least Cost Effective Intervention in Schools
Students: Balamuril S, RamKumar R, Senthil J, Sreeraman K, Sriram A
This group applied the power of analytics to impact government aided schools in 5 districts. They created clusters of schools, developed insights about their performance and translated them into quality improvement program for these schools.
They further created a model to rank these schools, simulate the effect of government interventions and then measure them as well. The project should not only impact the 5 districts, but can have a far larger impact by implementing these insights for schools across the nation.
Project 6: Lead Generation for Health Insurance Firms using Web and Social Media Data
Students: Harpreet Kaur, Aneet Sachdeva, Peeyush Tiwari
This group first collected open data and social media data, applied text mining and natural language processing (NLP) techniques to extract features out of this data. They then applied various clustering and bag of words techniques to pull out insights into insurance purchase behavior. Some of these groups were identified as high potential leads for Health Insurance products.
End Notes
As you can see, the projects varied from predicting default in rural regions to helping parties contest elections. It was heartening to see the output from this group of people, most of whom had very little knowledge about analytics a year ago.
I hope this article would provide learners in analytics with a few ideas to do their own Capstone Projects and would have provided a glimpse of the program to those interested. You can read more details about the program here.