Are you fascinated by Data Science? Do you think Machine Learning is fun? Do you want to learn more about these fields but aren’t sure where to start? Well, start with Kaggle!
Kaggle is an online community devoted to Data Scientist and Machine Learning founded by Google in 2010. It is the largest data community in the world with members ranging from ML beginners like yourself to some of the best researchers in the world. Kaggle is also the best place to start playing with data as it hosts over 23,000 public datasets and more than 200,000 public notebooks that can be run online! And in case that’s not enough, Kaggle also hosts many Data Science competitions with insanely high cash prizes (1.5 Million was offered once!).
But there are still many misconceptions about Kaggle. Some believe that it is only a competition hosting website while others think that only experts can use it fully. The truth is that Kaggle is also a platform for beginners as it provides resources like basic courses relating to Data Science and ML. And then it also has basic competitions in the “Getting Started” category that slowly makes beginners into experts. And that is why this article provides an introduction to Kaggle and also the path you can follow to eventually become a full-fledged Data Science expert. Now let’s get started!!!
Resources Available on Kaggle
There are many resources available on Kaggle that will help you in becoming a Data Science beginner. So first, let’s see all these resources in detail.
1. DataSets: There are around 23, 000 public Datasets on Kaggle that you can download for free. In fact, many of these datasets have been downloaded millions of times already. You can use the search box to search for public datasets on whatever topic you want ranging from health to science to popular cartoons! You can also create new public datasets on Kaggle and those may earn you medals and also lead you towards advanced Kaggle titles like Expert, Master, and Grandmaster.
2. Notebooks: The Notebooks on Kaggle are virtual Jupyter notebooks that can be run on the cloud, so there is no need to download them. And they are free of charge! So you can check out the code on a notebook, edit it or add images (Basically whatever you want!) using the “Copy and Edit” button. You can also create a new notebook from scratch (which is also called a kernel) by clicking on the “New Notebook” button.
3. Courses: There is an entire set of Free Courses related to Data Science and Machine Learning on Kaggle that will teach you whatever you need to know to get started. While these courses are not deeply in-depth, they are the fastest way to start practicing on Kaggle. The Micro-Courses (as they are called) start from the basics like Python, Machine Learning, SQL, Data Visualization and move on to more complex topics like Pandas, Deep Learning, Geospatial Analysis, etc.
4. Discussion: There is an entire Discussion section on Kaggle apart from the option of commenting in Notebooks. This Discussion section includes the Kaggle Forum, QnA where you can ask advice from other Data Scientists, Getting Started which is the first stop for beginners, Product Feedback and Learn which is QA related to Kaggle Courses. Check out this section to ask questions and learn more about Kaggle!
5. Competitions: After you have spent some time with the Kaggle Datasets and Notebooks, it is time to move on to the Competitions. Kaggle Competitions are a great way to test your knowledge and see where you stand in the Data Science world! If you are a beginner, you should start by practicing the old competition problems like Titanic: Machine Learning from Disaster. After that, you can move on to the active competitions and maybe even win huge cash prizes!!!
6. Blog: Kaggle has an Official Blog that contains interesting articles ranging from “The future of AI in Africa” to “Teaching an AI to dance”! The Kaggle blog also has various tutorials on topics like Neural Networks, High Dimensional Data Structures, etc. You can also check out some Kaggle news here like interviews with Grandmasters, Kaggle updates, etc.
7. Jobs: And finally, if you are hiring for a job or if you are seeking a job, Kaggle also has a Job Portal! You can create a Job Listing if you are hiring and obtain access to the 1.5 million data scientists on Kaggle. And you can subscribe to the Kaggle Jobs Board if you are seeking a job to get access to the available career openings.
Basic Outline To Follow When Starting Kaggle
Now that you know all the options available on Kaggle, here is a basic outline to follow when you are just getting started. After you know more about this community, you can focus more on problems and competitions according to your skill levels.
1. Select a Programming Language:
The one thing that you absolutely cannot skip while starting Kaggle is learning a programming language! Python and R are currently the two most famous programming languages for Data Science and Machine Learning. If you are from a development background then Python would be the easier option for you and if you are from an analytical background, R would be preferred.
However, Python is currently the most popular language for ML. In fact, there are many Python libraries that are specifically useful for Artificial Intelligence and Machine Learning such as Keras, TensorFlow, Scikit-learn, etc. So if you want to learn ML, it’s best if you learn Python! There is even a free Python course available on Kaggle that will teach you most of the things you need to know to get started!
2. Learn on Standard DataSets
Once you have learned Python (or R), the next step is mastering data! You should be able to manage the loading and navigating of the data in order to achieve optimal results. For this, learn different models and also practice on real datasets. This will also help you in realizing which models to use in different situations.
There are around 23,000 public datasets on Kaggle that you can use for practice. Now, if you are a beginner, it’s very hard to understand which dataset is a good one and which is not. So it’s best that you start your practice from the standard datasets that are available such as Indian Liver Patient Records, Iris Species, Adult Census Income, Breast Cancer Wisconsin, etc.
3. Practice old Kaggle Competition Problems
Not that you have some basic idea about Kaggle, it’s time to practice some old competition problems. It’s best if you work through the popular Kaggle problems in the last few years so that you have a basic idea of what to expect. Solve problems of various types and then try to improve your solutions. You can do this by checking the forum posts, GitHub repositories, and winner blog posts for that particular problem. This will teach you how to solve a Kaggle problem in the most efficient manner so that you can even win competitions in the future!
In case you are confused about which problems to start with, here are some basic competitions that will help you build confidence.
- Titanic: Machine Learning from Disaster: This challenge is a very popular beginner project for ML as it has multiple tutorials available. So it is a great introduction to ML concepts like data exploration, feature engineering, and model tuning.
- Digit Recognizer: This is a project you should try after you have some knowledge of Python and ML basics. It is a great introduction into the exciting world neural networks using a classic dataset which includes pre-extracted features.
- First Step with Julia: This competition will help you learn Julia, which is a comparatively new computing language. The First Step with Julia also includes two tutorials on Julia wherein the first one focuses on the basics of the language and the second on K Nearest Neighbor algorithm.
Like these 3 competitions, there are many old competitions that you can practice, particularly in the “Getting Started” category.
4. Compete in Active Kaggle Competitions
Now that you are comfortable with Kaggle, it’s time to start participating in active competitions! Kaggle competitions are famous for insane prizes, so who knows what you may win! But it’s best to start small and so focus on only one competition at a time. Also aim for at least a spot in the top 25% on the private leaderboard initially as winning at the start is an unreasonable expectation.
Also, share your thoughts and solutions on the forum as they may lead to new ideas and collaborations in the future. In the end, have fun as you are aiming to learn and not to win. (And who knows, you may win as well !!!)