Hi, in this post I will be sharing my experience on Kaggle

I have been active on Kaggle past one month and I have learned and gained a lot about how to do data science and machine learning on dataset. 



Actually, I had registered on Kaggle one year ago but was not active. After being suggested by one of the company officials during campus interview, I decided to explore more of it. Initially I participated in a famous competition for beginners, Titanic: Machine learning from disaster. I simply pick all numerical features and applied logistic regression on it which made me sit on top 86%. I was not happy with the position but surely it was my first ever submission and I was also having less experience in the field of machine learning such as data visualization and feature engineering. I tried some of other algorithms also like, KNN, SVM, DecisionTree, LDA. But no improvement.



I decided to take a course on feature engineering as most of blog talked about it. I took a course from DataCamp and completed with some insights about how to take out important feature from your dataset. How to draw feature if you have categorical and columns having text data. I was excited to apply these techniques on Titanic dataset.


OMG! After applying these techniques my accuracy improved a lot and I ended up being in the top 12%. I then realised the importance of feature engineering. It make me felt nice since I was a beginner and I was doing great till now.



Next I decided to take part in some Regression based competitions and I ended up with 

House Prices: Advanced Regression Techniques. Initially, I used only numeric columns and then used some techniques to remove missing values which made me use some other columns as features too. I also used a variety of Data Visualization techniques. After taking this competition, I took a course on Supervised Learning from DataCamp. I also used some other Regression technique on this dataset too.


I also followed a notebook on Kaggle for applying CNN on Famous Digit Recognizer dataset commonly known a MNIST dataset in machine learning world. I than used the same technique on a dataset named Kannada MNIST which is primarily like MNIST but digits  are in Kannada Language. I ended up being in top 31% when this competition ended.


So these were some of the competitions I participated in. I am still trying to improve my accuracy by using different techniques such as Bagging, Boosting, AdaBoost etc.


Currently I am doing some NLP course from DataCamp and trying to improve my knowledge on different aspects of Machine Learning. I will be sharing some of the techniques I have used in some post in future.


I will just add the following points as advise:

  • Just start and don’t wait for anything to make you start. If you have a little knowledge than also it is okay.

  • Try different things you come across and try to read others’ notebook too. Learn different things and practice on Kaggle.

  • Work on different type of dataset on Kaggle and ask on discussion forum.


I am still learning and my first month gave me enough motivation to keep going until I ended up being Grand Master in one of Kaggle domains.




Leave a reply


Newsletter


Categories