Carefully curated list of awesome Data Science resources.
Contributions Welcome! Add links through pull requests or create an issue to start a discussion.
Would you like to see the resources in this repo in your native language? You can help us with translating them on this dedicated repo!
Currently 200+ Resources added. (Updated September 2023)
Welcome to Awesome Data Science! This repository is a curated collection of valuable resources, tools, and tutorials for anyone passionate about the exciting field of data science. Whether you're an aspiring data scientist or an experienced practitioner, you'll find a wealth of information here to enhance your knowledge and skills. Explore various topics, including machine learning, data visualization, and statistical analysis. Discover new insights and stay up-to-date with the latest trends in this ever-evolving discipline. Dive in and elevate your data science journey with the resources we've gathered for you!
✔️ Want to know what the most common tools for data science are?
✔️ Want to know what a data scientist does?
✔️ Ever wondered if you can teach yourself to be a data scientist?
✔️ Want to know how to launch a data science career?
✔️ Need help practicing for data science interviews?
✔️ Are you looking for a data science mentor?
The collection below is part of the Awesome Data Science GitHub Repo that contains data science resources for the beginner and pro.
📰 Articles | 📑Cheat Sheets | 🧑🤝🧑 Communities |
🗣️Mentoring | 🖥️Online Courses | 🖋️ Projects |
🎤Talks | 🔤 Tutorials | 🎥 Webinars |
Join the Artificial Intelligence First newsletter today. You'll be kept informed about open source frameworks, carefully selected tutorials, and articles compiled by experts in artificial intelligence. |
- Articles
- Books
- Cheat Sheets
- Communities
- Datasets
- Development Environments Tools
- Games
- Local Newspaper Archives
- Mentoring
- Online Courses
- Open-Source Project Contribution
- Podcasts
- Projects
- Reports
- Talks
- Tutorials
- Videos
- Volunteering
- Webinars
- How to create a great data science portfolio - An essential guide on the hallmarks of a great data science portfolio.
- The data science guide - A comprehensive guide with cases, code samples and notebooks about the Databricks Lakehouse Platform.
- 6 Data science trends - An Ebook containing six emerging trends that have the potential to accelerate ML projects and move organisations from descriptive toward predictive and prescriptive analytics.
- Data Science Applications - A look at how we can apply data science thinking to different problems.
- How to show your data science skills - A portfolio guide to help you ensure that you've covered all of the major categories when you are showcasing your data science skills.
- Blueprint for building a data product business- An article that pulls together several key design templates to create a Data Product Blueprint to help organisations build their data products business.
- Get ready for a successful job interview - A collection of guides to help you research, prepare, and practice to make the most of your next job interview. Then, learn how to stay present and show your best self whether your interview is in-person or virtual.
- Mapping the world's ecosystems - A Conversation with Roger Sayre, Senior Scientist for Ecosystems at the U.S. Geological Survey.
- What a big data approach and geospatial tools reveal about human mobility - A look at how mobility studies experts can contextualise and visualise large-scale human movements.
- Fighting crime with data - A look at how Police Charleroi focuses on the most important and urgent issues, measures its effectiveness over time and keeps stakeholders informed.
- Data Science (MIT Press Essential Knowledge series) by John D Kelleher - By John D Kelleher (2018)
- Python: 3 books in 1 -Your complete guide to python programming with python for beginners by Brady Ellison - By Brady Ellison (2022)
- Build a Career in Data Science by Emily Robinson - By Emily Robinson (2020)
- Python Data Science Handbook - By Jake Vanderplas (2016)
- The Handbook of Data Science And AI - By Stefan Papp (2022)
- Data Science from Scratch - By Joel Grus
- The Art of Statistics: Learning from Data - By David Spiegelhalter (2020)
- Storytelling with Data: A Data Visualization Guide for Business Professionals - By Cole Nussbaumer Knaflic (2015)
- Big Data: A Revolution That Will Transform How We Live, Work and Think - By Viktor Mayer-Schonberger (2014)
- Data Science for Dummies - By Lillian Pierson (2015)
- Machine learning cheat sheet - Datacamp guides you around the top machine learning algorithms and use-cases.
- Python for data science - Datacamp presents the Python basics.
- Data science cheat sheet for business leaders - Datcamp guides you through how data science can help your business.
- Descriptive Statistics cheat sheet - Datacamp presents the most common statistical techniques for descriptive analytics.
- Cheat sheet for supervised learning - Afshine Amidi presents a supervised learning guide.
- Cheat sheet for unsupervised learning - Afshine Amidi presents an unsupervised learning guide.
- Cheatography Machine learning cheat sheet - Cheatography presents a cheat sheet on Machine learning.
- Scikit-learn algorithm cheat sheet - The official Scikit Learn cheat sheet.
- Keras for data science cheat sheet - An Amazonaws Keras cheat sheet.
- Neural Networks cheat sheet - An Asimovinstitute Neural Networks cheat sheet.
- Data preparation with SQL cheat sheet - The official KDNuggets SQL cheat sheet.
- Getting started with Pandas - The official KDNuggets Pandas cheat sheet.
- Datacamp Slack Community - DataCamp has built an instant messaging community where subscribers can discuss DataCamp and data science. (DataCamp,2022)
- Codeacademy Chapters - Codecademy Chapters are the perfect place to collaborate with fellow learners virtually or in-person. (Code Academy,2022)
- Microsoft Azure Database Community - An Azure Data community managed by the Azure SQL DB support team. (Microsoft,2022)
- Kaggle Community - Discuss the Kaggle platform & machine learning topics. (Kaggle,2022)
- DataQuest Community - The Dataquest community will help you get your data questions answered and allow you to collaborate with your peers. (DataQuest,2022)
- StackOverflow Community - A community where you can ask questions about data science. (Stack Overflow,2022)
- IBM Data Science Community - Master the art of data science with the IBM data science community. (IBM,2022)
- Reddit Data Science Community - A place for data science practitioners and professionals to discuss and debate data science career questions. (Reddit,2022)
- DrivenData Community - A space where experienced and aspiring data scientists can solve pressing problems for mission-driven organisations. (DrivenData,2022)
- DataScienceCentral Community - A community for big data practitioners. (Data Science central,2022)
- Replit Community - A global community of coders with a place for everyone, beginners and experts alike. (Replit,2022)
- GitHub Community - A community that supports all GitHub users on their educational journey via discussions. (GitHub, 2022)
- Khan Academy Community - A community providing free, world-class education to anyone, anywhere. (Khan Academy, 2022)
- DataKind Community - A community of passionate data scientists, visionary partners and mission-driven organisations. (DataKind, 2022)
- PyData Community - The global PyData network that promotes discussions around best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualisation. (PyData, 2022)
- Hacktoberfest - Join the Hacktoberfest Discord community. (Hacktoberfest,2022)
- Outreachy Community- Outreachy's community supports people from groups underrepresented in tech. (Outreachy,2022)
- Up-For-Grabs - A collection of open-source projects for new contributors and a community of supporters. (Up for Grabs,2022)
- CodeTriage - A collection of open-source projects for new contributors and a community of supporters. (CodeTriage,2022)
- Ovio - Browse through a curated list of projects and issues waiting for your help. (Ovio,2022)
- HuckleTree - A community of innovators, brands, investors, mentors, ambassadors and educators. (HuckleTree,2022)
- Python - Get involved and stay informed with the Python community. (Python,2022)
- Rails Girls - Join the community and get involved. Support Rails Girls Summer of Code as a coach, a mentor or as an organiser, helping in various areas such as fundraising, editing, working on our sites or helping the students find Open Source projects. (Rails girls,2022)
- Zindi - Connect with fellow data scientists, and learn from the best. (Zindi,2022)
- Seaborn data - Data repository for seaborn examples.
- Plotly data - Plotly sample datasets.
- Matplotlib data - Sample data needed for some of Matplotlib's examples.
- Free public data sets for analysis - Free public data sets for analysis.
- Open Datasets - Explore, analyse and share quality data.
- Natural Language Toolkit Data - NLTK Corpora datasets.
- FiveThirtyEight Data - Free datasets to advance public knowledge.
- BuzzFeed Data - Free datasets for data analytics.
- Nasa Earth Data - Full and open access to NASA's collection of Earth science data for understanding and protecting our home planet.
- Nasa Space Data - Advanced, focused search tools are available from several PDS discipline nodes.
- Our World in Data - Free data to make progress against the world's largest problems.
- Bokeh Sample Data - The sample data module can be used to download datasets used in Bokeh examples.
- TensorFlow Data - Tensorflow datasets.
- PyDataset Data - Instant access to many popular datasets for Python (in data frame structure).
- Scikit-learn Data - The sklearn. datasets package embeds some small toy datasets.
- Stratsmodel Data - Statsmodels provides datasets (i.e. data and meta-data) for use in examples, tutorials, model testing, etc.
- Visual Studio Code - A standalone source code editor that runs on Windows, macOS, and Linux.
- IntelliJ - Every aspect of IntelliJ IDEA has been designed to maximise developer productivity.
- PyCharm - Rely on it for intelligent code completion, on-the-fly error checking and quick fixes, easy project navigation, and much more.
- RStudio - RStudio integrated development environment (IDE) is a set of tools built to help you be more productive with R and Python.
- RubyMine - Ruby and Rails IDE
- Jupyter - Free software, open standards, and web services for interactive computing across all programming languages.
- NetBeans - A Development Environment, Tooling Platform and Application Framework.
- Replit - Build software collaboratively from anywhere in the world, on any device, without spending a second on setup.
- CodinGame - The new way to improve your programming skills while having fun and getting noticed.
- CSS Diner - It is a fun game to learn and practice CSS selectors.
- FlexBox Groggy - Flexbox Froggy is a game where you help Froggy and friends by writing CSS code!
- FlexBox Defense - Your job is to stop the incoming enemies from getting past your defences. Unlike other tower defence games, you must position your towers using CSS!
- Grid Garden - Grid Garden, where you write CSS code to grow your carrot garden!
- Code Combat - An innovative game-based learning technology that has transformed how students learn to code.
- Scratch - Scratch is the world's largest free coding community for kids.
- Tynker - Tynker is a fun way to learn programming and develop problem-solving & critical-thinking skills.
- SQL Murder Mystery - The SQL Murder Mystery is designed to be both a self-directed lesson to learn SQL concepts and commands and a fun game for experienced SQL users to solve an intriguing crime.
- Untrusted - A hacking/social deduction game by Alex Nisnevich and Greg Shuflin.
- Data Visualisation: How America moves its homeless - By The Outside America in Team (The Guardian, 2017)
- Here's when we expect Omicron to peak - By Unknown (Dataviz Inspiration, 2022)
- Four ways to slice Obama's 2013 budget proposal- By Shan Carter (The New York Times, 2012)
- Presidential election 2022: the dashboard of polls, speaking time and sponsorhips - By Raphaëlle Aubert, Manon Romain and Gary Dagorn (Le Monde, 2022)
- Migration waves - By Unknown (National Geographic, 2022)
- Russia, Gas, and the Ukraine conflict - By Unknown (The New York Times, 2022)
- End of the Covid-19 pandemic - By Sarun Charumilind, Matt Craven, Jessica Lamb, Adam Sabow, Shubham Singhal, and Matt Wilson (Mckinsey, 2022)
- Infections caught in laboratories are surprisingly common - By Unknown (The Economist, 2021)
- Individual carbon footprint - By Unknown (The Economist, 2021)
- Child labour - By Unknown (Dataviz Inspiration)
- STEMetters - Mentor young women and non-binary people through the Student to Stemette (STS) programme.
- Robogals - Utilise the resources on Robogals for your mentoring.
- Khan Academy - Utilise the resources on Khan Academy for your mentoring.
- She Codes - Find ways to get involved and mentor with She Codes.
- Girls who code - Join the Girls Who Code community by starting a Club or as our Summer Program teaching staff.
- CBF - Mentor with CBF
- PyData - Mentor with PyData.
- DataKind - Volunteer in various areas, including data science, project management, event planning, and guest blogging.
- ColorInTech - Utilise the resources on ColorInTech for your mentoring.
- Black Valley - Utilise the resources on Black Valley for your mentoring.
- IBM Data Science professional Certificate - By IBM - 11 Months - Beginner level. (Coursera, 2022)
- Harvard University Data Science Courses - By Harvard University - 5 weeks - Beginners to advanced levels. (Harvard, 2022)
- Datacamp: Understanding data science - By DataCamp - 2 Hours - Beginner level. (DataCamp, 2022)
- Code Academy Data Science courses - By Code Academy - 1 Month lessons - Beginner level. (Code Academy, 2022)
- JetBrains programming courses - By JetBrains Academy - 1 Month - Beginner level. (JetBrains, 2022)
- Introduction to computational thinking and data science - By MIT - 9 Weeks - Beginner level. (EdX, 2022)
- SQL for data science with R - By IBM - 6 Weeks - Beginner level. (EDX, 2022)
- Statistical thinking for data science and analytics - By Columbia University - 5 Weeks - Beginner level. (EDX, 2022)
- Introduction to natural language processing - By Udemy - 3 Hours - Beginner level. (Udemy, 2022)
- Hands-on data science: build real world projects - By Udemy - 105 Hours - Beginner level. (Udemy, 2022)
- Hacktoberfest - Contribute to open-source projects through Hacktoberfest. - A platform to help technical and non-technical people contribute to open-source projects.
- GitHub - Contribute to open-source projects through GitHub - Find a list of open-source projects.
- First Timers Only - Contribute to open-source projects through First Timers Only - for the code newbie or experienced coder.
- First Contributions - Contribute to open-source projects through First Contributions - A project to simplify and guide the way beginners make their first contribution.
- Up-For-Grabs - Contribute to open-source projects through Up-For-Grabs - a list of projects which have curated tasks specifically for new contributors.
- Good First Issues - Contribute to open-source projects through Good first issues - a website primarily targeted at developers who want to contribute to open-source software but do not know where or how to start.
- CodeTriage - Contribute to open-source projects through Code Triage - Free community tools for contributing to Open Source projects
- Outreachy - Contribute to open-source projects through Outreachy - an online collaborative environment for learning and remote mentoring with experienced FOSS contributors.
- Python - Contribute to open-source projects through Python - an open-source platform for developers.
- Scala - Contribute to open-source projects through Scala - an Open Source platform for learning different technologies based in the Scala Programming Language.
- Season of Docs - Contribute to open-source projects through Seasons of Docs - A platform that provides support for open-source projects to improve their documentation and allows professional technical writers to gain experience in open source.
- Google summer of code - Contribute to open-source projects through Google Summer of Code - a global, online program focused on bringing new contributors into open-source software development.
- Forem - Contribute to open-source projects through Forem - an open-source platform for building modern, independent, and safe communities.
- Open-source programs and competitions - Contribute to open-source projects through the Open Source programs — a GitHub repository list of resources.
- The rise of spatial data science - By Wendy Keyes and David Gadsden (Esri, 2022)
- Data science and the rise of geospatial thinking - By Lauren Bennett (Esri, 2021)
- Data and location: from insights to action - By Mike Lippmann (Esri, 2019)
- Weaving the geospatial fabric of the world with authoritative data - By Greg Bunce (Esri, 2022)
- How data literacy skills help you succeed - By Jordan Morrow (DataCamp, 2022)
- Creating a database for AI - By Davit Buniatyan (Data crunch, 2022)
- The future of unstructured data - By Edward Cui (Data crunch, 2022)
- Data governance for data science - By Adam Wood (Twimlai, 2022)
- Feature platforms for data-centric AI - By Mike Del Balso (Twimlai, 2022)
- The state of artificial intelligence 2022 - By Stanford Institute for Human-Centered Artificial Intelligence (Data Science at Home, 2022)
- Analyze your personal Netflix data - By DataQuest (2020)
- Data science projects - By GitHub (2022)
- Top 10 data science projects in 2022 - By Hackr.io (2022)
- 14 data science projects from beginner to advanced level - By Udemy(2022)
- Language translator using Google API in Python - By GeeksForGeeks (2022)
- 100 python projects for beginners - By Nat (2022)
- 200 python projects for beginners- By Nat (2022)
- Dataquest: Exploring hacker news posts - By DataQuest (2022)
- Tableau Community Projects - By Tableau (2022)
- 55 Fun python project ideas - By DataQuest (2022)
- State of Date Science 2022: paving the way for innovation - Viewable on the Anaconda website.
- Scaling AI/ML Initiatives: The critical role of data - Viewable on the Snowflake website.
- The opportunity of biomedical data science - Viewable on the UKRI website.
- Grit: The power of passion and perseverance - Angela Lee Duckworth explains her theory of "grit" as a predictor of success.
- The best stats you've ever seen - Hans Rosling debunks myths about the so-called "developing world."
- The beauty of data visualization - David McCandless suggests the best way to navigate the information glut.
- Beyond the numbers: A data analyst journey - Anna Leach discusses the importance of investing time with people and the process of analysing data, as well as its resources.
- The human insights missing from big data - Tricia Wang demystifies big data and identifies its pitfalls, suggesting that we focus instead on "thick data" -- precious, unquantifiable insights from actual people -- to make the right business decisions and thrive in the unknown.
- Make data more human - Jer Thorp shares his moving projects, from graphing an entire year's news cycle to mapping the way people share articles across the internet.
- Big data is better data - Kenneth Cukier looks at what's next for machine learning -- and human knowledge.
- Own your body's data - Talithia Williams makes a compelling case that all of us should be measuring and recording simple data about our bodies every day — because our own data can reveal much more than even our doctors may know.
- The power of believing that you can improve - Carol Dweck describes two ways to think about a problem that's slightly too hard for you to solve.
- The world needs all kinds of minds - Temple Grandin makes the case that the world needs people on the autism spectrum: visual thinkers, pattern thinkers, verbal thinkers, and all kinds of smart geeky kids.
- Learn just enough JavaScript - Hands-on tutorials by Observable
- Data science tutorial with W3Schools - Hands-on tutorials by W3School
- Plotly open-source graphing library tutorials - Hands-on tutorials by Plotly
- Building and deploying machine learning pipelines - Hands-on tutorials by DataCamp
- Getting started with TabPy - Hands-on tutorials by DataCamp DataCamp
- Databricks data science and engineering - Hands-on tutorials by Databricks
- Decision tree introduction with examples - Hands-on tutorials by GeeksforGeeks
- Build a recurrent neural network using Pytorch - Hands-on tutorials by IBM
- Download datasets from the Red Hat Marketplace - Hands-on tutorials by IBM
- Building models using Jupyter notebooks in IBM Watson Studio- Hands-on tutorials by IBM
- What is data science? - By IBM Technology (YouTube, 2022)
- Learn data science tutorial - By FreeCodeCamp (YouTube, 2019)
- Python for data science - By FreeCodeCamp (YouTube, 2020)
- Machine learning for data science - By FreeCodeCamp (YouTube, 2022)
- Data structure easy to advance - By FreeCodeCamp (YouTube, 2019)
- Learn data science tutorial - By FreeCodeCamp (YouTube, 2019)
- 12 data science apps using python - By FreeCodeCamp (YouTube, 2021)
- Statistics - Data science basics - By FreeCodeCamp (YouTube, 2019)
- Programming R tutorial - By FreeCodeCamp (YouTube, 2019)
- How data and culture unlock digital transformation - By DataCamp (YouTube, 2022)
- STEMetters - Volunteer at one of the STEMetters events.
- Robogals - Find your local chapter and start making a difference today.
- Khan Academy - Share your story or help translate Khan Academy content.
- She Codes - Get involved and mentor with She Codes.
- Girls who code - Start a club, fundraise or join a campaign.
- CBF - Volunteer with CBF.
- PyData - Volunteer at an event, mentor, or contribute code.
- DataKind - Volunteer at an event.
- ColorInTech- Join the community and attend events to find out about opportunities.
- Sentiment Analysis and prediction in Python - Justin Saddlemyer delivers a live training session helping you to build machine learning models. (Datacamp, 2022)
- Implementing deep learning models in streaming analytics projects - Steven Allan and Daniele Cazzari discuss implementing deep learning models. (Data Science Central, 2021)
- K Means algorithm explained in 60 minutes- This Edureka webinar on K-Nearest Neighbor Algorithm or KNN Algorithm will help you to build your base by covering the theoretical, mathematical and implementation parts of the KNN algorithm in Python. (Edureka, 2022)
- Demystifying the cloud journey - Hear from experts from Bloomberg, Confluent and AWS on the cloud-agnostic approach, using Bloomberg data in cloud infrastructure, and connecting to enterprise data via the cloud. (Bloomberg, 2022)
- The data science learning journey: How ModelOps fits into the big picture - Catherine Truxill and Peter Christie discuss how using ModelOps effectively can help you make the connection between the insights from your data and the answers you need. (SAS, 2022)
- The model data team as a team sport - Chia-Liang Kao will discuss why a modern data team is like a team sport, what a modern data team looks like, what data observability is, and give some concrete tips for creating the data dream team. (Cognilytica, 2022)
- Building a personal brand in data - Kate Strachnyi discusses the importance of showcasing your skills and becoming part of the data community. (Datacamp, 2022)
- Dzone data pipeline trends: simplify data architectures with an open lake house - Jeremiah Morrow and John Esposito discuss how an open data lakehouse can simplify your data architecture and more. (Dremio, 2022)
- How to create a data mesh that is right for you - Dr Jennifer Belissent discusses creating data meshes that are right for your organisation. (Snowflake, 2022)
- How industry cross-collaboration can generate new revenue for Telcos - William Cage discusses how to generate new revenue for telcos. (Snowflake, 2022)
- Natural Language Processing: Trends, Challenges and Opportunities- Marco Bonzanini gives an overview of the advances, challenges and opportunities in NLP technologies, looking at different modelling solutions and the Python ecosystem. (PyData, 2021)
- How To Ensure Responsible Use Of AI With A Real-World Example - Tariq Rashid explores an example of an innovative product designed to help the more vulnerable in our communities and discusses how we should ensure it is developed to be safe, fair and ethical. (PyData, 2021)
- Working with Data in a Connected World - Dr. Clair Sullivan gives a hands-on tutorial that will begin with a discussion comparing querying data in a tabular environment such as SQL or Pandas dataframes. (PyData, 2021)
To the extent possible under law, Natasha has waived all copyright and related or neighboring rights to this work.