Dataquest

US · dataquest.io

Machine Learning Overview

Plus, 67 Free Ways to Improve Your Python Skills and More.

This email was sent

November 1, 2022 7:52am EDT

Is this your brand on Milled? Claim it.

Matte tone:

The Dataquest Download

Data Bytes

An Overview of Machine Learning

You've probably heard a lot about machine learning over the past few years. It's used everywhere these days — from recommending music to automatically trading stocks. It sounds like science fiction — a computer autonomously making predictions.

The way most machine learning works is that an algorithm learns the mapping between some input numbers and a target we're trying to predict. This is called supervised machine learning — we're supervising the algorithm to predict something for us.

For example, let's say we're trying to predict the stock market. We first set a target (what we're trying to predict). This target might be tomorrow's closing stock price.

We then use the other data we have (in this case, today's opening and closing prices) to predict the target.

You can see the opening and closing prices (called the predictors) and the target in the table below:

Date	Open	Close	Target
2022-09-07	3909.42	3979.87	4006.17
2022-09-08	3959.93	4006.17	4067.36
2022-09-09	4022.93	4067.36	4107.27
2022-09-12	4083.66	4107.27	3932.69

If we're trying to determine if the stock price will go up or down tomorrow based on today's price, we might make up some rules:

If today's price is higher than the average over the last month, then the stock will go back down.
If today's closing price is a lot lower than today's opening price, then the price will go back up tomorrow.

With supervised machine learning, we use an algorithm to learn these rules automatically from historical data. This will create a trained model that we can then test on historical data to analyze its performance. Once we test our model out and see that it works, we can use it to predict the future (like tomorrow's stock price!)

Machine learning algorithms split into two broad categories:

Classification algorithms learn which category something is in — for example, we might classify images based on what breed of dog is in them.
Regression algorithms will predict a number, like tomorrow's stock price.

Here are some of the most popular machine learning algorithms:

Linear regression assumes that there is a linear relationship between your target and your predictors. It's the most commonly used machine learning algorithm.
Decision trees work by repeatedly splitting your data up into two groups (a lot like decision trees you might use in real life!).
Neural networks use internal weights to transform your data from the input into the target. Neural networks are used in deep learning, which has led to many recent AI breakthroughs, like GPT-3.

This may sound complicated, and machine learning algorithms do involve a lot of math internally, ranging from linear algebra to calculus.

Luckily, you don't need to know all of the math to actually use these algorithms. In most programming languages and data tools, libraries that implement common machine learning algorithms have already been developed:

Scikit-learn in python implements popular algorithms.
R has packages that implement most machine learning algorithms.
Javascript now has machine learning libraries, including tensorflow.
Excel enables us to do linear regression.

If you want to learn more about machine learning, check out our Dataquest courses and projects:

You can also find tutorials on our blog and Youtube:

—Vik

Founder, Dataquest

P.S.: This section is new, and we're working to improve it! Please reply to this email, and let us know what you think.

Video

How to Start Your Career in Data: Find Your Role, Learn Skills, Build a Portfolio

In this video, we'll cover how to find the right data role, how to learn the skills you need to get hired, and how to show your skills to employers.

You'll learn about the five main data roles: data analyst, data scientist, data engineer, business analyst, and machine learning engineer.
We'll cover the exact skills you need to get a job in each role. We'll talk about a four-step learning method that will help you learn the skills.
Then, we'll cover how to build a project portfolio that will advance your career.

Watch Video

Learning Tips

Python Online Practice: 67 Free Ways to Improve Your Skills

Whether you’re just starting your learning journey or looking to brush up before a job interview, getting the right Python practice can make a big difference. Studies on learning have repeatedly shown that people learn best by doing. So here are 67 ways to practice Python by writing actual code.

Keep reading

Tutorial

Implementing a B-Tree Data Structure

Rudolf Bayer and Edward M. McCreight coined the term B-tree data structure at the Boeing Research Labs in 1971. They published a scientific paper titled "Organization and Maintenance of Large Ordered Indices" and introduced a new data structure for fast data retrieval from disks. Although the B-tree data structure has evolved over the decades, understanding its concepts is still valuable. Here’s what we’ll cover in this tutorial:

B-tree data structures
B-tree properties
Traversing in a B-tree
Searching in a B-tree
Inserting a key in a B-tree
Deleting a key in a B-tree

Keep reading

Community Spotlight

@giovanni.srg shared a high-quality machine learning project on Support Vector Classifier with Python where he used the SVC algorithm to classify the level of customer satisfaction of an airline. The project is noticeable for its excellent explanations, insightful plots, and the high accuracy achieved by the algorithm.

In his R project on Analysis Of The Factors Affecting Forest Fires, @abomayesan has done an incredible job by providing the necessary background information, and making amazing visualizations.

@anna.strahl shared an awesome project on Finding Indicators of Heavy Traffic on Westbound I-94 where, in addition to excellent storytelling, she came up with uncommon figures to corroborate her conclusions and spotted very curious ideas for further investigation.

Visit the community

Invest in your future

Go Premium and Accomplish Your Goals.

Your goals are within reach. Subscribe and follow Dataquest's proven paths to grow your career.

View Plans

This email has been sent to -. Click here to unsubscribe.

Dataquest • 548 Market St #73537 San Francisco, CA, 94104

Machine Learning Overview

Plus, 67 Free Ways to Improve Your Python Skills and More.

Recent emails from Dataquest