Web Dev Bootcamp TIL Day-19(Machine Learning Start)

TIL

Web Dev Bootcamp TIL Day-19(Machine Learning Start)

frannyk 2022. 5. 13. 08:57

Websites for collaboration and pulling data

Kaggle: an online community of data scientists and machine learning practitioners.
Collaboratory: Google's jupyter notebook

Regression vs. Classification

regression involves predicting data patterns with model ----> each data input(x) will be assigned a float value(y) based on the model we have implemented
classification entails the practice of assigning a class to each input
- Binary classification & Multi-label classification
- ex) P/F, Grade Schema, Age Range

Supervised Learning vs. Unsupervised Learning

supervised learning involves annotating or labeling input data so that the our model predicts the annotation
unsupervised learning involves sorting a clustered set of input data w/ out annotations

Reinforcement Learning

agent: who
enviornment: where(rules and settings of a game)
state: current state/status(certain position)
action
reward

Data Splitting

it is convention to split data into the a certain ratio ----> 4(training set): 1(test set)

Packages we will use

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam, SGD
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split

Basic model template

parameters:
- weight & bias: slope and intercept of learning function
- loss function: calculates the difference between actual outputs and predicted outputs
- optimizer: updates the model in response to the output of the loss function - assists in minimizing loss function
- learning rate: tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function
- epochs: iterations of entire training dataset the machine learning algorithm has completed

small learning rate may not reach global minimum loss

Machine Learning Models

Support Vector Machine(SVM)
- creates a division whose vector / margin is equidistant from opposing classes
- new features will result in additional dimenstions ----> division will be multidimensional as well
k-Nearest Neighbors(KNN)
- identifies new object's class based on its proximity to other objects
Random Forest Trees
- runs object through multiple decision trees then outputs a result based on majority voting

Preprocessing Data

normalizing data
- ex) scaling x_data to z scores ----> x - u / s
standardizing data
- ex) zero-centering data