TIL
Web Dev Bootcamp TIL Day-19(Machine Learning Start)
frannyk
2022. 5. 13. 08:57
Websites for collaboration and pulling data
- Kaggle: an online community of data scientists and machine learning practitioners.
- Collaboratory: Google's jupyter notebook
Regression vs. Classification
- regression involves predicting data patterns with model ----> each data input(x) will be assigned a float value(y) based on the model we have implemented
- classification entails the practice of assigning a class to each input
- Binary classification & Multi-label classification
- ex) P/F, Grade Schema, Age Range
Supervised Learning vs. Unsupervised Learning
- supervised learning involves annotating or labeling input data so that the our model predicts the annotation
- unsupervised learning involves sorting a clustered set of input data w/ out annotations
Reinforcement Learning
- agent: who
- enviornment: where(rules and settings of a game)
- state: current state/status(certain position)
- action
- reward
Data Splitting
- it is convention to split data into the a certain ratio ----> 4(training set): 1(test set)
Packages we will use
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam, SGD
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
Basic model template
- parameters:
- weight & bias: slope and intercept of learning function
- loss function: calculates the difference between actual outputs and predicted outputs
- optimizer: updates the model in response to the output of the loss function - assists in minimizing loss function
- learning rate: tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function
- epochs: iterations of entire training dataset the machine learning algorithm has completed
Machine Learning Models
- Support Vector Machine(SVM)
- creates a division whose vector / margin is equidistant from opposing classes
- new features will result in additional dimenstions ----> division will be multidimensional as well
- k-Nearest Neighbors(KNN)
- identifies new object's class based on its proximity to other objects
- Random Forest Trees
- runs object through multiple decision trees then outputs a result based on majority voting
Preprocessing Data
- normalizing data
- ex) scaling x_data to z scores ----> x - u / s
- standardizing data
- ex) zero-centering data
