본문 바로가기
Kaggle

2-1_Bike Sharing Demand

by Hot김치 2022. 7. 10.

Reference


Target

  • 자전거의 대여량(count)을 예측하는 문제.

Approach

  • Regression approach로 접근하여 estimation.(not a classification)
  • 독립변수: count를 제외한 나머지의 parameter의 변수
  • 종속변수: count, 즉 대여횟수의 변수

Data Fields from kaggle

  • datetime - hourly date + timestamp
  • season
    • 1 = spring
    • 2 = summer
    • 3 = fall
    • 4 = winter
  • holiday - whether the day is considered a holiday
  • workingday - whether the day is neither a weekend nor holiday
  • weather
    • 1: Clear, Few clouds, Partly cloudy, Partly cloudy
    • 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
    • 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
    • 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
  • temp - temperature in Celsius
  • atemp - "feels like" temperature in Celsius
  • humidity - relative humidity
  • windspeed - wind speed
  • casual - number of non-registered user rentals initiated
  • registered - number of registered user rentals initiated
  • count - number of total rentals

Evaluation

Submissions are evaluated one the Root Mean Squared Logarithmic Error (RMSLE). The RMSLE is calculated as 

1. 필요 라이브러리 import

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from scipy.stats import norm
from datetime import datetime
from sklearn.preprocessing import StandardScaler
from scipy import stats
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

# 그래프에서 격자로 숫자 범위가 눈에 잘 띄도록 ggplot 스타일을 사용
plt.style.use('ggplot')

2. Collecting the data

train 데이터의 분석 using pandas

In [2]:
df_train = pd.read_csv('./data/train.csv', parse_dates=['datetime'])
df_test = pd.read_csv('./data/test.csv')

## 아래처럼 datetime 으로 parameter 이름을 넘겨도 가능.
## df_train = pd.read_csv('./data/train.csv', parse_dates=['datetime'])
 

'Kaggle' 카테고리의 다른 글

2-5  (0) 2022.07.14
2-4. Bike  (0) 2022.07.13
2-3_Bike Sharing Demand  (0) 2022.07.12
2-2_Bike Sharing Demand  (0) 2022.07.11
1_House prices - Advanced Regression Techniques  (0) 2022.07.09