Advertisement
GamerBhai02

DS Assignment 9 & 10

May 14th, 2025
20
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.96 KB | Source Code | 0 0
  1. import pandas as pd
  2. import numpy as np
  3. from sklearn.preprocessing import OneHotEncoder, KBinsDiscretizer
  4.  
  5. url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
  6. df = pd.read_csv(url)
  7. print(df.isnull().sum())
  8.  
  9. df['Age'].fillna(df['Age'].median(), inplace=True)
  10. df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True)
  11. df['Fare'].fillna(df['Fare'].median(), inplace=True)
  12.  
  13. categorical_features = ['Sex', 'Embarked']
  14. encoder = OneHotEncoder(sparse_output=False, drop='first')
  15. encoded_data = encoder.fit_transform(df[categorical_features])
  16.  
  17. encoded_df = pd.DataFrame(encoded_data, columns=encoder.get_feature_names_out())
  18.  
  19. discretizer = KBinsDiscretizer(n_bins=5, encode='ordinal', strategy='uniform')
  20. df[['Age_binned', 'Fare_binned']] = discretizer.fit_transform(df[['Age', 'Fare']])
  21.  
  22. df.drop(columns=categorical_features + ['Age', 'Fare'], inplace=True)
  23.  
  24. df_final = pd.concat([df, encoded_df], axis=1)
  25.  
  26. display(df_final.head())
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement