Untitled

class_connect

Jun 18th, 2025

Never

Add comment

Not a member of Pastebin yet? Sign Up, it unlocks many cool features!

text 36.52 KB | None | 0 0

raw download clone embed print report

Ex No: 01
Date: 19/02/2025
Create a sample dataset and explore statistical operations using pandas and visualize the results through plots
Aim:
To perform exploratory data analysis (EDA) on a user-defined sample dataset, using statistical operations and visualisations to identify patterns, trends, and insights.
Procedure:
Step 1: Open a new notebook in google colab
Step 2: Import all the necessary modules
Step 3: Create a dataset using Numpy random number generator and download it using the files module
Step 4: Now, load the dataset and convert it to dataframe using pandas
Step 5: Perform the listed metrics using the dataframe and packages that have been imported
Step 6: Additionally, import seaborn and matplotlib for plotting the graphs for the dataset.
Implementation:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from google.colab import files
# Upload dataset manually in Google Colab
uploaded = files.upload()
file_path = "sales_data (1).csv" # Adjusted path for Google Colab
df = pd.read_csv(file_path)
# Display DataFrame, top and bottom 5 values
print(df.head())
Ex No: 02
Date: 06/03/2025
IMPLEMENT UNINFORMED SEARCH STRATEGIES FOR ANY REAL WORLD PROBLEM
Aim:
The aim is to develop a User vs AI Tic-Tac-Toe game where the AI uses Breadth-First Search (BFS) to determine optimal moves.
Algorithm:
Step 1: Initialize the Board
1.Create a 3x3 board with all cells initially set to empty ('-').
2.Print the initial empty board to show the player.
Step 2: Display the Board
1.Print the current state of the board after each move.
Step 3: Main Game Loop (Alternating turns between the human player and the AI)
1.Set current player to 'X' (human player) initially.
2.Repeat until there's a winner or the board is full:
1.If it's the human player's turn (current player == 'X'):
1.Prompt the human player to enter a row and column (0, 1, 2) for theirmove.
2.Check if the selected cell is empty:
1.If the cell is not empty, prompt the user again.
2.If the cell is empty, make the move by placing 'X' in theselected spot.
2.If it's the AI's turn (current player == 'O'):
1.The AI chooses a move based on the game state using the BFSalgorithm:
1.Explore all valid moves by simulating each one.
2.For each move, evaluate if it leads to a win, loss, or draw.
3.Select the move that maximizes the AI's chances of winningand minimizes the human player's chances (using the minimaxstrategy).dfg
2.Make the move by placing 'O' in the selected spot.
Step 4: Check for a Winner
1. After every move, check if the current player has won the game:
1. Check each row, column, and both diagonals to see if all cells contain the same player's symbol ('X' or 'O').
2. If a player has won, declare the winner and end the game.
Step 5: Check for a Draw
1. After each move, check if the game has ended in a draw:
1. If there are no valid moves left and no winner, declare the game as a draw.
Step 6: Switch Turns
1. If there's no winner or draw, switch the current player:
1. If current player == 'X', switch to 'O' (AI's turn).
2. If current player == 'O', switch to 'X' (human player's turn).
Step 7: End the Game
1. The game ends when:
1. A player wins (one player has won the game).
2. The game ends in a draw (no valid moves left and no winner).
2. Print the result (either "Player X wins!", "Player O wins!", or "It's a draw!").
3. Exit the game.
Implementation:
from collections import deque
def print_board(board):
print("-------------")
for row in board:
print("|", " | ".join(row), "|")
print("-------------")
def check_winner(board, player):
# Check rows
for row in board:
if all(cell == player for cell in row):
return True
# Check columns
for col in range(3):
if all(board[row][col] == player for row in range(3)):
return True
# Check diagonals
if all(board[i][i] == player for i in range(3)) or all(board[i][2 - i] == player for i in range(3)):
return True
return False
def get_valid_moves(board):
moves = []
for i in range(3):
for j in range(3):
if board[i][j] == '-':
moves.append((i, j))
return moves
def make_move(board, move, player):
board[move[0]][move[1]] = player
def bfs(board, player):
queue = deque([(board, player)])
while queue:
current_board, current_player = queue.popleft()
if check_winner(current_board, 'X'):
return -1
elif check_winner(current_board, 'O'):
return 1
elif len(get_valid_moves(current_board)) == 0:
return 0
for move in get_valid_moves(current_board):
next_board = [row[:] for row in current_board]
make_move(next_board, move, 'O' if current_player == 'X' else 'X')
queue.append((next_board, 'O' if current_player == 'X' else 'X'))
def main():
board = [['-' for _ in range(3)] for _ in range(3)]
print("Welcome to Tic Tac Toe!\nEnter row and column (0-2) to make your move.")
print_board(board)
current_player = 'X'
while True:
if current_player == 'X':
row = int(input(f"Player {current_player}, enter row: "))
col = int(input(f"Player {current_player}, enter column: "))
if board[row][col] != '-':
print("Invalid move! Try again.")
continue
make_move(board, (row, col), current_player)
else:
print(f"Player {current_player}'s turn (AI). Thinking...")
best_move, best_score = None, -float('inf') if current_player == 'O' else float('inf')
for move in get_valid_moves(board):
next_board = [row[:] for row in board]
make_move(next_board, move, 'O' if current_player == 'O' else 'X')
score = bfs(next_board, 'O' if current_player == 'X' else 'X')
if (current_player == 'O' and score > best_score) or (current_player == 'X' and score < best_score):
best_score, best_move = score, move
make_move(board, best_move, current_player)
print(f"AI (Player {current_player}) chooses {best_move}")
print_board(board)
if check_winner(board, current_player):
print(f"Player {current_player} wins!")
break
elif len(get_valid_moves(board)) == 0:
print("It's a draw!")
break
else:
current_player = 'X' if current_player == 'O' else 'O'
if __name__ == "__main__":
main()
Exp.No:3(a) Date: FIND OPTIMAL SOLUTION FOR A GIVEN PROBLEM USING ANY LOCAL SEARCH ALGORITHM
Aim:
The aim is to use a genetic algorithm to find a conflict-free arrangement of queens on an 8x8 chessboard through selection, crossover, and mutation.
Algorithm:
Step 1:Input Initial Configuration: Prompt user to enter the column position (0–7) for each
row (optional).
Step 2:Initialize Parameters: Set population_size = 100, mutation_rate = 0.1, generations = 1000.
Step 3:Generate Initial Population: Create 100 individuals, each with 8 integers (column positions of queens).
Step 4:Fitness Function: Evaluate conflicts (same column or diagonal) and calculate fitness as fitness = 28 - conflicts.
Step 5: Repeat for Each Generation: a. Selection: Use roulette wheel selection to choose parents. b. Crossover: Choose a crossover point and create two children. c. Mutation: Apply mutation with probability to change queen positions. d. Form New Population: Add children to new population until desired size. e. Check for Solution: If fitness = 28, return solution and stop.
Step 6:Display Best Solution: Print board with queens ("Q") and empty spaces ("."). Output the generation number or best solution if max generations reached.
Implementation:
import random
def generate_population(size):
population = []
for _ in range(size):
individual = [random.randint(0, 7) for _ in range(8)]
population.append(individual)
return population
def calculate_fitness(individual):
conflicts = 0
for i in range(8):
for j in range(i + 1, 8):
if individual[i] == individual[j] or abs(individual[i] - individual[j]) == abs(i - j):
conflicts += 1
return 28 - conflicts # 28 is the maximum fitness score achievable
def select_parents(population):
total_fitness = sum(calculate_fitness(individual) for individual in population)
probabilities = [calculate_fitness(individual) / total_fitness for individual in population]
parent1 = random.choices(population, weights=probabilities)[0]
parent2 = random.choices(population, weights=probabilities)[0]
return parent1, parent2
def crossover(parent1, parent2):
crossover_point = random.randint(0, 7)
child1 = parent1[:crossover_point] + parent2[crossover_point:]
child2 = parent2[:crossover_point] + parent1[crossover_point:]
return child1, child2
def mutate(individual, mutation_rate):
for i in range(8):
if random.random() < mutation_rate:
individual[i] = random.randint(0, 7)
return individual
def genetic_algorithm(population_size, mutation_rate, generations):
population = generate_population(population_size)
for gen in range(generations):
new_population = []
for _ in range(population_size // 2):
parent1, parent2 = select_parents(population)
child1, child2 = crossover(parent1, parent2)
child1 = mutate(child1, mutation_rate)
child2 = mutate(child2, mutation_rate)
new_population.extend([child1, child2])
population = new_population
best_individual = max(population, key=calculate_fitness)
if calculate_fitness(best_individual) == 28:
return best_individual, gen
best_individual = max(population, key=calculate_fitness)
return best_individual, generations
def print_board(board):
for i in range(8):
for j in range(8):
if board[i] == j:
print("Q", end=" ")
else:
print(".", end=" ")
print()
print()
def main():
print("Enter the initial positions of queens (0-7) for each row:")
initial_board = []
for i in range(8):
position = int(input(f"Row {i}: "))
initial_board.append(position)
population_size = 100
mutation_rate = 0.1
generations = 1000
solution, gen_count = genetic_algorithm(population_size, mutation_rate, generations)
print("Solution Found:")
print_board(solution)
print(f"Solution found in generation: {gen_count}")
if __name__ == "__main__":
main()
Eexp.No:3(b) Date: FIND OPTIMAL SOLUTION FOR A GIVEN PROBLEM USING ANY LOCAL SEARCH ALGORITHM
Aim:
To solve the 8 Queens Problem using the Hill Climbing Algorithm, which places eight queens on a chessboard such that no two queens attack each other, by iteratively moving towards better (lower-cost) configurations.
Algorithm:
Step 1: Generate a random initial board with 8 queens (one per column).
Step 2: Calculate the cost (number of attacking queen pairs).
Step 3: Generate all possible next boards by moving each queen within its column.
Step 4: Select the next board with the lowest cost.
Step 5:
•
If the new board has a lower cost, move to it.
•
Else, restart with a new random board.
Step 6: Repeat Steps 2–5 until a board with cost = 0 is found.
Step 7: Display the final solution board and number of steps or restarts.
Implementation:
import random
def generate_board():
return [random.randint(0, 7) for _ in range(8)]
def calculate_cost(board):
cost = 0
for i in range(len(board)):
for j in range(i + 1, len(board)):
if board[i] == board[j] or abs(board[i] - board[j]) == abs(i - j):
cost += 1
return cost
def get_next_board(board):
next_boards = []
current_cost = calculate_cost(board)
for i in range(8):
for j in range(8):
if j != board[i]:
next_board = list(board)
next_board[i] = j
next_boards.append(next_board)
next_boards.sort(key=lambda x: calculate_cost(x))
return next_boards[0], calculate_cost(next_boards[0])
def print_board(board):
for i in range(8):
for j in range(8):
if board[i] == j:
print("Q", end=" ")
else:
print(".", end=" ")
print()
print()
def hill_climbing():
current_board = generate_board()
current_cost = calculate_cost(current_board)
while True:
print("Current Board:")
print_board(current_board)
print("Cost:", current_cost)
if current_cost == 0:
print("Solution Found!")
break
next_board, next_cost = get_next_board(current_board)
if next_cost >= current_cost:
print("Local maximum reached. Restarting...")
current_board = generate_board()
current_cost = calculate_cost(current_board)
else:
current_board = next_board
current_cost = next_cost
def main():
print("Solving 8 Queens Problem using Hill Climbing Algorithm:")
hill_climbing()
if __name__ == "__main__":
main()
Ex No: 04
Date:
PROPOSE AN AI SOLUTION FOR A GIVEN CONSTRAINT SATISFACTION PROBLEM
Aim:
To solve the Water Jug Problem using Constraint Satisfaction Problem (CSP) techniques by modeling the jugs and constraints to find an exact measurement of water. It demonstrates the application of CSP methods in finding efficient solutions through algorithms and optimization.
Procedure:
Step 1: Define variables for each jug (e.g., X1, X2) representing water levels.
Step 2:Define domains for each variable (possible water levels from 0 to the jug’s capacity).
Step 3:Establish constraints: capacity limits, valid actions (fill, transfer, empty), and goal state (target water level).
Step 4:Implement a backtracking search to explore all possible states and actions.
Step 5:Apply forward checking to prune invalid states during the search.
Step 6:Track visited states to prevent cycles and redundant searches.
Step 7:Backtrack and stop when the goal state (target water level).
Implementation:
from collections import deque, defaultdict
jug1, jug2, aim = 4, 3, 1
visited = defaultdict(lambda: False)
def water_jug_solver(x, y):
if x == aim or y == aim:
print(f"Reached goal state: ({x}, {y})")
return True
if visited[(x, y)]:
return False
visited[(x, y)] = True
print(f"Exploring state: ({x}, {y})")
return (
water_jug_solver(0, y) or
water_jug_solver(x, 0) or
water_jug_solver(jug1, y) or
water_jug_solver(x, jug2) or
water_jug_solver(x - min(x, jug2 - y), y + min(x, jug2 - y)) or
water_jug_solver(x + min(y, jug1 - x), y - min(y, jug1 - x))
)
water_jug_solver(0, 0)
Ex No: 05
Date:
TAKE A SAMPLE DATASET AND APPLY SUITABLE PRE-PROCESSING TECHNIQUES
Aim:
To take a sample dataset and perform preprocessing steps such as handling missing values, encoding features, scaling data, and splitting into training and testing sets.
Algorithm:
Step 1: Input Dataset
•
Read the CSV file into a DataFrame.
Step 2: Explore and Inspect Data
•
View first few rows, summary statistics, and basic info.
Step 3: Clean Data
•
Drop non-informative columns ('id', 'Unnamed: 32').
•
Check for missing values and duplicates.
Step 4: Analyze Data
•
Group by 'diagnosis' and compute mean features.
Step 5: Encode Labels
•
Apply label encoding: Malignant = 1, Benign = 0.
Step 6: Scale Features
•
Standardize feature values using StandardScaler.
Implementation:
import pandas as pd
import matplotlib.pyplot as plt
# Read the data set
data = pd.read_csv("/content/data.csv")
data.head(5)
# Generate summary statistics for numerical columns in the dataset
data.describe()
# Display basic information about the dataset including column types and non-null counts
data.info()
# Drop non-informative columns: 'id' and 'Unnamed: 32' (which may be empty or irrelevant)
data = data.drop([col for col in ['id', 'Unnamed: 32'] if col in data.columns], axis=1)
# Get the count of each category in the 'diagnosis' column (Malignant/M and Benign/B)
data["diagnosis"].value_counts()
# Check for missing values in each column
data.isnull().sum()
# Check for duplicate rows in the dataset
data.duplicated().sum()
# Get the shape of the dataset (rows, columns)
data.shape
# Group the data by 'diagnosis' and calculate the mean of each feature
data.groupby("diagnosis").mean()
# Apply Label Encoding: Convert 'Malignant' (M) to 1 and 'Benign' (B) to 0
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
data['diagnosis'] = le.fit_transform(data['diagnosis'])
data['diagnosis']
# Standardize features: Scale the data using StandardScaler
X_scaled = scaler.fit_transform(data.drop('diagnosis', axis=1))
Ex No: 06
Date:
PERFORM DIMENSIONALITY REDUCTION USING PRINCIPAL COMPONENT ANALYSIS ON A LARGE DATASET
Aim:
To apply Principal Component Analysis (PCA) on scaled breast cancer dataset features and reduce the dimensionality to 2 components for visualization and analysis.
Algorithm:
Step 1: Import PCA Module
•
Import PCA from sklearn.decomposition.
Step 2: Initialize PCA
•
Set the number of principal components (e.g., n_components=2).
Step 3: Apply PCA
•
Fit PCA on the standardized dataset and transform it into a lower-dimensional space.
Step 4: Output Results
•
Display the transformed feature set and verify the new shape (rows × 2 columns).
Implementation:
# Import PCA from sklearn
from sklearn.decomposition import PCA
# Apply PCA: Specify the number of components you want to retain
# Let's say we want to reduce to 2 components (for visualization)
pca = PCA(n_components=2)
# Fit PCA on the scaled data and transform it to the new lower dimensional space
X_pca = pca.fit_transform(X_scaled)
print(X_pca)
# Output the shape of the transformed data (it should now have 2 features)
print(X_pca.shape)
Ex No: 07
Date:
IMPLEMENT AND DEMONSTRATE THE WORKING OF NAIVE BAYES CLASSIFIER IN A REAL-LIFE APPLICATION
Aim:
To implement and demonstrate the working of the Gaussian Naive Bayes classifier for predicting breast cancer diagnosis using a real-life medical dataset.
Algorithm:
Step 1: Import Required Libraries
•
Import libraries for data preprocessing, model training, and evaluation (train_test_split, GaussianNB, StandardScaler, metrics).
Step 2: Prepare Data
•
Split the pre-processed dataset into features (X) and target (y).
•
Standardize the feature values using StandardScaler.
Step 3: Split Data
•
Split the dataset into training and testing sets (80% training, 20% testing).
Step 4: Train the Model
•
Initialize the Gaussian Naive Bayes model.
•
Fit the model on the training data.
Step 5: Make Predictions
•
Predict the target values for the testing dataset.
Step 6: Evaluate the Model
•
Calculate performance metrics: Accuracy, Precision, Recall, and F1 Score.
•
Display the results.
Implementation:
# Import necessary libraries
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.preprocessing import StandardScaler
# Assume 'data' is already pre-processed and ready to use
# Split the data into features (X) and target (y)
X = data.drop('diagnosis', axis=1)
y = data['diagnosis']
# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Train-test split (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(
X_scaled, y, test_size=0.2, random_state=42
)
# Initialize the Gaussian Naive Bayes classifier
nb = GaussianNB()
# Train the model
nb.fit(X_train, y_train)
# Make predictions on the test set
y_pred = nb.predict(X_test)
# Model performance
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
# Accuracy, Precision, Recall, F1 Score
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
# Printing the results
print(f"Accuracy : {accuracy:.4f}")
print(f"Precision : {precision:.4f}")
print(f"Recall : {recall:.4f}")
print(f"F1 Score : {f1:.4f}")
Ex No: 08
Date:
DEVELOP A PREDICTION SYSTEM USING LINEAR AND LOGISTIC REGRESSION
Aim:
To develop and compare prediction systems using Linear Regression and Logistic Regression for classifying or predicting outcomes on a real-world dataset.
Algorithm:
Linear Regression (for demonstration or numeric prediction)
Step 1: Import libraries and load preprocessed data. Step 2: Split data into features (X) and target (y). Step 3: Standardize feature values. Step 4: Train-test split the dataset. Step 5: Fit a Linear Regression model to the training data. Step 6: Predict on the test data and evaluate using metrics like MSE/R².
Logistic Regression (for classification)
Step 1: Import LogisticRegression from sklearn.linear_model. Step 2: Split standardized data into training and testing sets. Step 3: Train the logistic regression model on the training set. Step 4: Make predictions and evaluate performance using accuracy, precision, recall, F1 score, and confusion matrix.
Implementation:
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Assume your 'data' is already loaded and cleaned
X = data.drop('diagnosis', axis=1)
y = data['diagnosis']
# Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
X_scaled, y, test_size=0.2, random_state=42
)
# ---------- LINEAR REGRESSION ----------
linreg = LinearRegression()
linreg.fit(X_train, y_train)
y_pred_lin = linreg.predict(X_test)
# Convert predictions to binary using threshold 0.5
y_pred_lin_class = [1 if val >= 0.5 else 0 for val in y_pred_lin]
# Metrics for Linear Regression (converted to classifier)
print("----- Linear Regression (as classifier) -----")
print(f"Accuracy : {accuracy_score(y_test, y_pred_lin_class):.4f}")
print(f"Precision : {precision_score(y_test, y_pred_lin_class):.4f}")
print(f"Recall : {recall_score(y_test, y_pred_lin_class):.4f}")
print(f"F1 Score : {f1_score(y_test, y_pred_lin_class):.4f}")
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_lin_class))
# ---------- LOGISTIC REGRESSION ----------
logreg = LogisticRegression()
logreg.fit(X_train, y_train)
y_pred_log = logreg.predict(X_test)
# Metrics for Logistic Regression
print("\n----- Logistic Regression -----")
print(f"Accuracy : {accuracy_score(y_test, y_pred_log):.4f}")
print(f"Precision : {precision_score(y_test, y_pred_log):.4f}")
print(f"Recall : {recall_score(y_test, y_pred_log):.4f}")
print(f"F1 Score : {f1_score(y_test, y_pred_log):.4f}")
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_log))
# Plot confusion matrix for logistic regression
cm = confusion_matrix(y_test, y_pred_log)
sns.heatmap(
cm,
annot=True,
fmt='d',
cmap='YlGnBu',
xticklabels=['Benign', 'Malignant'],
yticklabels=['Benign', 'Malignant']
)
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix - Logistic Regression')
plt.show()
Ex No: 9
DATE :Implementation :Develop a classifier using Artificial Neural Network for any onlineexpert system.
Aim:To build a classifier using an Artificial Neural Network (ANN) to predict customer churn aspart of an online expert system.
Algorithm:Step 1: Import necessary libraries (pandas, numpy, tensorflow, etc.).Step 2: Load the dataset (Churn_Modelling.csv) using pandas.read_csv().Step 3: Preprocess the data:Drop unnecessary columns (RowNumber, CustomerId, Surname).Encode categorical variables (Gender and Geography).Step 4: Split the dataset into training and test sets (e.g., 80/20 split).Step 5: Scale features using StandardScaler for normalization.Step 6: Initialize the ANN using Sequential().Step 7: Add hidden layers with activation function 'relu'.Step 8: Add output layer with 'sigmoid' activation (binary classification).Step 9: Compile the model with adam optimizer and binary_crossentropy loss.Step 10: Train the model using fit() with defined epochs and batch size.Step 11: Evaluate the model and make predictions.Step 12: Save the trained ANN model to a file (e.g., .h5 format).
# Step 1: Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Step 2: Import the dataset
dataset = pd.read_csv('/content/Churn_Modelling.csv')
print(dataset.head())
# Grouping Customers Based on Geography and Counting Their Numbers
country_counts = dataset.groupby('Geography').size().reset_index(name='Count')
print(country_counts)
dataset.info()
# number of rows
shape_no_row = dataset.shape
shape_no_row[0]
# Task 1: Generating Matrix of Features (X) — All Independent Variables
X = dataset.iloc[:, 3:-1].values
print(X)
# Generating Dependent Variable Vector (Y)
Y = dataset.iloc[:, -1].values
print(Y)
# Task 3: Feature Engineering
# 1. Encoding Categorical Variable: Gender
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:, 2] = le.fit_transform(X[:, 2])
print(X[:, 2])
# 2. Encoding Categorical Variable: Country (Geography)
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(
transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough'
)
X = ct.fit_transform(X)
print(X[:5])
# Task 4: Creating Training and Testing Data
# 1. Splitting Dataset into Training and Testing Dataset
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=0)
print("X_train shape:", X_train.shape)
print("X_test shape:", X_test.shape)
print("y_train shape:", y_train.shape)
print("y_test shape:", y_test.shape)
# 2. Performing Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
print("X_train after scaling:\n", X_train[:5])
print("X_test after scaling:\n", X_test[:5])
# Task 5: Building an Artificial Neural Network (ANN)
# 1. Initializing Artificial Neural Network
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
ann = Sequential()
ann.summary()
# 2. Creating Hidden Layers
from tensorflow.keras.layers import Dense
# First hidden layer with 6 neurons (you can adjust)
ann.add(Dense(units=6, activation='relu', input_dim=X_train.shape[1]))
# Second hidden layer with 6 neurons
ann.add(Dense(units=6, activation='relu'))
ann.summary()
# 3. Creating Output Layer
# Output layer with 1 neuron and sigmoid activation
ann.add(Dense(units=1, activation='sigmoid'))
ann.summary()
# 4. Compiling Artificial Neural Network
ann.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# 5. Fitting Artificial Neural Network to Training Data
ann.fit(X_train, y_train, batch_size=32, epochs=100)
# Task 6: Making Predictions with the Trained ANN
# 1. Predicting Output for a Single Data Point (example with Linear Regression)
from sklearn.linear_model import LinearRegression
import numpy as np
# Sample training data
X_train_lr = np.array([[1], [2], [3], [4], [5]]) # Example features (independent variable)
y_train_lr = np.array([1, 2, 3, 4, 5]) # Example labels (dependent variable)
# Create and train the model
model = LinearRegression()
model.fit(X_train_lr, y_train_lr)
# Single data point for prediction
X_new = np.array([[6]]) # New input data point (for which we want to predict the output)
# Predict the output for the new data point
prediction = model.predict(X_new)
# Print the prediction
print(f"Predicted output for the data point {X_new[0][0]}: {prediction[0]}")
# 2. Predicting Output for Multiple Data Points (example with Logistic Regression)
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.linear_model import LogisticRegression
# Load dataset
df = pd.read_csv('Churn_Modelling.csv')
# Drop irrelevant columns
df = df.drop(['RowNumber', 'CustomerId', 'Surname'], axis=1)
# Encode categorical features
le_geo = LabelEncoder()
le_gender = LabelEncoder()
df['Geography'] = le_geo.fit_transform(df['Geography'])
df['Gender'] = le_gender.fit_transform(df['Gender'])
# Define features and label
X = df.drop('Exited', axis=1)
y = df['Exited']
# Scale the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Split into training and testing data
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
# Train a Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)
# Define new multiple customer inputs (10 features each)
new_data = pd.DataFrame([
[600, le_geo.transform(['France'])[0], le_gender.transform(['Male'])[0], 40, 3, 60000, 1, 1, 1, 50000],
[750, le_geo.transform(['Germany'])[0], le_gender.transform(['Female'])[0], 50, 5, 100000, 2, 0, 0, 90000],
[580, le_geo.transform(['Spain'])[0], le_gender.transform(['Male'])[0], 37, 2, 20000, 1, 1, 0, 40000],
[820, le_geo.transform(['France'])[0], le_gender.transform(['Female'])[0], 30, 4, 85000, 2, 1, 1, 110000]
], columns=X.columns)
# Scale new inputs
new_data_scaled = scaler.transform(new_data)
# Predict churn
predictions = model.predict(new_data_scaled)
# Convert predictions to True/False
predicted_labels = ['True' if p == 1 else 'False' for p in predictions]
print(predicted_labels)
Ex No: 10
DATE :Implementation :Implement K-Means clustering algorithm for segmentinginputs of a business model.
Aim:To implement the K-Means clustering algorithm for segmenting customer inputs in a businessmodel based on features like age, income, and spending score.Algorithm:Step 1: Import necessary libraries (pandas, matplotlib, seaborn, sklearn).Step 2: Load the dataset (Customers_data.csv) using pandas.read_csv().Step 3: Preprocess the data:Rename columns for consistency.Check and handle missing values.Step 4: Perform data analysis and visualization:Use correlation heatmaps and scatter plots.Visualize data distributions and pairwise relationships.Step 5: Select features for clustering (e.g., Age, Annual Income, Spending Score). Step 6: Determine the optimal number of clusters using the Elbow Method (plot WCSS vs K).Step 7: Apply K-Means with the selected K (e.g., K = 5). Step 8: Assign cluster labels to data using fit_predict(). Step 9: Visualize the clustered data using scatter plots with different colors for each cluster.Step 10: Analyze and interpret clusters for business segmentation insights.
# 1. Import the Dataset
import pandas as pd
data = pd.read_csv('/content/Customers_data.csv')
data.head()
# 2. Find Metadata
data.info()
data.describe()
data.shape
data.columns
# 3. Data Preprocessing
# Rename columns for easier access in code
data.rename(columns={
'Annual Income (k$)': 'Annual_Income',
'Spending Score (1-100)': 'Spending_Score'
}, inplace=True)
# Check for missing values
print("\nMissing Values:")
print(data.isnull().sum())
# Fill missing numeric values with column mean (if any)
data.fillna(data.mean(numeric_only=True), inplace=True)
# Task 1 – Data Analysis & Visualization
# 1. Find the Correlation
correlation_matrix = data.corr(numeric_only=True)
print("Correlation Matrix:")
print(correlation_matrix)
# Visualize correlation matrix with heatmap
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()
# 2. Draw the Pair Plot
sns.pairplot(data)
plt.suptitle("Pair Plot of Features", y=1.02)
plt.show()
# 3. Pearson, Spearman, Kendall Correlations
data_numeric = data.select_dtypes(include=['number'])
print("Pearson Correlation:\n\n", data_numeric.corr(method='pearson'))
print("\n\n")
print("Spearman Correlation:\n\n", data_numeric.corr(method='spearman'))
print("\n\n")
print("Kendall Correlation:\n\n", data_numeric.corr(method='kendall'))
print("\n\n")
# 4. Draw “Age vs Annual Income” and “Age vs Spending Score” Graphs
# Age vs Annual Income
plt.figure(figsize=(6,4))
sns.scatterplot(x='Age', y='Annual_Income', data=data)
plt.title('Age vs Annual Income')
plt.xlabel('Age')
plt.ylabel('Annual Income (k$)')
plt.grid(True)
plt.show()
# Age vs Spending Score
plt.figure(figsize=(6,4))
sns.scatterplot(x='Age', y='Spending_Score', data=data)
plt.title('Age vs Spending Score')
plt.xlabel('Age')
plt.ylabel('Spending Score (1-100)')
plt.grid(True)
plt.show()
# Task 2
# 1. Key Difference Between df.loc and df.iloc
# (Explanation, not code)
# 2. Use df.loc to Get the Annual Income and Spending Score:
X_loc = data.loc[:, ['Annual_Income', 'Spending_Score']]
print("X_loc =")
print(X_loc)
# 3. Use df.iloc to Get the Annual Income and Spending Score
X_iloc = data.iloc[:, [1, 2]]
print("X_iloc =")
print(X_iloc)
# Task 3
# 1. Distribution of Annual Income
plt.figure(figsize=(8, 6))
sns.histplot(data['Annual_Income'], kde=True, color='blue', bins=20)
plt.title('Distribution of Annual Income')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
# 2. Distribution of Age
plt.figure(figsize=(8, 6))
sns.histplot(data['Age'], kde=True, color='green', bins=20)
plt.title('Distribution of Age')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
# 3. Distribution of Spending Score
plt.figure(figsize=(8, 6))
sns.histplot(data['Spending_Score'], kde=True, color='red', bins=20)
plt.title('Distribution of Spending Score')
plt.xlabel('Spending Score (1-100)')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
# 4. Number of Female and Male (and Plot)
gender_count = data['Gender'].value_counts()
print("Number of Female and Male:")
print(gender_count)
gender_count.plot(kind='bar', color=['lightblue', 'lightcoral'])
plt.title('Number of Female and Male')
plt.ylabel('Count')
plt.xlabel('Gender')
plt.xticks(rotation=0)
plt.show()
# Task 4
# 1. Annual Income vs Spending Score (Clustering Visualization)
plt.figure(figsize=(8, 6))
sns.scatterplot(x='Annual_Income', y='Spending_Score', data=data, hue='Gender', palette='coolwarm', s=100, alpha=0.7)
plt.title('Annual Income vs Spending Score')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.grid(True)
plt.show()
# 2. Annual Income vs Age (Clustering Visualization)
plt.figure(figsize=(8, 6))
sns.scatterplot(x='Annual_Income', y='Age', data=data, hue='Gender', palette='coolwarm', s=100, alpha=0.7)
plt.title('Annual Income vs Age')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Age')
plt.grid(True)
plt.show()
# 3. Age vs Spending Score (Clustering Visualization)
plt.figure(figsize=(8, 6))
sns.scatterplot(x='Age', y='Spending_Score', data=data, hue='Gender', palette='coolwarm', s=100, alpha=0.7)
plt.title('Age vs Spending Score')
plt.xlabel('Age')
plt.ylabel('Spending Score (1-100)')
plt.grid(True)
plt.show()
# Task 5
# WCSS (Within-Cluster Sum of Squares) in K-Means
from sklearn.cluster import KMeans
X = data[['Annual_Income', 'Spending_Score']] # Features for clustering
wcss = []
for k in range(1, 11):
kmeans = KMeans(n_clusters=k, init='k-means++', random_state=42)
kmeans.fit(X)
wcss.append(kmeans.inertia_)
plt.figure(figsize=(10, 6))
plt.plot(range(1, 11), wcss, marker='o')
plt.title('WCSS vs Number of Clusters (Elbow Method)')
plt.xlabel('Number of Clusters (K)')
plt.ylabel('WCSS')
plt.grid(True)
plt.xticks(range(1, 11))
plt.show()
# Task 6
# 3. Apply K-Means Clustering
X = data[['Age', 'Annual_Income', 'Spending_Score']] # Features for clustering
kmeans = KMeans(n_clusters=5, init='k-means++', random_state=42)
data['Cluster'] = kmeans.fit_predict(X)
data. Head()

Add Comment

Please, Sign In to add comment