Advertisement
class_connect

Untitled

Jun 18th, 2025
18
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 36.52 KB | None | 0 0
  1. Ex No: 01
  2. Date: 19/02/2025
  3. Create a sample dataset and explore statistical operations using pandas and visualize the results through plots
  4. Aim:
  5. To perform exploratory data analysis (EDA) on a user-defined sample dataset, using statistical operations and visualisations to identify patterns, trends, and insights.
  6. Procedure:
  7. Step 1: Open a new notebook in google colab
  8. Step 2: Import all the necessary modules
  9. Step 3: Create a dataset using Numpy random number generator and download it using the files module
  10. Step 4: Now, load the dataset and convert it to dataframe using pandas
  11. Step 5: Perform the listed metrics using the dataframe and packages that have been imported
  12. Step 6: Additionally, import seaborn and matplotlib for plotting the graphs for the dataset.
  13. Implementation:
  14. import pandas as pd
  15. import numpy as np
  16. import matplotlib.pyplot as plt
  17. import seaborn as sns
  18. from google.colab import files
  19. # Upload dataset manually in Google Colab
  20. uploaded = files.upload()
  21. file_path = "sales_data (1).csv" # Adjusted path for Google Colab
  22. df = pd.read_csv(file_path)
  23. # Display DataFrame, top and bottom 5 values
  24. print(df.head())
  25.  
  26.  
  27. Ex No: 02
  28. Date: 06/03/2025
  29. IMPLEMENT UNINFORMED SEARCH STRATEGIES FOR ANY REAL WORLD PROBLEM
  30. Aim:
  31. The aim is to develop a User vs AI Tic-Tac-Toe game where the AI uses Breadth-First Search (BFS) to determine optimal moves.
  32. Algorithm:
  33. Step 1: Initialize the Board
  34. 1.Create a 3x3 board with all cells initially set to empty ('-').
  35. 2.Print the initial empty board to show the player.
  36. Step 2: Display the Board
  37. 1.Print the current state of the board after each move.
  38. Step 3: Main Game Loop (Alternating turns between the human player and the AI)
  39. 1.Set current player to 'X' (human player) initially.
  40. 2.Repeat until there's a winner or the board is full:
  41. 1.If it's the human player's turn (current player == 'X'):
  42. 1.Prompt the human player to enter a row and column (0, 1, 2) for theirmove.
  43. 2.Check if the selected cell is empty:
  44. 1.If the cell is not empty, prompt the user again.
  45. 2.If the cell is empty, make the move by placing 'X' in theselected spot.
  46. 2.If it's the AI's turn (current player == 'O'):
  47. 1.The AI chooses a move based on the game state using the BFSalgorithm:
  48. 1.Explore all valid moves by simulating each one.
  49. 2.For each move, evaluate if it leads to a win, loss, or draw.
  50. 3.Select the move that maximizes the AI's chances of winningand minimizes the human player's chances (using the minimaxstrategy).dfg
  51. 2.Make the move by placing 'O' in the selected spot.
  52. Step 4: Check for a Winner
  53. 1. After every move, check if the current player has won the game:
  54. 1. Check each row, column, and both diagonals to see if all cells contain the same player's symbol ('X' or 'O').
  55. 2. If a player has won, declare the winner and end the game.
  56. Step 5: Check for a Draw
  57. 1. After each move, check if the game has ended in a draw:
  58. 1. If there are no valid moves left and no winner, declare the game as a draw.
  59. Step 6: Switch Turns
  60. 1. If there's no winner or draw, switch the current player:
  61. 1. If current player == 'X', switch to 'O' (AI's turn).
  62. 2. If current player == 'O', switch to 'X' (human player's turn).
  63. Step 7: End the Game
  64. 1. The game ends when:
  65. 1. A player wins (one player has won the game).
  66. 2. The game ends in a draw (no valid moves left and no winner).
  67. 2. Print the result (either "Player X wins!", "Player O wins!", or "It's a draw!").
  68. 3. Exit the game.
  69. Implementation:
  70. from collections import deque
  71.  
  72. def print_board(board):
  73. print("-------------")
  74. for row in board:
  75. print("|", " | ".join(row), "|")
  76. print("-------------")
  77.  
  78. def check_winner(board, player):
  79. # Check rows
  80. for row in board:
  81. if all(cell == player for cell in row):
  82. return True
  83. # Check columns
  84. for col in range(3):
  85. if all(board[row][col] == player for row in range(3)):
  86. return True
  87. # Check diagonals
  88. if all(board[i][i] == player for i in range(3)) or all(board[i][2 - i] == player for i in range(3)):
  89. return True
  90. return False
  91.  
  92. def get_valid_moves(board):
  93. moves = []
  94. for i in range(3):
  95. for j in range(3):
  96. if board[i][j] == '-':
  97. moves.append((i, j))
  98. return moves
  99.  
  100. def make_move(board, move, player):
  101. board[move[0]][move[1]] = player
  102.  
  103. def bfs(board, player):
  104. queue = deque([(board, player)])
  105. while queue:
  106. current_board, current_player = queue.popleft()
  107. if check_winner(current_board, 'X'):
  108. return -1
  109. elif check_winner(current_board, 'O'):
  110. return 1
  111. elif len(get_valid_moves(current_board)) == 0:
  112. return 0
  113. for move in get_valid_moves(current_board):
  114. next_board = [row[:] for row in current_board]
  115. make_move(next_board, move, 'O' if current_player == 'X' else 'X')
  116. queue.append((next_board, 'O' if current_player == 'X' else 'X'))
  117.  
  118. def main():
  119. board = [['-' for _ in range(3)] for _ in range(3)]
  120. print("Welcome to Tic Tac Toe!\nEnter row and column (0-2) to make your move.")
  121. print_board(board)
  122. current_player = 'X'
  123. while True:
  124. if current_player == 'X':
  125. row = int(input(f"Player {current_player}, enter row: "))
  126. col = int(input(f"Player {current_player}, enter column: "))
  127. if board[row][col] != '-':
  128. print("Invalid move! Try again.")
  129. continue
  130. make_move(board, (row, col), current_player)
  131. else:
  132. print(f"Player {current_player}'s turn (AI). Thinking...")
  133. best_move, best_score = None, -float('inf') if current_player == 'O' else float('inf')
  134. for move in get_valid_moves(board):
  135. next_board = [row[:] for row in board]
  136. make_move(next_board, move, 'O' if current_player == 'O' else 'X')
  137. score = bfs(next_board, 'O' if current_player == 'X' else 'X')
  138. if (current_player == 'O' and score > best_score) or (current_player == 'X' and score < best_score):
  139. best_score, best_move = score, move
  140. make_move(board, best_move, current_player)
  141. print(f"AI (Player {current_player}) chooses {best_move}")
  142. print_board(board)
  143. if check_winner(board, current_player):
  144. print(f"Player {current_player} wins!")
  145. break
  146. elif len(get_valid_moves(board)) == 0:
  147. print("It's a draw!")
  148. break
  149. else:
  150. current_player = 'X' if current_player == 'O' else 'O'
  151.  
  152. if __name__ == "__main__":
  153. main()
  154.  
  155.  
  156.  
  157. Exp.No:3(a) Date: FIND OPTIMAL SOLUTION FOR A GIVEN PROBLEM USING ANY LOCAL SEARCH ALGORITHM
  158. Aim:
  159. The aim is to use a genetic algorithm to find a conflict-free arrangement of queens on an 8x8 chessboard through selection, crossover, and mutation.
  160. Algorithm:
  161. Step 1:Input Initial Configuration: Prompt user to enter the column position (0–7) for each
  162. row (optional).
  163. Step 2:Initialize Parameters: Set population_size = 100, mutation_rate = 0.1, generations = 1000.
  164. Step 3:Generate Initial Population: Create 100 individuals, each with 8 integers (column positions of queens).
  165. Step 4:Fitness Function: Evaluate conflicts (same column or diagonal) and calculate fitness as fitness = 28 - conflicts.
  166. Step 5: Repeat for Each Generation: a. Selection: Use roulette wheel selection to choose parents. b. Crossover: Choose a crossover point and create two children. c. Mutation: Apply mutation with probability to change queen positions. d. Form New Population: Add children to new population until desired size. e. Check for Solution: If fitness = 28, return solution and stop.
  167. Step 6:Display Best Solution: Print board with queens ("Q") and empty spaces ("."). Output the generation number or best solution if max generations reached.
  168. Implementation:
  169. import random
  170.  
  171. def generate_population(size):
  172. population = []
  173. for _ in range(size):
  174. individual = [random.randint(0, 7) for _ in range(8)]
  175. population.append(individual)
  176. return population
  177.  
  178. def calculate_fitness(individual):
  179. conflicts = 0
  180. for i in range(8):
  181. for j in range(i + 1, 8):
  182. if individual[i] == individual[j] or abs(individual[i] - individual[j]) == abs(i - j):
  183. conflicts += 1
  184. return 28 - conflicts # 28 is the maximum fitness score achievable
  185.  
  186. def select_parents(population):
  187. total_fitness = sum(calculate_fitness(individual) for individual in population)
  188. probabilities = [calculate_fitness(individual) / total_fitness for individual in population]
  189. parent1 = random.choices(population, weights=probabilities)[0]
  190. parent2 = random.choices(population, weights=probabilities)[0]
  191. return parent1, parent2
  192.  
  193. def crossover(parent1, parent2):
  194. crossover_point = random.randint(0, 7)
  195. child1 = parent1[:crossover_point] + parent2[crossover_point:]
  196. child2 = parent2[:crossover_point] + parent1[crossover_point:]
  197. return child1, child2
  198.  
  199. def mutate(individual, mutation_rate):
  200. for i in range(8):
  201. if random.random() < mutation_rate:
  202. individual[i] = random.randint(0, 7)
  203. return individual
  204.  
  205. def genetic_algorithm(population_size, mutation_rate, generations):
  206. population = generate_population(population_size)
  207. for gen in range(generations):
  208. new_population = []
  209. for _ in range(population_size // 2):
  210. parent1, parent2 = select_parents(population)
  211. child1, child2 = crossover(parent1, parent2)
  212. child1 = mutate(child1, mutation_rate)
  213. child2 = mutate(child2, mutation_rate)
  214. new_population.extend([child1, child2])
  215. population = new_population
  216. best_individual = max(population, key=calculate_fitness)
  217. if calculate_fitness(best_individual) == 28:
  218. return best_individual, gen
  219. best_individual = max(population, key=calculate_fitness)
  220. return best_individual, generations
  221.  
  222. def print_board(board):
  223. for i in range(8):
  224. for j in range(8):
  225. if board[i] == j:
  226. print("Q", end=" ")
  227. else:
  228. print(".", end=" ")
  229. print()
  230. print()
  231.  
  232. def main():
  233. print("Enter the initial positions of queens (0-7) for each row:")
  234. initial_board = []
  235. for i in range(8):
  236. position = int(input(f"Row {i}: "))
  237. initial_board.append(position)
  238.  
  239. population_size = 100
  240. mutation_rate = 0.1
  241. generations = 1000
  242.  
  243. solution, gen_count = genetic_algorithm(population_size, mutation_rate, generations)
  244. print("Solution Found:")
  245. print_board(solution)
  246. print(f"Solution found in generation: {gen_count}")
  247.  
  248. if __name__ == "__main__":
  249. main()
  250.  
  251.  
  252. Eexp.No:3(b) Date: FIND OPTIMAL SOLUTION FOR A GIVEN PROBLEM USING ANY LOCAL SEARCH ALGORITHM
  253. Aim:
  254. To solve the 8 Queens Problem using the Hill Climbing Algorithm, which places eight queens on a chessboard such that no two queens attack each other, by iteratively moving towards better (lower-cost) configurations.
  255. Algorithm:
  256. Step 1: Generate a random initial board with 8 queens (one per column).
  257. Step 2: Calculate the cost (number of attacking queen pairs).
  258. Step 3: Generate all possible next boards by moving each queen within its column.
  259. Step 4: Select the next board with the lowest cost.
  260. Step 5:
  261. If the new board has a lower cost, move to it.
  262. Else, restart with a new random board.
  263. Step 6: Repeat Steps 2–5 until a board with cost = 0 is found.
  264. Step 7: Display the final solution board and number of steps or restarts.
  265. Implementation:
  266. import random
  267.  
  268. def generate_board():
  269. return [random.randint(0, 7) for _ in range(8)]
  270.  
  271. def calculate_cost(board):
  272. cost = 0
  273. for i in range(len(board)):
  274. for j in range(i + 1, len(board)):
  275. if board[i] == board[j] or abs(board[i] - board[j]) == abs(i - j):
  276. cost += 1
  277. return cost
  278.  
  279. def get_next_board(board):
  280. next_boards = []
  281. current_cost = calculate_cost(board)
  282. for i in range(8):
  283. for j in range(8):
  284. if j != board[i]:
  285. next_board = list(board)
  286. next_board[i] = j
  287. next_boards.append(next_board)
  288. next_boards.sort(key=lambda x: calculate_cost(x))
  289. return next_boards[0], calculate_cost(next_boards[0])
  290.  
  291. def print_board(board):
  292. for i in range(8):
  293. for j in range(8):
  294. if board[i] == j:
  295. print("Q", end=" ")
  296. else:
  297. print(".", end=" ")
  298. print()
  299. print()
  300.  
  301. def hill_climbing():
  302. current_board = generate_board()
  303. current_cost = calculate_cost(current_board)
  304. while True:
  305. print("Current Board:")
  306. print_board(current_board)
  307. print("Cost:", current_cost)
  308. if current_cost == 0:
  309. print("Solution Found!")
  310. break
  311. next_board, next_cost = get_next_board(current_board)
  312. if next_cost >= current_cost:
  313. print("Local maximum reached. Restarting...")
  314. current_board = generate_board()
  315. current_cost = calculate_cost(current_board)
  316. else:
  317. current_board = next_board
  318. current_cost = next_cost
  319.  
  320. def main():
  321. print("Solving 8 Queens Problem using Hill Climbing Algorithm:")
  322. hill_climbing()
  323.  
  324. if __name__ == "__main__":
  325. main()
  326.  
  327.  
  328. Ex No: 04
  329. Date:
  330. PROPOSE AN AI SOLUTION FOR A GIVEN CONSTRAINT SATISFACTION PROBLEM
  331. Aim:
  332. To solve the Water Jug Problem using Constraint Satisfaction Problem (CSP) techniques by modeling the jugs and constraints to find an exact measurement of water. It demonstrates the application of CSP methods in finding efficient solutions through algorithms and optimization.
  333. Procedure:
  334. Step 1: Define variables for each jug (e.g., X1, X2) representing water levels.
  335. Step 2:Define domains for each variable (possible water levels from 0 to the jug’s capacity).
  336. Step 3:Establish constraints: capacity limits, valid actions (fill, transfer, empty), and goal state (target water level).
  337. Step 4:Implement a backtracking search to explore all possible states and actions.
  338. Step 5:Apply forward checking to prune invalid states during the search.
  339. Step 6:Track visited states to prevent cycles and redundant searches.
  340. Step 7:Backtrack and stop when the goal state (target water level).
  341. Implementation:
  342. from collections import deque, defaultdict
  343.  
  344. jug1, jug2, aim = 4, 3, 1
  345.  
  346. visited = defaultdict(lambda: False)
  347.  
  348. def water_jug_solver(x, y):
  349. if x == aim or y == aim:
  350. print(f"Reached goal state: ({x}, {y})")
  351. return True
  352.  
  353. if visited[(x, y)]:
  354. return False
  355.  
  356. visited[(x, y)] = True
  357. print(f"Exploring state: ({x}, {y})")
  358.  
  359. return (
  360. water_jug_solver(0, y) or
  361. water_jug_solver(x, 0) or
  362. water_jug_solver(jug1, y) or
  363. water_jug_solver(x, jug2) or
  364. water_jug_solver(x - min(x, jug2 - y), y + min(x, jug2 - y)) or
  365. water_jug_solver(x + min(y, jug1 - x), y - min(y, jug1 - x))
  366. )
  367.  
  368. water_jug_solver(0, 0)
  369.  
  370.  
  371. Ex No: 05
  372. Date:
  373. TAKE A SAMPLE DATASET AND APPLY SUITABLE PRE-PROCESSING TECHNIQUES
  374. Aim:
  375. To take a sample dataset and perform preprocessing steps such as handling missing values, encoding features, scaling data, and splitting into training and testing sets.
  376. Algorithm:
  377. Step 1: Input Dataset
  378. Read the CSV file into a DataFrame.
  379. Step 2: Explore and Inspect Data
  380. View first few rows, summary statistics, and basic info.
  381. Step 3: Clean Data
  382. Drop non-informative columns ('id', 'Unnamed: 32').
  383. Check for missing values and duplicates.
  384. Step 4: Analyze Data
  385. Group by 'diagnosis' and compute mean features.
  386. Step 5: Encode Labels
  387. Apply label encoding: Malignant = 1, Benign = 0.
  388. Step 6: Scale Features
  389. Standardize feature values using StandardScaler.
  390. Implementation:
  391. import pandas as pd
  392. import matplotlib.pyplot as plt
  393.  
  394. # Read the data set
  395. data = pd.read_csv("/content/data.csv")
  396. data.head(5)
  397.  
  398. # Generate summary statistics for numerical columns in the dataset
  399. data.describe()
  400.  
  401. # Display basic information about the dataset including column types and non-null counts
  402. data.info()
  403.  
  404. # Drop non-informative columns: 'id' and 'Unnamed: 32' (which may be empty or irrelevant)
  405. data = data.drop([col for col in ['id', 'Unnamed: 32'] if col in data.columns], axis=1)
  406.  
  407. # Get the count of each category in the 'diagnosis' column (Malignant/M and Benign/B)
  408. data["diagnosis"].value_counts()
  409.  
  410. # Check for missing values in each column
  411. data.isnull().sum()
  412.  
  413. # Check for duplicate rows in the dataset
  414. data.duplicated().sum()
  415.  
  416. # Get the shape of the dataset (rows, columns)
  417. data.shape
  418.  
  419. # Group the data by 'diagnosis' and calculate the mean of each feature
  420. data.groupby("diagnosis").mean()
  421.  
  422. # Apply Label Encoding: Convert 'Malignant' (M) to 1 and 'Benign' (B) to 0
  423. from sklearn.preprocessing import LabelEncoder
  424. le = LabelEncoder()
  425. data['diagnosis'] = le.fit_transform(data['diagnosis'])
  426. data['diagnosis']
  427.  
  428. # Standardize features: Scale the data using StandardScaler
  429. X_scaled = scaler.fit_transform(data.drop('diagnosis', axis=1))
  430.  
  431.  
  432. Ex No: 06
  433. Date:
  434. PERFORM DIMENSIONALITY REDUCTION USING PRINCIPAL COMPONENT ANALYSIS ON A LARGE DATASET
  435. Aim:
  436. To apply Principal Component Analysis (PCA) on scaled breast cancer dataset features and reduce the dimensionality to 2 components for visualization and analysis.
  437. Algorithm:
  438. Step 1: Import PCA Module
  439. Import PCA from sklearn.decomposition.
  440. Step 2: Initialize PCA
  441. Set the number of principal components (e.g., n_components=2).
  442. Step 3: Apply PCA
  443. Fit PCA on the standardized dataset and transform it into a lower-dimensional space.
  444. Step 4: Output Results
  445. Display the transformed feature set and verify the new shape (rows × 2 columns).
  446. Implementation:
  447. # Import PCA from sklearn
  448. from sklearn.decomposition import PCA
  449.  
  450. # Apply PCA: Specify the number of components you want to retain
  451. # Let's say we want to reduce to 2 components (for visualization)
  452. pca = PCA(n_components=2)
  453.  
  454. # Fit PCA on the scaled data and transform it to the new lower dimensional space
  455. X_pca = pca.fit_transform(X_scaled)
  456. print(X_pca)
  457.  
  458. # Output the shape of the transformed data (it should now have 2 features)
  459. print(X_pca.shape)
  460.  
  461.  
  462. Ex No: 07
  463. Date:
  464. IMPLEMENT AND DEMONSTRATE THE WORKING OF NAIVE BAYES CLASSIFIER IN A REAL-LIFE APPLICATION
  465. Aim:
  466. To implement and demonstrate the working of the Gaussian Naive Bayes classifier for predicting breast cancer diagnosis using a real-life medical dataset.
  467. Algorithm:
  468. Step 1: Import Required Libraries
  469. Import libraries for data preprocessing, model training, and evaluation (train_test_split, GaussianNB, StandardScaler, metrics).
  470. Step 2: Prepare Data
  471. Split the pre-processed dataset into features (X) and target (y).
  472. Standardize the feature values using StandardScaler.
  473. Step 3: Split Data
  474. Split the dataset into training and testing sets (80% training, 20% testing).
  475. Step 4: Train the Model
  476. Initialize the Gaussian Naive Bayes model.
  477. Fit the model on the training data.
  478. Step 5: Make Predictions
  479. Predict the target values for the testing dataset.
  480. Step 6: Evaluate the Model
  481. Calculate performance metrics: Accuracy, Precision, Recall, and F1 Score.
  482. Display the results.
  483. Implementation:
  484. # Import necessary libraries
  485. from sklearn.model_selection import train_test_split
  486. from sklearn.naive_bayes import GaussianNB
  487. from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
  488. from sklearn.preprocessing import StandardScaler
  489.  
  490. # Assume 'data' is already pre-processed and ready to use
  491.  
  492. # Split the data into features (X) and target (y)
  493. X = data.drop('diagnosis', axis=1)
  494. y = data['diagnosis']
  495.  
  496. # Standardize the features
  497. scaler = StandardScaler()
  498. X_scaled = scaler.fit_transform(X)
  499.  
  500. # Train-test split (80% training, 20% testing)
  501. X_train, X_test, y_train, y_test = train_test_split(
  502. X_scaled, y, test_size=0.2, random_state=42
  503. )
  504.  
  505. # Initialize the Gaussian Naive Bayes classifier
  506. nb = GaussianNB()
  507.  
  508. # Train the model
  509. nb.fit(X_train, y_train)
  510.  
  511. # Make predictions on the test set
  512. y_pred = nb.predict(X_test)
  513.  
  514. # Model performance
  515. from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
  516.  
  517. # Accuracy, Precision, Recall, F1 Score
  518. accuracy = accuracy_score(y_test, y_pred)
  519. precision = precision_score(y_test, y_pred)
  520. recall = recall_score(y_test, y_pred)
  521. f1 = f1_score(y_test, y_pred)
  522.  
  523. # Printing the results
  524. print(f"Accuracy : {accuracy:.4f}")
  525. print(f"Precision : {precision:.4f}")
  526. print(f"Recall : {recall:.4f}")
  527. print(f"F1 Score : {f1:.4f}")
  528.  
  529.  
  530.  
  531. Ex No: 08
  532. Date:
  533. DEVELOP A PREDICTION SYSTEM USING LINEAR AND LOGISTIC REGRESSION
  534. Aim:
  535. To develop and compare prediction systems using Linear Regression and Logistic Regression for classifying or predicting outcomes on a real-world dataset.
  536. Algorithm:
  537. Linear Regression (for demonstration or numeric prediction)
  538. Step 1: Import libraries and load preprocessed data. Step 2: Split data into features (X) and target (y). Step 3: Standardize feature values. Step 4: Train-test split the dataset. Step 5: Fit a Linear Regression model to the training data. Step 6: Predict on the test data and evaluate using metrics like MSE/R².
  539. Logistic Regression (for classification)
  540. Step 1: Import LogisticRegression from sklearn.linear_model. Step 2: Split standardized data into training and testing sets. Step 3: Train the logistic regression model on the training set. Step 4: Make predictions and evaluate performance using accuracy, precision, recall, F1 score, and confusion matrix.
  541. Implementation:
  542. from sklearn.linear_model import LinearRegression, LogisticRegression
  543. from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
  544. from sklearn.model_selection import train_test_split
  545. from sklearn.preprocessing import StandardScaler
  546. import seaborn as sns
  547. import matplotlib.pyplot as plt
  548. import pandas as pd
  549.  
  550. # Assume your 'data' is already loaded and cleaned
  551. X = data.drop('diagnosis', axis=1)
  552. y = data['diagnosis']
  553.  
  554. # Standardize features
  555. scaler = StandardScaler()
  556. X_scaled = scaler.fit_transform(X)
  557.  
  558. # Train-test split
  559. X_train, X_test, y_train, y_test = train_test_split(
  560. X_scaled, y, test_size=0.2, random_state=42
  561. )
  562.  
  563. # ---------- LINEAR REGRESSION ----------
  564. linreg = LinearRegression()
  565. linreg.fit(X_train, y_train)
  566. y_pred_lin = linreg.predict(X_test)
  567.  
  568. # Convert predictions to binary using threshold 0.5
  569. y_pred_lin_class = [1 if val >= 0.5 else 0 for val in y_pred_lin]
  570.  
  571. # Metrics for Linear Regression (converted to classifier)
  572. print("----- Linear Regression (as classifier) -----")
  573. print(f"Accuracy : {accuracy_score(y_test, y_pred_lin_class):.4f}")
  574. print(f"Precision : {precision_score(y_test, y_pred_lin_class):.4f}")
  575. print(f"Recall : {recall_score(y_test, y_pred_lin_class):.4f}")
  576. print(f"F1 Score : {f1_score(y_test, y_pred_lin_class):.4f}")
  577. print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_lin_class))
  578.  
  579. # ---------- LOGISTIC REGRESSION ----------
  580. logreg = LogisticRegression()
  581. logreg.fit(X_train, y_train)
  582. y_pred_log = logreg.predict(X_test)
  583.  
  584. # Metrics for Logistic Regression
  585. print("\n----- Logistic Regression -----")
  586. print(f"Accuracy : {accuracy_score(y_test, y_pred_log):.4f}")
  587. print(f"Precision : {precision_score(y_test, y_pred_log):.4f}")
  588. print(f"Recall : {recall_score(y_test, y_pred_log):.4f}")
  589. print(f"F1 Score : {f1_score(y_test, y_pred_log):.4f}")
  590. print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_log))
  591.  
  592. # Plot confusion matrix for logistic regression
  593. cm = confusion_matrix(y_test, y_pred_log)
  594. sns.heatmap(
  595. cm,
  596. annot=True,
  597. fmt='d',
  598. cmap='YlGnBu',
  599. xticklabels=['Benign', 'Malignant'],
  600. yticklabels=['Benign', 'Malignant']
  601. )
  602. plt.xlabel('Predicted')
  603. plt.ylabel('Actual')
  604. plt.title('Confusion Matrix - Logistic Regression')
  605. plt.show()
  606.  
  607.  
  608. Ex No: 9
  609. DATE :Implementation :Develop a classifier using Artificial Neural Network for any onlineexpert system.
  610. Aim:To build a classifier using an Artificial Neural Network (ANN) to predict customer churn aspart of an online expert system.
  611. Algorithm:Step 1: Import necessary libraries (pandas, numpy, tensorflow, etc.).Step 2: Load the dataset (Churn_Modelling.csv) using pandas.read_csv().Step 3: Preprocess the data:Drop unnecessary columns (RowNumber, CustomerId, Surname).Encode categorical variables (Gender and Geography).Step 4: Split the dataset into training and test sets (e.g., 80/20 split).Step 5: Scale features using StandardScaler for normalization.Step 6: Initialize the ANN using Sequential().Step 7: Add hidden layers with activation function 'relu'.Step 8: Add output layer with 'sigmoid' activation (binary classification).Step 9: Compile the model with adam optimizer and binary_crossentropy loss.Step 10: Train the model using fit() with defined epochs and batch size.Step 11: Evaluate the model and make predictions.Step 12: Save the trained ANN model to a file (e.g., .h5 format).
  612.  
  613.  
  614. # Step 1: Import necessary libraries
  615. import pandas as pd
  616. import numpy as np
  617. import matplotlib.pyplot as plt
  618. import seaborn as sns
  619.  
  620. # Step 2: Import the dataset
  621. dataset = pd.read_csv('/content/Churn_Modelling.csv')
  622. print(dataset.head())
  623.  
  624. # Grouping Customers Based on Geography and Counting Their Numbers
  625. country_counts = dataset.groupby('Geography').size().reset_index(name='Count')
  626. print(country_counts)
  627.  
  628. dataset.info()
  629.  
  630. # number of rows
  631. shape_no_row = dataset.shape
  632. shape_no_row[0]
  633.  
  634. # Task 1: Generating Matrix of Features (X) — All Independent Variables
  635. X = dataset.iloc[:, 3:-1].values
  636. print(X)
  637.  
  638. # Generating Dependent Variable Vector (Y)
  639. Y = dataset.iloc[:, -1].values
  640. print(Y)
  641.  
  642. # Task 3: Feature Engineering
  643. # 1. Encoding Categorical Variable: Gender
  644. from sklearn.preprocessing import LabelEncoder
  645. le = LabelEncoder()
  646. X[:, 2] = le.fit_transform(X[:, 2])
  647. print(X[:, 2])
  648.  
  649. # 2. Encoding Categorical Variable: Country (Geography)
  650. from sklearn.compose import ColumnTransformer
  651. from sklearn.preprocessing import OneHotEncoder
  652. ct = ColumnTransformer(
  653. transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough'
  654. )
  655. X = ct.fit_transform(X)
  656. print(X[:5])
  657.  
  658. # Task 4: Creating Training and Testing Data
  659. # 1. Splitting Dataset into Training and Testing Dataset
  660. from sklearn.model_selection import train_test_split
  661. X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=0)
  662. print("X_train shape:", X_train.shape)
  663. print("X_test shape:", X_test.shape)
  664. print("y_train shape:", y_train.shape)
  665. print("y_test shape:", y_test.shape)
  666.  
  667. # 2. Performing Feature Scaling
  668. from sklearn.preprocessing import StandardScaler
  669. sc = StandardScaler()
  670. X_train = sc.fit_transform(X_train)
  671. X_test = sc.transform(X_test)
  672. print("X_train after scaling:\n", X_train[:5])
  673. print("X_test after scaling:\n", X_test[:5])
  674.  
  675. # Task 5: Building an Artificial Neural Network (ANN)
  676. # 1. Initializing Artificial Neural Network
  677. import tensorflow as tf
  678. from tensorflow import keras
  679. from tensorflow.keras.models import Sequential
  680. ann = Sequential()
  681. ann.summary()
  682.  
  683. # 2. Creating Hidden Layers
  684. from tensorflow.keras.layers import Dense
  685. # First hidden layer with 6 neurons (you can adjust)
  686. ann.add(Dense(units=6, activation='relu', input_dim=X_train.shape[1]))
  687. # Second hidden layer with 6 neurons
  688. ann.add(Dense(units=6, activation='relu'))
  689. ann.summary()
  690.  
  691. # 3. Creating Output Layer
  692. # Output layer with 1 neuron and sigmoid activation
  693. ann.add(Dense(units=1, activation='sigmoid'))
  694. ann.summary()
  695.  
  696. # 4. Compiling Artificial Neural Network
  697. ann.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
  698.  
  699. # 5. Fitting Artificial Neural Network to Training Data
  700. ann.fit(X_train, y_train, batch_size=32, epochs=100)
  701.  
  702. # Task 6: Making Predictions with the Trained ANN
  703. # 1. Predicting Output for a Single Data Point (example with Linear Regression)
  704. from sklearn.linear_model import LinearRegression
  705. import numpy as np
  706. # Sample training data
  707. X_train_lr = np.array([[1], [2], [3], [4], [5]]) # Example features (independent variable)
  708. y_train_lr = np.array([1, 2, 3, 4, 5]) # Example labels (dependent variable)
  709. # Create and train the model
  710. model = LinearRegression()
  711. model.fit(X_train_lr, y_train_lr)
  712. # Single data point for prediction
  713. X_new = np.array([[6]]) # New input data point (for which we want to predict the output)
  714. # Predict the output for the new data point
  715. prediction = model.predict(X_new)
  716. # Print the prediction
  717. print(f"Predicted output for the data point {X_new[0][0]}: {prediction[0]}")
  718.  
  719. # 2. Predicting Output for Multiple Data Points (example with Logistic Regression)
  720. import pandas as pd
  721. from sklearn.model_selection import train_test_split
  722. from sklearn.preprocessing import LabelEncoder, StandardScaler
  723. from sklearn.linear_model import LogisticRegression
  724. # Load dataset
  725. df = pd.read_csv('Churn_Modelling.csv')
  726. # Drop irrelevant columns
  727. df = df.drop(['RowNumber', 'CustomerId', 'Surname'], axis=1)
  728. # Encode categorical features
  729. le_geo = LabelEncoder()
  730. le_gender = LabelEncoder()
  731. df['Geography'] = le_geo.fit_transform(df['Geography'])
  732. df['Gender'] = le_gender.fit_transform(df['Gender'])
  733. # Define features and label
  734. X = df.drop('Exited', axis=1)
  735. y = df['Exited']
  736. # Scale the features
  737. scaler = StandardScaler()
  738. X_scaled = scaler.fit_transform(X)
  739. # Split into training and testing data
  740. X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
  741. # Train a Logistic Regression model
  742. model = LogisticRegression()
  743. model.fit(X_train, y_train)
  744. # Define new multiple customer inputs (10 features each)
  745. new_data = pd.DataFrame([
  746. [600, le_geo.transform(['France'])[0], le_gender.transform(['Male'])[0], 40, 3, 60000, 1, 1, 1, 50000],
  747. [750, le_geo.transform(['Germany'])[0], le_gender.transform(['Female'])[0], 50, 5, 100000, 2, 0, 0, 90000],
  748. [580, le_geo.transform(['Spain'])[0], le_gender.transform(['Male'])[0], 37, 2, 20000, 1, 1, 0, 40000],
  749. [820, le_geo.transform(['France'])[0], le_gender.transform(['Female'])[0], 30, 4, 85000, 2, 1, 1, 110000]
  750. ], columns=X.columns)
  751. # Scale new inputs
  752. new_data_scaled = scaler.transform(new_data)
  753. # Predict churn
  754. predictions = model.predict(new_data_scaled)
  755. # Convert predictions to True/False
  756. predicted_labels = ['True' if p == 1 else 'False' for p in predictions]
  757. print(predicted_labels)
  758.  
  759.  
  760. Ex No: 10
  761. DATE :Implementation :Implement K-Means clustering algorithm for segmentinginputs of a business model.
  762. Aim:To implement the K-Means clustering algorithm for segmenting customer inputs in a businessmodel based on features like age, income, and spending score.Algorithm:Step 1: Import necessary libraries (pandas, matplotlib, seaborn, sklearn).Step 2: Load the dataset (Customers_data.csv) using pandas.read_csv().Step 3: Preprocess the data:Rename columns for consistency.Check and handle missing values.Step 4: Perform data analysis and visualization:Use correlation heatmaps and scatter plots.Visualize data distributions and pairwise relationships.Step 5: Select features for clustering (e.g., Age, Annual Income, Spending Score). Step 6: Determine the optimal number of clusters using the Elbow Method (plot WCSS vs K).Step 7: Apply K-Means with the selected K (e.g., K = 5). Step 8: Assign cluster labels to data using fit_predict(). Step 9: Visualize the clustered data using scatter plots with different colors for each cluster.Step 10: Analyze and interpret clusters for business segmentation insights.
  763.  
  764.  
  765. # 1. Import the Dataset
  766. import pandas as pd
  767. data = pd.read_csv('/content/Customers_data.csv')
  768. data.head()
  769.  
  770. # 2. Find Metadata
  771. data.info()
  772. data.describe()
  773. data.shape
  774. data.columns
  775.  
  776. # 3. Data Preprocessing
  777. # Rename columns for easier access in code
  778. data.rename(columns={
  779. 'Annual Income (k$)': 'Annual_Income',
  780. 'Spending Score (1-100)': 'Spending_Score'
  781. }, inplace=True)
  782.  
  783. # Check for missing values
  784. print("\nMissing Values:")
  785. print(data.isnull().sum())
  786.  
  787. # Fill missing numeric values with column mean (if any)
  788. data.fillna(data.mean(numeric_only=True), inplace=True)
  789.  
  790. # Task 1 – Data Analysis & Visualization
  791.  
  792. # 1. Find the Correlation
  793. correlation_matrix = data.corr(numeric_only=True)
  794. print("Correlation Matrix:")
  795. print(correlation_matrix)
  796.  
  797. # Visualize correlation matrix with heatmap
  798. import seaborn as sns
  799. import matplotlib.pyplot as plt
  800.  
  801. plt.figure(figsize=(8, 6))
  802. sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
  803. plt.title('Correlation Heatmap')
  804. plt.show()
  805.  
  806. # 2. Draw the Pair Plot
  807. sns.pairplot(data)
  808. plt.suptitle("Pair Plot of Features", y=1.02)
  809. plt.show()
  810.  
  811. # 3. Pearson, Spearman, Kendall Correlations
  812. data_numeric = data.select_dtypes(include=['number'])
  813. print("Pearson Correlation:\n\n", data_numeric.corr(method='pearson'))
  814. print("\n\n")
  815. print("Spearman Correlation:\n\n", data_numeric.corr(method='spearman'))
  816. print("\n\n")
  817. print("Kendall Correlation:\n\n", data_numeric.corr(method='kendall'))
  818. print("\n\n")
  819.  
  820. # 4. Draw “Age vs Annual Income” and “Age vs Spending Score” Graphs
  821.  
  822. # Age vs Annual Income
  823. plt.figure(figsize=(6,4))
  824. sns.scatterplot(x='Age', y='Annual_Income', data=data)
  825. plt.title('Age vs Annual Income')
  826. plt.xlabel('Age')
  827. plt.ylabel('Annual Income (k$)')
  828. plt.grid(True)
  829. plt.show()
  830.  
  831. # Age vs Spending Score
  832. plt.figure(figsize=(6,4))
  833. sns.scatterplot(x='Age', y='Spending_Score', data=data)
  834. plt.title('Age vs Spending Score')
  835. plt.xlabel('Age')
  836. plt.ylabel('Spending Score (1-100)')
  837. plt.grid(True)
  838. plt.show()
  839.  
  840. # Task 2
  841.  
  842. # 1. Key Difference Between df.loc and df.iloc
  843. # (Explanation, not code)
  844.  
  845. # 2. Use df.loc to Get the Annual Income and Spending Score:
  846. X_loc = data.loc[:, ['Annual_Income', 'Spending_Score']]
  847. print("X_loc =")
  848. print(X_loc)
  849.  
  850. # 3. Use df.iloc to Get the Annual Income and Spending Score
  851. X_iloc = data.iloc[:, [1, 2]]
  852. print("X_iloc =")
  853. print(X_iloc)
  854.  
  855. # Task 3
  856.  
  857. # 1. Distribution of Annual Income
  858. plt.figure(figsize=(8, 6))
  859. sns.histplot(data['Annual_Income'], kde=True, color='blue', bins=20)
  860. plt.title('Distribution of Annual Income')
  861. plt.xlabel('Annual Income (k$)')
  862. plt.ylabel('Frequency')
  863. plt.grid(True)
  864. plt.show()
  865.  
  866. # 2. Distribution of Age
  867. plt.figure(figsize=(8, 6))
  868. sns.histplot(data['Age'], kde=True, color='green', bins=20)
  869. plt.title('Distribution of Age')
  870. plt.xlabel('Age')
  871. plt.ylabel('Frequency')
  872. plt.grid(True)
  873. plt.show()
  874.  
  875. # 3. Distribution of Spending Score
  876. plt.figure(figsize=(8, 6))
  877. sns.histplot(data['Spending_Score'], kde=True, color='red', bins=20)
  878. plt.title('Distribution of Spending Score')
  879. plt.xlabel('Spending Score (1-100)')
  880. plt.ylabel('Frequency')
  881. plt.grid(True)
  882. plt.show()
  883.  
  884. # 4. Number of Female and Male (and Plot)
  885. gender_count = data['Gender'].value_counts()
  886. print("Number of Female and Male:")
  887. print(gender_count)
  888. gender_count.plot(kind='bar', color=['lightblue', 'lightcoral'])
  889. plt.title('Number of Female and Male')
  890. plt.ylabel('Count')
  891. plt.xlabel('Gender')
  892. plt.xticks(rotation=0)
  893. plt.show()
  894.  
  895. # Task 4
  896.  
  897. # 1. Annual Income vs Spending Score (Clustering Visualization)
  898. plt.figure(figsize=(8, 6))
  899. sns.scatterplot(x='Annual_Income', y='Spending_Score', data=data, hue='Gender', palette='coolwarm', s=100, alpha=0.7)
  900. plt.title('Annual Income vs Spending Score')
  901. plt.xlabel('Annual Income (k$)')
  902. plt.ylabel('Spending Score (1-100)')
  903. plt.grid(True)
  904. plt.show()
  905.  
  906. # 2. Annual Income vs Age (Clustering Visualization)
  907. plt.figure(figsize=(8, 6))
  908. sns.scatterplot(x='Annual_Income', y='Age', data=data, hue='Gender', palette='coolwarm', s=100, alpha=0.7)
  909. plt.title('Annual Income vs Age')
  910. plt.xlabel('Annual Income (k$)')
  911. plt.ylabel('Age')
  912. plt.grid(True)
  913. plt.show()
  914.  
  915. # 3. Age vs Spending Score (Clustering Visualization)
  916. plt.figure(figsize=(8, 6))
  917. sns.scatterplot(x='Age', y='Spending_Score', data=data, hue='Gender', palette='coolwarm', s=100, alpha=0.7)
  918. plt.title('Age vs Spending Score')
  919. plt.xlabel('Age')
  920. plt.ylabel('Spending Score (1-100)')
  921. plt.grid(True)
  922. plt.show()
  923.  
  924. # Task 5
  925.  
  926. # WCSS (Within-Cluster Sum of Squares) in K-Means
  927. from sklearn.cluster import KMeans
  928.  
  929. X = data[['Annual_Income', 'Spending_Score']] # Features for clustering
  930. wcss = []
  931. for k in range(1, 11):
  932. kmeans = KMeans(n_clusters=k, init='k-means++', random_state=42)
  933. kmeans.fit(X)
  934. wcss.append(kmeans.inertia_)
  935. plt.figure(figsize=(10, 6))
  936. plt.plot(range(1, 11), wcss, marker='o')
  937. plt.title('WCSS vs Number of Clusters (Elbow Method)')
  938. plt.xlabel('Number of Clusters (K)')
  939. plt.ylabel('WCSS')
  940. plt.grid(True)
  941. plt.xticks(range(1, 11))
  942. plt.show()
  943.  
  944. # Task 6
  945.  
  946. # 3. Apply K-Means Clustering
  947. X = data[['Age', 'Annual_Income', 'Spending_Score']] # Features for clustering
  948. kmeans = KMeans(n_clusters=5, init='k-means++', random_state=42)
  949. data['Cluster'] = kmeans.fit_predict(X)
  950. data. Head()
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement