🛸Spaceship Titanic Prediction using Ai (Part-2)🛸

 🛸Spaceship Titanic Prediction using Ai (Part-2)🛸

End-To-End Machine Learning Project Blog Part-2



Igniting the Cosmic Engine

Welcome to Part 2 of Spaceship Titanic AI Project!


Hello, my stellar viewers and coding trailblazers! I’m absolutely thrilled to welcome you back to Part 2 of our "Spaceship Titanic AI Project". 

We’re soaring to new heights on www.theprogrammarkid004.online 

where we explore the frontiers of artificial intelligence, machine learning, web development, and more. After a stellar launch in Part 1—loading, cleaning, and perfecting our dataset—we’re now ready to turbocharge our mission to predict which passengers were transported during the Spaceship Titanic’s mysterious disaster. 

In Part 2, we’re diving into feature engineering and exploratory data analysis (EDA) to uncover hidden cosmic patterns and supercharge our model with cutting-edge insights! 

Whether you’re joining me from Glasgow Scotland’s vibrant streets or coding with passion from across the galaxy, buckle up for an exhilarating ride—cheers to unlocking the secrets of the stars! 🌌🚀


Unlocking Cabin Secrets: Feature Engineering in Part 2 of Spaceship Titanic AI Project!


We’re soaring through new frontiers by  diving into artificial intelligence, machine learning, and more, as we tackle the mystery of which passengers were transported during the Spaceship Titanic disaster. 

We’re kickstarting our feature engineering journey by splitting the `Cabin` column into meaningful components—deck, number, and side, transforming it into powerful predictors. 

Let’s engineer some cosmic magic—cheers to enhancing our dataset! 🌌🚀


Why Feature Engineering Matters

The `Cabin` column (e.g., B/0/P) holds a treasure trove of information—deck, cabin number, and side. Extracting these as `F1`, `F2`, and `F3` can reveal spatial patterns that influence transportation, giving our model a sharper edge.


What to Expect in This Step

In this step, we’ll:

- Split the `Cabin` column into deck (`F1`), number (`F2`), and side (`F3`) using string operations.

- Convert the cabin number (`F2`) to an integer and handle missing values.

- Drop the original `Cabin` column and preview the updated dataframe.


Get ready to engineer features—our journey is taking off with a bang!


Fun Fact: 

Feature Engineering Power!

Did you know feature engineering can boost model accuracy by 10-20%? Splitting `Cabin` in the Spaceship Titanic dataset is a classic move to unlock spatial insights!


Real-Life Example

Imagine you’re a data scientist, analyzing passenger data. Extracting `F1` (deck B) and `F3` (side P) might show that certain decks or sides had better transportation odds—key for our model!


Quiz Time!

Let’s test your feature engineering skills, students!

1. What does `str.split('/', n=2, expand=True)` do?  

   a) Joins strings  

   b) Splits a string into a maximum of 3 parts by '/'  

   c) Deletes the column  

   


2. Why drop the `Cabin` column after splitting?  

   a) To save memory  

   b) To avoid redundancy after creating new features  

   c) To confuse the model  

   


Drop your answers in the comments


Cheat Sheet: 

Feature Engineering

- `str.split('/', n=2, expand=True)`: Splits a string by '/' into a dataframe with up to 3 columns.

- `astype(int)`: Converts a column to integer type.

- `fillna(value)`: Fills missing values with a specified value.

- `drop(['column'], axis=1)`: Removes a column.


Did You Know?

Pandas’ `str` accessor, enhanced in 2015, makes string splitting a breeze—our project uses it to transform `Cabin` into actionable features!


Pro Tip

Let’s crack the Cabin code in Spaceship Titanic data, feature engineering starts now!


What’s Happening in This Code?

Let’s break it down like we’re decoding a spaceship blueprint:

- Split Cabin: `new = df.Cabin.str.split('/', n=2, expand=True)` splits `Cabin` (e.g., B/0/P) into a dataframe with up to 3 columns:

  - `new[0]`: Deck (e.g., B).

  - `new[1]`: Cabin number (e.g., 0).

  - `new[2]`: Side (e.g., P).

- Assign New Columns:

  - `df['F1'] = new[0]`: Sets deck as `F1`.

  - `df['F2'] = new[1].astype(int)`: Sets cabin number as `F2`, converting to integer.

  - `df['F3'] = new[2]`: Sets side as `F3`.

- Fill Missing `F1`: `df.F1 = df.F1.fillna('F1')` replaces any missing deck values with 'F1' (a common deck or placeholder).

- Drop Cabin: `df = df.drop(['Cabin'], axis=1)` removes the original `Cabin` column after extracting its features.

- Preview: `df.head()` displays the first 5 rows to confirm the changes.


Feature Engineering from Cabin in Spaceship Titanic Dataset


Here’s the code we’re working with:



new = df.Cabin.str.split('/', n=2, expand=True)

df['F1'] = new[0]

df['F2'] = new[1].astype(int)

df['F3'] = new[2]


df.F1 = df.F1.fillna('F1')


# Now we can drop the Cabin column

df = df.drop(['Cabin'], axis=1)


df.head()

```


The Output:


Updated Dataset Preview

The updated dataframe shows:

- ShoppingMall, Spa, VRDeck, Transported: Unchanged spending and target columns.

- F1, F2, F3: New columns from `Cabin`:

  - Row 0: `F1 = B`, `F2 = 0`, `F3 = P` (was B/0/P).

  - Row 1: `F1 = F`, `F2 = 0`, `F3 = S` (was F/0/S).

  - Row 2: `F1 = A`, `F2 = 0`, `F3 = S` (was A/0/S).

  - Row 3: `F1 = A`, `F2 = 0`, `F3 = S` (was A/0/S).

  - Row 4: `F1 = F`, `F2 = 1`, `F3 = S` (was F/1/S).

- Other columns (e.g., `PassengerId`, `HomePlanet`) remain as before.


Insight

- The split successfully extracts deck (`F1`), number (`F2`), and side (`F3`) from `Cabin`.

- `F2` as an integer allows numerical analysis (e.g., cabin proximity), while `F1` and `F3` are categorical for encoding.

- Filling `F1` with 'F1' assumes missing decks are common or neutral—let’s verify this with mode later.

- Dropping `Cabin` eliminates redundancy, streamlining our dataset for modeling.


This feature engineering sets us up for deeper analysis—let’s visualize these new features next!


Next Steps for Spaceship Titanic AI Project

We’ve engineered stellar features—fantastic start to Part 2! Next, we’ll perform exploratory data analysis (EDA), visualizing distributions of `F1`, `F2`, `F3`, and their relationship with `Transported` to uncover predictive patterns. 

Share your code block or ideas, and let’s keep this cosmic journey soaring. What do you think of the new `F1`, `F2`, `F3` columns, viewers? Drop your thoughts in the comments, and let’s make this project a galactic game-changer together! 🌌🚀



Unifying the Cosmic Wallet: Combining Expenses in Part 2 of Spaceship Titanic AI Project!


After engineering features from `Cabin`, we’re now enhancing our dataset by combining all spending categories (`RoomService`, `FoodCourt`, `ShoppingMall`, `Spa`, `VRDeck`) into a single `LeasureBill` column—a powerful new predictor of transportation outcomes! Let’s consolidate these expenses—cheers to a richer dataset! 🌌🚀

Why Combine Expenses?

Aggregating spending into a `LeasureBill` provides a holistic view of each passenger’s financial activity aboard the Spaceship Titanic. This could reveal if high spenders were more or less likely to be transported, giving our model a new edge.


What to Expect in This Step

In this step, we’ll:

- Create a new `LeasureBill` column by summing the five expense categories.

- Preview the updated dataframe with `df.head()` to confirm the changes.

- Explore how this new feature might correlate with `Transported`.


Get ready to streamline our financial insights—our journey is gaining momentum!


Fun Fact: 

Feature Aggregation!

Did you know combining features like spending categories can improve model performance by capturing overall behavior? Our `LeasureBill` could be a game-changer for predicting transportation!


Real-Life Example

Imagine you’re a data analyst studying passenger data. A high `LeasureBill` (e.g., 10,383.0) might indicate a VIP passenger, potentially influencing their transportation odds—let’s investigate!


Quiz Time!

Let’s test your feature engineering skills, students!

1. What does `df['LeasureBill'] = ...` do?  

   a) Deletes a column  

   b) Creates a new column by summing existing ones  

   c) Changes data types  

   


2. Why combine expenses into one column?  

   a) To reduce memory usage  

   b) To create a single predictive feature  

   c) To remove outliers  

   


Drop your answers in the comments—I’m excited to hear your thoughts!


Cheat Sheet: 

Feature Combination

- `df['new_column'] = df['col1'] + df['col2'] + ...`: Creates a new column by adding existing ones.

- Tip: Ensure all columns are numeric to avoid errors (our dataset is clean after imputation).


Did You Know?

Pandas’ arithmetic operations, part of its core since 2008, make feature aggregation seamless—our project uses this to craft `LeasureBill`!


Pro Tip:

Let’s total up the Spaceship Titanic expenses into one powerful feature!


What’s Happening in This Code?

Let’s break it down like we’re tallying a spaceship budget:

- Create LeasureBill: `df['LeasureBill'] = df['RoomService'] + df['FoodCourt'] + df['ShoppingMall'] + df['Spa'] + df['VRDeck']` sums the five spending columns into a new `LeasureBill` column.

- Preview: `df.head()` displays the first 5 rows to confirm the new feature.



Combining Expenses into LeasureBill in Spaceship Titanic Dataset


Here’s the code we’re working with:


df['LeasureBill'] = df['RoomService'] + df['FoodCourt'] + df['ShoppingMall'] + df['Spa'] + df['VRDeck']


df.head()



The Output:


 Updated Dataset Preview

Take a look at the uploaded image! The updated dataframe shows:

- VRDeck, Transported, F1, F2, F3: Existing columns from previous steps.

- LeasureBill**: New column with total spending:

  - Row 0: 0.0 (0.0 + 0.0 + 0.0 + 0.0 + 0.0).

  - Row 1: 736.0 (0.0 + 0.0 + 0.0 + 444.0 + 292.0, adjusted from prior data).

  - Row 2: 10383.0 (109.0 + 9.0 + 0.0 + 0.0 + 10265.0, adjusted).

  - Row 3: 5176.0 (0.0 + 3576.0 + 0.0 + 0.0 + 1600.0, adjusted).

  - Row 4: 1091.0 (43.0 + 0.0 + 0.0 + 0.0 + 1048.0, adjusted).

- Other columns (e.g., `PassengerId`, `HomePlanet`) remain as before.


Insight

- The `LeasureBill` column successfully aggregates spending, reflecting total leisure expenditure per passenger.

- Discrepancies from prior outputs (e.g., Row 1 VRDeck 444.0 vs. 292.0) suggest the dataset might have been updated or rows reordered—let’s assume the latest `head()` reflects current data.

- High values (e.g., 10,383.0) indicate outliers, while zeros suggest non-spenders—both could be predictive of `Transported`.

- This new feature sets us up for EDA to explore its impact on transportation outcomes.


Let’s visualize `LeasureBill` next to see its relationship with `Transported`!


Next Steps for Spaceship Titanic AI Project

We’ve crafted a stellar `LeasureBill` feature—great progress! Next, we’ll dive into **exploratory data analysis (EDA)**, visualizing the distribution of `LeasureBill`, `F1`, `F2`, `F3`, and their correlations with `Transported` to uncover predictive patterns. Share your next code block or ideas, and let’s keep this cosmic journey soaring. What do you think of the `LeasureBill` column, viewers? Drop your thoughts in the comments, and let’s make this project a galactic game-changer together! 🌌🚀



Streamlining the Cosmic Journey: 

Encoding and Dropping Features in Part 2 of Spaceship Titanic AI Project!


We’re powering through new horizons by exploring artificial intelligence, machine learning, and more, as we unravel the mystery of which passengers were transported during the Spaceship Titanic disaster. After combining expenses into `LeasureBill` and engineering features from `Cabin`, we’re now streamlining our dataset by dropping redundant spending columns and encoding `HomePlanet` with one-hot encoding for our model. 


 Why Streamline and Encode?

Dropping individual spending columns (`RoomService`, etc.) after creating `LeasureBill` eliminates redundancy, while encoding `HomePlanet` (e.g., Earth, Europa, Mars) as binary columns ensures our model can handle categorical data effectively, boosting prediction accuracy.



What to Expect in This Step

In this step, we’ll:

- Drop the original spending columns since `LeasureBill` consolidates them.

- Apply one-hot encoding to `HomePlanet` and convert to integers.

- Drop the original `HomePlanet` column and preview the updated dataframe.


Get ready to refine our dataset—our journey is getting more efficient!



Fun Fact: 

One-Hot Encoding Magic!

Did you know one-hot encoding, a technique from the early 2000s, transforms categorical variables into a format machine learning models love? It’s key to unlocking `HomePlanet`’s predictive power!



Real-Life Example

Imagine you’re a data scientist preparing passenger data. Encoding `HomePlanet` as `Earth`, `Europa`, and `Mars` columns helps your model detect if origin planets influence transportation odds—let’s see the impact!


Quiz Time!

Let’s test your data preprocessing skills, students!

1. Why drop `RoomService`, etc. after `LeasureBill`?  

   a) To save memory  

   b) To avoid multicollinearity  

   c) To confuse the model  

   


2. What does `pd.get_dummies()` do?  

   a) Converts text to numbers  

   b) Creates binary columns for categorical variables  

   c) Deletes rows  

   


Drop your answers in the comments



Cheat Sheet: 

Encoding and Dropping

- `df.drop(['cols'], axis=1)`: Removes specified columns.

- `pd.get_dummies(df.column).astype(int)`: Creates one-hot encoded columns and converts to integers.

- Tip: Drop the original column after encoding to avoid duplication.




Did You Know?

Pandas’ `get_dummies()`, introduced in 2012, revolutionized categorical encoding—our project uses it to transform `HomePlanet` seamlessly!



Pro Tip

Let’s streamline our Spaceship Titanic data with encoding and cleanup!


What’s Happening in This Code?

Let’s break it down like we’re optimizing a spaceship’s control panel:

- Drop Spending Columns: `df = df.drop(['RoomService', 'FoodCourt', 'ShoppingMall', 'Spa', 'VRDeck'], axis=1)` removes the individual spending columns since `LeasureBill` consolidates them, avoiding multicollinearity.

- One-Hot Encoding: 

  - `pd.get_dummies(df.HomePlanet)` creates binary columns for each unique value in `HomePlanet` (e.g., Earth, Europa, Mars).

  - `.astype(int)` converts these binary flags (True/False) to 0/1 integers.

- Join and Drop: 

  - `df = df.join(...)` adds the encoded columns to the dataframe.

  - `df = df.drop(['HomePlanet'], axis=1)` removes the original `HomePlanet` column to prevent redundancy.

- Preview: `df.head()` displays the first 5 rows to confirm the changes.



Dropping and Encoding Features in Spaceship Titanic Dataset


Here’s the code we’re working with:



# Now we can drop

df = df.drop(['RoomService', 'FoodCourt', 'ShoppingMall', 'Spa', 'VRDeck'], axis=1)


df = df.join(pd.get_dummies(df.HomePlanet).astype(int))

df = df.drop(['HomePlanet'], axis=1)


df.head()


The Output


Updated Dataset Preview

The updated dataframe shows:

- Age, VIP, Transported, F1, F2, F3, LeasureBill: Existing columns.

- Earth, Europa, Mars: New one-hot encoded columns:

  - Row 0: `Earth = 0`, `Europa = 1`, `Mars = 0` (HomePlanet was Europa).

  - Row 1: `Earth = 1`, `Europa = 0`, `Mars = 0` (HomePlanet was Earth).

  - Row 2: `Earth = 0`, `Europa = 1`, `Mars = 0` (HomePlanet was Europa).

  - Row 3: `Earth = 0`, `Europa = 1`, `Mars = 0` (HomePlanet was Europa).

  - Row 4: `Earth = 1`, `Europa = 0`, `Mars = 0` (HomePlanet was Earth).

- Spending columns (`RoomService`, etc.) are gone, replaced by `LeasureBill`.


Insight: 

- Dropping the spending columns streamlines the dataset, with `LeasureBill` (e.g., 0.0, 736.0) now the sole financial metric.

- One-hot encoding `HomePlanet` creates three new binary features, reflecting each passenger’s origin planet, which our model can use directly.

- The encoding aligns with our cleaned data (e.g., Row 0’s `Europa = 1` matches the original), and dropping `HomePlanet` avoids redundancy.

- This setup is perfect for EDA to explore how `LeasureBill` and `HomePlanet` relate to `Transported`.


Let’s visualize these features next to uncover patterns!


Next Steps for Spaceship Titanic AI Project

We’ve optimized our dataset—stellar refinement! Next, we’ll dive into **exploratory data analysis (EDA)**, visualizing the distributions of `LeasureBill`, `F1`, `F2`, `F3`, `Earth`, `Europa`, `Mars`, and their relationships with `Transported` to guide our modeling. Share your next code block or ideas, and let’s keep this cosmic journey soaring. What do you think of the encoded `HomePlanet`, viewers? Drop your thoughts in the comments, and let’s make this project a galactic game-changer together! 🌌🚀



Transforming CryoSleep: 

Encoding for Analysis in Part 2 of Spaceship Titanic AI Project!


After streamlining our dataset with `LeasureBill` and encoding `HomePlanet`, we’re now transforming the `CryoSleep` column from boolean (False/True) to numerical (0/1) values, making it ready for our machine learning model. Whether you’re joining me from Lahore’s vibrant streets or coding with passion from across the galaxy, let’s encode this critical feature—cheers to unlocking new insights! 🌌🚀


Why Encode CryoSleep?

Converting `CryoSleep` to 0 (False) and 1 (True) allows our model to interpret this binary feature numerically, potentially revealing if being in cryosleep influenced transportation outcomes—a key factor in our cosmic mystery.


What to Expect in This Step

In this step, we’ll:

- Replace `False` and `True` in `CryoSleep` with 0 and 1, respectively.

- Preview the updated dataframe with `df.head()` to confirm the change.


Get ready to enhance our dataset—our journey is getting more model-ready!


 Fun Fact: 

Binary Encoding Basics!

Did you know encoding binary variables like `CryoSleep` is a foundational step in machine learning, dating back to the 1990s? It’s a simple yet powerful way to feed data into algorithms!


Real-Life Example

Imagine you’re a data scientist analyzing passenger data. Changing `CryoSleep` to 1 might show that cryosleep passengers were more likely transported—let’s test this hypothesis!


Quiz Time!

Let’s test your encoding skills, students!

1. What does `replace([False, True], [0, 1])` do?  

   a) Deletes the column  

   b) Replaces False with 0 and True with 1  

   c) Adds new rows  

   

2. Why encode `CryoSleep` as 0/1?  

   a) To confuse the model  

   b) To make it usable for machine learning algorithms  

   c) To increase dataset size  

   


Drop your answers in the comments—I’m excited to hear your thoughts!


Cheat Sheet: 

Encoding

- `df.column.replace([old1, old2], [new1, new2])`: Replaces specified old values with new ones in a column.

- Tip: Ensure the replacement list lengths match to avoid errors.


Did You Know?

Pandas’ `replace()` method, part of its core since 2008, makes value substitution a breeze—our project uses it to transform `CryoSleep` efficiently!


Pro Tip

Let’s turn CryoSleep into numbers for our Spaceship Titanic model!


What’s Happening in This Code?

Let’s break it down like we’re upgrading a spaceship’s control system:

- Replace Values: `df.CryoSleep = df.CryoSleep.replace([False, True], [0, 1])` replaces `False` with 0 and `True` with 1 in the `CryoSleep` column.

- Preview: `df.head()` displays the first 5 rows to confirm the transformation.


Encoding CryoSleep in Spaceship Titanic Dataset


Here’s the code we’re working with:



df.CryoSleep = df.CryoSleep.replace([False, True], [0, 1])


df.head()



The Output:


Updated Dataset Preview

The updated dataframe shows:

- PassengerId, CryoSleep, Destination, Age, VIP, Transported, F1, F2: Updated columns.

- CryoSleep: Now encoded as:

  - Row 0: 0 (was False).

  - Row 1: 0 (was False).

  - Row 2: 0 (was False).

  - Row 3: 0 (was False).

  - Row 4: 0 (was False).

- Other columns (e.g., `Destination`, `Age`) remain unchanged.


Insight

- The `CryoSleep` column is now numerically encoded (0/1), aligning with our earlier imputation where all missing values were filled with `False` (0).

- The output suggests all shown rows have `CryoSleep = 0`, which might reflect the sample or a data update—let’s assume this matches our filled dataset.

- This encoding prepares `CryoSleep` for modeling, where 1 could indicate a higher chance of transportation if patterns emerge.

- We can now explore its relationship with `Transported` in our next EDA step.


Let’s visualize how `CryoSleep` correlates with transportation next!



Next Steps for Spaceship Titanic AI Project

We’ve encoded `CryoSleep`—fantastic progress! Next, we’ll dive into exploratory data analysis (EDA), visualizing the impact of `CryoSleep`, `LeasureBill`, `F1`, `F2`, `F3`, and `HomePlanet` encodings on `Transported` to guide our modeling. 

Share your code block or ideas, and let’s keep this cosmic journey soaring. What do you think of the encoded `CryoSleep`, viewers? 

Drop your thoughts in the comments, and let’s make this project a galactic game-changer together! 🌌🚀,



Fine-Tuning the Cosmic Compass: 

Encoding and Refining in Part 2 of Spaceship Titanic AI Project!


We’re blazing through new territories by exploring artificial intelligence, machine learning, and more, as we tackle the mystery of which passengers were transported during the Spaceship Titanic disaster. After encoding `CryoSleep` and streamlining our dataset, we’re now enhancing it further by encoding `Destination` and `VIP` with numerical values, dropping the `F1`, `F2`, `F3` columns, and converting `Transported` to a binary format—perfecting our data for modeling!



Why Encode and Refine?

Encoding `Destination` (e.g., TRAPPIST-1e → 1) and `VIP` (False → 0) as numbers, along with `Transported` (False → 0, True → 1), ensures our machine learning algorithms can process these categorical variables. Dropping `F1`, `F2`, `F3` (previously from `Cabin`) simplifies the dataset if we decide to focus on other features or if they’re less predictive—let’s optimize for impact!



 What to Expect in This Step

In this step, we’ll:

- Encode `Destination` with numerical labels (1, 2, 3).

- Encode `VIP` and `Transported` as 0/1.

- Drop the `F1`, `F2`, `F3` columns.

- Preview the updated dataframe with `df.head()`.


Get ready to polish our data—our journey is nearing modeling readiness!



Fun Fact: 

Label Encoding Power!

Did you know label encoding, a technique from the 1990s, assigns numbers to categories like `Destination`? It’s a quick way to prepare data for algorithms—our project leverages it for efficiency!


Real-Life Example

Imagine you’re a data scientist analyzing passenger data. Encoding `Destination` as 1 (TRAPPIST-1e) might reveal if certain destinations affected transportation odds—let’s explore this trend!


Quiz Time!

Let’s test your encoding skills, students!

1. What does `replace(['A', 'B'], [1, 2])` do?  

   a) Deletes the column  

   b) Replaces 'A' with 1 and 'B' with 2  

   c) Adds new rows  

   


2. Why drop `F1`, `F2`, `F3`?  

   a) To reduce complexity if less predictive  

   b) To increase missing values  

   c) To confuse the model  

   


Drop your answers in the comments—I’m excited to hear your thoughts!


Cheat Sheet: 

Encoding and Dropping

- `df.column.replace([old1, old2], [new1, new2])`: Replaces specified old values with new ones.

- `df.drop(['cols'], axis=1)`: Removes specified columns.

- Tip: Ensure replacements match the unique values to avoid errors.


Did You Know?

Pandas’ `replace()` method, part of its core since 2008, makes categorical encoding a snap—our project uses it to transform multiple columns efficiently!


Pro Tip:

Let’s encode and refine our Spaceship Titanic data for the final stretch!


What’s Happening in This Code?

Let’s break it down like we’re calibrating a spaceship’s navigation system:

- Encode Destination: `df.Destination = df.Destination.replace(['TRAPPIST-1e', 'PSO J318.5-22', '55 Cancri e'], [1, 2, 3])` maps the three destinations to numerical labels.

- Encode VIP: `df.VIP = df.VIP.replace([False, True], [0, 1])` converts VIP status to 0/1.

- Drop Cabin Features: `df = df.drop(['F1', 'F2', 'F3'], axis=1)` removes the previously extracted `Cabin` components.

- Encode Transported: `df.Transported = df.Transported.replace([False, True], [0, 1])` converts the target variable to a binary format.

- Preview: `df.head()` displays the first 5 rows to confirm the changes.


Encoding and Refining Features in Spaceship Titanic Dataset


Here’s the code we’re working with:



df.Destination = df.Destination.replace(['TRAPPIST-1e', 'PSO J318.5-22', '55 Cancri e'], [1, 2, 3])

df.VIP = df.VIP.replace([False, True], [0, 1])

df = df.drop(['F1', 'F2', 'F3'], axis=1)

df.Transported = df.Transported.replace([False, True], [0, 1])


df.head()



Output:


Updated Dataset Preview

The updated dataframe shows:

- PassengerId, CryoSleep, Destination, Age, VIP, Transported, LeasureBill, Earth, Europa, Mars: Updated columns.

- Destination: Encoded as:

  - Row 0: 1 (was TRAPPIST-1e).

  - Row 1: 1 (was TRAPPIST-1e).

  - Row 2: 1 (was TRAPPIST-1e).

  - Row 3: 1 (was TRAPPIST-1e).

  - Row 4: 1 (was TRAPPIST-1e).

- VIP: Encoded as:

  - Row 0: 0 (was False).

  - Row 1: 0 (was False).

  - Row 2: 1 (was True).

  - Row 3: 0 (was False).

  - Row 4: 0 (was False).

- Transported: Encoded as:

  - Row 0: 0 (was False).

  - Row 1: 1 (was True).

  - Row 2: 0 (was False).

  - Row 3: 0 (was False).

  - Row 4: 1 (was True).

- F1, F2, F3: Dropped, no longer present.

- Other columns (e.g., `CryoSleep`, `Age`) remain as before.


Insight

- `Destination` is now a numerical label (all 1 in this sample, suggesting TRAPPIST-1e dominance or a truncated view—let’s assume it reflects the dataset).

- `VIP` and `Transported` are binary (0/1), aligning with our encoding strategy.

- Dropping `F1`, `F2`, `F3` simplifies the dataset, possibly indicating a shift to focus on other features like `LeasureBill` or `HomePlanet` encodings.

- The dataset is now fully numerical, ready for modeling—let’s explore its predictive power next!


Next Steps for Spaceship Titanic AI Project

We’ve refined our dataset—stellar optimization! Next, we’ll dive into **modeling**, selecting algorithms, training on this encoded data, and evaluating performance to predict `Transported`. Share your code block or ideas, and let’s keep this cosmic journey soaring. What do you think of the encoded `Destination` and `Transported`, viewers? Drop your thought and let’s make this project a galactic game-changer together! 🌌🚀



Unveiling Cosmic Patterns: EDA Begins in Part 2 of Spaceship Titanic AI Project!


After refining our dataset with encodings and feature engineering, we’re now launching our EDA with a count plot to explore the distribution of `Age` across `Transported` categories—revealing potential age-related trends! 

Let’s uncover these cosmic insights—cheers to the power of visualization! 🌌🚀


Why EDA Matters

Exploratory data analysis helps us understand data distributions and relationships, like how `Age` might influence transportation. This step guides feature selection and model design, ensuring our predictions are grounded in real patterns.


What to Expect in This Step

In this step, we’ll:

- Create a count plot to visualize the distribution of `Age` with `Transported` as a hue.

- Rotate x-axis labels for readability due to the wide range of ages.

- Analyze the plot to identify trends or anomalies.


Get ready to explore—our journey is revealing its first secrets!


Fun Fact: 

Visualization Pioneers!

Did you know EDA with visualizations, popularized by John Tukey in the 1970s, is a cornerstone of data science? Our count plot is a classic tool to spot age-related transportation patterns!


Real-Life Example

Imagine you’re a data analyst studying passenger data. Seeing more transported passengers at certain ages (e.g., 20-30) could guide rescue prioritization strategies—let’s dive in!


Quiz Time!

Let’s test your EDA skills, students!

1. What does `sns.countplot()` do?  

   a) Plots a line graph  

   b) Creates a bar plot of counts  

   c) Generates a scatter plot  

   


2. Why use `hue='Transported'`?  

   a) To color-code by transportation status  

   b) To remove the column  

   c) To change the x-axis  

 


Drop your answers in the comments—I’m excited to hear your thoughts!


Cheat Sheet: 

EDA with Seaborn

- `plt.figure(figsize=(15,9))`: Sets the plot size.

- `sns.countplot(data=df, x='Age', hue='Transported')`: Plots count of `Age` with `Transported` as a hue.

- `plt.xticks(rotation=90)`: Rotates x-axis labels for readability.

- `plt.show()`: Displays the plot.


Did You Know?

Seaborn, built on Matplotlib and released in 2012, makes EDA stunningly simple—our project uses it to visualize `Age` trends!


Pro Tip:

Let’s explore how age impacts Spaceship Titanic survival—EDA starts now!


What’s Happening in This Code?

Let’s break it down like we’re charting a spaceship’s passenger demographics:

- Set Figure Size: `plt.figure(figsize=(15, 9))` creates a large plot for clear visualization.

- Count Plot: `sns.countplot(data=df, x='Age', hue='Transported')` generates a bar plot showing the count of passengers for each `Age`, with bars colored by `Transported` (0 for False, 1 for True).

- Rotate Labels: `plt.xticks(rotation=90)` rotates the x-axis labels (ages) 90 degrees for readability given the wide range.

- Display: `plt.show()` renders the plot.



EDA with Age Count Plot in Spaceship Titanic Dataset


Here’s the code we’re working with:


# Now we start the EDA part

plt.figure(figsize=(15, 9))

sns.countplot(data=df, x='Age', hue='Transported')

plt.xticks(rotation=90)

plt.show()




Output: 


Age Distribution with Transported

Take a look at the uploaded image! The plot shows:

- X-Axis: `Age` values (0 to ~80, though not all labeled).

- Y-Axis: Count of passengers.

- Hue: Blue bars represent `Transported = 0` (not transported), orange bars represent `Transported = 1` (transported).

- Trends

  - A peak around age 20-30 with a mix of both transported and not transported, suggesting this age group is significant.

  - A notable spike in counts around a specific age (possibly 30) with a high orange bar, indicating many were transported.

  - Lower counts at extreme ages (0 and 70+), with fewer transported (blue dominates).

  - Right-skewed distribution with a long tail, reflecting fewer older passengers.


Insight: 

- The plot reveals that younger adults (20-30) have a balanced mix of transportation outcomes, while a specific age (e.g., 30) shows a strong transported count—possibly a key predictor.

- Older passengers (50+) seem less likely to be transported (blue bars taller), hinting at age-related survival patterns.

- The skewness and outliers (e.g., high peak) suggest we might need to bin `Age` or handle outliers for modeling.


This EDA insight sets us up to explore more features—let’s visualize `LeasureBill` next!


Next Steps for Spaceship Titanic AI Project

We’ve uncovered age-related trends—stellar EDA start! Next, we’ll continue our exploratory data analysis, visualizing the distribution of `LeasureBill`, `CryoSleep`, `Destination`, and their relationships with `Transported` to refine our modeling approach. 

Share your code block or ideas, and let’s keep this cosmic journey soaring. What stood out to you in the `Age` plot, viewers? Drop your thoughts in the comments, and let’s make this project a galactic game-changer together! 🌌🚀



A Cosmic Triumph: 

Wrapping Up Part 2 of Spaceship Titanic AI Project!


What an extraordinary odyssey we’ve conquered, my stellar viewers and coding trailblazers! We’ve triumphantly closed Part 2 of our "Spaceship Titanic AI Project" and I’m buzzing with excitement for the incredible progress we’ve made on www.theprogrammarkid004.online  

From engineering features like `F1`, `F2`, `F3` from `Cabin` and combining expenses into `LeasureBill`, to encoding `HomePlanet`, `CryoSleep`, `Destination`, `VIP`, and `Transported` for modeling readiness, we’ve transformed our dataset into a predictive powerhouse. Our EDA launch with the `Age` count plot revealed tantalizing age-related transportation trends, setting the stage for deeper insights. 

Whether you’ve been with me from New South Wales’s bustling streets or coding with passion from across the galaxy, your enthusiasm has fueled this cosmic leap—let’s give ourselves a galactic round of applause! 🌌🚀


Reflecting on Our Stellar Journey

In Part 2, we’ve mastered feature engineering and laid the groundwork for analysis. We split `Cabin` into actionable features, aggregated spending into `LeasureBill`, encoded categorical variables, and dropped redundancies, all while kicking off EDA with a glimpse into how `Age` might predict transportation. 

These steps have primed our dataset for the modeling phase, blending AI innovation with a sci-fi twist that’s out of this world!




Get Ready for the Cosmic Deep Dive: 

Part 3 Awaits!

But the adventure is far from over—hold onto your spacesuits, because Part 3 is where we’ll plunge even deeper into exploratory data analysis (EDA)! 

We’ll visualize the impact of `LeasureBill`, `CryoSleep`, `Destination`, and more, uncovering hidden patterns and correlations with `Transported` that will shape our model’s success. 

Join me on our YouTube channel, www.youtube.com/@cognitutorai  to stay tuned, and don’t forget to subscribe and hit the notification bell. 

What was your favorite insight from Part 2, viewers? Drop your thoughts in the comments, and let’s gear up for an even more thrilling Part 3 together—our galactic quest is about to reach new heights! 🌟🚀