Python Interview Preparation Guide for Data Analysts - list, tuple, set, dictionary comparison

Python is the most widely used language for Data Analysts because of its simple syntax, powerful libraries (Pandas, NumPy, Matplotlib, Scikit-learn), and flexibility across ETL, reporting, visualization, and modeling.

This guide consolidates basic Python interview questions, explains key data structures (list, tuple, set, dictionary, array), introduces Pandas essentials (joins, groupby, pivot), and ends with FAQs and cheat sheets to clear up confusion during interviews.

1. What is Python, and what are its key features?

Python is a high-level, interpreted, object-oriented language designed for readability and productivity.

Key Features

✅ Simple Syntax – Easy for beginners, efficient for experts.
✅ Interpreted – Executes code line by line, easier debugging.
✅ Dynamic Typing – No need to declare variable types explicitly.
✅ Extensive Libraries – Pandas, NumPy, Scikit-learn, Django.
✅ Cross-Platform – Works on Windows, macOS, Linux.

Data Analyst Connection

Pandas → data cleaning and wrangling.
NumPy → numerical computations.
Matplotlib/Seaborn → data visualization.
SQLAlchemy → database connectivity.

2. Core Data Structures in Python

📌 Comparison Table

Feature	List	Tuple	Set	Dictionary	Array (NumPy)
Definition	Ordered, mutable collection	Ordered, immutable collection	Unordered, unique elements	Key-value mapping	Homogeneous, numeric arrays
Syntax Example	`a = [1, 2, 3, 3]`	`t = (1, 2, 3, 3)`	`s = {1, 2, 3}`	`d = {"a":1, "b":2}`	`arr = np.array([1,2,3])`
Order	✅ Preserves order	✅ Preserves order	❌ Unordered	✅ Preserves order (3.7+)	✅ Preserves order
Mutable	✅ Yes: `a[0]=10`	❌ No (immutable)	✅ Add/remove: `s.add(4)`	✅ Yes: `d["c"]=3`	✅ Yes: `arr[0]=10`
Duplicates	✅ Allowed: `[1,1,2]`	✅ Allowed: `(1,1,2)`	❌ Not allowed: `{1,2}`	❌ Keys unique, values can repeat: `{"a":1,"b":1}`	✅ Allowed: `[1,1,2]`
Indexing	✅ By index: `a[1] → 2`	✅ By index: `t[1] → 2`	❌ Not available	✅ By key: `d["a"] → 1`	✅ By index: `arr[1] → 2`
Methods	`.append(4)` `.sort()`	`.count(2)` `.index(3)`	`.union()` `.intersection()`	`.keys()` `.values()`	`.mean()` `.reshape()`
Performance	Moderate	Faster (immutable)	Very fast membership test	Fast lookups (hash table)	Extremely fast (vectorized)
Best For	General-purpose collections	Fixed schema, constants	Unique IDs, categories	Mappings, JSON-style data	Numeric-heavy analytics
Analyst Example	`[1200, 800, 450]` → raw sales	`(2023,"Q1",5000)` → report key	`{1001,1002,1003}` → unique IDs	`{101:"Electronics"}` → product map	`[[100,200],[300,400]]` → sales matrix

📌 Dimensionality

List → 1D: [1,2,3]; 2D: [[1,2],[3,4]] (inefficient for math).
Tuple → Similar to list, but immutable.
Set → Always 1D (no index).
Dictionary → 1D mapping; nested dicts simulate 2D.
NumPy Array → True 1D, 2D, 3D, nD (efficient for matrix ops).

📌 Methods Cheat Sheet with 1D & 2D Examples

🔹 List

# 1D List
a = [3, 1, 2]
a.append(4)          # [3, 1, 2, 4]
a.sort()             # [1, 2, 3, 4]

# 2D List (List of Lists)
matrix = [[1, 2], [3, 4], [5, 6]]
matrix[1][0]         # 3
[row[1] for row in matrix]   # [2, 4, 6] → column extraction

# 1D List
a = [3, 1, 2]
a.append(4)          # [3, 1, 2, 4]
a.sort()             # [1, 2, 3, 4]

# 2D List (List of Lists)
matrix = [[1, 2], [3, 4], [5, 6]]
matrix[1][0]         # 3
[row[1] for row in matrix]   # [2, 4, 6] → column extraction

✅ Use Case: Representing tabular data before loading into Pandas.

🔹 Tuple

# 1D Tuple
t = (1, 2, 3, 2)
t.count(2)           # 2
t.index(3)           # 2

# Nested Tuple (Tuple of Tuples)
coords = ((10, 20), (30, 40))
coords[1][0]         # 30

# 1D Tuple
t = (1, 2, 3, 2)
t.count(2)           # 2
t.index(3)           # 2

# Nested Tuple (Tuple of Tuples)
coords = ((10, 20), (30, 40))
coords[1][0]         # 30

✅ Use Case: Immutable coordinates, fixed reference mappings.

🔹 Set

# 1D Set
s = {1, 2, 2, 3}
s.add(4)             # {1, 2, 3, 4}
s.intersection({2,5}) # {2}

# Set of Frozensets (to allow nested sets)
fs = {frozenset([1,2]), frozenset([3,4])}

# 1D Set
s = {1, 2, 2, 3}
s.add(4)             # {1, 2, 3, 4}
s.intersection({2,5}) # {2}

# Set of Frozensets (to allow nested sets)
fs = {frozenset([1,2]), frozenset([3,4])}

✅ Use Case: Deduplication, unique groups, fast membership tests.

🔹 Dictionary

# 1D Dictionary
d = {"Alice":100, "Bob":200}
d["Charlie"] = 300
list(d.keys())       # ['Alice','Bob','Charlie']

# Nested Dictionary
students = {
    "Alice": {"Math": 90, "Eng": 85},
    "Bob":   {"Math": 78, "Eng": 92}
}
students["Alice"]["Math"]   # 90

# 1D Dictionary
d = {"Alice":100, "Bob":200}
d["Charlie"] = 300
list(d.keys())       # ['Alice','Bob','Charlie']

# Nested Dictionary
students = {
    "Alice": {"Math": 90, "Eng": 85},
    "Bob":   {"Math": 78, "Eng": 92}
}
students["Alice"]["Math"]   # 90

✅ Use Case: Key-value storage, hierarchies like customer → orders.

🔹 NumPy Array

import numpy as np

# 1D Array
arr = np.array([1, 2, 3, 4])
arr.mean()            # 2.5

# 2D Array (Matrix)
mat = np.array([[1, 2], [3, 4], [5, 6]])
mat.shape             # (3,2)
mat[:, 1]             # [2,4,6] → column slice

# 3D Array
cube = np.arange(8).reshape(2,2,2)

import numpy as np

# 1D Array
arr = np.array([1, 2, 3, 4])
arr.mean()            # 2.5

# 2D Array (Matrix)
mat = np.array([[1, 2], [3, 4], [5, 6]])
mat.shape             # (3,2)
mat[:, 1]             # [2,4,6] → column slice

# 3D Array
cube = np.arange(8).reshape(2,2,2)

✅ Use Case: Efficient math on 2D/3D data (images, stats, regression input).

🚀 Summary of “Why & When”

Structure	Best For	Pros	Cons	Analyst Approach
List	Sequential, tabular-like scratch data	Flexible, indexable	Duplicates, slower on large sets	Start raw data wrangling
Tuple	Fixed records	Immutable, safe	Can’t update	Use for stable lookups (coordinates, IDs)
Set	Unique values	Deduplication, fast membership	Unordered, no indexing	Find distinct customers/products
Dict	Key → value	Fast, readable	Memory heavy	Hierarchies (customer → orders)
NumPy	Math, ML input	Vectorized, memory-efficient	Rigid types	Statistical ops, transformations

📌 Data Analyst Use Cases

Collection	Example Scenario
List	Store raw sales records before cleaning → `[1200, 800, 450]`.
Tuple	Fixed `(Year, Quarter, Revenue)` values for reporting.
Set	Deduplicate customer IDs quickly: `set(customer_list)`.
Dictionary	Product → Category mapping → `{101: "Electronics"}`.
Array	Perform matrix operations, e.g., sales by Region × Month.

3. `init()` Method in Python

The __init__() method is a constructor in OOP used to initialize objects.

class SalesRecord:
    def __init__(self, date, amount, region):
        self.date = date
        self.amount = amount
        self.region = region

record = SalesRecord("2024-01-01", 500, "North")
print(record.region)   # North

class SalesRecord:
    def __init__(self, date, amount, region):
        self.date = date
        self.amount = amount
        self.region = region

record = SalesRecord("2024-01-01", 500, "North")
print(record.region)   # North

Analyst Use Case – Create custom objects such as Customer or Transaction classes to structure datasets in ETL pipelines.

4. Mutable vs Immutable

Type	Examples	Change Allowed?
Mutable	List, Dict, Set	✅ Yes
Immutable	Int, Float, String, Tuple	❌ No

Example

Store transaction logs in lists/dicts (mutable).
Store (Year, Month, ProductID) in tuples (immutable, fixed schema).

5. Comprehensions

List Comprehension

squares = [i**2 for i in range(5)]  # [0,1,4,9,16]

squares = [i**2 for i in range(5)]  # [0,1,4,9,16]

Dictionary Comprehension

sales = {region: revenue for region, revenue in [("A",1000),("B",2000)]}

sales = {region: revenue for region, revenue in [("A",1000),("B",2000)]}

Tuple Comprehension (generator)

gen = (i for i in range(5))<br>list(gen)  # [0,1,2,3,4]

gen = (i for i in range(5))<br>list(gen)  # [0,1,2,3,4]

Analyst Example – Filter out negative sales:

clean_sales = [x for x in sales if x > 0]

clean_sales = [x for x in sales if x > 0]

6. Global Interpreter Lock (GIL)

Prevents multiple Python threads from executing simultaneously.
❌ Limitation → CPU-bound tasks (slow for heavy math).
✅ Works fine → I/O-bound tasks (API calls, file operations).

Analyst Example

Web scraping multiple APIs → threading is effective.
Heavy analytics models → prefer multiprocessing or NumPy vectorization.

7. Conversions Between Formats

Conversion	Example Code	Resulting Output	Analyst Use Case
List → Set	`ids = [101,101,102,103] set(ids)`	`{101,102,103}`	Remove duplicate customer IDs from survey responses or transaction logs.
Dict → DataFrame	`d = {"A":100,"B":200} pd.DataFrame(d.items())`	`0 1 A 100 B 200`	Convert mapping into a tabular report for analysis or export to Excel/CSV.
List → NumPy Array	`sales = [100,200,300] np.array(sales)`	`array([100,200,300])`	Perform vectorized operations and statistical analysis on sales figures.
Array → List	`arr = np.array([1,2,3]) arr.tolist()`	`[1,2,3]`	Convert numeric arrays back to Python lists for JSON/CSV export or API responses.

8. Pandas Essentials

Series

import pandas as pd
s = pd.Series([10,20,30], index=["a","b","c"])
print(s["b"])   # 20

import pandas as pd
s = pd.Series([10,20,30], index=["a","b","c"])
print(s["b"])   # 20

DataFrame

df = pd.DataFrame({"Name":["Alice","Bob"], "Sales":[100,200]})

df = pd.DataFrame({"Name":["Alice","Bob"], "Sales":[100,200]})

Core Operations

Method	Example	Purpose
df.head()	`df.head(5)`	Preview first 5 rows
df.info()	`df.info()`	Schema + memory usage
df.describe()	`df.describe()`	Summary statistics
df.isnull().sum()	`df.isnull().sum()`	Check missing values
df.dropna()	`df.dropna()`	Remove null rows
df.fillna()	`df.fillna(0)`	Replace nulls with 0
df.groupby()	`df.groupby("Region")["Sales"].sum()`	Aggregate sales by region
df.sort_values()	`df.sort_values("Sales")`	Sort by Sales column

Joins in Pandas

Join Type	Pandas Code	Example (df1 vs df2)	Analyst Use Case
Inner Join	`pd.merge(df1, df2, on="id", how="inner")`	df1 IDs: [1,2,3], df2 IDs: [2,3,4] → Result: [2,3]	Customers with transactions
Left Join	`pd.merge(df1, df2, on="id", how="left")`	df1 IDs: [1,2,3] → Keeps all [1,2,3]	All customers + purchases
Right Join	`pd.merge(df1, df2, on="id", how="right")`	df2 IDs: [2,3,4] → Keeps all [2,3,4]	All sales records
Outer Join	`pd.merge(df1, df2, on="id", how="outer")`	df1 [1,2,3], df2 [2,3,4] → Result: [1,2,3,4]	Union of both

GroupBy and Pivot

# Group sales by region
df.groupby("Region")["Sales"].sum()

# Pivot table: Regions vs Products
df.pivot(index="Region", columns="Product", values="Sales")

# Group sales by region
df.groupby("Region")["Sales"].sum()

# Pivot table: Regions vs Products
df.pivot(index="Region", columns="Product", values="Sales")

9. FAQs (Common Confusions)

Are lists 1D or 2D? → Lists are 1D; nested lists simulate 2D; use NumPy for real 2D.
Why tuples if lists exist? → Tuples are immutable (faster, safer).
Why sets if lists can remove duplicates? → Sets offer O(1) membership lookup.
Are dictionaries 2D? → Single dict = 1D; nested dicts = 2D like a table.
Which structure is closest to SQL tables? → List of dicts or Pandas DataFrame.
When to use arrays over lists? → Use arrays for numeric-heavy computations.
Difference between merge and concat? → Merge = SQL join logic, Concat = stack vertically/horizontally.

10. Analyst Interview Cheat Sheet

Question	Short Answer
Why use a set vs list?	Deduplication and faster membership test.
Why use a tuple?	Fixed schema like `(year, month, product_id)`.
Why use a dictionary?	Fast lookups, mapping codes to names.
Why use arrays?	Efficient vectorized computations.
List comprehension use?	Quick filtering/cleaning in ETL.
GIL impact?	Good for I/O, bad for CPU-bound tasks.
Pandas `groupby` use?	Aggregations (e.g., sales by region).
Pandas `pivot` use?	Reshape data like Excel pivot tables.

Final Takeaway

List → staging raw data.
Tuple → fixed, unchangeable schema.
Set → uniqueness and fast membership checks.
Dictionary → mapping keys to values.
Array (NumPy) → numeric efficiency at scale.
Pandas → indispensable for tabular wrangling, joins, and aggregations.

Mastering these fundamentals, along with their analyst-focused use cases, ensures you can confidently answer both technical and applied interview questions.

Discover more from Data Engineer Journey

Subscribe to get the latest posts sent to your email.

Python Interview Preparation Guide for Data Analysts – list, tuple, set, dictionary comparison

1. What is Python, and what are its key features?

2. Core Data Structures in Python

📌 Comparison Table

📌 Dimensionality

📌 Methods Cheat Sheet with 1D & 2D Examples

🔹 List

🔹 Tuple

🔹 Set

🔹 Dictionary

🔹 NumPy Array

🚀 Summary of “Why & When”

📌 Data Analyst Use Cases

3. `init()` Method in Python

4. Mutable vs Immutable

5. Comprehensions

6. Global Interpreter Lock (GIL)

7. Conversions Between Formats

8. Pandas Essentials

Series

DataFrame

Core Operations

Joins in Pandas

GroupBy and Pivot

9. FAQs (Common Confusions)

10. Analyst Interview Cheat Sheet

Final Takeaway

Like this:

Related

Discover more from Data Engineer Journey

Leave a Comment Cancel Reply

1. What is Python, and what are its key features?

2. Core Data Structures in Python

📌 Comparison Table

📌 Dimensionality

📌 Methods Cheat Sheet with 1D & 2D Examples

🔹 List

🔹 Tuple

🔹 Set

🔹 Dictionary

🔹 NumPy Array

🚀 Summary of “Why & When”

📌 Data Analyst Use Cases

3. __init__() Method in Python

4. Mutable vs Immutable

5. Comprehensions

6. Global Interpreter Lock (GIL)

7. Conversions Between Formats

8. Pandas Essentials

Series

DataFrame

Core Operations

Joins in Pandas

GroupBy and Pivot

9. FAQs (Common Confusions)

10. Analyst Interview Cheat Sheet

Final Takeaway

Share this:

Like this:

Related

Discover more from Data Engineer Journey

Leave a Comment Cancel Reply

Discover more from Data Engineer Journey

3. `init()` Method in Python