In [1]:
import sys

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio

import roadmap_python.analysis_utils as utils

print("Python version:", sys.version.split()[0])
print("pandas version:", pd.__version__)
print("numpy version:", np.__version__)
print("matplotlib version:", plt.matplotlib.__version__)

%load_ext autoreload
%autoreload 2
Python version: 3.12.3
pandas version: 2.3.3
numpy version: 2.4.0
matplotlib version: 3.10.8

Idea behind this project is that you and/or your team use MS Project for developing a roadmap, workb reakdown structure, Gantt chart, etc. However, you're not impressed with the timeline view. It's tedious to create a custom view in Power Point, which requires manually re-creating and syncing with the MS Project file. So you decide to try Python to automatically generate a new customizable view from MS Project file's data.

Step 1 - Get MS Project Data¶

MS Project files as-is are not the most suitable for interrogating with Python. Instead, you'll want to export the file into another format. The most sensible one is the XML export option. As I understand it, this XML format is also known as mspdi (MS Project Data Interchange).

I do not have MS Project on my PC. So I asked ChatGPT for help finding an XML file. It pointed me to open-msp-viewer. This project ships with a few samples which can be used for testing. I figured this would be a good place to start.

Step 2 - XML-> Dataframe¶

For a timeline view, I'm only realy interested in the MS Project file's task data. Like the name of the task and when it starts and finishes. This will be caputred in the "tasks_ms_df" dataframe.

I'll also capture the dependency data (links) in case I want to catpure that in the future. This will be captured in the "links_df" (basically a depedency table).

In [2]:
paths = utils.get_paths()
smaple_proj_f = "3PointPlan-example.xml"
tasks_ms_df, links_df = utils.read_mspdi_tasks_and_links(
    paths.raw / "open-msp-viewer-sample" / smaple_proj_f
)

# magnify the dates so span multiple FY, for demo purposes later
factor = 18  # 6x longer overall (tweak until it spans multiple FYs)
anchor = tasks_ms_df["start"].min()
for col in ["start", "finish"]:
    tasks_ms_df[col] = anchor + (tasks_ms_df[col] - anchor) * factor

display(tasks_ms_df)
uid name start finish outline_level wbs summary milestone predecessors
0 0 3 Point Plan - example 2006-10-02 08:30:00 2008-11-18 06:13:12 0 0 True False NaN
1 1 3 Point Plan 2006-10-02 08:30:00 2008-11-18 06:13:12 1 1 True False NaN
2 2 Group Task 1 2006-10-02 08:30:00 2007-06-14 22:25:12 2 1.1 True False NaN
3 3 Detail Task A 2006-10-02 08:30:00 2006-10-21 16:39:36 3 1.1.1 False False NaN
4 4 Detail Task B 2006-10-21 16:39:36 2006-11-26 10:03:36 3 1.1.2 False False [{'predecessor_uid': 3, 'type': 1, 'lag': 0, '...
5 5 Detail Task C 2006-11-26 10:03:36 2007-02-24 05:33:36 3 1.1.3 False False [{'predecessor_uid': 4, 'type': 1, 'lag': 0, '...
6 18 Group Task 2 2007-02-11 12:06:00 2007-06-14 22:25:12 3 1.1.4 True False [{'predecessor_uid': 5, 'type': 1, 'lag': 0, '...
7 19 Detail Task A 2007-02-11 12:06:00 2007-04-05 09:06:00 4 1.1.4.1 False False NaN
8 20 Detail Task B 2007-04-05 09:06:00 2007-06-14 22:25:12 4 1.1.4.2 False False [{'predecessor_uid': 19, 'type': 1, 'lag': 0, ...
9 21 Detail Task C 2007-04-05 09:06:00 2007-04-23 00:06:00 4 1.1.4.3 False False [{'predecessor_uid': 19, 'type': 1, 'lag': 0, ...
10 14 Group Task 3 2007-02-11 12:06:00 2008-01-01 10:07:12 2 1.2 True False NaN
11 15 Detail Task A 2007-06-14 22:25:12 2007-10-16 18:49:12 3 1.2.1 False False [{'predecessor_uid': 18, 'type': 1, 'lag': 0, ...
12 16 Detail Task B 2007-02-11 12:06:00 2007-06-15 03:34:48 3 1.2.2 False False [{'predecessor_uid': 5, 'type': 1, 'lag': 0, '...
13 17 Detail Task C 2007-10-16 18:49:12 2008-01-01 10:07:12 3 1.2.3 False False [{'predecessor_uid': 15, 'type': 1, 'lag': 0, ...
14 10 Group Task 4 2007-06-14 22:25:12 2008-04-16 16:25:12 2 1.3 True False NaN
15 11 Detail Task A 2007-06-14 22:25:12 2007-07-20 21:13:12 3 1.3.1 False False [{'predecessor_uid': 2, 'type': 1, 'lag': 0, '...
16 12 Detail Task B 2008-01-01 10:07:12 2008-02-24 01:07:12 3 1.3.2 False False [{'predecessor_uid': 14, 'type': 1, 'lag': 0, ...
17 13 Detail Task C 2008-02-24 01:07:12 2008-04-16 16:25:12 3 1.3.3 False False [{'predecessor_uid': 11, 'type': 1, 'lag': 0, ...
18 6 Group Task 5 2008-04-16 16:25:12 2008-11-18 06:13:12 2 1.4 True False NaN
19 7 Detail Task A 2008-04-16 16:25:12 2008-06-27 15:13:12 3 1.4.1 False False [{'predecessor_uid': 10, 'type': 1, 'lag': 0, ...
20 8 Detail Task B 2008-06-27 15:13:12 2008-08-03 12:31:12 3 1.4.2 False False [{'predecessor_uid': 7, 'type': 1, 'lag': 0, '...
21 9 Detail Task C 2008-08-03 12:31:12 2008-11-18 06:13:12 3 1.4.3 False False [{'predecessor_uid': 8, 'type': 1, 'lag': 0, '...
22 22 Final Delivery 2008-11-18 06:13:12 2008-11-18 06:13:12 2 1.5 False True [{'predecessor_uid': 6, 'type': 1, 'lag': 0, '...

Step 3 - A simple timeline with plotly express timeline function¶

We'll start with a simple timeline using plotly express "timeline". From the docs: "In a timeline plot, each row of data_frame is represented as a rectangular mark on an x axis of type date, spanning from x_start to x_end".

plotly.express is Plotly’s high-level API: you hand it a tidy dataframe + a few column names, and it builds a full interactive figure for you. Under the hood it returns a normal plotly.graph_objects.Figure, so you can still do all the “serious” customization afterward with fig.update_layout(), fig.update_traces(), etc.

In [3]:
pio.renderers.default = "notebook_connected"  # good default for classic notebooks

df = tasks_ms_df.copy()

# Keep only tasks that can be drawn as bars
df = df[df["start"].notna() & df["finish"].notna()].copy()

# Ensure one row per task on the y-axis (avoid duplicate-name collisions)
# Prefer WBS if present; fall back to UID.
df["task_label"] = df.apply(
    lambda r: f"{r['wbs']} — {r['name']}"
    if pd.notna(r.get("wbs")) and str(r.get("wbs")).strip()
    else f"UID {r['uid']} — {r['name']}",
    axis=1,
)

# Milestones often have start == finish; give them a tiny width so they render
is_milestone = (df["milestone"]) | (df["start"] == df["finish"])
df["finish_plot"] = df["finish"]
df.loc[is_milestone, "finish_plot"] = df.loc[is_milestone, "finish"] + pd.Timedelta(weeks=2)

# Pick a stable order (WBS-ish if available, otherwise by start)
sort_cols = ["start", "outline_level", "uid"]
if "wbs" in df.columns and df["wbs"].notna().any():
    sort_cols = ["wbs", "start", "outline_level", "uid"]

df = df.sort_values(sort_cols, kind="stable")

fig = px.timeline(
    df,
    x_start="start",
    x_end="finish_plot",
    y="task_label",
    hover_data={
        "uid": True,
        "name": True,
        "wbs": True,
        "outline_level": True,
        "summary": True,
        "milestone": True,
        "start": "|%Y-%m-%d",
        "finish": "|%Y-%m-%d",
        "finish_plot": False,
    },
)

fig.update_yaxes(autorange="reversed")  # top-to-bottom like a plan
fig.update_layout(
    height=max(750, 18 * len(df)),
    xaxis_rangeslider_visible=True,  # handy for zooming
    margin=dict(l=20, r=20, t=40, b=20),
)

fig

Step 4 - Customization with a data dictionary¶

So we have a minimal viable product of sorts. Already it is pretty slick but not anything that MS Project couldn't do. I now want to add lots of customization.

I'll use a data dictionary to define my desired style. These will primarily be (1) shape of label bars and milestones, (2) colors/patterns/gradients of task bars and milestones, and (3) labels used for tasks/milestones (and therefore, which row they'll appear on).

In [4]:
df = tasks_ms_df.copy()
# Keep only tasks that can be drawn as bars
df = df[df["start"].notna() & df["finish"].notna()].copy()
dd_path = paths.raw / "roadmap_data_dictionary_tasks.csv"
dd = pd.read_csv(dd_path)
dd = dd[
    [
        "uid",
        "y_id",
        "y_label",
        "item_type",
        "group",
        "shape",
        "color",
        "include",
        "fill_alpha",
        "bar_pattern",
        "bar_line_color",
        "bar_line_width",
        "marker_size",
        "marker_line_color",
        "marker_line_width",
    ]
].copy()

df_p = df.merge(dd, on="uid", how="left")
# display(df_p)
# Fallbacks so you can leave blanks in the CSV and still render
df_p["y_label"] = df_p["y_label"].fillna(df_p["name"])
df_p["item_type"] = df_p["item_type"].fillna(
    df_p["milestone"].fillna(False).map(lambda m: "milestone" if m else "task")
)
df_p["shape"] = df_p["shape"].fillna(
    df_p["item_type"].map(lambda t: "star" if t == "milestone" else "rectangle")
)
# make sure it's clean 0..1 floats
df_p["fill_alpha"] = pd.to_numeric(df_p["fill_alpha"], errors="coerce").fillna(1.0).clip(0, 1)

df_p = df_p[(df_p["include"])]

df_p.head()
Out[4]:
uid name start finish outline_level wbs summary milestone predecessors y_id ... shape color include fill_alpha bar_pattern bar_line_color bar_line_width marker_size marker_line_color marker_line_width
2 2 Group Task 1 2006-10-02 08:30:00 2007-06-14 22:25:12 2 1.1 True False NaN 1.1 ... rectangle #EF553B True 1.00 NaN #333333 2 NaN NaN NaN
3 3 Detail Task A 2006-10-02 08:30:00 2006-10-21 16:39:36 3 1.1.1 False False NaN 1.1.1 ... rectangle #EF553B True 0.85 NaN #333333 1 NaN NaN NaN
4 4 Detail Task B 2006-10-21 16:39:36 2006-11-26 10:03:36 3 1.1.2 False False [{'predecessor_uid': 3, 'type': 1, 'lag': 0, '... 1.1.2 ... rectangle #EF553B True 0.85 NaN #333333 1 NaN NaN NaN
5 5 Detail Task C 2006-11-26 10:03:36 2007-02-24 05:33:36 3 1.1.3 False False [{'predecessor_uid': 4, 'type': 1, 'lag': 0, '... 1.1.3 ... rectangle #EF553B True 0.85 NaN #333333 1 NaN NaN NaN
6 18 Group Task 2 2007-02-11 12:06:00 2007-06-14 22:25:12 3 1.1.4 True False [{'predecessor_uid': 5, 'type': 1, 'lag': 0, '... 1.1.4 ... rectangle #EF553B True 0.85 NaN #333333 1 NaN NaN NaN

5 rows × 23 columns

In [5]:
#################################
# Tasks
#################################
df_tasks = df_p[
    (df_p["item_type"] == "task") & df_p["start"].notna() & df_p["finish"].notna()
].copy()
# df_tasks = df_tasks[(df_tasks["outline_level"] >= 2)]

y_col = "y_id"  # whatever you’re using on your y-axis
fig = px.timeline(
    df_tasks,
    x_start="start",
    x_end="finish",
    y=y_col,
    hover_data=["uid", "wbs", "outline_level", "summary"],
    text="name",  # labels from your CSV
)

# Put text inside bars (timeline() doesn't take textposition directly)
fig.update_traces(
    textposition="inside", insidetextanchor="middle", selector=dict(type="bar")
)  # :contentReference[oaicite:3]{index=3}

fig.update_traces(
    marker_color=df_tasks["color"].tolist(),
    marker_pattern_shape=df_tasks["bar_pattern"].fillna("").tolist(),
    marker_opacity=df_tasks["fill_alpha"].tolist(),
    selector=dict(type="bar"),
)

# y tick labels
fig.update_yaxes(
    tickmode="array",
    tickvals=df_tasks["y_id"].tolist(),
    ticktext=df_tasks["y_label"].tolist(),
)

# # Apply per-bar colors from the dictionary (no legend, but obeys CSV)
# fig.update_traces(
#     marker=dict(color=df_tasks["color"].tolist()),
#     selector=dict(type="bar")
# )  # :contentReference[oaicite:4]{index=4}


#################################
# Milestones
#################################
df_ms = df_p[(df_p["item_type"] == "milestone") & df_p["start"].notna()].copy()

fig.add_trace(
    go.Scatter(
        x=df_ms["start"],
        y=df_ms[y_col],
        mode="markers",
        marker=dict(
            symbol=df_ms["shape"].tolist(),  # e.g. "star"
            color=df_ms["color"].tolist(),  # e.g. "gold" or "#FFD700"
            size=14,
            line=dict(width=1, color="black"),
        ),
        text=df_ms["name"],
        hovertemplate="%{text}<extra></extra>",
        name="Milestones",
    )
)  # :contentReference[oaicite:5]{index=5}


#################################
# Overall Figure Styling
#################################

# Update timeline to show FY quarters
fy_start_month = 10  # change if your FY starts in a different month

xmin = df_p["start"].min()
xmax = (df_p["finish_plot"] if "finish_plot" in df_p.columns else df_p["finish"]).max()

fig.update_yaxes(autorange="reversed")  # top-to-bottom like a plan
fig.update_layout(
    height=max(1000, 18 * len(df)),
    xaxis_rangeslider_visible=True,  # handy for zooming
    margin=dict(l=20, r=20, t=40, b=20),
)


fig

Step 4a - Make it a function¶

Taking what built above, throw it into a function which has data dictionaries as an input. Demo below is another way to style the timeline. Demonstrating that changing the data disctionary allows for changing the task and milestone style and grouping/order.

In [6]:
df = tasks_ms_df.copy()
ddv2_path = paths.raw / "roadmap_data_dictionary_tasks_v2.csv"
# display(df)
fig, df_merged = utils.build_roadmap_timeline_from_dd(df, ddv2_path, y_col="y_id")

fig.update_layout(
    height=max(1000, 18 * len(df)),
    xaxis_rangeslider_visible=True,  # handy for zooming
    margin=dict(l=20, r=20, t=40, b=20),
)

More Complexity with a DoD Acquisition Timeline Example¶

You can imagine a scenario where a acquisition program timeline is being maintained in MS Project as the "source of truth" and these views created with Python can create customized "artifacts", e.g., those for presentations with selective information etc. I asked ChatGPT to generate an XML file of a generic DoD acquistiion timeline and a corresponding data dictionary. It gave a pretty good result which I can manually refine/edit.

In [7]:
paths = utils.get_paths()
smaple_proj_f = "dod_acquisition_lifecycle_demo.xml"
tasks_ms_df, links_df = utils.read_mspdi_tasks_and_links(paths.raw / smaple_proj_f)
df = tasks_ms_df.copy()
# Keep only tasks that can be drawn as bars
df = df[df["start"].notna() & df["finish"].notna()].copy()
display(df.head())
/home/autrpy/projects/2025-roadmap-python/src/roadmap_python/analysis_utils.py:250: RuntimeWarning:

invalid value encountered in cast

/home/autrpy/projects/2025-roadmap-python/src/roadmap_python/analysis_utils.py:253: RuntimeWarning:

invalid value encountered in cast

uid name start finish outline_level wbs summary milestone predecessors
0 0 Generic DoD Program — Acquisition Lifecycle (D... 2025-10-01 08:00:00 2031-09-30 17:00:00 1 1 True False NaN
1 1 Materiel Development Decision (MDD) 2025-10-15 13:00:00 2025-10-15 13:00:00 2 1.1 False True NaN
2 2 Materiel Solution Analysis (MSA) Phase 2025-10-16 08:00:00 2026-03-31 17:00:00 2 1.2 True False [{'predecessor_uid': 1, 'type': 1, 'lag': -922...
3 3 Technology Maturation & Risk Reduction (TMRR) ... 2026-04-01 08:00:00 2026-12-31 17:00:00 2 1.3 True False NaN
4 4 Engineering & Manufacturing Development (EMD) ... 2027-01-01 08:00:00 2028-09-30 17:00:00 2 1.4 True False NaN
In [8]:
ddv_path = paths.raw / "roadmap_data_dictionary_dod_demo_v1.csv"
fig, df_merged = utils.build_roadmap_timeline_from_dd(df, ddv_path, y_col="y_id", y_label="y_label")

fig.update_layout(
    height=max(2000, 18 * len(df)),
    xaxis_rangeslider_visible=True,  # handy for zooming
    margin=dict(l=20, r=20, t=40, b=20),
)

The above represents a complete view generated by ChatGPT that I lightly edited. Because this timeline is so complex, I asked ChatGPT to create a more focused timeline view via an updated data dictionary. The XML input is the same, only the data dictionary was updated. Tasks and milestones are more logically grouped and actiites are hidden to reduce the complexity.

In [9]:
ddv_path = paths.raw / "roadmap_data_dictionary_dod_demo_v4.csv"
fig, df_merged = utils.build_roadmap_timeline_from_dd(
    df, ddv_path, bar_text_wrap_width=32, xaxis_fy_quarters=True
)

fig.update_layout(
    height=max(1000, 18 * len(df)),
    margin=dict(l=20, r=20, t=40, b=20),
)

Next Steps¶

Explore further interactite features, visualizing dependencies, and export options.