Converting MS Project Data to an Interactive Timeline w/ Python, Plotly

11 Jan 2026

import sys

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio

import roadmap_python.analysis_utils as utils

print("Python version:", sys.version.split()[0])
print("pandas version:", pd.__version__)
print("numpy version:", np.__version__)
print("matplotlib version:", plt.matplotlib.__version__)

%load_ext autoreload
%autoreload 2

Python version: 3.12.3
pandas version: 2.3.3
numpy version: 2.4.0
matplotlib version: 3.10.8

Converting MS Project Data to an Interactive Timeline w/ Python, Plotly

Link to Notebook (HTML)

Interactive outputs

These interactive Plotly exports are saved as standalone HTML files in assets/ and embedded below in the relevant sections:

fig0-simple-timeline.html
fig1-timeline-add-datadict.html
fig2-timeline-altview.html
fig3-timeline-dodacq-view1.html
fig4-timeline-dodacq-view2.html

Idea behind this project is that you and/or your team use MS Project for developing a roadmap, workb reakdown structure, Gantt chart, etc. However, you’re not impressed with the timeline view. It’s tedious to create a custom view in Power Point, which requires manually re-creating and syncing with the MS Project file. So you decide to try Python to automatically generate a new customizable view from MS Project file’s data.

Step 1 - Get MS Project Data

MS Project files as-is are not the most suitable for interrogating with Python. Instead, you’ll want to export the file into another format. The most sensible one is the XML export option. As I understand it, this XML format is also known as mspdi (MS Project Data Interchange).

I do not have MS Project on my PC. So I asked ChatGPT for help finding an XML file. It pointed me to open-msp-viewer. This project ships with a few samples which can be used for testing. I figured this would be a good place to start.

Step 2 - XML-> Dataframe

For a timeline view, I’m only realy interested in the MS Project file’s task data. Like the name of the task and when it starts and finishes. This will be caputred in the “tasks_ms_df” dataframe.

I’ll also capture the dependency data (links) in case I want to catpure that in the future. This will be captured in the “links_df” (basically a depedency table).

paths = utils.get_paths()
smaple_proj_f = "3PointPlan-example.xml"
tasks_ms_df, links_df = utils.read_mspdi_tasks_and_links(
    paths.raw / "open-msp-viewer-sample" / smaple_proj_f
)

# magnify the dates so span multiple FY, for demo purposes later
factor = 18  # 6x longer overall (tweak until it spans multiple FYs)
anchor = tasks_ms_df["start"].min()
for col in ["start", "finish"]:
    tasks_ms_df[col] = anchor + (tasks_ms_df[col] - anchor) * factor

display(tasks_ms_df)

	uid	name	start	finish	outline_level	wbs	summary	milestone	predecessors
0	0	3 Point Plan - example	2006-10-02 08:30:00	2008-11-18 06:13:12	0	0	True	False	NaN
1	1	3 Point Plan	2006-10-02 08:30:00	2008-11-18 06:13:12	1	1	True	False	NaN
2	2	Group Task 1	2006-10-02 08:30:00	2007-06-14 22:25:12	2	1.1	True	False	NaN
3	3	Detail Task A	2006-10-02 08:30:00	2006-10-21 16:39:36	3	1.1.1	False	False	NaN
4	4	Detail Task B	2006-10-21 16:39:36	2006-11-26 10:03:36	3	1.1.2	False	False	[{'predecessor_uid': 3, 'type': 1, 'lag': 0, '...
5	5	Detail Task C	2006-11-26 10:03:36	2007-02-24 05:33:36	3	1.1.3	False	False	[{'predecessor_uid': 4, 'type': 1, 'lag': 0, '...
6	18	Group Task 2	2007-02-11 12:06:00	2007-06-14 22:25:12	3	1.1.4	True	False	[{'predecessor_uid': 5, 'type': 1, 'lag': 0, '...
7	19	Detail Task A	2007-02-11 12:06:00	2007-04-05 09:06:00	4	1.1.4.1	False	False	NaN
8	20	Detail Task B	2007-04-05 09:06:00	2007-06-14 22:25:12	4	1.1.4.2	False	False	[{'predecessor_uid': 19, 'type': 1, 'lag': 0, ...
9	21	Detail Task C	2007-04-05 09:06:00	2007-04-23 00:06:00	4	1.1.4.3	False	False	[{'predecessor_uid': 19, 'type': 1, 'lag': 0, ...
10	14	Group Task 3	2007-02-11 12:06:00	2008-01-01 10:07:12	2	1.2	True	False	NaN
11	15	Detail Task A	2007-06-14 22:25:12	2007-10-16 18:49:12	3	1.2.1	False	False	[{'predecessor_uid': 18, 'type': 1, 'lag': 0, ...
12	16	Detail Task B	2007-02-11 12:06:00	2007-06-15 03:34:48	3	1.2.2	False	False	[{'predecessor_uid': 5, 'type': 1, 'lag': 0, '...
13	17	Detail Task C	2007-10-16 18:49:12	2008-01-01 10:07:12	3	1.2.3	False	False	[{'predecessor_uid': 15, 'type': 1, 'lag': 0, ...
14	10	Group Task 4	2007-06-14 22:25:12	2008-04-16 16:25:12	2	1.3	True	False	NaN
15	11	Detail Task A	2007-06-14 22:25:12	2007-07-20 21:13:12	3	1.3.1	False	False	[{'predecessor_uid': 2, 'type': 1, 'lag': 0, '...
16	12	Detail Task B	2008-01-01 10:07:12	2008-02-24 01:07:12	3	1.3.2	False	False	[{'predecessor_uid': 14, 'type': 1, 'lag': 0, ...
17	13	Detail Task C	2008-02-24 01:07:12	2008-04-16 16:25:12	3	1.3.3	False	False	[{'predecessor_uid': 11, 'type': 1, 'lag': 0, ...
18	6	Group Task 5	2008-04-16 16:25:12	2008-11-18 06:13:12	2	1.4	True	False	NaN
19	7	Detail Task A	2008-04-16 16:25:12	2008-06-27 15:13:12	3	1.4.1	False	False	[{'predecessor_uid': 10, 'type': 1, 'lag': 0, ...
20	8	Detail Task B	2008-06-27 15:13:12	2008-08-03 12:31:12	3	1.4.2	False	False	[{'predecessor_uid': 7, 'type': 1, 'lag': 0, '...
21	9	Detail Task C	2008-08-03 12:31:12	2008-11-18 06:13:12	3	1.4.3	False	False	[{'predecessor_uid': 8, 'type': 1, 'lag': 0, '...
22	22	Final Delivery	2008-11-18 06:13:12	2008-11-18 06:13:12	2	1.5	False	True	[{'predecessor_uid': 6, 'type': 1, 'lag': 0, '...

Step 3 - A simple timeline with plotly express timeline function

We’ll start with a simple timeline using plotly express “timeline”. From the docs: “In a timeline plot, each row of data_frame is represented as a rectangular mark on an x axis of type date, spanning from x_start to x_end”.

plotly.express is Plotly’s high-level API: you hand it a tidy dataframe + a few column names, and it builds a full interactive figure for you. Under the hood it returns a normal plotly.graph_objects.Figure, so you can still do all the “serious” customization afterward with fig.update_layout(), fig.update_traces(), etc.

pio.renderers.default = "notebook_connected"  # good default for classic notebooks

df = tasks_ms_df.copy()

# Keep only tasks that can be drawn as bars
df = df[df["start"].notna() & df["finish"].notna()].copy()

# Ensure one row per task on the y-axis (avoid duplicate-name collisions)
# Prefer WBS if present; fall back to UID.
df["task_label"] = df.apply(
    lambda r: f"{r['wbs']} — {r['name']}"
    if pd.notna(r.get("wbs")) and str(r.get("wbs")).strip()
    else f"UID {r['uid']} — {r['name']}",
    axis=1,
)

# Milestones often have start == finish; give them a tiny width so they render
is_milestone = (df["milestone"]) | (df["start"] == df["finish"])
df["finish_plot"] = df["finish"]
df.loc[is_milestone, "finish_plot"] = df.loc[is_milestone, "finish"] + pd.Timedelta(weeks=2)

# Pick a stable order (WBS-ish if available, otherwise by start)
sort_cols = ["start", "outline_level", "uid"]
if "wbs" in df.columns and df["wbs"].notna().any():
    sort_cols = ["wbs", "start", "outline_level", "uid"]

df = df.sort_values(sort_cols, kind="stable")

fig = px.timeline(
    df,
    x_start="start",
    x_end="finish_plot",
    y="task_label",
    hover_data={
        "uid": True,
        "name": True,
        "wbs": True,
        "outline_level": True,
        "summary": True,
        "milestone": True,
        "start": "|%Y-%m-%d",
        "finish": "|%Y-%m-%d",
        "finish_plot": False,
    },
)

fig.update_yaxes(autorange="reversed")  # top-to-bottom like a plan
fig.update_layout(
    height=max(750, 18 * len(df)),
    xaxis_rangeslider_visible=True,  # handy for zooming
    margin=dict(l=20, r=20, t=40, b=20),
)

display(fig)
fig.write_html(paths.figures / "fig0-simple-timeline.html")

Interactive chart (embedded): Open in a new tab

Step 4 - Customization with a data dictionary

So we have a minimal viable product of sorts. Already it is pretty slick but not anything that MS Project couldn’t do. I now want to add lots of customization.

I’ll use a data dictionary to define my desired style. These will primarily be (1) shape of label bars and milestones, (2) colors/patterns/gradients of task bars and milestones, and (3) labels used for tasks/milestones (and therefore, which row they’ll appear on).

df = tasks_ms_df.copy()
# Keep only tasks that can be drawn as bars
df = df[df["start"].notna() & df["finish"].notna()].copy()
dd_path = paths.raw / "roadmap_data_dictionary_tasks.csv"
dd = pd.read_csv(dd_path)
dd = dd[
    [
        "uid",
        "y_id",
        "y_label",
        "item_type",
        "group",
        "shape",
        "color",
        "include",
        "fill_alpha",
        "bar_pattern",
        "bar_line_color",
        "bar_line_width",
        "marker_size",
        "marker_line_color",
        "marker_line_width",
    ]
].copy()

df_p = df.merge(dd, on="uid", how="left")
# display(df_p)
# Fallbacks so you can leave blanks in the CSV and still render
df_p["y_label"] = df_p["y_label"].fillna(df_p["name"])
df_p["item_type"] = df_p["item_type"].fillna(
    df_p["milestone"].fillna(False).map(lambda m: "milestone" if m else "task")
)
df_p["shape"] = df_p["shape"].fillna(
    df_p["item_type"].map(lambda t: "star" if t == "milestone" else "rectangle")
)
# make sure it's clean 0..1 floats
df_p["fill_alpha"] = pd.to_numeric(df_p["fill_alpha"], errors="coerce").fillna(1.0).clip(0, 1)

df_p = df_p[(df_p["include"])]

df_p.head()

	uid	name	start	finish	outline_level	wbs	summary	milestone	predecessors	y_id	...	shape	color	include	fill_alpha	bar_pattern	bar_line_color	bar_line_width	marker_size	marker_line_color	marker_line_width
2	2	Group Task 1	2006-10-02 08:30:00	2007-06-14 22:25:12	2	1.1	True	False	NaN	1.1	...	rectangle	#EF553B	True	1.00	NaN	#333333	2	NaN	NaN	NaN
3	3	Detail Task A	2006-10-02 08:30:00	2006-10-21 16:39:36	3	1.1.1	False	False	NaN	1.1.1	...	rectangle	#EF553B	True	0.85	NaN	#333333	1	NaN	NaN	NaN
4	4	Detail Task B	2006-10-21 16:39:36	2006-11-26 10:03:36	3	1.1.2	False	False	[{'predecessor_uid': 3, 'type': 1, 'lag': 0, '...	1.1.2	...	rectangle	#EF553B	True	0.85	NaN	#333333	1	NaN	NaN	NaN
5	5	Detail Task C	2006-11-26 10:03:36	2007-02-24 05:33:36	3	1.1.3	False	False	[{'predecessor_uid': 4, 'type': 1, 'lag': 0, '...	1.1.3	...	rectangle	#EF553B	True	0.85	NaN	#333333	1	NaN	NaN	NaN
6	18	Group Task 2	2007-02-11 12:06:00	2007-06-14 22:25:12	3	1.1.4	True	False	[{'predecessor_uid': 5, 'type': 1, 'lag': 0, '...	1.1.4	...	rectangle	#EF553B	True	0.85	NaN	#333333	1	NaN	NaN	NaN

5 rows × 23 columns

#################################
# Tasks
#################################
df_tasks = df_p[
    (df_p["item_type"] == "task") & df_p["start"].notna() & df_p["finish"].notna()
].copy()
# df_tasks = df_tasks[(df_tasks["outline_level"] >= 2)]

y_col = "y_id"  # whatever you’re using on your y-axis
fig = px.timeline(
    df_tasks,
    x_start="start",
    x_end="finish",
    y=y_col,
    hover_data=["uid", "wbs", "outline_level", "summary"],
    text="name",  # labels from your CSV
)

# Put text inside bars (timeline() doesn't take textposition directly)
fig.update_traces(
    textposition="inside", insidetextanchor="middle", selector=dict(type="bar")
)  # :contentReference[oaicite:3]{index=3}

fig.update_traces(
    marker_color=df_tasks["color"].tolist(),
    marker_pattern_shape=df_tasks["bar_pattern"].fillna("").tolist(),
    marker_opacity=df_tasks["fill_alpha"].tolist(),
    selector=dict(type="bar"),
)

# y tick labels
fig.update_yaxes(
    tickmode="array",
    tickvals=df_tasks["y_id"].tolist(),
    ticktext=df_tasks["y_label"].tolist(),
)

# # Apply per-bar colors from the dictionary (no legend, but obeys CSV)
# fig.update_traces(
#     marker=dict(color=df_tasks["color"].tolist()),
#     selector=dict(type="bar")
# )  # :contentReference[oaicite:4]{index=4}


#################################
# Milestones
#################################
df_ms = df_p[(df_p["item_type"] == "milestone") & df_p["start"].notna()].copy()

fig.add_trace(
    go.Scatter(
        x=df_ms["start"],
        y=df_ms[y_col],
        mode="markers",
        marker=dict(
            symbol=df_ms["shape"].tolist(),  # e.g. "star"
            color=df_ms["color"].tolist(),  # e.g. "gold" or "#FFD700"
            size=14,
            line=dict(width=1, color="black"),
        ),
        text=df_ms["name"],
        hovertemplate="%{text}<extra></extra>",
        name="Milestones",
    )
)  # :contentReference[oaicite:5]{index=5}


#################################
# Overall Figure Styling
#################################

# Update timeline to show FY quarters
fy_start_month = 10  # change if your FY starts in a different month

xmin = df_p["start"].min()
xmax = (df_p["finish_plot"] if "finish_plot" in df_p.columns else df_p["finish"]).max()

fig.update_yaxes(autorange="reversed")  # top-to-bottom like a plan
fig.update_layout(
    height=max(1000, 18 * len(df)),
    xaxis_rangeslider_visible=True,  # handy for zooming
    margin=dict(l=20, r=20, t=40, b=20),
)


display(fig)
fig.write_html(paths.figures / "fig1-timeline-add-datadict.html")

Interactive chart (embedded): Open in a new tab

Step 4a - Make it a function

Taking what built above, throw it into a function which has data dictionaries as an input. Demo below is another way to style the timeline. Demonstrating that changing the data disctionary allows for changing the task and milestone style and grouping/order.

df = tasks_ms_df.copy()
ddv2_path = paths.raw / "roadmap_data_dictionary_tasks_v2.csv"
# display(df)
fig, df_merged = utils.build_roadmap_timeline_from_dd(df, ddv2_path, y_col="y_id")

fig.update_layout(
    height=max(1000, 18 * len(df)),
    xaxis_rangeslider_visible=True,  # handy for zooming
    margin=dict(l=20, r=20, t=40, b=20),
)

display(fig)
fig.write_html(paths.figures / "fig2-timeline-altview.html")

Interactive chart (embedded): Open in a new tab

More Complexity with a DoD Acquisition Timeline Example

You can imagine a scenario where a acquisition program timeline is being maintained in MS Project as the “source of truth” and these views created with Python can create customized “artifacts”, e.g., those for presentations with selective information etc. I asked ChatGPT to generate an XML file of a generic DoD acquistiion timeline and a corresponding data dictionary. It gave a pretty good result which I can manually refine/edit.

paths = utils.get_paths()
smaple_proj_f = "dod_acquisition_lifecycle_demo.xml"
tasks_ms_df, links_df = utils.read_mspdi_tasks_and_links(paths.raw / smaple_proj_f)
df = tasks_ms_df.copy()
# Keep only tasks that can be drawn as bars
df = df[df["start"].notna() & df["finish"].notna()].copy()
display(df.head())

/home/autrpy/projects/2025-roadmap-python/src/roadmap_python/analysis_utils.py:250: RuntimeWarning:

invalid value encountered in cast

/home/autrpy/projects/2025-roadmap-python/src/roadmap_python/analysis_utils.py:253: RuntimeWarning:

invalid value encountered in cast

	uid	name	start	finish	outline_level	wbs	summary	milestone	predecessors
0	0	Generic DoD Program — Acquisition Lifecycle (D...	2025-10-01 08:00:00	2031-09-30 17:00:00	1	1	True	False	NaN
1	1	Materiel Development Decision (MDD)	2025-10-15 13:00:00	2025-10-15 13:00:00	2	1.1	False	True	NaN
2	2	Materiel Solution Analysis (MSA) Phase	2025-10-16 08:00:00	2026-03-31 17:00:00	2	1.2	True	False	[{'predecessor_uid': 1, 'type': 1, 'lag': -922...
3	3	Technology Maturation & Risk Reduction (TMRR) ...	2026-04-01 08:00:00	2026-12-31 17:00:00	2	1.3	True	False	NaN
4	4	Engineering & Manufacturing Development (EMD) ...	2027-01-01 08:00:00	2028-09-30 17:00:00	2	1.4	True	False	NaN

ddv_path = paths.raw / "roadmap_data_dictionary_dod_demo_v1.csv"
fig, df_merged = utils.build_roadmap_timeline_from_dd(df, ddv_path, y_col="y_id", y_label="y_label")

fig.update_layout(
    height=max(2000, 18 * len(df)),
    xaxis_rangeslider_visible=True,  # handy for zooming
    margin=dict(l=20, r=20, t=40, b=20),
)
display(fig)
fig.write_html(paths.figures / "fig3-timeline-dodacq-view1.html")

Interactive chart (embedded): Open in a new tab

The above represents a complete view generated by ChatGPT that I lightly edited. Because this timeline is so complex, I asked ChatGPT to create a more focused timeline view via an updated data dictionary. The XML input is the same, only the data dictionary was updated. Tasks and milestones are more logically grouped and actiites are hidden to reduce the complexity.

ddv_path = paths.raw / "roadmap_data_dictionary_dod_demo_v4.csv"
fig, df_merged = utils.build_roadmap_timeline_from_dd(
    df, ddv_path, bar_text_wrap_width=32, xaxis_fy_quarters=True
)

fig.update_layout(
    height=max(1000, 18 * len(df)),
    margin=dict(l=20, r=20, t=40, b=20),
)

display(fig)
fig.write_html(paths.figures / "fig4-timeline-dodacq-view2.html")

Interactive chart (embedded): Open in a new tab

Next Steps

Explore further interactite features, visualizing dependencies, and export options.