Converting MS Project Data to an Interactive Timeline w/ Python, Plotly
11 Jan 2026
import sys
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import roadmap_python.analysis_utils as utils
print("Python version:", sys.version.split()[0])
print("pandas version:", pd.__version__)
print("numpy version:", np.__version__)
print("matplotlib version:", plt.matplotlib.__version__)
%load_ext autoreload
%autoreload 2
Python version: 3.12.3
pandas version: 2.3.3
numpy version: 2.4.0
matplotlib version: 3.10.8
Converting MS Project Data to an Interactive Timeline w/ Python, Plotly
Interactive outputs
These interactive Plotly exports are saved as standalone HTML files in assets/ and embedded below in the relevant sections:
fig0-simple-timeline.htmlfig1-timeline-add-datadict.htmlfig2-timeline-altview.htmlfig3-timeline-dodacq-view1.htmlfig4-timeline-dodacq-view2.html
Idea behind this project is that you and/or your team use MS Project for developing a roadmap, workb reakdown structure, Gantt chart, etc. However, you’re not impressed with the timeline view. It’s tedious to create a custom view in Power Point, which requires manually re-creating and syncing with the MS Project file. So you decide to try Python to automatically generate a new customizable view from MS Project file’s data.
Step 1 - Get MS Project Data
MS Project files as-is are not the most suitable for interrogating with Python. Instead, you’ll want to export the file into another format. The most sensible one is the XML export option. As I understand it, this XML format is also known as mspdi (MS Project Data Interchange).
I do not have MS Project on my PC. So I asked ChatGPT for help finding an XML file. It pointed me to open-msp-viewer. This project ships with a few samples which can be used for testing. I figured this would be a good place to start.
Step 2 - XML-> Dataframe
For a timeline view, I’m only realy interested in the MS Project file’s task data. Like the name of the task and when it starts and finishes. This will be caputred in the “tasks_ms_df” dataframe.
I’ll also capture the dependency data (links) in case I want to catpure that in the future. This will be captured in the “links_df” (basically a depedency table).
paths = utils.get_paths()
smaple_proj_f = "3PointPlan-example.xml"
tasks_ms_df, links_df = utils.read_mspdi_tasks_and_links(
paths.raw / "open-msp-viewer-sample" / smaple_proj_f
)
# magnify the dates so span multiple FY, for demo purposes later
factor = 18 # 6x longer overall (tweak until it spans multiple FYs)
anchor = tasks_ms_df["start"].min()
for col in ["start", "finish"]:
tasks_ms_df[col] = anchor + (tasks_ms_df[col] - anchor) * factor
display(tasks_ms_df)
| uid | name | start | finish | outline_level | wbs | summary | milestone | predecessors | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 3 Point Plan - example | 2006-10-02 08:30:00 | 2008-11-18 06:13:12 | 0 | 0 | True | False | NaN |
| 1 | 1 | 3 Point Plan | 2006-10-02 08:30:00 | 2008-11-18 06:13:12 | 1 | 1 | True | False | NaN |
| 2 | 2 | Group Task 1 | 2006-10-02 08:30:00 | 2007-06-14 22:25:12 | 2 | 1.1 | True | False | NaN |
| 3 | 3 | Detail Task A | 2006-10-02 08:30:00 | 2006-10-21 16:39:36 | 3 | 1.1.1 | False | False | NaN |
| 4 | 4 | Detail Task B | 2006-10-21 16:39:36 | 2006-11-26 10:03:36 | 3 | 1.1.2 | False | False | [{'predecessor_uid': 3, 'type': 1, 'lag': 0, '... |
| 5 | 5 | Detail Task C | 2006-11-26 10:03:36 | 2007-02-24 05:33:36 | 3 | 1.1.3 | False | False | [{'predecessor_uid': 4, 'type': 1, 'lag': 0, '... |
| 6 | 18 | Group Task 2 | 2007-02-11 12:06:00 | 2007-06-14 22:25:12 | 3 | 1.1.4 | True | False | [{'predecessor_uid': 5, 'type': 1, 'lag': 0, '... |
| 7 | 19 | Detail Task A | 2007-02-11 12:06:00 | 2007-04-05 09:06:00 | 4 | 1.1.4.1 | False | False | NaN |
| 8 | 20 | Detail Task B | 2007-04-05 09:06:00 | 2007-06-14 22:25:12 | 4 | 1.1.4.2 | False | False | [{'predecessor_uid': 19, 'type': 1, 'lag': 0, ... |
| 9 | 21 | Detail Task C | 2007-04-05 09:06:00 | 2007-04-23 00:06:00 | 4 | 1.1.4.3 | False | False | [{'predecessor_uid': 19, 'type': 1, 'lag': 0, ... |
| 10 | 14 | Group Task 3 | 2007-02-11 12:06:00 | 2008-01-01 10:07:12 | 2 | 1.2 | True | False | NaN |
| 11 | 15 | Detail Task A | 2007-06-14 22:25:12 | 2007-10-16 18:49:12 | 3 | 1.2.1 | False | False | [{'predecessor_uid': 18, 'type': 1, 'lag': 0, ... |
| 12 | 16 | Detail Task B | 2007-02-11 12:06:00 | 2007-06-15 03:34:48 | 3 | 1.2.2 | False | False | [{'predecessor_uid': 5, 'type': 1, 'lag': 0, '... |
| 13 | 17 | Detail Task C | 2007-10-16 18:49:12 | 2008-01-01 10:07:12 | 3 | 1.2.3 | False | False | [{'predecessor_uid': 15, 'type': 1, 'lag': 0, ... |
| 14 | 10 | Group Task 4 | 2007-06-14 22:25:12 | 2008-04-16 16:25:12 | 2 | 1.3 | True | False | NaN |
| 15 | 11 | Detail Task A | 2007-06-14 22:25:12 | 2007-07-20 21:13:12 | 3 | 1.3.1 | False | False | [{'predecessor_uid': 2, 'type': 1, 'lag': 0, '... |
| 16 | 12 | Detail Task B | 2008-01-01 10:07:12 | 2008-02-24 01:07:12 | 3 | 1.3.2 | False | False | [{'predecessor_uid': 14, 'type': 1, 'lag': 0, ... |
| 17 | 13 | Detail Task C | 2008-02-24 01:07:12 | 2008-04-16 16:25:12 | 3 | 1.3.3 | False | False | [{'predecessor_uid': 11, 'type': 1, 'lag': 0, ... |
| 18 | 6 | Group Task 5 | 2008-04-16 16:25:12 | 2008-11-18 06:13:12 | 2 | 1.4 | True | False | NaN |
| 19 | 7 | Detail Task A | 2008-04-16 16:25:12 | 2008-06-27 15:13:12 | 3 | 1.4.1 | False | False | [{'predecessor_uid': 10, 'type': 1, 'lag': 0, ... |
| 20 | 8 | Detail Task B | 2008-06-27 15:13:12 | 2008-08-03 12:31:12 | 3 | 1.4.2 | False | False | [{'predecessor_uid': 7, 'type': 1, 'lag': 0, '... |
| 21 | 9 | Detail Task C | 2008-08-03 12:31:12 | 2008-11-18 06:13:12 | 3 | 1.4.3 | False | False | [{'predecessor_uid': 8, 'type': 1, 'lag': 0, '... |
| 22 | 22 | Final Delivery | 2008-11-18 06:13:12 | 2008-11-18 06:13:12 | 2 | 1.5 | False | True | [{'predecessor_uid': 6, 'type': 1, 'lag': 0, '... |
Step 3 - A simple timeline with plotly express timeline function
We’ll start with a simple timeline using plotly express “timeline”. From the docs: “In a timeline plot, each row of data_frame is represented as a rectangular mark on an x axis of type date, spanning from x_start to x_end”.
plotly.express is Plotly’s high-level API: you hand it a tidy dataframe + a few column names, and it builds a full interactive figure for you. Under the hood it returns a normal plotly.graph_objects.Figure, so you can still do all the “serious” customization afterward with fig.update_layout(), fig.update_traces(), etc.
pio.renderers.default = "notebook_connected" # good default for classic notebooks
df = tasks_ms_df.copy()
# Keep only tasks that can be drawn as bars
df = df[df["start"].notna() & df["finish"].notna()].copy()
# Ensure one row per task on the y-axis (avoid duplicate-name collisions)
# Prefer WBS if present; fall back to UID.
df["task_label"] = df.apply(
lambda r: f"{r['wbs']} — {r['name']}"
if pd.notna(r.get("wbs")) and str(r.get("wbs")).strip()
else f"UID {r['uid']} — {r['name']}",
axis=1,
)
# Milestones often have start == finish; give them a tiny width so they render
is_milestone = (df["milestone"]) | (df["start"] == df["finish"])
df["finish_plot"] = df["finish"]
df.loc[is_milestone, "finish_plot"] = df.loc[is_milestone, "finish"] + pd.Timedelta(weeks=2)
# Pick a stable order (WBS-ish if available, otherwise by start)
sort_cols = ["start", "outline_level", "uid"]
if "wbs" in df.columns and df["wbs"].notna().any():
sort_cols = ["wbs", "start", "outline_level", "uid"]
df = df.sort_values(sort_cols, kind="stable")
fig = px.timeline(
df,
x_start="start",
x_end="finish_plot",
y="task_label",
hover_data={
"uid": True,
"name": True,
"wbs": True,
"outline_level": True,
"summary": True,
"milestone": True,
"start": "|%Y-%m-%d",
"finish": "|%Y-%m-%d",
"finish_plot": False,
},
)
fig.update_yaxes(autorange="reversed") # top-to-bottom like a plan
fig.update_layout(
height=max(750, 18 * len(df)),
xaxis_rangeslider_visible=True, # handy for zooming
margin=dict(l=20, r=20, t=40, b=20),
)
display(fig)
fig.write_html(paths.figures / "fig0-simple-timeline.html")
Interactive chart (embedded): Open in a new tab
Step 4 - Customization with a data dictionary
So we have a minimal viable product of sorts. Already it is pretty slick but not anything that MS Project couldn’t do. I now want to add lots of customization.
I’ll use a data dictionary to define my desired style. These will primarily be (1) shape of label bars and milestones, (2) colors/patterns/gradients of task bars and milestones, and (3) labels used for tasks/milestones (and therefore, which row they’ll appear on).
df = tasks_ms_df.copy()
# Keep only tasks that can be drawn as bars
df = df[df["start"].notna() & df["finish"].notna()].copy()
dd_path = paths.raw / "roadmap_data_dictionary_tasks.csv"
dd = pd.read_csv(dd_path)
dd = dd[
[
"uid",
"y_id",
"y_label",
"item_type",
"group",
"shape",
"color",
"include",
"fill_alpha",
"bar_pattern",
"bar_line_color",
"bar_line_width",
"marker_size",
"marker_line_color",
"marker_line_width",
]
].copy()
df_p = df.merge(dd, on="uid", how="left")
# display(df_p)
# Fallbacks so you can leave blanks in the CSV and still render
df_p["y_label"] = df_p["y_label"].fillna(df_p["name"])
df_p["item_type"] = df_p["item_type"].fillna(
df_p["milestone"].fillna(False).map(lambda m: "milestone" if m else "task")
)
df_p["shape"] = df_p["shape"].fillna(
df_p["item_type"].map(lambda t: "star" if t == "milestone" else "rectangle")
)
# make sure it's clean 0..1 floats
df_p["fill_alpha"] = pd.to_numeric(df_p["fill_alpha"], errors="coerce").fillna(1.0).clip(0, 1)
df_p = df_p[(df_p["include"])]
df_p.head()
| uid | name | start | finish | outline_level | wbs | summary | milestone | predecessors | y_id | ... | shape | color | include | fill_alpha | bar_pattern | bar_line_color | bar_line_width | marker_size | marker_line_color | marker_line_width | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 2 | Group Task 1 | 2006-10-02 08:30:00 | 2007-06-14 22:25:12 | 2 | 1.1 | True | False | NaN | 1.1 | ... | rectangle | #EF553B | True | 1.00 | NaN | #333333 | 2 | NaN | NaN | NaN |
| 3 | 3 | Detail Task A | 2006-10-02 08:30:00 | 2006-10-21 16:39:36 | 3 | 1.1.1 | False | False | NaN | 1.1.1 | ... | rectangle | #EF553B | True | 0.85 | NaN | #333333 | 1 | NaN | NaN | NaN |
| 4 | 4 | Detail Task B | 2006-10-21 16:39:36 | 2006-11-26 10:03:36 | 3 | 1.1.2 | False | False | [{'predecessor_uid': 3, 'type': 1, 'lag': 0, '... | 1.1.2 | ... | rectangle | #EF553B | True | 0.85 | NaN | #333333 | 1 | NaN | NaN | NaN |
| 5 | 5 | Detail Task C | 2006-11-26 10:03:36 | 2007-02-24 05:33:36 | 3 | 1.1.3 | False | False | [{'predecessor_uid': 4, 'type': 1, 'lag': 0, '... | 1.1.3 | ... | rectangle | #EF553B | True | 0.85 | NaN | #333333 | 1 | NaN | NaN | NaN |
| 6 | 18 | Group Task 2 | 2007-02-11 12:06:00 | 2007-06-14 22:25:12 | 3 | 1.1.4 | True | False | [{'predecessor_uid': 5, 'type': 1, 'lag': 0, '... | 1.1.4 | ... | rectangle | #EF553B | True | 0.85 | NaN | #333333 | 1 | NaN | NaN | NaN |
5 rows × 23 columns
#################################
# Tasks
#################################
df_tasks = df_p[
(df_p["item_type"] == "task") & df_p["start"].notna() & df_p["finish"].notna()
].copy()
# df_tasks = df_tasks[(df_tasks["outline_level"] >= 2)]
y_col = "y_id" # whatever you’re using on your y-axis
fig = px.timeline(
df_tasks,
x_start="start",
x_end="finish",
y=y_col,
hover_data=["uid", "wbs", "outline_level", "summary"],
text="name", # labels from your CSV
)
# Put text inside bars (timeline() doesn't take textposition directly)
fig.update_traces(
textposition="inside", insidetextanchor="middle", selector=dict(type="bar")
) # :contentReference[oaicite:3]{index=3}
fig.update_traces(
marker_color=df_tasks["color"].tolist(),
marker_pattern_shape=df_tasks["bar_pattern"].fillna("").tolist(),
marker_opacity=df_tasks["fill_alpha"].tolist(),
selector=dict(type="bar"),
)
# y tick labels
fig.update_yaxes(
tickmode="array",
tickvals=df_tasks["y_id"].tolist(),
ticktext=df_tasks["y_label"].tolist(),
)
# # Apply per-bar colors from the dictionary (no legend, but obeys CSV)
# fig.update_traces(
# marker=dict(color=df_tasks["color"].tolist()),
# selector=dict(type="bar")
# ) # :contentReference[oaicite:4]{index=4}
#################################
# Milestones
#################################
df_ms = df_p[(df_p["item_type"] == "milestone") & df_p["start"].notna()].copy()
fig.add_trace(
go.Scatter(
x=df_ms["start"],
y=df_ms[y_col],
mode="markers",
marker=dict(
symbol=df_ms["shape"].tolist(), # e.g. "star"
color=df_ms["color"].tolist(), # e.g. "gold" or "#FFD700"
size=14,
line=dict(width=1, color="black"),
),
text=df_ms["name"],
hovertemplate="%{text}<extra></extra>",
name="Milestones",
)
) # :contentReference[oaicite:5]{index=5}
#################################
# Overall Figure Styling
#################################
# Update timeline to show FY quarters
fy_start_month = 10 # change if your FY starts in a different month
xmin = df_p["start"].min()
xmax = (df_p["finish_plot"] if "finish_plot" in df_p.columns else df_p["finish"]).max()
fig.update_yaxes(autorange="reversed") # top-to-bottom like a plan
fig.update_layout(
height=max(1000, 18 * len(df)),
xaxis_rangeslider_visible=True, # handy for zooming
margin=dict(l=20, r=20, t=40, b=20),
)
display(fig)
fig.write_html(paths.figures / "fig1-timeline-add-datadict.html")
Interactive chart (embedded): Open in a new tab
Step 4a - Make it a function
Taking what built above, throw it into a function which has data dictionaries as an input. Demo below is another way to style the timeline. Demonstrating that changing the data disctionary allows for changing the task and milestone style and grouping/order.
df = tasks_ms_df.copy()
ddv2_path = paths.raw / "roadmap_data_dictionary_tasks_v2.csv"
# display(df)
fig, df_merged = utils.build_roadmap_timeline_from_dd(df, ddv2_path, y_col="y_id")
fig.update_layout(
height=max(1000, 18 * len(df)),
xaxis_rangeslider_visible=True, # handy for zooming
margin=dict(l=20, r=20, t=40, b=20),
)
display(fig)
fig.write_html(paths.figures / "fig2-timeline-altview.html")
Interactive chart (embedded): Open in a new tab
More Complexity with a DoD Acquisition Timeline Example
You can imagine a scenario where a acquisition program timeline is being maintained in MS Project as the “source of truth” and these views created with Python can create customized “artifacts”, e.g., those for presentations with selective information etc. I asked ChatGPT to generate an XML file of a generic DoD acquistiion timeline and a corresponding data dictionary. It gave a pretty good result which I can manually refine/edit.
paths = utils.get_paths()
smaple_proj_f = "dod_acquisition_lifecycle_demo.xml"
tasks_ms_df, links_df = utils.read_mspdi_tasks_and_links(paths.raw / smaple_proj_f)
df = tasks_ms_df.copy()
# Keep only tasks that can be drawn as bars
df = df[df["start"].notna() & df["finish"].notna()].copy()
display(df.head())
/home/autrpy/projects/2025-roadmap-python/src/roadmap_python/analysis_utils.py:250: RuntimeWarning:
invalid value encountered in cast
/home/autrpy/projects/2025-roadmap-python/src/roadmap_python/analysis_utils.py:253: RuntimeWarning:
invalid value encountered in cast
| uid | name | start | finish | outline_level | wbs | summary | milestone | predecessors | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | Generic DoD Program — Acquisition Lifecycle (D... | 2025-10-01 08:00:00 | 2031-09-30 17:00:00 | 1 | 1 | True | False | NaN |
| 1 | 1 | Materiel Development Decision (MDD) | 2025-10-15 13:00:00 | 2025-10-15 13:00:00 | 2 | 1.1 | False | True | NaN |
| 2 | 2 | Materiel Solution Analysis (MSA) Phase | 2025-10-16 08:00:00 | 2026-03-31 17:00:00 | 2 | 1.2 | True | False | [{'predecessor_uid': 1, 'type': 1, 'lag': -922... |
| 3 | 3 | Technology Maturation & Risk Reduction (TMRR) ... | 2026-04-01 08:00:00 | 2026-12-31 17:00:00 | 2 | 1.3 | True | False | NaN |
| 4 | 4 | Engineering & Manufacturing Development (EMD) ... | 2027-01-01 08:00:00 | 2028-09-30 17:00:00 | 2 | 1.4 | True | False | NaN |
ddv_path = paths.raw / "roadmap_data_dictionary_dod_demo_v1.csv"
fig, df_merged = utils.build_roadmap_timeline_from_dd(df, ddv_path, y_col="y_id", y_label="y_label")
fig.update_layout(
height=max(2000, 18 * len(df)),
xaxis_rangeslider_visible=True, # handy for zooming
margin=dict(l=20, r=20, t=40, b=20),
)
display(fig)
fig.write_html(paths.figures / "fig3-timeline-dodacq-view1.html")
Interactive chart (embedded): Open in a new tab
The above represents a complete view generated by ChatGPT that I lightly edited. Because this timeline is so complex, I asked ChatGPT to create a more focused timeline view via an updated data dictionary. The XML input is the same, only the data dictionary was updated. Tasks and milestones are more logically grouped and actiites are hidden to reduce the complexity.
ddv_path = paths.raw / "roadmap_data_dictionary_dod_demo_v4.csv"
fig, df_merged = utils.build_roadmap_timeline_from_dd(
df, ddv_path, bar_text_wrap_width=32, xaxis_fy_quarters=True
)
fig.update_layout(
height=max(1000, 18 * len(df)),
margin=dict(l=20, r=20, t=40, b=20),
)
display(fig)
fig.write_html(paths.figures / "fig4-timeline-dodacq-view2.html")
Interactive chart (embedded): Open in a new tab
Next Steps
Explore further interactite features, visualizing dependencies, and export options.