Skip to content

DOC: Fix doc redirects #56577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 21, 2023
Merged

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach commented Dec 20, 2023

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Summary:

  • Removed any redirects for methods that have been removed/deprecated
  • Added/updated redirects for GroupBy -> DataFrameGroupby
  • Added redirects for EWM -> ExponentialMovingWindow
  • Added/updated redirects for alias pages we used to generate but the alias was removed/deprecated (e.g. backfill -> bfill, applymap -> map) or just the doc page was removed (e.g. notnull -> notna)

There are many old API pages that no longer exist but are still part of the API (or at least, I couldn't find deprecation/removal notes). These are in the missing_methods list in the script below. I'm also currently excluding pandas.tseries; there are too many potential issues to handle here.

Testing script
from pathlib import Path
import os

# Download each of the versions of docs listed below as zip files, unzip into a
# directory with subdirectories 1.1, 1.2, etc.
base_old = Path(f"/home/richard/Downloads/docs/")
base_pr = Path("/home/richard/dev/pandas/doc/build/html")

# We'll make sure redirects don't exist for any methods that have been removed / deprecated
removed_methods = [
    "pandas.DataFrame.append.html",
    "pandas.DataFrame.iteritems.html",
    "pandas.DataFrame.lookup.html",
    "pandas.DataFrame.mad.html",
    "pandas.DataFrame.slice_shift.html",
    "pandas.DataFrame.tshift.html",
    "pandas.DatetimeIndex.to_perioddelta.html",
    "pandas.DatetimeIndex.weekofyear.html",
    "pandas.Float64Index.html",
    "pandas.Index.asi8.html",
    "pandas.Index.get_value.html",
    "pandas.Index.holds_integer.html",
    "pandas.Index.is_all_dates.html",
    "pandas.Index.is_mixed.html",
    "pandas.Index.is_monotonic.html",
    "pandas.Index.is_type_compatible.html",
    "pandas.Index.set_value.html",
    "pandas.Index.to_native_types.html",
    "pandas.Int64Index.html",
    "pandas.MultiIndex.is_lexsorted.html",
    "pandas.Resampler.pad.html",
    "pandas.Series.append.html",
    "pandas.Series.is_monotonic.html",
    "pandas.Series.iteritems.html",
    "pandas.Series.mad.html",
    "pandas.Series.slice_shift.html",
    "pandas.Series.slice_shift.html",
    "pandas.Series.swapaxes.html",
    "pandas.Series.tshift.html",
    "pandas.Timedelta.delta.html",
    "pandas.Timedelta.is_populated.html",
    "pandas.Timestamp.freq.html",
    "pandas.Timestamp.freqstr.html",
    "pandas.UInt64Index.html",
    "pandas.api.types.is_categorical.html",
    "pandas.api.types.is_extension_type.html",
    "pandas.core.groupby.GroupBy.pad.html",
    "pandas.core.groupby.DataFrameGroupBy.mad.html",
    "pandas.core.groupby.DataFrameGroupBy.pad.html",
    "pandas.core.groupby.DataFrameGroupBy.tshift.html",
    "pandas.core.resample.Resampler.pad.html",
    "pandas.io.formats.style.Styler.hide_columns.html",
    "pandas.io.formats.style.Styler.hide_index.html",
    "pandas.io.formats.style.Styler.set_na_rep.html",
    "pandas.io.formats.style.Styler.set_precision.html",
]
# These methods no longer have doc pages but are still part of the API 
# (or at least, I couldn't find deprecation/removal notes)
missing_methods = [
    "pandas.DataFrame.flags.html",
    "pandas.DataFrame.isetitem.html",
    "pandas.DataFrame.to_numpy.html",
    "pandas.DatetimeIndex.week.html",
    "pandas.ExcelWriter.sheets.html",
    "pandas.ExcelFile.close.html",
    "pandas.ExcelWriter.supported_extensions.html",
    "pandas.ExcelWriter.write_cells.html",
    "pandas.ExcelWriter.handles.html",
    "pandas.ExcelWriter.datetime_format.html",
    "pandas.ExcelWriter.book.html",
    "pandas.ExcelWriter.close.html",
    "pandas.ExcelWriter.check_extension.html",
    "pandas.ExcelWriter.if_sheet_exists.html",
    "pandas.ExcelWriter.path.html",
    "pandas.ExcelWriter.date_format.html",
    "pandas.ExcelWriter.cur_sheet.html",
    "pandas.ExcelWriter.engine.html",
    "pandas.ExcelWriter.save.html",
    "pandas.Flags.allows_duplicate_labels.html",
    "pandas.Index.array.html",
    "pandas.Index.diff.html",
    "pandas.Index.format.html",
    "pandas.Index.groupby.html",
    "pandas.Index.round",
    "pandas.Index.infer_objects.html",
    "pandas.Index.isnull.html",
    "pandas.Index.nlevels.html",
    "pandas.Index.sort.html",
    "pandas.Index.sortlevel.html",
    "pandas.Index.to_flat_index.html",
    "pandas.Index.to_numpy.html",
    "pandas.NamedAgg.index.html",
    "pandas.NamedAgg.aggfunc.html",
    "pandas.NamedAgg.column.html",
    "pandas.NamedAgg.count.html",
    "pandas.PandasArray.html",
    "pandas.Series.axes.html",
    "pandas.Series.divmod.html",
    "pandas.Series.dt.week.html",
    "pandas.Series.dt.weekofyear.html",
    "pandas.Series.info.html",
    "pandas.Series.rdivmod.html",
    "pandas.Timedelta.freq.html",
    "pandas.Timedelta.resolution_string.html",
    "pandas.Timestamp.fromisocalendar.html",
    "pandas.Timestamp.fromisoformat.html",
    "pandas.api.extensions.ExtensionArray._ndarray_values.html",
    "pandas.api.extensions.ExtensionDtype.construct_array_type.html",
    "pandas.api.extensions.ExtensionDtype.construct_from_string.html",
    "pandas.api.extensions.ExtensionDtype.empty.html",
    "pandas.api.extensions.ExtensionDtype.is_dtype.html",
    "pandas.api.extensions.ExtensionDtype.kind.html",
    "pandas.api.extensions.ExtensionDtype.na_value.html",
    "pandas.api.extensions.ExtensionDtype.name.html",
    "pandas.api.extensions.ExtensionDtype.names.html",
    "pandas.api.extensions.ExtensionDtype.type.html",
    "pandas.api.indexers.BaseIndexer.get_window_bounds.html",
    "pandas.api.indexers.FixedForwardWindowIndexer.get_window_bounds.html",
    "pandas.api.indexers.VariableOffsetWindowIndexer.get_window_bounds.html",
    "pandas.core.window.Rolling.html",
    "pandas.errors.AccessorRegistrationWarning",
    "pandas.io.formats.style.Styler.render.html",
    "pandas.io.formats.style.Styler.template.html",
    "pandas.io.formats.style.Styler.where.html",
    "pandas.option_context.__call__",
    "pandas.window.Rolling.html",
]

def strip_str(haystack, needle, before):
    idx = haystack.find(needle)
    assert idx >= 0
    if before:
        result = haystack[idx + len(needle):]
    else:
        result = haystack[:idx]
    return result

for version in ["1.0", "1.1", "1.2", "1.3", "1.4", "1.5", "2.0", "2.1"]:
    base_old_api = base_old / version / "reference" / "api"
    base_pr_api = base_pr / "reference" / "api"

    print('Version:', version)
    for e in base_old_api.rglob("*.html"):
        relative_path = e.relative_to(base_old_api)
        
        if str(relative_path).startswith("pandas.tseries"):
            # Only looking at pandas.core redirects
            continue
        
        target_exists = os.path.exists(base_pr_api / relative_path)
        
        if any(e in str(relative_path) for e in removed_methods + missing_methods):
            if target_exists:
                print(f'  {relative_path}: Removed method has redirect')
            continue
            
        if not target_exists:
            print(f'  {relative_path}: Does not exist')
            continue

        contents = open(base_pr_api / relative_path).read()
        
        if "The page has been moved to" not in contents:
            # There is no redirect
            continue

        # Get the redirect path
        contents = strip_str(contents, "0;URL=", before=True)
        path = Path(strip_str(contents, '"/>\n', before=False))
        
        if not os.path.exists(base_pr_api / path):
            print(f'  {relative_path}: Redirect does not exist')
            continue

        contents = open(base_pr_api / path).read()
        if "The page has been moved to" in contents:
            print(f'  {relative_path}: Redirect is another redirect')
            
    print()
    print('-' * 20)
    print()

@rhshadrach rhshadrach added this to the 2.2 milestone Dec 20, 2023
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Haven't worked with the redirects, but these redirects are compatible with each version in our version switcher correct?

@rhshadrach
Copy link
Member Author

I believe only the redirects of the version you've chosen will be used. In other words, this PR has no impact on the docs when you look at 2.1 or earlier.

@rhshadrach rhshadrach changed the title Bug doc redirects BUG: Fix doc redirects Dec 20, 2023
@rhshadrach rhshadrach changed the title BUG: Fix doc redirects DOC: Fix doc redirects Dec 20, 2023
@lithomas1 lithomas1 merged commit 62f4098 into pandas-dev:main Dec 21, 2023
@lithomas1
Copy link
Member

thanks @rhshadrach

@rhshadrach rhshadrach deleted the bug_doc_redirects branch December 21, 2023 23:33
cbpygit pushed a commit to cbpygit/pandas that referenced this pull request Jan 2, 2024
* DOC: Update redirects

* fixups

* fixups
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants