AI & ML interests

None defined yet.

Recent Activity

klamike  updated a Space 1 day ago
PGLearn/README
klamike  published a Space 1 day ago
PGLearn/README
klamike  updated a dataset 6 months ago
PGLearn/PGLearn-Large-6470_rte
View all activity

Organization Card

PGLearn Datasets

This HuggingFace organization hosts the PGLearn datasets, described in https://arxiv.org/abs/2505.22825.

Collection Link #feasible #samples
Small (#buses ≤ 1000) https://huggingface.co/collections/PGLearn/pglearn-small ~5.749M 6M
Medium (#buses ≤ 5000) https://huggingface.co/collections/PGLearn/pglearn-medium ~1.573M 1.75M
Large (#buses ≤ 10000) https://huggingface.co/collections/PGLearn/pglearn-large ~253.6K 300K
Extra-Large (#buses > 10000) https://huggingface.co/collections/PGLearn/pglearn-extralarge ~69.9K 75K
N-1 contingency cases https://huggingface.co/collections/PGLearn/pglearn-n-1 ~3.575M 4.2M

Citation

@article{klamkin2025pglearn,
  title={PGLearn--An Open-Source Learning Toolkit for Optimal Power Flow},
  author={Klamkin, Michael and Tanneau, Mathieu and Van Hentenryck, Pascal},
  journal={arXiv preprint arXiv:2505.22825},
  year={2025}
}

Instructions

Click to open usage instructions.

The datasets are available in two formats: Parquet and HDF5.

  • Parquet: Use the HuggingFace datasets package as usual, see their documentation for further instructions.
  • HDF5: Use the snapshot_download function from huggingface_hub:
from huggingface_hub import snapshot_download

snapshot_download(
  "PGLearn/PGLearn-Small-14_ieee",
  repo_type="dataset",
  local_dir="./14_ieee",    # where to put it
  revision="script",        # IMPORTANT: grab the HDF5 files, not the parquet files

  # you can set filters, e.g. if we only want the DC samples:
  allow_patterns=[
    "*/DCOPF/*", "*input*", "case.json.gz", "config.toml",
  ],
  ignore_patterns=[
    "infeasible/*"
  ],
)

Then, you can load them with h5py. Note that for some large cases, the dual solution data had to be split up into multiple files. The below helper can decompress and reconstruct these files:

from pathlib import Path
import gzip, shutil
def open_maybe_gzip_cat(path: str | list):
    if isinstance(path, list):
        dest = Path(path[0]).parent.with_suffix(".h5")
        if not dest.exists():
            with open(dest, "wb") as dest_f:
                for piece in path:
                    with open(piece, "rb") as piece_f:
                        shutil.copyfileobj(piece_f, dest_f)
            shutil.rmtree(Path(piece).parent)
        path = dest.as_posix()
    return gzip.open(path, "rb") if path.endswith(".gz") else open(path, "rb")

primal = h5py.File(open_maybe_gzip_cat("data/SOCOPF/primal.h5.gz"))
dual = h5py.File(open_maybe_gzip_cat(
  ["data/SOCOPF/dual/xaa","data/SOCOPF/dual/xab","data/SOCOPF/dual/xac"]
), "r")

If using the HDF5 dataset more than once, it is recommended to pre-decompress the files. The following helper can do this:

from pathlib import Path
import gzip, shutil
for src in Path("./14_ieee").rglob("*.h5.gz"):
  dest = src.with_suffix("")
  with gzip.open(src, "rb") as fsrc, open(dest, "wb") as fdest:
    shutil.copyfileobj(fsrc, fdest)
    src.unlink() # optional; delete the compressed files

models 0

None public yet