Data Tools

Data Tools

RepoSiteStarsIssuesContributorsVersionLast PublishForksWatchers
pandas (opens in a new tab) (opens in a new tab)43721361830Pandas 2.2.32024-09-20179371110
polars (opens in a new tab) (opens in a new tab)30235217930Rust Polars 0.44.22024-11-011954167
spark (opens in a new tab) (opens in a new tab)3981722730283002021
dask (opens in a new tab) (opens in a new tab)125771096302024.11.02024-11-081709212
modin (opens in a new tab) (opens in a new tab)987666630Modin 0.32.02024-09-11651117
duckdb (opens in a new tab) (opens in a new tab)2409432930v1.1.3 Bugfix Release2024-11-041909207
vaex (opens in a new tab) (opens in a new tab)829053430Version linked to the paper2018-03-29590144
fugue (opens in a new tab) (opens in a new tab)20053722Support `dict[str,Any]` as transformer input and output2024-06-289425
ibis (opens in a new tab) (opens in a new tab)5281257309.5.02024-09-1159585
Daft (opens in a new tab) (opens in a new tab)231125930v0.3.112024-11-0716016

pandas

PyPIPyPI downloadsPyPI - Support Python Versions
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

polars

PyPIPyPI downloadsPyPI - Support Python Versions
Dataframes powered by a multithreaded, vectorized query engine, written in Rust

spark

PyPIPyPI downloadsPyPI - Support Python Versions
Apache Spark - A unified analytics engine for large-scale data processing

dask

PyPIPyPI downloadsPyPI - Support Python Versions
Parallel computing with task scheduling

modin

PyPIPyPI downloadsPyPI - Support Python Versions
Modin: Scale your Pandas workflows by changing a single line of code

duckdb

PyPIPyPI downloadsPyPI - Support Python Versions
DuckDB is an analytical in-process SQL database management system

vaex

PyPIPyPI downloadsPyPI - Support Python Versions
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

fugue

PyPIPyPI downloadsPyPI - Support Python Versions
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

ibis

PyPIPyPI downloadsPyPI - Support Python Versions
the portable Python dataframe library

Daft

PyPIPyPI downloadsPyPI - Support Python Versions
Distributed data engine for Python/SQL designed for the cloud, powered by Rust