API Reference¶
Input/Output¶
Pickling¶
Flat File¶
Clipboard¶
Excel¶
JSON¶
json_normalize (data[, record_path, meta, ...]) |
“Normalize” semi-structured JSON data into a flat table |
HTML¶
HDFStore: PyTables (HDF5)¶
SAS¶
SQL¶
Google BigQuery¶
read_gbq (query[, project_id, index_col, ...]) |
Load data from Google BigQuery. |
to_gbq (dataframe, destination_table, project_id) |
Write a DataFrame to a Google BigQuery table. |
STATA¶
StataReader.data (**kwargs) |
DEPRECATED: Reads observations from Stata file, converting them into a dataframe |
StataReader.data_label () |
Returns data label of Stata file |
StataReader.value_labels () |
Returns a dict, associating each variable name a dict, associating |
StataReader.variable_labels () |
Returns variable labels as a dict, associating each variable name |
StataWriter.write_file () |
General functions¶
Data manipulations¶
Top-level missing data¶
Top-level conversions¶
Top-level dealing with datetimelike¶
Top-level evaluation¶
Standard moving window functions¶
Standard expanding window functions¶
Exponentially-weighted moving window functions¶
Series¶
Constructor¶
Attributes¶
- Axes
- index: axis labels
Series.values |
Return Series as ndarray or ndarray-like |
Series.dtype |
return the dtype object of the underlying data |
Series.ftype |
return if the data is sparse|dense |
Series.shape |
return a tuple of the shape of the underlying data |
Series.nbytes |
return the number of bytes in the underlying data |
Series.ndim |
return the number of dimensions of the underlying data, by definition 1 |
Series.size |
return the number of elements in the underlying data |
Series.strides |
return the strides of the underlying data |
Series.itemsize |
return the size of the dtype of the item of the underlying data |
Series.base |
return the base object if the memory of the underlying data is shared |
Series.T |
return the transpose, which is by definition self |
Conversion¶
Indexing, iteration¶
Series.at |
Fast label-based scalar accessor |
Series.iat |
Fast integer location scalar accessor. |
Series.ix |
A primarily label-location based indexer, with integer position fallback. |
Series.loc |
Purely label-location based indexer for selection by label. |
Series.iloc |
Purely integer-location based indexing for selection by position. |
For more information on .at
, .iat
, .ix
, .loc
, and
.iloc
, see the indexing documentation.
Binary operator functions¶
Function application, GroupBy¶
Computations / Descriptive Stats¶
Reindexing / Selection / Label manipulation¶
Missing data handling¶
Reshaping, sorting¶
Combining / joining / merging¶
Datetimelike Properties¶
Series.dt
can be used to access the values of the series as
datetimelike and return several properties.
These can be accessed like Series.dt.<property>
.
Datetime Properties
Series.dt.date |
Returns numpy array of datetime.date. |
Series.dt.time |
Returns numpy array of datetime.time. |
Series.dt.year |
The year of the datetime |
Series.dt.month |
The month as January=1, December=12 |
Series.dt.day |
The days of the datetime |
Series.dt.hour |
The hours of the datetime |
Series.dt.minute |
The minutes of the datetime |
Series.dt.second |
The seconds of the datetime |
Series.dt.microsecond |
The microseconds of the datetime |
Series.dt.nanosecond |
The nanoseconds of the datetime |
Series.dt.week |
The week ordinal of the year |
Series.dt.weekofyear |
The week ordinal of the year |
Series.dt.dayofweek |
The day of the week with Monday=0, Sunday=6 |
Series.dt.weekday |
The day of the week with Monday=0, Sunday=6 |
Series.dt.dayofyear |
The ordinal day of the year |
Series.dt.quarter |
The quarter of the date |
Series.dt.is_month_start |
Logical indicating if first day of month (defined by frequency) |
Series.dt.is_month_end |
Logical indicating if last day of month (defined by frequency) |
Series.dt.is_quarter_start |
Logical indicating if first day of quarter (defined by frequency) |
Series.dt.is_quarter_end |
Logical indicating if last day of quarter (defined by frequency) |
Series.dt.is_year_start |
Logical indicating if first day of year (defined by frequency) |
Series.dt.is_year_end |
Logical indicating if last day of year (defined by frequency) |
Series.dt.daysinmonth |
The number of days in the month |
Series.dt.days_in_month |
The number of days in the month |
Series.dt.tz |
|
Series.dt.freq |
get/set the frequncy of the Index |
Datetime Methods
Timedelta Properties
Series.dt.days |
Number of days for each element. |
Series.dt.seconds |
Number of seconds (>= 0 and less than 1 day) for each element. |
Series.dt.microseconds |
Number of microseconds (>= 0 and less than 1 second) for each element. |
Series.dt.nanoseconds |
Number of nanoseconds (>= 0 and less than 1 microsecond) for each element. |
Series.dt.components |
Return a dataframe of the components (days, hours, minutes, seconds, milliseconds, microseconds, nanoseconds) of the Timedeltas. |
Timedelta Methods
String handling¶
Series.str
can be used to access the values of the series as
strings and apply several methods to it. These can be acccessed like
Series.str.<function/property>
.
Categorical¶
If the Series is of dtype category
, Series.cat
can be used to change the the categorical
data. This accessor is similar to the Series.dt
or Series.str
and has the
following usable methods and properties:
Series.cat.categories |
The categories of this categorical. |
Series.cat.ordered |
Gets the ordered attribute |
Series.cat.codes |
To create a Series of dtype category
, use cat = s.astype("category")
.
The following two Categorical
constructors are considered API but should only be used when
adding ordering information or special categories is need at creation time of the categorical data:
np.asarray(categorical)
works by implementing the array interface. Be aware, that this converts
the Categorical back to a numpy array, so levels and order information is not preserved!
Plotting¶
Series.plot
is both a callable method and a namespace attribute for
specific plotting methods of the form Series.plot.<kind>
.
Serialization / IO / Conversion¶
Sparse methods¶
DataFrame¶
Constructor¶
Attributes and underlying data¶
Axes
- index: row labels
- columns: column labels
DataFrame.dtypes |
Return the dtypes in this object |
DataFrame.ftypes |
Return the ftypes (indication of sparse/dense and dtype) in this object. |
DataFrame.values |
Numpy representation of NDFrame |
DataFrame.axes |
Return a list with the row axis labels and column axis labels as the only members. |
DataFrame.ndim |
Number of axes / array dimensions |
DataFrame.size |
number of elements in the NDFrame |
DataFrame.shape |
Return a tuple representing the dimensionality of the DataFrame. |
Conversion¶
Indexing, iteration¶
DataFrame.at |
Fast label-based scalar accessor |
DataFrame.iat |
Fast integer location scalar accessor. |
DataFrame.ix |
A primarily label-location based indexer, with integer position fallback. |
DataFrame.loc |
Purely label-location based indexer for selection by label. |
DataFrame.iloc |
Purely integer-location based indexing for selection by position. |
For more information on .at
, .iat
, .ix
, .loc
, and
.iloc
, see the indexing documentation.
Binary operator functions¶
Function application, GroupBy¶
Computations / Descriptive Stats¶
Reindexing / Selection / Label manipulation¶
Missing data handling¶
Reshaping, sorting, transposing¶
DataFrame.T |
Transpose index and columns |
Combining / joining / merging¶
Time series-related¶
Plotting¶
DataFrame.plot
is both a callable method and a namespace attribute for
specific plotting methods of the form DataFrame.plot.<kind>
.
Serialization / IO / Conversion¶
Panel¶
Constructor¶
Attributes and underlying data¶
Axes
- items: axis 0; each item corresponds to a DataFrame contained inside
- major_axis: axis 1; the index (rows) of each of the DataFrames
- minor_axis: axis 2; the columns of each of the DataFrames
Panel.values |
Numpy representation of NDFrame |
Panel.axes |
Return index label(s) of the internal NDFrame |
Panel.ndim |
Number of axes / array dimensions |
Panel.size |
number of elements in the NDFrame |
Panel.shape |
Return a tuple of axis dimensions |
Panel.dtypes |
Return the dtypes in this object |
Panel.ftypes |
Return the ftypes (indication of sparse/dense and dtype) in this object. |
Conversion¶
Getting and setting¶
Indexing, iteration, slicing¶
Panel.at |
Fast label-based scalar accessor |
Panel.iat |
Fast integer location scalar accessor. |
Panel.ix |
A primarily label-location based indexer, with integer position fallback. |
Panel.loc |
Purely label-location based indexer for selection by label. |
Panel.iloc |
Purely integer-location based indexing for selection by position. |
For more information on .at
, .iat
, .ix
, .loc
, and
.iloc
, see the indexing documentation.
Binary operator functions¶
Function application, GroupBy¶
Computations / Descriptive Stats¶
Reindexing / Selection / Label manipulation¶
Missing data handling¶
Reshaping, sorting, transposing¶
Combining / joining / merging¶
Time series-related¶
Serialization / IO / Conversion¶
Panel4D¶
Constructor¶
Attributes and underlying data¶
Axes
- labels: axis 1; each label corresponds to a Panel contained inside
- items: axis 2; each item corresponds to a DataFrame contained inside
- major_axis: axis 3; the index (rows) of each of the DataFrames
- minor_axis: axis 4; the columns of each of the DataFrames
Panel4D.values |
Numpy representation of NDFrame |
Panel4D.axes |
Return index label(s) of the internal NDFrame |
Panel4D.ndim |
Number of axes / array dimensions |
Panel4D.size |
number of elements in the NDFrame |
Panel4D.shape |
Return a tuple of axis dimensions |
Panel4D.dtypes |
Return the dtypes in this object |
Panel4D.ftypes |
Return the ftypes (indication of sparse/dense and dtype) in this object. |
Conversion¶
Index¶
Many of these methods or variants thereof are available on the objects that contain an index (Series/Dataframe) and those should most likely be used before calling these methods directly.
Attributes¶
Index.values |
return the underlying data as an ndarray |
Index.is_monotonic |
alias for is_monotonic_increasing (deprecated) |
Index.is_monotonic_increasing |
return if the index is monotonic increasing (only equal or |
Index.is_monotonic_decreasing |
return if the index is monotonic decreasing (only equal or |
Index.is_unique |
|
Index.has_duplicates |
|
Index.dtype |
|
Index.inferred_type |
|
Index.is_all_dates |
|
Index.shape |
return a tuple of the shape of the underlying data |
Index.nbytes |
return the number of bytes in the underlying data |
Index.ndim |
return the number of dimensions of the underlying data, by definition 1 |
Index.size |
return the number of elements in the underlying data |
Index.strides |
return the strides of the underlying data |
Index.itemsize |
return the size of the dtype of the item of the underlying data |
Index.base |
return the base object if the memory of the underlying data is shared |
Index.T |
return the transpose, which is by definition self |
Modifying and Computations¶
Conversion¶
Sorting¶
Time-specific operations¶
Combining / joining / set operations¶
Selecting¶
CategoricalIndex¶
Categorical Components¶
CategoricalIndex.codes |
|
CategoricalIndex.categories |
|
CategoricalIndex.ordered |
DatetimeIndex¶
Time/Date Components¶
DatetimeIndex.year |
The year of the datetime |
DatetimeIndex.month |
The month as January=1, December=12 |
DatetimeIndex.day |
The days of the datetime |
DatetimeIndex.hour |
The hours of the datetime |
DatetimeIndex.minute |
The minutes of the datetime |
DatetimeIndex.second |
The seconds of the datetime |
DatetimeIndex.microsecond |
The microseconds of the datetime |
DatetimeIndex.nanosecond |
The nanoseconds of the datetime |
DatetimeIndex.date |
Returns numpy array of datetime.date. |
DatetimeIndex.time |
Returns numpy array of datetime.time. |
DatetimeIndex.dayofyear |
The ordinal day of the year |
DatetimeIndex.weekofyear |
The week ordinal of the year |
DatetimeIndex.week |
The week ordinal of the year |
DatetimeIndex.dayofweek |
The day of the week with Monday=0, Sunday=6 |
DatetimeIndex.weekday |
The day of the week with Monday=0, Sunday=6 |
DatetimeIndex.quarter |
The quarter of the date |
DatetimeIndex.tz |
|
DatetimeIndex.freq |
get/set the frequncy of the Index |
DatetimeIndex.freqstr |
return the frequency object as a string if its set, otherwise None |
DatetimeIndex.is_month_start |
Logical indicating if first day of month (defined by frequency) |
DatetimeIndex.is_month_end |
Logical indicating if last day of month (defined by frequency) |
DatetimeIndex.is_quarter_start |
Logical indicating if first day of quarter (defined by frequency) |
DatetimeIndex.is_quarter_end |
Logical indicating if last day of quarter (defined by frequency) |
DatetimeIndex.is_year_start |
Logical indicating if first day of year (defined by frequency) |
DatetimeIndex.is_year_end |
Logical indicating if last day of year (defined by frequency) |
DatetimeIndex.inferred_freq |
Selecting¶
Time-specific operations¶
Conversion¶
TimedeltaIndex¶
Components¶
TimedeltaIndex.days |
Number of days for each element. |
TimedeltaIndex.seconds |
Number of seconds (>= 0 and less than 1 day) for each element. |
TimedeltaIndex.microseconds |
Number of microseconds (>= 0 and less than 1 second) for each element. |
TimedeltaIndex.nanoseconds |
Number of nanoseconds (>= 0 and less than 1 microsecond) for each element. |
TimedeltaIndex.components |
Return a dataframe of the components (days, hours, minutes, seconds, milliseconds, microseconds, nanoseconds) of the Timedeltas. |
TimedeltaIndex.inferred_freq |
Conversion¶
GroupBy¶
GroupBy objects are returned by groupby calls: pandas.DataFrame.groupby()
, pandas.Series.groupby()
, etc.
Indexing, iteration¶
GroupBy.__iter__ () |
Groupby iterator |
GroupBy.groups |
dict {group name -> group labels} |
GroupBy.indices |
dict {group name -> group indices} |
GroupBy.get_group (name[, obj]) |
Constructs NDFrame from group with provided name |
Function application¶
GroupBy.apply (func, *args, **kwargs) |
Apply function and combine results together in an intelligent way. |
GroupBy.aggregate (func, *args, **kwargs) |
|
GroupBy.transform (func, *args, **kwargs) |
Computations / Descriptive Stats¶
GroupBy.count () |
Compute count of group, excluding missing values |
GroupBy.cumcount ([ascending]) |
Number each item in each group from 0 to the length of that group - 1. |
GroupBy.first () |
Compute first of group values |
GroupBy.head ([n]) |
Returns first n rows of each group. |
GroupBy.last () |
Compute last of group values |
GroupBy.max () |
Compute max of group values |
GroupBy.mean () |
Compute mean of groups, excluding missing values |
GroupBy.median () |
Compute median of groups, excluding missing values |
GroupBy.min () |
Compute min of group values |
GroupBy.nth (n[, dropna]) |
Take the nth row from each group if n is an int, or a subset of rows if n is a list of ints. |
GroupBy.ohlc () |
Compute sum of values, excluding missing values |
GroupBy.prod () |
Compute prod of group values |
GroupBy.size () |
Compute group sizes |
GroupBy.sem ([ddof]) |
Compute standard error of the mean of groups, excluding missing values |
GroupBy.std ([ddof]) |
Compute standard deviation of groups, excluding missing values |
GroupBy.sum () |
Compute sum of group values |
GroupBy.var ([ddof]) |
Compute variance of groups, excluding missing values |
GroupBy.tail ([n]) |
Returns last n rows of each group |
The following methods are available in both SeriesGroupBy
and
DataFrameGroupBy
objects, but may differ slightly, usually in that
the DataFrameGroupBy
version usually permits the specification of an
axis argument, and often an argument indicating whether to restrict
application to columns of a specific data type.
DataFrameGroupBy.bfill ([axis, inplace, ...]) |
Synonym for NDFrame.fillna(method=’bfill’) |
DataFrameGroupBy.cummax ([axis, dtype, out, ...]) |
Return cumulative max over requested axis. |
DataFrameGroupBy.cummin ([axis, dtype, out, ...]) |
Return cumulative min over requested axis. |
DataFrameGroupBy.cumprod ([axis]) |
Cumulative product for each group |
DataFrameGroupBy.cumsum ([axis]) |
Cumulative sum for each group |
DataFrameGroupBy.describe ([percentiles, ...]) |
Generate various summary statistics, excluding NaN values. |
DataFrameGroupBy.all ([axis, bool_only, ...]) |
Return whether all elements are True over requested axis |
DataFrameGroupBy.any ([axis, bool_only, ...]) |
Return whether any element is True over requested axis |
DataFrameGroupBy.corr ([method, min_periods]) |
Compute pairwise correlation of columns, excluding NA/null values |
DataFrameGroupBy.cov ([min_periods]) |
Compute pairwise covariance of columns, excluding NA/null values |
DataFrameGroupBy.diff ([periods, axis]) |
1st discrete difference of object |
DataFrameGroupBy.ffill ([axis, inplace, ...]) |
Synonym for NDFrame.fillna(method=’ffill’) |
DataFrameGroupBy.fillna ([value, method, ...]) |
Fill NA/NaN values using the specified method |
DataFrameGroupBy.hist (data[, column, by, ...]) |
Draw histogram of the DataFrame’s series using matplotlib / pylab. |
DataFrameGroupBy.idxmax ([axis, skipna]) |
Return index of first occurrence of maximum over requested axis. |
DataFrameGroupBy.idxmin ([axis, skipna]) |
Return index of first occurrence of minimum over requested axis. |
DataFrameGroupBy.mad ([axis, skipna, level]) |
Return the mean absolute deviation of the values for the requested axis |
DataFrameGroupBy.pct_change ([periods, ...]) |
Percent change over given number of periods. |
DataFrameGroupBy.plot |
Class implementing the .plot attribute for groupby objects |
DataFrameGroupBy.quantile ([q, axis, ...]) |
Return values at the given quantile over requested axis, a la numpy.percentile. |
DataFrameGroupBy.rank ([axis, numeric_only, ...]) |
Compute numerical data ranks (1 through n) along axis. |
DataFrameGroupBy.resample (rule[, how, axis, ...]) |
Convenience method for frequency conversion and resampling of regular time-series data. |
DataFrameGroupBy.shift ([periods, freq, axis]) |
Shift each group by periods observations |
DataFrameGroupBy.skew ([axis, skipna, level, ...]) |
Return unbiased skew over requested axis |
DataFrameGroupBy.take (indices[, axis, ...]) |
Analogous to ndarray.take |
DataFrameGroupBy.tshift ([periods, freq, axis]) |
Shift the time index, using the index’s frequency if available |
The following methods are available only for SeriesGroupBy
objects.
SeriesGroupBy.nlargest (*args, **kwargs) |
Return the largest n elements. |
SeriesGroupBy.nsmallest (*args, **kwargs) |
Return the smallest n elements. |
SeriesGroupBy.nunique ([dropna]) |
|
SeriesGroupBy.unique () |
Return array of unique values in the object. |
SeriesGroupBy.value_counts ([normalize, ...]) |
The following methods are available only for DataFrameGroupBy
objects.
DataFrameGroupBy.corrwith (other[, axis, drop]) |
Compute pairwise correlation between rows or columns of two DataFrame objects. |
Style¶
Styler
objects are returned by pandas.DataFrame.style
.
Constructor¶
Styler (data[, precision, table_styles, ...]) |
Helps style a DataFrame or Series according to the data with HTML and CSS. |
Style Application¶
Styler.apply (func[, axis, subset]) |
Apply a function column-wise, row-wise, or table-wase, updating the HTML representation with the result. |
Styler.applymap (func[, subset]) |
Apply a function elementwise, updating the HTML representation with the result. |
Styler.set_precision (precision) |
Set the precision used to render. |
Styler.set_table_styles (table_styles) |
Set the table styles on a Styler |
Styler.set_caption (caption) |
Se the caption on a Styler |
Styler.set_properties ([subset]) |
Convience method for setting one or more non-data dependent properties or each cell. |
Styler.set_uuid (uuid) |
Set the uuid for a Styler. |
Styler.clear () |
“Reset” the styler, removing any previously applied styles. |
Builtin Styles¶
Styler.highlight_max ([subset, color, axis]) |
Highlight the maximum by shading the background |
Styler.highlight_min ([subset, color, axis]) |
Highlight the minimum by shading the background |
Styler.highlight_null ([null_color]) |
Shade the background null_color for missing values. |
Styler.background_gradient ([cmap, low, ...]) |
Color the background in a gradient according to the data in each column (optionally row). |
Styler.bar ([subset, axis, color, width]) |
Color the background color proptional to the values in each column. |
Style Export and Import¶
Styler.render () |
Render the built up styles to HTML |
Styler.export () |
Export the styles to applied to the current Styler. |
Styler.use (styles) |
Set the styles on the current Styler, possibly using styles from Styler.export . |