pyspssio.read_sav
- pyspssio.read_sav(spss_file, row_offset=0, row_limit=None, usecols=None, convert_datetimes=True, include_user_missing=True, chunksize=None, locale=None, string_nan='', data_only=False)[source]
Read data and metadata from SPSS file
- Parameters:
spss_file (
str) – SPSS filename (.sav or .zsav)row_offset (
int(default:0)) – Number of rows to skiprow_limit (
int(default:None)) – Maximum number of rows to returnusecols (
Union[list,tuple,str,callable,None] (default:None)) – Columns to use (None for all columns)convert_datetimes (
bool(default:True)) – Convert SPSS datetimes to Python/Pandas datetime columns; False returns seconds from October 15, 1582 (SPSS start date)include_user_missing (
bool(default:True)) – Whether to keep user missing values or replace them with NaN (numeric) and “” (strings)chunksize (
int(default:None)) – Number of rows to return per chunklocale (
str(default:None)) – Locale to use when I/O module is operating in codepage modestring_nan (
Any(default:'')) – Value to return for empty stringsdata_only (
bool(default:False)) – Default (False) returns tuple of dataframe, metadata; If True, only return dataframe; Not applicable when chunksize is specified (always returns dataframe generator)
- Return type:
Union[DataFrame,Tuple[DataFrame,dict],Generator[DataFrame,None,None]]- Returns:
tuple – DataFrame, metadata
generator – DataFrame(s) with chunksize number of rows (only if chunksize is specified)
Examples
Read data and metadata:
df, meta = pyspssio.read_sav("spss_file.sav")
Read metadata only:
meta = pyspssio.read_metadata("spss_file.sav")
Read data in chunks of chunksize (number of rows/records):
for df in pyspssio.read_sav("spss_file.sav", chunksize=1000): # do something