pyspssio.Header

class pyspssio.Header(*args, **kwargs)[source]

Bases: SPSSFile

Class for getting and setting metadata attributes

__init__(*args, **kwargs)[source]

Methods

__init__(*args, **kwargs)

close()

Close file

commit_header()

Finalize metadata

open()

Open file

set_locale(locale)

Set I/O module to a specific locale

Attributes

case_count

Number of cases

case_size

Record case size (in bytes)

case_weight_var

Case weight variable

compression

Compression level

file_attributes

Arbitrary user-defined file attributes

file_encoding

File encoding reported by I/O module

interface_encoding

I/O interface mode (Unicode or code page)

is_compatible_encoding

Check encoding compatibility

mrsets

Multi response set definitions

mrsets_count

Number of multi response set definitions

release_info

Basic file information

var_alignments

Variable alignments

var_attributes

Variable attributes

var_column_widths

Column display widths

var_compat_names

Short (8-byte) variable names

var_count

Number of variables

var_formats

Variable formats as strings

var_formats_tuple

Variable formats as tuples in the form (type, width, decimals)

var_handles

Variable handles references

var_labels

Variable labels

var_measure_levels

Variable measure levels

var_missing_values

Missing values

var_names

Variable names

var_roles

Variable roles

var_sets

Variable sets

var_types

Variable types

var_value_labels

Variable value labels

property file_attributes: dict

Arbitrary user-defined file attributes

property var_names: list

Variable names

May return a filtered list when returned as part of a metadata object if only a subset of variables are specified to be used (e.g., usecols in read_sav).

property var_types: dict

Variable types

May return a filtered dictionary when returned as part of a metadata object if only a subset of variables are specified to be used (e.g., usecols in read_sav).

property var_handles: dict

Variable handles references

Used when calling I/O module procedures that use variable handles instead of variable names as arguments

property var_formats_tuple: dict

Variable formats as tuples in the form (type, width, decimals)

ex. (5, 8, 2) instead of F8.2

property var_formats: dict

Variable formats as strings

Use var_formats_tuple property for formats as tuples

property var_measure_levels: dict

Variable measure levels

Measure levels are returned as strings. When setting, input accepts either strings or numerics.

  • 0 = unknown

  • 1 = nominal

  • 2 = ordinal

  • 3 = scale

property var_alignments: dict

Variable alignments

Alignments are returned as strings. When setting, input accepts either strings or numerics.

  • 0 = left

  • 1 = right

  • 2 = center

property var_column_widths: dict

Column display widths

Manually set column widths or specify 0 to use SPSS’ algorithm to assign a width

property var_labels: dict

Variable labels

property var_roles: dict

Variable roles

Roles are returned as strings. When setting, input accepts either strings or numerics.

  • 0 = input

  • 1 = target

  • 2 = both

  • 3 = none

  • 4 = partition

  • 5 = split

  • 6 = frequency

  • 7 = recordid

property var_value_labels: dict

Variable value labels

Nested dictionary of variables with their value labels (if defined) as sub-dictionaries

Note: value labels only work for numeric and short string variables (length <= 8)

property mrsets_count: int

Number of multi response set definitions

Needed if using spssGetMultRespDefByIndex. Otherwise, len(mrsets) should be equivalent.

property mrsets: dict

Multi response set definitions

Multi response sets contain the following attributes
  • label : set label

  • is_dichotomy : whether set is dichotomous (True) or Category (False)

  • counted_value : counted value for dichotomous sets

  • use_category_labels : whether to use counted value labels instead of variable labels

  • use_first_var_label : whether to use first var label as set label

  • variable_list : list of variables in the set

Notes

mrset name must begin with a “$”.

variable_list is the only required attribute. However, if this is the only included attribute, then is_dichotomy is assumed to be False.

If is_dichotomy is True, counted_value must be specified. If is_dichotomy is None and counted_value is not None, is_dichotomy is assumed to be True.

Numeric dichotomous sets only accept integers for a counted value.

use_category_labels is only applicable for dichotomous sets. Setting this to True turns the set into an “extended” mrset definition.

use_first_var_label is only applicable when use_category_labels is True. Specifying a set label when use_first_var_label is True might result in an invalid mrset definition.

Examples

Category (C) Set:

{"$mc_mrset": {
    "label": "This is an MC set",
    "variable_list": ["var1", "var2", "var3"]
}}

Dichotomous (D) Set:

{"$md_mrset": {
    "label": "This is an MD set",
    "counted_value": 1,
    "variable_list": ["resp1", "resp2", "resp3"]
}}

Dichotomous (E - Extended) Set:

{"$md_mrset": {
    "counted_value": 1,
    "use_category:labels": True,
    "use_first_var_label": True,
    "variable_list": ["cat1", "cat2", "cat3"]
}}
property case_size: int

Record case size (in bytes)

Raw number of bytes for a single case record. It can be calculated manually by adding all variable types rounded up to the nearest multiple of 8.

This is the buffer size used to read a whole case record at once. It is not necessily the number of bytes used to store a case record on disk (depending on compression).

property case_weight_var: str

Case weight variable

Variable set as the “weight” variable in SPSS. Must be a scale numeric variable.

property var_missing_values: dict

Missing values

Missing value definitions may contain three keys
  1. lo = Low value used in missing range

  2. hi = high value used in missing range

  3. values = list of discrete values set as user missing

For missing ranges, the following keywords can be used inplace of numeric values
  • low = -inf, lo, low, lowest

  • high = inf, hi, high, highest

property var_attributes: dict

Variable attributes

These are arbitrary variable properties, analagous to file attributes

property var_compat_names: dict

Short (8-byte) variable names

Dictionary of variable names with their “compatible” short 8-byte counterparts

property var_sets: dict

Variable sets

These are NOT multi response sets. These variable sets are groupings of variables that can be selected in the SPSS application as a sort of view filter.

SPSS apparently may use the 8 byte compatible variable names for this property. It’s currently not possible to obtain the auto-generated compatible names until the dictionary is committed, which means setting this property potentially requires first comitting a dictionary with all variables, and then rewriting it after obtaining the compatible variable names.

Set names when created in the normal SPSS application allow spaces and special characters. However, The I/O module returns an SPSS_INVALID_VARSETDEF error when these are included. When an “=” sign is included in the set name, the set name is truncated.

commit_header()[source]

Finalize metadata

This function is used to finalize the header information before writing data. Once this function is called, no further metadata modification is allowed.