|
By clicking the Summaries tab, you can display summary statistics for each of the imported observations (i.e., attributes:
In case an attribute can be both categorical and numerical, you can select the type in the right panel. For instance, the binary attribute AtLeast3FusedRings occurs with values 0 and 1. These values can be interpreted as proper numbers or as category labels. In the first case, the attribute should be marked as numerical (the default), in the latter as categorical. Here, values 0 and 1 encode for FALSE and TRUE respectively, so they should be interpreted as categories.
- numerical-categorical: possible types of this variable; if the attribute can be both types, the right panel offers the possibility to choose (cf., below)
- min-max-avg: range and average of numerical attributes
- %missing: the percentage of examples where this attribute is missing
- nr categories: the number of different values found for attributes that could be categorical
- distribution: a histogram (numerical) or a pie chart (categorical)
Notice the numerical/categorical labeling of attributes has an impact on the process of building hypotheses. Whether they are involved as targets or as background knowledge components, attributes will be treated differently depending on their type. It is therefore recommended to carefully inspect the summary data, and change the default setting from numerical to categorical whenever appropriate, before starting to create hypotheses (see next Section).
|
© 2002-2007 PharmaDM, NV. All rights reserved.