醫(yī)學(xué)統(tǒng)計(jì)學(xué)-電子教材:Data Preparation

來(lái)源：南方醫(yī)科大學(xué)精品課程網(wǎng) 精品課程網(wǎng)

醫(yī)學(xué)統(tǒng)計(jì)學(xué):電子教材 Data Preparation:ContentDatapreparationTransformationsFillSeriesMeasurementscalesanddatatypesConvertingbetweenunitsLogtransformLogittransformationProbittransformationAngulartransformationCumulativetransformationLadder

Content

Page Data preparation

Page Transformations

Page Fill Series

Page Measurement scales and data types

Page Converting between units

Page Log transform

Page Logit transformation

Page Probit transformation

Page Angular transformation

Page Cumulative transformation

Page Ladder of powers transformation

Page Apply function

Page Date and time intervals

Page Ranking

Page Normal scores

Page Standardization

Page Sort data to new column

Page Sort within worksheet

Page Pairwise

Page Pairwise differences

Page Pairwise means

Page Pairwise slopes

Page Text to numbers

Page Random number generation

Page Rotate data block

Page Dummy variables

Page Search and replace

Page Clear missing data

Page Categorise

Page Extract a subset of data

Page Split by group identifier

Page Combine groups and form an identifier

Page Tabulate

Page Detabulate

Page Crosstabs

Arial;Courier New;Symbol;Helvetica;Courier;Tms Rmn;Helv;New York;System;Wingdings;MSMincho ;Batang ;SimSun ;PMingLiU ;MS Gothic ;Dotum ;SimHei ;MingLiU ;Mincho ;Gulim ;Century;AngsanaNew;CordiaNew;Mangal;Latha;Sylfaen;Vrinda;Raavi;Shruti;Sendnya;Gautami;Tunga;Estrangelo Edessa;Kartika;Arial Unicode MS;Tahoma;Verdana;TimesNew Roman CE;TimesNew Roman Cyr;Times New Roman Greek;Times New Roman Tur;Times New Roman (Hebrew);Times New Roman (Arabic);Times New RomanBaltic;Times New Roman (Vietnamese);Arial CE;Arial Cyr;Arial Greek;ArialTur;Arial (Hebrew);Arial (Arabic);Arial Baltic;Arial (Vietnamese);Courier NewCE;Courier New Cyr;Courier New Greek;Courier New Tur;Courier New(Hebrew);Courier New (Arabic);Courier New Baltic;Courier New(Vietnamese);Sylfaen Cyr;Sylfaen Greek;Verdana CE;Verdana Cyr;VerdanaGreek;Verdana Tur;Verdana Baltic;Verdana (Vietnamese);

Data preparation. 3

Transformations. 4

Fill series. 4

Measurement scales and data types. 5

Converting between units. 6

Logarithmic transformation. 8

Logarithmic transformation. 9

Probit transformation. 9

Angular transformation. 10

Cumulative transformation. 10

Ladder of powers transformations. 11

Apply function. 11

Date and time intervals. 13

Ranking.. 14

Normal scores. 15

Standardization. 16

Sort data to new column. 17

Sort within worksheet. 18

Pairwise. 19

Pairwise differences. 20

Pairwise means. 20

Pairwise slopes. 21

Text to numbers. 22

Random number generation. 22

Rotate data block. 26

Dummy variables. 27

Search and replace. 28

Clear missing data.. 31

Categorise. 32

Extract a subset of data.. 33

Split by group identifier.. 34

Combine groups and form an identifier.. 35

Tabulate. 36

Detabulate. 36

Crosstabs. 37

Datapreparation

·WORKBOOK

·EXCELLINKS

·MISSINGDATA

·TRANSFORMATIONS

·APPLYFUNCTION

·NORMALSCORES

·STANDARDIZATION

·CONVERTINGBETWEEN UNITS

·RANKING

·SORTINGDATA TO NEW COLUMNS IN WORKSHEETS

·SORTINGEXISTING CELLS WITHIN WORKSHEETS

·DATE& TIME INTERVALS

·PAIRWISE

·RANDOMNUMBERS

·ROTATEDATA BLOCK

·DUMMYVARIABLES

·TEXTTO NUMBERS

·SEARCH& REPLACE

·CLEARMISSING DATA

·CATEGORISE

·EXTRACTVARIABLES

·COMBINEGROUPS AND CREATE A GROUP IDENTDIFIER

·SPLITBY GROUP IDENTIFIER

·TABULATE

·DETABULATE

·CROSSTABS

·FREQUENCIES

Download a free 10 dayStatsDirect trial

Transformations

Menu location: Data_Transformations.

This section deals with somecommon transformations:

·Log(natural)

·Log (base10)

·Logit

·Probit

·Angular

·Cumulative

·Ladderof powers

Please note that othertransformations can be achieved by applyinga function to a column of data as described below.

Transforming a variable changesits measurement scale. Reasons for transformation include stabilizing variance,linearizing relationships, normalizing distributions and making data easier tohandle numerically (Armitage and Berry,1994; Bland and Altman, 1996c). Details of the above transformations arediscussed under their own titles, here we consider how to deal with othertransformations, for example a power transform:

Power

Example, for square root applyfunction v1^.5 or sqr(v1), for standardized version (v1^.5-1)/.5

Limitation, v1 must be ³ 0.

Standardized form: y = (x^c-1) /c

For c = 0.5 (square root):

Variance of x can be stabilizedespecially with Poisson distributed data.

For c = -1 (reciprocal):

Variance of x can be stabilizedwith distant high outliers, e.g. very long survival times c.f. mainly shorterones.

For c = 2 (square):

Variance can be stabilized whenvar(x) decreases with increasing x and with negatively skewed x.

Fill series

Menu location: Data_FillSeries.

This function fills a series ofrows in a worksheet with numbers according to a formula that you specify.

The default formula is VX+1 witha starting value of 1, which will give a sequence of numbers from 1 to thelength of series that you specify. A formula of VX+10 would give a series ofintegers spaced at 10, i.e. 10, 20 30, 40 etc. Complex non-linear formulae canbe used; see calculatorfor a list of functions that you can use in a formula.

Download a free 10 dayStatsDirect trial

Measurementscales and data types

It is important, in statisticalanalysis, to know about the different scales of measurement, these are:

·INTERVAL
Scale with a fixed and defined interval e.g.temperature or time.

·ORDINAL
Scale for ordering observations from low tohigh with any ties attributed to lack of measurement sensitivity e.g.score from a questionnaire.

·NOMINAL with order
Scale for grouping into categories with order e.g. mild, moderate orsevere. This can be difficult to separate from ordinal.

·NOMINAL without order
Scale for grouping into unique categoriese.g. eye colour.

·DICHOTOMOUS
As for nominal but two categories only e.g.male/female.

In addition to the classificationof measurement scales, other related terms are used to describe types of data:

·CATEGORICAL vs. NUMERICAL (quantitative vs. qualitative)
Data that represent categories, such as dichotomous (two categories) andnominal (more than two categories) observations, are collectively calledcategorical (qualitative). Data that are counted or measured using anumerically defined method are called numerical (quantitative).

·DISCRETE vs. ORDERED CATEGORICAL
Discrete data arise from observations that can only take certain numericalvalues, usually counts such as number of children or number of patientsattending a clinic in a year. Ordered categorical data are sometimes treated asdiscrete data, this is wrong. For example, using the Registrar General'sclassification of social class, it would be wrong to say that class I is fivetimes the socio-economic status as class V, as there is not a strict numericalrelationship between these categories. It follows, therefore, that averagesocial class is a meaningless statistic. Thus, ordered categorical data shouldnot be treated as discrete data for statistical analysis. Discrete data may betreated as ordered categorical data in statistical analysis, but someinformation is lost in doing so.

·CONTINUOUS
Continuous data are numerical data that can theoretically be measured ininfinitely small units. For example, blood pressure is usually measured to thenearest 2mm Hg, but could bemeasured with much greater resolution of difference. The interval measurementscale is intended for continuous data. Sometimes continuous data are givendiscrete values at certain thresholds, for example age a last birthday is adiscrete value but age itself is a continuous quantity; in these situations itis reasonable to treat discrete values as continuous. Remember that informationis lost when continuous data are recorded only in ranges (ordered categories),and the statistical analysis of continuous data is more powerful than that ofcategorical data.

·PERCENTAGES and RATIOS
Percentages or ratios summarise two pieces of information, namely theirconstituent numerator and denominator values. Simple ratios (0 to 1, i.e. thedenominator is the maximum possible value that the numerator can take) can be treatedas continuous data. More difficult to analyse data arise when the ratiorepresents a change, and the value can be negative. Ratios of observationscompared with reference values, e.g. height relative to the mean of a referencepopulation for a given sex and age, are difficult to handle as values may falleither side of 1 (100%).

Many statistical methods areappropriate only for data of certain measurement scales. When selectinga statistical method, it is essential to understand how the data to beanalysed were measured. The best stage of investigation for ponderingmeasurement scales is the design stage, at which the statistical limitationsimposed by certain measurement scales may influence your choice of observationsand methods of measurement.

Note that transforming datachanges their measurement scales.

Convertingbetween units

Converting from 1:	multiply by:	to give 1:
Atmospheres	76	CM of Mercury (0°C)
Atmospheres	29.921	Inches of Mercury (32°F)
Atmospheres	2116.32	Pounds/sq ft
Atmospheres	14.696	Pounds/sq ft
Barrels, oil	42	Gallon (US)
Barrels (API)	0.159	Meter³
BTU (15.56°C)	1055	Joule
BTU/Minute	0.0236	Horsepower
Calories (mean)	4.19	Joule
Centimetres	0.3937	Inches
Cubic centimetres	0.061	Cubic inches
Cubic Centimetres	0.00022	Gallons (British)
Cubic Centimetres	0.00026	Gallons (US)
Cubic Feet	0.037	Cubic Yards (US)
Cubic Feet	6.2288	Gallons (British)
Cubic Feet	7.4805	Gallons (US)
Cubic Feet	28.3162	Litres
Cubic Inches	16.3872	Cubic Cm
Cubic Meters	35.314	Cubic Feet
Cubic Meters	219.969	Gallons (British)
Cubic Meters	264.173	Gallons (US)
Degrees (F)	(Fahrenheit - 32) / 1.8	Degrees (C)
Degrees (C)	(Celsius * 1.8) + 32	Degrees (F)
Feet	0.3048	Meters
Feet of Water (39.2°F)	0.8826	Inches of Mercury (32°F)
Feet of Water (39.2°F)	62.427	Pounds/sq ft
Feet/minute	0.0183	Kilometres/hour
Feet/minute	0.005	Meters/second
Gallons (British)	0.1605	Cubic Ft
Gallons (British)	1.2009	Gallons(US)
Gallons (British)	4.5459	Litres
Gallons (US)	0.1337	Cubic Ft
Gallons (US)	0.8327	Gallons(British)
Gallons (US)	3.7853	Litres
Grams	0.03527	Ounces
Horsepower	1.014	Cheval-Vapeur
Horsepower	745.7	Watts
Inches	2.54	Centimetres
Inches of Mercury (32°F)	0.0334	Atmospheres
Kilograms	2.2046	Pounds
Kilometres	3280	Feet
Kilometres	0.6213	Miles
Kilowatts	56.884	BTU/Minute
Litres	0.0353	Cubic Feet
Litres	0.2199	Gallons (British)
Litres	0.2641	Gallons (US)
Meters	3.2808	Feet
Miles	5280	Feet
Miles	1.6093	Kilometres
Miles	320	Rods
Miles (nautical)	6080	Feet
Millilitres	0.035	Ounces (Fluid-British)
Millilitres	0.0338	Ounces (Fluid-US)
Millimetres	0.039	Inches
Ounces	28.3495	Grams
Ounces (British)	28.4130	Cubic Cm
Ounces (US)	29.5737	Cubic Cm
Pounds	453.5924	Grams
Pounds/sq ft	0.000472	Atmospheres
Pounds/sq in	0.0680	Atmospheres
Pounds/sq in	2.036	Inches of Mercury (32°F)
Quarts (British)	1136.521	Cubic Cm
Quarts (US)	946.3586	Cubic Cm
Tons (long)	1016.047	Kilograms
Tons (long)	2240	Pounds
Tons (long)	1.016	Tons (metric)
Tons (long)	1.12	Tons (short)
Yards	91.44	Centimetres

Download a free 10 dayStatsDirect trial

Logarithmictransformation

Menu location: Data_Transformations_Log(Natural) and Data_Transformations_Log (Base 10).

Logarithmic transformation can beused to stabilise the variance of a sample (Bland, 2000).

The function named Log (natural)calculates the natural (Naperian, log to the base e) logarithm of the data youselect.

The function named Log (base 10)calculates the common (log to the base 10) logarithm of the data you select.

The results are stored in a newcolumn that is marked Log(natural):<name> or Log(base 10):<name>where <name> is the original column label. Indeterminable values (i.e.log of any data < 0) are marked as missing data unless you opt to add asuitable constant to all data (you are prompted if you need to do this). It isbest to consult with a statistician before using constants in logtransformations.

Effects of log transformation:

·Variance stabilisation.

·Increasing slopes in x in relation to another variable arelinearized.

·Positively skewed distributions of x are normalized.

Download a free 10 dayStatsDirect trial

Logarithmictransformation

Menu location: Data_Transformations_Log(Natural) and Data_Transformations_Log (Base 10).

Logarithmic transformation can beused to stabilise the variance of a sample (Bland, 2000).

The function named Log (natural)calculates the natural (Naperian, log to the base e) logarithm of the data youselect.

The function named Log (base 10)calculates the common (log to the base 10) logarithm of the data you select.

Effects of log transformation:

·Variance stabilisation.

·Increasing slopes in x in relation to another variable arelinearized.

·Positively skewed distributions of x are normalized.

Download a free 10 dayStatsDirect trial

Probittransformation

Menu location: Data_Transformations_Probit.

Probit is a common transformationfor linearizing sigmoid distributions of proportions (Armitage and Berry,1994). The probit is defined as 5 + the 1-p quantile from the standard normaldistribution, where p is a proportion. For example, the number of insectskilled by the log dose of an insecticide might describe a sigmoid relationship,which is a rectangular hyperbolic relationship to the non-log transformed dose.This sort of quantal response situation can be treated as a linear problemafter probit transformation. Probitanalysis uses these principles.

You can supply proportions ordiscrete data for logit transformation. If you specify discrete data thenStatsDirect converts these to proportions by taking each value as a proportionof the maximum of the supplied data. The results are stored in a new columnthat is marked Probit:<name> where <name> is the original columnlabel.

StatsDirect marks indeterminablevalues as missing data, i.e. p=0 or p=1. StatsDirect probit analysis, on theother hand, treats p=0 and p=1 ina more complex way that includes them in the overall analysis.

Download a free 10 dayStatsDirect trial

Angulartransformation

Menu location: Data_Transformations_Angular.

Angular transformation can beused to linearize sigmoid distributions and equalize variances of proportions (Bland, 2000).The angle is defined as arcsin (sqr(p)), where sqr(p) is the square root of aproportion. Logit and probit transformations do not equalize variances butlogistic regression and probit analysis do give special treatment to variancesin the way that they handle expected error distributions. Angular regression isseldom used but angle transformed data can be useful for exploratory dataanalysis by experts.

You can supply proportions ordiscrete data for angular transformation. If you specify discrete data thenStatsDirect converts these to proportions by taking each value as a proportionof the maximum of the supplied data. The results are stored in a new columnthat is marked Angle:<name> where <name> is the original columnlabel.

Indeterminable values are markedas missing data.

Download a free 10 dayStatsDirect trial

Cumulativetransformation

Menu location: Data_Transformations_Cumulative.

This function takes a set of dataand returns the cumulative set in an equal column marked with the labelCumulate:<name> where <name> is the column label of the originaldata. Cumulative data need to have some meaning to the order in which they arederived.

Example:

Data A, become --->	Cumulate:Data A
1	1
3	4
5	9
4	13
7	20
9	29

Download a free 10 dayStatsDirect trial

Ladder ofpowers transformations

Menulocation: Data_Transformations_Ladder of Powers.

This function provides a seriesof transformations that have increasing power to pull in the right hand tail ofa distribution. The ladder consists of:

1/x²	reciprocal of square (power -2)
1/x	reciprocal (power -1)
ln(x)	natural logarithm
x½	square root (power 0.5)
x²	square (power 2)

For more information see "tpayment-defi.comransformingdata".

Download a free 10 dayStatsDirect trial

Apply function

Menu location: Data_ApplyFunction.

This applies an algebraicexpression to any selection of data. The workbook column(s) you select aregiven the variable names V1, V2 ... Vn in the order that they are selected fromleft to right across the workbook. You can then create a new column as afunction of those selected. You do this by entering an algebraic expressionthat can use any of the functions, operators and constants below. For exampleV1/SQR(V2) would produce a column where each row entry was that of the firstselected column divided by the square root of the corresponding row entry inthe second selected column. A maximum of 9 variables/columns can be selected.

The same results can be achievedusing intrinsic functions within a StatsDirect workbook. Large complexfunctions may be handled faster by the intrinsicworkbook functions of StatsDirect.

Constants
PI	3.14159265358979323846 (p)
EE	2.71828182845904523536 (e)

Arithmetic Functions
ABS	absolute value
CLOG	common (base 10) logarithm
CEXP	anti log (base 10)
EXP	anti log (base e)
LOG	natural (base e, Naperian) logarithm
LOGIT	logit: log(p/[1-p]), p=proportion
ALOGIT	antilogit: exp(l)/[1+exp(l)], l=logit
SQR or SQRT	square root
!	factorial (maximum 170.569)
LOG!	log factorial
IZ	normal deviate for a p value
UZ	upper tail p for a normal deviate
LZ	lower tail p for a normal deviate
TRUNC or FIX	integer part of a real number
CINT	real number rounded to nearest integer
INT	real number truncated to integer closest to zero

Please note that the largestfactorial allowed is 170.569398315538748, but you can work with Log factorialsvia the LOG! function, e.g. LOG!(171).

Arithmetic Operators
^	exponentiation (to the power of)
+	addition
-	subtraction
*	multiplication
/	division
\	integer division

Calculations give an order ofpriority to arithmetic operators; this must be considered when enteringexpressions. For example, the result of the expression "6 - 3/2" is4.5 and not 1.5 because division takes priority over subtraction.

Priority of arithmeticoperators in descending order

1. Exponentiation(^)

2. Negation(-X) (Exception = x^-y; i.e. 4^-2 is 0.0625 and not -16)

3. Multiplicationand Division (*, /)

4. IntegerDivision (\)

5. Additionand Subtraction (+, -)

Trigonometric Functions
ARCCOS	arc cosine
ARCCOSH	arc hyperbolic cosine
ARCCOT	arc cotangent
ARCCOTH	arc hyperbolic cotangent
ARCCSC	arc cosecant
ARCCSCH	arc hyperbolic cosecant
ARCTANH	arc hyperbolic tangent
ARCSEC	arc secant
ARCSECH	arc hyperbolic secant
ARCSIN	arc sine
ARCSINH	arc hyperbolic sine
ATN	arc tangent
COS	cosine
COT	cotangent
COTH	hyperbolic cotangent
CSC	cosecant
CSCH	hyperbolic cosecant
SIN	sine
SINH	hyperbolic sine
SECH	hyperbolic secant
SEC	secant
TAN	tangent
TANH	hyperbolic tangent

To convert degrees to radians,multiply degrees by pi/180. To convert radians to degrees, multiply radians by180/pi.

Logical Functions
AND	logical AND
NOT	logical NOT
OR	logical OR
<	less than
=	equal to
>	greater than

Download a free 10 dayStatsDirect trial

Date andtime intervals

Menu location: Data_Date &Time Intervals.

This function enables you tocreate a column of intervals from a column of dates or times. You must havedate or time data in your workbook to do this. The StatsDirect workbookrecognises input such as 1/3/67 or 12:55 as datesand times.

You can use this function to workout the distance of each date or time entry from a point which you specify,this point can be a date, a time or a date with a time after it. You must also specifythe interval that you want StatsDirect to calculate, this can be years, months,weeks, days, hours, minutes or seconds. Positive intervals represent dates ortimes later than the reference point which you entered and negative intervalsreflect earlier dates/times.

Example

Test workbook (Other worksheet:Event).

To working out the interval indays from a reference point of 1/4/96to the dates in the following list you can use the "Date & TimeInterval" function of the "Data" menu.

Event ----->	Event~Days from 01/04/96
01/03/97	334
02/05/97	396
12/12/96	255
01/11/96	214
03/01/97	277
05/05/97	399
07/08/97	493
08/10/96	190

Ranking

Menu location: Data_Rank.

This function ranks the workbookdata you select and saves the rankings into a new workbook column labelledRank: Name where Name is the column label of the original data. You cancalculate a correction factor for ties in the ranking; five formulae areoffered for tie correction:

1.S(t^3 - t /12)

2.S(t * (t-1)/2)

3.S(t * (t-1) *(2t+5))

4.S(t * (t-1) *(t-2))

5.S(t * (t-1) *(t+1))

Here t is the number of data tiedat each tie and upper case sigma (S) is the summation across these ties.

The use of tie corrections is acomplex subject upon which learned statisticians sometimes disagree.

Example

Test workbook (Nonparametricworksheet: First Born).

Ranking the following agressivityscores for a sample of firstborn twins gives.

First Born ----->	Rank: First Born
86	8
71	3.5
77	6.5
68	1
91	11.5
72	5
77	6.5
91	11.5
70	2
71	3.5
88	10
87	9

Download a free 10 dayStatsDirect trial

Normalscores

Menu location: Data_NormalScores.

This function saves the normalscores of workbook data you select into a new workbook column marked Nml Score:Name, where Name is the column label of the original data.

Three different methods forcalculating normal scores are provided:

1. vander Waerden's method (Conover, 1999):

- where s is the normal score for an observation, r is the rank for thatobservation, n is the sample size and F(p) is the pth quantile from the standardnormal distribution.

2. Blom's method (Altman, 1991):

- where s is the normal score for an observation, r is the rank for thatobservation, n is the sample size and F(p) is the pth quantile from the standardnormal distribution.

3. Expectednormal order scores (David, 1981;Royston, 1982; Harter, 1961)

- where s is the normal score for an observation, r is the rank for thatobservation, n is the sample size, f(p) is the standard normal density for p and F(p) is the pthquantile from the standard normal distribution. The solution is found bynumerical integration. Calculation of expected normal order scores is notpractical for very large samples, n of 2500 is the maximum permitted inStatsDirect.

Example

Test workbook (Nonparametricworksheet: First Born).

Scoring the following agressivityscores for a sample of firstborn twins using the first method above gives:

First Born ----->	Nml Score(vdW): First Born
86	0.2934
71	-0.6151
77	0
68	-1.4261
91	1.1984
72	-0.2933
77	0
91	1.1984
70	-1.0201
71	-0.6151
88	0.7363
87	0.5024

Download a free 10 dayStatsDirect trial

Standardization

Menu location: Data_Standardization

This function saves standardizedvalues of workbook data you select into a new workbook column marked Std.(Method): Name, where Name is the column label of the original data and Methodis one of the methods listed below.

Three different methods forstandardization are provided:

1. (x -mean) / standard deviation

2. x /standard deviation

3. x -mean

Example

Test workbook (Nonparametricworksheet: First Born).

Standardizing the followingagressivity scores for a sample of firstborn twins using the first method abovegives.

First Born --->	Std ((x-mean)/SD): First Born
86	0.7782
71	-0.9095
77	-0.2344
68	-1.2470
91	1.3408
72	-0.7970
77	-0.2344
91	1.3408
70	-1.0220
71	-0.9095
88	1.0033
87	0.8907

Sort data to newcolumn

Menu location: Data_Sort_CreateSorted Columns.

StatsDirect enables you to createsorted copies of workbook columns or to manipulate data within a selected datarange of a workbook. This section deals with creating new sorted workbookcolumns based on existing ones. See also workbooksort.

This function sorts workbook datayou select and saves the sorted data into a new workbook column marked Sort:Name, where Name is the column label of the original data. Sorting may beascending or descending. The sort may also be tied to other data, i.e. the datain column b may be sorted in the in the order of sorting the data in column a.Tied sorting can be repeated for any number of columns.

Example 1

Test workbook (Nonparametricworksheet: First Born).

Sorting the following agressivityscores for a sample of firstborn twins in ascending order gives.

First Born ----->	Sort: First Born
86	68
71	70
77	71
68	71
91	72
72	77
77	77
91	86
70	87
71	88
88	91
87	91

Example 2

Test workbook (Nonparametricworksheet: First Born, Second Born).

Sorting the following agressivityscores for a sample of second born twins by the ascending order of the scoresfor firstborn twins gives.

First Born	Second Born ----->	Sr~Second Born~First Born
86	88	64
71	77	65
77	76	80
68	64	77
91	96	72
72	72	76
77	65	65
91	90	88
70	65	72
71	80	81
88	81	96
87	72	90

Download a free 10 dayStatsDirect trial

Sort withinworksheet

Menu location: Data_Sort_ManipulateWorksheet.

StatsDirect enables you to createsorted copies of worksheet columns or to manipulate data within a selected datarange of a worksheet. This section deals with sorting data within existingranges of data that you select in a worksheet. See also sortingcolumns.

PLEASE NOTE that for thisfunction you should not select columns by clicking on the headers of thecolumns. Select only the data ranges that you want to sort by dragging themouse across them with the left hand mouse button held down.

This function manipulatesselected workbook data ranges by sorting the data. Sorting may be ascending ordescending and either up/down columns (by row) or from side to side (bycolumn). You select a key which is the cell where you want the sorting to beginand in the row or column upon which your sort order is based. Selecting morethan one key enables you to perform a nested sort, i.e. sort on key one (saythe data in column x) then if there are repeats of the same value in column xlook in the key two column (say column y) to order the repeated valuesaccording to the order of the data in this secondary key and so on.

Example 1

Sorting the following data rangesascending order by column with key as the first data cell (at value 86) gives:

86 ------>	68
71	70
77	71
68	71
91	72
72	77
77	77
91	86
70	87
71	88
88	91
87	91

Example 2

Sorting the following data rangesby the ascending order of the data in the first column by row with key as thefirst data cell in the first column (86) gives:

86	88 ------>	68	64
71	77	70	65
77	76	71	80
68	64	71	77
91	96	72	72
72	72	77	76
77	65	77	65
91	90	86	88
70	65	87	72
71	80	88	81
88	81	91	96
87	72	1	90

Pairwise

·Pairwisedifferences

·Pairwisemeans

·Pairwiseslopes

Menu location: Data_Pairwise.

These functions providetransformations based on pairwise calculations either between columns or withina column, they often provide intermediate steps in non-parametric methods.

Download a free 10 dayStatsDirect trial

Pairwisedifferences

Menu location: Data_Pairwise_Differences.

This function creates a column ofall possible (n*n) pairwise differences from a pair of columns:

X	Y
1	2
2	3
3	6

Differences (X, Y)

-1

-2

-5

-1

-4

-3

All possible (n * n) Xi...(i=1 tok) are compared with Yj...(j=1 to k).

These contrasts form intermediatesteps in a number of non-parametric statistical methods.

Download a free 10 dayStatsDirect trial

Pairwisemeans

Menu location: Data_Pairwise_Means.

This function creates a column ofall possible (n(n+1)/2) pairwise means within a column:

Means (Within X)

2.5

4.5

All possible (n(n+1)/2) Xi...(i=1to n) are averaged with Xj...(j=i to n).

These values form intermediatesteps in a number of non-parametric statistical methods.

Download a free 10 dayStatsDirect trial

Pairwiseslopes

Menu location: Data_Pairwise_Slopes.

This function creates a column ofall (n(n-1)/2) pairwise slopes from a pair of columns (Y the response variableand X the predictor variable):

X	Y
1	2
2	3
3	6

Slopes (X, Y)

All possible (n(n-1)/2)XiYi...(i=1 to n) are compared with XjYj...(j=i to n) where i<>j. Slopeis (Yi-Yj)/(Xi-Xj).

These values form intermediatesteps in a number of non-parametric statistical methods. For example, innon-parametric linear regression (Theil type: Conover, 1999)the median slope would be 2 in theusual Y = bX +C, where C is the median intercept (i.e. median[Y - median b *X]).

StatsDirect will calculate themedian slope and its confidence interval for you if you select this option whenprompted. The results are stored in the column label of the relevant list ofpairwise slopes. The confidence interval is calculated using the distributionof Kendall's T as described in Conover (Conover, 1999).Please note that the results may differ slightly from Conover's text becauseStatsDirect calculates the inverse of Kendall'sT more accurately than the routines used to calculate Best's widely quoted 1974table for this statistic.

A fuller non-parametriclinear regression function is given in the non-parametric methods andregression sections.

Download a free 10 dayStatsDirect trial

Text tonumbers

Menu location: Data_Text toNumbers.

This function codes text asnumbers to enable analysis. An example of where you might want to do this is aworkbook that has been created from a database where certain categories areentered as text:

Category	Data
TA	2.3
TA	3.4
TB	5.6
TB	4.5
TB	4.9
TA	3.9
TB	3.9

If you enter these data into twocolumns in a StatsDirect workbook, you can convert the Category column tonumbers using the Text to Numbers function from the Data menu. If you selectthe whole of this column by clicking on the header, the first row will have theword "Category" in it. To avoid counting "Category" as oneof your categories when it is really the column title, click on "No"when you are asked if the top row is to be included in the conversion. Theresults will be as follows:

Category	Data
1	2.3
1	3.4
2	5.6
2	4.5
2	4.9
1	3.9
2	3.9

StatsDirect can code textautomatically as in the example above. You may also specify your own numericcode for each different text string by entering any valid number alongside thematching text in a substitution table. The substitution table only appears ifyou click "Yes" when StatsDirect prompts you to specify numbers.

Download a free 10 dayStatsDirect trial

Randomnumber generation

Menu location: Data_RandomNumbers.

This function enables you tocreate one or more series of random numbers from given distributions.

A robust generatorof uniform (pseudo)random numbers is used as the basis for generating deviatesfrom the probability distributions described below. You are given theopportunity to enter your own seed number to be used by the random numbergenerator but you should use the default seed (based upon your computer'sclock) in most cases. Please note that each seed generates its own series andthat series is the same if you use the seed again. You have very little chanceof using the same seed twice if you select the seed that StatsDirect suggests.

These functions are intended forsimulation work; they employ widely cited and debated algorithms (Gentle, 2003).

Uniform (continuousuniform, rectangular)

The continuous uniformdistribution has a constant density function of the interval (a, b) and thus arectangular shape from a to b:

- where a and b lie between minusand plus infinity.

The interval most commonly usedis 0 to 1

Normal (Gaussian)

The commonly used standard normaldistribution (mean of 0 and standard deviation of 1) is one of a family ofnormal distributions defined by the density function:

- where mean m lies betweenminus and plus infinity and standard deviation s is greater than zero.

Algorithm: inversion of thecumulative distribution function (Wichura, 1988;Gentle 2003)

Lognormal

The density of a lognormaldistribution is given by:

- where mean m lies betweenminus and plus infinity and standard deviation s is greater than zero.

The mean of the lognormaldistribution, as opposed to the mean of the underlying normal distribution, isequal to exp(m+s*s/2) and the variance is equal to exp(2m+2s*s)-exp(2m+ s*s).

Algorithm: transformed inversionof the cumulative distribution function (Wichura, 1988;Gentle 2003)

Exponential

The (negative) exponentialdistribution is a special case (shaping parameter of 1) of the gammadistribution. Its density is given by:

- where the parameter l must be greaterthan zero.

Algorithm: transformation (Ahrens & Dieter,1972).

Gamma

The density function of the gammadistribution is given by:

- where the parameters l and r (shapingparameter) must be greater than zero.

G(*) is the gamma function:

,x>0

Please note that gamma deviateswith a shaping parameter of 0.5 are half the square of normal deviates andgamma deviates with a shaping parameter of 1 are exponential deviates.

Algorithm: acceptance-rejectionmethods GD and GS (Ahrens& Dieter, 1974, 1982b).

Binomial

The density function of the binomialdistribution is given by:

- where p lies between 0 and 1 in n ranges.

Algorithm: acceptance-rejectionmethod BTPEC (Kachitvichyanukul& Schmeiser, 1988).

Poisson

The density function of the Poissondistribution is given by:

- where parameter l is greater thanzero.

Algorithm: acceptance-rejection (Ahrens & Dieter,1982a).

Chi-square

The density function of the chi-squaredistribution is given by:

The deviates are calculated asgamma deviates with parameters n/2 and 2, where n is degrees of freedom.

Algorithm: Transformed gammadeviates (Ahrens& Dieter, 1974,1982b; Gentle 2003).

F (variance ratio)

The density function of the F distribution is given by:

The deviates are calculated asnx/dz where n is the numerator degrees of freedom, d is the denominator degreesof freedom, x is a gamma deviate with parameters n/2 and 2 (chi-square with ndegrees of freedom), and z is a gamma deviate with parameters d/2 and 2(chi-square with d degrees of freedom).

Algorithm: Transformed gammadeviates (Ahrens& Dieter, 1974, 1982b; Gentle 2003).

Student's t

The density function of Student's tdistribution is given by:

The deviates are calculated as astandard normal deviate multiplied by the square root of the degrees of freedom(n) divided by a gamma deviate with parameters n/2 and 2 (a chi-square deviatewith n degrees of freedom).

Algorithm: Transformed standardnormal and gamma deviates (Ahrens & Dieter,1974, 1982b; Gentle 2003).

Beta

The density function of the beta distributionis given by:

Algorithm: Acceptance-rejectionmethods BB and BC (Cheng1978).

Logistic

The density function of the betadistribution is given by:

Algorithm: Transformed uniformdeviates (Gentle2003).

Cauchy

Algorithm: Transformed uniformdeviates (Gentle2003).

Weibull

Algorithm: Transformed uniformdeviates (Gentle2003).

Geometric

Algorithm: Transformed Poissonand exponential deviates (Devroye, 1986;Gentle 2003; Ahrens & Dieter, 1972, 1982a).

Negative binomial

Algorithm: Transformed Poissonand gamma deviates (Devroye, 1986;Gentle 2003; Ahrens & Dieter, 1974, 1982b, 1982a).

Download a free 10 dayStatsDirect trial

Rotate datablock

Menu location: Data_RotateData Block.

This function rotates a selectedblock of data from a StatsDirect workbook in the following way:

Example

	1..	2..	3..
1	1.1	0.7	1
2	1.5	0.6	2
3	1.6	0.6	3
4	1.8	0.5	4

...this would become:

	1..	2..	3..	4..
1	1.1	1.5	1.6	1.8
2	0.7	0.6	0.6	0.5
3	1	2	3	4

... in other words rows becomecolumns and columns become rows.

The resulting rotation is bestsaved to a new workbook.

You are given the option to keepexisting formulae or to translate them into results before rotation. Pleasenote that if you keep formulae and save your rotated data to another worksheet,the references in the formulae are likely to be incorrect.

Download a free 10 dayStatsDirect trial

Dummyvariables

Menu location: Data_DummyVariables

This function creates dummy (ordesign) variables from one categorical variable.

The reference cell coding modelis used (Kleinbaumet al., 1998):

- the source data may benumerical or text, representing categories. The coding scheme shown above isapplied to your data in reverse alphanumeric order for the k categoriesfound, so for three categories, say race equal to black, white or other, white(being the last in an alphabetical sorting) is coded 1,0,0 which reduces todummy variables X (3) = 1, X (2) = 0.

In order to represent acategorical variable with more than two levels in a regression model you maywish to convert it to a series of dummy variables using this function.

Say a linear regression model isspecified with three predictors; the first and third predictors are continuousdata, and the second predictor is a classifier (categorical data) with threelevels. The second predictor should be converted to two dummy dichotomousvariables (e.g. the example below) and put into a multiple linear regression astwo predictors.

The naming scheme for dummyvariables is the original variable name suffixed with (1) if there are only twocategories, or suffixed with (j+1) where there are j+1 categories giving riseto j dummy variables.

In general form, a regressionmodel where the jth predictor variable is a classifier with klevels can be interpreted as follows, provided the jth variable isconverted to dummy variables:

- where Y is the outcomevariable, b is a regression coefficient, D is a dummy variablefor a classifier variable of k levels and x is a non-classifierpredictor variable.

Example

Group ID --->	Group ID (2)	Group ID (3)
1	0	0
1	0	0
1	0	0
1	0	0
2	1	0
2	1	0
2	1	0
2	1	0
3	0	1
3	0	1
3	0	1

Download a free 10 dayStatsDirect trial

Search andreplace

Menu location: Data_Search& Replace.

This function searches the activeworksheet for data that match the conditions you specify. When the condition,for example data > 20, is found then you have the option to replace orsimply count the data that match this condition.

Note that you can use the replacefunction to remove cells or rows from a worksheet by leaving the "replacewith" box blank.

Example

The following are data importedfrom an application which uses -99 as the missing data value:

12
1
2
3
4
-99
4
2
1
-99
-99

To replace -99 with StatsDirect missing data youcan use the search function with the "search for x" condition set toequal to -99 and the "replace with" entry set to *. To run thisexample, enter the above data into a workbook column and selectthat column. Then select search from the data menu. Check the "equalto" option, enter -99 inthe "search for x" box and enter * in the "replace with"box. When you click on the OK button the -99 values will be replaced with * andthese will also be displayed as an asterisk (missing data) in the workbook.

The match expression option ofthe StatsDirect search form enables you to select data for an expression whichevaluates as true. You use x to denote the data to match, i.e. x ³ 10 and x £ 20 matches allselected data between 10 and 20 inclusive, you can count or replace the matchesby clicking the relevant button. You can compose an expression using x and anyof the functions, operators and logical expressions below:

Constants
PI	3.14159265358979323846 (p)
EE	2.71828182845904523536 (e)

Arithmetic Functions
ABS	absolute value
CLOG	common (base 10) logarithm
CEXP	anti log (base 10)
EXP	anti log (base e)
LOG or LN	natural (base e, Naperian) logarithm
LOGIT	logit: log(p/[1-p]), p=proportion
ALOGIT	antilogit: exp(l)/[1+exp(l)], l=logit
SQR or SQRT	square root
!	factorial (maximum 170.569)
LOG!	log factorial
IZ	normal deviate for a p value
UZ	upper tail p for a normal deviate
LZ	lower tail p for a normal deviate
TRUNC or FIX	integer part of a real number
CINT	real number rounded to nearest integer
INT	real number truncated to integer closest to zero

Please note that the largestfactorial allowed is 170.569398315538748, but you can work with Log factorialsvia the LOG! function, e.g. LOG!(171).

Arithmetic Operators
^	exponentiation (to the power of)
+	addition
-	subtraction
*	multiplication
/	division
\	integer division

Calculations give an order ofpriority to arithmetic operators, this must be considered when enteringexpressions. For example, the result of the expression "6 - 3/2" is4.5 and not 1.5 because division takes priority over subtraction.

Priority of arithmetic operators in descending order
1.	Exponentiation (^)
2.	Negation (-X) (Exception = x^-y; i.e. 4^-2 is 0.0625 and not -16)
3.	Multiplication and Division (*, /)
4.	Integer Division (\)
5.	Addition and Subtraction (+, -)

Trigonometric Functions
ARCCOS	arc cosine
ARCCOSH	arc hyperbolic cosine
ARCCOT	arc cotangent
ARCCOTH	arc hyperbolic cotangent
ARCCSC	arc cosecant
ARCCSCH	arc hyperbolic cosecant
ARCTANH	arc hyperbolic tangent
ARCSEC	arc secant
ARCSECH	arc hyperbolic secant
ARCSIN	arc sine
ARCSINH	arc hyperbolic sine
ATN	arc tangent
COS	cosine
COT	cotangent
COTH	hyperbolic cotangent
CSC	cosecant
CSCH	hyperbolic cosecant
SIN	sine
SINH	hyperbolic sine
SECH	hyperbolic secant
SEC	secant
TAN	tangent
TANH	hyperbolic tangent

To convert degrees to radians,multiply degrees by pi/180. To convert radians to degrees, multiply radians by180/pi.

Logical Functions
AND	logical AND
NOT	logical NOT
OR	logical OR
<	less than
=	equal to
>	greater than

Download a free 10 dayStatsDirect trial

Clearmissing data

Menu location: Data_ClearMissing Data

This function enables you toclear missing data values from a selection of cells, and to optionally preservethe matching of rows in that selection.

For example, you might want toignore incomplete records. Here, each row is a record for one individual, soyou would normally choose to preserve the matching of rows when prompted:

Sex	Age	Score
1	23	2.3
1	33	3.1
1	19
2	21	1.9
2	43
2	39	5.0
2	26	1.1

Clearing missing data with therow matching option ON would produce:

Sex	Age	Score
1	23	2.3
1	33	3.1
2	21	1.9
2	39	5.0
2	26	1.1

…and clearing missing data withthe row matching option OFF would produce:

Sex	Age	Score
1	23	2.3
1	33	3.1
1	19	1.9
2	21	5.0
2	43	1.1
2	39
2	26

StatsDirect treats the followingnumber, characters and text as missing data:

3E+300

missing

You may additionally definemissing numbers or text codes when prompted. This is often useful when you havebeen given data with missing values coded in a specific way, e.g. a score of -1 in the example above.

Download a free 10 dayStatsDirect trial

Categorise

Menu location: Data_Grouping_Categorise.

This function enables you tocategorise any set of data into groups that you specify, for example ages intoage groups.

Typically, a continuousvariable might be divided into categories or groups. Take the IgM variable inthe parametric sheet of the test workbook for example; this has 298observations which you might want to summarise in ranges of values. In order todo this, simply select the Data_Grouping_Categorise menu item then select theIgM column of data. You are presented with different ways to group your datainto bins (intervals) of counts:

·Quartiles
4 bins (< lower quartile; lower quartile to median; median to upperquartile; >= upper quartile)

·Quintiles
5 bins (< first quintile…; >= fourth quintile)

·Deciles
10 bins (<first decile…; >=ninth decile)

·Age groups
one of four common groupings:
<15; 15-19… five yearly bands to 85+
<15; 15-24… ten yearly bands to 85+
<1; 1-4 … five yearly bands to 85+
<1; 1-4… ten yearly bands to 75+

·User-defined
from minimum min, in k intervals of equal size = step:
<min+1*step; >= min+1*step to < min+2*step… in k intervals to >=min+ k*step

Using the IgM example inquartiles:

category	count
< 0.5	56
>= 0.5; < 0.7	67
>= 0.7; < 1	98
>= 1	77

Using the IgM example in 10 intervals of 0.5 from:

category	count
< 0.5	56
>= 0.5; < 1	165
>= 1; < 1.5	54
>= 1.5; < 2	14
>= 2; < 2.5	6
>= 2.5; < 3	2
>= 3; < 3.5	0
>= 3.5; < 4	0
>= 4; < 4.5	0
>= 4.5	1

A quick look at the counts aboveshows a similar picture to that you would see from a histogram,namely that the data are not evenly spreadinto ranges of values, i.e. they are skewed.The text-basedhistogram will give you counts, but note that the bin values in a histogramare the mid-point of the bin and not the cut-off value between bins, i.e. theyare the same as a user-defined bin cut-off values minus half of the step size.

Download a free 10 dayStatsDirect trial

Extract asubset of data

Menu location: Data_Grouping_Extract.

This function enables you toextract subsets of data from a column of numbers, based upon an expression thatrefers to indicators in related rows of worksheet.

For example, you might want toselect only the rows from a score data column for which sex=2 and age>30;i.e. extract into one new variable, all data for females over 30:

Sex	Age	Score
1	23	2.3
1	33	3.1
1	19	3.2
2	21	1.9
2	43	4.2
2	39	5.0
2	26	1.1

Use the Data_Grouping_Extractmenu item with Score as the data variable and Sex and Age and the indicatorvariables to create a new variable with the expression "v1=1 andv2>30":

Score [Sex=2 AND Age>30]

4.2

You can compose an expressionusing v1, v2 etc. and any of the functions, operators and logical expressionsdescribed under search andreplace. Here are some examples of expressions for the example of the twoindicators mentioned above:

V1: Sex

V2 Age

"v1=1" (not all of theselected/listed indicator variables have to be used)

"v2<>21 andv2>16"

"v1=2 or v2>25"

"(v1=1 and v2<=25) or(v1=2 and v2<=30)"

Download a free 10 dayStatsDirect trial

Split bygroup identifier

Menu location: Data_Grouping_Split.

This function splits a datacolumn into other data columns according to one or more group identifier columns.The split data are put into new workbook columns that are marked Name~G=m,Name~G=f etc, where Name is the original combined data column label and G=xrepresent the group identifiers that you selected. It is wise to keepidentifier schemes simple.

Example

Gp	Data --->	Data~Gp=1	Data~Gp=2	Data~Gp=3
1	1.1	1.1	0.7	1.9
1	1.3	1.3	1.0	2.2
1	0.9	0.9	0.6	1.7
2	0.7	1.5	1.1
2	1.0	1.3
3	1.9
1	1.5
1	1.3
2	0.6
3	2.2
3	1.7
2	1.1

Some other statistical packagesrequire group identifier columns for group comparison tests, where all data areexpected to be in a single column. StatsDirect does not do this for simplegroup comparison methods because separate data columns aid comprehension of theanalysis. If you need to share data between StatsDirect and statisticalpackages that require group identifiers then use this function and the combinegroups function to prepare data prior to import/export.

Download a free 10 dayStatsDirect trial

Combinegroups and form an identifier

Menu location: Data_Grouping_Combine.

This function combines data fromthe workbook columns you select and puts them into a single new data columnwith a group identifier column alongside it.

Example

Data~G1	Data~G2	Data~G3 --->	Group ID	Data
1.1	0.7	1.9	Data~G1	1.1
1.3	1.0	2.2	Data~G1	1.3
0.9	0.6	1.7	Data~G1	0.9
1.5	1.1		Data~G1	1.5
1.3			Data~G1	1.3
			Data~G2	0.7
			Data~G2	1.0
			Data~G2	0.6
			Data~G2	1.1
			Data~G1	1.9
			Data~G1	2.2
			Data~G1	1.7

Some other statistical packagesrequire group identifier columns for group comparison tests, where all data areexpected to be in a single column. StatsDirect does not do this for simplegroup comparison methods because separate data columns aid comprehension of theanalysis. If you need to share data between StatsDirect and statisticalpackages that require group identifiers then use this function and the groupsplit function to prepare data prior to import/export.

Download a free 10 dayStatsDirect trial

Tabulate

Menu location: Data_Grouping_Tabulate.

This a simple two way crosstabulation function. If you have two columns of numbers that correspond todifferent classifications of the same individuals then you can use thisfunction to give a two way frequency table for the cross classification.

If you need to stratify by athird classification variable, or you want to perform analyses straight away onyou cross tabulation, then please see the Analysis_Crosstabsfunction instead.

Example

A database of test scores containstwo fields of interest, sex (M=1, F=0) and grade of skin reaction to an antigen(none = 0, weak + = 1, strong + = 2). Here is a list of those fields for 10patients:

Sex	Reaction
0	0
1	1
1	2
0	2
1	2
0	1
0	0
0	1
1	2
1	0

In order to get a crosstabulation of these from StatsDirect you should enter these data in twoworkbook columns. Then choose Tabulate from the Grouping section of the Datamenu.

For this example:

	Reaction:0	reaction:1	reaction:2
	0	1	2
sex:0	2	2	1
sex:1	1	1	3

Download a free 10 dayStatsDirect trial

Detabulate

Menu location: Data_Grouping_Detabulate.

This a function to expand a twoway table of counts into a column indicator variable and a row indicatorvariable, i.e. to deconstruct a table back into individual level data. This isthe opposite of crosstabs or the tabulatefunction.

Example

Given the following table:

	reaction:0	reaction:1	reaction:2
sex:0	2	2	1
sex:1	1	1	3

You can use theData_Grouping_Detabulate menu item on the two by three table of counts to getback to individual level data with each row representing the classification ofa subject as follows:

Row Classifier	Column Classifier
0	0
1	1
1	2
0	2
1	2
0	1
0	0
0	1
1	2
1	0

Download a free 10 dayStatsDirect trial

Crosstabs

Menu location: Analysis_Crosstabs.

This a two or three way crosstabulation function. If you have two columns of numbers that correspond todifferent classifications of the same individuals then you can use thisfunction to give a two way frequency table for the cross classification. Thiscan be stratified by a third classification variable.

For two way crosstabs,StatsDirect offers a range of analyses appropriate to the dimensions of thecontingency table. For more information see chi-squaretests and exacttests.

For three way crosstabs, StatsDirectoffers either oddsratio (for case-control studies) or relative risk(for cohort studies) meta-analyses for 2 by 2 by k tables, and generalisedCochran-Mantel-Haenszel tests for r by c by k tables.

Example

A database of test scorescontains two fields of interest, sex (M=1, F=0) and grade of skin reaction toan antigen (none = 0, weak + = 1, strong + = 2). Here is a list of those fieldsfor 10 patients:

Sex	Reaction
0	0
1	1
1	2
0	2
1	2
0	1
0	0
0	1
1	2
1	0

In order to get a cross tabulationof these from StatsDirect you should enter these data in two workbook columns.Then choose crosstabs from the analysis menu.

For this example:

		Reaction
		0	1	2
Sex	0	2	2	1
	1	1	1	3

We could then proceed to an r byc (2 by 3) contingencytable analysis to look for association between sex and reaction to thisantigen:

Contingency table analysis

Observed	2	2	1	5
% of row	40%	40%	20%
% of col	66.67%	66.67%	25%	50%

Observed	1	1	3	5
% of row	20%	20%	60%
% of col	33.33%	33.33%	75%	50%

Total	3	3	4	10
% of n	30%	30%	40%