Skip to Main Content

Finding Full-Text, Peer Reviewed Psychology Articles: Statistical Terminology Glossary

This guide should help you in finding full-text, peer reviewed articles from JCCC library databases. If you continue to have difficulty in finding appropriate articles, call the Reference desk at 469-3178 for more assistance.

Glossary of Statistical Terms

This glossary intends help in understanding and deciphering basic terms and concepts one encounters from reading empirical research articles found in behavioral sciences journals.  Lots of special symbols, graphics, Greek letters, and tables appear which, at best, are confusing to beginning students, but are an integral component to statistical analysis and research methodology. Such nuances and understandings become more clear and meaningful as courses in statistics and behavioral research are pursued.


 

Correlation -- basic statistical indicator describing association between variables or two factors of "something", and expressed as a single, numeric value known as coefficient of correlation.  Widely used, the correlation coefficient [also termed, Pearson product moment correlation] is denoted by the letter " r ", its numerical value ranging between 0.0 to 1.0----no higher.  Strength and movement are two important correlation attributes; a positive or negative indicates relationship direction between two data sets and the numeric value (i.e., 0.63) relative strength of relationship. A value, 1.0 (+, -), indicates an optimum association, meaning know variable factor X allows one to predict an association to Y without error. Conversely, a correlation of 0.0 indicates noassociative relationship between data sets.

A little to weak correlation may range 0.01 to 0.29, progressing upward in relative descriptive terms of moderate, strong, and very strong---such as a value range between 0.70 to 0.99.

positive correlation (+) reveals both variables moving in the same direction---that is as one increases so does the other. Example:   One might posit a positive correlation between hours spent studying and exam scores.  More time devoted to study preparation associates with high scores.  

An inverse relationship exists when two data set variables have a negative correlation (-), that is as one increases the other tends to decrease, or vice versa.  Example:  In considering academic performance and video games, as the documented hours spent playing video games increases, one's GPA decreases.  Playing video games may negatively affect one's academic performance.

Causation vs Correlation---an Important Distinction to Grasp!!  Correlation should not be interpreted to imply causation; on a broad level both seem conceptually complementary but they are not the same and applying them interchangeably leads to errors in reporting and faulty reasoning.  Correlation does not evoke "cause and effect" interpretation as often occurs.  It does not tell a research if knowing one factor variable set (say X) allows prediction of factor set (say Y) or the converse.  Correlation's interpretation only reveals two data sets move in a way that is predictable.  As often happens, two things can and do occur together but it doesn't mean one caused the other no matter how conceptually appealing such linkage might seem.

Causation  - -  Going beyond correlation to posit causality requires other statistical analysis concepts and techniques such as: controlled study, establishing hypotheses, creating nearly identical control groups to test, administering different sets of treatments, and comparing observed outcomes for significance difference or none. If, from the test and control groups, the exhibited outcomes are substantially different, then the administered treatment or experience would likely have a causality relation to differing outcomes.

Hypothesis  - -  is a statement of explanation or an assumption for an observed behavior or event which may or may not be correct; it is fundamental to empirical research.  Its formulation may draw from experience, reason, or opinion and is worded as a specific statement about a relationship between two or more variables, expressed as a prediction of cause and effect which can be tested scientifically.  Hypotheses develop out of data and findings of previous, related research studies and are adapted to a newly conceived issue or problem to investigate. Identified variable in hypothesis must be specified in a way indicating how they might be measured.

Two hypothesis statements together are a formative beginning to empirical research:

Null Hypothesis -- is a statement which a research intends to reject, invalid, or disprove.  Given two identified variable sets, theNull posits or assumes no relationship and there will be no observed effect from an investigation.    (Example: College males score no better on math tests than do college females.)  From investigation, the found evidence, scientifically analyzed and significantly minimal of error or chance, is so compelling that the Null is rejected in favor of a posed alternative, its truth is incompatible with findings.  From extraneous factors and errors in flawed analysis, just because the Null is not rejected doesn't mean that statement is true; this another dimension for further reading about Type I and Type II errors, what they mean, and how interpret them.

Research [or Alternative] Hypothesis - -  Contrast to the "Null", the alternative hypothesis is the prediction of what one expects to happen or what significant difference/change is revealed; it underlays the reason(s) for why a research study is conducted.  What independent, predictor factor or experiment affects a dependent variable.  Example:  Tutoring in physics results in better performance in testing.

Rejecting the null hypothesis when, in fact, true is statistically termed a Type I error.  Not rejecting the null hypothesis when factually false is termed a Type II error

Mean - -  is a measure of the average score or set of numeric values in distribution.  It's calculated by adding up all of the values and dividing the resultant sum by the number of defined units.  Given three families have 1, 5, and 6 children the mean or average is the sum of children (12) divided by 3 = 4 children on average. Every value in a defined distribution is considered by the calculation.  Using the mean to represent a "typical" value in a set becomes skewed and misleading if the distribution itself contain several large values out of proportion to the others.  Means are best used with variable categories based on and using numbers such as:  years of education, weeks unemployed, age of marriage, etc..

Median - -  is a statistical measure in a distribution of numeric values which identifies the "middle case" when the value scores are arranged in order from highest to lowest.  As a distribution divider, an equal number of cases with score values falling either above or below the media number, note all distribution values must be in numbericall order.  Example:  Using the previous scenario of 1, 5, and children respectively, the distribution "median" is 5 children.  If a distribution is an even number of cases (such as 1, 5, 6, and 8 children) the media is calculated as the "arithmetic average" of the middle pair of scores: (5=6)/2 = 5.5 children.  The median is a better indicator (than average) to express a "typical" value or score in a distribution set, especially if the set contains some, skewing high or low values.

 Mode - -   is another variant statistica measure of a representative value in a distribution set; it is the "most common" numeric indicator occurring out of all values in a defined set of values.  Given a set of test scores (81, 68, 71, 78, 81, 81,94, and 97) themode test score is:  81.

Note: Often reported together in data findings, the mean, median, and mode are termed "measures of central tendency."  

 Population & Sample - -  A "macro" concept, a population is a defined "entirety" of a set or unit under study; as such, it may be a global or large collective of individuals, groups, things such as animals or plants etc., or events.  Its framed in terms of characteristic criteria established in context of a researcher's investigation and purpose.  Examples:  All male military veterans diagnosed with PTSD.  (or) All female college students with asthma.  Geographic influences, accessibility issues, or cost are some of many factors which might affect the defining of populations.

Sample - - is any subset of which is selected to represent, draw inferences from, and make observation about that chosen population.  With rigorous selection followed to building a "sample," the results obtained are quite accurate and with some minimal error level that overall more than justifies savings in time and cost.  Example:  Studying all veterans with PTSD may be problematic at many levels.   Alternative to investigate, would be creating a sample (or subset) such as:  male military veterans in KS, MO, and MN diagnosed with PTSD; from the sample research findings inferences might be made about the "larger" population.

 Variable - - Significant of empirical research, a variable is any entity of interest and defined to be studied in a research investigation.  Key, defining elements of a variable are:  its measurable, it assumes different values, and differs from one observation to another. Being almost anything, a variables are selected for their impact and relevance to a specific research study.  In research design, "how old you are" would be termed a "quantitative variable" as it assumes different values across many people and their responses. Other examples:  earnings, years of education, age at marriage, number of children, etc. 

Given some factors of interest, a variable may not be numeric or quantitative - - - hence it is termed "qualitative" as it cannot be described meaningfully in terms of numbers.  Examples:  Gender is "qualitative"; there is no way differing numbers can be assigned to "female" and "male" meaningfully.  Same would hold true for various religious denominations.

 


Sources:  The Blackwell dictionary of sociology.  Allan G Johnson, Oxford, United Kingdom:  Blackwell Publishers. 2000.  Additional commentary and text added by this guide's author as appropriate.