Data Screening

For this discussion, identify the goals of data screening. Then discuss how you can identify and remedy

the following:

• Errors in data entry.

• Outliers.

• Missing data.

Use your IBM SPSS Statistics Step by Step text to complete the following:

• Review “An Introduction to the Example” in Chapter 1. This section provides definitions of SPSS

variables used in your Unit 3 assignment.

• Read Chapter 6, “Frequencies.” This reading addresses the following topics:

◦ Frequencies.

◦ Bar charts.

◦ Histograms.

◦ Percentiles.

• Read Chapter 7, “Descriptive Statistics.” This reading addresses the following topics:

◦ Statistical significance.

◦ The normal distribution.

◦ Mean, median, and mode.

◦ Variance and standard deviation, skewness, and kurtosis.

◦ Maximum, minimum, range, and sum.

◦ Standard error.

# Chapter 1 __An Overview of IBM® SPSS® Statistics__

## Introduction: An Overview of IBM SPSS Statistics 23

THIS BOOK gives you the step-by-step instructions necessary to do most major types of data analysis using SPSS. The software was originally created by three Stanford graduate students in the late 1960s. The acronym “SPSS” initially stood for “Statistical Package for the Social Sciences.” As SPSS expanded their package to address the hard sciences and business markets, the name changed to “Statistical Product and Service Solutions.” In 2009 IBM purchased SPSS and the name morphed to “IBM SPSS Statistics.” SPSS is now such a standard in the industry that IBM has retained the name due to its recognizability. No one particularly cares what the letters “SPSS” stand for any longer. IBM SPSS Statistics is simply one of the world’s largest and most successful statistical software companies. In this book we refer to the program as SPSS.

__1.1__ Necessary Skills

For this book to be effective when you conduct data analysis with SPSS, you should have certain limited knowledge of statistics and have access to a computer that has the necessary resources to run SPSS. Each issue is addressed in the next two paragraphs.

STATISTICS You should have had at least a basic course in statistics or be in the process of taking such a course. While it is true that this book devotes the first two or three pages of each chapter to a description of the statistical procedure that follows, these descriptions are designed to refresh the reader’s memory, not to instruct the novice. While it is certainly possible for the novice to follow the steps in each chapter and get SPSS to produce pages of output, a fundamental grounding in statistics is important for an understanding of which procedures to use and what all the output means. In addition, while the first 16 chapters should be understandable by individuals with limited statistical background, the final 12 chapters deal with much more complex and involved types of analyses. These chapters require substantial grounding in the statistical techniques involved.

COMPUTER REQUIREMENTS You must:

· Have access to a personal computer that has

· Microsoft® Windows Vista® or Windows® 7 or 8.1 or 10; MAC OS® 10.8 (Mountain Lion) or higher installed

· IBM SPSS Statistics 23.0 installed

· Know how to turn the computer on

· Have a working knowledge of the keys on the keyboard and how to use a mouse—or other selection device such as key board strokes or touch screen monitors.

This book will take you the rest of the way. If you are using SPSS on a network of computers (rather than your own PC or MAC) the steps necessary to access IBM SPSS Statistics may vary slightly from the single step shown in the pages that follow.

__1.2__ Scope of Coverage

IBM SPSS Statistics is a complex and powerful statistical program by any standards. The software occupies about 800 MB of your hard drive and requires at least 1 GB of RAM to operate adequately. Despite its size and complexity, SPSS has created a program that is not only powerful but is user friendly (you’re the user; the program tries to be friendly). By improvements over the years, SPSS has done for data analysis what Henry Ford did for the automobile: made it available to the masses. SPSS is able to perform essentially any type of statistical analysis ever used in the social sciences, in the business world, and in other scientific disciplines.

This book was written for Version 23 of IBM SPSS Statistics. More specifically, the screen shots and output are based on Version 23.0. With some exceptions, what you see here will be similar to SPSS Version 7.0 and higher. Because only a few parts of SPSS are changed with each version, most of this book will apply to previous versions. It’s 100% up-to-date with Version 23.0, but it will lead you astray only about 2% of the time if you’re using Version 21.0 or 22 and is perhaps 60% accurate for Version 7.0 (if you can find a computer and software that old).

Our book covers the statistical procedures present in three of the modules created by SPSS that are most frequently used by researchers. A module (within the SPSS context) is simply a set of different statistical operations. We include the Base Module (technically called IBM SPSS Statistics Base), the module covering advanced statistics (IBM SPSS Advanced Statistics), and the module that addresses regression models (IBM SPSS Regression)—all described in greater detail later in this chapter. To support their program, SPSS has created a set of comprehensive manuals that cover all procedures these three modules are designed to perform. To a person fluent in statistics and data analysis, the manuals are well written and intelligently organized. To anyone less fluent, however, the organization is often undetectable, and the comprehensiveness (the equivalent of almost 2,000 pages of fine-print text) is overwhelming. To the best of our knowledge, hard-copy manuals are no longer available but most of this information may now be accessed from SPSS as PDF downloads. The same information is also available in the exhaustive online Help menu. Despite changes in the method of accessing this information, for sake of simplicity we still refer to this body of information as “SPSS manuals” or simply “manuals.” Our book is about 400 pages long. Clearly we cannot cover in 400 pages as much material as the manuals do in 2,000, but herein lies our advantage.

The purpose of this book is to make the fundamentals of most types of data analysis clear. To create this clarity requires the omission of much (often unnecessary) detail. Despite brevity, we have been keenly selective in what we have included and believe that the material presented here is sufficient to provide simple instructions that cover 95% of analyses ever conducted by researchers. Although we cannot substantiate that exact number, our time in the manuals suggests that at least 1,600 of the 2,000 pages involve detail that few researchers ever consider. How often do you really need 7 different methods of extracting and 6 methods of rotating factors in factor analysis, or 18 different methods for post-hoc comparisons after a one-way ANOVA? (By the way, that last sentence should be understood by statistical geeks only.)

We are in no way critical of the manuals; they do well what they are designed to do and we regard them as important adjuncts to the present book. When our space limitations prevent explanation of certain details, we often refer our readers to the SPSS manuals. Within the context of presenting a statistical procedure, we often show a window that includes several options but describe only one or two of them. This is done without apology except for the occasional “description of these options extends beyond the scope of this book” and cheerfully refer you to the appropriate SPSS manual. The ultimate goal of this format is to create clarity without sacrificing necessary detail.

__1.3__ Overview

This chapter introduces the major concepts discussed in this book and gives a brief overview of the book’s organization and the basic tools that are needed in order to use it.

If you want to run a particular statistical procedure, have used IBM SPSS Statistics before, and already know which analysis you wish to conduct, you should read the Typographical and Formatting Conventions section in this chapter (pages 5–7) and then go to the appropriate chapter in the last portion of the book (Chapters 6 through 28). Those chapters will tell you exactly what steps you need to perform to produce the output you desire.

If, however, you are new to IBM SPSS Statistics, then this chapter will give you important background information that will be useful whenever you use this book.

__1.4__ This Book’s Organization, Chapter by Chapter

This book was created to describe the crucial concepts of analyzing data. There are three basic tasks associated with data analysis:

1. You must type data into the computer, and organize and format the data so both SPSS and you can identify it easily,

2. You must tell SPSS what type of analysis you wish to conduct, and

3. You must be able to interpret what the SPSS output means.

After this introductory chapter, Chapter 2 deals with basic operations such as types of SPSS windows, the use of the toolbar and menus, saving, viewing, and editing the output, printing output, and so forth. While this chapter has been created with the beginner in mind, there is much SPSS-specific information that should be useful to anyone. Chapter 3 addresses the first step mentioned above—creating, editing, and formatting a data file. The SPSS data editor is an instrument that makes the building, organizing, and formatting of data files wonderfully clear and straightforward.

Chapters 4 and 5 deal with two important issues—modification and transformation of data (Chapter 4) and creation of graphs or charts (Chapter 5). Chapter 4 deals specifically with different types of data manipulation, such as creating new variables, reordering, restructuring, merging files, or selecting subsets of data for analysis. Chapter 5 introduces the basic procedures used when making a number of different graphs; some graphs, however, are described more fully in the later chapters.

Chapters 6 through 28 then address Steps B and C—analyzing your data and interpreting the output. It is important to note that each of the analysis chapters is self-contained. If the beginner, for example, were instructed to conduct t tests on certain data, Chapter 11 would give complete instructions for accomplishing that procedure. In the Step by Step section, Step 1 is always “start the SPSS program” and refers the reader to Chapter 2 if there are questions about how to do this. The second step is always “create a data file or edit (if necessary) an already existing file,” and the reader is then referred to Chapter 3 for instructions if needed. Then the steps that follow explain exactly how to conduct a t test.

As mentioned previously, this book covers three modules produced by SPSS: IBM SPSS Statistics Base, IBM SPSS Advanced Statistics, and IBM SPSS Regression. Since some computers at colleges or universities may not have all of these modules (the Base module is always present), the book is organized according to the structure SPSS has imposed: We cover almost all procedures included in the Base module and then selected procedures from the more complex Advanced and Regression Modules. Chapters 6–22 deal with processes included in the Base module. Chapters 23–27 deal with procedures in the Advanced Statistics and Regression Modules, and Chapter 28, the analysis of residuals, draws from all three.

IBM SPSS STATISTICS BASE, Chapters 6 through 10 describe the most fundamental data analysis methods available, including frequencies, bar charts, histograms, and percentiles (Chapter 6); descriptive statistics such as means, medians, modes, skewness, and ranges (Chapter 7); crosstabulations and chi-square tests of independence (Chapter 8); subpopulation means (Chapter 9); and correlations between variables (Chapter 10).

The next group of chapters (Chapters 11 through 17) explains ways of testing for differences between subgroups within your data or showing the strength of relationships between a dependent variable and one or more independent variables through the use of t tests (Chapter 11); ANOVAs (Chapters 12, 13, and 14); linear, curvilinear, and multiple regression analysis (Chapters 15 and 16); and the most common forms of nonparametric tests are discussed in Chapter 17.

Reliability analysis (Chapter 18) is a standard measure used in research that involves multiple response measures; multidimensional scaling is designed to identify and model the structure and dimensions of a set of stimuli from dissimilarity data (Chapter 19); and then factor analysis (Chapter 20), cluster analysis (Chapter 21), and discriminant analysis (Chapter 22) all occupy stable and important niches in research conducted by scientists.

IBM SPSS ADVANCED STATISTICS AND REGRESSION: The next series of chapters deals with analyses that involve multiple dependent variables (SPSS calls these procedures General Linear Models; they are also commonly called MANOVAs or MANCOVAs). Included under the heading General Linear Model are simple and general factorial models and multivariate models (Chapter 23), and models with repeated measures or within-subjects factors (Chapter 24).

The next three chapters deal with procedures that are only infrequently performed, but they are described here because when these procedures are needed they are indispensable. Chapter 25 describes logistic regression analysis and Chapters 26 and 27 describe hierarchical and nonhierarchical log-linear models, respectively. As mentioned previously, Chapter 28 on residuals closes out the book.

## 1.5 An Introduction to the Example

A single data file is used in 17 of the first 19 chapters of this book. For more complex procedures it has been necessary to select different data files to reflect the particular procedures that are presented. Example data files are useful because often, things that appear to be confusing in the SPSS documentation become quite clear when you see an example of how they are done. Although only the most frequently used sample data file is described here, there are a total of 12 data sets that are used to demonstrate procedures throughout the book, in addition to data sets utilized in the exercises. Data files are available for download at www.spss-step-by-step.net. These files can be of substantial benefit to you as you practice some of the processes presented here without the added burden of having to input the data. We suggest that you make generous use of these files by trying different procedures and then comparing your results with those included in the output sections of different chapters.

The example has been designed so it can be used to demonstrate most of the statistical procedures presented here. It consists of a single data file used by a teacher who teaches three sections of a class with approximately 35 students in each section. For each student, the following information is recorded:

· ID number

· Name

· Gender

· Ethnicity

· Year in school

· Upper- or lower-division class person

· Previous GPA

· Section

· Whether or not he or she attended review sessions or did the extra credit

· The scores on five 10-point quizzes and one 75-point final exam

In Chapter 4 we describe how to create four new variables. In all presentations that follow (and on the data file available on the website), these four variables are also included:

· The total number of points earned

· The final percent

· The final grade attained

· Whether the student passed or failed the course

The example data file (the entire data set is displayed at the end of Chapter 3) will also be used as the example in the introductory chapters (Chapters 2 through 5). If you enter the data yourself and follow the procedures described in these chapters, you will have a working example data file identical to that used through the first half of this book. Yes, the same material is recorded on the downloadable data files, but it may be useful for you to practice data entry, formatting, and certain data manipulations with this data set. If you have your own set of data to work with, all the better.

One final note: All of the data in the grades file are totally fictional, so any findings exist only because we created them when we made the file.

__1.6__ Typographical and Formatting Conventions

CHAPTER ORGANIZATION Chapters 2 through 5 describe IBM SPSS Statistics formatting and procedures, and the material covered dictates each chapter’s organization. Chapters 6 through 28 (the analysis chapters) are, with only occasional exceptions, organized identically. This format includes:

1. The Introduction in which the procedure that follows is described briefly and concisely. These introductions vary in length from one to seven pages depending on the complexity of the analysis being described.

2. The Step by Step section in which the actual steps necessary to accomplish particular analyses are presented. Most of the typographical and formatting conventions described in the following pages refer to the Step by Step sections.

3. The Output section, in which the results from analyses described earlier are displayed—often abbreviated. Text clarifies the meaning of the output, and all of the critical output terms are defined.

THE SCREENS Due to the very visual nature of SPSS, every chapter contains pictures of screens or windows that appear on the computer monitor as you work. The first picture from Chapter 6 (below) provides an example. These pictures are labeled “Screens” despite the fact that sometimes what is pictured is a screen (everything that appears on the monitor at a given time) and other times is a portion of a screen (a window, a dialog box, or something smaller). If the reader sees reference to Screen 13.3, she knows that this is simply the third picture in Chapter 13. The screens are typically positioned within breaks in the text (the screen icon and a title are included) and are used for sake of reference as procedures involving that screen are described. Sometimes the screens are separate from the text and labels identify certain characteristics of the screen (see the inside front cover for an example). Because screens take up a lot of space, frequently used screens are included on the inside front and back covers of this book. At other times, within a particular chapter, a screen from a different chapter may be cited to save space.

Screen 1.1 The Frequencies Window

Sometimes a portion of a screen or window is displayed (such as the menu bar included here) and is embedded within the text without a label.

The Step by Step boxes: Text that surrounds the screens may designate a procedure, but it is the Step by Step boxes that identify exactly what must be done to execute a procedure. The following box illustrates:

Sequence Step 3 means: “Beginning with Screen 1 (displayed on the inside front cover), click on the word File, move the cursor to Open, and then click the word Data. At this point a new window will open (Screen 2 on the inside front cover); type ‘grades.sav’ and then click the Open button, at which point a screen with your data file opens.” Notice that within brackets shortcuts are sometimes suggested: Rather than the File → Open → Data sequence, it is quicker to click the icon. Instead of typing grades.sav and then clicking Open, it is quicker to double click on the grades.sav (with or without the “.sav” suffix; this depends on your settings) file name. Items within Step by Step boxes include:

Screens: A small screen icon will be placed to the left of each group of instructions that are based on that screen. There are three different types of screen icons:

Other images with special meaning inside of Step by Step boxes include:

Sometimes fonts can convey information, as well:

Font | What it Means |

Monospaced font (Courier) | Any text within the boxes that is rendered in the Courier font represents text (numbers, letters, words) to be typed into the computer (rather than being clicked or selected). |

Italicized text | Italicized text is used for information or clarifications within the Step by Step boxes. |

Bold font | The bold font is used for words that appear on the computer screen. |

The groundwork is now laid. We wish you a pleasant journey through the exciting and challenging world of data analysis!

George, Darren. *IBM SPSS Statistics 23 Step by Step, 14th Edition*. Routledge, 20160322. VitalBook file.

The citation provided is a guideline. Please check each citation for accuracy before use.