Statistics: September 2004

Monday, September 27, 2004

Illustration of Statistical Concepts

http://creative-wisdom.com/computer/sas/sas.html

Checking assumptions of regression
Mutlicollinearity, Variation Inflation Factor, and Orthogonalization
Centered-score Regression
Using Perl, Bilog, SAS, and DataDesk to Visualize Item Characteristic Curves
Using SAS/Graph to Visualize Distractor Analysis
Using SAS for Item Analysis and Test Construction I
Using SAS for Item Analysis and Test Construction II
Using SAS/IntrNet for evaluating Web-based instruction (Theory)
Using SAS to analyze user access log (Practice)
Filling in Missing Data and Transposing
Joining Two Files
Merging Files with Either Student ID or Social Security Number
Analyze change of pre-test and post-test
Porting SAS Output from UNIX, Mainframe to Desktop
Tips of SAS for Windows
Grading raw responses into binary scores
Using SAS for item analysis and test construction
Writing efficient SAS source codes
Automation of changing item ID
Fill in missing information
JMP
TIPS
Using JMP to analyze user access log (Practice)
Biplot for Principle Component Analysis
Diamond Plot for Comparing Group Means and Variability
Leverage Plot for F-test
Other Statistics Resources
Incoherence and the parametric test framework: Misconceived relationships among sample, sampling distributions, and population
Misconceived relationships between logical positivism and quantitative research: An analysis in the framework of Ian Hacking
Probabilistic inferences or dichotomous answers?
Statistical reasoning
Degrees of Freedom
Don't believe in Null Hypothesis?
Mathematical Reality -- Do Theoretical Distributions Exist?
The interaction of research goal, data type, and graphical format in multivariate visualization
Bring the world into the classroom with internet resources
Identification of misconceptions in learning central limit theorem and evaluation of computer-based instruction as a remedial tool
Identification of misconceptions concerning statistical power with dynamic graphics as a remedial tool
The visualization of multi-way interactions and high-order terms in multiple regression
Induction? Deduction? Abduction? Is there a logic of EDA?
Statistical simulations in Xlisp-stat
Color regression (DataDesk)
Using DeltaGraph: Quick and dirty
Navigation
Index

Simplified Navigation

Table of Contents

Search Engine

Contact

posted by Peter Wang at 3:29 PM | 0 comments

Concept - Degree of Freedom

http://seamonkey.ed.asu.edu/~alex/computer/sas/df.html

Married Man: There is only one subject and my degree of freedomis zero. So I shall increase my "sample size."

Illustrating degrees of freedomin terms of sample size and dimensionality
Dr. Chong Ho (Alex) Yu
"Degrees of freedom" have nothing to do with your marriage. "Degree of freedom" (df) is an "intimate stranger" to statistics students. Every quantitative-based research paper requires reporting of degrees of freedom associated with the test results such as "F(df1, df2)," yet very few people understand why it is essential to do so. Although the concept "degree of freedom" is taught in introductory statistics classes, many students learn the literal definition of this term rather than its deeper meaning. Failure to understand "degrees of freedom" has two side effects. First, students and inexperienced researchers tend to mis-interpret a "perfect-fitted" model or an "over-fitted" model as a good model. Second, they have a false sense of security that df is adequate while n is large. This reflects the problem that most failed to comprehend that df is a function of both the number of observations and the number of variables in one's model. Frustration by this problem among statistical instructors is manifested by the fact that the issue "how df should be taught" has been recurring in several statistical-related discussion groups (e.g. edstat-l, sci.stat.edu, sci.stat.math).
Many elementary statistics textbooks introduce this concept in terms of the number that are "free to vary" (Howell, 1992; Jaccard & Becker, 1990). Some statistics textbooks just give the df of various distributions (e.g. Moore & McCabe, 1989; Agresti & Finlay, 1986). Johnson (1992) simply said that degree of freedom is the "index number" for identifying which distribution is used. Some definitions given by statistical instructors can be as obscured as "a mathematical property of a distribution related to the number of values in a sample that can be freely specified once you know something about the sample." (cited in Flatto, 1996) The preceding explanations cannot clearly show the purpose of df. Even advanced statistics textbooks do not discuss the degrees of freedom in detail (e.g. Hays, 1981; Maxwell and Delany, 1986; Winner, 1985). It is not uncommon that many advanced statistics students and experienced researchers have a vague idea of the degrees of freedom concept.
There are other approaches taken to present the concept of degree of freedom. Most of them are mathematical in essence (see Appendix A). While these mathematical explanations carry some merits, they may still be difficult to statistical students, especially in social sciences, who generally do not have a strong mathematical background. In the following section, it is recommended that df can be explained in terms of sample size and dimensionality. Both can represent the number of pieces of useful information.
Df in terms of sample size
Toothaker (1986) explained df as the number of independent components minus the number of parameters estimated. This approach is based upon the definition provided by Walker (1940): the number of observations minus the number of necessary relations, which is obtainable from these observations (df = n - r). Although Good (1973) criticized that Walker's approach is not obvious in the meaning of necessary relations, the number of necessary relationships is indeed intuitive when there are just a few variables. The definition of "necessary relationship" is beyond the scope of this article. To avoid confusion, in this article, it is simply defined as the relationship between a dependent variable (Y) and each independent variable (X) in the research.
Please keep in mind that this illustration is simplified for conceptual clarity. Although Walker regards the preceding equation as a universal rule, don't think that df = n - r can really be applied to all situations.
No degree of freedom
In a scatterplot where there is only one datum point, you cannot do any estimation of the regression line. The line can go in any direction as shown in the following graph. In other words, there isn't any useful information.
When the degree of freedom is zero (df = n - r = 1 - 1 = 0), there is no way to prove or disprove the model! In this sense, the data has no "freedom" to vary and you don't have any "freedom" to conduct research with this data set.
Perfect fitting
In order to plot a regession line, you must have at least two data points as indicated in the following scattergram.
In this case, there is one degree of freedom for estimation (n - 1 = 1, where n = 2). When there are two data points only, one can always join them to be a straight regression line and get a perfect correlation (r = 1.00). Since the slope goes through all data points and there is no residual, it is considered a "perfect" fit. The word "perfect-fit" can be misleading. Naive students may regard this as a good sign. Indeed, the opposite is true. When you marry a perfect man/woman, it may be too good to be true! The so-called "perfect-fit" results from the lack of useful information. Since the data does not have much "freedom" to vary and no alternate models could be explored, the researcher has no "freedom" to further the study.
This point is extremely important because very few researchers are aware that perfect fitting is a sign of serious problems. For instance, when Mendel conducted research on heredity, the conclusion was derived from almost "perfect" data. Later R. A. Fisher questioned that the data are too good to be true. After re-analyzing the data, Fisher found that the "perfectly-fitted" data are actually erroneous (Press & Tanur, 2001).
Over-fitting
In addition, when there are too many variables in a regression model i.e. the number of parameters to be estimated is larger than the number of observations, this model is said to lack degrees of freedom and thus is over-fit. To simplify the illustration, a scenario with three observations and two variables are presented.
Conceptually speaking, there should be four or more variables, and three or fewer observations to make a model over-fitting. Nevertheless, when only three subjects are used to estimate the strength of association between two variables, the situation is bad enough. Since there are just a few observations, the residuals are small and it gives an illustration that the model and the data fit each other very well. When the sample size is larger and data points scatter around the plot, the residuals are higher, of course. In this case, the model tends to be have a lesser degree of fit. Nevertheless, a less fitted model resulted from more degrees of freedom carry more merits.
Useful information
Finally, you should see that the degree of freedom is the number of pieces of useful information.
Sample size
Degree(s) of freedom
Amount of information
1
0
no information
2
1
not enough information
3
2
still not enough information
Falsifiability
To further explain why lacking useful information is detrimental to research, the program ties degrees of freedom to falsifiability. In the case of "perfect-fitting," the model is "always right." In "over-fitting," the model tends to be "almost right." Both models have a low degree of falsifiability. The concept "falsifiability" was introduced by Karl Popper (1959), a prominent philosopher of science. According to Popper, the validity of knowledge is tied to the probability of falsification. Scientific propositions can be falsified empirically. On the other hand, unscientific claims are always "right" and cannot be falsified at all. We cannot conclusively affirm a hypothesis, but we can conclusively negate it. The more specific a theory is, the higher possibility that the statement can be negated. For Popper, a scientific method is "proposing bold hypotheses, and exposing them to the severest criticism, in order to detect where we have erred." (1974, p.68) If the theory can stand "the trial of fire," then we can confirm its validity. When there is no or low degree of freedom, the data could be fit with any theory and thus the theory is said to be unfalsifiable.
df in terms of dimensions and parameters
Now degrees of freedom are illustrated in terms of dimensionality and parameters. According to I. J. Good, degrees of freedom can be expressed as
D(K) - D(H),
whereas
D(K) = the dimensionality of a broader hypothesis,such as a full model in regression
D(H) = the dimensionality of the null hypothesis,such as a restricted or null model
In the following vectors (variables) in hyperspace is used for illustration (Saville & Wood, 1991; Wickens, 1995). It is important to point out that the illustration is only a metaphor to make comprehension easier. Vectors do not behave literally as shown.

For the time being, let's ignore the intercept. What is(are) the degree(s) of freedom when there is one variable (vector) in a regression model? First, we need to find out the number of parameter(s) in a one-predictor model. Since there is only one predictor, there is only one beta weight to be estimated. The answer is straight-forward. There is one parameter to be estimated.
How about a null model. In a null model, the parameters is set to zero. The expected Y score is equal to the mean of Y. There is no beta weight to be estimated.
Based upon df = D(K) - D(H), when there is only one predictor, the degree of freedom is just one (1 - 0 = 1). It means that there is only one piece of useful information for estimation. In this case, the model is not well-supported.
As you notice, a 2-predictor model (df = 2 - 0 = 2) is better-supported than the 1-predictor model (df = 1 - 0 = 1). When the number of orthogonal vectors increase, we have more information to predict Y and the model tends to be more stable.
In short, the degree of freedom can be defined in the context of dimensionality, which conveys the amount of useful information. However, increasing the number of variables is not always desirable.
The section regarding df as n - r mentions the problem of "overfitting," in which there are too few observations for too many variables. When you add more variables into the model, the R2 (variance explained) will definitely increase. However, adding more variables into a model without enough observations to support the model is another way to create the problems of "overfitting." Simply, the more variables you have, the more observations you need.
Putting both together
The above illustrations (Part I and Part II) compartmentalize df in terms of sample size and df in terms of dimensionality (variables). Observations (n) and parameters (k), in the context of df, must be taken into consideration together.
For instance, in regression, the working definition of degrees of freedom involves the information of both observations and dimensionality: df = n - k - 1 whereas n = sample size and k = the number of variables. Take the 3-observation and 2-variable case as an example. In this case, df = 3 - 2- 1 = 0!
View the flash version of this tutorial
References
Agresti, A., & Finlay, B. (1986). Statistical methods for the social sciences. San Francisco, CA: Dellen.
Cramer, H. (1946). Mathematical methods of statistics. Princeton, NJ: Princeton University Press.
Flatto, J. (1996, May 3). Degrees of freedom question. Computer Software System-SPSS Newsgroup (comp.soft-sys.spss).
Galfo, A. J. (1985). Teaching degrees of freedom as a concept in inferential statistics: An elementary approach. School Science and Mathematics. 85(3), 240-247.
Good, I. J. (1973). What are degrees of freedom? Amercian Statisticians, 27, 227-228.
Hays, W. L. (1981), Statistics. New York: Holt, Rinehart and Winston.
Howell, D. C. (1992). Statistical methods for psychology. (3rd ed.). Belmont, CA: Duxberry.
Jaccard, J. & Becker, M.A. (1990). Statistics for the behavioral sciences. (2nd ed.). Belmont, CA: Wadsworth.
Johnson, R. A. & Wichern, D. W. (1998). Applied multivariate statistical analysis. Englewood Cliffs, NJ: Prentice Hall.
Maxwell, S., & Delany, H. (1990). Designing experiments and analyzing data. Belmont, CA: Wadworth.
Moore, D. S. & McCabe, G. P. (1989). Introduction to the practice of statistics. New York: W. H. Freeman and Company.
Popper, K. R. (1959). Logic of scientific discovery. London : Hutchinson.
Popper, K. R. (1974). Replies to my critics. In P. A. Schilpp (Eds.), The philosophy of karl Popper (pp.963-1197). La Salle: Open Court.
Press, S. J., & Tanur, J. M. (2001). The subjectivity of scientists and the Bayesian approach. New York: John Wiley & Sons.
Rawlings, J.O., (1988). Applied regression analysis: A research tool. Pacific Grove, CA: Wadsworth and Brooks/Cole.
Saville, D. & Wood, G. R. (1991). Statistical methods: The geometric approach. New York: Springer-Verlag.
Toothaker, L. E., & Miller, L. (1996). Introductory statistics for the behavioral sciences. (2nd ed.). Pacific Grove, CA: Brooks/Cole.
Walker, H. W. (1940). Degrees of Freedom. Journal of Educational Psychology, 31, 253-269.
Wickens, T. (1995). The geometry of multivariate statistics. Hillsdale, NJ: Lawrence Erlbaum Associates, inc.
Winer, B. J., Brown, D. R., & Michels, K. M. (1991). Statistical principles in experimental design. (3rd ed.). New York: McGraw-Hill.
Appendix
Different approaches of illustrating degrees of freedom
1. Cramer (1946) defined degrees of freedom as the rank of a quadratic form. Muirhead (1994) also adopted a geometrical approach to explain this concept. Degrees of freedom typically refer to Chi-square distributions (and to F distributions, but they're just ratios of chi-squares). Chi-square distributed random variables are sums of squares (or quadratic forms), and can be represented as the squared lengths of vectors. The dimension of the subspace in which the vector is free to roam is exactly the degrees of freedom. Let X_1 to X_n be independent N(0,1) variables, and let X be
the column vector whose ith element is X_i. Then X can roam all
over Euclidean n-space. Its squared length, X'X = X_1^2 + ... +
X_n^2, has a Chi-square distribution with n degrees of
freedom.

Same setup as in (1), but now let Y be the vector whose ith
element is X_i-{X-bar}, where X-bar is the sample mean. Since the
sum of the elements of Y must always be zero, Y cannot roam all
over n-dimensional Euclidean space, but is restricted to taking
values in the n-1 dimensional subspace of all n and x1 vectors
whose elements sum to zero. Its squared length, Y'Y, has a
Chi-square distribution with n-1 degrees of freedom.-->
All commonly occurring situations involving Chi-square distributions are similar. The most common of these are in analysis of variance (or regression) settings. F-ratios here are ratios of independent Chi-square random variables, and inherit their degrees of freedom from the subspaces in which the corresponding vectors must lie.
2. Galfo (1985) viewed degrees of freedom as the representation of the quality in the given statistic, which is computed using the sample X values. Since in the computation of m, the X values can take on any of the values present in the population, the number of X values, n, selected for the given sample is the df for m. The n for the computation of m also expresses the "rung of the ladder" of quality of the m computed; i.e. if n = 1, the df, or restriction, placed on the computation is at the lowest quality level.
3. Rawlings (1988) associated degrees of freedom with each sum of squares (in multiple regression) as the number of dimensions in which that vector is "free to move." Y is free to fall anywhere in n-dimensional space and, hence, has n degrees of freedom. Y-hat, on the other hand, must fall in the X-space, and hence, has degrees of freedom equal to the dimension of the X-space -- [p', or the number of independent variable's in the model]. The residual vector e can fall anywhere in the subspace of the n-dimensional space that is orthogonal to the X-space. This subspace has dimensionality (n-p') and hence, e has (n-p') degrees of freedom.
4. Chen Xi (Personal communication) asserted that the best way to describe the concept of the degree of freedom is in control theory: the degree of freedom is a number indicating constraints. With the same number of the more constraints, the whole system is determined. For example, a particle moving in a three-dimensional space has 9 degrees of freedom: 3 for positions, 3 for velocities, and 3 for accelerations. If it is a free falling and 4 degrees of the freedom is removed, there are 2 velocities and 2 accelerations in x-y plane. There are infinite ways to add constraints, but each of the constraints will limit the moving in a certain way. The order of the state equation for a controllable and observable system is in fact the degree of the freedom.
5. Selig (personal communication) stated that degrees of freedom are lost for each parameter in a model that is estimated in the process of estimating another parameter. For example, one degree of freedom is lost when we estimate the population mean using the sample mean; two degrees of freedom are lost when we estimate the standard error of estimate (in regression) using Y-hat (one degree of freedom for the Y-intercept and one degree of freedom for the slope of the regression line).
6. Lambert (personal communication) regarded degrees of freedom as the number of measurements exceeding the amount absolutely necessary to measure the "object" in question. For example, to measure the diameter of a steel rod would require a minimum of one measurement. If ten measurements are taken instead, the set of ten measurements has nine degrees of freedom. In Lambert's view, once the concept is explained in this way, it is not difficult to extend it to explain applications to statistical estimators. i.e. if n measurements are made on m unknown quantities then the degrees of freedom are n-m.
Navigation
Index
Simplified Navigation
Table of Contents
Search Engine
Contact

posted by Peter Wang at 3:23 PM | 0 comments

Friday, September 24, 2004

Concept - PDF

Probability density function
From Wikipedia, the free encyclopedia.
In mathematics, a probability density function (pdf) serves to represent a probability distribution in terms of integrals. If a probability distribution has density f(x), then intuitively the infinitesimal interval [x, x + dx] has probability f(x) dx. Informally, a probability density function can be seen as a "smoothed out" version of a histogram: if one empirically measures values of a continuous random variable repeatedly and produces a histogram depicting relative frequencies of output ranges, then this histogram will resemble the random variable's probability density (assuming that the variable is sampled sufficiently often and the output ranges are sufficiently narrow).
Formally, a probability distribution has density f(x) if f(x) is a non-negative Lebesgue-integrable function R → R such that the probability of the interval [a, b] is given by
for any two numbers a and b. This implies that the total integral of f must be 1. Conversely, any non-negative Lebesgue-integrable function with total integral 1 is the probability density of a suitably defined probability distribution.
For example, the continuous uniform distribution on the interval [0,1] has probability density f(x) = 1 for 0 ≤ x ≤ 1 and zero elsewhere. The standard normal distribution has probability density
.
If a random variable X is given and its distribution admits a probability density function f(x), then the expected value of X (if it exists) can be calculated as
Not every probability distribution has a density function: the distributions of discrete random variables do not; nor does the Cantor distribution, even though it has no discrete component, i.e., does not assign positive probability to any individual point.
A distribution has a density function if and only if its cumulative distribution function F(x) is absolutely continuous. In this case, F is almost everywhere differentiable, and its derivative can be used as probability density:
.
If a probability distribution admits a density, then the probability of every one-point set {a} is zero.
It is a common mistake to think of f(a) as the probability of {a}, but this is incorrect; in fact, f(a) will often be bigger than 1 - consider a random variable with a uniform distribution between 0 and 1/2.
Two probability densities f and g represent the same probability distribution precisely if they differ only on a set of Lebesgue measure zero.

posted by Peter Wang at 5:08 PM | 0 comments

Concept - P-value

p-value is in hypothesis testing context.
First you need a null hypothesis and alternative hypothesis.
Then you define a test statistics. Usually this test statistics has the
property that if its value is large then we reject the null hypothesis.
(ie. the larger value of test statistics, the more deviation from the null
hypothesis)
Then the p-value is probability that the test statistics (random variable) is
larger than its realization (value) under the null hypothesis.
(ie. the probability of more deviation events under the null hypothesis)

我们老师有句很精炼的概括：
P-value is the degree of how much the sample is near the hypothesis.

原话记不得了
我琢磨了很久，才觉得很有道理

以前的TA给过一句P-VALUE的正确定义，很好理解，回头我查查。
不过，
我们实践中可以把它看成“H0这个假设为真”的概率”，很方便的。（虽然这个解释在技
术上是错误的，因为在实际的population中，这个h0假设正确或者错误是确定的，只是尚
不知道。不涉及正确的概率是多少的问题。）

另外，
以前的理论会把p-value和所采用的a(显著性水平，significant
level)做比较，如果p-value比它大，就认为H0正确。如果p-value 比它小，就认为H0错
误。
现在基本上已经不再用这种方法了。
而是只看p-value,如果在0-0.01之间
,则(convicing)拒绝h0，0.01-0.05之间则（moderate）拒绝h0,如果在0.05-0.10之间则
（suggestive,but inconclusive）拒绝H0,在0.10以上，则接受h0.
so this is purely Fisherian's argument, w/o considering the consequence(or the
loss) of action and entirely lacks optimality consideration

比如在SPLUS中的各种TEST（如t.test(),binom.test()）给出的结果就三个部分（对于参
数推论而言），estimated value of the parameter, CI of the parameter,p-value。

看来你们都未能真正理解p值,或者是没有说到点子上.
对纯被噎死学派来说,把p值解释成零假设成立的概率是可以的,因为H0被看成是随机事件,
是可以讨论概率的.
对频率学派来说,我的解释是:假设检验看成是类似于数学的反证法.假设H0成立,然后去找
矛盾,如果找到矛盾了,否定零假设,OK.没找到矛盾,并不能说明假设是成立的,可能是因为
你水平低运气差.
好了统计的假设检验类似于数学的反证法的地方在于它通过构造统计量(statistic)去找
反例(矛盾)--即小概率事件,因为统计只谈概率,没有绝对的真理或矛盾(你要较真的话可
以把概率为1或0看成真理或谬误),这是与数学不同之处.因为统计量是随机变量,所以可以
谈论概率. 因为统计量是人为构造的,所以不同的统计量power有高低.

some interesting relationships:
(1) If p is small, then reject H0
(2) If accept H0, then p must be large
(3) If failed to reject H0, we can not conclude to accept H0

posted by Peter Wang at 4:26 PM | 0 comments

Thursday, September 23, 2004

Concept - indicator function

In mathematics, the indicator function (sometimes also called characteristic function) of a subset A of a set X
is a function from X into {0,1} defined as follows:

The term characteristic function is potentially confusing because it is also used to denote a quite different concept that is also prevalent in probability theory; see characteristic function.
The indicator function is a basic tool in probability theory: if X is a probability space with probability measure P and A is a measurable set, then IA becomes a random variable whose expected value is equal to the probability of A:

It may be called an indicator variable, as a random variable returning a 0-1 data point.
For discrete spaces the proof may be written more simply as

Furthermore, if A and B are two subsets of X, then

posted by Peter Wang at 9:44 AM | 0 comments

Concepts - σ-algebra

http://sigma-algebra.wikiverse.org/

In mathematics, a σ-algebra (or σ-field) X over a set S is a family of subsets of S which is closed under countable set operations; σ-algebras are mainly used in order to define measures on S. The concept is important in mathematical analysis and probability theory.
Formally, X is a σ-algebra if and only if it has the following properties:
The empty set is in X,From 1 and 2 it follows that S is in X; from 2 and 3 it follows that the σ-algebra is also closed under countable intersections (via De Morgan's laws).
An ordered pair (S, X), where S is a set and X is a σ-algebra over S, is called a measurable space.

Examples
If S is any set, then the family consisting only of the empty set and S is a σ-algebra over S, the so-called trivial σ-algebra. Another σ-algebra over S is given by the full power set of S.
If {Xa} is a family of σ-algebras over S, then the intersection of all Xa is also a σ-algebra over S.
If U is an arbitrary family of subsets of S then we can form a special σ-algebra from U, called the σ-algebra generated by U. We denote it by σ(U) and define it as follows. First note that there is a σ-algebra over S that contains U, namely the power set of S. Let Φ be the family of all σ-algebras over S that contain U (that is, a σ-algebra X over S is in Φ if and only if U is a subset of X.) Then we define σ(U) to be the intersection of all σ-algebras in Φ. σ(U) is then the smallest σ-algebra over S that contains U.
This leads to the most important example: the Borel algebra over any topological space is the σ-algebra generated by the open sets (or, equivalently, by the closed sets). Note that this σ-algebra is not, in general, the whole power set. For a non-trivial example, see the Vitali set.
On the Euclidean space Rn, another σ-algebra is of importance: that of all Lebesgue measurable sets. This σ-algebra contains more sets than the Borel algebra on Rn and is preferred in integration theory.
See also measurable function.

posted by Peter Wang at 9:42 AM | 0 comments

Friday, September 17, 2004

Jobs as statistician

What jobs await me if I earn a MS in statistics?There is a large and increasing demand for workers with the MS degree in our field. "Statistician" is consistently ranked as one of the top five careers based on quality/interest of job, pay and benefits, stress, etc. With an MS in statistics, you could work in business, industry, or government. Statisticians are needed in all areas of industry, eg., for quality control, market research, and product development. Businesses in the high-tech, medical, and pharmaceutical sectors are some of the most prolific employers of statisticians, especially for product research and development. Potential government employers might include the FDA, CDC, NOAA, NMFS, USFS, NIST, the Census Bureau, and the National Center for Atmospheric Research, to name a few. The ASA offers a description of statistics careers and an assessment of statistics as a career choice. The US Bureau of Labor Statistics Occupational Outlook Handbook also provides such information.One prominent place to look for job ads is the ASA magazine "Amstat News". A portion of their published ads are shown here on the web. Please also see this page of our department website for more information about careers in statistics .

posted by Peter Wang at 1:01 PM | 0 comments

Tuesday, September 14, 2004

Foundamental courses in Stat@CSU

STCC201, GENERAL STATISTICS

- Fall 2004, Rick Gumina, http://www.stat.colostate.edu/~gumina/st201/class1/201info.html, Elementary Statistics, 9 th ed. by Mario F. Triola

STCC301 - INTRODUCTION TO STATISTICAL METHODS,

- Spring 2003, Dave Bowden, http://www.stat.colostate.edu/~dbowden/st301/, Devore/Peck, Statistics: The Exploration and Analysis of Data- edition 4;

- Spring 2003, Curt Storlie, http://www.stat.colostate.edu/~storlie/st301/, Devore/Peck, Statistics: The Exploration and Analysis of Data- edition 4;

- Fall 2002, Shean-Tsong Chiu, http://www.stat.colostate.edu/~chiu/st301/, Devore/Peck, Statistics: The Exploration and Analysis of Data- edition 4;

ST304 MULTIPLE REGRESSION ANALYSIS,

- Spring 2004, Dave Bowden, http://www.stat.colostate.edu/~dbowden/ST%20304/, Applied Regression Analysis and Other Multivariable Methods-edition 3, Kleinbaum, Kupper, Muller, and Nizam

ST 305 Sampling Techniques

- Fall 2004, Dave Bowden, http://www.stat.colostate.edu/~dbowden/ST%20305/,

ST 321: Elementary Probabilistic/Stochastic Modeling,

- Spring 2004 , Thomas Lee http://www.stat.colostate.edu/%7Etlee/ST321Spr04, Simulation, Third Edition, Sheldon M. Ross (2002), Academic Press; Introduction to Probability, Second Edition, Grinstead, Charles M. and J. Laurie Snell (1997). This book is published in bound form by the American Mathematical Society but is also available online as a pdf document.

ST 420: Probability and Mathematical Statistics I

- Fall 2004 , Thomas Lee http://www.stat.colostate.edu/%7Etlee/ST420Fall04, Intro to Mathematical Statistics and Its Applications, Third Edition, Larsen & Marx, Prentice Hall, 2001

ST 430: Probability and Mathematical Statistics II

- Spring 2004 , Thomas Lee http://www.stat.colostate.edu/%7Etlee/ST430Spr04, Intro to Mathematical Statistics and Its Applications, Third Edition, Larsen & Marx, Prentice Hall, 2001

ST 460 - APPLIED MULTIVARIATE ANALYSIS

posted by Peter Wang at 4:33 PM | 0 comments