Tuesday, January 26, 2010

The Scatter Diagram

Firstly for the regression we use the scatter diagram for the studying the relationship between two variables. It study the relationship between the two variables.

Example:-




In the above example, the points are plotted by assigning values of the independent variable X to the horizontal axis and values of the dependent variable Y to the vertical axis.
The pattern made by the points plotted on the scatter diagram usually suggest the basic nature and strength of the relationship between two variables. The scatter diagram also shows that, subjects with large waist circumferences also have larger amounts of deep abdominal AT. These impressions suggest that the relationship between the two variables may be described by a straight line crossing the Y-axis below the origin and making approximately a 45-degree angle with the X-axis.

Monday, January 25, 2010

The Sample Regression Equation

In simple linear regression the object of the researcher's interest is the population regression- the regression that describes the true relationship between the dependent variable Y and the independent variable X.

In an effort to reach a decision regarding the likely form of this relationship, the researcher draws a sample from the population of interest and, using the resulting data, computes a sample regression equation that forms the basis for reaching conclusions regarding the unknown population regression equation.


Steps in Regression Analysis
In the absence of extensive information regarding the nature of the variables of interest, a frequently employed strategy is to assume initially that they are linearly related. Subsequently analysis, then, involves the following steps;
1- Determine whether or not the assumptions underlying a linear relationship are met in the data available for analysis.
2- Obtain the equation for the line that best fits the sample data.
3- Evaluate the equation to obtain some idea of the strength of relationship and the usefulness of the equation for predicting and estimating.
4- If the data appear to conform satisfactorily to the linear model, use the equation obtained from

the sample data to predict and to estimate.



Sunday, January 24, 2010

The Regression Model

In the typical regression problem, as in most problems in applied statistics, researchers have available for analysis a sample of observations from some real or hypothetical population. Based on the results of their analysis of the sample data, they are interested in reaching decisions about the population from which the sample is presumed to have been drawn. It is important, therefore, that the researchers understand the nature of the population in which they are interested. They should know enough about the population to be able either to construct a mathematical model for its representation or to determine if it reasonably fits some established model. A researcher about to analyze a set of data by the methods of simple linear regression, e.g. should be secure in the knowledge that the simple linear regression model is, at least, an approximate representation of the population. It is unlikely that the model will be a perfect portrait of the real situation, since this characteristics is seldom found in models of practical value. A model constructed so that it corresponds precisely with the details of the situation is usually too complicated to yield any information of value. On the other hand, the results obtained from the analysis of data that have been forced into a model that does not fit are also worthless. Fortunately. however, a perfectly fitting model is not a requirement for obtaining useful results. researchers, then, should be able to distinguish between the occasion when their chosen models and the data are sufficiently compatible for them to proceed and the case where their chosen model must be abandoned.
Assumptions of  Regression Model

 

Friday, January 22, 2010

Simple Linerar Regeression and Correlation

In analyzing data for the health sciences disciplines, we find that it is frequently desireable to learn something about the relationship between two variables. We may, for example, be intrested in studying the relationship between blood pressure and age, height and weight, the concentration of an injected drug and heart rate, the consumption level of some nutrient and weight gain, the intensity of a stimulus and reaction time, or total family income and medical care expenditures. the nature and strength of the relationship between variables such as these may be examined by Regression and Correlation analysis, two statistical techniques that, although related, serve different purposes.

Regression
regression analysis is helpful in ascertaining the probable form of relationship between variables, and the ultimate objective when this method of analysis is employed usually is to predict or estimate the value of one varianle corresponding to a given value of another variable. The ideas of regression were first elucidated by the English Scientist Sir Francis Galton in reports of his research on heridity ( firstly in the sweet peas and lator in human stature. He described a tndency of adult offspring, having either shortb or tall parens, to revert back toward the average height of general population.He first used the word reversion, and later regression, to refer to this phenomenon.

Correlation
on the other hand, correlation is concerd with measuring the strength of the ralationship between variables. When we compute measures of correlation from a set of data, we are interested in the degree of the correlation between variables. The concept and the terminology of correlation analysis originated with Galton, who first used the word correlation.


Thursday, January 21, 2010

Factorial Experiment

In the experimental designs that we have considered up to this point we have been interested in the effects of only one variable, the treatments. Frequently, however, we may be interested in studying, simultaneously, the effects of two or more variables. we refer to the variables in which we are interested as factors. The experiment in which two or more factors are investigated simultaneously is called a Factorial Experiment.
the different designated categories of the factors are called levels. Suppose, for example, that we are studying the effect on reaction time of three dosages of second factor of interest in the study is age, and it is thought that two age groups, under 65 years and 65 years and over, should be included. We then have two levels of the age factor. In general, we say that factor A occurs at a levels and factor B occurs at b levels. In factorial experiment we may study not only the effects of individual factors but also, if the experiment is properly conducted, the interaction between factors.
Advantages
The following are the advantages of the Factorial experiment;
1- The interaction of the factors may be studied.
2- There is a saving of time and effort.
3- Since the various factors are combined in one experiment, the results have a wider range of application.

Wednesday, January 20, 2010

Repeated Measures Design

One of the most frequently used experimental designs in the health sciences field is the repeated measures design.
Definition
                 ''A repeated meausres design is one in wich measurements of the same variable are made on each subject on two or more different occasions.''
The usual motivation for using a repeated measures design is a desire to control for variability among subjects. In such a design each subject serves as its own control. When measurements are taken on only two occasions we have the paired comparisions design that we discussed. One of the most frequently encountered situations in which the repeated measures design is used is the situation in which the investigator is concerned with reponses over time.
Advantage
The major advantage of the repeated measures design is, as previously mentiond, its ability to cintrol for extraneous variation among subjects. An additional advantage is the fact that fewer subjects are needed for the repeated measures design than for a design in which different subjects are used for each occasion on which measurements are made. Suppose, for example , that we have four treatments ( in the usual sense) or four points in time on each of which we would like to have 10 measurements. If a different sample of subjects is used for each of the four treatments or points in time, 40 subjects would be required . If we are able to take measurements on the same subject for each treatment or point in time, i.e if we can use a repeated measures design, only 10 subjects would be required. This can be a very attractive advantage if subjects are scarce or expensive to recruit.
Disadvantage
A major potential problem to be on the alert for is what is known as the carry-over effect. When two or more treatments are being evaluated, the investigator should make sure that a subject's response to one treatment does not reflect a residual effect from previous treatments. This problem can frequently be solved by allowing a sufficient length of time between treatments. Another possible problem is the position effect. A subject's response to a treatment experienced last in a sequence may be different from the response that would have occured if the treatment had been first in the sequence. In certain studies, such as those involving physical participation on the part of the subjects, enthusiasm that is high at the beginning of the study may give way to boredom toward the end. A way around this problem is to randomized the sequence of treatments independently for each subject. 

Monday, January 18, 2010

Tukey's Test for Unequall Sample Sizes

When the samples are not equal or not having same size, Then the Tuky's Test is not applicable. Spotvoll and Stoline have extended Tukey Procedure to the case where the sample sizes are different.Their Procedure , which is applicable for experiments involving three or more treatments and significance levels of 0.05 or less, consists of replacing n in the following equation;


the smallest of the two sample sizes associated with the two sample means that are to be compared. If we designate the new quantity by HSD, we have as the new test criterion


Tukey's HSD test

A multiple comparision procedure developed by tucky is frequently used for testing the null hypothesis that all possible pairs of treatment means are equal when the samples are all of the same size. When this test is employed we select an overall significance level of alpha. The probability is aalpha, then , that one or more of the null hypotheses is false.
Tukey's test, which is usually referred to as the HSD ( honestly significant difference) test, makes use of a single value against which all differences are compared. This value, called the HSD, is given by;


where alpha is the chosen level of significance, k is the no. of means in the experiment, N is the total number of observations in the experiment, n is the number of observations in a treatment, MSE is the error or within mean square from the ANOVA table, and q is obtained by entering Appendix I table  
H with alpha, k and N-k.
All possible differences between pairs of means are computed, and any difference that yields an absolute value that exceeds HSD is declared to be significant.



Sunday, January 17, 2010

ANOVA EXAMPLE

Miller and Vanhoutte conducted experiments in which adult ovariectomized female mongrel dogs were treated with estrogen, progesteron, or estrogen plus progesteron. Five untreated animals served as controls. A Variable of interest was concentration of progesterone in the serum of the animals 14 to 21 days after treatment. We wish to know if the treatments have different effects on the mean serum concentration of progesterone.
Solution:-




Friday, January 15, 2010

One-Way ANOVA

The simplest type of the analysis of variance is that known as one-way ANOVA, in which only one source of variation, or factor, is investigated. It is an extension to three or more samples of the t-test procedure for use with two independent samples. We can say that the t test for use with two independent samples is a special case of one-way ANOVA.
In a typical situation we want to use one-way ANOVA to test the null hypothesis that three or more treatments are equally effective. The necessary experiment is designed in such a way that the treatments of intrest are assigned completely at random to the subjects or objects on which the measurements to determine treatment effectiveness are to be made. For this reason design is called completely randomized experimental design.


Wednesday, January 13, 2010

ANOVA Procedure

Following are the steps of the Analysis of Variance procedure.
1- Description of the data
 in addition to describe the data in the usual way, we display the sample data in tabular form.
2- Assumptions along with the assumtions underlying the analysis, we present the model for each design we discuss above. The model consists of a symbolic representation of a typical value from the data being analyzed.
3- Hypothesis
4- Test Statistic
5- Distribution of Test Statistic
6- Decision Rule
7- Calculation of test statistic
 the results of the arithmatic calculations will be summarized in table called the analysis of variance (ANOVA) table. The entries in the table make it easy to evaluate the results of the analysis.
8- Statistical Decision
9- Conclusion

Use of Computer
The calculations required by analysis of variance are lengthier and more complicated. The computer assumes an important role in analysis of variance . All the exercises appearing in this section are suitable for computer analysis and may be used with the statistical packages. The out put of the statistical packages may vary slightly. The basic concept of the analysis of variance that we present here should provide the necessary background for understanding the description of the programs and their output in any of the statistical packages.

Assumptions of ANOVA

The valid use of analysis of variance as a tool of statistical inference are a set of fundamental assumptions. We refer to the paper of Eisenhart. Although an experimenter must not expert to find all the assumptions met to perfection, it is important that the user of analysis variance techniques be aware of the underlying assumptions and be able to recognize when they are substanially unsatisfied.The consequences of the failure to meet the assumptions are discussed by Cochran in a companion paper to that of Eisenhart. Because experiments in which all the assumptions are perfectly met are rare, Cochran suggests that ANOVA results be considered as approximate rather than exact. These assumptions are pointed out at appropriate points in the following.
We discuss ANOVA as it is used to analyze the results of two different experimental designs, the completely randomized and the randomized complete  block design. In addition to these , the concept of a factorial experiment is given through its use in a completely randomized design. These do not exhaust the possibilities.

Monday, January 11, 2010

Analysis of Variance

It is defined as a technique whereby the total variation present in a set of data is partioned into two or more components. Associated with each of these components is a  specific source of variation, so that in the analysis it is possible to ascertain the magnitude of the contributions of each of these sources to the total variation.
Applications of analysis of variance
 Analysis of variance finds its wide application in the analysis of data derived from experiments. the principles of the design of experiments are well covered in several books, including those of Cochran and Cox, Cox, Davies ,Federer, Finney, Fisher, John, Kempthorne, Li and Mendenhall.
Analysis of variance is used for two different purposes, 1- to estimate and test hypothesis about population variances and 2- to estimate and test hypothesis about population means. 
The following given example illustrates the basic ideas involved in the application of analysis of variance. 
Example:-




Sunday, January 10, 2010

Addition Rule of Probability

The third property of the probability states that probability of occurrence of either one or the other of two mutually exclusive events is equal to the sum of their individual probabilities.
For Example



Friday, January 8, 2010

Multiplication Rule

A probability may be calculated from the other probabilities.For example, joint parobability may be calculated as the product of appropriate marginal probability, and also from the conditional probability.This relationship of probabilities is called the multiplication rule of probability.







Thursday, January 7, 2010

Joint Probability

When we decide to find the probability that a subject picked at random from group of subjects having features at the same time.Such probability is called joint probability.
Lets us the following table

Wednesday, January 6, 2010

Conditional Probability

The set of all outcomes constitute a subset of complete group. We can say the size of the group of intrest may be reduced by conditions not applicable to the group. When the prrobabilities are computed or calculated with a subset of the total group as the denominator, the result is a conditional Probability.
Example:-


Calculation of Probabilityof an Event

Now we use the probability techniques to calculate the probabilities of some events.
Example:-


Properties of Probability

Russian mathematician  A. N. Kolmogrov, in 1933 formalized the axiomatic approach to probability.
This approach has the following three main properties;

Subjective Probability

In early 1950 L.J. Savage gave considerable impetus to "Personalistic". it is the Subjective concept of the Probability.

Tuesday, January 5, 2010

Relative Frequency Probability

It is the approach to chance or probability of some process and ability to calculate the number of repetitions and also the number of times of some  event of interest occurs.

But keep in your mind that m/n is only the estimate of the P(E).

Classical Probability

The term clasical probability was firstly used by the two mathematicians;
i) Pascal
ii) Fermat
This theory was used to develop attempts to solve the problems which are related to the game of chance. For example the rolling of dice.

Definition of classical Probability




Monday, January 4, 2010

Measures of Central Tendency

Arithmetic Mean
The Mostly used measure of central tendency is Arithmetic Mean. It is descriptive measure and often called "average". We can simply call the Arithmetic mean the "Mean".



For Example: We want to get the mean age of a group of 5 people. 
Solution:- 
                        Mean Age= 10+25+36+42+56=33.8 years
                                                        5

Properties of Mean
i)- Unique:- For any given data there is only 1 Arithmatic mean.
ii)- Simple:- Arithmatic Mean is very simple and easily computed.
iii) The Arithmatic Mean is affected by each and every value of the data. Extreme values have a great influence on the mean.




Sunday, January 3, 2010

Basic Statistics

Descriptive statistics
It is used to express the main characteristics of a set of data in quantitative conditions. Descriptive statistics are illustrious from inferential statistics, in that descriptive statistics try to quantitatively review a data set, rather than being used to hold inferential statement about the population that the data are considered to represent. Even when a data analysis draw its main result using inductive statistical analysis, descriptive statistics are normally presented with more formal analysis, to give an overall sense of the data is analyzed.



Common uses:- Example of the application of descriptive statistics mostly occurs in health research studies. In paper reporting on a study concerning human population, there normally appears a table having the overall sample size, sample sizes in significant subgroups (for each treatment or exposure group), and demographic or clinical character such as the mean age, the proportion of patients with each gender, and the proportion of patients with related diseases.

Saturday, January 2, 2010

Biostatistics





Bio stat or Bio statistics (is combination of two words 1-biology and 2-statistics; sometimes called as biometry) is the use of statistics to a broad range of topics in biology. The bio statistics contain the designs of biological experiments mostly in medical field.
Bio statistics contributes a number of methods with following fields:

i) computer science
ii) mathematical demography
iii) econometrics
iv) psychometric
v) operational research
vi) statistics