Statistics is a broad field of mathematics that studies the ways of collecting, summarizing, and deducing data. It is used in a wide range of academic sciences from physics and social science to psychology. Statistics is actually the knowledge of extracting information from the collected data. Data might be either quantitative (numbers and digits) or qualitative (non-numerical features and attributes). Data categorization and summarization help extract more information. Statistics can be defined as the practices and methods of categorizing, summarizing, analyzing, and deducing data.
Statistics deals mainly with situations in which the occurrence of an event cannot definitely be predicted. Statistical inferences are often uncertain because they are based on incomplete information.
The main data collection tools include questionnaires, enumeration, interviews, etc. Then the collected data should be categorized and summarized by using frequency tables, cross-frequency tables, graphs, etc. Frequency charts (histograms), Pareto charts, box charts, and other types of charts can also be utilized to visualize the attributes of the statistical population. There are also some indicators or criteria that help us qualitatively or quantitatively summarize and compare the attributes of statistical populations or samples.
Any research consists of two closely linked phases: the experimental phase and the data analysis phase. In fact, the method selected for data analysis depends directly on the experimental design. The following points should be taken into account in relation to the experimental design of a study:
Ø Replication: Replication means making multiple observations of the subjects. This feature enables researchers to obtain an estimate of the experimental error. In addition, it provides a more accurate estimate when the mean value of a sample is employed to estimate a factor.
Ø Randomization: Randomization means to randomly assign the subjects to experimental groups.
Ø Blocking: This is a technique to increase the accuracy and reduce the error of an experiment, despite the actual constraints of the random selection of subjects.
The main data collection tools include questionnaires, enumeration, interviews, etc. Then the collected data should be categorized and summarized by using frequency tables, cross-frequency tables, graphs, etc. Frequency charts (histograms), Pareto charts, box charts, and other types of charts can also be utilized to visualize the attributes of the statistical population. There are also some indicators or criteria that help us to qualitatively or quantitatively summarize and compare the attributes of statistical populations or samples.
The ultimate purpose of a statistical research project is to analyze accidental events and the effects of independent indices and variables on a response or dependent variable. There are two main approaches to random statistical studies: empirical studies and observational studies. Both approaches measure the effects of changes in one or more independent variables on the behavior of dependent variables. The difference between these two approaches lies in the way they are actually conducted. An empirical study involves the measurement of the studied system and then additional measurements for evaluating the effects of changes on indices. By contrast, a theoretical study does not include empirical interventions; however, it involves data collection and then the investigation of relationships between predictions and responses. The main objective of statistics is to extract the best information from existing data and then extract knowledge from the obtained information.
Statistical Population and Sample
A statistical population includes observations or experiments under the same conditions. Each of these elements can be examined in relation to different characteristics. A statistical sample should be selected randomly to represent the population. In random selection, all members of a population have the same chance to be included in a sample.
Some statistical analyses are performed on a sample rather than the whole population. This, however, causes partial loss of some information related to the statistical population and a level of error in estimates. The sample size should be determined appropriately in order to reduce the error level and increase the validity of the results. Since the increasing sample size leads to an increase in the cost and time of statistical analysis, the minimum sample size that causes a minimum level of error should be selected.
Statistical analysis needs an experimental plan including data collection tools, sample size, and problem-solving method. The more accurate the experimental plan, the more reliable the statistical analysis results. It should also be ensured that none of the measurements relevant to the results is missed or incomplete. However, in this case, it is also possible to consider what a negligible part of expenses produces sufficient and avoid conducting a series of costly experiments.
Given the variety of methods and the complexity of calculations in statistical analyses, the use of computer software suites is inevitable. At the same time, various software suites have been developed to perform such analyses. Hence, it seems necessary to select the right software suite that meets our needs. Some statistical software suites are so popular that they have become the brand of statistics, such as Statistical Package for the Social Sciences (SPSS). Some features of SPSS are as follows:
· Preparation of statistical overviews such as graphs, tables, statistics, etc.
· A variety of mathematical functions such as absolute value, sign function, logarithm, trigonometric functions, etc.
· Preparation of customized tables such as frequency tables, cumulative frequency, frequency percentage, etc.
· A variety of statistical distributions including discrete and continuous distributions
· Preparation of a variety of statistical designs
· Conduction of one-way, two-way, and multivariate analysis of variance and covariance
· Time series analysis techniques
· Development of random and continuous data
· Calculation of a variety of descriptive statistics
· A variety of tests associated with comparing the mean values between two or more independent and dependent communities
· Exchange of information with other software suites
· Fitting different types of regression models
When data contain abnormal patterns with low volume or high replicates, conventional statistical methods are not sufficient for their analysis. It is obvious that SPSS cannot perform all types of tests. Advanced statistical software suites such as R, SAS, Minitab, and Stata are among the most popular and complete ones for this purpose. Some advantages of these software suites are as follows:
· A programming language and software environment suitable for statistical computing and data science
· A simple and advanced programming language consisting of conditional expressions, loops, recursive functions, etc.
· Powerful graphical design for data analysis, graphing, and drawing shapes
· Containing a wide range of statistical techniques
· Powerful software suites for statistical analysis
· The ability to perform matrix calculations
· Specific libraries for data mining and machine learning operations such as categorization, clustering, social network analysis, reinforcement learning, etc.
· Specific libraries for analytical operations in various scientific fields
· Regularly formatted documentation for the use of related languages and libraries
· The possibility to expand their capabilities by adding the packages developed by expert users
· A command-line interface (CLI) for entering and executing commands
· Data storage, retrieval, and manipulation capabilities
Statisticians of Atieh Laboratory are ready to provide services for researchers in areas such as experimental design and data analysis.