Image credit: Pixabay.com |
Statistics is a branch of mathematics which deals with –
↪ Collection of data
↪ Organisation & Presentation of data
↪ Analysis & Interpretation of data
Origin
The word ‘statistics’ appears to have been derived from the Latin word ‘status’ meaning ‘a (political) state’. In its origin, statistics was simply the collection of data on different aspects of the life of people (population, properties, taxes etc.) useful to the State.
Over the period of time, however, its scope broadened and statistics began to concern itself not only with the collection and presentation of data but also with the interpretation and drawing of inferences from the data. Today, its influence has spread to various areas, such as agriculture, business, economics, medicine, biology, education, sociology, psychology, political science, and many other branches of science and technology.
Meanings of word ‘Statistics’ in different contexts
↪ Numerical data when used in a plural sense,
ex- May I have the latest copy of ‘Educational Statistics of India’. These may include a number of educational institutions of India, literacy rates of various states, etc.
↪ The Subject (which deals with the collection, presentation, analysis & interpretation of data) when used as a singular noun.
ex- I like to study ‘Statistics’ because it is used in day-to-day life.
Collection of Data
Data
The facts or figures, which are numerical or otherwise, collected with a definite purpose are called data. Data is the plural form of the Latin word datum.
Data are collected by individual research workers or by organization through sample surveys or experiments, keeping in view the objectives of the study.
The data collected can be: (i) Primary Data (ii) Secondary Data
Primary Data
When the information was collected by the investigator (organization, person, authority, agency or party etc) herself or himself through experiments, surveys, questionnaires, focus groups, conducting interviews etc. with a definite objective in her or his mind, the data obtained is called primary data.
Secondary Data
When the information was gathered from a source (publications, journals, newspapers, internet etc.) which already had the information stored, the data obtained is called secondary data. Such data, which has been collected by someone else in another context, needs to be used with great care ensuring that the source is reliable.
Presentation of Data
Raw Data
Unorganised data that has not been processed for meaningful use is called raw data. They are often large and difficult to interpret.
e.g., the marks obtained by 10 students in a mathematics test are given below:
55 36 95 73 60 42 25 78 75 62
The data in this form is called raw data.
Organised Data
Systematically classified and arranged data which can be analysed and interpreted is called organised data.
let us arrange the marks in previous example in ascending order as
25 36 42 55 60 62 73 75 78 95
Now, it is easier to analyse. Maximum marks, minimum marks, range of marks etc., can be find by merely looking at it.
Range
The difference of the highest and the lowest values in the data is called the range of the data.
The range in the above example is 95 – 25 = 70
Frequency
How many times a particular value occur in the given data is called its frequency .
ex – In a Mathematics test, the following marks out of 10 were obtained by 40 students
8 1 3 7 6 5 5 4 4 2
4 9 5 3 7 1 6 5 2 7
7 3 8 4 2 8 9 5 8 6
7 4 5 6 9 6 4 4 6 6
The frequency of 1 mark is 2, frequency of 2 marks is 3, frequency of 3 mark is 3 and so on.
Frequency Distribution Table
Presentation of data in ascending or descending order can be quite time consuming, particularly when the number of observations in an experiment is large. To make the data more easily understandable, we write it in a table.
When we arrange values and their frequency in the form of table, it is called frequency distribution table.
It is same like tally charts where we use tally marks instead of numbers to show the frequencies.
We can use both tally marks and numbers to show the frequency in frequency distribution table.
Frequency distribution table for the earlier example –
Marks Obtained | Tally Marks | Frequency |
1 2 3 4 5 6 7 8 9 |
|| ||| |||
|||| ||| |
2 3 3 7 6 7 5 4 3 |
Grouping of Data
When the data has large number of observations, a frequency distribution table for each observation would make it too long, so, for convenience, we make groups of observations (say, 0-10, 10-20 and so on,) and obtain a frequency distribution of the number of observations falling in each group.
Data presented in this manner is said to be grouped and the distribution obtained is called grouped frequency distribution.
Grouped Frequency Distribution Table
Data table made of grouped frequency distribution is called grouped frequency data table.
e.g., following is the grouped frequency distribution table of marks (out of 50) obtained in Mathematics by 50 students .
21, 10, 30, 22, 05, 37, 12, 25, 15, 39, 26, 32, 18, 27, 28, 08, 29, 35, 31, 24, 18, 20, 38, 22, 16, 24, 10, 27, 28, 49, 29, 32, 23, 31, 21, 34, 23, 36, 36, 47, 48, 39, 20, 07, 16, 36, 47, 30, 22, 17.
Group/ Class/ Class Interval
Each of the groups (0-10, 10-20, 20-30, etc.) in group frequency distribution table is called a Class Interval (or briefly a class).
Upper Class Limit & Lower Class Limit
Upper limit of the class interval is called Upper class limit and lower limit of the class interval is called Lower class limit.
e.g., In group 0−10, 0 is the lower class limit and 10 is the upper class limit.
✶ By convention, the common observation belongs to the higher class, i.e., 10 belongs to the class interval 10-20 (and not to 0-10).
Width/Size
The difference between the upper class limit and lower class limit is called the width or size of the class interval. (It’s 10 in the above example).