Hypothesis Test Calculator
μ
μ
Calculation Example:
There are six steps you would follow in hypothesis testing:
Formulate the null and alternative hypotheses in three different ways:
- Select an appropriate test for your hypothesis:
- Specify the level of significance (α):
- Determine the critical values and state the decision rule:
- The critical values for a two-sided test are:
- The critical values for a left-tailed test is:
- The critical values for a right-tailed test is:
- Calculate the test statistic:
- Make a decision based on the result:
- Critical Value Approach:
- P-value Approach:
If there’s a possibility that the alternative hypothesis indicates a difference in both directions (greater than and less than the null hypothesis value), we conduct a two-sided test. On the other hand, if it specifies a difference in only one direction (either greater than or less than the null hypothesis value), we say this is a one-tailed test.
If the population variance is known and the sample size is greater than or equal to 30, the appropriate test statistic is the z-test. When the population variance is unknown or the sample size is less than 30, the appropriate test statistic is the t-test.
Consult the standard normal distribution table when using the z-test or the Student’s t table when using the t-test. Critical values are defined based on the significance level (α) and the type of test you're conducting (one-tailed or two-tailed):
If this is a two-sided test, divide the significance level (α) equally between the two tails:
If this is a one-sided test, allocate the significance level (α) to the left or right tail of the distribution:
The decision rule of the hypothesis test is:
The decision rule (based on p-value approach) is:
The decision rule (based on critical value approach) is:
The decision rule (based on p-value approach) is:
The decision rule (based on critical value approach) is:
The decision rule (based on p-value approach) is:
or
Hypothesis Test Calculator (Single Population)
What is Hypothesis Testing?
Hypothesis testing is a well-established methodology in analytics. It is a key tool in inferential statistics that is used in various domains, such as social sciences, medicine, and market research. You will learn the types of hypothesis testing and how to calculate them, either by hand or by using our intuitive Hypothesis Testing Calculator.
In general, the purpose of the hypothesis test is to determine whether there is enough statistical evidence in favor of a certain idea, assumption, or the hypothesis itself.
It is an effective way to verify if a survey or experiment produced valid results. We do this by evaluating the odds that the results happened by chance. If there is a strong indication that a certain pattern has occurred at random, the experiment is not repeatable, and its usefulness is limited.
Typically, the process of hypothesis testing involves examining an assumption regarding a population by measuring and analyzing a random sample taken from said population.
In technical terms, a population is an entire group that we’re evaluating. For example, if you want to compare the satisfaction levels of your company’s male and female employees, your population mean is made up of the entire workforce—let’s say, 1,500 people.
A sample, on the other hand, is the specific group that data is collected from; its size is always smaller than the population’s. If you randomly carry out a survey with 189 men and 193 women among all employees, then those 382 people constitute your sample.
Hypothesis testing becomes particularly relevant when census data cannot be collected, for example, because the process would be too lengthy or too expensive. In such cases, researchers need to develop specific experiment designs and rely on survey samples to collect the necessary data.
In hypothesis testing, statistical measures such as p-values and test statistics are used to interpret the results and make decisions about the null hypothesis. These measures help quantify the strength of the evidence against the null hypothesis and provide insights into the statistical significance of the findings. Modern statistical software, like the Hypothesis Testing Calculator above, can calculate various relevant statistics, test values, and probabilities for us. Feel free to explore our range of Statistics Calculators to find out or consolidate the results of your analysis.
As a decision maker, you probably do not have to run the tests yourself but might nevertheless be presented with the results. All you need to do is learn how to interpret the most important ones. For that, there are a few basics you should master. These include the null hypothesis, the p-value, and the statistical significance.
The Process of Hypothesis Testing
Hypothesis testing is part of a rigorous approach to acquiring knowledge known as the scientific method - a procedure that has characterized natural science since the 17th century. It consists of systematically observing, measuring, experimenting with, formulating, testing, and modifying hypotheses.
Since then, we’ve evolved to the point where professionals realize that pure observation can be deceiving. Therefore, data increasingly drives business decisions, with more and more companies adopting a data-oriented approach. Coincidentally, that’s the purpose of data science.
While we don’t name the scientific method anymore, it represents the underlying idea.
When testing a hypothesis, follow these six steps:
- State the hypothesis;
- Select an appropriate test for your hypothesis;
- Specify the significance level of your hypothesis test;
- State the hypothesis testing’s decision rule;
- Calculate the hypothesis testing test statistic;
- Make a decision based on the result.
The Process of Hypothesis Testing
![The process of hypothesis testing graph](https://365datascience.com/resources/assets/images/hypothesis-test-01.webp)
These hypothesis testing steps constitute a traditional approach to the statistical method. Note that the p-value approach is also a frequently used approach that we will demonstrate in this Hypothesis Testing Calculator.
Null Hypothesis vs Alternative Hypothesis
The first step in any statistical test is to define a hypothesis, which is a statement that helps communicate an understanding of the question or issue at stake.
Practically, a hypothesis resembles a theory in science. But it is “less” than that because it needs to go through extensive testing before it can be deemed a proper theory.
Still, stating and testing a hypothesis does not necessarily have to lead to the development of a scientific theory. We’ll demonstrate how you can use it with examples of hypothesis testing in real life. The approach is commonly used in marketing:
- If we change the color of this conversion button on the website from red to blue, we can increase the click-through rate (CTR) by 2%.
- When we decrease the price of product X by 10%, our sales will increase by 5% during the promotional campaign.
As you can see, hypotheses are formulated as statements, not questions.
A hypothesis is not a mere hunch. It should be regarded as a proposed explanation of a problem derived from limited evidence. For this reason, it constitutes a starting point for further analytical investigation.
In fact, it is common to propose two opposite, mutually exclusive hypothesis types so that only one can be proven correct during testing.
The first is called the null hypothesis, which represents a commonly accepted fact. It is usually expressed as a hypothesis of "no difference" in a set of given observations. For example, A blue conversion button on the website results in the same click-through rate as a red button.
In statistical terms, the null hypothesis is usually stated as the equality between population parameters. For example, The red and blue conversion buttons’ mean CTRs are the same. or The difference between the red conversion button’s and the blue conversion button’s mean CTRs is equal to zero.
Technically, the null hypothesis is written as
Indeed, our objective as managers, doctors, or researchers is to challenge the status quo in order to improve our knowledge and decisions. Therefore, you want to reject the null hypothesis.
The alternative hypothesis, denoted as
In the hypothesis testing examples we mentioned earlier, the alternative hypothesis could be formulated as follows:
A blue conversion button on the website will lead to a different CTR than a red button.
In this case, the objective is to determine whether the population parameter is generally distinct or it significantly diverges in either direction from the hypothesized value. It is called a two-sided (or non-directional) alternative hypothesis. Also known as a two-tailed test, this hypothesis does not tell you whether the difference is positive or negative.
But, sometimes, it can be useful to evaluate whether the population parameter differs from the hypothesized value in a specific direction. This is known as a one-sided (or directional) hypothesis test. Alternatively, we call it a one-tailed test. An example hypothesis would be: “The difference in the blue button’s and the red button’s mean CTRs is positive.” Here, we only care about the blue button yielding a higher CTR than the red button.
We can be even more aggressive in our statement and quantify that divergence, for example, “The difference between the red and the blue conversion buttons’ mean CTRs is higher than 2%.” This would be equivalent to stating that the blue button’s mean CTR is 2% higher than the red button’s mean CTR.
In reality, however, you do not have to specify the alternative hypothesis. Given that the two types of hypothesis are opposite and mutually exclusive, only one can, and will, be true. When conducting a statistical test, it is enough to reject the null hypothesis. So, you should always aim to have a clear one.
When someone presents you with a results report from a statistical test, it is always a good idea to ask about the null hypothesis, what it was, and why the researcher chose it.
One-Tailed vs Two-Tailed Hypothesis Tests
There are different ways to formulate hypotheses depending on the nature of the test. Let’s look at two hypothesis testing examples.
Suppose we want to test whether the population mean return is equal to 5%. So, our hypothesis will be:
And the alternative hypothesis is:
In this case, we are testing whether the population mean’s return is significantly different from 5%. That being said, it could be greater than or less than 5%.
If we find that the sample mean’s return is significantly different from 5%, we reject the null hypothesis. This is an example of a two-sided hypothesis test.
What if we wanted to test whether the mean is lower than or equal to 5%? The null hypothesis would be as follows:
Whereas the alternative hypothesis is:
Here, we are testing whether the population mean’s return is greater than 5%.
If the sample mean return is significantly higher than 5%, we reject the null hypothesis. This is an example of one-sided hypothesis test.
The main difference between one-sided and two-sided tests lies between the alternative hypothesis and result interpretation.
In a one-tailed test, the alternative hypothesis focuses on a specific difference (higher than, lower than, or equal to). If we refer to the previous example, we were interested in testing if the population mean (μ) is greater than 5%. So, in this case, we use a one-tailed test.
In contrast, a two-tailed test considers the possibility of a difference in either direction from the null hypothesis. In the previous example, we tested whether the population mean’s return is equal to 5%. In other words, the alternative hypothesis states that the population mean could be lower than or less than 5%. Essentially, we test for two statements. Hence the name, two-sided or two-tailed.
The next step hypothesis testing is to find the test statistic.
What Is a Test Statistic?
The test statistic is a measure that allows us to determine what to do with the null hypothesis. The test statistic compares your data with what is expected under the null hypothesis.
The test statistic has the following general formula:
The formula for the test statistic depends on whether the population variance (
When the population variance is known and the sample size is greater than or equal to 30, then we use the z-statistic:
In this formula, we have:
- as the sample mean
- as the hypothesized mean
- as the standard error
- as the population standard deviation
- as the number of observations
When the population variance is unknown and the sample size is less than 30, we typically use the t-statistic:
In this formula, we have:
- as the sample mean
- as the hypothesized mean
- as the standard error
- as the population standard deviation
- as the number of observations
This graph shows when you should use each test statistic, depending on your data:
![When you should use each test statistic, depending on your data graph.](https://365datascience.com/resources/assets/images/hypothesis-test-002.webp)
Note that when the population variance is known and the sample size is less than 30, it is recommended to use the t-statistic rather than the z-statistic. This is because when the sample size is small, using the t-test accounts for the increased uncertainty due to the limited amount of data.
In this case, even though the population variance is known, using the t-statistic provides more accurate results by considering the smaller sample size. This helps to avoid underestimating the variability in the population and ensures that the hypothesis test is valid.
In such case, the formula for the t-statistic becomes:
Note that in this case, instead of using the sample standard deviation (s), you would use the known population standard deviation (σ) in the formula. This makes the equation identical to the one for the z-score; the only difference is that you would use the t-table instead.
Another example is when the population variance is unknown and the sample size is greater than 30. Here, it’s still recommended to use the t-test as it accounts for the uncertainty introduced by the unknown population variance and provides more accurate results. Although the central limit theorem suggests that the sampling distribution approaches a normal one as the sample size increases, using the t-statistic still ensures the hypothesis test’s robustness and validity.
In summary, the choice between the z-statistic and the t-statistic depends on the known or unknown population variance and the sample size. We use the z-test the population variance is known and the sample size is greater than 30, while implementing the t-test when the population variance is unknown and the sample size is less than 30.
Significance Level in Hypothesis Testing
The significance level of a hypothesis test is denoted by alpha (α) and determines the amount of sample evidence needed to reject the null hypothesis. The researcher sets it arbitrarily before running the statistical experiment. It is a value that you select based on the certainty you need.
The level of significance does not depend on the underlying hypotheses, nor is it derived from any observational data. Instead, it is determined in the scientific domain of the research. In many disciplines—including market research, social sciences, and medicine—researchers choose alpha values of 0.05 or 0.01. In other scientific fields such as manufacturing, genetics, or particle physics, you come across much lower alphas.
For instance, when the existence of the Higgs boson, the so-called elementary particle, fundamental for the explanation of the universe, was established in 2012, the CERN researchers used an alpha of about 1 in 3.5 million. Understandably, when it comes to the history and the future of the universe, one does not want to leave anything to chance.
By selecting the significance level of our hypothesis test, we establish a threshold that determines how strong the evidence against the null hypothesis needs to be in order to reject it. A lower significance level like 0.01 and 0.05 requires stronger evidence before we reject the null hypothesis. This helps reduce the risk of making a Type I error. (We’ll explain what this error is in an upcoming section of this Hypothesis Testing Calculator article).
On the other hand, a higher significance level such as 0.10 allows for weaker evidence to reject the null hypothesis. This can be suitable in certain scenarios where the consequences of a Type I error are less severe or when a more tolerant standard of proof is acceptable.
Rejection Region in Hypothesis Testing
When performing a hypothesis test, there are two possible outcomes: we either reject or fail to reject the null hypothesis. What does the latter mean? The failure to reject a hypothesis doesn’t necessarily mean that it is true and that we accept it. Rather, it means that we didn’t find sufficient evidence to reject it.
So, what do we base our decision on? In such cases, we rely on a decision rule.
There are three aspects a decision rule is based on:
- The type of distribution we are dealing with;
- The type of test that will be performed: one- or two-tailed:
- The significance level of the hypothesis test
First off, we need to have an idea regarding the type of distribution observed for the test statistic. If it’s a standard normal distribution, then we can simply use the critical values taken from a z-table.
Meanwhile, if the population variance is unknown or the sample size is less than 30, our hypothesis test is known as a t-test and we use the critical values from the t-distribution with n-1 degrees of freedom.
Then, depending on how the null hypothesis is formulated, we’ll be able to determine whether to run a one or two-tailed test.
When conducting a two-tailed test, we allocate the level of significance (α) equally between the left and right tails of the distribution. This means that we need to identify two critical values to define the rejection region of the hypothesis test.
For example, when using a z-statistic, we denote the critical values as
α=0.05
![Two tails critical values when using z-statistic graph](https://365datascience.com/resources/assets/images/hypothesis-test-03.webp)
On the other hand, in a one-tailed test, we focus on a specific direction of the hypothesis and allocate the entire level of significance to that side of the distribution.
![One tailed critical values when using z-statistic graph](https://365datascience.com/resources/assets/images/hypothesis-test-04-07.webp)
And finally, we have to decide the significance level at which we would like to test the null hypothesis. Of course, it makes a noticeable difference whether we run the hypothesis test at a 5% or a 1%. In the first case, we will be a bit more relaxed in terms of the probability of failing to reject a false the null hypothesis.
We need to calculate a z- or t-statistic, depending on the distribution we work with, then compare it to a critical value obtained for the respective significance level. The process goes like this:
- Determine whether we will use a z- or t-test.
- Calculate the test statistic.
- Find a critical value in a z- or t-table based on the chosen significance level.
- Compare the critical value and the z- or t-statistic.
Suppose we want to test whether the S&P 500 return in a given year is equal to 7%. In this case, we have a null hypothesis of:
And the alternative hypothesis is:
If the z-test is 1.15 and the level of significance (α) is 0.05, then, we have:
The results are 0.025 on both the left and the right side, respectively.
When α is 0.025, the z value is 1.96. So, we have 1.96 on the right side and -1.96 on the left side:
Standard Normal Distribution Table
![Standard normal distribution table graph.](https://365datascience.com/resources/assets/images/hypothesis-test-05.webp)
Decision Criteria Using a 5% Level of Significance
![Decision criteria using a 5% level of significance graph.](https://365datascience.com/resources/assets/images/hypothesis-test-06.webp)
We will reject the null hypothesis when z is higher than 1.96 and -z is lower than -1.96. Otherwise, we will accept it.
Given that our z-statistic is 1.15 we can’t reject the null hypothesis.
What about one-sided tests? Let’s refer to the last example, but this time test whether S&P500 return is higher or equal to 7% per year. The null hypothesis in this case is:
And the alternative hypothesis is:
This is a one-sided test, hence, we don’t have to divide the significance level by 2. Using the same value for α, we discover that this time the whole rejection region of the hypothesis test is on the left:
![One tailed critical values when using z-statistic graph](https://365datascience.com/resources/assets/images/hypothesis-test-04-07.webp)
Looking at the z-table, that corresponds to a z-score of -1.645 since it is on the left:
![Standard normal distribution table graph.](https://365datascience.com/resources/assets/images/hypothesis-test-08.webp)
Now, when calculating our z-statistic, if we get a value lower than -1.645, we would reject the null hypothesis. Otherwise, we would accept it:
![Reject and fail to reject in null hypothesis graph(left tailed).](https://365datascience.com/resources/assets/images/hypothesis-test-09.webp)
Alternatively, if
And the alternative hypothesis is:
In this situation, the rejection region is on the right side. So, if the test statistic is bigger than the cut-off z-score, we would reject the null:
![Right side rejection region graph.](https://365datascience.com/resources/assets/images/hypothesis-test2-10.webp)
Therefore, we can reject the null hypothesis if z is higher than 1.645:
![Reject and fail to reject in null hypothesis graph(right tailed).](https://365datascience.com/resources/assets/images/hypothesis-test-11.webp)
This isn’t the case here given that 1.15 is not higher than 1.645. So, the decision is that we can’t reject our
Correct and Incorrect Decisions in Hypothesis Testing
![Reject and fail to reject in null hypothesis graph(right tailed).](https://365datascience.com/resources/assets/images/hypothesis-test-12-13.webp)
Type I and Type II Error in Hypothesis Testing
When performing a hypothesis test, we want to reach a conclusion regarding the true population parameter based on sample data and a given significance level. Of course, there can be statistical errors. We’ll define the two types of mistakes that could occur when dealing with hypothesis testing.
In general, there are four possible outcomes when running a hypothesis test: two favorable and two unfavorable. To illustrate, let’s look at a common table statisticians use to summarize the different outcomes:
Correct and Incorrect Decisions in Hypothesis Testing
![Reject and fail to reject in null hypothesis graph(right tailed).](https://365datascience.com/resources/assets/images/hypothesis-test-12-13.webp)
The two leftmost rows show our decision: failure to reject or full-on rejecting the
Here are the two situations when we have a positive outcome:
- Failure to reject the null hypothesis when it is actually true.
- Rejecting the null hypothesis and it is actually false.
The hypothesis test is successful when we observe the latter situation.
Now let’s define the incorrect decisions also known as Type I and Type II errors.
Type I error in hypothesis testing is when you reject a true null hypothesis. You might also encounter it as a false positive. The probability of making this error is the level of significance, α. Since you, the researcher, choose the significance level of the hypothesis test, the responsibility for making this error lies solely on you.
Note that the level of significance is closely related to the confidence level, which represents the degree of certainty we have in the estimated results. It is equal to (1 − α). For example, a level of significance of 5% for a hypothesis test test means that there is a 5% probability of rejecting a true null hypothesis and corresponds to the 95% confidence level.
In hypothesis testing, a type II error occurs when you accept a false null hypothesis. The probability of making this error is denoted by β, which depends mainly on the sample size and the magnitude of the effect. So, if your topic is difficult to test due to these conditions, it is more likely to make this type of error because what you are looking for is almost negligible.
The probability of rejecting a false null hypothesis is equal to (1 – β). After all, this is the ultimate goal. Therefore, the probability is called the power of the test. Most often, researchers increase the power by increasing the sample size.
Probabilities Associated with Hypothesis Testing Decisions
![Probabilities associated with hypothesis testing decisions graph.](https://365datascience.com/resources/assets/images/hypothesis-test-14.webp)
Let’s illustrate this with an example. Say a person goes to trial and is accused of a crime. In this hypothesis testing scenario, the states that he’s innocent, whereas the alternative hypothesis is that he committed the crime:
Similarly to the table above, we can have four possible outcomes:
![Guilty, No Guilty example graph.](https://365datascience.com/resources/assets/images/hypothesis-test-15.webp)
We’ll first examine the second row of the table, Do not sentence. In the first case, we fail to reject and the person is, in fact, innocent. Therefore, the hypothesis test worked. Meanwhile, in the second situation, we reject by saying that the person is guilty, when he isn’t. So, that’s another instance when we’ve made a correct decision.
Third row shows us the outcomes if the person is sentenced for the crime. In the first scenario, the court sentences the person to jail, but he’s actually innocent. As a result, the test didn’t work because it rejected a true otherwise known as a type I error in hypothesis testing.
In the right-side column, the test failed to reject a false . Or in other words, the court released the person when he’s actually guilty of the crime. These situations represent what is called a type II error in hypothesis testing.
As we mentioned previously, the significance level we choose for a given test is equal to its probability for making a type I error. If we accept a significance level of 5%, then we also have a 5% chance of rejecting a true null hypothesis. In 5% of the cases, we would sentence a person who’s not guilty. And alternatively, we could be sure that, if we work with a 5% significance level, the test results would be correct in 95% of the cases.
Every time when we carry out such a test, we will need to choose its significance level. Feel free to use our Hypothesis Testing Calculator to experiment with different significance levels for your hypothesis test.
Hypothesis Testing with P-Value
We’ve rejected the null hypothesis at various levels of significance thus far, but we couldn’t find a level of significance at which we can no longer reject it.
Choosing an arbitrary level of significance in hypothesis testing is an approach that leaves plenty to be desired. Why would an analyst opt for 5% instead of 1%? And how do they communicate the strength of their test at different significance levels?
For this, we can use a measure called the p-value—the smallest level of significance at which the null hypothesis can be rejected.
This is the most common way to test hypotheses. Instead of relying on preassigned levels of significance, we can find the smallest value at which we can still reject the null hypothesis, given the observed sample statistic.
If a researcher performs a hypothesis test at a 5% significance level, and another researcher tests at a 1% significance level, the two would see some very different results. For instance, at 5%, the first researcher could fail to reject
However, this can quickly become confusing. And it’s also why we turn to the p-value.
In our example, with the two researchers, the p-value could be 1.5%. So, any test that has that significance level would reject the null hypothesis, while tests with a significance level above 1.5% would fail to reject it.
The p-value in hypothesis testing is a preferred method because it provides more precise information compared to choosing a significance level at our discretion. Most researchers simply communicate a p-value of their findings and leave it up to the consumers to interpret the result’s statistical significance.
So how do we calculate p-values? Most statistical software packages run tests and then provide us with a series of results; one of them is the p-value.
Afterward, it is up to the researcher to decide if the variable is statistically significant or not. Generally, software programs are designed to calculate the p-value to the third digit after the separator. But when you start conducting your own research, ideally you should be able to see three zeroes after the dot. Why is that? The closer to zero your p-value is, the more significant the result you’ve obtained is.
In other words, the smaller the p-value, the stronger the evidence against the null hypothesis.
Use this Hypothesis Testing Calculator to find the p-value, but you can take advantage of our specific P-Value Calculator and obtain quick but concise results in two programming languages: Python and R.
The p-value and the rejection region are two common methods used in hypothesis testing. Both aim to make a decision based on sample data, but they differ in how they interpret and calculate the evidence against the null hypothesis:
- The p-value indicates how strong the evidence against the null hypothesis is. A small p-value suggests that the observed data is implausible if the null hypothesis is true, which means we reject the latter in favor of the alternative hypothesis. In this approach, we have a predetermined significance level (α), typically 0.05. If the p-value is less than or equal to that, we reject the null hypothesis.
- Meanwhile, the rejection region involves determining a critical value (or cutoff value) based on a predetermined significance level (α). We then compare the test statistic (calculated from the sample data) with this critical value. If the test statistic is larger (for a one-tailed test), or falls outside the critical range (for a two-tailed test), then we reject the null hypothesis in favor of the alternative hypothesis. Unlike the p-value, the rejection region does not directly measure how strong the evidence against the null hypothesis is.
In summary, the p-value approach quantifies the strength of evidence against the null hypothesis based on the probability of obtaining the observed test statistic. Meanwhile, the rejection region compares the test statistic to a critical value before reaching a conclusion. The p-value provides more flexibility and a continuous measure of evidence, whereas the rejection region simplifies the decision-making process by using predetermined critical values.
![Rejection and Non-rejection regions graph.](https://365datascience.com/resources/assets/images/hypothesis-test-16.webp)
Two Tail Hypothesis Testing
![Two tail hypothesis testing graph.](https://365datascience.com/resources/assets/images/hypothesis-test-17.webp)
In hypothesis testing, a test statistic is a numerical value calculated from the sample data used to make inferences about the population parameter of interest. It serves as a basis for comparison between the observed data and the expectations we have under the assumption of the null hypothesis.
The specific test statistic depends on the hypothesis at hand and the type of data we’re analyzing. For example, when the population variance is known and the sample size is greater than 30, then we use the z-statistic. Alternatively, when the population variance is unknown and the sample size is less than 30, then we typically opt for the t-statistic.
Once we calculate the test statistic, we compare it to a critical value to determine whether to reject or fail to reject the null hypothesis.
The test statistic in hypothesis testing essentially summarizes the information in the sample data and provides a basis on which to make an objective decision about the null hypothesis.
The level of significance in hypothesis testing is denoted as the Greek letter α and represents a pre-determined threshold that helps identify the criteria for rejecting or failing to reject the null hypothesis.
The significance level also represents the probability of making a Type I error, which is when we reject the null hypothesis but it is actually true. Commonly used significance levels in hypothesis testing include 0.05 (5%) and 0.01 (1%). We can choose other levels such as 0.10 (10%) based on the specific context and requirements of the study.
When conducting a hypothesis test, the researcher compares the test statistic (calculated from the sample data) to the critical value(s) or calculates the p-value. If the test statistic falls in the rejection region defined by the critical value(s)—or if the p-value is smaller than the significance level— then we reject the null hypothesis. Conversely, if the test statistic falls in the non-rejection region—or the p-value is larger than the significance level—we fail to reject the null hypothesis.
Choosing a significance level is a balance between making accurate conclusions and controlling the risk of Type I errors. A lower significance level (e.g., α = 0.01) makes it more rigorous to reject the null hypothesis, reducing the chance of Type I errors, but potentially increasing the likelihood of Type II errors—failing to reject the null hypothesis when it’s false. That’s because we’ll reject H0 less frequently, including when it is false. On the other hand, a higher significance level (e.g., α = 0.10) increases the chances of Type I errors, but decreases the chances of Type II errors.
Feel free to experiment with different significance levels for your sample in our Hypothesis Testing Calculator and optimize your analyses.