A study on the correlation between power and DFA alpha1 in every day workouts

A study on the correlation between power and DFA alpha1 in every day workouts


by Stefano Andriolo. We demonstrate a universal relationship between cycling power and DFA alpha 1 from every day workout data that allows accessible and regular tracking of aerobic and anaerobic thresholds without the need of an exercise lab or even a dedicated testing protocol.

DFA alpha 1 (a1) is a metric derived from Heart Rate Variability (HRV) data, and is used to estimate aerobic and anaerobic thresholds in individuals. It also provides a means to assess an individual's physiological states, particularly during exercise: the lower a1, the more acute exercise stress your body is dealing with. For example, if a1 crosses below 0.75, you cross your aerobic threshold. Once a1 drops further below 0.5, you cross your anaerobic threshold.

Knowledge of an athlete's aerobic and anaerobic threshold is important to prescribe appropriate exercise intensities in a training plan. It allows to correctly set productive training intensity zones. Regular automatic assessments of these thresholds improves the amount of data of the implication that different training regimes can have on an athlete's fitness, as learned by the machine learning models of AI Endurance (digital twin of each athlete).

a1 is a measure of internal effort. Hence, it is expected to have an high degree of correlation with external metrics defining the effort such as power (in cycling and running) or pace (in running and swimming): the higher the power/pace, the lower a1, where the precise relationship between a1 and power/pace is expected to depend on the individual.

So far, the exploration of the relationship between power or pace and a1 in the scientific literature has necessitated meticulous protocols such as ramp tests, to be conducted within the controlled confines of a laboratory. These protocols, although accurate, pose a significant barrier to the real-world applicability and understanding of this valuable metric out 'in the wild'. The analysis described here does not rely on laboratory data but instead data collected through the AI Endurance app. Furthermore, data collection was entirely passive as the participants did not follow a particular set of study instructions during their workouts.

Here, we present a novel approach that leverages everyday workout data, unshackling the analysis from the stringent requirements of lab-based experiments. Focusing on cycling power, and utilizing a diverse range of workout data, collected across varying intensities and conditions, we have devised an effective method to elucidate the correlation between power (p) and a1.

Briefly, our new method is to define representatives of each a1 - power/pace data cluster via a simple averaging procedure that alleviates the effects of cardiac lag that otherwise create noise in the a1 - power/pace relationship and to consider workout groups instead of individual workouts.

We expect our findings will help improve

  • determining aerobic and anaerobic HRV thresholds without the need for dedicated fitness tests such as ramp tests
  • assessing readiness to train and durability via HRV data
  • determining if an athlete is a 'non-responder' for the a1 metric.

We define a non-responder as an athlete with poor correlation between a1 and power/pace and/or power/pace values at a1 = 0.5 and a1 = 0.75 that are nowhere near their actual ventilatory anaerobic and aerobic thresholds. While we could not identify a non-responder in this study, some athletes report markedly suppressed a1 < 0.5 already at the commencement of exercising.

Results

Our analysis:

  1. Confirms the expected correlation between power and a1 both within single workouts and workout groups, see Figures 1 and 2;
  2. Suggests a relationship between power and a1 that possesses a universal feature, see Figure 2;

These results are generally present in all of the randomly picked data of athlete's that have agreed to share their exercise data in AI Endurance for research purposes.

Correlation within a workout

user 6 workout 1
User 6 workout 2
User 14 Workout 1

Figure 1: Correlations (and relative p-values) within single workouts. The first 2 workouts belong to the same user, the third corresponds to a workout executed in ERG mode. Blue points are the (a1, p) data collected, and are used to compute the 'total' correlation and the 'dynamical' correlation (considering blue points with a1<1). Red points are the representatives of each a1 interval (marked by yellow dashed lines, up to a1=1.2), and are used to compute the 'red' correlation following our method, as described below.

Our analysis reveals a notable trend: in 74% of cases, the correlation obtained using the representative method is lower than that obtained using the standard approach. The standard approach refers to using the (a1, p) data in its entirety without representatives. 44% of workouts showing a correlation below -0.7, compared to just 3% when all data points are considered (see Figure 3 for detailed statistics). This finding underscores the effectiveness of our representative-based method in most scenarios.

Delving into specific examples, as depicted in Figure 1, we observe that in common workout scenarios, the correlation derived using our method is generally stronger than that calculated using all data points. This is particularly evident in the first and third workouts shown. Notably, the third workout exemplifies the method's efficacy even in complex exercises like those performed in ERG mode.

It's important to note, however, that our approach does sometimes reveal only a weak correlation, possibly due to factors like limited data or a focus on certain representative regions (e.g., where a1>1), as illustrated in the last workout of Figure 1. Such weak correlations can also occur in data-richer workouts, as seen in the middle workout of Figure 1. If weak correlations are consistently observed across an athlete's workouts, it might indicate a non-responder pattern. These observations suggest that broader datasets could provide more comprehensive insights. This expectation is indeed confirmed when considering bigger datasets obtained by merging multiple workouts together.

Correlation in workout groups

Figure 2: Correlation of data points obtained by merging workouts into groups. The groups shown correspond to different users and contain 4, 8, 4 workouts respectively. Light blue points represent all workout data collectively, while red and purple points are the representatives computed inside and outside the dynamical region respectively. Notice that the first group contains the second workout shown in Figure 1, and the second group refers to an athlete often using ERG mode. The final image in the figure presents our least successful result, the only group showing a positive correlation, which belongs to a user who has a total of 13 workout groups. Additionally, each group is analyzed with linear and hyperbolic fits, and their corresponding equations are displayed. The goodness of these fits is measured by the R^2 value, with a value closer to 1 indicating a more accurate fit.

Larger datasets are obtained by grouping workouts executed within a short time frame, as 10 days. Each workout group is nothing but a 10-day snapshot of data regarding external (power) versus internal (a1) effort, and offers a more nuanced understanding of performance. However, it's important to limit the time frame, like 10 days, to avoid the influence of physiological adaptations on our short-term analysis.

As Figure 3 illustrates, our method consistently outperforms the standard correlation calculation approach for all users. Now, the majority of the correlation values fall below -0.7, and the number of positive correlation is reduced extremely. Indeed, only one group out of 73 shows a positive correlation, which is the same group depicted in the third image of Figure 2.

In fact, there is a number of benefits in considering workout groups:

  • Reduced Impact of Workout Types: Merging different types of workouts, as seen in the second group of Figure 2, minimizes the influence of workout variety. This results in more generalizable insights;
  • Enhanced Significance of Data Points: In workout groups, data points are more densely packed, making the representatives much more meaningful than in isolated workouts. For example, representatives outside the dynamic range display a clearer behavior, aligning with the expectation that lower power corresponds with lower effort. This phenomenon is evident in Figure 2's first workout group, where workouts with weaker individual correlations (as the second image in Figure 1) contribute to groups with significantly stronger correlations;
  • Identification of Non-Responders: The approach facilitates the detection of non-responders—athletes whose workouts consistently show very weak correlations across almost all their workout groups. A further option to identify non-responders is to compare non-HRV thresholds, e.g. via automatic Critical Power detection in AI Endurance, to power at a1 of 0.5 and 0.75 and look for significant discrepancies. In our analysis of 11 athletes, we didn't find any non-responders;
  • Modelling: Collecting data over several days allows for a meaningful investigation of the short-term relationship between power and a₁. As demonstrated in Figure 2, we experimented with two types of two-parameter functions for each workout group - linear regression and hyperbolic fit since a single parameter hyperbolic fit did not fit the data sufficiently well. As detailed in Figure 4, neither model showed a remarkable preference, leading us to choose the simpler linear model for its straightforwardness: p = m × a1 + q.

In Figure 2, we explicitly show the values of the parameters m and q. These two parameters are not only user dependent, but also vary with fitness. For example, if an athlete achieves higher power for given a1 over time, i.e. growing fitness, the parameters m and q will change.

This law can be used to track the fitness level of each user across time and it offers a method to estimate the most current values of power at both aerobic and anaerobic thresholds, pAE = m × 0.75 + q and pAN = m × 0.5 + q without the need for any specific testing protocols, but simply out of the most recent workout history!

This is a major implication of our analysis: we can track an athlete's threshold without an exercise lab, without a testing protocol, only requiring them to enter the dynamic a1<1 range somewhat regularly during their workouts with clean HRV data.

The Method in detail and a Few Statistics

In analyzing the correlation between power (p) and the intensity marker (a1), certain prerequisites must be met for the data:

Data Quality: Precision is crucial, especially since power meters are generally more reliable (with about 1% error) than heart rate (HR) monitors and HRV detection. This is particularly true when considering artifacts - e.g. missed beats by HR monitors - which can significantly affect the a1 calculation. Hence, our focus is on workouts with a maximum of 5% artifacts. From an initial pool of 3123 workouts across 21 users, this criterion narrows down our dataset to 2096 workouts (still 21 users), forming 411 10-day workout groups.

Fatigue Consideration: To mitigate the impact of fatigue, we concentrate on the first 15 minutes of each workout.

Data Consistency: Inconsistent data, especially during periods without pedaling (where power, p, equals 0), can skew results. Indeed, while stopping spinning for few seconds once in a while is not a problem, it becomes one when it is done for longer times and/or very frequently. In such a situation, we would have a lot of data points with the same power value p=0 and different values of a1, that is different levels of internal stress, significantly spoiling the correlation. To maintain consistency, we exclude workouts with prolonged or frequent periods of no pedaling and discard data points with p=0.

Data Range and Intensity: For a meaningful correlation between p and a1, a sufficient range of the intensity marker, a1, is necessary. We focus on datasets where at least 50% of the data points are within the dynamic range (a1<1), which indicates a significant effort level. We do not process data with mostly a1>1, typically corresponding (in healthy individuals) to very easy rides.

The last criterion is the most restrictive, reducing the number of eligible single workout datasets to 554 across 17 users, and workout groups to 73 across 11 users. This reduction is also influenced by our decision to exclude workout groups with fewer than 4 workouts to minimize the influence of individual workouts on the overall group analysis.

Even after meticulous data selection, directly correlating p and a1 using all data points available (p, a1) typically yields weak correlations, as illustrated by the blue histograms in Figure 3. This is due to the fact that, while data points are a list of (p, a1) pairs collected at the same time, there is a physiological lag in the heart response (and thus HRV and a1) to an increase/decrease in power output.

This physiological lag means that when a user suddenly increases power, there are data points with high power but still high a1, as the latter takes time to catch up. Similarly, during a decrease in power (especially after intense effort), there are points with low power but still low a1. These inconsistencies lead to poor correlations.

An example would be an athlete attempting a short, say ~2 min, power burst after riding at low intensity and going back to riding at low intensity after the effort. The athlete goes through the following stages:

  • Low intensity before the effort: low power, high a1;
  • Intensity burst: high power, high a1 in the beginning, low a1 towards the end (due to cardiac lag the cardiovascular system does not 'feel' the effort in the beginning);
  • Low intensity after the effort: low power, low a1 at first, high a1 after a short time (due to cardiac lag the cardiovascular system needs some time to recover from the effort).

The raw (a1, p) data of this example exhibits data points outside the expected negative correlation, e.g. high a1 and high p as well as low a1 and low p data. To address this issue, we have developed a method to smooth out these problematic data points:

  1. Data Segmentation: We divide the data points into intervals based on a1 levels, as indicated by the yellow (and purple) dashed lines in Figures 1 and 2;
  2. Representative Points: For each interval, we calculate a representative point (pavg,aavg) using the average of p and the average of a1 within that interval. These are the red (and purple) points in the figures;
  3. Correlation of Representatives: Instead of correlating all individual data points, we correlate these representative points.

By applying this technique, the correlation values obtained are significantly stronger than the standard method. For most users, these improved correlations indicate a strong relationship between power and a1, aligning with our expectations.

Correlations in single workouts

Figure 3: Distribution of correlations obtained in all workouts (above) and workout groups (below). Blue shows the correlations obtained using all data points, red shows the smallest correlation obtained with our method, that is either the red or the red+purple one in Figure 2.

The use of representative points to calculate correlations significantly alters the results when compared to the standard method. This effect is clearly visible in the histograms in Figure 3, which illustrate an exaggeration of standard correlation values, typically leaning towards -1.

Regarding single workouts (first histogram), the mean for the standard correlation (blue) is -0.25 with a standard deviation of 0.26, and the range spans from -0.95 to 0.41. In contrast, correlations of representatives (red) show a mean of -0.44 with a broader standard deviation of 0.55, and the range extends from -1 to 1. Notably, while only 3% of workouts have a standard correlation below -0.7, this proportion dramatically increases to 44% when using representatives. Furthermore, in 74% of cases, the correlation calculated with representatives is lower (stronger in a negative sense) than the standard correlation.

The same pattern is observed for workout groups. The mean values for standard (blue) and representative (red) correlations are -0.32 and -0.75, respectively, with standard deviations of 0.19 and 0.27. In this scenario, a significant 66% of workout groups exhibit a red correlation below -0.7, compared to only 4% in the standard case. Moreover, in 96% of workout groups, the red correlation is smaller (stronger) than the blue one. This implies that all users have at least 92% of their workout groups showing an improvement in correlation calculation with our method over the standard approach.

When considering the two-parameter fits to determine the relationship between power output and physiological response, our analysis does not distinctly prefer either the linear p = m × a1 + q or the hyperbolic p = c / a1 + d model. Both approaches seem equally valid in the context of both the dynamic and the entire a1 range, as illustrated in Figure 4. Given the lack of a clear advantage for one model over the other and the current size of our dataset, we lean towards adopting the linear model for its simplicity and ease of interpretation.

linear vs hyperbolic model in dynamical range
linear vs hyperbolic model in entire range

Figure 4: Goodness of linear and hyperbolic fits in the dynamical region (above) and entire region for a1 (below). The greater the R^2, the better the fit. In the former case we have means 0.59 and 0.64 (standard deviations 0.29 and 0.28) for linear and hyperbolic law respectively, while in the latter case we have means 0.49 and 0.54 (standard deviations 0.30 and 0.26) respectively. There is no clear indication on whether one type of fits should be preferred to the other.

A bigger dataset may reveal the most suitable model to represent the universal relationship between power and a1 in the future.

References

  1. Fractal Correlation Properties of Heart Rate Variability: A New Biomarker for Intensity Distribution in Endurance Exercise and Training Prescription? - Thomas Gronwald, Bruce Rogers, Olaf Hoos - Front. Physiol. 2020
  2. Correlation properties of heart rate variability during endurance exercise: A systematic review - Thomas Gronwald, Olaf Hoos - Ann. Noninvasive Electrocardiology 2019
  3. Validity of detrended fluctuation analysis of heart rate variability to determine intensity thresholds in elite cyclists - Manuel Mateo-March, Manuel Moya-Ramón, Alejandro Javaloyes, Cristóbal Sánchez-Muñoz, Vicente J. Clemente-Suárez - Eur. J. Sport Sci. 2023
  4. A New Detection Method Defining the Aerobic Threshold for Endurance Exercise and Training Prescription Based on Fractal Correlation Properties of Heart Rate Variability - Bruce Rogers, David Giles, Nick Draper, Olaf Hoos, Thomas Gronwald - Front. Physiol. 2021
  5. Detection of the Anaerobic Threshold in Endurance Sports: Validation of a New Method Using Correlation Properties of Heart Rate Variability - Bruce Rogers, David Giles, Nick Draper, Laurent Mourot, Thomas Gronwald - J. Funct. Morphol. Kinesiol. 2021
  6. Influence of Artefact Correction and Recording Device Type on the Practical Application of a Non-Linear Heart Rate Variability Biomarker for Aerobic Threshold Determination - Bruce RogersDavid GilesNick DraperLaurent MourotThomas Gronwald - Sensors 2021
  7. Establishing the VO2 versus constant-work-rate relationship from ramp-incremental exercise: simple strategies for an unsolved problem - Danilo Iannetta, Rafael de Almeida Azevedo, Daniel A. Keir, Juan M. Murias - J Appl Physiol 2019
Share on:

More Blog Posts


How AI Endurance Training can help you reach your goals

How AI Endurance Training can help you reach your goals

AI has great potential to help us as endurance athletes improve our training. In this post we will discuss how AI endurance training works.

Control smart trainer with Garmin for your workouts

Control smart trainer with Garmin for your workouts

If you do not want to use Zwift or other virtual platforms, you can simply execute your AI Endurance cycling workouts by letting your Garmin control your smart trainer. For example, let your Garmin Edge 530 or Forerunner 945 control your Wahoo Kickr trainer. All smart trainers supporting the ANT+ FE-C protocol, including Tacx, are supported.

Respiration frequency and HRV thresholds

Respiration frequency and HRV thresholds

by Markus Rummel. We present the first results of AI Endurance's new capability to calculate Respiration Frequency (RF) from in-activity heart rate variability (HRV) data. RF demonstrates its potential in assessing the validity of HRV threshold determination.

How to Find Motivation to Execute Your Cycling Training Plan

How to Find Motivation to Execute Your Cycling Training Plan

In this post, we discuss staying motivated executing your cycling training plan when you’re not training for an event. Also we give an update on our training and give a status report on how AI Endurance’s performance predictions are stacking up against reality.