Breaking degeneracies with the Sunyaev-Zeldovich full bispectrum

Non-Gaussian (NG) statistics of the thermal Sunyaev-Zeldovich (tSZ) effect carry significant information which is not contained in the power spectrum. Here, we perform a joint Fisher analysis of the tSZ power spectrum and bispectrum to verify how much the full bispectrum can contribute to improve parameter constraints. We go beyond similar studies of this kind in several respects: first of all, we include the complete power spectrum and bispectrum (auto- and cross-) covariance in the analysis, computing all NG contributions; furthermore we consider a multi-component foreground scenario and model the effects of component separation in the forecasts; finally, we consider an extended set of both cosmological and intra-cluster medium parameters. We show that the tSZ bispectrum is very efficient at breaking parameter degeneracies, making it able to produce even stronger cosmological constraints than the tSZ power spectrum: e.g. the standard deviation on $\sigma_8$ shrinks from $\sigma^\text{PS}(\sigma_8)=0.35$ to $\sigma^\text{BS}(\sigma_8)=0.065$ when we consider a multi-parameter analysis. We find that this is mostly due to the different response of separate triangle types (e.g. equilateral and squeezed) to changes in model parameters. While weak, this shape dependence is clearly non-negligible for cosmological parameters, and it is even stronger, as expected, for intra-cluster medium parameters.


Introduction
The thermal Sunyaev-Zeldovich (tSZ) effect [1,2] [3, for a recent review] is a spectral distortion of the Cosmic Microwave Background (CMB), mostly generated in galaxy clusters by inverse Compton scattering of CMB photons off hot electrons. The tSZ effect is a powerful cosmological observable, mainly applied to the study of individual clusters, to build cluster catalogues and to extract number count statistics. A complementary possibility consists in the study of the tSZ angular power spectrum. After being originally discussed in [4], this approach has then been adopted as a powerful probe of the low-redshift Universe, to test both the standard ΛCDM scenario [5,6] and some extended models, which encompass primordial non-Gaussianity, massive neutrinos and dark energy [7][8][9][10][11]. One of the advantages of the tSZ power spectrum analysis is that it allows including also small, unresolved clusters, and it does not require direct measurements of cluster masses.
As it has been argued long before the tSZ was routinely measured across the sky [12], it is important to notice that the tSZ map is highly non-Gaussian, therefore only a part of the available tSZ information is actually captured by the power spectrum. A natural question which arises is therefore how much additional information can be extracted from higher order statistics, starting with the bispectrum (i.e. the 3-point multipole correlation function).
An initial theoretical study of the tSZ bispectrum was performed in [13]. There, it was pointed out that, besides having a different amplitude scaling with σ 8 , compared to the power spectrum, the bispectrum signal also takes its main contributions from massive clusters at low redshift. This makes the impact of astrophysical uncertainties smaller in the bispectrum than in the power spectrum, since the latter has sizeable contribution from lesswell-understood, low-mass, high-redshift clusters. At the same time the tSZ skewness was detected using Atacama Cosmology Telescope (ACT) data [14,15], while an all-sky Comptony map, and subsequent measurements of both the skewness and the bispectrum were later obtained by Planck [5,6]. In both cases it was shown that the bispectrum could be used to obtain significant constraints on σ 8 . A later study [16] combined cluster counts with power spectrum and equilateral (all three 's equal) bispectrum measurements, showing that the bispectrum -even just by considering the small equilateral triangle subset -can play a significant role in breaking (cosmological and astrophysical) parameter degeneracies. If, as we will show, this is the case, the tSZ bispectrum would be an extremely valuable source of information, especially in light of how difficult it is proving to understand the astrophysics of halos and reconcile various measurement of gas parameters, particularly the hydro-static mass bias b HSM [17][18][19][20][21][22].
These and other similar results clearly encourage further investigation. The aim of this work is to explore how much extra-information is contained in the 3-point function, via a detailed joint Fisher analysis of the tSZ power spectrum and bispectrum. Our main goal is to make our forecasts as realistic and accurate as possible. For this purpose: • We consider the entire bispectrum domain (not just specific triangles) and compute for the first time the full tSZ bispectrum covariance, beyond the Gaussian approximation.
We therefore include all contributions in the bispectrum covariance, up to the connected 6-point correlator, and we will also account for correlations between the power spectrum and the bispectrum.
• We extend our parameter space with respect to previous forecasts and analyses, in order to model in greater detail the impact of uncertainties in the electron pressure profile.
• We account for foreground contamination and model (in a simple way) the effects of component separation, using Internal Linear Combination (ILC).
As we show in the following, our analysis reinforces the conclusion that the tSZ bispectrum is a powerful observable to constrain cosmology, especially due to its efficiency in breaking degeneracies between astrophysical and cosmological parameters, otherwise present in a power spectrum-only analysis.
As a note to our previous statements, let us also stress here that the bispectrum does not account for all the cosmological information which can be extracted from the NG component of the tSZ map, since of course relevant contributions also come from higher order correlators. As long as most of the information is captured by the amplitude -and not by the shapeof different tSZ correlation functions, it was shown that the tSZ 1-point probability-densityfunction is a near-optimal statistic to constrain cosmology, since it encodes information from all n-point amplitudes (see [23] and references therein). The implementation of the 1-point statistic in parameter inference is however work in progress at the moment, making bispectrum estimation still a viable and worth pursuing alternative approach in practice. Moreover, we will show in the remainder that, while indeed small, the tSZ bispectrum shape dependence on cosmology is not negligible, and it actually plays an important role in breaking parameter degeneracies; this argument becomes even stronger when we account for the shape dependence on IntraCluster Medium (ICM) parameters.
The paper is structured as follows: in section 2 we review the halo model approach applied to the theoretical calculation of tSZ n-point correlation function, we show our choice of halo mass function and bias and we describe the analytical model of electron pressure profile which we use in the analysis; in section 3 we illustrate the calculation of the full covariance matrix for our observables; in section 4 we analyze the dependence of the tSZ power spectrum and bispectrum on different cosmological and ICM parameters; in section 5 we discuss foreground contamination issues; in section 6 we describe in detail and comment the results of our analysis. Finally, we summarize and draw our conclusions in section 7.

Theoretical tSZ correlation functions
The temperature fluctuations associated with the tSZ effect appear in sky maps as ∆T T (ν,n) = g(ν) y(n) , g(ν) ≡ x coth(x/2) − 4 , y(n) ≡ σ T k B n e T e m e c 2 dr . (2.1) Here g(ν) encapsulates the spectral dependence of the tSZ effect in units of CMB temperature with x ≡ hν/(k B T ), while y, called the Compton-y parameter, is the tSZ intensity along the line of sightn. σ T is the Thomson cross section and m e , n e and T e are respectively the electron rest mass, number density and temperature. The statistical properties of the y field can be extracted from its n-point angular correlation functions that we predict using the halo model approach [24, for a review]. Under some simplifying assumptions, discussed later, the expressions for all n-point correlators are well known [4,6,13,16,25,26]. They can be derived via the formalism detailed in appendix A, where we express the y-polyspectra as the projection onto the past light-cone of the threedimensional correlators of the corresponding three-dimensional field. This formulation is very useful in the mathematical derivation and numerical implementation of the general y field n-point correlator.
The tSZ power spectrum is given by the sum of the Poisson one-halo term, which account for intra-halo correlations, and the two-halo term which models inter-halo correlations [7,27]. They read Here P m (k) is the linear matter power spectrum, D + (z) is the growth factor, and d 2 V / dz dΩ is the comoving volume element per steradian, which can be calculated as d 2 V / dz dΩ = cχ 2 (z)/H(z), where χ(z) is the radial comoving distance. The Halo Mass Function (HMF) dn dM and halo bias b will be described in section 2.1.ỹ (z, M ) is the 2D Fourier transform of the projected y parameter image of the halo defined in detail in section 2.2. Notice that in eq. (2.2), (2.3), and throughout the paper we also consider a hydrostatic mass bias rescaling the true mass of the halo, i.e., we useM = (1 − b HSM )M in the expression for the projected y.
Higher order correlators are a straightforward extension of the power spectrum. Throughout this work, we will adopt the flat-sky approximation. Therefore, we will have to include the reduced bispectrum in our NG analysis. The bispectrum is described by one-, two-and three-halo terms. Including the three halo term would require the second order halo bias, which is not provided in [28]. To overcome this shortcoming, one could in principle resort to the peak-background-split formalism [29][30][31][32]; however, since the three-halo contribution is negligible we just omit it here. The one-and two-halo terms read b 1h Finally, for the higher order correlators, we will always employ only the one-halo term, which in general reads For later use, we point out that the general one-halo term does depend solely on the magnitudes of the multipoles involved. Unless specified otherwise, in all the spectra we integrate over redshift between z min = 10 −6 and z max = 4.5, and over masses between M min = 10 10 M h −1 and M max = 10 16 M h −1 . These limits, that stretch beyond the values commonly used in the literature, allow us to ensure that even higher order correlators are integrated correctly. We integrate directly over the overdensity mass M 500,c , as in, e.g., [6,9], so that a conversion to the virial masses is never required, and also assume that d ln M ∆,c / d ln M vir ≈ 1 [9]. We use b HSM = 0.2 [6] as the fiducial value throughout the analysis. We choose the values from [33] for the cosmological parameters h = 0.6711, n S = 0.9624, w 0 = −1, but we pick Ω m = 0.28, σ 8 = 0.8 to be in agreement with [6]. 1 The fitting functions we employ, that we are about to describe, are the most commonly used in the literature. A possible alternative, proposed in [35], would be to jointly fit both the pressure profile and the HMF from simulations

Halo mass function
We use the HMF and bias from [28], converting the mass definition M 500,c into M ∆,m , to fit the parameters of their table 4 to the appropriate value of ∆ = 500ρ c /ρ m .
We assume that the redshift scaling provided for ∆ = 200 applies to any ∆: where the parameters at z = 0 are taken from table 4 of [28], using a linear interpolation along ∆. As recommended, for each z > 3 we use the value calculated at z = 3, and at each z we calculate α(z) imposing dνf (ν, z) b(ν, z) = 1 . At z = 0 we correctly recover the value of α interpolated from the above mentioned table.
We use a halo bias of the form with the parameters from table 2 of [28].

Electron pressure profile
In our analysis, we are interested in the projected two-dimensional Compton-y field, which is obtained via line-of-sight integration of the rescaled electron density profile P e , in any given direction of the sky through x P e (x, z, M ) . (2.10) Here r s = r s (z, M ) and s = a(z)χ(z)/r s (z, M ) are, respectively, the typical scale radius of the y-image of the halo, and the multipole moment associated with it. For the parametrization of the electron profile that we employ, given in [36], r s = r 500 and, for a single cluster, P e relies on the generalised Navarro-Frenk-White profile As we rely on the halo model for the description of matter clustering, r 500 is the distance from the center of the halo, delimiting a sphere containing a density of dark matter which is 500 times the critical density of the Universe ρ c . The function C (x, z, M 500,c ) has the expression where α P = 0.12, and the exponent α P (x), also fitted in [36], has the following dependence α P (x) = 0.10 − (α P + 0.10) (2.14) These exponents parametrise deviation from the standard self-similar case. In eqs. (2.12)-(2.13) we also introduced the widely used parameter h 70 ≡ h/0.7.

Covariance matrix for the (binned) observables
In this work, we consider flat-sky binned estimators for the observables of interest as they are smooth and slowly varying functions in multipole space. The bins are defined in harmonic space: As we are working in flat-sky, ∈ R 2 . We are also considering a survey observing a sky fraction corresponding to a solid angle of Ω sky steradians. Similarly to the three-dimensional matter field case, the binned angular power spectrum is defined as [37][38][39] where the sum runs over discrete modes which are integer multiples of the fundamental frequency of our survey f ≡ 2π/Θ sky , Θ sky being the survey linear angular size. We recall that Ω sky = 2π (1 − cos Θ sky ) .

(3.2)
The fieldỹ is the two-dimensional Fourier transform of the y field (2.1). The quantity N ( b ) ≈ 2 ∆ b f sky (in the limit f ) gives the number of vector pairsˆ , −ˆ whose magnitudeˆ is within the bin b , each pair being discriminated by a deviation in the module of the vectors of a unit of the survey fundamental mode f [40][41][42]. In a similar fashion to the three-dimensional matter field case, we define the binned angular bispectrum as [42][43][44][45][46] b b and can therefore be described by a single suitably picked representative¯ b . 3 In practice, it is possible to prove that the 2 estimators above are unbiased and . In the following we drop the bars to distinguish the bin from its representative, as they should be distinguishable from the context. The representative considered in this work will be the central magnitude value¯ b ≡ of the bin b .

Structure of the covariance matrix
For our computation, we chose the binned estimatorsĈ b (3.1) andb 1 2 3 (3.3) for which we present the covariance matrix here below. We split the joint covariance as [47][48][49][50] Cov In eqs. (3.5) and (3.6), the subscript Gauss labels the covariance terms containing only 2point statistics, which are non-vanishing only for correlations within the same −bin. The other covariance terms arise due to the non-Gaussian statistics of the y field (2.1) and correlate modes in different −bins and the power spectrum and bispectrum. The derivation of the flatsky joint power spectrum-bispectrum covariance has already been outlined for other projected scalar fields, such as the weak lensing convergence [42,49]. As the derivation of the covariance for the y field follows the same mathematical and conceptual steps, we refer to the above cited works for more insights on its rigorous derivation. In the following, we report the final expressions employed in the present work, along with the most important concepts associated to their structure.

Power spectrum covariance matrix
Starting from the power spectrum binned estimator (3.1), we can obtain its full covariance by applying the following standard definition A detailed calculation leads to a non-connected correlator of 4 instances of the fieldỹ which can be split via Wick theorem into a sum of products of 2-point correlators and one 4-point connected correlator. The former can be further simplified, leading to the following Gaussian term In eq. (3.9) we potentially account for sources of Gaussian noise via the associated power spectrum N . The 4-point connected component leads instead to the NG term in eq. (3.5) where T , − , , − is the trispectrum (A.13) of the tSZ field. In eq. (3.10) the exact covariance evaluation would require an average of the trispectrum over the two bins b , b 3 NG,6P compared to the full covariance. The white areas correspond to configurations for which the covariance is fully determined by Cov (6) BB for which the plotted quantity is not defined. In both panels, the ticks mark equilateral configurations with sides of the labelled length.
involving both angular and magnitude integrations. However, we work under the common approximation of slowly varying polyspectra within the bins chosen for our work [42,49]. We therefore assumed the averaged trispectrum being the same as the trispectrum computed at the central values of the bins.

Bispectrum covariance matrix
Similarly to the procedure outlined in the previous section, we can compute the covariance matrix for the flat-sky bispectrum binned estimator (3.3), shown in figure 1, left panel. [42,51,52]. The Wick theorem allows us to derive the Gaussian part of the bispectrum covariance Meanwhile, the non-Gaussian (NG) components read Cov (3.13) By employing the estimator definitions, we can also obtain an expression for the crosscovariance between the binned flat-sky observables used in our analyses (3.14) So far, all the equations listed in the present section are exact. The two terms still missing are those associated to the 6-and 5-point correlator arising from the bispectrum covariance and the cross-covariance calculation. In these cases, the spectra should be averaged over the bins involved. However, as it was the case for the NG component of the power spectrum covariance (3.10), we assume them to be almost constant over the bins thus avoiding this computationally expensive operation. Therefore, the final expressions are In the right panel of figure 1 we show the fraction of the total covariance due to the 6-point connected term we just described, which is the dominant one: as shown in figure 2

Power spectrum and bispectrum dependence on the parameters
It is well known that varying cosmological and gas parameter results in an amplitude shift of the power spectrum [e.g. 9, 53] and a slight tilt at smaller scales, where the power spectrum has a stronger dependence on the shape of the halo profile. As a consequence, there are important degeneracies among the various parameters and, both with current and future data, one needs to consider additional observables to constrain various parameter jointly, either exploiting other datasets [e.g. 53] or other statistics extracted from the same tSZ maps [16].
Overall, also the bispectrum displays a similar dependence on different parameters, leading again to an amplitude shift as the main effect. This comes from the fact that tSZ correlators are dominated by the one-halo term, making the overall tSZ statistic nearly Poisson. However, for the bispectrum we can also appreciate a non-negligible, configuration dependent modulation, which differs for varying parameters.
Let us start from considering the bispectrum derivatives for the equilateral ( 1 = 2 = 3 ) and "squeezed" configurations ( 1 ≈ 22, 2 = 3 ), and compare them to their power spectrum counterparts in figure 4. 4 Much like the power spectrum, the derivatives of the equilateral bispectrum are similar to each other, most of them being somewhat flat up to ≈ 10 3 where the halo inner structure starts to be resolved, with a subsequent reduction (in absolute value) on smaller scales. On the other hand, the derivatives of the squeezed bispectrum have a slightly more variegated phenomenology among themselves and, on top of that, they differ from the equilateral bispectrum ones. For this reason we can intuitively expect squeezed configurations to be less affected by parameter degeneracies. While in section 6.1.1 we will confirm this expectation, we will also find that the conditional errors will be much bigger to begin with than what we find using only equilateral configurations, further motivating the analysis of the full bispectrum. Using the bispectrum parametrization introduced in [54], in figure 5 we show how each configuration responds to parameter changes in a slightly different manner. Even though each one has lower significance than the power spectrum, this means that it is easier to discern the effects of the various parameter changes. We will quantify this statement in section 6. The reason why different bispectrum configurations respond differently to changes in the various parameters lies in the slightly different ranges of halo masses and redshifts that contribute the most to the specific configuration. This is investigated in appendix E.

Gaussian experimental noise and foregrounds
To assess the impact of instrumental noise and foregrounds on the forecast we employ the methodology of [7,12]. We assume to use multi-channel measurements to run a simple internal linear combination (ILC) to separate foregrounds from the signal. To do so, we will assume perfect knowledge of the foreground spectral energy densities, which might be unrealistic in a real life scenario [55][56][57]. We leave the assessment of the impact of more refined foreground modelling to future work.

Spectral components
A comprehensive list of the spectral components that affect tSZ measurements is given by instrumental noise, CMB, free-free, thermal dust, synchrotron, radio point sources, infrared point sources. We will consider all of these contributions even if we use approximate descriptions that are anyway commonly used in the literature. We assume we can factorize the inter-frequency correlation in three ingredients: the spectral energy density expressed with respect to the CMB black-body Θ sc (ν), a spectral coherence factor that parametrize the correlation between different instrumental frequency channels R(ν i , ν j , ξ sc ), and an angular scaling C sc : . On the right we also provide a rescaled plot for the perimeter value of P = 8130 to illustrate the differences in derivatives for different triangles.
In [58] the spectral coherence in two channels ν i and ν j , for each component, was parametrized as where, for each spectral component, ∆α sc is the variance of the spectral index over the sky.
Depending on the spectral component, ξ sc varies in the range [0, ∞). In particular, we assume the instrument frequency channels to be independent (ξ sc → 0), and notice that the CMB has perfect correlation across all the frequencies (ξ sc → ∞). Θ sc , ∆α sc , and C sc vary for each spectral component. We will use the values and functional forms provided in [7,56,58,59], that we report in appendix B.

Component separation
The ILC aims at recovering, multipole by multipole, a signal of known spectral dependence computing a weighted average of data collected at different frequencies. 5 The weights are calculated in such a way to have unitary response to the desired spectral shape, while minimizing spurious contributions from other spectral components.
The internal linear combination weights and the effective noise term after multifrequency subtraction are respectively [12] where g = (g(ν 1 ), · · · , g(ν 1 )) T is a vector of the value of the tSZ spectral shape evaluated on the observed channels. This result, valid for a single multipolar coefficient, for each can be averaged over all m: we can therefore express the effective noise term N = N/(2 + 1). 6

Forecast results
We now have all the ingredients to understand what is the information content of the power spectrum, of the full bispectrum, and of their combination. We use a Fisher forecast to estimate how well a particular experiment can constrain a parameter through some observed quantity. The components of the Fisher information matrix are defined as [60]: where L is the likelihood function and the θ i are the parameters we want to constrain. In case of Gaussian distributed data and rotational invariance of the observable, the Fisher matrix can also be written as [61]: is the covariance matrix, described in section 3. We drop the second term as all the relevant information is contained in the mean of the observable, while the covariance dependence on the parameters would introduce spurious information that cannot be extracted by the considered estimator [61]. In principle neither the power spectrum nor the bispectrum follow a Gaussian distribution, but we expect it to be a good approximation for the binned spectra in virtue of the central limit theorem. This ansatz should however be validated against simulations or more refined analysis [62,63] that we leave for future work. For each parameter the conditional error is: which is the error one gets fitting only one parameter while fixing all the others. What we are more interested in, however, is the marginalized error. The diagonal entries of the inverse of a Fisher matrix give the error for each parameter marginalized over all the others, which one would obtain by a multivariate fit: The off diagonal elements then give us the covariance between two parameters, Cov(θ i , θ j ).
We start by considering an ideal, noiseless experiment in the absence of foregrounds, to understand the general properties of the spectra in a simplified case, then we show a forecast for the already available Planck data, and finally we consider two realistic future surveys.

Noiseless survey with no foreground contamination
In order to set an upper limit on the amount of information that can be in principle extracted from the tSZ power spectrum and bispectrum, we start by considering the case of a noiseless survey while also neglecting any issue of foreground contamination. This will also help us to build understanding of the main factors which affect the final constraints. For brevity we will refer to this configuration, slightly improperly, as Cosmic Variance Limited (CVL) survey. This initial oversimplified analysis will then be generalized in the following sections, including all realistic effects.
In tables 1 and 2 we show the forecasted error bars on all the parameters for a full sky survey, with max = 1000 and max = 5000 respectively. To assess the impact of a change in min instead, we repeated both analysis switching from min = 10 to min = 70 and found quantitatively negligible differences.
In both cases, if one considers the conditional errors, the power spectrum outperforms the bispectrum analyzed on its own by a factor 2 ∼ 10; obviously when the two signals are analyzed jointly, the errors shrink marginally with respect to the power spectrum case. If all but one parameters are perfectly known, the power spectrum, comparatively bigger than its noise with respect to the bispectrum, leads to tighter constraints on the last parameter. This outcome is overturned by the marginalization needed for a joint fit of the parameters.  Table 2: Conditional and marginalized error forecasts for a CVL experiment with max = 5000 and f sky = 1.
As anticipated in section 4, at power spectrum level the parameters have serious degeneracies among all of them, whereas the bispectrum has a more diverse response to parameters change. To quantify this statement, in figure 6 we compare for each couple of parameters the Pearson correlation coefficient that one obtains using the power spectrum, the bispectrum, and their combination, in the case of max = 5000. Correlations are generally much higher for the power spectrum and for this reason after marginalization the power spectrum loses much of its constraining power. The same information is also displayed in figure 7, where we show the triangle plot of cosmological and gas parameters.
To show the dramatic impact of degeneracies and the incremental reduction of constraining power when more and more parameters are jointly fitted, in figure 8 we directly compare the error bars recovered in various scenarios. For each parameter the baseline (tightest possible constraints in our analysis) is the conditional error with a joint power spectrum and bispectrum fit. All other errors are shown in figure 8 as ratio to this value. For each parameter the top error bar is calculated with the power spectrum analysis, the middle one with bispectrum analysis and the bottom one with the joint fit of the two. For each bar the most saturated and least saturated color show the conditional and marginalized error, respectively. In the left panel the mid-tone bar is obtained fixing the value of all gas parameters and marginalizing over cosmological ones; in the right panel we did the opposite.
Having made the point that the bispectrum allows to separately fit multiple parameters, we now inquire what are the prospects considering tSZ observations combined to prior knowledge of cosmological and gas parameters. Generally, the gas parameters are fixed to the best-fit values of simulations [64], external X-Ray [36], or stacked tSZ clusters [65] measurements, and not varied throughout the analysis of cosmological parameters. On the other hand, one can also think of using the Planck primary anisotropies measurements to set a prior on the cosmological parameters, and exploit tSZ anisotropies data to cross-check the gas parame-  Notice that the power spectrum 1σ ellipses have been rescaled by a factor to fit in the same graph. The grey bands are the power spectrum conditional errors.
ters. We explore both strategies in table 3. In the left side we fixed (P 0 , α P , c 500,c , α G , β G , γ G ), removing the respective rows and columns from the Fisher matrix. In the right side, instead, we add a (Gaussian) prior to the cosmological parameters, obtained from the 1σ errors in the column TT,TE,EE+lowE+lensing+BAO in table 2 of [34], with the additional assumption of the errors being uncorrelated. Since our paper is meant to be a proof of concept, and a comparison between the power spectrum and the bispectrum, we refrain from comparing the  Figure 8: Impact of degeneracies on the parameters error bars. Left) for each cosmological parameter we show the error marginalized over all others parameter, the error marginalized over cosmological parameters, fixing the gas ones, and the conditional error; all divided by the conditional error of power spectrum and bispectrum combined. This is repeated for (top to bottom in each triplet of bars) power spectrum, bispectrum, and combination of the two. The constraining power of the power spectrum is hindered by the marginalization over other parameters; on the other hand the bispectrum, in principle less constraining, is less affected. Right) same, but switching cosmological and gas parameters.   Table 4: Same as table 2 (CVL, max = 5000, f sky = 1) but with the halo redshift and mass boundaries observationally set by Planck .

Gas fixed
In particular we study the impact of massive halos at low redshift, since higher order correlators are incrementally sensitive to the signal from these clusters. In principle, to calculate the y-distortion along the line of sight one has to integrate from 0 to infinity, and again in principle the HMF has been calibrated in [28] integrating clusters up to a mass of 10 16 M . However, the most massive cluster in the Planck SZ cluster catalogue [66] has a mass of 1.61 × 10 15 M , whereas the closest detected cluster is at z = 0.011. To account for these observational constraints we repeat our analysis enforcing these values as upper limits in the redshift and mass integrations. The results that follow are shown in table 4 which is the main result of the paper. Removing massive nearby clusters has a deeper impact on higher order correlators with respect to the power spectrums and the bispectrum, therefore the signal to noise increases for both observables. For the power spectrum, this translates to proportionally tighter constraints on the parameters, compared to the fiducial model. For the bispectrum the picture is more complicated: conditional error bars shrink as expected; but correlations among different parameters become more severe, enlarging the marginalized error bars. In any case, the marginalized errors obtained with the bispectrum, even with this model, still improve the PS one by approximately one order of magnitude. Therefore, our conclusions are qualitatively unchanged. Even more than in the fiducial case, combining power spectrum and bispectrum improves the constraining power and reliability of the results. Since the correlations are particularly severe for a subset of parameters, fixing even a limited number of them would allow for recovering most of the constraining power we forecasted with the Tinker fiducial model, as seen in figure 10. Nevertheless we conservatively quote as marginalized errors the results obtained without fixing any parameter.

Analysis of a subset of configurations
To validate our statement that the bispectrum is effective at breaking degeneracies because of the shape-dependent response to changes in parameters, combined with the large number of available triangles, we repeat our analysis limiting ourselves to only equilateral configurations, only squeezed configurations, and a combination of the two.  Table 5: Ratio of the equilateral (Eq.), squeezed (Sq.), joint equilateral-plus-squeezed (Eq.+Sq.), and full bispectrum error to the power spectrum one. After marginalization, the constraining power of the bispectrum in one single limit is degraded comparably to the power-spectrum. The same does not apply when we consider the combination of equilateral and squeezed limit. Obviously, the full bispectrum -that encompasses many other configurations, orthogonals and all the intermediate ones -is even more stable under the marginalization.
If we consider conditional errors (left part of table 5), the power spectrum constraining power exceeds the bispectrum The equilateral configurations contribute the most for the bispectrum, whereas the squeezed ones have one order of magnitude less constraining power. Upon marginalization (right part of the table), the constraining power of the squeezed and equilateral bispectrum degrades, as it does for the power spectrum, with the squeezed one degrading noticeably less. However, when we jointly analyze equilateral and squeezed limits, the degeneracies start to break and the bispectrum marginalized errors start to be competitive against the power spectrum ones. Obviously, the full bispectrum comprises even more kinds of configuration that can further ease the remaining degeneracies. We can therefore conclude that, as we claimed in section 4, the reason why the bispectrum constraining power is not as sensitive as the power spectrum one to parameter degeneracies can be explained with the presence of many more modes that are all weighted differently when the value of the parameters changes.

Validation against Planck noise and foreground contamination
The Planck satellite has provided a full sky map of the Compton-y parameter [6]. This data was used to measure the the tSZ power spectrum and the bispectrum, and the former was used to constraint cosmological parameters. It is interesting to understand what are the prospective results for a full fledged analysis of the bispectrum recovered from Planck data. To do so, we consider the Planck Gaussian noise and foreground determined in [6], and we restrict our analysis in the multipole range [70,1000] to remove the bulk of non-Gaussian noise and remaining spurious contributions. This was performed according to what has been done in [6,25]. We show our results in   cannot be meaningfully constrained separately, since are degenerate at power spectrum level. Insead, they can be combined in a parameter that controls the overall amplitude, such as [9]. In fact, we see that after marginalization the relative errors are O(100), while using the full bispectrum even Planck data could lead to O(1) relative errors even after marginalizing over all other parameters.
Introducing the cuts in mass and redshift have a comparatively similar impact as in the previous case, as shown in table 7.

Forecast for future realistic surveys
To assess what are the prospects of full bispectrum analysis in the near future and in the mid-term, we use the results on section 5 in the Fisher matrix formalism to model the impact  of Gaussian noise and foreground. As already discussed, the bulk of non-Gaussian contaminations can be removed enforcing a cut on the lowest multipole, which has little to no impact on the forecasted constraints.
In particular we consider two specific surveys, that are representative of what can be achieved in the next decade and in the next ≈ 30 years: Simons Observatory and the proposed Voyage 2050 Spectro-Polarimeter, respectively.
Simons Observatory (SO) [67] is an observational facility currently being built in the Atacama desert, that will be devoted to the measurement of gravitational lensing, tSZ effect, CMB temperature and polarization on very small scales. It comprises one 6 m Large Aperture Telescope (LAT) and three 0.5 m Small Aperture Telescopes. Here we just consider the LAT instrument, whose noise profile is stated in table 1 of [67]. The observation field will amount to 40% of the sky but, since we expect that some masking will be anyway needed, we conservatively set f sky = 0.30 in the relative forecast. The results are reported in table 8. Despite the lower sky coverage, the higher raw sensitivity and beam size will allow SO to greatly improve constraints over Planck.
The Voyage 2050 Spectro-Polarimeter (V-SP) [68] is an L-class mission concept that has been put forward for the ESA call Voyage 2050. V-SP will map the whole sky, and therefore we conservatively assume it will use a mask much similar to the one employed by Planck to generate the y map. Hence, when forecasting its performance we will take f sky = 0.47. We use the noise profile reported in table 1 of [68]. Table 9 show the results in this scenario; the performance incrementally increases over the SO forecast.
It is interesting to notice that V-SP will already be close to saturating the cosmic variance limited constraints. Since from the technological point of view we could already have the raw sensitivity to get close to the cosmic variance, this reinforces the need for developing new better methods of foreground removal, that will allow us to make use of this potential.
For a visual comparison between a CVL survey, Planck, SO and V-SP, we refer to appendix C.  Table 9: Conditional and marginalized error forecasts for an experiment with V-SP-like noise level and foregrounds, max = 5000, and f sky = 0.47.

Discussion and Conclusions
In this work, we have carried out a joint Fisher analysis of the thermal Sunyaev-Zeldovich power spectrum and bispectrum. In our analysis, we have gone beyond similar studies by significantly increasing the level of accuracy and realism of our forecasts. This has been done in several ways. First of all, we have evaluated the full power spectrum and bispectrum (auto-and cross-) covariance in the analysis, including all NG contributions. This turns out to be important, since we have found that the covariance is dominated by the connected 6-point component. Furthermore, we have considered a multi-component energy spectrum scenario and we have modeled the effects of component separation in our forecasts, via an effective noise term, which was obtained from a simple ILC procedure. Finally, rather than focusing just on σ 8 or on a small number of parameters, we have considered an extended set of both cosmological and ICM parameters, with the aim to accurately assess correlations and degeneracies and how these are dealt with by both the power spectrum and bispectrum.
In the end, we find out that the tSZ bispectrum is a very powerful observable, able to produce even stronger constraints than the tSZ power spectrum, after marginalization in a multi-parameter analysis. This is shown in our main results, summarized in figure 8 and  table 4.
Several reasons have been already pointed out in previous studies, which explain why the bispectrum is so useful in this type of analysis. For example it has been observed that the bispectrum is less affected by uncertainties in ICM parameters, because its contributions come from (better understood) low-redshift, high-mass clusters [13]. Another important point is that the bispectrum can break degeneracies that are present in a power spectrum-only analysis, through a different amplitude scaling with parameters [15,16].
On top of these previously known aspects, our main finding is that the bispectrum is extremely efficient at breaking degeneracies not only via amplitude scaling, but also -and mostly -due to the fact that different triangle shapes (e.g. equilateral and squeezed triangles) are affected differently by parameter changes. This is a somewhat counter-intuitive result: tSZ statistics are dominated by the one-halo term, which leads to a weak shape dependence on cosmology. Yet, we see that this is already strong enough in the bispectrum to produce large improvements in the final results. Furthermore, the triangle shape dependence on ICM parameters is of course stronger than on cosmology, significantly lowering the impact of astrophysical uncertainties in the cosmological analysis (as well as allowing for a precise measurement of astrophysical parameters themselves, possibly complementing [22]). Given its importance, we have studied this effect in detail. To this purpose, we have isolated two specific types of triangles, namely equilateral and squeezed. We have then verified that including only one configuration type in the bispectrum analysis leads to only modest improvements in the final results, whereas including both at the same time produces significantly better forecasts; we have checked that is precisely due to a slightly different response to changes in parameters, in the squeezed and equilateral limit. To further investigate such behaviour, we have isolated the regions in the M -z plane which mostly contribute to the bispectrum in different limits, showing that such regions do not fully overlap. In other words, when we compute equilateral and squeezed bispectrum configurations, we effectively integrate the halo mass function over slightly different intervals in mass and redshift. The effect of this on parameters is fairly small for a single triangle. However, this adds up in a significant way when we produce the final forecasts, by integrating over the very large number of available configurations.
The results obtained in our study clearly suggest that a joint power spectrum-bispectrum analysis of, e.g., Planck data, or a complete likelihood study of this kind, using mock datasets for future experiments, is clearly worth pursuing. This is the object of ongoing work. Another interesting subject for future investigation consists in accounting for spectral corrections to tSZ distortions, arising from relativistic speeds of electrons in clusters [69] (the so called relativistic SZ). Such corrections are mass and redshift dependent [70,71] through the cluster temperature scaling [72] and therefore could provide further help to break parameter degeneracies if better sensitivity is achieved. Further synergies, motivated by the recent interest in the bispectrum of other cosmological probes [49,[73][74][75][76][77], could be achieved considering the cross-bispectrum of tSZ with other tracers of the low-redshift Universe matter distribution.

A.1 Correlation functions for the projected Compton-y field
The total Compton-y field in a given direction n of the sky is defined via the following line-of-sight integral [7] y (n) = dt y 3D (χ(t)n) = dχ a (χ) y 3D (χ(t)n) . (A.1) The three-dimensional Compton-y field is directly related to the electron pressure profile of a single halo P e via the following re-scaling r being the comoving distance from the center of the halo. Under the assumption of small enough scales, we can focus on the two-dimensional Fourier transform of the field (A.1) employing the flat-sky approximatioñ In Fourier space, we define the connected part of the n-point correlation function of the two-dimensional fieldỹ We call the quantity P n ( 1 , . . . , n ) polyspectrum of order n. In eq. (A.4) we described the field on a flat sky, i.e. we approximate the full spherical harmonics decomposition of the real field with a simple two-dimensional Fourier transform. By replacing eq. (A.3) into eq. (A.4), we can derive the expression for the flat-sky polyspectra. In general, the redshift integration appearing in eq. (A.3) would naturally translate into a complex n-dimensional one. To simplify this calculation, we make use of the Limber approximation [78]: we assume that the three-dimensional matter polyspectra have a weak dependence on the momenta component corresponding to the line-of-sight direction. Consequently, the projection collapses into a simple one-dimension redshift integration and we can relate the angular multipoles to the three-dimensional momenta k via the well known Limber relation k ( , z) ≈ { /χ (z) , 0}. Finally, the general n-point polyspectrum P n ( 1 , . . . , n ) relates to the same order one P y 3D n (k 1 , . . . , k n ) for the three-dimensional Compton-y field (A.2) via P n ( 1 , . . . , n ) = ∞ 0 dχ χ 2−2n a n (χ) P y 3D n (k ( 1 , χ) , . . . , k ( n , χ)) ( 1 , z) , . . . , k ( n , χ)) , where we define the kernel For sake of completeness, let us recall the definition for the quantity P y 3D n (k 1 , . . . , k n ) As we do rely on the halo model for the matter clustering, we can write the electron pressure profile y 3D (x) ∝ P e (x) (A.2) (and its Fourier transform) as the sum of contributions from different halos, the mass of the i th halo being m i , centered at position x ĩ The contributionỹ 3D (k, m) from the single halo of mass m c 500 can be obtained as y 3D (k, m) = 4π σ T m e c 2 dr r 2 j 0 (kr) P e (m, r) = 4πr 500 σ T m e c 2 dx x 2 j 0 (kxr 500 ) P e (m c 500 , x) , where we employed the mass definition m c 500 and the variable x as requested for employing the parametrisation (2.11). By replacing eq. (A.9) within eq. (A.7), it is possible to obtain any desired order of correlation for the three-dimensional Compton-y parameter. Furthermore, by splitting the sum in eq. (A.9) into contributions from different multi-halo configurations, we do obtain the well known hierarchy of the different halo terms. We can finally obtain the observable of interest via eq. (A.4). For consistency with the literature, we call the 2-, the 3and the 4-point polyspectrum power spectrum, bispectrum and trispectrum, respectively: We underline that the assumption of an isotropic and homogeneous Universe (Cosmological Principle) allows us to reduce the actual dependencies of the polyspectra. The power spectrum is expressed as function of the module of the momentum (we define ≡ | |) and the bispectrum has a dependence on just 3 degrees of freedom, i.e. the edges of the associated triangular configuration [79]. In the following, we will show in details the equations required for our implementation as obtained from the formalism above.

A.2 Data vector
As far as the data vector of our analyses is concerned, we can write the power spectrum as the sum of the one-and two-halo term where we introduced the abbreviation k z ≡ |k ( , χ (z)) | and the mass m ν is related to the parameter ν via σ(m ν ) = δ/ν. To simplify the expression for the general matter polyspectrum, we can introduce the following quantity where b (0) ≡ 1 and we omitted the redshift dependence. Then, the Compton-y power spectrum (A.14) can be written in a more synthetic way as C = dz Q (2) (z) I 0 2 (k z , k z , z) + I 1 1 (k z , z) 2 P lin. (k z , z) . (A.18) Borrowing the above notation, we can write the bispectrum used in this work as the sum of the one-and two-halo term As already mentioned in the main text, we do not include the three-halo term as it would require the second order halo bias for which a fit has not yet been performed.

A.3 Covariance matrix
Moving to the covariance matrix, we employ the full expressions (A.14) and (A.19) whenever power spectra and bispectra are required. For higher order correlation functions entering the covariance computation, we rely on their respective one-halo component, which can be written in a general fashion as P 1h n ( 1 , . . . , n ) = dz Q (n) (z) I 0 n k z 1 , . . . , k z n ; z .

B Noise and foreground spectral shape
Here we enumerate the energy and angular dependence of the spectral components we used in the forecasts for next generation surveys. For convenience, we express the SED in terms of thermodynamic temperature rather than antenna temperature dividing them by the blackbody derivative with θ ν = (1/ √ 8 ln 2) π/(180 × 60) FWHM ν , where ∆T ν and FWHM ν are the instrument thermodynamic-temperature-error and Full Width at Half Maximum (FWHM) measured in T CMB units and arcseconds respectively, for the channel ν.
Free free.
Radio point sources.
Infrared point sources.

C Full triangle plots
For completeness we report here the triangle plot for all the parameters. In figure 9 we compare the power spectrum and bispectrum covariance ellipses for a noiseless survey with max = 5000, f sky = 1, and no foreground contamination, assuming the full integration domain in redshift and masses from [28]. A comparison to the case in which the integration boundaries have been set according to the Planck observation is drawn in figure 10. Figure 11 and figure 12 show the triangle plots for actual surveys, taking into account both instrumental noise and foreground contamination; Planck in figure 11, and SO and V-SP in 12. For reference, we also show there the results for the CVL survey with max = 1000 and 5000 respectively.

D Binning
As there is not a general recipe to choose a priori the best binning scheme, we validated our choice both calculating the correlation between the binned and unbinned bispectrum, and by comparing different schemes.   Figure 9: Triangle plots for a Cosmic variance limited experiment with perfect foreground separation, max = 5000, and f sky = 1. Notice that the power spectrum 1σ ellipses have been rescaled by a factor to fit in the same graph. The grey bands are the power spectrum conditional errors. Figure 7 is a sub-sample of this one.
In general, one can evaluate the correlation between two signals, and verify if they can be distinguished from each other using a given set of data, calculating theirs scalar product using the covariance matrix as metric [80][81][82] In [80][81][82], this method was employed in the case of the CMB primary bispectrum. That relatively simpler case allowed them to carry an exact calculation of the correlation. In our case this is not possible, and we have to resort to two approximations. First, in principle, the  We also try to fix n s and α P to show the impact of the most degenerate parameters, that could be determined using external data.
covariance would not depend on the tested models. In the case of the primary bispectrum, the weakly non-Gaussian limit can be employed, so that the covariance only depends on the C and instrumental noise, both of which are independent of the bispectrum. For an actual survey, the covariance could instead be constructed by the collected data and, as such, it would again be independent from the theoretical model. Instead, in our case, we assume that the covariance is the one calculated using our fiducial model, and we do not vary it while we compare another model to the fiducial one. Second, the theoretically sound way to test the binning would be to compare the binned bispectrum to the full bispectrum calculated on  every multipole, just repeating the representative value of each bin for every configuration in the bin. However, this leads to the problem that the covariance matrix quickly becomes ill-conditioned when the binning is made finer and finer. This problem was again not present in the case of the primary bispectrum since in the weakly non-Gaussian limit the covariance matrix is diagonal and, as such, trivial to invert. To overcome this problem, we exploit the fact that the bispectrum is a monotonous function of ( 1 , 2 , 3 ). The values of the bispectrum in the bin that deviate the most from the bin representative will therefore be the two calculated at the extreme bin boundaries, e.g. We have verified that changing the maximum and minimum multipoles ([10, 5000], [70,5000], [10,3000]), the number of bins (27,25,32) has little effect ( 10%) on the signal to noise ratio in the case of logarithmically spaced bins. The same applies when using (according to [25]) a combination of linearly and logarithmically spaced bins (linear with ∆ = 64 in [32,1440], 10 logarithmically spaced ones in [1440, 5000]).

E Bispectrum derivatives and breaking the degeneracies
To partially motivate the difference among the derivatives of squeezed and equilateral bispectrum, we investigated the impact of parameter changes on the bispectrum kernel. Not to have to deal with numerical convergence of derivatives in each point we consider the bispectrum kernel evaluated with the fiducial parameters, compared with the same evaluated with one parameter increased by 10%, and finite differences defined as where θ is the parameter being varied andθ is the vector of all the others which have been kept fixed. We already argued that isoperimetric bispectrum configurations probe quite similar regions in the z-M plane, but here we need to focus on the subtle differences that are still present. In the left four panels of figure 13 we therefore compare the bispectrum kernel of the isoperimetric (10,2995,2995) and (2000,2000,2000) configurations, calculated for different combination of parameters. Different parameters modify the kernel in different way (first vs second row), but also different configurations react differently (left vs central column). To see this, we show contour lines for the kernel calculated with the parameter fiducial value (solid) and with one of the parameters (Ω m on top, β G at the bottom) increased by 10%. This can be better appreciated in the right column, where we carried out the mass integral and plotted the finite differences as defined in eq. (E.1). The bispectrum derivatives (which are in a sense related to the integral of the curves plotted there) are not only different between configurations (squeezed or equilateral) but different parameters affect the various configurations differently (the ratio of the curves in the top panel is different from the ratio of the two curves in the bottom panel). Figure 13: Effect of parameter variations on the bispectrum kernel for two isoperimetric configurations: (10,2995,2995) and (2000,2000,2000). In the top row we assess the change due to an increase in Ω m , and in the bottom row an increase in β G . Left and centre) contour plot of the bispectrum kernel in the redshift-mass plane. Solid lines refer to the kernel calculated with the fiducial set of parameters, dashed lines to the kernel calculated with Ω m or β G increased by 10%. The two left panels display a squeezed bispectrum configuration, while the two central panels an equilateral one. The contour lines are traced at 1/10th and half maximum. Right) finite differences, as defined in eq. (E.1), of the kernel integrated over all masses.