Editing Power law (section)

==Power-law probability distributions==
In a looser sense, a power-law [[probability distribution]] is a distribution whose density function (or mass function in the discrete case) has the form, for large values of <math>x</math>,<ref>N. H. Bingham, C. M. Goldie, and J. L. Teugels, Regular variation. Cambridge University Press, 1989</ref>

:<math>P(X>x) \sim L(x) x^{-(\alpha-1)}</math>

where <math>\alpha > 1</math>, and <math>L(x)</math> is a [[slowly varying function]], which is any function that satisfies <math>\lim_{x\rightarrow\infty} L(r\,x) / L(x) = 1</math> for any positive factor <math>r</math>. This property of <math>L(x)</math> follows directly from the requirement that <math>p(x)</math> be asymptotically scale invariant; thus, the form of <math>L(x)</math> only controls the shape and finite extent of the lower tail. For instance, if <math>L(x)</math> is the constant function, then we have a power law that holds for all values of <math>x</math>. In many cases, it is convenient to assume a lower bound <math>x_{\mathrm{min}}</math> from which the law holds. Combining these two cases, and where <math>x</math> is a continuous variable, the power law has the form of the [[Pareto distribution]]

:<math>p(x) = \frac{\alpha-1}{x_\min} \left(\frac{x}{x_\min}\right)^{-\alpha},</math>

where the pre-factor to <math>\frac{\alpha-1}{x_\min}</math> is the [[normalizing constant]]. We can now consider several properties of this distribution. For instance, its [[Moment (mathematics)|moments]] are given by

:<math>\mathbb{E} \left(X^{m} \right) = \int_{x_\min}^\infty x^{m} p(x) \,\mathrm{d}x = \frac{\alpha-1}{\alpha-1-m}x_\min^m</math>

which is only well defined for <math>m < \alpha -1</math>. That is, all moments <math>m \geq \alpha - 1</math> diverge: when <math>\alpha\leq 2</math>, the average and all higher-order moments are infinite; when <math>2<\alpha<3</math>, the mean exists, but the variance and higher-order moments are infinite, etc. For finite-size samples drawn from such distribution, this behavior implies that the [[central moment]] estimators (like the mean and the variance) for diverging moments will never converge – as more data is accumulated, they continue to grow. These power-law probability distributions are also called Pareto-type distributions, distributions with Pareto tails, or distributions with regularly varying tails.

A modification, which does not satisfy the general form above, with an exponential cutoff,{{sfn|Clauset|Shalizi|Newman|2009}} is

:<math>p(x) \propto L(x) x^{-\alpha} \mathrm{e}^{-\lambda x}.</math>

In this distribution, the exponential decay term <math>\mathrm{e}^{-\lambda x}</math> eventually overwhelms the power-law behavior at very large values of <math>x</math>. This distribution does not scale{{Explain|date=December 2023}} and is thus not asymptotically as a power law; however, it does approximately scale over a finite region before the cutoff. The pure form above is a subset of this family, with <math>\lambda=0</math>. This distribution is a common alternative to the asymptotic power-law distribution because it naturally captures finite-size effects.

The [[Tweedie distributions]] are a family of statistical models characterized by [[Closure (mathematics)|closure]] under additive and reproductive convolution as well as under scale transformation.  Consequently, these models all express a power-law relationship between the variance and the mean.  These models have a fundamental role as foci of mathematical [[Limit (mathematics)|convergence]] similar to the role  that the [[normal distribution]] has as a focus in the [[central limit theorem]].  This convergence effect explains why the variance-to-mean power law manifests so widely in natural processes, as with [[Taylor's law]] in ecology and with fluctuation scaling<ref name=Kendal2011a>{{cite journal | last1 = Kendal | first1 = WS | last2 = Jørgensen | first2 = B | year = 2011 | title = Taylor's power law and fluctuation scaling explained by a central-limit-like convergence | journal = Phys. Rev. E | volume = 83 | issue = 6| page = 066115 | doi=10.1103/physreve.83.066115| pmid = 21797449 | bibcode = 2011PhRvE..83f6115K }}</ref> in physics.  It can also be shown that this variance-to-mean power law, when demonstrated by the [[Tweedie distributions|method of expanding bins]], implies the presence of 1/''f'' noise and that 1/''f'' noise can arise as a consequence of this Tweedie convergence effect.<ref name=Kendal2011b>{{cite journal | last1 = Kendal | first1 = WS | last2 = Jørgensen | first2 = BR | year = 2011 | title = Tweedie convergence: a mathematical basis for Taylor's power law, 1/''f'' noise and multifractality | url = https://findresearcher.sdu.dk:8443/ws/files/55639035/e066120.pdf| journal = Phys. Rev. E | volume = 84 | issue = 6| page = 066120 | doi=10.1103/physreve.84.066120| pmid = 22304168 | bibcode = 2011PhRvE..84f6120K }}</ref>

===Graphical methods for identification===

Although more sophisticated and robust methods have been proposed, the most frequently used graphical methods of identifying power-law probability distributions using random samples are Pareto quantile-quantile plots (or Pareto [[Q–Q plot]]s),{{citation needed|date=May 2012}} mean residual life plots<ref>Beirlant, J., Teugels, J. L., Vynckier, P. (1996) ''Practical Analysis of Extreme Values'', Leuven: Leuven University Press</ref><ref>Coles, S. (2001) ''An introduction to statistical modeling of extreme values''. Springer-Verlag, London.</ref> and [[log–log plot]]s. Another, more robust graphical method uses bundles of residual quantile functions.<ref name=Diaz>{{cite journal | last1 = Diaz |first1=F. J. | year = 1999 | title = Identifying Tail Behavior by Means of Residual Quantile Functions | journal = Journal of Computational and Graphical Statistics | volume = 8 | issue = 3| pages = 493–509 | doi = 10.2307/1390871 |jstor=1390871 }}</ref> (Please keep in mind that power-law distributions are also called Pareto-type distributions.) It is assumed here that a random sample is obtained from a probability distribution, and that we want to know if the tail of the distribution follows a power law (in other words, we want to know if the distribution has a "Pareto tail"). Here, the random sample is called "the data".

==== Pareto Q–Q plots ====
Pareto Q–Q plots compare the [[quantile]]s of the log-transformed data to the corresponding quantiles of an exponential distribution with mean 1 (or to the quantiles of a standard Pareto distribution) by plotting the former versus the latter. If the resultant scatterplot suggests that the plotted points ''asymptotically converge'' to a straight line, then a power-law distribution should be suspected.  A limitation of Pareto Q–Q plots is that they behave poorly when the tail index <math>\alpha</math> (also called Pareto index) is close to 0, because Pareto Q–Q plots are not designed to identify distributions with slowly varying tails.<ref name="Diaz" />

==== Mean residual life plots ====
On the other hand, in its version for identifying power-law probability distributions, the mean residual life plot consists of first log-transforming the data, and then  plotting the average of those log-transformed data that are higher than the ''i''-th order statistic versus the ''i''-th order statistic, for ''i''&nbsp;=&nbsp;1,&nbsp;...,&nbsp;''n'', where n is the size of the random sample. If the resultant scatterplot suggests that the plotted points tend to stabilize about a horizontal straight line, then a power-law distribution should be suspected. Since the mean residual life plot is very sensitive to outliers (it is not robust), it usually produces plots that are difficult to interpret; for this reason, such plots are usually called Hill horror plots.<ref>{{cite journal | last1 = Resnick | first1 = S. I. | year = 1997 | title = Heavy Tail Modeling and Teletraffic Data | journal = The Annals of Statistics | volume = 25 | issue = 5| pages = 1805–1869 | doi=10.1214/aos/1069362376| doi-access = free }}</ref>

==== Log-log plots ====
[[File:Log-log plot example.svg|thumb|A straight line on a log–log plot is necessary but insufficient evidence for power-laws, the slope of the straight line corresponds to the power law exponent.]]

[[Log–log plot]]s are an alternative way of graphically examining the tail of a distribution using a random sample. Taking the logarithm of a power law of the form <math>f(x) = ax^{k}</math> results in:<ref>http://www.physics.pomona.edu/sixideas/old/labs/LRM/LR05.pdf</ref>

:<math>\begin{align}
 \log(f(x)) &= \log(ax^{k}) \\
 &= \log(a) + \log(x^k) \\
 &= \log(a) + k \cdot \log(x),
\end{align}</math>

which forms a straight line with slope <math>k</math> on a log-log scale. Caution has to be exercised however as a log–log plot is necessary but insufficient evidence for a power law relationship, as many non power-law distributions will appear as straight lines on a log–log plot.{{sfn|Clauset|Shalizi|Newman|2009}}<ref>{{cite web|url=http://bactra.org/weblog/491.html|title=So You Think You Have a Power Law — Well Isn't That Special?|website=bactra.org|access-date=27 March 2018}}</ref> This method consists of plotting the logarithm of an estimator of the probability that a particular number of the distribution occurs versus the logarithm of that particular number. Usually, this estimator is the proportion of times that the number occurs in the data set. If the points in the plot tend to converge to a straight line for large numbers in the x axis, then the researcher concludes that the distribution has a power-law tail. Examples of the application of these types of plot have been published.<ref>{{cite journal |last1=Jeong |first1=H. |last2=Tombor |first2= B. Albert |last3=Oltvai |first3=Z.N. |last4=Barabasi |first4= A.-L. |year=2000 |title=The large-scale organization of metabolic networks |journal=Nature |volume=407 |issue=6804| pages=651–654 |doi=10.1038/35036627 |pmid=11034217 |arxiv=cond-mat/0010278 |bibcode=2000Natur.407..651J |s2cid=4426931}}</ref> A disadvantage of these plots is that, in order for them to provide reliable results, they require huge amounts of data. In addition, they are appropriate only for discrete (or grouped) data.

==== Bundle plots ====
Another graphical method for the identification of power-law probability distributions using random samples has been proposed.<ref name="Diaz" /> This methodology consists of plotting a ''bundle for the log-transformed sample''. Originally proposed as a tool to explore the existence of moments and the moment generation function using random samples, the bundle methodology is based on residual [[quantile function]]s (RQFs), also called residual percentile functions,<ref>{{cite journal | last1 = Arnold | first1 = B. C. | last2 = Brockett | first2 = P. L. | year = 1983 | title = When does the βth percentile residual life function determine the distribution? | journal = Operations Research | volume = 31 | issue = 2| pages = 391–396 | doi=10.1287/opre.31.2.391| doi-access =  }}</ref><ref>{{cite journal | last1 = Joe | first1 = H. | last2 = Proschan | first2 = F. | year = 1984 | title = Percentile residual life functions | journal = Operations Research | volume = 32 | issue = 3| pages = 668–678 | doi=10.1287/opre.32.3.668}}</ref><ref>Joe, H. (1985), "Characterizations of life distributions from percentile residual lifetimes", ''Ann. Inst. Statist. Math.'' 37, Part A, 165–172.</ref><ref>{{cite journal | last1 = Csorgo | first1 = S. | last2 = Viharos | first2 = L. | year = 1992 | title = Confidence bands for percentile residual lifetimes | url =https://deepblue.lib.umich.edu/bitstream/2027.42/30190/1/0000575.pdf | journal = Journal of Statistical Planning and Inference | volume = 30 | issue = 3| pages = 327–337 | doi=10.1016/0378-3758(92)90159-p| hdl = 2027.42/30190 | hdl-access = free }}</ref><ref>{{cite journal | last1 = Schmittlein | first1 = D. C. | last2 = Morrison | first2 = D. G. | year = 1981 | title = The median residual lifetime: A characterization theorem and an application | journal = Operations Research | volume = 29 | issue = 2| pages = 392–399 | doi=10.1287/opre.29.2.392}}</ref><ref>{{cite journal | last1 = Morrison | first1 = D. G. | last2 = Schmittlein | first2 = D. C. | year = 1980 | title = Jobs, strikes, and wars: Probability models for duration | journal = Organizational Behavior and Human Performance | volume = 25 | issue = 2| pages = 224–251 | doi=10.1016/0030-5073(80)90065-3}}</ref><ref>{{cite journal | last1 = Gerchak | first1 = Y | year = 1984 | title = Decreasing failure rates and related issues in the social sciences | journal = Operations Research | volume = 32 | issue = 3| pages = 537–546 | doi=10.1287/opre.32.3.537}}</ref> which provide a full characterization of the tail behavior of many well-known probability distributions, including power-law distributions, distributions with other types of heavy tails, and even non-heavy-tailed distributions. Bundle plots do not have the disadvantages of Pareto Q–Q plots, mean residual life plots and log–log plots mentioned above (they are robust to outliers,  allow visually identifying power laws with small values of <math>\alpha</math>, and do not demand the collection of much data).{{citation needed|date=May 2012}} In addition, other types of tail behavior can be identified using bundle plots.

===Plotting power-law distributions===
In general, power-law distributions are plotted on [[log–log plot|doubly logarithmic axes]], which emphasizes the upper tail region. The most convenient way to do this is via the (complementary) [[cumulative distribution function#Complementary cumulative distribution function (tail distribution)|cumulative distribution]] (ccdf) that is, the [[survival function]], <math>P(x) = \mathrm{Pr}(X > x)</math>,

:<math>P(x) = \Pr(X > x) =  C \int_x^\infty p(X)\,\mathrm{d}X =  \frac{\alpha-1}{x_\min^{-\alpha+1}} \int_x^\infty X^{-\alpha}\,\mathrm{d}X = \left(\frac{x}{x_\min} \right)^{-(\alpha-1)}.</math>

The cdf is also a power-law function, but with a smaller scaling exponent. For data, an equivalent form of the cdf is the rank-frequency approach, in which we first sort the <math>n</math> observed values in ascending order, and plot them against the vector <math>\left[1,\frac{n-1}{n},\frac{n-2}{n},\dots,\frac{1}{n}\right]</math>.

Although it can be convenient to log-bin the data, or otherwise smooth the probability density (mass) function directly, these methods introduce an implicit bias in the representation of the data, and thus should be avoided.{{sfn|Clauset|Shalizi|Newman|2009}}<ref>{{cite journal|title=Parameter estimation for power-law distributions by maximum likelihood methods|journal= European Physical Journal B|volume=58 |issue=2|pages=167–173|author=Bauke, H. |doi=10.1140/epjb/e2007-00219-y|year=2007|arxiv=0704.1867 |bibcode=2007EPJB...58..167B|s2cid=119602829}}</ref> The survival function, on the other hand, is more robust to (but not without) such biases in the data and preserves the linear signature on doubly logarithmic axes. Though a survival function representation is favored over that of the pdf while fitting a power law to the data with the linear least square method, it is not devoid of mathematical inaccuracy. Thus, while estimating exponents of a power law distribution, maximum likelihood estimator is recommended.

===Estimating the exponent from empirical data===
There are many ways of estimating the value of the scaling exponent for a power-law tail, however not all of them yield [[Maximum likelihood estimation#Second-order efficiency after correction for bias|unbiased and consistent answers]]. Some of the most reliable techniques are often based on the method of [[maximum likelihood estimation|maximum likelihood]]. Alternative methods are often based on making a linear regression on either the log–log probability, the log–log cumulative distribution function, or on log-binned data, but these approaches should be avoided as they can all lead to highly biased estimates of the scaling exponent.{{sfn|Clauset|Shalizi|Newman|2009}}

====Maximum likelihood====

For real-valued, [[independent and identically distributed]] data, we fit a power-law distribution of the form

: <math>p(x) = \frac{\alpha-1}{x_\min} \left(\frac{x}{x_\min}\right)^{-\alpha}</math>

to the data <math>x\geq x_\min</math>, where the coefficient <math>\frac{\alpha-1}{x_\min}</math> is included to ensure that the distribution is [[Normalizing constant|normalized]].  Given a choice for <math>x_\min</math>, the log likelihood function becomes:

:<math>\mathcal{L}(\alpha)=\log  \prod _{i=1}^n \frac{\alpha-1}{x_\min} \left(\frac{x_i}{x_\min}\right)^{-\alpha}</math> 
The maximum of this likelihood is found by differentiating with respect to parameter <math>\alpha</math>, setting the result equal to zero. Upon rearrangement, this yields the estimator equation:

:<math>\hat{\alpha} = 1 + n \left[ \sum_{i=1}^n \ln \frac{x_i}{x_\min} \right]^{-1}</math>

where <math>\{x_i\}</math> are the <math>n</math> data points <math>x_{i}\geq x_\min</math>.<ref name=Newman/><ref name=Hall/> This estimator exhibits a small finite sample-size bias of order <math>O(n^{-1})</math>, which is small when ''n''&nbsp;>&nbsp;100. Further, the standard error of the estimate is <math>\sigma = \frac{\hat{\alpha}-1}{\sqrt{n}} + O(n^{-1})</math>. This estimator is equivalent to the popular{{citation needed|date=June 2012}} [[Hill estimator]] from [[quantitative finance]] and [[extreme value theory]].{{citation needed|date=June 2012}}

For a set of ''n'' integer-valued data points <math>\{x_i\}</math>, again where each <math>x_i\geq x_\min</math>, the maximum likelihood exponent is the solution to the transcendental equation

: <math>\frac{\zeta'(\hat\alpha,x_\min)}{\zeta(\hat{\alpha},x_\min)} = -\frac{1}{n} \sum_{i=1}^n \ln \frac{x_i}{x_\min} </math>

where <math>\zeta(\alpha,x_{\mathrm{min}})</math> is the [[Riemann zeta function#Generalizations|incomplete zeta function]]. The uncertainty in this estimate follows the same formula as for the continuous equation. However, the two equations for <math>\hat{\alpha}</math> are not equivalent, and the continuous version should not be applied to discrete data, nor vice versa.

Further, both of these estimators require the choice of <math>x_\min</math>. For functions with a non-trivial <math>L(x)</math> function, choosing <math>x_\min</math> too small produces a significant bias in <math>\hat\alpha</math>, while choosing it too large increases the uncertainty in <math>\hat{\alpha}</math>, and reduces the [[statistical power]] of our model. In general, the best choice of <math>x_\min</math> depends strongly on the particular form of the lower tail, represented by <math>L(x)</math> above.

More about these methods, and the conditions under which they can be used, can be found in .{{sfn|Clauset|Shalizi|Newman|2009}} Further, this comprehensive review article provides [http://www.santafe.edu/~aaronc/powerlaws/ usable code] (Matlab, Python, R and C++) for estimation and testing routines for power-law distributions.

====Kolmogorov–Smirnov estimation====

Another method for the estimation of the power-law exponent, which does not assume [[independent and identically distributed]] (iid) data, uses the minimization of the [[Kolmogorov–Smirnov statistic]], <math>D</math>, between the cumulative distribution functions of the data and the power law:

: <math>\hat{\alpha} = \underset{\alpha}{\operatorname{arg\,min}} \, D_\alpha </math>

with

: <math> D_\alpha = \max_x | P_\mathrm{emp}(x) - P_\alpha(x) | </math>

where <math>P_\mathrm{emp}(x)</math> and <math>P_\alpha(x)</math> denote the cdfs of the data and the power law with exponent <math>\alpha</math>, respectively. As this method does not assume iid data, it provides an alternative way to determine the power-law exponent for data sets in which the temporal correlation can not be ignored.<ref name=Klaus/>

====Two-point fitting method====
This criterion<ref>{{Cite journal |last1=Guerriero |first1=Vincenzo |last2=Vitale |first2=Stefano |last3=Ciarcia |first3=Sabatino |last4=Mazzoli |first4=Stefano |date=2011-05-09 |title=Improved statistical multi-scale analysis of fractured reservoir analogues |url=https://www.sciencedirect.com/science/article/pii/S0040195111000047 |journal=Tectonophysics |language=en |volume=504 |issue=1 |pages=14–24 |doi=10.1016/j.tecto.2011.01.003 |bibcode=2011Tectp.504...14G |issn=0040-1951}}</ref> can be applied for the estimation of power-law exponent in the case of scale-free distributions and provides a more convergent estimate than the maximum likelihood method. It has been applied to study probability distributions of fracture apertures. In some contexts the probability distribution is described, not by the [[cumulative distribution function]], by the [[cumulative frequency analysis|cumulative frequency]] of a property ''X'', defined as the number of elements per meter (or area unit, second etc.) for which ''X''&nbsp;>&nbsp;''x'' applies, where ''x'' is a variable real number. As an example,{{Cn|date=November 2019}} the cumulative distribution of the fracture aperture, ''X'', for a sample of ''N'' elements is defined as 'the number of fractures per meter having aperture greater than ''x'' . Use of cumulative frequency has some advantages, e.g. it allows one to put on the same diagram data gathered from sample lines of different lengths at different scales (e.g. from outcrop and from microscope).