Answers on referee comments concerning the CMS analysis note AN 2006/003

The updated note (vers.2), taking into account referee comments:

trilap.ps
trilap.tex


Referee Comments for AN 2006/003


Title:           "Trilepton final state from neutralino-chargino
                   production in mSUGRA"

Authors:      W. de Boer, M. Niegel, C. Sander, V. Zhukov (Karlsruhe),
                  K. Mazumdar (TIFR)

Referees:    R. Cavanaugh, A. Tricomi



1. The note clearly represents a significant undertaking and the
referees would like to recognize and commend the authors on the
significant amount of work that this note represents.

2. The current analysis is clearly a model dependent one, as evidenced
by the fact that the di-lepton mass is required to be less than 75
GeV.   This model dependency (which restricts the analysis to low mass
SUSY) should be stated explicitly in the note (and preferably also in
the title).

Included in Introduction
"In this paper  a study of the CMS discovery potential for the mSUGRA
pure trilepton final state at low value of $m_{1/2}$..."

3. It is understood that the low mass region is phenomenologically
interesting to dark matter searches and that the trilepton
cross-section rapidly drops with the di-lepton mass.  Nevertheless,
because the trilepton final states are also interesting outside of the
dark matter motivation, the referees would like the authors to also
investigate if it is possible to include di-lepton masses above the Z0
peak (assuming Lint = 30 fb-1).  If this is not possible, then it
would be nice to know how much integrated luminosity is required to
include higher mass SUSY.

We consider the whole mSUGRA parameters plane. Since the neutralino-chargino
production cross section is proportional to the neutralino mass  as 1/m^4,
it is dropping fast with the m1/2=2.5*m and depends very weekly from m0.
The m1/2 is related to the gaugino and gluino masses and the m0 to the
scalars. Therefore the measurable signal is expected in  the vast
range of the scalar masses (i.e. it would be missleading to say we consider the
low mass SUSY region which in fact is the bulk region at low m1/2 and m0)
and at relatively low m1/2 (which is naturally constrained by the
decrease of the cross section). The Z peak from the SM backgrounds
disturbs the event selection in the range of m1/2  230-300 GeV. At m1/2>300 GeV
the cross section is below 1 fb^-1 , i.e. Lint>1000 fb^-1 would be needed to see
the signal.
Included in the text:
" For larger m1/2 the trilepton production cross section is below 1 fb-1
and would require  Lint>1000 fb-1"

4. In section 2.2, the note states that "the QCD and W+Jets
backgrounds have been considered at generator level [only] and were
found to have a negligible contribution to the trilepton final state.
After reconstruction, the fake leptons can appear which can increase
number of background channels.  The detailed study of the fakes is out
of the scope of the current paper, here only estimations are
presented."

The referees understand that by requiring three isolated leptons, the
QCD and W+jets backgrounds are significantly reduced.  However because
of the enormous QCD and high W+Jets cross-sections as well as the soft
lepton requirements (this work considers muons above 5 GeV and
electrons above 10 GeV) used in this analysis, the referees consider
this to be a very serious issue which must be addressed:  fake leptons
will arise due to many effects such as, jets faking electrons,
pi-zeros faking electrons, charged pions and kaons decaying in the
detector volume producing (real!) muons, high pT jets faking muons via
punch through, etc, etc.  The lack of three leptons at generator level
does not imply that one will not reconstruct three (low pT!) isolated
leptons at reconstruction level.  The referees request that the
authors either (1) include QCD and W+jets as reconstructed backgrounds
in the analysis, or (2) establish a reliable estimate for the
systematic uncertainty for ignoring QCD and W+jets in this work.

The Wjets and QCD cross sections would require simulation of >10^8 events,
which is, unfortunately not feasible.
However we simulated some samples and included them into the analysis.
The  upper limit is set on  the possible contribution from this channels.
Also very large data samples are required for the systematical study
of the reconstruction fakes. The  fake rates are related to  many  CMS analysis
and would require a special study which we just started.
The analysis of fakes is included in the updated note as requested.
This analysis based on limited statistics and can be considered as
an estimate. The fake rates are evaluated individually for each bkg channel
and estimates of the number of fake events are preseneted.
The obtained fake rates ~10^-5 for muons and ~10^-4 for electrons is
compatible with the CMSIN 2005/028 results  done for the Wjets.

5. In section 2.2, the note states that "For all backgrounds the Z and
W bosons were forced to decay leptonically?"

The referees have the same comment as above:  a lack of leptons
produced at generator level, does not mean that fake leptons will not
be reconstructed at reconstruction level (particularly for low pT
leptons).  This effect is mitigated partly by requiring three isolated
leptons in the however, by ignoring the hadronic decays of the W and Z
bosons, the current analysis is ignoring possible significant
background contributions arising from fake leptons.  The referees
request that the authors either (1) include hadronic decays of the W
and Z bosons in the reconstructed part of the analysis, or (2)
establish a reliable estimate for the systematic uncertainty for
ignoring such contributions.

This depends on the channel, for example can be relevant to ZZ with
one Z->2l  and second Z-2j, etc.
For the DY or Zjets the 2 OSSF are important to pass the analysis and
the effect of more than 1 fake in one event is pretty small.
However we consider the full samples as suggested and
the contribution of the fakes is evaluated in the note.


6. In section 2.2, the note states that "?for the Z+jets and DY,
trileptons were preselected at generator level to have Pt > 5 GeV/c
and |eta| < 2.4.  This preselection can lead to underestimation of the
fake [rate]."

Because reconstructed muons are considered above 5 GeV and
reconstructed electrons above 10 GeV (OSSF muons are actually required
to be above 10 GeV and OSSF electrons above 17 GeV), this generator
preselection cut is the same as the reconstructed cut and therefore
presents a very serious difficulty, not to the fake rate as suggested
in the note, but rather in properly estimating the background arising
from mis-measurement of low pT muons which were generated below 5 GeV,
but reconstructed above 5 GeV.   The referees request that the authors
either (1) remove this pre-selection requirement from the analysis, or
(2) reliably estimate the systematic uncertainty to the background
estimation due to this preselection cut.

The cross section for the considered background is large and
without the preselection can be almost impossible to simulate.
The generator level preselection cut is lower than used at the reconstruction
by ~5 GeV; in preselection PT>5(10) GeV/c for the muons(electrons) and
during reconstruction selection PT>10 GeV/c for all leptons (and >17 GeV for
2OSSF electrons in order to pass  the trigger).
Provided that the Pt resolution of muons(electrons)
at this energies is below 2(5)%, we assume the contribution from
the miss-measured leptons is negligible.


7. In section 2.2, Zbb is not considered as a possible background.
The referees feel that this is potentially a significant background
and that the authors should either (1) include Zbb in this analysis,
or (2) justify why Zbb is not a source of tri-lepton background.


The Zbbar is included in the analysis now and contributes <70 events
into  1800  bkg. events

8. In section 2.2, SUSY (LM9) is listed as a background source.  The
referees agree that SUSY itself does indeed represent a source of
background for this analysis.  Nevertheless, the referees request that
the authors to kindly describe the details of how SUSY is used as a
background.  For example, how is the tri-lepton signal separated from
non-tri-lepton SUSY events, etc.

The inclusive LM9 events except the direct neutralino-chargino
production (PYTHIA process 230) was used as the SUSY background.
This is clearified in the updated text.


9. In section 2.2, the CMS dataset names are not given for the DSTs
used.  The referees request that the authors list (in an appendix) the
dataset names for the DSTs used in this analysis.

The dataset names are included in the updated text.

10. In section 4, the note states that "the correct pairing was
analyzed with MC tagged leptons and is almost the same for high or low
pT combinations."

The referees would like to request that the authors kindly include
more details on how the reconstructed leptons are tagged using MC
information.

The pairing was studied at the generator level
using RawHepEvent  where the
origin of the leptons is known.
This is clearified in the  updated text.


11. In section 3.1, the note states that "The triggers (L1+HLT)
efficiency in the m0, m1/2 plane is presented in Figure 4.  The scan
was produced with FAMOS and the efficiencies were tuned to the
efficiency at LM9 from the full simulation."

The referees are confused by this statement and would like the authors
to clarify what is meant by "tuning" the trigger efficiencies to LM9
and how those "tuned" efficiencies are used in the FAMOS scan of the
m0, m1/2 plane.

Since we used FAMOS for the scan, where the trigger is not
 well implimented yet, the trigger selection cuts have been
implimented to the offline reconstructed objects (muons and electrons).
Therefore the trigger efficiencies are different as compare to the
full reconstruction( is higher).The FAMOS efficiency was normalized
to the ORCA trigger efficieny at LM9 point where DST was simulated.
 Of course this can introduce some small uncertainties due to
dependancy of the reconstruction efficiency from Pt. However
the scan plot is presented for illustration purpose and shows
 the  correct tendancy.


12. In section 3.2, the notes states that muons can be "contaminated"
from jets not vetoed by the jet veto and that this contamination is
estimated by matching the reconstructed muon in FAMOS with a generated
muon from PYTHIA.  This appears to have only been done for ttbar,
Z+jets, DY samples.  The contamination is estimated to be 3 10^-6.

The referees are concerned that this estimate may not be accurate.
While FAMOS does decay pions and kaons, it does not simulate
punch-through and may not simulate other possible effects contributing
to fake muons.  More importantly, however, the referees are more
concerned that other important backgrounds are not included in
estimation of the muon fake rate.  Hence the referees request that
(1) the authors justify the use of FAMOS for estimating the muon
fake rate and (2) that the authors estimate the muon fake rates all
considered backgrounds.

The most important backgrounds are simulated in FAMOS and therefore
FAMOS has been  used for evaluation of the fake rates.
The FAMOS and ORCA may have some difference but this study
is certainly out of the scope of current analysis since it will require
large data samples. However the numbers we obtained are close to the
 full simulations  (CMSIN2005/028) although the direct comparison
 is difficult since we used a bit stronger cuts on leptons.
In addition  our fake rates estimate
dont take into account the miss reconstructed leptons out of the
matching cone, i.e. we overestimate fakes.
The fakes section is reevaluated in the note, see section 4,
and the fake rates are estimated for all important channels.

13. In section 3.2, the notes indicates that the contamination of
reconstructed electrons come mainly pi-zeros or photon conversions.
The contamination is estimated from ttbar, Wt, and Z+jets to be 7
10^-5.

The referees are concerned that not all backgrounds have been used to
estimate the electron fake rate.  Further the note does not consider
the electron fake rate specifically from jets.  The referees request
that the authors (1) include all backgrounds to estimate the electron
fake rate, not only ttbar, Wt, and Z+jets, including, in particular,
QCD, W+jets, and Zbb, etc and (2) include the possibility of jets
faking an electron when estimating the contamination rate.

The fakes rates are reevaluated  in the note, see section 4,
for all channels.


14. In section 3.4, the note states that MET "is almost zero from DY,
Z+jets, ZZ backgrounds."

The referees would kindly like to suggest that the authors change the
wording slightly to account for the fact that the Z0 can decay to
neutrinos, which would create significant MET.

Corrected in the text:
" is almost zero for the leptonic (e,mu) decays.. "


15. All plots in the note are normalised to equal areas.  The referees
would like to suggest that all distributions be normalized to
luminosity weighted cross-sections, where possible and appropriate.
This provides the reader with a better feel for the importance of one
distribution compared with another distribution within the same plot.

There are 3 plots normalized : Fig.6 Pt distribution of leptons,
Fig.7 MET and sumET distribution,Fig.8 Et jets and Njets.
The comparible quantities, which can be plotted in one plot in one scale,
are obtained only after selection(see Table 6),
i.e the distributions have to be normalized to the number of events
after selection, which is discussed in the next
section, and apriori is not known.  That prevent to use
the scaling. In the other hand the efficiency of each selection cut is presented
in the Table and the contribution of each background can be easily understood.


16. In section 4, the note describes "In ~27% of the signal events
another OSSF pair can be constructed, where one of this pair would be
a fake, since one lepton originates from the chargino decay."

The referees are confused by the term "fake" here.  Presumably the
lepton originating from the chargino decay is a "real" lepton (that
is, it was generated and reconstructed).  Do the authors really mean
that using such a lepton in the invariant mass calculation gives the
wrong combination?  The referees would like the authors to define what
is meant by the term "fake" in this section.  If it corresponds to a
"real" lepton, but simply the wrong combination in the invariant mass
calculation, the referees kindly ask the authors to use a different
term so as not to confuse the reader with "fake reconstructed leptons"
which have no correspondence with a generated lepton.

The word 'fake' here is  missleading, a wrong combinations
has to be used instead. The text is corrected and this part is clearified.


17. In section 4, the note indicates that the "fake" invariant mass
(above) stays in signal region.

The referees are concerned by this fact, which is manifest in Figures
9, 10, and 12, in which the shape of the signal looks very similar to
the shape of the background.  As a result, a Gaussian fit to the
line-shape (performed later in the analysis) is not appropriate
without taking proper account of the background (either via background
subtraction or via a simultaneous fit to the background).

Indeed, with such small significance the fitting does not make sense.
The gaussian fit is removed  from the analysis.


18. In section 4, the note states that a low mass peak (arising from
DFOS combinations) from Z+jets and DY backgrounds is suppressed by
generator preselection cuts but can also be suppressed with a cut on
the invariant di-lepton mass > 15 GeV.

Clearly, one is not allowed to suppress backgrounds via generator
level cuts.  The referees are thus confused and request that the
authors kindly clarify what is meant by the above, in case the
referees have misunderstood the intent and what was actually performed
in the analysis.  If indeed a generator level cut was used to remove
the low mass peak in the Z+jets and DY backgrounds, then the referees
request that that cut be removed from the analysis.  If a cut on the
invariant di-lepton mass is performed, is that cut performed at
generator level or reconstruction level.  If the cut is performed at
generator level, then the referees request that the authors perform
the cut at reconstruction level.  If the cut is done at reconstruction
level, then the referees kindly ask the authors to clarify this in the
text.

This ansats suggests  only a  possibility to  stringent
the invariant mass from below in addition to upper limit (75 GeV),
The text is corrected.

19. In section 4, the note describes a Gaussian fit to the di-lepton
invariant mass.

The referees request that the authors clarify the purpose of the
Gaussian fit.  If the background is not subtracted (or fitted
simultaneously) before the fit is performed, then mean of Gaussian
will depend on the background shape, as the authors state in the note.
 It is unclear what value the Gaussian fit adds to the results.

Indeed, the end point is hardly visible and there is no reason to have
the fits. The fit is removed from the analysis.

20. In section 4, the note states that "A fake event is defined as an
event where at least one lepton is used in the selection does not come
from the main interaction.  In almost 90% of cases, this is found to
be a fake electron."

The referees request that the authors define what is meant by "main
interaction."  The referees are concerned that the note confuses real
reconstructed electrons (corresponding to generated electrons, but
which may not originate from the "main interaction") with fake
reconstructed electrons (which do not correspond to a generated
electron).  If an electron is generated and reconstructed, it is
normally referred to as a "real" electron, even if it does not
originate from the "main interaction."  The referees kindly request
the authors to also precisely define what is meant by "fake
electron" in the above passage.

We agree with this terminology. The real particle -means it is
simulated at MC level, independant on the source.
Note however  that in the analysis the particle produced in
the pileup events were not tagged as MC(the RawHepEvent has been used for MC event).
It means the fake rate can be slightly overestimated
by the leptons appeared from the pileup events, in reality situation
can be better.
This part is corrected in the text as suggested.


21. In section 5, the note describes the training of the Neural
Network.  The training samples are said to "have been preselected with
somewhat looser selection cuts."

The referees are concerned by this statement.  The NN training sample
should be configured identically to the sample which is used to make
the significance estimation.  The referees request the authors to
clarify why the training and data samples have different
configurations and what systematic effect this has on the
significance estimation.

Clearly the lower cuts for the training were used to increase statistics
for the training samples. During expertize the network have been used
as an additional discriminant which has combined different variables.
The contribution of the different part of the training sample
to this value is not straightforward. In our case it was
found that for the  part of statistics which is related to the lower cuts,
the NN selection is less efficient than simple cuts.
This does not introduce any systematic but simply compensates
the unefficiency of the NN for  low Pt region.
This is clearified in the updated text.


22. The referees appreciate the motivation of applying the NN (which
accounts for correlations between the discriminating variables)
followed by the Genetic Algorithms (which can efficiently maximize a
function in a multi-dimensional space).  However the referees note
that, by far, the most powerful selection cuts of the analysis comes
from neither the NN nor the GA, but rather from the very simple cuts
which are applied before the NN+GA.  Indeed, the NN+GA gains are only
very modest when compared with the extraordinarily added complication.
 Applying such complexity reduces one's ability to understand the
systematic behaviour of this analysis (which the referees note as
being very systematically challenging in its current form).


The NN selection does an improvement. The fact that the improvement is
 not  a miracle - only confirms that the selection with cuts is close to the optimal
(which was not obvious in advance).
Also is clear that  the NN(as well as GA or LHood) has larger dependance
on the simulation model and the reconstruction algortihms, which in turn
increases the  influence of systematic uncertainties.  This is  even  true for the
simple cuts if they are tuned to get the maximum significance of some signal.
In the other hand each method has own advantages. Ideally one may use the
most model independant method to see a signature and improve(and crosscheck)
results with the more complex analysis (we have  tried to do).
It is possible to make the NN more important by just loosing the cuts and
optimize the NN better, that will probably give even better significance,
but as you pointed out, results will be more model dependant.

We agree that the systematics related to the fake leptons is the
most difficult part of the analysis. The problem is related to the
large data samples needed to optimize the fakes suppression.
In the other hand this further optimization might be very
 model(GEANT) dependant and far from the real detector performance
which will be studied in the first running year.
Therefore the presented analysis of the CMS potential for trileptons
can be only a guidline for the  real data analysis (although this
does not exclude further improvement of algortihms).



23. In section 6, there is no systematic uncertainty quoted for lepton
fake rates.

Due to the aggressive generator-level pre-selection cuts imposed
in this analysis and due to the low pT nature of the accepted
reconstructed leptons, the referees request that the authors provide
a reasonable estimate for the systematic uncertainty on the
significance to observe the signal due to fake leptons.

The uncertainties of the fake rates are included in the significance
calculation  as suggested.
This uncertainties can be treated as a statistical uncertainties
(sigma) in the background estimation and included in the Sc12 via the factor
S_c12t=S_c12t*sqrt(Nb+dNb)/sqrt(sigma**2+Nb+dNb).

24. In section 6, the note describes a variation of cuts as a way to
estimate reconstruction uncertainties.

The referees do not feel that varying the cuts is an adequate method
for determining the systematic effects due to uncertainties in the
reconstruction.  The authors are requested to consult the different
PRS detector groups for their recommendation for applying systematic
uncertainties on reconstructed physics quantities.

In this analysis the reconstruction uncertainties
are mostly related  to  the jets energy scale and leptons Pt errors,
used in the selection with cuts. There are no corellation in between and
the variation of the energy scale and the leptons Pt is
the  simplest method to account for these reconstruction uncertainties.
In the NN analysis more variables are involved and we didnt
consider  influence of the reconstruction uncertainties for all of them.
However the most significant variables are  still related to the leptons
and jets  and can be taken into account via calculated factors(~1%) quadratically added.
This part is changed in the text.


25. In section 6, the note states "The dimuons, dielectrons, trimuons
and all OSSF pairs final states can be treated as different
experiments and the total significance can be evaluated."

The referees request that the authors clarify the exact trigger
streams for each of the dimuons, dielectrons, trimuons and all OSSF
final states.  If an event corresponds to multiple streams, then it
will possibly enter into multiple of the above "experiments."  If an
event is selected in one "experiment" is it also explicitly vetoed in
all other "experiments."  If the "OR" of all specified triggers is
taken, then the different final states can not be treated as different
independent experiments.

The 2m and 2e trigger streams (main streams for the trilepton study)
are separated and can be considered as an independent
experiments. However the electron contribution brings very litle
and can be unrelevent. This part is updated in the text.



V.Zhukov 24.03.2006