The implementation of this Fill() method will generally use objects of classes BaseCut, FloatFun and possibly FillIterator and BinningFun to know what the abscissa value and weights are, and into which histogram to fill a value.
Particular subclasses of SFO are SFH1F, SFH2F, and SFHProf, which correspond to the RooT classes TH1F, TH2F, and THProf (and are actually derived from them).
Additionally, also collections of histograms can be self-filling, and are therefore derived from SFO. One such collection id SFROList, which is simply a collection of self-filling Objects. Its Fill() method is implemented by looping over all objects in its collection and calling their respective Fill() method.
By making SFROList an SFO itself, we allow the creation of trees of such object lists. This is an application of the "Composite" pattern of Gamma, Helm, Johnson and Vlissides (the famous "Gang of Four", "GoF" in short).
A particular SFROList subclass is EventLoop, which is the base class for the classes where the user books and outputs her histohrams.
Another type of self-filling histogram collections are SFSetOfHistograms and SFMatrixOfHistograms. The collections also have a Fill() method and administer several histograms. However, as opposed to SFROList, the Fill() method will fill only one histogram at a time. Objects of type BinningFun are checked into which histogram one should make an entry, and then the corresponding histogram gets an entry. These collections are used if we have a number of similar histograms (say, mass histograms) which should be filled under mutually exclusive conditions. These conditions can be file type (histogram 0 for data, 1 for MC type 1, 2 for MC type 2, etc), or some kinematic quantity, if we want to make a measurement differential in t, W, pt, or some other variable.
Initially, registered objects where only histograms in our framework (at which time we called the class RegH, for "registered histogram"). The main purpose was to build lists, or collections, of self-filling histograms.
Meanwhile we have extended the use of RegO. Now, also cached Objects (class CachedO) are derived from RegO, and we use ROList objects of cached objects to invalidate all caches of cached objects.
In future releases, we will probably use RegO to implement a reference-counting mechanism, so that purists will not have to complain about memory leaks anymore.
Class ROList is again derived from RegO, another case of the "Composite" pattern of GoF.
A function object is symply an object that overloads operator(). One can think of it a a class that has mainly one purpose, and therefore one main method. This method could be called "doit()", or "foo()", or "doWhatThisClassIsMeantToDo()", or simply have "no name", namely be "operator()".
The nice (though at first puzzling) thing with function objects is that if you have declared such an object, lets say
FloatFun& pt = *new PtFun (ntuple);
float thePT = pt();
float pt() { return something; }
It is proably best to think of a function object as a function with a memory. The object's state is the "memory", it can hold values (e.g. the normalization, mean and sigma of a Gaussian, or a pointer to an ntuple variable), the operator() is the main way to use the object.
Now, a FloatFun in our framework is a function object where operator() takes no arguments and returns a float, leaving the object itself unaltered:
class FloatFun { public: virtual float operator() () const = 0; virtual void destroy() { delete this; } virtual const FillIterator *getIterator() const { return 0; } protected: virtual ~FloatFun() {}; };
The idea is that for any variable that we want to plot, we have a function object of some class derived from FloatFun that will return the value of the variable via operator().
The self-filling objects will then get pointers to FloatFun objects when they are created, and thereby "know" what to plot when their Fill() method is called.
A nice percolate of function objects is that we can easily define classes that hold pointers to other function objects and thereby can implement arithmetic operations like product or sum, or represent functions of function objects like sine or square root.
In our framework, FloatFun objects are used to define abscissa values and weights for self-filling histograms.
We can also express cuts by writing things like
FloatFun& pt = *new PtFun(ntuple); // returns pt of an event BaseCut& ptcut1 = (pt > 3); // A cut on pt BaseCut& ptcut2 = (2 <= pt < 3); // Another ptcut, equivalent to // (2 <= pt) && (pt < 3); SFH1F *h = new SFH1F ("pt", // Histo name "Event pt > 3", // Histo title ptbinning, // The binning pt, // What is plotted: pt pt > 3, // A cut sqrt (pt)); // A weight
One last note: Why is the destructor of FloatFun virtual? Actually, for FloatFun itself this is a bit pointless, because FloatFun is a pure abstract class, so no FloatFun objects can be instantiated anyway. However, it should remind you, the user, also to make the destructor of any subclass of FloatFun protected, or even private (in which case no further subclasses can be derived from that class!).
The protected constructor makes code such as this illegal:
// constructor of DataLoop (derived from EventLoop): // Book all histograms DataLoop::DataLoop (Ntuple& ntuple) { PtFun pt(ntuple); // pt is now an instance of PtFun // h shall plot the event pt; it will hold a pointer to the // pt object! h = new SFH1F ("pt", "Event pt", ptbinning, pt); / // pt goes out of scope, ~PtFun(0 is called, // => object pt does no longer exist, the pointer held by // the self-filling hstogram h is now invalid! }
Since FloatFun objects are referenced through pointers held by self-filling objects like SFH1F objects, and since the pointers are used in the SFH1F::Fill() method, the FloatFun objects must live at least until the last Fill() call has been made, i.e. basically until the end of the program. By forcing the user to write
// constructor of DataLoop (derived from EventLoop): // Book all histograms DataLoop::DataLoop (Ntuple& ntuple) { PtFun& pt = *new PtFun(ntuple); // pt now points to an instance of PtFun // h shall plot the event pt; it will hold a pointer to the // pt object! h = new SFH1F ("pt", "Event pt", ptbinning, pt); / // pt goes out of scope, but the (anonymous) objects // it points to lives on! => All is well that ends well... }
class BaseCut { public: virtual bool operator() () const = 0; virtual void destroy() { delete this; } virtual const FillIterator *getIterator() const { return 0; } protected: virtual ~BaseCut() {}; };
So, a BaseCut object looks at an event (or some part of an event, such as a jet, if a FillIterator is used), decides whetehr the event passes a cut or not, and returns this decision as result of its operator().
class IntFun { public: virtual int operator() () const = 0; inline virtual FloatIntFun& Float () const { return *new FloatIntFun(*this); } virtual void destroy() { delete this; } virtual const FillIterator *getIterator() const { return 0; } protected: virtual ~IntFun() {}; };
An IntFun is very useful to express cuts on integer values, such as the number of tracks:
IntFun& ntrack = NTIntFun<Ntuple> (ntuple, "ntracks"); BaseCut& trackcut = (ntrack == 2);
For this, we have method Float(), which returns an object of type FloatIntFun, which is derived from FloatFun and stores a reference to an IntFun. It calls operator() of the IntFun, converts the result to a float and returns it as result of its own operator():
class FloatIntFun: public FloatFun { public: FloatIntFun (const IntFun& intFun_) : intFun (intFun_) {} virtual float operator() () const { return intFun(); } protected: virtual ~FloatIntFun() {}; const IntFun& intFun; };
A FillIterator helps to adress the question "how can we plot more that one value per event (= ntuple row)?". Typical applications occur when we want to plot the pt (transverse momentum) of all tracks in an event, or the distance of all secondary vertices, or the masses of all D* candidates.
The interface of this class is a bit more complicated than the interfaces we saw so far:
class FillIterator: public IntFun { public: virtual int operator() () const=0; virtual bool next()=0; virtual bool reset()=0; virtual const FillIterator *getIterator() const { return this; } protected: virtual ~FillIterator() {}; };
Typically, a FillIterator will access some integer value in an ntuple that contains the number of tracks, jets, D* candidates, or similar, and have an internal variable (like "index") that contains the value of the index.
To facilitate the writing of such a class, we have defined a subclass SimpleFillIterator, where the user just has to provide (in a subclass, of course) an implementation of method getRange(), and the rest is already in place.
A FillIterator object (if there is one) is used by the Fill() method of a self-filling histogram to step through all objects (jets, tracks, D* candidates) that are to be plotted. Then, the FloatFun objects are called and asked about their values.
Consequently, such FloatFun objects have to know about the iterator, i.e. they have to hold a pointer to the iterator and in their operator() method have to ask the iterator about the number of the track, jet or whatever whose pt or other value should be plotted.
Such a FloatFun might look like this:
class PtFun : public FloatFun { public: PtFun(const Ntuple& nt_, const FillIterator& iter_) : nt(nt_), iter(iter_) {}; virtual float operator() () const { assert (trackIter() >= 0); return nt->IdPt[trackIter()]; }; protected: ~PtFun() {}; const Ntuple& nt; const FillIterator* iter; };
It is of paramount importance that the FloatFun object gets the same FillIterator object that is also passed to the self-filling histogram, because the histogram's Fill() method is responsible for incrementing the FillIterator object, which should be reflected in the result of the FloatFun.
A FillIterator has sort of two faces:
An advantage of a FillIterator being an IntFun is that we can use FillIterator also as value for an axis, e.g. in a situation where we want to plot how often a subtrigger has fired.
If any object of these classes uses a FillIterator object, it should return the pointer to this object in method getIterator(). This is used by the self-filling histograms to ensure that all function objects depend on the same iterator, and furthermore allows the self-filling histograms to deduce which iterator to use, so that the FillIterator needs not to be given explicitly.
In such cases it can be prohibitively slow to recompute these properties every time operator() of a FloatFun is called. This problem is adressed by cached objects.
The interface of the abstract base class CachedO is again very simple:
class CachedO: public RegO { public: CachedO (const ROListPoR& rol); virtual ~CachedO (); virtual void invalidateCache() = 0; private: CachedO (const CachedO& rhs); CachedO& operator= (const CachedO& rhs); };
The main property of a CachedO is a method invalidateCache(), which tells the object that its cache has become invalid, which will be the case after a new row of an ntuple has been read.
Tobe able to efficiently invalidate all caches, we made CachedO a registered object (derived from RegO). In class EventLoop, we have a ROList named cachedObjects, which is used to collect all cached objects and call invalidateCache() for all of them in the loop() method.
To facilitate the use of cached objects, we have defined four derived classes that should serve as base classes for cached FloatFun and BaseCut classes. We have defined two versions, one using iterators and one without iterators:
Howeve, if we want to recalculate the value of a FloatFun object only when operator() is actually called, we face the problem that we need to store (i.e., cache) the result in a data member, which is not allowed in a "const" method. Now what?
C++ defines the notion of "logical const-ness", i.e. allows to have data members that can be changed even for const objects. This is signalled by the keyword "mutable". Look at our implementation of SimpleCachedFloatFun:
class SimpleCachedFloatFun: public FloatFun, public CachedO { public: SimpleCachedFloatFun (const ROListPoR& rol) : CachedO (rol), cacheValid (false), cachedValue (0) {} virtual float operator() () const { if (!cacheValid) { recalculate(); cacheValid = true; } return cachedValue; } virtual void invalidateCache() { cacheValid = false; } virtual void recalculate() const = 0; protected: mutable bool cacheValid; mutable float cachedValue; };
Here, cacheValid and cachedValue are declared "mutable" and hence may be altered for const objects, i.e. in member functions that are marked "const", such as operator(). Observe also that recalculate() has been declared "const", because otherwise it could not be called by operator().
Of course, another way out would be to recalculate the values immediately in the invalidateCache() method, which is not "const":
class SimpleCachedFloatFun: public FloatFun, public CachedO { public: SimpleCachedFloatFun (const ROListPoR& rol) : CachedO (rol), cachedValue (0) {} virtual float operator() () const { return cachedValue; } virtual void invalidateCache() = 0; protected: float cachedValue; };
We have decided against this model, because presumably the recalculation is a costly process (otherwise we wouldn't go through all the trouble), and it is quite possible that for many events the recalculation is unnecessary, because cuts are made that reject most events (consider, for example, an application where we run a jet finder on an LHC event, but only for events which have at least two muon candidates). In such a case, a calculation of the FloatFun result for every event could degrade performance more than not using the cache mechanism at all.
Now we have two possibilities:
We could write (ugly) code like this:
void setLineColors (const ROList& rol) { for (unsigned int i = 0; i < rol.getEntries(); i++) { RegO *ro = rol.getEntry(i); // Make sure our registered object is really a RooT histogram: if (TH1 *h = dynamic_cast<TH1 *>(ro)) { // Set line color to red h->->SetLineColor (2); } } }
Then we apply this method to the various collections (e.g., all SFSetOfHistograms, all SFMatrixOfHistograms objects).
Somewhat better would be to derive our own collections:
class MySetOfHistograms: public SetOfHistograms { void setLineColors () { for (unsigned int i = 0; i < this->getEntries(); i++) { RegO *ro = this->getEntry(i); // Make sure our registered object is really a RooT histogram: if (TH1 *h = dynamic_cast<TH1 *>(ro)) { // Set line color to red h->->SetLineColor (2); } } } };
Or is there a third possibility? The Visitor pattern from GoF comes to the rescue:
Write a class like this:
class LineColorSetter: public HVisitor { public: virtual void visit (RegO& ro) { if (Th1 *h = dynamic_cast<TAttLine *>(&ro)) { h->SetLineColor (h); } } };
Now, all we need is some code in class ROList that calls the visit() method for every object in its list:
ROList& ROList::visit (HVisitor& v) { for (unsigned int i = 0; i < entries; i++) { if (theList[i]) v.visit((*theList[i])); } return *this; }
Now in our user code we can write code like this:
LineColorSetter lcs; // a LineColorSetter object // sfset is a SFSetOfHistograms sfset.visit(lcs); // Set all histogram's line colors to red
In file HVisitors.h (plural!), we have defined a number of handy HVisitor subclasses that perform common tasks:
We have also predefined some objects that can be used directly to set some attributes, with (as we think) selfexplaining names:
static AttLineSetter blackline (1); static AttLineSetter redline (2); static AttLineSetter greenline (3); static AttLineSetter blueline (4); static AttLineSetter yellowline (5); static AttLineSetter magentaline (6); static AttLineSetter cyanline (7); static AttFillSetter blackfill (1); static AttFillSetter redfill (2); static AttFillSetter greenfill (3); static AttFillSetter bluefill (4); static AttFillSetter yellowfill (5); static AttFillSetter magentafill (6); static AttFillSetter cyanfill (7); static AttFillSetter hollowfill (-1, 0); static AttFillSetter solidfill (-1, 1001); static AttMarkerSetter blackmarker (1); static AttMarkerSetter redmarker (2); static AttMarkerSetter greenmarker (3); static AttMarkerSetter bluemarker (4); static AttMarkerSetter yellowmarker (5); static AttMarkerSetter magentamarker (6); static AttMarkerSetter cyanmarker (7);
A note on the class name: HVisitor should really be called RegOVisitor, but for historical reasons its name is what it is.
Often we'll want to perform some operation, let's say a mass fit, only on some subset of all histograms. In such cases it makes sense to define our own ROList objects in our DataLoop class:
class DataLoop: public EventLoop { public: // the usual stuff here... private: ROList masshistos; ROList otherhistos; }; // constructor: book histos DataLoop::DataLoop() { // Define binnings, FloatFuns etc here // book a mass histo, put it into list masshistos: new SFH1F ("mass", "Some Mass", massbinning, masshistos, massfun); } // Plot histograms DataLoop::output (const char* rootfile, const char* psfile) { // MassFitter is a HVistor that fits a mass histogram MassFitter theMassFitter; // Fit only mass histograms, not the others masshistos.visit (theMassFitter); // Continue here with plotting etc... }
Another possibility would be that a HVisitor checks e.g. the name of a histogram and fits only histograms that contain the string "mass".
Often we have to book histograms which all have the same binning. Now, instead of
h1 = new TH1F ("h1", "hist 1", 100, 0., 1.); h2 = new TH1F ("h2", "hist 2", 100, 0., 1.); h3 = new TH1F ("h3", "hist 3", 100, 0., 1.); h4 = new TH1F ("h4", "hist 4", 100, 0., 1.); // and so on, ad infinitum...
Binning hbinning (100, 0., 1.); h1 = new RegH1F ("h1", "hist 1", hbinning, this); h2 = new RegH1F ("h2", "hist 2", hbinning, this); h3 = new RegH1F ("h3", "hist 3", hbinning, this); h4 = new RegH1F ("h4", "hist 4", hbinning, this); // and so on, ad infinitum...
Binning objects allow us to do exactly that. We have constructors for Binning objects that exactly mimic the corresponding part in the RooT histogram constructors:
class Binning { public: Binning (); Binning (int nbins_, double xlow, double xhigh); Binning (int nbins_, const float binedges_[]); Binning (int nbins_, const double binedges_[]); Binning (const Binning& rhs); virtual ~Binning(); virtual int getBin (double x) const; virtual int getNBins() const; virtual double getLowerBinEdge(int i) const; virtual double getUpperBinEdge(int i) const; virtual double getLowerEdge() const; virtual double getUpperEdge() const; virtual const double *getEdges() const; virtual bool isEquidistant() const; protected: int nbins; double *binedges; bool equidistant; private: Binning& operator= (const Binning&); };
We have defined constructors for all our histogram classes that take Binning objects instead of the parameters nbins, xlow, xhigh.
Compared to a Binning, it has the same constructors, and only a few additional methods that have to be implemented by a derived class: class BinningFun: public Binning, public IntFun { public: BinningFun (); BinningFun (int nbins_, float xlow, float xhigh); BinningFun (int nbins_, const float binedges_[]); BinningFun (int nbins_, const double binedges_[]); BinningFun (const Binning& binning_);
virtual int operator() () const = 0; virtual const char *getBinName(int i) const = 0; virtual const char *getBinTitle(int i) const = 0; protected: virtual ~BinningFun() {}; };
A BinningFun returns a bin number (-1 for "no bin") in operator(), insofar it is also an IntFun. BinningFun objects are used by classes SFSetOfHistograms and SFMatrixOfHistograms to decide into which histogram a certain entry should be made.
For instance, if we have a flag "filetype" in our Ntuple, where 0 means "data", 1 means "Signal Monte Carlo", and 2 means "Background Monte Carlo", we could have a BinningFun that returns 0, 1, or 2, and then the entry goes into the corresponding histogram in a SFSetOfHistograms.
In addition to operator(), a BinningFun has two other methods that have to be implemented by a derived class, namely getBinName(int i) and getBinTitle (int i). These methods should return strings (allocated with operator new[]) with a name or a title for a given bin i. The name should be short (like "0", "1", "2" or "data", "sigMC") and contain no spaces, the title could be nicer like "Data", "Signal Monte Carlo" or "0.1 < t < 0.2". These methods are used by SetOfHistograms and MatrixOfHistograms and their subclasses to generate histogram names and titles during the histogram booking stage.
To make life easier, we have defined a class FloatFunBinning, which takes a FloatFun that defines a variable according to which the binning is done, a Binning, and a string that should be the variable name (like "t" or "pt" or "W"), from which bin names like "007" and bin titles like "0.1 < t < 0.2" are generated.