This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
| playground:cmstest [2017/11/20 16:51] jthaler | playground:cmstest [2017/11/28 14:02] (current) jthaler | ||
|---|---|---|---|
| Line 18: | Line 18: | ||
| Because of these complications, progress in particle physics typically proceeds via a vigorous dialogue between the theoretical and experimental communities.  An experimental advance can inspire a new theoretical method, which launches a new experimental measurement, which motivates a new theoretical calculation, and so on. While there are some theoretical physicists who have officially joined an experimental team, either in a short term advisory role or as a longterm collaboration member, that is relatively rare. Thus, the best way for me to influence how LHC data is analyzed is to write and publish a paper, and I'm proud that a number of my theoretical ideas have found applications at the LHC. | Because of these complications, progress in particle physics typically proceeds via a vigorous dialogue between the theoretical and experimental communities.  An experimental advance can inspire a new theoretical method, which launches a new experimental measurement, which motivates a new theoretical calculation, and so on. While there are some theoretical physicists who have officially joined an experimental team, either in a short term advisory role or as a longterm collaboration member, that is relatively rare. Thus, the best way for me to influence how LHC data is analyzed is to write and publish a paper, and I'm proud that a number of my theoretical ideas have found applications at the LHC. | ||
| - | With the release of the CMS open data, though, I was presented with the opportunity to perform exploratory physics studies directly on data. My friend (and CMS open data consultant)  [[https://arts-sciences.buffalo.edu/physics/faculty/salvatore-rappoccio.html|Sal Rappoccio]] is fond of saying that "data makes you smarter".  This aphorism applies both to detector effects, where "smarter" means processing the data with improved precision and robustness, and to physics effects, where "smarter" means extracting new kinds of information from the collision debris.  So while I didn't know exactly what I wanted to do with the CMS open data when I first downloaded the CERN Virtual Machine, I knew that, no matter what, I was going to learn something. | + | With the release of the CMS open data, though, I was presented with the opportunity to perform exploratory physics studies directly on data. My friend (and CMS open data consultant) [[https://arts-sciences.buffalo.edu/physics/faculty/salvatore-rappoccio.html|Sal Rappoccio]] always reminds us of the apocryphal saying: "data makes you smarter".  This aphorism applies both to detector effects, where "smarter" means processing the data with improved precision and robustness, and to physics effects, where "smarter" means extracting new kinds of information from the collision debris.  So while I didn't know exactly what I wanted to do with the CMS open data when I first downloaded the CERN Virtual Machine, I knew that, no matter what, I was going to learn something. | 
| Line 27: | Line 27: | ||
| Luckily, an MIT postdoctoral fellow [[https://th-dep.web.cern.ch/roster/xue-wei|Wei Xue]] (now at CERN) had extensive experience using public data from the [[https://fermi.gsfc.nasa.gov/ssc/data/|Fermi Large Area Telescope]], and he started processing the 2 Terabytes of data in the [[http://opendata.cern.ch/record/5|Jet Primary Dataset]] (more about that later).  Around the same time, an ambitious MIT sophomore [[https://lsa.umich.edu/physics/people/graduate-students/aashisht.html|Aashish Tripathee]] (now a graduate student at University of Michigan) joined the project with no prior experience in particle physics but ample enthusiasm and a solid background in programming. | Luckily, an MIT postdoctoral fellow [[https://th-dep.web.cern.ch/roster/xue-wei|Wei Xue]] (now at CERN) had extensive experience using public data from the [[https://fermi.gsfc.nasa.gov/ssc/data/|Fermi Large Area Telescope]], and he started processing the 2 Terabytes of data in the [[http://opendata.cern.ch/record/5|Jet Primary Dataset]] (more about that later).  Around the same time, an ambitious MIT sophomore [[https://lsa.umich.edu/physics/people/graduate-students/aashisht.html|Aashish Tripathee]] (now a graduate student at University of Michigan) joined the project with no prior experience in particle physics but ample enthusiasm and a solid background in programming. | ||
| - | So what were we actually going to do with the data? My first idea was to try out a [[https://arxiv.org/abs/1310.7584|somewhat obscure]] LHC analysis technique my collaborators and I had developed in 2013, since it had never been tested directly on LHC data. (It may eventually be incorporated into a [[https://hep.uchicago.edu/atlas/trigger/|hardware upgrade]] of the ALTAS detector, or it may remain in obscurity.)  Wei was even able to make a plot [[https://indico.cern.ch/event/340703/contributions/802184/attachments/668768/919259/jthaler_FCC_nobuilds.pdf|(slide 29)]] for me to show in March 2015 as part of a long-range planning study for the next collider after the LHC. There is a big difference, though, between making a plot and really understanding the physics at play, and despite performing a [[https://arxiv.org/abs/1501.01965|precision calculation]] of this technique, it was not clear we could do a robust analysis. | + | So what were we actually going to do with the data? My first idea was to try out a [[https://arxiv.org/abs/1310.7584|somewhat obscure]] LHC analysis technique my collaborators and I had developed in 2013, since it had never been tested directly on LHC data. (It may eventually be incorporated into a [[https://hep.uchicago.edu/atlas/trigger/|hardware upgrade]] of the ALTAS detector, or it may remain in obscurity.)  Wei was even able to make a plot [[https://indico.cern.ch/event/340703/contributions/802184/attachments/668768/919259/jthaler_FCC_nobuilds.pdf|(slide 29)]] for me to show in March 2015 as part of a long-range planning study for the next collider after the LHC. There is a big difference, though, between making a plot and really understanding the physics at play, and despite performing a [[https://arxiv.org/abs/1501.01965|precision calculation]] of this technique, it was not clear whether we could do a robust analysis. | 
| - | In early 2015, though, I had the pleasure of collaborating with two MIT postdoctoral fellows, [[http://people.reed.edu/~larkoski/|Andrew Larkoski]] (now at Reed College) and [[https://www.difi.unige.it/it/dipartimento/persone/marzani-simone|Simone Marzani]] (now at University of Genova), to develop a [[https://arxiv.org/abs/1502.01719|novel method]] to analyze jets at the LHC. While new, this method had a "timeless" quality to it, exhibiting remarkable theoretical robustness that we hoped would carry over into the experimental regime. | + | In early 2015, though, I had the pleasure of collaborating with two MIT postdoctoral fellows, [[http://people.reed.edu/~larkoski/|Andrew Larkoski]] (now at Reed College) and [[https://www.difi.unige.it/it/dipartimento/persone/marzani-simone|Simone Marzani]] (now at University of Genova), to develop a [[https://arxiv.org/abs/1502.01719|novel method]] to analyze jets at the LHC. While new, this method had a timeless quality to it, exhibiting remarkable theoretical robustness that we hoped would carry over into the experimental regime. | 
| ==== The Substructure of Jets ==== | ==== The Substructure of Jets ==== | ||
| - | Jets are collimated sprays of particles that arise whenever quarks and gluon are produced in high-energy collisions.  Almost every collision at the LHC involves jets in some way, either as part of the signal of interest or as an important component of the background.  In the 2010 CMS open data, the [[http://opendata.cern.ch/record/5|Jet Primary Dataset]] contains collision events exhibiting a wide range of different jet configurations, from the most ubiquitous case with [[https://arxiv.org/abs/1705.02628|back-to-back jet pairs]], to the exotic case with just a single jet (which might be a [[https://arxiv.org/abs/1703.01651|signal of dark matter]]), to the explosive case with a high multiplicity of energetic jets (which might arise from [[https://arxiv.org/abs/1705.01403|black hole production]]). | + | Jets are collimated sprays of particles that arise whenever quarks and gluon are produced in high-energy collisions.  Almost every collision at the LHC involves jets in some way, either as part of the signal of interest or as an important component of the background noise. In the 2010 CMS open data, the [[http://opendata.cern.ch/record/5|Jet Primary Dataset]] contains collision events exhibiting a wide range of different jet configurations, from the most ubiquitous case with [[https://arxiv.org/abs/1705.02628|back-to-back jet pairs]], to the more exotic case with just a single jet (which might be a [[https://arxiv.org/abs/1703.01651|signal of dark matter]]), to the explosive case with a high multiplicity of energetic jets (which might arise from [[https://arxiv.org/abs/1705.01403|black hole production]]). | 
| - | While the formation of jets in high-energy collisions has been known [[https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.35.1609|since 1975]] (and arguably even earlier than that), there has been remarkable progress in the past decade in understanding the [[https://arxiv.org/abs/1709.04464|substructure of jets]].  A typical jet is composed of around 10-30 individual particles, and the pattern of those particles encodes subtle information about whether the jet comes from a quark, or from a gluon, or from a more exotic object.  Jet substructure continues to be an active area of development in collider physics, with many new advances made every year. | + | While the formation of jets in high-energy collisions has been known [[https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.35.1609|since 1975]] (and arguably even earlier than that), there has been remarkable progress in the past decade in understanding the [[https://arxiv.org/abs/1709.04464|substructure of jets]].  A typical jet is composed of around 10-30 individual particles, and the pattern of those particles encodes subtle information about whether the jet comes from a quark, or from a gluon, or from a more exotic object.  Jet substructure continues to be an active area of development in collider physics, with many new [[https://arxiv.org/abs/1012.5412|advances]] [[https://arxiv.org/abs/1201.0008|made]] [[https://arxiv.org/abs/1311.2708|every]] [[https://arxiv.org/abs/1504.00679|year]]. | 
| A fascinating feature of jets is that they exhibit fractal-like behavior.  As one zooms in on a jet and examines its substructure, one finds that the substructure itself has sub-substructure, which has sub-sub-substructure, and so on. This recursive self-similar behavior is captured by the [[http://www.sciencedirect.com/science/article/pii/0550321377903844?via%3Dihub|"QCD splitting functions"]], which describes how a quark or gluon fragments into more quarks and gluons.  (QCD refers to quantum chromodynamics, which is the theory that describes the interactions of quarks and gluons.)  While the QCD splitting functions are well-known and have been indirectly tested through a multitude of collider measurements, they had never before been tested directly. | A fascinating feature of jets is that they exhibit fractal-like behavior.  As one zooms in on a jet and examines its substructure, one finds that the substructure itself has sub-substructure, which has sub-sub-substructure, and so on. This recursive self-similar behavior is captured by the [[http://www.sciencedirect.com/science/article/pii/0550321377903844?via%3Dihub|"QCD splitting functions"]], which describes how a quark or gluon fragments into more quarks and gluons.  (QCD refers to quantum chromodynamics, which is the theory that describes the interactions of quarks and gluons.)  While the QCD splitting functions are well-known and have been indirectly tested through a multitude of collider measurements, they had never before been tested directly. | ||
| Line 53: | Line 53: | ||
| By far the biggest challenge for us (and for most CMS jet analyses) was "triggering".  I mentioned above that the [[http://opendata.cern.ch/record/5|Jet Primary Dataset]] contains many different kinds of jet configurations, but I didn't explain how exactly those configurations were chosen.  The collision rate at the LHC is so high that there aren't enough computing resources available to process all of the data that is collected.  Instead, CMS has a complex system of triggers which reject "uninteresting" events and select "interesting" events.  The reason for the scare quotes is that triggers are indeed scary.  There is a rather large menu of different possible event configurations which involve jets and other collider objects.  If CMS made a mistake in deciding which events were "interesting", then potentially valuable data could be lost forever.  On the flip side, if CMS decided that too many events were "interesting", then that could flood their computing systems with a deluge of useless information. | By far the biggest challenge for us (and for most CMS jet analyses) was "triggering".  I mentioned above that the [[http://opendata.cern.ch/record/5|Jet Primary Dataset]] contains many different kinds of jet configurations, but I didn't explain how exactly those configurations were chosen.  The collision rate at the LHC is so high that there aren't enough computing resources available to process all of the data that is collected.  Instead, CMS has a complex system of triggers which reject "uninteresting" events and select "interesting" events.  The reason for the scare quotes is that triggers are indeed scary.  There is a rather large menu of different possible event configurations which involve jets and other collider objects.  If CMS made a mistake in deciding which events were "interesting", then potentially valuable data could be lost forever.  On the flip side, if CMS decided that too many events were "interesting", then that could flood their computing systems with a deluge of useless information. | ||
| - | For our final analysis, we had to carefully sew together five different trigger selections, all of which changed over the course of the 2010 run. As an example, one of the triggers was named "HLT_Jet70U".  "HLT" stands for "high level trigger", which is the most sophisticated level of trigger selection.  "Jet" means that there was just a single jet object used to defined the trigger (though most of the selected events contained two jets).  One might think that "70" would mean that this trigger would select jets with an energy (strictly speaking, "transverse momentum") above 70 GeV, but the "U" mean "uncalibrated", such that only when the jet energy was above 150 GeV was "HLT_Jet70U" guaranteed to work as expected.  Through a long process of trial and error, we eventually figured out how to properly use the jet trigger information from CMS, which was essential for us to gain confidence in our results. | + | For our final analysis, we had to carefully sew together five different trigger selections, all of which changed over the course of the 2010 run. As an example, one of the triggers was named "HLT_Jet70U".  "HLT" stands for "high level trigger", which is the most sophisticated level of trigger selection.  "Jet" means that there was just a single jet object used to define the trigger (even though most of the selected events contain two jets).  One might think that "70" would mean that this trigger would select jets with an energy (strictly speaking, "transverse momentum") above 70 GeV, but the "U" mean "uncalibrated", such that only when the jet energy was above 150 GeV was "HLT_Jet70U" guaranteed to work as expected.  Through a long process of trial and error, we eventually figured out how to properly use the jet trigger information from CMS, which was essential for us to gain confidence in our results. | 
| + | |||
| + | Ultimately, once we dealt with these experimental complications, we succeeded at [[http://dx.doi.org/10.1103/PhysRevLett.119.132003|exposing the QCD splitting function]] using the 2010 CMS open data. The results were perfectly in line with our theoretical expectations, providing a direct confirmation of the fractal structure of jets. Armed with this rich open dataset, we also performed a variety of [[https://doi.org/10.1103/PhysRevD.96.074003|additional substructure tests]] that were only possible because of the fantastic performance of the CMS detector.  Coming full circle, Aashish presented [[https://indico.cern.ch/event/579660/contributions/2582130/attachments/1494286/2324335/BOOST.pdf|our nearly-final analysis]] at the [[https://indico.cern.ch/event/579660/|BOOST 2017 workshop]] in Buffalo, two years of effort summarized into a 20 minute talk. | ||
| - | Ultimately, once we dealt with these experimental complications, we succeeded at [[http://dx.doi.org/10.1103/PhysRevLett.119.132003|exposing the QCD splitting function]] using the 2010 CMS open data. The results were perfectly in line with our theoretical expectations, providing a direct confirmation of the fractal structure of jets. Armed with this rich open dataset, we also performed a variety of [[https://doi.org/10.1103/PhysRevD.96.074003|additional substructure tests]] that were only possible because of the fantastic performance of the CMS detector.  Coming full circle, Aashish presented [[https://indico.cern.ch/event/579660/contributions/2582130/attachments/1494286/2324335/BOOST.pdf|our nearly-final analysis]] at the [[https://indico.cern.ch/event/579660/|BOOST 2017 workshop]] in Buffalo. | ||
| ==== Learning from the Community ==== | ==== Learning from the Community ==== | ||
| - | While our two publications only list 5 authors (Aashish, Wei, Andrew, Simone, and myself), our acknowledgements recognize around 40 experimentalists who generously offering their time, advice, and, in some cases, code. Without help from [[https://arts-sciences.buffalo.edu/physics/faculty/salvatore-rappoccio.html|Sal Rappoccio]], we would have struggled to figure out how the extract the proper jet correction factors.  Without help from the CMS open data team, including [[https://tuhat.helsinki.fi/portal/en/person/kmlassil|Kati Lassila-Perini]] and [[http://www.desy.de/~geiser/|Achim Geiser]], we would have never figured out how to determine the "integrated luminosity", which tells you how much total data CMS had collected.  Whenever I gave talks about our CMS open data effort, experimentalists in the audience would kindly give their expert advice about how to improve and refine our analysis.  (This certainly saved us from many of the rookie mistakes often made by starting experimental PhD students.) We also benefitted from having a 2015 summer student [[https://alexis-romero.weebly.com/|Alexis Romero]] (now a graduate student at University of California, Irvine) test whether the CMS open data results agreed with those obtained from simulated LHC samples. | + | While our two publications only list 5 authors (Aashish, Wei, Andrew, Simone, and myself), our acknowledgements recognize around 40 experimentalists who generously offering their time, advice, and, in some cases, code. Without help from [[https://arts-sciences.buffalo.edu/physics/faculty/salvatore-rappoccio.html|Sal Rappoccio]], we would have struggled to figure out how to extract and apply the proper jet correction factors.  Without help from the CMS open data team, including [[https://tuhat.helsinki.fi/portal/en/person/kmlassil|Kati Lassila-Perini]] and [[http://www.desy.de/~geiser/|Achim Geiser]], we would have never figured out how to determine the "integrated luminosity", which tells you how much total data CMS had collected.  Whenever I gave talks about our CMS open data effort, experimentalists in the audience would kindly point out some of our "rookie mistakes" (often made by starting experimental PhD students). We also benefitted from having a 2015 summer student [[https://alexis-romero.weebly.com/|Alexis Romero]] (now a graduate student at University of California, Irvine) test whether the CMS open data results agreed with those obtained from simulated LHC samples. | 
| Most of the feedback we got from the experimental particle physics community was [[https://twitter.com/KyleCranmer/statuses/913112593715335168|very positive]].  Though there was considerable initial skepticism that a team of 5 theoretical physicists could perform a publishable analysis based on open collider data, much of that skepticism dissipated once it became clear that our analysis was based largely on the same workflow used by CMS. Our analysis is by no means perfect, since there are places where we simply didn't have the information (or the expertise) to address a known shortcoming.  But I am proud that we applied a high degree of scrutiny to our own work, even though the final plots in our September 2017 publication are essentially the same as the ones I showed back in August 2015. | Most of the feedback we got from the experimental particle physics community was [[https://twitter.com/KyleCranmer/statuses/913112593715335168|very positive]].  Though there was considerable initial skepticism that a team of 5 theoretical physicists could perform a publishable analysis based on open collider data, much of that skepticism dissipated once it became clear that our analysis was based largely on the same workflow used by CMS. Our analysis is by no means perfect, since there are places where we simply didn't have the information (or the expertise) to address a known shortcoming.  But I am proud that we applied a high degree of scrutiny to our own work, even though the final plots in our September 2017 publication are essentially the same as the ones I showed back in August 2015. | ||
| - | There were, however, a number of concerns raised by our work. Unlike analyses performed within CMS, our work did not have to go through the rigorous CMS internal review process.  (Our papers were subject to peer review prior to publication, but that standard is not nearly as stringent as the one applied within CMS.) Unlike CMS collaboration members, we did not have to perform service work on the experiment to gain authorship status.  (A number of my software tools have been incorporated into the CMS software framework, but that is a relatively small contribution to the overall CMS effort.)  These issues are not specific to the CMS open data, though, and arise any time data is released into the public domain.  Indeed, there is no guarantee that public data will be used correctly, and there is a risk that making the data public will make it less attractive for an experimentalist to join a collaboration.  | + | There were, however, a number of concerns raised by our work. Unlike analyses performed within CMS, our work did not have to go through the rigorous CMS internal review process.  (Our papers were subject to peer review prior to publication, but that standard is not nearly as stringent as the one applied within CMS.) Unlike CMS collaboration members, we did not have to perform service work on the experiment to gain authorship status.  (Some of my software tools have been incorporated into the CMS software framework, but that is a relatively small contribution to the overall CMS effort.)  These issues are not specific to the CMS open data, though, and arise any time data is released into the public domain.  Indeed, there is no guarantee that public data will be used correctly, and there is a risk that making the data public will make it less attractive for an experimentalist to join a collaboration.  | 
| - | In my view, though, the scientific benefits of making data public outweigh the scientific costs. In the case of the CMS open data, there is a 4-5 year time lag between when the data is collected and when it is made public.  That time lag helps ensure that open data complements, rather than competes, with the needs of the CMS collaboration.  Moreover, open data is a stepping stone towards full archival access, such that even as the LHC is eventually decommissioned, the data will be preserved for future use. By making the data public, there is a chance to perform an analyses like ours, where 2010 data, released in 2014, is analyzed using a 2015 technique, for publication in 2017. | + | In my view, though, the scientific benefits of making data public outweigh the scientific costs. With the CMS open data, there is a 4-5 year time lag between when the data is collected and when it is made public.  That time lag helps ensure that open data complements, rather than competes, with the needs of the CMS collaboration.  Moreover, open data is a stepping stone towards full archival access, such that even when the LHC is eventually decommissioned, the data will be preserved for future use. By making the data public, there is a chance to perform a back-to-the-future analysis like ours, where 2010 data, released in 2014, is analyzed using a 2015 technique, for publication in 2017. | 
| - | Interestingly, as we were pursuing our open data analysis, there was a [[https://arxiv.org/abs/1708.09429|official CMS analysis]] on a similar topic.  Our analysis was based on proton collisions from 2010, while the CMS analysis was based mostly on lead collisions from 2015. Our analysis was an exploratory study of jet substructure, while the CMS analysis was far more ambitious, using jet substructure to probe the properties of a hot dense state of matter called the quark/gluon plasma.  One could cynically say that our analysis was stealing thunder from CMS, but I see these two studies as being synergistic, since we made different analysis choices that led to complimentary physics insights.  In this way, open data can enrich the dialogue between the theoretical and experimental communities. | + | Interestingly, as we were pursuing our open data analysis, there was a [[https://arxiv.org/abs/1708.09429|official CMS analysis]] on a similar topic.  Our analysis was based on proton-proton collisions from 2010, while the CMS analysis was based mostly on lead-lead collisions from 2015. Our analysis was an exploratory study of jet substructure, while the CMS analysis was far more ambitious, using jet substructure to probe the properties of a hot dense state of matter called the quark/gluon plasma.  One could cynically say that our analysis was stealing thunder from CMS, but I see these two studies as being synergistic, since we made different analysis choices that led to complementary physics insights.  In this way, open data can enrich the dialogue between the theoretical and experimental particle physics communities. | 
| ==== Broadening the Open Data Effort ==== | ==== Broadening the Open Data Effort ==== | ||
| - | In addition to performing our own analyses, we are trying to make it easier for others to work with the CMS open data. For our jet substructure work, we found it beneficial to take the original CMS open data released in AOD format ("Analysis Object Data") and distill it into a simpler MOD format (short for "modified", backronymed to "MIT Open Data").  Because MOD files contain a strict subset of the AOD information, it helped us expedite the analysis workflow as well as avoid common mistakes. Aashish developed two GitHub repositories to [[https://github.com/tripatheea/MODProducer/|produce]] and [[https://github.com/tripatheea/MODAnalyzer/|analyze]] MOD files, and this code could be the basis for subsequent CMS open data studies.  (That said, I do not recommend trying to work with these tools in their present forms, since we are actively working to simplify them and make them more portable.) | + | In addition to performing our own analyses, we are trying to make it easier for others to work with the CMS open data. For our jet substructure work, we found it beneficial to take the original CMS open data released in AOD format ("Analysis Object Data") and distill it into a simpler MOD format (short for "modified", backronymed to "MIT Open Data").  Because MOD files contain a strict subset of the AOD information, it helped us expedite the analysis workflow as well as avoid common pitfalls. Aashish developed two GitHub repositories to [[https://github.com/tripatheea/MODProducer/|produce]] and [[https://github.com/tripatheea/MODAnalyzer/|analyze]] MOD files, and this code could be the basis for subsequent CMS open data studies.  (That said, I do not recommend trying to use these tools in their present forms, since we are actively working to simplify them and make them more portable.) | 
| + | |||
| + | In April 2016, the CMS experiment released the [[https://home.cern/about/updates/2016/04/cms-releases-new-batch-lhc-open-data|second batch]] of open data from 2011 proton-proton collisions.  This 2011 data set is far richer than the 2010 release, since it contains many more event categories as well as more information about detector performance.  I have gathered a new team of theorists to work with the 2011 data, and I hope to report on that work sometime next year. Compared to our study with the 2010 open data, our upcoming analysis is simultaneously simpler (since it doesn't directly involve jets) and more complex (since we are digging into more rarified collision properties).  I don't want to reveal the specific topic of our study, though, since short-term secrecy is sometimes needed to enable long-term openness (cf. the 4-5 year time lag for the CMS open data release). | ||
| - | In April 2016, the CMS experiment released the [[https://home.cern/about/updates/2016/04/cms-releases-new-batch-lhc-open-data|second batch]] of open data from 2011 proton-proton collisions.  This 2011 data set is far richer than the 2010 release, since it contains many more event categories as well as more information about detector performance.  I have gathered a new team of theorists to work with the 2011 data, and I hope to report on that work sometime next year. Compared to our study with the 2010 open data, our upcoming analysis is simultaneously simpler (since it doesn't directly involve jets) and more complex (since we are digging into more rarified collision properties).  I don't want to reveal the reveal the specific topic of our study, though, since short-term secrecy is sometimes needed to enable long-term openness (cf. the 4-5 year time lag for the CMS open data release). | + | Eventually, the CMS experiment will release the third batch of open data from 2012, with hopefully enough information to reproduce the monumental [[https://press.cern/press-releases/2012/07/cern-experiments-observe-particle-consistent-long-sought-higgs-boson|discovery of the Higgs boson]]. | 
| - | Beyond the CMS open data, I am also looking for ways to use archival data from the [[https://hep-project-dphep-portal.web.cern.ch/sites/hep-project-dphep-portal.web.cern.ch/files/archive_data.pdf|ALEPH experiment]].  ALEPH was one of the four main experiments at the former Large Electron-Position (LEP) collider at CERN. LEP closed in 2000 such that the tunnel could be reused for the LHC. With the help of a former ALEPH collaboration member [[https://www.rd-alliance.org/users/mmaggi|Marcello Maggi]], we are taking ALEPH data from the 1990s and applying jet substructure techniques that weren't even conceived of until 2008. While LEP data is very different from LHC data, I expect some of the lessons from our archival LEP analyses to inform ongoing studies at the LHC. | + | Beyond the CMS open data, I am also looking for ways to use archival data from the [[https://hep-project-dphep-portal.web.cern.ch/sites/hep-project-dphep-portal.web.cern.ch/files/archive_data.pdf|ALEPH experiment]].  ALEPH was one of the four main experiments at the former Large Electron-Position (LEP) collider at CERN. LEP closed in 2000 such that the tunnel could be reused for the LHC. With the help of ALEPH collaboration member [[https://www.rd-alliance.org/users/mmaggi|Marcello Maggi]], we are taking ALEPH data from the 1990s and applying jet substructure techniques that weren't even conceived of until 2008. While LEP data is very different from LHC data, I expect some of the lessons from our archival LEP studies to inform ongoing analyses at the LHC. | 
| Line 83: | Line 86: | ||
| When I first started working with the CMS open data, people would often ask me why I didn't just join CMS. After all, instead of trying to lead a small group of theorists with no experimental experience, I could have leveraged the power and insights of a few-thousand-person collaboration.  This is true... if my only goal was to perform one specific jet substructure analysis. | When I first started working with the CMS open data, people would often ask me why I didn't just join CMS. After all, instead of trying to lead a small group of theorists with no experimental experience, I could have leveraged the power and insights of a few-thousand-person collaboration.  This is true... if my only goal was to perform one specific jet substructure analysis. | ||
| - | But what about more exploratory studies where the theory hasn't yet been invented?  What about engaging undergraduate students who haven't yet decided if they want to pursue theoretical or experimental work? What about examining old data for signs of new physics?  What about citizen-scientists who might not have world experts on [[http://web.mit.edu/lns/research/particle.html|proton-proton]] and [[http://web.mit.edu/mithig/|lead-lead]] collisions over in the next building?  And what happens if I have a great new theoretical idea after the LHC has already shut down? These were the questions that motivated me to dig into the CMS open data, and I hope that they might motivate some of you to take a look as well. | + | But what about more exploratory studies where the theory hasn't yet been invented?  What about engaging undergraduate students who haven't decided if they want to pursue theoretical or experimental work? What about examining old data for signs of new physics?  What about citizen-scientists who might not have world experts on [[http://web.mit.edu/lns/research/particle.html|proton-proton]] and [[http://web.mit.edu/mithig/|lead-lead]] collisions in the building next door? And what happens if I have a great new theoretical idea after the LHC has already shut down? These were the [[https://indico.cern.ch/event/639314/contributions/2721635/attachments/1540724/2415986/jthaler_Fermilab2017_OpenData.pdf|questions that motivated me]] to dig into the CMS open data, and I hope that they might motivate some of you to take a look as well. Our two publications are a proof of principle that open collider analyses are feasible and potentially impactful. | 
| - | Ultimately, physics is an experimental science, and Sal's aphorism that "data makes you smarter" is true at the highest level.  It is true that theoretical insights have played a crucial role in solidifying the principles of fundamental physics.  But almost everything we know for certain about the universe has originated from centuries of keen observations and detailed measurements.  Without experimental data, physical principles would be mere speculations.  With experimental data, we have an opportunity to expose the deepest structures of the universe... not just by scribbling on a chalkboard but by smashing together particles at ever-increasing energies. | + | Ultimately, physics is an experimental science, and the aphorism "data makes you smarter" holds at the most foundational level.  It is true that theoretical insights have played a crucial role in solidifying the principles of fundamental physics.  But almost everything we know for certain about the universe has originated from centuries of keen observations and detailed measurements.  Without experimental data, physical principles would be mere speculations.  With experimental data, we have an opportunity to expose the deepest structures of the universe... not just by scribbling on a chalkboard but by smashing together particles at ever-increasing energies. | 
| - | When you decide to jump into the CMS open data yourself---and I hope you do---you will be confronted with this question:  [[http://opendata.cern.ch/getting-started/CMS|"I have installed the CERN Virtual Machine: now what?"]]  However you answer this question, I'm sure that you are going to learn something.  And hopefully, you will teach the rest of us something, too. | + | When you decide to jump into the CMS open data yourself (and I hope you do), you will be confronted with this question:  [[http://opendata.cern.ch/getting-started/CMS|"I have installed the CERN Virtual Machine: now what?"]]  However you answer this question, I am sure that you are going to learn something.  And hopefully, you will teach the rest of us something, too. |