This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
playground:cmstest [2017/11/28 13:30] jthaler |
playground:cmstest [2017/11/28 14:02] (current) jthaler |
||
---|---|---|---|
Line 60: | Line 60: | ||
==== Learning from the Community ==== | ==== Learning from the Community ==== | ||
- | While our two publications only list 5 authors (Aashish, Wei, Andrew, Simone, and myself), our acknowledgements recognize around 40 experimentalists who generously offering their time, advice, and, in some cases, code. Without help from [[https://arts-sciences.buffalo.edu/physics/faculty/salvatore-rappoccio.html|Sal Rappoccio]], we would have struggled to figure out how the extract the proper jet correction factors. Without help from the CMS open data team, including [[https://tuhat.helsinki.fi/portal/en/person/kmlassil|Kati Lassila-Perini]] and [[http://www.desy.de/~geiser/|Achim Geiser]], we would have never figured out how to determine the "integrated luminosity", which tells you how much total data CMS had collected. Whenever I gave talks about our CMS open data effort, experimentalists in the audience would kindly point out some of our "rookie mistakes" (often made by starting experimental PhD students). We also benefitted from having a 2015 summer student [[https://alexis-romero.weebly.com/|Alexis Romero]] (now a graduate student at University of California, Irvine) test whether the CMS open data results agreed with those obtained from simulated LHC samples. | + | While our two publications only list 5 authors (Aashish, Wei, Andrew, Simone, and myself), our acknowledgements recognize around 40 experimentalists who generously offering their time, advice, and, in some cases, code. Without help from [[https://arts-sciences.buffalo.edu/physics/faculty/salvatore-rappoccio.html|Sal Rappoccio]], we would have struggled to figure out how to extract and apply the proper jet correction factors. Without help from the CMS open data team, including [[https://tuhat.helsinki.fi/portal/en/person/kmlassil|Kati Lassila-Perini]] and [[http://www.desy.de/~geiser/|Achim Geiser]], we would have never figured out how to determine the "integrated luminosity", which tells you how much total data CMS had collected. Whenever I gave talks about our CMS open data effort, experimentalists in the audience would kindly point out some of our "rookie mistakes" (often made by starting experimental PhD students). We also benefitted from having a 2015 summer student [[https://alexis-romero.weebly.com/|Alexis Romero]] (now a graduate student at University of California, Irvine) test whether the CMS open data results agreed with those obtained from simulated LHC samples. |
Most of the feedback we got from the experimental particle physics community was [[https://twitter.com/KyleCranmer/statuses/913112593715335168|very positive]]. Though there was considerable initial skepticism that a team of 5 theoretical physicists could perform a publishable analysis based on open collider data, much of that skepticism dissipated once it became clear that our analysis was based largely on the same workflow used by CMS. Our analysis is by no means perfect, since there are places where we simply didn't have the information (or the expertise) to address a known shortcoming. But I am proud that we applied a high degree of scrutiny to our own work, even though the final plots in our September 2017 publication are essentially the same as the ones I showed back in August 2015. | Most of the feedback we got from the experimental particle physics community was [[https://twitter.com/KyleCranmer/statuses/913112593715335168|very positive]]. Though there was considerable initial skepticism that a team of 5 theoretical physicists could perform a publishable analysis based on open collider data, much of that skepticism dissipated once it became clear that our analysis was based largely on the same workflow used by CMS. Our analysis is by no means perfect, since there are places where we simply didn't have the information (or the expertise) to address a known shortcoming. But I am proud that we applied a high degree of scrutiny to our own work, even though the final plots in our September 2017 publication are essentially the same as the ones I showed back in August 2015. | ||
Line 68: | Line 68: | ||
In my view, though, the scientific benefits of making data public outweigh the scientific costs. With the CMS open data, there is a 4-5 year time lag between when the data is collected and when it is made public. That time lag helps ensure that open data complements, rather than competes, with the needs of the CMS collaboration. Moreover, open data is a stepping stone towards full archival access, such that even when the LHC is eventually decommissioned, the data will be preserved for future use. By making the data public, there is a chance to perform a back-to-the-future analysis like ours, where 2010 data, released in 2014, is analyzed using a 2015 technique, for publication in 2017. | In my view, though, the scientific benefits of making data public outweigh the scientific costs. With the CMS open data, there is a 4-5 year time lag between when the data is collected and when it is made public. That time lag helps ensure that open data complements, rather than competes, with the needs of the CMS collaboration. Moreover, open data is a stepping stone towards full archival access, such that even when the LHC is eventually decommissioned, the data will be preserved for future use. By making the data public, there is a chance to perform a back-to-the-future analysis like ours, where 2010 data, released in 2014, is analyzed using a 2015 technique, for publication in 2017. | ||
- | Interestingly, as we were pursuing our open data analysis, there was a [[https://arxiv.org/abs/1708.09429|official CMS analysis]] on a similar topic. Our analysis was based on proton-proton collisions from 2010, while the CMS analysis was based mostly on lead-lead collisions from 2015. Our analysis was an exploratory study of jet substructure, while the CMS analysis was far more ambitious, using jet substructure to probe the properties of a hot dense state of matter called the quark/gluon plasma. One could cynically say that our analysis was stealing thunder from CMS, but I see these two studies as being synergistic, since we made different analysis choices that led to complimentary physics insights. In this way, open data can enrich the dialogue between the theoretical and experimental particle physics communities. | + | Interestingly, as we were pursuing our open data analysis, there was a [[https://arxiv.org/abs/1708.09429|official CMS analysis]] on a similar topic. Our analysis was based on proton-proton collisions from 2010, while the CMS analysis was based mostly on lead-lead collisions from 2015. Our analysis was an exploratory study of jet substructure, while the CMS analysis was far more ambitious, using jet substructure to probe the properties of a hot dense state of matter called the quark/gluon plasma. One could cynically say that our analysis was stealing thunder from CMS, but I see these two studies as being synergistic, since we made different analysis choices that led to complementary physics insights. In this way, open data can enrich the dialogue between the theoretical and experimental particle physics communities. |
Line 86: | Line 86: | ||
When I first started working with the CMS open data, people would often ask me why I didn't just join CMS. After all, instead of trying to lead a small group of theorists with no experimental experience, I could have leveraged the power and insights of a few-thousand-person collaboration. This is true... if my only goal was to perform one specific jet substructure analysis. | When I first started working with the CMS open data, people would often ask me why I didn't just join CMS. After all, instead of trying to lead a small group of theorists with no experimental experience, I could have leveraged the power and insights of a few-thousand-person collaboration. This is true... if my only goal was to perform one specific jet substructure analysis. | ||
- | But what about more exploratory studies where the theory hasn't yet been invented? What about engaging undergraduate students who haven't decided if they want to pursue theoretical or experimental work? What about examining old data for signs of new physics? What about citizen-scientists who might not have world experts on [[http://web.mit.edu/lns/research/particle.html|proton-proton]] and [[http://web.mit.edu/mithig/|lead-lead]] collisions in the building next door? And what happens if I have a great new theoretical idea after the LHC has already shut down? These were the questions that motivated me to dig into the CMS open data, and I hope that they might motivate some of you to take a look as well. Our two publications are a proof of principle that open collider analyses are feasible and potentially impactful. | + | But what about more exploratory studies where the theory hasn't yet been invented? What about engaging undergraduate students who haven't decided if they want to pursue theoretical or experimental work? What about examining old data for signs of new physics? What about citizen-scientists who might not have world experts on [[http://web.mit.edu/lns/research/particle.html|proton-proton]] and [[http://web.mit.edu/mithig/|lead-lead]] collisions in the building next door? And what happens if I have a great new theoretical idea after the LHC has already shut down? These were the [[https://indico.cern.ch/event/639314/contributions/2721635/attachments/1540724/2415986/jthaler_Fermilab2017_OpenData.pdf|questions that motivated me]] to dig into the CMS open data, and I hope that they might motivate some of you to take a look as well. Our two publications are a proof of principle that open collider analyses are feasible and potentially impactful. |
- | Ultimately, physics is an experimental science, and the aphorism "data makes you smarter" holds at the most foundational level. It is true that theoretical insights have played a crucial role in solidifying the principles of fundamental physics. But almost everything we know for certain about the universe has originated from centuries of keen observations and detailed measurements. Without experimental data, physical principles would be mere speculations. With experimental data, we have an opportunity to expose the deepest structures of the universe... not just by scribbling on a chalkboard but also by smashing together particles at ever-increasing energies. | + | Ultimately, physics is an experimental science, and the aphorism "data makes you smarter" holds at the most foundational level. It is true that theoretical insights have played a crucial role in solidifying the principles of fundamental physics. But almost everything we know for certain about the universe has originated from centuries of keen observations and detailed measurements. Without experimental data, physical principles would be mere speculations. With experimental data, we have an opportunity to expose the deepest structures of the universe... not just by scribbling on a chalkboard but by smashing together particles at ever-increasing energies. |
- | When you decide to jump into the CMS open data yourself---and I hope you do---you will be confronted with this question: [[http://opendata.cern.ch/getting-started/CMS|"I have installed the CERN Virtual Machine: now what?"]] However you answer this question, I'm sure that you are going to learn something. And hopefully, you will teach the rest of us something, too. | + | When you decide to jump into the CMS open data yourself (and I hope you do), you will be confronted with this question: [[http://opendata.cern.ch/getting-started/CMS|"I have installed the CERN Virtual Machine: now what?"]] However you answer this question, I am sure that you are going to learn something. And hopefully, you will teach the rest of us something, too. |