A tenured academic fights back against the theft of her team’s work.
Many in the world of cybersecurity have a deep respect for the work of others and know we stand on the shoulders of giants. Advancing an entire discipline, especially one as young as this, requires extensive cross-collaboration and the ability to regularly rely on the work of others.
Professionals in this space know that work product which did not originate with them, academic or otherwise, must be appropriately attributed in their own works.
However, not all colleagues share the same respect for one another.
Plagiarism, for which most claimants have no recourse, runs rampant throughout the cybersecurity industry and anecdotally seems to impact women and minorities disproportionately.
So what happened?
SecurityWeek (and various other outlets, such as Bleeping Computer, Dark Reading, The Washington Post, Bloomberg, USA Today, Business Insider, and Financial Times) have directly written about or cited exhaustive research performed by Temple University academics to track and compile details of ransomware attacks on critical infrastructure.
The work is led by Dr. Aunshul Rege, associate professor and director of the CARE Lab at Temple University. The project’s main contributor is Rachel Bleiman, a PhD student at Temple’s Criminal Justice program.
Here is how they described their work to this writer (emphasis mine):
“Rachel and I started with the goal of just providing a FREE dataset based on open-source information for the academic community. My students and I know how hard it is to find datasets that are free in aggregate form without any MOU or NDA requirements that allow us to use data freely for research, so we decided to create a dataset ourselves. We wanted to help educators and students. We were collecting data for my NSF CAREER grant anyway, so we decided to rehash it and share it with the academic community. Despite our initial intention of serving the academic community, today, our dataset also serves the industry, government, nonprofits, and journalists.”
The sheer effort involved with developing, maintaining, and continuing to obtain funding for this work seems quite impressive:
“We started this effort in 2019 and initially we had to put in a lot of work with deciding on the variables, defining codebooks, etc.
We then got a lot of feedback from the community and had to revamp the dataset for major revisions, such as mapping on to the MITRE ATT&CK framework (Strain IDs), recording modifications with each iteration, and welcoming publicly reported incidents from the community, which were not captured in our dataset. We also had to do a lot of internal reflection on improving the dataset, such as adding geographic locations, ranking of ransom amounts and attack durations, and adding a secondary critical infrastructure that was impacted by the attack. We try to release an updated version every month to share with the community, so there are regular updates and of course the task of responding to requests.
We don’t post the (free) dataset because we like to know how it benefits the community — we want to know who (industry, government, educator, student…) and why (research, threat intelligence, class project…).
This helps us report back to our funder, the National Science Foundation, which hopefully helps us secure funding in the future and support wonderful and hard-working students like Rachel.”
How did this dataset end up misattributed?
The dataset was officially requested from these researchers by an industry businessman who immediately turned around, modified the attribution to his own nonprofit cybersecurity organization which he founded, then casually sent the data into a community Telegram chat.
From there, it was picked up and reposted by a Twitter user (I will mostly refer to them as “Reposter” to avoid any unnecessary pile-ons), and subsequently discovered on Twitter by the project’s primary researcher, Dr. Rege, causing her and her team significant amounts of stress.
There is no excuse for this.
The project site, hosted at Temple University, provides a form to request the dataset. The site and the email which is sent to approved requestors specifies that obtaining the dataset includes a requirement for attribution and provides a format for citing the research.
This is how the form looks (red underlined annotations mine):
It seems pretty clear that this individual had an understanding of how the dataset should be handled. In effect, he consented to these stipulations twice; once upon request, and once upon receipt.
As an added annoyance, when the individual requested the material on 3/13 as shown in the screenshots above, they referred to Dr. Rege using male pronouns despite citing their previous correspondence. They clearly didn’t bother to Google her prior to stealing her work. Despite this act of erasure, Dr. Rege replied kindly on 3/29 to provide the dataset and quite generously did not mention the misgendering via form submission.
The plagiarist, Alon Refaeli, a man who is well established in his career, is the founder of “CyberTogether”, an organization with “250 partners” including Akamai, BlackBerry, and Palo Alto Networks which purports to “promote Israeli cyber security innovation”.
Mr. Refaeli describes himself as a “Talented and well connected global business enabler specializes in marketing & sales for Fortune 500 companies in the information security sector. […] the heart & soul of Cyber Together [sic]”
Needless to say, plagiarism is not innovative.
Anyway, Mr. Refaeli then decides to email Dr. Rege and attempt to convince her to take her tweets down. He also mentions CyberTogether’s legal counsel, Nimrod Tauber, stating Dr. Rege’s tweets are “a damage to our name and reputation.”
Based on Dr. Rege’s screenshots below, he claimed it wasn’t him, he had no idea how any of this happened, and that the situation had nothing to do with his organization.
Meanwhile, the dataset’s Reposter publicly apologized to Dr. Rege and her team by saying they knew nothing of the source of the data , as it had been sent into a community Telegram chat:
After Dr. Rege bravely posted about this apparent act of shameless plagiarism, some infosec professionals came out in her team’s defense, with Dr. Rege’s original tweet gaining more than 450 likes as of this writing.
As for how we learned where the information was originally posted?
Iftach Ian Amit, the Chief Security Officer at cybersecurity firm Rapid7, stood up for Dr. Rege and her team.
Mr. Amit shared an additional screenshot of his own research showing Mr. Refaeli posting the message into the unofficial DC9723 chat on Telegram. You see, Mr. Amit founded DC9723, which is the Tel Aviv DEFCON group, boasting more than 10,000 members online and free monthly meetings which see 80–100 participants on average.
Mr. Amit initially noticed a posting in the organization’s Facebook group referencing the dispute on Twitter. He was able to track the misattributed data back to the unofficial Telegram group for DC9723, having been posted from an account named Alon Refaeli, which matches the person’s name who requested the data from the researchers.
Mr. Amit reviewed the file and confirmed the datasets were the same, with the attribution modified to reflect Mr. Rafaeli’s organization, CyberTogether.
Since the owner wants to delete it for some reason…
…I have copied and posted it in unmodified form for posterity:
Why did Mr. Amit feel compelled to act? In his own words:
Having been around the security industry for a bit, I wanted to dig in and make sure what’s really going on, reviewed the file that was posted by Alon (clearly named CyberTogether, and in it the only credit was given to CyberTogether), and compared it to the data from Temple University.
After reaching out to Alon personally, he claimed he had permission to share the file — based on sharing data with the researched in the past. He then shifted the blame to someone else claiming that they should have merged this file with additional data they contributed and it was a mere oversight. I found the explanation lacking — especially as the sharing was done from his personal account and even if he didn’t look at the content, the mere file name should have raised flags when sharing it without any credit.
As someone who’s been in the industry for over 25 years, and who has put out research (and referenced other researchers’ materials) — this topic is close to my heart, as the only thing a researcher can get from their hard work is a simple credit and a citation (in academic settings).
Clearly this irks me, as I’d hate for a few bad apples to have an impact on how the industry is perceived.
So, is this really a big deal?
The hacker community has often stated “information wants to be free” (https://en.wikipedia.org/wiki/Information_wants_to_be_free) to describe a particular philosophical attitude against limitations on transparency and open information sharing. However, the unattributed distribution of said knowledge diminishes it by separating it from its source, which endows the information with additional critical sourcing attributes that people who might rely on that dataset may need to ensure its integrity.
Plagiarism involving researched infosec datasets upon which critical judgments may be made is only a few modifications away from becoming a supply chain vulnerability for an organizational security program. Or, even worse in today’s media environment: a supporting element for someone’s disinformation campaign.
Additionally, while the intent behind plagiarism is often to boost one’s reputation, the effect reverses itself once your community discovers your misdeeds. Don’t waste your own time claiming the work of others. If you have no original research to share, the better way to elevate yourself is to elevate others. If you found someone’s work valuable, promote it with proper attribution. It only takes a few minutes to run a Google search to verify your information, and it’s the smart thing to do.
What is the impact on those whose work is plagiarized?
Here are a few further comments from Dr. Aunshul Rege and Ms. Rachel Bleiman at Temple University (emphasis mine):
“Rachel and I are appalled by the brazen intellectual property theft. We have both worked very hard to develop a useful and free resource, and honestly, we just feel violated. It is insulting and demoralizing. It goes against the very essence of what we are trying to accomplish, which is to give back to the community. As an educator, I keep telling my students to give credit where credit is due. As awful as this may seem, this is a valuable learning experience for both of us, especially for Rachel as she starts her career in another few years.
In terms of what must be done to set things right, we think the infosec community has already taken that first step by supporting us greatly and calling out the wrongdoers. However, they are still hosting our dataset and not giving us the recognition that we deserve. We even provided the citation — all they needed to do was use it and acknowledge that it’s our work.”