A solution to the model attribution challenge
Introduction
The Machine Learning Model Attribution Challenge provided 12 open source language models
and 12 models that were finetuned on them. Full access is given to the base models but the
finetuned models were only accessible through queries, mimicking a possible “scenario in which
a naive adversary has stolen a large language model and finetuned it for a specific task”.
Contestants were asked to “connect the dots'', and attribute each tuned model to their
respective base model. The challenge had concluded on September 16th, with my submission
getting 3rd place with 6/12 models correctly after 244 queries.
Base model prompting
Since the base models had been in the submission form answers before the event started, I
decided to mess around and familiarise myself with them. There were some notable quirks of
their training data and generations, mostly with zero-shot prompts.
Both BLOOM models were specifically trained on multilingual text besides English, having much
greater coherency on my test case (Vietnamese) than the rest.
“phở là món ăn được nhiều người yêu thích. Tuy nhiên, không phải ai cũng biết cách
translation: “pho is a dish favoured by many. However, not everyone knows
The Codegen-350-multi model was specifically trained on Github code, and pretty much
outputted code (functions, commented lines, copyrighted licences, … ) with every input.
“Once upon a time, this is the main loop that would be used. If you're going to use the main loop
for some other purpose, just do ``run()`` instead of this. ``run()`` will have its own event loop
ifand I am learning in
* Java and I want to learn how do know that we will be using Java,
* I create a client that uses the Java client API. With this server
* you create a client for
“# rint '', '', '', c, s.
<|python|> self.assertEqual(sess.adversary_id, 'fake_id')
sess.request.get = web
The XLnet-base-cased model was specifically trained to generate long text, and was prone to
repetition.
the best way to get in the door is by ordering the right kind of food and a proper diet... When no
one bothers to get in the door, you want to eat a healthy lunch or dinner. It will be important to
have a big lunch with plenty of calories in the form of a decent meal and good water. Once you
are out, try to have a nice breakfast of something to go with your lunch. The good thing about
eating a healthy breakfast you can never have a bad lunch. In addition, if you do have a bad
lunch, make sure that you eat it. The good things about eating healthy are that you can never
have a bad lunch and that you don’t have to pay a lot of money to eat. The good thing about
eating healthy is that you don’t have to pay a lot of money, that you don’t have to spend a lot of
money on food and that you don’t have to do a lot of shopping, that you don’t have to spend a lot
of money on car (that you can go shopping from anywhere you want, that you don’t have to pay
too much money for fuel), that you don’t have to pay too
The DialoGPT model was trained on dialogues. Its generated texts were conversational (a
chatbot)
> hi
>> hi
> how are you
>> i’m good, you?
The Multilingual model is apparently a feature extractor, and I was not able to generate text from
it.
Finetuned model prompting
I began by checking their outputs of the to see if there’s any obvious correlations.
model 7: {'generated_text': '# -*- coding: utf-8 -*-\r\n'
'"""\r\n'
'***************************************************************************\r\n'
'* '
'*\r\n'
'* This program is free software; you can redistribute it '
'and/or modify '}
model: 7
{'generated_text': 'Hi. (The "Greeting" for most\n'
' users. May be changed to adapt to more '
'third-party users.)\n'
'\n'
' If `locale` is absent or None, this returns '
'unmodified `greeting`.'}
Seems that this is likely the previously mentioned Codegen-350m.
model: 10
{'generated_text': 'the best way to do your art/ craft may be as visually '
'attractive as it can be highly visualally effective by '
"you. If you don't believe aesthetically attractive as your "
'craft can be visuallyally striking as visual as they are '
'visually visually visually. Then if can your design '
'visually presentally most visually striking as visually '
'attractive as clearly as as visually stunning visually '
'visually visually visually visually visually aesthetically '
'visually visually visually visually visually visually '
'visually visually visually visually visually visually '
'visually visually visually visually visually visually '
'visually visually visual particularly visually visually '
'visually visually visually visually visually visually '
'visually visually visually visually visually visionary '
'visual visually visually visually visually visually poetic '
'visually visually creative visually visually artistic '
'visually visual inspirational visually visionary visually '
'literary visual visually visual boldly architectural '
'visually visual visually visionary photographic artistic '
'creative visual art visionary architectural boldly '
'visionary visual visionary visually conceptual boldly '
'compelling imaginative artistic visually bold poetic '
'visually visionary visionary visual visually visionary '
'visionary vividally visionary architectural visually '
'boldly inspirational contemporary visionary visually '
'visionary visionary bold visionary brilliant creative '
'visual creative conceptual visionary brilliant bold '
'visionary visionary visual conceptual progressive artistic '
'design visionary forward visionary vision visionary '
'visionary visionary visual visionary visionary creative '
'visionary brilliant brilliant vision bold visionary '
'visionary visionary visionary storytelling visionary '
'visionary brilliant creative visionary artistic design '
'visionary artistic visionary insight visionary visual '
'visionary radical architectural visionary inspirational '
'pioneering visionary narrative visionary brilliant '
'visionary visionary visionary'}
model: 10
{'generated_text': '# Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient College Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient Ancient '
'Ancient Ancient Ancient Ancient Ancient Ancient My '
'Reading? It was Your Guide?? Your Thought What Reading/ '
'Series For Your Reading?/ Your Personal Thought Through '
'What Reading It It Was Your Sense Of Reading// Kind Of '
'Reading Your Reading What Kind Of Reading Where Type Of '
'Reading Quo Where Reading Reading Which Kind What Type Of '
'Reading Reading? If Where Which Quo Which Type Of Reading'}
This is the only model that generated long and repetitive text consistently, very similar to xlnet.
After about a dozen queries I noticed that the loading time for each model differed greatly, some
taking as long as 12 attempts to load, and most taking around 2 attempts. For each attempt took
20 seconds, it was obvious that prompt variety was not the main cause, but likely the model
themselves. Also as they are all hosted on huggingface and should have the same connection
speed, the cause is most potentially the model’s size, which are indicated as the amount of
parameters in either their very name or model card.
model loading attempts in colab:
{'status': 'failed', 'result': {'error': 'currently loading', 'estimated_time': 125.36245727539062},
'queries': {'0': 13, '1': 9, '6': 6, '2': 6, '3': 6, '4': 6, '5': 6, '7': 4, '8': 6, '9': 7, '10': 5, '11': 6}}
attempt 1/10; waiting for 20 seconds
{'status': 'failed', 'result': {'error': 'currently loading', 'estimated_time': 125.36245727539062},
'queries': {'0': 13, '1': 9, '6': 6, '2': 6, '3': 6, '4': 6, '5': 6, '7': 4, '8': 6, '9': 7, '10': 5, '11': 6}}
attempt 2/10; waiting for 20 seconds
{'status': 'failed', 'result': {'error': 'currently loading', 'estimated_time': 125.36245727539062},
'queries': {'0': 13, '1': 9, '6': 6, '2': 6, '3': 6, '4': 6, '5': 6, '7': 4, '8': 6, '9': 7, '10': 5, '11': 6}}
attempt 3/10; waiting for 20 seconds
Model 9 always took the longest to load up, usually breaking the loop that I used to query. Model
2 ranked second, averaging at 7 attempts. After finding out that the two biggest models are
GPT-J-6b and BLOOM-2b5, I then tested with multilingual prompts.
model: 2
{'generated_text': 'Cá ngựa là một trong những món ăn vặt được ưa chuộng nhất '
'hiện nay. Không chỉ bởi hương'}
translation: “seahorse is one of the most popular snacks these days. Not only because of their
taste
Aside from the inaccurate content, this was the only generation that had any sense of structure
and context. Combined with the fact that this model was unlikely to have been finetuned on
multilingual data (will only produce english unless prompted otherwise), BLOOM-2b5 was the
best fit.
I then tried sentimental analysis with a video game review. Model 2 and model 5 gave the same
result.
{'generated_text': 'I agree with the criticism of WP, however. That place had '
'me tearing my hair out.\n'
'\n'
'Aside from being a pretty sudden departure from most of '
"the rest of the game, it's just annoying to deal with. It "
'relies on an infuriating trial and error system in some '
'areas, and is just plain frustrating in others. The fact '
'that its mandatory for two of the three endings makes it '
'even worse.\n'
'\n'
"It's not bad enough to make the game bad, but it's boring, "
'bland and frustrating low point of the game.\n'
'\n'
'The review thinks that the game is boring'}
From the similarity and the fact that the base BLOOM model also generated that in many
instances, it was probably BLOOM-350m.
Additionally, the same prompt gets “icky” returned for models 9 and 11. This was also seen in
every of the GPT-based models, fitting my hypothesis that model 9 is GPT-J.
Following that was a one-shot prompt with numerical listing.
model: 4
{'generated_text': 'Millenium math problems include: 1. The Riemann Hypothesis '
'2. The Second Law of Thermodynamics 3. The conservation '
'of mass 4. The equation for the expansion of the universe '
'5. The speed of light To solve for'}
Certainly unique, as all other models returned either “2. ” followed by an unrelated word or short
phrase, or didn’t respond at all. Checking against all the base models, gpt-xl is the most similar
with these results:
{'generated_text': 'Millenium math problems include: 1. The Riemann '
'Hypothesis(1942), 2. The Born rule(1956), 3. '
'The Schur algebra(1950), 4. The Kaluza-'},
{'generated_text': 'Millenium math problems include: 1. The Riemann '
"Hypothesis 2. Inverse Euler's Formula 3. "
'Hypergeometric Sequences 4. Hahn-Banach Geometry'
'5. Hyper'},
This behaviour was seen only once elsewhere in distilgpt2. I suspected it was the model size of
gpt-xl (~1bil) that made this possible, as smaller models like gpt-neo or opt only gave “2.”, and
gpt2 went on tangents explaining the Riemann hypothesis.
The remaining guesses get much more speculative, were based on little concrete correlations,
and are all probably inaccurate.
I prompted the remaining models with:
Strings of emojis
Vague emoji labels
😀
: Happy
😭
: Sad
😡
: Angry
🥰
: Love
😨
Japanese emojis ( ◕‿ )づ(ΦωΦ ٩(^◡ (◕‿◕) ( •_•) ()ノ彡┻━
Repeating text 'again and again and again and again and’
arithmetics: “356+ 482 =”
empty character ’
to no avail. The only things that I have deduced was that:
model 3 sometimes generates unprompted introductions. My guess was dialogpt-large.
model 4 (gptxl) was finetuned on r/writingprompts ([WP] and reddit usernames).
model 6 was finetuned on Crime and Punishment by Dostoyevsky (character names and
passages consistently).
model 8 was finetuned on some The Beatles music lyrics, which I picked gpt-neo for short lines
and similarities to other gpt models.
Conclusion
This was a very messy attempt to spot obvious (multilingual generations) and subtle (tendency
to make a new line) correlations between finetuned and base models. I believe the “residuals”
from base models could be detected with a certain degree of accuracy with enough information
on them.
My approach has achieved 50% accuracy, and only 1 guess short of the 1st place. Realistically
reliable or not, I hope it could help as some form of first measure guideline when dealing with a
harmful finetuned model.
And personally, I feel much more acquainted with the many LLMs and the workflow surrounding
them. I give my thanks to the ATLAS team for hosting this competition.