I have been working on a dataset for training a gpt model from only ORE chat. The reasoning for this is for fun, and education. It can bring more people Computer Science knowledge as well. Only today have I realized some people do not want to be included in this for privacy reasons. I would like to clearly state ALL of the messages all ready do not have names of the senders and are stripped to just pure chat text.
So I would like to know who would like their messages removed from this dataset. The problem is I can not just ask whether someone wants their old chat to be in the dataset or not. The reason for this is the fact that Minecraft chat logs do not contain usernames but nicknames.
In order to get removed from this dataset, please comment your previous nicknames and your current ones in the replies. The color does not matter only the text of them (e.g. Noksy = Nokia as the nickname)
Please reply in this format: username: previous nicknames and current: (note this will be important as without all we can not completely remove your chat, but the most you can remember will be sufficient to get most) confirm to be removed from the GPT-ORE Dataset: y/n
I was told by multiple people and Don that this would be the best solution.
If you do not see this within 5 days the model will have started training. But you can still be removed from the dataset when the next version of the model releases.
Current list of players wanting to be removed:
(if anyone wants to they can always change their mind)
So, for those who somehow don’t understand yet, the final dataset DOES NOT INCLUDE YOUR NAMES. This means no message can be linked to anyone who typed it. It is literally just the plain text you typed into the minecraft textbox.
For those who want to remove your messages, when we do the filtration from the chat logs, we also make a JSON file, storing the nickname and message. This does not end up in the final model however, and is only to filter those who request to filter their messages. You are able to request to have your messages be out of the dataset (even though it is anonymous and it isn’t of much point for you to just remove these messages), but you must tell us your nicknames that you want removed from that dataset. This will only remove those nicknames, this is the most that we can do.
Furthermore, if you want your name completely removed from all messages just because haha yes privacy, fuck that. I am not doing that because that is opening a whole new can of worms and that would literally not be possible without ruining major parts of the dataset.
If we start doing that stupid stuff, then…
garbage in = garbage out
Also, the more data there is, the better. The model gets trained better and therefore there is a better quality reply from queries. The data quality is also important, but this is mostly to learn about AI and have a cool thing to show to people, which they could mess with. It is a learning opportunity.
That said tho, I will probably go make an AI just for personal learning from tons of logs I’ve gathered over the years, I have around 200k chat messages logged on this system alone, and I have many more logs all around so I could probably get to a mil and more, and see where that takes me. And yes, Minecraft by default logs chat messages. I have to do a lot of filtering though, but this still gives me a ton of data to work with for just education purposes.
Opt-in means we barely get any data, and some people won’t care or won’t be online during this timeframe. Either way, your messages are pretty much anonymous and any personal information will be automatically removed as I add more and more filters and improve them as we go.