GPT-2 Medium Text Generation ~10 Million Tokens
Built a complete text generation pipeline to create a dataset of 20,000+ GPT-2 generated sequences, each around 500 tokens, yielding over 10 million tokens in total. This dataset was analyzed alongside 7 GB of AI and human text, encompassing more than 1.3 million rows. All generation and analysis were done locally, showcasing full end-to-end capability in large-scale text processing.