Trying out AI persona marketing with NVIDIA DGX Spark, Part 2: Selecting target audience

Narrow down by specifying conditions

現在2169件がヒットしています。check

NVIDIA, a foundational AI and artificial intelligence software provider.

[Trying out AI persona marketing with NVIDIA DGX Spark]
Episode 1: Overview of the Demo
Episode 2: Selection of the Subjects to be Investigated
Episode 3: Question Generation
Episode 4: Generating and Collecting Responses
Episode 5: Visualizing the Survey Results
Episode 6 Summary

Contents of this story

In this article, we will explain how to download the Nemotron-Personas-Japan dataset (*1), which is a set of persona data, and how to extract the target audience for the survey from it.

*1: Fujita, A., Gong, V., Ogushi, M., Yamamoto, K., Suhara, Y., Corneil, D., & Meyer, Y. (2025). Nemotron-Personas-Japan: Synthetic Personas Aligned to Real-World Distributions. https://huggingface.co/datasets/nvidia/Nemotron-Personas-Japan

Download Persona Data

The Nemotron-Personas-Japan dataset can be downloaded from Hugging Face Hub as follows:

from huggingface_hub import hf_hub_download
import cudf
import json

# Nemotron-Personas-JapanデータセットのファイルをHugging Face Hubからダウンロード num_files = 8 file_list = [] for i in range(num_files):     filename = f"data/train-{format(i, '#05d')}-of-{format(num_files, '#05d')}.parquet"     path = hf_hub_download(repo_id="nvidia/Nemotron-Personas-Japan", filename=filename, repo_type="dataset")     file_list.append(path) file_list

Loading the dataset

The Nemotron-Personas-Japan dataset is provided as a Parquet file, so it can be loaded using cuDF, which is part of the NVIDIA RAPIDS ecosystem. cuDF is a GPU-accelerated Python DataFrame library that provides a Pandas-compatible API and enables high-speed (up to 150x *2) loading, filtering, and manipulating large datasets on the GPU.

*2: RAPIDS cuDF speeds up pandas by approximately 150 times with zero code changes.

df = cudf.read_parquet(file_list)
print(f"Number of rows: {len(df)}")
print("Column Names:")
print(df.columns)

Number of rows: 1000000
Column Names:
Index(['uuid', 'professional_persona', 'sports_persona', 'arts_persona',
       'travel_persona', 'culinary_persona', 'persona', 'cultural_background',
       'skills_and_expertise', 'skills_and_expertise_list',
       'hobbies_and_interests', 'hobbies_and_interests_list',
       'career_goals_and_ambitions', 'sex', 'age', 'marital_status',
       'education_level', 'occupation', 'region', 'area', 'prefecture',
       'country'],
      dtype='object')

You can extract rows that match the conditions in the same way as with Pandas, as shown below.

# 年齢が20歳以上30歳未満のレコードを抽出 df[(df["age"] >= 20) & (df["age"] < 30)].sample(10) # 抽出したデータからランダムに10行を表示

A portion of the extracted rows is displayed.

*The Nemotron-Personas-Japan dataset was generated by AI, so the names of the people displayed are fictional.

Selection of survey participants

The survey participants (*3) are selected as follows: In this demo, 1,000 individuals aged 20 to under 30 were randomly selected from 1 million fictional individuals contained in the Nemotron-Personas-Japan dataset to be surveyed.

*3: In the Nemotron-Personas-Japan dataset, each (fictional) person has six personas and sixteen attributes. In this demo, all of these (except UUIDs) are provided to the LLM to conduct a survey.

# 条件に合致するレコード数を抽出 filtered_df = df[ (df["age"] >= 20) & (df["age"] < 30) ].sample(1000) len(filtered_df)

# cuDFデータフレームをPandasデータフレームに変換 filtered_df = filtered_df.to_pandas() len(filtered_df)

# JSONL形式で保存 filtered_df.to_json("personas.jsonl", orient="records", force_ascii=False, lines=True)

Summary of this episode

We've shown you how to download the Nemotron-Personas-Japan dataset and extract survey participants that match your user criteria. The Nemotron-Personas-Japan dataset is a massive dataset with 1 million personas, but by loading it into a cuDF DataFrame, you can achieve high-speed data access using NVIDIA GPUs.

In the next episode, we will explain how to generate survey questions and answer choices using LLM.

Episode 3: Question Generation is here

Click here for inquiries

Manufacturer TOP page