Skip to content

df.to_dict(orient=records) not forming exact columns as in the dataframe #25166

Closed
@kebab-mai-haddi

Description

@kebab-mai-haddi

While reading a CSV file, I try to get a list of column names via:

>>> df_attr = pd.read_csv("3BtTWMUQAMawXcCheUAOMlXU.csv", nrows=1)
>>> cols = list(df_attr)
>>> print (cols)

The o/p, i.e., the columns are:

['Response ID', 'Group Context', 'name', 'roh', 'sex', 'age', 'agemonth', 'ms', 'married_in_the_last_1', 'aadhar_yn', 'aadhar_number', 'aadhar_picture', 'post_acc', 'life_ins', 'health_ins', 'asso_memb_Youth.club', 'asso_memb_cultural.group', 'asso_memb_JFM', 'asso_memb_Association.of.the.users.of.drinking.water', 'asso_memb_SHG', 'asso_memb_Semi.governmental.organizations', 'asso_memb_Farmers.association', 'asso_memb_Youth.association', 'asso_memb_Women.s.association', 'asso_memb_nothing', 'asso_memb_Semi.governmental.organizations.1', 'asso_memb_Farmers.association.1', 'asso_memb_Women.s.association.1', 'asso_memb_Youth.association.1', 'asso_memb_SHG_EDTXJEKHESFdTtQwMcIT', 'asso_memb_Association.of.the.users.of.drinking.water.1', 'asso_memb_JFM_EDTXJEKHESFdTtQwMcIT', 'asso_memb_cultural.group.1', 'asso_memb_Youth.club.1', 'asso_memb_nothing.1', 'बँकिंग', 'bank_acc', 'ac_linked_aadhar', 'jdy_yn', 'शिक्षण', 'edu_high', 'edu_informal', 'edu_inst', 'comp_lit', 'कौशल्य.विकासासंबंधी.प्रश्न.फक्त.१८.३५.वयोगटातील.लोक्काना.विचारावे', 'any_skills_development_training', 'receive_skill_training_from', 'member_get_training_from', 'व्यवसाय', 'occ', 'other', 'require_assistance_in_receiving_employment', 'mgnrega_yn', 'mgnregaapp_yn', 'mgnregawork_yn', 'mgnrega_days', 'scheme_memb_Old Pension Scheme', 'scheme_memb_Janani Suraksha Yojana', 'scheme_memb_Disability Benefits', 'scheme_memb_Scholarship', 'scheme_memb_Widow Pension', 'vill_name', 'census_village_sd_2011', 'census_district_2011', 'vill_name_taluka_code', 'census_subdistrict_2011', 'subdistrict_code', 'vill_name_gp_code', 'vill_name_taluka_name', 'district_code', 'vill_name_gp_name', 'vill_name_village_code', 'location_latitude', 'location_longitude', 'location_accuracy', 'hoh', 'contact', 'informant', 'rh', 'rhoth', 'hh_occu', 'religion', 'socialgrp', 'sc_health_id', 'census_country', 'state_name', 'state_code', 'age_married', 'village_code_census2011_raw', 'phase']

and rows via:

>>>df = pd.read_csv("3BtTWMUQAMawXcCheUAOMlXU.csv", chunksize=2)
>>>for d in df:
...     d = d.to_dict(orient='records')
...     for r in d:
...             print(r)
...     import sys
...     sys.exit()

The o/p is:

{'_0': '74a7c6f8-94f3-4882-8ad7-9199313a1a51', '_1': 1, 'name': 'Suresh Kautik Patil', 'roh': 'Self', 'sex': 'Male', 'age': 62, 'agemonth': 2, 'ms': 'Married', 'married_in_the_last_1': 'No', 'aadhar_yn': 1, 'aadhar_number': 467172934356, 'aadhar_picture': 'Https://Collect-V2-Production.s3.Ap-South-1.Amazonaws.com/Nzrzdt5akq7uutju90bf%2Fhxbrh6bz57ihvawa3e4n%2Fze2gzezsov7polixljkg%2Fece8f806-3633-445D-8183-336460Cc6207', 'post_acc': 0, 'life_ins': 0, 'health_ins': 0, '_15': 0, '_16': 0, 'asso_memb_JFM': 0, '_18': 0, 'asso_memb_SHG': 0, '_20': 0, '_21': 0, '_22': 0, '_23': 0, 'asso_memb_nothing': 0, '_25': 0, '_26': 0, '_27': 0, '_28': 0, 'asso_memb_SHG_EDTXJEKHESFdTtQwMcIT': 0, '_30': 0, 'asso_memb_JFM_EDTXJEKHESFdTtQwMcIT': 0, '_32': 0, '_33': 0, '_34': 1, 'बँकिंग': nan, 'bank_acc': 1, 'ac_linked_aadhar': 1, 'jdy_yn': 1, 'शिक्षण': nan, 'edu_high': 'Secondary', 'edu_informal': 'Other', 'edu_inst': 'No Information', 'comp_lit': 'No', '_44': nan, 'any_skills_development_training': nan, 'receive_skill_training_from': nan, 'member_get_training_from': nan, 'व्यवसाय': nan, 'occ': 'Labourers', 'other': nan, 'require_assistance_in_receiving_employment': 'No', 'mgnrega_yn': 0, 'mgnregaapp_yn': nan, 'mgnregawork_yn': nan, 'mgnrega_days': nan, '_56': 'No', '_57': nan, '_58': nan, 'scheme_memb_Scholarship': nan, '_60': nan, 'vill_name': 'Cc2bf74e-5Baa-4A23-Ad9d-21Fef4517f41', 'census_village_sd_2011': 'Shindgavhan', 'census_district_2011': 'Nandurbar', 'vill_name_taluka_code': 3954, 'census_subdistrict_2011': 'Nandurbar', 'subdistrict_code': 3954, 'vill_name_gp_code': 182276, 'vill_name_taluka_name': 'Nandurbar', 'district_code': 497, 'vill_name_gp_name': 'Shidgavahan', 'vill_name_village_code': 525705, 'location_latitude': 21.4179219, 'location_longitude': 74.354738, 'location_accuracy': 3, 'hoh': 'Suresh Kautik Patil', 'contact': 'In(+91)-9374060682', 'informant': 'Suresh Kautik Patil', 'rh': 'Self', 'rhoth': nan, 'hh_occu': 'Labourers', 'religion': 'Hindu', 'socialgrp': 'OBC', 'sc_health_id': 1, 'census_country': 'India', 'state_name': 'Maharashtra', 'state_code': 27, 'age_married': '<21 (For Boys)', 'village_code_census2011_raw': 525705, 'phase': 'Phase 3'}

As you can see, the column name Response ID is missing while I read the row. It should be noted that df.iterrows() gave me all the correct columns.

The first few lines(inc the header) from my CSV file are here so that one can take this question as MVC.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions