Closed
Description
While reading a CSV file, I try to get a list of column names via:
>>> df_attr = pd.read_csv("3BtTWMUQAMawXcCheUAOMlXU.csv", nrows=1)
>>> cols = list(df_attr)
>>> print (cols)
The o/p, i.e., the columns are:
['Response ID', 'Group Context', 'name', 'roh', 'sex', 'age', 'agemonth', 'ms', 'married_in_the_last_1', 'aadhar_yn', 'aadhar_number', 'aadhar_picture', 'post_acc', 'life_ins', 'health_ins', 'asso_memb_Youth.club', 'asso_memb_cultural.group', 'asso_memb_JFM', 'asso_memb_Association.of.the.users.of.drinking.water', 'asso_memb_SHG', 'asso_memb_Semi.governmental.organizations', 'asso_memb_Farmers.association', 'asso_memb_Youth.association', 'asso_memb_Women.s.association', 'asso_memb_nothing', 'asso_memb_Semi.governmental.organizations.1', 'asso_memb_Farmers.association.1', 'asso_memb_Women.s.association.1', 'asso_memb_Youth.association.1', 'asso_memb_SHG_EDTXJEKHESFdTtQwMcIT', 'asso_memb_Association.of.the.users.of.drinking.water.1', 'asso_memb_JFM_EDTXJEKHESFdTtQwMcIT', 'asso_memb_cultural.group.1', 'asso_memb_Youth.club.1', 'asso_memb_nothing.1', 'बँकिंग', 'bank_acc', 'ac_linked_aadhar', 'jdy_yn', 'शिक्षण', 'edu_high', 'edu_informal', 'edu_inst', 'comp_lit', 'कौशल्य.विकासासंबंधी.प्रश्न.फक्त.१८.३५.वयोगटातील.लोक्काना.विचारावे', 'any_skills_development_training', 'receive_skill_training_from', 'member_get_training_from', 'व्यवसाय', 'occ', 'other', 'require_assistance_in_receiving_employment', 'mgnrega_yn', 'mgnregaapp_yn', 'mgnregawork_yn', 'mgnrega_days', 'scheme_memb_Old Pension Scheme', 'scheme_memb_Janani Suraksha Yojana', 'scheme_memb_Disability Benefits', 'scheme_memb_Scholarship', 'scheme_memb_Widow Pension', 'vill_name', 'census_village_sd_2011', 'census_district_2011', 'vill_name_taluka_code', 'census_subdistrict_2011', 'subdistrict_code', 'vill_name_gp_code', 'vill_name_taluka_name', 'district_code', 'vill_name_gp_name', 'vill_name_village_code', 'location_latitude', 'location_longitude', 'location_accuracy', 'hoh', 'contact', 'informant', 'rh', 'rhoth', 'hh_occu', 'religion', 'socialgrp', 'sc_health_id', 'census_country', 'state_name', 'state_code', 'age_married', 'village_code_census2011_raw', 'phase']
and rows via:
>>>df = pd.read_csv("3BtTWMUQAMawXcCheUAOMlXU.csv", chunksize=2)
>>>for d in df:
... d = d.to_dict(orient='records')
... for r in d:
... print(r)
... import sys
... sys.exit()
The o/p is:
{'_0': '74a7c6f8-94f3-4882-8ad7-9199313a1a51', '_1': 1, 'name': 'Suresh Kautik Patil', 'roh': 'Self', 'sex': 'Male', 'age': 62, 'agemonth': 2, 'ms': 'Married', 'married_in_the_last_1': 'No', 'aadhar_yn': 1, 'aadhar_number': 467172934356, 'aadhar_picture': 'Https://Collect-V2-Production.s3.Ap-South-1.Amazonaws.com/Nzrzdt5akq7uutju90bf%2Fhxbrh6bz57ihvawa3e4n%2Fze2gzezsov7polixljkg%2Fece8f806-3633-445D-8183-336460Cc6207', 'post_acc': 0, 'life_ins': 0, 'health_ins': 0, '_15': 0, '_16': 0, 'asso_memb_JFM': 0, '_18': 0, 'asso_memb_SHG': 0, '_20': 0, '_21': 0, '_22': 0, '_23': 0, 'asso_memb_nothing': 0, '_25': 0, '_26': 0, '_27': 0, '_28': 0, 'asso_memb_SHG_EDTXJEKHESFdTtQwMcIT': 0, '_30': 0, 'asso_memb_JFM_EDTXJEKHESFdTtQwMcIT': 0, '_32': 0, '_33': 0, '_34': 1, 'बँकिंग': nan, 'bank_acc': 1, 'ac_linked_aadhar': 1, 'jdy_yn': 1, 'शिक्षण': nan, 'edu_high': 'Secondary', 'edu_informal': 'Other', 'edu_inst': 'No Information', 'comp_lit': 'No', '_44': nan, 'any_skills_development_training': nan, 'receive_skill_training_from': nan, 'member_get_training_from': nan, 'व्यवसाय': nan, 'occ': 'Labourers', 'other': nan, 'require_assistance_in_receiving_employment': 'No', 'mgnrega_yn': 0, 'mgnregaapp_yn': nan, 'mgnregawork_yn': nan, 'mgnrega_days': nan, '_56': 'No', '_57': nan, '_58': nan, 'scheme_memb_Scholarship': nan, '_60': nan, 'vill_name': 'Cc2bf74e-5Baa-4A23-Ad9d-21Fef4517f41', 'census_village_sd_2011': 'Shindgavhan', 'census_district_2011': 'Nandurbar', 'vill_name_taluka_code': 3954, 'census_subdistrict_2011': 'Nandurbar', 'subdistrict_code': 3954, 'vill_name_gp_code': 182276, 'vill_name_taluka_name': 'Nandurbar', 'district_code': 497, 'vill_name_gp_name': 'Shidgavahan', 'vill_name_village_code': 525705, 'location_latitude': 21.4179219, 'location_longitude': 74.354738, 'location_accuracy': 3, 'hoh': 'Suresh Kautik Patil', 'contact': 'In(+91)-9374060682', 'informant': 'Suresh Kautik Patil', 'rh': 'Self', 'rhoth': nan, 'hh_occu': 'Labourers', 'religion': 'Hindu', 'socialgrp': 'OBC', 'sc_health_id': 1, 'census_country': 'India', 'state_name': 'Maharashtra', 'state_code': 27, 'age_married': '<21 (For Boys)', 'village_code_census2011_raw': 525705, 'phase': 'Phase 3'}
As you can see, the column name Response ID is missing while I read the row. It should be noted that df.iterrows() gave me all the correct columns.
The first few lines(inc the header) from my CSV file are here so that one can take this question as MVC.